Snowflake and Anaconda recently announced the general availability of Snowpark for Python, a solution that embeds Anaconda’s data and machine learning packages within Snowflake’s Data Cloud.
Previously available in public preview since June, this new native integration is for the Python community of data scientists, data engineers, developers, and analysts who wish to build data pipelines and machine learning workflows directly within Snowflake.
Snowflake says Python is catching up to SQL in popularity within the data world, and its key motivation behind Snowpark for Python was to foster the value of SQL and Python working together without the need for complex infrastructure management for separate languages.
The companies list Snowpark for Python’s capabilities as the following:
- Run secure Python-based workflows without the need to copy or move data.
- Access the most popular open-source Python packages such as NumPy, scikit-learn, SciPy, pandas, TensorFlow and others in Snowflake without any manual installs.
- Accelerate Python-based workflows running inside Snowflake’s secure processing engine with Anaconda’s dependency management and securely built packages.
- Build production data pipelines and data science workflows with Anaconda-curated Python libraries that run in a secure sandbox inside Snowflake.
“Since we announced the public preview of Anaconda in Snowpark for Python this June, data scientists have told us that the ability to use their favorite programming language directly inside the database has been a game-changer,” said Peter Wang, CEO and co-founder of Anaconda. “Snowflake users can be more productive with cutting-edge machine learning tools while meeting the needs of organizational governance; at the production end, it is easier for the business to ‘see’ machine learning models and deploy them into business environments.”
Coding in multiple languages can result in heightened security risks due to siloed data: “Snowpark users’ seamless access to Anaconda’s curated package repository helps address two of the biggest challenges data scientists face using open-source software: Meeting InfoSec standards and managing package dependencies in their computing environments,” Anaconda said in a release.
“As a major contributor to open source projects, Snowflake wanted to bring enterprise-grade open-source innovation to the Snowflake Data Cloud,” said Torsten Grabs, director of product management at Snowflake. “By embedding Anaconda’s repository and package manager into the Snowflake engine, data scientists and engineers can use the most popular open source packages without needing to copy or move the data.”
Snowflake says the GA of Snowpark for Python is just the beginning. The company is actively expanding functionality based on community feedback from the Snowflake and Anaconda ideas board. The company will continue adding packages to the existing repository of over 2,000 packages available in the Snowflake channel. Since public preview, example packages added include Prophet, PyNomaly, Datasketch, h3-py, Gensim, email_validator, PyPDF2, and tzdata, among others. In the future, Snowflake plans to add support for Python 3.9 and higher, offer user-defined aggregate functions, and grant the ability for more granular package access controls.
Snowflake also announced the public preview of Snowpark-optimized warehouses. The company claims that each node of the new warehouse option provides 16x more memory and 10x the cache compared to a standard warehouse. Snowflake says this will unlock ML training inside Snowflake for large datasets and enable memory-intensive operations such as statistical analysis, feature engineering transformations, model training, and inference.