AI and machine learning on Databricks
This article describes the tools that Mosaic AI (formerly Databricks Machine Learning) provides to help you build AI and ML systems. The diagram shows how various products on Databricks platform help you implement your end to end workflows to build and deploy AI and ML systems
Machine learning on Databricks
With Mosaic AI, a single platform serves every step of ML development and deployment, from raw data to inference tables that save every request and response for a served model. Data scientists, data engineers, ML engineers and DevOps can do their jobs using the same set of tools and a single source of truth for the data.
Mosaic AI unifies the data layer and ML platform. All data assets and artifacts, such as models and functions, are discoverable and governed in a single catalog. Using a single platform for data and models makes it possible to track lineage from the raw data to the production model. Built-in data and model monitoring saves quality metrics to tables that are also stored in the platform, making it easier to identify the root cause of model performance problems. For more information about how Databricks supports the full ML lifecycle and MLOps, see MLOps workflows on Azure Databricks and MLOps Stacks: model development process as code.
Some of the key components of the data intelligence platform are:
Tasks | Component |
---|---|
Govern and manage data, features, models, and functions. Also discovery, versioning, and lineage. | Unity Catalog |
Feature development and management | Feature engineering and serving. |
Train models | AutoML, Databricks notebooks |
Track model development | MLflow tracking |
Build automated workflows and production-ready ETL pipelines | Databricks Jobs |
Git integration | Databricks Git folders |
Deep learning on Databricks
Configuring infrastructure for deep learning applications can be difficult. Databricks Runtime for Machine Learning takes care of that for you, with clusters that have built-in compatible versions of the most common deep learning libraries like TensorFlow, PyTorch, and Keras.
Databricks Runtime ML clusters also include pre-configured GPU support with drivers and supporting libraries. It also supports libraries like Ray to parallelize compute processing for scaling ML workflows and ML applications.
For machine learning applications, Databricks recommends using a cluster running Databricks Runtime for Machine Learning. See Create a cluster using Databricks Runtime ML.
To get started with deep learning on Databricks, see:
- Best practices for deep learning on Azure Databricks
- Deep learning on Databricks
- Reference solutions for deep learning
Next steps
To get started, see:
For a recommended MLOps workflow on Databricks Mosaic AI, see:
To learn about key Databricks Mosaic AI features, see: