Databricks Asset Bundles for MLOps Stacks
You can use Databricks Asset Bundles, the Databricks CLI, and the Databricks MLOps Stack repository on GitHub to create MLOps Stacks. An MLOps Stack is an MLOps project on Azure Databricks that follows production best practices out of the box. See What are Databricks Asset Bundles?.
To create, deploy, and run an MLOps Stacks project, complete the following steps:
Requirements
- Make sure that the target remote workspace has workspace files enabled. See What are workspace files?.
- On your development machine, make sure that Databricks CLI version 0.212.2 or above is installed. To check your installed Databricks CLI version, run the command
databricks -v
. To update your Databricks CLI version, see Install or update the Databricks CLI. (Bundles do not work with Databricks CLI versions 0.18 and below.)
Step 1: Set up authentication
Configure the Databricks CLI for authentication.
This article assumes that you want to use OAuth user-to-machine (U2M) authentication and a corresponding Azure Databricks configuration profile named DEFAULT
for authentication.
Note
U2M authentication is appropriate for trying out these steps in real time. For fully automated workflows, Databricks recommends that you use OAuth machine-to-machine (M2M) authentication instead. See the M2M authentication setup instructions in Authentication.
Use the Databricks CLI to initiate OAuth token management locally by running the following command for each target workspace.
In the following command, replace
<workspace-url>
with your Azure Databricks per-workspace URL, for examplehttps://adb-1234567890123456.7.databricks.azure.cn
.databricks auth login --host <workspace-url>
The Databricks CLI prompts you to save the information that you entered as an Azure Databricks configuration profile. Press
Enter
to accept the suggested profile name, or enter the name of a new or existing profile. Any existing profile with the same name is overwritten with the information that you entered. You can use profiles to quickly switch your authentication context across multiple workspaces.To get a list of any existing profiles, in a separate terminal or command prompt, use the Databricks CLI to run the command
databricks auth profiles
. To view a specific profile's existing settings, run the commanddatabricks auth env --profile <profile-name>
.In your web browser, complete the on-screen instructions to log in to your Azure Databricks workspace.
To view a profile's current OAuth token value and the token's upcoming expiration timestamp, run one of the following commands:
databricks auth token --host <workspace-url>
databricks auth token -p <profile-name>
databricks auth token --host <workspace-url> -p <profile-name>
If you have multiple profiles with the same
--host
value, you might need to specify the--host
and-p
options together to help the Databricks CLI find the correct matching OAuth token information.
Step 2: Create the bundle project
Use Databricks Asset Bundle templates to create your MLOps Stacks project's starter files. To do this, begin by running the following command:
databricks bundle init mlops-stacks
Answer the on-screen prompts. For guidance on answering these prompts, see Start a new project in the Databricks MLOps Stacks repository on GitHub.
The first prompt offers the option of setting up the ML code components, the CI/CD components, or both. This option simplifies the initial setup as you can choose to create only those components that are immediately relevant. (To set up the other components, run the initialization command again.) Select one of the following:
CICD_and_Project
(default) - Set up both ML code and CI/CD components.Project_Only
- Set up ML code components only. This option is for data scientists to get started.CICD_Only
- Set up CI/CD components only. This option is for ML engineers to set up infrastructure.
After you answer all of the on-screen prompts, the template creates your MLOps Stacks project's starter files and adds them to your current working directory.
Customize your MLOps Stacks project's starter files as desired. To do this, follow the guidance in the following files within your new project:
Role Goal Docs First-time users of this repo Understand the ML pipeline and code structure in this repo README.md
Data Scientist Get started writing ML code for a brand new project <project-name>/README.md
Data Scientist Update production ML code (for example, model training logic) for an existing project docs/ml-pull-request.md
Data Scientist Modify production model ML resources (for example, model training or inference jobs) <project-name>/resources/README.md
MLOps / DevOps Set up CI/CD for the current ML project docs/mlops-setup.md
For customizing experiments, the mappings within an experiment declaration correspond to the create experiment operation's request payload as defined in POST /api/2.0/mlflow/experiments/create in the REST API reference, expressed in YAML format.
For customizing jobs, the mappings within a job declaration correspond to the create job operation's request payload as defined in POST /api/2.1/jobs/create in the REST API reference, expressed in YAML format.
Tip
You can define, combine, and override the settings for new job clusters in bundles by using the techniques described in Override cluster settings in Databricks Asset Bundles.
For customizing models, the mappings within a model declaration correspond to the create model operation's request payload as defined in POST /api/2.0/mlflow/registered-models/create in the REST API reference, expressed in YAML format.
For customizing pipelines, the mappings within a pipeline declaration correspond to the create pipeline operation's request payload as defined in POST /api/2.0/pipelines in the REST API reference, expressed in YAML format.
Step 3: Validate the bundle project
Check whether the bundle configuration is valid. To do this, run the Databricks CLI from the project's root, where the databricks.yml
is located, as follows:
databricks bundle validate
If a summary of the bundle configuration is returned, then the validation succeeded. If any errors are returned, fix the errors, and then repeat this step.
Step 4: Deploy the bundle
Deploy the project's resources and artifacts to the desired remote workspace. To do this, run the Databricks CLI from the project's root, where the databricks.yml
is located, as follows:
databricks bundle deploy -t <target-name>
Replace <target-name>
with the name of the desired target within the databricks.yml
file, for example dev
, test
, staging
, or prod
.
Step 5: Run the deployed bundle
The project's deployed Azure Databricks jobs automatically run on their predefined schedules. To run a deployed job immediately, run the Databricks CLI from the project's root, where the databricks.yml
is located, as follows:
databricks bundle run -t <target-name> <job-name>
- Replace
<target-name>
with the name of the desired target within thedatabricks.yml
file where the job was deployed, for exampledev
,test
,staging
, orprod
. - Replace
<job-name>
with the name of the job in one of the.yml
files within<project-name>/databricks-resources
, for examplebatch_inference_job
,write_feature_table_job
, ormodel_training_job
.
A link to the Azure Databricks job appears, which you can copy into your web browser to open the job within the Azure Databricks UI.
Step 6: Delete the deployed bundle (optional)
To delete a deployed project's resources and artifacts if you no longer need them, run the Databricks CLI from the project's root, where the databricks.yml
is located, as follows:
databricks bundle destroy -t <target-name>
Replace <target-name>
with the name of the desired target within the databricks.yml
file, for example dev
, test
, staging
, or prod
.
Answer the on-screen prompts to confirm the deletion of the previously deployed resources and artifacts.