Tutorial: Create your first custom Databricks Asset Bundle template
In this tutorial, you'll create a custom Databricks Asset Bundle template for creating bundles that run a job with a specific Python task on a cluster using a specific Docker container image.
Before you start
Install the Databricks CLI version 0.218.0 or above. If you've already installed it, confirm the version is 0.218.0 or higher by running databricks -version
from the command line.
Define user prompt variables
The first step in buidling a bundle template is to define the databricks bundle init
user prompt variables. From the command line:
Create an empty directory named
dab-container-template
:mkdir dab-container-template
In the directory's root, create a file named
databricks_template_schema.json
:cd dab-container-template touch databricks_template_schema.json
Add the following contents to the
databricks_template_schema.json
and save the file. Each variable will be translated to a user prompt during bundle creation.{ "properties": { "project_name": { "type": "string", "default": "project_name", "description": "Project name", "order": 1 } } }
Create the bundle folder structure
Next, in the template directory, create subdirectories named resources
and src
. The template
folder contains the directory structure for your generated bundles. The names of the subdirectories and files will follow Go package template syntax when derived from user values.
mkdir -p "template/resources"
mkdir -p "template/src"
Add YAML configuration templates
In the template
directory, create a file named databricks.yml.tmpl
and add the following YAML:
touch template/databricks.yml.tmpl
# This is a Databricks asset bundle definition for {{.project_name}}.
# See https://docs.databricks.com/dev-tools/bundles/index.html for documentation.
bundle:
name: {{.project_name}}
include:
- resources/*.yml
targets:
# The 'dev' target, used for development purposes.
# Whenever a developer deploys using 'dev', they get their own copy.
dev:
# We use 'mode: development' to make sure everything deployed to this target gets a prefix
# like '[dev my_user_name]'. Setting this mode also disables any schedules and
# automatic triggers for jobs and enables the 'development' mode for Delta Live Tables pipelines.
mode: development
default: true
workspace:
host: {{workspace_host}}
# The 'prod' target, used for production deployment.
prod:
# For production deployments, we only have a single copy, so we override the
# workspace.root_path default of
# /Workspace/Users/${workspace.current_user.userName}/.bundle/${bundle.target}/${bundle.name}
# to a path that is not specific to the current user.
#
# By making use of 'mode: production' we enable strict checks
# to make sure we have correctly configured this target.
mode: production
workspace:
host: {{workspace_host}}
root_path: /Shared/.bundle/prod/${bundle.name}
{{- if not is_service_principal}}
run_as:
# This runs as {{user_name}} in production. Alternatively,
# a service principal could be used here using service_principal_name
# (see Databricks documentation).
user_name: {{user_name}}
{{end -}}
Create another YAML file named {{.project_name}}_job.yml.tmpl
and place it in the template/resources
directory. This new YAML file splits the project job definitions from the rest of the bundle's definition. Add the following YAML to this file to describe the template job, which contains a specific Python task to run on a job cluster using a specific Docker container image:
touch template/resources/{{.project_name}}_job.yml.tmpl
# The main job for {{.project_name}}
resources:
jobs:
{{.project_name}}_job:
name: {{.project_name}}_job
tasks:
- task_key: python_task
job_cluster_key: job_cluster
spark_python_task:
python_file: ../src/{{.project_name}}/task.py
job_clusters:
- job_cluster_key: job_cluster
new_cluster:
docker_image:
url: databricksruntime/python:10.4-LTS
node_type_id: i3.xlarge
spark_version: 13.3.x-scala2.12
In this example, you use a default Databricks base Docker container image, but you can specify your own custom image instead.
Add files referenced in your configuration
Next, create a template/src/{{.project_name}}
directory and create the Python task file referenced by the job in the template:
mkdir -p template/src/{{.project_name}}
touch template/src/{{.project_name}}/task.py
Now, add the following to task.py
:
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.master('local[*]').appName('example').getOrCreate()
print(f'Spark version{spark.version}')
Verify the bundle template structure
Review the folder structure of your bundle template project. It should look like this:
.
├── databricks_template_schema.json
└── template
├── databricks.yml.tmpl
├── resources
│ └── {{.project_name}}_job.yml.tmpl
└── src
└── {{.project_name}}
└── task.py
Test your template
Finally, test your bundle template. To generate a bundle based on your new custom template, use the databricks bundle init
command, specifying the new template location. From your bundle projects root folder:
mkdir my-new-container-bundle
cd my-new-container-bundle
databricks bundle init dab-container-template
Next steps
- Create a bundle that deploys a notebook to an Azure Databricks workspace and then runs that deployed notebook as an Azure Databricks job. See Develop a job on Azure Databricks using Databricks Asset Bundles.
- Create a bundle that deploys a notebook to an Azure Databricks workspace and then runs that deployed notebook as a Delta Live Tables pipeline. See Develop Delta Live Tables pipelines with Databricks Asset Bundles.
- Create a bundle that deploys and runs an MLOps Stack. See Databricks Asset Bundles for MLOps Stacks.
- Add a bundle to a CI/CD (continuous integration/continuous deployment) workflow in GitHub. See Run a CI/CD workflow with a Databricks Asset Bundle and GitHub Actions.