Databricks 资产捆绑包资源

项目
2024-12-23

使用 Databricks 资产捆绑包可以指定捆绑包在捆绑配置中的 resources 映射中使用的 Azure Databricks 资源的相关信息。请参阅资源映射。

本文概述了捆绑包支持的资源类型，并为每个受支持的类型提供详细信息和示例。

支持的资源

下表列出了捆绑包支持的资源类型。可以通过在捆绑包中定义资源并部署捆绑包来创建某些资源，而某些资源仅支持引用要包含在捆绑包中的现有资源。

资源使用相应的 Databricks REST API 对象的创建操作请求有效负载定义，其中对象支持的字段（以 YAML 表示）是资源支持的属性。表中列出了每个资源的相应有效负载的文档链接。

提示

如果在捆绑包配置文件中发现未知资源属性，databricks bundle validate 命令将返回警告。

资源	创建支持	资源属性
cluster	✓	群集属性：POST /api/2.1/clusters/create
仪表板		仪表板属性：POST /api/2.0/lakeview/dashboards
experiment	✓	试验属性：POST /api/2.0/mlflow/experiments/create
作业	✓	作业属性：POST /api/2.1/jobs/create
模型（旧版）	✓	模型属性：POST /api/2.0/mlflow/registered-models/create
model_serving_endpoint	✓	模型服务终结点属性：POST /api/2.0/serving-endpoints
pipeline	✓	管道属性：POST /api/2.0/pipelines
quality_monitor	✓	质量监视器属性：POST /api/2.1/unity-catalog/tables/{table_name}/monitor
registered_model (Unity Catalog)	✓	Unity Catalog 模型属性：POST /api/2.1/unity-catalog/models
schema (Unity Catalog)	✓	Unity Catalog 架构属性：POST /api/2.1/unity-catalog/schemas
卷 (Unity Catalog)	✓	Unity Catalog 卷属性：POST /api/2.1/unity-catalog/volumes

cluster

借助群集资源可以创建通用群集。以下示例将创建一个名为 my_cluster 的群集并将其设置为用于在 my_job 中运行笔记本的群集：

bundle:
  name: clusters

resources:
  clusters:
    my_cluster:
      num_workers: 2
      node_type_id: "i3.xlarge"
      autoscale:
        min_workers: 2
        max_workers: 7
      spark_version: "13.3.x-scala2.12"
      spark_conf:
        "spark.executor.memory": "2g"

  jobs:
    my_job:
      tasks:
        - task_key: test_task
          notebook_task:
            notebook_path: "./src/my_notebook.py"

仪表板

借助仪表板资源，可以在捆绑包中管理 AI/BI 仪表板。有关 AI/BI 仪表板的信息，请参阅仪表板。

以下示例包括并将示例的 NYC 出租车行程分析仪表板部署到 Databricks 工作区。

resources:
  dashboards:
    nyc_taxi_trip_analysis:
      display_name: "NYC Taxi Trip Analysis"
      file_path: ../src/nyc_taxi_trip_analysis.lvdash.json
      warehouse_id: ${var.warehouse_id}

如果使用 UI 修改仪表板，则通过 UI 进行的修改不会应用于本地捆绑包中的仪表板 JSON 文件，除非使用 bundle generate 显式更新它。可以使用 --watch 持续轮询和检索仪表板的更改。请参阅生成捆绑包配置文件。

此外，如果尝试部署包含与远程工作区中不同的仪表板 JSON 文件的捆绑包，将发生错误。若要使用本地工作区强制部署并覆盖远程工作区中的仪表板，请使用 --force 选项。请参阅部署捆绑包。

experiment

借助试验资源，可以在捆绑包中定义 MLflow 试验。有关 MLflow 试验的信息，请参阅 MLflow 试验。

以下示例定义所有用户都可以查看的试验：

resources:
  experiments:
    experiment:
      name: my_ml_experiment
      permissions:
        - level: CAN_READ
          group_name: users
      description: MLflow experiment used to track runs

作业 (job)

借助作业资源，可以在捆绑包中定义作业及其相应的任务。有关作业的信息，请参阅计划和协调工作流。有关使用 Databricks 资产捆绑包模板创建作业的教程，请参阅使用 Databricks 资产捆绑包在 Azure Databricks 上开发作业。

以下示例使用一个笔记本任务定义具有资源键 hello-job 的作业：

resources:
  jobs:
    hello-job:
      name: hello-job
      tasks:
        - task_key: hello-task
          notebook_task:
            notebook_path: ./hello.py

有关定义作业任务和重写作业设置的信息，请参阅将任务添加到 Databricks 资产捆绑包中的作业、替代 Databricks 资产捆绑包中的作业任务设置，以及替代 Databricks 资产捆绑包中的群集设置。

model_serving_endpoint

使用 model_serving_endpoint 资源，可以定义提供终结点的模型。

以下示例定义提供终结点的 Unity Catalog 模型：

resources:
  model_serving_endpoints:
    uc_model_serving_endpoint:
      name: "uc-model-endpoint"
      config:
        served_entities:
        - entity_name: "myCatalog.mySchema.my-ads-model"
          entity_version: "10"
          workload_size: "Small"
          scale_to_zero_enabled: "true"
        traffic_config:
          routes:
          - served_model_name: "my-ads-model-10"
            traffic_percentage: "100"
      tags:
      - key: "team"
        value: "data science"

quality_monitor (Unity Catalog)

使用 quality_monitor 资源，可以定义 Unity Catalog 表监视器。

以下示例定义质量监视器：

resources:
  quality_monitors:
    my_quality_monitor:
      table_name: dev.mlops_schema.predictions
      output_schema_name: ${bundle.target}.mlops_schema
      assets_dir: /Users/${workspace.current_user.userName}/databricks_lakehouse_monitoring
      inference_log:
        granularities: [1 day]
        model_id_col: model_id
        prediction_col: prediction
        label_col: price
        problem_type: PROBLEM_TYPE_REGRESSION
        timestamp_col: timestamp
      schedule:
        quartz_cron_expression: 0 0 8 * * ? # Run Every day at 8am
        timezone_id: UTC

registered_model (Unity Catalog)

使用注册的模型资源，可以在 Unity Catalog 中定义模型。有关 Unity Catalog 注册模型的信息，请参阅管理 Unity Catalog 中的模型生命周期。

以下示例在 Unity Catalog 中定义已注册的模型：

resources:
  registered_models:
      model:
        name: my_model
        catalog_name: ${bundle.target}
        schema_name: mlops_schema
        comment: Registered model in Unity Catalog for ${bundle.target} deployment target
        grants:
          - privileges:
              - EXECUTE
            principal: account users

管道

使用管道资源，可以创建增量实时表管道。有关管道的信息，请参阅什么是增量实时表？。有关使用 Databricks 资产捆绑包模板创建管道的教程，请参阅使用 Databricks 资产捆绑包开发增量实时表管道。

以下示例定义了一个使用资源键 hello-pipeline 的管道：

resources:
  pipelines:
    hello-pipeline:
      name: hello-pipeline
      clusters:
        - label: default
          num_workers: 1
      development: true
      continuous: false
      channel: CURRENT
      edition: CORE
      photon: false
      libraries:
        - notebook:
            path: ./pipeline.py

schema (Unity Catalog)

通过使用架构资源类型，你可以为工作流和管道中作为 Bundle 的一部分创建的表和其他资产定义 Unity Catalog 构架。不同于其他资源类型，架构具有以下限制：

构架资源的所有者始终是部署用户，不能更改。如果在捆绑包中指定了 run_as，则构架上的操作将忽略它。
只有相应的构架对象创建 API 支持的字段才可用于架构资源。例如，不支持 enable_predictive_optimization，因为它仅在更新 API 上可用。

以下示例定义了一个使用资源键 my_pipeline 的管道，该管道会创建一个以键 my_schema 为目标的 Unity Catalog 构架：

resources:
  pipelines:
    my_pipeline:
      name: test-pipeline-{{.unique_id}}
      libraries:
        - notebook:
            path: ./nb.sql
      development: true
      catalog: main
      target: ${resources.schemas.my_schema.id}

  schemas:
    my_schema:
      name: test-schema-{{.unique_id}}
      catalog_name: main
      comment: This schema was created by DABs.

Databricks 资产捆绑包不支持顶级授权映射，因此如果想为架构设置授权，请在 schemas 映射中定义架构的授权。有关授予的详细信息，请参阅显示、授予和撤销权限。

以下示例使用授予定义 Unity Catalog 架构：

resources:
  schemas:
    my_schema:
      name: test-schema
      grants:
        - principal: users
          privileges:
            - CAN_MANAGE
        - principal: my_team
          privileges:
            - CAN_READ
      catalog_name: main

卷 (Unity Catalog)

使用卷资源类型，你可以将 Unity Catalog 卷作为捆绑包的一部分定义并创建。部署定义了卷的捆绑包时，请注意：

在工作区中存在卷之前，不能在捆绑包的 artifact_path 中引用卷。因此，如果要使用 Databricks 资产捆绑包创建卷，必须先在捆绑包中定义卷，对其进行部署以创建卷，然后在后续部署中在 artifact_path 中引用该卷。
部署目标配置了 dev_${workspace.current_user.short_name} 后，捆绑包中的卷不会追加 mode: development 前缀。但是，可以手动配置此前缀。请参阅自定义预设。

以下示例创建具有键 my_volume 的 Unity Catalog 卷。

resources:
  volumes:
    my_volume:
      catalog_name: main
      name: my_volume
      schema_name: my_schema