Create a Unity Catalog metastore
This article shows how to create a Unity Catalog metastore and link it to workspaces.
Important
For workspaces that were enabled for Unity Catalog automatically, the instructions in this article are unnecessary. Databricks began to enable new workspaces for Unity Catalog automatically on November 9, 2023, with a rollout proceeding gradually across accounts. You must follow the instructions in this article only if you have a workspace and don't already have a metastore in your workspace region. To determine whether a metastore already exists in your region, see Automatic enablement of Unity Catalog.
A metastore is the top-level container for data in Unity Catalog. Unity Catalog metastores register metadata about securable objects (such as tables, volumes, external locations, and shares) and the permissions that govern access to them. Each metastore exposes a three-level namespace (catalog
.schema
.table
) by which data can be organized. You must have one metastore for each region in which your organization operates. To work with Unity Catalog, users must be on a workspace that is attached to a metastore in their region.
To create a metastore, you do the following:
In your Azure account, optionally create a storage location for metastore-level storage of managed tables and volumes.
For information to help you decide whether you need metastore-level storage, see (Optional) Create metastore-level storage and Data is physically separated in storage.
In your Azure account, create an Azure managed identity or service principal that gives access to that storage location.
In Azure Databricks, create the metastore, attaching the storage location, and assign workspaces to the metastore.
Note
In addition to the approaches described in this article, you can also create a metastore by using the Databricks Terraform provider, specifically the databricks_metastore resource. To enable Unity Catalog to access the metastore, use databricks_metastore_data_access. To link workspaces to a metastore, use databricks_metastore_assignment.
Before you begin
Before you begin, you should familiarize yourself with the basic Unity Catalog concepts, including metastores and managed storage. See What is Unity Catalog?.
You should also confirm that you meet the following requirements for all setup steps:
You must be an Azure Databricks account admin.
The first Azure Databricks account admin must be a Microsoft Entra ID Global Administrator at the time that they first log in to the Azure Databricks account console. Upon first login, that user becomes an Azure Databricks account admin and no longer needs the Microsoft Entra ID Global Administrator role to access the Azure Databricks account. The first account admin can assign users in the Microsoft Entra ID tenant as additional account admins (who can themselves assign more account admins). Additional account admins do not require specific roles in Microsoft Entra ID.
The workspaces that you attach to the metastore must be on the Azure Databricks Premium plan.
If you want to set up metastore-level root storage, you must have permission to create the following in your Azure tenant:
- A storage account to use with Azure Data Lake Storage Gen2. See Create a storage account to use with Azure Data Lake Storage Gen2.
- A new resource to hold a system-assigned managed identity. This requires that you be a Contributor or Owner of a resource group in any subscription in the tenant.
Step 1 (Optional): Create a storage container for metastore-level managed storage
In this step, which is optional, you create a storage account and container to store managed table and volume data at the metastore level. To determine whether you need metastore-level storage, see (Optional) Create metastore-level storage.
Create a storage account for Azure Data Lake Storage Gen2.
This storage account will contain Unity Catalog managed tables and volumes. This must be an Azure Data Lake Storage Gen2 account in the same region as your Azure Databricks workspaces. See Create a storage account to use with Azure Data Lake Storage Gen2.
Create a storage container that will hold your managed tables and volume data at the metastore level.
You can create only one metastore per region. You must use the same region for your metastore and storage container.
This metastore-level storage location can be overridden at the catalog and schema levels. See Specify a managed storage location in Unity Catalog.
Make a note of the ADLSv2 URI for the container, which is in the following format:
abfss://<container-name>@<storage-account-name>.dfs.core.chinacloudapi.cn/<metastore-name>
In the steps that follow, replace
<storage-container>
with this URI.
Step 2 (Optional): Create a managed identity to access the managed storage location
In this step, which is required only if you completed step 1, you create an Azure Databricks access connector that holds a managed identity and give it access to the storage container.
Follow the instructions in Use Azure managed identities in Unity Catalog to access storage.
Note
You can use either an Azure managed identity or a service principal as the identity that gives access to the metastore's storage container. Databricks strongly recommends managed identities, because they do not require you to maintain credentials or rotate secrets, and they let you connect to an Azure Data Lake Storage Gen2 account that is protected by a storage firewall. If you want to use a service principal, see Create Unity Catalog managed storage using a service principal (legacy).
Step 3: Create the metastore and attach a workspace
Each Azure Databricks region requires its own Unity Catalog metastore.
You create a metastore for each region in which your organization operates. You can link each of these regional metastores to any number of workspaces in that region. Each linked workspace has the same view of the data in the metastore, and data access control can be managed across workspaces. You can access data in other metastores using Delta Sharing.
If you chose to create metastore-level storage, the metastore will use the the storage container and Azure managed identity that you created in the previous steps.
To create a metastore:
If you chose to create metastore-level storage, make sure that you have the path to the storage container and the resource ID of the Azure Databricks access connector that you created in the previous task.
Log in to your workspace as an account admin.
Click your username in the top bar of the Azure Databricks workspace and select Manage Account.
Log in to the Azure Databricks account console.
Click Catalog.
Click Create metastore.
Enter the following:
Name for the metastore.
Region where the metastore will be deployed.
This must be in the same region as the workspaces you want to use to access the data. If you chose to create a storage container for metastore-level storage, that region must also be the same.
(Optional) ADLS Gen 2 path: Enter the path to the storage container that you will use as root storage for the metastore.
The
abfss://
prefix is added automatically.(Optional) Access Connector ID: Enter the Azure Databricks access connector's resource ID in the format:
/subscriptions/12f34567-8ace-9c10-111c-aea8eba12345c/resourceGroups/<resource-group>/providers/Microsoft.Databricks/accessConnectors/<connector-name>
Click Create.
When prompted, select workspaces to link to the metastore.
For details, see Enable a workspace for Unity Catalog.
Transfer the metastore admin role to a group.
The user who creates a metastore is its owner, also called the metastore admin. The metastore admin can create top-level objects in the metastore such as catalogs and can manage access to tables and other objects. Databricks recommends that you reassign the metastore admin role to a group. See Assign a metastore admin.
Enable Azure Databricks management of uploads to managed volumes.
Azure Databricks uses cross-origin resource sharing (CORS) to upload data to managed volumes in Unity Catalog. See Configure Unity Catalog storage account for CORS.