Get started with Azure Synapse Link for SQL Server 2022

This article is a step-by-step guide for getting started with Azure Synapse Link for SQL Server 2022. For an overview, see Azure Synapse Link for SQL Server 2022.

Prerequisites

  • Before you begin, see Create a new Azure Synapse workspace to get Azure Synapse Link for SQL. The current tutorial is to create Azure Synapse Link for SQL in a public network. This article assumes that you selected Disable Managed virtual network and Allow connections from all IP addresses when you created an Azure Synapse workspace. If you want to configure Azure Synapse Link for SQL Server 2022 with network security, also see Configure Azure Synapse Link for SQL Server 2022 with network security.

  • Create an Azure Data Lake Storage Gen2 account, which is different from the account you create with the Azure Synapse Analytics workspace. You'll use this account as the landing zone to stage the data submitted by SQL Server 2022. For more information, see Create an Azure Data Lake Storage Gen2 account.

  • Make sure that your SQL Server 2022 database has a master key created.

    CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<a new password>'
    

Create your target Azure Synapse SQL dedicated pool

  1. Open Synapse Studio.

  2. Open the Manage hub, go to SQL pools, and then select New.

    Screenshot that shows how to create a new Azure Synapse SQL dedicated pool from Synapse Studio.

  3. Enter a unique pool name, use the default settings, and create the dedicated pool.

  4. From the Data hub, under Workspace, your new Azure Synapse SQL database should be listed under Databases. From your new Azure Synapse SQL database, select New SQL script, and then select Empty script.

    Screenshot that shows how to create a new empty SQL script from Synapse Studio.

  5. To create the master key for your target Azure Synapse SQL database, paste the following script, and then select Run.

    CREATE MASTER KEY
    

Create a linked service for your source SQL Server 2022 database

  1. Select the Manage hub button, and then select Linked services.

    Go to linked services from Synapse Studio.

  2. Press New, select SQL Server and select Continue.

    Screenshot that shows how to create a SQL server linked service.

  3. In the Name box, enter the name of the linked service of SQL Server 2022.

    Screenshot that shows where to enter the server and database names to connect.

    Note

    Only the Linked Service in Legacy version is supported.

  4. When you're choosing the integration runtime, select your self-hosted integration runtime. If your Azure Synapse workspace doesn't have an available self-hosted integration runtime, create one.

  5. (Optional) To create a self-hosted integration runtime to connect to your source SQL Server 2022, do the following:

    a. Select New.

    Screenshot that shows how to create a new self-hosted integration runtime.

    b. Select Self-hosted, and then select Continue.

    c. In the Name box, enter the name of the self-hosted integration runtime, and then select Create.

    Screenshot that shows where to enter a name for the self-hosted integration runtime.

    A self-hosted integration runtime is now available in your Azure Synapse workspace.

    d. Follow the prompts to download, install, and use the key to register your integration runtime agent on your Windows machine, which has direct access to your SQL Server 2022 instance. For more information, see Create a self-hosted integration runtime - Azure Data Factory and Azure Synapse.

    Screenshot that shows where to download, install, and register the integration runtime.

    e. Select Close.

    Get the status of integration runtime.

    f. Go to the monitoring page, and then ensure that your self-hosted integration runtime is running by selecting Refresh to get the latest status of integration runtime.

  6. Continue to enter the remaining information for your linked service, including SQL Server name, Database name, Authentication type, User name, and Password to connect to your SQL Server 2022 instance.

    Note

    We recommend that you enable encryption on this connection. To do so, add the Encrypt property with a value of true as an additional connection property. Also set the Trust Server Certificate property to either true or false, depending on your server configuration. For more information, see Enable encrypted connections to the database engine.

  7. Select Test Connection to ensure that your self-hosted integration runtime can access your SQL Server instance.

  8. Select Create.

    Your new linked service will be connected to the SQL Server 2022 instance that's available in your workspace.

    Screenshot that shows where to view the linked service connection.

    Note

    The linked service that you create here isn't dedicated to Azure Synapse Link for SQL. It can be used by any workspace user who has the appropriate permissions. Take time to understand the scope of users who might have access to this linked service and its credentials. For more information about permissions in Azure Synapse workspaces, see Azure Synapse workspace access control overview - Azure Synapse Analytics.

Create a linked service to connect to your landing zone on Azure Data Lake Storage Gen2

  1. Go to your newly created Azure Data Lake Storage Gen2 account, select Access Control (IAM), select Add, and then select Add role assignment.

    Screenshot of the 'Access Control (IAM)' pane of the Data Lake Storage Gen2 account.

  2. Select Storage Blob Data Contributor for the chosen role, select Managed identity and then, under Members, select your Azure Synapse workspace. Adding this role assignment might take a few minutes.

    Screenshot that shows how to add a role assignment.

    Note

    Make sure that you've granted your Azure Synapse workspace managed identity permissions to the Azure Data Lake Storage Gen2 storage account that's used as the landing zone. For more information, see Grant permissions to a managed identity in an Azure Synapse workspace - Azure Synapse Analytics.

  3. Open the Manage hub in your Azure Synapse workspace, and go to Linked services.

    Screenshot that shows how to go to the linked service.

  4. Select New, and then select Azure Data Lake Storage Gen2.

  5. Do the following:

    a. In the Name box, enter the name of the linked service for your landing zone.

    b. For Authentication method, enter Managed Identity.

    c. Select the Storage account name, which has already been created.

  6. Select Test Connection to ensure that you can access your Azure Data Lake Storage Gen2 account.

  7. Select Create.

    Your new linked service will be connected to the Azure Data Lake Storage Gen2 account.

    Screenshot that shows the new linked service to Azure Data Lake Storage Gen2.

    Note

    The linked service that you create here isn't dedicated to Azure Synapse Link for SQL. It can be used by any workspace user who has the appropriate permissions. Take time to understand the scope of users who might have access to this linked service and its credentials. For more information about permissions in Azure Synapse workspaces, see Azure Synapse workspace access control overview - Azure Synapse Analytics.

  1. From Synapse Studio, open the Integrate hub.

  2. On the Integrate pane, select the plus sign (+), and then select Link connection.

    Screenshot that shows the 'Link connection' button.

  3. Enter your source database:

    a. For Source type, select SQL Server.

    b, For your source Linked service, select the service that connects to your SQL Server 2022 instance.

    c. For Table names, select names from your SQL Server instance to be replicated to your Azure Synapse SQL pool.

    d. Select Continue.

    Screenshot that shows where to enter source database details.

  4. From Synapse SQL Dedicated Pools, select a target database name.

  5. Select Continue.

  6. Enter your link connection settings:

    a. For Link connection name, enter the name.

    b. For Core count for the link connection compute, enter the number of cores. These cores will be used for the movement of data from the source to the target. We recommend that you start with a small number and increase the count as needed.

    c. For Linked service, select the service that will connect to your landing zone.

    d. Enter your Azure Data Lake Storage Gen2 container name or container/folder name as a landing zone folder path for staging the data. The container must be created first.

    e. Enter your Azure Data Lake Storage Gen2 shared access signature token. The token is required for the SQL change feed to access the landing zone. If your Azure Data Lake Storage Gen2 account doesn't have a shared access signature token, you can create one by selecting Generate token.

    f. Select OK.

    Screenshot that shows where to enter the link connection settings.

    Note

    The number of cores you select here are allocated to the ingestion service for processing data loading and changes. They don't affect the target dedicated SQL pool configuration. If you can't connect to landing zone using generated SAS token due to limitation from your storage, you can try to use delegation SAS token to connect to landing zone as well.

  7. With the new Azure Synapse Link connection open, you can now update the target table name, distribution type, and structure type.

    Note

    • Consider using heap table for the structure type when your data contains varchar(max), nvarchar(max), and varbinary(max).
    • Make sure that the schema in your Azure Synapse SQL dedicated pool has already been created before you start the link connection. Azure Synapse Link for SQL will create tables automatically under your schema in the Azure Synapse SQL pool.
  8. In the Action on existing target table dropdown list, choose the option most appropriate for your scenario if the table already exists in the destination.

    • Drop and recreate table: The existing target table will be dropped and recreated.
    • Fail on non-empty table: If target table contains data, link connection for the given table will fail.
    • Merge with existing data: Data will be merged into the existing table.

    Note

    If you want to merge multiple sources into the same destination by choosing "Merge with existing data", make sure the sources contain different data to avoid conflict and unexpected result.

  9. Specify whether to enable transaction consistency across tables.

    • When this option is enabled, a transaction spanning across multiple tables on the source database is always replicated to the destination database in a single transaction. This, however, will create overhead on the overall replication throughput.
    • When the option is disabled, each table will replicate changes in its own transaction boundary to the destination in parallel connections, thus improving overall replication throughput.

    Note

    When you want to enable transaction consistency across tables, please also make sure the transaction isolation levels in your Synapse dedicated SQL pool is READ COMMITTED SNAPSHOT ISOLATION.

  10. Select Publish all to save the new link connection to the service.

Select Start, and then wait a few minutes for the data to be replicated.

Note

A link connection will start from a full initial load from your source database, followed by incremental change feeds via the change feed feature in SQL Server 2022. For more information, see Azure Synapse Link for SQL change feed.

You can monitor the status of your Azure Synapse Link connection, see which tables are being initially copied over (snapshotting), and see which tables are in continuous replication mode (replicating).

  1. Go to the Monitor hub of your Azure Synapse workspace, and then select Link connections.

  2. Open the link connection you started, and view the status of each table.

  3. Select Refresh on the monitoring view for your connection to observe any updates to the status.

    Monitor the linked connection.

Query the replicated data

Wait for a few minutes, and then check to ensure that the target database has the expected table and data. See the data available in your Azure Synapse SQL dedicated pool destination store. You can also now explore the replicated tables in your target Azure Synapse SQL dedicated pool.

  1. In the Data hub, under Workspace, open your target database.

  2. Under Tables, right-click one of your target tables.

  3. Select New SQL script, and then select Top 100 rows.

  4. Run this query to view the replicated data in your target Azure Synapse SQL dedicated pool.

  5. You can also query the target database by using Microsoft SQL Server Management Studio (SSMS) or other tools. Use the SQL dedicated endpoint for your workspace as the server name. This name is usually <workspacename>.sql.azuresynapse.azure.cn. Add Database=databasename@poolname as an extra connection string parameter when connecting via SSMS or other tools.

To add or remove tables in Synapse Studio, do the following:

  1. In your Azure Synapse workspace, open the Integrate hub.

  2. Select the link connection that you want to edit, and then open it.

  3. Do either of the following:

    • To add a table, select New table.
    • To remove a table, select the trash can icon next to it.

    Screenshot of the link connection pane for adding or removing tables.

    Note

    You can directly add or remove tables when a link connection is running.

To stop the Azure Synapse Link connection in Synapse Studio, do the following:

  1. In your Azure Synapse workspace, open the Integrate hub.

  2. Select the link connection that you want to edit, and then open it.

  3. Select Stop to stop the link connection, and it will stop replicating your data.

    Screenshot of the pane for stopping a link connection.

    Note

    • If you restart a link connection after stopping it, it will start from a full initial load from your source database, and incremental change feeds will follow.
    • If you choose "Merge with existing data" as the action on existing target table, when you stop the link connection and restart it, the record deletions in source during that period won't be deleted in the destination. In such case, to ensure data consistency, consider to use pause/resume instead of stop/start, or to clean up the destination tables before you restart the link connection.

Rotate the shared access signature token for the landing zone

A shared access signature token is required for the SQL change feed to get access to the landing zone and push data there. It has an expiration date, so you need to rotate the token before that date. Otherwise, Azure Synapse Link will fail to replicate the data from the SQL Server instance to the Azure Synapse SQL dedicated pool.

  1. In your Azure Synapse workspace, open the Integrate hub.

  2. Select the link connection that you want to edit, and then open it.

  3. Select Rotate token.

    Screenshot that shows where to rotate a shared access signature token.

  4. To get the new shared access signature token, select Generate automatically or Input manually, and then select OK.

    Screenshot that shows how to get a new shared access signature token.

    Note

    If you can't connect to landing zone using generated SAS token due to limitation from your storage, you can try to use delegation SAS token to connect to landing zone as well.

Next steps

If you're using a database other than SQL Server 2022, see: