Copy data from SAP Business Warehouse with Azure Data Factory or Synapse Analytics
APPLIES TO: Azure Data Factory Azure Synapse Analytics
This article shows how to use Azure Data Factory or Synapse Analytics pipelines to copy data from SAP Business Warehouse (BW) via Open Hub to Azure Data Lake Storage Gen2. You can use a similar process to copy data to other supported sink data stores.
Tip
For general information about copying data from SAP BW, including SAP BW Open Hub integration and delta extraction flow, see Copy data from SAP Business Warehouse via Open Hub by using Azure Data Factory.
Prerequisites
Azure Data Factory or Synapse workspace: If you don't have one, follow the steps to create a data factory or create a Synapse workspace.
SAP BW Open Hub Destination (OHD) with destination type "Database Table": To create an OHD or to check that your OHD is configured correctly for integration with the service, see the SAP BW Open Hub Destination configurations section of this article.
The SAP BW user needs the following permissions:
- Authorization for Remote Function Calls (RFC) and SAP BW.
- Permissions to the “Execute” activity of the S_SDSAUTH authorization object.
A self-hosted integration runtime (IR) with SAP .NET connector 3.0. Follow these setup steps:
Install and register the self-hosted integration runtime, version 3.13 or later. (This process is described later in this article.)
Download the 64-bit SAP Connector for Microsoft .NET 3.0 from SAP's website, and install it on the same computer as the self-hosted IR. During installation, make sure that you select Install Assemblies to GAC in the Optional setup steps dialog box, as the following image shows:
Do a full copy from SAP BW Open Hub
In the Azure portal, go to your service. Select Open on the Open Azure Data Factory Studio or Open Synapse Studio tile to open the service UI in a separate tab.
On the home page, select Ingest to open the Copy Data tool.
On the Properties page, choose Built-in copy task under Task type, and choose Run once now under Task cadence or task schedule, and then select Next.
On the Source data store page, select + New connection. Select SAP BW Open Hub from the connector gallery, and then select Continue. To filter the connectors, you can type SAP in the search box.
On the New connection (SAP BW Open Hub) page, follow these steps to create a new connection.
From the Connect via integration runtime list, select an existing self-hosted IR. Or, choose to create one if you don't have one yet.
To create a new self-hosted IR, select +New, and then select Self-hosted. Enter a Name, and then select Next. Select Express setup to install on the current computer, or follow the Manual setup steps that are provided.
As mentioned in Prerequisites, make sure that you have SAP Connector for Microsoft .NET 3.0 installed on the same computer where the self-hosted IR is running.
Fill in the SAP BW Server name, System number, Client ID, Language (if other than EN), User name, and Password.
Select Test connection to validate the settings, and then select Create.
On the Source data store page, select the newly created connection in the Connection block.
In the section of selecting Open Hub destinations, browse the Open Hub Destinations that are available in your SAP BW. You can preview the data in each destination by selecting the preview button at the end of each row. Select the OHD to copy data from, and then select Next.
Specify a filter, if you need one. If your OHD only contains data from a single data-transfer process (DTP) execution with a single request ID, or you're sure that your DTP is finished and you want to copy the data, clear the Exclude Last Request check box in Advanced section. You can preview the data by selecting Preview data button.
Learn more about these settings in the SAP BW Open Hub Destination configurations section of this article. Then select Next.
On the Destination data store page, select + New connection > Azure Data Lake Storage Gen2 > Continue.
On the New connection (Azure Data Lake Storage Gen2) page, follow these steps to create a connection.
- Select your Data Lake Storage Gen2-capable account from the Name drop-down list.
- Select Create to create the connection.
On the Destination data store page, select the newly created connection in the Connection section, and enter copyfromopenhub as the output folder name. Then select Next.
On the File format setting page, select Next to use the default settings.
On the Settings page, specify a Task name, and expand Advanced. Enter a value for Degree of copy parallelism such as 5 to load from SAP BW in parallel. Then select Next.
On the Summary page, review the settings. Then select Next.
On the Deployment page, select Monitor to monitor the pipeline.
Notice that the Monitor tab on the left side of the page is automatically selected. You can use links under the Pipeline name column in the Pipeline runs page to view activity details and to rerun the pipeline.
To view activity runs that are associated with the pipeline run, select links under the Pipeline name column. There's only one activity (copy activity) in the pipeline, so you see only one entry. To switch back to the pipeline-runs view, select the All pipeline runs link at the top. Select Refresh to refresh the list.
To monitor the execution details for each copy activity, select the Details link, which is an eyeglasses icon in the same row of each copy activity in the activity-monitoring view. Available details include the data volume copied from the source to the sink, data throughput, execution steps and duration, and configurations used.
To view the maximum Request ID of each copy activity, go back to the activity-monitoring view and select Output in the same row of each copy activity.
Incremental copy from SAP BW Open Hub
Tip
See SAP BW Open Hub connector delta extraction flow to learn how the SAP BW Open Hub connector copies incremental data from SAP BW. This article can also help you understand basic connector configuration.
Now, let's continue to configure incremental copy from SAP BW Open Hub.
Incremental copy uses a "high-watermark" mechanism that's based on the request ID. That ID is automatically generated in SAP BW Open Hub Destination by the DTP. The following diagram shows this workflow:
On the home page, select Pipeline templates in the Discover more section to use the built-in template.
Search for SAP BW to find and select the Incremental copy from SAP BW to Azure Data Lake Storage Gen2 template. This template copies data into Azure Data Lake Storage Gen2. You can use a similar workflow to copy to other sink types.
On the template's main page, select or create the following three connections, and then select Use this template in the lower-right corner of the window.
- Azure Blob storage: In this walkthrough, we use Azure Blob storage to store the high watermark, which is the max copied request ID.
- SAP BW Open Hub: This is the source to copy data from. Refer to the previous full-copy walkthrough for detailed configuration.
- Azure Data Lake Storage Gen2: This is the sink to copy data to. Refer to the previous full-copy walkthrough for detailed configuration.
This template generates a pipeline with the following three activities and makes them chained on-success: Lookup, Copy Data, and Web.
Go to the pipeline Parameters tab. You see all the configurations that you need to provide.
SAPOpenHubDestinationName: Specify the Open Hub table name to copy data from.
Data_Destination_Container: Specify the destination Azure Data Lake Storage Gen2 container to copy data to. If the container doesn't exist, the copy activity creates one during execution.
Data_Destination_Directory: Specify the folder path under the Azure Data Lake Storage Gen2 container to copy data to. If the path doesn't exist, the copy activity creates a path during execution.
HighWatermarkBlobContainer: Specify the container to store the high-watermark value.
HighWatermarkBlobDirectory: Specify the folder path under the container to store the high-watermark value.
HighWatermarkBlobName: Specify the blob name to store the high watermark value, such as
requestIdCache.txt
. In Blob storage, go to the corresponding path of HighWatermarkBlobContainer+HighWatermarkBlobDirectory+HighWatermarkBlobName, such as container/path/requestIdCache.txt. Create a blob with content 0.LogicAppURL: In this template, we use WebActivity to call Azure Logic Apps to set the high-watermark value in Blob storage. Or, you can use Azure SQL Database to store it. Use a stored procedure activity to update the value.
You must first create a logic app, as the following image shows. Then, paste in the HTTP POST URL.
Go to the Azure portal. Select a new Logic Apps service. Select +Blank Logic App to go to Logic Apps Designer.
Create a trigger of When an HTTP request is received. Specify the HTTP request body as follows:
{ "properties": { "sapOpenHubMaxRequestId": { "type": "string" } }, "type": "object" }
Add a Create blob action. For Folder path and Blob name, use the same values that you configured previously in HighWatermarkBlobContainer+HighWatermarkBlobDirectory and HighWatermarkBlobName.
Select Save. Then, copy the value of HTTP POST URL to use in the pipeline.
After you provide the pipeline parameters, select Debug > Finish to invoke a run to validate the configuration. Or, select Publish to publish all the changes, and then select Add trigger to execute a run.
SAP BW Open Hub Destination configurations
This section introduces configuration of the SAP BW side to use the SAP BW Open Hub connector to copy data.
Configure delta extraction in SAP BW
If you need both historical copy and incremental copy or only incremental copy, configure delta extraction in SAP BW.
Create the Open Hub Destination. You can create the OHD in SAP Transaction RSA1, which automatically creates the required transformation and data-transfer process. Use the following settings:
- ObjectType: You can use any object type. Here, we use InfoCube as an example.
- Destination Type: Select Database Table.
- Key of the Table: Select Technical Key.
- Extraction: Select Keep Data and Insert Records into Table.
You might increase the number of parallel running SAP work processes for the DTP:
Schedule the DTP in process chains.
A delta DTP for a cube only works if the necessary rows haven't been compressed. Make sure that BW cube compression isn't running before the DTP to the Open Hub table. The easiest way to do this is to integrate the DTP into your existing process chains. In the following example, the DTP (to the OHD) is inserted into the process chain between the Adjust (aggregate rollup) and Collapse (cube compression) steps.
Configure full extraction in SAP BW
In addition to delta extraction, you might want a full extraction of the same SAP BW InfoProvider. This usually applies if you want to do full copy but not incremental, or you want to resync delta extraction.
You can't have more than one DTP for the same OHD. So, you must create an additional OHD before delta extraction.
For a full load OHD, choose different options than for delta extraction:
In OHD: Set the Extraction option to Delete Data and Insert Records. Otherwise, data will be extracted many times when you repeat the DTP in a BW process chain.
In the DTP: Set Extraction Mode to Full. You must change the automatically created DTP from Delta to Full immediately after the OHD is created, as this image shows:
In the BW Open Hub connector: Turn off Exclude last request. Otherwise, nothing will be extracted.
You typically run the full DTP manually. Or, you can create a process chain for the full DTP. It's typically a separate chain that's independent of your existing process chains. In either case, make sure that the DTP is finished before you start the extraction by using copy. Otherwise, only partial data will be copied.
Run delta extraction the first time
The first delta extraction is technically a full extraction. By default, the SAP BW Open Hub connector excludes the last request when it copies data. For the first delta extraction, no data is extracted by the copy activity until a subsequent DTP generates delta data in the table with a separate request ID. There are two ways to avoid this scenario:
- Turn off the Exclude last request option for the first delta extraction. Make sure that the first delta DTP is finished before you start the delta extraction the first time.
- Use the procedure for resyncing the delta extraction, as described in the next section.
Resync delta extraction
The following scenarios change the data in SAP BW cubes but are not considered by the delta DTP:
- SAP BW selective deletion (of rows by using any filter condition)
- SAP BW request deletion (of faulty requests)
An SAP Open Hub Destination isn't a data-mart-controlled data target (in all SAP BW support packages since 2015). So, you can delete data from a cube without changing the data in the OHD. You must then resync the data of the cube with the service:
- Run a full extraction in the service (by using a full DTP in SAP).
- Delete all rows in the Open Hub table for the delta DTP.
- Set the status of the delta DTP to Fetched.
After this, all subsequent delta DTPs and delta extractions work as expected.
To set the status of the delta DTP to Fetched, you can use the following option to run the delta DTP manually:
No Data Transfer; Delta Status in Source: Fetched
Related content
Learn about SAP BW Open Hub connector support: