你当前正在访问 Microsoft Azure Global Edition 技术文档网站。 如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站,请访问 https://docs.azure.cn。
教程:将数据加载到 Azure Synapse Analytics SQL 池
本教程使用 PolyBase 将 WideWorldImportersDW 数据仓库从 Azure Blob 存储加载到 Azure Synapse Analytics SQL 池中的数据仓库。 本教程使用 Azure 门户和 SQL Server Management Studio (SSMS) 执行以下操作:
- 创建专用于加载数据的用户
- 创建使用 Azure Blob 作为数据源的外部表
- 使用 CTAS T-SQL 语句将数据加载到数据仓库
- 查看正在加载的数据的进度
- 在日期维度和销售事实数据表中生成一年的数据
- 创建新加载的数据的统计信息
如果你没有 Azure 订阅,请在开始之前创建一个 Azure 试用帐户。
准备阶段
开始本教程之前,请下载并安装最新版 SQL Server Management Studio (SSMS)。
本教程假设你已根据以下教程创建 SQL 专用池。
注意
对于本教程,建议至少使用 DW1000c。
创建用于加载数据的用户
服务器管理员帐户用于执行管理操作,不适合对用户数据运行查询。 加载数据是一种内存密集型操作。 内存最大值是根据所用 SQL 池的代系、数据仓库单位和资源类定义的。
最好创建专用于加载数据的登录名和用户。 然后,将加载用户添加到启用相应最大内存分配的资源类。
由于当前是以服务器管理员的身份连接的,因此可以创建登录名和用户。 使用以下步骤创建名为 LoaderRC60 的登录名和用户。 然后将该用户分配到 staticrc60 资源类。
在 SSMS 中,右键单击“master”,然后在显示的下拉菜单中选择“新建查询”。 此时将打开一个新的查询窗口。
在查询窗口中,输入以下 T-SQL 命令,创建名为 LoaderRC60 的登录名和用户,并将“a123STRONGpassword!”替换为自己的密码。
CREATE LOGIN LoaderRC60 WITH PASSWORD = 'a123STRONGpassword!'; CREATE USER LoaderRC60 FOR LOGIN LoaderRC60;
单击“执行” 。
右键单击“SampleDW”,并选择“新建查询”。 此时会打开一个新的查询窗口。
输入以下 T-SQL 命令,为 LoaderRC60 登录名创建名为 LoaderRC60 的数据库用户。 第二行为新用户授予对新数据仓库的 CONTROL 权限。 这些权限类似于使用户成为数据库所有者。 第三行将新用户添加为
staticrc60
资源类的成员。CREATE USER LoaderRC60 FOR LOGIN LoaderRC60; GRANT CONTROL ON DATABASE::[SampleDW] to LoaderRC60; EXEC sp_addrolemember 'staticrc60', 'LoaderRC60';
单击“执行” 。
以加载用户的身份连接到服务器
加载数据的第一步是以 LoaderRC60 的身份登录。
在对象资源管理器中,单击“连接”下拉菜单,然后选择“数据库引擎”。 此时会显示“连接到服务器”对话框。
输入完全限定的服务器名称,并输入 LoaderRC60 作为登录名。 输入 LoaderRC60 的密码。
单击“连接” 。
当连接准备就绪时,对象资源管理器中出现两个服务器连接。 一个是作为 ServerAdmin 连接,另一个是作为 LoaderRC60 连接。
创建外部表和对象
已准备好开始将数据加载到新的数据仓库。 若要了解如何将数据置于 Azure Blob 存储或将其直接从源加载到 SQL 池以供将来参考,请参阅加载概述。
运行以下 SQL 脚本,指定有关想要加载的数据的信息。 此信息包括数据所在的位置、数据内容的格式以及数据的表定义。 数据位于全局 Azure Blob 中。
在前一部分,我们以 LoaderRC60 的身份登录到了数据仓库。 在 SSMS 中,右键单击 LoaderRC60 连接下面的“SampleDW”,并选择“新建查询”。 此时会显示一个新的查询窗口。
比较查询窗口和之间的图像。 验证新查询窗口是否以 LoaderRC60 的身份运行,并对 SampleDW 数据库执行查询。 使用此查询窗口执行所有加载步骤。
创建 SampleDW 数据库的主密钥。 只需要为每个数据库创建一次主密钥。
CREATE MASTER KEY;
运行以下 CREATE EXTERNAL DATA SOURCE 语句,定义 Azure Blob 的位置。 这是外部全球进口商数据的位置。 要运行追加到查询窗口的命令,请突出显示要运行的命令,然后单击“执行”。
CREATE EXTERNAL DATA SOURCE WWIStorage WITH ( TYPE = Hadoop, LOCATION = 'wasbs://wideworldimporters@sqldwholdata.blob.core.windows.net' );
运行以下 CREATE EXTERNAL FILE FORMAT T-SQL 语句,指定外部数据文件的格式设置特征和选项。 此语句指定外部数据需存储为文本,且值由竖线(“|”)字符分隔。
CREATE EXTERNAL FILE FORMAT TextFileFormat WITH ( FORMAT_TYPE = DELIMITEDTEXT, FORMAT_OPTIONS ( FIELD_TERMINATOR = '|', USE_TYPE_DEFAULT = FALSE ) );
运行以下 CREATE SCHEMA 语句,创建外部文件格式的架构。 ext 架构提供组织即将创建的外部表的方法。 wwi 架构组织要包含数据的标准表。
CREATE SCHEMA ext; GO CREATE SCHEMA wwi;
创建外部表。 表定义存储在数据库中,但表引用数据存储在 Azure Blob 存储中。 运行以下 T-SQL 命令创建若干外部表,这些表都指向之前在外部数据源中定义的 Azure Blob。
CREATE EXTERNAL TABLE [ext].[dimension_City]( [City Key] [int] NOT NULL, [WWI City ID] [int] NOT NULL, [City] [nvarchar](50) NOT NULL, [State Province] [nvarchar](50) NOT NULL, [Country] [nvarchar](60) NOT NULL, [Continent] [nvarchar](30) NOT NULL, [Sales Territory] [nvarchar](50) NOT NULL, [Region] [nvarchar](30) NOT NULL, [Subregion] [nvarchar](30) NOT NULL, [Location] [nvarchar](76) NULL, [Latest Recorded Population] [bigint] NOT NULL, [Valid From] [datetime2](7) NOT NULL, [Valid To] [datetime2](7) NOT NULL, [Lineage Key] [int] NOT NULL ) WITH (LOCATION='/v1/dimension_City/', DATA_SOURCE = WWIStorage, FILE_FORMAT = TextFileFormat, REJECT_TYPE = VALUE, REJECT_VALUE = 0 ); CREATE EXTERNAL TABLE [ext].[dimension_Customer] ( [Customer Key] [int] NOT NULL, [WWI Customer ID] [int] NOT NULL, [Customer] [nvarchar](100) NOT NULL, [Bill To Customer] [nvarchar](100) NOT NULL, [Category] [nvarchar](50) NOT NULL, [Buying Group] [nvarchar](50) NOT NULL, [Primary Contact] [nvarchar](50) NOT NULL, [Postal Code] [nvarchar](10) NOT NULL, [Valid From] [datetime2](7) NOT NULL, [Valid To] [datetime2](7) NOT NULL, [Lineage Key] [int] NOT NULL ) WITH (LOCATION='/v1/dimension_Customer/', DATA_SOURCE = WWIStorage, FILE_FORMAT = TextFileFormat, REJECT_TYPE = VALUE, REJECT_VALUE = 0 ); CREATE EXTERNAL TABLE [ext].[dimension_Employee] ( [Employee Key] [int] NOT NULL, [WWI Employee ID] [int] NOT NULL, [Employee] [nvarchar](50) NOT NULL, [Preferred Name] [nvarchar](50) NOT NULL, [Is Salesperson] [bit] NOT NULL, [Photo] [varbinary](300) NULL, [Valid From] [datetime2](7) NOT NULL, [Valid To] [datetime2](7) NOT NULL, [Lineage Key] [int] NOT NULL ) WITH ( LOCATION='/v1/dimension_Employee/', DATA_SOURCE = WWIStorage, FILE_FORMAT = TextFileFormat, REJECT_TYPE = VALUE, REJECT_VALUE = 0 ); CREATE EXTERNAL TABLE [ext].[dimension_PaymentMethod] ( [Payment Method Key] [int] NOT NULL, [WWI Payment Method ID] [int] NOT NULL, [Payment Method] [nvarchar](50) NOT NULL, [Valid From] [datetime2](7) NOT NULL, [Valid To] [datetime2](7) NOT NULL, [Lineage Key] [int] NOT NULL ) WITH ( LOCATION ='/v1/dimension_PaymentMethod/', DATA_SOURCE = WWIStorage, FILE_FORMAT = TextFileFormat, REJECT_TYPE = VALUE, REJECT_VALUE = 0 ); CREATE EXTERNAL TABLE [ext].[dimension_StockItem]( [Stock Item Key] [int] NOT NULL, [WWI Stock Item ID] [int] NOT NULL, [Stock Item] [nvarchar](100) NOT NULL, [Color] [nvarchar](20) NOT NULL, [Selling Package] [nvarchar](50) NOT NULL, [Buying Package] [nvarchar](50) NOT NULL, [Brand] [nvarchar](50) NOT NULL, [Size] [nvarchar](20) NOT NULL, [Lead Time Days] [int] NOT NULL, [Quantity Per Outer] [int] NOT NULL, [Is Chiller Stock] [bit] NOT NULL, [Barcode] [nvarchar](50) NULL, [Tax Rate] [decimal](18, 3) NOT NULL, [Unit Price] [decimal](18, 2) NOT NULL, [Recommended Retail Price] [decimal](18, 2) NULL, [Typical Weight Per Unit] [decimal](18, 3) NOT NULL, [Photo] [varbinary](300) NULL, [Valid From] [datetime2](7) NOT NULL, [Valid To] [datetime2](7) NOT NULL, [Lineage Key] [int] NOT NULL ) WITH ( LOCATION ='/v1/dimension_StockItem/', DATA_SOURCE = WWIStorage, FILE_FORMAT = TextFileFormat, REJECT_TYPE = VALUE, REJECT_VALUE = 0 ); CREATE EXTERNAL TABLE [ext].[dimension_Supplier]( [Supplier Key] [int] NOT NULL, [WWI Supplier ID] [int] NOT NULL, [Supplier] [nvarchar](100) NOT NULL, [Category] [nvarchar](50) NOT NULL, [Primary Contact] [nvarchar](50) NOT NULL, [Supplier Reference] [nvarchar](20) NULL, [Payment Days] [int] NOT NULL, [Postal Code] [nvarchar](10) NOT NULL, [Valid From] [datetime2](7) NOT NULL, [Valid To] [datetime2](7) NOT NULL, [Lineage Key] [int] NOT NULL ) WITH ( LOCATION ='/v1/dimension_Supplier/', DATA_SOURCE = WWIStorage, FILE_FORMAT = TextFileFormat, REJECT_TYPE = VALUE, REJECT_VALUE = 0 ); CREATE EXTERNAL TABLE [ext].[dimension_TransactionType]( [Transaction Type Key] [int] NOT NULL, [WWI Transaction Type ID] [int] NOT NULL, [Transaction Type] [nvarchar](50) NOT NULL, [Valid From] [datetime2](7) NOT NULL, [Valid To] [datetime2](7) NOT NULL, [Lineage Key] [int] NOT NULL ) WITH ( LOCATION ='/v1/dimension_TransactionType/', DATA_SOURCE = WWIStorage, FILE_FORMAT = TextFileFormat, REJECT_TYPE = VALUE, REJECT_VALUE = 0 ); CREATE EXTERNAL TABLE [ext].[fact_Movement] ( [Movement Key] [bigint] NOT NULL, [Date Key] [date] NOT NULL, [Stock Item Key] [int] NOT NULL, [Customer Key] [int] NULL, [Supplier Key] [int] NULL, [Transaction Type Key] [int] NOT NULL, [WWI Stock Item Transaction ID] [int] NOT NULL, [WWI Invoice ID] [int] NULL, [WWI Purchase Order ID] [int] NULL, [Quantity] [int] NOT NULL, [Lineage Key] [int] NOT NULL ) WITH ( LOCATION ='/v1/fact_Movement/', DATA_SOURCE = WWIStorage, FILE_FORMAT = TextFileFormat, REJECT_TYPE = VALUE, REJECT_VALUE = 0 ); CREATE EXTERNAL TABLE [ext].[fact_Order] ( [Order Key] [bigint] NOT NULL, [City Key] [int] NOT NULL, [Customer Key] [int] NOT NULL, [Stock Item Key] [int] NOT NULL, [Order Date Key] [date] NOT NULL, [Picked Date Key] [date] NULL, [Salesperson Key] [int] NOT NULL, [Picker Key] [int] NULL, [WWI Order ID] [int] NOT NULL, [WWI Backorder ID] [int] NULL, [Description] [nvarchar](100) NOT NULL, [Package] [nvarchar](50) NOT NULL, [Quantity] [int] NOT NULL, [Unit Price] [decimal](18, 2) NOT NULL, [Tax Rate] [decimal](18, 3) NOT NULL, [Total Excluding Tax] [decimal](18, 2) NOT NULL, [Tax Amount] [decimal](18, 2) NOT NULL, [Total Including Tax] [decimal](18, 2) NOT NULL, [Lineage Key] [int] NOT NULL ) WITH ( LOCATION ='/v1/fact_Order/', DATA_SOURCE = WWIStorage, FILE_FORMAT = TextFileFormat, REJECT_TYPE = VALUE, REJECT_VALUE = 0 ); CREATE EXTERNAL TABLE [ext].[fact_Purchase] ( [Purchase Key] [bigint] NOT NULL, [Date Key] [date] NOT NULL, [Supplier Key] [int] NOT NULL, [Stock Item Key] [int] NOT NULL, [WWI Purchase Order ID] [int] NULL, [Ordered Outers] [int] NOT NULL, [Ordered Quantity] [int] NOT NULL, [Received Outers] [int] NOT NULL, [Package] [nvarchar](50) NOT NULL, [Is Order Finalized] [bit] NOT NULL, [Lineage Key] [int] NOT NULL ) WITH ( LOCATION ='/v1/fact_Purchase/', DATA_SOURCE = WWIStorage, FILE_FORMAT = TextFileFormat, REJECT_TYPE = VALUE, REJECT_VALUE = 0 ); CREATE EXTERNAL TABLE [ext].[fact_Sale] ( [Sale Key] [bigint] NOT NULL, [City Key] [int] NOT NULL, [Customer Key] [int] NOT NULL, [Bill To Customer Key] [int] NOT NULL, [Stock Item Key] [int] NOT NULL, [Invoice Date Key] [date] NOT NULL, [Delivery Date Key] [date] NULL, [Salesperson Key] [int] NOT NULL, [WWI Invoice ID] [int] NOT NULL, [Description] [nvarchar](100) NOT NULL, [Package] [nvarchar](50) NOT NULL, [Quantity] [int] NOT NULL, [Unit Price] [decimal](18, 2) NOT NULL, [Tax Rate] [decimal](18, 3) NOT NULL, [Total Excluding Tax] [decimal](18, 2) NOT NULL, [Tax Amount] [decimal](18, 2) NOT NULL, [Profit] [decimal](18, 2) NOT NULL, [Total Including Tax] [decimal](18, 2) NOT NULL, [Total Dry Items] [int] NOT NULL, [Total Chiller Items] [int] NOT NULL, [Lineage Key] [int] NOT NULL ) WITH ( LOCATION ='/v1/fact_Sale/', DATA_SOURCE = WWIStorage, FILE_FORMAT = TextFileFormat, REJECT_TYPE = VALUE, REJECT_VALUE = 0 ); CREATE EXTERNAL TABLE [ext].[fact_StockHolding] ( [Stock Holding Key] [bigint] NOT NULL, [Stock Item Key] [int] NOT NULL, [Quantity On Hand] [int] NOT NULL, [Bin Location] [nvarchar](20) NOT NULL, [Last Stocktake Quantity] [int] NOT NULL, [Last Cost Price] [decimal](18, 2) NOT NULL, [Reorder Level] [int] NOT NULL, [Target Stock Level] [int] NOT NULL, [Lineage Key] [int] NOT NULL ) WITH ( LOCATION ='/v1/fact_StockHolding/', DATA_SOURCE = WWIStorage, FILE_FORMAT = TextFileFormat, REJECT_TYPE = VALUE, REJECT_VALUE = 0 ); CREATE EXTERNAL TABLE [ext].[fact_Transaction] ( [Transaction Key] [bigint] NOT NULL, [Date Key] [date] NOT NULL, [Customer Key] [int] NULL, [Bill To Customer Key] [int] NULL, [Supplier Key] [int] NULL, [Transaction Type Key] [int] NOT NULL, [Payment Method Key] [int] NULL, [WWI Customer Transaction ID] [int] NULL, [WWI Supplier Transaction ID] [int] NULL, [WWI Invoice ID] [int] NULL, [WWI Purchase Order ID] [int] NULL, [Supplier Invoice Number] [nvarchar](20) NULL, [Total Excluding Tax] [decimal](18, 2) NOT NULL, [Tax Amount] [decimal](18, 2) NOT NULL, [Total Including Tax] [decimal](18, 2) NOT NULL, [Outstanding Balance] [decimal](18, 2) NOT NULL, [Is Finalized] [bit] NOT NULL, [Lineage Key] [int] NOT NULL ) WITH ( LOCATION ='/v1/fact_Transaction/', DATA_SOURCE = WWIStorage, FILE_FORMAT = TextFileFormat, REJECT_TYPE = VALUE, REJECT_VALUE = 0 );
在对象资源管理器中展开“SampleDW”,查看已创建的外部表列表。
将数据加载到 SQL 池中
本部分使用已定义的外部表将示例数据从 Azure Blob 加载到 SQL 池。
注意
本教程直接将数据加载到最终表。 在生产环境中,通常使用 CREATE TABLE AS SELECT 将数据加载到临时表。 数据在临时表中时,可以执行任何必要的转换。 要将临时表中的数据追加到生产表,可以使用 INSERT...SELECT 语句。 有关详细信息,请参阅将数据插入到生产表。
下面的脚本使用 CREATE TABLE AS SELECT (CTAS) T-SQL 语句将数据从 Azure 存储 Blob 加载到数据仓库中的新表。 CTAS 基于 select 语句的结果创建新表。 新表包含与 select 语句结果相同的列和数据类型。 当 select 语句从外部表进行选择时,数据将导入到数据仓库的关系表中。
此脚本不会将数据载入 wwi.dimension_Date 和 wwi.fact_Sale 表。 稍后的步骤会生成这些表,使表中包含数目可调整的行。
运行以下脚本,将数据加载到数据仓库中的新表。
CREATE TABLE [wwi].[dimension_City] WITH ( DISTRIBUTION = REPLICATE, CLUSTERED COLUMNSTORE INDEX ) AS SELECT * FROM [ext].[dimension_City] OPTION (LABEL = 'CTAS : Load [wwi].[dimension_City]') ; CREATE TABLE [wwi].[dimension_Customer] WITH ( DISTRIBUTION = REPLICATE, CLUSTERED COLUMNSTORE INDEX ) AS SELECT * FROM [ext].[dimension_Customer] OPTION (LABEL = 'CTAS : Load [wwi].[dimension_Customer]') ; CREATE TABLE [wwi].[dimension_Employee] WITH ( DISTRIBUTION = REPLICATE, CLUSTERED COLUMNSTORE INDEX ) AS SELECT * FROM [ext].[dimension_Employee] OPTION (LABEL = 'CTAS : Load [wwi].[dimension_Employee]') ; CREATE TABLE [wwi].[dimension_PaymentMethod] WITH ( DISTRIBUTION = REPLICATE, CLUSTERED COLUMNSTORE INDEX ) AS SELECT * FROM [ext].[dimension_PaymentMethod] OPTION (LABEL = 'CTAS : Load [wwi].[dimension_PaymentMethod]') ; CREATE TABLE [wwi].[dimension_StockItem] WITH ( DISTRIBUTION = REPLICATE, CLUSTERED COLUMNSTORE INDEX ) AS SELECT * FROM [ext].[dimension_StockItem] OPTION (LABEL = 'CTAS : Load [wwi].[dimension_StockItem]') ; CREATE TABLE [wwi].[dimension_Supplier] WITH ( DISTRIBUTION = REPLICATE, CLUSTERED COLUMNSTORE INDEX ) AS SELECT * FROM [ext].[dimension_Supplier] OPTION (LABEL = 'CTAS : Load [wwi].[dimension_Supplier]') ; CREATE TABLE [wwi].[dimension_TransactionType] WITH ( DISTRIBUTION = REPLICATE, CLUSTERED COLUMNSTORE INDEX ) AS SELECT * FROM [ext].[dimension_TransactionType] OPTION (LABEL = 'CTAS : Load [wwi].[dimension_TransactionType]') ; CREATE TABLE [wwi].[fact_Movement] WITH ( DISTRIBUTION = HASH([Movement Key]), CLUSTERED COLUMNSTORE INDEX ) AS SELECT * FROM [ext].[fact_Movement] OPTION (LABEL = 'CTAS : Load [wwi].[fact_Movement]') ; CREATE TABLE [wwi].[fact_Order] WITH ( DISTRIBUTION = HASH([Order Key]), CLUSTERED COLUMNSTORE INDEX ) AS SELECT * FROM [ext].[fact_Order] OPTION (LABEL = 'CTAS : Load [wwi].[fact_Order]') ; CREATE TABLE [wwi].[fact_Purchase] WITH ( DISTRIBUTION = HASH([Purchase Key]), CLUSTERED COLUMNSTORE INDEX ) AS SELECT * FROM [ext].[fact_Purchase] OPTION (LABEL = 'CTAS : Load [wwi].[fact_Purchase]') ; CREATE TABLE [wwi].[seed_Sale] WITH ( DISTRIBUTION = HASH([WWI Invoice ID]), CLUSTERED COLUMNSTORE INDEX ) AS SELECT * FROM [ext].[fact_Sale] OPTION (LABEL = 'CTAS : Load [wwi].[seed_Sale]') ; CREATE TABLE [wwi].[fact_StockHolding] WITH ( DISTRIBUTION = HASH([Stock Holding Key]), CLUSTERED COLUMNSTORE INDEX ) AS SELECT * FROM [ext].[fact_StockHolding] OPTION (LABEL = 'CTAS : Load [wwi].[fact_StockHolding]') ; CREATE TABLE [wwi].[fact_Transaction] WITH ( DISTRIBUTION = HASH([Transaction Key]), CLUSTERED COLUMNSTORE INDEX ) AS SELECT * FROM [ext].[fact_Transaction] OPTION (LABEL = 'CTAS : Load [wwi].[fact_Transaction]') ;
在加载数据的同时查看数据。 假设要加载几个 GB 的数据,并将其压缩成高性能群集列存储索引。 在 SampleDW 中打开新查询窗口,并运行以下查询显示负载状态。 开始查询后,在 SQL 池执行繁重任务的同时,你可以享用咖啡和小吃。
SELECT r.command, s.request_id, r.status, count(distinct input_name) as nbr_files, sum(s.bytes_processed)/1024/1024/1024 as gb_processed FROM sys.dm_pdw_exec_requests r INNER JOIN sys.dm_pdw_dms_external_work s ON r.request_id = s.request_id WHERE r.[label] = 'CTAS : Load [wwi].[dimension_City]' OR r.[label] = 'CTAS : Load [wwi].[dimension_Customer]' OR r.[label] = 'CTAS : Load [wwi].[dimension_Employee]' OR r.[label] = 'CTAS : Load [wwi].[dimension_PaymentMethod]' OR r.[label] = 'CTAS : Load [wwi].[dimension_StockItem]' OR r.[label] = 'CTAS : Load [wwi].[dimension_Supplier]' OR r.[label] = 'CTAS : Load [wwi].[dimension_TransactionType]' OR r.[label] = 'CTAS : Load [wwi].[fact_Movement]' OR r.[label] = 'CTAS : Load [wwi].[fact_Order]' OR r.[label] = 'CTAS : Load [wwi].[fact_Purchase]' OR r.[label] = 'CTAS : Load [wwi].[fact_StockHolding]' OR r.[label] = 'CTAS : Load [wwi].[fact_Transaction]' GROUP BY r.command, s.request_id, r.status ORDER BY nbr_files desc, gb_processed desc;
查看所有系统查询。
SELECT * FROM sys.dm_pdw_exec_requests;
结果让人欣慰,数据已顺利载入数据仓库。
创建表以及用于生成日期和销售表的过程
本部分将创建 wwi.dimension_Date 和 wwi.fact_Sale 表。 此外,还会创建可在 wwi.dimension_Date 和 wwi.fact_Sale 表中生成数百万行的存储过程。
创建 dimension_Date 和 fact_Sale 表。
CREATE TABLE [wwi].[dimension_Date] ( [Date] [datetime] NOT NULL, [Day Number] [int] NOT NULL, [Day] [nvarchar](10) NOT NULL, [Month] [nvarchar](10) NOT NULL, [Short Month] [nvarchar](3) NOT NULL, [Calendar Month Number] [int] NOT NULL, [Calendar Month Label] [nvarchar](20) NOT NULL, [Calendar Year] [int] NOT NULL, [Calendar Year Label] [nvarchar](10) NOT NULL, [Fiscal Month Number] [int] NOT NULL, [Fiscal Month Label] [nvarchar](20) NOT NULL, [Fiscal Year] [int] NOT NULL, [Fiscal Year Label] [nvarchar](10) NOT NULL, [ISO Week Number] [int] NOT NULL ) WITH ( DISTRIBUTION = REPLICATE, CLUSTERED INDEX ([Date]) ); CREATE TABLE [wwi].[fact_Sale] ( [Sale Key] [bigint] IDENTITY(1,1) NOT NULL, [City Key] [int] NOT NULL, [Customer Key] [int] NOT NULL, [Bill To Customer Key] [int] NOT NULL, [Stock Item Key] [int] NOT NULL, [Invoice Date Key] [date] NOT NULL, [Delivery Date Key] [date] NULL, [Salesperson Key] [int] NOT NULL, [WWI Invoice ID] [int] NOT NULL, [Description] [nvarchar](100) NOT NULL, [Package] [nvarchar](50) NOT NULL, [Quantity] [int] NOT NULL, [Unit Price] [decimal](18, 2) NOT NULL, [Tax Rate] [decimal](18, 3) NOT NULL, [Total Excluding Tax] [decimal](18, 2) NOT NULL, [Tax Amount] [decimal](18, 2) NOT NULL, [Profit] [decimal](18, 2) NOT NULL, [Total Including Tax] [decimal](18, 2) NOT NULL, [Total Dry Items] [int] NOT NULL, [Total Chiller Items] [int] NOT NULL, [Lineage Key] [int] NOT NULL ) WITH ( DISTRIBUTION = HASH ( [WWI Invoice ID] ), CLUSTERED COLUMNSTORE INDEX )
创建 [wwi].[InitialSalesDataPopulation],以便以 8 为乘数增加 [wwi].[seed_Sale] 中的行数。
CREATE PROCEDURE [wwi].[InitialSalesDataPopulation] AS BEGIN INSERT INTO [wwi].[seed_Sale] ( [Sale Key], [City Key], [Customer Key], [Bill To Customer Key], [Stock Item Key], [Invoice Date Key], [Delivery Date Key], [Salesperson Key], [WWI Invoice ID], [Description], [Package], [Quantity], [Unit Price], [Tax Rate], [Total Excluding Tax], [Tax Amount], [Profit], [Total Including Tax], [Total Dry Items], [Total Chiller Items], [Lineage Key] ) SELECT [Sale Key], [City Key], [Customer Key], [Bill To Customer Key], [Stock Item Key], [Invoice Date Key], [Delivery Date Key], [Salesperson Key], [WWI Invoice ID], [Description], [Package], [Quantity], [Unit Price], [Tax Rate], [Total Excluding Tax], [Tax Amount], [Profit], [Total Including Tax], [Total Dry Items], [Total Chiller Items], [Lineage Key] FROM [wwi].[seed_Sale] INSERT INTO [wwi].[seed_Sale] ( [Sale Key], [City Key], [Customer Key], [Bill To Customer Key], [Stock Item Key], [Invoice Date Key], [Delivery Date Key], [Salesperson Key], [WWI Invoice ID], [Description], [Package], [Quantity], [Unit Price], [Tax Rate], [Total Excluding Tax], [Tax Amount], [Profit], [Total Including Tax], [Total Dry Items], [Total Chiller Items], [Lineage Key] ) SELECT [Sale Key], [City Key], [Customer Key], [Bill To Customer Key], [Stock Item Key], [Invoice Date Key], [Delivery Date Key], [Salesperson Key], [WWI Invoice ID], [Description], [Package], [Quantity], [Unit Price], [Tax Rate], [Total Excluding Tax], [Tax Amount], [Profit], [Total Including Tax], [Total Dry Items], [Total Chiller Items], [Lineage Key] FROM [wwi].[seed_Sale] INSERT INTO [wwi].[seed_Sale] ( [Sale Key], [City Key], [Customer Key], [Bill To Customer Key], [Stock Item Key], [Invoice Date Key], [Delivery Date Key], [Salesperson Key], [WWI Invoice ID], [Description], [Package], [Quantity], [Unit Price], [Tax Rate], [Total Excluding Tax], [Tax Amount], [Profit], [Total Including Tax], [Total Dry Items], [Total Chiller Items], [Lineage Key] ) SELECT [Sale Key], [City Key], [Customer Key], [Bill To Customer Key], [Stock Item Key], [Invoice Date Key], [Delivery Date Key], [Salesperson Key], [WWI Invoice ID], [Description], [Package], [Quantity], [Unit Price], [Tax Rate], [Total Excluding Tax], [Tax Amount], [Profit], [Total Including Tax], [Total Dry Items], [Total Chiller Items], [Lineage Key] FROM [wwi].[seed_Sale] END
创建此存储过程,以便在 wwi.dimension_Date 中填充行。
CREATE PROCEDURE [wwi].[PopulateDateDimensionForYear] @Year [int] AS BEGIN IF OBJECT_ID('tempdb..#month', 'U') IS NOT NULL DROP TABLE #month CREATE TABLE #month ( monthnum int, numofdays int ) WITH ( DISTRIBUTION = ROUND_ROBIN, heap ) INSERT INTO #month SELECT 1, 31 UNION SELECT 2, CASE WHEN (@YEAR % 4 = 0 AND @YEAR % 100 <> 0) OR @YEAR % 400 = 0 THEN 29 ELSE 28 END UNION SELECT 3,31 UNION SELECT 4,30 UNION SELECT 5,31 UNION SELECT 6,30 UNION SELECT 7,31 UNION SELECT 8,31 UNION SELECT 9,30 UNION SELECT 10,31 UNION SELECT 11,30 UNION SELECT 12,31 IF OBJECT_ID('tempdb..#days', 'U') IS NOT NULL DROP TABLE #days CREATE TABLE #days (days int) WITH (DISTRIBUTION = ROUND_ROBIN, HEAP) INSERT INTO #days SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 UNION SELECT 10 UNION SELECT 11 UNION SELECT 12 UNION SELECT 13 UNION SELECT 14 UNION SELECT 15 UNION SELECT 16 UNION SELECT 17 UNION SELECT 18 UNION SELECT 19 UNION SELECT 20 UNION SELECT 21 UNION SELECT 22 UNION SELECT 23 UNION SELECT 24 UNION SELECT 25 UNION SELECT 26 UNION SELECT 27 UNION SELECT 28 UNION SELECT 29 UNION SELECT 30 UNION SELECT 31 INSERT [wwi].[dimension_Date] ( [Date], [Day Number], [Day], [Month], [Short Month], [Calendar Month Number], [Calendar Month Label], [Calendar Year], [Calendar Year Label], [Fiscal Month Number], [Fiscal Month Label], [Fiscal Year], [Fiscal Year Label], [ISO Week Number] ) SELECT CAST(CAST(monthnum AS VARCHAR(2)) + '/' + CAST([days] AS VARCHAR(3)) + '/' + CAST(@year AS CHAR(4)) AS DATE) AS [Date] ,DAY(CAST(CAST(monthnum AS VARCHAR(2)) + '/' + CAST([days] AS VARCHAR(3)) + '/' + CAST(@year AS CHAR(4)) AS DATE)) AS [Day Number] ,CAST(DATENAME(day, CAST(CAST(monthnum AS VARCHAR(2)) + '/' + CAST([days] AS VARCHAR(3)) + '/' + CAST(@year AS CHAR(4)) AS DATE)) AS NVARCHAR(10)) AS [Day] ,CAST(DATENAME(month, CAST(CAST(monthnum AS VARCHAR(2)) + '/' + CAST([days] AS VARCHAR(3)) + '/' + CAST(@year as char(4)) AS DATE)) AS nvarchar(10)) AS [Month] ,CAST(SUBSTRING(DATENAME(month, CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)), 1, 3) AS nvarchar(3)) AS [Short Month] ,MONTH(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) AS [Calendar Month Number] ,CAST(N'CY' + CAST(YEAR(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) AS nvarchar(4)) + N'-' + SUBSTRING(DATENAME(month, CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)), 1, 3) AS nvarchar(10)) AS [Calendar Month Label] ,YEAR(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) AS [Calendar Year] ,CAST(N'CY' + CAST(YEAR(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) AS nvarchar(4)) AS nvarchar(10)) AS [Calendar Year Label] ,CASE WHEN MONTH(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) IN (11, 12) THEN MONTH(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) - 10 ELSE MONTH(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) + 2 END AS [Fiscal Month Number] ,CAST(N'FY' + CAST(CASE WHEN MONTH(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) IN (11, 12) THEN YEAR(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) + 1 ELSE YEAR(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) END AS nvarchar(4)) + N'-' + SUBSTRING(DATENAME(month, CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)), 1, 3) AS nvarchar(20)) AS [Fiscal Month Label] ,CASE WHEN MONTH(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) IN (11, 12) THEN YEAR(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) + 1 ELSE YEAR(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) END AS [Fiscal Year] ,CAST(N'FY' + CAST(CASE WHEN MONTH(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) IN (11, 12) THEN YEAR(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) + 1 ELSE YEAR(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE))END AS nvarchar(4)) AS nvarchar(10)) AS [Fiscal Year Label] , DATEPART(ISO_WEEK, CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) AS [ISO Week Number] FROM #month m CROSS JOIN #days d WHERE d.days <= m.numofdays DROP table #month; DROP table #days; END;
创建此过程,以便填充 wwi.dimension_Date 和 wwi.fact_Sale 表。 此过程调用 [wwi].[PopulateDateDimensionForYear] 来填充 wwi.dimension_Date。
CREATE PROCEDURE [wwi].[Configuration_PopulateLargeSaleTable] @EstimatedRowsPerDay [bigint],@Year [int] AS BEGIN SET NOCOUNT ON; SET XACT_ABORT ON; EXEC [wwi].[PopulateDateDimensionForYear] @Year; DECLARE @OrderCounter bigint = 0; DECLARE @NumberOfSalesPerDay bigint = @EstimatedRowsPerDay; DECLARE @DateCounter date; DECLARE @StartingSaleKey bigint; DECLARE @MaximumSaleKey bigint = (SELECT MAX([Sale Key]) FROM wwi.seed_Sale); DECLARE @MaxDate date; SET @MaxDate = (SELECT MAX([Invoice Date Key]) FROM wwi.fact_Sale) IF ( @MaxDate < CAST(@YEAR AS CHAR(4)) + '1231') AND (@MaxDate > CAST(@YEAR AS CHAR(4)) + '0101') SET @DateCounter = @MaxDate ELSE SET @DateCounter= CAST(@Year as char(4)) + '0101'; PRINT 'Targeting ' + CAST(@NumberOfSalesPerDay AS varchar(20)) + ' sales per day.'; DECLARE @OutputCounter varchar(20); DECLARE @variance DECIMAL(18,10); DECLARE @VariantNumberOfSalesPerDay BIGINT; WHILE @DateCounter < CAST(@YEAR AS CHAR(4)) + '1231' BEGIN SET @OutputCounter = CONVERT(varchar(20), @DateCounter, 112); RAISERROR(@OutputCounter, 0, 1); SET @variance = (SELECT RAND() * 10)*.01 + .95 SET @VariantNumberOfSalesPerDay = FLOOR(@NumberOfSalesPerDay * @variance) SET @StartingSaleKey = @MaximumSaleKey - @VariantNumberOfSalesPerDay - FLOOR(RAND() * 20000); SET @OrderCounter = 0; INSERT [wwi].[fact_Sale] ( [City Key], [Customer Key], [Bill To Customer Key], [Stock Item Key], [Invoice Date Key], [Delivery Date Key], [Salesperson Key], [WWI Invoice ID], [Description], Package, Quantity, [Unit Price], [Tax Rate], [Total Excluding Tax], [Tax Amount], Profit, [Total Including Tax], [Total Dry Items], [Total Chiller Items], [Lineage Key] ) SELECT TOP(@VariantNumberOfSalesPerDay) [City Key], [Customer Key], [Bill To Customer Key], [Stock Item Key], @DateCounter, DATEADD(day, 1, @DateCounter), [Salesperson Key], [WWI Invoice ID], [Description], Package, Quantity, [Unit Price], [Tax Rate], [Total Excluding Tax], [Tax Amount], Profit, [Total Including Tax], [Total Dry Items], [Total Chiller Items], [Lineage Key] FROM [wwi].[seed_Sale] WHERE --[Sale Key] > @StartingSaleKey and /* IDENTITY DOES NOT WORK THE SAME IN SQLDW AND CAN'T USE THIS METHOD FOR VARIANT */ [Invoice Date Key] >=cast(@YEAR AS CHAR(4)) + '-01-01' ORDER BY [Sale Key]; SET @DateCounter = DATEADD(day, 1, @DateCounter); END; END;
生成数百万行
使用创建的存储过程在 wwi.fact_Sale 表中生成数百万行,并在 wwi.dimension_Date 表中生成相应的数据。
运行此过程,在 [wwi].[seed_Sale] 中播种更多行。
EXEC [wwi].[InitialSalesDataPopulation]
运行此过程,以便每天在 wwi.fact_Sale 中填充 100,000 行,这些行对应于 2000 年的每一天。
EXEC [wwi].[Configuration_PopulateLargeSaleTable] 100000, 2000
在上一步骤中,由于生成的数据跨越一整年,因此,该过程可能需要花费较长时间。 若要查看当前进程正在处理哪一天的数据,请打开新的查询,并运行以下 SQL 命令:
SELECT MAX([Invoice Date Key]) FROM wwi.fact_Sale;
运行以下命令查看已用的空间。
EXEC sp_spaceused N'wwi.fact_Sale';
填充复制表缓存
SQL 池通过将数据缓存到每个计算节点来复制表。 针对该表运行查询时,将会填充缓存。 因此,针对复制表运行的第一个查询可能需要花费额外的时间来填充缓存。 填充缓存后,针对复制表运行查询的速度会加快。
运行这些 SQL 查询可填充计算节点上的复制表缓存。
SELECT TOP 1 * FROM [wwi].[dimension_City];
SELECT TOP 1 * FROM [wwi].[dimension_Customer];
SELECT TOP 1 * FROM [wwi].[dimension_Date];
SELECT TOP 1 * FROM [wwi].[dimension_Employee];
SELECT TOP 1 * FROM [wwi].[dimension_PaymentMethod];
SELECT TOP 1 * FROM [wwi].[dimension_StockItem];
SELECT TOP 1 * FROM [wwi].[dimension_Supplier];
SELECT TOP 1 * FROM [wwi].[dimension_TransactionType];
创建新加载的数据的统计信息
若要实现较高的查询性能,必须在首次加载后基于每个表的每个列创建统计信息。 此外,在对数据做出重大更改后,必须更新统计信息。
创建此存储过程,以便更新所有表的所有列中的统计信息。
CREATE PROCEDURE [dbo].[prc_sqldw_create_stats] ( @create_type tinyint -- 1 default 2 Fullscan 3 Sample , @sample_pct tinyint ) AS IF @create_type IS NULL BEGIN SET @create_type = 1; END; IF @create_type NOT IN (1,2,3) BEGIN THROW 151000,'Invalid value for @stats_type parameter. Valid range 1 (default), 2 (fullscan) or 3 (sample).',1; END; IF @sample_pct IS NULL BEGIN; SET @sample_pct = 20; END; IF OBJECT_ID('tempdb..#stats_ddl') IS NOT NULL BEGIN; DROP TABLE #stats_ddl; END; CREATE TABLE #stats_ddl WITH ( DISTRIBUTION = HASH([seq_nmbr]) , LOCATION = USER_DB ) AS WITH T AS ( SELECT t.[name] AS [table_name] , s.[name] AS [table_schema_name] , c.[name] AS [column_name] , c.[column_id] AS [column_id] , t.[object_id] AS [object_id] , ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS [seq_nmbr] FROM sys.[tables] t JOIN sys.[schemas] s ON t.[schema_id] = s.[schema_id] JOIN sys.[columns] c ON t.[object_id] = c.[object_id] LEFT JOIN sys.[stats_columns] l ON l.[object_id] = c.[object_id] AND l.[column_id] = c.[column_id] AND l.[stats_column_id] = 1 LEFT JOIN sys.[external_tables] e ON e.[object_id] = t.[object_id] WHERE l.[object_id] IS NULL AND e.[object_id] IS NULL -- not an external table ) SELECT [table_schema_name] , [table_name] , [column_name] , [column_id] , [object_id] , [seq_nmbr] , CASE @create_type WHEN 1 THEN CAST('CREATE STATISTICS '+QUOTENAME('stat_'+table_schema_name+ '_' + table_name + '_'+column_name)+' ON '+QUOTENAME(table_schema_name)+'.'+QUOTENAME(table_name)+'('+QUOTENAME(column_name)+')' AS VARCHAR(8000)) WHEN 2 THEN CAST('CREATE STATISTICS '+QUOTENAME('stat_'+table_schema_name+ '_' + table_name + '_'+column_name)+' ON '+QUOTENAME(table_schema_name)+'.'+QUOTENAME(table_name)+'('+QUOTENAME(column_name)+') WITH FULLSCAN' AS VARCHAR(8000)) WHEN 3 THEN CAST('CREATE STATISTICS '+QUOTENAME('stat_'+table_schema_name+ '_' + table_name + '_'+column_name)+' ON '+QUOTENAME(table_schema_name)+'.'+QUOTENAME(table_name)+'('+QUOTENAME(column_name)+') WITH SAMPLE '+CONVERT(varchar(4),@sample_pct)+' PERCENT' AS VARCHAR(8000)) END AS create_stat_ddl FROM T ; DECLARE @i INT = 1 , @t INT = (SELECT COUNT(*) FROM #stats_ddl) , @s NVARCHAR(4000) = N'' ; WHILE @i <= @t BEGIN SET @s=(SELECT create_stat_ddl FROM #stats_ddl WHERE seq_nmbr = @i); PRINT @s EXEC sp_executesql @s SET @i+=1; END DROP TABLE #stats_ddl;
运行此命令,在数据仓库中所有表的所有列中创建统计信息。
EXEC [dbo].[prc_sqldw_create_stats] 1, NULL;
清理资源
需要为加载到数据仓库中的计算资源和数据付费。 这些需要单独计费。
请按照下列步骤按需清理资源。
登录到 Azure 门户,单击数据仓库。
如果想要将数据保留在存储中,可以在不使用数据仓库时暂停计算。 暂停计算后,仅需为数据存储付费,并且随时都可在准备处理数据时恢复计算。 要暂停计算,请单击“暂停”按钮。 暂停数据仓库后,可看到“启动”按钮。 要恢复计算,请单击“启动”。
如果不想支付将来的费用,则可以删除数据仓库。 要删除数据仓库,以便不再为计算或存储付费,请单击“删除”。
若要删除创建的服务器,请单击上图中的“sample-svr.database.chinacloudapi.cn”,然后单击“删除” 。 请审慎执行此操作,因为删除服务器会删除分配给该服务器的所有数据库。
若要删除资源组,请单击“SampleRG”,然后单击“删除资源组”。
后续步骤
在本教程中,已学习了如何创建数据仓库以及用于加载数据的用户。 创建了外部表以定义 Azure 存储 Blob 中存储的数据的结构,然后使用 PolyBase CREATE TABLE AS SELECT 语句将数据加载到数据仓库。
完成了以下操作:
- 在 Azure 门户中使用 SQL 池创建数据仓库
- 在 Azure 门户中设置服务器级防火墙规则
- 使用 SSMS 连接到 SQL 池
- 创建了专用于加载数据的用户
- 在 Azure 存储 Blob 中为数据创建了外部表
- 使用 CTAS T-SQL 语句将数据加载到数据仓库
- 查看了正在加载的数据的进度
- 创建了新加载的数据的统计信息
转到开发概述,了解如何将现有数据库迁移到 Azure Synapse SQL 池。