Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
In this quickstart, you'll bulk load data into your dedicated SQL pool using the simple and flexible COPY statement for high-throughput data ingestion. The COPY statement is the recommended loading utility as it enables you to seamlessly and flexibly load data by providing functionality to:
- Allow lower privileged users to load without needing strict CONTROL permissions on the data warehouse
- Leverage only a single T-SQL statement without having to create any additional database objects
- Leverage a finer permission model without exposing storage account keys using Share Access Signatures (SAS)
- Specify a different storage account for the ERRORFILE location (REJECTED_ROW_LOCATION)
- Customize default values for each target column and specify source data fields to load into specific target columns
- Specify a custom row terminator for CSV files
- Escape string, field, and row delimiters for CSV files
- Leverage SQL Server Date formats for CSV files
- Specify wildcards and multiple files in the storage location path
Prerequisites
This quickstart assumes you already have a dedicated SQL pool. If a dedicated SQL pool hasn't been created, use the Create and Connect - portal quickstart.
Set up the required permissions
-- List the permissions for your user
select princ.name
, princ.type_desc
, perm.permission_name
, perm.state_desc
, perm.class_desc
, object_name(perm.major_id)
from sys.database_principals princ
left join
sys.database_permissions perm
on perm.grantee_principal_id = princ.principal_id
where name = '<yourusername>';
--Make sure your user has the permissions to CREATE tables in the [dbo] schema
GRANT CREATE TABLE TO <yourusername>;
GRANT ALTER ON SCHEMA::dbo TO <yourusername>;
--Make sure your user has ADMINISTER DATABASE BULK OPERATIONS permissions
GRANT ADMINISTER DATABASE BULK OPERATIONS TO <yourusername>
--Make sure your user has INSERT permissions on the target table
GRANT INSERT ON <yourtable> TO <yourusername>
Create the target table
In this example, we'll be loading data from the New York taxi dataset. We'll load a table called Trip that represents taxi trips taken within a single year. Run the following to create the table:
CREATE TABLE [dbo].[Trip]
(
[DateID] int NOT NULL,
[MedallionID] int NOT NULL,
[HackneyLicenseID] int NOT NULL,
[PickupTimeID] int NOT NULL,
[DropoffTimeID] int NOT NULL,
[PickupGeographyID] int NULL,
[DropoffGeographyID] int NULL,
[PickupLatitude] float NULL,
[PickupLongitude] float NULL,
[PickupLatLong] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[DropoffLatitude] float NULL,
[DropoffLongitude] float NULL,
[DropoffLatLong] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[PassengerCount] int NULL,
[TripDurationSeconds] int NULL,
[TripDistanceMiles] float NULL,
[PaymentType] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[FareAmount] money NULL,
[SurchargeAmount] money NULL,
[TaxAmount] money NULL,
[TipAmount] money NULL,
[TollsAmount] money NULL,
[TotalAmount] money NULL
)
WITH
(
DISTRIBUTION = ROUND_ROBIN,
CLUSTERED COLUMNSTORE INDEX
);
Run the COPY statement
Run the following COPY statement that will load data from the Azure blob storage account into the Trip table.
COPY INTO [dbo].[Trip] FROM 'https://nytaxiblob.blob.core.windows.net/2013/Trip2013/'
WITH (
FIELDTERMINATOR='|',
ROWTERMINATOR='0x0A'
) OPTION (LABEL = 'COPY: dbo.trip');
Monitor the load
Check whether your load is making progress by periodically running the following query:
SELECT r.[request_id]
, r.[status]
, r.resource_class
, r.command
, sum(bytes_processed) AS bytes_processed
, sum(rows_processed) AS rows_processed
FROM sys.dm_pdw_exec_requests r
JOIN sys.dm_pdw_dms_workers w
ON r.[request_id] = w.request_id
WHERE [label] = 'COPY: dbo.trip' and session_id <> session_id() and type = 'WRITER'
GROUP BY r.[request_id]
, r.[status]
, r.resource_class
, r.command;
Next steps
- For best practices on data loading, see Best Practices for Loading Data.
- For information on how to manage the resources for your data loads, see Workload Isolation.