Tutorial: Analyze data with dedicated SQL pools
In this tutorial, use the NYC Taxi data to explore a dedicated SQL pool's capabilities.
- [Deploy a dedicated SQL pool]
- [Load data into the pool]
- [Explore the data you've loaded]
Prerequisites
- This tutorial assumes you've completed the steps in the rest of the quickstarts. Specifically it uses the 'contosodatalake' resource created in the Create a Synapse Workspace quickstart.
Create a dedicated SQL pool
- In Synapse Studio, on the left-side pane, select Manage > SQL pools under Analytics pools.
- Select New.
- For Dedicated SQL pool name select
SQLPOOL1
. - For Performance level choose DW100C.
- Select Review + create > Create. Your dedicated SQL pool will be ready in a few minutes.
Your dedicated SQL pool is associated with a SQL database that's also called SQLPOOL1
.
- Navigate to Data > Workspace.
- You should see a database named SQLPOOL1. If you don't see it, select Refresh.
A dedicated SQL pool consumes billable resources as long as it's active. You can pause the pool later to reduce costs.
Note
When creating a new dedicated SQL pool (formerly SQL DW) in your workspace, the dedicated SQL pool provisioning page will open. Provisioning will take place on the logical SQL server.
Load the NYC Taxi Data into SQLPOOL1
In Synapse Studio, navigate to the Develop hub, select the + button to add new resource, then create new SQL script.
Select the pool
SQLPOOL1
(pool created in STEP 1 of this tutorial) in Connect to drop down list above the script.Enter the following code:
IF NOT EXISTS (SELECT * FROM sys.objects O JOIN sys.schemas S ON O.schema_id = S.schema_id WHERE O.NAME = 'NYCTaxiTripSmall' AND O.TYPE = 'U' AND S.NAME = 'dbo') CREATE TABLE dbo.NYCTaxiTripSmall ( [VendorID] bigint, [store_and_fwd_flag] nvarchar(1) NULL, [RatecodeID] float NULL, [PULocationID] bigint NULL, [DOLocationID] bigint NULL, [passenger_count] float NULL, [trip_distance] float NULL, [fare_amount] float NULL, [extra] float NULL, [mta_tax] float NULL, [tip_amount] float NULL, [tolls_amount] float NULL, [ehail_fee] float NULL, [improvement_surcharge] float NULL, [total_amount] float NULL, [payment_type] float NULL, [trip_type] float NULL, [congestion_surcharge] float NULL ) WITH ( DISTRIBUTION = ROUND_ROBIN, CLUSTERED COLUMNSTORE INDEX -- HEAP ) GO COPY INTO dbo.NYCTaxiTripSmall (VendorID 1, store_and_fwd_flag 4, RatecodeID 5, PULocationID 6 , DOLocationID 7, passenger_count 8,trip_distance 9, fare_amount 10, extra 11, mta_tax 12, tip_amount 13, tolls_amount 14, ehail_fee 15, improvement_surcharge 16, total_amount 17, payment_type 18, trip_type 19, congestion_surcharge 20 ) FROM 'https://contosolake.dfs.core.chinacloudapi.cn/users/NYCTripSmall.parquet' WITH ( FILE_TYPE = 'PARQUET' ,MAXERRORS = 0 ,IDENTITY_INSERT = 'OFF' ,AUTO_CREATE_TABLE ='ON' )
Tip
If you get an error that reads
Login failed for user '<token-identified principal>'
, you need to set your Entra Id admin.- In the Azure Portal, search for your synapse workspace.
- Under Settings select Microsoft Entra ID.
- Select Set admin and set a Microsoft Entra ID admin.
Select the Run button to execute the script.
This script finishes in less than 60 seconds. It loads 2 million rows of NYC Taxi data into a table called
dbo.NYCTaxiTripSmall
.
Explore the NYC Taxi data in the dedicated SQL pool
In Synapse Studio, go to the Data hub.
Go to SQLPOOL1 > Tables. (If you don't see it in the menu, refresh the page.)
Right-click the dbo.NYCTaxiTripSmall table and select New SQL Script > Select TOP 100 Rows.
Wait while a new SQL script is created and runs.
At the top of the SQL script Connect to is automatically set to the SQL pool called SQLPOOL1.
Replace the text of the SQL script with this code and run it.
SELECT passenger_count as PassengerCount, SUM(trip_distance) as SumTripDistance_miles, AVG(trip_distance) as AvgTripDistance_miles INTO dbo.PassengerCountStats FROM dbo.NYCTaxiTripSmall WHERE trip_distance > 0 AND passenger_count > 0 GROUP BY passenger_count; SELECT * FROM dbo.PassengerCountStats ORDER BY PassengerCount;
This query creates a table
dbo.PassengerCountStats
with aggregate data from thetrip_distance
field, then queries the new table. The data shows how the total trip distances and average trip distance relate to the number of passengers.In the SQL script result window, change the View to Chart to see a visualization of the results as a line chart. Change Category column to
PassengerCount
.
Clean up
Pause your dedicated SQL Pool to reduce costs.
- Navigate to Manage in your synapse workspace.
- Select SQL pools.
- Hover over SQLPOOL1 and select the Pause button.
- Confirm to pause.