Build a logical Enterprise Data Warehouse with ADLS and Synapse Serverless SQL pool
Published Sep 12 2022 07:56 AM 4,805 Views
Microsoft

Build a Logical Enterprise Data Warehouse using Azure Data Lake Storage Gen 2 and Synapse Serverless SQL pools

 

Overview

A common pattern in Modern Data Warehouse architecture is to land your source data in its existing format into a data lake followed by transforming and loading it into an Enterprise Data Warehouse (EDW) for reporting and analytics.  In Microsoft Azure, this translates to loading data into Azure Data Lake Storage Gen2 (ADLS), transforming data with Synapse or Azure Data Factory Pipelines, storing data in an Azure Synapse Dedicated Pool database or other Azure relational data store, and building reports over that data in Power BI, Excel, or other reporting tools. Typically, the EDW data is stored in a Star Schema, the optimal design for many reporting and analytical tools like Power BI.  This is a great practice for enterprise reporting requirements but has some pitfalls: 

  1. Traditional transformation and load of data to the final EDW data store can be time consuming
  2. Ingress and storage costs increase when transforming and loading from the landing zone to the final data store for reporting
  3. Deep knowledge of the data sources is required in order to provide business insights and optimal data warehouse design

 But: 

  • What if your data was transformed and ready for reporting as soon as it is landed in the cloud?
  • What if you could eliminate that time consuming ETL process? 
  • How about not incurring additional costs for moving, transforming and storing data in another location? 
  • What if you suspect that you will need a traditional EDW but would like to start an Exploratory Data Analysis project in the cloud?
  • How about doing this all within T-SQL? 

Then Serverless SQL Pools in Azure Synapse Analytics may be the answer for you!  

 

In this post, we'll walk through creating a logical data warehouse over your ADLS data using a Serverless SQL database. 

jehayes_0-1662743900808.png

Prerequisites

Steps to create a logical EDW via Synapse SQL Serverless

  1. Open Synapse Studio

  2. Create a SQL Serverless Database in Azure Synapse Analytics Workspace Data hub
jehayes_3-1662745045331.png jehayes_0-1662754527468.png

 

 

 

  1. I created two schemas, Bronze and Silver. The views in the Bronze schema represent the ADLS data in its existing format. The views in the Silver schema contain queries over the Bronze views to denormalize the data into Facts and Dimensions. In the Data hub, go to your SQL Serverless database and click on the ellipses to the right of the database name, choose New SQL script, then Empty script

jehayes_5-1662745576448.png

In the empty script, type CREATE SCHEMA Bronze, then click Run:

jehayes_6-1662745641267.png

  1. Repeat step 3 to create a Silver schema.
  2. Create the SQL views in your Bronze schema over your ADLS files to define column names for easy querying in t-SQL.

A quick way to do this is to navigate to your ADLS account in the Data hub and generate a New SQL script over the folder (or file) for each entity

jehayes_0-1662752165574.png

 

You will be prompted for the format (Text, Parquet, or Delta) and then a script will be created for you. Change the script to eliminate the TOP100 from the SELECT statement and add CREATE VIEW. Click Run to create the view. Do this for all of the ADLS files that will be used in your logical data warehouse.

jehayes_0-1662990720156.png

 

 

Here are the views I created in my Bronze Schema:

jehayes_2-1662752313516.png

 

  1. Create SQL views in the Silver schema to define the transformations for the fact and dimension tables for your star schema.

Below are the views I created in the Silver schema. You can create your own views or download the code I created for the views below

jehayes_3-1662752444560.png

Here is an example of the StockItems dimension table that was denormalized for the star schema:

 

 

 

Create View Silver.StockItems as
SELECT  SI_A.StockItemID,
SI_A.StockItemName,
S.SupplierID,
S.SupplierName,
SC.SupplierCategoryID,
SC.SupplierCategoryName,
C.ColorID,
SI_A.UnitPackageID as [Selling Package],
SP_A.PackageTypeName,
SI_A.OuterPackageID as [Buying Package],
SI_A.Brand,
SI_A.Size,
SI_A.LeadTimeDays,
SI_A.QuantityPerOuter,
SI_A.IsChillerStock,
SI_A.Barcode,
SI_A.TaxRate,
SI_A.UnitPrice,
SI_A.RecommendedRetailPrice,
SI_A.TypicalWeightPerUnit,
SI_A.MarketingComments,
SI_A.InternalComments,
SI_A.Photo, 
SI_A.CustomFields,
SI_A.Tags,
SI_A.SearchDetails,
SI_A.LastEditedBy,
SI_A.ValidFrom,
SI_A.ValidTo
From Bronze.StockItems as SI_A
INNER JOIN Bronze.Suppliers S ON SI_A.SupplierID = S.SupplierID
INNER JOIN Bronze.Colors C on SI_A.ColorID = C.ColorID
INNER JOIN Bronze.PackageTypes SP_A on SP_A.PackageTypeID = SI_A.UnitPackageID
INNER JOIN Bronze.SupplierCategories SC on S.SupplierCategoryID = SC.SupplierCategoryID

 

 

 

  1. Connect to the SQL Serverless Endpoint to query your database in your tool of choice, such as Power BI or SQL Server Management studio. Start by navigating to your Synapse Analytics Workspace in the Azure Portal and copying the Serverless SQL Endpoint:

jehayes_0-1662752668964.png

Connect to the endpoint in SQL Server Management Studio by pasting the Serverless SQL endpoint URL as the Server name:

jehayes_1-1662752832999.png

Here's the results of a query over one of the views:

jehayes_2-1662752901459.png

Connect to the Serverless SQL endpoint in Power BI Desktop:

jehayes_5-1662753299891.png

 

jehayes_6-1662753315447.png

 

Load and transform just like any other Power BI data source:

jehayes_0-1662753568474.png

Summary

Keeping data in ADLS Gen2 without transforming and loading to another data platform gives users quick access to the data they need for reporting and analytics without the cost and delays due to heavy transformations and data movement. After creating your logical EDW in the Synapse SQL Serverless database, you can further evaluate if Synapse SQL Serverless meets your goals for query performance. If more performance is needed, you can leverage your Synapse SQL Serverless views to move data to another Azure data platform like Azure Synapse Analytics SQL Dedicated pool for further query optimization and performance features.

Co-Authors
Version history
Last update:
‎Sep 12 2022 07:20 AM
Updated by: