A common pattern in Modern Data Warehouse architecture is to land your source data in its existing format into a data lake followed by transforming and loading it into an Enterprise Data Warehouse (EDW) for reporting and analytics. In Microsoft Azure, this translates to loading data into Azure Data Lake Storage Gen2 (ADLS), transforming data with Synapse or Azure Data Factory Pipelines, storing data in an Azure Synapse Dedicated Pool database or other Azure relational data store, and building reports over that data in Power BI, Excel, or other reporting tools. Typically, the EDW data is stored in a Star Schema, the optimal design for many reporting and analytical tools like Power BI. This is a great practice for enterprise reporting requirements but has some pitfalls:
But:
Then Serverless SQL Pools in Azure Synapse Analytics may be the answer for you!
In this post, we'll walk through creating a logical data warehouse over your ADLS data using a Serverless SQL database.
|
In the empty script, type CREATE SCHEMA Bronze, then click Run:
A quick way to do this is to navigate to your ADLS account in the Data hub and generate a New SQL script over the folder (or file) for each entity
You will be prompted for the format (Text, Parquet, or Delta) and then a script will be created for you. Change the script to eliminate the TOP100 from the SELECT statement and add CREATE VIEW. Click Run to create the view. Do this for all of the ADLS files that will be used in your logical data warehouse.
Here are the views I created in my Bronze Schema:
Below are the views I created in the Silver schema. You can create your own views or download the code I created for the views below
Here is an example of the StockItems dimension table that was denormalized for the star schema:
Create View Silver.StockItems as
SELECT SI_A.StockItemID,
SI_A.StockItemName,
S.SupplierID,
S.SupplierName,
SC.SupplierCategoryID,
SC.SupplierCategoryName,
C.ColorID,
SI_A.UnitPackageID as [Selling Package],
SP_A.PackageTypeName,
SI_A.OuterPackageID as [Buying Package],
SI_A.Brand,
SI_A.Size,
SI_A.LeadTimeDays,
SI_A.QuantityPerOuter,
SI_A.IsChillerStock,
SI_A.Barcode,
SI_A.TaxRate,
SI_A.UnitPrice,
SI_A.RecommendedRetailPrice,
SI_A.TypicalWeightPerUnit,
SI_A.MarketingComments,
SI_A.InternalComments,
SI_A.Photo,
SI_A.CustomFields,
SI_A.Tags,
SI_A.SearchDetails,
SI_A.LastEditedBy,
SI_A.ValidFrom,
SI_A.ValidTo
From Bronze.StockItems as SI_A
INNER JOIN Bronze.Suppliers S ON SI_A.SupplierID = S.SupplierID
INNER JOIN Bronze.Colors C on SI_A.ColorID = C.ColorID
INNER JOIN Bronze.PackageTypes SP_A on SP_A.PackageTypeID = SI_A.UnitPackageID
INNER JOIN Bronze.SupplierCategories SC on S.SupplierCategoryID = SC.SupplierCategoryID
Connect to the endpoint in SQL Server Management Studio by pasting the Serverless SQL endpoint URL as the Server name:
Here's the results of a query over one of the views:
Connect to the Serverless SQL endpoint in Power BI Desktop:
|
|
Load and transform just like any other Power BI data source:
Keeping data in ADLS Gen2 without transforming and loading to another data platform gives users quick access to the data they need for reporting and analytics without the cost and delays due to heavy transformations and data movement. After creating your logical EDW in the Synapse SQL Serverless database, you can further evaluate if Synapse SQL Serverless meets your goals for query performance. If more performance is needed, you can leverage your Synapse SQL Serverless views to move data to another Azure data platform like Azure Synapse Analytics SQL Dedicated pool for further query optimization and performance features.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.