Reference cloud Data Warehouse architecture for MS Azure
Data Warehousing in the traditional on-premises world is a well-elaborated and studied discipline. With all the logical and physical architectures, data modeling techniques and methodologies, industrial data models, business glossaries, and data governance principles, one can find plenty of guidelines on how to build the solution properly.
Data Warehousing in the traditional on-premises world is well-elaborated and studied discipline. With all the logical and physical architectures, data modeling techniques and methodologies, industrial data models, business glossaries, and data governance principles, one can find plenty of guidelines on how to build the solution properly.
However, with emerging cloud technologies and a variety of options, finding the best practice and proven architecture for the cloud can be challenging. Thus, it is necessary to balance the options from the practicality, sustainability, and time-to-market perspectives.
Simplified reference architecture
The following diagram depicts the toolset best practice, covering necessary functional blocks.
Data ingestion from the sources usually requires fast and reliable replication for cost-effective storage. Using Azure Data Factory and/or Pipelines for the data replication to Azure Data Lake Storage is definitely a good choice
Data pipelines with most of the business logic for the typical core of the Data Warehouse requires complex transformation logic to be implemented. Here, again, using Azure Data Factory and/or Pipelines and storing the data either in Azure Synapse or Azure SQL Database is a very grounded decision
For data streaming purposes, should real-time integration be necessary, working with Azure Event Hub and Azure Stream Analytics and propagating the data to Azure Cosmos DB is a very good practice
Reporting, data science, and machine learning are typically a matter of taste. However, Azure supports these with MS PowerBI capabilities and Azure Machine Learning capabilities
Last, but not least, the data governance capabilities of Azure are supported mostly by Azure Purview
How ADELE helps to accelerate migration?
The diagram above describes the reference architecture not only for the Data Warehouse being built from scratch. The same architecture principles can be applied to the migrated / re-platformed legacy Data Warehouse. However, with legacy, things are getting more complicated.
What to do with existing historical data? How to make sure, that all the functionalities, ETL jobs, and data pipelines are migrated properly with the same behavior?
This is where ADELE is most helpful. Understanding legacy Data Warehouse, harvesting metadata, and providing automated generation capabilities for Azure Data Factory and/or Pipelines.
Want to see ADELE in action? Book the demo or Proof of Concept now.