🚀 Azure_DE_Pipeline

This project demonstrates a complete end-to-end Data Engineering pipeline built using various Azure services. It covers the flow of data from on-premises sources to cloud-based transformation, storage, and visualization using modern tools like Azure Data Factory, Azure Databricks, Azure Synapse Analytics, and Power BI.

🧠 Project Architecture

This pipeline follows the modern Medallion Architecture approach:

Ingestion via ADF from on-prem SQL Server to Azure Data Lake Gen2.
Transformation from Bronze → Silver → Gold layers using Databricks notebooks.
Loading & Reporting into Synapse and Power BI for data visualization.
Security & Governance managed using Azure Key Vault and Active Directory.

📌 Project Overview

This project simulates a real-world cloud data pipeline setup, designed to handle:

Structured data from an on-prem SQL Server source
Data orchestration and automation using Azure Data Factory
Transformation logic in notebooks using PySpark in Azure Databricks
Scalable storage and analytics using Synapse Analytics
Dashboarding and reporting with Power BI
Secret and credential management via Azure Key Vault

It showcases data engineering best practices like modular pipeline design, separation of compute/storage, and secure resource access.

🛠️ Tools & Technologies Used

Technology	Purpose
Azure Data Factory	Orchestrate and automate data pipelines
Azure Databricks	Clean and transform data using PySpark
Azure Synapse	Analyze and store structured data
SQL Server (SSMS)	On-prem data source (AdventureWorksLT2022)
Power BI	Visualize data and build dashboards
Azure Storage	Staging & lake storage using Gen2
Azure Key Vault	Store credentials securely
Azure AD	Access & identity management
GitHub	Version control and project repository

🔄 Pipeline Flow

Source: On-prem SQL Server database (AdventureWorksLT2022).
Ingestion: ADF ingests raw data into the Bronze Layer (Data Lake).
Transformation: Databricks processes data → Silver & Gold Layers.
Loading: Gold-layer data loaded into Synapse.
Visualization: Power BI connects to Synapse for reporting.
Orchestration: ADF coordinates the end-to-end pipeline execution.

📈 Power BI Dashboard Highlights

✅ Total Customers
✅ Total Sales
✅ Sales by Product Category
✅ Customer Gender Distribution
✅ Interactive filters & slicers

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
DataFactory		DataFactory
Databricks_notebooks		Databricks_notebooks
Images		Images
PowerBI		PowerBI
SQL		SQL
Synapse_workspace		Synapse_workspace
README.md		README.md
publish_config.json		publish_config.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Azure_DE_Pipeline

🧠 Project Architecture

📌 Project Overview

🛠️ Tools & Technologies Used

🔄 Pipeline Flow

📈 Power BI Dashboard Highlights

📈 Power BI Dashboard Snapshots

About

Uh oh!

Releases

Packages

Languages

Pravalika1812/Azure_DE_Pipeline

Folders and files

Latest commit

History

Repository files navigation

🚀 Azure_DE_Pipeline

🧠 Project Architecture

📌 Project Overview

🛠️ Tools & Technologies Used

🔄 Pipeline Flow

📈 Power BI Dashboard Highlights

📈 Power BI Dashboard Snapshots

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages