This pipeline is designed to transfer data from Azure Data Lake Storage Gen2 (ADLS Gen2) to Azure Blob Storage using Azure Data Factory (ADF). The data is stored in CSV format in both source and target.
ADLS Gen2 → ADF → Blob Storage
Before setting up the pipeline, ensure you have the following services available in Azure:
- Azure Data Lake Storage Gen2 (ADLS Gen2)
- Azure Blob Storage
- Azure Data Factory (ADF)
- Required permissions to create Linked Services and Datasets in ADF
Azure services are independent and do not communicate internally by default. Linked Services enable connections between them.
- Source: ADLS Gen2
- Linked Service Name:
ls_adlsgen2_b21blobgen2storage
- Authentication: Managed Identity / Service Principal
- Target: Blob Storage
- Linked Service Name:
ls_blob_b21blobstorage2
ls_blob_b21blobstorage2
→ [Linked Service]_[Service Type]_[Service Name]
ls
→ Linked Serviceblob
→ Blob Storageb21blobstorage2
→ Storage account name
A Dataset represents the data structure (file format, storage path) used in the pipeline.
- Source: ADLS Gen2
- File Format: CSV
- Dataset Name:
ds_adlsgen2_b21blobgen2storage_source
- Path: Container / Directory Path
- Target: Blob Storage
- File Format: CSV
- Dataset Name:
ds_blob_b21blobstorage2_emp_tgt
ds_blob_b21blobstorage2_emp_tgt
→ [Dataset]_[Service Type]_[Service Name]_[Container Name]
ds
→ Datasetblob
→ Blob Storageb21blobstorage2
→ Storage account nameemp_tgt
→ Container name
- Pipeline Name:
pl_adlsgen2_blob
- Purpose: Read data from ADLS Gen2 and write to Blob Storage
- Activity Used:
Copy Data Activity
-
Source: ADLS Gen2 Dataset (
ds_adlsgen2_b21blobgen2storage_source
) -
Destination: Blob Storage Dataset (
ds_blob_b21blobstorage2_emp_tgt
) -
Using List of Files Option:
In Copy Activity, using the "List of Files" option to specify the `source_list` file, which contains the list of files to be loaded into the target.
My container also have other files apart from intended source file, so list of files option helps us to load only specified files.
- Debug Mode: Test the pipeline before deployment
- Trigger Now: Immediate execution of the pipeline
- ADF triggers the pipeline
- Reads files from ADLS Gen2
- Copies data to Blob Storage
- Completion status logged in ADF monitoring
📂 ADF-ADLSGen2-Blob-Pipeline
├── 📂 Pipeline # JSON export files of ADF pipeline
├── 📂 Images # Flow diagrams, linked services, and results
├── 📂 Datasets # JSON export files of ADF datasets
├── source_list.txt # File containing the list of source data files for processing
└── README.md ← (This file)
🔗 Author: Naveen Madala
📧 Contact: madalanaveen9@gmail.com
🔗 LinkedIn: https://www.linkedin.com/in/madalanaveen