Data Developer & Analytics Engineer | Specialist in Snowflake, Databricks, and Modern Data Stack Orchestration.
I build reliable data systems that bridge the gap between messy raw data and actionable business intelligence. My focus is on the Modern Data Stack and implementing Medallion Architecture patterns.
| Category | Tools |
|---|---|
| Languages | Python, SQL, Bash |
| Data Orchestration | Apache Airflow, Astronomer Cosmos, dbt (Core/Cloud) |
| Data Warehousing | Databricks, Snowflake, Postgres, MySQL |
| Big Data | Apache Spark, PySpark, Pandas |
| Infrastructure | Docker, MinIO, S3, GitHub |
| Visualization | Lightdash, Looker |
๐ง๐ท Olist: Brazilian E-Commerce Analytics Pipeline๐ฏ
The objective of this project is to analyze the logistics performance and customer satisfaction of the Brazilian e-commerce giant, Olist.
-
Data Architecture & Tech Stack: This project follows the Medallion Architecture principles, utilizing a modular flow within dbt Cloud and Snowflake.
-
๐ Key Business Metrics: The primary KPI for operational excellence. A "Perfect Order" is defined as an order that meets three strict criteria: Status, Logistics and Satisfaction.
-
๐ Features & Engineering Highlights: Macro-Driven Cleaning, Customer Identity Resolution and Historical Tracking.
A containerized ELT pipeline transforming raw TPC-H data into business-ready Marts.
-
Orchestration: Leveraged Astronomer Cosmos to render dbt models as native Airflow tasks for granular monitoring.
-
Architecture: Implemented a three-layer structure (Staging, Intermediate, Marts) with dynamic schema generation via custom dbt macros.
-
Data Governance: Automated 15+ data quality tests and persisted technical metadata (primary keys/descriptions) directly to Snowflake.
-
Observability: Integrated custom failure callbacks for real-time alerting.
โฝ๏ธ UK Fuel Prices Data Pipeline
A production-grade pipeline designed to track and model fuel prices across the UK.
-
Ingestion: Scrapes data from multiple retailer APIs using Apache Spark.
-
Orchestration: Fully managed via Apache Airflow with custom retry policies and failure alerts.
-
Modeling: Implements SCD Type 2 (Slowly Changing Dimensions) using dbt to maintain historical price accuracy.
-
Testing: Automated data quality and referential integrity checks.
- ๐ In Progress: Databricks Certified Associate Developer for Apache Spark.
- ๐ญ Researching: Query optimization and Data Automation.
- ๐ Reading:

