This project builds a production-style analytics mart from raw transactional retail data using dbt.
Using the UCI Online Retail dataset (2010–2011 UK e-commerce transactions), I transform raw invoice line items into a clean, tested star schema and generate revenue reporting models with built-in data quality enforcement.
This project demonstrates how modern Analytics Engineers:
- Separate raw ingestion from transformation logic
- Define clear model grain and data contracts
- Implement automated data quality checks
- Build documented, reproducible analytics pipelines
- Design warehouse-portable dbt projects (DuckDB → Snowflake)
Retail transactional data often contains:
- Returns and cancellations
- Negative quantities
- Inconsistent identifiers
- Revenue miscalculations due to unhandled edge cases
Without proper modeling and validation, downstream dashboards and financial reporting can be incorrect.
This project solves that by:
- Standardizing raw data in a staging layer
- Modeling clear fact and dimension tables
- Explicitly handling cancellation logic
- Enforcing integrity via dbt tests
- Producing trustworthy revenue reporting outputs
Raw Layer (Seed)
- raw.online_retail
- Grain: one row per invoice line item
Staging Layer
- stg_online_retail
- Typed and standardized fields Derived fields:
- invoice_date
- line_revenue
- is_cancellation (business rule logic)
Mart Layer (Star Schema)
Dimensions
- dim_customers — one row per customer
- dim_products — one row per product (StockCode)
Fact
- fct_order_items — one row per invoice line item
- Revenue cleaned and business-validated
Reporting
- rpt_daily_revenue Aggregated daily revenue excluding cancellation rows
- dbt Core
- DuckDB (local analytics warehouse)
- Snowflake-ready configuration
- SQL
- YAML (schema + test configuration)
- Git / GitHub
Dataset Source: UCI ML Repository – Online Retail
git clone <your-repo-url>
cd retail_analytics_dbtpython -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install dbt-core dbt-duckdb##3️⃣ Configure dbt profile Create ~/.dbt/profiles.yml:
retail_analytics_dbt:
target: dev
outputs:
dev:
type: duckdb
path: "retail.duckdb"Verify connection:
dbt debugdbt seed
dbt builddbt docs generate
dbt docs serveThis project treats data tests as contracts, not afterthoughts.
Generic Tests
- unique and not_null on dimension keys
- relationships tests between fact and dimensions
Business Logic Test
Custom SQL test:
- Ensures non-cancellation revenue is never negative
Cancellation Handling Logic
The dataset contains returns that appear as:
- Invoice numbers beginning with “C”
- Negative quantities
- Negative revenue patterns
The staging layer introduces a derived is_cancellation flag to ensure:
- Revenue reporting excludes returns
- Downstream aggregates are financially accurate
✔ Built a fully reproducible analytics mart from raw retail data ✔ Enforced primary and foreign key integrity ✔ Identified and corrected dataset-specific revenue edge cases ✔ Designed project for seamless warehouse portability ✔ Full dataset (~540k rows) builds locally in under 1 second
This pipeline ensures revenue metrics are reliable before reaching BI dashboards or executive reporting.
- Clearly defining model grain prevents metric inflation
- Data tests should reflect business logic, not just schema constraints
- Cancellation logic must be explicitly modeled in retail datasets
- Warehouse abstraction in dbt enables seamless platform migration
- Lineage documentation improves transparency and trust
This is not a dashboard exercise.
It demonstrates:
- Production-style analytics engineering practices
- Version-controlled transformation logic
- Data contract enforcement
- Business-aware modeling
- Reproducible local development
- Clean migration path to enterprise warehouse
Anju Mercian Platform Engineer | Data Platform | Modern Data Stack