Retail Revenue Analytics Mart (dbt + DuckDB )

📌 Overview

This project builds a production-style analytics mart from raw transactional retail data using dbt.

Using the UCI Online Retail dataset (2010–2011 UK e-commerce transactions), I transform raw invoice line items into a clean, tested star schema and generate revenue reporting models with built-in data quality enforcement.

This project demonstrates how modern Analytics Engineers:

Separate raw ingestion from transformation logic
Define clear model grain and data contracts
Implement automated data quality checks
Build documented, reproducible analytics pipelines
Design warehouse-portable dbt projects (DuckDB → Snowflake)

🎯 Business Context

Retail transactional data often contains:

Returns and cancellations
Negative quantities
Inconsistent identifiers
Revenue miscalculations due to unhandled edge cases

Without proper modeling and validation, downstream dashboards and financial reporting can be incorrect.

This project solves that by:

Standardizing raw data in a staging layer
Modeling clear fact and dimension tables
Explicitly handling cancellation logic
Enforcing integrity via dbt tests
Producing trustworthy revenue reporting outputs

🏗 Data Model

Raw Layer (Seed)

raw.online_retail
Grain: one row per invoice line item

Staging Layer

stg_online_retail
Typed and standardized fields Derived fields:
invoice_date
line_revenue
is_cancellation (business rule logic)

Mart Layer (Star Schema)

Dimensions

dim_customers — one row per customer
dim_products — one row per product (StockCode)

Fact

fct_order_items — one row per invoice line item
- Revenue cleaned and business-validated

Reporting

rpt_daily_revenue Aggregated daily revenue excluding cancellation rows

🛠 Tech Stack

dbt Core
DuckDB (local analytics warehouse)
Snowflake-ready configuration
SQL
YAML (schema + test configuration)
Git / GitHub

Dataset Source: UCI ML Repository – Online Retail

🚀 How to Run Locally

1️⃣ Clone repository

git clone <your-repo-url>
cd retail_analytics_dbt

2️⃣ Create virtual environment

python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install dbt-core dbt-duckdb

##3️⃣ Configure dbt profile Create ~/.dbt/profiles.yml:

retail_analytics_dbt:
  target: dev
  outputs:
    dev:
      type: duckdb
      path: "retail.duckdb"

Verify connection:

dbt debug

5️⃣ Build the full pipeline

dbt seed
dbt build

6️⃣ View documentation & lineage

dbt docs generate
dbt docs serve

✅ Data Quality & Testing Strategy

This project treats data tests as contracts, not afterthoughts.

Generic Tests

unique and not_null on dimension keys
relationships tests between fact and dimensions

Business Logic Test

Custom SQL test:

Ensures non-cancellation revenue is never negative

Cancellation Handling Logic

The dataset contains returns that appear as:

Invoice numbers beginning with “C”
Negative quantities
Negative revenue patterns

The staging layer introduces a derived is_cancellation flag to ensure:

Revenue reporting excludes returns
Downstream aggregates are financially accurate

📊 Results & Impact

✔ Built a fully reproducible analytics mart from raw retail data ✔ Enforced primary and foreign key integrity ✔ Identified and corrected dataset-specific revenue edge cases ✔ Designed project for seamless warehouse portability ✔ Full dataset (~540k rows) builds locally in under 1 second

This pipeline ensures revenue metrics are reliable before reaching BI dashboards or executive reporting.

🧠 Key Learnings

Clearly defining model grain prevents metric inflation
Data tests should reflect business logic, not just schema constraints
Cancellation logic must be explicitly modeled in retail datasets
Warehouse abstraction in dbt enables seamless platform migration
Lineage documentation improves transparency and trust

📌 Why This Project Matters

This is not a dashboard exercise.

It demonstrates:

Production-style analytics engineering practices
Version-controlled transformation logic
Data contract enforcement
Business-aware modeling
Reproducible local development
Clean migration path to enterprise warehouse

👤 Author

Anju Mercian Platform Engineer | Data Platform | Modern Data Stack

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.dbt		.dbt
backup/example		backup/example
data		data
logs		logs
retail_analytics_dbt		retail_analytics_dbt
scripts		scripts
.DS_Store		.DS_Store
README.md		README.md
online+retail.zip		online+retail.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Retail Revenue Analytics Mart (dbt + DuckDB )

📌 Overview

🎯 Business Context

🏗 Data Model

🛠 Tech Stack

🚀 How to Run Locally

1️⃣ Clone repository

2️⃣ Create virtual environment

5️⃣ Build the full pipeline

6️⃣ View documentation & lineage

✅ Data Quality & Testing Strategy

📊 Results & Impact

🧠 Key Learnings

📌 Why This Project Matters

👤 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Retail Revenue Analytics Mart (dbt + DuckDB )

📌 Overview

🎯 Business Context

🏗 Data Model

🛠 Tech Stack

🚀 How to Run Locally

1️⃣ Clone repository

2️⃣ Create virtual environment

5️⃣ Build the full pipeline

6️⃣ View documentation & lineage

✅ Data Quality & Testing Strategy

📊 Results & Impact

🧠 Key Learnings

📌 Why This Project Matters

👤 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages