Build core dbt data pipeline with staging and marts models #2

whitehackr · 2025-08-21T13:49:06Z

Overview

Establish the complete core data pipeline for Flit's e-commerce analytics platform, transforming TheLook dataset and synthetic experiment assignments into analytics-ready models designed to support comprehensive experimentation analytics and ML workloads.

Changes Made

Project Foundation

Add dbt_project.yml with staging/intermediate/marts layer configuration
Configure BigQuery connection and dataset targeting (flit_staging, flit_marts)
Set up organized folder structure for scalable model development
Add .gitkeep files to maintain directory structure

Data Models

Staging Layer: Clean, standardized data models
- stg_customers - Customer profiles with demographic and behavioral attributes
- stg_orders - Order transactions with derived business metrics
- stg_products - Product catalog with category hierarchies
- stg_experiment_assignments - Experiment variant assignments from synthetic data
Core Marts: Business-ready analytics tables
- dim_customers - Customer 360° with lifetime metrics and segmentation
- fct_orders - Order fact table with enriched transaction data
- customer_ltv_base - Lifetime value foundations and cohort analysis
- product_performance - Product analytics for business intelligence

Experimentation Foundation

Experiment-Ready Models: All customer and order models include experiment context
Statistical Readiness: Models designed for A/B testing analysis (user-level aggregations, pre/post experiment windows)
Variant Tracking: Clean joining of experiment assignments with behavioral outcomes

Data Quality & Testing

Source freshness tests for TheLook datasets
Uniqueness and referential integrity tests
Business logic validation tests
Documentation for all models and columns

Architecture Design

Models are strategically designed to support downstream analytics:

Experiment-aware customer segmentation for statistical analysis
Behavioral aggregations optimized for A/B testing calculations
Temporal features for ML model training (tenure, recency, frequency)
Normalized business metrics for consistent reporting across use cases

Testing & Validation

All models run successfully against BigQuery
Data quality tests pass with zero failures
Staging models correctly reference TheLook public datasets
Experiment assignments properly join with customer behavior
Marts models deliver accurate business metrics
Schema documentation generated for all models

Business Impact

Enables data-driven decision making through:

Customer 360° analytics - Complete customer journey visibility
Product performance insights - Revenue and engagement metrics
Business KPI tracking - Standardized metrics across teams
Experiment foundation - Statistical analysis-ready datasets
ML pipeline preparation - Feature-rich customer behavioral data

🔗 Related Work

This PR establishes the data foundation for advanced analytics capabilities:

Issue #[TBD]: Comprehensive A/B testing analytics framework (includes basic statistical analysis, CUPED, sequential testing, and interactive dashboards)
Issue #[TBD]: ML feature engineering for churn and LTV prediction models

Deployment Notes

Models deploy to flit_staging and flit_marts datasets
Incremental refresh strategy configured for large fact tables
Cost optimization: partitioned tables for time-series data
Experiment assignment data refreshed weekly via automated pipeline

- Add dbt_project.yml with staging/marts configuration - Set up folder structure for data transformations - Configure BigQuery connection and datasets - Add .gitkeep files to maintain empty directories

… into data/dbt-models

Initialize dbt project structure for Flit data platform

ea52458

- Add dbt_project.yml with staging/marts configuration - Set up folder structure for data transformations - Configure BigQuery connection and datasets - Add .gitkeep files to maintain empty directories

whitehackr self-assigned this Aug 21, 2025

Kevin Mugweru added 2 commits September 8, 2025 11:17

Update branch with lates updates

c10531a

Merge branch 'main' of https://github.com/whitehackr/flit-data-platform…

9038f61

… into data/dbt-models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Build core dbt data pipeline with staging and marts models #2

Build core dbt data pipeline with staging and marts models #2

Uh oh!

whitehackr commented Aug 21, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Build core dbt data pipeline with staging and marts models #2

Are you sure you want to change the base?

Build core dbt data pipeline with staging and marts models #2

Uh oh!

Conversation

whitehackr commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Changes Made

Project Foundation

Data Models

Experimentation Foundation

Data Quality & Testing

Architecture Design

Testing & Validation

Business Impact

🔗 Related Work

Deployment Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

whitehackr commented Aug 21, 2025 •

edited

Loading