Skip to content

Conversation

@whitehackr
Copy link
Owner

@whitehackr whitehackr commented Aug 21, 2025

Overview

Establish the complete core data pipeline for Flit's e-commerce analytics platform, transforming TheLook dataset and synthetic experiment assignments into analytics-ready models designed to support comprehensive experimentation analytics and ML workloads.

Changes Made

Project Foundation

  • Add dbt_project.yml with staging/intermediate/marts layer configuration
  • Configure BigQuery connection and dataset targeting (flit_staging, flit_marts)
  • Set up organized folder structure for scalable model development
  • Add .gitkeep files to maintain directory structure

Data Models

  • Staging Layer: Clean, standardized data models

    • stg_customers - Customer profiles with demographic and behavioral attributes
    • stg_orders - Order transactions with derived business metrics
    • stg_products - Product catalog with category hierarchies
    • stg_experiment_assignments - Experiment variant assignments from synthetic data
  • Core Marts: Business-ready analytics tables

    • dim_customers - Customer 360° with lifetime metrics and segmentation
    • fct_orders - Order fact table with enriched transaction data
    • customer_ltv_base - Lifetime value foundations and cohort analysis
    • product_performance - Product analytics for business intelligence

Experimentation Foundation

  • Experiment-Ready Models: All customer and order models include experiment context
  • Statistical Readiness: Models designed for A/B testing analysis (user-level aggregations, pre/post experiment windows)
  • Variant Tracking: Clean joining of experiment assignments with behavioral outcomes

Data Quality & Testing

  • Source freshness tests for TheLook datasets
  • Uniqueness and referential integrity tests
  • Business logic validation tests
  • Documentation for all models and columns

Architecture Design

Models are strategically designed to support downstream analytics:

  • Experiment-aware customer segmentation for statistical analysis
  • Behavioral aggregations optimized for A/B testing calculations
  • Temporal features for ML model training (tenure, recency, frequency)
  • Normalized business metrics for consistent reporting across use cases

Testing & Validation

  • All models run successfully against BigQuery
  • Data quality tests pass with zero failures
  • Staging models correctly reference TheLook public datasets
  • Experiment assignments properly join with customer behavior
  • Marts models deliver accurate business metrics
  • Schema documentation generated for all models

Business Impact

Enables data-driven decision making through:

  • Customer 360° analytics - Complete customer journey visibility
  • Product performance insights - Revenue and engagement metrics
  • Business KPI tracking - Standardized metrics across teams
  • Experiment foundation - Statistical analysis-ready datasets
  • ML pipeline preparation - Feature-rich customer behavioral data

🔗 Related Work

This PR establishes the data foundation for advanced analytics capabilities:

  • Issue #[TBD]: Comprehensive A/B testing analytics framework (includes basic statistical analysis, CUPED, sequential testing, and interactive dashboards)
  • Issue #[TBD]: ML feature engineering for churn and LTV prediction models

Deployment Notes

  • Models deploy to flit_staging and flit_marts datasets
  • Incremental refresh strategy configured for large fact tables
  • Cost optimization: partitioned tables for time-series data
  • Experiment assignment data refreshed weekly via automated pipeline

- Add dbt_project.yml with staging/marts configuration
- Set up folder structure for data transformations
- Configure BigQuery connection and datasets
- Add .gitkeep files to maintain empty directories
@whitehackr whitehackr self-assigned this Aug 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants