Skip to content

πŸ›οΈ GO Sales Data Warehouse Project A dbt-powered datawarehouse project built on the IBM GO Sales sample dataset using πŸ¦† DuckDB and 🐍 Python.. It models the data through a layered design architecture - raw, staging, detail and mart layers.

Notifications You must be signed in to change notification settings

manz01/dbt-core-sample-duckdb

Repository files navigation

GOSALES CI - Flake8 & Pylint GOSALES CI - SonarCloud
SQL dbt DuckDB Python

NOTE: βœ… CI/CD Integration: This repository now includes static code analysis via Pylint and quality gate validation via SonarCloud..

GO Sales Logo

Go Sales πŸ¦† DuckDB dbt Sample Project

This repository contains a sample dbt project that demonstrates how to model and transform the GO Sales IBM sample data using dbt (data build tool) with DuckDB as the database engine.

Document Control

Version Date Author Description of Change
1.0 2025-05-18 Manzar Ahmed Initial Version
1.1 2025-06-12 Manzar Ahmed Added DET and MRT model sections
1.2 2025-06-22 Manzar Ahmed Added section with dbt docs
1.3 2025-06-23 Manzar Ahmed Added section High level design
1.4 2025-07-23 Manzar Ahmed Added section Low level design

Table of Content


NOTE: This sample project utlizes the GO Sales IBM sample data to demonstrate dbt modeling techniques. It is designed to be run with DuckDB as the database engine, but can be adapted for other engines like Snowflake, BigQuery, or Redshift with minor modifications to the dbt profiles and SQL syntax. The GO Sales dataset is a fictional retail dataset that simulates sales operations for a global retailer, and available under the MIT License.

1. Background

The GO Sales IBM sample data is a fictional retail dataset designed to demonstrate business analytics, reporting, and data warehousing techniques. It simulates sales operations for a global retailer and contains various interconnected tables that model business domains. A copy of the entity relationship diagram is provided below for reference.

GO Sales Entity Relationship Diagram

2. High Level Design

The dbt-core project follows a layered design architecture that systematically structures data transformations through a series of increasingly refined stages. This layered approach promotes modularity, reusability, and transparency in the data pipeline.

High Level Design

Layer Breakdown:

  1. Raw Layer (raw)

    • This layer ingests raw data directly from the MySQL DB instance.
    • It performs minimal transformation (if any), mainly focused on standardizing data types and storing source extracts as-is.
  2. Staging Layer (stg)

    • This layer acts as a clean-up zone where raw data is normalized, renamed, and prepared for further transformation.
    • Typical operations include renaming columns to snake_case, handling nulls, and deduplicating rows.
  3. Detailed Layer (det)

    • This is the business logic layer, where transformations are applied to derive meaningful metrics and dimensions.
    • It includes joins, surrogate key generation, Slowly Changing Dimensions (SCD), and other enrichment logic.
    • The detailed layer will build a star schema for the go sales data
 +------------------+  +--------------+
 |t_dim_order_method|  |t_dim_products|
 +------------------+  +--------------+                 
         \              /
          \            /
           +-----------+
           |t_fct_sales|
           +-----------+
           /          \
          /            \
   +-----------+    +---------------+
   |t_dim_dates|    |t_dim_retailers|
   +-----------+    +---------------+
  1. Mart Layer (mrt)
    • This final layer presents the data in a business-consumable format.
    • It aggregates and filters data for reporting, dashboards, and analytics use cases.

Each layer feeds into the next, ensuring that transformations are traceable and logically separated.

3. Run dbt Models

This section outlines how to set up your environment and run different layers of the GO Sales dbt models using convenient shell commands.

Create Aliases & Global Vars

First, set the required environment variables to specify the paths for the dbt project and profile directory. Also, set the PYTHONPATH so that any custom Python modules within your project can be properly resolved.

export DBT_PROJ_DIR='/home/u0001/dbt-core-sample-duckdb'
export DBT_PROFILE_DIR='/home/u0001/dbt-core-sample-duckdb'
export PYTHONPATH=$DBT_PROJ_DIR

Create dbt run go sales alias shorthand

Define a shell alias to simplify running the dbt project with the correct profile and target. This avoids repeating long command strings every time you want to run a model.

alias dbt_run_go_sales='dbt run --project-dir $DBT_PROJ_DIR --profiles-dir $DBT_PROFILE_DIR --target go_sales'

3.1. Raw Models

Run all models tagged with GO_SALES_RAW. These models typically ingest and prepare raw data, often performing minimal transformations.

dbt_run_go_sales --select tag:GO_SALES_RAW

3.2. Staging Models

Run staging layer models tagged with GO_SALES_STG. These models clean and standardize raw data into a more analysis-ready format.

dbt_run_go_sales --select tag:GO_SALES_STG

3.3. Detailed Models (DET)

Run detailed transformation models tagged with GO_SALES_DET. These models perform more complex business logic and enrichment tasks.

dbt_run_go_sales --select tag:GO_SALES_DET

3.4. Mart Models (MRT)

Run mart layer models tagged with GO_SALES_MRT. These are the final outputs optimized for reporting and analytics.

dbt_run_go_sales --select tag:GO_SALES_MRT

4. Visualise Lineage with dbt Docs

dbt provides an interactive lineage graph that visually represents how models are built from raw data through staging, transformation, and into marts. This helps developers, analysts, and stakeholders understand data dependencies and relationships.

To generate and view the lineage diagram:

Step 1: Generate dbt docs

dbt docs generate --project-dir $DBT_PROJ_DIR --profiles-dir $DBT_PROFILE_DIR --target go_sales

Step 2: Serve dbt docs

dbt docs serve --project-dir $DBT_PROJ_DIR --profiles-dir $DBT_PROFILE_DIR --target go_sales

This will start a local web server and open a browser where you can explore:

  • Model-level documentation
  • Column-level metadata
  • Tags and descriptions
  • The DAG (Directed Acyclic Graph) lineage diagram

The diagram includes paths from:

  • Raw sources (e.g., t_raw_go_daily_sales)
  • Through staging models (e.g., t_stg_go_daily_sales)
  • Into dimensional tables (e.g., t_dim_products)
  • Finally into fact and mart tables (e.g., t_fct_sales β†’ t_mrt_sales)

The following diagram provides a visual representation of the dbt model lineage for the GO Sales project, illustrating how raw data flows through staging, dimension, fact, and mart layers:

5. Low-Level Design (LLD)

5.1.1. Models - raw layer

# Object Name Object Type Description
1 t_raw_go_1k.py Python File Python script for GO 1k data
2 t_raw_go_1k.yml YAML File Metadata/config for GO 1k
3 t_raw_go_daily_sales.py Python File Python script for daily sales data
4 t_raw_go_daily_sales.yml YAML File Metadata/config for daily sales
5 t_raw_go_methods.py Python File Python script for GO methods
6 t_raw_go_methods.yml YAML File Metadata/config for GO methods
7 t_raw_go_products.py Python File Python script for GO products
8 t_raw_go_products.yml YAML File Metadata/config for GO products
9 t_raw_go_retailers.py Python File Python script for GO retailers
10 t_raw_go_retailers.yml YAML File Metadata/config for GO retailers

5.1.2. Models - stg layer

# Object Name Object Type Description
1 t_dim_dates.sql SQL File Staging logic for date dimension
2 t_dim_dates.yml YAML File Metadata/config for date dimension
3 t_dim_order_methods.sql SQL File Staging logic for order methods
4 t_dim_order_methods.yml YAML File Metadata/config for order methods
5 t_dim_products.sql SQL File Staging logic for products
6 t_dim_products.yml YAML File Metadata/config for products
7 t_dim_retailers.sql SQL File Staging logic for retailers
8 t_dim_retailers.yml YAML File Metadata/config for retailers
9 t_fct_sales.sql SQL File Staging logic for sales fact table
10 t_fct_sales.yml YAML File Metadata/config for sales fact table

5.1.3. Models - det layer

# Object Name Object Type Description
1 t_dim_dates.sql SQL File Detail-layer model for date dimension
2 t_dim_dates.yml YAML File Metadata/config for date dimension
3 t_dim_order_methods.sql SQL File Detail-layer model for order methods
4 t_dim_order_methods.yml YAML File Metadata/config for order methods
5 t_dim_products.sql SQL File Detail-layer model for products
6 t_dim_products.yml YAML File Metadata/config for products
7 t_dim_retailers.sql SQL File Detail-layer model for retailers
8 t_dim_retailers.yml YAML File Metadata/config for retailers
9 t_fct_sales.sql SQL File Detail-layer model for sales fact table
10 t_fct_sales.yml YAML File Metadata/config for sales fact table

5.1.4. Models - mrt layer

# Object Name Object Type Description
1 t_mrt_sales.sql SQL File Final mart model for sales
2 t_mrt_sales.yml YAML File Metadata/config for mart sales

5.1.5. Macros

# Object Name Object Type Description
1 custom_schema.sql SQL (Jinja) File Macro to dynamically assign custom schemas based on environment or config
2 scd2_ts.sql SQL (Jinja) File Macro to implement SCD Type 2 logic with timestamp-based tracking

5.1.6. Python Utils

# Object Name Object Type Description
1 __init__.py Python File Marks the directory as a Python package
2 config.py Python File Contains shared configuration values and helpers
3 db_utils.py Python File Utility functions for database access and queries

About

πŸ›οΈ GO Sales Data Warehouse Project A dbt-powered datawarehouse project built on the IBM GO Sales sample dataset using πŸ¦† DuckDB and 🐍 Python.. It models the data through a layered design architecture - raw, staging, detail and mart layers.

Topics

Resources

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages