NOTE: β CI/CD Integration: This repository now includes static code analysis via Pylint and quality gate validation via SonarCloud..
This repository contains a sample dbt project that demonstrates how to model and transform the GO Sales IBM sample data using dbt (data build tool) with DuckDB as the database engine.
| Version | Date | Author | Description of Change |
|---|---|---|---|
| 1.0 | 2025-05-18 | Manzar Ahmed | Initial Version |
| 1.1 | 2025-06-12 | Manzar Ahmed | Added DET and MRT model sections |
| 1.2 | 2025-06-22 | Manzar Ahmed | Added section with dbt docs |
| 1.3 | 2025-06-23 | Manzar Ahmed | Added section High level design |
| 1.4 | 2025-07-23 | Manzar Ahmed | Added section Low level design |
- Background
- High Level Design
- Run dbt Models
3.1. Raw Models
3.2. Staging Models
3.3. Detailed Models (DET)
3.4. Mart Models (MRT) - Visualise Lineage with dbt Docs
- Low-Level Design (LLD)
5.1.1. Models - raw layer
5.1.2. Models - stg layer
5.1.3. Models - det layer
5.1.4. Models - mrt layer
5.1.5. Macros
5.1.6. Python Utils
NOTE: This sample project utlizes the GO Sales IBM sample data to demonstrate dbt modeling techniques. It is designed to be run with DuckDB as the database engine, but can be adapted for other engines like Snowflake, BigQuery, or Redshift with minor modifications to the dbt profiles and SQL syntax. The GO Sales dataset is a fictional retail dataset that simulates sales operations for a global retailer, and available under the MIT License.
The GO Sales IBM sample data is a fictional retail dataset designed to demonstrate business analytics, reporting, and data warehousing techniques. It simulates sales operations for a global retailer and contains various interconnected tables that model business domains. A copy of the entity relationship diagram is provided below for reference.
The dbt-core project follows a layered design architecture that systematically structures data transformations through a series of increasingly refined stages. This layered approach promotes modularity, reusability, and transparency in the data pipeline.
-
Raw Layer (
raw)- This layer ingests raw data directly from the MySQL DB instance.
- It performs minimal transformation (if any), mainly focused on standardizing data types and storing source extracts as-is.
-
Staging Layer (
stg)- This layer acts as a clean-up zone where raw data is normalized, renamed, and prepared for further transformation.
- Typical operations include renaming columns to snake_case, handling nulls, and deduplicating rows.
-
Detailed Layer (
det)- This is the business logic layer, where transformations are applied to derive meaningful metrics and dimensions.
- It includes joins, surrogate key generation, Slowly Changing Dimensions (SCD), and other enrichment logic.
- The detailed layer will build a star schema for the go sales data
+------------------+ +--------------+
|t_dim_order_method| |t_dim_products|
+------------------+ +--------------+
\ /
\ /
+-----------+
|t_fct_sales|
+-----------+
/ \
/ \
+-----------+ +---------------+
|t_dim_dates| |t_dim_retailers|
+-----------+ +---------------+
- Mart Layer (
mrt)- This final layer presents the data in a business-consumable format.
- It aggregates and filters data for reporting, dashboards, and analytics use cases.
Each layer feeds into the next, ensuring that transformations are traceable and logically separated.
This section outlines how to set up your environment and run different layers of the GO Sales dbt models using convenient shell commands.
Create Aliases & Global Vars
First, set the required environment variables to specify the paths for the dbt project and profile directory. Also, set the PYTHONPATH so that any custom Python modules within your project can be properly resolved.
export DBT_PROJ_DIR='/home/u0001/dbt-core-sample-duckdb'
export DBT_PROFILE_DIR='/home/u0001/dbt-core-sample-duckdb'
export PYTHONPATH=$DBT_PROJ_DIRCreate dbt run go sales alias shorthand
Define a shell alias to simplify running the dbt project with the correct profile and target. This avoids repeating long command strings every time you want to run a model.
alias dbt_run_go_sales='dbt run --project-dir $DBT_PROJ_DIR --profiles-dir $DBT_PROFILE_DIR --target go_sales'Run all models tagged with GO_SALES_RAW. These models typically ingest and prepare raw data, often performing minimal transformations.
dbt_run_go_sales --select tag:GO_SALES_RAWRun staging layer models tagged with GO_SALES_STG. These models clean and standardize raw data into a more analysis-ready format.
dbt_run_go_sales --select tag:GO_SALES_STGRun detailed transformation models tagged with GO_SALES_DET. These models perform more complex business logic and enrichment tasks.
dbt_run_go_sales --select tag:GO_SALES_DETRun mart layer models tagged with GO_SALES_MRT. These are the final outputs optimized for reporting and analytics.
dbt_run_go_sales --select tag:GO_SALES_MRTdbt provides an interactive lineage graph that visually represents how models are built from raw data through staging, transformation, and into marts. This helps developers, analysts, and stakeholders understand data dependencies and relationships.
To generate and view the lineage diagram:
Step 1: Generate dbt docs
dbt docs generate --project-dir $DBT_PROJ_DIR --profiles-dir $DBT_PROFILE_DIR --target go_salesStep 2: Serve dbt docs
dbt docs serve --project-dir $DBT_PROJ_DIR --profiles-dir $DBT_PROFILE_DIR --target go_salesThis will start a local web server and open a browser where you can explore:
- Model-level documentation
- Column-level metadata
- Tags and descriptions
- The DAG (Directed Acyclic Graph) lineage diagram
The diagram includes paths from:
- Raw sources (e.g., t_raw_go_daily_sales)
- Through staging models (e.g., t_stg_go_daily_sales)
- Into dimensional tables (e.g., t_dim_products)
- Finally into fact and mart tables (e.g., t_fct_sales β t_mrt_sales)
The following diagram provides a visual representation of the dbt model lineage for the GO Sales project, illustrating how raw data flows through staging, dimension, fact, and mart layers:
| # | Object Name | Object Type | Description |
|---|---|---|---|
| 1 | t_raw_go_1k.py | Python File | Python script for GO 1k data |
| 2 | t_raw_go_1k.yml | YAML File | Metadata/config for GO 1k |
| 3 | t_raw_go_daily_sales.py | Python File | Python script for daily sales data |
| 4 | t_raw_go_daily_sales.yml | YAML File | Metadata/config for daily sales |
| 5 | t_raw_go_methods.py | Python File | Python script for GO methods |
| 6 | t_raw_go_methods.yml | YAML File | Metadata/config for GO methods |
| 7 | t_raw_go_products.py | Python File | Python script for GO products |
| 8 | t_raw_go_products.yml | YAML File | Metadata/config for GO products |
| 9 | t_raw_go_retailers.py | Python File | Python script for GO retailers |
| 10 | t_raw_go_retailers.yml | YAML File | Metadata/config for GO retailers |
| # | Object Name | Object Type | Description |
|---|---|---|---|
| 1 | t_dim_dates.sql | SQL File | Staging logic for date dimension |
| 2 | t_dim_dates.yml | YAML File | Metadata/config for date dimension |
| 3 | t_dim_order_methods.sql | SQL File | Staging logic for order methods |
| 4 | t_dim_order_methods.yml | YAML File | Metadata/config for order methods |
| 5 | t_dim_products.sql | SQL File | Staging logic for products |
| 6 | t_dim_products.yml | YAML File | Metadata/config for products |
| 7 | t_dim_retailers.sql | SQL File | Staging logic for retailers |
| 8 | t_dim_retailers.yml | YAML File | Metadata/config for retailers |
| 9 | t_fct_sales.sql | SQL File | Staging logic for sales fact table |
| 10 | t_fct_sales.yml | YAML File | Metadata/config for sales fact table |
| # | Object Name | Object Type | Description |
|---|---|---|---|
| 1 | t_dim_dates.sql | SQL File | Detail-layer model for date dimension |
| 2 | t_dim_dates.yml | YAML File | Metadata/config for date dimension |
| 3 | t_dim_order_methods.sql | SQL File | Detail-layer model for order methods |
| 4 | t_dim_order_methods.yml | YAML File | Metadata/config for order methods |
| 5 | t_dim_products.sql | SQL File | Detail-layer model for products |
| 6 | t_dim_products.yml | YAML File | Metadata/config for products |
| 7 | t_dim_retailers.sql | SQL File | Detail-layer model for retailers |
| 8 | t_dim_retailers.yml | YAML File | Metadata/config for retailers |
| 9 | t_fct_sales.sql | SQL File | Detail-layer model for sales fact table |
| 10 | t_fct_sales.yml | YAML File | Metadata/config for sales fact table |
| # | Object Name | Object Type | Description |
|---|---|---|---|
| 1 | t_mrt_sales.sql | SQL File | Final mart model for sales |
| 2 | t_mrt_sales.yml | YAML File | Metadata/config for mart sales |
| # | Object Name | Object Type | Description |
|---|---|---|---|
| 1 | custom_schema.sql | SQL (Jinja) File | Macro to dynamically assign custom schemas based on environment or config |
| 2 | scd2_ts.sql | SQL (Jinja) File | Macro to implement SCD Type 2 logic with timestamp-based tracking |
| # | Object Name | Object Type | Description |
|---|---|---|---|
| 1 | __init__.py |
Python File | Marks the directory as a Python package |
| 2 | config.py |
Python File | Contains shared configuration values and helpers |
| 3 | db_utils.py |
Python File | Utility functions for database access and queries |



