Go Sales ❄️ Snowflake dbt Sample Project

This repository contains a sample dbt project that demonstrates how to model and transform the GO Sales IBM sample data using dbt (data build tool) with ❄️ Snowflake as the database engine.

Document Control

Version	Date	Author	Description of Change
1.0	2025-07-18	Manzar Ahmed	Initial Version

NOTE: This sample project utlises the GO Sales IBM sample data to demonstrate dbt modeling techniques with Snowflake. The dbt models have been built so that they can be easilty adapted for other engines like Databricks, BigQuery, or Redshift with minor modifications to the dbt profiles and SQL syntax. The GO Sales dataset is a fictional retail dataset that simulates sales operations for a global retailer, and available under the MIT License.

Table of Content

Background
High Level Design
Load from S3 to Snowflake
Run dbt Models
4.1. Local Development Using dbt Core
4.2. Model Execution by Layer
    4.2.1. Raw Models
    4.2.2. Staging Models
    4.2.3. Detail Models
    4.2.4. Mart Models
Using Github and Visualise Lineage with dbt Docs
5.1. Push to GitHub
5.2. Sync with dbt Cloud
5.3. Lineage Views Across Platforms

1. Background

The GO Sales IBM sample data is a fictional retail dataset designed to demonstrate business analytics, reporting, and data warehousing techniques. It simulates sales operations for a global retailer and contains various interconnected tables that model business domains. This project leverages dbt to transform the data and uses ❄️ Snowflake as the target data warehouse to house the final data solution.

A copy of the GO Sales entity relationship diagram is provided below for reference.

This dbt_sample_snowflake project leverages dbt (data build tool) to transform the data and uses ❄️ Snowflake as the target cloud data warehouse for the final data solution. It re-uses the dbt models originally developed for DuckDB, adapting them through environment-specific configuration to point to a Snowflake endpoint. The original DuckDB-based project can be found here: https://github.com/manz01/dbt-core-sample-duckdb.

Rather than accessing the original MySQL database, this implementation sources raw CSV files from an S3 bucket, aligning with cloud-native, serverless architecture best practices and adopts the integration with Snowflake’s external stage capabilities.

Case Sensitivity Note

Snowflake is case-sensitive for unquoted identifiers, and by default, it stores unquoted column and table names in uppercase. To ensure cross-platform compatibility and avoid quoting issues, all column names and table identifiers in this project are standardised to uppercase.

2. High Level Design

The dbt-core project follows a layered design architecture that structures data transformations through a series of refined stages. This layered approach promotes modularity, reusability, and is common best practice approach for building scalable data pipelines.

Layer Breakdown:

Raw Layer (RAW)
- This layer ingests raw data directly from a AWS S3 Bucket.
- It performs minimal transformation (if any), mainly focused on standardizing data types and storing source extracts as-is.
Staging Layer (STG)
- This layer acts as a clean-up zone where raw data is normalized, renamed, and prepared for further transformation.
- Typical operations include renaming columns to snake_case, handling nulls, and deduplicating rows.
Detailed Layer (DET)
- This is the business logic layer, where transformations are applied to derive meaningful metrics and dimensions.
- It includes joins, surrogate key generation, Slowly Changing Dimensions (SCD), and other enrichment logic.
- The detailed layer will build a star schema for the go sales data

 +------------------+  +--------------+
 |T_DIM_ORDER_METHOD|  |T_DIM_PRODUCTS|
 +------------------+  +--------------+
         \              /
          \            /
           +-----------+
           |T_FCT_SALES|
           +-----------+
           /          \
          /            \
   +-----------+    +---------------+
   |T_DIM_DATES|    |T_DIM_RETAILERS|
   +-----------+    +---------------+

Mart Layer (MRT)
The mart layer builds business-ready, denormalised tables designed for reporting, dashboarding, and analytical consumption.

These models typically:

Combine multiple dimension and fact tables into a single wide table
Pre-join and flatten hierarchical relationships
Include calculated metrics and business-friendly labels
Are optimised for ease of use by analysts and BI tools (e.g., Tableau, Power BI)

3. Load from S3 to Snowflake

This section outlines how to securely load raw data files from an AWS S3 bucket into Snowflake using STAGE and STORAGE INTEGRATION objects.

Step 1: Grant S3 Bucket Permissions and Create AWS IAM Role

To allow Snowflake to access your S3 bucket, follow these steps:

1. Create an IAM Role in AWS named snowflake-access-role.

2. Attach a trust relationship policy that allows Snowflake to assume the role. Replace <account_locator> and with your Snowflake account details:

A trust policy (also called an assume role policy) defines which entities (principals) are allowed to assume an IAM role. In the context of Snowflake, this is essential for enabling Snowflake to access AWS resources (e.g., S3 buckets) via an IAM Role.

Purpose of the Trust Policy It does not grant permissions to access AWS resources directly.

Instead, it controls who is allowed to assume the role.

For a Snowflake storage integration, Snowflake assumes the IAM role via a temporary security credential using this trust relationship.

Where to Find or Edit the Trust Policy Log in to the AWS Console

Go to: https://console.aws.amazon.com/iam

Navigate to IAM > Roles

Click on the specific IAM Role (e.g., snowflake-access-role)

Go to the “Trust relationships” tab

Click "Edit trust policy" to review or update the trust configuration.

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Effect": "Allow",
			"Principal": {
				"AWS": "arn:aws:iam::724772052480:root"
			},
			"Action": "sts:AssumeRole",
			"Condition": {
				"StringEquals": {
					"sts:ExternalId": "YO19174_SFCRole=4_fMO0LOpe1w+L+dSzEeUZW6dWcuM="
				}
			}
		}
	]
}

Example provided below showing the aws trust policy.

3. Attach an S3 bucket policy allowing that role to access the bucket:

To view or edit the permissions policy for an IAM Role (e.g., the role used in a Snowflake storage integration), follow these steps:

Sign in to AWS Console
Visit: https://console.aws.amazon.com/iam
Navigate to IAM
In the AWS Management Console, type IAM in the search bar and click on IAM under "Security, Identity, & Compliance".
Open the Roles section
In the IAM sidebar menu, click on Roles.
Select your IAM Role
Search for the role name you used in your Snowflake storage integration (e.g., snowflake-access-role).
Click on the role name to open its details page.
View the Permissions policies
Under the Permissions tab, you’ll find: Inline policies (specific to this role) Managed policies (shared or AWS-provided)
Inspect or Edit Policy
Click on a policy name to view the JSON definition.
If you want to attach a new policy, click “Add permissions” > “Attach policies” or create a custom one.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowSnowflakeAccess",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::123456789012:role/snowflake-access-role"
      },
      "Action": ["s3:GetObject", "s3:ListBucket"],
      "Resource": [
        "arn:aws:s3:::go-sales",
        "arn:aws:s3:::go-sales/*"
      ]
    }
  ]
}

The permissions policy is under IAM Roles, shown below.

Step 2: Create Snowflake Storage Integration

In Snowflake, create a STORAGE INTEGRATION object to connect to your S3 bucket:

CREATE OR REPLACE STORAGE INTEGRATION gosales_s3
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = S3
  ENABLED = TRUE
  STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::123456789012:role/snowflake-access-role'
  STORAGE_ALLOWED_LOCATIONS = ('s3://go-sales/');

Step 3: Inspect Storage Integration Metadata

To verify the storage integration and retrieve the values needed for the trust policy:

DESC INTEGRATION gosales_s3;

This returns details like:

property	property_value
ENABLED	true
STORAGE_PROVIDER	S3
STORAGE_ALLOWED_LOCATIONS	s3://go-sales/
STORAGE_AWS_IAM_USER_ARN	arn:aws:iam::724772052480:user/externalstages/cilen90000
STORAGE_AWS_ROLE_ARN	arn:aws:iam::123456789012:role/snowflake-access-role
STORAGE_AWS_EXTERNAL_ID	YO19174_SFCRole=4_fMO0LOpe1w+L+dSzEeUZW6dWcuM=

Use these values to update the AWS trust policy correctly.

Step 4: Create the Snowflake Stage

Now link the S3 bucket to a named external stage:

USE DATABASE gos01;

CREATE OR REPLACE STAGE go_sales_stage
  URL = 's3://go-sales/'
  STORAGE_INTEGRATION = gosales_s3;

Step 5: List the Files in the Stage

This verifies that Snowflake can see the files in your S3 bucket:

LIST @go_sales_stage;

Example output:

name	size	md5	last_modified
s3://go-sales/go_1k.csv	25923	8a5042551bbd4cb4d5058511e059e5dc	Wed, 9 Jul 2025 19:17:06 GMT
s3://go-sales/go_daily_sales.csv	6058951	960eebdea858c6e4f88ce90713f97cd7	Wed, 9 Jul 2025 19:17:10 GMT
s3://go-sales/go_methods.csv	143	e1b0bbcb27e61d22c86db45e2aea75dd	Wed, 9 Jul 2025 19:17:10 GMT
s3://go-sales/go_products.csv	20402	476c09b0e383bebe7cf861baedffe662	Wed, 9 Jul 2025 19:17:10 GMT
s3://go-sales/go_retailers.csv	26334	2270f9fa1c82d1a0c44e2cfa15577f58	Wed, 9 Jul 2025 19:17:11 GMT

Step 6: Load Raw Tables via dbt

The following dbt operations run macros that create and load Snowflake tables from the corresponding S3 files:

dbt run-operation load_raw_go_1k
dbt run-operation load_raw_go_methods
dbt run-operation load_raw_go_products
dbt run-operation load_raw_go_retailers
dbt run-operation load_raw_go_daily_sales

The below macro load_csv_to_snowflake.sql handles the loading, example shown for raw_go_products123:

{% macro load_csv_to_snowflake_and_create_table(table_name, file_name, columns, stage) %}
    {% set full_table = target.database ~ '.' ~ target.schema ~ '.' ~ table_name %}

    {% do run_query("CREATE OR REPLACE TABLE " ~ full_table ~ " (" ~ columns ~ ");") %}

    {% do run_query("COPY INTO " ~ full_table ~ "
        FROM @" ~ stage ~ "/" ~ file_name ~ "
        FILE_FORMAT = (
            TYPE = 'CSV',
            FIELD_OPTIONALLY_ENCLOSED_BY = '\"',
            SKIP_HEADER = 1,
            DATE_FORMAT = 'DD/MM/YYYY',
            ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE
        )
        ON_ERROR = 'CONTINUE';") %}

    {{ return("Table created and loaded: " ~ full_table) }}
{% endmacro %}

{% macro load_raw_go_products() %}
    {% set table_name = "T_RAW_GO_PRODUCTS" %}
    {% set file_name = "go_products.csv" %}
    {% set columns = """
        PRODUCT_NUMBER BIGINT,
        PRODUCT_LINE VARCHAR,
        PRODUCT_TYPE VARCHAR,
        PRODUCT VARCHAR,
        PRODUCT_BRAND VARCHAR,
        PRODUCT_COLOR VARCHAR,
        UNIT_COST DOUBLE,
        UNIT_PRICE DOUBLE
    """ %}
    {% set stage = "GOS01.RAW.GO_SALES_STAGE" %}

    {{ load_csv_to_snowflake_and_create_table(table_name, file_name, columns, stage) }}
{% endmacro %}

The code that runs for the above products data is shown below:

CREATE OR REPLACE TABLE GOS01.RAW.T_RAW_GO_PRODUCTS (
    PRODUCT_NUMBER BIGINT,
    PRODUCT_LINE VARCHAR,
    PRODUCT_TYPE VARCHAR,
    PRODUCT VARCHAR,
    PRODUCT_BRAND VARCHAR,
    PRODUCT_COLOR VARCHAR,
    UNIT_COST DOUBLE,
    UNIT_PRICE DOUBLE
);

COPY INTO GOS01.RAW.T_RAW_GO_PRODUCTS
FROM @GOS01.RAW.GO_SALES_STAGE/go_products.csv
FILE_FORMAT = (
    TYPE = 'CSV',
    FIELD_OPTIONALLY_ENCLOSED_BY = '\"',
    SKIP_HEADER = 1,
    DATE_FORMAT = 'DD/MM/YYYY',
    ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE
)
ON_ERROR = 'CONTINUE';

Explanation:

Line	Purpose
`CREATE OR REPLACE TABLE`	Creates the table structure for raw product data.
`COPY INTO`	Loads the data from the staged CSV in S3 into the Snowflake table.
`@GOS01.RAW.GO_SALES_STAGE/go_products.csv`	Refers to the staged file in the external stage created earlier.
`FILE_FORMAT`	Specifies CSV parsing options including delimiters, header rows, and date format.
`ON_ERROR = 'CONTINUE'`	Allows the load to continue even if some rows fail (useful during raw ingestion).

4. Run dbt Models

This section outlines how to build and execute dbt models using dbt Core locally, how the models are committed to GitHub, and how they are synchronised with dbt Cloud for scheduled or ad-hoc runs via the web interface.

4.1 Local Development Using dbt Core

During development, dbt models are created and tested locally using VSCode and dbt Core. This allows developers to:

Rapidly test and debug SQL models
Use Git for version control
Allow developers to utilise VSCode that have preference for devloping with VSCode IDE

Set Environment Variables

First, set the required environment variables to specify the paths for the dbt project and profile directory.

export DBT_PROJ_DIR='/home/u0001/dbt_sample_snowflake'
export DBT_PROFILE_DIR='/home/u0001/dbt_sample_snowflake'

Create dbt Run Alias (Shorthand)

Define a shell alias to simplify running the dbt project with the correct profile and target. This avoids repeating long command strings every time you want to run a model.

alias dbt_run_go_sales='dbt run --project-dir $DBT_PROJ_DIR --profiles-dir $DBT_PROFILE_DIR --target go_sales'

4.2. Model Execution by Layer

Use tags to run models grouped by their logical layer:

4.2.1. Raw Models

Load the source data from S3 bucket using dbt run-operation that invokes the dbt macro:

dbt run-operation load_raw_go_1k
dbt run-operation load_raw_go_methods
dbt run-operation load_raw_go_products
dbt run-operation load_raw_go_retailers
dbt run-operation load_raw_go_daily_sales

4.2.2. Staging Models

The staging models apply snowflake naming convention, data quality checks and deduplication logic:

dbt_run_go_sales --select tag:GO_SALES_STG

4.2.3. Detailed Models (DET)

The detail layer handles the transformation logic for Slowly Changing Dimensions (SCDs) and builds the fact table through a surrogate lookup:

dbt_run_go_sales --select tag:GO_SALES_DET

4.2.4. Mart Models (MRT)

The mart layer builds business-ready transactional table by combining the fact table with the dimension tables.

dbt_run_go_sales --select tag:GO_SALES_MRT

5. Using Github and Visualise Lineage with dbt Docs

5.1. Push to GitHub

Once tested locally:

Commit and push changes to the GitHub repository.
GitHub serves as the central source of truth for:
- dbt models
- Configuration files
- Documentation and macros

This enables version control, peer reviews, and change history tracking.

5.2. Sync with dbt Cloud

After pushing to GitHub:

You must click “Pull” in the dbt Cloud IDE or deployment environment to retrieve the latest changes.

Once pulled, you can:

Trigger runs from the dbt Cloud UI
Schedule jobs using dbt Cloud Scheduler
Monitor build logs, test results, and job history

Generate and explore dbt Docs, including full lineage graphs

5.3. Lineage Views Across Platforms

dbt provides powerful lineage tracking capabilities that allow you to visually trace the flow of data through the different layers of the data model.

Although the lineage metadata originates from the dbt project structure and compiled DAG, it can be viewed using several interfaces depending on your environment.

Screenshot 1: dbt Cloud Lineage (dbt Studio)

In this view, you can explore lineage through the dbt Cloud UI under the "Docs" tab or directly in dbt Studio.
Highlights upstream dependencies and downstream impacts.
Clicking a model reveals its SQL, metadata, and related nodes.

Screenshot 2: Snowflake Lineage (UI after dbt runs)

This is a fairly new feature in snowflake. When models are built in Snowflake using dbt run, the underlying dependencies are registered in Snowflake’s metadata catalog.

Available via the “Lineage” tab under table objects in the Snowsight interface.
Reflects lineage based on actual execution, not just defined models.
Useful for validating dbt's impact on Snowflake-native views and tables.

Screenshot 3: VSCode dbt Power User Extension

The VSCode extension for dbt provides an interactive lineage graph directly in your development environment.

Allows you to click through models and explore the DAG locally.
Provides contextual awareness during development.
Syncs with dbt's internal manifest for accurate rendering.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github		.github
analyses		analyses
aws		aws
macros		macros
markdown_images		markdown_images
models		models
README.md		README.md
dbt_project.yml		dbt_project.yml
profiles.yml		profiles.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Go Sales ❄️ Snowflake dbt Sample Project

Document Control

Table of Content

1. Background

2. High Level Design

Layer Breakdown:

3. Load from S3 to Snowflake

Step 1: Grant S3 Bucket Permissions and Create AWS IAM Role

Step 2: Create Snowflake Storage Integration

Step 3: Inspect Storage Integration Metadata

Step 4: Create the Snowflake Stage

Step 5: List the Files in the Stage

Step 6: Load Raw Tables via dbt

4. Run dbt Models

4.1 Local Development Using dbt Core

Set Environment Variables

Create dbt Run Alias (Shorthand)

4.2. Model Execution by Layer

4.2.1. Raw Models

4.2.2. Staging Models

4.2.3. Detailed Models (DET)

4.2.4. Mart Models (MRT)

5. Using Github and Visualise Lineage with dbt Docs

5.1. Push to GitHub

5.2. Sync with dbt Cloud

5.3. Lineage Views Across Platforms

Screenshot 1: dbt Cloud Lineage (dbt Studio)

Screenshot 2: Snowflake Lineage (UI after dbt runs)

Screenshot 3: VSCode dbt Power User Extension

About

Uh oh!

Releases

Packages

Languages

manz01/dbt_sample_snowflake

Folders and files

Latest commit

History

Repository files navigation

Go Sales ❄️ Snowflake dbt Sample Project

Document Control

Table of Content

1. Background

2. High Level Design

Layer Breakdown:

3. Load from S3 to Snowflake

Step 1: Grant S3 Bucket Permissions and Create AWS IAM Role

Step 2: Create Snowflake Storage Integration

Step 3: Inspect Storage Integration Metadata

Step 4: Create the Snowflake Stage

Step 5: List the Files in the Stage

Step 6: Load Raw Tables via dbt

4. Run dbt Models

4.1 Local Development Using dbt Core

Set Environment Variables

Create dbt Run Alias (Shorthand)

4.2. Model Execution by Layer

4.2.1. Raw Models

4.2.2. Staging Models

4.2.3. Detailed Models (DET)

4.2.4. Mart Models (MRT)

5. Using Github and Visualise Lineage with dbt Docs

5.1. Push to GitHub

5.2. Sync with dbt Cloud

5.3. Lineage Views Across Platforms

Screenshot 1: dbt Cloud Lineage (dbt Studio)

Screenshot 2: Snowflake Lineage (UI after dbt runs)

Screenshot 3: VSCode dbt Power User Extension

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages