Skip to content

SQL database normalization project demonstrating Third Normal Form (3NF) principles with the Northwind dataset

License

Notifications You must be signed in to change notification settings

sasanamari/northwind-data-normalization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Northwind Database Normalization Project

A SQL project demonstrating database normalization from a denormalized CSV file to a properly structured relational database following Third Normal Form (3NF).

Project Overview

This project takes a flat Northwind dataset and normalizes it into separate, related tables to:

  • Eliminate data redundancy
  • Prevent update, insertion, and deletion anomalies
  • Improve data integrity through proper foreign key relationships
  • Optimize storage and query performance

Database Schema

The normalized database consists of 7 tables:

erDiagram
    customers ||--o{ orders : places
    employees ||--o{ orders : processes
    orders ||--o{ order_details : contains
    products ||--o{ order_details : "included in"
    categories ||--o{ products : categorizes
    suppliers ||--o{ products : supplies

    customers {
        varchar customerID PK
        varchar companyName
        varchar contactName
        varchar contactTitle
    }

    employees {
        int employeeID PK
        varchar lastName
        varchar firstName
        varchar title
    }

    orders {
        int orderID PK
        varchar customerID FK
        int employeeID FK
        date orderDate
        date requiredDate
        date shippedDate
        int shipVia
        decimal freight
    }

    order_details {
        int orderID PK,FK
        int productID PK,FK
        decimal unitPrice "price at time of order"
        int quantity
        decimal discount
    }

    products {
        int productID PK
        varchar productName
        varchar quantityPerUnit
        decimal unitPrice "current price"
        int unitsInStock
        int unitsOnOrder
        int reorderLevel
        smallint discontinued
        int categoryID FK
        int supplierID FK
    }

    categories {
        int categoryID PK
        varchar categoryName
    }

    suppliers {
        int supplierID PK
        varchar companyName
        varchar contactName
        varchar contactTitle
    }
Loading

Prerequisites

  • PostgreSQL (version 12 or higher recommended)
  • psql command-line tool

Setup Instructions

  1. Clone the repository

    git clone <repository-url>
    cd northwind-data-normalization
  2. Create a database and schema

    psql -U postgres
    CREATE DATABASE northwind_db;
    \c northwind_db
    CREATE SCHEMA northwind;
    \q
  3. Run the normalization script

    psql -U postgres -d northwind_db -f schema/create_tables.sql

    The script will:

    • Create the staging table TblNorthwind
    • Load data from the CSV file
    • Create normalized tables (categories, suppliers, customers, employees, products, orders, order_details)
    • Add primary keys and foreign keys
    • Create employee sales performance view
    • Display row counts for verification

Project Structure

northwind-data-normalization/
├── README.md                    # This file
├── .gitignore                   # Git ignore rules
├── data/
│   └── Northwind_utf8.csv      # Source data file
└── schema/
    ├── create_tables.sql        # Main normalization script
    ├── Northwind_schema.dbml    # Database schema (DBML format)
    └── Northwind_ERD.md         # Entity Relationship Diagram (Mermaid)

Key Features

Normalization Highlights

  • 1NF (First Normal Form): All columns contain atomic values
  • 2NF (Second Normal Form): No partial dependencies on composite keys
  • 3NF (Third Normal Form): No transitive dependencies between non-key columns

Junction Table

The order_details table serves as a junction table to resolve the many-to-many relationship between orders and products, storing:

  • Historical pricing (unitPrice at time of order)
  • Quantity and discount information
  • Composite primary key (orderID, productID)

Data Integrity

  • Primary keys ensure unique identification
  • Foreign keys maintain referential integrity
  • Proper data types with size constraints
  • Boolean conversion for discontinued status

Sample Queries and Results

Normalized Table Row Counts

After running the normalization script:

Table Name Row Count
categories 8
suppliers 29
customers 85
employees 9
products 77
orders 378
order_details 1000

Orders with Customer Information

SELECT o.orderID, c.companyName, o.orderDate, o.freight
FROM orders o
JOIN customers c ON o.customerID = c.customerID
ORDER BY o.orderDate DESC
LIMIT 5;

Sample Result:

orderID companyName orderDate freight
10625 Ana Trujillo Emparedados y helados 1997-08-08 43.90
10623 Frankenversand 1997-08-07 97.18
10624 The Cracker Box 1997-08-07 94.80
10622 Ricardo Adocicados 1997-08-06 50.97
10621 Island Trading 1997-08-05 23.73

Products by Category

SELECT c.categoryName, COUNT(p.productID) as product_count
FROM categories c
JOIN products p ON c.categoryID = p.categoryID
GROUP BY c.categoryName
ORDER BY product_count DESC;

Result:

categoryName product_count
Confections 13
Beverages 12
Condiments 12
Seafood 12
Dairy Products 10
Grains/Cereals 7
Meat/Poultry 6
Produce 5

Employee Sales Performance

The database includes an employee_record view for analyzing sales performance:

SELECT * FROM employee_record
ORDER BY revenue DESC
LIMIT 5;

Sample Result:

employeeID full_name title quantity_sold revenue
4 Margaret Peacock Sales Representative 5301 132910.29
1 Nancy Davolio Sales Representative 3741 92335.38
3 Janet Leverling Sales Representative 3421 86548.05
2 Andrew Fuller Vice President, Sales 2784 66519.00
7 Robert King Sales Representative 1881 59541.60

Technologies Used

  • PostgreSQL - Relational database management system
  • SQL - Data definition and manipulation
  • DBML - Database Markup Language for schema documentation
  • Mermaid - Diagram rendering in markdown

License

This project is for educational purposes.

About

SQL database normalization project demonstrating Third Normal Form (3NF) principles with the Northwind dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published