A SQL project demonstrating database normalization from a denormalized CSV file to a properly structured relational database following Third Normal Form (3NF).
This project takes a flat Northwind dataset and normalizes it into separate, related tables to:
- Eliminate data redundancy
- Prevent update, insertion, and deletion anomalies
- Improve data integrity through proper foreign key relationships
- Optimize storage and query performance
The normalized database consists of 7 tables:
erDiagram
customers ||--o{ orders : places
employees ||--o{ orders : processes
orders ||--o{ order_details : contains
products ||--o{ order_details : "included in"
categories ||--o{ products : categorizes
suppliers ||--o{ products : supplies
customers {
varchar customerID PK
varchar companyName
varchar contactName
varchar contactTitle
}
employees {
int employeeID PK
varchar lastName
varchar firstName
varchar title
}
orders {
int orderID PK
varchar customerID FK
int employeeID FK
date orderDate
date requiredDate
date shippedDate
int shipVia
decimal freight
}
order_details {
int orderID PK,FK
int productID PK,FK
decimal unitPrice "price at time of order"
int quantity
decimal discount
}
products {
int productID PK
varchar productName
varchar quantityPerUnit
decimal unitPrice "current price"
int unitsInStock
int unitsOnOrder
int reorderLevel
smallint discontinued
int categoryID FK
int supplierID FK
}
categories {
int categoryID PK
varchar categoryName
}
suppliers {
int supplierID PK
varchar companyName
varchar contactName
varchar contactTitle
}
- PostgreSQL (version 12 or higher recommended)
psqlcommand-line tool
-
Clone the repository
git clone <repository-url> cd northwind-data-normalization
-
Create a database and schema
psql -U postgres
CREATE DATABASE northwind_db; \c northwind_db CREATE SCHEMA northwind; \q
-
Run the normalization script
psql -U postgres -d northwind_db -f schema/create_tables.sql
The script will:
- Create the staging table
TblNorthwind - Load data from the CSV file
- Create normalized tables (categories, suppliers, customers, employees, products, orders, order_details)
- Add primary keys and foreign keys
- Create employee sales performance view
- Display row counts for verification
- Create the staging table
northwind-data-normalization/
├── README.md # This file
├── .gitignore # Git ignore rules
├── data/
│ └── Northwind_utf8.csv # Source data file
└── schema/
├── create_tables.sql # Main normalization script
├── Northwind_schema.dbml # Database schema (DBML format)
└── Northwind_ERD.md # Entity Relationship Diagram (Mermaid)
- 1NF (First Normal Form): All columns contain atomic values
- 2NF (Second Normal Form): No partial dependencies on composite keys
- 3NF (Third Normal Form): No transitive dependencies between non-key columns
The order_details table serves as a junction table to resolve the many-to-many relationship between orders and products, storing:
- Historical pricing (
unitPriceat time of order) - Quantity and discount information
- Composite primary key
(orderID, productID)
- Primary keys ensure unique identification
- Foreign keys maintain referential integrity
- Proper data types with size constraints
- Boolean conversion for discontinued status
After running the normalization script:
| Table Name | Row Count |
|---|---|
| categories | 8 |
| suppliers | 29 |
| customers | 85 |
| employees | 9 |
| products | 77 |
| orders | 378 |
| order_details | 1000 |
SELECT o.orderID, c.companyName, o.orderDate, o.freight
FROM orders o
JOIN customers c ON o.customerID = c.customerID
ORDER BY o.orderDate DESC
LIMIT 5;Sample Result:
| orderID | companyName | orderDate | freight |
|---|---|---|---|
| 10625 | Ana Trujillo Emparedados y helados | 1997-08-08 | 43.90 |
| 10623 | Frankenversand | 1997-08-07 | 97.18 |
| 10624 | The Cracker Box | 1997-08-07 | 94.80 |
| 10622 | Ricardo Adocicados | 1997-08-06 | 50.97 |
| 10621 | Island Trading | 1997-08-05 | 23.73 |
SELECT c.categoryName, COUNT(p.productID) as product_count
FROM categories c
JOIN products p ON c.categoryID = p.categoryID
GROUP BY c.categoryName
ORDER BY product_count DESC;Result:
| categoryName | product_count |
|---|---|
| Confections | 13 |
| Beverages | 12 |
| Condiments | 12 |
| Seafood | 12 |
| Dairy Products | 10 |
| Grains/Cereals | 7 |
| Meat/Poultry | 6 |
| Produce | 5 |
The database includes an employee_record view for analyzing sales performance:
SELECT * FROM employee_record
ORDER BY revenue DESC
LIMIT 5;Sample Result:
| employeeID | full_name | title | quantity_sold | revenue |
|---|---|---|---|---|
| 4 | Margaret Peacock | Sales Representative | 5301 | 132910.29 |
| 1 | Nancy Davolio | Sales Representative | 3741 | 92335.38 |
| 3 | Janet Leverling | Sales Representative | 3421 | 86548.05 |
| 2 | Andrew Fuller | Vice President, Sales | 2784 | 66519.00 |
| 7 | Robert King | Sales Representative | 1881 | 59541.60 |
- PostgreSQL - Relational database management system
- SQL - Data definition and manipulation
- DBML - Database Markup Language for schema documentation
- Mermaid - Diagram rendering in markdown
This project is for educational purposes.