Author: ShuhanYang
Project Type: Database Normalization & Business Intelligence Dashboard
This project demonstrates comprehensive data analytics skills through database normalization and business intelligence visualization. Using a retail store dataset, the project transforms raw transactional data into a normalized database structure and creates insightful visualizations for business decision-making.
- Database Normalization: Transform denormalized retail data into a proper relational database structure
- Data Optimization: Achieve significant storage reduction through normalization
- Business Intelligence: Create meaningful visualizations for strategic business insights
- Performance Analysis: Implement efficient database design with proper indexing and foreign key relationships
The project uses a comprehensive retail store dataset containing:
- Order Information: Order IDs, dates, shipping details
- Customer Data: Customer demographics and segmentation
- Product Details: Categories, subcategories, product specifications
- Geographic Data: Regional distribution across US states and cities
- Financial Metrics: Sales, profit, discount, and quantity data
- Raw Data: Single denormalized table with 21 attributes
- Data Points: 209,874 total data points
The database is normalized into 5 optimized tables:
- customers- Customer information and segmentation
- locations- Geographic data with unique location identifiers
- products- Product catalog with categories and subcategories
- orders- Order header information with foreign key relationships
- order_items- Order line items (junction table)
The project includes a comprehensive ER diagram showing:
- Primary and foreign key relationships
- Table dependencies and constraints
- Optimized data structure for query performance
- Technology Leadership: Technology category leads with $836K in sales
- Profit Margin Insights: Office Supplies shows more stable profitability
- Strategic Value: Guides product portfolio optimization
- West Region Excellence: 31.6% of total sales with 14.9% profit margin
- Geographic Opportunities: California, New York, and Texas lead in sales
- Operational Efficiency: East and West regions show superior profit margins
- Consumer Dominance: 51.6% of customers contributing highest sales ($1.16M)
- Home Office Premium: Smallest segment (18.7%) with highest average order value ($241)
- Segmentation Strategy: Clear behavioral differences require differentiated approaches
- Seasonal Patterns: Clear sales peaks at year-end periods
- Growth Trajectory: Significant growth in late 2017
- Sales-Profit Correlation: Strong correlation of 0.716 between sales and profit
- Market Focus: Technology products drive revenue, Office Supplies drive margins
- Regional Strategy: West region represents most valuable market
- Customer Strategy: Balance consumer volume with home office premium segments
- Inventory Management: Data-driven guidance for product allocation
- Marketing Optimization: Regional and segment-specific campaign targeting
- Financial Planning: Seasonal pattern recognition for cash flow management
- Database: MySQL
- Data Processing: Python (pandas, numpy)
- Visualization: Python (matplotlib, seaborn, plotly)
- Documentation: Microsoft Word, Markdown
- Sales Performance: Technology category: $836K, Office Supplies: stable margins
- Regional Distribution: West region: 31.6% sales share, 14.9% profit margin
- Customer Segments: Consumer: 51.6% of customers, Home Office: $241 average order value
- Temporal Patterns: Year-end sales peaks, 0.716 sales-profit correlation
- Product Portfolio: Balance high-volume technology sales with high-margin office supplies
- Geographic Expansion: Focus investment in West region while exploring East region opportunities
- Customer Acquisition: Differentiated strategies for consumer volume vs. home office premium segments
This project demonstrates comprehensive data analytics capabilities including database design, normalization, business intelligence, and strategic insight generation.