A end-to-end data analytics project that cleans, analyzes, and visualizes 50,000+ retail transaction records — featuring SQL-based KPI extraction and an interactive Streamlit dashboard.
This project simulates a real-world retail analytics pipeline, taking raw superstore sales data from ingestion all the way to an executive-ready dashboard. It covers data cleaning, exploratory data analysis, SQL querying, and interactive visualization.
Retail-Data-Analytics/
│
├── data/ # Raw and cleaned CSV + SQLite database
├── outputs/
│ ├── charts/ # Saved EDA charts
│ └── executive_summary.txt
├── src/
│ ├── data_cleaning.py # Data ingestion, cleaning, feature engineering
│ ├── eda.py # Exploratory data analysis + chart generation
│ ├── sql_analysis.py # SQLite setup + KPI queries
│ └── insights.py # Automated executive summary generation
├── dashboard/
│ └── app.py # Streamlit interactive dashboard
├── main.py # Run full pipeline in one command
└── requirements.txt
| Tool | Purpose |
|---|---|
| Python / Pandas | Data cleaning & EDA |
| SQLite / SQLAlchemy | SQL-based KPI extraction |
| Matplotlib / Seaborn | Static chart generation |
| Plotly | Interactive dashboard charts |
| Streamlit | Web-based dashboard UI |
1. Clone the repository
git clone https://github.com/your-username/Retail-Data-Analytics.git
cd Retail-Data-Analytics2. Install dependencies
pip install -r requirements.txt3. Add the dataset
Download the Superstore Sales dataset from Kaggle and place the CSV at:
data/Superstore.csv
4. Run the full pipeline
python main.py5. Launch the dashboard
streamlit run dashboard/app.py- Data Cleaning — Fixed date formats, removed duplicates, engineered features like
profit_marginandship_days - EDA — Monthly revenue trends, category/regional breakdowns, discount vs. profit scatter analysis, top customer rankings
- SQL Queries — KPIs extracted using
GROUP BY, subqueries,COUNT(DISTINCT), and conditional aggregations - Streamlit Dashboard — Fully interactive with sidebar filters by year, region, and category
- Executive Summary — Auto-generated plain-English business insights saved to file
- Source: Superstore Sales — Kaggle
- Size: 9,000+ transaction records
- Features: Order dates, customer segments, product categories, regional data, sales, profit, and discount
Nilay Srivastava
GitHub
