An interactive Python ETL pipeline for cleaning messy sales data, saving to a database, generating summary reports, and visualizing sales trends.
β ETL Process
- Load multi-CSV (monthly sales data)
 - Clean data: handle NaN, convert messy 
Salescolumn to numeric, remove outliers - Add 
Monthcolumn automatically based on file name 
β Database Integration
- Save cleaned data to SQLite database
 - Auto-create index on 
Productcolumn for faster queries - Run SQL queries for total sales per product & month
 
β Reports & Visualization
- Export sales summary CSV from DB query
 - Generate bar chart of total sales per month (
output/sales_per_month.png) - Full logging to 
output/etl.logfor traceability 
This project helps small businesses, analysts, and data teams:
- Automate monthly sales data consolidation
 - Build a clean, queryable sales database
 - Generate reports & visual insights for better decisions
 
output/sales_data.dbβ SQLite database fileoutput/monthly_sales_summary.csvβ Sales summary (DB query)output/sales_per_month.pngβ Bar chart of sales by monthoutput/etl.logβ Detailed ETL process log
1οΈβ£ (Recommended) Set up virtual environment
python -m venv venv
source venv/bin/activate  # Mac/Linux
venv\Scripts\activate     # Windows2οΈβ£ Install dependencies
pip install -r requirements.txt3οΈβ£ Run ETL pipeline
python etl/main.py- π Python 3.x
 - π¦ Pandas
 - β SQLAlchemy
 - π Matplotlib
 
Built with clean, modular code β ready for production or extension into dashboards.
β
 You can easily integrate this pipeline into Streamlit, BI tools, or cloud databases.
Khairu Ikramendra
πΌ Freelance Dashboard & Data Analytics Developer
π LinkedIn
π Upwork
π¬ Need help customizing this ETL for your business? Feel free to reach out!