A Data Engineering & Analytics Project by Aneesh AJ
PowerGrid+ is an end-to-end data engineering and analytics
platform designed to simulate real-time smart grid monitoring.
It demonstrates the full lifecycle of a modern data pipeline:
- Synthetic smart-meter generation\
- Automated ETL processing\
- Rolling-window feature engineering\
- Real-time anomaly detection\
- Storage into PostgreSQL\
- Interactive Power BI dashboard delivering operational insights
This project simulates how real utilities monitor grid health, detect abnormal consumption patterns, track meter behavior, and identify potential faults or energy theft indicators.
It is designed for recruiters and hiring managers evaluating Data Engineering candidates.
- Synthetic smart-meter data generator (configurable, multi-region, 336K+ rows)
- Python ETL pipeline
- Cleaning, transformations, temporal feature engineering\
- Rolling 1-hour load averages\
- Power factor + consumption metrics\
- Rule-based anomaly detection
- Sudden spikes/drops\
- Consumption irregularities\
- Gold dataset creation for analytics\
- PostgreSQL integration (Dockerized)\
- Power BI operational dashboard with:
- KPIs\
- Load trends\
- Region performance\
- Hourly heatmap\
- Top anomalous meters\
- Detailed anomaly review
โโโโโโโโโโโโโโโโโโโโโโโ
โ Data Generation โ
โ (Synthetic Meter โ
โ Readings - CSV) โ
โโโโโโโโโโโโฌโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โ ETL Layer โ
โ Cleaning + Shaping โ
โ Rolling Features โ
โโโโโโโโโโโโฌโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โ Anomaly Detection โ
โ (Spike / Drop / PF) โ
โโโโโโโโโโโโฌโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโ
โ Gold Dataset โ
โ (Analytics- โ
โ optimized) โ
โโโโโโโโโโฌโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PostgreSQL (Docker) โ
โ Analytics Warehouse โ
โโโโโโโโโโโโฌโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโ
โ Power BI Dashboard โ
โ Insights & Monitoring โ
โโโโโโโโโโโโโโโโโโโโโโโโโโ
The pipeline constructs meaningful operational features:
rolling_kw_1h = mean(power_kw over last 4 intervals)
Used for hourly heatmaps and temporal patterns.
Low PF indicates inefficient loads or equipment issues.
Detects sudden transitions:
kw_pct_change = (current_kw - prev_kw) / prev_kw
The anomaly engine flags readings using simple rule-based thresholds:
Large upward jump in consumption.
Zero load or rapid decay --- often indicates outages, equipment faults, or meter resets.
Repeated power factor abnormalities.
Each row gets:
anomaly_flag(True/False)anomaly_reason(drop/spike/irregular)
This is optimized for operational monitoring, not ML classification.
- timestamp\
- meter_id\
- region\
- voltage\
- current\
- power_kw\
- temperature_c\
- hour_of_day\
- rolling_kw_1h\
- kw_pct_change\
- anomaly_flag\
- anomaly_reason
The dashboard provides real-time operational visibility.
- Total Power Consumption\
- Average Power Factor\
- Total Readings Processed\
- Total Anomalies\
- Anomaly Percentage
Bar chart showing anomaly distribution by region.
Continuous time-series showing how load evolves.
Shows consumption patterns over 24 hours ร region.
Identifies worst-performing meters.
Detailed record of flagged anomalies.
North exhibits higher mean consumption and more spikes.
This may indicate:
- Industrial clients\
- Transformer overload\
- Aging infrastructure
Occurs across all regions --- classic residential demand peak.
Supports load balancing and peak shaving strategies.
Often representing outages, equipment faults, or meter resets.
Top-10 chart reveals meters with recurring faults.
These meters should be prioritized for field inspection.
PF < 0.9 in several intervals = potential:
- Poor load quality\
- Reactive power issues\
- Need for capacitor bank adjustments
These insights demonstrate operational value for utilities.
Python (pandas, numpy)
- ETL pipelines\
- Feature engineering\
- Anomaly detection\
- File-based data orchestration
PostgreSQL (Dockerized)
Power BI
Docker, SQLAlchemy, psycopg2
cd db
docker compose up -d
powergrid generate
powergrid etl
powergrid anomalies
powergrid load
Open Power BI โ load gold_dataset.csv or connect directly to
PostgreSQL.
- Add ML anomaly detection (XGBoost, isolation forest)\
- Build FastAPI service for real-time prediction\
- Add Airflow orchestration\
- Deploy to cloud (AWS RDS + ECS + QuickSight)\
- Add reactive power + harmonics analysis
Aneesh AJ
Data Engineering & AI Enthusiast