This project builds a production-ready data pipeline that combines daily weather and electricity usage data across 5 major US cities. The insights help energy utilities, planners, and analysts to forecast demand, reduce waste, and make informed data-driven decisions.
| City | State | NOAA Station ID | EIA Region Code |
|---|---|---|---|
| New York | New York | GHCND:USW00094728 | NYIS |
| Chicago | Illinois | GHCND:USW00094846 | PJM |
| Houston | Texas | GHCND:USW00012960 | ERCO |
| Phoenix | Arizona | GHCND:USW00023183 | AZPS |
| Seattle | Washington | GHCND:USW00024233 | SCL |
energy-analysis/
├── dashboards/
│ └── app.py # Streamlit dashboard
├── data/
│ ├── raw/ # Raw data from APIs
│ └── processed/ # Cleaned, validated, and analyzed data
├── logs/
│ ├── analysis.log # Analysis module logs
│ ├── fetcher_run.log # Data fetcher logs
│ ├── pipeline.log # Pipeline execution logs
│ └── processor_run.log # Data processor logs
├── src/
│ ├── data_fetcher.py # Weather & energy data collection
│ ├── data_processor.py # Cleaning + quality checks
│ ├── analysis.py # Data aggregation + insights
│ └── pipeline.py # Full ETL runner
├── tests/
│ ├── test_data_fetcher.py
│ ├── test_data_processor.py
│ ├── test_analysis.py
│ └── test_pipeline.py
├── streamlit/
│ └── config.toml # Streamlit configuration
├── Makefile # Workflow automation
├── pyproject.toml # Project metadata and dependencies
├── .env # Stores API keys (not versioned)
├── README.md # Project documentation
└── AI_USAGE.md # AI assistant audit(ChatGPT / GitHub Copilot)
- Uses NOAA and EIA APIs
- Pulls 90 days of:
- Temperature (Max & Min)
- Electricity Demand (daily)
- Automatically retries failed requests
- Output:
data/raw/
- Merges weather + energy data
- Performs 4 quality checks:
- Missing values
- Temperature outliers (over 130°F / under -50°F)
- Negative or missing energy demand
- Data freshness (within 2 days)
- Logs issues to
logs/ - Output: cleaned CSVs +
*_quality_report.csv
- Combines all city data into
merged_data.csv - Produces:
- Correlation matrix
- Seasonal patterns (monthly)
- Weekday vs weekend usage
- Heatmap matrix
- Geographic summary
make run-dashboardVisualizations:
-
📍 US Map View
- Shows current energy + temperature
- % change from previous day
- Color-coded by energy change
-
📈 Time Series
- 90-day chart: Max temperature (solid) vs Energy (dotted)
- Weekends shaded
- City selector dropdown
-
🔬 Correlation Analysis
- Temperature vs energy scatter plot
- Trendline (regression) with R² and R
- Hover shows date and values
-
🔥 Heatmap by Temp & Weekday
- Temp range on Y-axis
- Weekday on X-axis
- Color scale: low (blue) to high (red)
make installCreate a .env file:
NOAA_API_KEY=your_noaa_token
EIA_API_KEY=your_eia_tokenmake pipelinemake run-dashboard| Path | Description |
|---|---|
data/raw/*.csv |
Raw NOAA and EIA data |
data/processed/*_cleaned.csv |
Cleaned and merged per-city data |
data/processed/merged_data.csv |
All-city data for dashboard and insights |
data/processed/*_quality_report.csv |
Summary of missing, outliers, and freshness |
data/processed/correlation_matrix.csv |
Pairwise correlations (Temp vs Energy) |
data/processed/geographic_overview.csv |
Latest energy usage + % change |
data/processed/heatmap_matrix.csv |
Heatmap-ready usage matrix |
To run unit tests:
make testCovers:
- Data fetching
- Data validation
- Analysis calculations
- Full ETL orchestration
See AI_USAGE.md for:
- Prompts used
- Specific AI contributions (ChatGPT / GitHub Copilot)
- Manual edits vs AI completions
- Impact on productivity and bug detection
- Energy usage increases as temperatures rise, especially >80°F
- Weekday demand consistently higher than weekends
- Seasonal patterns vary: colder cities peak in winter; hotter cities in summer
- R-squared values confirm moderate to strong correlation in most regions
- Python 3.13+
- Streamlit
- Plotly
- Pandas, NumPy
- Scikit-learn
- NOAA / EIA APIs
- Pytest
- Makefile automation
uvfor dependency management
Project: Pioneer AI Academy — Data Science Project 1
Maintainer: Felix Wilson Gbedemah
Contact: afrogbede09@gmail.com
License: Apache License
- Add ML model to forecast next-day demand
- Add error monitoring with Slack/email notifications
- Expand to all US states using FIPS codes
- Cache API responses to reduce load and retries