This project predicts the Air Quality Index (AQI) for five U.S. cities — New York, Los Angeles, Chicago, Houston, and Denver — over a four-day window (April 22–25, 2026). AQI measures how clean or polluted the air is on a scale of 0–500, with a focus on PM2.5 (particulate matter ≤ 2.5 micrometers).
- Primary source: EPA Air Quality System (AQS) — historical PM2.5 and AQI data from outdoor monitors.
- Supplemental data: Weather variables (temperature, humidity, wind, etc.) and any other publicly available data that may influence air quality.
Models must not use neural networks or deep learning. Acceptable approaches include:
- Regression / classification (e.g., XGBoost, random forests)
- Time series analysis (e.g., ARIMA, SARIMA)
- Other statistical methods covered in class
Actual AQI values are sourced from AirNow for the prediction dates across all five cities. The winning team is determined by the smallest mean absolute error (MAE).
| Deadline | Deliverable |
|---|---|
| March 24, 2026 | Team formation email |
| March 31, 2026 | Project status update (one paragraph) |
| April 7, 2026 | Initial predictions CSV (April 8–11, ungraded) |
| April 21, 2026 | Final predictions CSV (April 22–25) |
| April 30, 2026 | Final CSV (with actuals), project report (PDF), code with output (PDF) |
├── project_sp26.pdf # Full project description
└── README.md