Air Quality Index (AQI) Prediction Project
Overview This project analyzes historical air quality data to train a machine learning model capable of predicting the Air Quality Index (AQI) category. It includes scripts for data processing, model training, and a web-based dashboard for live predictions and historical data visualization.
The core of the project is a Random Forest Classifier that predicts the AQI bucket (e.g., "Good", "Moderate", "Poor") based on the concentration of 12 different pollutants.
Features Model Training (source.py):
Loads raw data from city_hour.csv.
Cleans data, imputes missing values using medians, and scales features.
Trains a RandomForestClassifier on the pollutant data.
Performs K-Means clustering and Isolation Forest outlier detection.
Saves the trained model, scaler, imputer, and encoder to outputs/aqi_model.pkl.
Streamlit Dashboard (air_dashboard.py):
Historical Analysis: Provides interactive charts for a selected city, showing AQI over time and the distribution of different pollutants.
Live AQI Prediction: A sidebar tool where you can input 12 pollutant levels (PM2.5, PM10, NO, etc.) and get an instant AQI category prediction from the trained model.
Prediction Test (forecast_api.py): A simple command-line script to test the prediction pipeline with a hardcoded set of pollutant values.
How to Run
-
Install Dependencies You'll need the following Python libraries. You can install them using pip: pip install pandas numpy scikit-learn streamlit matplotlib seaborn
-
Run the Training Pipeline Before you can use the dashboard, you must run source.py to train the model and create the necessary aqi_model.pkl file. python AIR/source.py This script will:Read AIR/city_hour.csv. Process the data and train the model. Save the model artifacts to AIR/outputs/aqi_model.pkl. Generate visualization plots (like kmeans.png, outlier.png, etc.).
-
Run the Streamlit Dashboard Once the aqi_model.pkl file exists, you can start the web application: streamlit run AIR/air_dashboard.py This will open the dashboard in your web browser, where you can explore historical data and use the live prediction tool.
-
(Optional) Test Prediction in Terminal You can run forecast_api.py to test the model's prediction on a single sample from your terminal. python AIR/forecast_api.py
File Structure AIR/source.py: The main script for data cleaning, preprocessing, model training, and analysis. AIR/air_dashboard.py: The Streamlit application file. This script loads the trained model and serves the interactive web dashboard. AIR/forecast_api.py: A simple script to load the model and test a single prediction. AIR/city_hour.csv: The raw input data file (required by source.py and air_dashboard.py). AIR/outputs/aqi_model.pkl: The (generated) file containing the dictionary of model artifacts: the model, scaler, imputer, and encoder. AIR/*.png: Image files (e.g., kmeans.png, outlier.png) that are generated by the visualization section of source.py.