Skip to content

AshvanP23/Air_quality_forecasting

 
 

Repository files navigation

Air Quality Index (AQI) Prediction Project

Overview This project analyzes historical air quality data to train a machine learning model capable of predicting the Air Quality Index (AQI) category. It includes scripts for data processing, model training, and a web-based dashboard for live predictions and historical data visualization.

The core of the project is a Random Forest Classifier that predicts the AQI bucket (e.g., "Good", "Moderate", "Poor") based on the concentration of 12 different pollutants.

Features Model Training (source.py):

Loads raw data from city_hour.csv.
Cleans data, imputes missing values using medians, and scales features.
Trains a RandomForestClassifier on the pollutant data.
Performs K-Means clustering and Isolation Forest outlier detection.
Saves the trained model, scaler, imputer, and encoder to outputs/aqi_model.pkl.

Streamlit Dashboard (air_dashboard.py):

Historical Analysis: Provides interactive charts for a selected city, showing AQI over time and the distribution of different pollutants.

Live AQI Prediction: A sidebar tool where you can input 12 pollutant levels (PM2.5, PM10, NO, etc.) and get an instant AQI category prediction from the trained model.

Prediction Test (forecast_api.py): A simple command-line script to test the prediction pipeline with a hardcoded set of pollutant values.

How to Run

  1. Install Dependencies You'll need the following Python libraries. You can install them using pip: pip install pandas numpy scikit-learn streamlit matplotlib seaborn

  2. Run the Training Pipeline Before you can use the dashboard, you must run source.py to train the model and create the necessary aqi_model.pkl file. python AIR/source.py This script will:Read AIR/city_hour.csv. Process the data and train the model. Save the model artifacts to AIR/outputs/aqi_model.pkl. Generate visualization plots (like kmeans.png, outlier.png, etc.).

  3. Run the Streamlit Dashboard Once the aqi_model.pkl file exists, you can start the web application: streamlit run AIR/air_dashboard.py This will open the dashboard in your web browser, where you can explore historical data and use the live prediction tool.

  4. (Optional) Test Prediction in Terminal You can run forecast_api.py to test the model's prediction on a single sample from your terminal. python AIR/forecast_api.py

File Structure AIR/source.py: The main script for data cleaning, preprocessing, model training, and analysis. AIR/air_dashboard.py: The Streamlit application file. This script loads the trained model and serves the interactive web dashboard. AIR/forecast_api.py: A simple script to load the model and test a single prediction. AIR/city_hour.csv: The raw input data file (required by source.py and air_dashboard.py). AIR/outputs/aqi_model.pkl: The (generated) file containing the dictionary of model artifacts: the model, scaler, imputer, and encoder. AIR/*.png: Image files (e.g., kmeans.png, outlier.png) that are generated by the visualization section of source.py.

About

Air Quality Index : An end-to-end Machine Learning pipeline to predict Air Quality Index categories. Features a Random Forest model trained on 12 pollutants, automated data preprocessing, outlier detection using Isolation Forest, and an interactive Streamlit dashboard for real-time predictions and historical analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%