GitHub - AshvanP23/Air_quality_forecasting: Air Quality Index : An end-to-end Machine Learning pipeline to predict Air Quality Index categories. Features a Random Forest model trained on 12 pollutants, automated data preprocessing, outlier detection using Isolation Forest, and an interactive Streamlit dashboard for real-time predictions and historical analysis.

Air Quality Index (AQI) Prediction Project

Overview This project analyzes historical air quality data to train a machine learning model capable of predicting the Air Quality Index (AQI) category. It includes scripts for data processing, model training, and a web-based dashboard for live predictions and historical data visualization.

The core of the project is a Random Forest Classifier that predicts the AQI bucket (e.g., "Good", "Moderate", "Poor") based on the concentration of 12 different pollutants.

Features Model Training (source.py):

Loads raw data from city_hour.csv.
Cleans data, imputes missing values using medians, and scales features.
Trains a RandomForestClassifier on the pollutant data.
Performs K-Means clustering and Isolation Forest outlier detection.
Saves the trained model, scaler, imputer, and encoder to outputs/aqi_model.pkl.

Streamlit Dashboard (air_dashboard.py):

Historical Analysis: Provides interactive charts for a selected city, showing AQI over time and the distribution of different pollutants.

Live AQI Prediction: A sidebar tool where you can input 12 pollutant levels (PM2.5, PM10, NO, etc.) and get an instant AQI category prediction from the trained model.

Prediction Test (forecast_api.py): A simple command-line script to test the prediction pipeline with a hardcoded set of pollutant values.

How to Run

Install Dependencies You'll need the following Python libraries. You can install them using pip: pip install pandas numpy scikit-learn streamlit matplotlib seaborn
Run the Training Pipeline Before you can use the dashboard, you must run source.py to train the model and create the necessary aqi_model.pkl file. python AIR/source.py This script will:Read AIR/city_hour.csv. Process the data and train the model. Save the model artifacts to AIR/outputs/aqi_model.pkl. Generate visualization plots (like kmeans.png, outlier.png, etc.).
Run the Streamlit Dashboard Once the aqi_model.pkl file exists, you can start the web application: streamlit run AIR/air_dashboard.py This will open the dashboard in your web browser, where you can explore historical data and use the live prediction tool.
(Optional) Test Prediction in Terminal You can run forecast_api.py to test the model's prediction on a single sample from your terminal. python AIR/forecast_api.py

File Structure AIR/source.py: The main script for data cleaning, preprocessing, model training, and analysis. AIR/air_dashboard.py: The Streamlit application file. This script loads the trained model and serves the interactive web dashboard. AIR/forecast_api.py: A simple script to load the model and test a single prediction. AIR/city_hour.csv: The raw input data file (required by source.py and air_dashboard.py). AIR/outputs/aqi_model.pkl: The (generated) file containing the dictionary of model artifacts: the model, scaler, imputer, and encoder. AIR/*.png: Image files (e.g., kmeans.png, outlier.png) that are generated by the visualization section of source.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
AQIdistribution.png		AQIdistribution.png
README.md		README.md
air_dashboard.py		air_dashboard.py
city_hour.csv		city_hour.csv
forecast_api.py		forecast_api.py
kmeans.png		kmeans.png
outlier.png		outlier.png
source.py		source.py

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages