Team Members: Sofia Zanti, Vasilisa Matafonova, Tim Marchenko
This project explores the relationship between GDP growth and air pollution levels, specifically focusing on Suspended Particulate Matter (SPM) over the period from 1700 to 2016.
We aimed to understand how economic development has historically influenced air pollution and whether this relationship can be captured through various machine learning techniques.
- Collected and cleaned historical data (1700–2016)
- Used scatter plots and cubic regressions for visualization
- Calculated correlation coefficients
- Tested multiple machine learning models and founf the best one
By the end of this project, we aim to achieve the following:
- Data Visualization: Generate graphs and data frames illustrating historical air quality changes.
- Analysis Report: Document methodologies, results, and insights.
- Research: Understand the problem and gather background information.
- Data Collection: Identify and acquire relevant datasets, and make changes to the idea of the project if absolutely necessary.
- Data Processing: Clean, structure, and preprocess data (handling missing values, formatting).
- Data Visualization: Use matplotlib and/or seaborn to create linear regression graphs and other visualizations.
- Report Writing: Document findings, methodologies, and analysis.
- Prediction Modeling: Use historical data to forecast future air quality trends.
- Matplotlib + Seaborn: Visualization of data
- Pandas: Data manipulation and structuring
- NumPy: Data calculations and predictions
- Testing Metrics: In order to document how the models are performing
- Google Docs: Report writing and documentation
- GitHub: Version control and project tracking
- Pollution follows the Environmental Kuznets Curve meaning it rises with GDP up to a point, then declines
- Linear models are inadequate for complex, non-linear relationships
- Ensemble models, particularly Random Forest, perform best
- Early Pollution: Coal use. solution: Switch to solar/wind
- Industrial Era: More coal with GDP growth. Solution: Promote clean tech
- Modern Era: CO2 from cars. Solution: Encourage public transport, bike-friendly cities
- Regulatory Gaps. Solution: Use AI for monitoring & enforcement
- Add more features
- Apply time-series oriented models
- Use hyperparameter tuning