Predict the magnitude of earthquakes using historical earthquake event data.
Accurate magnitude prediction helps assess potential risks, improve disaster preparedness, and support seismological research.
Source: Kaggle – Earthquake Dataset
Target Variable: magnitude
Features used:
longitude,latitude– location of the earthquakeyear,month– time-based featuresdepth,sig,nst,dmin,rms,gap,tsunami– seismological attributesmagType,type– event type indicators
Performed detailed analysis to understand relationships and data trends:
- Distribution of Magnitudes: Most earthquakes have magnitudes between 3.0 and 6.0.
- Depth vs Magnitude Correlation: Deeper earthquakes show slightly lower magnitudes on average.
- Regional Patterns: Higher activity clusters observed near specific latitude-longitude ranges.
- Temporal Trends: Magnitude patterns analyzed over time using
yearandmonth.
Key visualizations include:
- Magnitude distribution histogram
- Scatter plot of depth vs magnitude
- Heatmap of feature correlations
- Map of earthquake locations (longitude vs latitude)
| Model | RMSE | MAE | R² Score |
|---|---|---|---|
| Gradient Boosting | 0.233 | 0.0299 | 0.9184 |
| Random Forest | 0.238 | 0.0277 | 0.9150 |
| Ridge Regression | 0.268 | 0.130 | 0.8920 |
| Linear Regression | 0.268 | 0.130 | 0.8920 |
| Lasso Regression | 0.272 | 0.131 | 0.8889 |
Best Model: Gradient Boosting Regressor (R² = 0.918)
It achieved the highest R² score (0.92) and lowest RMSE (0.23), showing strong predictive accuracy.
Top influential features for magnitude prediction:
- sig (significance factor)
- dmin (distance to nearest station)
- tsunami
- rms (root mean square of travel time residuals)
- magType
- Actual vs Predicted Magnitude (Scatter Plot)
- Residual Distribution (Prediction Errors)
- Feature Importance (Gradient Boosting)
- The
sigfeature dominates prediction importance, indicating event significance is a strong indicator of magnitude. - Geographical coordinates (
latitude,longitude) contribute less, suggesting local variations are less dominant than seismic parameters. - Ensemble models (Random Forest, Gradient Boosting) outperform linear models due to their ability to capture nonlinear relationships.
The actual vs predicted plot shows strong alignment:
- Points are closely scattered around the diagonal line (ideal prediction).
- Gradient Boosting demonstrates minimal deviation and low residual error.
- Gradient Boosting performed best with lowest RMSE (0.23) and highest R² (0.92).
- Geographic location (
longitude,latitude) and seismic station parameters (nst,dmin) were most influential. - Linear models underperformed compared to ensemble methods.
- Language: Python 3.x
- Libraries:
- pandas, numpy
- matplotlib, seaborn
- scikit-learn
- Clone this repository:
git clone https://github.com/<your-username>/earthquake-prediction.git
- Run the Jupyter Notebook jupyter notebook earthquake_magnitude_predictions.ipynb
