Skip to content

iakcan/crop_production_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Crop Production Analysis in India using Statistical Modeling

This project analyzes agricultural production data in India using statistical methods and modeling techniques to identify key environmental and soil-related factors influencing crop yield.


🚀 Project Overview

Agricultural productivity depends on complex interactions between climate conditions, soil properties, and farming practices.

This project explores crop production data across Indian states to:

  • Identify patterns in agricultural output
  • Analyze the impact of rainfall, temperature, and soil nutrients
  • Build interpretable statistical models for yield prediction

📂 Dataset

The dataset contains:

  • ~100,000 observations
  • Soil nutrients, climate variables, and production metrics

👉 Dataset sample:
View full dataset


🔬 Methodology

Data Preprocessing

  • Removed redundant variables
  • Outlier removal (IQR method)
  • Feature engineering (Year_Index, regions, crop categories)

Exploratory Data Analysis

Regional Production

Production by Region

  • West region has the highest production
  • Strong regional variation in agricultural output

Climate Analysis

Rainfall by Region

Temperature by Region

  • Rainfall varies significantly across regions
  • Temperature differences influence productivity

Temporal Trends

Rainfall vs Production

Temperature vs Production

  • Increased rainfall → lower production
  • Temperature trends correlate with yield changes

Crop Type Analysis

Crop Production

  • Food crops dominate total production
  • Spices show the lowest production levels

Correlation Analysis

Heatmap

  • Strong correlation: Area ↔ Production
  • Weak correlations between climate and nutrients

📉 Model Development

PCA (Dimensionality Reduction)

PCA

  • 5 components explain ~77% of variance
  • Soil nutrients negatively related to pH

Model Results

  • Generalized Linear Model (GLM)
  • Model selection using AIC
  • Best model includes interaction terms

Model Performance

Model Testing

  • RMSE ≈ 7285.52
  • Model captures general trends but shows variance

🧠 Key Insights

  • Climate variability strongly impacts crop production
  • Excess rainfall negatively affects yield
  • Soil nutrients interact with pH
  • Area is the strongest production driver

🛠️ Tech Stack

Python Pandas NumPy Matplotlib


⚠️ Limitations

  • Implicit time dimension (no explicit years)
  • High redundancy in dataset
  • Crop-specific yield differences not normalized

🔮 Future Work

  • Apply ML models (XGBoost, Random Forest)
  • Improve feature engineering
  • Integrate external climate datasets

⚡ Implementation

The model implementation is available in the notebooks/Crop_Production.ipynb directory.


👩‍💻 Author

Irem Akcan

About

Statistical analysis and modeling of crop production in India using climate and soil data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors