Skip to content

Latest commit

Β 

History

History
97 lines (70 loc) Β· 4.04 KB

File metadata and controls

97 lines (70 loc) Β· 4.04 KB

Exploratory Data Analysis of Climate and Land-Use Data 🌍

Python Dask Machine Learning License

This repository contains the code and documentation for a project exploring the relationship between climate change, land-use practices, and natural disasters. The study emphasizes Brazil while leveraging global datasets to provide insights into disaster trends and environmental factors.

πŸ“‹ Summary

  • Objective: Analyze trends and patterns of natural disasters in relation to environmental factors.
  • Datasets:
  • Techniques:
    • Data cleaning and normalization
    • Exploratory data analysis (descriptive statistics and visualizations)
    • Predictive modeling using machine learning
    • Regional analysis focusing on Brazil

πŸ›  Project Structure

β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw/                # Raw datasets
β”‚   β”œβ”€β”€ interim/            # Intermediate datasets
β”‚   └── processed/          # Final processed datasets
β”œβ”€β”€ main.py                 # Main script for data processing and analysis
β”œβ”€β”€ ANALISE_EXPLORATORIA.ipynb  # Jupyter Notebook for analysis and visualizations
β”œβ”€β”€ Exploratory Data Analysis of Climate and Land-Use Data.pdf  # Final report
└── README.md               # Project documentation

πŸš€ How to Run

Prerequisites

  • Python 3.9+
  • Libraries: Dask, Pandas, NumPy, Matplotlib, Seaborn

Steps

  1. Clone this repository:

    git clone https://github.com/your-username/your-repository.git
    cd your-repository
  2. Install the dependencies:

    pip install -r requirements.txt
  3. Run the main script:

    python main.py
  4. Explore the results in the output files:

    • Normalized data: data/interim/dataConcat_silver.csv
    • Processed data: data/processed/dataConcat_gold.csv

Optional: Analyze in Jupyter Notebook

Open the ANALISE_EXPLORATORIA.ipynb file to explore the analysis and visualizations interactively.

πŸ“ Key Findings

  • Increasing Disaster Frequency: A clear trend of increasing natural disasters was observed, with a rate of 4.09 events/year (RΒ²=0.37, p=0.0003).
  • Brazil Focus: The analysis identified deforestation rates and forest area as critical predictors for temperature changes.
  • Best Predictive Model: Random Forest achieved the best RΒ² score with minimal prediction error.

🧠 Conclusions

The findings emphasize the importance of regional environmental policies and climate resilience strategies. Data science plays a crucial role in deriving actionable insights for sustainable decision-making.

πŸ‘₯ Contributors

πŸ’» Languages Used

Jupyter Notebook Python

βš™οΈ Suggested Workflows

Based on the tech stack, the following workflows are recommended: