This project aims to predict extreme weather events with a focus on wildfires. Wildfires are significant natural disasters that cause extensive damage to the environment, property, and human life. Accurate prediction of wildfires can help in planning and mitigating these risks.
We are using publicly available wildfire datasets. Below are two reliable sources where the same dataset can be accessed:
- National Interagency Fire Occurrence - Sixth Edition (1992-2020) on Data.gov
- Kaggle - US Wildfire Records (6th Edition)
- Download the dataset from one of the sources above.
- Save the dataset as
data.csvin the project directory.
pandasfor data manipulation and analysismatplotlibfor plotting and visualizationseabornfor statistical data visualization
We start with EDA to understand the structure and characteristics of the dataset.
- Load the dataset into a Pandas DataFrame.
- Display the first few rows.
- Print a summary of the dataset.
- Handle missing values.
- Remove duplicate rows.
- Ensure data consistency by converting columns to appropriate data types.
- Convert
DISCOVERY_DATEandCONT_DATEto datetime format.
- Convert
- Fire Count by Year: Analyzed the yearly trend of fire counts from 1992 to 2020.
- Fire Count by State: Identified the states with the highest number of fires.
- Fire Size Classification: Examined the distribution of fires based on size classification.