Storm Stoppers

Introduction: According to the storm data on the United States of America from NOAA, in 2021 978 people were killed directly and indirectly by storms. There are many different types of storms that can occur, such as tornados, thunderstorms, hail, wind, and more. While these are only a few of the storms that can impact people, they can still have devastating effects for communities and can cause deaths and injuries. These storms happen all year round and can be more prevalent based on location. The location can also impact the type of storm hitting the area. Using the NOAA dataset from the U.S., we wanted to analyze whether we could predict future storms based on the region they appear in. Based on the trends presented based on the type of storm that occurred, the latitude and longitude, and how many deaths that occurred, we can work to see when these storms will most likely occur as well as where. We are also interested in predicting how many deaths/injuries can occur based on these storms that occur.

Assumptions used in Data Cleaning: We cleaned our data using Python. When cleaning the data, we had to assume several conditions that would allow us to remove specific columns. Within the location data, we assumed several variables would not be necessary for our analysis. We assumed that columns relating to location outside of latitude and longitude, and specific region should be removed as they all conveyed similar or the same information. When working with the fatalities dataset, it was mostly good but we removed several of the additional dates included. The details dataset had the most information and required the most work. We had to requantify the property damage column and ensure all of the values had the same units. Additionally, we removed several other overlapping variables that were present in other datasets. Once the cleaning was complete, we joined all three datasets by the event identification number.

Methodology: In analyzing our data, we moved over to R Studio. In our analysis, we followed multiple approaches. In order to begin working with our variables we conducted an exploratory data analysis.

From there we attempted a linear regression to see if there were relationships among variables in predicting months, or deaths using years, months, deaths, and event types. We used forward selection to select our variables. Forward selection picks one predictive variable, tests its predictive power and if it’s accurate, keeps the variable and moves onto the next term. It continues that trend until all the variables are attempted for fit. From that analysis, we found the most correlation between predicting deaths through using the month and year. Our analysis said that no predictive variables should be used in predicting month specifically. When testing the linearity conditions of our model for predicting deaths, we see that linearity, zero mean, independence and constant variance. It should be noted that normality has some left skew but is otherwise well fit to a normal distribution. Although we found correlation between deaths and these other variables, we cannot say that there is causation as there may be other variables that are not within the dataset to cause deaths not noted here.

In addition to the modeling, we wanted to visualize some of the trends we noticed. We took our dataset and created an interactive element in PowerBI. We wanted to see what sort of trends could be drawn out from a more friendly visualization software.

Challenges: We faced two challenges in working with our dataset. The first was a lack of accurate documentation regarding the variables in the datasets. The second was the separation of the data into three different types of documentation. Oftentimes, this documentation had variables that were overlapping or somewhat contradictory to one another.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
DataClean.py		DataClean.py
README.md		README.md
analysis.py		analysis.py
cleandata.Rmd		cleandata.Rmd
details2000.csv		details2000.csv
details2001.csv		details2001.csv
details2002.csv		details2002.csv
details2003.csv		details2003.csv
details2004.csv		details2004.csv
details2005.csv		details2005.csv
details2006.csv		details2006.csv
details2007.csv		details2007.csv
details2008.csv		details2008.csv
details2009.csv		details2009.csv
details2010.csv		details2010.csv
details2011.csv		details2011.csv
details2012.csv		details2012.csv
details2013.csv		details2013.csv
details2014.csv		details2014.csv
details2015.csv		details2015.csv
details2016.csv		details2016.csv
details2017.csv		details2017.csv
details2018.csv		details2018.csv
details2019.csv		details2019.csv
details2020.csv		details2020.csv
details2021.csv		details2021.csv
details2022.csv		details2022.csv
fatalities2000.csv		fatalities2000.csv
fatalities2001.csv		fatalities2001.csv
fatalities2002.csv		fatalities2002.csv
fatalities2003.csv		fatalities2003.csv
fatalities2004.csv		fatalities2004.csv
fatalities2005.csv		fatalities2005.csv
fatalities2006.csv		fatalities2006.csv
fatalities2007.csv		fatalities2007.csv
fatalities2008.csv		fatalities2008.csv
fatalities2009.csv		fatalities2009.csv
fatalities2010.csv		fatalities2010.csv
fatalities2011.csv		fatalities2011.csv
fatalities2012.csv		fatalities2012.csv
fatalities2013.csv		fatalities2013.csv
fatalities2014.csv		fatalities2014.csv
fatalities2015.csv		fatalities2015.csv
fatalities2016.csv		fatalities2016.csv
fatalities2017.csv		fatalities2017.csv
fatalities2018.csv		fatalities2018.csv
fatalities2019.csv		fatalities2019.csv
fatalities2020.csv		fatalities2020.csv
fatalities2021.csv		fatalities2021.csv
fatalities2022.csv		fatalities2022.csv
fats.csv		fats.csv
locations2000.csv		locations2000.csv
locations2001.csv		locations2001.csv
locations2002.csv		locations2002.csv
locations2003.csv		locations2003.csv
locations2004.csv		locations2004.csv
locations2005.csv		locations2005.csv
locations2006.csv		locations2006.csv
locations2007.csv		locations2007.csv
locations2008.csv		locations2008.csv
locations2009.csv		locations2009.csv
locations2010.csv		locations2010.csv
locations2011.csv		locations2011.csv
locations2012.csv		locations2012.csv
locations2013.csv		locations2013.csv
locations2014.csv		locations2014.csv
locations2015.csv		locations2015.csv
locations2016.csv		locations2016.csv
locations2017.csv		locations2017.csv
locations2018.csv		locations2018.csv
locations2019.csv		locations2019.csv
locations2020.csv		locations2020.csv
locations2021.csv		locations2021.csv
locations2022.csv		locations2022.csv
locs.csv		locs.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Storm Stoppers

About

Uh oh!

Releases

Packages

Contributors 4

Languages

shefali-pai/cdc2022

Folders and files

Latest commit

History

Repository files navigation

Storm Stoppers

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages