Project: Disaster Response Pipeline

Table of Content

Project Overview
Project Software Stack
Run the Project
File Structure
Software Requirements
Conclusion
Links

1. Project Overview

The motivation behind this project is to classify disaster messages into categories. Through a web app, the user can input a new disaster message and get classification results in several categories. With this classification help can be organized in an efficient way.

In the “Disaster Response Pipeline” project, I will apply data engineering and machine learning to analyze disaster data provided by Figure Eight and Udacity to build a ML classifier model that classifies disaster messages from social media and news.

The 'data' directory contains real messages that were sent during disaster events. I will create a machine learning pipeline to categorize these events so that appropriate disaster help agencies can be reached out for help.

In the project data 26248 messages with a unique id are included. Each massage will be categorized in the ML model within 36 categories.

This project will include a web app where an emergency worker can input a new message and get classification results in several categories. The web app will also display visualizations of the data.

2. Project Software Stack

The software stack of this project contains three main parts:

2.1. ETL Pipeline

File /data/process_data.py contains data cleaning pipeline:

Loads the 'disaster_messages' and 'disaster_categories' dataset
Merges the two datasets in one
Cleans the data in the combined data frame
Stores the data in a SQLite database “DisasterResponse.db”

2.2. ML Pipeline

File /models/train_classifier.py contains the machine learning pipeline:

Loads data from the SQLite database “DisasterResponse.db”
Splits the data into train and test data sets
Builds a text processing and machine learning pipeline
Trains and tunes a model using GridSearchCV
Outputs analytics result on the test set
Exports the final model as a pickle file

2.3. Flask Web App

Running the start command (please see No.3) from app directory will start the web app. Users can enter their query, i.e., a request message sent during a natural disaster, e.g. "We need more water and food in New York!".

Screenshot 1

What the app will do is that it will classify the new text message entered via the app. The app will classify the message into categories so that the relief agency can be reached out for help.

3. Run the Project

Starting with the ETL process there are three steps necessary to get the WebApp in place and use the tool.

Screenshot 2

3.1. Data Cleaning

Run the following commands in the project's root directory to set up your database and model.

To run ETL pipeline that cleans data and stores in database:

python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db

The first two arguments are input data and the third argument is the SQLite Database in which we want to save the cleaned data. The ETL pipeline is in process_data.py.

DisasterResponse.db already exists in project's root directory folder but the above command will still run and replace the file with same information.

3.2. Training ML Classifier

After the data cleaning process, run this command to run ML pipeline that trains classifier and saves ML classifier from the project directory:

python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl

This will use cleaned data to train the ML model, improve the model with grid search and saved the model to a pickle file (classifer.pkl).

classifier.pkl already exists but the above command will still run and replace the file will new information.

3.3. Run the Web App

After data cleaning and creation of the ML model the ML model will be used to predict new messages direct in the Web App interface. Run the following command in the app's directory to run the web app.

Go the app directory and run the following command:

python run.py

This will start the web app.

Go to http://0.0.0.0:3001/

Here you can enter messages and get classification results for it.

Screenshot 3 "Frontend"

Screenshot 4 "Backend"

4. File Structure

.
├── app
│   ├── run.py------------------------# FLASK FILE THAT RUNS APP
│   ├── templates
│       ├── go.html-------------------# CLASSIFICATION RESULT PAGE
│       └── master.html---------------# MAIN PAGE OF WEB APP
├── data
│   ├── 
│   ├── disaster_categories.csv-------# DATA TO PROCESS
│   ├── disaster_messages.csv---------# DATA TO PROCESS
│   └── process_data.py---------------# PERFORMS ETL PROCESS
├── images ---------------------------# PLOTS and SCREENSHOTS
├── models
│   └── classifier.pkl----------------# ML MODEL
│   └── train_classifier.py-----------# PERFORMS CLASSIFICATION TASK
├──DisasterResponse.db----------------# DATABASE TO SAVE CLEANED DATA
│

5. Software Requirements

The project uses Python 3.7 and additional libraries:

pandas
numpy
sys
time
collections
json
re
warnings
operator
pickle
pprint
flask
nltk
plotly
scikit-learn
SQLAlchemy

6. Conclusion

Classification metrics for the ML model:

Accuracy of the ML model is: 0.95 (accuracy is the fraction of samples predicted correctly)
Recall of the ML model is: 0.64 (also known as sensitivity; is the fraction of positive events that were predicted correctly)
f1-score of the ML model is: 0.70 (f1-score is the harmonic mean of recall and precision, higher score means a better model)

Link to understanding Data Science Classification Metrics in Scikit-Learn in Python

Link regarding Ffne tuning a classifier in scikit-learn

You will see the exact value after the model is trained by grid searchdirect in the command line.

Though the accuracy metric is high it has a poor value for recall. This ML model is not yet ready for production or the data input is not enough. But for this Show-Case the data and the ML model was great!

Screenshot 5

7. Links

Classification Metrics:

Link to understanding Data Science Classification Metrics in Scikit-Learn in Python

Link regarding Ffne tuning a classifier in scikit-learn

Help:

Ideas, Help and Templates:

GitHub from Sanjeev Yadav

GitHub from Genevieve Hayes

This project was completed as part of the Udacity Data Scientist Nanodegree. Code templates and data were provided by Udacity. The data was originally sourced by Udacity from Figure Eight.

Udacity Data Scientist Nanodegree

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project: Disaster Response Pipeline

Table of Content

1. Project Overview

2. Project Software Stack

2.1. ETL Pipeline

2.2. ML Pipeline

2.3. Flask Web App

3. Run the Project

3.1. Data Cleaning

3.2. Training ML Classifier

3.3. Run the Web App

4. File Structure

5. Software Requirements

6. Conclusion

7. Links

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
app		app
data		data
images		images
models		models
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Project: Disaster Response Pipeline

Table of Content

1. Project Overview

2. Project Software Stack

2.1. ETL Pipeline

2.2. ML Pipeline

2.3. Flask Web App

3. Run the Project

3.1. Data Cleaning

3.2. Training ML Classifier

3.3. Run the Web App

4. File Structure

5. Software Requirements

6. Conclusion

7. Links

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages