This project is part of the Udacity Nanodegree Course: Data Scientist - Data Engineering
The Project analyzes disaster data from Appen containing real messages that were sent during disaster events and builds a model for an API that classifies disaster messages.
You can install the repository on your pc by using:
git clone git@github.com:Ivana-DS/Disaster-Response-Pipeline.git
Python version: 3.8.10
You also need to install the requirements file:
pip install -r requirements.txt
1. ETL Pipeline
- Loads the messages and categories datasets
- Merges the two datasets
- Cleans the data
- Stores it in a SQLite database
2. ML Pipeline
- Loads data from the SQLite database
- Splits the dataset into training and test sets
- Trains and tunes a model using GridSearchCV
- Outputs results on the test set
- Exports the final model as a pickle file
3. Flask Web App
- classifies new messages in several categories
- displays some visualizations of the data
- app
| - template
| |- master.html # main page of web app
| |- go.html # classification result page of web app
|- run.py # Flask file that runs app
- data
|- disaster_categories.csv # data to process
|- disaster_messages.csv # data to process
|- process_data.py # ETL Pipline
|- InsertDatabaseName.db # database to save clean data to
- models
|- train_classifier.py # ML Pipline
|- classifier.pkl # saved model
|-evaluation_report.txt
- readme_images
|- classification_example.png
|- data_visualisation_examlpe.png
- .gitignore
- README.md
- requirements.txt
1. ETL Pipeline To run the ETL Pipeline go to the data repository:
cd data
and run:
python process_data.py disaster_messages.csv disaster_categories.csv DisasterResponse.db
2. ML Pipeline
To run the ML Pipeline and train the model go to the models repository:
cd models
and run:
python train_classifier.py ../data/DisasterResponse.db disaster_messages classifier.pkl
This step takes about 1 hour and 40 minutes, because of the optimizing of the parameters with grid search. If you just want to try the Flask Web App than feel free to use the pretrained model.
To download the pretrained classifier from this release assets page https://github.com/Ivana-DS/Disaster-Response-Pipeline/releases/tag/pretrained_classifier and put it in the following location:
models/classifier.pkl
3. Flask Web App To run the Fask Web App and classify messages go to the app repository:
cd models
and run:
python run.py
and then open http://127.0.0.1:3000 in your browser
If you are interested in the model performance I got on the test data, you can see it in models/evaluation_report
The file contains the classification reports from all categories.

