Disaster Impact Database

The Disaster Impact Database is an open‑source initiative that collects, cleans, and harmonises disaster‑related data from multiple global providers. It produces a unified, analysis‑ready dataset that supports the Anticipatory Action Framework and broader humanitarian research.

Supported sources: GDACS · GLIDE · CERF · EM‑DAT · IDMC · IFRC‑DREF · Disaster Charter

Key Goals

Automated downloads of raw data where APIs exist.
Headless scraping for semi‑automated sources.
Normalisation into consistent, JSON‑schema‑validated tables (under development).
Export of tidy CSVs for each provider.
Event matching across feeds by hazard, country, and date (±7 days).
Exploratory analytics that quantify overlap and uniqueness across sources.

Prerequisites

Requirement	Purpose
Python ≥ 3.10	Core language
Poetry	Virtual‑env and dependency management
Firefox	Headless scraping engine
GeckoDriver	WebDriver interface for Firefox
Unix‑like shell	Tested on macOS, Linux, and Windows WSL2

Tip — GLIDE scraper profile
The GLIDE portal occasionally shows CAPTCHAs. Edit src/glide/data_acquisition_scrape.py and set FIREFOX_PROFILE to the absolute path of a persistent Firefox profile (e.g. ~/.mozilla/firefox/abcd1234.default-release).

Quick Start

# 0  Install Poetry (skip if you already have it)
$ curl -sSL https://install.python-poetry.org | python3 -

# 1  Clone the repo and enter it
$ git clone https://github.com/mapaction/disaster-impact.git
$ cd disaster-impact

# 2  Create the virtual environment (Poetry will be installed if missing)
$ make .venv

# 3  Activate the environment (Poetry makes this automatic in new shells)
$ poetry shell

All Makefile targets below assume the environment is active.

Data Acquisition

Dataset	Access Method	Historical Coverage	Makefile Target	Status
GDACS	REST API	2000 – present	`make run_gdacs_download`	Automated
IDMC IDU	REST API	2016 – present	`make run_idus_download`	Automated
GLIDE	Headless scrape	1930 – present	`make run_glide_scrape`	Semi‑automated
CERF	Headless scrape	2006 – present	`make run_cerf_scrape`	Semi‑automated
Disaster Charter	Headless scrape	2000 – present	`make run_charter_scrape`	Semi‑automated
EM‑DAT	Manual download	2000 – present	—	Manual
IFRC DREF	Manual download	2018 – present	—	Manual

Raw files are stored in data/<provider>/, preserving provenance and update timestamps.

Processing & Analysis Workflow

Load raw datasets (see notebooks/process_sandbox.ipynb).
Pre‑process: select columns, rename, parse dates, harmonise hazard labels.
Match events by hazard, ISO‑3 country code, and date window (±7 days).
Generate analytics: bar charts of retention/overlap and a chord diagram of pairwise matches.

The notebook is fully reproducible; rerun it after refreshing data to obtain an updated master table.

Project Structure

.
├── data/                # Raw datasets (one sub‑folder per provider)
├── docs/                # Additional documentation
├── notebooks/           # Jupyter notebooks for ETL and analysis
├── src/                 # Source code modules
│   ├── cerf/
│   ├── disaster_charter/
│   ├── emdat/
│   ├── gdacs/
│   ├── glide/
│   ├── idmc/
│   ├── ifrc_dref/
│   ├── unified/         # Unified schema & helpers
│   └── utils/
├── static_data/         # Reference tables (e.g., country & hazard codes)
├── tests/               # Unit & integration tests
├── Makefile             # Automation commands
├── pyproject.toml       # Project metadata
└── README.md            # This file

Common Make Targets

Target	Action
`.venv`	Bootstrap the Poetry virtual‑env
`test`	Run the test suite (`pytest`)
`lint`	Run `ruff` and `mypy` checks
`clean`	Remove virtual‑env, caches & temporary files
`run_<source>_download`	Refresh a specific feed (see table above)

Limitations & Roadmap

Matching logic is intentionally conservative; multi‑country or slow‑onset events may be under‑linked.
ETL pipeline is notebook‑driven; migration to a parametric workflow (e.g., Airflow, Dagster) is planned.
Manual feeds (EM‑DAT, IFRC‑DREF) need scripted ingestion once stable APIs become available.
Funding gap stalled development after the HNPW 2025 demo; contributions are welcome to resume full ETL work.

Contributing

Fork the repository and create a feature branch from main.
Commit logical, well‑documented changes.
Ensure make test lint passes.
Open a pull request; the CI pipeline will run automatically.

License

Distributed under the GNU GPL v3. See the LICENSE file for details.

Author

Evangelos Diakatos · ediakatos@mapaction.org

Happy coding & stay safe!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disaster Impact Database

Key Goals

Prerequisites

Quick Start

Data Acquisition

Processing & Analysis Workflow

Project Structure

Common Make Targets

Limitations & Roadmap

Contributing

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 451 Commits
.github/workflows		.github/workflows
docs		docs
notebooks		notebooks
src		src
static_data		static_data
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Disaster Impact Database

Key Goals

Prerequisites

Quick Start

Data Acquisition

Processing & Analysis Workflow

Project Structure

Common Make Targets

Limitations & Roadmap

Contributing

License

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages