A small Python tool that:
- Calls the Country Leaders API to retrieve a list of countries and their past/present political leaders.
- Visits each leader’s Wikipedia page and grabs the first paragraph of their biography.
- Cleans that paragraph (removes footnote markers, pronunciation text, extra whitespace) with regular expressions.
- Saves everything as nicely formatted JSON (
leaders.json).
Fast – reuses one requests.Session() connection for all Wikipedia calls
Cookie-aware – automatically refreshes API cookies if they expire
One-command run – python leaders_scraper.py fetches → scrapes → cleans → saves
-
Clone the repository:
git clone https://github.com/evivelentza/wikipedia-scraper.git cd wikipedia-scraper -
(Recommended) Create and activate a virtual environment:
python3 -m venv clean_venv source clean_venv/bin/activate -
Install dependencies:
pip install beautifulsoup4 pip install requests
To run the scraper from the command line:
python3 leaders_scaper.py