This project is archived due to FBRef no longer having xG data available on their website. There are multiple things one would need to change to get this repository to work from the current state it is in - I have fixed it locally, but the changes are messy and liable to change very often so don't feel it is worth updating until there is more clarity on the future of FBRef. The web pages and the master database can be downloaded from this Google Drive link. The matches included in the dataset are all the valid matches (not abandoned, e.g.) in the date ranges below:
The rest of this README.md is unaltered from how it was when the project wasn't archived.
A tool to compile player and team statistics from FBRef match reports into a local SQLite database.
-
Clone the repository:
git clone https://github.com/ChrisMusson/FBRef_DB.git cd FBRef_DB -
Install dependencies with uv:
If you don't already have
uvinstalled:curl -LsSf https://astral.sh/uv/install.sh | shThen, create and activate a virtual environment:
uv venv .venv source .venv/bin/activateAnd install the dependencies from pyproject.toml:
uv pip install .
-
Download match data:
Download the
.zipfiles for the leagues you want from this Google Drive link. (Last updated 15 Jan 2026) -
Extract files:
Unzip the downloaded files into the
web_pages/directory.
This should result in a folder structure like:web_pages/ ├── Premier_League/ │ ├── 2017-2018/ │ ├── 2018-2019/ │ └── ... └── Ligue_1/ └── ... -
Edit
main.pyto specify your leagues:By default,
main.pyis set to process the 2024-2025 season for the top 6 European leagues. You can modify thecompetitionsandseasonslists to include/exclude the leagues and seasons you want to process.competitions = ["Premier_League", "Bundesliga", "La_Liga", "Ligue_1", "Serie_A", "Primeira_Liga"] seasons = ["2023-2024", "2024-2025"] main("master.db", competitions=competitions, seasons=seasons)
-
Run the script:
python main.py
This will:
- Check FBRef for newly played matches in your selected leagues
- Add new match pages to the
web_pages/folders - Parse the HTML pages
- Populate/update the
master.dbSQLite database
Use DB Browser for SQLite to explore master.db or premier_league.db.
A visual overview of the database schema is available here:
https://dbdiagram.io/d/62221bf854f9ad109a5e298c
- The
master.dbfile in this repo contains all top 6 European leagues. - A
premier_league.dbfile (only Premier League) is available in the Google Drive link.
