Event Dates: February 27–28, 2026
Location: Northeastern University / Network Science Institute
Supported by: PySport
This repository provides starter code and instructions for the Soccer Data Analytics Hackathon. You'll work with IMPECT Open Data containing 306 German Bundesliga matches from the 2023/24 season to tackle one of two challenge prompts.
Recommend an optimal starting eleven and/or substitution plan to maximize team cohesion and ball progression. Build a player-to-player pass network, analyze network structure, and compare alternative lineups with clear visualizations.
Define an interpretable attacking or defensive metric using event data. Create a metric definition, produce a leaderboard comparing players, present a case study, and discuss limitations.
!pip install "kloppy>=3.18.0" polars pyarrowOpen getting-started.ipynb to see examples of:
- Loading matches and squad data
- Filtering for specific event types (passes, shots)
- Transforming coordinate systems
- Exporting to Polars/Pandas DataFrames
SoccerImpectHackathon/
├── getting-started.ipynb # Tutorial notebook
├── environment.yml # Conda environment
├── requirements.txt # Python dependencies
├── .gitignore # Git ignore rules
├── LICENSE # MIT License
├── README.md # This file
└── CONTRIBUTING.md # Contribution guidelines
- Slide deck (PDF, max 8 slides) with clear visualizations
- GitHub repository with:
- Clean, reproducible code
README.mdexplaining your approachenvironment.ymlorrequirements.txt- Open-source license (MIT or Apache 2.0)
| Milestone | Date |
|---|---|
| Release & Data Primer | Monday, November 3, 2025 |
| Registration Deadline | Wednesday, December 31, 2025 |
| Checkpoint (draft slides/repo) | Monday, February 2, 2026 |
| Final Work Session & Judging | Friday, February 27, 2026 |
| Industry Talks & Awards | Saturday, February 28, 2026 |
- Python: kloppy, polars, pandas, numpy, mplsoccer, databallpy, networkx
- R: tidyverse, igraph
- Any language is acceptable as long as your work is reproducible
- IMPECT Open Data is for non-commercial use only
- Cite all sources appropriately
- If using AI tools, document where and how they were used
- For example: This .README was generated with the help of Claude Sonnet 4.5
- Be transparent about limitations in your methodology
- Problem framing & soccer context (10 pts)
- Data engineering & correctness (15 pts)
- Methodology quality (15 pts)
- Validation & robustness (15 pts)
- Results & insight (15 pts)
- Communication & visualization (15 pts)
- Reproducibility & ethics (15 pts)
Questions? Email northeasternsportsanalytics@gmail.com
Good luck and happy hacking! ⚽📊