Skip to content

This repository houses the starter files and template .README for the Northeastern University 2025-26 Soccer Data Hackathon

License

Notifications You must be signed in to change notification settings

eaegerber/SoccerImpectHackathon

 
 

Repository files navigation

Soccer Data Analytics Hackathon - Getting Started

Event Dates: February 27–28, 2026
Location: Northeastern University / Network Science Institute
Supported by: PySport

Overview

This repository provides starter code and instructions for the Soccer Data Analytics Hackathon. You'll work with IMPECT Open Data containing 306 German Bundesliga matches from the 2023/24 season to tackle one of two challenge prompts.

Challenge Prompts (Choose One)

Option A: Starting Eleven Lineup Construction

Recommend an optimal starting eleven and/or substitution plan to maximize team cohesion and ball progression. Build a player-to-player pass network, analyze network structure, and compare alternative lineups with clear visualizations.

Option B: Transparent Player Valuation Metric

Define an interpretable attacking or defensive metric using event data. Create a metric definition, produce a leaderboard comparing players, present a case study, and discuss limitations.

Quick Start

1. Installation (within Jupyter notebook)

!pip install "kloppy>=3.18.0" polars pyarrow

2. Explore the Notebook

Open getting-started.ipynb to see examples of:

  • Loading matches and squad data
  • Filtering for specific event types (passes, shots)
  • Transforming coordinate systems
  • Exporting to Polars/Pandas DataFrames

Project Structure

SoccerImpectHackathon/
├── getting-started.ipynb        # Tutorial notebook
├── environment.yml              # Conda environment
├── requirements.txt             # Python dependencies
├── .gitignore                   # Git ignore rules
├── LICENSE                      # MIT License
├── README.md                    # This file
└── CONTRIBUTING.md              # Contribution guidelines

Deliverables (Due Friday, February 27, 2026)

  1. Slide deck (PDF, max 8 slides) with clear visualizations
  2. GitHub repository with:
    • Clean, reproducible code
    • README.md explaining your approach
    • environment.yml or requirements.txt
    • Open-source license (MIT or Apache 2.0)

Timeline

Milestone Date
Release & Data Primer Monday, November 3, 2025
Registration Deadline Wednesday, December 31, 2025
Checkpoint (draft slides/repo) Monday, February 2, 2026
Final Work Session & Judging Friday, February 27, 2026
Industry Talks & Awards Saturday, February 28, 2026

Suggested Tools

  • Python: kloppy, polars, pandas, numpy, mplsoccer, databallpy, networkx
  • R: tidyverse, igraph
  • Any language is acceptable as long as your work is reproducible

Resources

License & Ethics

  • IMPECT Open Data is for non-commercial use only
  • Cite all sources appropriately
  • If using AI tools, document where and how they were used
    • For example: This .README was generated with the help of Claude Sonnet 4.5
  • Be transparent about limitations in your methodology

Judging Criteria (100 points)

  • Problem framing & soccer context (10 pts)
  • Data engineering & correctness (15 pts)
  • Methodology quality (15 pts)
  • Validation & robustness (15 pts)
  • Results & insight (15 pts)
  • Communication & visualization (15 pts)
  • Reproducibility & ethics (15 pts)

Contact

Questions? Email northeasternsportsanalytics@gmail.com


Good luck and happy hacking! ⚽📊

License: MIT Python 3.11+

About

This repository houses the starter files and template .README for the Northeastern University 2025-26 Soccer Data Hackathon

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%