Skip to content
Howard Hamilton edited this page Jan 22, 2016 · 2 revisions

Welcome to the Marcotti-Light wiki!

Marcotti-Light is the new name of the light version of the Football Match Result Database, a relational database schema developed by Soccermetrics to create football (soccer) match databases.

Marcotti-Light captures full-time match results and, where applicable, penalty kick shootout results for friendly matches or league, knockout, and hybrid (league+knockout) competitions. It even tracks administrative point deductions. These data are used to enable and support research activities for the benefit of the football analytics community.

The Name

Yes, the Marcotti data schemas are named in honor of football pundit Gabriele Marcotti. The project has nothing to do with him, but he did appreciate the gesture.

Why Marcotti-Light?

Analytics projects depend on data, and the collection and preprocessing of data takes up between 60-80% of a typical project. As project scope gets more complicated, the challenge of collecting and wrangling data becomes more daunting and painful. This data schema project grew out of a desire to collect football data once and access it multiple times and in multiple ways.

History of Marcotti and Marcotti-Light

The Marcotti schema was originally called the Football Match Results Database. It was created in 2011 and was refined and extended over the following years. The data models defined by the schema served as the foundation for the Soccermetrics API products, and an analytics library was written to interact with the models.

Marcotti-Light was created to support the former ResultsPage website as a stripped down version of Marcotti that contained full-time results of teams in various types of competitions. It was open-sourced in November of 2015 (link here).

The Current Marcotti-Light Schema

The original data schema consisted of scripts that defined tables and views in raw SQL. Two schemas are created: one for matches involving club teams, and another for matches involving national teams.

The current data schema makes use of the SQLAlchemy database library to map the database tables to Python classes that define the corresponding data models. This allows us to build a collection of data models that are common to club and national team schemas. It also permits the creation of base models with common attributes and methods that are then inherited by other models.

Using SQLAlchemy to define the data schema allows us to offload low-level read/write operations to the library. It also makes it easier to write test suites for these data models.