Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
119 commits
Select commit Hold shift + click to select a range
fd9fa44
gitignore
Rob-Sligter Apr 23, 2025
e29e412
This commit contains a f1 dataset from kaggle and explores some relat…
Rob-Sligter May 1, 2025
b7f2f68
sorted plots to show relation between average normalized laptime comp…
Rob-Sligter May 1, 2025
732fa66
this commit contains the code where the training and test datasets ar…
Rob-Sligter May 1, 2025
05eda4d
this commit contains a trained neural network on some basic features …
Rob-Sligter May 1, 2025
f3df08d
created PipFile with dependencies
Mario-04 May 2, 2025
6003478
tested environment: Success!
Mario-04 May 2, 2025
411963e
Installed matplotlib
Mario-04 May 2, 2025
e08b10c
insalled: Keras, tensorflow, scikit-learn
Mario-04 May 2, 2025
81c6ef0
first PCA Boii
Mario-04 May 2, 2025
8e94347
moved all .csv files to the data directory
Mario-04 May 4, 2025
69667bb
Track CSV files with Git LFS
Mario-04 May 4, 2025
76faddf
Removed Git LFS tracking for .psd files
Mario-04 May 4, 2025
5fb5344
Delete train_data.csv
Mario-04 May 4, 2025
0981d67
fixed pathing issue
Mario-04 May 4, 2025
33d0a02
notebook organisation
Mario-04 May 4, 2025
cf01efd
moved the dataset class to a new python file
Mario-04 May 4, 2025
b21e713
renamed train_network to train_model
Mario-04 May 4, 2025
6635a09
moved train_model to root directory
Mario-04 May 4, 2025
a3c7645
moved training_data to directory for preprocessing
Mario-04 May 4, 2025
e2239e9
renamed project_name to f1_predictor
Mario-04 May 4, 2025
1643037
created experimental directory
Mario-04 May 4, 2025
c86a4d5
moved plots.py to experimental
Mario-04 May 4, 2025
c67d1fc
Update README.md project file structure
Mario-04 May 4, 2025
2eefda9
Delete gitignore
Mario-04 May 4, 2025
34ccc06
This commit contains the proper feature calculation code including th…
Rob-Sligter May 4, 2025
9af6400
This commit contains the proper feature calculation code including th…
Rob-Sligter May 4, 2025
20be780
This commit contains the untested code to manage the models as propos…
Rob-Sligter May 8, 2025
3135960
this commit contains code to train and test a model, requiremnts.txt.…
Rob-Sligter May 9, 2025
cb2c9a1
Fixed feature importance xgboost
Rob-Sligter May 9, 2025
d79c38f
quick test
Rob-Sligter May 9, 2025
11b17e7
this code contains a new model new features and an optimized mlp
Rob-Sligter May 9, 2025
202805c
improved feature generation and pairplot
Rob-Sligter May 13, 2025
abb850d
Merge pull request #2 from 1-million-weed/rob-development
Mario-04 May 15, 2025
2df5e90
added tensorboard to mlp
Rob-Sligter May 15, 2025
55c9962
auto run tensorboard
Rob-Sligter May 15, 2025
6ddbb3d
Merge pull request #3 from 1-million-weed/Rob-Tensor-Board
Mario-04 May 15, 2025
2a8e766
This commit contains the driver elo calculator from the kaggle websit…
Rob-Sligter May 16, 2025
57e44c0
Complete remove of Git LFS metadata and CSV pointers
Mario-04 May 16, 2025
7181c8e
Complete remove of Git LFS metadata and CSV pointers
Mario-04 May 16, 2025
7766a05
This commit contains the code for calculating the drivers elo in expe…
Rob-Sligter May 18, 2025
696b0ee
added team points feature
Rob-Sligter May 18, 2025
124646b
This commit contains; A new way of interacting with the code base via…
Rob-Sligter May 18, 2025
a539d66
Merge branch 'development' of https://github.com/1-million-weed/Appli…
Mario-04 May 19, 2025
e4531a9
Merge pull request #4 from 1-million-weed/Rob-Tensor-Board
Mario-04 May 19, 2025
8d463b6
quickly made an api and streamlit app. Also added a docker file. Unte…
Rob-Sligter May 20, 2025
3f2d8e8
hot fix to make config work with pipeline
Rob-Sligter May 20, 2025
0a42d31
Merge pull request #5 from 1-million-weed/rob_app
Rob-Sligter May 21, 2025
d6934eb
Proper naming of eval. Also added a dockerignore
Rob-Sligter May 21, 2025
6825a97
first unit test for _convert_time_to_milliseconds
Mario-04 May 21, 2025
de42abb
test constructor and driver points passed
Mario-04 May 21, 2025
f7d72dc
added multilayerregression. and requirement.
Rob-Sligter May 22, 2025
0a67025
thoroughly tested min_max normalisation
Mario-04 May 22, 2025
984e066
split by year to prevent leakage
Rob-Sligter May 23, 2025
402b29d
update unit tests
Mario-04 May 23, 2025
6d4e581
Some methods were static
Mario-04 May 25, 2025
283405e
DatasetLoader test done
Mario-04 May 25, 2025
64d6b16
Merge branch 'mario-unit-tests' into rob_app
Mario-04 May 25, 2025
74df40c
added sphinx docstrings and type hints
fromDLA May 26, 2025
329c6d8
This commit contains the elo calculation code from kaggle in a more r…
Rob-Sligter May 27, 2025
3b27da2
Merge branch 'development' into rob_app
Rob-Sligter May 27, 2025
c64bf5b
Merge pull request #7 from 1-million-weed/rob_app
Rob-Sligter May 27, 2025
4269869
hot fix to fix the missing imports by Cordelia
Rob-Sligter May 27, 2025
2a55413
working api
Rob-Sligter May 27, 2025
9b068b4
hotfix requirements
Rob-Sligter May 28, 2025
3c305a2
pre changes for real time updates
Rob-Sligter May 29, 2025
f56d490
laptime, qualifying
Rob-Sligter May 29, 2025
385332e
code not done yet
Rob-Sligter May 29, 2025
c9489d7
code todo finish sample calculation
Rob-Sligter May 29, 2025
39dc922
Update api.py
Mario-04 May 29, 2025
96f17e5
removed unusable features
Mario-04 May 29, 2025
f8f122a
Merge branch 'real-time-crunch' into rob_2025
Mario-04 May 29, 2025
da1d77b
marinus please interperet
Rob-Sligter May 29, 2025
57873f7
Merge branch 'rob_2025' of https://github.com/1-million-weed/Applied-…
Rob-Sligter May 29, 2025
91c6e83
Merge pull request #9 from 1-million-weed/rob_2025
Mario-04 May 29, 2025
15b8c46
prediction
Rob-Sligter May 29, 2025
cfcedc0
meet informatio
Mario-04 May 29, 2025
ddb7c7b
Merge pull request #10 from 1-million-weed/rob_2025
Mario-04 May 29, 2025
4d07eb3
working code
Rob-Sligter May 29, 2025
068a502
now runs with pipeline
Mario-04 May 29, 2025
105b790
code for api pipeline
Rob-Sligter May 29, 2025
57d3ed6
Merge branch 'api' of https://github.com/1-million-weed/Applied-ML-GR…
Mario-04 May 29, 2025
05016b1
Merge pull request #11 from 1-million-weed/rob_2025
Mario-04 May 29, 2025
c600e9a
debugging
Mario-04 May 29, 2025
a5b5285
custom logging json config file
Mario-04 May 31, 2025
0f36a4f
custom logger class for proper parsing of logs
Mario-04 May 31, 2025
cf3975f
updates to handeling the 2025 data and rounding of the output from mo…
Rob-Sligter May 31, 2025
cd261a2
valid response. TODO handle missing values from feature generator. Pr…
Rob-Sligter May 31, 2025
b411a9c
had to make logger work for python 3.9
Mario-04 May 31, 2025
0bf4a47
Merge pull request #12 from 1-million-weed/logging
Mario-04 May 31, 2025
5998943
added logs for pipeline and api. editted api docstrings
Mario-04 May 31, 2025
1b0bfab
added random model to pipeline
Rob-Sligter May 31, 2025
d79cee1
Merge branch 'api' of https://github.com/1-million-weed/Applied-ML-GR…
Rob-Sligter May 31, 2025
e98f232
moved log setup to mylogger.py
Mario-04 May 31, 2025
4363d41
added Matplotlib debug filter
Mario-04 May 31, 2025
df790ee
Merge branch 'api' of https://github.com/1-million-weed/Applied-ML-GR…
Mario-04 May 31, 2025
e3ef699
forgot to initialise the filter
Mario-04 May 31, 2025
bc097a4
logger level now in config
Mario-04 May 31, 2025
ec75bb7
edited log statements
Mario-04 May 31, 2025
d31ac8b
fix zero indezing mlr
Rob-Sligter Jun 1, 2025
17d1763
Merge branch 'api' of https://github.com/1-million-weed/Applied-ML-GR…
Rob-Sligter Jun 1, 2025
926a696
streamlit page for api demo
Mario-04 Jun 1, 2025
4412354
Merge pull request #13 from 1-million-weed/streamlit
Mario-04 Jun 1, 2025
37e6670
layer changes MLR
Rob-Sligter Jun 1, 2025
ccce286
Merge branch 'api' of https://github.com/1-million-weed/Applied-ML-GR…
Rob-Sligter Jun 1, 2025
5b4ae8c
added meetings endpoint
Rob-Sligter Jun 1, 2025
9049c6d
Create NEW-README.md
Mario-04 Jun 1, 2025
ddb6643
README ready to go. can do with more refinement
Mario-04 Jun 1, 2025
961fa04
get max rounds api endpoint
Rob-Sligter Jun 2, 2025
56ed8c8
Merge branch 'api' of https://github.com/1-million-weed/Applied-ML-GR…
Rob-Sligter Jun 2, 2025
a14560d
added show plots option to evaluate option
Rob-Sligter Jun 2, 2025
66fddbf
added tensorboard option
Rob-Sligter Jun 2, 2025
2b17ae0
working api with properposition
Rob-Sligter Jun 2, 2025
edf06c6
fixed 0-indexing for confusion matrixes
Rob-Sligter Jun 2, 2025
e27b9a3
Merge pull request #14 from 1-million-weed/api
Rob-Sligter Jun 2, 2025
4619ed3
hotfix to readme
Rob-Sligter Jun 2, 2025
ee72c3f
horfix overfitting mlp
Rob-Sligter Jun 12, 2025
842cf5a
Final commit. Update Readme. More unit tests.
Mario-04 Jun 12, 2025
c82d223
Hotfix - unit tests in config
Mario-04 Jun 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Pipfile.lock
.idea
__pycache__/
.DS_Store
.venv
personal_testing
.vscode
/logs
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,7 @@ Pipfile.lock
.idea
__pycache__/
.DS_Store
.venv
personal_testing
.vscode
/logs
15 changes: 15 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
FROM python:3.9.6

WORKDIR /app

# Copy the requirements file into the container
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy the rest of the application code into the container
COPY . .

EXPOSE 8000 6006 8501

# Command to run the application
CMD ["streamlit", "run", "main.py"]
26 changes: 26 additions & 0 deletions Pipfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"

[packages]
numpy=="2.0.2"
keras=="3.9.2"
xgboost=="2.1.4"
matplotlib=="3.9.4"
pandas=="2.2.3"
scikit-learn=="1.6.1"
tensorflow=="2.19.0"
pyyaml=="6.0.2"
fastapi=="0.115.12"
streamlit=="1.45.1"
pytest=="8.3.5"
seaborn=="0.13.2"
httpx=="0.28.1"
uvicorn=="0.34.2"
fastf1=="3.5.3"

[dev-packages]

[requires]
python_version = "3.9"
171 changes: 80 additions & 91 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,126 +1,115 @@
# Applied ML Template 🛠️
# 🏎️ Formula 1 Predictor

**Welcome to Applied Machine Learning!** This template is designed to streamline the development process and boost the quality of your code.
Welcome to our Formula 1 Predictor project! This project was made for the Applied Machine Learning course at the University of Groningen, part of the Artificial Intelligence Bachelor's program.

Before getting started with your projects, we encourage you to carefully read the sections below and familiarise yourselves with the proposed tools.
The goal is to predict the outcome of Formula 1 races leveraging machine learning.

## Prerequisites
Make sure you have the following software and tools installed:
## Getting Started

- **PyCharm**: We recommend using PyCharm as your IDE, since it offers a highly tailored experience for Python development. You can get a free student license [here](https://www.jetbrains.com/community/education/#students/).
Currently, our API is not publicly available, but you can still run the code locally to train and test our diverse selection of models.

- **Pipenv**: Pipenv is used for dependency management. This tools enables users to easily create and manage virtual environments. To install Pipenv, use the following command:
```bash
$ pip install --user pipenv
```
For detailed installation instructions, [click here](https://pipenv.pypa.io/en/latest/installation.html).
All changeable parameters are stored in the `config.py` file, so you can easily adjust them to your needs.

- **Git LFS**: Instead of committing large files to your repository, you should store and manage them using Git LFS. For installation information, [click here](https://github.com/git-lfs/git-lfs?utm_source=gitlfs_site&utm_medium=installation_link&utm_campaign=gitlfs#installing).
### Running the Project Locally

## Getting Started
### Setting up your own repository
1. Fork this repository.
2. Clone your fork locally.
3. Configure a remote pointing to the upstream repository to sync changes between your fork and the original repository.
> [WARNING] This project is built with Python 3.9. Make sure you have this version installed on your machine. If you are using a different version, you may encounter compatibility issues.

1. Clone this repository to your local machine:
```bash
git remote add upstream https://github.com/ivopascal/Applied-ML-Template
git clone https://github.com/1-million-weed/Applied-ML-GROUP1.git
cd Applied-ML-GROUP1
```
2. Set up a virtual environment (our team used a mix between `pipenv` and `venv`):
```bash
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```
**Don't skip this step.** We might update the original repository, so you should be able to easily pull our changes.

To update your forked repo follow these steps:
1. `git fetch upstream`
2. `git rebase upstream/main`
3. `git push origin main`

Sometimes you may need to use `git push --force origin main`. Only use this flag the first time you push after you rebased, and be careful as you might overwrite your teammates' changes.
### Git LFS
1. Set it up for your user account (only once, not each time you want to use it).
```bash
git lfs install
```
2. Select the files that Git LFS should manage. To track all files of a certain type, you can use a wildcard as in the command below.
```bash
git lfs track "*.psd"
```
3. Add _.gitattributes_ to the staging area.
```bash
git add .gitattributes
```
That's all, you can commit and push as always. The tracked files will be automatically stored with Git LFS.

### Pipenv
This tool is incredibly easy to use. Let's **install** our first package, which you will all need in your projects.

```bash
pipenv install pre-commit
```

After running this command, you will notice that two files were created, namely, _Pipfile_ and _Pipfile.lock_. _Pipfile_ is the configuration file that specifies all the dependencies in your virtual environment.

To **uninstall** a package, you can run the command:
```bash
pipenv uninstall <package-name>
```

To **activate** the virtual environment, run `pipenv shell`. You can now use the environment as you wish. To **deactivate** the environment run the command `exit`.
#### OR

If you **already have access to a Pipfile**, you can install the dependencies using `pipenv install`.
2. If you prefer using `pipenv`, you can install the dependencies with:
```bash
pip install pipenv
pipenv install
```
Ensure you are in the directory where the `Pipfile` is located when running this.
3. To run just the API server, execute:
```bash
python main.py
```
Then head over to 'http://localhost:8000/docs' to access the Swagger UI and test the API endpoints.
4. To run the Streamlit application, execute:
```bash
streamlit run main.py
```
Then head over to 'http://localhost:8501' to acces our streamlit api demo.

For a comprehensive list of commands, consult the [official documentation](https://pipenv.pypa.io/en/latest/cli.html).
## THE CONFIG FILE

### Unit testing
You are expected to test your code using unit testing, which is a technique where small individual components of your code are tested in isolation.
Everything in this project is configurable through the `config.py` file. This includes:
- The `model` to be used for predictions
- Dataset **acquisition**
- Whether to acquire new data or use existing data
- Whether to `preprocess` the data
- Whether to `generate` new features from raw data
- Training and testing data `split`
- Model **training**
- Whether to `train` a new model or use an existing one
- `Tensorboard` logging
- Model **evaluation**
- Whether to `evaluate` the model on the test set
- Whether to show the evaluation `plots`
- Model **inference**
- Whether to run `inference` on the model
- Whether to activate the `API server`
- Whether to run the `Streamlit` application
- **Logging**
- The level of logging (`DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`)

An **example** is given in _tests/test_main.py_, which uses the standard _unittest_ Python module to test whether the function _hello_world_ from _main.py_ works as expected.
## API Endpoints

To run all the tests developed using _unittest_, simply use:
```bash
python -m unittest discover tests
```
If you wish to see additional details, run it in verbose mode:
```bash
python -m unittest discover -v tests
```
The API provides several endpoints to interact with the model:
- **GET /**: API Health check
- **GET /meetings**: List of all meetings from the current season
- **GET /meetings/{meeting_id}/max-laps**: List the maximum amount of laps in a race
- **GET /docs**: Swagger UI documentation

### Pre-commit
Another good coding practice is using pre-commit hooks. This is used to inspect the code before committing to ensure it matches your standards.
- **POST /predict**: Predict the outcome

In this course, we will be using two hooks (already configured in _.pre-commit-config.yaml_):
- Unit testing
- Flake8 (checks your code for errors, styling issues and complexity)

Since we have already configured the hooks, all you need to do is run:
```bash
pre-commit install
```
Now `pre-commit` will automatically run whenever you want to commit something to the repository.
## Notable references:

## Get Coding
You are now ready to start working on your projects.
- [Elo calculation code](https://www.kaggle.com/code/lorenzojayd/elo-system-in-formula-1/notebook)

We recommend following the same folder structure as in the original repository. This will make it easier for you to have cleaner and consistent code, and easier for us to follow your progress and help you.
## Project Structure

Your repository should look something like this:
```bash
├───data # Stores .csv
├───models # Stores .pkl
├───notebooks # Contains experimental .ipynbs
├───project_name
│ ├───data # For data processing, not storing .csv
│ ├───features
├───data # Stores raw .csv files
├───models # Stores .pkl files for trained models
├───experimental # Contains experimental .ipynbs & .py
├───f1_predictor
│ ├───app # Contains the Streamlit app
│ ├───data # stores processed .csv files
│ ├───data_acquisition # For acquiring data from the FastF1 API for 2025 data
│ ├───features # For scripts and logic for feature engineering
│ ├───ml # Contains the machine learning logic (pipelines & managers)
│ └───models # For model creation, not storing .pkl
├───reports
├───reports # For outputs and visualisations
├───tests
│ ├───data
│ ├───features
│ └───models
├───.dockerignore
├───.gitignore
├───.pre-commit-config.yaml
├───config.py
├───Dockerfile
├───main.py
├───mylogger.py
├───train_model.py
├───Pipfile
├───Pipfile.lock
├───README.md
├───requirements.txt
```

**Good luck and happy coding! 🚀**
58 changes: 58 additions & 0 deletions config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# ===============================
# Model Configuration
# ===============================
model:
name: "MultiLayerRegression" # Options: ["RandomForestClassifier", "XGBClassifier", "XGBRegressor", "MultiLayerPerceptron", "MultiLayerRegression", "RandomModel"]

# ===============================
# Dataset Configuration
# ===============================
dataset:
get_2025_data: false
calculate_elo: false
elo_plots: false
generate: false
test_size: 0.2
random_state: 42
empty_folder: true

# ===============================
# Training Configuration
# ===============================
training:
enabled: true
test_size: 0.2
show_plot: true
ground_truth: "finishing_position"
training_features: ['normalized_lap', 'average_normalized_lap', 'lap_progress', 'current_position_norm', 'normalized_driver_standing', 'normalized_fastest_qualifying', 'position_quali', 'normalized_driver_elo' , 'amount_of_wins', 'points_team']
tensorboard: true

# ===============================
# Evaluation Configuration
# ===============================

evaluation:
enabled: true
show_plot: true

# ===============================
# Inference Configuration
# ===============================
inference:
enabled: false
api: true
streamlit: true

# ===============================
# Logging Configuration
# ===============================

logger:
level: "CRITICAL" # Options: ["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"]


# ===============================
# Unit Tests Configuration
# ===============================

unit_tests: false
Loading