ivopascal · Rob-Sligter · Apr 23, 2025 · May 1, 2025 · May 1, 2025 · May 1, 2025
diff --git a/.dockerignore b/.dockerignore
@@ -0,0 +1,8 @@
+Pipfile.lock
+.idea
+__pycache__/
+.DS_Store
+.venv
+personal_testing
+.vscode
+/logs
diff --git a/.gitignore b/.gitignore
@@ -2,3 +2,7 @@ Pipfile.lock
 .idea
 __pycache__/
 .DS_Store
+.venv
+personal_testing
+.vscode
+/logs
diff --git a/Dockerfile b/Dockerfile
@@ -0,0 +1,15 @@
+FROM python:3.9.6
+
+WORKDIR /app
+
+# Copy the requirements file into the container
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+
+# Copy the rest of the application code into the container
+COPY . .
+
+EXPOSE 8000 6006 8501
+
+# Command to run the application
+CMD ["streamlit", "run", "main.py"]
diff --git a/Pipfile b/Pipfile
@@ -0,0 +1,26 @@
+[[source]]
+url = "https://pypi.org/simple"
+verify_ssl = true
+name = "pypi"
+
+[packages]
+numpy=="2.0.2"
+keras=="3.9.2"
+xgboost=="2.1.4"
+matplotlib=="3.9.4"
+pandas=="2.2.3"
+scikit-learn=="1.6.1"
+tensorflow=="2.19.0"
+pyyaml=="6.0.2"
+fastapi=="0.115.12"
+streamlit=="1.45.1"
+pytest=="8.3.5"
+seaborn=="0.13.2"
+httpx=="0.28.1"
+uvicorn=="0.34.2"
+fastf1=="3.5.3"
+
+[dev-packages]
+
+[requires]
+python_version = "3.9"
diff --git a/README.md b/README.md
@@ -1,126 +1,115 @@
-# Applied ML Template 🛠️
+# 🏎️ Formula 1 Predictor
 
-**Welcome to Applied Machine Learning!** This template is designed to streamline the development process and boost the quality of your code.
+Welcome to our Formula 1 Predictor project! This project was made for the Applied Machine Learning course at the University of Groningen, part of the Artificial Intelligence Bachelor's program. 
 
-Before getting started with your projects, we encourage you to carefully read the sections below and familiarise yourselves with the proposed tools.
+The goal is to predict the outcome of Formula 1 races leveraging machine learning. 
 
-## Prerequisites
-Make sure you have the following software and tools installed:
+## Getting Started
 
-- **PyCharm**: We recommend using PyCharm as your IDE, since it offers a highly tailored experience for Python development. You can get a free student license [here](https://www.jetbrains.com/community/education/#students/).
+Currently, our API is not publicly available, but you can still run the code locally to train and test our diverse selection of models.
 
-- **Pipenv**: Pipenv is used for dependency management. This tools enables users to easily create and manage virtual environments. To install Pipenv, use the following command:
-    ```bash
-    $ pip install --user pipenv
-    ```
-    For detailed installation instructions, [click here](https://pipenv.pypa.io/en/latest/installation.html).
+All changeable parameters are stored in the `config.py` file, so you can easily adjust them to your needs.
 
-- **Git LFS**: Instead of committing large files to your repository, you should store and manage them using Git LFS. For installation information, [click here](https://github.com/git-lfs/git-lfs?utm_source=gitlfs_site&utm_medium=installation_link&utm_campaign=gitlfs#installing).
+### Running the Project Locally
 
-## Getting Started
-### Setting up your own repository
-1. Fork this repository.
-2. Clone your fork locally.
-3. Configure a remote pointing to the upstream repository to sync changes between your fork and the original repository.
+> [WARNING] This project is built with Python 3.9. Make sure you have this version installed on your machine. If you are using a different version, you may encounter compatibility issues.
+
+1. Clone this repository to your local machine:
    ```bash
-   git remote add upstream https://github.com/ivopascal/Applied-ML-Template
+   git clone https://github.com/1-million-weed/Applied-ML-GROUP1.git
+   cd Applied-ML-GROUP1
+   ```
+2. Set up a virtual environment (our team used a mix between `pipenv` and `venv`):
+   ```bash
+   python -m venv venv
+   source venv/bin/activate
+   pip install -r requirements.txt
    ```
-   **Don't skip this step.** We might update the original repository, so you should be able to easily pull our changes.
-
-   To update your forked repo follow these steps:
-   1. `git fetch upstream`
-   2. `git rebase upstream/main`
-   3. `git push origin main`
-
-      Sometimes you may need to use `git push --force origin main`. Only use this flag the first time you push after you rebased, and be careful as you might overwrite your teammates' changes.
-### Git LFS
-1. Set it up for your user account (only once, not each time you want to use it).
-    ```bash
-    git lfs install
-    ```
-2. Select the files that Git LFS should manage. To track all files of a certain type, you can use a wildcard as in the command below.
-    ```bash
-   git lfs track "*.psd"
-    ```
-3. Add _.gitattributes_ to the staging area.
-    ```bash
-    git add .gitattributes
-    ```
-That's all, you can commit and push as always. The tracked files will be automatically stored with Git LFS.
-
-### Pipenv
-This tool is incredibly easy to use. Let's **install** our first package, which you will all need in your projects.
-
-```bash
-pipenv install pre-commit
-```
-
-After running this command, you will notice that two files were created, namely, _Pipfile_ and _Pipfile.lock_. _Pipfile_ is the configuration file that specifies all the dependencies in your virtual environment.
-
-To **uninstall** a package, you can run the command:
-```bash
-pipenv uninstall <package-name>
-```
 
-To **activate** the virtual environment, run `pipenv shell`. You can now use the environment as you wish. To **deactivate** the environment run the command `exit`.
+#### OR 
 
-If you **already have access to a Pipfile**, you can install the dependencies using `pipenv install`.
+2. If you prefer using `pipenv`, you can install the dependencies with:
+   ```bash
+   pip install pipenv
+   pipenv install
+   ```
+   Ensure you are in the directory where the `Pipfile` is located when running this.
+3. To run just the API server, execute:
+   ```bash
+   python main.py
+   ```
+   Then head over to 'http://localhost:8000/docs' to access the Swagger UI and test the API endpoints.
+4. To run the Streamlit application, execute:
+   ```bash
+   streamlit run main.py
+   ```
+   Then head over to 'http://localhost:8501' to acces our streamlit api demo.
 
-For a comprehensive list of commands, consult the [official documentation](https://pipenv.pypa.io/en/latest/cli.html).
+## THE CONFIG FILE
 
-### Unit testing
-You are expected to test your code using unit testing, which is a technique where small individual components of your code are tested in isolation.
+Everything in this project is configurable through the `config.py` file. This includes:
+- The `model` to be used for predictions
+- Dataset **acquisition**
+  - Whether to acquire new data or use existing data
+  - Whether to `preprocess` the data
+  - Whether to `generate` new features from raw data
+  - Training and testing data `split`
+- Model **training**
+  - Whether to `train` a new model or use an existing one
+  - `Tensorboard` logging
+- Model **evaluation**
+  - Whether to `evaluate` the model on the test set
+  - Whether to show the evaluation `plots`
+- Model **inference**
+  - Whether to run `inference` on the model
+  - Whether to activate the `API server`
+  - Whether to run the `Streamlit` application
+- **Logging**
+  - The level of logging (`DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`)
 
-An **example** is given in _tests/test_main.py_, which uses the standard _unittest_ Python module to test whether the function _hello_world_ from _main.py_ works as expected.
+## API Endpoints
 
-To run all the tests developed using _unittest_, simply use:
-```bash
-python -m unittest discover tests
-```
-If you wish to see additional details, run it in verbose mode:
-```bash
-python -m unittest discover -v tests
-```
+The API provides several endpoints to interact with the model:
+- **GET /**: API Health check
+- **GET /meetings**: List of all meetings from the current season
+- **GET /meetings/{meeting_id}/max-laps**: List the maximum amount of laps in a race
+- **GET /docs**: Swagger UI documentation
 
-### Pre-commit
-Another good coding practice is using pre-commit hooks. This is used to inspect the code before committing to ensure it matches your standards.
+- **POST /predict**: Predict the outcome
 
-In this course, we will be using two hooks (already configured in _.pre-commit-config.yaml_):
-- Unit testing
-- Flake8 (checks your code for errors, styling issues and complexity)
 
-Since we have already configured the hooks, all you need to do is run:
-```bash
-pre-commit install
-```
-Now `pre-commit` will automatically run whenever you want to commit something to the repository.
+## Notable references:
 
-## Get Coding
-You are now ready to start working on your projects.
+- [Elo calculation code](https://www.kaggle.com/code/lorenzojayd/elo-system-in-formula-1/notebook)
 
-We recommend following the same folder structure as in the original repository. This will make it easier for you to have cleaner and consistent code, and easier for us to follow your progress and help you.
+## Project Structure
 
-Your repository should look something like this:
 ```bash
-├───data  # Stores .csv
-├───models  # Stores .pkl
-├───notebooks  # Contains experimental .ipynbs
-├───project_name
-│   ├───data  # For data processing, not storing .csv
-│   ├───features
+├───data  # Stores raw .csv files
+├───models  # Stores .pkl files for trained models
+├───experimental  # Contains experimental .ipynbs & .py
+├───f1_predictor
+│   ├───app # Contains the Streamlit app
+│   ├───data  # stores processed .csv files
+│   ├───data_acquisition # For acquiring data from the FastF1 API for 2025 data
+│   ├───features # For scripts and logic for feature engineering
+│   ├───ml # Contains the machine learning logic (pipelines & managers)
 │   └───models  # For model creation, not storing .pkl
-├───reports
+├───reports # For outputs and visualisations
 ├───tests
 │   ├───data
 │   ├───features
 │   └───models
+├───.dockerignore
 ├───.gitignore
 ├───.pre-commit-config.yaml
+├───config.py
+├───Dockerfile
 ├───main.py
+├───mylogger.py
 ├───train_model.py
 ├───Pipfile
 ├───Pipfile.lock
 ├───README.md
+├───requirements.txt
 ```
-
-**Good luck and happy coding! 🚀**
diff --git a/config.yaml b/config.yaml
@@ -0,0 +1,58 @@
+# ===============================
+# Model Configuration
+# ===============================
+model:
+  name: "MultiLayerRegression"        # Options: ["RandomForestClassifier", "XGBClassifier", "XGBRegressor", "MultiLayerPerceptron", "MultiLayerRegression", "RandomModel"]
+
+# ===============================
+# Dataset Configuration
+# ===============================
+dataset: 
+  get_2025_data: false
+  calculate_elo: false
+  elo_plots: false
+  generate: false
+  test_size: 0.2
+  random_state: 42
+  empty_folder: true
+
+# ===============================
+# Training Configuration
+# ===============================
+training:
+  enabled: true
+  test_size: 0.2
+  show_plot: true 
+  ground_truth: "finishing_position"
+  training_features: ['normalized_lap', 'average_normalized_lap', 'lap_progress', 'current_position_norm', 'normalized_driver_standing', 'normalized_fastest_qualifying', 'position_quali', 'normalized_driver_elo' , 'amount_of_wins', 'points_team']
+  tensorboard: true
+
+# ===============================
+# Evaluation Configuration
+# ===============================
+
+evaluation:
+  enabled: true
+  show_plot: true 
+
+# ===============================
+# Inference Configuration
+# ===============================
+inference:
+  enabled: false
+  api: true
+  streamlit: true
+
+# ===============================
+# Logging Configuration
+# ===============================
+
+logger:
+  level: "CRITICAL"  # Options: ["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"]
+
+
+# ===============================
+# Unit Tests Configuration
+# ===============================
+
+unit_tests: false