STADS Datathon 2025 - Flaschenpost Challenge

Overview

This project is the result of our participation in the Flaschenpost Challenge during the STADS Datathon 2025. Our goal was to develop a prediction model for the customer service time of the German company Flaschenpost.

Approach

We explored various models and found that Bayesian Models were particularly suitable for our needs. Our workflow included the following steps:

Data Preprocessing: We cleaned and prepared our data in the _1_Preprocessing directory.
Model Implementation: We implemented different models in directories _2_ through _11_.
Model Evaluation: We evaluated the performance of our models using a dashboard located in the _99_dash directory.

Models

A brief explanation of the models we used:

Baseline: A simple model used as a reference point for evaluating the performance of other models. It always predicts the average service time.
Linear Regression (_2_LinearRegression.py): A basic predictive model that assumes a linear relationship between the input features and the target variable.
XGBoost (_3_XGBoost.py): An optimized gradient boosting algorithm.
Neuronal Network (_4_NeuronalesNetz.py): A model consisting of layers of interconnected nodes.
Bayesian Ridge Regression (_5_BayesRidgeRegression.py): A linear regression model that incorporates Bayesian inference, providing probabilistic predictions and regularization.
BART (_7_BART.py): Bayesian Additive Regression Trees, a non-parametric model that combines the strengths of decision trees and Bayesian inference.
LinexXGB (_8_LinexXGB.py): A combination of linear regression and XGBoost, leveraging the strengths of both models. Here we incorporated a personalized cost function.
LightGBM (_9_LightGBM.py): A gradient boosting framework that uses tree-based learning algorithms.
Deep Gaussian Processes (_10_DeepGP.py): A model that extends Gaussian processes to deep architectures.
Hierarchical Bayesian Models (_11_HierarchicalBayes.py): Models that incorporate hierarchical structures, allowing for more flexible and accurate predictions by sharing information across different levels of the hierarchy.

Results

We significantly improved the prediction accuracy by reducing the Mean Absolute Error (MAE) from 4.330 minutes with the Baseline model to 2.316 minutes with the XGBoost model. This represents a reduction of approximately 46.5% in the prediction error. Additionally, the XGBoost model achieved a Mean Squared Error (MSE) of 11.299 and an R-squared (R2) value of 0.769, indicating a strong fit to the data. The confidence interval for the XGBoost model was (-9.759, 9.787), showing a narrower range compared to the Baseline model's interval of (-9.801, 9.801), further demonstrating the improved precision of our predictions.

Final Presentation

Our final presentation, summarizing our results and findings, is available in the pitch_summary directory. With this pitch, we secured second place in the challenge.

Repository Structure

_0_...: Exploratory Data Analysis
_1_Preprocessing: Data preprocessing scripts and notebooks.
_2_ to _11_: Implementation of various prediction models.
_12_evaluation: Definition of confidence interval calculation.
_99_dash: Dashboard for model evaluation.
pitch_summary: Final presentation of our results.

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
.ipynb_checkpoints		.ipynb_checkpoints
model		model
.gitignore		.gitignore
README.md		README.md
_0_Outliers.ipynb		_0_Outliers.ipynb
_0_customer_mapping.ipynb		_0_customer_mapping.ipynb
_0_driver_mapping.ipynb		_0_driver_mapping.ipynb
_10_DeepGP.py		_10_DeepGP.py
_11_HierarchicalBayes.py		_11_HierarchicalBayes.py
_12_evaluation.py		_12_evaluation.py
_1_Preprocessing.py		_1_Preprocessing.py
_2_LinearRegression.py		_2_LinearRegression.py
_3_XGBoost.py		_3_XGBoost.py
_4_NeuronalNetwork.py		_4_NeuronalNetwork.py
_5_BayesRidgeRegression.py		_5_BayesRidgeRegression.py
_6_Baseline.py		_6_Baseline.py
_7_BART.py		_7_BART.py
_8_LinexXGB.py		_8_LinexXGB.py
_99_Dash.py		_99_Dash.py
_9_LightGBM.py		_9_LightGBM.py
pitch_summary.pdf		pitch_summary.pdf
pitch_summary.pptx		pitch_summary.pptx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STADS Datathon 2025 - Flaschenpost Challenge

Overview

Approach

Models

Results

Final Presentation

Repository Structure

Team Members

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

STADS Datathon 2025 - Flaschenpost Challenge

Overview

Approach

Models

Results

Final Presentation

Repository Structure

Team Members

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages