A hackathon project that uses machine learning to analyze the relationship between gaming habits and mental wellbeing. Users fill out a short assessment about their gaming behavior and lifestyle; the system returns a personalized risk analysis across five wellbeing dimensions with explainable predictions.
- Predicts risk levels for five mental health outcomes based on a user's gaming profile:
- Sleep problems
- Productivity issues
- Social isolation
- Dysregulation
- Emotional problems
- Shows which specific habits (hours played, genre, spending, exercise, etc.) are driving each risk
- Presents results in a retro arcade-themed web interface with charts and scenario analysis
Dataquest26/
├── data/ # Source dataset (1,000 records, 27 features)
├── mood/ # Mood state classifier (Random Forest)
│ ├── train.py
│ ├── train_notebook.ipynb
│ └── model.joblib / model.pkl
├── train/ # Multi-issue wellbeing models
│ ├── train_causal_grouped_model.py
│ ├── infer_grouped_model.py
│ ├── gamer_personas_kmeans.ipynb
│ ├── causal_detection_wellbeing*.ipynb
│ └── artifacts/ # Trained model + metrics
├── analysis/ # Causal pathway exploration
│ └── path_analysis.ipynb
└── frontend/ # Next.js web application
└── app/
├── page.tsx # Landing page
├── assess/page.tsx # User intake form
├── results/page.tsx # Results + visualizations
└── api/analyze/route.ts # Inference API (calls Python)
| Layer | Technology |
|---|---|
| ML Models | scikit-learn (RandomForest, GradientBoosting, LogisticRegression) |
| Data | pandas, numpy |
| Notebooks | Jupyter |
| Frontend | Next.js 16, React 19, TypeScript |
| Styling | Tailwind CSS 4 |
| Charts | Recharts |
data/Gaming and Mental Health.csv — 1,000 rows covering:
- Demographics (age, gender, years gaming)
- Gaming habits (daily hours, spending, genre, platform)
- Physical health (sleep hours, exercise, eye strain)
- Mental health outcomes (mood state, social isolation, addiction risk level)
There are two scripts in the train/ folder that together form the core ML pipeline.
Trains a bundle of classifiers that predict five distinct wellbeing issues from a user's gaming and lifestyle inputs. The training pipeline works as follows:
-
Feature taxonomy — Input features are split into three groups:
- Actionable causes:
daily_gaming_hours,monthly_game_spending_usd,exercise_hours_weekly,game_genre,primary_game,gaming_platform - Context confounders:
age,gender,years_gaming - Mediators (used in augmented track only): sleep, mood, social, and productivity metrics
- Actionable causes:
-
Target definitions — Five binary classification targets are derived from the dataset:
sleep_problem: poor sleep quality or frequent sleep disruptionproductivity_problem: poor academic/work performance or low productivity scoresocial_isolation_problem: high isolation score or low face-to-face social hoursemotional_problem: mood state is Depressed, Anxious, or Withdrawndysregulation_problem: frequent mood swings or continued gaming despite problems
-
Leakage prevention — For each issue, the features that directly define the target (e.g.
sleep_qualityforsleep_problem) are excluded from that model's input, preventing circular predictions. -
Model selection — Three algorithms are trained and evaluated for each issue; the best by F1 + ROC-AUC is kept:
RandomForestClassifier(500 trees, balanced class weights)GradientBoostingClassifierLogisticRegression(balanced class weights)
-
Overall model — A separate top-level classifier (also best-of-three) predicts a composite
wellbeing_problemflag (two or more issues present simultaneously). -
Preprocessing pipeline — Numeric features are median-imputed and standard-scaled; categorical features are mode-imputed and one-hot encoded, all within a
ColumnTransformerpipeline to prevent data leakage.
Output: train/artifacts/grouped_wellbeing_model.pkl — a pickle bundle containing all five issue models, the overall model, feature sets per issue, and the feature taxonomy.
| Issue | Best Model | F1 | ROC-AUC | Reliability |
|---|---|---|---|---|
| Sleep problem | GradientBoosting | 0.809 | 0.672 | Reliable |
| Social isolation | RandomForest | 0.794 | 0.927 | Reliable |
| Productivity problem | RandomForest | 0.646 | 0.716 | Reliable |
| Dysregulation | GradientBoosting | 0.550 | 0.626 | Weak |
| Emotional problem | LogisticRegression | 0.374 | 0.559 | Weak |
| Overall wellbeing | RandomForest | 0.667 | 0.898 | — |
A command-line inference script that loads the trained model bundle and runs predictions for a single user profile. It is called by the Next.js API route at runtime.
What it does:
- Accepts a JSON payload (via
--payloador--payload-file) containing the 9 required input fields - Validates and range-clips all numeric inputs before prediction
- Runs the overall wellbeing model and all five issue-specific models
- Computes local feature contributions per issue using perturbation analysis: each feature is individually replaced with a neutral baseline value and the change in predicted probability is measured, giving an explainable breakdown of what is driving each risk
- Groups contributions into cause categories (
gaming_load,gaming_spend,health_habits,game_context,context) - Runs intervention scenarios on the overall risk model to show how specific behavioral changes (e.g. reduce gaming by 2h/day, add 2h exercise/week) would shift the predicted risk
- Outputs a single JSON object with overall risk, per-issue risks, top contributors, and scenario comparisons
Usage:
python train/infer_grouped_model.py \
--artifact train/artifacts/grouped_wellbeing_model.pkl \
--payload '{"age": 22, "daily_gaming_hours": 6, ...}'Overall accuracy: 79.6%
| Issue | F1 Score |
|---|---|
| Sleep problems | 81.5% |
| Social isolation | 79.4% |
| Productivity problems | ~78% |
| Dysregulation | ~78% |
| Emotional problems | ~77% |
Models were trained without pretrained neural networks or external APIs — all classical ML on the provided dataset.
pip install -r requirements.txt
# Train the wellbeing models
python train/train_causal_grouped_model.py
# Train the mood classifier
python mood/train.pycd frontend
npm install
npm run devOpen http://localhost:3000 in your browser.
The app's /api/analyze route spawns a Python subprocess to run inference. Make sure Python dependencies are installed and the model artifacts exist in train/artifacts/ before starting the frontend.
- User fills out a 9-field form (age, gaming hours, spending, genre, platform, exercise, etc.)
- The Next.js API route calls
infer_grouped_model.pywith the user's inputs - The model outputs risk levels and per-feature contribution scores for each wellbeing dimension
- Results are displayed as pie charts, bar charts, and a scenario analysis breakdown
| Notebook | Purpose |
|---|---|
train/causal_detection_wellbeing.executed.ipynb |
Core causal analysis and model training |
train/causal_detection_wellbeing.localcontributors.executed.ipynb |
Feature contribution analysis |
train/causal_detection_wellbeing.personalized.executed.ipynb |
Per-user prediction walkthrough |
train/gamer_personas_kmeans.ipynb |
K-means clustering of gamer archetypes |
mood/train_notebook.ipynb |
Mood state classifier training |
analysis/path_analysis.ipynb |
Causal pathway exploration |