This repository contains an end-to-end Jupyter notebook and supporting artifacts for building and evaluating desktop activity recognition models from eye-tracking data.
eye_tracking_analysis.ipynb— the main Jupyter notebook. It implements data loading, preprocessing, sliding-window feature extraction, event detection (I‑DT — fixation/saccade), model training and comparison (RandomForest and XGBoost), class-imbalance handling (SMOTE), hyperparameter tuning (GridSearchCV for RandomForest), and visualization cells that compare model performance.dataset/— CSV files for each recording (e.g.,P01_READ.csv) used by the notebook. The notebook expectsxandycolumns at minimum; atimestampcolumn is supported if present.rf_eye_tracking.joblib,xgb_eye_tracking.joblib— example saved model artifacts (may be created after running notebook cells).eye_tracking_analysis.py— companion script (if present) with helper routines.
-
Create and activate a Python virtual environment (recommended):
python -m venv .venv ..venv\Scripts\Activate.ps1
-
Install dependencies (the notebook uses scikit-learn, pandas, numpy, matplotlib, seaborn, xgboost, imbalanced-learn, joblib):
pip install -r requirements.txt
-
Start Jupyter Lab or Notebook and open
eye_tracking_analysis.ipynb:jupyter lab
-
Run notebook cells in order. Recommended order:
- Run the imports & constants cell.
- Run the data loading cell (
load_all_data) to producedf_raw. - Run preprocessing (velocity) and feature extraction cells that create
X_df,yandfeature_names. - Run the encoding/train cell (RandomForest baseline).
- Optionally run SMOTE and the comparison cell to compare
RandomForestandXGBoostwith/without SMOTE. - Run the visualization cells to produce accuracy bar charts, confusion matrices, and feature importances.
- Data ingestion: reads CSVs from
dataset/, extractsuserandtaskfrom filenames, and forms session ids. - Preprocessing: computes per-session differences (
dx,dy) and avelocityfeature (handles optionaltimestampcolumn). Velocity is clipped to remove extreme outliers. - Event detection (I‑DT): detects fixations and saccades using a dispersion-threshold identification algorithm; events are used to create event-based window features (e.g., fixation count, mean fixation duration, saccade amplitude).
- Sliding-window features: converts sessions into overlapping windows (configurable
window_sizeandstep_size) and computes statistical features (means, stds, min/max) forx,y, andvelocityper window. - Modeling: trains a
RandomForestClassifierbaseline; includes a cell that trains anXGBoostclassifier for comparison. - Class imbalance: includes a SMOTE cell that oversamples only the training set and prints distributions before/after resampling.
- Model comparison & tuning: comparison cell trains RF and XGB on original vs SMOTE-resampled training data and reports accuracies and classification reports; a GridSearchCV cell is provided to tune RandomForest hyperparameters.
- Visualization: accuracy bar plots, normalized confusion matrices, and top-feature importance plots are included with an added human-readable summary and interpretation tips.
- Run cells top-to-bottom to avoid NameError from missing variables (cells create and rely on globals like
X_df,y,X_train,X_test). - If you edit the notebook structure (add/remove cells), re-run the important preprocessing cells to re-create
X_dfandybefore training cells. - If
imbalanced-learnorxgboostare missing, install them withpip install imbalanced-learn xgboost.
- Tune XGBoost with GridSearchCV or use early-stopping with a validation split.
- Experiment with different SMOTE ratios,
window_sizeandstep_size, and I‑DT thresholds for event detection. - Export the best model and build a small CLI script in
eye_tracking_analysis.pyto run inference on new recordings.
See LICENSE in the repository root.
This project was developed in this workspace. If you want changes (move markdown, add automated text summaries under plots, or export plots to PNG), tell me which change and I will update the notebook.