This project analyzes how environmental factors impact a cyclist's emotional experience (affective response). The workflow uses results from a large online survey to inform a controlled lab study, with a key focus on comparing manually created vs. LLM-generated video descriptions for downstream analyses purposes.
The core pipeline is as follows:
- Predictive Modeling & Candidate Selection: Analyzes cyclist ratings from a large survey to train a model that selects affectively diverse videos for the lab study.
- Video Content Description: Uses two parallel methods—manual "ground truth" labeling and automated LLM-based feature extraction—to describe video events and environmental features.
- Physiological Data Processing: Extracts heart rate (PPG) and skin conductance (EDA) metrics, calculating baseline-corrected "Deltas" to measure physiological reactivity.
- Lab Study & SEM Analysis: Performs general analysis of lab ratings and employs Structural Equation Modeling (SEM) to analyze how subjective and physiological responses are influenced by static (e.g., scenery) and dynamic (e.g., traffic) elements.
- Presence & Immersion: Analyzes the Igroup Presence Questionnaire (IPQ) to evaluate participant immersion during the lab study across different demographics.
cycling_experience/
├── input_data/
│ ├── context_data/ # Stores geospatial data layers (e.g., bike networks, traffic volume)
│ ├── video_traces/ # Contains GPX traces for cycling routes
│ ├── video_candidates/ # Raw video files for analysis
│ ├── online_results/ # Raw survey data from the online study
│ └── lab_results/ # Raw data from the lab study
│
├── output_data/ # Stores all outputs (e.g., processed data, predictions, plots)
│ └── video_data/ # Processed video data and extracted features
│ ...
│
├── utils/ # Utility scripts for data processing, plotting, etc.
│ ├── clustering_utils.py
│ ├── helper_functions.py
│ ├── lmm_utils.py
│ ├── plotting_utils.py
│ └── processing_utils.py
│
├── build_ground_truth.py # Script for processing manual labels and geospatial features into the ground truth dataset
├── llm_feature_extraction.py # Script for extracting features from videos using LLMs
├── online_survey_analysis.py # Script for analyzing online survey data
├── candidate_video_prediction.py # Script for predictive modeling and candidate video selection
├── lab_study_analysis.py # Script for analyzing lab study data
├── static_dynamic_analysis)SEM.py # Script for analyzing static vs. dynamic video features using Structural Equation Modelling. This pipeline corresponds to the following publication: Understanding Subjective Cycling Experience, with Static, Dynamic and Physiological Cues.
│
├── config.ini # Configuration file for paths and models
├── constants.py # Stores constant variables (e.g., column names, categories)
└── requirements.txt # Python package dependencies
-
Clone the repository:
git clone [https://github.com/mie-lab/cycling_experience.git](https://github.com/mie-lab/cycling_experience.git) cd cycling_experience -
Create and Activate a Virtual Environment:
# For Unix/macOS python3 -m venv venv source venv/bin/activate # For Windows python -m venv venv .\venv\Scripts\activate
-
Install Dependencies:
pip install -r requirements.txt
- Download Data:
- Place datasets (geospatial data, video files, survey results) in their respective folders under
input_data/. - Ensure the folder structure matches the directory description provided above.
- Place datasets (geospatial data, video files, survey results) in their respective folders under
- Update Paths in
config.ini:- Update all file and directory paths to match your local machine.
- Provide a
gemini_api_keyif you intend to runllm_feature_extraction.py.
-
Generate the Ground Truth:
python build_ground_truth.py
- Description: Aggregates geospatial data (traffic, greenery, bike networks) and runs semantic segmentation on video frames.
- Output: 30 frames per video,
segmentation_results.csv, and the mastervideo_ground_truth.csv.
-
Run the Online Survey Analysis:
python online_survey_analysis.py
- Description: Processes online ratings to assess valence/arousal, and generates demographic summaries to establish 'bikeable' or 'non-bikeable labels'.
- Output: Processed survey data and affect-grid visualizations in
output_data/.
-
Predict Candidate Videos:
python candidate_video_prediction.py
- Description: Uses KNN clustering and RMSE optimization to predict valence for candidate videos based on geospatial and semantic features.
- Output:
candidate_predictions.csvwith predicted valence scores.
-
Analyze Lab Study Data:
python lab_study_analysis.py
- Description: Performs block-level analysis (Validation, Equal, Positive, and Negative scenarios) and tests positional effects of "spoilers" using Linear Mixed Models (LMMs).
- Output: Scenario-specific visualizations and statistical model comparisons.
-
Extract Features Using LLMs:
python llm_feature_extraction.py
- Description: Sends video files to the Gemini 2.5 Flash API to extract environmental features (lane counts, surface material, motorized traffic speed) via a Pydantic-validated prompt.
- Output:
video_llm_info.csvcontaining automated video features.
-
Process Physiological Data:
python physiological_data_analysis.py
- Description: Processes physiological signals to extract cleaned EDA (SCL, SCR) and PPG (HR, HRV) signals.
- Output:
physiological_results.csvwith event-related and tonic metrics.
-
Run SEM & Causal Analysis:
python static_dynamic_analysis_SEM.py
- Description: Fits Structural Equation Models (SEM) and runs LiNGAM causal discovery to evaluate how infrastructure, visual elements, and dynamic events drive subjective and physiological affect.
- Output: Path diagrams, model fit statistics (
SEM_model_comparison.csv), and coefficient matrices.
This project is licensed under the MIT License. See the LICENSE file for details.