This project applies Principal Component Analysis (PCA) and K-Means Clustering to a high-dimensional dataset (93 variables) to explore the relationship between psychographic traits—such as personality, phobias, and habits—and physical characteristics.
The analysis focuses on reducing 93 initial variables into 5 actionable factors that capture the most significant variance in human behavior and physical profiles. By applying dimensionality reduction, we successfully mapped complex psychological data onto distinct physical clusters.
- Data Preprocessing: Handling categorical data via Ordinal Encoding and normalizing numerical features using
StandardScaler. - Dimensionality Reduction: Implementing PCA to condense the feature space while retaining structural integrity.
- Unsupervised Learning: Utilizing K-Means Clustering to segment the population based on the principal components.
- Statistical Profiling: Analyzing cluster centroids to interpret the correlation between height/weight and behavioral factors (e.g., extraversion vs. anxiety).
The model identified four distinct psychophysical profiles:
- The Reckless Hedonist (Cluster 1): Linked to larger physical stature (avg. 178.6 cm, 73.3 kg) and sensation-seeking behaviors.
- The Anxious Conformist (Cluster 0): Characterized by smaller physical frames (avg. 169.3 cm, 59.3 kg).
- The Female-Dominant Profile (Cluster 2): 60.6% female with distinct psychological markers.
-
The Heterogeneous Group (Cluster 3): High weight variability (
$\sigma = 16.8$ ), representing a physically diverse segment.
- Language: Python 3
- Libraries:
Scikit-Learn(PCA, KMeans, Preprocessing)Pandas&NumPy(Data Manipulation)Matplotlib(Visualization)
- Environment: Jupyter Notebook
- Course: Computational Linear Algebra
- Academic Year: 2025/2026
- Authors: Lucio Baiocchi, Leonardo Passafiume