This project explores how Principal Component Analysis (PCA) and K-Means clustering perform on the Flags dataset from the UCI Machine Learning Repository. The study highlights how datasets with mixed feature types influence interpretability and model performance.
flags_pca_clustering.ipynb
— Main Jupyter notebook containing code and explanations.flags_pca_clustering.html
— Exported HTML version of the notebook.flags_pca_clustering.pdf
— Exported PDF version of the notebook.figures/
— Contains exported plots mainly for reference; all key figures are already embedded in the outputs.
- Perform PCA and K-Means clustering on the Flags dataset.
- Conduct exploratory data analysis to visualize trends and correlations.
- Assess PCA decomposition onto first two principal components.
- Analyze clustering results and discuss the tradeoff between parsimony and interpretability.
- Python (3.10.16 recommended)
- Jupyter Notebook / Jupyter Lab
- Python packages:
pandas
,numpy
,matplotlib
,seaborn
,altair
,scikit-learn
,ucimlrepo
You can install the required packages using:
pip install pandas numpy matplotlib seaborn altair scikit-learn ucimlrepo
- Clone or download this repository.
- Open
flags_pca_clustering.ipynb
in Jupyter Notebook or Jupyter Lab and load the dataset via theucimlrepo
package. - Run all cells to reproduce results, figures, and exported HTML/PDF outputs.