Skip to content

Exploration of PCA and K-Means clustering on the UCI Flags dataset, with EDA, dimensionality reduction, and interactive visualizations. This project was done in Python.

License

Notifications You must be signed in to change notification settings

alan-c-lin/flags_pca_clustering

Repository files navigation

Unsupervised Learning on the Flags Dataset: PCA and K-Means Clustering

This project explores how Principal Component Analysis (PCA) and K-Means clustering perform on the Flags dataset from the UCI Machine Learning Repository. The study highlights how datasets with mixed feature types influence interpretability and model performance.

Project Structure

  • flags_pca_clustering.ipynb — Main Jupyter notebook containing code and explanations.
  • flags_pca_clustering.html — Exported HTML version of the notebook.
  • flags_pca_clustering.pdf — Exported PDF version of the notebook.
  • figures/ — Contains exported plots mainly for reference; all key figures are already embedded in the outputs.

Key Points

  • Perform PCA and K-Means clustering on the Flags dataset.
  • Conduct exploratory data analysis to visualize trends and correlations.
  • Assess PCA decomposition onto first two principal components.
  • Analyze clustering results and discuss the tradeoff between parsimony and interpretability.

Requirements

  • Python (3.10.16 recommended)
  • Jupyter Notebook / Jupyter Lab
  • Python packages: pandas, numpy, matplotlib, seaborn, altair, scikit-learn, ucimlrepo

You can install the required packages using:

pip install pandas numpy matplotlib seaborn altair scikit-learn ucimlrepo

How to Use

  1. Clone or download this repository.
  2. Open flags_pca_clustering.ipynb in Jupyter Notebook or Jupyter Lab and load the dataset via the ucimlrepo package.
  3. Run all cells to reproduce results, figures, and exported HTML/PDF outputs.

About

Exploration of PCA and K-Means clustering on the UCI Flags dataset, with EDA, dimensionality reduction, and interactive visualizations. This project was done in Python.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published