Skip to content

emirmasood/SpotifyClusteringAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

From Tunes to Trends: Data and Clustering Analysis of My Liked Songs on Spotify

This repository presents a data-driven exploration of Spotify listening behavior using unsupervised learning and visual analytics. The project analyzes audio features of personal Spotify data to uncover my listening trends, music preferences, and mood patterns through K-Means clustering, PCA, and a Power BI dashboard.


Overview

Spotify offers rich song-level metadata such as tempo, energy, valence, danceability, and acousticness, which can be analyzed to reveal user preferences and listening moods. This project builds a complete analytical pipeline — from data extraction to dashboard visualization — to understand how songs group together based on musical attributes.

The analysis identifies:

  • Listening clusters (based on song audio features)
  • Genre and artist distribution patterns
  • Personal listening moods and preferences
  • Time-of-day and energy/valence trends

Data Description

The dataset was exported from the Spotify API and enriched with song-level features such as:

Feature Description
danceability How suitable a track is for dancing based on rhythm stability and tempo.
energy Intensity and activity measure from 0.0 (calm) to 1.0 (energetic).
valence Musical positiveness — higher values represent happier, brighter tracks.
tempo Estimated beats per minute (BPM).
acousticness Likelihood of being acoustic.
instrumentalness Degree of vocal absence.
speechiness Presence of spoken words (e.g., podcasts vs songs).
popularity Spotify’s popularity index for each track.

🔬 Methodology

1️ Data Collection & ETL

  • Extracted personal “Liked Songs” data using the Spotify Web API.
  • Stored and cleaned data using SQL.
  • Transformed and aggregated metadata for analysis and visualization.

2️ Data Preprocessing

  • Handled missing values and duplicate songs.
  • Standardized numerical features using Min–Max scaling.
  • Encoded categorical attributes (e.g., genres, artists).

3️ Clustering & Dimensionality Reduction

  • Applied Principal Component Analysis (PCA) for feature compression and visualization.
  • Implemented clustering algorithms:
    • K-Means
    • DBSCAN
    • Agglomerative Clustering
  • Identified meaningful clusters representing different musical “moods” or listening profiles.

4️ Visualization

  • Created interactive dashboards in Power BI:
    • Energy vs Valence heatmaps
    • Genre and artist frequency plots
    • Listening mood distribution and top tracks
  • Generated static visualizations:
    • Wordcloud (most frequent artists)
    • Cluster scatterplots

Author

Amir Masoud Almasi

About

Spotify Data Exploration and Unsupervised Learning of My Listening Pattern

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors