Harmonic Clustering is a research project focused on identifying organic listener communities from large-scale music streaming data. Instead of traditional demographic-based clustering, we adopt a network-science approach by building user-user graphs derived from temporal listening patterns and applying community detection algorithms to uncover shared music interests.
Check out the full implementation in our main notebook.
Streaming platforms accumulate rich temporal listening histories. Rather than grouping users by genres or playlists alone, this project investigates how temporal overlaps in listening behavior can naturally lead to the formation of music communities. We aim to highlight the social dimensions of music discovery by mapping these overlaps into graph structures.
We use a filtered version of the Last.fm 1K users dataset, which includes detailed listening histories for 1,000 users over several years.
- Extracted listening records for a fixed 3-month duration.
- Grouped listening history into weekly pseudo-playlists.
- Sampled a subset of users with high listening activity for focused analysis.
- Parsed the Last.fm-1K data to extract relevant fields (user, artist, timestamp).
- Created weekly pseudo-playlists per user based on timestamps.
- Filtered tracks to retain only frequent ones and active users.
- Computed pairwise user similarity using Jaccard index based on shared tracks in overlapping weeks.
- Created a weighted user-user similarity graph using NetworkX.
- Applied multiple community detection algorithms:
- Louvain algorithm
- Label Propagation
- Girvan-Newman (edge betweenness)
- Compared modularity scores and cluster distributions.
We visualized the user-user network using force-directed layouts. Nodes represent users and edges indicate similarity based on weekly listening overlap. Communities were detected using different algorithms and visualized below.
Each visualization highlights unique clustering behavior. Louvain and LPA show differing granularity, while Harmonic Clustering offers smoother transitions. The size comparison plot gives insight into the distribution of cluster memberships.
- Investigate whether users naturally cluster based on listening time and content overlaps.
- Build an interpretable and reproducible framework for temporal music graph analysis.
- Contribute a short research report demonstrating the effectiveness of this methodology.
Example visualizations of the user-user network, along with detected communities and representative artists, will be included in the report and repository.
- Data preprocessing and playlist formation
- User similarity computation
- Graph construction
- Community detection
- Visualization and analysis
- Report writing
This project is licensed under the MIT License.
Contributions, feedback, and suggestions are welcome. Please fork the repository and open a pull request for any additions or improvements.




