Skip to content

KingPhilip14/Data-Eng-Final-Proj

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Open in Visual Studio Code

CSCI 422 Project - Ian King

This project focuses on analyzing the performance of music on Spotify. The music industry is never consistent with what is currently popular, and Spotify does a wonderful job keeping track of these changes. The purpose of this project is to analyze top performing songs and albums to determine what makes certain music successful. Furthermore, this would allow for artists/listeners to compare the performance of other projects to potentially determine why a project was successful or not.

Ingestion

During the ingestion stage of this project, there are two data sources that are used: one for albums, and one for singles.

Where the Data Comes From

The data first comes from kworb.net, where they extract the top 200 singles globally. This site will refresh every week with the most popular songs and display information such as how many weeks the single has been on the chart, the streams accrued that week, and total streams.

How the Data for Singles is Collected

To collect the data for the top singles, web scraping is used. By reading the values in the table on the website, the artist name, song title, streams from the week, and total streams are collected. These values are then added to a CSV file.

How the Data for Albums is Collected

To collect the data for popular albums, the process is very similar. Using Spotify's API, calls can be made that determine what album a song is on. Since kworb.net already contains the most popular songs, finding the albums these songs are on (if applicable) will yield good results.

This process includes first collecting all the songs just as before when collecting the top singles. Next, API calls must be made to determine which albums contain the song(s) collected. Once the albums are collected, all songs from the album (the tracklist) must be collected to analyze the album entirely. In addition to this, an extra value to retrieve is the length of a tracklist. All this information will be saved in a CSV file.

Transformation

Cleaning Song Data

Data cleaning is very minimal when retreiving data from kworb.net. When the data is being read, if there is no valid artist name/song title (i.e., an entry is labeled as "Unknown Artist - Unknown Song"), it is discarded and no longer processed.

Cleaning Album Data

When transforming an individual song into an album, if the song counts as a single and is not present on an album, it is disregarded. Furthermore, if at any point in the transformation from a song to an album, if the JSON object is malformed or is null, it will also be disregarded.

Serving

How will Songs/Album be Compared?

Albums and songs will be compared by analyzing values retrieved by using Spotify's API. Spotify has metrics called audio features for each song that describes its character. These include danceability, loudness, the key the song was written in, the tempo, the time signature, and a lot more. These values are represented by floating point numbers.

For both singles and albums, API calls would be made on every song to collect these values. For a single, the raw values will be used, and singles will be compared to singles. For an album, an average of all the songs' audio features will be collected instead, and albums will only be compared to albums.

UPDATE: Due to Spotify's API deprecating the endpoints that gave the data specified above, it is no longer possible to retreive the audio features for songs. While this removed half the data initially inteded to be used, extra functionality was added to the application to compensate.

To compare songs and albums, a user can provide a song/album title and the artist name. If the JSON can be found and is formatted correctly, the song/album will be returned. The user's input will be put in a CSV file to compare their music of choice to a collected average of the popular songs/albums from kworb.net.

The data extracted to compare for songs will be the popularity value (calculated by Spotify), the season in which the song was released (summer, fall, winter, spring), if the song is explicit, and if the song is a collab with another artist(s).

The data extracted to compare for albums will be the popularity value, the season in which the album was released, if the album contains any explicit songs, the tracklist length, if the album is a collaboration between artists, and if there are any features on any songs.

About

An application that helps users analyze the performance of song and albums using Spotify's API. Developed for the final project in CSCI 422.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages