Lyrics

Mid-Bootcamp Project

This is the project what I created for my Mid-Bootcamp project in [IronHack](http://www.ironhack.com).

I had some requeriments and restrictions like:

Collect the data by myself (cannot download datasets)
The dataset should have between 30 and 100 observations (rows) and 5 to 10 features (columns)
I could enrich the dataset with more information obtained with other methods than manual typing (for example, web scraping)
Need to complete one analysis to answer the questions that I have to solve with this project, also I should supplement the analysis with some hypothesis.

My project

My questions were about the lyrics of the songs that we use to listen everyday, these question are:

Do the lyrics have an overall positive sentiment?
Are women's lyrics more positive than men's
Are pop's lyrics than hip hop's?

My solution

Python:

Creation of the dataset using python: I decided to collect information from Spotify Charts because I wanted to analyze lyrics globally, so I chose the top artist of the week 47 of year 2022. The process I followed was type in a file the artist of that chart and also used Last.fm to type more information like gender, main genre and if is a band.

Then I complete the dataset searching the 10 most popular songs of every artist in Spotify using their API.

After that I used the lyricsgenius library to connect to the website Genius to download 5 lyrics of the 10 most popular songs in Spotify of every artist (because the name of the songs in Spotify not always has a corresponding name in Genius). At this point I also created one function with Selenium to get the connection token, just in case that the token to connect to Genius change. Right now is stored in a file (secrets.txt) but could get it without store it.
Once I downloaded all the information, needed to make a treatment to do the sentiment analysis (using the library Flair) and natural language processing (NLTK).
Since in the list of songs were used different languages, I decided to translate all to english to simplify the analysis. For this I used the library Fasttext with their pre-trained model to detect the language of the lyrics and then translate them to english with the library translators which if cannot do the translation uses google translator to perform the work.
I calculated the top words with NLTK functions and update manually their stop words list to include more that where not interesting to my analysis.
I also created a function to generate one word cloud with the library wordcloud and PIL to show different with shapes that I downloaded.
In another jupyter notebook I developed the hypothesis analysis and prepared the dataset to the visual analysis with Tableau

Tableau

I used Tableau in his public version to create the presentation of the project and the visual analysis of the project, can find in in my Tableau

Flask

The project include a demo in flask that allows to search a song of one artist and will show the lyrics translated with the sentiment analysis and one wordcloud.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Flask		Flask
data		data
.gitignore		.gitignore
Analysis.ipynb		Analysis.ipynb
Dataset.ipynb		Dataset.ipynb
README.md		README.md
custom stopwords.txt		custom stopwords.txt
requeriments.txt		requeriments.txt
stopwords.txt		stopwords.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lyrics

Mid-Bootcamp Project

My project

My solution

Python:

Tableau

Flask

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Lyrics

Mid-Bootcamp Project

My project

My solution

Python:

Tableau

Flask

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages