A Temporal Analysis of Song Lyrics and Social Concerns in the U.S.

Project Description

This project aims to explore the relationship between the content of lyrics from the Billboard Top 100 list and the most common searches on Google Trends by U.S. citizens from 2004 to 2024. The primary purpose is to identify whether there is a connection between the two or any notable patterns over time.

Project Structure

Text_mining_final_project/
|-- documents/                  # Documents for the project
|   |-- Gentzkow (2010).pdf
|   |-- Hassan (2019).pdf
|   |-- hw02.pdf
|-- packages/                   # Package initialization file
|   |-- __pycache__.py          
|   |-- categories.py           # Create new categories
|   |-- preprocessing.py        # Data processing
|-- HW2_TEXT_MINING.pdf         # PDF with all our results and analysis
|-- hw02.ipynb                  # Principal Notebook
|-- README.md                   # Description the project structure
|-- requirements.txt            # Dependencies required to run
|-- setup.py                    # Installation and setup script

Installation and Setup

Requirements

To install the required dependencies, run:

pip install -r requirements.txt

Running the Pipeline

Prepare the Corpus:
- Choose a dataset from class materials or other sources like Kaggle, Google Books, or scraped web content.
- Ensure the dataset covers diverse topics and contains metadata.
Preprocess the Text:
- Run the preprocessing script to clean and normalize the text:
```
python packages/preprocessing.py
```
- This step includes removing stopwords, lemmatization, tokenization, and metadata extraction.
Generate Dictionaries:
- Define dictionary categories in categories.py.
- Run the script to generate dictionaries:
```
python packages/categories.py
```
- The script uses TF-IDF and other statistical methods to extract meaningful vocabulary.
Analyze the Data:
- Open the Jupyter Notebook and execute the analysis:
```
jupyter notebook hw02.ipynb
```
- This notebook visualizes dictionary distributions and metadata-based insights.

Contributions

This project is an academic exercise in text mining and dictionary-based analysis. Ethical considerations apply when scraping and analyzing text data.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.vscode		.vscode
ConceptNet		ConceptNet
LDA_Priors		LDA_Priors
Spotify-Genius		Spotify-Genius
WVS		WVS
Wikipedia-Billboard		Wikipedia-Billboard
figures		figures
google_trends		google_trends
.cache		.cache
ITM_final_project.pdf		ITM_final_project.pdf
README.md		README.md
regression_analysis.ipynb		regression_analysis.ipynb
results_ConceptNet_FE.tex		results_ConceptNet_FE.tex
results_ConceptNet_RE.tex		results_ConceptNet_RE.tex
results_LDA_FE.tex		results_LDA_FE.tex
results_LDA_RE.tex		results_LDA_RE.tex

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Temporal Analysis of Song Lyrics and Social Concerns in the U.S.

Project Description

Project Structure

Installation and Setup

Requirements

Running the Pipeline

Contributions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

A Temporal Analysis of Song Lyrics and Social Concerns in the U.S.

Project Description

Project Structure

Installation and Setup

Requirements

Running the Pipeline

Contributions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages