This project analyzes Lady Gaga's song lyrics to examine the application of Zipf's Law in her music. Zipf's Law is a linguistic principle stating that the frequency of a word is inversely proportional to its rank in the frequency table.
This data analysis project:
- Processes and cleans Lady Gaga's song lyrics
- Analyzes word frequency distribution
- Tests whether the lyrics follow Zipf's Law
- Visualizes the results through various charts and plots
The project uses a dataset (LadyGaga.csv) containing Lady Gaga's song lyrics. The dataset includes:
- Song titles
- Album information
- Complete lyrics for each song
- Python 3.6+
- Jupyter Notebook
pip install pandas numpy matplotlib seaborn- Clone this repository
- Open the Jupyter Notebook:
jupyter notebook Hackdev.ipynb
- Run the cells to reproduce the analysis
The analysis includes:
-
Data Cleaning and Preprocessing:
- Removing missing values
- Tokenizing lyrics
- Converting text to lowercase
-
Word Frequency Analysis:
- Counting word occurrences
- Ranking words by frequency
-
Zipf's Law Verification:
- Comparing actual word frequencies with expected frequencies according to Zipf's Law
- Log-log plotting of rank vs. frequency
-
Visualization:
- Bar charts of top words
- Log-log plots showing the relationship between word rank and frequency
The analysis demonstrates how closely Lady Gaga's lyrics follow Zipf's Law, providing insights into linguistic patterns in popular music.
Hackdev.ipynb: Jupyter notebook containing the complete analysisLadyGaga.csv: Dataset with Lady Gaga's song lyricszipf_analysis.png: Log-log plot of word rank vs. frequencytop_words.png: Visualization of the most frequent wordstop_words_chart.png: Bar chart of the top words
This project is available for educational purposes.
For questions or feedback about this analysis, please open an issue in this repository.