-
Notifications
You must be signed in to change notification settings - Fork 0
Linkash-77/Spotify_Analysis
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Β | Β | |||
Β | Β | |||
Β | Β | |||
Β | Β | |||
Β | Β | |||
Β | Β | |||
Β | Β | |||
Repository files navigation
# π΅ Spotify Data Engineering Project
This project demonstrates an **end-to-end data engineering pipeline** using the **Spotify API**, **Python**, **Supabase/MySQL**, and **Streamlit**.
It extracts track metadata, transforms it with simple business logic, loads it into a database, and provides analytics dashboards.
---
## π Tech Stack
* **Python** β Data extraction, transformation, and loading
* **Spotify API (Spotipy)** β Source of music metadata
* **Supabase (Postgres) / MySQL** β Data warehouse for storage
* **Pandas** β Data handling and cleaning
* **Matplotlib / Streamlit** β Visualizations and dashboard
* **SQL** β Schema design and analytics queries
---
## π Project Structure
```
spotify_data_analytics/
βββ dashboard.py # Streamlit dashboard (analytics & visualizations)
βββ spotify_mysql_urls.py # ETL pipeline (Extract β Transform β Load)
βββ spotify_schema_queries.sql # DB schema & SQL analytics queries
βββ track_urls.txt # Input file containing Spotify track URLs
βββ spotify_tracks_data.csv # Processed dataset (for quick demo)
βββ requirements.txt # Python dependencies
βββ README.md # Project documentation
```
---
## π ETL Pipeline Flow
1. **Extract**
* Read track URLs from `track_urls.txt`.
* Fetch metadata (track name, artist, album, popularity, duration, etc.) using the **Spotify API**.
2. **Transform**
* Calculate `duration_minutes`.
* Categorize `popularity` as `High / Medium / Low`.
* Categorize `duration` as `Short / Medium / Long`.
* Add timestamp `inserted_at`.
3. **Load**
* Insert into **Supabase (Postgres)** or **MySQL**.
* Prevent duplicate inserts by checking `track_id`.
4. **Analytics & Visualization**
* Generate `.csv` for offline analysis.
* Run **SQL queries** for insights.
* Interactive dashboard using **Streamlit**.
---
## π Example Queries (MySQL/Postgres)
* Most popular track:
```sql
select track_name, artist, album, popularity
from spotify_tracks
order by popularity desc
limit 1;
```
* Average popularity:
```sql
select avg(popularity) as average_popularity
from spotify_tracks;
```
* Categorize popularity:
```sql
select
case
when popularity >= 80 then 'Very Popular'
when popularity >= 50 then 'Popular'
else 'Less Popular'
end as popularity_range,
count(*) as track_count
from spotify_tracks
group by popularity_range;
```
---
## π Streamlit Dashboard Features
* View latest raw data records
* Top 5 tracks by popularity (bar chart)
* Popularity category distribution
* Duration category distribution
* Top 5 artists by average track duration
Run dashboard:
```bash
streamlit run dashboard.py
```
---
## β‘ How to Run
1. Clone the repo:
```bash
git clone https://github.com/your-username/spotify_data_analytics.git
cd spotify_data_analytics
```
2. Create a virtual environment & install dependencies:
```bash
python -m venv .venv
.venv\Scripts\activate # Windows
pip install -r requirements.txt
```
3. Add your Spotify API credentials in the script.
4. Add track URLs to `track_urls.txt`.
5. Run ETL script:
```bash
python spotify_mysql_urls.py
```
6. Run Streamlit dashboard:
```bash
streamlit run dashboard.py
```
---
## π‘ Future Improvements
* Automate pipeline with **Airflow/Prefect**
* Handle larger datasets with **Spark**
* Add streaming ingestion using **Kafka**
* Deploy dashboard to **Streamlit Cloud / Heroku**
---
## π― Key Takeaways
This project simulates a **real-world data engineering workflow**:
* Connecting to APIs (Spotify)
* ETL (Extract, Transform, Load)
* Database design and SQL analytics
* Visualization with dashboards
Itβs designed to highlight **data engineering skills** in interviews and resumes.
About
This project demonstrates an end-to-end data engineering pipeline using the Spotify API, Python, Supabase/MySQL, and Streamlit.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published