This repository contains two hands-on Pandas projects focused on exploratory data analysis (EDA) using real-world movie datasets. These projects demonstrate strong fundamentals in data wrangling, aggregation, filtering, and insight generation using Python (Pandas) — skills essential for Data Analyst roles.
Notebook: Exercises-1_PANDAS.ipynb
To explore a movie cast dataset and answer analytical questions using core Pandas operations such as filtering, sorting, grouping, indexing, and aggregation.
-
cast.csv
-
Key columns include:
titleyearcharacter
- Reading CSV files using Pandas
- Data inspection (
head,dtypes) - Filtering rows using conditions
- Sorting and deduplication
- Indexing and performance optimization
- Aggregation using
groupby
- Total number of unique movies
- Earliest movies ever released
- Frequency analysis of popular movie titles (e.g., Hamlet)
- Timeline analysis of movies across years
- Year-based production trends
- Certain movie titles (like Hamlet) have been remade multiple times across decades
- Indexing significantly improves query performance on large datasets
- Early cinema production trends can be identified using simple Pandas operations
Notebook: Exercises-2_Pandas.ipynb
To perform time-based trend analysis on movie data and extract insights at a decade level, simulating real-world analytical reporting.
- titles.csv (movie metadata)
- cast.csv (movie cast information)
- Value frequency analysis
- Decade-based feature engineering
- Grouping and aggregation
- Analytical thinking for trend detection
- Preparing data for visualization
- Most common movie titles of all time
- Peak movie production years during the 1930s
- Number of films released per decade
- Character-level trend analysis (e.g., Hamlet, Rustler)
- Movie production increased significantly post-1950s
- Some characters and titles persist across multiple generations
- Decade-level aggregation is powerful for long-term trend analysis
- Python
- Pandas
- Jupyter Notebook
- Matplotlib (inline plotting)
- Demonstrates real-world data analysis thinking, not just syntax
- Shows ability to translate raw data into meaningful insights
- Strong foundation for roles such as Data Analyst / Business Analyst / Junior Data Scientist
- Clean, modular, and interpretable Pandas code
- Add visualizations using Matplotlib / Seaborn
- Convert insights into a Power BI / Tableau dashboard
- Optimize performance for very large datasets
- Package analysis into reusable functions
Nikhil Jaiswal Data Analyst | SQL | Python | Pandas | Power BI Passionate about transforming raw data into actionable insights
📌 This repository is ideal for showcasing Pandas proficiency and analytical reasoning during interviews.