Skip to content

Nikhil3107jaiswal/Data-Analysis-using-Pandas

Repository files navigation

🎬 Pandas Movie Data Analysis Projects

This repository contains two hands-on Pandas projects focused on exploratory data analysis (EDA) using real-world movie datasets. These projects demonstrate strong fundamentals in data wrangling, aggregation, filtering, and insight generation using Python (Pandas) — skills essential for Data Analyst roles.


📁 Projects Overview

1️⃣ Project 1: Movie Titles & Cast Analysis (Foundational Pandas)

Notebook: Exercises-1_PANDAS.ipynb

🔍 Objective

To explore a movie cast dataset and answer analytical questions using core Pandas operations such as filtering, sorting, grouping, indexing, and aggregation.

📊 Dataset Used

  • cast.csv

  • Key columns include:

    • title
    • year
    • character

🛠️ Key Skills Demonstrated

  • Reading CSV files using Pandas
  • Data inspection (head, dtypes)
  • Filtering rows using conditions
  • Sorting and deduplication
  • Indexing and performance optimization
  • Aggregation using groupby

📌 Business / Analytical Questions Answered

  • Total number of unique movies
  • Earliest movies ever released
  • Frequency analysis of popular movie titles (e.g., Hamlet)
  • Timeline analysis of movies across years
  • Year-based production trends

💡 Key Insights

  • Certain movie titles (like Hamlet) have been remade multiple times across decades
  • Indexing significantly improves query performance on large datasets
  • Early cinema production trends can be identified using simple Pandas operations

2️⃣ Project 2: Advanced Movie Trend Analysis (Decade-Level Insights)

Notebook: Exercises-2_Pandas.ipynb

🔍 Objective

To perform time-based trend analysis on movie data and extract insights at a decade level, simulating real-world analytical reporting.

📊 Datasets Used

  • titles.csv (movie metadata)
  • cast.csv (movie cast information)

🛠️ Key Skills Demonstrated

  • Value frequency analysis
  • Decade-based feature engineering
  • Grouping and aggregation
  • Analytical thinking for trend detection
  • Preparing data for visualization

📌 Business / Analytical Questions Answered

  • Most common movie titles of all time
  • Peak movie production years during the 1930s
  • Number of films released per decade
  • Character-level trend analysis (e.g., Hamlet, Rustler)

💡 Key Insights

  • Movie production increased significantly post-1950s
  • Some characters and titles persist across multiple generations
  • Decade-level aggregation is powerful for long-term trend analysis

🧰 Tools & Technologies

  • Python
  • Pandas
  • Jupyter Notebook
  • Matplotlib (inline plotting)

🎯 Why This Project Matters (For Recruiters)

  • Demonstrates real-world data analysis thinking, not just syntax
  • Shows ability to translate raw data into meaningful insights
  • Strong foundation for roles such as Data Analyst / Business Analyst / Junior Data Scientist
  • Clean, modular, and interpretable Pandas code

🚀 Next Improvements (Optional Enhancements)

  • Add visualizations using Matplotlib / Seaborn
  • Convert insights into a Power BI / Tableau dashboard
  • Optimize performance for very large datasets
  • Package analysis into reusable functions

👤 Author

Nikhil Jaiswal Data Analyst | SQL | Python | Pandas | Power BI Passionate about transforming raw data into actionable insights


📌 This repository is ideal for showcasing Pandas proficiency and analytical reasoning during interviews.

About

Exploratory Data Analysis on real-world movie datasets using Python Pandas to uncover trends, patterns, and insights across decades.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors