The Google Play Store hosts millions of Android apps across diverse categories. This project focuses on analyzing a Kaggle dataset of Play Store apps to explore ratings, installs, reviews, and categories. The aim is to identify what drives app popularity and quality, and to provide useful insights for developers and researchers.
The dataset contains details such as:
- App name
- Category
- Rating
- Number of reviews
- Install counts
- Size
- Price
- App type (free or paid)
- Content rating
Source: Google Play Store Apps Dataset on Kaggle
Several issues were addressed before analysis:
- Converted
Reviews
,Installs
, andPrice
into numeric format. - Standardized app
Size
into megabytes. - Handled missing values by filling with mean, median, or mode.
- Removed 483 duplicate rows and dropped incomplete records.
- Final dataset shape: 10,347 rows.
The analysis included:
- Descriptive statistics, unique values, and data types.
- Distribution of ratings, reviews, installs, and app sizes.
- Average ratings by category.
- App type and content rating distribution.
- Top apps by installs and reviews.
- Boxplots for price, size, installs, and reviews to identify unusual values.
- Explored relationships between numerical features like reviews, installs, price, and rating.
- Free apps dominate the Play Store and gather far more installs compared to paid apps.
- Categories like Games, Communication, and Tools have the highest number of apps.
- Ratings are generally high across most categories, but reviews and installs are heavily skewed.
- Duplicate entries existed for popular apps like Instagram and Facebook, which were cleaned for analysis.
The Play Store dataset analysis shows how app type, category, and installs shape the success of apps. The findings can guide app developers in understanding user engagement and category trends, while also highlighting the importance of data cleaning for accurate insights.