A data analysis project that demonstrates how to clean, process, and analyze social network data to generate intelligent recommendations for users — including “People You May Know” and “Pages You May Like.”
This project models a simplified version of a social media recommendation engine. Using user–friend relationships and page-like data, it performs data cleaning, builds user–page mappings, and produces recommendations based on mutual friends and shared interests.
All processing and logic are implemented using Jupyter Notebooks and JSON data files, making the workflow transparent and easy to follow.
Codebook/
│
├── data/
│ ├── data.json # Raw social network data
│ ├── data2.json # Intermediate processed data
│ └── cleaned_codebook_data.json # Final cleaned dataset used for analysis
│
├── notebooks/
│ ├── Data_Cleaning.ipynb # Data cleaning and preparation
│ ├── Data_conversion.ipynb # Data format conversion and preprocessing
│ ├── Friends_recommendation.ipynb # Logic for "People You May Know"
│ └── Pages_recommendations.ipynb # Logic for "Pages You May Like"
│
└── README.md
- Cleans and validates raw social network data.
- Identifies inactive or invalid user entries.
- Builds relationships between users and their friends/pages.
- Generates friend and page recommendations using set and dictionary operations.
- Fully implemented and visualized within Jupyter notebooks.
After cleaning and processing, the system can generate results such as:
People You May Know for User 1: [4]
Pages You May Like for User 1: ['AI Weekly', 'Tech Crunch']
These outputs are computed based on the number of mutual friends and the pages liked by a user's direct connections.
-
Open the notebooks in JupyterLab or VS Code.
-
Run the notebooks in this order:
Data_Cleaning.ipynbData_conversion.ipynbFriends_recommendation.ipynbPages_recommendations.ipynb
-
Ensure that the cleaned dataset (
cleaned_codebook_data.json) is present inside thedata/folder. -
All results and visualizations will be shown within the notebooks.
- Python 3
- Jupyter Notebook
- JSON (data format)
- Core Python libraries:
json,os,collections
- Rank recommendations by mutual connection strength.
- Add a visualization of the social graph using NetworkX.
- Develop a simple web or command-line interface to run recommendations dynamically.
This project was created to explore the fundamentals of:
- Data cleaning and preprocessing,
- Graph-based relationship modeling, and
- Recommendation logic in social network data.
It serves as an academic and portfolio demonstration of end-to-end data analysis and algorithm design.