This repository contains two implementations of a K-Nearest Neighbors (KNN) classifier for predicting online shopping behavior based on the programming exercise in "Shopping - CS50's Introduction to Artificial Intelligence with Python" course. https://cs50.harvard.edu/college/2024/fall/.
- The project description is detailed in the file "Shopping - CS50's Introduction to Artificial Intelligence with Python.pdf"
The classifiers are implemented in Python and use different approaches for finding the nearest neighbors:
- Naive Implementation: A straightforward approach using a brute-force method to compute Euclidean distances.
- KDTree Implementation: An optimized approach using a KDTree for efficient nearest neighbor searches.
The dataset used in this project is shopping.csv
, which contains information about online shopping sessions. Each session is described by several features, and the goal is to predict whether the user made a purchase (Revenue).
It is provided by Sakar et al. (2018) (https://link.springer.com/article/10.1007%2Fs00521-018-3523-0)
Make sure you have Python installed. You can download it from python.org.
-
Clone this repository to your local machine:
git clone https://github.com/RezaBN/KNN-classifier-for-predicting-customers-purchase-behavior.git cd KNN-classifier-for-predicting-customers-purchase-behavior
-
(Optional) Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
-
Ensure you have the
shopping.csv
dataset in the same directory as the code files. -
Run the naive implementation:
python naive_knn.py
-
Run the KDTree implementation:
python kdtree_knn.py
The naive implementation uses a brute-force method to compute Euclidean distances between points and find the nearest neighbors.
- File:
naive_knn.py
- Class:
KNNClassifier
- Functionality: Loads data, splits it into training and testing sets, trains the KNN classifier, makes predictions, evaluates the results, and prints performance metrics.
The KDTree implementation uses a KDTree for efficient nearest neighbor searches, which improves the performance for large datasets.
- File:
kdtree_knn.py
- Class:
KDTree
,KDTreeNode
,KNNClassifier
- Functionality: Similar to the naive implementation but uses a KDTree for neighbor searches to optimize performance.
Both implementations evaluate the model using the following metrics:
- Correct (Number of correct predictions)
- Incorrect (Number of incorrect predictions)
- Sensitivity (True Positive Rate)
- Specificity (True Negative Rate)
- Precision (Positive Predictive Value)
- F1 Score (Harmonic mean of precision and recall)
- Accuracy
This project is licensed under the MIT License - see the LICENSE file for details.
Feel free to contribute to this project by submitting issues or pull requests. For major changes, please open an issue first to discuss what you would like to change.
Happy coding!