Skip to content

ObayAlshaer/CSI4142_Assignment-2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 

Repository files navigation

CSI4142 - Fundamentals of Data Science

Assignment 2: Data Cleaning

Group Information

Group Number: 39
Members: Mohamed-Obay Alshaer (300170489), Samih Karroum (300188957)
Course: CSI4142 - Fundamentals of Data Science
Instructor: Caroline Barrière
Term: Winter 2025
Submission Date: February 25, 2025


About This Repository

This repository contains our submission for Assignment 2 of CSI4142 - Fundamentals of Data Science. The focus of this assignment is Data Cleaning, specifically:

  • Duplicate detection
  • Data validation processes
  • Imputation for missing values

The assignment is implemented entirely in Python and documented using Jupyter Notebooks to ensure transparency, reproducibility, and clarity in our analysis.


Repository Structure

This repository consists of three branches:

  1. Main: Contains only this README file.

  2. dataset-1: Contains the Jupyter Notebook for cleaning data using a "Clean Data Checker." This tool detects various types of errors such as:

    • Data type errors
    • Range errors
    • Format inconsistencies
    • Duplicate entries
    • Missing values, etc.
  3. dataset-2: Contains the Jupyter Notebook focusing on data imputation. This includes testing different imputation techniques such as:

    • Mean/Median/Mode imputation
    • Regression-based imputation
    • Correlation-based imputation

Each dataset and its respective processing are thoroughly explained within their respective Jupyter Notebooks.


Instructions for Running the Code

  1. Clone the repository:

    git clone <repo_url>
    cd <repo_directory>
  2. Checkout the branch you want to work with:

    git checkout dataset-1   # For Clean Data Checker
    git checkout dataset-2   # For Data Imputation
  3. Open Jupyter Notebook:

    jupyter notebook
  4. Navigate to the corresponding notebook and run the cells sequentially.


Acknowledgment

This assignment was completed as part of the CSI4142 - Fundamentals of Data Science course under the guidance of Professor Caroline Barrière at the University of Ottawa.


Contact

For any questions or clarifications, please reach out via email.

malsh094@uottawa.ca


Mohamed-Obay Alshaer & Samih Karroum
Winter 2025

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors