Skip to content

kra268/Data-Cleaning-Methods

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧹 Data Cleaning and Preprocessing for Machine Learning & Deep Learning

This repository contains a well-documented Jupyter Notebook that walks through essential and commonly used techniques for cleaning and preprocessing data before feeding it into machine learning and deep learning models.

The notebook includes:

  • Handling missing values
  • Removing duplicates
  • Encoding categorical variables (including Entity Embeddings)
  • Normalization & Standardization
  • Outlier detection (IQR, Z-score, Isolation Forest)
  • Text preprocessing (Stemming, Lemmatization)
  • Image cleaning via augmentations (Rotation, Affine & Perspective transforms)
  • Feature scaling
  • Data visualization for sanity checks

πŸ“ Contents

  • data_cleaning_and_preprocessing.ipynb: Main notebook with code examples and explanations.
  • images/: Folder (if any) with sample images for transformations.
  • requirements.txt: List of packages needed to run the notebook (optional, if provided).

πŸš€ Getting Started

1. Clone the repository

git clone https://github.com/yourusername/data-cleaning-preprocessing-ml.git
cd data-cleaning-preprocessing-ml

2. Install the required packages

pip install -r requirements.txt

🀝 Contributing

We welcome contributions! Here's how you can help:

1. Fork the repository

2. Create a new branch (git checkout -b feature-name)

3. Make your changes β€” Improve examples, fix bugs, add new cleaning methods, etc.

4. Commit your changes (git commit -m 'Add new cleaning method')

5. Push to your fork (git push origin feature-name)

6. Submit a pull request

Contribution Guidelines: Keep explanations simple and beginner-friendly.

Follow PEP8 style guidelines for Python code.

Provide comments in the notebook for new sections.

If adding new data types (e.g., audio, time-series), include minimal sample data.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published