Skip to content

davidestrada1/AutomatedCleaning

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AutomatedCleaning

PyPI LinkedIn Instagram X YouTube

AutomatedCleaning is a Python library for automated data cleaning.It helps preprocess and analyze datasets by handling missing values, outliers, spelling corrections, and more.

Logo

Features

  • Supports both large (100+ GB) and small datasets
  • Detects and handles missing values and duplicate records
  • Identifies and corrects spelling errors in categorical values
  • Detect outliers
  • Detects and fixes data imbalance
  • Identifies and corrects skewness in numerical data
  • Checks for correlation and detects multicollinearity
  • Analyzes cardinality in categorical columns
  • Identifies and cleans text columns
  • Detect JSON-type columns
  • Performs univariate, bivariate, and multivariate analysis

Installation

pip install -q AutomatedCleaning==0.1.5```

Usage

It requires Claude API key which you can get it from here https://console.anthropic.com/settings/keys

import AutomatedCleaning as ac
df = ac.load_data("dataset.csv")
df_cleaned = ac.clean_data(df)

About

Automated Data Cleaning Library

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%