Project documentation

text_data_toolkit is a Python package designed to assist in handling and transforming natural language data. It provides a suite of tools for efficient text processing, analysis, and visualization.

Features

Data Cleaning: Functions to preprocess and clean text data, including removal of stopwords, punctuation, and special characters.
Text Transformation: Utilities for tokenization, stemming, lemmatization, and vectorization.
Visualization: Tools to generate word clouds and other graphical representations of text data.
File Operations: Functions to move, bulk rename, delete, and list files.
Sentiment Labeling: Label text data into categories: positive, negative, neutral.

Installation

To install the package, clone the repository and use pip:

git clone https://github.com/Thomas-Kulch/text_data_toolkit.git
cd text_data_toolkit
pip install .

Dependencies

The package requires the following Python libraries:

pandas>=2.2.0

wordcloud>=1.9.0

matplotlib>=3.10.0

seaborn>=0.13.0

python-Levenshtein>=0.27.0

nltk>=3.9.0

pytest>=8.3.0

scikit-learn>=1.6.0

These dependencies will be installed automatically when you install the package.

Usage

Here's a basic example of how to use the toolkit:

import text_data_toolkit as tdt

# Sample text data
text = "This is a sample sentence for text processing."

# Clean the text
cleaned_text = tdt.clean_text(text)

# Tokenize the text
tokens = tdt.tokenize_text(cleaned_text)

# Generate a word cloud
tdt.generate_wordcloud(tokens)

Testing

To run tests, use pytest from the root of the project:

pytest

Make sure you have all development dependencies installed. You can install them with:

pip install -e .[dev]

Authors

Thomas Kulch - kulch.t@northeastern.edu

Ben Lin - lin.benja@northeastern.edu

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
.idea		.idea
data		data
eda		eda
tests		tests
text_data_toolkit		text_data_toolkit
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project documentation

Features

Installation

Dependencies

The package requires the following Python libraries:

Usage

Testing

Authors

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project documentation

Features

Installation

Dependencies

The package requires the following Python libraries:

Usage

Testing

Authors

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages