A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.
-
Updated
Aug 13, 2025 - Python
A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.
Fully neural approach for text chunking
🍱 semantic-chunking ⇢ semantically create chunks from large document for passing to LLM workflows
🍶 llm-distillery ⇢ use LLMs to run map-reduce summarization tasks on large documents until a target token size is met.
Semantic Chunking is a Python library for segmenting text into meaningful chunks using embeddings from Sentence Transformers.
Advanced semantic text chunking with custom structural markers, whole-text coherence preservation, and flexible token management. Features async processing, LangChain integration, and dynamic drift detection. Ideal for RAG systems, augmented text processing, and domain-specific document analysis.
All in One-Solution for converting documents to finetune LLMs
🤖 Automated Q&A Dataset Generation Pipeline powered by LLMs. Multi-stage pipeline that searches, filters, extracts and transforms web content into high-quality question-answer datasets for LLM training. Supports multiple LLM providers (Groq, Mistral, Ollama) and search engines.
Cutting-edge semantic text processing system that uses hierarchical clustering and advanced language models to automatically organize and summarize large volumes of text.
Lightweight, composable TypeScript library for semantic chunking, workflow pipelining, and LLM orchestration.
An exploration of advanced text splitting strategies in LangChain for RAG, from basic character splitting to state-of-the-art semantic chunking.
Retrieval-Augmented Generation (RAG) Fundamentals and Semantic Chunking
Add a description, image, and links to the semantic-chunking topic page so that developers can more easily learn about it.
To associate your repository with the semantic-chunking topic, visit your repo's landing page and select "manage topics."