Fully neural approach for text chunking
-
Updated
Apr 27, 2025 - Python
Fully neural approach for text chunking
A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.
🍱 semantic-chunking ⇢ semantically create chunks from large document for passing to LLM workflows
🍶 llm-distillery ⇢ use LLMs to run map-reduce summarization tasks on large documents until a target token size is met.
Semantic Chunking is a Python library for segmenting text into meaningful chunks using embeddings from Sentence Transformers.
Advanced semantic text chunking with custom structural markers, whole-text coherence preservation, and flexible token management. Features async processing, LangChain integration, and dynamic drift detection. Ideal for RAG systems, augmented text processing, and domain-specific document analysis.
🤖 Automated Q&A Dataset Generation Pipeline powered by LLMs. Multi-stage pipeline that searches, filters, extracts and transforms web content into high-quality question-answer datasets for LLM training. Supports multiple LLM providers (Groq, Mistral, Ollama) and search engines.
Cutting-edge semantic text processing system that uses hierarchical clustering and advanced language models to automatically organize and summarize large volumes of text.
Lightweight, composable TypeScript library for semantic chunking, workflow pipelining, and LLM orchestration.
Retrieval-Augmented Generation (RAG) Fundamentals and Semantic Chunking
Add a description, image, and links to the semantic-chunking topic page so that developers can more easily learn about it.
To associate your repository with the semantic-chunking topic, visit your repo's landing page and select "manage topics."