semantic-chunking

Star

Here are 11 public repositories matching this topic...

mirth / chonky

Star

Fully neural approach for text chunking

ai ml chunking rag text-splitter llms semantic-chunking

Updated Apr 27, 2025
Python

isaacus-dev / semchunk

Star

A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.

python nlp text splitting chunking text-chunking text-splitting semantic-chunking isaacus

Updated Aug 13, 2025
Python

jparkerweb / semantic-chunking

Star

🍱 semantic-chunking ⇢ semantically create chunks from large document for passing to LLM workflows

vector embeddings chunking text-splitter llm text-chunking text-splitting semantic-chunking

Updated Jul 6, 2025
JavaScript

jparkerweb / llm-distillery

Star

🍶 llm-distillery ⇢ use LLMs to run map-reduce summarization tasks on large documents until a target token size is met.

text-summarization text-processing tokenization text-compression token-management openai-api llm large-language-model semantic-chunking text-distillation ai-text-reduction

Updated Feb 28, 2025
JavaScript

prajwal10001 / semantic-chunker-langchain

Star

Token-aware, LangChain-compatible semantic chunker with PDF, markdown, and layout support

python nlp markdown pdf ai rag langchain semantic-chunking

Updated Jun 28, 2025
Python

ThanhHung2112 / Semantic_chunking

Star

Semantic Chunking is a Python library for segmenting text into meaningful chunks using embeddings from Sentence Transformers.

nlp text vector chunking rag text-split vector-database semantic-chunking

Updated Dec 15, 2024
Python

Advanced semantic text chunking with custom structural markers, whole-text coherence preservation, and flexible token management. Features async processing, LangChain integration, and dynamic drift detection. Ideal for RAG systems, augmented text processing, and domain-specific document analysis.

lang rag test-split langchain semantic-chunking text-spl

Updated Aug 10, 2025
Python

gokhaneraslan / llm-qa-dataset-pipeline

Sponsor

Star

🤖 Automated Q&A Dataset Generation Pipeline powered by LLMs. Multi-stage pipeline that searches, filters, extracts and transforms web content into high-quality question-answer datasets for LLM training. Supports multiple LLM providers (Groq, Mistral, Ollama) and search engines.

nlp machine-learning natural-language-processing web-scraping question-answering dataset-generation content-extraction mistral document-processing qa-dataset groq automated-pipeline llm llama-index trafilatura ollama semantic-chunking crawl4ai ai-training-data

Updated Jun 7, 2025
Python

smart-models / Progressive-Summarizer-RAPTOR

Star

Cutting-edge semantic text processing system that uses hierarchical clustering and advanced language models to automatically organize and summarize large volumes of text.

docker rest-api gpu-acceleration raptor hierarchical-clustering rag llm semantic-chunking ollama-integration progressive-summarization

Updated Aug 3, 2025
Python

pipewrk / llm-core

Star

Lightweight, composable TypeScript library for semantic chunking, workflow pipelining, and LLM orchestration.

nlp typescript pipeline embeddings openai cosine-similarity chunking data-processing bun llm ollama semantic-chunking

Updated Jul 30, 2025
TypeScript

dcirne / rag_fundamentals

Star

Retrieval-Augmented Generation (RAG) Fundamentals and Semantic Chunking

machine-learning artificial-intelligence rag semantic-chunking

Updated Jun 19, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the semantic-chunking topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the semantic-chunking topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

semantic-chunking

Here are 11 public repositories matching this topic...

mirth / chonky

isaacus-dev / semchunk

jparkerweb / semantic-chunking

jparkerweb / llm-distillery

prajwal10001 / semantic-chunker-langchain

ThanhHung2112 / Semantic_chunking

bazilicum / axonode-chunker

gokhaneraslan / llm-qa-dataset-pipeline

smart-models / Progressive-Summarizer-RAPTOR

pipewrk / llm-core

dcirne / rag_fundamentals

Improve this page

Add this topic to your repo