Is it possible to use aws bedrock embeddings or Custom Embedding for HybridChunker? #3124
-
|
For process complex document with tables and unstructured table in AWS ecosystem, can i use Docling with aws embeddings via Bedrock? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
|
Yes, you can use Docling with AWS Bedrock embeddings, but there's an important distinction to understand: HybridChunker only handles chunking, not embedding generation. It uses a tokenizer to determine chunk boundaries (counting tokens to stay within limits), but outputs text chunks—not embeddings. The actual embedding step happens separately in your RAG pipeline. For AWS Bedrock integration, you have two approaches:
Built-in tokenizers include HuggingFace and OpenAI tiktoken. HybridChunker also handles complex tables well, automatically splitting large tables across chunks while preserving headers. To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
Yes, you can use Docling with AWS Bedrock embeddings, but there's an important distinction to understand:
HybridChunker only handles chunking, not embedding generation. It uses a tokenizer to determine chunk boundaries (counting tokens to stay within limits), but outputs text chunks—not embeddings. The actual embedding step happens separately in your RAG pipeline.
For AWS Bedrock integration, you have two approaches:
Use HybridChunker for chunking, then embed with Bedrock separately: