Skip to content
Discussion options

You must be logged in to vote

Yes, you can use Docling with AWS Bedrock embeddings, but there's an important distinction to understand:

HybridChunker only handles chunking, not embedding generation. It uses a tokenizer to determine chunk boundaries (counting tokens to stay within limits), but outputs text chunks—not embeddings. The actual embedding step happens separately in your RAG pipeline.

For AWS Bedrock integration, you have two approaches:

  1. Use HybridChunker for chunking, then embed with Bedrock separately:

    from docling_core.transforms.chunker import HybridChunker
    
    chunker = HybridChunker(max_tokens=512)
    chunks = list(chunker.chunk(doc))
    
    # Then embed each chunk using your Bedrock client
    for chunk in chunks:
    …

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by princeGedeon
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant