Guidance on Optimal Chunking Configuration for LLM-Based Processing of Financial PDFs #1371
              
                Unanswered
              
          
                  
                    
                      igelfenbeyn
                    
                  
                
                  asked this question in
                Q&A
              
            Replies: 0 comments
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I’m working on processing a large number of loosely related PDF files—primarily financial statements such as balance sheets, income statements, and similar documents. In this project, I’m not defining a fixed ontology upfront; instead, I’m relying on the LLM to determine how to interpret and extract information from each document.
Given this use case, I’d like to know: What are the most optimal chunking configurations for this kind of unstructured, heterogeneous input?
Additionally, is there any documentation or best-practice guide that explains the trade-offs between using larger vs. smaller chunk sizes? I’m particularly interested in how chunk size impacts context retention, accuracy of entity/relation extraction, and overall performance when using LLMs for knowledge graph construction.
Any advice or references would be greatly appreciated!
Thanks in advance.
Beta Was this translation helpful? Give feedback.
All reactions