Skip to content

Conversation

sachinML
Copy link

Refs #30200

  • Use split_documents(docs) after header-based splitting (preserves per-section metadata; overlap applied per document).
  • Overlap appears only when a single section exceeds chunk_size.
  • Overlap does not cross section/document boundaries.
  • Consider strip_headers=True to avoid a tiny header-only chunk; keep "" as a fallback separator if text lacks newlines/spaces.

@github-actions github-actions bot added documentation Improvements or additions to documentation text-splitters Related to the package `text-splitters` and removed documentation Improvements or additions to documentation labels Oct 13, 2025
@sachinML
Copy link
Author

CI is green; CodeQL is pending. This is a docs-only change. Could a maintainer please trigger the CodeQL scan so this can be merged?

@sachinML sachinML force-pushed the docs/chunk-overlap-troubleshooting branch 6 times, most recently from 88e993b to 5e569a9 Compare October 18, 2025 09:44
@sachinML sachinML force-pushed the docs/chunk-overlap-troubleshooting branch from 5e569a9 to ae92263 Compare October 19, 2025 11:52
Copy link
Collaborator

@ccurme ccurme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. I think this is not the right level documentation for this information, can we add to the dedicated docs site here?

You can submit a PR updating the source here.

@ccurme ccurme closed this Oct 21, 2025
@sachinML
Copy link
Author

sachinML commented Oct 21, 2025

Thanks @ccurme, I’ve opened a PR to the docs site, updating the MarkdownHeaderTextSplitter page with the troubleshooting note:
langchain-ai/docs#1061.

If preferred, I can close or revert the README change here and keep the guidance centralized in the docs site.

@sachinML sachinML deleted the docs/chunk-overlap-troubleshooting branch October 21, 2025 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation text-splitters Related to the package `text-splitters`

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants