Skip to content

feat: better benchmarks with swe-bench approach#34

Merged
aeneasr merged 10 commits intomainfrom
better-benchmarks
Mar 10, 2026
Merged

feat: better benchmarks with swe-bench approach#34
aeneasr merged 10 commits intomainfrom
better-benchmarks

Conversation

@aeneasr
Copy link
Member

@aeneasr aeneasr commented Mar 9, 2026

No description provided.

aeneasr and others added 6 commits March 7, 2026 12:46
…hunk overlap

Improve chunk partitioning for oversized code chunks:

- findSplitPoint now recognizes block-ending patterns across language families:
  C-family (}, },, });, };), Ruby/Elixir (end), and a dedent heuristic for
  Python/YAML (detects indentation decreases at block boundaries)
- Add 5-line overlap between adjacent sub-chunks to improve search recall for
  queries matching concepts that span a split boundary
- Comprehensive test coverage: all boundary patterns, edge-of-lookback window,
  multiple consecutive splits, overlap content and line number verification

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of telling users to manually run /lumen:reindex, the doctor skill
now triggers force_reindex: true via semantic_search when the index is stale
or missing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These results are superseded by the bench-swe pipeline and are no longer
referenced from documentation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@aeneasr aeneasr enabled auto-merge (squash) March 10, 2026 17:08
aeneasr and others added 4 commits March 10, 2026 18:09
Language fixture indexing for Go (31k lines), PHP (21k), Rust (15k),
Python (13k) etc. cumulatively exceed the previous 20m limit. Raise to
30m to accommodate the full language test matrix.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@aeneasr aeneasr merged commit c722d66 into main Mar 10, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant