fix(pampa): detect parse errors without the per-lex tree-sitter logger (bd-b7eb7)#246
Closed
cscheid wants to merge 2 commits into
Closed
fix(pampa): detect parse errors without the per-lex tree-sitter logger (bd-b7eb7)#246cscheid wants to merge 2 commits into
cscheid wants to merge 2 commits into
Conversation
…r (bd-b7eb7) The qmd reader attached a tree-sitter `set_logger` callback on every parse. When any logger is set, tree-sitter formats a debug string with `snprintf` for every lex action; on macOS `snprintf` locks the global locale (`localeconv_l`). Under multithreaded project renders (rayon, ~16 workers) that locale lock serializes all parser threads. Profiled on a large *healthy* project (401 docs / 8.2 MB, no errors): a fresh multithreaded render spent ~75% of samples in __ulock_wait2/__ulock_wake/ os_unfair_lock — chain ts_lexer__advance -> snprintf -> __vfprintf -> localeconv_l. 16 threads bought only ~1.2x wall-clock; sys ~37s. (It is NOT the allocator: mimalloc, both as #[global_allocator] and via ts_set_allocator, changed nothing. Identified with macOS `sample`, which symbolicates system libs unlike samply's sidecar.) Fix: parse the happy path with no logger, and detect whether tree-sitter entered error-correction mode via its own public signal — the `parse_with_options` progress callback's `ParseState::has_error` — combined with the final tree's `Node::has_error` (the latter backstops an error in the last few ops before the parse ends). The logger is attached only for the diagnostic re-parse, which runs only when the document has an error. `MarkdownTree::had_parse_error` encapsulates this; `TreeSitterLogObserverFast` (the previous, ineffective optimization) is removed. Why has_error is reliable here: tree-sitter-qmd declares no `conflicts:`, so the parser is deterministic LR and GLR multiple-stack speculation only happens during error recovery — has_error therefore means a genuine error, not a speculative branch. Documented this invariant in grammar.js and parser.rs so it isn't regressed. Measured (large healthy project, release-perf): 16-thread render 8.04 -> 3.71s (2.2x); sys 36.83 -> 1.37s (27x); pass-1 2935 -> 426ms (now scales ~6.4x on 16 cores); single-thread 1.6x faster. No diagnostic regression: error-corpus snapshot tests pass and the real 565-file corpus reports the same 140 error files. Full `cargo xtask verify` (incl. WASM/hub) passes. Research notes: claude-notes/research/2026-05-31-q2-render-perf-qmd-plans.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…b7eb7) samply's syms sidecar doesn't symbolicate system libraries, so hot/lock-holding frames inside libsystem show up as bare addresses. Document using macOS `sample` to resolve them, with the bd-b7eb7 worked example (guessed malloc, was wrong; mimalloc swap as a decisive negative result; the real cause was snprintf -> localeconv_l, the locale lock). Also note that #[global_allocator] doesn't cover a vendored C library's direct libc malloc. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Member
Author
|
Superseded by #247 — same commits, rebranched to |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The qmd reader (
crates/pampa/src/readers/qmd.rs::read) attached a tree-sitterset_loggercallback on every parse. When any logger is set, tree-sitter formats a debug string withsnprintffor every lex action — and on macOSsnprintflocks the global locale (localeconv_l). Under multithreaded project renders (rayon, ~16 workers) that locale lock serializes all parser threads.How it was found
Profiled a large healthy project (401 docs / 8.2 MB, no errors) — the case we most want fast. A fresh multithreaded render spent ~75% of samples in
__ulock_wait2/__ulock_wake/os_unfair_lock, call chain:16 threads bought only ~1.2× wall-clock;
systime ~37 s. It is not the allocator — swapping mimalloc both as#[global_allocator]and viatree_sitter::set_allocatorchanged nothing. The locale-lock culprit was found with macOSsample(which symbolicates system libraries, unlike samply's sidecar).Fix
Parse the happy path with no logger, and detect whether tree-sitter entered error-correction mode via its own public signal — the
parse_with_optionsprogress callback'sParseState::has_error— combined with the final tree'sNode::has_error(which backstops an error in the last few ops before parse end). The logger is attached only for the diagnostic re-parse, which runs only when the document has an error. Encapsulated inMarkdownTree::had_parse_error. The previous, ineffectiveTreeSitterLogObserverFast(its cheap callback didn't help — thesnprintfis inside tree-sitter, before the callback) is removed.Why
has_erroris reliable here: tree-sitter-qmd declares noconflicts:, so the parser is deterministic LR and GLR multiple-stack speculation only occurs during error recovery —has_errortherefore means a genuine error, not a speculative branch that will be pruned. This invariant is now documented ingrammar.jsandparser.rs.Results (large healthy project,
release-perf)sysCorrectness
cargo xtask verify(Rust workspace + WASM/hub build + hub tests) passes.Notes
samplefor system-lib frames; a Rust#[global_allocator]doesn't cover a vendored C library's libcmalloc) inclaude-notes/instructions/performance-profiling.md.claude-notes/research/2026-05-31-q2-render-perf-qmd-plans.md(lands separately with the website experiment).🤖 Generated with Claude Code