fix(pampa): detect parse errors without the per-lex tree-sitter logger (bd-b7eb7) by cscheid · Pull Request #247 · quarto-dev/q2

cscheid · 2026-05-31T14:35:31Z

Summary

The qmd reader (crates/pampa/src/readers/qmd.rs::read) attached a tree-sitter set_logger callback on every parse. When any logger is set, tree-sitter formats a debug string with snprintf for every lex action — and on macOS snprintf locks the global locale (localeconv_l). Under multithreaded project renders (rayon, ~16 workers) that locale lock serializes all parser threads.

How it was found

Profiled a large healthy project (401 docs / 8.2 MB, no errors) — the case we most want fast. A fresh multithreaded render spent ~75% of samples in __ulock_wait2/__ulock_wake/os_unfair_lock, call chain:

ts_lexer__advance → snprintf → __vfprintf → localeconv_l → os_unfair_lock

16 threads bought only ~1.2× wall-clock; sys time ~37 s. It is not the allocator — swapping mimalloc both as #[global_allocator] and via tree_sitter::set_allocator changed nothing. The locale-lock culprit was found with macOS sample (which symbolicates system libraries, unlike samply's sidecar).

Fix

Parse the happy path with no logger, and detect whether tree-sitter entered error-correction mode via its own public signal — the parse_with_options progress callback's ParseState::has_error — combined with the final tree's Node::has_error (which backstops an error in the last few ops before parse end). The logger is attached only for the diagnostic re-parse, which runs only when the document has an error. Encapsulated in MarkdownTree::had_parse_error. The previous, ineffective TreeSitterLogObserverFast (its cheap callback didn't help — the snprintf is inside tree-sitter, before the callback) is removed.

Why has_error is reliable here: tree-sitter-qmd declares no conflicts:, so the parser is deterministic LR and GLR multiple-stack speculation only occurs during error recovery — has_error therefore means a genuine error, not a speculative branch that will be pruned. This invariant is now documented in grammar.js and parser.rs.

Results (large healthy project, `release-perf`)

metric	before	after
16-thread render	8.04 s	3.71 s (2.2×)
16-thread `sys`	36.83 s	1.37 s (27×)
pass-1 @ 16 threads	2935 ms	426 ms (scales ~6.4× on 16 cores)
single-thread render	9.71 s	6.0 s (1.6×)

Correctness

Error-corpus snapshot tests (json / text / ariadne / locations) pass.
The real 565-file qmd-plans corpus reports the same 140 error files — exact parity with the previous log-based detection.
Full cargo xtask verify (Rust workspace + WASM/hub build + hub tests) passes.

Notes

Also documents the profiling lesson (use macOS sample for system-lib frames; a Rust #[global_allocator] doesn't cover a vendored C library's libc malloc) in claude-notes/instructions/performance-profiling.md.
Full investigation: claude-notes/research/2026-05-31-q2-render-perf-qmd-plans.md (lands separately with the website experiment).

🤖 Generated with Claude Code

…r (bd-b7eb7) The qmd reader attached a tree-sitter `set_logger` callback on every parse. When any logger is set, tree-sitter formats a debug string with `snprintf` for every lex action; on macOS `snprintf` locks the global locale (`localeconv_l`). Under multithreaded project renders (rayon, ~16 workers) that locale lock serializes all parser threads. Profiled on a large *healthy* project (401 docs / 8.2 MB, no errors): a fresh multithreaded render spent ~75% of samples in __ulock_wait2/__ulock_wake/ os_unfair_lock — chain ts_lexer__advance -> snprintf -> __vfprintf -> localeconv_l. 16 threads bought only ~1.2x wall-clock; sys ~37s. (It is NOT the allocator: mimalloc, both as #[global_allocator] and via ts_set_allocator, changed nothing. Identified with macOS `sample`, which symbolicates system libs unlike samply's sidecar.) Fix: parse the happy path with no logger, and detect whether tree-sitter entered error-correction mode via its own public signal — the `parse_with_options` progress callback's `ParseState::has_error` — combined with the final tree's `Node::has_error` (the latter backstops an error in the last few ops before the parse ends). The logger is attached only for the diagnostic re-parse, which runs only when the document has an error. `MarkdownTree::had_parse_error` encapsulates this; `TreeSitterLogObserverFast` (the previous, ineffective optimization) is removed. Why has_error is reliable here: tree-sitter-qmd declares no `conflicts:`, so the parser is deterministic LR and GLR multiple-stack speculation only happens during error recovery — has_error therefore means a genuine error, not a speculative branch. Documented this invariant in grammar.js and parser.rs so it isn't regressed. Measured (large healthy project, release-perf): 16-thread render 8.04 -> 3.71s (2.2x); sys 36.83 -> 1.37s (27x); pass-1 2935 -> 426ms (now scales ~6.4x on 16 cores); single-thread 1.6x faster. No diagnostic regression: error-corpus snapshot tests pass and the real 565-file corpus reports the same 140 error files. Full `cargo xtask verify` (incl. WASM/hub) passes. Research notes: claude-notes/research/2026-05-31-q2-render-perf-qmd-plans.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…b7eb7) samply's syms sidecar doesn't symbolicate system libraries, so hot/lock-holding frames inside libsystem show up as bare addresses. Document using macOS `sample` to resolve them, with the bd-b7eb7 worked example (guessed malloc, was wrong; mimalloc swap as a decisive negative result; the real cause was snprintf -> localeconv_l, the locale lock). Also note that #[global_allocator] doesn't cover a vendored C library's direct libc malloc. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cscheid · 2026-05-31T14:51:30Z

(it should be noted that these wall-clock numbers come from a really beefy M5 Max; typical users will see slower perf. We're working on it!)

…rcw) `native_visitor` declared `whitespace_re` as a function-local `Lazy<Regex>`, so a fresh `Lazy` was built and `\s+` recompiled on every call that reached the inline whitespace check. A samply profile of a full `q2 render` of the 565-file qmd-plans website attributed ~14% of self-time to the regex NFA/DFA compiler under `native_visitor` (20,806 compilations over the corpus). Hoist it to a module-level `static WHITESPACE_RE`, compiled once. Impact (median of 5, release builds): - end-to-end `q2 render`: 4720 ms -> 3540 ms (~-24%; render parses each file twice, Pass 1 profile + Pass 2 render, so the per-parse win doubles) - isolated parse path: -11.5% (1 thread), -9.9% (16 threads) - profile: regex-compile self-time 14.3% -> 0.02%, total CPU samples -17% Adds a regression test (`test_whitespace_re_compile_once`) that asserts the regex compiles <= 1x per process via a `WHITESPACE_RE_COMPILE_COUNT` counter, and a `parse-corpus` perf-harness driver used to measure the serial/parallel impact. The driver also surfaced that the parse path scales 7.1x on 16 threads with zero lock contention post-#247 — see bd-3gj56 (parallelize Pass 2). Plan + full numbers: claude-notes/plans/2026-06-01-render-perf-profiling.md Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cscheid and others added 2 commits May 31, 2026 09:31

cscheid mentioned this pull request May 31, 2026

fix(pampa): detect parse errors without the per-lex tree-sitter logger (bd-b7eb7) #246

Closed

cscheid merged commit 51a19ad into main May 31, 2026
9 checks passed

cscheid deleted the perf/tree-sitter-error-logging branch May 31, 2026 14:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(pampa): detect parse errors without the per-lex tree-sitter logger (bd-b7eb7)#247

fix(pampa): detect parse errors without the per-lex tree-sitter logger (bd-b7eb7)#247
cscheid merged 2 commits into
mainfrom
perf/tree-sitter-error-logging

cscheid commented May 31, 2026

Uh oh!

cscheid commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cscheid commented May 31, 2026

Summary

How it was found

Fix

Results (large healthy project, release-perf)

Correctness

Notes

Uh oh!

cscheid commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Results (large healthy project, `release-perf`)