Skip to content

fix(index): prevent stack overflow via async pendingRebuild#100

Open
martinmarsian wants to merge 1 commit into
nark:mainfrom
martinmarsian:fix/index-async-rebuild
Open

fix(index): prevent stack overflow via async pendingRebuild#100
martinmarsian wants to merge 1 commit into
nark:mainfrom
martinmarsian:fix/index-async-rebuild

Conversation

@martinmarsian
Copy link
Copy Markdown

Summary

performFullRebuild() called itself synchronously from its own defer block when pendingRebuild was true. Under sustained file-system activity (pendingRebuild staying true across consecutive builds), this produces 400+ frames of recursive self-invocation and terminates the process with SIGBUS.

  • Replace the direct recursive call with indexQueue.async { [weak self] in self?.performFullRebuild() } so the next rebuild is scheduled as a fresh stack frame

Files changed

  • Sources/wired3/Controllers/IndexController.swift (1 file, 4 lines)

Test plan

  • Start server with a large files directory and trigger rapid file-system changes that keep re-indexing active — process should not crash
  • Normal re-index flow (SIGUSR1 or UI button) still works

🤖 Generated with Claude Code

performFullRebuild() called itself synchronously from its own defer block
when pendingRebuild was true. Under sustained rebuild pressure (continuous
file-system activity keeping pendingRebuild set), this produces 400+ frames
of recursive self-invocation and terminates the process with SIGBUS.

Use indexQueue.async instead so the next rebuild is scheduled as a new
stack frame, not a synchronous recursive call.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@nark nark closed this May 13, 2026
@nark nark reopened this May 13, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

Claude finished @nark's task in 1m 15s —— View job


Code Review — fix(index): prevent stack overflow via async pendingRebuild

Tasks

  • Read changed file and diff
  • Analyse correctness, thread safety, and edge cases
  • Post review

Nice catch and clean fix. The root cause is well-diagnosed — defer calling performFullRebuild() synchronously from within itself creates unbounded recursion whenever finalisation keeps failing (the recoverIndexAndFTS5() path sets pendingRebuild = true unconditionally). Switching to indexQueue.async is exactly the right tool here: it unwinds the current stack before the next rebuild starts, while the serial queue guarantees that only one rebuild runs at a time and that isIndexing/pendingRebuild are always read/written in serial order.

Thread-safety walk-through (everything checks out):

  • isIndexing and pendingRebuild are accessed only on indexQueue
  • cancelCurrentRebuild goes through cancelLock
  • indexQueue is serial → no window where a concurrent indexFiles() dispatch can race with the newly queued rebuild ✓
  • [weak self] avoids a retain cycle ✓

💡 Suggestion — persistent-error retry loop (not a blocker, but worth thinking about):

The async dispatch fixes the SIGBUS, but if the SQLite error that triggered recoverIndexAndFTS5() is persistent, the server will now loop continuously:

performFullRebuild → finalisation error
  → recoverIndexAndFTS5()           (sets hasFTS5 = false on failure)
  → pendingRebuild = true
  → defer dispatches next rebuild
  → same error again …

In practice the loop is bounded by traversal time (seconds-to-minutes per iteration, not a CPU spin), and recoverIndexAndFTS5() dropping/recreating the FTS5 table should resolve the common case. But if the underlying DB is entirely unwritable, you get a perpetual rebuild storm. A small guard — e.g. a consecutive-failure counter that backs off exponentially or disables auto-rebuild after N failures — would make the recovery story airtight. That said, this behaviour pre-dates this PR and is clearly out of scope here.


💡 Minor — comment verbosity (nit, ignore if you like):

The added // IMPORTANT: block re-states what the code now self-evidently shows. The first two lines that already existed (// If cancelled, forceReindex() …) are the important ones. The new block could be condensed to one line explaining the overflow scenario, rather than restating the async solution. Purely cosmetic.


The fix is correct and safe to merge as-is. Good work tracking down a tricky crash path, @martinmarsian.

@github-actions github-actions Bot added the claude: ✅ ok Claude review: pas de bloqueur, code mergeable label May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claude: ✅ ok Claude review: pas de bloqueur, code mergeable

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants