Skip to content

fix(core): keep daemon alive when a recompute's plugin load fails#35705

Merged
AgentEnder merged 19 commits into
masterfrom
chore/spread-race-repro
Jun 5, 2026
Merged

fix(core): keep daemon alive when a recompute's plugin load fails#35705
AgentEnder merged 19 commits into
masterfrom
chore/spread-race-repro

Conversation

@FrozenPandaz

@FrozenPandaz FrozenPandaz commented May 15, 2026

Copy link
Copy Markdown
Contributor

Current Behavior

A watcher-driven project-graph recompute that fails while loading plugins
takes the whole Nx daemon down.

kickOffRecompute (project-graph-incremental-recomputation.ts) builds
cachedSerializedProjectGraphPromise from an async IIFE.
processFilesAndCreateAndSerializeProjectGraph is wrapped in try/catch
and always resolves to an errorResult — but the prologue hoisted in
front of it for the freshness gate (readNxJson, getPluginsSeparated,
isStale) is not. getPluginsSeparated rejects when a plugin fails
to load.

scheduleProjectGraphRecomputation calls kickOffRecompute()
fire-and-forget, so a rejected myPromise has no awaiter — Node reports
an unhandled promise rejection and the daemon process exits. (The request
path, getCachedSerializedProjectGraphPromise, try/catches its await,
so only watcher-driven recomputes hit this.)

Observed downstream as a flaky e2e/nx/src/spread.test.ts: a test mutates
nx.json / tools/* / project.json, the watcher-driven recompute hits
a transient plugin-load failure, the daemon crashes mid-test, and the
following show project races a restarting daemon — surfacing as
project.targets.build being undefined (a graph whose plugin set never
ran). The captured daemon.log shows the AggregateError from
getPluginsSeparated followed by a fresh daemon process starting.

Expected Behavior

A plugin-load failure during a recompute resolves to a graph error
that the next requester surfaces — the daemon stays up. The IIFE body is
wrapped so it always resolves (never rejects), turning a prologue failure
into an errorResult, the same contract
processFilesAndCreateAndSerializeProjectGraph already honors. The next
getCachedSerializedProjectGraphPromise reads result.error, returns it
to the client, and clears the cached promise so the recompute retries.

Also in this PR (diagnostics / regression coverage for the flake):

  • Restored the afterEach daemon-log dump in spread.test.ts (removed in
    2f261d6903) so a failing run prints .nx/workspace-data/d/daemon.log
    next to the assertion in CI output.
  • Added a describe('rapid reconfiguration (race-condition stress)')
    block that mutates nx.json / project.json / tools/* in tight
    loops with no settle time and no reset between iterations, so the
    long-lived daemon is exercised hard — regression coverage for the crash.

Related Issue(s)

Follow-up to PR #35650 (#35650), which
hoisted getPluginsSeparated out of the try/catched compute and into
the bare IIFE prologue for the freshness gate.

No separate issue number.

@netlify

netlify Bot commented May 15, 2026

Copy link
Copy Markdown

Deploy Preview for nx-docs ready!

Name Link
🔨 Latest commit c6871b5
🔍 Latest deploy log https://app.netlify.com/projects/nx-docs/deploys/6a22f39606e37900088ec102
😎 Deploy Preview https://deploy-preview-35705--nx-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@netlify

netlify Bot commented May 15, 2026

Copy link
Copy Markdown

Deploy Preview for nx-dev ready!

Name Link
🔨 Latest commit c6871b5
🔍 Latest deploy log https://app.netlify.com/projects/nx-dev/deploys/6a22f3968db7110008a218c8
😎 Deploy Preview https://deploy-preview-35705--nx-dev.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@nx-cloud

nx-cloud Bot commented May 15, 2026

Copy link
Copy Markdown
Contributor

View your CI Pipeline Execution ↗ for commit 077e165

Command Status Duration Result
nx affected --targets=lint,test,build,e2e,e2e-c... ✅ Succeeded 20m 11s View ↗
nx run-many -t check-imports check-lock-files c... ✅ Succeeded 3s View ↗
nx-cloud record -- pnpm nx-cloud conformance:check ✅ Succeeded 1m 2s View ↗
nx build workspace-plugin ✅ Succeeded <1s View ↗
nx-cloud record -- nx sync:check ✅ Succeeded 18s View ↗
nx-cloud record -- nx format:check ✅ Succeeded 6s View ↗

☁️ Nx Cloud last updated this comment at 2026-06-05 16:29:11 UTC

nx-cloud[bot]

This comment was marked as outdated.

@FrozenPandaz FrozenPandaz changed the title chore(testing): stress the daemon watcher race in spread.test.ts fix(core): keep daemon alive when a recompute's plugin load fails May 15, 2026
AgentEnder added a commit that referenced this pull request May 15, 2026
Cherry-picked from PR #35705 (Jason Jean). Wraps the kickOffRecompute
IIFE body in try/catch so a rejection from the prologue (readNxJson,
getPluginsSeparated, isStale) becomes an errorResult the next requester
surfaces instead of crashing the daemon with an unhandled rejection.

Orthogonal to the in-flight dedupe fix one commit ago — that addresses
a successful-but-stale plugin load returning the previous load's
SeparatedPlugins. This one addresses the load *rejecting* (e.g., a
plugin file failing to require). Both can be hit by the spread test
under tight write→query loops; the strongest branch ships both.
AgentEnder added a commit that referenced this pull request May 15, 2026
Cherry-picked from PR #35705 (Jason Jean). Adds a stress test that
matches the exact write shape of the original flake — plugin file +
nx.json (plugins + targetDefaults) + project.json all changing
together, then a single show project query — so CI reproduces the
race reliably instead of the 1-in-N single-shot test.

Pairs with the get-plugins dedupe and the kickOffRecompute try/catch
in this branch: this is the failing shape both fixes must cover.
nx-cloud[bot]

This comment was marked as outdated.

nx-cloud[bot]

This comment was marked as outdated.

nx-cloud[bot]

This comment was marked as outdated.

nx-cloud[bot]

This comment was marked as outdated.

nx-cloud[bot]

This comment was marked as outdated.

nx-cloud[bot]

This comment was marked as outdated.

nx-cloud[bot]

This comment was marked as outdated.

@FrozenPandaz FrozenPandaz force-pushed the chore/spread-race-repro branch from c25c719 to c294d66 Compare May 17, 2026 18:20
nx-cloud[bot]

This comment was marked as outdated.

nx-cloud[bot]

This comment was marked as outdated.

nx-cloud[bot]

This comment was marked as outdated.

@FrozenPandaz FrozenPandaz force-pushed the chore/spread-race-repro branch from 45440d8 to 9b09c0c Compare May 18, 2026 04:14
nx-cloud[bot]

This comment was marked as outdated.

@nx-cloud nx-cloud Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nx Cloud has identified a flaky task in your failed CI:

🔂 Since the failure was identified as flaky, we triggered a CI rerun by adding an empty commit to this branch.

Nx Cloud View detailed reasoning in Nx Cloud ↗


🎓 Learn more about Self-Healing CI on nx.dev

AgentEnder added a commit that referenced this pull request May 20, 2026
Cherry-picked from PR #35705 (Jason Jean). Wraps the kickOffRecompute
IIFE body in try/catch so a rejection from the prologue (readNxJson,
getPluginsSeparated, isStale) becomes an errorResult the next requester
surfaces instead of crashing the daemon with an unhandled rejection.

Orthogonal to the in-flight dedupe fix one commit ago — that addresses
a successful-but-stale plugin load returning the previous load's
SeparatedPlugins. This one addresses the load *rejecting* (e.g., a
plugin file failing to require). Both can be hit by the spread test
under tight write→query loops; the strongest branch ships both.
AgentEnder added a commit that referenced this pull request May 20, 2026
Cherry-picked from PR #35705 (Jason Jean). Adds a stress test that
matches the exact write shape of the original flake — plugin file +
nx.json (plugins + targetDefaults) + project.json all changing
together, then a single show project query — so CI reproduces the
race reliably instead of the 1-in-N single-shot test.

Pairs with the get-plugins dedupe and the kickOffRecompute try/catch
in this branch: this is the failing shape both fixes must cover.
AgentEnder added a commit that referenced this pull request May 20, 2026
Cherry-picked from PR #35705 (Jason Jean). Wraps the kickOffRecompute
IIFE body in try/catch so a rejection from the prologue (readNxJson,
getPluginsSeparated, isStale) becomes an errorResult the next requester
surfaces instead of crashing the daemon with an unhandled rejection.

Orthogonal to the in-flight dedupe fix one commit ago — that addresses
a successful-but-stale plugin load returning the previous load's
SeparatedPlugins. This one addresses the load *rejecting* (e.g., a
plugin file failing to require). Both can be hit by the spread test
under tight write→query loops; the strongest branch ships both.
AgentEnder added a commit that referenced this pull request May 20, 2026
Cherry-picked from PR #35705 (Jason Jean). Adds a stress test that
matches the exact write shape of the original flake — plugin file +
nx.json (plugins + targetDefaults) + project.json all changing
together, then a single show project query — so CI reproduces the
race reliably instead of the 1-in-N single-shot test.

Pairs with the get-plugins dedupe and the kickOffRecompute try/catch
in this branch: this is the failing shape both fixes must cover.
AgentEnder added a commit that referenced this pull request Jun 3, 2026
Cherry-picked from PR #35705 (Jason Jean). Wraps the kickOffRecompute
IIFE body in try/catch so a rejection from the prologue (readNxJson,
getPluginsSeparated, isStale) becomes an errorResult the next requester
surfaces instead of crashing the daemon with an unhandled rejection.

Orthogonal to the in-flight dedupe fix one commit ago — that addresses
a successful-but-stale plugin load returning the previous load's
SeparatedPlugins. This one addresses the load *rejecting* (e.g., a
plugin file failing to require). Both can be hit by the spread test
under tight write→query loops; the strongest branch ships both.
AgentEnder added a commit that referenced this pull request Jun 3, 2026
Cherry-picked from PR #35705 (Jason Jean). Adds a stress test that
matches the exact write shape of the original flake — plugin file +
nx.json (plugins + targetDefaults) + project.json all changing
together, then a single show project query — so CI reproduces the
race reliably instead of the 1-in-N single-shot test.

Pairs with the get-plugins dedupe and the kickOffRecompute try/catch
in this branch: this is the failing shape both fixes must cover.
AgentEnder added a commit that referenced this pull request Jun 3, 2026
Cherry-picked from PR #35705 (Jason Jean). Wraps the kickOffRecompute
IIFE body in try/catch so a rejection from the prologue (readNxJson,
getPluginsSeparated, isStale) becomes an errorResult the next requester
surfaces instead of crashing the daemon with an unhandled rejection.

Orthogonal to the in-flight dedupe fix one commit ago — that addresses
a successful-but-stale plugin load returning the previous load's
SeparatedPlugins. This one addresses the load *rejecting* (e.g., a
plugin file failing to require). Both can be hit by the spread test
under tight write→query loops; the strongest branch ships both.
AgentEnder added a commit that referenced this pull request Jun 3, 2026
Cherry-picked from PR #35705 (Jason Jean). Adds a stress test that
matches the exact write shape of the original flake — plugin file +
nx.json (plugins + targetDefaults) + project.json all changing
together, then a single show project query — so CI reproduces the
race reliably instead of the 1-in-N single-shot test.

Pairs with the get-plugins dedupe and the kickOffRecompute try/catch
in this branch: this is the failing shape both fixes must cover.
PR #35650 (freshness-gate + per-OS force-flush grace) is merged, but the
spread.test.ts "middle plugin" e2e still flakes. This change makes the
race reproduce reliably and restores the diagnostics for a failing run.

- Restore the afterEach daemon-log dump (removed in 2f261d6) so the
  always-on [watcher]/recompute lines land next to the failure on CI.
- Add a "rapid reconfiguration" describe with three stress tests that
  mutate nx.json / project.json in tight loops with no settle time and
  no reset between iterations, so the long-lived daemon must observe
  every change through its watcher before serving the graph. Each loop
  iteration is a fresh mutate-then-query race.
nx-cloud Bot and others added 16 commits June 5, 2026 10:45
The first CI run reproduced the flake, but in a sibling test rather
than the rapid-nx.json-swap stress tests: "...with target defaults
overriding" failed with project.targets.build undefined — the daemon
served a graph whose plugin set never ran.

The rapid-swap stress tests raced nx.json alone against a warm daemon
and passed. The real flake needs a plugin file, nx.json (plugins +
targetDefaults) and project.json all changing together, then a single
query. Add a stress test that loops that exact shape so CI reproduces
it far more reliably than the 1-in-N single-shot test.
kickOffRecompute builds cachedSerializedProjectGraphPromise from an
async IIFE. processFilesAndCreateAndSerializeProjectGraph is try/caught
and always resolves to an errorResult, but the prologue hoisted in
front of it for the freshness gate — readNxJson, getPluginsSeparated,
isStale — was not. getPluginsSeparated rejects when a plugin fails to
load.

scheduleProjectGraphRecomputation calls kickOffRecompute() fire-and-
forget, so a rejected myPromise has no awaiter and takes the daemon
down with an unhandled promise rejection. The request path
(getCachedSerializedProjectGraphPromise) try/catches its await, so
only watcher-driven recomputes hit this.

Wrap the IIFE body so it always resolves: a prologue failure now
becomes an errorResult the next requester surfaces, same contract
processFilesAndCreateAndSerializeProjectGraph already honors.
nx-plugin.test.ts "should be able to infer projects and targets" flakes
with "Cannot find project" — the same getPluginsSeparated local-plugin
load failure family as the spread.test.ts flake, but this file has no
daemon-log dump so a CI failure is opaque.

Add an afterAll that prints .nx/workspace-data/d/daemon.log before
teardown. The suite shares one long-lived daemon (no reset between
tests), so one dump captures everything; a failing test is located by
its Run Command / [REQUEST] lines. This makes the next run conclusive:
a daemon crash (Node.js vNN + fresh pid) vs a clean errorResult with
the local plugin unresolved.
The age_ms experiment answered its question — CI logs showed every
ingest event arriving 0-5ms old, so watcher delivery is prompt. At
debug the per-event line flooded every e2e daemon.log. Back to trace.
…ent loads

getPluginsSeparated set currentPluginsConfigurationHash in its synchronous
prologue but cachedSeparatedPlugins only after the await. With no mutual
exclusion, two overlapping recomputes — which a daemon restart triggers,
firing an initial recompute and a watcher recompute at once — could
interleave so the hash key and the cached plugin set described different
plugin sets. A later recompute then cache-hits the hash and is served the
wrong plugins, dropping inferred targets from listed local plugins.

Commit the hash and the cached value together, after the load, by the same
call, and only if that call's plugin set is still the latest requested.
Coalesce concurrent loads of the same plugin set so they share one load.
… reload

Local plugin resolution snapshots the workspace's project layout
(project configs, tsconfig path mappings, package entry points) into
module-level caches that were taken once and never invalidated. In the
long-lived daemon that snapshot can predate a newly added local plugin,
so findProjectForPath falls back to the catch-all root project and the
plugin resolves to the workspace root directory, which fails to import.

loadSpecifiedNxPlugins now drops those snapshots before resolving. It
runs only when the plugin set changed, which is exactly when a new
local plugin can have appeared, and in the same process that resolves
the plugin paths.
The afterEach/afterAll daemon.log dumps printed the whole file — thousands
of per-file watcher events and per-message bookkeeping lines that bury the
handful explaining a failure.

Add trimDaemonLog() which keeps only diagnostic lines (daemon
restarts, plugin loads, stale-graph discards, errors) plus the tail, and
collapses runs of dropped lines into a visible marker. Wire it into the
nx-plugin and spread e2e log dumps.
@FrozenPandaz FrozenPandaz force-pushed the chore/spread-race-repro branch from 776d958 to a417874 Compare June 5, 2026 15:00
@AgentEnder AgentEnder marked this pull request as ready for review June 5, 2026 17:26
@AgentEnder AgentEnder requested a review from a team as a code owner June 5, 2026 17:26
@AgentEnder AgentEnder self-requested a review June 5, 2026 17:26
@AgentEnder AgentEnder merged commit 57e37c3 into master Jun 5, 2026
18 checks passed
@AgentEnder AgentEnder deleted the chore/spread-race-repro branch June 5, 2026 17:27
vrxj81 pushed a commit to vrxj81/nx that referenced this pull request Jun 7, 2026
…wl#35705)

## Current Behavior

A watcher-driven project-graph recompute that fails while loading
plugins
takes the **whole Nx daemon down**.

`kickOffRecompute` (`project-graph-incremental-recomputation.ts`) builds
`cachedSerializedProjectGraphPromise` from an async IIFE.
`processFilesAndCreateAndSerializeProjectGraph` is wrapped in
`try/catch`
and always resolves to an `errorResult` — but the prologue hoisted in
front of it for the freshness gate (`readNxJson`, `getPluginsSeparated`,
`isStale`) is not. `getPluginsSeparated` **rejects** when a plugin fails
to load.

`scheduleProjectGraphRecomputation` calls `kickOffRecompute()`
fire-and-forget, so a rejected `myPromise` has no awaiter — Node reports
an unhandled promise rejection and the daemon process exits. (The
request
path, `getCachedSerializedProjectGraphPromise`, `try/catch`es its
`await`,
so only watcher-driven recomputes hit this.)

Observed downstream as a flaky `e2e/nx/src/spread.test.ts`: a test
mutates
`nx.json` / `tools/*` / `project.json`, the watcher-driven recompute
hits
a transient plugin-load failure, the daemon crashes mid-test, and the
following `show project` races a restarting daemon — surfacing as
`project.targets.build` being `undefined` (a graph whose plugin set
never
ran). The captured `daemon.log` shows the `AggregateError` from
`getPluginsSeparated` followed by a fresh daemon process starting.

## Expected Behavior

A plugin-load failure during a recompute resolves to a graph **error**
that the next requester surfaces — the daemon stays up. The IIFE body is
wrapped so it always resolves (never rejects), turning a prologue
failure
into an `errorResult`, the same contract
`processFilesAndCreateAndSerializeProjectGraph` already honors. The next
`getCachedSerializedProjectGraphPromise` reads `result.error`, returns
it
to the client, and clears the cached promise so the recompute retries.

Also in this PR (diagnostics / regression coverage for the flake):

- Restored the `afterEach` daemon-log dump in `spread.test.ts` (removed
in
`2f261d6903`) so a failing run prints `.nx/workspace-data/d/daemon.log`
  next to the assertion in CI output.
- Added a `describe('rapid reconfiguration (race-condition stress)')`
  block that mutates `nx.json` / `project.json` / `tools/*` in tight
  loops with no settle time and no `reset` between iterations, so the
long-lived daemon is exercised hard — regression coverage for the crash.

## Related Issue(s)

Follow-up to PR nrwl#35650 (nrwl#35650), which
hoisted `getPluginsSeparated` out of the `try/catch`ed compute and into
the bare IIFE prologue for the freshness gate.

No separate issue number.

---------

Co-authored-by: nx-cloud[bot] <71083854+nx-cloud[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants