fix(core): keep daemon alive when a recompute's plugin load fails#35705
Merged
Conversation
✅ Deploy Preview for nx-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
✅ Deploy Preview for nx-dev ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Contributor
|
View your CI Pipeline Execution ↗ for commit 077e165
☁️ Nx Cloud last updated this comment at |
AgentEnder
added a commit
that referenced
this pull request
May 15, 2026
Cherry-picked from PR #35705 (Jason Jean). Wraps the kickOffRecompute IIFE body in try/catch so a rejection from the prologue (readNxJson, getPluginsSeparated, isStale) becomes an errorResult the next requester surfaces instead of crashing the daemon with an unhandled rejection. Orthogonal to the in-flight dedupe fix one commit ago — that addresses a successful-but-stale plugin load returning the previous load's SeparatedPlugins. This one addresses the load *rejecting* (e.g., a plugin file failing to require). Both can be hit by the spread test under tight write→query loops; the strongest branch ships both.
AgentEnder
added a commit
that referenced
this pull request
May 15, 2026
Cherry-picked from PR #35705 (Jason Jean). Adds a stress test that matches the exact write shape of the original flake — plugin file + nx.json (plugins + targetDefaults) + project.json all changing together, then a single show project query — so CI reproduces the race reliably instead of the 1-in-N single-shot test. Pairs with the get-plugins dedupe and the kickOffRecompute try/catch in this branch: this is the failing shape both fixes must cover.
c25c719 to
c294d66
Compare
45440d8 to
9b09c0c
Compare
Contributor
There was a problem hiding this comment.
Nx Cloud has identified a flaky task in your failed CI:
🔂 Since the failure was identified as flaky, we triggered a CI rerun by adding an empty commit to this branch.
🎓 Learn more about Self-Healing CI on nx.dev
AgentEnder
added a commit
that referenced
this pull request
May 20, 2026
Cherry-picked from PR #35705 (Jason Jean). Wraps the kickOffRecompute IIFE body in try/catch so a rejection from the prologue (readNxJson, getPluginsSeparated, isStale) becomes an errorResult the next requester surfaces instead of crashing the daemon with an unhandled rejection. Orthogonal to the in-flight dedupe fix one commit ago — that addresses a successful-but-stale plugin load returning the previous load's SeparatedPlugins. This one addresses the load *rejecting* (e.g., a plugin file failing to require). Both can be hit by the spread test under tight write→query loops; the strongest branch ships both.
AgentEnder
added a commit
that referenced
this pull request
May 20, 2026
Cherry-picked from PR #35705 (Jason Jean). Adds a stress test that matches the exact write shape of the original flake — plugin file + nx.json (plugins + targetDefaults) + project.json all changing together, then a single show project query — so CI reproduces the race reliably instead of the 1-in-N single-shot test. Pairs with the get-plugins dedupe and the kickOffRecompute try/catch in this branch: this is the failing shape both fixes must cover.
AgentEnder
added a commit
that referenced
this pull request
May 20, 2026
Cherry-picked from PR #35705 (Jason Jean). Wraps the kickOffRecompute IIFE body in try/catch so a rejection from the prologue (readNxJson, getPluginsSeparated, isStale) becomes an errorResult the next requester surfaces instead of crashing the daemon with an unhandled rejection. Orthogonal to the in-flight dedupe fix one commit ago — that addresses a successful-but-stale plugin load returning the previous load's SeparatedPlugins. This one addresses the load *rejecting* (e.g., a plugin file failing to require). Both can be hit by the spread test under tight write→query loops; the strongest branch ships both.
AgentEnder
added a commit
that referenced
this pull request
May 20, 2026
Cherry-picked from PR #35705 (Jason Jean). Adds a stress test that matches the exact write shape of the original flake — plugin file + nx.json (plugins + targetDefaults) + project.json all changing together, then a single show project query — so CI reproduces the race reliably instead of the 1-in-N single-shot test. Pairs with the get-plugins dedupe and the kickOffRecompute try/catch in this branch: this is the failing shape both fixes must cover.
AgentEnder
added a commit
that referenced
this pull request
Jun 3, 2026
Cherry-picked from PR #35705 (Jason Jean). Wraps the kickOffRecompute IIFE body in try/catch so a rejection from the prologue (readNxJson, getPluginsSeparated, isStale) becomes an errorResult the next requester surfaces instead of crashing the daemon with an unhandled rejection. Orthogonal to the in-flight dedupe fix one commit ago — that addresses a successful-but-stale plugin load returning the previous load's SeparatedPlugins. This one addresses the load *rejecting* (e.g., a plugin file failing to require). Both can be hit by the spread test under tight write→query loops; the strongest branch ships both.
AgentEnder
added a commit
that referenced
this pull request
Jun 3, 2026
Cherry-picked from PR #35705 (Jason Jean). Adds a stress test that matches the exact write shape of the original flake — plugin file + nx.json (plugins + targetDefaults) + project.json all changing together, then a single show project query — so CI reproduces the race reliably instead of the 1-in-N single-shot test. Pairs with the get-plugins dedupe and the kickOffRecompute try/catch in this branch: this is the failing shape both fixes must cover.
AgentEnder
added a commit
that referenced
this pull request
Jun 3, 2026
Cherry-picked from PR #35705 (Jason Jean). Wraps the kickOffRecompute IIFE body in try/catch so a rejection from the prologue (readNxJson, getPluginsSeparated, isStale) becomes an errorResult the next requester surfaces instead of crashing the daemon with an unhandled rejection. Orthogonal to the in-flight dedupe fix one commit ago — that addresses a successful-but-stale plugin load returning the previous load's SeparatedPlugins. This one addresses the load *rejecting* (e.g., a plugin file failing to require). Both can be hit by the spread test under tight write→query loops; the strongest branch ships both.
AgentEnder
added a commit
that referenced
this pull request
Jun 3, 2026
Cherry-picked from PR #35705 (Jason Jean). Adds a stress test that matches the exact write shape of the original flake — plugin file + nx.json (plugins + targetDefaults) + project.json all changing together, then a single show project query — so CI reproduces the race reliably instead of the 1-in-N single-shot test. Pairs with the get-plugins dedupe and the kickOffRecompute try/catch in this branch: this is the failing shape both fixes must cover.
PR #35650 (freshness-gate + per-OS force-flush grace) is merged, but the spread.test.ts "middle plugin" e2e still flakes. This change makes the race reproduce reliably and restores the diagnostics for a failing run. - Restore the afterEach daemon-log dump (removed in 2f261d6) so the always-on [watcher]/recompute lines land next to the failure on CI. - Add a "rapid reconfiguration" describe with three stress tests that mutate nx.json / project.json in tight loops with no settle time and no reset between iterations, so the long-lived daemon must observe every change through its watcher before serving the graph. Each loop iteration is a fresh mutate-then-query race.
…f-Healing CI Rerun]
The first CI run reproduced the flake, but in a sibling test rather than the rapid-nx.json-swap stress tests: "...with target defaults overriding" failed with project.targets.build undefined — the daemon served a graph whose plugin set never ran. The rapid-swap stress tests raced nx.json alone against a warm daemon and passed. The real flake needs a plugin file, nx.json (plugins + targetDefaults) and project.json all changing together, then a single query. Add a stress test that loops that exact shape so CI reproduces it far more reliably than the 1-in-N single-shot test.
kickOffRecompute builds cachedSerializedProjectGraphPromise from an async IIFE. processFilesAndCreateAndSerializeProjectGraph is try/caught and always resolves to an errorResult, but the prologue hoisted in front of it for the freshness gate — readNxJson, getPluginsSeparated, isStale — was not. getPluginsSeparated rejects when a plugin fails to load. scheduleProjectGraphRecomputation calls kickOffRecompute() fire-and- forget, so a rejected myPromise has no awaiter and takes the daemon down with an unhandled promise rejection. The request path (getCachedSerializedProjectGraphPromise) try/catches its await, so only watcher-driven recomputes hit this. Wrap the IIFE body so it always resolves: a prologue failure now becomes an errorResult the next requester surfaces, same contract processFilesAndCreateAndSerializeProjectGraph already honors.
…lf-Healing CI Rerun]
nx-plugin.test.ts "should be able to infer projects and targets" flakes with "Cannot find project" — the same getPluginsSeparated local-plugin load failure family as the spread.test.ts flake, but this file has no daemon-log dump so a CI failure is opaque. Add an afterAll that prints .nx/workspace-data/d/daemon.log before teardown. The suite shares one long-lived daemon (no reset between tests), so one dump captures everything; a failing test is located by its Run Command / [REQUEST] lines. This makes the next run conclusive: a daemon crash (Node.js vNN + fresh pid) vs a clean errorResult with the local plugin unresolved.
…[Self-Healing CI Rerun]
The age_ms experiment answered its question — CI logs showed every ingest event arriving 0-5ms old, so watcher delivery is prompt. At debug the per-event line flooded every e2e daemon.log. Back to trace.
…ent loads getPluginsSeparated set currentPluginsConfigurationHash in its synchronous prologue but cachedSeparatedPlugins only after the await. With no mutual exclusion, two overlapping recomputes — which a daemon restart triggers, firing an initial recompute and a watcher recompute at once — could interleave so the hash key and the cached plugin set described different plugin sets. A later recompute then cache-hits the hash and is served the wrong plugins, dropping inferred targets from listed local plugins. Commit the hash and the cached value together, after the load, by the same call, and only if that call's plugin set is still the latest requested. Coalesce concurrent loads of the same plugin set so they share one load.
…ent loads [Self-Healing CI Rerun]
… reload Local plugin resolution snapshots the workspace's project layout (project configs, tsconfig path mappings, package entry points) into module-level caches that were taken once and never invalidated. In the long-lived daemon that snapshot can predate a newly added local plugin, so findProjectForPath falls back to the catch-all root project and the plugin resolves to the workspace root directory, which fails to import. loadSpecifiedNxPlugins now drops those snapshots before resolving. It runs only when the plugin set changed, which is exactly when a new local plugin can have appeared, and in the same process that resolves the plugin paths.
The afterEach/afterAll daemon.log dumps printed the whole file — thousands of per-file watcher events and per-message bookkeeping lines that bury the handful explaining a failure. Add trimDaemonLog() which keeps only diagnostic lines (daemon restarts, plugin loads, stale-graph discards, errors) plus the tail, and collapses runs of dropped lines into a visible marker. Wire it into the nx-plugin and spread e2e log dumps.
776d958 to
a417874
Compare
… separate hash [Self-Healing CI Rerun]
AgentEnder
approved these changes
Jun 5, 2026
vrxj81
pushed a commit
to vrxj81/nx
that referenced
this pull request
Jun 7, 2026
…wl#35705) ## Current Behavior A watcher-driven project-graph recompute that fails while loading plugins takes the **whole Nx daemon down**. `kickOffRecompute` (`project-graph-incremental-recomputation.ts`) builds `cachedSerializedProjectGraphPromise` from an async IIFE. `processFilesAndCreateAndSerializeProjectGraph` is wrapped in `try/catch` and always resolves to an `errorResult` — but the prologue hoisted in front of it for the freshness gate (`readNxJson`, `getPluginsSeparated`, `isStale`) is not. `getPluginsSeparated` **rejects** when a plugin fails to load. `scheduleProjectGraphRecomputation` calls `kickOffRecompute()` fire-and-forget, so a rejected `myPromise` has no awaiter — Node reports an unhandled promise rejection and the daemon process exits. (The request path, `getCachedSerializedProjectGraphPromise`, `try/catch`es its `await`, so only watcher-driven recomputes hit this.) Observed downstream as a flaky `e2e/nx/src/spread.test.ts`: a test mutates `nx.json` / `tools/*` / `project.json`, the watcher-driven recompute hits a transient plugin-load failure, the daemon crashes mid-test, and the following `show project` races a restarting daemon — surfacing as `project.targets.build` being `undefined` (a graph whose plugin set never ran). The captured `daemon.log` shows the `AggregateError` from `getPluginsSeparated` followed by a fresh daemon process starting. ## Expected Behavior A plugin-load failure during a recompute resolves to a graph **error** that the next requester surfaces — the daemon stays up. The IIFE body is wrapped so it always resolves (never rejects), turning a prologue failure into an `errorResult`, the same contract `processFilesAndCreateAndSerializeProjectGraph` already honors. The next `getCachedSerializedProjectGraphPromise` reads `result.error`, returns it to the client, and clears the cached promise so the recompute retries. Also in this PR (diagnostics / regression coverage for the flake): - Restored the `afterEach` daemon-log dump in `spread.test.ts` (removed in `2f261d6903`) so a failing run prints `.nx/workspace-data/d/daemon.log` next to the assertion in CI output. - Added a `describe('rapid reconfiguration (race-condition stress)')` block that mutates `nx.json` / `project.json` / `tools/*` in tight loops with no settle time and no `reset` between iterations, so the long-lived daemon is exercised hard — regression coverage for the crash. ## Related Issue(s) Follow-up to PR nrwl#35650 (nrwl#35650), which hoisted `getPluginsSeparated` out of the `try/catch`ed compute and into the bare IIFE prologue for the freshness gate. No separate issue number. --------- Co-authored-by: nx-cloud[bot] <71083854+nx-cloud[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Current Behavior
A watcher-driven project-graph recompute that fails while loading plugins
takes the whole Nx daemon down.
kickOffRecompute(project-graph-incremental-recomputation.ts) buildscachedSerializedProjectGraphPromisefrom an async IIFE.processFilesAndCreateAndSerializeProjectGraphis wrapped intry/catchand always resolves to an
errorResult— but the prologue hoisted infront of it for the freshness gate (
readNxJson,getPluginsSeparated,isStale) is not.getPluginsSeparatedrejects when a plugin failsto load.
scheduleProjectGraphRecomputationcallskickOffRecompute()fire-and-forget, so a rejected
myPromisehas no awaiter — Node reportsan unhandled promise rejection and the daemon process exits. (The request
path,
getCachedSerializedProjectGraphPromise,try/catches itsawait,so only watcher-driven recomputes hit this.)
Observed downstream as a flaky
e2e/nx/src/spread.test.ts: a test mutatesnx.json/tools/*/project.json, the watcher-driven recompute hitsa transient plugin-load failure, the daemon crashes mid-test, and the
following
show projectraces a restarting daemon — surfacing asproject.targets.buildbeingundefined(a graph whose plugin set neverran). The captured
daemon.logshows theAggregateErrorfromgetPluginsSeparatedfollowed by a fresh daemon process starting.Expected Behavior
A plugin-load failure during a recompute resolves to a graph error
that the next requester surfaces — the daemon stays up. The IIFE body is
wrapped so it always resolves (never rejects), turning a prologue failure
into an
errorResult, the same contractprocessFilesAndCreateAndSerializeProjectGraphalready honors. The nextgetCachedSerializedProjectGraphPromisereadsresult.error, returns itto the client, and clears the cached promise so the recompute retries.
Also in this PR (diagnostics / regression coverage for the flake):
afterEachdaemon-log dump inspread.test.ts(removed in2f261d6903) so a failing run prints.nx/workspace-data/d/daemon.lognext to the assertion in CI output.
describe('rapid reconfiguration (race-condition stress)')block that mutates
nx.json/project.json/tools/*in tightloops with no settle time and no
resetbetween iterations, so thelong-lived daemon is exercised hard — regression coverage for the crash.
Related Issue(s)
Follow-up to PR #35650 (#35650), which
hoisted
getPluginsSeparatedout of thetry/catched compute and intothe bare IIFE prologue for the freshness gate.
No separate issue number.