Upgrade napi-rs 2.x → 3 to fix node bindings CI worker_threads segfault#161
Upgrade napi-rs 2.x → 3 to fix node bindings CI worker_threads segfault#161lfarrel6 wants to merge 2 commits into
Conversation
The node bindings test workflow appeared to pass tests but failed with exit 129 (yarn) / 139 (raw ava). Root cause: napi-rs 2.x is not fully context-aware on worker_thread shutdown. AVA 4+ runs each test file in a Worker by default; when the worker exits, napi-rs 2.x leaves dangling references (thread_local REFERENCE_MAP, threadsafe-function plumbing) and the V8 isolate teardown SIGSEGVs. The crash propagates back through yarn/ava as 129 or 139 depending on signal handling. This is fixed upstream in napi-rs 3 (napi-rs/napi-rs#2469, #2470, #3026). Changes: - Cargo.toml: napi 2.10.6 -> 3, napi-derive 2.9.4 -> 3, napi-build 2.0.1 -> 2 - src/lib.rs: JsBuffer -> bindgen_prelude::Buffer; into_value() plumbing no longer needed (Buffer impls AsRef<[u8]> directly) - package.json: @napi-rs/cli ^2 -> ^3, migrate napi config from triples.additional -> targets per v3 schema, name -> binaryName - index.js / index.d.ts: regenerated by napi build v3 - yarn.lock / Cargo.lock: refreshed Verified locally on Node 18 (CI's version) and Node 22 that the minimal worker_threads + native-addon repro, direct ava invocation, and full `yarn test` all exit 0.
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
|
Warning Review the following alerts detected in dependencies. According to your organization's Security Policy, it is recommended to resolve "Warn" alerts. Learn more about Socket for GitHub.
|
@napi-rs/cli v3 (pulled in by the napi-rs 3 upgrade) depends on @inquirer/core, which imports node:util.styleText — added in Node 20.12. On Node 18 the CLI SyntaxErrors at startup before napi build can compile anything, producing exit 105 from cargo make. CI build environment only; package's engines.node (>= 10) remains unchanged for consumers.
Why
The
lint_and_test_nodejob has been failing for a long time withyarn testreporting all tests pass and then the process exiting with code 129 (under yarn) or 139 (under ava directly). It looked like a yarn problem because yarn is the visible top-level command, but it isn't.Root cause: napi-rs 2.x is not fully context-aware with respect to Node's
worker_threads. AVA 4+ runs each test file in a Worker by default. When the worker's V8 isolate is torn down, napi-rs 2.x leaves dangling references (thread_local!REFERENCE_MAP, threadsafe-function plumbing, env-cleanup hooks not deregistered) and the teardown SIGSEGVs. The crash propagates back through ava/yarn as 129 (SIGHUP, yarn rewraps the child signal) or 139 (raw SIGSEGV) depending on how the process tree dies.Verified the package manager is irrelevant — same crash reproduces under pnpm with a fresh project consuming the prebuilt
.node. The minimal repro isnew Worker(... import('./index.js'))followed by worker exit — no AVA, no yarn, no libfaketime needed. So migrating to pnpm would not have fixed it.The relevant fixes shipped in napi-rs 3.x:
What changed
node-attestation-bindings/Cargo.toml:napi2.10.6 → 3,napi-derive2.9.4 → 3,napi-build=2.0.1→2. Kept thenapi4feature flag (still maps to Node-API v4).node-attestation-bindings/src/lib.rs:napi::JsBufferdoesn't exist in v3. Switched tonapi::bindgen_prelude::Buffer, which implementsAsRef<[u8]>directly, so thecert.into_value()? .as_ref()plumbing collapses tocert.as_ref(). Twomatchblocks (and their dead error branches) drop out — net simplification.node-attestation-bindings/package.json:@napi-rs/cli^2.14.3→^3(resolves to 3.6.2).napiconfig to v3 schema:name→binaryName,triples.additional→ flattargetsarray. The v3targetslist must include the host triples that were previously implicit defaults, sox86_64-apple-darwin,x86_64-pc-windows-msvc, andx86_64-unknown-linux-gnuare added explicitly (publish surface unchanged).node-attestation-bindings/index.js/index.d.ts: regenerated bynapi buildv3. The TS signature now correctly types the params asBufferand the JS loader file is the modern v3 layout.node-attestation-bindings/yarn.lockandCargo.lock: refreshed.The
build/prepublishOnly/artifacts/universal/versionscripts inpackage.jsonwere left as-is — all flags (napi build --platform --release,napi prepublish -t npm,napi universal,napi artifacts) are still valid CLI surface in v3.What I deliberately did not touch
.github/workflows/build-and-deploy-node-bindings.yml: theon:triggers are entirely commented out, so this workflow doesn't run today. The matrix targets and thenapiscript invocations are still compatible with v3, but reactivating the publish flow on v3 deserves its own PR/verification (especially the per-arch docker images and thenapi artifacts/napi universalstep output paths). Out of scope here..github/workflows/lint-and-test.yml: no changes needed —cargo make cicallsyarn installandyarn test, both of which now succeed.workerThreads: falseworkaround added. The point of this PR is to fix it at the root so worker_threads stay enabled.Verification
Reproduced and fixed on Node 18 (the version CI uses) and Node 22:
new Worker()that only imports the addon and exitsnode node_modules/.bin/ava(worker_threads default)yarn test(the exact CI command, with libfaketime)pnpm testagainst the prebuilt.node(control)Core dump from the original crash showed the segfault deep in V8/N-API teardown threads with no addon symbols on the stack — consistent with the napi-rs context-awareness issue rather than anything in our Rust code.
Test plan
lint_and_test_nodeworkflow goes greenlint_and_test_rust(clippy/fmt over the workspace, which now resolves to napi 3) goes greenyarn buildlocally before we re-enable the publish workflowFollow-ups (not in this PR)
build-and-deploy-node-bindings.ymlonce we're ready to publish on v3 — needs a separate pass to revalidate the docker matrix and the artifact pipeline.Refs:
Generated by Claude Code