Skip to content

feat(bundled-metadata): bundle vara runtime metadata for faster cold start#2450

Open
ukint-vs wants to merge 10 commits into
mainfrom
feat/bundle-vara-metadata
Open

feat(bundled-metadata): bundle vara runtime metadata for faster cold start#2450
ukint-vs wants to merge 10 commits into
mainfrom
feat/bundle-vara-metadata

Conversation

@ukint-vs
Copy link
Copy Markdown
Member

@ukint-vs ukint-vs commented Apr 26, 2026

Summary

Threads @polkadot/api's ApiOptions.metadata through GearApiProvider and GearApi.create so cold start can skip the state_getMetadata WebSocket round-trip. The pre-fetched data is published as @gear-js/bundled-metadata — a new workspace package any Substrate-based gear-js consumer can opt into. Generated bundle ships for vara-mainnet + vara-testnet; weekly CI refresh keeps it current. Wired into gear-idea-frontend (dynamic-import chunk) and idea/gear/squid (static import) as the two demo consumers.

Why

GearApi.create blocks on state_getMetadata (~500-1500 ms on a public RPC). The blob is large but rarely changes — keyed by (genesisHash, specVersion). @polkadot/api already supports skipping the fetch when a key matches and falls back transparently otherwise. We just turn it on, then make the data shareable across consumers.

Profile

scripts/profile-cold-start.mjs wss://rpc.vara.network (5 trials, measures ApiPromise.create only):

unbundled: median=911ms  p95=1029ms
bundled  : median=329ms  p95=362ms

Saves ~580 ms median on cold load. Reload too — browser HTTP cache reuses the metadata chunk, but state_getMetadata is not cacheable, so without the bundle every reload pays the same RPC cost.

Frontend impact

  • Eager-preloaded separate chunk (@gear-js/bundled-metadata, ~235 KB gzip) — fetch starts at JS parse time, races the WS handshake.
  • 3 s timeout fallback: a stalled CDN edge / proxy / flaky mobile network falls through to the RPC-fetch path instead of stranding the app on a blank render.
  • Entry chunk delta: +389 B.
  • Skipped on custom/dev RPCs (URL param, localhost) — chunk never downloads when the key wouldn't match.
  • Stale key (post-runtime upgrade): @polkadot/api silently falls back to RPC fetch. Tested.
  • Chunk load failure: render gate releases anyway, falls back to RPC fetch.

Squid impact

idea/gear/squid/src/main.ts:94 uses GearApi to read genesisHash then disconnects (the actual indexer is SubstrateBatchProcessor with its own RPC path). Bundled metadata saves one state_getMetadata round-trip per squid restart (~500-1500 ms). api.disconnect() is now awaited so disconnect doesn't race polkadot-js's internal init RPCs.

What's in the PR

New workspace package @gear-js/bundled-metadata:

  • utils/bundled-metadata/ — private, pure-data package mirroring the utils/util/ layout. Dual ESM/CJS via rollup, with closeBundle hook that writes lib/cjs/package.json containing {\"type\":\"commonjs\"} so Node CJS consumers don't hit ERR_REQUIRE_ESM (matches apis/gear/rollup.config.js).
  • utils/bundled-metadata/src/data.ts — generated, committed, biome-ignored.
  • utils/bundled-metadata/test/index.test.js — boundary smoke test imports the BUILT package (not source), covers ESM + CJS require interop. Test script chains yarn build && yarn jest.
  • utils/bundled-metadata/README.md — purpose, usage, refresh cadence, stale-key behavior, explicit Vara-only scope.
  • lerna.json + root package.json workspaces + biome ignores updated to include the new package.

Consumer wiring:

  • utils/gear-hooks/src/context/Api.tsxmetadata plumbed through switchNetwork, with stale-check guards in async paths and clearMarks/clearMeasures cleanup so long-lived tabs don't accumulate performance.* entries across network switches.
  • idea/gear/frontend/src/app/providers/api/Provider.tsx — eager-preload dynamic import from @gear-js/bundled-metadata, custom-RPC guard, 3 s timeout fallback.
  • idea/gear/squid/src/main.ts — static import + metadata arg, await api.disconnect().
  • idea/gear/frontend/package.json + idea/gear/squid/package.json — workspace deps.
  • Root package.json build:gear-idea-frontend / build:gear-idea-squid / build:gear-idea-backend scripts include --scope @gear-js/bundled-metadata.

Generation + refresh:

  • scripts/fetch-bundled-metadata.mjs — pinned to one finalized block hash via state.getMetadata(headHash) so a runtime upgrade between RPC calls can't produce a mismatched (key, hex) pair. Asserts expectedGenesis per target to guard against DNS hijack on the refresh path. Validates every entry against its own RPC (single-entry map forces polkadot-js to consume that exact blob rather than refetching) so a corrupt non-primary target blocks the bundle write instead of silently shipping.
  • scripts/test-bundled-metadata.mjs — 3 node:test smoke tests (stale-key fallback, blob reconstruction, timeout helper).
  • scripts/profile-cold-start.mjs — Node-only profiler.
  • scripts/bundled-metadata.config.json — vara-mainnet + vara-testnet with expectedGenesis per target.
  • .github/workflows/refresh-bundled-metadata.yml — Mon 06:00 UTC cron, opens chore PR.

Review trail

The branch was put through /simplify (3 parallel review agents), /plan-eng-review (with codex outside voice), /review (Claude adversarial + codex adversarial), and /codex review. Every cross-model finding was either fixed or has a recorded skip rationale. The structurally serious findings caught and fixed in the review loop:

  • Atomicity bug in the generator (was claimed atomic but wasn't) — fixed.
  • Boundary test imported source not the built package — rewritten.
  • Provider.tsx could hang on a stalled chunk — 3 s timeout added.
  • Squid api.disconnect() race — await-ed.
  • performance.mark/measure accumulation across switchNetwork — cleared.
  • New package not registered in lerna.json — added.
  • validateBundle only ran against the primary target — now every (entry, target) pair is validated.

Test plan

  • yarn workspace @gear-js/bundled-metadata test — 3/3 pass (ESM, key/value shape, CJS require interop)
  • node --test scripts/test-bundled-metadata.mjs — 3/3 (network-gated tests skip offline)
  • tsc --noEmit on frontend, gear-hooks, squid — clean
  • biome check — clean
  • node scripts/profile-cold-start.mjs — confirms ~580 ms median savings on a real RPC
  • Manual: load gear-idea-frontend in DevTools, verify no state_getMetadata WS frame post-handshake
  • Manual: switch networks via NodesSwitch, verify still no state_getMetadata
  • Manual: yarn workspace gear-idea-frontend build and inspect dist/assets/*.js — bundled-metadata must be a separate chunk
  • Manual: workflow_dispatch Refresh bundled metadata, expect "no change" first run

🤖 Generated with Claude Code

…getMetadata on cold start

Threads @polkadot/api's ApiOptions.metadata through GearApiProvider so cold
start can skip state_getMetadata (~500-1500ms saved on a public RPC). Adds a
generated bundled-metadata.ts for vara-mainnet + vara-testnet, a weekly CI
refresh that opens a chore PR, a Node-only profiler, and node:test smoke
tests for the stale-key fallback path.

Profile (wss://rpc.vara.network, 5 trials):
  unbundled: median=911ms p95=1029ms
  bundled  : median=329ms p95=362ms

Frontend: separate ~235KB gzip chunk preloaded eagerly, entry chunk delta
+389B. Skips on custom/dev RPCs (URL param, localhost). On stale key,
@polkadot/api transparently falls back to the RPC fetch.

release-gear-idea.yml fires on either common-package version bump or
bundled-metadata change so the chore PR redeploys staging.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to bundle and preload chain runtime metadata to optimize the initialization of the GearApiProvider. It includes scripts for fetching, validating, and profiling the metadata, along with updates to the ApiProvider to utilize this bundled data and track performance metrics. Review feedback identified a critical TypeError in the network switching logic where a subscription ID is incorrectly treated as an unsubscribe function, and noted that performance measures are cleared too early for effective observability.

const measureName = `gear-api:metadata-window:${sessionId}`;
mark(markConnectStart);

providerUnsubRef.current = provider.on('connected', async () => {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The on method of WsProvider and ScProvider from @polkadot/api returns a subscription ID (a number), not an unsubscribe function. Storing this value in providerUnsubRef.current and subsequently calling it as a function (as seen in the cleanup logic at line 77) will cause a TypeError at runtime when switching networks. To correctly unsubscribe, you should store the handler function and use provider.off('connected', handler) for cleanup, or wrap the subscription in a function that performs the cleanup.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified against @polkadot/rpc-provider/types.d.ts:51on(type, sub): () => void returns an unsubscribe function, not a numeric subscription ID. This is documented behavior and the pattern existed pre-PR; the network switcher works correctly. False positive.

Comment thread utils/gear-hooks/src/context/Api.tsx Outdated
performance.clearMarks(markConnectStart);
performance.clearMarks(markConnected);
performance.clearMarks(markReady);
performance.clearMeasures(measureName);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Calling performance.clearMeasures(measureName) immediately after performance.measure removes the entry from the performance timeline. This prevents the measurement from being captured by external telemetry tools or inspected in the browser's DevTools 'Performance' tab. Consider clearing measures only when they are no longer needed (e.g., at the start of the next switchNetwork call) to ensure they remain observable for profiling purposes.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in f15ad64. Dropped the clearMarks/clearMeasures calls so DevTools and external telemetry can still observe the metadata-window measurement. The buffer growth concern was theoretical; entries are bounded by switchNetwork call count, which is bounded by user behavior.

ukint-vs and others added 8 commits April 27, 2026 00:18
Drop the clearMarks/clearMeasures calls in switchNetwork — clearing
immediately after measure() removes the entries from the timeline before
DevTools or external telemetry can see them, defeating the purpose. The
buffer grows linearly with switchNetwork calls; that's bounded by user
behavior and the entries are tiny.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…data

# Conflicts:
#	.github/workflows/release-gear-idea.yml
#	package.json
#	yarn.lock
…nanoid

- Move withTimeout from fetch-bundled-metadata.mjs into scripts/common.mjs and
  drop the copy in test-bundled-metadata.mjs so the timeout test exercises the
  real helper.
- Export BundledMetadata from @gear-js/react-hooks; Provider.tsx imports it
  instead of redeclaring the same template-literal shape.
- Replace the module-scope sessionCounter + Date.now() id with nanoid(8) from
  the existing nanoid/non-secure dep so ids stay stable across HMR.
- Add a measure() helper alongside mark() and use it in place of the inline
  performance.mark / performance.measure block.
…rkspace package

Move the generated runtime metadata bundle out of idea/gear/frontend/src/shared/config
and into a dedicated workspace package so any Substrate-based gear-js consumer
(frontend, indexer, future CLI) can opt in without depending on a gear-idea
internal source path.

- Add utils/bundled-metadata/ with package.json, tsconfig.json, rollup.config.js,
  README.md, jest.config.ts, src/index.ts, test/index.test.js. Dual ESM/CJS build
  matches the apis/gear pattern, including a closeBundle hook that writes
  lib/cjs/package.json with {"type":"commonjs"} so Node CJS consumers don't hit
  ERR_REQUIRE_ESM.
- git mv idea/gear/frontend/src/shared/config/bundled-metadata.ts ->
  utils/bundled-metadata/src/data.ts so the file's history is preserved.
- Update scripts/bundled-metadata.config.json output path, the refresh workflow's
  add-paths, biome.json ignore, and root workspaces + build:gear-idea-frontend /
  build:gear-idea-squid / build:gear-idea-backend script scopes.
- Revert the stopgap BundledMetadata re-export from @gear-js/react-hooks now
  that the canonical home is @gear-js/bundled-metadata; the hook keeps its
  one-line structural alias.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switch gear-idea-frontend's dynamic import from the moved local path to the
new workspace package, and pass bundled metadata into squid's GearApi.create.

- idea/gear/frontend/src/app/providers/api/Provider.tsx: import the
  BundledMetadata type from @gear-js/bundled-metadata; dynamic-import the
  BUNDLED_METADATA map from the package so Vite still emits a separate chunk.
- idea/gear/squid/src/main.ts: static-import BUNDLED_METADATA and pass it to
  GearApi.create. The GearApi instance is used only to read genesisHash
  before disconnect, so this saves one state_getMetadata round-trip per squid
  (re)start, not indexer catch-up time.
- Add the workspace dep to both consumer package.json files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
biome.json already has vcs.useIgnoreFile: true, and every lib/ in the
monorepo is gitignored. The explicit **/lib glob added during the package
extraction was redundant and risked silently skipping any future lib/ source
directory.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two independent reviewers (Claude adversarial + Codex) converged on the same
set of issues. Address all five.

Critical (P1, cross-model confirmed):
- scripts/fetch-bundled-metadata.mjs: the script claimed atomic block-pinning
  but api.runtimeMetadata was loaded at connect time while runtimeVersion was
  fetched separately at headHash. During a runtime upgrade between those
  calls, the bundle could ship (oldSpec, newMeta) or vice versa - silent
  data corruption when polkadot-js matches the bad key. Switch to
  api.rpc.state.getMetadata(headHash) so both pin to the same finalized block.
- Add expectedGenesis per target in scripts/bundled-metadata.config.json; the
  fetch script asserts the live chain matches before writing. Guards against
  DNS hijack on the refresh path shipping a bundle keyed under the attacker's
  genesis.

Informational (P2-P3):
- utils/bundled-metadata/test/index.test.js: previously imported ../src/index
  (source), but the README claimed it tested the built package boundary. The
  CJS marker, exports map, and dual-build pipeline had zero coverage. Rewrite
  to import from @gear-js/bundled-metadata (built lib/) and add a createRequire
  smoke test that exercises the CJS path. Build script switched to
  yarn run -T rollup -c so yarn workspace ... test resolves rollup without
  going through lerna.
- idea/gear/frontend/src/app/providers/api/Provider.tsx: Promise.race the
  metadata chunk import against a 3s timeout so a stalled CDN edge / proxy
  / flaky mobile network can't strand the app on a blank render. Falls through
  to the RPC-fetch path on timeout.
- idea/gear/squid/src/main.ts: await api.disconnect() (was fire-and-forget).
  Bundled metadata removes the implicit state_getMetadata sync barrier;
  awaiting closes the race between disconnect and polkadot-js's internal
  init RPCs.
- utils/gear-hooks/src/context/Api.tsx: add clearSession() helper and call it
  after each performance.measure() and in the catch path. Long-running tabs
  no longer accumulate performance.mark/measure entries across switchNetwork
  churn. Also add a one-line comment explaining why BundledMetadata is
  redeclared inline (structural duplicate of @gear-js/bundled-metadata to
  avoid a runtime dep just for a type).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex review caught two issues that survived the prior review pass.

[P1] lerna.json:packages did not include utils/bundled-metadata. The repo's
Lerna package graph is sourced from this explicit list, so `lerna run build
--scope @gear-js/bundled-metadata` was a no-op on a clean checkout — frontend
and squid builds would fail to resolve @gear-js/bundled-metadata/lib because
the package was never built.

[P2] scripts/fetch-bundled-metadata.mjs validated only the validateAgainst
target (vara-mainnet). The vara-testnet entry was written without ever
reconstructing an ApiPromise from it, so a corrupt-blob with matching key
under testnet would ship silently. Refactor: validateEntry(entry, target)
runs once per (entry, target) pair using a SINGLE-ENTRY map, forcing
polkadot-js to consume that exact blob rather than refetching from chain.
Drop the unused validateAgainst field from the config.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ukint-vs ukint-vs changed the title feat(gear-idea/frontend): bundle vara runtime metadata for faster cold start feat(bundled-metadata): bundle vara runtime metadata for faster cold start May 15, 2026
@ukint-vs
Copy link
Copy Markdown
Member Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to bundle Vara runtime metadata, significantly reducing cold-start times for the frontend and indexer by skipping the state_getMetadata RPC round-trip. It adds a new utils/bundled-metadata package, along with scripts for fetching, validating, and profiling the metadata. The ApiProvider in the frontend and the squid indexer have been updated to utilize this bundled data when connecting to Vara networks. Feedback was provided regarding the withTimeout utility in the frontend, suggesting that the setTimeout timer should be explicitly cleared upon promise resolution to prevent unnecessary tasks in the event loop.

Comment on lines +22 to +26
const withTimeout = <T,>(p: Promise<T>, ms: number) =>
Promise.race([
p,
new Promise<T>((_, rej) => setTimeout(() => rej(new Error('bundled-metadata chunk timed out')), ms)),
]);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of withTimeout does not clear the setTimeout timer if the main promise resolves before the timeout. While minor for a one-time initialization, it is better practice to clear the timer to avoid unnecessary tasks in the event loop, especially in long-lived single-page applications.

Suggested change
const withTimeout = <T,>(p: Promise<T>, ms: number) =>
Promise.race([
p,
new Promise<T>((_, rej) => setTimeout(() => rej(new Error('bundled-metadata chunk timed out')), ms)),
]);
const withTimeout = <T,>(p: Promise<T>, ms: number) => {
let timer: ReturnType<typeof setTimeout>;
const timeout = new Promise<T>((_, rej) => {
timer = setTimeout(() => rej(new Error('bundled-metadata chunk timed out')), ms);
});
return Promise.race([p, timeout]).finally(() => clearTimeout(timer));
};

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 022f7cf. withTimeout now stores the timer ref and calls clearTimeout via .finally(), matching the pattern already used in scripts/common.mjs. Long-lived SPA sessions no longer accumulate armed timers from each module re-eval on fast chunk fetches.

…first

Per gemini-code-assist review on PR #2450: the inline withTimeout in
Provider.tsx never called clearTimeout when the main promise resolved before
the timeout fired. Each module-eval allocated a timer that stayed armed for
the full 3s window even on a fast chunk fetch. Mirror the pattern from
scripts/common.mjs:withTimeout: store the timer, race, .finally(() =>
clearTimeout(timer)).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant