Migrate from Blacksmith to Namespace for CI runners and caching#4007
Conversation
Runners: all warm/build/deploy jobs move to namespace-profile-default (upgrade the profile later for more cores + cache volumes; nothing else changes when you do). Caching: /nix and the Pulumi plugin dir ride Namespace cache volumes (nscloud-cache-action, SHA-pinned), declared continue-on-error because volumes are strictly an optimization here — a cold or missing volume means setup-nix installs fresh and Cachix substitutes the warm layers: slower, never wrong. Stale or last-write-wins volume content is equally harmless for the same reason. Handoff: build->deploy artifact transfer moves to run-scoped GitHub artifacts, which are strongly consistent across jobs — chosen deliberately after the Blacksmith sticky-disk snapshot-visibility race (empty clone 41s after a verified commit). Consistency-critical data no longer rides a primitive whose consistency we cannot verify; receipts (sha256+size on both sides) and the extract guard remain. Docker: deploy jobs use the stock buildx builder for now (cold layer cache per job; convert-service pays ~2-3 min). Namespace remote builders are the follow-up alongside the profile upgrade. The provider-neutral core — flake layers, hakari, pruned sources, Cachix, concurrency groups, on-push/release entry points — is untouched. https://claude.ai/code/session_0144BqNCnE75AX6LSyEBoGtL
…docker builders Handoff tars now ride 'nsc artifact upload/download' (attempt-scoped paths, 24h expiry) instead of the GitHub artifacts API — strongly consistent object storage on Namespace's network. The deploy job downloads into runner.temp (outside the workspace the composite's checkout cleans) and feeds the composite's existing guarded tar-path branch, so receipts and the extract guard apply unchanged. The composite's Blacksmith builder branch becomes a Namespace remote builder branch (nscloud-setup-buildx-action, SHA-pinned): persistent layer cache so heavy bases (LibreOffice/Collabora) stay warm across deploys. Stock buildx remains the default for non-Namespace callers. https://claude.ai/code/session_0144BqNCnE75AX6LSyEBoGtL
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis PR migrates the deploy pipeline's artifact handoff mechanism from Blacksmith sticky disks to Namespace artifact storage. It adds a Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
The first Namespace run failed in setup-nix: the Determinate installer
exits when systemd is not active ('systemd was not active ... consider
passing --init none'), and Namespace runners do not run systemd, unlike
the Blacksmith VMs this action was written against.
setup-nix now detects systemd (/run/systemd/system) and branches: with
systemd it behaves exactly as before; without it, it installs with
--init none and starts nix-daemon directly as a background process,
waiting for the daemon socket (with the daemon log surfaced on failure).
The warm path repairs the daemon the same way, and the substituter-append
path restarts whichever daemon flavour is running. teardown-nix
additionally pkills directly-started daemons so cache-volume unmounts
stay clean.
https://claude.ai/code/session_0144BqNCnE75AX6LSyEBoGtL
Summary
Migrates the deploy pipeline from Blacksmith to Namespace: runners, caching, artifact handoff, and docker builders. Motivated by this week's operational record on Blacksmith (a regression shipped through the floating
stickydisk@v1tag that emptied handoff disks, an identical consistency bug open upstream since April with no response, and today's major outage).All three deploy entry points (push-to-dev, manual dispatches, prod releases) migrate together since they share
deploy-all-services.yml. The provider-neutral core — flake layers, hakari, pruned per-artifact sources, Cachix, concurrency groups — is untouched.Key Changes
namespace-profile-linux-small. Note: smaller than Blacksmith's 32vcpu, so compile-heavy runs will be slower until profiles are upgraded; substitution-heavy runs are bootstrap-bound and largely unaffected. Upgrading later is a label change.nscloud-cache-action, SHA-pinned) back/nixand/pulumi/plugins, declaredcontinue-on-error— deliberately optimization-only. A cold or missing volume means setup-nix installs fresh and Cachix substitutes: slower, never wrong. Consistency-insensitive data only.nsc artifact upload/download): strongly consistent object storage replaces the sticky-disk handoff that raced snapshot visibility (empty clone 41s after a verified commit). Paths are per run-attempt and per service (handoff/<run_id>-<attempt>/<service>/…, 24h expiry), so re-runs can't collide and services can't cross-read. The deploy job downloads intorunner.temp(outside the workspace the composite's checkout cleans) and feeds the composite's existing guarded tar branch — sha256 receipts on both sides and the fail-loud extract guard carry over unchanged.nscloud-setup-buildx-action, SHA-pinned) for persistent layer caching (LibreOffice/Collabora bases stay warm). Stock buildx remains the default for non-Namespace callers (reusable-deploy-serviceunaffected).