Skip to content

Migrate from Blacksmith to Namespace for CI runners and caching#4007

Merged
synoet merged 5 commits into
mainfrom
synoet/migrate-namespace
Jun 11, 2026
Merged

Migrate from Blacksmith to Namespace for CI runners and caching#4007
synoet merged 5 commits into
mainfrom
synoet/migrate-namespace

Conversation

@synoet

@synoet synoet commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

Migrates the deploy pipeline from Blacksmith to Namespace: runners, caching, artifact handoff, and docker builders. Motivated by this week's operational record on Blacksmith (a regression shipped through the floating stickydisk@v1 tag that emptied handoff disks, an identical consistency bug open upstream since April with no response, and today's major outage).

All three deploy entry points (push-to-dev, manual dispatches, prod releases) migrate together since they share deploy-all-services.yml. The provider-neutral core — flake layers, hakari, pruned per-artifact sources, Cachix, concurrency groups — is untouched.

Key Changes

  • Runners: all warm/build/deploy jobs → namespace-profile-linux-small. Note: smaller than Blacksmith's 32vcpu, so compile-heavy runs will be slower until profiles are upgraded; substitution-heavy runs are bootstrap-bound and largely unaffected. Upgrading later is a label change.
  • Cache volumes (nscloud-cache-action, SHA-pinned) back /nix and /pulumi/plugins, declared continue-on-error — deliberately optimization-only. A cold or missing volume means setup-nix installs fresh and Cachix substitutes: slower, never wrong. Consistency-insensitive data only.
  • Handoff via Namespace artifact storage (nsc artifact upload/download): strongly consistent object storage replaces the sticky-disk handoff that raced snapshot visibility (empty clone 41s after a verified commit). Paths are per run-attempt and per service (handoff/<run_id>-<attempt>/<service>/…, 24h expiry), so re-runs can't collide and services can't cross-read. The deploy job downloads into runner.temp (outside the workspace the composite's checkout cleans) and feeds the composite's existing guarded tar branch — sha256 receipts on both sides and the fail-loud extract guard carry over unchanged.
  • Docker: the composite's Blacksmith builder branch becomes a Namespace remote builder branch (nscloud-setup-buildx-action, SHA-pinned) for persistent layer caching (LibreOffice/Collabora bases stay warm). Stock buildx remains the default for non-Namespace callers (reusable-deploy-service unaffected).
  • All new third-party actions pinned by commit SHA — floating tags are how this week happened.

claude added 3 commits June 11, 2026 18:16
Runners: all warm/build/deploy jobs move to namespace-profile-default
(upgrade the profile later for more cores + cache volumes; nothing else
changes when you do).

Caching: /nix and the Pulumi plugin dir ride Namespace cache volumes
(nscloud-cache-action, SHA-pinned), declared continue-on-error because
volumes are strictly an optimization here — a cold or missing volume
means setup-nix installs fresh and Cachix substitutes the warm layers:
slower, never wrong. Stale or last-write-wins volume content is equally
harmless for the same reason.

Handoff: build->deploy artifact transfer moves to run-scoped GitHub
artifacts, which are strongly consistent across jobs — chosen
deliberately after the Blacksmith sticky-disk snapshot-visibility race
(empty clone 41s after a verified commit). Consistency-critical data no
longer rides a primitive whose consistency we cannot verify; receipts
(sha256+size on both sides) and the extract guard remain.

Docker: deploy jobs use the stock buildx builder for now (cold layer
cache per job; convert-service pays ~2-3 min). Namespace remote builders
are the follow-up alongside the profile upgrade.

The provider-neutral core — flake layers, hakari, pruned sources, Cachix,
concurrency groups, on-push/release entry points — is untouched.

https://claude.ai/code/session_0144BqNCnE75AX6LSyEBoGtL
…docker builders

Handoff tars now ride 'nsc artifact upload/download' (attempt-scoped
paths, 24h expiry) instead of the GitHub artifacts API — strongly
consistent object storage on Namespace's network. The deploy job
downloads into runner.temp (outside the workspace the composite's
checkout cleans) and feeds the composite's existing guarded tar-path
branch, so receipts and the extract guard apply unchanged.

The composite's Blacksmith builder branch becomes a Namespace remote
builder branch (nscloud-setup-buildx-action, SHA-pinned): persistent
layer cache so heavy bases (LibreOffice/Collabora) stay warm across
deploys. Stock buildx remains the default for non-Namespace callers.

https://claude.ai/code/session_0144BqNCnE75AX6LSyEBoGtL
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: db14f2e2-45d8-49d9-b5e7-3bd0e77eb5b6

📥 Commits

Reviewing files that changed from the base of the PR and between 2050f1f and 73d5733.

📒 Files selected for processing (4)
  • .github/actions/deploy-cloud-storage-pulumi/action.yml
  • .github/actions/setup-nix/action.yml
  • .github/actions/teardown-nix/action.yml
  • .github/workflows/deploy-all-services.yml

📝 Walkthrough

Summary by CodeRabbit

  • Chores
    • Transitioned build artifact handling from legacy sticky disks to modern artifact storage for improved reliability
    • Enhanced Nix environment setup with automatic systemd detection and fallback support for broader runner compatibility
    • Improved cache management with better cleanup and process handling in deployment teardown
    • Updated deployment workflow with modern artifact caching for more efficient CI/CD operations

Walkthrough

This PR migrates the deploy pipeline's artifact handoff mechanism from Blacksmith sticky disks to Namespace artifact storage. It adds a use-namespace-builder input to the deploy action to conditionally use Namespace Docker builders, refactors Nix daemon lifecycle management in setup-nix and teardown-nix to detect and handle non-systemd runners, updates warm jobs to mount Namespace cache volumes, and changes binary and lambda build jobs to package outputs as tarballs and upload them to Namespace artifact storage instead of writing to sticky disks. The deploy job now downloads these artifacts from Namespace storage and passes their locations to the deploy action.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

claude added 2 commits June 11, 2026 18:32
The first Namespace run failed in setup-nix: the Determinate installer
exits when systemd is not active ('systemd was not active ... consider
passing --init none'), and Namespace runners do not run systemd, unlike
the Blacksmith VMs this action was written against.

setup-nix now detects systemd (/run/systemd/system) and branches: with
systemd it behaves exactly as before; without it, it installs with
--init none and starts nix-daemon directly as a background process,
waiting for the daemon socket (with the daemon log surfaced on failure).
The warm path repairs the daemon the same way, and the substituter-append
path restarts whichever daemon flavour is running. teardown-nix
additionally pkills directly-started daemons so cache-volume unmounts
stay clean.

https://claude.ai/code/session_0144BqNCnE75AX6LSyEBoGtL
@synoet synoet marked this pull request as ready for review June 11, 2026 19:21
@synoet synoet merged commit 944e906 into main Jun 11, 2026
157 checks passed
@synoet synoet deleted the synoet/migrate-namespace branch June 11, 2026 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants