-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Context
The backend currently runs alembic upgrade head automatically on container startup.
This works under single-replica, recreate-style deployments but will fail under:
- Multiple replicas
- Rolling deployments
- Horizontal scaling
Deployment occurs on NERSC Spin (Kubernetes).
The database is behind a firewall and requires explicit service/network configuration to allow access from Spin workloads.
We must define a production-safe migration strategy that works within these network constraints.
Problem
Startup-based migrations introduce:
- Race conditions when multiple replicas start simultaneously
- Tight coupling between application boot and schema changes
- Crash loops if migration fails
- Implicit constraint of
replicas=1 - Unsafe behavior during rolling updates
Additional constraint:
- Database access requires proper firewall/service configuration.
- Migration logic must run from within an allowed network boundary (e.g., Spin namespace).
Required Outcome
Define and implement a production-safe migration strategy for NERSC Spin that:
- Prevents concurrent schema migrations
- Decouples schema changes from application startup
- Supports multi-replica deployments
- Works within NERSC firewall/network constraints
- Clearly documents DB access requirements
Migration Strategies to Evaluate
Copilot should evaluate and propose one of the following:
- Dedicated Kubernetes Job in Spin namespace to run
alembic upgrade head - CI/CD migration step executed from within NERSC network boundary
- Explicit manual migration step from a controlled NERSC host
- Guarded startup migration using advisory DB locks (only if justified)
The recommendation must:
- Address firewall/service access requirements
- Specify where migrations execute (Spin pod, login node, CI runner, etc.)
- Include operational tradeoffs
Network / Firewall Requirements
Document:
- How Spin pods reach the database (service name, host, port)
- Required firewall rules or network policies
- Whether a dedicated migration Job requires separate service account or network policy
- Any changes required to expose or allow DB connectivity
Acceptance Criteria
-
Selected migration strategy documented
-
Deployment flow clearly defined:
build → migrate → deploy
-
Explicit scaling constraints documented (if any)
-
Firewall / service configuration documented
-
Kubernetes manifests updated if required
-
Startup-time migration removed or gated appropriately
-
Rollback strategy documented
Deliverables
- Code changes (if required)
- Kubernetes manifest updates (Deployment / Job / NetworkPolicy)
- README / ops documentation update
- Clear summary of chosen strategy and rationale