[DevOps]: Production-Safe Alembic Migration Strategy (NERSC Spin)

### Context

The backend currently runs `alembic upgrade head` automatically on container startup.

This works under single-replica, recreate-style deployments but will fail under:

- Multiple replicas
- Rolling deployments
- Horizontal scaling

Deployment occurs on **NERSC Spin (Kubernetes)**.

The database is behind a firewall and requires explicit service/network configuration to allow access from Spin workloads.

We must define a production-safe migration strategy that works within these network constraints.

---

## Problem

Startup-based migrations introduce:

- Race conditions when multiple replicas start simultaneously
- Tight coupling between application boot and schema changes
- Crash loops if migration fails
- Implicit constraint of `replicas=1`
- Unsafe behavior during rolling updates

Additional constraint:

- Database access requires proper firewall/service configuration.
- Migration logic must run from within an allowed network boundary (e.g., Spin namespace).

---

## Required Outcome

Define and implement a production-safe migration strategy for NERSC Spin that:

- Prevents concurrent schema migrations
- Decouples schema changes from application startup
- Supports multi-replica deployments
- Works within NERSC firewall/network constraints
- Clearly documents DB access requirements

---

## Migration Strategies to Evaluate

Copilot should evaluate and propose one of the following:

1. Dedicated Kubernetes Job in Spin namespace to run `alembic upgrade head`
2. CI/CD migration step executed from within NERSC network boundary
3. Explicit manual migration step from a controlled NERSC host
4. Guarded startup migration using advisory DB locks (only if justified)

The recommendation must:

- Address firewall/service access requirements
- Specify where migrations execute (Spin pod, login node, CI runner, etc.)
- Include operational tradeoffs

---

## Network / Firewall Requirements

Document:

- How Spin pods reach the database (service name, host, port)
- Required firewall rules or network policies
- Whether a dedicated migration Job requires separate service account or network policy
- Any changes required to expose or allow DB connectivity

---

## Acceptance Criteria

- Selected migration strategy documented
- Deployment flow clearly defined:

  build → migrate → deploy

- Explicit scaling constraints documented (if any)
- Firewall / service configuration documented
- Kubernetes manifests updated if required
- Startup-time migration removed or gated appropriately
- Rollback strategy documented

---

## Deliverables

- Code changes (if required)
- Kubernetes manifest updates (Deployment / Job / NetworkPolicy)
- README / ops documentation update
- Clear summary of chosen strategy and rationale

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DevOps]: Production-Safe Alembic Migration Strategy (NERSC Spin) #117

Context

Problem

Required Outcome

Migration Strategies to Evaluate

Network / Firewall Requirements

Acceptance Criteria

Deliverables

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[DevOps]: Production-Safe Alembic Migration Strategy (NERSC Spin) #117

Description

Context

Problem

Required Outcome

Migration Strategies to Evaluate

Network / Firewall Requirements

Acceptance Criteria

Deliverables

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions