Skip to content
Merged
2 changes: 2 additions & 0 deletions backend/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ ENV PYTHONUNBUFFERED=1 \
PYTHONPATH=/app \
PATH="/app/.venv/bin:$PATH" \
UV_SYSTEM_PYTHON=1 \
UV_CACHE_DIR=/tmp/.cache/uv \
HOME=/tmp \
ENV=${ENV}

WORKDIR /app
Expand Down
2 changes: 1 addition & 1 deletion backend/app/scripts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ scripts/

### Domains

- **db/** — Database seeding and rollback utilities
- **db/** — Database migration, seeding, and rollback utilities
- **users/** — Administrative and service account management

---
Expand Down
14 changes: 2 additions & 12 deletions backend/entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,24 +30,14 @@ until pg_isready -d "${PG_URL}" -q; do
done
echo "✅ Database is ready"

# -----------------------------------------------------------
# Run Alembic migrations
# -----------------------------------------------------------
echo "🔄 Running Alembic migrations..."
if ! uv run alembic upgrade head; then
echo "❌ Alembic migrations failed"
exit 1
fi
echo "✅ Alembic migrations complete"

# -----------------------------------------------------------
# Start application
# -----------------------------------------------------------
if [ "$ENV" = "production" ]; then
echo "🚀 Starting SimBoard backend (production mode)..."
# In production, HTTPS is expected to be handled by a reverse proxy (e.g., Traefik).
# Uvicorn is started without SSL options here; do not enable HTTPS at the app layer in production.
exec uv run uvicorn app.main:app --host 0.0.0.0 --port 8000
exec uvicorn app.main:app --host 0.0.0.0 --port 8000
else
echo "⚙️ Starting SimBoard backend (development mode with HTTPS + autoreload)..."

Expand All @@ -58,7 +48,7 @@ else
exit 1
fi

exec uv run uvicorn app.main:app \
exec uvicorn app.main:app \
--host 0.0.0.0 \
--port 8000 \
--ssl-keyfile "${SSL_KEYFILE}" \
Expand Down
12 changes: 8 additions & 4 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,12 @@ Documentation for the SimBoard project.

```bash
docs/
├── README.md # This file
└── cicd/ # CI/CD and deployment
├── README.md # Quick start and overview
└── DEPLOYMENT.md # Complete reference guide
├── README.md # This file
├── cicd/ # CI/CD and deployment
│ ├── README.md # Quick start and overview
│ └── DEPLOYMENT.md # Complete reference guide
└── deploy/ # Environment-specific deployment runbooks
└── spin.md # Spin backend migration rollout + frontend/db/ingress config
```

---
Expand All @@ -25,6 +27,7 @@ docs/
**Need deployment details?**

- [cicd/DEPLOYMENT.md](cicd/DEPLOYMENT.md) - Complete reference
- [deploy/spin.md](deploy/spin.md) - Spin backend/frontend/db/ingress workload runbook

---

Expand All @@ -34,6 +37,7 @@ All CI/CD and deployment documentation is in the [`cicd/`](cicd/) directory:

- **[cicd/README.md](cicd/README.md)** - Quick start, overview, and common operations
- **[cicd/DEPLOYMENT.md](cicd/DEPLOYMENT.md)** - Complete deployment guide with workflows, Kubernetes examples, and troubleshooting
- **[deploy/spin.md](deploy/spin.md)** - Spin-specific backend migration-first plus frontend/db/ingress runbook

---

Expand Down
89 changes: 78 additions & 11 deletions docs/cicd/DEPLOYMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ Complete reference for CI/CD pipelines and NERSC Spin deployments.
- [Image Tagging Strategy](#image-tagging-strategy)
- [Development Deployment](#development-deployment)
- [Production Release Process](#production-release-process)
- [Database Migrations](#database-migrations)
- [Rollback Procedure](#rollback-procedure)
- [Manual Builds](#manual-builds)
- [Troubleshooting](#troubleshooting)
Expand Down Expand Up @@ -244,6 +245,8 @@ Update the image tags in the [Rancher UI](https://rancher2.spin.nersc.gov/dashbo
4. Set **Pull Policy** to `IfNotPresent`
5. Click **Save** — Rancher will roll out the new version

For backend releases, migrations run automatically in a backend initContainer during rollout. See [Database Migrations](#database-migrations).

### Step 5: Verify Production

1. In Rancher, check that pods are **Running** under **Workloads → Pods** in the prod namespace
Expand All @@ -254,19 +257,63 @@ Update the image tags in the [Rancher UI](https://rancher2.spin.nersc.gov/dashbo

## Database Migrations

Alembic database migrations run **automatically** when the backend container starts. No manual migration step is required during deployment.
Database migrations are executed by a backend Deployment initContainer during rollout, not on backend app startup.

### Runtime Behavior

- Backend container starts the API directly and does not run migrations at startup.
- InitContainer runs before backend container start and executes:
- `test -n "$DATABASE_URL" || { echo "DATABASE_URL is required"; exit 1; }; alembic upgrade head`

### Spin Workloads

### Startup Sequence
Reference runbook:

1. **Database readiness check** — the container waits (up to 30 seconds) for the PostgreSQL server to accept connections using `pg_isready`.
2. **`alembic upgrade head`** — applies any pending migrations. If the database is already up to date, this is a no-op.
3. **Application start** — Uvicorn launches only after migrations succeed.
- [`docs/deploy/spin.md`](../deploy/spin.md)

If either the database readiness check or migration step fails, the container exits immediately and does **not** start the application.
- Backend service/deployment baseline is defined for in-cluster API routing (`backend` on `8000`).
- Backend Deployment uses the image entrypoint directly (no app args required).
- Backend Deployment includes initContainer `migrate` using the same backend image tag to run Alembic before app start.
- Frontend service/deployment baseline is defined for UI routing (`frontend` on `80`).
- Frontend Deployment uses the frontend image default CMD (no explicit args).
- DB service/deployment baseline is defined for in-cluster Postgres (`db`).
- Ingress baseline (`lb`) terminates TLS via `simboard-tls-cert` and routes frontend/backend hosts.
- Backend and migration initContainer env values are sourced via `envFrom` from secret `simboard-backend-env`.
- DB container env values are sourced via `envFrom` from secret `simboard-db`.

### Deployment Order (Required)

1. Roll out backend deployment with the target image tag.
2. Wait for initContainer migration step to succeed.
3. Confirm backend pods become `Running` and `Ready`.

If initContainer migration fails, backend pods will not become ready and rollout should be treated as failed.

### Concurrency Note

The current deployment assumes a **single backend replica**. If horizontal scaling is introduced, migration execution should be separated into a one-time init container or deployment job to avoid race conditions.
InitContainers run per pod. If more than one backend pod is created simultaneously, migrations may execute concurrently.

Use an explicit rollout strategy that guarantees only one new pod (and therefore one migration initContainer) is created at a time:

```yaml
spec:
replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
```

Why this is required: with default `RollingUpdate` settings, Kubernetes may create a surge pod during updates, which can run a second migration initContainer even when the steady-state replica count is `1`.

If you need `replicas > 1`, use a DB-level migration lock so only one initContainer can run Alembic at a time. For PostgreSQL, wrap migration execution with a single transaction-scoped advisory lock (for example, `SELECT pg_advisory_lock(<fixed_key>); ... alembic upgrade head ...; SELECT pg_advisory_unlock(<fixed_key>);`).

Production-safe recommendation: apply both controls (serialized rollout strategy plus DB-level lock) for defense in depth.

### Rollback Caveat

Rolling back the backend container image does not roll back database schema automatically. Use backward-compatible migrations (expand/contract pattern), and use a separate, explicit rollback migration only when needed.

## Rollback Procedure

Expand Down Expand Up @@ -296,13 +343,17 @@ Alternatively, use the built-in Rancher rollback:

## Manual Builds

For testing or emergency builds:
For testing or emergency builds, you can manually build and push images using Docker Buildx. This is not recommended for regular use, as it bypasses CI checks and versioning conventions.

First login to the NERSC registry:

```bash
# Login
docker login registry.nersc.gov
```

### Backend

# Backend
```bash
cd backend
docker buildx build \
--platform=linux/amd64,linux/arm64 \
Expand All @@ -311,7 +362,23 @@ docker buildx build \
--push \
.

# Frontend
```

### Frontend (with API URL override)

```bash
# Development
cd frontend
docker buildx build \
--platform=linux/amd64,linux/arm64 \
--build-arg VITE_API_BASE_URL=https://simboard-dev-api.e3sm.org \
-t registry.nersc.gov/e3sm/simboard/frontend:manual \
--push \
.
```

```bash
# Production
cd frontend
docker buildx build \
--platform=linux/amd64,linux/arm64 \
Expand Down
Loading
Loading