Skip to content

feat(transport): add transport-aware health check endpoints#1201

Open
dunglas wants to merge 5 commits intomainfrom
feat/transport-health
Open

feat(transport): add transport-aware health check endpoints#1201
dunglas wants to merge 5 commits intomainfrom
feat/transport-health

Conversation

@dunglas
Copy link
Copy Markdown
Owner

@dunglas dunglas commented Apr 15, 2026

Expose readiness and liveness probes via the Caddy admin API so Kubernetes (and other orchestrators) can detect when a hub's transport connection is actually healthy, not just that the process is running.

  • New TransportHealthChecker optional interface (Ready / Live) that transports can implement to report their state.
  • New admin.api.mercure_health Caddy module exposing /mercure/health/{ready,live} aggregate endpoints and /mercure/health/{name}/{ready,live} per-hub endpoints on the admin API (port 2019).
  • Caddyfile name directive for identifying hubs in per-hub endpoints and future metrics labels (defaults to "default").
  • Helm chart: transport-aware probes enabled by default, with fallback to the legacy /healthz on HTTP port when healthCheck.enabled=false. Admin port always exposed; metrics.port kept for backward compat.
  • /healthz on the HTTP port is deprecated (only checks that Caddy is running, not transport connectivity).

Transports that do not implement the interface (Bolt, Local) are considered always-healthy; the Enterprise transports (Redis, Postgres, Kafka, Pulsar) implement it in the Mercure Enterprise repository.

Expose readiness and liveness probes via the Caddy admin API so
Kubernetes (and other orchestrators) can detect when a hub's transport
connection is actually healthy, not just that the process is running.

- New TransportHealthChecker optional interface (Ready / Live) that
  transports can implement to report their state.
- New admin.api.mercure_health Caddy module exposing
  /mercure/health/{ready,live} aggregate endpoints and
  /mercure/health/{name}/{ready,live} per-hub endpoints on the admin
  API (port 2019).
- Caddyfile name directive for identifying hubs in per-hub endpoints
  and future metrics labels (defaults to "default").
- Helm chart: transport-aware probes enabled by default, with fallback
  to the legacy /healthz on HTTP port when healthCheck.enabled=false.
  Admin port always exposed; metrics.port kept for backward compat.
- /healthz on the HTTP port is deprecated (only checks that Caddy is
  running, not transport connectivity).

Transports that do not implement the interface (Bolt, Local) are
considered always-healthy; the Enterprise transports (Redis, Postgres,
Kafka, Pulsar) implement it in the Mercure Enterprise repository.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 15, 2026 16:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds transport-aware readiness/liveness health checks exposed via the Caddy admin API so orchestrators can distinguish “process is up” from “transport is healthy”, and updates the Helm chart/docs to use these endpoints (with /healthz marked deprecated).

Changes:

  • Introduces TransportHealthChecker (Ready/Live) and a new Caddy admin module admin.api.mercure_health exposing /mercure/health/* endpoints.
  • Adds a name directive for Mercure hubs to support per-hub health endpoints.
  • Updates Helm chart probes to use the admin health endpoints by default and updates docs/Caddyfiles to deprecate /healthz.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
transport.go Adds optional transport health-check interface (Ready/Live).
caddy/health.go Implements admin API health endpoints aggregating per-hub/per-transport status.
caddy/health_test.go Adds coverage for health endpoints (OK/404/405/unhealthy).
caddy/mercure.go Stores hub metadata (name/transport) for health aggregation and parses name directive.
charts/mercure/values.yaml Adds adminPort and new healthCheck values; deprecates metrics.port usage.
charts/mercure/templates/deployment.yaml Adds admin container port and switches probes to /mercure/health/* when enabled.
charts/mercure/templates/service.yaml Routes metrics Service to the admin container port.
docs/hub/config.md Documents new admin API health endpoints and deprecates /healthz.
Caddyfile Marks /healthz as deprecated in comments.
dev.Caddyfile Marks /healthz as deprecated in comments.
examples/chat/chart/mercure-example-chat/README.md Formatting update to values table (helm-docs output).
.github/workflows/lint.yml Workflow config tweak (per diff).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/hub/config.md
Comment thread charts/mercure/values.yaml Outdated
Comment thread caddy/health.go
Comment thread caddy/health.go Outdated
Comment thread charts/mercure/templates/deployment.yaml Outdated
Comment thread charts/mercure/templates/deployment.yaml
Comment thread charts/mercure/templates/service.yaml Outdated
dunglas and others added 2 commits April 15, 2026 18:39
The Caddy admin API binds to localhost:2019 by default for security,
so httpGet probes from the kubelet fail with connection refused (the
kubelet connects to the pod IP, not localhost inside the container).

Switch to exec probes that run curl from inside the container, which
matches the pattern already used by the preStop hook.

Also update the documentation to reflect this and note the alternative
(binding admin to 0.0.0.0, with the security caveat).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Return 404 when a per-hub health endpoint is queried for a hub name
  that doesn't exist, instead of silently responding 200 OK (which
  hid typos and misconfigurations).
- Don't expose internal transport error details in the HTTP response
  body; log them server-side and return a generic error message to
  avoid leaking connection details if admin is exposed beyond localhost.
- Chart: use adminPort (with metrics.port fallback) for preStop URL
  and metrics Service port so everything stays aligned when adminPort
  is customized.
- values.yaml: fix helm-docs annotation marker on adminPort.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Owner Author

dunglas commented Apr 15, 2026

A few of the Copilot review threads on this PR look already addressed in the current head (e76fceb) but are still marked unresolved — flagging them here so they can be closed out:

  1. docs/hub/config.md#L107 — "admin API binds to localhost" note. Addressed: the diff already adds the "These endpoints are exposed on the admin API port, not the main HTTP port. The Caddy admin API binds to localhost:2019 by default for security…" paragraph plus the admin 0.0.0.0:2019 alternative.
  2. caddy/health.go#L126 — 404 for unknown hub names. Addressed: checkTransports now tracks matched and returns errHubNotFound, which handleHealth translates to http.StatusNotFound. Also covered by TestHealthEndpointUnknownHub.
  3. charts/mercure/templates/deployment.yaml#L113 — preStop hook hitting hardcoded :2019. Addressed: preStop now posts to http://localhost:{{ .Values.adminPort | default .Values.metrics.port }}/stop.

Separately, a likely root cause for the lint-test (ct install) failure: with healthCheck.enabled: true as the default, the probes use exec with curl, but caddy:2-alpine doesn't ship curl (only ca-certificates, libcap, mailcap). BusyBox wget is available though, consistent with the docker-compose example you already put in docs/hub/config.md. I've pushed a suggested fix to claude/fix-pr-1201-curl-to-wget — switches both probes and the preStop hook to wget --spider, and regenerates charts/mercure/README.md so the new adminPort / healthCheck.* values show up in the values table. Feel free to cherry-pick or ignore.


Generated by Claude Code

claude and others added 2 commits April 16, 2026 11:22
…pine

The caddy:2-alpine base image ships ca-certificates, libcap and mailcap
but does not include curl. With healthCheck.enabled=true (the new default)
the curl-based exec probes would always fail readiness and liveness,
blocking ct install and any real deployment. The preStop hook had the
same latent issue but failed silently.

Switch both probes and the preStop hook to BusyBox wget (which is
included in alpine), matching the wget-based Docker healthcheck already
recommended in docs/hub/config.md.

Also regenerate charts/mercure/README.md so the values table documents
the new adminPort and healthCheck.* fields.
The caddy:2-alpine base image ships wget, not curl. Update the
Kubernetes and Docker Compose examples in the docs accordingly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants