-
Notifications
You must be signed in to change notification settings - Fork 180
Fix orchestrator and template manager health checks #1392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- use provider=nomad explicitly - grpc is only supported by consul-integrated nomad clusters, use http instead - add tcp health check for orchestrator proxy port - use network block to define orchestrator ports (similar to template manager)
| network { | ||
| mode = "host" | ||
|
|
||
| port "api" { | ||
| // todo: remove this once all API and client-proxy jobs | ||
| // can pull the port number from nomad. | ||
| static = "${port}" | ||
| } | ||
|
|
||
| port "proxy" { | ||
| // todo: remove this once all API and client-proxy jobs | ||
| // can pull the port number from nomad. | ||
| static = "${proxy_port}" | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This format is exactly the same as the one for template-manager, and allows us to refer to ports by name instead of number
| port = "${port}" | ||
| name = "orchestrator" | ||
| port = "api" | ||
| provider = "nomad" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This enables the health checks to work. Without it, we default to consul, which doesn't function.
| provider = "nomad" | ||
|
|
||
| check { | ||
| type = "grpc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GRPC only works when provider=consul, but we expose an http health endpoint as well.
| grpc_use_tls = false | ||
| port = "${port}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need this, as we're using HTTP (not GRPC) and it inherits the port of the service block that surrounds the check.
| check { | ||
| type = "tcp" | ||
| name = "health" | ||
| interval = "30s" | ||
| timeout = "1s" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use TCP to ensure that the port is accepting connections
| GRPC_PORT = "$${NOMAD_PORT_api}" | ||
| PROXY_PORT = "$${NOMAD_PORT_proxy}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This refers to the ports defined in the network block on line 8.
Before:

After:

Note
Replace gRPC health checks with HTTP, add Nomad-labeled ports/provider, use NOMAD_PORT envs, and add TCP check for orchestrator proxy.
iac/provider-gcp/nomad/jobs/orchestrator.hcl):networkblock withmode = "host"and labeled portsapiandproxy(static).api,proxy) andprovider = "nomad".orchestratorfromgrpcto HTTPGET /health; add TCP check fororchestrator-proxy.GRPC_PORT=$${NOMAD_PORT_api},PROXY_PORT=$${NOMAD_PORT_proxy}.iac/provider-gcp/nomad/jobs/template-manager.hcl):apiand reference it in the service; setprovider = "nomad".grpcto HTTPGET /health.GRPC_PORT=$${NOMAD_PORT_api}.Written by Cursor Bugbot for commit 13bf326. This will update automatically on new commits. Configure here.