-
Notifications
You must be signed in to change notification settings - Fork 116
Nomad: recommendations for singleton deployments #1473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Vercel Previews Deployed
|
46f2009 to
09ca115
Compare
Broken Link CheckerNo broken links found! 🎉 |
7a8b08f to
3e86c40
Compare
Many users have a requirement to run exactly one instance of a given allocation because it requires exclusive access to some cluster-wide resource, which we'll refer to here as a "singleton allocation". This is challenging to implement, so this document is intended to describe an accepted design to publish as a how-to/tutorial.
3e86c40 to
aefaab2
Compare
aimeeu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great technical content! I left some style guide and presentation suggestions.
| workload needs exclusive access to a remote resource like a data store. Nomad | ||
| does not support singleton deployments as a built-in feature. Your workloads | ||
| continue to run even when the Nomad client agent has crashed, so ensuring | ||
| there's at most one allocation for a given workload some cooperation from the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| there's at most one allocation for a given workload some cooperation from the | |
| there's at most one allocation for a given workload requires some cooperation from the |
missing a verb between "workload" and "some" so guessing "requires"??
|
|
||
| ## Design Goals | ||
|
|
||
| The configuration described here meets two primary design goals: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The configuration described here meets two primary design goals: | |
| The configuration described here meets these primary design goals: |
you have more than 2 bullet points...
| * The design will prevent a specific process with a task from running if there | ||
| is another instance of that task running anywhere else on the Nomad cluster. | ||
| * Nomad should be able to recover from failure of the task or the node on which | ||
| the task is running with minimal downtime, where "recovery" means that the | ||
| original task should be stopped and that Nomad should schedule a replacement | ||
| task. | ||
| * Nomad should minimize false positive detection of failures to avoid | ||
| unnecessary downtime during the cutover. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * The design will prevent a specific process with a task from running if there | |
| is another instance of that task running anywhere else on the Nomad cluster. | |
| * Nomad should be able to recover from failure of the task or the node on which | |
| the task is running with minimal downtime, where "recovery" means that the | |
| original task should be stopped and that Nomad should schedule a replacement | |
| task. | |
| * Nomad should minimize false positive detection of failures to avoid | |
| unnecessary downtime during the cutover. | |
| - The design prevents a specific process with a task from running if there | |
| is another instance of that task running anywhere else on the Nomad cluster. | |
| - Nomad should be able to recover from failure of the task or the node on which | |
| the task is running with minimal downtime, where "recovery" means that Nomad should stop the | |
| original task and schedule a replacement | |
| task. | |
| - Nomad should minimize false positive detection of failures to avoid | |
| unnecessary downtime during the cutover. |
style nits: - instead of * for unordered lists; present tense, active voice
| faster you make Nomad attempt to recover from failure, the more likely that a | ||
| transient failure causes a replacement to be scheduled and a subsequent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| faster you make Nomad attempt to recover from failure, the more likely that a | |
| transient failure causes a replacement to be scheduled and a subsequent | |
| faster you make Nomad attempt to recover from failure, the more likely that a | |
| transient failure causes Nomad to schedule a replacement and a subsequent |
active voice nit
| allocation in a distributed system. This design will err on the side of | ||
| correctness: having 0 or 1 allocations running rather than the incorrect 1 or 2 | ||
| allocations running. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| allocation in a distributed system. This design will err on the side of | |
| correctness: having 0 or 1 allocations running rather than the incorrect 1 or 2 | |
| allocations running. | |
| allocation in a distributed system. This design errs on the side of | |
| correctness: having zero or one allocations running rather than the incorrect one or two | |
| allocations running. |
| ```hcl | ||
| job "example" { | ||
| group "group" { | ||
|
|
||
| disconnect { | ||
| stop_on_client_after = "1m" | ||
| } | ||
|
|
||
| task "lock" { | ||
| leader = true | ||
| config { | ||
| driver = "raw_exec" | ||
| command = "/opt/lock-script.sh" | ||
| pid_mode = "host" | ||
| } | ||
|
|
||
| identity { | ||
| env = true # make NOMAD_TOKEN available to lock command | ||
| } | ||
| } | ||
|
|
||
| task "application" { | ||
| lifecycle { | ||
| hook = "poststart" | ||
| sidecar = true | ||
| } | ||
|
|
||
| config { | ||
| driver = "docker" | ||
| image = "example/app:1" | ||
| } | ||
| } | ||
| } | ||
| } | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ```hcl | |
| job "example" { | |
| group "group" { | |
| disconnect { | |
| stop_on_client_after = "1m" | |
| } | |
| task "lock" { | |
| leader = true | |
| config { | |
| driver = "raw_exec" | |
| command = "/opt/lock-script.sh" | |
| pid_mode = "host" | |
| } | |
| identity { | |
| env = true # make NOMAD_TOKEN available to lock command | |
| } | |
| } | |
| task "application" { | |
| lifecycle { | |
| hook = "poststart" | |
| sidecar = true | |
| } | |
| config { | |
| driver = "docker" | |
| image = "example/app:1" | |
| } | |
| } | |
| } | |
| } | |
| ``` | |
| <CodeBlockConfig lineNumbers highlight="9"> | |
| ```hcl | |
| job "example" { | |
| group "group" { | |
| disconnect { | |
| stop_on_client_after = "1m" | |
| } | |
| task "lock" { | |
| leader = true | |
| config { | |
| driver = "raw_exec" | |
| command = "/opt/lock-script.sh" | |
| pid_mode = "host" | |
| } | |
| identity { | |
| env = true # make NOMAD_TOKEN available to lock command | |
| } | |
| } | |
| task "application" { | |
| lifecycle { | |
| hook = "poststart" | |
| sidecar = true | |
| } | |
| config { | |
| driver = "docker" | |
| image = "example/app:1" | |
| } | |
| } | |
| } | |
| } | |
| ``` | |
| </CodeBlockConfig> |
add code highlight
| The easiest way to implement the locking logic is to use `nomad var lock` as a | ||
| shim in your task. The jobspec below assumes there's a Nomad binary in the | ||
| container image. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The easiest way to implement the locking logic is to use `nomad var lock` as a | |
| shim in your task. The jobspec below assumes there's a Nomad binary in the | |
| container image. | |
| We recommend implementing the locking logic with `nomad var lock` as a shim in | |
| your task. This example jobspec assumes there's a Nomad binary in the container | |
| image. |
| You set this policy on the job with `nomad acl policy apply -namespace prod -job | ||
| example example-lock ./policy.hcl`. | ||
|
|
||
| ### Using `nomad var lock` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ### Using `nomad var lock` | |
| ## Implementation | |
| ### Use `nomad var lock` |
Add an H2 so we have Overview and Implementation in the right-hand page TOC
| ### Sidecar Lock | ||
|
|
||
| If cannot implement the lock logic in your application or with a shim such as | ||
| `nomad var lock`, you'rll need to implement it such that the task you are | ||
| locking is running as a sidecar of the locking task, which has | ||
| [`task.leader=true`][] set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ### Sidecar Lock | |
| If cannot implement the lock logic in your application or with a shim such as | |
| `nomad var lock`, you'rll need to implement it such that the task you are | |
| locking is running as a sidecar of the locking task, which has | |
| [`task.leader=true`][] set. | |
| ### Sidecar lock | |
| If you cannot implement the lock logic in your application or with a shim such | |
| as `nomad var lock`, you need to implement it such that the task you are locking | |
| is running as a sidecar of the locking task, which has [`task.leader=true`][] | |
| set. |
| * The locking task must be in the same group as the task being locked. | ||
| * The locking task must be able to terminate the task being locked without the | ||
| Nomad client being up (i.e. they share the same PID namespace, or the locking | ||
| task is privileged). | ||
| * The locking task must have a way of signalling the task being locked that it | ||
| is safe to start. For example, the locking task can write a sentinel file into | ||
| the /alloc directory, which the locked task tries to read on startup and | ||
| blocks until it exists. | ||
|
|
||
| If the third requirement cannot be met, then you’ll need to split the lock | ||
| acquisition and lock heartbeat into separate tasks: | ||
|
|
||
| ```hcl | ||
| job "example" { | ||
| group "group" { | ||
|
|
||
| disconnect { | ||
| stop_on_client_after = "1m" | ||
| } | ||
|
|
||
| task "acquire" { | ||
| lifecycle { | ||
| hook = "prestart" | ||
| sidecar = false | ||
| } | ||
| config { | ||
| driver = "raw_exec" | ||
| command = "/opt/lock-acquire-script.sh" | ||
| } | ||
| identity { | ||
| env = true # make NOMAD_TOKEN available to lock command | ||
| } | ||
| } | ||
|
|
||
| task "heartbeat" { | ||
| leader = true | ||
| config { | ||
| driver = "raw_exec" | ||
| command = "/opt/lock-heartbeat-script.sh" | ||
| pid_mode = "host" | ||
| } | ||
| identity { | ||
| env = true # make NOMAD_TOKEN available to lock command | ||
| } | ||
| } | ||
|
|
||
| task "application" { | ||
| lifecycle { | ||
| hook = "poststart" | ||
| sidecar = true | ||
| } | ||
|
|
||
| config { | ||
| driver = "docker" | ||
| image = "example/app:1" | ||
| } | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| If the primary task is configured to [`restart`][], the task should be able to | ||
| restart within the lock TTL in order to minimize flapping on restart. This | ||
| improves availability but isn't required for correctness. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * The locking task must be in the same group as the task being locked. | |
| * The locking task must be able to terminate the task being locked without the | |
| Nomad client being up (i.e. they share the same PID namespace, or the locking | |
| task is privileged). | |
| * The locking task must have a way of signalling the task being locked that it | |
| is safe to start. For example, the locking task can write a sentinel file into | |
| the /alloc directory, which the locked task tries to read on startup and | |
| blocks until it exists. | |
| If the third requirement cannot be met, then you’ll need to split the lock | |
| acquisition and lock heartbeat into separate tasks: | |
| ```hcl | |
| job "example" { | |
| group "group" { | |
| disconnect { | |
| stop_on_client_after = "1m" | |
| } | |
| task "acquire" { | |
| lifecycle { | |
| hook = "prestart" | |
| sidecar = false | |
| } | |
| config { | |
| driver = "raw_exec" | |
| command = "/opt/lock-acquire-script.sh" | |
| } | |
| identity { | |
| env = true # make NOMAD_TOKEN available to lock command | |
| } | |
| } | |
| task "heartbeat" { | |
| leader = true | |
| config { | |
| driver = "raw_exec" | |
| command = "/opt/lock-heartbeat-script.sh" | |
| pid_mode = "host" | |
| } | |
| identity { | |
| env = true # make NOMAD_TOKEN available to lock command | |
| } | |
| } | |
| task "application" { | |
| lifecycle { | |
| hook = "poststart" | |
| sidecar = true | |
| } | |
| config { | |
| driver = "docker" | |
| image = "example/app:1" | |
| } | |
| } | |
| } | |
| } | |
| ``` | |
| If the primary task is configured to [`restart`][], the task should be able to | |
| restart within the lock TTL in order to minimize flapping on restart. This | |
| improves availability but isn't required for correctness. | |
| - Must be in the same group as the task being locked. | |
| - Must be able to terminate the task being locked without the Nomad client being | |
| up. For example, they share the same PID namespace, or the locking task is | |
| privileged. | |
| - Must have a way of signalling the task being locked that it is safe to start. | |
| For example, the locking task can write a Sentinel file into the `/alloc` | |
| directory, which the locked task tries to read on startup and blocks until it | |
| exists. | |
| If you cannot meet the third requirement, then you need to split the lock | |
| acquisition and lock heartbeat into separate tasks. | |
| <CodeBlockConfig lineNumbers highlight="8-20,22-32"> | |
| ```hcl | |
| job "example" { | |
| group "group" { | |
| disconnect { | |
| stop_on_client_after = "1m" | |
| } | |
| task "acquire" { | |
| lifecycle { | |
| hook = "prestart" | |
| sidecar = false | |
| } | |
| config { | |
| driver = "raw_exec" | |
| command = "/opt/lock-acquire-script.sh" | |
| } | |
| identity { | |
| env = true # make NOMAD_TOKEN available to lock command | |
| } | |
| } | |
| task "heartbeat" { | |
| leader = true | |
| config { | |
| driver = "raw_exec" | |
| command = "/opt/lock-heartbeat-script.sh" | |
| pid_mode = "host" | |
| } | |
| identity { | |
| env = true # make NOMAD_TOKEN available to lock command | |
| } | |
| } | |
| task "application" { | |
| lifecycle { | |
| hook = "poststart" | |
| sidecar = true | |
| } | |
| config { | |
| driver = "docker" | |
| image = "example/app:1" | |
| } | |
| } | |
| } | |
| } | |
| ``` | |
| </CodeBlockConfig> | |
| If you configured the primary task to [`restart`][], the task should be able to | |
| restart within the lock TTL in order to minimize flapping on restart. This | |
| improves availability but isn't required for correctness. |
Many users have a requirement to run exactly one instance of a given allocation because it requires exclusive access to some cluster-wide resource, which we'll refer to here as a "singleton allocation". This is challenging to implement, so this document is intended to describe an accepted design to publish as a how-to/tutorial.
Links
Jira: https://hashicorp.atlassian.net/browse/NMD-1039
Deploy previews: https://unified-docs-frontend-preview-qx8ebllbe-hashicorp.vercel.app/nomad/docs/job-declare/strategy/singleton
Contributor checklists
Review urgency:
Pull request:
Content:
Reviewer checklist