-
Couldn't load subscription status.
- Fork 53
Feature: --ebs-volumes option for ephemeral-storage init
#395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Feature: --ebs-volumes option for ephemeral-storage init
#395
Conversation
d9e1cf2 to
d14ca7b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a patch that populates /dev/disk/ephemeral-ebs with symlinks for non-system EBS volumes in the xvdda - xvddx range.
In combination with my other suggestions, this would let you provide user-data like this to automatically configure up to 24 EBS volumes in a RAID-0 array, if present, and fall back to instance store otherwise:
[settings.bootstrap-commands.k8s-ephemeral-storage]
commands = [
["apiclient", "ephemeral-storage", "init", "--prefer", "ebs-volumes"],
["apiclient", "ephemeral-storage" ,"bind", "--dirs", "/var/lib/containerd", "/var/lib/kubelet", "/var/log/pods"]
]
essential = true
mode = "always"
| for ebs_volume in ebs_volumes.iter().flatten() { | ||
| let link_path = format!("/dev/disk/by-ebs-id/{ebs_volume}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add an ephemeral_ebs_volumes function, like ephemeral_devices, that returns available EBS volumes.
| if known_disks.is_empty() && ebs_volumes.is_none() { | ||
| info!("no ephemeral disks found, skipping ephemeral storage initialization"); | ||
| return Ok(()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple of design questions here that are worth considering, in the context of the patch that I shared that designates the xvdda - xvddx range for "ephemeral" EBS volumes.
Should we allow both EBS and instance store volumes in the same RAID-0 array? I would default to "no", and reject the API request if we're given both. We can change our minds later, but it seems weird to mix devices like this in the same array.
If both lists are empty, and both instance store disks and ephemeral EBS volumes are present, and we can't mix the two in the same array - then which devices do we include?
Some options:
- Add another API flag, like
--prefer-ebs-volumes, which controls the behavior when presented with two empty lists.- If it's true:
- If ephemeral EBS volumes are present: we'll use them in the array.
- If no ephemeral EBS volumes are present: we'll use ephemeral disks, if any.
- If it's false:
- If ephemeral disks are present: we'll use them in the array.
- If no ephemeral disks are present: we'll use ephemeral EBS volumes, if any.
- If it's true:
- Punt on the problem and don't support implicit array creation for EBS volumes.
--prefer-ebs-volumes could also solve the first issue - if we're given two lists and told which one to prefer, we can ignore the other one rather than return an error.
(For the actual implemention, I'd favor using an enum and having --prefer ephemeral-disks and --prefer ebs-volumes rather than a boolean. That will be easier to extend if we end up with another storage category. You can put it in models/src/ephemeral_storage.rs so that the client and server can share the same type definition.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the enum approach, we could also just have all four options as variants:
--prefer [ disks-only
| ebs-volumes-only
| disks-to-ebs-volumes
| ebs-volumes-to-disks ]
Then the default can be disks-only for backwards compatibility.
| let disk_name = disks.first().expect("non-empty").clone(); | ||
| let link_name = format!("{RAID_DEVICE_DIR}{RAID_DEVICE_NAME}"); | ||
| std::os::unix::fs::symlink(disk_name, &link_name) | ||
| .context(error::DiskSymlinkFailureSnafu)?; | ||
| link_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you're fixing a pre-existing bug - we effectively lost track of which disk we used during the "init" operation. That didn't matter in the case where we use all ephemeral disks, but could have mattered on an instance that had three (nvme1, nvme2, nvme3) where the "init" request specified only one (nvme2) so we didn't end up creating an array.
The symlink approach is a nice, cheap way to preserve the state we need, but it can cause trouble to create links in "owned" directories under /dev, like /dev/md (which mdadm "owns").
Instead I'd suggest using pathdiff::diff_paths and creating a relative symlink from /dev/disk/ephemeral-storage to either the single disk or EBS volume, or to the RAID-0 array.
|
Thank you kindly for the comments and suggestions. I had some doubts about the handling of system ebs volumes and some edge-cases involving mixed instance storage/ebs volume arrays, which are solved by the |
| --prefer PREFERENCE [PREFERENCE ..] | ||
| Epheremeral storage type preference, in descending order. Allowed | ||
| values: `ephemeral-disk`, `ebs-volume` or a combination of these | ||
| joined by `+`. Defaults to `ephemeral-disk` only. This option is | ||
| ignored if `--disks` or `--ebs-volumes` is set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of the enum solution I went with an ordered list of "preferences", which are sets of allowed storage types. This should be easier to expand in case more storage types are added in the future.
There are effectively five distinct preference options available:
--prefer ephemeral-disk: use only ephemeral instance storage (default)--prefer ebs-volume: use only ebs volumes--prefer ephemeral-disk ebs-volume: use ephemeral instance storage if available, otherwise ebs volumes--prefer ebs-volume ephemeral-disk: same as above except in the opposite order--prefer ephemeral-disk+ebs-volume: use all available storage of either type
The --disks and --ebs-volumes options override --prefer, which allows for a specific custom storage setup to be defined. This is also backwards compatible with the previous behaviour of the --disks option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense; I agree the enum variants would get unwieldy if more permutations of storage were added, while this will be easy to extend.
| match i { | ||
| "ephemeral-disk" => preference.ephemeral_disk = true, | ||
| "ebs-volume" => preference.ebs_volume = true, | ||
| x => return Err(x), | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should "" be allowed here? It would allow defining an empty set (not very useful) and superfluous +s, e.g. ephemeral-disk+. The main benefit would probably be enabling round-tripping through Display and TryFrom<&str> for the empty set case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The round-trip property is nice, though I don't feel strongly. Up to you!
|
Results from manual testing with the latest commit:
where
All test cases reported reasonable values for ephemeral storage capacity, except for two cases where the bootstrap command failed due to missing ebs volumes. A capacity of 20G corresponds to the default EBS volume added by Karpenter, which is used when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the delayed review! LGTM.
|
Let me know if you're ready to mark this as "ready for review" and I can kick off CI tests. The only change I'd suggest is squashing the commits together and splitting them back out into a pair of apiclient and apiserver commits. |
54e6c8f to
8d2dd6a
Compare
8d2dd6a to
2ac3e15
Compare
|
I've refactored the PR to two commits, the apiserver commit also includes the changes to |
Issue number:
Closes #389
Description of changes:
Adds two new options to
ephemeral-storage init:--ebs-volumesspecifies EBS volumes to be included in the ephemeral storage array (e.g.--ebs-volumes /dev/disk/ephemeral-ebs/xvdda /dev/disk/ephemeral-ebs/xvddb--preferspecifies combinations of storage types to be included in the ephemeral storage array in order of preference. Currently supported storage types areephemeral-diskandebs-volume. Storage types can be combined with the+operator. For example,--prefer ephemeral-disk ebs-volumeuses ephemeral disks if available and ebs volumes otherwise. To use a mix of both types, specify--prefer ephemeral-disk+ebs-volume.Testing done:
Manual testing of a few different configurations on EKS (see this comment)
Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.