Skip to content

Conversation

@jesseanttila-cai
Copy link

@jesseanttila-cai jesseanttila-cai commented Mar 8, 2025

Issue number:

Closes #389

Description of changes:

Adds two new options to ephemeral-storage init:

  • --ebs-volumes specifies EBS volumes to be included in the ephemeral storage array (e.g. --ebs-volumes /dev/disk/ephemeral-ebs/xvdda /dev/disk/ephemeral-ebs/xvddb

  • --prefer specifies combinations of storage types to be included in the ephemeral storage array in order of preference. Currently supported storage types are ephemeral-disk and ebs-volume. Storage types can be combined with the + operator. For example, --prefer ephemeral-disk ebs-volume uses ephemeral disks if available and ebs volumes otherwise. To use a mix of both types, specify --prefer ephemeral-disk+ebs-volume.

Testing done:

Manual testing of a few different configurations on EKS (see this comment)

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

Copy link
Contributor

@bcressey bcressey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a patch that populates /dev/disk/ephemeral-ebs with symlinks for non-system EBS volumes in the xvdda - xvddx range.

In combination with my other suggestions, this would let you provide user-data like this to automatically configure up to 24 EBS volumes in a RAID-0 array, if present, and fall back to instance store otherwise:

[settings.bootstrap-commands.k8s-ephemeral-storage]
commands = [
    ["apiclient", "ephemeral-storage", "init", "--prefer", "ebs-volumes"],
    ["apiclient", "ephemeral-storage" ,"bind", "--dirs", "/var/lib/containerd", "/var/lib/kubelet", "/var/log/pods"]
]
essential = true
mode = "always"

Comment on lines 70 to 71
for ebs_volume in ebs_volumes.iter().flatten() {
let link_path = format!("/dev/disk/by-ebs-id/{ebs_volume}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add an ephemeral_ebs_volumes function, like ephemeral_devices, that returns available EBS volumes.

Comment on lines 62 to 65
if known_disks.is_empty() && ebs_volumes.is_none() {
info!("no ephemeral disks found, skipping ephemeral storage initialization");
return Ok(());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple of design questions here that are worth considering, in the context of the patch that I shared that designates the xvdda - xvddx range for "ephemeral" EBS volumes.

Should we allow both EBS and instance store volumes in the same RAID-0 array? I would default to "no", and reject the API request if we're given both. We can change our minds later, but it seems weird to mix devices like this in the same array.

If both lists are empty, and both instance store disks and ephemeral EBS volumes are present, and we can't mix the two in the same array - then which devices do we include?

Some options:

  1. Add another API flag, like --prefer-ebs-volumes, which controls the behavior when presented with two empty lists.
    1. If it's true:
      • If ephemeral EBS volumes are present: we'll use them in the array.
      • If no ephemeral EBS volumes are present: we'll use ephemeral disks, if any.
    2. If it's false:
      • If ephemeral disks are present: we'll use them in the array.
      • If no ephemeral disks are present: we'll use ephemeral EBS volumes, if any.
  2. Punt on the problem and don't support implicit array creation for EBS volumes.

--prefer-ebs-volumes could also solve the first issue - if we're given two lists and told which one to prefer, we can ignore the other one rather than return an error.

(For the actual implemention, I'd favor using an enum and having --prefer ephemeral-disks and --prefer ebs-volumes rather than a boolean. That will be easier to extend if we end up with another storage category. You can put it in models/src/ephemeral_storage.rs so that the client and server can share the same type definition.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the enum approach, we could also just have all four options as variants:

--prefer [ disks-only
         | ebs-volumes-only
         | disks-to-ebs-volumes
         | ebs-volumes-to-disks ]

Then the default can be disks-only for backwards compatibility.

Comment on lines 96 to 100
let disk_name = disks.first().expect("non-empty").clone();
let link_name = format!("{RAID_DEVICE_DIR}{RAID_DEVICE_NAME}");
std::os::unix::fs::symlink(disk_name, &link_name)
.context(error::DiskSymlinkFailureSnafu)?;
link_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you're fixing a pre-existing bug - we effectively lost track of which disk we used during the "init" operation. That didn't matter in the case where we use all ephemeral disks, but could have mattered on an instance that had three (nvme1, nvme2, nvme3) where the "init" request specified only one (nvme2) so we didn't end up creating an array.

The symlink approach is a nice, cheap way to preserve the state we need, but it can cause trouble to create links in "owned" directories under /dev, like /dev/md (which mdadm "owns").

Instead I'd suggest using pathdiff::diff_paths and creating a relative symlink from /dev/disk/ephemeral-storage to either the single disk or EBS volume, or to the RAID-0 array.

@jesseanttila-cai
Copy link
Author

Thank you kindly for the comments and suggestions. I had some doubts about the handling of system ebs volumes and some edge-cases involving mixed instance storage/ebs volume arrays, which are solved by the /dev/disk/ephemeral-ebs links and the --prefer flag suggestion.
Unfortunately I won't be able to (read: shouldn't) implement and test any of these changes until I return from holiday on the 25th, but now that I have the AMI build set-up it shouldn't take four months this time.

Comment on lines +276 to +280
--prefer PREFERENCE [PREFERENCE ..]
Epheremeral storage type preference, in descending order. Allowed
values: `ephemeral-disk`, `ebs-volume` or a combination of these
joined by `+`. Defaults to `ephemeral-disk` only. This option is
ignored if `--disks` or `--ebs-volumes` is set.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of the enum solution I went with an ordered list of "preferences", which are sets of allowed storage types. This should be easier to expand in case more storage types are added in the future.

There are effectively five distinct preference options available:

  • --prefer ephemeral-disk: use only ephemeral instance storage (default)
  • --prefer ebs-volume: use only ebs volumes
  • --prefer ephemeral-disk ebs-volume: use ephemeral instance storage if available, otherwise ebs volumes
  • --prefer ebs-volume ephemeral-disk: same as above except in the opposite order
  • --prefer ephemeral-disk+ebs-volume: use all available storage of either type

The --disks and --ebs-volumes options override --prefer, which allows for a specific custom storage setup to be defined. This is also backwards compatible with the previous behaviour of the --disks option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense; I agree the enum variants would get unwieldy if more permutations of storage were added, while this will be easy to extend.

Comment on lines 56 to 61
match i {
"ephemeral-disk" => preference.ephemeral_disk = true,
"ebs-volume" => preference.ebs_volume = true,
x => return Err(x),
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "" be allowed here? It would allow defining an empty set (not very useful) and superfluous +s, e.g. ephemeral-disk+. The main benefit would probably be enabling round-tripping through Display and TryFrom<&str> for the empty set case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The round-trip property is nice, though I don't feel strongly. Up to you!

@jesseanttila-cai
Copy link
Author

Results from manual testing with the latest commit:

instance config A B C D E
m5.large 20G 20G 20G 20G X
m5.large+4x8GB 20G 32G 32G 32G 16G
m5.large+16x4GB 20G 64G 64G 64G 8G
m5d.large 70G 70G 70G 70G X
m5d.large+4x8GB 70G 32G 70G 102G 16G
m5d.large+16x4GB 70G 64G 70G 134G 8G

where

  • A: default (equivalent to --prefer ephemeral-disk)
  • B: --prefer ebs-volume ephemeral-disk
  • C: --prefer ephemeral-disk ebs-volume
  • D: --prefer ephemeral-disk+ebs-volume
  • E: --ebs-volumes /dev/disk/ephemeral-ebs/xvddb /dev/disk/ephemeral-ebs/xvddd

All test cases reported reasonable values for ephemeral storage capacity, except for two cases where the bootstrap command failed due to missing ebs volumes. A capacity of 20G corresponds to the default EBS volume added by Karpenter, which is used when ephemeral-storage init does not find any storage devices of an allowed type.

Copy link
Contributor

@bcressey bcressey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delayed review! LGTM.

@bcressey
Copy link
Contributor

Let me know if you're ready to mark this as "ready for review" and I can kick off CI tests. The only change I'd suggest is squashing the commits together and splitting them back out into a pair of apiclient and apiserver commits.

@jesseanttila-cai jesseanttila-cai force-pushed the ebs-ephemeral-storage branch 2 times, most recently from 54e6c8f to 8d2dd6a Compare October 14, 2025 07:01
@jesseanttila-cai jesseanttila-cai marked this pull request as ready for review October 14, 2025 07:31
@jesseanttila-cai
Copy link
Author

I've refactored the PR to two commits, the apiserver commit also includes the changes to models & os.spec. There was also a very minor conflict with commit 3fd2ad1 which has now been resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Allow EBS disks in ephemeral storage init

2 participants