Skip to content

apply: Check the partition's free size of /boot before apply#344

Open
starnight wants to merge 2 commits intoeos6.0from
fix-limited-esp
Open

apply: Check the partition's free size of /boot before apply#344
starnight wants to merge 2 commits intoeos6.0from
fix-limited-esp

Conversation

@starnight
Copy link
Copy Markdown
Contributor

The systems which start before EOS 5.0.0 have only 250 MB ESP partition. However, the kernel & initramfs composed EFI image has grown to almost 100 MB. The PAYG system mounts ESP partition to /boot. The space is not enough to hold the 3rd EFI image for PAYG provisioned system in the ESP during OSTree deploying the new updated OSTree. So, here introduces check_boot_free_space() function to check the partition's free size of /boot in the early apply.

Delete one unused ostree deployment if the the partition's free size of /boot is not enough. The used ostree deployment could be pending, or rollback ostree deployment. But, prefer removing the rollback (older) over the pending deployment. Then, there should be enough free space for the new updated OSTree deployment.

https://phabricator.endlessm.com/T33799

@starnight starnight requested review from dsd and wjt March 31, 2026 09:40
Comment on lines +239 to +240
/* Require at least 100 MiB free on /boot */
const guint64 boot_min_free_bytes = 100 * 1024 * 1024;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems wrong to hardcode this number. And if I understand correctly each UKI is now more than 100 MiB so it's not sufficient to have 100 MiB free on /boot. Can we not calculate how much space will be required precisely?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is hard to get the new update OSTree's UKI, because it has not deployed. But, we can get current booted OSTree's UKI /boot/ostree/eos-xxxx/payg-image-$(uname -r).efi, which is copied from /lib/modules/$(uname -r)/payg-image.efi. I think we can calculate the minimum size from the size of /lib/modules/$(uname -r)/payg-image.efi * 1.1. The more 10% is for the potential UKI image growing size.

*/
if (!check_boot_free_space (error))
{
if (!g_error_matches (*error, G_IO_ERROR, G_IO_ERROR_NO_SPACE))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't know that this function was not called with error set to NULL. You should pass &local_error to check_boot_free_space, and propagate it to error if needed. However...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for guiding the glib's error design! :)

Fixed in the new commits.

Comment on lines +357 to +361
return FALSE;
g_clear_error (error);

if (!remove_one_unused_deployment (sysroot, cancellable, error))
return FALSE;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch would make the upgrade operation fail in many more cases than before, e.g.:

  1. If /boot cannot be stat:ed for some reason
  2. If removing the rollback deployment fails
  3. etc.

I think that this check should never make the apply operation fail. At most it should log a warning. Otherwise, if there is some bug in the free-space-checking operation, then OS updates will never be applied, and we won't be able to fix the bug. It is better to allow Apply to proceed. (That's even true in the case where we believe that /boot is too full. We should still try to proceed - it's possible we are wrong.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified as the new commits.

Comment on lines +305 to +314
deployments = ostree_sysroot_get_deployments (sysroot);
new_deployments = g_ptr_array_new_with_free_func (g_object_unref);
for (guint i = 0; i < deployments->len; i++)
{
OstreeDeployment *d = deployments->pdata[i];
if (d != to_remove)
g_ptr_array_add (new_deployments, g_object_ref (d));
}

if (!ostree_sysroot_write_deployments (sysroot, new_deployments, cancellable, error))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this will work:

Suggested change
deployments = ostree_sysroot_get_deployments (sysroot);
new_deployments = g_ptr_array_new_with_free_func (g_object_unref);
for (guint i = 0; i < deployments->len; i++)
{
OstreeDeployment *d = deployments->pdata[i];
if (d != to_remove)
g_ptr_array_add (new_deployments, g_object_ref (d));
}
if (!ostree_sysroot_write_deployments (sysroot, new_deployments, cancellable, error))
deployments = ostree_sysroot_get_deployments (sysroot);
g_ptr_array_remove (deployments, to_remove);
if (!ostree_sysroot_write_deployments (sysroot, deployments, cancellable, error))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks!

Comment on lines +233 to +234
static gboolean
check_boot_free_space (GError **error)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is confusing that the return value FALSE from this function can mean either "an actual error occurred" or "boot has insufficient space".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new commit passes argument enough_space into check_boot_free_space() to get the information.

@starnight starnight requested a review from wjt April 1, 2026 11:01
Comment on lines +374 to +376
g_propagate_error (error, g_steal_pointer (&local_error));
g_warning ("%s", local_error->message);
g_clear_error (&local_error);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is guaranteed to crash because g_steal_pointer (&local_error) assigns NULL to local_error, so local_error->message will dereference a NULL pointer. Please test this code path.

Suggested change
g_propagate_error (error, g_steal_pointer (&local_error));
g_warning ("%s", local_error->message);
g_clear_error (&local_error);
g_warning ("Failed to check for free space: %s", local_error->message);
g_propagate_error (error, g_steal_pointer (&local_error));

Copy link
Copy Markdown
Contributor Author

@starnight starnight Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check g_steal_pointer()'s description:

Sets pp to NULL, returning the value that was there before.

Wow! It is not simply returning the pointer. Thanks for the hint!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a hard to test the code path get into this if yes condition. check_boot_free_space() only returns false when there is no /proc/cmdline, or no /boot.

But, normal path works.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed as the new commit.

@starnight starnight requested a review from wjt April 14, 2026 06:08
Copy link
Copy Markdown
Member

@wjt wjt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it's not trivial to test but also not impossible: rename /boot? Overmount something on /proc that hides /proc/cmdline? Temporarily patch the code to read the wrong paths?

The systems which start before EOS 5.0.0 have only 250 MB ESP partition.
However, the kernel & initramfs composed EFI image has grown to almost
100 MB. The PAYG system mounts ESP partition to /boot. The space is not
enough to hold the 3rd EFI image for PAYG provisioned system in the ESP
during OSTree deploying the new updated OSTree. So, here introduces
check_boot_free_space() function to check the partition's free size of
/boot in the early apply.

https://phabricator.endlessm.com/T33799
…ough space

Delete one unused ostree deployment if the the partition's free size of
/boot is not enough. The used ostree deployment could be pending, or
rollback ostree deployment. But, prefer removing the rollback (older)
over the pending deployment. Then, there should be enough free space for
the new updated OSTree deployment.

https://phabricator.endlessm.com/T33799
@starnight starnight requested a review from wjt April 14, 2026 11:34
@wjt
Copy link
Copy Markdown
Member

wjt commented Apr 14, 2026

I notice that this is targeting eos6.0. What is the plan for the master branch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants