apply: Check the partition's free size of /boot before apply#344
apply: Check the partition's free size of /boot before apply#344
Conversation
eos-updater/apply.c
Outdated
| /* Require at least 100 MiB free on /boot */ | ||
| const guint64 boot_min_free_bytes = 100 * 1024 * 1024; |
There was a problem hiding this comment.
It seems wrong to hardcode this number. And if I understand correctly each UKI is now more than 100 MiB so it's not sufficient to have 100 MiB free on /boot. Can we not calculate how much space will be required precisely?
There was a problem hiding this comment.
It is hard to get the new update OSTree's UKI, because it has not deployed. But, we can get current booted OSTree's UKI /boot/ostree/eos-xxxx/payg-image-$(uname -r).efi, which is copied from /lib/modules/$(uname -r)/payg-image.efi. I think we can calculate the minimum size from the size of /lib/modules/$(uname -r)/payg-image.efi * 1.1. The more 10% is for the potential UKI image growing size.
eos-updater/apply.c
Outdated
| */ | ||
| if (!check_boot_free_space (error)) | ||
| { | ||
| if (!g_error_matches (*error, G_IO_ERROR, G_IO_ERROR_NO_SPACE)) |
There was a problem hiding this comment.
You don't know that this function was not called with error set to NULL. You should pass &local_error to check_boot_free_space, and propagate it to error if needed. However...
There was a problem hiding this comment.
Thanks for guiding the glib's error design! :)
Fixed in the new commits.
eos-updater/apply.c
Outdated
| return FALSE; | ||
| g_clear_error (error); | ||
|
|
||
| if (!remove_one_unused_deployment (sysroot, cancellable, error)) | ||
| return FALSE; |
There was a problem hiding this comment.
This patch would make the upgrade operation fail in many more cases than before, e.g.:
- If
/bootcannot be stat:ed for some reason - If removing the rollback deployment fails
- etc.
I think that this check should never make the apply operation fail. At most it should log a warning. Otherwise, if there is some bug in the free-space-checking operation, then OS updates will never be applied, and we won't be able to fix the bug. It is better to allow Apply to proceed. (That's even true in the case where we believe that /boot is too full. We should still try to proceed - it's possible we are wrong.)
There was a problem hiding this comment.
Modified as the new commits.
eos-updater/apply.c
Outdated
| deployments = ostree_sysroot_get_deployments (sysroot); | ||
| new_deployments = g_ptr_array_new_with_free_func (g_object_unref); | ||
| for (guint i = 0; i < deployments->len; i++) | ||
| { | ||
| OstreeDeployment *d = deployments->pdata[i]; | ||
| if (d != to_remove) | ||
| g_ptr_array_add (new_deployments, g_object_ref (d)); | ||
| } | ||
|
|
||
| if (!ostree_sysroot_write_deployments (sysroot, new_deployments, cancellable, error)) |
There was a problem hiding this comment.
I believe this will work:
| deployments = ostree_sysroot_get_deployments (sysroot); | |
| new_deployments = g_ptr_array_new_with_free_func (g_object_unref); | |
| for (guint i = 0; i < deployments->len; i++) | |
| { | |
| OstreeDeployment *d = deployments->pdata[i]; | |
| if (d != to_remove) | |
| g_ptr_array_add (new_deployments, g_object_ref (d)); | |
| } | |
| if (!ostree_sysroot_write_deployments (sysroot, new_deployments, cancellable, error)) | |
| deployments = ostree_sysroot_get_deployments (sysroot); | |
| g_ptr_array_remove (deployments, to_remove); | |
| if (!ostree_sysroot_write_deployments (sysroot, deployments, cancellable, error)) |
eos-updater/apply.c
Outdated
| static gboolean | ||
| check_boot_free_space (GError **error) |
There was a problem hiding this comment.
I think it is confusing that the return value FALSE from this function can mean either "an actual error occurred" or "boot has insufficient space".
There was a problem hiding this comment.
The new commit passes argument enough_space into check_boot_free_space() to get the information.
392c7e8 to
43808c7
Compare
eos-updater/apply.c
Outdated
| g_propagate_error (error, g_steal_pointer (&local_error)); | ||
| g_warning ("%s", local_error->message); | ||
| g_clear_error (&local_error); |
There was a problem hiding this comment.
This is guaranteed to crash because g_steal_pointer (&local_error) assigns NULL to local_error, so local_error->message will dereference a NULL pointer. Please test this code path.
| g_propagate_error (error, g_steal_pointer (&local_error)); | |
| g_warning ("%s", local_error->message); | |
| g_clear_error (&local_error); | |
| g_warning ("Failed to check for free space: %s", local_error->message); | |
| g_propagate_error (error, g_steal_pointer (&local_error)); |
There was a problem hiding this comment.
Check g_steal_pointer()'s description:
Sets pp to NULL, returning the value that was there before.
Wow! It is not simply returning the pointer. Thanks for the hint!
There was a problem hiding this comment.
It is a hard to test the code path get into this if yes condition. check_boot_free_space() only returns false when there is no /proc/cmdline, or no /boot.
But, normal path works.
There was a problem hiding this comment.
Fixed as the new commit.
43808c7 to
6c693ee
Compare
wjt
left a comment
There was a problem hiding this comment.
Sure, it's not trivial to test but also not impossible: rename /boot? Overmount something on /proc that hides /proc/cmdline? Temporarily patch the code to read the wrong paths?
The systems which start before EOS 5.0.0 have only 250 MB ESP partition. However, the kernel & initramfs composed EFI image has grown to almost 100 MB. The PAYG system mounts ESP partition to /boot. The space is not enough to hold the 3rd EFI image for PAYG provisioned system in the ESP during OSTree deploying the new updated OSTree. So, here introduces check_boot_free_space() function to check the partition's free size of /boot in the early apply. https://phabricator.endlessm.com/T33799
…ough space Delete one unused ostree deployment if the the partition's free size of /boot is not enough. The used ostree deployment could be pending, or rollback ostree deployment. But, prefer removing the rollback (older) over the pending deployment. Then, there should be enough free space for the new updated OSTree deployment. https://phabricator.endlessm.com/T33799
6c693ee to
33880e3
Compare
|
I notice that this is targeting |
The systems which start before EOS 5.0.0 have only 250 MB ESP partition. However, the kernel & initramfs composed EFI image has grown to almost 100 MB. The PAYG system mounts ESP partition to /boot. The space is not enough to hold the 3rd EFI image for PAYG provisioned system in the ESP during OSTree deploying the new updated OSTree. So, here introduces
check_boot_free_space()function to check the partition's free size of /boot in the early apply.Delete one unused ostree deployment if the the partition's free size of /boot is not enough. The used ostree deployment could be pending, or rollback ostree deployment. But, prefer removing the rollback (older) over the pending deployment. Then, there should be enough free space for the new updated OSTree deployment.
https://phabricator.endlessm.com/T33799