Skip to content

boot: bootutil: Refactor erase functionality to fix watchdog feeding, also fix swap scratch corrupt image issue #2275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 2, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions boot/boot_serial/src/boot_serial.c
Original file line number Diff line number Diff line change
Expand Up @@ -875,7 +875,7 @@ static off_t erase_range(const struct flash_area *fap, off_t start, off_t end)
BOOT_LOG_DBG("Erasing range 0x%jx:0x%jx", (intmax_t)start,
(intmax_t)(start + size - 1));

rc = boot_erase_region(fap, start, size);
rc = boot_erase_region(fap, start, size, false);
if (rc != 0) {
BOOT_LOG_ERR("Error %d while erasing range", rc);
return -EINVAL;
Expand Down Expand Up @@ -1019,7 +1019,7 @@ bs_upload(char *buf, int len)
/* Non-progressive erase erases entire image slot when first chunk of
* an image is received.
*/
rc = boot_erase_region(fap, 0, area_size);
rc = boot_erase_region(fap, 0, area_size, false);
if (rc) {
goto out_invalid_data;
}
Expand Down
2 changes: 1 addition & 1 deletion boot/boot_serial/src/boot_serial_encryption.c
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@ decrypt_region_inplace(struct boot_loader_state *state,
(off + bytes_copied + idx) - hdr->ih_hdr_size, blk_sz,
blk_off, &buf[idx]);
}
rc = boot_erase_region(fap, off + bytes_copied, chunk_sz);
rc = boot_erase_region(fap, off + bytes_copied, chunk_sz, false);
if (rc != 0) {
return BOOT_EFLASH;
}
Expand Down
98 changes: 92 additions & 6 deletions boot/bootutil/src/bootutil_misc.c
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
* Copyright (c) 2017-2019 Linaro LTD
* Copyright (c) 2016-2019 JUUL Labs
* Copyright (c) 2019-2020 Arm Limited
* Copyright (c) 2025 Nordic Semiconductor ASA
*
* Original license:
*
Expand Down Expand Up @@ -562,18 +563,103 @@ boot_read_image_size(struct boot_loader_state *state, int slot, uint32_t *size)
* Erases a region of device that requires erase prior to write; does
* nothing on devices without erase.
*
* @param fap The flash_area containing the region to erase.
* @param fa The flash_area containing the region to erase.
* @param off The offset within the flash area to start the
* erase.
* @param sz The number of bytes to erase.
* @param size The number of bytes to erase.
* @param backwards If set to true will erase from end to start
* addresses, otherwise erases from start to end
* addresses.
*
* @return 0 on success; nonzero on failure.
*/
int
boot_erase_region(const struct flash_area *fap, uint32_t off, uint32_t sz)
boot_erase_region(const struct flash_area *fa, uint32_t off, uint32_t size, bool backwards)
{
if (device_requires_erase(fap)) {
return flash_area_erase(fap, off, sz);
int rc = 0;

if (off >= flash_area_get_size(fa) || (flash_area_get_size(fa) - off) < size) {
rc = -1;
goto end;
} else if (device_requires_erase(fa)) {
uint32_t end_offset = 0;
struct flash_sector sector;

if (backwards) {
/* Get the lowest page offset first */
rc = flash_area_get_sector(fa, off, &sector);

if (rc < 0) {
goto end;
}

end_offset = flash_sector_get_off(&sector);

/* Set boundary condition, the highest probable offset to erase, within
* last sector to erase
*/
off += size - 1;
} else {
/* Get the highest page offset first */
rc = flash_area_get_sector(fa, (off + size - 1), &sector);

if (rc < 0) {
goto end;
}

end_offset = flash_sector_get_off(&sector);
}

while (true) {
/* Size to read in this iteration */
size_t csize;

/* Get current sector and, also, correct offset */
rc = flash_area_get_sector(fa, off, &sector);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nordicjm Something is wrong with this function from mynewt. For the second sector starting at offset 131072 it returns sector start offset as 0, which causes the line 624 to assign 0 to offset and the loop never ends.


if (rc < 0) {
goto end;
}

/* Corrected offset and size of current sector to erase */
off = flash_sector_get_off(&sector);
csize = flash_sector_get_size(&sector);

rc = flash_area_erase(fa, off, csize);
Copy link
Contributor

@taltenbach taltenbach Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we still have an issue here as depending on the flash driver implementation, the sector returned by flash_area_get_sector might be a virtual sector, that is composed of multiple physical sectors. For example, I'm doing that on a project where I have an internal flash memory with 128 KiB physical sectors and an external flash memory with 64 KiB sector. My flash_map_backend implementation simply use a virtual sector size of 128 KiB for simplicity, but also to be able to support swap-move/offset on that device.

Perhaps we want to add a backwards argument to flash_area_erase then?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flash_area_erase is an OS function, so it can't be changed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we still have an issue here as depending on the flash driver implementation, the sector returned by flash_area_get_sector might be a virtual sector, that is composed of multiple physical sectors. For example, I'm doing that on a project where I have an internal flash memory with 128 KiB physical sectors and an external flash memory with 64 KiB sector. My flash_map_backend implementation simply use a virtual sector size of 128 KiB for simplicity, but also to be able to support swap-move/offset on that device.

Perhaps we want to add a backwards argument to flash_area_erase then?

Whatever API returns as a sector is a sector, you do not care beyond that. So your backend just returns sectors. I do not understand what would be benefit of knowing what type of sector that really is.

I am working at making MCUboot funciton with software defined sectors, that may be considered virtual sectors, and reducing code that queries the info allover the place. Software sectors also allow devices with storage on RRAM/NVRAM/MRAM/EEPROM function more effectively as they drop the physical constrains like alignment. Actually, for the purpose to ramp up support of RRAM devices, we made them pretend to be flash and have paged layout, even though it does nothing for them and is only annoying for software above them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, let say my scratch area is 128 KiB and composed of two 64 KiB sectors. Imagine my flash backend is considering 128-KiB virtual sectors, so when trying to erase the scratch area backwards, the backend will return a single 128-KiB sector to boot_erase_region that we will erase with flash_area_erase. The backend doesn't know it has to erase the sectors backwards, so it is quite likely it will start by erasing the first 64 KiB sector of the scratch area. If a power cycle occurs at that moment, I guess we'd observe the same issue spotted by @nordicjm

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's different, because MCUboot would see it as a single 128KiB sector, so if we assume this is being used for swap status, swap status is going to be at the beginning of that in the first sector, MCUboot will not split up the sectors. Though having though about it I guess that maybe the patch here does only work because of the limited number of sectors, if there were a lot of sectors, so much that the trailer section spanned multiple sectors and power was lost during erasing them (backwards) then I assume it means that the swap would resume earlier than it should do, but also creating a corrupt image... might need a new device in simulator for that

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's different, because MCUboot would see it as a single 128KiB sector, so if we assume this is being used for swap status, swap status is going to be at the beginning of that in the first sector, MCUboot will not split up the sectors. Though having though about it I guess that maybe the patch here does only work because of the limited number of sectors, if there were a lot of sectors, so much that the trailer section spanned multiple sectors and power was lost during erasing them (backwards) then I assume it means that the swap would resume earlier than it should do, but also creating a corrupt image... might need a new device in simulator for that

Yeah, I was mentioning that problem with swap algorithms too, several times. If your log crosses to a next page it will be left there, in case of interruption, and once new swap starts you will end up in situation where next swap tries to overwrite existing log.
Solution would be to check on every boot, if trailer has erased magic or magic is somehow damaged, and try to erase the entire, presumed, log range. That check should happen on all boots.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, let say my scratch area is 128 KiB and composed of two 64 KiB sectors. Imagine my flash backend is considering 128-KiB virtual sectors, so when trying to erase the scratch area backwards, the backend will return a single 128-KiB sector to boot_erase_region that we will erase with flash_area_erase. The backend doesn't know it has to erase the sectors backwards, so it is quite likely it will start by erasing the first 64 KiB sector of the scratch area. If a power cycle occurs at that moment, I guess we'd observe the same issue spotted by @nordicjm

Well you are not sure about that. OK, so you may with that hardware, but if you have a device like SPI NOR, the driver may choose to erase data in bulk and select command it will send to device depending on range you are trying to erase. For example if you ask it to erase 4k, then it will send command to erase 4k, if you ask it to erase 8k, it will send two 4k commands, but if you ask it to erase 64k then it will erase it with one command. In all above cases, the spi nor will report page size as 4k, so you actually do not know how it does the erase in the end.
You may also have a modified driver that will jump out of erase every 4k to allow other software to function, so it is hard to relay on erase as atomic operation. And actually even the erase, as hardware does it, is not atomic, and may be interrupted leaving some data not erased.

Copy link
Contributor

@taltenbach taltenbach Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well you are not sure about that. OK, so you may with that hardware, but if you have a device like SPI NOR, the driver may choose to erase data in bulk and select command it will send to device depending on range you are trying to erase. For example if you ask it to erase 4k, then it will send command to erase 4k, if you ask it to erase 8k, it will send two 4k commands, but if you ask it to erase 64k then it will erase it with one command. In all above cases, the spi nor will report page size as 4k, so you actually do not know how it does the erase in the end.
You may also have a modified driver that will jump out of erase every 4k to allow other software to function, so it is hard to relay on erase as atomic operation. And actually even the erase, as hardware does it, is not atomic, and may be interrupted leaving some data not erased.

You're absolutely right, but in that case does that mean swap-scratch is broken by design? Indeed, if my understanding of the bug described by @nordicjm is correct, it seems swap-scratch assumes the trailer of the scratch area is erased before the other data contained in the scratch area, which wouldn't be the case in the situations you're describing, even though boot_erase_region is erasing backwards.

And in fact, we might even have an issue for swap-move or swap-offset, the swap_scramble_trailer_sectors routine erases the trailer backwards to ensure we cannot have a case where, after a reboot during the erase of the trailer, the magic number (at the end of the trailer) is valid but the remaining of the trailer has already been erased and is therefore corrupted. Since we cannot trust the flash driver to erase the last part of our trailer first and since the erase is not atomic at hardware-level, I'm wondering if there are some cases where we would have an issue.


if (rc < 0) {
goto end;
}

MCUBOOT_WATCHDOG_FEED();

if (backwards) {
if (end_offset >= off) {
/* Reached the first offset in range and already erased it */
break;
}

/* Move down to previous sector, the flash_area_get_sector will
* correct the value to real page offset
*/
off -= 1;
} else {
/* Move up to next sector */
off += csize;

if (off > end_offset) {
/* Reached the end offset in range and already erased it */
break;
}

/* Workaround for flash_sector_get_off() being broken in mynewt, hangs with
* infinite loop if this is not present, should be removed if bug is fixed.
*/
off += 1;
}
}
}
return 0;

end:
return rc;
}
6 changes: 2 additions & 4 deletions boot/bootutil/src/bootutil_priv.h
Original file line number Diff line number Diff line change
Expand Up @@ -346,11 +346,9 @@ int boot_copy_region(struct boot_loader_state *state,
/* Prepare for write device that requires erase prior to write. This will
* do nothing on devices without erase requirement.
*/
int boot_erase_region(const struct flash_area *fap, uint32_t off, uint32_t sz);
int boot_erase_region(const struct flash_area *fap, uint32_t off, uint32_t sz, bool backwards);
/* Similar to boot_erase_region but will always remove data */
int boot_scramble_region(const struct flash_area *fap, uint32_t off, uint32_t sz);
/* Similar to boot_scramble_region but works backwards */
int boot_scramble_region_backwards(const struct flash_area *fap, uint32_t off, uint32_t sz);
int boot_scramble_region(const struct flash_area *fap, uint32_t off, uint32_t sz, bool backwards);
/* Makes slot unbootable, either by scrambling header magic, header sector
* or entire slot, depending on settings.
* Note: slot is passed here becuase at this point there is no function
Expand Down
Loading
Loading