Skip to content

Conversation

@VivekYadav7272
Copy link

Please see issue #893 for why this change was needed.

I'm marking this as draft because I have some cleanup to do, but to give early visibility in case I've done something obviously dumb, I'm raising the draft PR anyways.

@VivekYadav7272
Copy link
Author

debug_image_info_table.rs's most of unsafe usage came from things I could not mutate, because either the shape was UEFI-spec mandated (like the union structure), or an operation was UEFI-spec mandated (like the volatile_{read|write}s on DebugImageInfoTableHeader). So I did not see a huge reduction in unsafe there.

// as that pointer is not shared with unprotected structures, and can only be accessed through the MutexGuard here.
unsafe impl<'a> Send for DebugImageInfoTableMetadata<'a> {}

static METADATA_TABLE: Mutex<Option<DebugImageInfoTableMetadata>> = Mutex::new(None);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spin::Mutex is problematic in the UEFI context because of the kind of multi-threading that UEFI supports. Unlike most high-level operating systems, UEFI only supports single-threaded execution with interrupts. This means that there is no concurrent execution. As such, if one context acquires the lock, and then an interrupt occurs to a higher-priority context that also attempts to acquire the lock, then deadlock will occur because the lower-priority context will not regain control until the higher-priority context yields, which it will not do since it is blocked attempting to acquire the lock.

If you really need Mutex+Guard capabilities here, recommend using TplMutex instead (implemented in the tpl_lock.rs module). TplMutex raises the priority level of the executing context while the lock is held, preventing pre-emption by other levels that would potentially want to acquire the lock, making it a safer option for avoiding deadlock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need a lock here, this is only ever invoked when loading an image, which won't happen during an interrupt. Cell should be sufficient.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My DebugImageInfoTable changes seem to already have been merged in the commit that reworked atomics, although it uses RwLock instead of Cell. Perhaps that still exposes us to potential interrupt pre-emption deadlocks since there are a bunch of .write() calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh - I forgot you had this PR outstanding - if there are additional improvements here that are not part of that commit, please feel free to rebase.

With respect to pre-emption - all the non-unit-test calls to .write() occur as part of an initialization flow (in initialize_debug_image_info_table) before the event subsystem is up, so there should be no danger of pre-emption at that point.

Copy link
Contributor

@os-d os-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! I think the general approach is okay, just a few notes on the Mutex vs Cell/TplMutex.

// as that pointer is not shared with unprotected structures, and can only be accessed through the MutexGuard here.
unsafe impl<'a> Send for DebugImageInfoTableMetadata<'a> {}

static METADATA_TABLE: Mutex<Option<DebugImageInfoTableMetadata>> = Mutex::new(None);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need a lock here, this is only ever invoked when loading an image, which won't happen during an interrupt. Cell should be sufficient.

@VivekYadav7272 VivekYadav7272 force-pushed the dev/vivekyadav7272/config-tables-zerocopy branch from 2bf8e15 to 09b5634 Compare December 13, 2025 16:18
@VivekYadav7272 VivekYadav7272 marked this pull request as ready for review December 13, 2025 16:18
Reduce unsafe usage in config_tables module using more Rust standard
allocation APIs and `zerocopy` for struct initialisations.
@VivekYadav7272 VivekYadav7272 force-pushed the dev/vivekyadav7272/config-tables-zerocopy branch from 09b5634 to 1715c2e Compare December 13, 2025 16:22
@VivekYadav7272 VivekYadav7272 changed the title DRAFT(#893): config_tables: Use zerocopy crate to reduce unsafe code blocks #893: config_tables: Use zerocopy crate to reduce unsafe code blocks Dec 13, 2025
@VivekYadav7272 VivekYadav7272 changed the title #893: config_tables: Use zerocopy crate to reduce unsafe code blocks config_tables: Use zerocopy crate to reduce unsafe code blocks Dec 13, 2025

impl Default for MemoryAttributesTable {
fn default() -> Self {
// Self-note: BootServicesData is basically just the default allocator (see documentation for MemoryManager trait)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear how this comment is related to the code at hand? I think you are noting that the Box created here will be in EBSD bucket, so maybe something like // default MAT will be created in the Efi Boot Services Data memory type using the default allocator

Copy link
Author

@VivekYadav7272 VivekYadav7272 Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. Will fix that

}
Ok(_) => {
// free the old MAT table if we have one
let current_mat = get_configuration_table(&efi::MEMORY_ATTRIBUTES_TABLE_GUID);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is going to return the mat you just installed - i.e. I think you're going to end up freeing the new table, not the old one. The Ok case for core_install_configuration_table should return the old pointer, and that's the one I think you want to free:

Ok(old_ptr) => {
  let current_mat = get_configuration_table(&efi::MEMORY_ATTRIBUTES_TABLE_GUID);
  println!("Free this one: {:?} not this one: {:?}", old_ptr, current_mat);
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ofcourse, that was a dumb mistake.

let mat = MemoryAttributesTable::from_mat_desc_list(mat_desc_list);
let mat_ptr = mat.as_ptr();

match core_install_configuration_table(efi::MEMORY_ATTRIBUTES_TABLE_GUID, mat_ptr as *mut c_void, st) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you are putting mat_ptr into a public table (i.e. one that is visible outside of the patina core to e.g. drivers). However, because you don't leak the mat, I think it will get auto-dropped at the end of the function, meaning that the pointer you installed in the global config table is going to point to freed memory.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the rebase was a bit sloppy. Before I'd taken care of this because the struct was stored globally, so the lifetime was 'static.

let current_mat = get_configuration_table(&efi::MEMORY_ATTRIBUTES_TABLE_GUID);
log::info!("Dumping MAT: {mat:?}");
if let Some(current_mat) = current_mat
&& let Err(err) = core_free_pool(current_mat.as_ptr())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current_mat was allocated using Box, it is invalid to free it here using core_free_pool. core_free_pool assumes that the pointer in question was allocated with core_allocate_pool and that an AllocationInfo structure is associated with the allocation. That would not be the case if it was directly allocated from the GlobalAllocator via Box. If you are going to allocate with Box, then it needs to be freed either by dropping the box directly or by calling the global allocator dealloc function directly with the pointer + layout.

I think your best bet is to create the raw pointer with Box::into_raw (this would also fix the issue where it gets dropped when it goes out of scope leaving a dangling pointer in the config table per my other comment), and then reclaim it here with Box::from_raw. However, the type info on the raw pointer is going to be erased when it goes into the config table (all pointers in there are typed c_void); so you need to engage in casting shenanigans to convert *mut c_void back to *mut [u8] with the right length that you need to reconstruct the box.

Copy link
Author

@VivekYadav7272 VivekYadav7272 Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's true, I forgot about this during the rebase. Previously with a global member, this was much easier, since I effectively had something to "remember" the type for me. So, there wouldn't have been a need for unsafe casting. Freeing would have been as simple as a swap operation, hence dropping the old one elegantly. Do you think that's worth re-introducing a private global member for? After all the goal is to avoid unsafe (taken care of by TplMutex?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is the case that if you rely on native rust allocations (via Box) that it is easier to go with a design where Rust always knows about the allocation - and that implies a global static.

The nature of the MAT table being an externally observable table in a system table structure means that it is inherently outside of the Rust memory model. For example, from a strictly API-centric viewpoint, it is entirely UEFI-spec legal (though a terrible idea and not in step with the expected design) for an external agent to delete this table from the global table and replace it with its own pointer, or even worse, modify the table contents. Neither of these are legitimate things to do with MAT specifically, but the system does not prevent them - so taking such a pointer and casting it back into the Box for dropping it is inherently unsafe because we rely on the good behavior of external code. And you could extend this argument to basically any approach - whatever way we handle this, it's always going to have an element of unsafety about it because of the external visibility of this table.

The question is how to best manage this unsafety. This is a matter for opinions :) My opinion is that maybe the Rust allocation apparatus is not helpful here, because it makes a lot of things (like drop) implicit in an environment where maybe not all of the assumptions baked into the Rust memory model hold. If you use direct allocations (via core_allocate_pool and core_free_pool for example), it's a bit more explicit what is being done to manage the memory, but it also means opting out of the Rust memory safety guarantees.

I can see different approaches working here; the important thing is to ensure we don't end up twisted around a broken hybrid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants