-
Notifications
You must be signed in to change notification settings - Fork 40
config_tables: Use zerocopy crate to reduce unsafe code blocks #1062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
config_tables: Use zerocopy crate to reduce unsafe code blocks #1062
Conversation
|
|
| // as that pointer is not shared with unprotected structures, and can only be accessed through the MutexGuard here. | ||
| unsafe impl<'a> Send for DebugImageInfoTableMetadata<'a> {} | ||
|
|
||
| static METADATA_TABLE: Mutex<Option<DebugImageInfoTableMetadata>> = Mutex::new(None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spin::Mutex is problematic in the UEFI context because of the kind of multi-threading that UEFI supports. Unlike most high-level operating systems, UEFI only supports single-threaded execution with interrupts. This means that there is no concurrent execution. As such, if one context acquires the lock, and then an interrupt occurs to a higher-priority context that also attempts to acquire the lock, then deadlock will occur because the lower-priority context will not regain control until the higher-priority context yields, which it will not do since it is blocked attempting to acquire the lock.
If you really need Mutex+Guard capabilities here, recommend using TplMutex instead (implemented in the tpl_lock.rs module). TplMutex raises the priority level of the executing context while the lock is held, preventing pre-emption by other levels that would potentially want to acquire the lock, making it a safer option for avoiding deadlock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need a lock here, this is only ever invoked when loading an image, which won't happen during an interrupt. Cell should be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My DebugImageInfoTable changes seem to already have been merged in the commit that reworked atomics, although it uses RwLock instead of Cell. Perhaps that still exposes us to potential interrupt pre-emption deadlocks since there are a bunch of .write() calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh - I forgot you had this PR outstanding - if there are additional improvements here that are not part of that commit, please feel free to rebase.
With respect to pre-emption - all the non-unit-test calls to .write() occur as part of an initialization flow (in initialize_debug_image_info_table) before the event subsystem is up, so there should be no danger of pre-emption at that point.
os-d
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! I think the general approach is okay, just a few notes on the Mutex vs Cell/TplMutex.
| // as that pointer is not shared with unprotected structures, and can only be accessed through the MutexGuard here. | ||
| unsafe impl<'a> Send for DebugImageInfoTableMetadata<'a> {} | ||
|
|
||
| static METADATA_TABLE: Mutex<Option<DebugImageInfoTableMetadata>> = Mutex::new(None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need a lock here, this is only ever invoked when loading an image, which won't happen during an interrupt. Cell should be sufficient.
2bf8e15 to
09b5634
Compare
Reduce unsafe usage in config_tables module using more Rust standard allocation APIs and `zerocopy` for struct initialisations.
09b5634 to
1715c2e
Compare
|
|
||
| impl Default for MemoryAttributesTable { | ||
| fn default() -> Self { | ||
| // Self-note: BootServicesData is basically just the default allocator (see documentation for MemoryManager trait) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear how this comment is related to the code at hand? I think you are noting that the Box created here will be in EBSD bucket, so maybe something like // default MAT will be created in the Efi Boot Services Data memory type using the default allocator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair. Will fix that
| } | ||
| Ok(_) => { | ||
| // free the old MAT table if we have one | ||
| let current_mat = get_configuration_table(&efi::MEMORY_ATTRIBUTES_TABLE_GUID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is going to return the mat you just installed - i.e. I think you're going to end up freeing the new table, not the old one. The Ok case for core_install_configuration_table should return the old pointer, and that's the one I think you want to free:
Ok(old_ptr) => {
let current_mat = get_configuration_table(&efi::MEMORY_ATTRIBUTES_TABLE_GUID);
println!("Free this one: {:?} not this one: {:?}", old_ptr, current_mat);
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ofcourse, that was a dumb mistake.
| let mat = MemoryAttributesTable::from_mat_desc_list(mat_desc_list); | ||
| let mat_ptr = mat.as_ptr(); | ||
|
|
||
| match core_install_configuration_table(efi::MEMORY_ATTRIBUTES_TABLE_GUID, mat_ptr as *mut c_void, st) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you are putting mat_ptr into a public table (i.e. one that is visible outside of the patina core to e.g. drivers). However, because you don't leak the mat, I think it will get auto-dropped at the end of the function, meaning that the pointer you installed in the global config table is going to point to freed memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the rebase was a bit sloppy. Before I'd taken care of this because the struct was stored globally, so the lifetime was 'static.
| let current_mat = get_configuration_table(&efi::MEMORY_ATTRIBUTES_TABLE_GUID); | ||
| log::info!("Dumping MAT: {mat:?}"); | ||
| if let Some(current_mat) = current_mat | ||
| && let Err(err) = core_free_pool(current_mat.as_ptr()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current_mat was allocated using Box, it is invalid to free it here using core_free_pool. core_free_pool assumes that the pointer in question was allocated with core_allocate_pool and that an AllocationInfo structure is associated with the allocation. That would not be the case if it was directly allocated from the GlobalAllocator via Box. If you are going to allocate with Box, then it needs to be freed either by dropping the box directly or by calling the global allocator dealloc function directly with the pointer + layout.
I think your best bet is to create the raw pointer with Box::into_raw (this would also fix the issue where it gets dropped when it goes out of scope leaving a dangling pointer in the config table per my other comment), and then reclaim it here with Box::from_raw. However, the type info on the raw pointer is going to be erased when it goes into the config table (all pointers in there are typed c_void); so you need to engage in casting shenanigans to convert *mut c_void back to *mut [u8] with the right length that you need to reconstruct the box.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's true, I forgot about this during the rebase. Previously with a global member, this was much easier, since I effectively had something to "remember" the type for me. So, there wouldn't have been a need for unsafe casting. Freeing would have been as simple as a swap operation, hence dropping the old one elegantly. Do you think that's worth re-introducing a private global member for? After all the goal is to avoid unsafe (taken care of by TplMutex?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is the case that if you rely on native rust allocations (via Box) that it is easier to go with a design where Rust always knows about the allocation - and that implies a global static.
The nature of the MAT table being an externally observable table in a system table structure means that it is inherently outside of the Rust memory model. For example, from a strictly API-centric viewpoint, it is entirely UEFI-spec legal (though a terrible idea and not in step with the expected design) for an external agent to delete this table from the global table and replace it with its own pointer, or even worse, modify the table contents. Neither of these are legitimate things to do with MAT specifically, but the system does not prevent them - so taking such a pointer and casting it back into the Box for dropping it is inherently unsafe because we rely on the good behavior of external code. And you could extend this argument to basically any approach - whatever way we handle this, it's always going to have an element of unsafety about it because of the external visibility of this table.
The question is how to best manage this unsafety. This is a matter for opinions :) My opinion is that maybe the Rust allocation apparatus is not helpful here, because it makes a lot of things (like drop) implicit in an environment where maybe not all of the assumptions baked into the Rust memory model hold. If you use direct allocations (via core_allocate_pool and core_free_pool for example), it's a bit more explicit what is being done to manage the memory, but it also means opting out of the Rust memory safety guarantees.
I can see different approaches working here; the important thing is to ensure we don't end up twisted around a broken hybrid.
Please see issue #893 for why this change was needed.
I'm marking this as draft because I have some cleanup to do, but to give early visibility in case I've done something obviously dumb, I'm raising the draft PR anyways.