-
Notifications
You must be signed in to change notification settings - Fork 39
patina_internal_collections: Fix UB memory read in Node fields #1152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
492ef76 to
7a57e63
Compare
| // Initialize all Cell fields to prevent reading uninitialized memory. | ||
| // Cell::set() internally uses mem::replace() which reads the old value before writing. | ||
| // We use ptr::write to initialize the fields without creating references to uninitialized data. | ||
| for i in 0..self.data.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't this be:
for node in buffer {
node.color = Cell::new(BLACK);
node.parent = Cell::new(null);
...
}i.e. we can avoid the pointer manipulation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, recommend making the values consistent with the defaults in Node::new(). i.e. Cell::new(RED).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't this be:
for node in buffer { node.color = Cell::new(BLACK); node.parent = Cell::new(null); ... }i.e. we can avoid the pointer manipulation?
But then in that case buffer still contains uninitialized memory, right? So for node in buffer creates &mut Node<D> references to uninitialized memory, which is still UB, even if immediately overwritten. node.color = Cell::new(BLACK) creates a reference &mut Node<D> and then the assignment operator reads the old value, just like mem::replace reads uninitialized memory. We need to get a raw pointer to the field without creating a reference to uninitialized memory, hence the pointer manipulation. I agree it's not very intuitive...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does miri complain on this? Maybe a case for MaybeUninit so that doesn't happen? I'm speculating.
| // Initialize all Cell fields to prevent reading uninitialized memory. | ||
| // Cell::set() internally uses mem::replace() which reads the old value before writing. | ||
| // We use ptr::write to initialize the fields without creating references to uninitialized data. | ||
| for i in 0..self.data.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, recommend making the values consistent with the defaults in Node::new(). i.e. Cell::new(RED).
| // SAFETY: node_ptr is derived from self.data which is a valid slice with length i+1. | ||
| // We use addr_of_mut! to get field pointers without creating intermediate references | ||
| // to uninitialized data, then ptr::write to initialize the fields. | ||
| unsafe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be done in all cases, not just the capacity == 0 case. In fact, I think it might need to be done before the let buffer at the beginning of the call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - I agree. We also need to update the test for this. Currently, it only takes the capacity==0 path,
| // Initialize all Cell fields to prevent reading uninitialized memory. | ||
| // Cell::set() internally uses mem::replace() which reads the old value before writing. | ||
| // We use ptr::write to initialize the fields without creating references to uninitialized data. | ||
| for i in 0..self.data.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to look at this more in depth, but if this is the solution, it should probably be completed in Self::build_linked_list. This function is already looping through each uninitialized node to set up the linked list of available nodes for allocations. A simple solution would just be to initialize it there while setting up the linked list. It also stops the need to writing null_ptr to the left and right nodes because we are setting them (except for the first and last node in the linked list)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had thought the same thing, but build_linked_list gets called when the list is expected to be initialized, it looks like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also @joschock I think this was an issue before your change. My implementation always had these nodes as only partially initialized because the logic was I only need the linked list of available nodes (so left and right ptr). Nothing else was necessary. When we go to use a node, we popped it and fully initialized it for use.
So theoretically it is safe, but if misused, would definitely be UB. So adding this initialization is for the best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had thought the same thing, but
build_linked_listgets called when the list is expected to be initialized, it looks like.
Nah, build_linked_list does not need to be initialized because build_linked_list is not reading any data from the initialized node. It is only setting the left and right pointers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code creates a mut ref to the slice without initializing the nodes within is technically UB already. So I don't think you can defer it to build_linked_list unless you defer creation of the slice to there.
let buffer = unsafe {
slice::from_raw_parts_mut::<'a, Node<D>>(
slice as *mut [u8] as *mut Node<D>,
slice.len() / mem::size_of::<Node<D>>(),
)
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@garybeihl can you post the actual full miri test failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The full backtrace miri test output is >250K. The gist is that resize calls build_linked_list on uninitialized data (buffer). build_linked_list calls Cell::set_right which calls Cell::set on uninitialized data. Cell::set does a read before write which triggers the UB.
To reproduce the error, install miri with
rustup component add miri --toolchain nightly-2025-09-19Run the test with Stacked Borrows disabled to see the definite UB:
MIRIFLAGS="-Zmiri-backtrace=full -Zmiri-disable-stacked-borrows" \
cargo +nightly-2025-09-19 miri test -p patina_dxe_core --lib \
allocate_deallocate_test │| // We use addr_of_mut! to get field pointers without creating intermediate references | ||
| // to uninitialized data, then ptr::write to initialize the fields. | ||
| unsafe { | ||
| let node_ptr = self.data.as_mut_ptr().add(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option is you could require D implements default then just do this:
for i in 0..self.data.len() {
unsafe { self.data.as_mut_ptr().add(i).write(Node::new(D::default())); }
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would work I think, but you'd be changing the API by adding the D: Default trait and you'd waste some CPU with all the new calls to Default. If that is what the consensus favors, however I can try to implement that path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without something like Default, it's not clear how to initialize D. Since it's generic, we can't assume that filling it with zeros is legit, for example. The UB is not just about accessing uninitialized data; I suspect it would be possible to construct a D such that the compiler would make poor choices based on assumptions of initialization that are not upheld, even without a direct read to the corresponding uninit memory. I think the read is just where Miri happens to detect the UB.
If we don't do Default, we might have to do data: Option<D> so that we can make data the well-defined value of None, and then make the slice with uninitialized nodes:
let buffer = unsafe {
let unint_buffer = slice::from_raw_parts_mut::<<'a, Node<D>>(
slice as *mut [u8] as *mut <'a, Node<D>,
slice.len() / mem::size_of::<Node<D>>(),
)
for i in 0..unint_buffer .len() {
// new_empty makes a Node with data = None, and everything else defaults.
unint_buffer .as_mut_ptr().add(i).write(Node::new_empty()); }
}
};There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall I redo the PR switching to the D: Default trait method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want other opinions, but mine is that Default is unnecessarily constraining on D when what we are trying to solve is essentially an internal problem of the Node implementation. I would go with the Option<D> approach instead.
I also need to study more, but I suspect we need to make the unint_buffer a slice of MaybeUninit<Node<D>> as well, and initialize it before transmuting it to a slice of Node<D>. I think (though my confidence is not great) that simply doing a slice from raw parts on uninitialized Node<D> might be UB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are many options I would like to investigate before considering Node<Option<D>>. Using option is going to have multiple issues:
- I'm guessing that is going to get you stuck with more
Cellwrappers because you may not have mutable access to the node to setDonce we've initialized it. - It's going to make you have to unwrap D even though we know it exists
- It is most likely going to change the memory layout as rust will need to add a discernment for the option.
I would like to see usage of MaybeUninit at all levels investigated, e.g. MaybeUninit<[Node<D>]>, &[MaybeUninit<Node<D>>], and even Node<MaybeUnit<D>>. At a bare minimum, MaybeUninit is guaranteed to keep the same memory layout and alignment.
f0178c1 to
b1385f8
Compare
ae5dcec to
7fcea9b
Compare
|
Hi @garybeihl , I noticed you've been keeping this PR up to date, but not working on it (which is completely fine). I just wanted to make sure you were not waiting on additional information from any of us that have commented on this PR (i.e. make sure we are not blocking you in any way)! |
|
No - not blocked - I uploaded a version of the changes that uses MaybeUninit and D: Default - just waiting for further comments or change requests. If you could have a look when you get a chance, that would be great - thanks! |
b329553 to
a3c3321
Compare
Fixes an undefined behavior issue where Cell::set() reads uninitialized memory during linked list creation in Storage::resize(). Root Cause: - Cell::set() internally uses mem::replace(), which reads the old value before writing the new one. - When Storage::resize() allocates new nodes and calls operations that use Cell::set(), the Cell fields contain uninitialized memory. - Reading uninitialized memory is undefined behavior, even if immediately overwritten. Unwanted compiler "optimizations" could follow. Impact: - Any package using patina_internal_collections - Affects BOTH resize paths: capacity == 0 AND capacity > 0 - Potential memory corruption and non-deterministic errors - Detected by Miri testing (issue OpenDevicePartnership#560) Fix: - Use MaybeUninit<Node<D>> to explicitly represent uninitialized memory - Add `D: Default` trait bound to Storage, Rbt, and Bst - Initialize all nodes using MaybeUninit::write(Node::new(D::default())) - Convert to initialized slice only after all nodes are initialized - This avoids creating references to uninitialized Node<D> (which could be UB) - Added Default implementations for MemoryBlock and IoBlock in patina_dxe_core Introduced by: - PR OpenDevicePartnership#1050 (Nov 13, 2025) Replace AtomicPtr with Cell in patina_internal_collections - AtomicPtr::store() writes without reading the old value, but Cell::set() uses mem::replace() which reads before writing Testing: - Tested with: cargo +nightly-2025-09-19 miri test -p patina_dxe_core - allocate_deallocate_test: passes under Miri with no UB errors - All 47 tests in patina_internal_collections pass - Verified compilation of patina_dxe_core with new trait bounds Addresses reviewer feedback: - Use MaybeUninit to avoid UB from references to uninitialized data - Combine with D: Default for clean initialization - Avoids complexity of Option<D> while maintaining safety guarantees Related to OpenDevicePartnership#560
a3c3321 to
490f0cf
Compare
Description
Fixes an undefined behavior issue where
Cell::set()reads uninitialized memory during linked list creation inStorage::resize().Root Cause
Cell::set()internally usesmem::replace(), which reads the old value before writing the new one.Storage::resize()allocates new nodes and callsbuild_linked_list(), the Cell fields contain uninitialized memory.Impact
Fix
ptr::write()beforebuild_linked_list()addr_of_mut!()to read field pointers without creating references to uninitialized dataIntroduced by
AtomicPtr::store()writes without reading the old value, butCell::set()usesmem::replace()which reads before writingRelated to #560
How This Was Tested
cargo +nightly-2025-09-19 miri test -p patina_dxe_core.Integration Instructions
N/A