Skip to content

Conversation

@garybeihl
Copy link
Contributor

Description

Fixes an undefined behavior issue where Cell::set() reads uninitialized memory during linked list creation in Storage::resize().

Root Cause

  • Cell::set() internally uses mem::replace(), which reads the old value before writing the new one.
  • When Storage::resize() allocates new nodes and calls build_linked_list(), the Cell fields contain uninitialized memory.
  • Reading uninitialized memory is undefined behavior, even if immediately overwritten. Unwanted compiler "optimizations" could follow.

Impact

Fix

  • Initialize all Cell fields using ptr::write() before build_linked_list()
  • Use addr_of_mut!() to read field pointers without creating references to uninitialized data

Introduced by

Related to #560

  • [X ] Impacts functionality?
  • [ X] Impacts security?
  • Breaking change?
  • Includes tests?
  • Includes documentation?

How This Was Tested

  • Tested with: cargo +nightly-2025-09-19 miri test -p patina_dxe_core.
  • 7 tests now pass (previously 0/469 due to this UB issue).

Integration Instructions

N/A

@github-actions github-actions bot added the impact:security Has a security impact label Dec 3, 2025
@garybeihl garybeihl added the type:bug Something isn't working label Dec 3, 2025
@codecov
Copy link

codecov bot commented Dec 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@garybeihl garybeihl force-pushed the fix-node-uninit branch 2 times, most recently from 492ef76 to 7a57e63 Compare December 3, 2025 16:57
// Initialize all Cell fields to prevent reading uninitialized memory.
// Cell::set() internally uses mem::replace() which reads the old value before writing.
// We use ptr::write to initialize the fields without creating references to uninitialized data.
for i in 0..self.data.len() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this be:

for node in buffer {
  node.color = Cell::new(BLACK);
  node.parent = Cell::new(null);
  ...
}

i.e. we can avoid the pointer manipulation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, recommend making the values consistent with the defaults in Node::new(). i.e. Cell::new(RED).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this be:

for node in buffer {
  node.color = Cell::new(BLACK);
  node.parent = Cell::new(null);
  ...
}

i.e. we can avoid the pointer manipulation?

But then in that case buffer still contains uninitialized memory, right? So for node in buffer creates &mut Node<D> references to uninitialized memory, which is still UB, even if immediately overwritten. node.color = Cell::new(BLACK) creates a reference &mut Node<D> and then the assignment operator reads the old value, just like mem::replace reads uninitialized memory. We need to get a raw pointer to the field without creating a reference to uninitialized memory, hence the pointer manipulation. I agree it's not very intuitive...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does miri complain on this? Maybe a case for MaybeUninit so that doesn't happen? I'm speculating.

// Initialize all Cell fields to prevent reading uninitialized memory.
// Cell::set() internally uses mem::replace() which reads the old value before writing.
// We use ptr::write to initialize the fields without creating references to uninitialized data.
for i in 0..self.data.len() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, recommend making the values consistent with the defaults in Node::new(). i.e. Cell::new(RED).

// SAFETY: node_ptr is derived from self.data which is a valid slice with length i+1.
// We use addr_of_mut! to get field pointers without creating intermediate references
// to uninitialized data, then ptr::write to initialize the fields.
unsafe {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be done in all cases, not just the capacity == 0 case. In fact, I think it might need to be done before the let buffer at the beginning of the call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - I agree. We also need to update the test for this. Currently, it only takes the capacity==0 path,

// Initialize all Cell fields to prevent reading uninitialized memory.
// Cell::set() internally uses mem::replace() which reads the old value before writing.
// We use ptr::write to initialize the fields without creating references to uninitialized data.
for i in 0..self.data.len() {
Copy link
Collaborator

@Javagedes Javagedes Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@garybeihl

I need to look at this more in depth, but if this is the solution, it should probably be completed in Self::build_linked_list. This function is already looping through each uninitialized node to set up the linked list of available nodes for allocations. A simple solution would just be to initialize it there while setting up the linked list. It also stops the need to writing null_ptr to the left and right nodes because we are setting them (except for the first and last node in the linked list)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had thought the same thing, but build_linked_list gets called when the list is expected to be initialized, it looks like.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also @joschock I think this was an issue before your change. My implementation always had these nodes as only partially initialized because the logic was I only need the linked list of available nodes (so left and right ptr). Nothing else was necessary. When we go to use a node, we popped it and fully initialized it for use.

So theoretically it is safe, but if misused, would definitely be UB. So adding this initialization is for the best.

Copy link
Collaborator

@Javagedes Javagedes Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had thought the same thing, but build_linked_list gets called when the list is expected to be initialized, it looks like.

Nah, build_linked_list does not need to be initialized because build_linked_list is not reading any data from the initialized node. It is only setting the left and right pointers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code creates a mut ref to the slice without initializing the nodes within is technically UB already. So I don't think you can defer it to build_linked_list unless you defer creation of the slice to there.

let buffer = unsafe {
            slice::from_raw_parts_mut::<'a, Node<D>>(
                slice as *mut [u8] as *mut Node<D>,
                slice.len() / mem::size_of::<Node<D>>(),
            )
        };

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@garybeihl can you post the actual full miri test failure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The full backtrace miri test output is >250K. The gist is that resize calls build_linked_list on uninitialized data (buffer). build_linked_list calls Cell::set_right which calls Cell::set on uninitialized data. Cell::set does a read before write which triggers the UB.

To reproduce the error, install miri with

    rustup component add miri --toolchain nightly-2025-09-19

Run the test with Stacked Borrows disabled to see the definite UB:

MIRIFLAGS="-Zmiri-backtrace=full -Zmiri-disable-stacked-borrows" \
cargo +nightly-2025-09-19 miri test -p patina_dxe_core --lib \
allocate_deallocate_test                                                     │

// We use addr_of_mut! to get field pointers without creating intermediate references
// to uninitialized data, then ptr::write to initialize the fields.
unsafe {
let node_ptr = self.data.as_mut_ptr().add(i);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option is you could require D implements default then just do this:

for i in 0..self.data.len() {
    unsafe { self.data.as_mut_ptr().add(i).write(Node::new(D::default())); }
}

Copy link
Contributor Author

@garybeihl garybeihl Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would work I think, but you'd be changing the API by adding the D: Default trait and you'd waste some CPU with all the new calls to Default. If that is what the consensus favors, however I can try to implement that path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without something like Default, it's not clear how to initialize D. Since it's generic, we can't assume that filling it with zeros is legit, for example. The UB is not just about accessing uninitialized data; I suspect it would be possible to construct a D such that the compiler would make poor choices based on assumptions of initialization that are not upheld, even without a direct read to the corresponding uninit memory. I think the read is just where Miri happens to detect the UB.

If we don't do Default, we might have to do data: Option<D> so that we can make data the well-defined value of None, and then make the slice with uninitialized nodes:

        let buffer = unsafe {
            let unint_buffer = slice::from_raw_parts_mut::<<'a, Node<D>>(
                slice as *mut [u8] as *mut <'a, Node<D>,
                slice.len() / mem::size_of::<Node<D>>(),
            )
            for i in 0..unint_buffer .len() {
              // new_empty makes a Node with data = None, and everything else defaults.
              unint_buffer .as_mut_ptr().add(i).write(Node::new_empty()); } 
            }
        };

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall I redo the PR switching to the D: Default trait method?

Copy link
Contributor

@joschock joschock Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want other opinions, but mine is that Default is unnecessarily constraining on D when what we are trying to solve is essentially an internal problem of the Node implementation. I would go with the Option<D> approach instead.

I also need to study more, but I suspect we need to make the unint_buffer a slice of MaybeUninit<Node<D>> as well, and initialize it before transmuting it to a slice of Node<D>. I think (though my confidence is not great) that simply doing a slice from raw parts on uninitialized Node<D> might be UB.

Copy link
Collaborator

@Javagedes Javagedes Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many options I would like to investigate before considering Node<Option<D>>. Using option is going to have multiple issues:

  1. I'm guessing that is going to get you stuck with more Cell wrappers because you may not have mutable access to the node to set D once we've initialized it.
  2. It's going to make you have to unwrap D even though we know it exists
  3. It is most likely going to change the memory layout as rust will need to add a discernment for the option.

I would like to see usage of MaybeUninit at all levels investigated, e.g. MaybeUninit<[Node<D>]>, &[MaybeUninit<Node<D>>], and even Node<MaybeUnit<D>>. At a bare minimum, MaybeUninit is guaranteed to keep the same memory layout and alignment.

@garybeihl garybeihl force-pushed the fix-node-uninit branch 4 times, most recently from f0178c1 to b1385f8 Compare December 9, 2025 00:26
@garybeihl garybeihl force-pushed the fix-node-uninit branch 2 times, most recently from ae5dcec to 7fcea9b Compare December 16, 2025 00:41
@Javagedes
Copy link
Collaborator

Hi @garybeihl , I noticed you've been keeping this PR up to date, but not working on it (which is completely fine).

I just wanted to make sure you were not waiting on additional information from any of us that have commented on this PR (i.e. make sure we are not blocking you in any way)!

@garybeihl
Copy link
Contributor Author

No - not blocked - I uploaded a version of the changes that uses MaybeUninit and D: Default - just waiting for further comments or change requests. If you could have a look when you get a chance, that would be great - thanks!

@garybeihl garybeihl force-pushed the fix-node-uninit branch 2 times, most recently from b329553 to a3c3321 Compare December 22, 2025 13:57
Fixes an undefined behavior issue where Cell::set() reads uninitialized
memory during linked list creation in Storage::resize().

Root Cause:
- Cell::set() internally uses mem::replace(), which reads the old value
  before writing the new one.
- When Storage::resize() allocates new nodes and calls operations that use
  Cell::set(), the Cell fields contain uninitialized memory.
- Reading uninitialized memory is undefined behavior, even if immediately
  overwritten. Unwanted compiler "optimizations" could follow.

Impact:
- Any package using patina_internal_collections
- Affects BOTH resize paths: capacity == 0 AND capacity > 0
- Potential memory corruption and non-deterministic errors
- Detected by Miri testing (issue OpenDevicePartnership#560)

Fix:
- Use MaybeUninit<Node<D>> to explicitly represent uninitialized memory
- Add `D: Default` trait bound to Storage, Rbt, and Bst
- Initialize all nodes using MaybeUninit::write(Node::new(D::default()))
- Convert to initialized slice only after all nodes are initialized
- This avoids creating references to uninitialized Node<D> (which could be UB)
- Added Default implementations for MemoryBlock and IoBlock in patina_dxe_core

Introduced by:
- PR OpenDevicePartnership#1050 (Nov 13, 2025) Replace AtomicPtr with Cell in patina_internal_collections
- AtomicPtr::store() writes without reading the old value, but Cell::set()
  uses mem::replace() which reads before writing

Testing:
- Tested with: cargo +nightly-2025-09-19 miri test -p patina_dxe_core
- allocate_deallocate_test: passes under Miri with no UB errors
- All 47 tests in patina_internal_collections pass
- Verified compilation of patina_dxe_core with new trait bounds

Addresses reviewer feedback:
- Use MaybeUninit to avoid UB from references to uninitialized data
- Combine with D: Default for clean initialization
- Avoids complexity of Option<D> while maintaining safety guarantees

Related to OpenDevicePartnership#560
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

impact:security Has a security impact type:bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants