Skip to content

Do typed copies of unions preserve "invalid" bytes? #555

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jswrenn opened this issue Feb 6, 2025 · 10 comments
Open

Do typed copies of unions preserve "invalid" bytes? #555

jswrenn opened this issue Feb 6, 2025 · 10 comments
Labels
A-unions Topic: Related to unions

Comments

@jswrenn
Copy link
Member

jswrenn commented Feb 6, 2025

For zerocopy, @joshlf and I are interested in whether we can soundly round-trip values through a union that aren't bit-valid instances of a non-ZST field; e.g.:

union Tricky {
    a: bool,
    b: (),
}

fn main() {
    let src = 3u8;

    // Is it sound to do this? Or insta-UB a la transmuting `3` to `bool`?
    let dst: Tricky = unsafe {
        core::mem::transmute(src)
    };

    // Is it sound to do this? Or are we possibly reading an uninit byte here?
    assert_eq!(src,
        unsafe { core::mem::transmute(dst) }
    )
}

The reasons for our concern is that we know typed copies don't have to preserve padding, but will they preserve initialized-but-invalid bytes?

@RalfJung RalfJung changed the title Do typed copies preserve invalid bytes? Do typed copies of unions preserve invalid bytes? Feb 8, 2025
@RalfJung RalfJung changed the title Do typed copies of unions preserve invalid bytes? Do typed copies of unions preserve "invalid" bytes? Feb 8, 2025
@RalfJung RalfJung added the A-unions Topic: Related to unions label Feb 8, 2025
@RalfJung
Copy link
Member

RalfJung commented Feb 8, 2025

This is tied up with the broader discussion around the value representation of unions, #438 and #494.

I think for unions that look like MaybeUninit, we have general consensus that non-padding bytes are exactly preserved -- but we don't have stable guarantees in that area and we don't (yet) have a good framework for even making such guarantees.

@joshlf
Copy link

joshlf commented Apr 28, 2025

What about the specific question on insta-UB? Is it the case that transmuting 3u8 into MaybeUninit<bool> (or MaybeUninit<Tricky> from the example above) is possibly insta-UB, or is it guaranteed to be sound?

I hope it's the latter since we already provide safe APIs for constructing MaybeUninit<T>s with bytes that might not be valid for the destination type (namely, MaybeUninit::zeroed, which can be combined with e.g. NonZeroU8).

@elichai
Copy link

elichai commented Apr 28, 2025

What about the specific question on insta-UB? Is it the case that transmuting 3u8 into MaybeUninit<bool> is possibly insta-UB, or is it guaranteed to be sound?

I hope it's the latter since we already provide safe APIs for constructing MaybeUninit<T>s with bytes that might not be valid for the destination type (namely, MaybeUninit::zeroed, which can be combined with e.g. NonZeroU8).

I think that's not the question, I think the question is, after you store 3u8 in a MaybeUninit<bool>, can you then transmute it back into a u8 and can you expect to get 3u8?

@joshlf
Copy link

joshlf commented Apr 28, 2025

I'm specifically referring to this comment in the original post:

// Is it sound to do this? Or insta-UB a la transmuting `3` to `bool`?

@ia0
Copy link

ia0 commented Apr 29, 2025

What about the specific question on insta-UB? Is it the case that transmuting 3u8 into MaybeUninit<bool> (or MaybeUninit<Tricky> from the example above) is possibly insta-UB, or is it guaranteed to be sound?

You're changing the question. In OP the first question is transmuting 3u8 to Tricky. This is different from transmuting to MaybeUninit<bool> (or even MaybeUninit<Tricky>). You're not allowed to look at the definition of MaybeUninit, it's private. All MaybeUninit says is that it's currently a #[repr(transparent)] union (which is currently an unstable concept) and that it may change in the future.

So there would be 4 questions:

  • Transmute 3u8 to Tricky:
    Assuming Tricky is #[repr(transparent)] (which OP seems to have omitted) and for any reasonable definition of #[repr(transparent)] union, this should be well-defined (not UB) because for all the AM can tell, you can be transmuting to Tricky::b and 3u8 is just the padding byte. In some sense this is a trivial instance of Adopt Minimum Union Validity Rules #494 (which tries to solve the case where a union value is valid even if no single field can explain it and you have to choose possibly different fields for each byte of the union value).
  • Transmute 3u8 to MaybeUninit<bool>:
    This should follow from the documentation of MaybeUninit and thus be well-defined (although I don't consider the documentation pretty clear about that).
  • What's the value when transmuting back to u8 through Tricky (again assuming it's a #[repr(transparent)])?
    Open question.
  • What's the value when transmuting back to u8 through MaybeUninit<bool>?
    This should follow from the documentation of MaybeUninit and could be well-defined to return 3u8 but I don't see any clear statement about that in the documentation.

@RalfJung
Copy link
Member

RalfJung commented Apr 29, 2025

It's definitely never UB to transmute anything to MaybeUninit<T>, that is the entire point of that type. It's just hard to fully spell out all those cases in the docs...

In an ideal world, transmuting back would always give you exactly what you started with. In practice, there are some hickups, which I am trying to resolve but not everyone agrees that's worth a breaking change; see #518. For types T that have no padding, however, a transmute from T to MaybeUninit<T> and back is guaranteed to preserve the value exactly.

EDIT: oh wait, if T has padding that is lost in any copy of T anyway -- the "value" of type T does not contain the padding. So even without resolving #518, you will always get back the original value.

@joshlf
Copy link

joshlf commented Apr 29, 2025

I've put up rust-lang/rust#140463 to document MaybeUninit's bit validity.

In an ideal world, transmuting back would always give you exactly what you started with. In practice, there are some hickups, which I am trying to resolve but not everyone agrees that's worth a breaking change; see #518. For types T that have no padding, however, a transmute from T to MaybeUninit<T> and back is guaranteed to preserve the value exactly.

EDIT: oh wait, if T has padding that is lost in any copy of T anyway -- the "value" of type T does not contain the padding. So even without resolving #518, you will always get back the original value.

What about T -> MaybeUninit<U> -> T? I can think of two cases:

  • U contains padding where T doesn't (e.g. T = u32, U = #[repr(C)] (u8, u16))
  • U has stricter validity than T (e.g. T = u8, U = bool)

@RalfJung
Copy link
Member

Validity of U does not matter, but sadly padding does at the moment.

@joshlf
Copy link

joshlf commented Apr 29, 2025

Okay understood. So, in rust-lang/rust#140463 (comment), it should be valid for me to update the wording to clarify that, so long as U does not have uninit bytes where T has init bytes, then the round-trip is guaranteed to preserve the value of T?

@RalfJung
Copy link
Member

Let's discuss there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-unions Topic: Related to unions
Projects
None yet
Development

No branches or pull requests

5 participants