Skip to content

Commit 311e6a3

Browse files
committed
Guarantee slice representation
1 parent b0e56db commit 311e6a3

File tree

1 file changed

+163
-0
lines changed

1 file changed

+163
-0
lines changed

text/0000-guaranteed-slice-repr.md

+163
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
- Feature Name: guaranteed_slice_repr
2+
- Start Date: 2025-02-18
3+
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)
4+
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)
5+
6+
# Summary
7+
[summary]: #summary
8+
9+
This RFC guarantees the in-memory representation of slice and str references.
10+
Specifically, `&[T]` is guaranteed to have the same layout as:
11+
12+
```rust
13+
#[repr(C)]
14+
struct Slice<T> {
15+
data: *const T,
16+
len: usize,
17+
}
18+
```
19+
20+
The layout of `&str` is the same as that of `&[u8]`, and the layout of
21+
`&mut str` is the same as that of `&mut [u8]`.
22+
23+
# Motivation
24+
[motivation]: #motivation
25+
26+
This RFC allows non-Rust (e.g. C or C++) code to read from or write to existing
27+
slices and to declare slice fields or locals.
28+
29+
For example, guaranteeing the representation of slices allows non-Rust code to
30+
read from the `data` or `len` fields of `string` in the type below without
31+
intermediate FFI calls into Rust:
32+
33+
```rust
34+
#[repr(C)]
35+
struct HasString {
36+
string: &'static str,
37+
}
38+
```
39+
40+
Note: prior to this RFC, the type above is not even properly `repr(C)` since the
41+
size and alignment of slices were not guaranteed. However, the Rust compiler
42+
accepts `repr(C)` declaration above without warning.
43+
44+
# Guide-level explanation
45+
[guide-level-explanation]: #guide-level-explanation
46+
47+
Slices are represented with a pointer and length pair. Their in-memory layout is
48+
the same as a `#[repr(C)]` struct like the following:
49+
50+
```rust
51+
#[repr(C)]
52+
struct Slice<T> {
53+
data: *const T,
54+
len: usize,
55+
}
56+
```
57+
58+
The validity requirements for the in-memory type are the same as [those
59+
documented on `std::slice::from_raw_parts`](https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html).
60+
Namely:
61+
62+
* `data` must be non-null, [valid] for reads for `len * mem::size_of::<T>()` many bytes,
63+
and it must be properly aligned. This means in particular:
64+
65+
* The entire memory range of this slice must be contained within a single allocated object!
66+
Slices can never span across multiple allocated objects. See [below](#incorrect-usage)
67+
for an example incorrectly not taking this into account.
68+
* `data` must be non-null and aligned even for zero-length slices or slices of ZSTs. One
69+
reason for this is that enum layout optimizations may rely on references
70+
(including slices of any length) being aligned and non-null to distinguish
71+
them from other data. You can obtain a pointer that is usable as `data`
72+
for zero-length slices using [`NonNull::dangling()`].
73+
74+
* `data` must point to `len` consecutive properly initialized values of type `T`.
75+
76+
* The memory referenced by the returned slice must not be mutated for the duration
77+
of lifetime `'a`, except inside an `UnsafeCell`.
78+
79+
* The total size `len * mem::size_of::<T>()` of the slice must be no larger than `isize::MAX`,
80+
and adding that size to `data` must not "wrap around" the address space.
81+
See the safety documentation of [`pointer::offset`].
82+
83+
# Drawbacks
84+
[drawbacks]: #drawbacks
85+
86+
## Zero-sized types
87+
88+
One could imagine representing `&[T]` as only `len` for zero-sized `T`.
89+
This proposal would preclude that choice in favor of a standard representation
90+
for slices regardless of the underlying type.
91+
92+
Alternatively, we could choose to guarantee that the data pointer is present if
93+
and only if `size_of::<T> != 0`. This has the possibility of breaking exising
94+
code which smuggles pointers through the `data` value in `from_raw_parts` /
95+
`into_raw_parts`.
96+
97+
## Compatibility with C++ `std::span`
98+
99+
The largest drawback of this layout and set of validity requirements is that it
100+
may preclude `&[T]` from being representationally equivalent to C++'s
101+
`std::span<T, std::dynamic_extent>`.
102+
103+
* `std::span` does not currently guarantee its layout. In practice, pointer + length
104+
is the common representation. This is even observable using `is_layout_compatible`
105+
[on MSVC](https://godbolt.org/z/Y8ardrshY), though not
106+
[on GCC](https://godbolt.org/z/s4v4xehnG) nor
107+
[on Clang](https://godbolt.org/z/qsd1K5oGq). Future changes to guarantee a
108+
different layout in the C++ standard (unlikely due to MSVC ABI stabilitiy
109+
requirements) could preclude matching the layout with `&[T]`.
110+
111+
* Unlike Rust, `std::span` allows the `data` pointer to be `nullptr`. One
112+
possibile workaround for this would be to guarantee that `Option<&[T]>` uses
113+
`data: std::ptr::null()` to represent the `None` case, making `std::span<T>`
114+
equivalent to `Option<&[T]>` for non-zero-sized types.
115+
116+
* Rust uses a dangling pointer in the representation of zero-length slices.
117+
It's unclear whether
118+
119+
Note that C++ also does not support zero-sized types, so there is no naiive way
120+
to represent types like `std::span<SomeZeroSizedRustType>`.
121+
122+
## Flexibility
123+
124+
Additionally, guaranteeing layout of Rust-native types limits the compiler's and
125+
standard library's ability to change and take advantage of new optimization
126+
opportunities.
127+
128+
# Rationale and alternatives
129+
[rationale-and-alternatives]: #rationale-and-alternatives
130+
131+
* We could avoid committing to a particular representation for slices.
132+
133+
* We could try to guarantee layout compatibility with a particular target's
134+
`std::span` representation, though without standardization this may be
135+
impossible. Multiple different C++ stdlib implementations may be used on
136+
the same platform and could potentially have different span representations.
137+
In practice, current span representations also use ptr+len pairs.
138+
139+
* We could avoid storing a data pointer for zero-sized types. This would result
140+
in a more compact representation but would mean that the representation of
141+
`&[T]` is dependent on the type of `T`.
142+
143+
# Prior art
144+
[prior-art]: #prior-art
145+
146+
The layout in this RFC is already documented in
147+
[the Unsafe Code Guildelines Reference.](https://rust-lang.github.io/unsafe-code-guidelines/layout/pointers.html)
148+
149+
# Unresolved questions
150+
[unresolved-questions]: #unresolved-questions
151+
152+
* Should `&[T]` include a pointer when `T` is zero-sized?
153+
154+
# Future possibilities
155+
[future-possibilities]: #future-possibilities
156+
157+
* Consider defining a separate Rust type which is repr-equivalent to the platform's
158+
native `std::span<T, std::dynamic_extent>` to allow for easier
159+
interoperability with C++ APIs. Unfortunately, the C++ standard does not
160+
guarantee the layout of `std::span` (though the representation may be known
161+
and fixed on a particular implementation, e.g. libc++/libstdc++/MSVC).
162+
Zero-sized types would also not be supported with a naiive implementation of
163+
such a type.

0 commit comments

Comments
 (0)