Skip to content

Commit 415125f

Browse files
committed
RFC: Promote the libc crate from the nursery
Move the `libc` crate into the `rust-lang` organization after applying changes such as: * Remove the internal organization of the crate in favor of just one flat namespace at the top of the crate. * Set up a large number of CI builders to verify FFI bindings across many platforms in an automatic fashion. * Define the scope of libc in terms of bindings it will provide for each platform.
1 parent 0599646 commit 415125f

File tree

1 file changed

+308
-0
lines changed

1 file changed

+308
-0
lines changed

text/0000-promote-libc.md

Lines changed: 308 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,308 @@
1+
- Feature Name: N/A
2+
- Start Date: 2015-09-21
3+
- RFC PR: (leave this empty)
4+
- Rust Issue: (leave this empty)
5+
6+
# Summary
7+
8+
Promote the `libc` crate from the nursery into the `rust-lang` organization
9+
after applying changes such as:
10+
11+
* Remove the internal organization of the crate in favor of just one flat
12+
namespace at the top of the crate.
13+
* Set up a large number of CI builders to verify FFI bindings across many
14+
platforms in an automatic fashion.
15+
* Define the scope of libc in terms of bindings it will provide for each
16+
platform.
17+
18+
# Motivation
19+
20+
The current `libc` crate is a bit of a mess unfortunately, having long since
21+
departed from its original organization and scope of definition. As more
22+
platforms have been added over time as well as more APIs in general, the
23+
internal as well as external facing organization has become a bit muddled. Some
24+
specific concerns related to organization are:
25+
26+
* There is a vast amount of duplication between platforms with some common
27+
definitions. For example all BSD-like platforms end up defining a similar set
28+
of networking struct constants with the same definitions, but duplicated in
29+
many locations.
30+
* Some subset of `libc` is reexported at the top level via globs, but not all of
31+
`libc` is reexported in this fashion.
32+
* When adding new APIs it's unclear what modules it should be placed into. It's
33+
not always the case that the API being added conforms to one of the existing
34+
standards that a module exist for and it's not always easy to consult the
35+
standard itself to see if the API is in the standard.
36+
* Adding a new platform to liblibc largely entails just copying a huge amount of
37+
code from some previously similar platform and placing it at a new location in
38+
the file.
39+
40+
Additionally, on the technical and tooling side of things some concerns are:
41+
42+
* None of the FFI bindings in this module are verified in terms of testing.
43+
This means that they are both not automatically generated nor verified, and
44+
it's highly likely that there are a good number of mistakes throughout.
45+
* It's very difficult to explore the documentation for libc on different
46+
platforms, but this is often one of the more important libraries to have
47+
documentation for across all platforms.
48+
49+
The purpose of this RFC is to largely propose a reorganization of the libc
50+
crate, along with tweaks to some of the mundane details such as internal
51+
organization, CI automation, how new additions are accepted, etc. These changes
52+
should all help push `libc` to a more more robust position where it can be well
53+
trusted across all platforms both now and into the future!
54+
55+
# Detailed design
56+
57+
All design can be previewed as part of an [in progress fork][libc] available on
58+
GitHub. Additionally, all mentions of the `libc` crate in this RFC refer to the
59+
external copy on crates.io, not the in-tree one in the `rust-lang/rust`
60+
repository. No changes are being proposed (e.g. to stabilize) the in-tree copy.
61+
62+
[libc]: https://github.com/alexcrichton/libc
63+
64+
### What is this crate?
65+
66+
The primary purpose of this crate is to provide all of the definitions
67+
necessary to easily interoperate with C code (or "C-like" code) on each of the
68+
platforms that Rust supports. This includes type definitions (e.g. `c_int`),
69+
constants (e.g. `EINVAL`) as well as function headers (e.g. `malloc`).
70+
71+
One question that typically comes up with this sort of purpose is whether the
72+
crate is "cross platform" in the sense that it basically just works across the
73+
platforms it supports. The `libc` crate, however, **is not intended to be cross
74+
platform** but rather the opposite, an exact binding to the platform in
75+
question. In essence, the `libc` crate is targeted as "replacement for
76+
`#include` in Rust" for traditional system header files, but it makes no
77+
effort to be help being portable by tweaking type definitions and signatures.
78+
79+
### The Home of `libc`
80+
81+
Currently this crate resides inside of the main `rust` repo of the `rust-lang`
82+
organization, but this unfortunately somewhat hinders its development as it
83+
takes awhile to land PRs and isn't quite as quick to release as external
84+
repositories. As a result, this RFC proposes having the crate reside externally
85+
in the `rust-lang` organization so additions can be made through PRs (tested
86+
much more quickly).
87+
88+
The main repository will have a submodule pointing at the external repository to
89+
continue building libstd.
90+
91+
### Public API
92+
93+
The `libc` crate will hide all internal organization of the crate from users of
94+
the crate. All items will be reexported at the top level as part of a flat
95+
namespace. This brings with it a number of benefits:
96+
97+
* The internal structure can evolve over time to better fit new platforms
98+
while being backwards compatible.
99+
* This design matches what one would expect from C, where there's only a flat
100+
namespace available.
101+
* Finding an API is quite easy as the answer is "it's always at the root".
102+
103+
A downside of this approach, however, is that the public API of `libc` will be
104+
platform-specific (e.g. the set of symbols it exposes is different across
105+
platforms), which isn't seen very commonly throughout the rest of the Rust
106+
ecosystem today. This can be mitigated, however, by clearly indicating that this
107+
is a platform specific library in the sense that it matches what you'd get if
108+
you were writing C code across multiple platforms.
109+
110+
The API itself will include any number of definitions typically found in C
111+
header files such as:
112+
113+
* C types, e.g. typedefs, primitive types, structs, etc.
114+
* C constants, e.g. `#define` directives
115+
* C statics
116+
* C functions (their headers)
117+
* C macros (exported as `#[inline]` functions in Rust)
118+
119+
As a technical detail, all `struct` types exposed in `libc` will be guaranteed
120+
to implement the `Copy` and `Clone` traits. There will be an optional feature of
121+
the library to implement `Debug` for all structs, but it will be turned off by
122+
default.
123+
124+
### Changes from today
125+
126+
The [in progress][libc] implementation of this RFC has a number of API changes
127+
and breakages from today's `libc` crate. Almost all of them are minor and
128+
targeted at making bindings more correct in terms of faithfully representing the
129+
underlying platforms.
130+
131+
There is, however, one large notable change from today's crate. The `size_t`,
132+
`ssize_t`, `ptrdiff_t`, `intptr_t`, and `uintptr_t` types are all defined in
133+
terms of `isize` and `usize` instead of known sizes. Brought up by @briansmith
134+
on [#28096][isizeusize] this helps decrease the number of casts necessary in
135+
normal code and matches the existing definitions on all platforms that `libc`
136+
supports today. In the future if a platform is added where these type
137+
definitions are not correct then new ones will simply be available for that
138+
target platform (and casts will be necessary if targeting it).
139+
140+
[isizeusize]: https://github.com/rust-lang/rust/pull/28096
141+
142+
Note that part of this change depends upon removing the compiler's
143+
lint-by-default about `isize` and `usize` being used in FFI definitions. This
144+
lint is mostly a holdover from when the types were named `int` and `uint` and it
145+
was easy to confuse them with C's `int` and `unsigned int` types.
146+
147+
The final change to the `libc` crate will be to bump its version to 1.0.0,
148+
signifying that breakage has happened (a bump from 0.1.x) as well as having a
149+
future-stable interface until 2.0.0.
150+
151+
### Scope of `libc`
152+
153+
The name "libc" is a little nebulous as to what it means across platforms. It
154+
is clear, however, that this library must have a well defined scope up to which
155+
it can expand to ensure that it doesn't start pulling in dozens of runtime
156+
dependencies to bind all the system APIs that are found.
157+
158+
Unfortunately, however, this library also can't be "just libc" in the sense of
159+
"just libc.so on Linux," for example, as this would omit common APIs like
160+
pthreads and would also mean that pthreads would be included on platforms like
161+
MUSL (where it is literally inside libc.a). Additionally, the purpose of libc
162+
isn't to provide a cross platform API, so there isn't necessarily one true
163+
definition in terms of sets of symbols that `libc` will export.
164+
165+
In order to have a well defined scope while satisfying these constraints, this
166+
RFC proposes that this crate will have a scope that is defined separately for
167+
each platform that it targets. The proposals are:
168+
169+
* Linux (and other unix-like platforms) - the libc, libm, librt, libdl, and
170+
libpthread libraries. Additional platforms can include libraries whose symbols
171+
are found in these libraries on Linux as well.
172+
* OSX - the common library to link to on this platform is libSystem, but this
173+
transitively brings in quite a few dependencies, so this crate will refine
174+
what it depends upon from libSystem a little further, specifically:
175+
libsystem\_c, libsystem\_m, libsystem\_pthread, libsystem\_malloc and libdyld.
176+
* Windows - the VS CRT libraries. This library is currently intended to be
177+
distinct from the `winapi` crate as well as bindings to common system DLLs
178+
found on Windows, so the current scope of `libc` will be pared back to just
179+
what the CRT contains. This notably means that a large amount of the current
180+
contents will be removed on Windows.
181+
182+
New platforms added to `libc` can decide the set of libraries `libc` will link
183+
to and bind at that time.
184+
185+
### Internal structure
186+
187+
The primary change being made is that the crate will no longer be one large file
188+
sprinkled with `#[cfg]` annotations. Instead, the crate will be split into a
189+
tree of modules, and all modules will reexport the entire contents of their
190+
children. Unlike most libraries, however, most modules in `libc` will be
191+
hidden via `#[cfg]` at compile time. Each platform supported by `libc` will
192+
correspond to a path from a leaf module to the root, picking up more
193+
definitions, types, and constants as the tree is traversed upwards.
194+
195+
This organization provides a simple method of deduplication between platforms.
196+
For example `libc::unix` contains functions found across all unix platforms
197+
whereas `libc::unix::bsd` is a refinement saying that the APIs within are common
198+
to only BSD-like platforms (these may or may not be present on non-BSD platforms
199+
as well). The benefits of this structure are:
200+
201+
* For any particular platform, it's easy in the source to look up what its value
202+
is (simply trace the path from the leaf to the root, aka the filesystem
203+
structure, and the value can be found).
204+
* When adding an API it's easy to know **where** the API should be added because
205+
each node in the module hierarchy corresponds clearly to some subset of
206+
platforms.
207+
* Adding new platforms should be a relatively simple and confined operation. New
208+
leaves of the hierarchy would be created and some definitions upwards may be
209+
pushed to lower levels if APIs need to be changed or aren't present on the new
210+
platform. It should be easy to audit, however, that a new platform doesn't
211+
tamper with older ones.
212+
213+
### Testing
214+
215+
The current set of bindings in the `libc` crate suffer a drawback in that they
216+
are not verified. This is often a pain point for new platforms where when
217+
copying from an existing platform it's easy to forget to update a constant here
218+
or there. This lack of testing leads to problems like a [wrong definition of
219+
`ioctl`][ioctl] which in turn lead to [backwards compatibility
220+
problems][backcompat] when the API is fixed.
221+
222+
[ioctl]: https://github.com/rust-lang/rust/pull/26809
223+
[backcompat]: https://github.com/rust-lang/rust/pull/27762
224+
225+
In order to solve this problem altogether, the libc crate will be enhanced with
226+
the ability to automatically test the FFI bindings it contains. As this crate
227+
will begin to live in `rust-lang` instead of the `rust` repo itself, this means
228+
it can leverage external CI systems like Travis CI and AppVeyor to perform these
229+
tasks.
230+
231+
The [current implementation][ctest] of the binding testing verifies attributes
232+
such as type size/alignment, struct field offset, struct field types, constant
233+
values, function definitions, etc. Over time it can be enhanced with more
234+
metrics and properties to test.
235+
236+
[ctest]: https://github.com/alexcrichton/ctest
237+
238+
In theory adding a new platform to `libc` will be blocked until automation can
239+
be set up to ensure that the bindings are correct, but it is unfortunately not
240+
easy to add this form of automation for all platforms, so this will not be a
241+
requirement (beyond "tier 1 platforms"). There is currently automation for the
242+
following targets, however, through Travis and AppVeyor:
243+
244+
* `{i686,x86_64}-pc-windows-{msvc,gnu}`
245+
* `{i686,x86_64,mips,aarch64}-unknown-linux-gnu`
246+
* `x86_64-unknown-linux-musl`
247+
* `arm-unknown-linux-gnueabihf`
248+
* `arm-linux-androideabi`
249+
* `{i686,x86_64}-apple-{darwin,ios}`
250+
251+
# Drawbacks
252+
253+
### Loss of module organization
254+
255+
The loss of an internal organization structure can be seen as a drawback of this
256+
design. While perhaps not precisely true today, the principle of the structure
257+
was that it is easy to constrain yourself to a particular C standard or subset
258+
of C to in theory write "more portable programs by default" by only using the
259+
contents of the respective module. Unfortunately in practice this does not seem
260+
to be that much in use, and it's also not clear whether this can be expressed
261+
through simply headers in `libc`. For example many platforms will have slight
262+
tweaks to common structures, definitions, or types in terms of signedness or
263+
value, so even if you were restricted to a particular subset it's not clear that
264+
a program would automatically be more portable.
265+
266+
That being said, it would still be useful to have these abstractions to *some
267+
degree*, but the filp side is that it's easy to build this sort of layer on top
268+
of `libc` as designed here externally on crates.io. For example `extern crate
269+
posix` could just depend on `libc` and reexport all the contents for the
270+
POSIX standard, perhaps with tweaked signatures here and there to work better
271+
across platforms.
272+
273+
### Loss of Windows bindings
274+
275+
By only exposing the CRT functions on Windows, the contents of `libc` will be
276+
quite trimmed down which means when accessing similar functions like `send` or
277+
`connect` crates will be required to link to two libraries at least.
278+
279+
This is also a bit of a maintenance burden on the standard library itself as it
280+
means that all the bindings it uses must move to `src/libstd/sys/windows/c.rs`
281+
in the immedidate future.
282+
283+
# Alternatives
284+
285+
* Instead of *only* exporting a flat namespace the `libc` crate could optionally
286+
also do what it does today with respect to reexporting modules corresponding
287+
to various C standards. The downside to this, unfortunately, is that it's
288+
unclear how much portability using these standards actually buys you.
289+
290+
* The crate could be split up into multiple crates which represent an exact
291+
correspondance to system libraries, but this has the downside of using common
292+
functions available on both OSX and Linux would require at least two `extern
293+
crate` directives and dependencies.
294+
295+
# Unresolved questions
296+
297+
* The only platforms without automation currently are the BSD-like platforms
298+
(e.g. FreeBSD, OpenBSD, Bitrig, DragonFly, etc), but if it were possible to
299+
set up automation for these then it would be plausible to actually require
300+
automation for any new platform. It is possible to do this?
301+
302+
* What is the relation between `std::os::*::raw` and `libc`? Given that the
303+
standard library will probably always depend on an in-tree copy of the `libc`
304+
crate, should `libc` define its own in this case, have the standard library
305+
reexport, and then the out-of-tree `libc` reexports the standard library?
306+
307+
* Should Windows be supported to a greater degree in `libc`? Should this crate
308+
and `winapi` have a closer relationship?

0 commit comments

Comments
 (0)