|
| 1 | +- Feature Name: N/A |
| 2 | +- Start Date: 2015-09-21 |
| 3 | +- RFC PR: (leave this empty) |
| 4 | +- Rust Issue: (leave this empty) |
| 5 | + |
| 6 | +# Summary |
| 7 | + |
| 8 | +Promote the `libc` crate from the nursery into the `rust-lang` organization |
| 9 | +after applying changes such as: |
| 10 | + |
| 11 | +* Remove the internal organization of the crate in favor of just one flat |
| 12 | + namespace at the top of the crate. |
| 13 | +* Set up a large number of CI builders to verify FFI bindings across many |
| 14 | + platforms in an automatic fashion. |
| 15 | +* Define the scope of libc in terms of bindings it will provide for each |
| 16 | + platform. |
| 17 | + |
| 18 | +# Motivation |
| 19 | + |
| 20 | +The current `libc` crate is a bit of a mess unfortunately, having long since |
| 21 | +departed from its original organization and scope of definition. As more |
| 22 | +platforms have been added over time as well as more APIs in general, the |
| 23 | +internal as well as external facing organization has become a bit muddled. Some |
| 24 | +specific concerns related to organization are: |
| 25 | + |
| 26 | +* There is a vast amount of duplication between platforms with some common |
| 27 | + definitions. For example all BSD-like platforms end up defining a similar set |
| 28 | + of networking struct constants with the same definitions, but duplicated in |
| 29 | + many locations. |
| 30 | +* Some subset of `libc` is reexported at the top level via globs, but not all of |
| 31 | + `libc` is reexported in this fashion. |
| 32 | +* When adding new APIs it's unclear what modules it should be placed into. It's |
| 33 | + not always the case that the API being added conforms to one of the existing |
| 34 | + standards that a module exist for and it's not always easy to consult the |
| 35 | + standard itself to see if the API is in the standard. |
| 36 | +* Adding a new platform to liblibc largely entails just copying a huge amount of |
| 37 | + code from some previously similar platform and placing it at a new location in |
| 38 | + the file. |
| 39 | + |
| 40 | +Additionally, on the technical and tooling side of things some concerns are: |
| 41 | + |
| 42 | +* None of the FFI bindings in this module are verified in terms of testing. |
| 43 | + This means that they are both not automatically generated nor verified, and |
| 44 | + it's highly likely that there are a good number of mistakes throughout. |
| 45 | +* It's very difficult to explore the documentation for libc on different |
| 46 | + platforms, but this is often one of the more important libraries to have |
| 47 | + documentation for across all platforms. |
| 48 | + |
| 49 | +The purpose of this RFC is to largely propose a reorganization of the libc |
| 50 | +crate, along with tweaks to some of the mundane details such as internal |
| 51 | +organization, CI automation, how new additions are accepted, etc. These changes |
| 52 | +should all help push `libc` to a more more robust position where it can be well |
| 53 | +trusted across all platforms both now and into the future! |
| 54 | + |
| 55 | +# Detailed design |
| 56 | + |
| 57 | +All design can be previewed as part of an [in progress fork][libc] available on |
| 58 | +GitHub. Additionally, all mentions of the `libc` crate in this RFC refer to the |
| 59 | +external copy on crates.io, not the in-tree one in the `rust-lang/rust` |
| 60 | +repository. No changes are being proposed (e.g. to stabilize) the in-tree copy. |
| 61 | + |
| 62 | +[libc]: https://github.com/alexcrichton/libc |
| 63 | + |
| 64 | +### What is this crate? |
| 65 | + |
| 66 | +The primary purpose of this crate is to provide all of the definitions |
| 67 | +necessary to easily interoperate with C code (or "C-like" code) on each of the |
| 68 | +platforms that Rust supports. This includes type definitions (e.g. `c_int`), |
| 69 | +constants (e.g. `EINVAL`) as well as function headers (e.g. `malloc`). |
| 70 | + |
| 71 | +One question that typically comes up with this sort of purpose is whether the |
| 72 | +crate is "cross platform" in the sense that it basically just works across the |
| 73 | +platforms it supports. The `libc` crate, however, **is not intended to be cross |
| 74 | +platform** but rather the opposite, an exact binding to the platform in |
| 75 | +question. In essence, the `libc` crate is targeted as "replacement for |
| 76 | +`#include` in Rust" for traditional system header files, but it makes no |
| 77 | +effort to be help being portable by tweaking type definitions and signatures. |
| 78 | + |
| 79 | +### The Home of `libc` |
| 80 | + |
| 81 | +Currently this crate resides inside of the main `rust` repo of the `rust-lang` |
| 82 | +organization, but this unfortunately somewhat hinders its development as it |
| 83 | +takes awhile to land PRs and isn't quite as quick to release as external |
| 84 | +repositories. As a result, this RFC proposes having the crate reside externally |
| 85 | +in the `rust-lang` organization so additions can be made through PRs (tested |
| 86 | +much more quickly). |
| 87 | + |
| 88 | +The main repository will have a submodule pointing at the external repository to |
| 89 | +continue building libstd. |
| 90 | + |
| 91 | +### Public API |
| 92 | + |
| 93 | +The `libc` crate will hide all internal organization of the crate from users of |
| 94 | +the crate. All items will be reexported at the top level as part of a flat |
| 95 | +namespace. This brings with it a number of benefits: |
| 96 | + |
| 97 | +* The internal structure can evolve over time to better fit new platforms |
| 98 | + while being backwards compatible. |
| 99 | +* This design matches what one would expect from C, where there's only a flat |
| 100 | + namespace available. |
| 101 | +* Finding an API is quite easy as the answer is "it's always at the root". |
| 102 | + |
| 103 | +A downside of this approach, however, is that the public API of `libc` will be |
| 104 | +platform-specific (e.g. the set of symbols it exposes is different across |
| 105 | +platforms), which isn't seen very commonly throughout the rest of the Rust |
| 106 | +ecosystem today. This can be mitigated, however, by clearly indicating that this |
| 107 | +is a platform specific library in the sense that it matches what you'd get if |
| 108 | +you were writing C code across multiple platforms. |
| 109 | + |
| 110 | +The API itself will include any number of definitions typically found in C |
| 111 | +header files such as: |
| 112 | + |
| 113 | +* C types, e.g. typedefs, primitive types, structs, etc. |
| 114 | +* C constants, e.g. `#define` directives |
| 115 | +* C statics |
| 116 | +* C functions (their headers) |
| 117 | +* C macros (exported as `#[inline]` functions in Rust) |
| 118 | + |
| 119 | +As a technical detail, all `struct` types exposed in `libc` will be guaranteed |
| 120 | +to implement the `Copy` and `Clone` traits. There will be an optional feature of |
| 121 | +the library to implement `Debug` for all structs, but it will be turned off by |
| 122 | +default. |
| 123 | + |
| 124 | +### Changes from today |
| 125 | + |
| 126 | +The [in progress][libc] implementation of this RFC has a number of API changes |
| 127 | +and breakages from today's `libc` crate. Almost all of them are minor and |
| 128 | +targeted at making bindings more correct in terms of faithfully representing the |
| 129 | +underlying platforms. |
| 130 | + |
| 131 | +There is, however, one large notable change from today's crate. The `size_t`, |
| 132 | +`ssize_t`, `ptrdiff_t`, `intptr_t`, and `uintptr_t` types are all defined in |
| 133 | +terms of `isize` and `usize` instead of known sizes. Brought up by @briansmith |
| 134 | +on [#28096][isizeusize] this helps decrease the number of casts necessary in |
| 135 | +normal code and matches the existing definitions on all platforms that `libc` |
| 136 | +supports today. In the future if a platform is added where these type |
| 137 | +definitions are not correct then new ones will simply be available for that |
| 138 | +target platform (and casts will be necessary if targeting it). |
| 139 | + |
| 140 | +[isizeusize]: https://github.com/rust-lang/rust/pull/28096 |
| 141 | + |
| 142 | +Note that part of this change depends upon removing the compiler's |
| 143 | +lint-by-default about `isize` and `usize` being used in FFI definitions. This |
| 144 | +lint is mostly a holdover from when the types were named `int` and `uint` and it |
| 145 | +was easy to confuse them with C's `int` and `unsigned int` types. |
| 146 | + |
| 147 | +The final change to the `libc` crate will be to bump its version to 1.0.0, |
| 148 | +signifying that breakage has happened (a bump from 0.1.x) as well as having a |
| 149 | +future-stable interface until 2.0.0. |
| 150 | + |
| 151 | +### Scope of `libc` |
| 152 | + |
| 153 | +The name "libc" is a little nebulous as to what it means across platforms. It |
| 154 | +is clear, however, that this library must have a well defined scope up to which |
| 155 | +it can expand to ensure that it doesn't start pulling in dozens of runtime |
| 156 | +dependencies to bind all the system APIs that are found. |
| 157 | + |
| 158 | +Unfortunately, however, this library also can't be "just libc" in the sense of |
| 159 | +"just libc.so on Linux," for example, as this would omit common APIs like |
| 160 | +pthreads and would also mean that pthreads would be included on platforms like |
| 161 | +MUSL (where it is literally inside libc.a). Additionally, the purpose of libc |
| 162 | +isn't to provide a cross platform API, so there isn't necessarily one true |
| 163 | +definition in terms of sets of symbols that `libc` will export. |
| 164 | + |
| 165 | +In order to have a well defined scope while satisfying these constraints, this |
| 166 | +RFC proposes that this crate will have a scope that is defined separately for |
| 167 | +each platform that it targets. The proposals are: |
| 168 | + |
| 169 | +* Linux (and other unix-like platforms) - the libc, libm, librt, libdl, and |
| 170 | + libpthread libraries. Additional platforms can include libraries whose symbols |
| 171 | + are found in these libraries on Linux as well. |
| 172 | +* OSX - the common library to link to on this platform is libSystem, but this |
| 173 | + transitively brings in quite a few dependencies, so this crate will refine |
| 174 | + what it depends upon from libSystem a little further, specifically: |
| 175 | + libsystem\_c, libsystem\_m, libsystem\_pthread, libsystem\_malloc and libdyld. |
| 176 | +* Windows - the VS CRT libraries. This library is currently intended to be |
| 177 | + distinct from the `winapi` crate as well as bindings to common system DLLs |
| 178 | + found on Windows, so the current scope of `libc` will be pared back to just |
| 179 | + what the CRT contains. This notably means that a large amount of the current |
| 180 | + contents will be removed on Windows. |
| 181 | + |
| 182 | +New platforms added to `libc` can decide the set of libraries `libc` will link |
| 183 | +to and bind at that time. |
| 184 | + |
| 185 | +### Internal structure |
| 186 | + |
| 187 | +The primary change being made is that the crate will no longer be one large file |
| 188 | +sprinkled with `#[cfg]` annotations. Instead, the crate will be split into a |
| 189 | +tree of modules, and all modules will reexport the entire contents of their |
| 190 | +children. Unlike most libraries, however, most modules in `libc` will be |
| 191 | +hidden via `#[cfg]` at compile time. Each platform supported by `libc` will |
| 192 | +correspond to a path from a leaf module to the root, picking up more |
| 193 | +definitions, types, and constants as the tree is traversed upwards. |
| 194 | + |
| 195 | +This organization provides a simple method of deduplication between platforms. |
| 196 | +For example `libc::unix` contains functions found across all unix platforms |
| 197 | +whereas `libc::unix::bsd` is a refinement saying that the APIs within are common |
| 198 | +to only BSD-like platforms (these may or may not be present on non-BSD platforms |
| 199 | +as well). The benefits of this structure are: |
| 200 | + |
| 201 | +* For any particular platform, it's easy in the source to look up what its value |
| 202 | + is (simply trace the path from the leaf to the root, aka the filesystem |
| 203 | + structure, and the value can be found). |
| 204 | +* When adding an API it's easy to know **where** the API should be added because |
| 205 | + each node in the module hierarchy corresponds clearly to some subset of |
| 206 | + platforms. |
| 207 | +* Adding new platforms should be a relatively simple and confined operation. New |
| 208 | + leaves of the hierarchy would be created and some definitions upwards may be |
| 209 | + pushed to lower levels if APIs need to be changed or aren't present on the new |
| 210 | + platform. It should be easy to audit, however, that a new platform doesn't |
| 211 | + tamper with older ones. |
| 212 | + |
| 213 | +### Testing |
| 214 | + |
| 215 | +The current set of bindings in the `libc` crate suffer a drawback in that they |
| 216 | +are not verified. This is often a pain point for new platforms where when |
| 217 | +copying from an existing platform it's easy to forget to update a constant here |
| 218 | +or there. This lack of testing leads to problems like a [wrong definition of |
| 219 | +`ioctl`][ioctl] which in turn lead to [backwards compatibility |
| 220 | +problems][backcompat] when the API is fixed. |
| 221 | + |
| 222 | +[ioctl]: https://github.com/rust-lang/rust/pull/26809 |
| 223 | +[backcompat]: https://github.com/rust-lang/rust/pull/27762 |
| 224 | + |
| 225 | +In order to solve this problem altogether, the libc crate will be enhanced with |
| 226 | +the ability to automatically test the FFI bindings it contains. As this crate |
| 227 | +will begin to live in `rust-lang` instead of the `rust` repo itself, this means |
| 228 | +it can leverage external CI systems like Travis CI and AppVeyor to perform these |
| 229 | +tasks. |
| 230 | + |
| 231 | +The [current implementation][ctest] of the binding testing verifies attributes |
| 232 | +such as type size/alignment, struct field offset, struct field types, constant |
| 233 | +values, function definitions, etc. Over time it can be enhanced with more |
| 234 | +metrics and properties to test. |
| 235 | + |
| 236 | +[ctest]: https://github.com/alexcrichton/ctest |
| 237 | + |
| 238 | +In theory adding a new platform to `libc` will be blocked until automation can |
| 239 | +be set up to ensure that the bindings are correct, but it is unfortunately not |
| 240 | +easy to add this form of automation for all platforms, so this will not be a |
| 241 | +requirement (beyond "tier 1 platforms"). There is currently automation for the |
| 242 | +following targets, however, through Travis and AppVeyor: |
| 243 | + |
| 244 | +* `{i686,x86_64}-pc-windows-{msvc,gnu}` |
| 245 | +* `{i686,x86_64,mips,aarch64}-unknown-linux-gnu` |
| 246 | +* `x86_64-unknown-linux-musl` |
| 247 | +* `arm-unknown-linux-gnueabihf` |
| 248 | +* `arm-linux-androideabi` |
| 249 | +* `{i686,x86_64}-apple-{darwin,ios}` |
| 250 | + |
| 251 | +# Drawbacks |
| 252 | + |
| 253 | +### Loss of module organization |
| 254 | + |
| 255 | +The loss of an internal organization structure can be seen as a drawback of this |
| 256 | +design. While perhaps not precisely true today, the principle of the structure |
| 257 | +was that it is easy to constrain yourself to a particular C standard or subset |
| 258 | +of C to in theory write "more portable programs by default" by only using the |
| 259 | +contents of the respective module. Unfortunately in practice this does not seem |
| 260 | +to be that much in use, and it's also not clear whether this can be expressed |
| 261 | +through simply headers in `libc`. For example many platforms will have slight |
| 262 | +tweaks to common structures, definitions, or types in terms of signedness or |
| 263 | +value, so even if you were restricted to a particular subset it's not clear that |
| 264 | +a program would automatically be more portable. |
| 265 | + |
| 266 | +That being said, it would still be useful to have these abstractions to *some |
| 267 | +degree*, but the filp side is that it's easy to build this sort of layer on top |
| 268 | +of `libc` as designed here externally on crates.io. For example `extern crate |
| 269 | +posix` could just depend on `libc` and reexport all the contents for the |
| 270 | +POSIX standard, perhaps with tweaked signatures here and there to work better |
| 271 | +across platforms. |
| 272 | + |
| 273 | +### Loss of Windows bindings |
| 274 | + |
| 275 | +By only exposing the CRT functions on Windows, the contents of `libc` will be |
| 276 | +quite trimmed down which means when accessing similar functions like `send` or |
| 277 | +`connect` crates will be required to link to two libraries at least. |
| 278 | + |
| 279 | +This is also a bit of a maintenance burden on the standard library itself as it |
| 280 | +means that all the bindings it uses must move to `src/libstd/sys/windows/c.rs` |
| 281 | +in the immedidate future. |
| 282 | + |
| 283 | +# Alternatives |
| 284 | + |
| 285 | +* Instead of *only* exporting a flat namespace the `libc` crate could optionally |
| 286 | + also do what it does today with respect to reexporting modules corresponding |
| 287 | + to various C standards. The downside to this, unfortunately, is that it's |
| 288 | + unclear how much portability using these standards actually buys you. |
| 289 | + |
| 290 | +* The crate could be split up into multiple crates which represent an exact |
| 291 | + correspondance to system libraries, but this has the downside of using common |
| 292 | + functions available on both OSX and Linux would require at least two `extern |
| 293 | + crate` directives and dependencies. |
| 294 | + |
| 295 | +# Unresolved questions |
| 296 | + |
| 297 | +* The only platforms without automation currently are the BSD-like platforms |
| 298 | + (e.g. FreeBSD, OpenBSD, Bitrig, DragonFly, etc), but if it were possible to |
| 299 | + set up automation for these then it would be plausible to actually require |
| 300 | + automation for any new platform. It is possible to do this? |
| 301 | + |
| 302 | +* What is the relation between `std::os::*::raw` and `libc`? Given that the |
| 303 | + standard library will probably always depend on an in-tree copy of the `libc` |
| 304 | + crate, should `libc` define its own in this case, have the standard library |
| 305 | + reexport, and then the out-of-tree `libc` reexports the standard library? |
| 306 | + |
| 307 | +* Should Windows be supported to a greater degree in `libc`? Should this crate |
| 308 | + and `winapi` have a closer relationship? |
0 commit comments