Skip to content

Commit 63e87b5

Browse files
Protocol Buffer TeamLogofile
authored andcommitted
This documentation change includes the following:
* Fixes link in `proto3.md` * Major updates to the `java-proto-names.md` topic * Publishing several Rust-related topics PiperOrigin-RevId: 689770222 Change-Id: I50541f33ccf56e5af21085f03fa833c7f7377275
1 parent f0c5f35 commit 63e87b5

File tree

7 files changed

+262
-97
lines changed

7 files changed

+262
-97
lines changed

content/programming-guides/proto3.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,8 @@ You must give each field in your message definition a number between `1` and
6868
implementation. The protocol buffer compiler will complain if you use one of
6969
these reserved field numbers in your message.
7070
- You cannot use any previously [reserved](#fieldreserved) field numbers or
71-
any field numbers that have been allocated to [extensions](#extensions).
71+
any field numbers that have been allocated to
72+
[extensions](/programming-guides/proto2#extensions).
7273

7374
This number **cannot be changed once your message type is in use** because it
7475
identifies the field in the

content/reference/java/java-proto-names.md

Lines changed: 49 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -2,55 +2,72 @@
22
title = "Java Proto Names"
33
weight = 655
44
linkTitle = "Generated Proto Names"
5-
description = "Names that are generated by the immutable API."
5+
description = "Names that are generated by the Java protoc plugin."
66
type = "docs"
77
+++
88

99
This document contains information on what the fully-qualified Java name of a
1010
proto is, based on the different proto options. This name corresponds to the
1111
package you need to import to use that message.
1212

13-
**NOTE:** The `java_package` and `java_alt_api_package` options are interpreted
14-
relative to the API indicated by `java_api_version`. For example, if
15-
`java_api_version` is 1, then the proto1 package will be `java_package` and the
16-
proto2 package (the "alternative" API) will be `java_alt_api_package`. And if
17-
`java_api_version` is 2, then `java_package` determines the proto2 package and
18-
`java_alt_api_package` determines the proto1 package.
13+
## Recommendation { #recommendation }
14+
15+
* Set `option java_multiple_files = true;`
16+
* Set `option java_outer_classname = "FileNameProto";`
17+
* Set `option java_package = "com.google.package";`
18+
19+
### Explanation {#explanation}
20+
21+
#### Multiple Files {#multiple-files}
22+
23+
With `java_multiple_files = true`, the generated Java class for each message
24+
will be placed in a separate `.java` file. This makes it much easier to move
25+
messages from one `.proto` file to another.
26+
27+
#### Outer Classname {#outer-classname}
28+
29+
There is a Java class generated for the `.proto` file itself. The name of the
30+
class for the file will be automatically generated if not specified. However,
31+
the rules for how that name is generated are overly-complicated and non-obvious.
32+
The best policy is to explicitly set the `java_outer_classname` option to the
33+
`.proto` file name converted to PascalCase with the `'.'` removed. For example:
34+
35+
* The file `student_record_request.proto` should set:
36+
37+
```proto
38+
option java_outer_classname = "StudentRecordRequestProto";
39+
```
40+
41+
#### Java Package {#java-package}
42+
43+
The Java package for generated bindings will be automatically set to the proto
44+
package. However, this is usually not conformant with Java conventions. To
45+
ensure a conventional Java package name, we recommend explicitly setting the
46+
`java_package` option. For example, within Google, the convention is to prepend
47+
`com.google.` to the proto package.
1948
2049
## Immutable API Message Names { #immutable-api-message-names }
2150
22-
The names for protos generated by the immutable API (`java_proto_library` BUILD
23-
target) are listed in the following table.
24-
25-
java_api_version | java_multiple_files | java_alt_api_package | java_package | java_outer_classname | Generated full message name
26-
:--------------: | :-----------------: | -------------------- | ------------ | -------------------- | ---------------------------
27-
1 | true | Defined | - | | `$java_alt_api_package.$message`
28-
1 | true | Not defined | Not defined | | `com.google.protos.$package.proto2api.$message`
29-
1 | true | Not defined | Defined | | `$java_package.proto2api.$message`
30-
1 | false | Defined | - | Not defined | `$java_alt_api_package.$derived_outer_class.$message`
31-
1 | false | Defined | - | Defined | `$java_alt_api_package.$java_outer_classname.$message`
32-
1 | false | Not defined | Not defined | Not defined | `com.google.protos.$package.proto2api.$derived_outer_class.$message`
33-
1 | false | Not defined | Not defined | Defined | `com.google.protos.$package.proto2api.$java_outer_classname.$message`
34-
1 | false | Not defined | Defined | Not defined | `$java_package.proto2api.$derived_outer_class.$message`
35-
1 | false | Not defined | Defined | Defined | `$java_package.proto2api.$java_outer_classname.$message`
36-
2 | true | - | Not defined | - | `com.google.protos.$package.$message`
37-
2 | true | - | Defined | - | `$java_package.$message`
38-
2 | false | - | Not defined | Not defined | `com.google.protos.$package.$derived_outer_class.$message`
39-
2 | false | - | Not defined | Defined | `com.google.protos.$package.$java_outer_classname.$message`
40-
2 | false | - | Defined | Not defined | `$java_package.$derived_outer_class.$message`
41-
2 | false | - | Defined | Defined | `$java_package.$java_outer_classname.$message`
51+
The Java plugin for protoc will generate names according to this table.
52+
53+
java_multiple_files | java_package | java_outer_classname | Generated full message name
54+
:-----------------: | ------------ | -------------------- | ---------------------------
55+
true | Not defined | *ignored* | `com.google.protos.$package.$message`
56+
true | Defined | *ignored* | `$java_package.$message`
57+
false | Not defined | Not defined | `com.google.protos.$package.$derived_outer_class.$message`
58+
false | Not defined | Defined | `com.google.protos.$package.$java_outer_classname.$message`
59+
false | Defined | Not defined | `$java_package.$derived_outer_class.$message`
60+
false | Defined | Defined | `$java_package.$java_outer_classname.$message`
4261
4362
**Legend**
4463
45-
* \- means either setting or not setting the option will not change the
46-
generated full message name.
4764
* `$message` is the actual name of the proto message.
4865
* `$package` is the name of the proto package. This is the name specified by
4966
the `package` directive in the proto file, which is usually at the top of
5067
the file.
5168
* `$derived_outer_class` is a name generated from the proto file name.
5269
Generally it's computed by removing punctuation from the file name and
53-
converting it to CamelCase. For example, if the proto is `foo_bar.proto`,
70+
converting it to PascalCase. For example, if the proto is `foo_bar.proto`,
5471
the `$derived_outer_class` value is `FooBar`.
5572
5673
If the generated class name would be the same as one of the messages defined
@@ -60,22 +77,5 @@ java_api_version | java_multiple_files | java_alt_api_package | java_package | j
6077
true when using the v1 API, whether or not the class name would be the same
6178
as one of the messages defined.
6279
63-
* All other `$names` are the values of the corresponding proto2 file options
64-
defined in the proto file.
65-
66-
### Recommendation { #recommendation }
67-
68-
The recommended option to use is:
69-
70-
```proto
71-
option java_multiple_files = true;
72-
```
73-
74-
With `java_multiple_files = true`, the generated Java class for each message
75-
will be placed in a separate `.java` file. This makes it much easier to move
76-
messages from one `.proto` file to another. There is also an outer Java class
77-
generated for the `.proto` file itself. (The legend above explains how this
78-
outer class name is generated.)
79-
80-
The `java_api_version` option defaults to `2`, but you can manually set it to
81-
`1` when necessary.
80+
* All other `$names` are the values of the corresponding file options defined
81+
in the `.proto` file.

content/reference/rust/building-rust-protos.md

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -60,9 +60,6 @@ other programming languages:
6060
)
6161
```
6262
63-
See google3/devtools/rust/examples/protobuf/ for the full example.
64-
6563
**Note:** Don't use `rust_upb_proto_library` or `rust_cc_proto_library`
6664
directly. `rust_proto_library` checks the global build flag to choose the
67-
appropriate backend for you. See go/switching-rust-proto-library-backends if you
68-
want to learn more.
65+
appropriate backend for you.

content/reference/rust/index.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
+++
2+
title = "Rust Reference"
3+
weight = 781
4+
linkTitle = "Rust"
5+
description = "Reference documentation for working with protocol buffer classes in Rust."
6+
type = "docs"
7+
toc_hide = "true"
8+
+++
Lines changed: 184 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,184 @@
1+
+++
2+
title = "Rust Proto Design Decisions"
3+
weight = 782
4+
linkTitle = "Design Decisions"
5+
description = "Explains some of the design choices that the Rust Proto implementation makes."
6+
type = "docs"
7+
toc_hide = "true"
8+
+++
9+
10+
As with any library, Rust Protobuf is designed considering the needs of both
11+
Google's first-party usage of Rust as well that of external users. Choosing a
12+
path in that design space means that some choices made will not be optimal for
13+
some users in some cases, even if it is the right choice for the implementation
14+
overall.
15+
16+
This page covers some of the larger design decisions that the Rust Protobuf
17+
implementation makes and the considerations which led to those decisions.
18+
19+
## Designed to Be ‘Backed’ by Other Protobuf Implementations, Including C++ Protobuf {#backed-by-cpp}
20+
21+
Protobuf Rust is not a pure Rust implementation of protobuf, but a safe Rust API
22+
implemented on top of existing protobuf implementations, or as we call these
23+
implementations: kernels.
24+
25+
The biggest factor that goes into this decision was to enable zero-cost of
26+
adding Rust to a preexisting binary which already uses non-Rust Protobuf. Bby
27+
enabling the implementation to be ABI-compatible with the C++ Protobuf generated
28+
code, it is possible to share Protobuf messages across the language boundary
29+
(FFI) as plain pointers, avoiding the need to serialize in one language, pass
30+
the byte array across the boundary, and deserialize in the other language. This
31+
also reduces binary size for these use cases by avoiding having redundant schema
32+
information embedded in the binary for the same messages for each language.
33+
34+
Protobuf Rust currently supports three kernels:
35+
36+
* C++ kernel - the generated code is backed by C++ Protocol Buffers (the
37+
"full" implementation, typically used for servers). This kernel offers
38+
in-memory interoperability with C++ code that uses the C++ runtime. This is
39+
the default for servers within Google.
40+
* C++ Lite kernel - the generated code is backed by C++ Lite Protocol Buffers
41+
(typically used for mobile). This kernel offers in-memory interoperability
42+
with C++ code that uses the C++ Lite runtime. This is the default for
43+
for mobile apps within Google.
44+
* upb kernel - the generated code is backed by
45+
[upb](https://github.com/protocolbuffers/protobuf/tree/main/upb),
46+
a highly performant and small-binary-size Protobuf library written in C. upb
47+
is designed to be used as an implementation detail by Protobuf runtimes in
48+
other languages. This is the default in open source builds where we expect
49+
static linking with code already using C++ Protobuf to be more rare.
50+
51+
The decision to support multiple non-Rust kernels significantly influences the
52+
our public API decisions, including the types used on getters (discussed later
53+
in this document).
54+
55+
### No Pure Rust Kernel {#no-pure-rust}
56+
57+
Given that we designed the API to be implementable by multiple backing
58+
implementations, a natural question is why the only supported kernels are
59+
written in the memory unsafe languages of C and C++ today.
60+
61+
While Rust being a memory-safe language can significantly reduce exposure to
62+
critical security issues, no language is immune to security issues. The Protobuf
63+
implementations that we support as kernels have been scrutinized and fuzzed to
64+
the extent that Google is comfortable using those implementations to perform
65+
unsandboxed parsing of untrusted inputs in our own servers and apps. A
66+
greenfield binary parser written in Rust at this time would be understood to be
67+
much more likely to contain critical vulnerabilities than the preexisting C++
68+
Protobuf parser.
69+
70+
There are legitimate arguments for long-term supporting a pure Rust
71+
implementation, including toolchain difficulties for developers using our
72+
implementation in open source.
73+
74+
It is a reasonable assumption that Google will support a pure Rust
75+
implementation at some later date, but we are not investing in it today and have
76+
no concrete roadmap for it at this time.
77+
78+
## View/Mut Proxy Types {#view-mut-proxy-types}
79+
80+
The Rust Proto API is designed with opaque "Proxy" types. For a .proto file that
81+
defines `message SomeMsg {}`, we generate the Rust types `SomeMsg`,
82+
`SomeMsgView<'_>` and `SomeMsgMut<'_>`. The simple rule of thumb is that we
83+
expect the View and Mut types to stand in for `&SomeMsg` and `&mut SomeMsg` in
84+
all usages by default, while still getting all of the borrow checking/Send/etc.
85+
behavior that you would expect from those types.
86+
87+
### Another Lens to Understand These Types {#another-lens}
88+
89+
To better understand the nuances of these types, it may be useful to think of
90+
these types as follows:
91+
92+
```rust
93+
struct SomeMsg(Box<cpp::SomeMsg>);
94+
struct SomeMsgView<'a>(&'a cpp::SomeMsg);
95+
struct SomeMsgMut<'a>(&'a mut cpp::SomeMsg);
96+
```
97+
98+
Under this lens you can see that:
99+
100+
- Given a `&SomeMsg` it is possible to get a `SomeMsgView` (similar to how
101+
given a `&Box<T>` you can get a `&T`)
102+
- Given a `SomeMsgView` it in *not* possible to get a `&SomeMsg` (similar to
103+
how given a `&T` you couldn't get a `&Box<T>`).
104+
105+
Just like with the `&Box` example, this means that on function arguments, it is
106+
generally better to default to use `SomeMsgView<'a>` rather than a `&'a
107+
SomeMsg`, as it will allow a superset of callers to use the function.
108+
109+
### Why {#why}
110+
111+
There are two main reasons for this design: to unlock possible optimization
112+
benefits, and as an inherent outcome of the kernel design.
113+
114+
#### Optimization Opportunity Benefit {#optimization}
115+
116+
Protobuf being such a core and widespread technology makes it unusually both
117+
prone to all possible observable behaviors being depended on by someone, as well
118+
as relatively small optimizations having unusually major net impact at scale. We
119+
have found that more opaqueness of types gives unusually high amount of
120+
leverage: they permit us to be more deliberate about exactly what behaviors are
121+
exposed, and give us more room to optimize the implementation.
122+
123+
A `SomeMsgMut<'_>` provides those opportunities where a `&mut SomeMsg` would
124+
not: namely that we can construct them lazily and with an implementation detail
125+
which is not the same as the owned message representation. It also inherently
126+
allows us to control certain behaviors that we couldn't otherwise limit or
127+
control: for example, any `&mut` can be used with `std::mem::swap()`, which is a
128+
behavior that would place strong limits on what invariants you are able to
129+
maintain between a parent and child struct if `&mut SomeChild` is given to
130+
callers.
131+
132+
#### Inherent to Kernel Design {#kernel-design}
133+
134+
The other reason for the proxy types is more of an inherent limitation to our
135+
kernel design; when you have a `&T` there must be a real Rust `T` type in memory
136+
somewhere.
137+
138+
Our C++ kernel design allows you to parse a message which contains nested
139+
messages, and create only a small Rust stack-allocated object to representing
140+
the root message, with all other memory being stored on the C++ Heap. When you
141+
later access a child message, there will be no already-allocated Rust object
142+
which corresponds to that child, and so there's no Rust instance to borrow at
143+
that moment.
144+
145+
By using proxy types, we're able to on-demand create the Rust proxy types that
146+
semantically acting as borrows, without there being any eagerly allocated Rust
147+
memory for those instances ahead of time.
148+
149+
## Non-Std Types {#non-std}
150+
151+
### Simple Types Which May Have a Directly Corresponding Std Type {#corresponding-std}
152+
153+
In some cases the Rust Protobuf API may choose to create our own types where a
154+
corresponding std type exists with the same name, where the current
155+
implementation may even simply wrap the std type, for example
156+
`proto::UTF-8Error`.
157+
158+
Using these types rather than std types gives us more flexibility in optimizing
159+
the implementation in the future. While our current implementation uses the Rust
160+
std UTF-8 validation today, by creating our own `proto::Utf8Error` type it
161+
enables us to change the implementation to use the highly optimized C++
162+
implementation of UTF-8 validation that we use from C++ Protobuf which is faster
163+
than Rust's std UTF-8 validation.
164+
165+
### ProtoString {#proto-string}
166+
167+
Rust's `str` and `std::string::String` types maintain a strict invariant that
168+
they only contain valid UTF-8, but C++ Protobuf and C++'s `std::string` type
169+
generally do not enforce any such guarantee. `string` typed Protobuf fields are
170+
intended to only ever contain valid UTF-8, but the enforcement of this has many
171+
holes where a `string` field may end up containing invalid UTF-8 contents at
172+
runtime.
173+
174+
To deliver on zero-cost message sharing between C++ and Rust while minimizing
175+
costly validations or risk of undefined behavior in Rust, we chose not to using
176+
the `str`/`String` types for `string` field getters, and introduced the types
177+
`ProtoStr` and `ProtoString` instead which are equivalent types except they
178+
could contain invalid UTF-8 in rare situations. Those types let the application
179+
code choose if they wish to perform the validation on-demand to get a `&str`, or
180+
operate on the raw bytes to avoid any validation.
181+
182+
We are aware that vocabulary types like `str` are very important to idiomatic
183+
usage, and intend to keep an eye on if this decision is the right one as usage
184+
details of Rust evolves.

0 commit comments

Comments
 (0)