diff --git a/content/programming-guides/editions.md b/content/programming-guides/editions.md index 1831a75c0..ff4e7d2c7 100644 --- a/content/programming-guides/editions.md +++ b/content/programming-guides/editions.md @@ -1356,7 +1356,7 @@ and unpack `Any` values in a typesafe manner – for example, in Java, the `Any` type will have special `pack()` and `unpack()` accessors, while in C++ there are `PackFrom()` and `UnpackTo()` methods: -```c++ +```cpp // Storing an arbitrary message type in Any. NetworkErrorDetails details = ...; ErrorStatus status; @@ -1425,7 +1425,7 @@ language in the relevant [API reference](/reference/). oneof. So if you set several oneof fields, only the *last* field you set will still have a value. - ```c++ + ```cpp SampleMessage message; message.set_name("name"); CHECK(message.has_name()); @@ -1452,7 +1452,7 @@ language in the relevant [API reference](/reference/). following sample code will crash because `sub_message` was already deleted by calling the `set_name()` method. - ```c++ + ```cpp SampleMessage message; SubMessage* sub_message = message.mutable_sub_message(); message.set_name("name"); // Will delete sub_message @@ -1463,7 +1463,7 @@ language in the relevant [API reference](/reference/). end up with the other's oneof case: in the example below, `msg1` will have a `sub_message` and `msg2` will have a `name`. - ```c++ + ```cpp SampleMessage msg1; msg1.set_name("name"); SampleMessage msg2; @@ -1644,223 +1644,11 @@ about, see the ## JSON Mapping {#json} -Protobuf supports a canonical encoding in JSON, making it easier to share data -between systems. The encoding is described on a type-by-type basis in the table -below. - -When parsing JSON-encoded data into a protocol buffer, if a value is missing or -if its value is `null`, it will be interpreted as the corresponding -[default value](#default). - -When generating JSON-encoded output from a protocol buffer, if a protobuf field -has the default value and if the field doesn't support field presence, it will -be omitted from the output by default. An implementation may provide options to -include fields with default values in the output. - -Singular fields that have a value set and that support field presence always -include the field value in the JSON-encoded output, even if it is the default -value. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
EditionsJSONJSON exampleNotes
messageobject{"fooBar": v, "g": null, ...}Generates JSON objects. Message field names are mapped to - lowerCamelCase and become JSON object keys. If the - json_name field option is specified, the specified value - will be used as the key instead. Parsers accept both the lowerCamelCase - name (or the one specified by the json_name option) and the - original proto field name. null is an accepted value for - all field types and treated as the default value of the corresponding - field type. However, null cannot be used for the - json_name value. For more on why, see - Stricter validation for json_name. -
enumstring"FOO_BAR"The name of the enum value as specified in proto is used. Parsers - accept both enum names and integer values. -
map<K,V>object{"k": v, ...}All keys are converted to strings.
repeated Varray[v, ...]null is accepted as the empty list [].
booltrue, falsetrue, false
stringstring"Hello World!"
bytesbase64 string"YWJjMTIzIT8kKiYoKSctPUB+"JSON value will be the data encoded as a string using standard base64 - encoding with paddings. Either standard or URL-safe base64 encoding - with/without paddings are accepted. -
int32, fixed32, uint32number1, -10, 0JSON value will be a decimal number. Either numbers or strings are - accepted. Empty strings are invalid. -
int64, fixed64, uint64string"1", "-10"JSON value will be a decimal string. Either numbers or strings are - accepted. Empty strings are invalid. -
float, doublenumber1.1, -10.0, 0, "NaN", "Infinity"JSON value will be a number or one of the special string values "NaN", - "Infinity", and "-Infinity". Either numbers or strings are accepted. - Empty strings are invalid. Exponent notation is also accepted. -
Anyobject{"@type": "url", "f": v, ... }If the Any contains a value that has a special JSON - mapping, it will be converted as follows: {"@type": xxx, "value": - yyy}. Otherwise, the value will be converted into a JSON object, - and the "@type" field will be inserted to indicate the - actual data type. -
Timestampstring"1972-01-01T10:00:20.021Z"Uses RFC 3339, where generated output will always be Z-normalized - and uses 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are - also accepted. -
Durationstring"1.000340012s", "1s"Generated output always contains 0, 3, 6, or 9 fractional digits, - depending on required precision, followed by the suffix "s". Accepted - are any fractional digits (also none) as long as they fit into - nano-seconds precision and the suffix "s" is required. -
Structobject{ ... }Any JSON object. See struct.proto.
Wrapper typesvarious types2, "2", "foo", true, "true", null, 0, ...Wrappers use the same representation in JSON as the wrapped primitive - type, except that null is allowed and preserved during data - conversion and transfer. -
FieldMaskstring"f.fooBar,h"See field_mask.proto.
ListValuearray[foo, bar, ...]
ValuevalueAny JSON value. Check - google.protobuf.Value - for details. -
NullValuenullJSON null
Emptyobject{}An empty JSON object
- -### JSON Options {#json-options} - -A protobuf JSON implementation may provide the following options: - -* **Always emit fields without presence**: Fields that don't support presence - and that have their default value are omitted by default in JSON output (for - example, an implicit presence integer with a 0 value, implicit presence - string fields that are empty strings, and empty repeated and map fields). An - implementation may provide an option to override this behavior and output - fields with their default values. - - As of v25.x, the C++, Java, and Python implementations are nonconformant, as - this flag affects proto2 `optional` fields but not proto3 `optional` fields. - A fix is planned for a future release. - -* **Ignore unknown fields**: The protobuf JSON parser should reject unknown - fields by default but may provide an option to ignore unknown fields in - parsing. - -* **Use proto field name instead of lowerCamelCase name**: By default the - protobuf JSON printer should convert the field name to lowerCamelCase and - use that as the JSON name. An implementation may provide an option to use - proto field name as the JSON name instead. Protobuf JSON parsers are - required to accept both the converted lowerCamelCase name and the proto - field name. - -* **Emit enum values as integers instead of strings**: The name of an enum - value is used by default in JSON output. An option may be provided to use - the numeric value of the enum value instead. +The standard protobuf binary wire format is the preferred serialization format +for communication between two systems that use protobufs. For communicating with +systems that use JSON rather than protobuf wire format, Protobuf supports a +canonical encoding in +[ProtoJSON](/programming-guides/json). ## Options {#options} diff --git a/content/programming-guides/field_presence.md b/content/programming-guides/field_presence.md index 215a2c840..27e962fca 100644 --- a/content/programming-guides/field_presence.md +++ b/content/programming-guides/field_presence.md @@ -9,20 +9,24 @@ type = "docs" ## Background *Field presence* is the notion of whether a protobuf field has a value. There -are two different manifestations of presence for protobufs: *no presence*, where -the generated message API stores field values (only), and *explicit presence*, -where the API also stores whether or not a field has been set. +are two different manifestations of presence for protobufs: *implicit presence*, +where the generated message API stores field values (only), and *explicit +presence*, where the API also stores whether or not a field has been set. Historically, proto2 has mostly followed *explicit presence*, while proto3 -exposes only *no presence* semantics. Singular proto3 fields of basic types -(numeric, string, bytes, and enums) which are defined with the `optional` label -have *explicit presence*, like proto2 (this feature is enabled by default as -release 3.15). +exposes only *implicit presence* semantics. Singular proto3 fields of basic +types (numeric, string, bytes, and enums) which are defined with the `optional` +label have *explicit presence*, like proto2 (this feature is enabled by default +as release 3.15). + +**NOTE:** We recommend always adding the `optional` label for proto3 basic +types. This provides a smoother path to editions, which uses explicit presence +by default. ### Presence Disciplines *Presence disciplines* define the semantics for translating between the *API -representation* and the *serialized representation*. The *no presence* +representation* and the *serialized representation*. The *implicit presence* discipline relies upon the field value itself to make decisions at (de)serialization time, while the *explicit presence* discipline relies upon the explicit tracking state instead. @@ -41,7 +45,7 @@ backward-compatible across changes to the message definition; however, this compatibility introduces some (perhaps surprising) considerations when deserializing wire-formatted messages: -- When serializing, fields with *no presence* are not serialized if they +- When serializing, fields with *implicit presence* are not serialized if they contain their default value. - For numeric types, the default is 0. - For enums, the default is the zero-valued enumerator. @@ -87,8 +91,8 @@ semantics of the wire format or TextFormat. - Notably, JSON *elements* are semantically unordered, and each member must have a unique name. This is different from TextFormat rules for repeated fields. -- JSON may include fields that are "not present," unlike the *no presence* - discipline for other formats: +- JSON may include fields that are "not present," unlike the *implicit + presence* discipline for other formats: - JSON defines a `null` value, which may be used to represent a *defined but not-present field*. - Repeated field values may be included in the formatted output, even if @@ -174,24 +178,24 @@ basic types (numeric, string, bytes, and enums), either. Oneof fields affirmatively expose presence, although the same set of hazzer methods may not generated as in proto2 APIs. -Under the *no presence* discipline, the default value is synonymous with "not -present" for purposes of serialization. To notionally "clear" a field (so it -won't be serialized), an API user would set it to the default value. +Under the *implicit presence* discipline, the default value is synonymous with +"not present" for purposes of serialization. To notionally "clear" a field (so +it won't be serialized), an API user would set it to the default value. -The default value for enum-typed fields under *no presence* is the corresponding -0-valued enumerator. Under proto3 syntax rules, all enum types are required to -have an enumerator value which maps to 0. By convention, this is an `UNKNOWN` or -similarly-named enumerator. If the zero value is notionally outside the domain -of valid values for the application, this behavior can be thought of as -tantamount to *explicit presence*. +The default value for enum-typed fields under *implicit presence* is the +corresponding 0-valued enumerator. Under proto3 syntax rules, all enum types are +required to have an enumerator value which maps to 0. By convention, this is an +`UNKNOWN` or similarly-named enumerator. If the zero value is notionally outside +the domain of valid values for the application, this behavior can be thought of +as tantamount to *explicit presence*. -## Semantic Differences +## Semantic Differences {#semantic-differences} -The *no presence* serialization discipline results in visible differences from -the *explicit presence* tracking discipline, when the default value is set. For -a singular field with numeric, enum, or string type: +The *implicit presence* serialization discipline results in visible differences +from the *explicit presence* tracking discipline, when the default value is set. +For a singular field with numeric, enum, or string type: -- *No presence* discipline: +- *Implicit presence* discipline: - Default values are not serialized. - Default values are *not* merged-from. - To "clear" a field, it is set to its default value. @@ -200,6 +204,7 @@ a singular field with numeric, enum, or string type: the application-specific domain of values; - the field was notionally "cleared" by setting its default; or - the field was never set. + - `has_` methods are not generated (but see note after this list) - *Explicit presence* discipline: - Explicitly set values are always serialized, including default values. - Un-set fields are never merged-from. @@ -209,13 +214,17 @@ a singular field with numeric, enum, or string type: - A generated `clear_foo` method must be used to clear (i.e., un-set) the value. +**Note:** `Has_` methods are not generated for implicit members in most cases. +The exception to this behavior is Dart, which generates `has_` methods with +proto3 proto schema files. + ### Considerations for Merging -Under the *no presence* rules, it is effectively impossible for a target field -to merge-from its default value (using the protobuf's API merging functions). -This is because default values are skipped, similar to the *no presence* -serialization discipline. Merging only updates the target (merged-to) message -using the non-skipped values from the update (merged-from) message. +Under the *implicit presence* rules, it is effectively impossible for a target +field to merge-from its default value (using the protobuf's API merging +functions). This is because default values are skipped, similar to the *implicit +presence* serialization discipline. Merging only updates the target (merged-to) +message using the non-skipped values from the update (merged-from) message. The difference in merging behavior has further implications for protocols which rely on partial "patch" updates. If field presence is not tracked, then an @@ -228,14 +237,14 @@ values -- even default values -- will be merged into the target. ### Considerations for change-compatibility -Changing a field between *explicit presence* and *no presence* is a +Changing a field between *explicit presence* and *implicit presence* is a binary-compatible change for serialized values in wire format. However, the serialized representation of the message may differ, depending on which version of the message definition was used for serialization. Specifically, when a "sender" explicitly sets a field to its default value: -- The serialized value following *no presence* discipline does not contain the - default value, even though it was explicitly set. +- The serialized value following *implicit presence* discipline does not + contain the default value, even though it was explicitly set. - The serialized value following *explicit presence* discipline contains every "present" field, even if it contains the default value. @@ -324,7 +333,7 @@ syntax = "proto3"; package example; message MyMessage { - // No presence: + // implicit presence: int32 not_tracked = 1; // Explicit presence: @@ -346,7 +355,7 @@ tracking with protoc. The generated code for proto3 fields with *explicit presence* (the `optional` label) will be the same as it would be in a proto2 file. -This is the definition used in the "no presence" examples below: +This is the definition used in the "implicit presence" examples below: ```protobuf syntax = "proto3"; @@ -371,9 +380,9 @@ In the examples, a function `GetProto` constructs and returns a message of type #### C++ Example -No presence: +Implicit presence: -```c++ +```cpp Msg m = GetProto(); if (m.foo() != 0) { // "Clear" the field: @@ -399,7 +408,7 @@ if (m.has_foo()) { #### C# Example -No presence: +Implicit presence: ```c# var m = GetProto(); @@ -427,7 +436,7 @@ if (m.HasFoo) { #### Go Example -No presence: +Implicit presence: ```go m := GetProto() @@ -458,7 +467,7 @@ if m.Foo != nil { These examples use a `Builder` to demonstrate clearing. Simply checking presence and getting values from a `Builder` follows the same API as the message type. -No presence: +Implicit presence: ```java Msg.Builder m = GetProto().toBuilder(); @@ -486,7 +495,7 @@ if (m.hasFoo()) { #### Python Example -No presence: +Implicit presence: ```python m = example.Msg() @@ -512,7 +521,7 @@ else: #### Ruby Example -No presence: +Implicit presence: ```ruby m = Msg.new @@ -540,7 +549,7 @@ end #### Javascript Example -No presence: +Implicit presence: ```js var m = new Msg(); @@ -568,7 +577,7 @@ if (m.hasFoo()) { #### Objective-C Example -No presence: +Implicit presence: ```objective-c Msg *m = [[Msg alloc] init]; diff --git a/content/programming-guides/json.md b/content/programming-guides/json.md new file mode 100644 index 000000000..d09ca9439 --- /dev/null +++ b/content/programming-guides/json.md @@ -0,0 +1,238 @@ ++++ +title = "ProtoJSON Format" +weight = 62 +description = "Covers how to use the Protobuf to JSON conversion utilities." +type = "docs" ++++ + +Protobuf supports a canonical encoding in JSON, making it easier to share data +with systems that do not support the standard protobuf binary wire format. + +ProtoJSON Format is not as efficient as protobuf wire format. The converter uses +more CPU to encode and decode messages and (except in rare cases) encoded +messages consume more space. Furthermore, ProtoJSON format puts your field and +enum value names into encoded messages making it much harder to change those +names later. Removing fields is a breaking change that will trigger a parsing +error. In short, there are many good reasons why Google prefers to use the +standard wire format for virtually everything rather than ProtoJSON format. + +The encoding is described on a type-by-type basis in the table later in this +topic. + +When parsing JSON-encoded data into a protocol buffer, if a value is missing or +if its value is `null`, it will be interpreted as the corresponding +[default value](/programming-guides/editions#default). + +When generating JSON-encoded output from a protocol buffer, if a protobuf field +has the default value and if the field doesn't support field presence, it will +be omitted from the output by default. An implementation may provide options to +include fields with default values in the output. + +Fields that have a value set and that support field presence always include the +field value in the JSON-encoded output, even if it is the default value. For +example, a proto3 field that is defined with the `optional` keyword supports +field presence and if set, will always appear in the JSON output. A message type +field in any edition of protobuf supports field presence and if set will appear +in the output. Proto3 implicit-presence scalar fields will only appear in the +JSON output if they are not set to the default value for that type. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ProtobufJSONJSON exampleNotes
messageobject{"fooBar": v, "g": null, ...}Generates JSON objects. Message field names are mapped to + lowerCamelCase and become JSON object keys. If the + json_name field option is specified, the specified value + will be used as the key instead. Parsers accept both the lowerCamelCase + name (or the one specified by the json_name option) and the + original proto field name. null is an accepted value for + all field types and treated as the default value of the corresponding + field type. However, null cannot be used for the + json_name value. For more on why, see + Stricter validation for json_name. +
enumstring"FOO_BAR"The name of the enum value as specified in proto is used. Parsers + accept both enum names and integer values. +
map<K,V>object{"k": v, ...}All keys are converted to strings.
repeated Varray[v, ...]null is accepted as the empty list [].
booltrue, falsetrue, false
stringstring"Hello World!"
bytesbase64 string"YWJjMTIzIT8kKiYoKSctPUB+"JSON value will be the data encoded as a string using standard base64 + encoding with paddings. Either standard or URL-safe base64 encoding + with/without paddings are accepted. +
int32, fixed32, uint32number1, -10, 0JSON value will be a decimal number. Either numbers or strings are + accepted. Empty strings are invalid. +
int64, fixed64, uint64string"1", "-10"JSON value will be a decimal string. Either numbers or strings are + accepted. Empty strings are invalid. +
float, doublenumber1.1, -10.0, 0, "NaN", "Infinity"JSON value will be a number or one of the special string values "NaN", + "Infinity", and "-Infinity". Either numbers or strings are accepted. + Empty strings are invalid. Exponent notation is also accepted. +
Anyobject{"@type": "url", "f": v, ... }If the Any contains a value that has a special JSON + mapping, it will be converted as follows: {"@type": xxx, "value": + yyy}. Otherwise, the value will be converted into a JSON object, + and the "@type" field will be inserted to indicate the + actual data type. +
Timestampstring"1972-01-01T10:00:20.021Z"Uses RFC 3339, where generated output will always be Z-normalized + and uses 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are + also accepted. +
Durationstring"1.000340012s", "1s"Generated output always contains 0, 3, 6, or 9 fractional digits, + depending on required precision, followed by the suffix "s". Accepted + are any fractional digits (also none) as long as they fit into + nano-seconds precision and the suffix "s" is required. +
Structobject{ ... }Any JSON object. See struct.proto.
Wrapper typesvarious types2, "2", "foo", true, "true", null, 0, ...Wrappers use the same representation in JSON as the wrapped primitive + type, except that null is allowed and preserved during data + conversion and transfer. +
FieldMaskstring"f.fooBar,h"See field_mask.proto.
ListValuearray[foo, bar, ...]
ValuevalueAny JSON value. Check + google.protobuf.Value + for details. +
NullValuenullJSON null
Emptyobject{}An empty JSON object
+ +### JSON Options {#json-options} + +A conformant protobuf JSON implementation may provide the following options: + +* **Always emit fields without presence**: Fields that don't support presence + and that have their default value are omitted by default in JSON output (for + example, an implicit presence integer with a 0 value, implicit presence + string fields that are empty strings, and empty repeated and map fields). An + implementation may provide an option to override this behavior and output + fields with their default values. + + As of v25.x, the C++, Java, and Python implementations are nonconformant, as + this flag affects proto2 `optional` fields but not proto3 `optional` fields. + A fix is planned for a future release. + +* **Ignore unknown fields**: The protobuf JSON parser should reject unknown + fields by default but may provide an option to ignore unknown fields in + parsing. + +* **Use proto field name instead of lowerCamelCase name**: By default the + protobuf JSON printer should convert the field name to lowerCamelCase and + use that as the JSON name. An implementation may provide an option to use + proto field name as the JSON name instead. Protobuf JSON parsers are + required to accept both the converted lowerCamelCase name and the proto + field name. + +* **Emit enum values as integers instead of strings**: The name of an enum + value is used by default in JSON output. An option may be provided to use + the numeric value of the enum value instead. diff --git a/content/programming-guides/proto2.md b/content/programming-guides/proto2.md index c367ce3b9..eae4004c5 100644 --- a/content/programming-guides/proto2.md +++ b/content/programming-guides/proto2.md @@ -1802,218 +1802,10 @@ projects we know about, see the ## JSON Mapping {#json} -Protobuf supports a canonical encoding in JSON, making it easier to share data -between systems. The encoding is described on a type-by-type basis in the table -below. - -When parsing JSON-encoded data into a protocol buffer, if a value is missing or -if its value is `null`, it will be interpreted as the corresponding -[default value](#optional). - -Singular fields that have a value set and that support field presence always -include the field value in the JSON-encoded output, even if it is the default -value. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
proto2JSONJSON exampleNotes
messageobject{"fooBar": v, "g": null, ...}Generates JSON objects. Message field names are mapped to - lowerCamelCase and become JSON object keys. If the - json_name field option is specified, the specified value - will be used as the key instead. Parsers accept both the lowerCamelCase - name (or the one specified by the json_name option) and the - original proto field name. null is an accepted value for - all field types and treated as the default value of the corresponding - field type. However, null cannot be used for the - json_name value. For more on why, see - Stricter validation for json_name. -
enumstring"FOO_BAR"The name of the enum value as specified in proto is used. Parsers - accept both enum names and integer values. -
map<K,V>object{"k": v, ...}All keys are converted to strings.
repeated Varray[v, ...]null is accepted as the empty list [].
booltrue, falsetrue, false
stringstring"Hello World!"
bytesbase64 string"YWJjMTIzIT8kKiYoKSctPUB+"JSON value will be the data encoded as a string using standard base64 - encoding with paddings. Either standard or URL-safe base64 encoding - with/without paddings are accepted. -
int32, fixed32, uint32number1, -10, 0JSON value will be a decimal number. Either numbers or strings are - accepted. -
int64, fixed64, uint64string"1", "-10"JSON value will be a decimal string. Either numbers or strings are - accepted. -
float, doublenumber1.1, -10.0, 0, "NaN", "Infinity"JSON value will be a number or one of the special string values "NaN", - "Infinity", and "-Infinity". Either numbers or strings are accepted. - Exponent notation is also accepted. -0 is considered equivalent to 0. -
Anyobject{"@type": "url", "f": v, ... }If the Any contains a value that has a special JSON - mapping, it will be converted as follows: {"@type": xxx, "value": - yyy}. Otherwise, the value will be converted into a JSON object, - and the "@type" field will be inserted to indicate the - actual data type. -
Timestampstring"1972-01-01T10:00:20.021Z"Uses RFC 3339, where generated output will always be Z-normalized - and uses 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are - also accepted. -
Durationstring"1.000340012s", "1s"Generated output always contains 0, 3, 6, or 9 fractional digits, - depending on required precision, followed by the suffix "s". Accepted - are any fractional digits (also none) as long as they fit into - nano-seconds precision and the suffix "s" is required. -
Structobject{ ... }Any JSON object. See struct.proto.
Wrapper typesvarious types2, "2", "foo", true, "true", null, 0, ...Wrappers use the same representation in JSON as the wrapped scalar - type, except that null is allowed and preserved during data - conversion and transfer. -
FieldMaskstring"f.fooBar,h"See field_mask.proto.
ListValuearray[foo, bar, ...]
ValuevalueAny JSON value. Check - google.protobuf.Value - for details. -
NullValuenullJSON null
Emptyobject{}An empty JSON object
- -### JSON Options {#json-options} - -A protobuf JSON implementation may provide the following options: - -* **Always emit fields without presence**: Fields that don't support presence - and that have their default value are omitted by default in JSON output (for - example, an implicit presence integer with a 0 value, implicit presence - string fields that are empty strings, and empty repeated and map fields). An - implementation may provide an option to override this behavior and output - fields with their default values. - - As of v25.x, the C++, Java, and Python implementations are nonconformant, as - this flag affects proto2 `optional` fields but not proto3 `optional` fields. - A fix is planned for a future release. - -* **Ignore unknown fields**: The protobuf JSON parser should reject unknown - fields by default but may provide an option to ignore unknown fields in - parsing. - -* **Use proto field name instead of lowerCamelCase name**: By default the - protobuf JSON printer should convert the field name to lowerCamelCase and - use that as the JSON name. An implementation may provide an option to use - proto field name as the JSON name instead. Protobuf JSON parsers are - required to accept both the converted lowerCamelCase name and the proto - field name. - -* **Emit enum values as integers instead of strings**: The name of an enum - value is used by default in JSON output. An option may be provided to use - the numeric value of the enum value instead. +The standard protobuf binary wire format is the preferred serialization format +for communication between two systems that use protobufs. For communicating with +systems that use JSON rather than protobuf wire format, Protobuf supports a +canonical encoding in [JSON](/programming-guides/json). ## Options {#options} diff --git a/content/programming-guides/proto3.md b/content/programming-guides/proto3.md index 7c1706e90..cd9ce5ba4 100644 --- a/content/programming-guides/proto3.md +++ b/content/programming-guides/proto3.md @@ -68,7 +68,8 @@ You must give each field in your message definition a number between `1` and implementation. The protocol buffer compiler will complain if you use one of these reserved field numbers in your message. - You cannot use any previously [reserved](#fieldreserved) field numbers or - any field numbers that have been allocated to [extensions](#extensions). + any field numbers that have been allocated to + [extensions](/programming-guides/proto2#extensions). This number **cannot be changed once your message type is in use** because it identifies the field in the @@ -1041,7 +1042,7 @@ and unpack `Any` values in a typesafe manner – for example, in Java, the `Any` type will have special `pack()` and `unpack()` accessors, while in C++ there are `PackFrom()` and `UnpackTo()` methods: -```c++ +```cpp // Storing an arbitrary message type in Any. NetworkErrorDetails details = ...; ErrorStatus status; @@ -1104,7 +1105,7 @@ language in the relevant [API reference](/reference/). oneof. So if you set several oneof fields, only the *last* field you set will still have a value. - ```c++ + ```cpp SampleMessage message; message.set_name("name"); CHECK_EQ(message.name(), "name"); @@ -1129,7 +1130,7 @@ language in the relevant [API reference](/reference/). following sample code will crash because `sub_message` was already deleted by calling the `set_name()` method. - ```c++ + ```cpp SampleMessage message; SubMessage* sub_message = message.mutable_sub_message(); message.set_name("name"); // Will delete sub_message @@ -1140,7 +1141,7 @@ language in the relevant [API reference](/reference/). end up with the other's oneof case: in the example below, `msg1` will have a `sub_message` and `msg2` will have a `name`. - ```c++ + ```cpp SampleMessage msg1; msg1.set_name("name"); SampleMessage msg2; @@ -1320,224 +1321,10 @@ about, see the ## JSON Mapping {#json} -Protobuf supports a canonical encoding in JSON, making it easier to share data -between systems. The encoding is described on a type-by-type basis in the table -below. - -When parsing JSON-encoded data into a protocol buffer, if a value is missing or -if its value is `null`, it will be interpreted as the corresponding -[default value](#default). - -When generating JSON-encoded output from a protocol buffer, if a protobuf field -has the default value and if the field doesn't support field presence, it will -be omitted from the output by default. An implementation may provide options to -include fields with default values in the output. - -A proto3 field that is defined with the `optional` keyword supports field -presence. Fields that have a value set and that support field presence always -include the field value in the JSON-encoded output, even if it is the default -value. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
proto3JSONJSON exampleNotes
messageobject{"fooBar": v, "g": null, ...}Generates JSON objects. Message field names are mapped to - lowerCamelCase and become JSON object keys. If the - json_name field option is specified, the specified value - will be used as the key instead. Parsers accept both the lowerCamelCase - name (or the one specified by the json_name option) and the - original proto field name. null is an accepted value for - all field types and treated as the default value of the corresponding - field type. However, null cannot be used for the - json_name value. For more on why, see - Stricter validation for json_name. -
enumstring"FOO_BAR"The name of the enum value as specified in proto is used. Parsers - accept both enum names and integer values. -
map<K,V>object{"k": v, ...}All keys are converted to strings.
repeated Varray[v, ...]null is accepted as the empty list [].
booltrue, falsetrue, false
stringstring"Hello World!"
bytesbase64 string"YWJjMTIzIT8kKiYoKSctPUB+"JSON value will be the data encoded as a string using standard base64 - encoding with paddings. Either standard or URL-safe base64 encoding - with/without paddings are accepted. -
int32, fixed32, uint32number1, -10, 0JSON value will be a decimal number. Either numbers or strings are - accepted. Empty strings are invalid. -
int64, fixed64, uint64string"1", "-10"JSON value will be a decimal string. Either numbers or strings are - accepted. Empty strings are invalid. -
float, doublenumber1.1, -10.0, 0, "NaN", "Infinity"JSON value will be a number or one of the special string values "NaN", - "Infinity", and "-Infinity". Either numbers or strings are accepted. - Empty strings are invalid. Exponent notation is also accepted. -
Anyobject{"@type": "url", "f": v, ... }If the Any contains a value that has a special JSON - mapping, it will be converted as follows: {"@type": xxx, "value": - yyy}. Otherwise, the value will be converted into a JSON object, - and the "@type" field will be inserted to indicate the - actual data type. -
Timestampstring"1972-01-01T10:00:20.021Z"Uses RFC 3339, where generated output will always be Z-normalized - and uses 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are - also accepted. -
Durationstring"1.000340012s", "1s"Generated output always contains 0, 3, 6, or 9 fractional digits, - depending on required precision, followed by the suffix "s". Accepted - are any fractional digits (also none) as long as they fit into - nano-seconds precision and the suffix "s" is required. -
Structobject{ ... }Any JSON object. See struct.proto.
Wrapper typesvarious types2, "2", "foo", true, "true", null, 0, ...Wrappers use the same representation in JSON as the wrapped primitive - type, except that null is allowed and preserved during data - conversion and transfer. -
FieldMaskstring"f.fooBar,h"See field_mask.proto.
ListValuearray[foo, bar, ...]
ValuevalueAny JSON value. Check - google.protobuf.Value - for details. -
NullValuenullJSON null
Emptyobject{}An empty JSON object
- -### JSON Options {#json-options} - -A protobuf JSON implementation may provide the following options: - -* **Always emit fields without presence**: Fields that don't support presence - and that have their default value are omitted by default in JSON output (for - example, an implicit presence integer with a 0 value, implicit presence - string fields that are empty strings, and empty repeated and map fields). An - implementation may provide an option to override this behavior and output - fields with their default values. - - As of v25.x, the C++, Java, and Python implementations are nonconformant, as - this flag affects proto2 `optional` fields but not proto3 `optional` fields. - A fix is planned for a future release. - -* **Ignore unknown fields**: The protobuf JSON parser should reject unknown - fields by default but may provide an option to ignore unknown fields in - parsing. - -* **Use proto field name instead of lowerCamelCase name**: By default the - protobuf JSON printer should convert the field name to lowerCamelCase and - use that as the JSON name. An implementation may provide an option to use - proto field name as the JSON name instead. Protobuf JSON parsers are - required to accept both the converted lowerCamelCase name and the proto - field name. - -* **Emit enum values as integers instead of strings**: The name of an enum - value is used by default in JSON output. An option may be provided to use - the numeric value of the enum value instead. +The standard protobuf binary wire format is the preferred serialization format +for communication between two systems that use protobufs. For communicating with +systems that use JSON rather than protobuf wire format, Protobuf supports a +canonical encoding in [JSON](/programming-guides/json). ## Options {#options} diff --git a/content/reference/java/java-proto-names.md b/content/reference/java/java-proto-names.md index 6d64e1f33..8233f8adf 100644 --- a/content/reference/java/java-proto-names.md +++ b/content/reference/java/java-proto-names.md @@ -2,7 +2,7 @@ title = "Java Proto Names" weight = 655 linkTitle = "Generated Proto Names" -description = "Names that are generated by the immutable API." +description = "Names that are generated by the Java protoc plugin." type = "docs" +++ @@ -10,47 +10,64 @@ This document contains information on what the fully-qualified Java name of a proto is, based on the different proto options. This name corresponds to the package you need to import to use that message. -**NOTE:** The `java_package` and `java_alt_api_package` options are interpreted -relative to the API indicated by `java_api_version`. For example, if -`java_api_version` is 1, then the proto1 package will be `java_package` and the -proto2 package (the "alternative" API) will be `java_alt_api_package`. And if -`java_api_version` is 2, then `java_package` determines the proto2 package and -`java_alt_api_package` determines the proto1 package. +## Recommendation { #recommendation } + +* Set `option java_multiple_files = true;` +* Set `option java_outer_classname = "FileNameProto";` +* Set `option java_package = "com.google.package";` + +### Explanation {#explanation} + +#### Multiple Files {#multiple-files} + +With `java_multiple_files = true`, the generated Java class for each message +will be placed in a separate `.java` file. This makes it much easier to move +messages from one `.proto` file to another. + +#### Outer Classname {#outer-classname} + +There is a Java class generated for the `.proto` file itself. The name of the +class for the file will be automatically generated if not specified. However, +the rules for how that name is generated are overly-complicated and non-obvious. +The best policy is to explicitly set the `java_outer_classname` option to the +`.proto` file name converted to PascalCase with the `'.'` removed. For example: + +* The file `student_record_request.proto` should set: + + ```proto + option java_outer_classname = "StudentRecordRequestProto"; + ``` + +#### Java Package {#java-package} + +The Java package for generated bindings will be automatically set to the proto +package. However, this is usually not conformant with Java conventions. To +ensure a conventional Java package name, we recommend explicitly setting the +`java_package` option. For example, within Google, the convention is to prepend +`com.google.` to the proto package. ## Immutable API Message Names { #immutable-api-message-names } -The names for protos generated by the immutable API (`java_proto_library` BUILD -target) are listed in the following table. - -java_api_version | java_multiple_files | java_alt_api_package | java_package | java_outer_classname | Generated full message name -:--------------: | :-----------------: | -------------------- | ------------ | -------------------- | --------------------------- -1 | true | Defined | - | | `$java_alt_api_package.$message` -1 | true | Not defined | Not defined | | `com.google.protos.$package.proto2api.$message` -1 | true | Not defined | Defined | | `$java_package.proto2api.$message` -1 | false | Defined | - | Not defined | `$java_alt_api_package.$derived_outer_class.$message` -1 | false | Defined | - | Defined | `$java_alt_api_package.$java_outer_classname.$message` -1 | false | Not defined | Not defined | Not defined | `com.google.protos.$package.proto2api.$derived_outer_class.$message` -1 | false | Not defined | Not defined | Defined | `com.google.protos.$package.proto2api.$java_outer_classname.$message` -1 | false | Not defined | Defined | Not defined | `$java_package.proto2api.$derived_outer_class.$message` -1 | false | Not defined | Defined | Defined | `$java_package.proto2api.$java_outer_classname.$message` -2 | true | - | Not defined | - | `com.google.protos.$package.$message` -2 | true | - | Defined | - | `$java_package.$message` -2 | false | - | Not defined | Not defined | `com.google.protos.$package.$derived_outer_class.$message` -2 | false | - | Not defined | Defined | `com.google.protos.$package.$java_outer_classname.$message` -2 | false | - | Defined | Not defined | `$java_package.$derived_outer_class.$message` -2 | false | - | Defined | Defined | `$java_package.$java_outer_classname.$message` +The Java plugin for protoc will generate names according to this table. + +java_multiple_files | java_package | java_outer_classname | Generated full message name +:-----------------: | ------------ | -------------------- | --------------------------- +true | Not defined | *ignored* | `com.google.protos.$package.$message` +true | Defined | *ignored* | `$java_package.$message` +false | Not defined | Not defined | `com.google.protos.$package.$derived_outer_class.$message` +false | Not defined | Defined | `com.google.protos.$package.$java_outer_classname.$message` +false | Defined | Not defined | `$java_package.$derived_outer_class.$message` +false | Defined | Defined | `$java_package.$java_outer_classname.$message` **Legend** -* \- means either setting or not setting the option will not change the - generated full message name. * `$message` is the actual name of the proto message. * `$package` is the name of the proto package. This is the name specified by the `package` directive in the proto file, which is usually at the top of the file. * `$derived_outer_class` is a name generated from the proto file name. Generally it's computed by removing punctuation from the file name and - converting it to CamelCase. For example, if the proto is `foo_bar.proto`, + converting it to PascalCase. For example, if the proto is `foo_bar.proto`, the `$derived_outer_class` value is `FooBar`. If the generated class name would be the same as one of the messages defined @@ -60,22 +77,5 @@ java_api_version | java_multiple_files | java_alt_api_package | java_package | j true when using the v1 API, whether or not the class name would be the same as one of the messages defined. -* All other `$names` are the values of the corresponding proto2 file options - defined in the proto file. - -### Recommendation { #recommendation } - -The recommended option to use is: - -```proto -option java_multiple_files = true; -``` - -With `java_multiple_files = true`, the generated Java class for each message -will be placed in a separate `.java` file. This makes it much easier to move -messages from one `.proto` file to another. There is also an outer Java class -generated for the `.proto` file itself. (The legend above explains how this -outer class name is generated.) - -The `java_api_version` option defaults to `2`, but you can manually set it to -`1` when necessary. +* All other `$names` are the values of the corresponding file options defined + in the `.proto` file. diff --git a/content/reference/protobuf/edition-2023-spec.md b/content/reference/protobuf/edition-2023-spec.md index a2c2c9cbf..5c86151cb 100644 --- a/content/reference/protobuf/edition-2023-spec.md +++ b/content/reference/protobuf/edition-2023-spec.md @@ -323,26 +323,26 @@ of the following are prohibited: ``` message MyMessage { - optional string foo = 1; + string foo = 1; message foo {} } message MyMessage { - optional string foo = 1; + string foo = 1; oneof foo { string bar = 2; } } message MyMessage { - optional string foo = 1; + string foo = 1; extend Extendable { - optional string foo = 2; + string foo = 2; } } message MyMessage { - optional string foo = 1; + string foo = 1; enum E { foo = 0; } @@ -362,7 +362,7 @@ Example: ```proto extend Foo { - optional int32 bar = 126; + int32 bar = 126; } ``` @@ -415,7 +415,7 @@ message Outer { } message Foo { message GroupMessage { - optional bool a = 1; + bool a = 1; } GroupMessage groupmessage = [features.message_encoding = DELIMITED]; } diff --git a/content/reference/rust/building-rust-protos.md b/content/reference/rust/building-rust-protos.md new file mode 100644 index 000000000..d1ad0be26 --- /dev/null +++ b/content/reference/rust/building-rust-protos.md @@ -0,0 +1,65 @@ ++++ +title = "Building Rust Protos" +weight = 783 +linkTitle = "Building Rust Protos" +description = "Describes using Blaze to build Rust protos." +type = "docs" +toc_hide = "true" ++++ + +The process of building a Rust library for a Protobuf definition is similar to +other programming languages: + +1. Use the language-agnostic `proto_library` rule: + + ```build + proto_library( + name = "person_proto", + srcs = ["person.proto"], + ) + ``` + +2. Create a Rust library: + + ```build {highlight="lines:1,8-11"} + load("//third_party/protobuf/rust:defs.bzl", "rust_proto_library") + + proto_library( + name = "person_proto", + srcs = ["person.proto"], + ) + + rust_proto_library( + name = "person_rust_proto", + deps = [":person_proto"], + ) + ``` + +3. Use the library by including it in a Rust binary: + + ```build {highlight="lines:1,14-20"} + load("//third_party/bazel_rules/rules_rust/rust:defs.bzl", "rust_binary") + load("//third_party/protobuf/rust:defs.bzl", "rust_proto_library") + + proto_library( + name = "person_proto", + srcs = ["person.proto"], + ) + + rust_proto_library( + name = "person_rust_proto", + deps = [":person_proto"], + ) + + rust_binary( + name = "greet", + srcs = ["greet.rs"], + deps = [ + ":person_rust_proto", + ], + ) + ``` + +**Note:** Don't use `rust_upb_proto_library` or `rust_cc_proto_library` +directly. `rust_proto_library` checks the global build flag to choose the +appropriate backend for you. diff --git a/content/reference/rust/index.md b/content/reference/rust/index.md new file mode 100644 index 000000000..0565f5da9 --- /dev/null +++ b/content/reference/rust/index.md @@ -0,0 +1,8 @@ ++++ +title = "Rust Reference" +weight = 781 +linkTitle = "Rust" +description = "Reference documentation for working with protocol buffer classes in Rust." +type = "docs" +toc_hide = "true" ++++ diff --git a/content/reference/rust/rust-design-decisions.md b/content/reference/rust/rust-design-decisions.md new file mode 100644 index 000000000..34d97a527 --- /dev/null +++ b/content/reference/rust/rust-design-decisions.md @@ -0,0 +1,184 @@ ++++ +title = "Rust Proto Design Decisions" +weight = 782 +linkTitle = "Design Decisions" +description = "Explains some of the design choices that the Rust Proto implementation makes." +type = "docs" +toc_hide = "true" ++++ + +As with any library, Rust Protobuf is designed considering the needs of both +Google's first-party usage of Rust as well that of external users. Choosing a +path in that design space means that some choices made will not be optimal for +some users in some cases, even if it is the right choice for the implementation +overall. + +This page covers some of the larger design decisions that the Rust Protobuf +implementation makes and the considerations which led to those decisions. + +## Designed to Be ‘Backed’ by Other Protobuf Implementations, Including C++ Protobuf {#backed-by-cpp} + +Protobuf Rust is not a pure Rust implementation of protobuf, but a safe Rust API +implemented on top of existing protobuf implementations, or as we call these +implementations: kernels. + +The biggest factor that goes into this decision was to enable zero-cost of +adding Rust to a preexisting binary which already uses non-Rust Protobuf. Bby +enabling the implementation to be ABI-compatible with the C++ Protobuf generated +code, it is possible to share Protobuf messages across the language boundary +(FFI) as plain pointers, avoiding the need to serialize in one language, pass +the byte array across the boundary, and deserialize in the other language. This +also reduces binary size for these use cases by avoiding having redundant schema +information embedded in the binary for the same messages for each language. + +Protobuf Rust currently supports three kernels: + +* C++ kernel - the generated code is backed by C++ Protocol Buffers (the + "full" implementation, typically used for servers). This kernel offers + in-memory interoperability with C++ code that uses the C++ runtime. This is + the default for servers within Google. +* C++ Lite kernel - the generated code is backed by C++ Lite Protocol Buffers + (typically used for mobile). This kernel offers in-memory interoperability + with C++ code that uses the C++ Lite runtime. This is the default for + for mobile apps within Google. +* upb kernel - the generated code is backed by + [upb](https://github.com/protocolbuffers/protobuf/tree/main/upb), + a highly performant and small-binary-size Protobuf library written in C. upb + is designed to be used as an implementation detail by Protobuf runtimes in + other languages. This is the default in open source builds where we expect + static linking with code already using C++ Protobuf to be more rare. + +The decision to support multiple non-Rust kernels significantly influences the +our public API decisions, including the types used on getters (discussed later +in this document). + +### No Pure Rust Kernel {#no-pure-rust} + +Given that we designed the API to be implementable by multiple backing +implementations, a natural question is why the only supported kernels are +written in the memory unsafe languages of C and C++ today. + +While Rust being a memory-safe language can significantly reduce exposure to +critical security issues, no language is immune to security issues. The Protobuf +implementations that we support as kernels have been scrutinized and fuzzed to +the extent that Google is comfortable using those implementations to perform +unsandboxed parsing of untrusted inputs in our own servers and apps. A +greenfield binary parser written in Rust at this time would be understood to be +much more likely to contain critical vulnerabilities than the preexisting C++ +Protobuf parser. + +There are legitimate arguments for long-term supporting a pure Rust +implementation, including toolchain difficulties for developers using our +implementation in open source. + +It is a reasonable assumption that Google will support a pure Rust +implementation at some later date, but we are not investing in it today and have +no concrete roadmap for it at this time. + +## View/Mut Proxy Types {#view-mut-proxy-types} + +The Rust Proto API is designed with opaque "Proxy" types. For a .proto file that +defines `message SomeMsg {}`, we generate the Rust types `SomeMsg`, +`SomeMsgView<'_>` and `SomeMsgMut<'_>`. The simple rule of thumb is that we +expect the View and Mut types to stand in for `&SomeMsg` and `&mut SomeMsg` in +all usages by default, while still getting all of the borrow checking/Send/etc. +behavior that you would expect from those types. + +### Another Lens to Understand These Types {#another-lens} + +To better understand the nuances of these types, it may be useful to think of +these types as follows: + +```rust +struct SomeMsg(Box); +struct SomeMsgView<'a>(&'a cpp::SomeMsg); +struct SomeMsgMut<'a>(&'a mut cpp::SomeMsg); +``` + +Under this lens you can see that: + +- Given a `&SomeMsg` it is possible to get a `SomeMsgView` (similar to how + given a `&Box` you can get a `&T`) +- Given a `SomeMsgView` it in *not* possible to get a `&SomeMsg` (similar to + how given a `&T` you couldn't get a `&Box`). + +Just like with the `&Box` example, this means that on function arguments, it is +generally better to default to use `SomeMsgView<'a>` rather than a `&'a +SomeMsg`, as it will allow a superset of callers to use the function. + +### Why {#why} + +There are two main reasons for this design: to unlock possible optimization +benefits, and as an inherent outcome of the kernel design. + +#### Optimization Opportunity Benefit {#optimization} + +Protobuf being such a core and widespread technology makes it unusually both +prone to all possible observable behaviors being depended on by someone, as well +as relatively small optimizations having unusually major net impact at scale. We +have found that more opaqueness of types gives unusually high amount of +leverage: they permit us to be more deliberate about exactly what behaviors are +exposed, and give us more room to optimize the implementation. + +A `SomeMsgMut<'_>` provides those opportunities where a `&mut SomeMsg` would +not: namely that we can construct them lazily and with an implementation detail +which is not the same as the owned message representation. It also inherently +allows us to control certain behaviors that we couldn't otherwise limit or +control: for example, any `&mut` can be used with `std::mem::swap()`, which is a +behavior that would place strong limits on what invariants you are able to +maintain between a parent and child struct if `&mut SomeChild` is given to +callers. + +#### Inherent to Kernel Design {#kernel-design} + +The other reason for the proxy types is more of an inherent limitation to our +kernel design; when you have a `&T` there must be a real Rust `T` type in memory +somewhere. + +Our C++ kernel design allows you to parse a message which contains nested +messages, and create only a small Rust stack-allocated object to representing +the root message, with all other memory being stored on the C++ Heap. When you +later access a child message, there will be no already-allocated Rust object +which corresponds to that child, and so there's no Rust instance to borrow at +that moment. + +By using proxy types, we're able to on-demand create the Rust proxy types that +semantically acting as borrows, without there being any eagerly allocated Rust +memory for those instances ahead of time. + +## Non-Std Types {#non-std} + +### Simple Types Which May Have a Directly Corresponding Std Type {#corresponding-std} + +In some cases the Rust Protobuf API may choose to create our own types where a +corresponding std type exists with the same name, where the current +implementation may even simply wrap the std type, for example +`proto::UTF-8Error`. + +Using these types rather than std types gives us more flexibility in optimizing +the implementation in the future. While our current implementation uses the Rust +std UTF-8 validation today, by creating our own `proto::Utf8Error` type it +enables us to change the implementation to use the highly optimized C++ +implementation of UTF-8 validation that we use from C++ Protobuf which is faster +than Rust's std UTF-8 validation. + +### ProtoString {#proto-string} + +Rust's `str` and `std::string::String` types maintain a strict invariant that +they only contain valid UTF-8, but C++ Protobuf and C++'s `std::string` type +generally do not enforce any such guarantee. `string` typed Protobuf fields are +intended to only ever contain valid UTF-8, but the enforcement of this has many +holes where a `string` field may end up containing invalid UTF-8 contents at +runtime. + +To deliver on zero-cost message sharing between C++ and Rust while minimizing +costly validations or risk of undefined behavior in Rust, we chose not to using +the `str`/`String` types for `string` field getters, and introduced the types +`ProtoStr` and `ProtoString` instead which are equivalent types except they +could contain invalid UTF-8 in rare situations. Those types let the application +code choose if they wish to perform the validation on-demand to get a `&str`, or +operate on the raw bytes to avoid any validation. + +We are aware that vocabulary types like `str` are very important to idiomatic +usage, and intend to keep an eye on if this decision is the right one as usage +details of Rust evolves. diff --git a/content/reference/rust/rust-generated.md b/content/reference/rust/rust-generated.md new file mode 100644 index 000000000..91da233c2 --- /dev/null +++ b/content/reference/rust/rust-generated.md @@ -0,0 +1,567 @@ ++++ +title = "Rust Generated Code Guide" +weight = 782 +linkTitle = "Generated Code Guide" +description = "Describes the API of message objects that the protocol buffer compiler generates for any given protocol definition." +type = "docs" +toc_hide = "true" ++++ + +This page describes exactly what Rust code the protocol buffer compiler +generates for any given protocol definition. + +Any differences between proto2 and proto3 generated code are highlighted. You +should read the +[proto2 language guide](/programming-guides/proto2.md) +and/or +[proto3 language guide](/programming-guides/proto3.md) +before reading this document. + +## Protobuf Rust {#rust} + +Protobuf Rust is an implementation of protocol buffers designed to be able to +sit on top of other existing protocol buffer implementations that we refer to as +'kernels'. + +The decision to support multiple non-Rust kernels has significantly influenced +our public API, including the choice to use custom types like `ProtoStr` over +Rust std types like `str`. See +[Rust Proto Design Decisions](/reference/rust/rust-design-decisions.md) +for more on this topic. + +## Generated Filenames {#filenames} + +Each `rust_proto_library` will be compiled as one crate. Most importantly, for +every `.proto` file in the `srcs` of the corresponding `proto_library`, one Rust +file is emitted, and all these files form a single crate. + +Files generated by the compiler vary between kernels. In general, the names of +the output files are computed by taking the name of the `.proto` file and +replacing the extension. + +Generated files: + +* C++ kernel: + * `.c.pb.rs` - generated Rust code + * `.pb.thunks.cc` - generated C++ thunks (glue code that Rust code calls, + and that delegates to the C++ Protobuf APIs). +* C++ Lite kernel: + * <same as C++ kernel> +* UPB kernel + * `.u.pb.rs` - generated Rust code. \ + (However, `rust_proto_library` relies on the `.thunks.c` file produced + by `upb_proto_aspect`.) + +If the `proto_library` contains more than one file, the first file is declared a +"primary" file and is treated as the entry point for the crate; that file will +contain both the gencode corresponding to the `.proto` file, and also re-exports +for all symbols defined in the files corresponding to all "secondary" files. + +## Packages {#packages} + +Unlike in most other languages, the `package` declarations in the `.proto` files +are not used in Rust codegen. Instead, each `rust_proto_library(name = +"some_rust_proto")` target emits a crate named `some_rust_proto` which contains +the generated code for all `.proto` files in the target. + +## Messages {#messages} + +Given the message declaration: + +```proto +message Foo {} +``` + +The compiler generates a struct named `Foo`. The `Foo` struct defines the +following methods: + +* `fn new() -> Self`: Creates a new instance of `Foo`. +* `fn parse(data: &[u8]) -> Result`: Parses `data` + into an instance of `Foo` if `data` holds a valid wire format representation + of `Foo`. Otherwise, the function returns an error. +* `fn clear_and_parse(&mut self, data: &[u8]) -> Result<(), ParseError>`: Like + calling `.clear()` and `parse()` in sequence. +* `fn serialize(&self) -> Result, SerializeError>`: Serializes the + message to Protobuf wire format. Serialization can fail but rarely will. + Failure reasons include exceeding the maximum message size, insufficient + memory, and required fields (proto2) that are unset. +* `fn merge_from(&mut self, other)`: Merges `self` with `other`. +* `fn as_view(&self) -> FooView<'_>`: Returns an immutable handle (view) to + `Foo`. This is further covered in the section on proxy types. +* `fn as_mut(&mut self) -> FooMut<'_>`: Returns a mutable handle (mut) to + `Foo`. This is further covered in the section on proxy types. + +`Foo` implements the following traits: + +* `std::fmt::Debug` +* `std::default::Default` +* `std::clone::Clone` +* `std::ops::Drop` +* `std::marker::Send` +* `std::marker::Sync` + +#### Message Proxy Types {#message-proxy-types} + +As a consequence of the requirement to support multiple kernels with a single +Rust API, we cannot in some situations use native Rust references (`&T` and +`&mut T`), but instead, we need to express these concepts using types - `View`s +and `Mut`s. These situations are shared and mutable references to: + +* Messages +* Repeated fields +* Map fields + +For example, the compiler emits structs `FooView<'a>` and `FooMut<'msg>` +alongside `Foo`. These types are used in place of `&Foo` and `&mut Foo`, and +they behave the same as native Rust references in terms of borrow checker +behavior. Just like native borrows, Views are `Copy` and the borrow checker will +enforce that you can either have any number of Views or at most one Mut live at +a given time. + +For the purposes of this documentation, we focus on describing all methods +emitted for the owned message type (`Foo`). A subset of these functions with +`&self` receiver will also be included on the `FooView<'msg>`. A subset of these +functions with either `&self` or `&mut self` will also be included on the +`FooMut<'msg>`. + +## Nested Types {#nested-types} + +Given the message declaration: + +```proto +message Foo { + message Bar { + enum Baz { ... } + } +} +``` + +In addition to the struct named `Foo`, a module named `foo` is created to +contain the struct for `Bar`. And similarly a nested module named `bar` to +contain the deeply nested enum `Baz`: + +```rs +pub struct Foo {} + +pub mod foo { + pub struct Bar {} + pub mod bar { + pub struct Baz { ... } + } +} +``` + +## Fields {#fields} + +In addition to the methods described in the previous section, the protocol +buffer compiler generates a set of accessor methods for each field defined +within the message in the `.proto` file. + +Following Rust style, the methods are in lower-case/snake-case, such as +`has_foo()` and `clear_foo()`. Note that the capitalization of the field name +portion of the accessor maintains the style from the original .proto file, which +in turn should be lower-case/snake-case per the +[.proto file style guide](/programming-guides/style). + +### Optional Numeric Fields (proto2 and proto3) {#optional-numeric} + +For either of these field definitions: + +```proto +optional int32 foo = 1; +required int32 foo = 1; +``` + +The compiler will generate the following accessor methods: + +* `fn has_foo(&self) -> bool`: Returns `true` if the field is set. +* `fn foo(&self) -> i32`: Returns the current value of the field. If the field + is not set, it returns the default value. +* `fn foo_opt(&self) -> protobuf::Optional`: Returns an optional with the + variant `Set(value)` if the field is set or `Unset(default value)` if it's + unset. +* `fn set_foo(&mut self, val: i32)`: Sets the value of the field. After + calling this, `has_foo()` will return `true` and `foo()` will return + `value`. +* `fn clear_foo(&mut self)`: Clears the value of the field. After calling + this, `has_foo()` will return `false` and `foo()` will return the default + value. + +For other numeric field types (including `bool`), `int32` is replaced with the +corresponding C++ type according to the +[scalar value types table](/programming-guides/proto3#scalar). + +### Implicit Presence Numeric Fields (proto3) {#implicit-presence-numeric} + +For these field definitions: + +```proto +int32 foo = 1; +``` + +* `fn foo(&self) -> i32`: Returns the current value of the field. If the field + is not set, returns `0`. +* `fn set_foo(&mut self, val: i32)`: Sets the value of the field. After + calling this, `foo()` will return value. + +For other numeric field types (including `bool`), `int32` is replaced with the +corresponding C++ type according to the +[scalar value types table](/programming-guides/proto3#scalar). + +### Optional String/Bytes Fields (proto2 and proto3) {#optional-string-byte} + +For any of these field definitions: + +```proto +optional string foo = 1; +required string foo = 1; +optional bytes foo = 1; +required bytes foo = 1; +``` + +The compiler will generate the following accessor methods: + +* `fn has_foo(&self) -> bool`: Returns `true` if the field is set. +* `fn foo(&self) -> &protobuf::ProtoStr`: Returns the current value of the + field. If the field is not set, it returns the default value. +* `fn foo_opt(&self) -> protobuf::Optional<&ProtoStr>`: Returns an optional + with the variant `Set(value)` if the field is set or `Unset(default value)` + if it's unset. +* `fn clear_foo(&mut self)`: Clears the value of the field. After calling + this, `has_foo()` will return `false` and `foo()` will return the default + value. + +For fields of type `bytes` the compiler will generate the `ProtoBytes` type +instead. + +### Implicit Presence String/Bytes Fields (proto3) {#implicit-presence-string-byte} + +For these field definitions: + +```proto +optional string foo = 1; +string foo = 1; +optional bytes foo = 1; +bytes foo = 1; +``` + +The compiler will generate the following accessor methods: + +* `fn foo(&self) -> &ProtoStr`: Returns the current value of the field. If the + field is not set, returns the empty string/empty bytes. +* `fn foo_opt(&self) -> Optional<&ProtoStr>`: Returns an optional with the + variant `Set(value)` if the field is set or `Unset(default value)` if it's + unset. +* `fn set_foo(&mut self, value: IntoProxied)`: Sets the field to + `value`. After calling this function `foo()` will return `value` and + `has_foo()` will return `true`. +* `fn has_foo(&self) -> bool`: Returns `true` if the field is set. +* `fn clear_foo(&mut self)`: Clears the value of the field. After calling + this, `has_foo()` will return `false` and `foo()` will return the default + value. + +For fields of type `bytes` the compiler will generate the `ProtoBytes` type +instead. + +### Singular String and Bytes Fields with Cord Support {#singular-string-bytes} + +`[ctype = CORD]` enables bytes and strings to be stored as an +[absl::Cord](https://github.com/abseil/abseil-cpp/blob/master/absl/strings/cord.h) +in C++ Protobufs. `absl::Cord` currently does not have an equivalent type in +Rust . Protobuf Rust uses an enum to represent a cord +field: + +```proto +enum ProtoStringCow<'a> { + Owned(ProtoString), + Borrowed(&'a ProtoStr) +} +``` + +In the common case, for small strings, an `absl::Cord` stores its data as a +contiguous string. In this case cord accessors return +`ProtoStringCow::Borrowed`. If the underlying `absl::Cord` is non-contiguous, +the accessor copies the data from the cord into an owned `ProtoString` and +returns `ProtoStringCow::Owned`. The `ProtoStringCow` implements +`Deref`. + +For any of these field definitions: + +```proto +optional string foo = 1 [ctype = CORD]; +string foo = 1 [ctype = CORD]; +optional bytes foo = 1 [ctype = CORD]; +bytes foo = 1 [ctype = CORD]; +``` + +The compiler generates the following accessor methods: + +* `fn my_field(&self) -> ProtoStringCow<'_>`: Returns the current value of the + field. If the field is not set, returns the empty string/empty bytes. +* `fn set_my_field(&mut self, value: IntoProxied)`: Sets the + field to `value`. After calling this function `foo()` returns `value` and + `has_foo()` returns `true`. +* `fn has_foo(&self) -> bool`: Returns `true` if the field is set. +* `fn clear_foo(&mut self)`: Clears the value of the field. After calling + this, `has_foo()` returns `false` and `foo()` returns the default value. + Cords have not been implemented yet. + +For fields of type `bytes` the compiler generates the `ProtoBytesCow` type +instead. + +### Optional Enum Fields (proto2 and proto3) {#optional-enum} + +Given the enum type: + +```proto +enum Bar { + BAR_UNSPECIFIED = 0; + BAR_VALUE = 1; + BAR_OTHER_VALUE = 2; +} +``` + +The compiler generates a struct where each variant is an associated constant: + +```rust +pub struct Bar(i32); + +impl Bar { + pub const Unspecified: Bar = Bar(0); + pub const Value: Bar = Bar(1); + pub const OtherValue: Bar = Bar(2); +} +``` + +For either of these field definitions: + +```proto +optional Bar foo = 1; +required Bar foo = 1; +``` + +The compiler will generate the following accessor methods: + +* `fn has_foo(&self) -> bool`: Returns `true` if the field is set. +* `fn foo(&self) -> Bar`: Returns the current value of the field. If the field + is not set, it returns the default value. +* `fn foo_opt(&self) -> Optional`: Returns an optional with the variant + `Set(value)` if the field is set or `Unset(default value)` if it's unset. +* `fn set_foo(&mut self, val: Bar)`: Sets the value of the field. After + calling this, `has_foo()` will return `true` and `foo()` will return + `value`. +* `fn clear_foo(&mut self)`: Clears the value of the field. After calling + this, `has_foo()` will return false and `foo()` will return the default + value. + +### Implicit Presence Enum Fields (proto3) {#implicit-presence-enum} + +Given the enum type: + +```proto +enum Bar { + BAR_UNSPECIFIED = 0; + BAR_VALUE = 1; + BAR_OTHER_VALUE = 2; +} +``` + +For these field definitions: + +```proto +Bar foo = 1; +``` + +The compiler will generate the following accessor methods: + +* `fn foo(&self) -> Bar`: Returns the current value of the field. If the field + is not set, it returns the default value. +* `fn set_foo(&mut self, value: Bar)`: Sets the value of the field. After + calling this, `has_foo()` will return `true` and `foo()` will return + `value`. + +### Optional Embedded Message Fields (proto2 and proto3) {#optional-embedded-message} + +Given the message type: + +```proto +message Bar {} +``` + +For any of these field definitions: + +```proto +//proto2 +optional Bar foo = 1; + +//proto3 +Bar foo = 1; +optional Bar foo = 1; +``` + +The compiler will generate the following accessor methods: + +* `fn foo(&self) -> BarView<'_>`: Returns a view of the current value of the + field. If the field is not set it returns an empty message. +* `fn foo_mut(&mut self) -> BarMut<'_>`: Returns a mutable handle to the + current value of the field. Sets the field if it is not set. After calling + this method, `has_foo()` returns true. +* `fn foo_opt(&self) -> protobuf::Optional`: If the field is set, + returns the variant `Set` with its `value`. Else returns the variant `Unset` + with the default value. +* `fn set_foo(&mut self, value: impl protobuf::IntoProxied)`: Sets the + field to `value`. After calling this method, `has_foo()` returns `true`. +* `fn has_foo(&self) -> bool`: Returns `true` if the field is set. +* `fn clear_foo(&mut self)`: Clears the field. After calling this method + `has_foo()` returns `false`. + +### Repeated Fields {#repeated} + +For any repeated field definition the compiler will generate the same three +accessor methods that deviate only in the field type. + +For example, given the below field definition: + +```proto +repeated int32 foo = 1; +``` + +The compiler will generate the following accessor methods: + +* `fn foo(&self) -> RepeatedView<'_, i32>`: Returns a view of the underlying + repeated field. +* `fn foo_mut(&mut self) -> RepeatedMut<'_, i32>`: Returns a mutable handle to + the underlying repeated field. +* `fn set_foo(&mut self, src: impl IntoProxied>)`: Sets the + underlying repeated field to a new repeated field provided in `src`. + +For different field types only the respective generic types of the +`RepeatedView`, `RepeatedMut` and `Repeated` types will change. For example, +given a field of type `string` the `foo()` accessor would return a +`RepeatedView<'_, ProtoString>`. + +### Map Fields {#map} + +For this map field definition: + +```proto +map weight = 1; +``` + +The compiler will generate the following 3 accessor methods: + +* `fn weight(&self) -> protobuf::MapView<'_, i32, i32>`: Returns an immutable + view of the underlying map. +* `fn weight_mut(&mut self) -> protobuf::MapMut<'_, i32, i32>`: Returns a + mutable handle to the underlying map. +* `fn set_weight(&mut self, src: protobuf::IntoProxied>)`: Sets + the underlying map to `src`. + +For different field types only the respective generic types of the `MapView`, +`MapMut` and `Map` types will change. For example, given a field of type +`string` the `foo()` accessor would return a `MapView<'_, int32, ProtoString>`. + +## Any {#any} + +Any is not special-cased by Rust Protobuf at this time; it will behave as though +it was a simple message with this definition: + +```proto +message Any { + string type_url = 1; + bytes value = 2 [ctype = CORD]; +} +``` + +## Oneof {#oneof} + +Given a oneof definition like this: + +```proto +oneof example_name { + int32 foo_int = 4; + string foo_string = 9; + ... +} +``` + +The compiler will generate accessors (getters, setters, hazzers) for every field +as if the same field was declared as an `optional` field outside of the oneof. +So you can work with oneof fields like regular fields, but setting one will +clear the other fields in the oneof block. In addition, the following types are +emitted for the `oneof` block: + +```rust + #[non_exhaustive] + #[derive(Debug, Clone, Copy)] + + pub enum ExampleName<'msg> { + FooInt(i32) = 4, + FooString(&'msg protobuf::ProtoStr) = 9, + not_set(std::marker::PhantomData<&'msg ()>) = 0 + } +``` + +```rust + #[derive(Debug, Copy, Clone, PartialEq, Eq)] + + pub enum ExampleNameCase { + FooInt = 4, + FooString = 9, + not_set = 0 + } +``` + +Additionally, it will generate the two accessors: + +* `fn example_name(&self) -> ExampleName<_>`: Returns the enum variant + indicating which field is set and the field's value. Returns `not_set` if no + field is set. +* `fn example_name_case(&self) -> ExampleNameCase`: Returns the enum variant + indicating which field is set. Returns `not_set` if no field is set. + +## Enumerations {#enumerations} + +Given an enum definition like: + +```proto +enum Foo { + VALUE_A = 0; + VALUE_B = 5; + VALUE_C = 1234; +} +``` + +The compiler will generate: + +```rust + #[derive(Clone, Copy, PartialEq, Eq)] + + pub struct Foo(i32); + + impl Foo { + pub const ValueA: Foo = Foo(0); + pub const ValueB: Foo = Foo(5); + pub const ValueC: Foo = Foo(1234); + } +``` + +## Extensions (proto2 only) {#extensions} + +A Rust API for extensions is currently a work in progress. +Extension fields will be maintained through +parse/serialize, and in a C++ interop case any extensions set will be retained +if the message is accessed from Rust (and propagated in the case of a message +copy or merge). + +## Arena Allocation {#arena} + +A Rust API for arena allocated messages has not yet been implemented. + +Internally, Protobuf Rust on upb kernel uses arenas, but on C++ kernels it +doesn't. However, references (both const and mutable) to messages that were +arena allocated in C++ can be safely passed to Rust to be accessed or mutated. + +## Services {#services} + +A Rust API for services has not yet been implemented. diff --git a/content/reference/rust/rust-redaction.md b/content/reference/rust/rust-redaction.md new file mode 100644 index 000000000..e0ed06696 --- /dev/null +++ b/content/reference/rust/rust-redaction.md @@ -0,0 +1,24 @@ ++++ +title = "Redaction in Rust" +weight = 785 +linkTitle = "Redaction in Rust" +description = "Describes redaction in Rust." +type = "docs" +toc_hide = "true" ++++ + + + +Use the standard `fmt::Debug` ("`{:?}`" in format strings) on Protobuf messages +for human-readable strings for logging, error messages, exceptions, and similar +use cases. The output of this debug info is not intended to be machine-readable +(unlike `TextFormat` and `JSON` which are +[not be used for debug output](/programming-guides/dos-donts#text-format-interchange)). + +Using `fmt::Debug` enables redaction of some sensitive fields. + +Note that under upb kernel this redaction is not yet implemented, but is +expected to be added.