Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow users to pass an io.Writer to protojson #1673

Open
denysvitali opened this issue Jan 21, 2025 · 7 comments
Open

Allow users to pass an io.Writer to protojson #1673

denysvitali opened this issue Jan 21, 2025 · 7 comments

Comments

@denysvitali
Copy link

What language does this apply to?

Golang (protocolbuffers/protobuf-go)

Describe the problem you are trying to solve.
As part of open-telemetry/opentelemetry-collector#7095 we would like to switch to the new Protobuf Opaque API to increase the performance of opentelemetry-collector (specifically when exporting telemetry as JSON).

We would like to use encoding/protojson for this, but the current implementation doesn't allow users to pass an io.Writer.

The only method available to encode JSON is:

func Marshal(m proto.Message) ([]byte, error)

Describe the solution you'd like

We would like to see a solution similar to encoding/json's Encoder, where a new encoder can be created by passing an io.Writer and we can call Encode(obj).
This will prevent the use of a buffer that will only impact the encoding performance.

func NewEncoder(w io.Writer) *Encoder
func (enc *Encoder) Encode(v any) error

Describe alternatives you've considered

None

Additional context


cc/ @stapelberg, @els0r

Moved from protocolbuffers/protobuf#20049

@dsnet
Copy link
Member

dsnet commented Jan 21, 2025

I suspect it would be more forward thinking to match the prospective "encoding/json/v2" API, and thus provide:

func MarshalWrite(io.Writer, proto.Message) error
func MarshalEncode(*jsontext.Encoder, proto.Message) error
func UnmarshalRead(io.Reader, proto.Message) error
func UnmarshalDecode(*jsontext.Decoder, proto.Message) error

and similar methods hung off the Options type:

func (MarshalOptions) MarshalWrite(io.Writer, proto.Message) error
func (MarshalOptions) MarshalEncode(*jsontext.Encoder, proto.Message) error
func (UnmarshalOptions) UnmarshalRead(io.Reader, proto.Message) error
func (UnmarshalOptions) UnmarshalDecode(*jsontext.Decoder, proto.Message) error

The entirety of the internal/encoding/json package can be removed and replaced with the prospective encoding/json/jsontext package. In fact, internal/encoding/json can be thought of as an early prototype for what eventually became jsontext.

This has other benefits such as making it easier to bridge "encoding/json/v2" with "protojson" for users that are trying to serialize Go values that happen to have a proto.Message in it and want to use "protojson" for such values:

import "encoding/json/v2"

json.Marshal(v,
    json.WithMarshalers(json.MarshalToFunc(protojson.MarshalEncode)),
)

where v is a large Go value with proto.Message types somewhere within it, but is not itself a proto.Message.

@stapelberg
Copy link

@denysvitali Would you be willing to send a (Gerrit) change that implements your suggestion (and also @dsnet’s comments)? See https://github.com/protocolbuffers/protobuf-go/blob/master/CONTRIBUTING.md

@denysvitali
Copy link
Author

Sure thing!
I'll be able to start to work on this in roughly two weeks if no one else picks it up before then :)

@dsnet
Copy link
Member

dsnet commented Jan 23, 2025

I'm remembering now that https://github.com/protocolbuffers/protobuf-go/blob/master/internal/encoding/json is currently implemented only in terms of []byte. Switching it to a io.Reader and io.Writer isn't trivial.

Option 1: Buffer entirely

  • For writing, buffer the entire []byte and then call Write once.
  • For reading, read the entire body into a []byte and then unmarshal that.
  • There's no good way to support unmarshaling back-to-back messages unless you invent a Decoder type, but declaring such a type wouldn't play nice with the future of having jsontext.Decoder, which is trying to avoid the problem of "protojson"-like package reinventing a Decoder type just to maintain buffer state in-between top-level JSON values.
  • In both writing and reading, we're not truly streaming and have lost the main advantage of supporting io.Reader and io.Writer. We could at least pool the []byte used, which could save on allocations, but you'll have to think about how to handle encoding/json: incorrect usage of sync.Pool go#27735.

Option 2: Modify "./internal/encoding/json" to be streaming

  • Modify writing to slowly spit out the []byte as it gets full enough. This isn't hard to do.
  • Modify reading to slowly pull in data, but parsing becomes tricky since you have to deal with torn buffers. Much of the complexity in jsontext comes from dealing with this problem. Naive approaches to this can become O(N^2) and v1 "encoding/json" is O(n^2) in some edge cases. This can result in a security bug in the form of DOS through adversarial inputs.

Option 2a: Depend on "github.com/go-json-experiment/json/jsontext"

  • Depend on the "github.com/go-json-experiment/json/jsontext" package, which does the above. However, the module's public API is unstable and not suitable for widely used code. However, the implementation is battle-tested and used in production by services at Tailscale.
  • Unfortunately, I suspect marshaling will take a performance hit since the jsontext.Encoder is a general purpose implementation that must validate every WriteToken and WriteValue call against the JSON grammar. The internal Encoder implementation used by the protobuf module has no such grammar checking since it assumes the user (i.e.. "protojson") always uses it correctly.
  • On the plus side, you will get decent errors in both marshal and unmarshal. The existence of a state machine to validate for the JSON grammar lends itself to produce a JSON Pointer (RFC 6901) for referencing where an error occurred.

Option 2b: Vendor "github.com/go-json-experiment/json/jsontext"

  • Vendor in "github.com/go-json-experiment/json/jsontext", which avoids the unstable dependency problem. That module is under the same authors, so there are no legal issues (to my knowledge).
  • However, you won't be able to provide MarshalEncode or UnmarshalDecode yet since jsontext.Encoder and jsontext.Decoder are not publicly available types.
  • When "jsontext" lands in stdlib, you can replace the vendored copy with the official one (and adjust for any breaking API changes that may have happened).

Option 2c: Wait for "encoding/json/jsontext"

  • Wait for "jsontext" to land in stdlib. I'm planning on formally filing the v2 proposal in the upcoming weeks. Earliest possible release would be Go 1.25, but I suspect it will more realistically be some later release.
  • Unfortunately, the protobuf module needs to support some number older Go releases. Fortunately, we could place a stable version of the v2 API in "golang.org/x/json/v2" or something. Thus, you can depend on that package, knowing that the Encoder and Decoder types are just aliases to the stdlib declaration if you're using a sufficiently new Go toolchain.

Option 1.5: Use streaming features in v1 "encoding/json"

  • Have MarshalWrite still buffer the entire []byte before calling Write (since v1 json.Encoder lacks a WriteToken method).
  • Have UnmarshalRead be implemented under-the-hood using the v1 json.Decoder by repeatedly calling the ReadToken method. Unfortunately, this is slow and produces lots of garbage. See proposal: encoding/json: garbage-free reading of tokens go#40128. Also, the lexical scanner in v1 is significantly slower than parsing directly from a []byte. The v2 "jsontext" package gets close to the speeds of directly parsing from a []byte.

@puellanivis
Copy link
Collaborator

Also #1203 for context of the last time we discussed this matter… an issue that seems to even still be open. Though, it focused a lot more on streaming decoding vs singleton decoding, but still covered the topic.

Raising the primary important factor from that message here: json.NewDecoder(data).Decode(&v) is very commonly inherently buggy, because it’s misusing a streaming reader as a singleton reader, and thus does not consider trailing garbage erroneous.

@stapelberg
Copy link

Thanks for the analysis, @dsnet! Just a brief note: between 2a (dependency), 2b (vendor), 2c (wait), I have a preference for 2c. I don’t think this issue is important enough to warrant adding dependencies to Go Protobuf, and I think vendoring is cumbersome enough from a compliance perspective (even if the licenses match up) that I would want to avoid it, too.

@dsnet
Copy link
Member

dsnet commented Feb 6, 2025

BTW, a formal proposal for "encoding/json/v2" along with "encoding/json/jsontext" has be filed at golang/go#71497.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants