Skip to content

Conversation

LucasEby
Copy link
Contributor

@LucasEby LucasEby commented Oct 1, 2025

Fixes #24804

Motivation

In ProtobufNativeSchemaTest.testSchema, the json's contents and the FileDescriptorSet field's contents do not have a deterministic order but the hardcoded string assertion assumes a deterministic order. The json serializer did not guarantee attribute order and inside FileDescriptorSet the contents can also be in different orders due to different generation paths or environments producing the contents in different orders despite the logical content being the same. Since the original test compared the raw strings/trees "as-is", harmless re-ordering could flip the test from pass to fail without any real schema change.

Modifications

We no longer compare raw strings/trees "as-is". Instead the fileDescriptorSet field is base64-decoded into a FileDescriptorSet, then converted into a JSONObject. We compare the expected and actual descriptors with JsonAssert in NON_EXTENSIBLE mode, which ignores field ordering but forbids missing or extra fields.

Next, we remove FileDescriptorSet from the outer JSON and compare the remainder in the NON_EXTENSIBLE mode again. We implemented this two stage approach because JSON assertion does not re-order the inner contents of the FileDescriptorSet (compares directly as strings instead) even if though it's decoded contents can be re-ordered.

By decoding and normalizing fileDescriptorSet first, we ensure the test remains robust to nondeterministic ordering inside the descriptor set, while still validating the schema structure strictly. This change keeps the spirit of the original test while eliminating failures caused solely by allowed reordering.

NOTE: Some additional assertTrue and assertFalse assertions were also added to ensure the test remains robust in the event that the hardcoded string is ever modified for some reason. This will prevent someone from modifying the string and then forgetting to modify the test afterwards, thus causing the fileDescriptorSet assertion to incorrectly pass from comparing empty strings.

Verifying this change

  • Make sure that the change passes the CI checks.

This change is already covered by existing tests, such as org.apache.pulsar.client.impl.schema.ProtobufSchemaTest#testSchema.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: LucasEby#1

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Oct 1, 2025
@LucasEby LucasEby force-pushed the testSchemaNondeterministic branch from 3146821 to 1fbb84e Compare October 1, 2025 17:52
@LucasEby LucasEby force-pushed the testSchemaNondeterministic branch 3 times, most recently from d39852c to c84ac39 Compare October 4, 2025 00:15
@LucasEby
Copy link
Contributor Author

LucasEby commented Oct 8, 2025

@lhotari I incorporated your suggestions and made the code easier to follow. All checks passed on my fork: LucasEby#1
The code should be good to merge

Comment on lines 112 to 139
String fdSetField = "fileDescriptorSet";
JsonNode actualFdsNode = fdsToJson(actualRoot.path(fdSetField).asText());
JsonNode expectedFdsNode = fdsToJson(expectedRoot.path(fdSetField).asText());
((ObjectNode) actualRoot).set(fdSetField, actualFdsNode);
((ObjectNode) expectedRoot).set(fdSetField, expectedFdsNode);

// Ensure the fields and values are the same, order does not matter
JSONAssert.assertEquals(
MAPPER.writeValueAsString(expectedRoot),
MAPPER.writeValueAsString(actualRoot),
JSONCompareMode.NON_EXTENSIBLE
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be easier to understand the logic if "fileDescriptorSet" would be ignored in the comparison of the schema json and compared separately. Assertions would optimally have a description message to explain what went wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lhotari I incorporated the requested changes. The changes passed CI checks on my personal fork: LucasEby#1

I additionally added some additional assertTrue and assertFalse assertions to ensure the test remains robust in the event that the hardcoded string is ever modified for some reason. The assertions ensure the retrieved fileDescriptorSet field contains data and the strings compared later won’t be empty, thus preventing the possible scenario of the fileDescriptorSet assertions incorrectly passing from comparing empty strings.

@LucasEby LucasEby force-pushed the testSchemaNondeterministic branch from c84ac39 to 3473283 Compare October 17, 2025 18:00
@LucasEby LucasEby force-pushed the testSchemaNondeterministic branch from 3473283 to 6f31a65 Compare October 17, 2025 22:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] ProtobufNativeSchemaTest.testSchema order-independent

2 participants