Remove unneeded version tags from Bin_prot
-serialized data produced
by ppx_version
. Generate version tags on an as-needed basis where
they're useful. Detect versioning violations in the CI version linter
in a more useful way.
Versioned data types are built from other versioned types and
primitive OCaml types. When generating Bin_prot
-serialized data from
instances of those type, a version tag is generated not only for the
containing type, but also for each constituent type. That strategy
allows determining the version of any piece of the serialized data.
The Jane Street Async
versioned RPC mechanism used in
Mina_networking
already provides versioning of data sent between
nodes. Therefore, version-tagging of the serialized data is not needed
for that use case. Removing the version tags will reduce the amount of
data sent over the network.
Some precomputed blocks from Mainnet show the amount of data currently used by version tags:
Bin_io size Number of tags Proportion
9956 930 9%
1682720 121776 7% 1826633 131088 7% 1803712 129340 7%
We may wish to save serialized data and be able to determine its version. For that use case, a single "top" tag for the containing type suffices.
Some user-visible data, such as public keys, are formatted using the current serialization mechanism, where all contained types have version tags. For those few cases, allow use of the existing mechanism.
The CI version linter currently compares changes to versioned types against the PR base branch. That's more fine-grained than needed, if the goal of preventing version changes is to allow interoperability with deployed software. Instead, compare the types against the branch of the latest released software, as well as the base branch. That strategy will reduce the amount of noise produced by the linter, and provide more useful results.
Currently, tagging a Stable
module with %versioned
produces
shadowing bin_read_t
and bin_write_t
functions that read and write
version tags, positive integers, prepended to serialized data, via the
ppx_version
library. The proposal is not to shadow those functions,
and instead use those generated by deriving bin_io
, directly. It is
still useful to use %versioned
to assure that serializations do not
change for a given versioned type.
For backwards-compatibility, we can add the annotation
[@@@with_all_version_tags]
to the legacy Stable.Vn
modules. For each such module, generate a module
within it:
module With_all_version_tags = struct
type t = ...
let bin_read_t = ... bin_read_t ...
let bin_write_t = ... bin_write_t ...
...
end
where the shadowing functions are generated as they are today. For
this case, the constituent types must also have the same annotation,
which is enforced by the construction of the type t
: the constituent
types of t
are of the form M.Stable.Vn.With_all_version_tags.t
.
Within Stable
modules, we can have the annotation
[@@@with_top_version_tag]
to generate:
module With_top_version_tag = struct
type t = ...
let bin_read_t = ... bin_read_t ...
let bin_write_t = ... bin_write_t ...
...
end
where the shadowing functions handle a single version tag to the
serializations of outermost type instances. The constituent types need
not have the annotation: the type t
will be identical to the type
t
in the containing Vn
module.
The existing versioning system generates a function in Stable
modules:
val bin_read_to_latest_opt : Bin_prot.Common.buf -> pos_ref:(int ref) -> Stable.Latest.t option
That function deserializes data by dispatching on version tags. Serializations may or may not have version tags. If there are neither all-tagged or top-tagged serializations, generate nothing. If there are all-tagged serializations, generate:
val bin_read_all_tagged_to_latest : Bin_prot.Common.buf -> pos_ref:(int ref) -> Stable.Latest.t Or_error.t
This function will know only about the version in the legacy Vn
modules
where we've maintained all-tagging.
If there are top-tagged serializations, generate:
val bin_read_top_tagged_to_latest : Bin_prot.Common.buf -> pos_ref:(int ref) -> Stable.Latest.t Or_error.t
This function will know about all the versions within a Stable
module.
Because the default serialization does not contain version tags, it no
longer makes sense to generate such a function in Stable
. Instead,
generate it inside With_all_version_tags
and With_top_version_tag
,
if they're created. The return type should be changed to an
Or_error.t
type, to describe errors, instead of using an option
type.
The CI version linter looks for changes to versioned types in a PR branch by comparing those types with counterparts in the base branch. The linter will complain if a new version is added, and later modified, even if there are no deployed nodes using the new version.
Instead, the linter should use both the PR base branch and a release branch for comparison, according to this table:
Base diff Release diff Significance Warn?
N N No version changes N Y Y Version changed Y N Y Previous force merge N Y N Modified new version N
Under this regime, a new version can be added and modified as needed, without triggering a linter warning. If a PR contains an innocuous change to a versioned type, such as a field renaming, that can be accomplished with a force-merge, without triggering warnings in later PRs.
It should be enough to change the script ci_diff_types.sh
to compare
the PR branch against the release branch, in addition to the
comparison made with $BASE_BRANCH_NAME
. There may need to be a
mechanism to automate that change when releases become available,
which is beyond the scope of this RFC.
The version linter had been using the text of type definitions for
comparisons, and the "Binable" linter had been using the text of the
uses of the Binable
functors. We can instead use Bin_prot
shape
digests for comparisons in both cases. For a type t
, the digest is
available as a string:
Bin_prot.Shape.eval_to_digest_string bin_shape_t
With this approach to comparisons, we can use a single linter task in CI, rather than separate version and Binable linters.
Using shape digests to detect changes to serializations also means we
no longer need unit tests for version-asserted types. ppx_version
was checking for a For_tests
module in that case, to remind that
such tests were needed; that check can be removed.
Shape digests will also allow detecting changes to serializations
provided by hand-written bin_io
code. We may wish to add a module
annotation like %versioned_custom
for that use case. If the
bin_shape_t
for a type with a custom serialization is built with
Bin_prot.Shape.basetype
, the shape should use a new UUID
if the serialization changes (effectively, a version), for change
detection to be effective.
The existing versioning mechanism is working acceptably well. The only
drawback of making changes to the version-tagging is the effort to
update ppx_version
. That should not be a major effort, since it's
mostly reorganizing existing code.
If these changes are not made, network performance will be less than it could be.
We could omit the top-tagging mechanism, since there isn't a specific use case for it now. Adding it later is possible, if it's needed for some purpose. Then again, adding that feature now while working on other code changes may be easier than deferring it.
We could phase out the all-tagged mechanism, by allowing two syntaxes for public keys and other affected types, and eventually disallowing the old syntax. That strategy may be too drastic to impose upon users.
The relevant prior art is the existing versioning mechanism.
- Do we need top-tagging?
- What types will require all-tagging? Any user-visible Base58Check-encoded data created from
Bin_prot
-serializations will need that. One way to enforce this would be to haveCodable.Make_base58_check
take a module containing a valueall_tagged
, which would be contained in the generatedWith_all_version_tags
modules.