[WIP/proof-of-concept] Convoluted little-endian support by valadaptive · Pull Request #1555 · googlefonts/fontations

valadaptive · 2025-07-07T12:05:37Z

The hvgl and hvpm tables, unlike everything else in OpenType, store everything in little-endian. font-codegen, font-types, and read-fonts currently don't support it at all.

There are a few paths we could take for implementing little-endian support:

Don't add codegen support for it at all, and parse little-endian tables manually.
Add second-class support for little-endian types in codegen and font-types. Continue treating big-endian as the default, but add enough little-endian support that we can automatically parse hvgl and hvpm. This is the approach that this PR takes.
Add full little-endian support in codegen and font-types. Codegen inputs will be treated as big-endian by default, but can be treated as little-endian on either field-level or table-level granularity. Things like the Scalar trait will require implementing conversions to and from big-endian and little-endian, and all types which currently assume big-endian will have to abstract over endianness.

This definitely shouldn't be merged as-is, but I'm creating it to help spark some discussion around the design details here.

For this PR, I implemented little-endian support with a new ScalarLE trait which is a subtrait of Scalar--that lets it piggyback off some of the byte-conversion stuff, while not requiring all Scalar types to implement little-endian conversions that will likely go unused. It's definitely more convoluted like this, though.

Things like ArrayOfOffsets currently use the BigEndian<T> wrapper type. This PR also adds LittleEndian<T>. In order to make ArrayOfOffsets work with either, I've introduced the BytesWrapper trait (name very not final), which represents an unaligned byte slice that can be parsed into its underlying type. This of course means that all methods that were previously on BigEndian<T> are now trait methods on BytesWrapper, and require importing that trait everywhere we want to read a BigEndian<T>. That's very annoying.

For codegen, I implemented little-endian support at field-level granularity, but that's currently unnecessary since hvgl and hvpm are entirely little-endian and all other tables are entirely big-endian. If I want to implement this for real, I think I'd go back and move the #[little_endian] attribute to tables, but I did want to get others' opinions here.

Again, this is mainly a proof-of-concept, but it highlights some of the details that we'll want to sort out.

valadaptive · 2025-07-07T12:08:11Z

/cc @cmyr, since you did a lot of the codegen and font-types work

rsheeter · 2025-07-07T13:46:40Z

Thank you for getting this started!

I would us to take the "full little-endian support in codegen and font-types" path unless there is a strong reason to do otherwise.

I tend to think we should 1.0 with the featureset live in Chrome (#1329) before we start landing this. I will take that up with the team.

I think I'd go back and move the #[little_endian] attribute to tables, but I did want to get others' opinions here.

+1 to assignment at broad scope, I expect that will simplify things and I don't think we anticipate mixed-endian structures at time of writing.

valadaptive · 2025-07-07T13:52:57Z

I tend to think we should 1.0 with the featureset live in Chrome (#1329) before we start landing this. I will take that up with the team.

There was some discussion in #1329 that the release schedule is expected to slow down after 1.0, but little-endian support will definitely be a breaking change and hence a 2.0 for read-fonts and font-types. How is versioning going to work post-1.0? Will the branch for the next major version be the main one, or will it be the branch for the current one? Or will the commits just stay linear?

I expect that little-endian support may still need some tweaking even after it first lands, once hvgl outlining is actually implemented in skrifa.

cmyr · 2025-07-09T22:30:28Z

I'll block out some time to take a look at this tomorrow!

cmyr

Okay so I've had time to look at this, and then have also played around with going all-in and just making little-endian support a first-class part of Scalar, and I think I prefer the latter approach, but it's pretty close.

Was there a reason you preferred this approach?

The advantage of the other approach, to my mind, is just that it means adding fewer new types, but I do also see the appeal of keeping these changes isolated to one extra file.

I would also love to find a way to get rid of BytesWrapper, but I'm not sure what the best approach is, I see the problem with the offset array types, which make assumptions about endianness.

I can think of some fancier possible approaches (like actually having endianness be represented by marker types, and replacing BigEndian<T> with type BigEndian = Endian<Big, T>, and then we could have ArrayOfOffsets<T: ReadArgs, O: Scalar = Offset16, S: Endian = Big> but... it's unclear that this is actually any better. (One advantage though is that perhaps it would mean having less duplicate code between the BigEndian and LittleEndian types?

In any case, I think that both approaches are ultimately okay, but i don't see any harm in folding the _le functions into the main Scalar trait.

cmyr · 2025-07-10T20:59:51Z

font-types/src/raw.rs

 /// we hide this trait; it isn't part of the public API, and this clarifies
 /// the guarantee that it is only implemented for [u8; N]
-mod sealed {
+pub(super) mod sealed {


if we keep this I think it's fine to rename the trait to just ByteArray?

cmyr · 2025-07-10T21:01:08Z

font-types/src/raw.rs

 }

+/// A trait for types that contain a byte slice which can be decoded to a scalar value.
+pub trait BytesWrapper: Sized {


If we keep this I would prefer to call this something like EndianWrapper or something to more clearly communicate that this is a common interface for the LittleEndian and BigEndian types. (I'd also clarify this in the docs)

The other thing that this type encodes is that the underlying byte slice doesn't have the same alignment requirements as the scalar value it decodes to. Not sure how important that is to mention, though.

valadaptive · 2025-07-10T21:21:14Z

I've actually played around with that approach in another branch. I also think I prefer it, although it might result in generating little-endian I/O functions for types that only exist in big-endian tables.

Right now I've hit a wall figuring out how to support writing little-endian tables--FontWrite is recursive, and I don't want to have to separately implement FontWriteBE and FontWriteLE traits or something. For that purpose, I think an endianness marker type would be beneficial--the FontWrite implementation for a nested structure would just pass the endianness marker down all the way the "leaves" (at compile time, as a type parameter), which would be the only types that need to care about endianness.

I'll take another pass over the API with marker types, and see if it makes things cleaner when it comes to implementing support for writing-little endian (and also if it could somehow help reduce boilerplace for newtypes which wrap a Scalar?)

One auxiliary thing I noticed: with the ArrayOfOffsets changes, I think it's possible to remove ArrayOfNullableOffsets. If ResolveOffset has a generic associated type Resolved<T>, it could just be T for Offset, but Option<T> for Nullable<impl Offset>. This would require swapping the resolved type for a nullable offset from Option<Result<T, ReadError>> to Result<Option<T>, ReadError>, which is a rather broad change and I'm not sure if it'd affect semantics at all. Not something we have to decide on now, but an interesting option for the future.

cmyr · 2025-07-11T02:03:10Z

I'll think about this more in the light of day, but generally I think you're on the right path.

One additional idea that behdad mentioned is that it would be nice if the annotation indicating endianness was on the table level, instead of the field level; and then maybe we set it on the fields after everything is parsed, around the same time we do the sanity_check pass?

Convoluted little-endian support

f9c2053

valadaptive mentioned this pull request Jul 9, 2025

Implement hvgl outlines #1559

Closed

cmyr reviewed Jul 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP/proof-of-concept] Convoluted little-endian support#1555

[WIP/proof-of-concept] Convoluted little-endian support#1555
valadaptive wants to merge 1 commit intogooglefonts:mainfrom
valadaptive:little-endian

valadaptive commented Jul 7, 2025

Uh oh!

valadaptive commented Jul 7, 2025

Uh oh!

rsheeter commented Jul 7, 2025

Uh oh!

valadaptive commented Jul 7, 2025

Uh oh!

cmyr commented Jul 9, 2025

Uh oh!

cmyr left a comment

Uh oh!

cmyr Jul 10, 2025

Uh oh!

cmyr Jul 10, 2025

Uh oh!

valadaptive Jul 10, 2025

Uh oh!

valadaptive commented Jul 10, 2025

Uh oh!

cmyr commented Jul 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

valadaptive commented Jul 7, 2025

Uh oh!

valadaptive commented Jul 7, 2025

Uh oh!

rsheeter commented Jul 7, 2025

Uh oh!

valadaptive commented Jul 7, 2025

Uh oh!

cmyr commented Jul 9, 2025

Uh oh!

cmyr left a comment

Choose a reason for hiding this comment

Uh oh!

cmyr Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

cmyr Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

valadaptive Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

valadaptive commented Jul 10, 2025

Uh oh!

cmyr commented Jul 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants