[WIP/proof-of-concept] Convoluted little-endian support#1555
[WIP/proof-of-concept] Convoluted little-endian support#1555valadaptive wants to merge 1 commit intogooglefonts:mainfrom
Conversation
|
/cc @cmyr, since you did a lot of the codegen and font-types work |
|
Thank you for getting this started! I would us to take the "full little-endian support in codegen and font-types" path unless there is a strong reason to do otherwise. I tend to think we should 1.0 with the featureset live in Chrome (#1329) before we start landing this. I will take that up with the team.
+1 to assignment at broad scope, I expect that will simplify things and I don't think we anticipate mixed-endian structures at time of writing. |
There was some discussion in #1329 that the release schedule is expected to slow down after 1.0, but little-endian support will definitely be a breaking change and hence a 2.0 for read-fonts and font-types. How is versioning going to work post-1.0? Will the branch for the next major version be the main one, or will it be the branch for the current one? Or will the commits just stay linear? I expect that little-endian support may still need some tweaking even after it first lands, once hvgl outlining is actually implemented in skrifa. |
|
I'll block out some time to take a look at this tomorrow! |
cmyr
left a comment
There was a problem hiding this comment.
Okay so I've had time to look at this, and then have also played around with going all-in and just making little-endian support a first-class part of Scalar, and I think I prefer the latter approach, but it's pretty close.
Was there a reason you preferred this approach?
The advantage of the other approach, to my mind, is just that it means adding fewer new types, but I do also see the appeal of keeping these changes isolated to one extra file.
I would also love to find a way to get rid of BytesWrapper, but I'm not sure what the best approach is, I see the problem with the offset array types, which make assumptions about endianness.
I can think of some fancier possible approaches (like actually having endianness be represented by marker types, and replacing BigEndian<T> with type BigEndian = Endian<Big, T>, and then we could have ArrayOfOffsets<T: ReadArgs, O: Scalar = Offset16, S: Endian = Big> but... it's unclear that this is actually any better. (One advantage though is that perhaps it would mean having less duplicate code between the BigEndian and LittleEndian types?
In any case, I think that both approaches are ultimately okay, but i don't see any harm in folding the _le functions into the main Scalar trait.
| /// we hide this trait; it isn't part of the public API, and this clarifies | ||
| /// the guarantee that it is only implemented for [u8; N] | ||
| mod sealed { | ||
| pub(super) mod sealed { |
There was a problem hiding this comment.
if we keep this I think it's fine to rename the trait to just ByteArray?
| } | ||
|
|
||
| /// A trait for types that contain a byte slice which can be decoded to a scalar value. | ||
| pub trait BytesWrapper: Sized { |
There was a problem hiding this comment.
If we keep this I would prefer to call this something like EndianWrapper or something to more clearly communicate that this is a common interface for the LittleEndian and BigEndian types. (I'd also clarify this in the docs)
There was a problem hiding this comment.
The other thing that this type encodes is that the underlying byte slice doesn't have the same alignment requirements as the scalar value it decodes to. Not sure how important that is to mention, though.
|
I've actually played around with that approach in another branch. I also think I prefer it, although it might result in generating little-endian I/O functions for types that only exist in big-endian tables. Right now I've hit a wall figuring out how to support writing little-endian tables-- I'll take another pass over the API with marker types, and see if it makes things cleaner when it comes to implementing support for writing-little endian (and also if it could somehow help reduce boilerplace for newtypes which wrap a One auxiliary thing I noticed: with the |
|
I'll think about this more in the light of day, but generally I think you're on the right path. One additional idea that behdad mentioned is that it would be nice if the annotation indicating endianness was on the table level, instead of the field level; and then maybe we set it on the fields after everything is parsed, around the same time we do the |
The
hvglandhvpmtables, unlike everything else in OpenType, store everything in little-endian. font-codegen, font-types, and read-fonts currently don't support it at all.There are a few paths we could take for implementing little-endian support:
hvglandhvpm. This is the approach that this PR takes.Scalartrait will require implementing conversions to and from big-endian and little-endian, and all types which currently assume big-endian will have to abstract over endianness.This definitely shouldn't be merged as-is, but I'm creating it to help spark some discussion around the design details here.
For this PR, I implemented little-endian support with a new
ScalarLEtrait which is a subtrait ofScalar--that lets it piggyback off some of the byte-conversion stuff, while not requiring allScalartypes to implement little-endian conversions that will likely go unused. It's definitely more convoluted like this, though.Things like
ArrayOfOffsetscurrently use theBigEndian<T>wrapper type. This PR also addsLittleEndian<T>. In order to makeArrayOfOffsetswork with either, I've introduced theBytesWrappertrait (name very not final), which represents an unaligned byte slice that can be parsed into its underlying type. This of course means that all methods that were previously onBigEndian<T>are now trait methods onBytesWrapper, and require importing that trait everywhere we want to read aBigEndian<T>. That's very annoying.For codegen, I implemented little-endian support at field-level granularity, but that's currently unnecessary since
hvglandhvpmare entirely little-endian and all other tables are entirely big-endian. If I want to implement this for real, I think I'd go back and move the#[little_endian]attribute to tables, but I did want to get others' opinions here.Again, this is mainly a proof-of-concept, but it highlights some of the details that we'll want to sort out.