Skip to content

Refactor to remove State and make easier to use downstream #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 56 commits into from
May 5, 2025

Conversation

cowlicks
Copy link
Member

@cowlicks cowlicks commented Apr 2, 2025

This is a major rewrite to simplify the API of this crate. We have been using Vec<u8> (or Box<[u8]>) as a buffer plus a State struct to track where in the buffer we are encoding or decoding. Instead, now we just use a buffer, and take slices of it to track where we are encoding or decoding. It also changes how the CompactEncoding trait works. Instead of doing impl CompactEncoding<T> for State we now do impl CompactEncoding for T.

This came about because while working downstream crates I found the usage awkward. See HypercoreState and this Encoder trait, needed an object to to be mutable to encode it here. These things are removed by PR's to their respective repos by issues fixed by this PR. hypercore pr and hypercore-protocol PR.

Further changes are explained below:

Implementing CompactEncoding For Vec<T> where T: CompactEncoding

There was a discussion in discord about having the Vec<T> impl's be automatically derived. There are problems with this.

First, we often don't want the same encoding of T within Vec<T> which is the case when we encode u32 as a variable width number, but in Vec, we want to encode the u32 as a fixed width number.

Second: when we auto derive CompacEncoding for Vec<T> we end up with the same generic implementation for all T.
This causes a problem for the preencode/encoded_size step.
Consider When T is always encoded with a fixed width the result is just: (preencode(vec.len()) + (preencode(T)*vec.len())
this is O(1) but for variable sized T, (like for Vec<String>) we must do something like (prencode(vec.len()) + vec.fold(0, |acc, x| acc + preencode(x))
basically we must calculate the size of everything element in the vec and sum them. this is O(N)
To have a correct generic preencode we'd have to use latter, and we'd miss out on getting O(1) preencode for things like Vec
And we would not implement CompactEncoding for specific Vec<T> without creating a duplicate implementation that would conflict with a blanket impl for Vec<T>

A solution: VecEncodable

Instead I've added VecEncodable.
You implement: VecEncodable for MyType and you automatically get impl CompactEncoding for Vec<MyType>.
This also let's you avoid external traits on external types issues, since you are implementing the trait on MyType, not Vec<MyType>
I believe this orphan trait issue is why in the hypercore crate we added a HypercoreState(State) struct.

VecEncodable has a default impl for encode/decode and just requires that you implement vec_encoded_size.
However for the cases like u32 where we want a variable width encoding for a standalone u32 but a fixedwidth encoding for Vec<u32>
we do implement the VecEncodable::vec_encode and vec_decode methods to overide the default variable width encoding for u32
Note: We use the vec_ prefix on methods to avoid having disambiguate which encode/decode we want on types that implement CompactEncodable which has encode and decode methods

I've also added BoxedSliceEncodable8 which works the same as VecEncodable but gives you Box<[MyType]>.

name changes

  • CompactEncoding::preencode is replaced with CompactEncoding::encoded_size. I make this distinction because 'preencode' implies that it is doing some preperatory work, previously 'preencode' would update state by adding to the 'end' value. But the 'encoded_size' does not do this it just returns the calculated encoded size.

  • I kept the function naming pattern: encode_, decode_, etc.for fixed width. And also encode_var, decode_var for variable width

  • methods on BoxedSliceEncodable/VecEncodable etc are pefixed with boxed_slice_*, vec_* to avoid name collisions with the CompactEncoding trait's methods.

ergonomics

In my opinion, these changes make the library much easier to use. I've also included some macro's that make encoding and decoding more declarative, and concise. An example from the rewritten docs:

use compact_encoding::{map_decode, to_encoded_bytes};

let number = 41_u32;
let word = "hi";
// Encoded the passed values to a buffer
let encoded_buffer = to_encoded_bytes!(number, word);
// decode a buffer based on the types passed to `map_decode`
let ((number_dec, word_dec), remaining_buffer) = map_decode!(&encoded_buffer, [u32, String]);

assert!(remaining_buffer.is_empty());
assert_eq!(number_dec, number);
assert_eq!(word_dec, word);

Here is a comparison between implementing CompactingEncoding before and after this PR:

// PREVIOUSLY

impl CompactEncoding<RequestBlock> for HypercoreState {
    fn preencode(&mut self, value: &RequestBlock) -> Result<usize, EncodingError> {
        self.0.preencode(&value.index)?;
        self.0.preencode(&value.nodes)
    }

    fn encode(&mut self, value: &RequestBlock, buffer: &mut [u8]) -> Result<usize, EncodingError> {
        self.0.encode(&value.index, buffer)?;
        self.0.encode(&value.nodes, buffer)
    }

    fn decode(&mut self, buffer: &[u8]) -> Result<RequestBlock, EncodingError> {
        let index: u64 = self.0.decode(buffer)?;
        let nodes: u64 = self.0.decode(buffer)?;
        Ok(RequestBlock { index, nodes })
    }
}

Now with this PR:

impl CompactEncoding for RequestBlock {  // No more `HypercoreState` or `self.0.*`
    fn encoded_size(&self) -> Result<usize, EncodingError> {  // each fn is basically one line
        Ok(sum_encoded_size!(self.index, self.nodes))
    }

    fn encode<'a>(&self, buffer: &'a mut [u8]) -> Result<&'a mut [u8], EncodingError> {
        Ok(map_encode!(buffer, self.index, self.nodes))
    }

    fn decode(buffer: &[u8]) -> Result<(Self, &[u8]), EncodingError>
    where
        Self: Sized,
    {
        let ((index, nodes), rest) = map_decode!(buffer, [u64, u64]);
        Ok((RequestBlock { index, nodes }, rest))
    }
}

more notes

  • Default CompactEncoding impls - I've only included the implementation that existed in the existing compact_encoding crate. I noticed there were some that would be nice to have, but were missing (and not currently used downstream). I left these out they could be added later.
  • Helper functions - I've added exports of a bunch of functions (write_array, take_array, etc) which make implementing CompactEncoding in downstream libraries easier.
  • Fixed width encoding helpers are added for getting a fixed width encoding/decoding for types that have a default variable width encoding. See the FixedWidthEncoding trait for details.
  • The crate no longer exports nested modules, everything is at the root. This avoids requiring a breaking API change if we want to change the crates internal modules structure.

Note that these changes do add more code. But I think without the added and updated tests and documentation there would actually be a small decrease it lines-of-code.

I propose releasing this as a major version bump.

@cowlicks cowlicks marked this pull request as draft April 2, 2025 21:30
@cowlicks cowlicks force-pushed the hyperswarm-changes branch from ed96a38 to 4ec35b4 Compare April 30, 2025 17:33
@cowlicks cowlicks marked this pull request as ready for review May 2, 2025 22:10
@cowlicks cowlicks changed the title Hyperswarm changes Refactor to remove State and make easier to use downstream May 2, 2025
Copy link
Member

@ttiurani ttiurani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

Feel free to merge and do the release procedure yourself!

@cowlicks cowlicks merged commit dc99341 into datrs:main May 5, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants