Refactor to remove `State` and make easier to use downstream #1

cowlicks · 2025-04-02T21:30:11Z

This is a major rewrite to simplify the API of this crate. We have been using Vec<u8> (or Box<[u8]>) as a buffer plus a State struct to track where in the buffer we are encoding or decoding. Instead, now we just use a buffer, and take slices of it to track where we are encoding or decoding. It also changes how the CompactEncoding trait works. Instead of doing impl CompactEncoding<T> for State we now do impl CompactEncoding for T.

This came about because while working downstream crates I found the usage awkward. See HypercoreState and this Encoder trait, needed an object to to be mutable to encode it here. These things are removed by PR's to their respective repos by issues fixed by this PR. hypercore pr and hypercore-protocol PR.

Further changes are explained below:

Implementing `CompactEncoding` For `Vec<T>` where `T: CompactEncoding`

There was a discussion in discord about having the Vec<T> impl's be automatically derived. There are problems with this.

First, we often don't want the same encoding of T within Vec<T> which is the case when we encode u32 as a variable width number, but in Vec, we want to encode the u32 as a fixed width number.

Second: when we auto derive CompacEncoding for Vec<T> we end up with the same generic implementation for all T.
This causes a problem for the preencode/encoded_size step.
Consider When T is always encoded with a fixed width the result is just: (preencode(vec.len()) + (preencode(T)*vec.len())
this is O(1) but for variable sized T, (like for Vec<String>) we must do something like (prencode(vec.len()) + vec.fold(0, |acc, x| acc + preencode(x))
basically we must calculate the size of everything element in the vec and sum them. this is O(N)
To have a correct generic preencode we'd have to use latter, and we'd miss out on getting O(1) preencode for things like Vec
And we would not implement CompactEncoding for specific Vec<T> without creating a duplicate implementation that would conflict with a blanket impl for Vec<T>

A solution: `VecEncodable`

Instead I've added VecEncodable.
You implement: VecEncodable for MyType and you automatically get impl CompactEncoding for Vec<MyType>.
This also let's you avoid external traits on external types issues, since you are implementing the trait on MyType, not Vec<MyType>
I believe this orphan trait issue is why in the hypercore crate we added a HypercoreState(State) struct.

VecEncodable has a default impl for encode/decode and just requires that you implement vec_encoded_size.
However for the cases like u32 where we want a variable width encoding for a standalone u32 but a fixedwidth encoding for Vec<u32>
we do implement the VecEncodable::vec_encode and vec_decode methods to overide the default variable width encoding for u32
Note: We use the vec_ prefix on methods to avoid having disambiguate which encode/decode we want on types that implement CompactEncodable which has encode and decode methods

I've also added BoxedSliceEncodable8 which works the same as VecEncodable but gives you Box<[MyType]>.

name changes

CompactEncoding::preencode is replaced with CompactEncoding::encoded_size. I make this distinction because 'preencode' implies that it is doing some preperatory work, previously 'preencode' would update state by adding to the 'end' value. But the 'encoded_size' does not do this it just returns the calculated encoded size.
I kept the function naming pattern: encode_, decode_, etc.for fixed width. And also encode_var, decode_var for variable width
methods on BoxedSliceEncodable/VecEncodable etc are pefixed with boxed_slice_*, vec_* to avoid name collisions with the CompactEncoding trait's methods.

ergonomics

In my opinion, these changes make the library much easier to use. I've also included some macro's that make encoding and decoding more declarative, and concise. An example from the rewritten docs:

use compact_encoding::{map_decode, to_encoded_bytes};

let number = 41_u32;
let word = "hi";
// Encoded the passed values to a buffer
let encoded_buffer = to_encoded_bytes!(number, word);
// decode a buffer based on the types passed to `map_decode`
let ((number_dec, word_dec), remaining_buffer) = map_decode!(&encoded_buffer, [u32, String]);

assert!(remaining_buffer.is_empty());
assert_eq!(number_dec, number);
assert_eq!(word_dec, word);

Here is a comparison between implementing CompactingEncoding before and after this PR:

// PREVIOUSLY

impl CompactEncoding<RequestBlock> for HypercoreState {
    fn preencode(&mut self, value: &RequestBlock) -> Result<usize, EncodingError> {
        self.0.preencode(&value.index)?;
        self.0.preencode(&value.nodes)
    }

    fn encode(&mut self, value: &RequestBlock, buffer: &mut [u8]) -> Result<usize, EncodingError> {
        self.0.encode(&value.index, buffer)?;
        self.0.encode(&value.nodes, buffer)
    }

    fn decode(&mut self, buffer: &[u8]) -> Result<RequestBlock, EncodingError> {
        let index: u64 = self.0.decode(buffer)?;
        let nodes: u64 = self.0.decode(buffer)?;
        Ok(RequestBlock { index, nodes })
    }
}

Now with this PR:

impl CompactEncoding for RequestBlock {  // No more `HypercoreState` or `self.0.*`
    fn encoded_size(&self) -> Result<usize, EncodingError> {  // each fn is basically one line
        Ok(sum_encoded_size!(self.index, self.nodes))
    }

    fn encode<'a>(&self, buffer: &'a mut [u8]) -> Result<&'a mut [u8], EncodingError> {
        Ok(map_encode!(buffer, self.index, self.nodes))
    }

    fn decode(buffer: &[u8]) -> Result<(Self, &[u8]), EncodingError>
    where
        Self: Sized,
    {
        let ((index, nodes), rest) = map_decode!(buffer, [u64, u64]);
        Ok((RequestBlock { index, nodes }, rest))
    }
}

more notes

Default CompactEncoding impls - I've only included the implementation that existed in the existing compact_encoding crate. I noticed there were some that would be nice to have, but were missing (and not currently used downstream). I left these out they could be added later.
Helper functions - I've added exports of a bunch of functions (write_array, take_array, etc) which make implementing CompactEncoding in downstream libraries easier.
Fixed width encoding helpers are added for getting a fixed width encoding/decoding for types that have a default variable width encoding. See the FixedWidthEncoding trait for details.
The crate no longer exports nested modules, everything is at the root. This avoids requiring a breaking API change if we want to change the crates internal modules structure.

Note that these changes do add more code. But I think without the added and updated tests and documentation there would actually be a small decrease it lines-of-code.

I propose releasing this as a major version bump.

The same as Cenc::encode/decode except it also returns the # of bytes encoded and decoded

ttiurani

Great work!

Feel free to merge and do the release procedure yourself!

cowlicks added 6 commits November 8, 2024 00:06

Add CompactEncodable trait

e14e449

implement CompactEncodable for Vec<T>

dfaa143

just EncodingError

f5bf11c

make encoded_bytes take buffer

945e8aa

Add usize encoding for compactencodable

def5bae

Add IpAddr encodings

df76e71

cowlicks marked this pull request as draft April 2, 2025 21:30

cowlicks added 23 commits April 3, 2025 10:47

impl Clone, PartialEq, and Error for EncodingError

b81646f

docs

de6c4fd

take &[T] instead of &Vec<T> in funcs

23ddffc

handle errors, fix todo!()

2a28141

Add helper funcs for building EncodingError

0b6cadb

Add encodable module

de293d9

Add conversion for Encoding to Encodable

2caaf37

lints

ff969aa

refactoring

3636574

Remove EncodedSize trai

d72f8bd

fix todo!()

75020e3

begin adding encodable tests

5a23165

wip new encoding tests

9c2f085

Fix issues with implementing for String & str

108f7ca

Fix bug in u32

6625a60

redo all tests

37fe9c4

Add VecEncodable for u32

f24c455

add CompactEncodable::create_buffer

36ae9ec

lints

8ae2153

flush out macros and add doctests

9d0b6bd

better docs

2f53078

add box_ & vec_ prefix to VecEnc and BoxEnc trait methods

c7bf377

Rename encoded_bytes to encode

9431df9

cowlicks added 19 commits April 15, 2025 02:07

rewrite module documentation

5171bb8

cargo +nightly rustfmt

23b1359

replace lib.rs

58c54b5

rm unused

ff5a614

rm unused

e089aa9

fix problems from move

abcca34

rm unused

4b29f6d

rename types -> errors

a520454

rename and reorder

c633b5c

Add FixedWidthENcoding for uints

9dc20df

docs & rm debug & add as_array

eda12e1

dedup decode_usize

bab285f

more docs and clippy

f3ec0fb

Add EncodingError::external

2ed7ae3

Add Cenc::encode/decode_with_len

8ec7727

The same as Cenc::encode/decode except it also returns the # of bytes encoded and decoded

simplify summing in macros

d398358

Add fn for creating Box<[u8]> buffer

8234211

Fix benches to work with stuff

de0a24c

format doc comments code

4ec35b4

cowlicks force-pushed the hyperswarm-changes branch from ed96a38 to 4ec35b4 Compare April 30, 2025 17:33

cowlicks added 6 commits May 1, 2025 15:33

BoxArrayEnc -> BoxedSliceEnc

ed621f4

Use Box<[u8]> for buffer instead of Vec<u8>

329bf32

better doc comments

70471c5

fix benches

1289a93

more doc tests

a7f67ac

don't re-export error module

d1d597f

cowlicks marked this pull request as ready for review May 2, 2025 22:10

cowlicks changed the title ~~Hyperswarm changes~~ Refactor to remove State and make easier to use downstream May 2, 2025

ttiurani approved these changes May 3, 2025

View reviewed changes

cowlicks merged commit dc99341 into datrs:main May 5, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor to remove `State` and make easier to use downstream #1

Refactor to remove `State` and make easier to use downstream #1

cowlicks commented Apr 2, 2025 •

edited

Loading

ttiurani left a comment •

edited

Loading

Refactor to remove State and make easier to use downstream #1

Refactor to remove State and make easier to use downstream #1

Conversation

cowlicks commented Apr 2, 2025 • edited Loading

Implementing CompactEncoding For Vec<T> where T: CompactEncoding

A solution: VecEncodable

name changes

ergonomics

more notes

ttiurani left a comment • edited Loading

Choose a reason for hiding this comment

Refactor to remove `State` and make easier to use downstream #1

Refactor to remove `State` and make easier to use downstream #1

cowlicks commented Apr 2, 2025 •

edited

Loading

Implementing `CompactEncoding` For `Vec<T>` where `T: CompactEncoding`

A solution: `VecEncodable`

ttiurani left a comment •

edited

Loading