Skip to content

Make Parquet SBBF serialize/deserialize helpers public for external reuse #8727

@RoseZhang123

Description

@RoseZhang123

Which part is this question about

https://github.com/apache/arrow-rs/blob/55.2.0/parquet/src/bloom_filter/mod.rs

Describe your question

Hi team!

I’d love to reuse the existing Split Block Bloom Filter implementation in parquet to back a byte-level cache in a downstream service of our team.

The problem is that the two helpers that convert between raw bitsets and Sbbf instances are pub(crate), so external crates can’t create an Sbbf from bytes or emit its bytes without re-implementing your logic.

Would you be open to relaxing the visibility and making the following methods pub (without changing their signatures)?
pub(crate) fn new(bitset: &[u8]) -> Self;
pub(crate) fn write<W: Write>(&self, mut writer: W) -> Result<(), ParquetError>

That would let us deserialize SBBFs straight from storage and re-serialize them after caching, while reusing the canonical implementation from this crate.

Thanks,

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions