Skip to content

Error reading parquet FileMetaData with empty lists encoded as element-type=0 #8826

@pmarks

Description

@pmarks

I have a collection of parquet files that started throwing ParquetError::General("Unexpected list/set element type0") when parsing FileMetaData. They hit this error path, due to having an ElementType of 0, and a length of 0 in the FileMetaData.row_groups.columns.meta_data.key_value_metadata list header. In the spec, an empty-map with an element-type of 0 is allowed for maps, but doesn't seem to be explicitly allowed for lists. So perhaps these files are technically out of spec, but I haven't yet encountered a reader that rejected them, and I have used a variety of tools with these files. The parquet files were created with "fastparquet-python version 2024.2.0 (build 0)".

I don't have an easy way to share the filea - one can be downloaded here - NOTE: 6GB download - get the transcripts.parquet file out of the zip archive.

I bisected this crash to #8530. @etseidl

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions