Difference files for identical data sets

When there are no differences, the differ makes an empty data set. Writing this out in Spark as a CSV results in an empty file, not even a header. Also, writing it out as a parquet results in n*3 segments, each small, that do contain the schema but no data and use up a fair amount of space. Currently that's 200 segments * 3 * 26kB = 15 MB.

What is the best way to represent no changes in both parquet and CSV form?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference files for identical data sets #1

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Difference files for identical data sets #1

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions