Skip to content

Can the writer support the scenario when the output is a positional file with header&data? #843

@Il-Pela

Description

@Il-Pela

Question

Can the writer support the scenario when the output is a positional file with header&data?

Considering header and data as two different sets of records having different schemas, usually header is 1 row while data is N row.

What I was considering:

  • manage header and data in same Spark Dataframe. Resulting in a Df with schema the union of columns of header and columns of data. Headers column will have values only in first row and nulls in the rest N rows, while data columns will have nulls in first row and values only in N rows after the first one (where N is the number of records in my df)
  • write the df with 2 different copybooks: 1 with the header schema (writing only top 1 row), 1 with the data schema (writing all rows except first one). when writing all the columns not in the copybook will not be written to outputs
  • the results: 2 separate files
  • merge them together following the order header>data.

Do you think it will work? Is there something smarter?

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions