Question
Can the writer support the scenario when the output is a positional file with header&data?
Considering header and data as two different sets of records having different schemas, usually header is 1 row while data is N row.
What I was considering:
- manage header and data in same Spark Dataframe. Resulting in a Df with schema the union of columns of header and columns of data. Headers column will have values only in first row and nulls in the rest N rows, while data columns will have nulls in first row and values only in N rows after the first one (where N is the number of records in my df)
- write the df with 2 different copybooks: 1 with the header schema (writing only top 1 row), 1 with the data schema (writing all rows except first one). when writing all the columns not in the copybook will not be written to outputs
- the results: 2 separate files
- merge them together following the order header>data.
Do you think it will work? Is there something smarter?
Question
Can the writer support the scenario when the output is a positional file with header&data?
Considering header and data as two different sets of records having different schemas, usually header is 1 row while data is N row.
What I was considering:
Do you think it will work? Is there something smarter?