-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[FLINK-37729][flink-formats] ResultTypeQueryable for Avro serialization schema #26507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[FLINK-37729][flink-formats] ResultTypeQueryable for Avro serialization schema #26507
Conversation
…ion schema Signed-off-by: Pawel Leszczynski <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the right approach to me.
* @return TypeInformation of the produced type | ||
*/ | ||
@Override | ||
public TypeInformation getProducedType() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be TypeInformation<T>
. You probably need to cast the returns though but I think it makes things more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The common type for both cases is ResultTypeQueryable<GenericContainer>
as both SpecificRecord
and GenericRecord
extend GenericContainer
. However, I could not make it work for GenericRecordAvroTypeInfo
to be casted into TypeInformation<GenericContainer>
if (schema == null) { | ||
throw new IllegalStateException( | ||
"The produced type is not available before the schema is initialized."); | ||
} else { | ||
return new GenericRecordAvroTypeInfo(schema); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the produced type be of row? Naively I'd expect to have the same type bound to
SerializationSchema<RowData>, ResultTypeQueryable<GenericRecord>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know any convenient helper to convert RowData
into sth extending TypeInformation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that in the Avro deserialization it returns the type as TypeInformation<RowData>
.
I can see in the Parquet Avro and the Orc code it is returning a type.
It would be good if what we do here is consistent with the Parquet Avro case.
I also see that there is a comment at the top of this class saying to keep this class in step with AvroRowDataDeserializationSchema} and schema converter {@link AvroSchemaConverter}. AvroSchemaConverter does deal with Type info.
It would be useful for me to understand what how lineage is calling this and why we end up in this class. Maybe a lineage unit test bringing in an Avro schema and a Parquet Avro schema should be helpful.
Implement
ResultTypeQueryable
interface forAvroSerializationSchema
andAvroRowDataSerializationSchema
to make it exposed via existing lineage interfaces.Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: noDocumentation