-
Notifications
You must be signed in to change notification settings - Fork 1k
Allow Users to Provide Custom ArrayFormatters when Pretty-Printing Record Batches
#8829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| /// /// correct formatter for an extension type on-demand. | ||
| /// struct MyFormatters {} | ||
| /// | ||
| /// impl ArrayFormatterFactory for MyFormatters { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the DataFusion perspective, this would be the trait that we would implenent. It looks up any existing extension types in the registry and uses the custom printing implementation.
| /// | ||
| /// /// A formatter for the type `my_money` that wraps a specific array and has access to the | ||
| /// /// formatting options. | ||
| /// struct MyMoneyFormatter<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the DataFusion perspective, this would be a struct that users implement such that they can pretty-print their custom types.
We would also do something similar for canonical data types (e.g., formatting UUIDs).
| .unwrap(); | ||
|
|
||
| let s = [ | ||
| "+--------+", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty-printing result with a custom extension type
|
|
||
| // No € formatting as in test_format_batches_with_custom_formatters | ||
| let s = [ | ||
| "+--------------+", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overriding pretty-printing result based on the DataType (appen 32-Bit to Int32).
| trait DisplayIndex { | ||
| pub trait DisplayIndex { | ||
| /// Write the value of the underlying array at `idx` to `f`. | ||
| fn write(&self, idx: usize, f: &mut dyn Write) -> FormatResult; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should add a len method to make the trait more complete (?).
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this PR @tobixdev -- I think the basic idea looks great to me
I have a few suggestions on API design:
- Don't make fields of
FormatOptionspub (instead make accessors) - Consider putting the factory on the
FormatOptionsrather than passing in a new parameter
| /// If set to `true` any formatting errors will be written to the output | ||
| /// instead of being converted into a [`std::fmt::Error`] | ||
| safe: bool, | ||
| pub safe: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make fields in FormatOptions public. This is necessary as the custom ArrayFormatter must also have access to the formatting options. (see in the test)
Rather than making these public, what about adding accessors for them? I think that would make it easier to change the underlying implementation in the future without causing breaking API changes
I think by making all these fields pub it means people can construct format options explicitly like
let options = FormatOptions {
safe: false,
null: "",
...
};So adding any new field to the struct will be a breaking API change
If we keep them private fields, then we can add new fields without breaking existing peopel
| pub fn pretty_format_batches_with_options_and_formatters( | ||
| results: &[RecordBatch], | ||
| options: &FormatOptions, | ||
| formatters: Option<&dyn ArrayFormatterFactory>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than needing a parameter, did you consider adding a new field to FormatOptions?
struct FormatOptions {
...
formatter_factory: Option<&dyn ArrayFormatterFactory>,
}That would reduce the new APIs and make it easier for existing code to use custom formatters (just need to update options)
| /// } | ||
| /// } | ||
| /// | ||
| /// /// A formatter for the type `my_money` that wraps a specific array and has access to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great -- thank you
Which issue does this PR close?
Rationale for this change
Allows users that require custom pretty-printing logic for batches to supply this implementation.
What changes are included in this PR?
Changes to existing code:
FormatOptionspublic. This is necessary as the customArrayFormattermust also have access to the formatting options. (see<NULL>in the test)types_info()method as the field is now publicArrayFormatterwith aDisplayIndeximplementationFormatError,FormatResult, andDisplayIndexpublic. I do have some second thoughts aboutDisplayIndexnot having any concept of length even though its taking an index as input. However, it may be fine for now.New code:
ArrayFormatterFactory: Allows creatingArrayFormatterswith custom behaviorpretty_format_batches_with_options_and_formatterspretty printing with custom formattersAre these changes tested?
Yes, existing tests cover the default formatting path.
Three new tests:
Int32)Are there any user-facing changes?
Yes, multiple things become public,
types_info()becomes deprecated, and there are new APIs for custom pretty printing of batches.