Skip to content

Conversation

@tobixdev
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Allows users that require custom pretty-printing logic for batches to supply this implementation.

What changes are included in this PR?

Changes to existing code:

  • Make fields in FormatOptions public. This is necessary as the custom ArrayFormatter must also have access to the formatting options. (see <NULL> in the test)
  • Deprecate types_info() method as the field is now public
  • Allow directly creating ArrayFormatter with a DisplayIndex implementation
  • Make FormatError, FormatResult, and DisplayIndex public. I do have some second thoughts about DisplayIndex not having any concept of length even though its taking an index as input. However, it may be fine for now.

New code:

  • ArrayFormatterFactory: Allows creating ArrayFormatters with custom behavior
  • pretty_format_batches_with_options_and_formatters pretty printing with custom formatters
  • Similar thing for format column

Are these changes tested?

Yes, existing tests cover the default formatting path.

Three new tests:

  • Format record batch with custom type (append € sign)
  • Format column with custom formatter (append (32-Bit) for Int32)
  • Allow overriding the custom types with a custom schema (AFAIK this is not possible with the current API but might make sense).
  • Added a sanity check that the number of fields in a custom schema must match the number of columns in the record batch.

Are there any user-facing changes?

Yes, multiple things become public, types_info() becomes deprecated, and there are new APIs for custom pretty printing of batches.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Nov 12, 2025
/// /// correct formatter for an extension type on-demand.
/// struct MyFormatters {}
///
/// impl ArrayFormatterFactory for MyFormatters {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the DataFusion perspective, this would be the trait that we would implenent. It looks up any existing extension types in the registry and uses the custom printing implementation.

///
/// /// A formatter for the type `my_money` that wraps a specific array and has access to the
/// /// formatting options.
/// struct MyMoneyFormatter<'a> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the DataFusion perspective, this would be a struct that users implement such that they can pretty-print their custom types.

We would also do something similar for canonical data types (e.g., formatting UUIDs).

.unwrap();

let s = [
"+--------+",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty-printing result with a custom extension type


// No € formatting as in test_format_batches_with_custom_formatters
let s = [
"+--------------+",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overriding pretty-printing result based on the DataType (appen 32-Bit to Int32).

trait DisplayIndex {
pub trait DisplayIndex {
/// Write the value of the underlying array at `idx` to `f`.
fn write(&self, idx: usize, f: &mut dyn Write) -> FormatResult;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add a len method to make the trait more complete (?).

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR @tobixdev -- I think the basic idea looks great to me

I have a few suggestions on API design:

  1. Don't make fields of FormatOptions pub (instead make accessors)
  2. Consider putting the factory on the FormatOptions rather than passing in a new parameter

/// If set to `true` any formatting errors will be written to the output
/// instead of being converted into a [`std::fmt::Error`]
safe: bool,
pub safe: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make fields in FormatOptions public. This is necessary as the custom ArrayFormatter must also have access to the formatting options. (see in the test)

Rather than making these public, what about adding accessors for them? I think that would make it easier to change the underlying implementation in the future without causing breaking API changes

I think by making all these fields pub it means people can construct format options explicitly like

let options = FormatOptions {
  safe: false,
  null: "", 
...
};

So adding any new field to the struct will be a breaking API change

If we keep them private fields, then we can add new fields without breaking existing peopel

pub fn pretty_format_batches_with_options_and_formatters(
results: &[RecordBatch],
options: &FormatOptions,
formatters: Option<&dyn ArrayFormatterFactory>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than needing a parameter, did you consider adding a new field to FormatOptions?

struct FormatOptions {
...
  formatter_factory: Option<&dyn ArrayFormatterFactory>,
}

That would reduce the new APIs and make it easier for existing code to use custom formatters (just need to update options)

/// }
/// }
///
/// /// A formatter for the type `my_money` that wraps a specific array and has access to the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great -- thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Custom Pretty-Printing Implementation for Column when Formatting Record Batches

2 participants