-
Notifications
You must be signed in to change notification settings - Fork 1k
[Variant][Shredding] Support typed_access for timestamp_micro/timestamp_nano #8401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Variant][Shredding] Support typed_access for timestamp_micro/timestamp_nano #8401
Conversation
984fbcb to
0748c0e
Compare
e003775 to
aa9f8a9
Compare
scovich
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, other than a panic that needs to be removed.
Your call whether to deal with timestamp precision widening in this PR or as a follow-on effort (maybe it depends on whether there's any controversy about the approach)
| ) | ||
| } | ||
| // Variant timestamp only support time unit with microsecond or nanosecond precision | ||
| _ => panic!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we return an error instead of panicking?
Also, AFAIK all the other timestamp precisions can be widened losslessly to microsecond; should we consider doing that instead of blowing up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, let's not panic here
ANother potential approach is to return a not yet implemented error -- maybe by matching the supported types explicitly, so like
DataType::Timestamp(TimeUnit::Microsecond, Some(_)) => {
generic_conversion_single_value!(
TimestampMicrosecondType,
as_primitive,
|v| DateTime::from_timestamp_micros(v).unwrap(),
typed_value,
index
)
}instead of
DataType::Timestamp(timeunit, tz) => {
match (timeunit, tz) {
(TimeUnit::Microsecond, Some(_)) => {
generic_conversion_single_value!(
TimestampMicrosecondType,
as_primitive,
|v| DateTime::from_timestamp_micros(v).unwrap(),
typed_value,
index
)
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm always a fan of less nesting, when possible...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. Yes, TimeUnit::Second and TimeUnit::Millisecond can convert to TimeUnit::MicroSecond losslessly. Using panic here because this value is located in typed_value of the Variant, and from the spec of Variant, there are only Micro and Nano units for Timestamp(with/without timezone), I assumed that this is an error(this could not happen?), I can add the second and Millisecond units if this could happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using
panichere because this value is located intyped_valueof theVariant, and from the spec ofVariant, there are onlyMicroandNanounits forTimestamp(with/without timezone), I assumed that this is an error(this could not happen?)
Good point! There are definitely unit tests (parquet/tests/variant_integration.rs) that expect non-variant types in typed_value columns to produce an error (cases 127 and 137, specifically). So we should not widen on the read path, but rather reject.
Widening would seem to make sense on the shredding write path, tho?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See also #8435, which I noticed while trying to add struct support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, we need to widen to TimeUnit::MicroSecond when writing to a variant, as TimeUnit::MicroSecond is the minimum precision we support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. We have flexibility on the write path to decide whether/how we widen or convert values that don't exactly fit the variant type system, but readers expect the resulting types to always be valid variant shredding types.
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ) | ||
| } | ||
| // Variant timestamp only support time unit with microsecond or nanosecond precision | ||
| _ => panic!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, let's not panic here
ANother potential approach is to return a not yet implemented error -- maybe by matching the supported types explicitly, so like
DataType::Timestamp(TimeUnit::Microsecond, Some(_)) => {
generic_conversion_single_value!(
TimestampMicrosecondType,
as_primitive,
|v| DateTime::from_timestamp_micros(v).unwrap(),
typed_value,
index
)
}instead of
DataType::Timestamp(timeunit, tz) => {
match (timeunit, tz) {
(TimeUnit::Microsecond, Some(_)) => {
generic_conversion_single_value!(
TimestampMicrosecondType,
as_primitive,
|v| DateTime::from_timestamp_micros(v).unwrap(),
typed_value,
index
)
}| variant_test_case!(31); | ||
| // https://github.com/apache/arrow-rs/issues/8334 | ||
| variant_test_case!(32, "Unsupported typed_value type: Time64(Microsecond)"); | ||
| // https://github.com/apache/arrow-rs/issues/8331 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
|
|
958e8b0 to
3019a2e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ]) | ||
| }); | ||
|
|
||
| partially_shredded_variant_array_gen!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for reducing the boilerplate
|
I merged up from main to resolve a conflict |
|
@klion26 -- this PR seems to have accumulated conflicts -- I apologize for that. Can you please resolve the conflicts so I can merge it in? |
69919eb to
bb4c3eb
Compare
|
@alamb I've rebased on main and resolved the conflicts. Please take another look, thanks. |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @klion26 🙏 🙏
Which issue does this PR close?
Timestamp(Microsecond, _)andTimestamp(Nanosecond, _)#8331 .Rationale for this change
Timestamp(Micro, _)andTimestamp(Nano, -)What changes are included in this PR?
Arrayimpl forVariantArrayandShreddedVariantFieldArray#8392), rebase the master in the last commitTimestamp(Micro, _)andTimestamp(Nano, _)Timestamp(Micro, _)andTimestamp(Nano, _)Are these changes tested?
Covered by existing and added tests
Are there any user-facing changes?
No