-
Notifications
You must be signed in to change notification settings - Fork 6
timedelta64 #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
timedelta64 #12
Conversation
data-types/timedelta64/README.md
Outdated
## Fill value representation | ||
|
||
`timedelta64` fill values are represented as one of: | ||
- a JSON number with no fraction or exponent part that is within the range `[-2^63, 2^63 - 1]`. | ||
- the string `"NaT"`, which denotes the value `NaT`. | ||
|
||
> Note: the `NaT` value may optionally be encoded as the JSON number `-9223372036854775808`, i.e., `-2^63`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this part. Here it seems like the user can configure their own custom fill value? If so, shouldn't fill_value
be in configuration
? And what would be the use case for that?
Isn't it simpler if we just say that, like numpy, the integer -2^63
represents NaT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what I'm trying to say here is that the following two cases are the only acceptable fill values:
"fill_value" : <a JSON integer in the range [-2^63, 2^63]>
"fill_value" : "NaT"
With one degenerate case:
"fill_value": "NaT"
has the same meaning as
"fill_value": -9223372036854775808
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the statement "timedelta64
fill values are represented as one of" is intended to mean "there are two possible forms for the "fill_value"
metadata. Maybe I should make this clear. I definitely don't want to convey that users can configure a custom fill value.
This PR looks good. Are you ready to have it merged or are you still looking for more feedback from the community? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I get it now. Thanks for walking me through the fill value stuff.
data-types/timedelta64/README.md
Outdated
| Y | year | | ||
| M | month | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting that year and month are super problematic as units because they don't actually have a fixed duration (leap years, variable months). I would hate to see us proliferating data with this encoding into the world. But I guess if the goal is numpy compatibility, we should leave them in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% agree that the numpy definition is problematic. But I think there's value in a data type that numpy users (or zarr v2 users) can adopt without thinking. We should specify a less problematic, more generally useful datetime data type in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be useful to rename this data type to numpy.timedelta64
to signal the intent that it is only meant for compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numpy.timedelta64
is actually my preferred name, but iirc @rabernat was not a fan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this naming concern affects all the numpy dtypes, we should resolve that conversation in #4.
For general use I'd suggest a more general "unit" mechanism rather than a data type but this seems reasonable for numpy compatibility. Note that "year" and "month" still seem like very plausibly useful units even though they can't be precisely converted to seconds --- for example you may have a table listing the ages of people in years, or of children/infants in months. The source data may well not contain any more precise information anyway. Technically this issue also exists with every other unit because datetime64 excludes leap seconds. |
I think we should keep this open for a few days at a minimum. I'm very open to feedback on certain things (e.g., should it be named |
this data type is now identified as |
this PR adds
timedelta64
, based on the data type with the same name defined in numpy.Zarr v2 deferred to numpy's data type semantics, which means that Zarr v2 users could transparently create arrays using numpy's
timedelta64
data type. The data type defined in this PR enables the same usage pattern for zarr v3. This will be valuable for zarr v2 users who intend to migrate their data to zarr v3, or numpy users who want a simple way to store their data using zarr v3 arrays.Thus, the goal of this PR is not to specify an excellent data type for representing temporal durations. We should evaluate this spec based on how well it captures the semantics already defined by the numpy
timedelta64
data type.partially addresses #11