Skip to content

datetime64 and timedelta64 #11

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
d-v-b opened this issue May 5, 2025 · 7 comments
Closed

datetime64 and timedelta64 #11

d-v-b opened this issue May 5, 2025 · 7 comments

Comments

@d-v-b
Copy link
Contributor

d-v-b commented May 5, 2025

Zarr-python 2.x supported numpy's datetime64 and timedelta64 dtypes, which are described in the numpy documentation. Both of these data types are parametrized by a step size (a positive integer) and a unit (one of the temporal units listed here).

We should add a description of this data type to zarr v3. I imagine the JSON metadata for these dtypes would look like this:

"name": "datetime64" | "timedelta64"
"configuration": {
    "unit": "h" | "m" | "s" ...
    "interval": <int>
    } 

I think it would make sense to use a standard string datetime format for encoding fill values. There is a special value, called "not a time" or "NaT", which we could represent with a string literal "NaT".

@normanrz
Copy link
Member

normanrz commented May 5, 2025

I think it would be great to have date/time dtypes in v3.

There is a special value, called "not a time" or "NaT", which we could represent with a string literal "NaT".

How would NaT be encoded in the binary chunk?

@d-v-b d-v-b changed the title datetime64 and timedeta64 datetime64 and timedelta64 May 5, 2025
@d-v-b
Copy link
Contributor Author

d-v-b commented May 5, 2025

I think it would be great to have date/time dtypes in v3.

We should be careful to emphasize that these data types are intended for applications where compatibility with zarr v2 or numpy are high priority. For users who don't need zarr v2 / numpy compatibility, we could probably devise much better ways to represent dates / times than datetime64 / timedelta64. For example, we could define a metadata scheme for associating any data type with a unit (e.g., a time unit). This idea was suggested by @jbms.

How would NaT be encoded in the binary chunk?

numpy uses a signed 64 bit integer for datetime64 and timedelta64; -2^63 is reserved for the NaT value. I think we could copy that behavior.

@jbms
Copy link
Contributor

jbms commented May 5, 2025

What does "interval" mean? Where is the corresponding numpy documentation for interval?

@d-v-b
Copy link
Contributor Author

d-v-b commented May 5, 2025

In my example, "interval" meant "the smallest non-zero duration that scalars in a given timedelta / datetime data type can represent". E.g.:

import numpy as np
>>> np.ones((1,), dtype='datetime64[10s]')
array(['1970-01-01T00:00:10'], dtype='datetime64[10s]')

Numpy does not provide extensive documentation for the interval attribute. It's called informally the "step size" or "count" in the documentation for np.datetime_data.

"interval" might not be right word, we could also consider something like "step_size" or "duration" or "scale_factor".

@jbms
Copy link
Contributor

jbms commented May 5, 2025

Oh, I didn't realize those numpy types supported an arbitrary scale factor. Effectively that is part of the unit --- you could call it unit_multiplier or something.

This was referenced May 6, 2025
@normanrz
Copy link
Member

With #12 and #14 merged, I guess we can close this?

@d-v-b
Copy link
Contributor Author

d-v-b commented May 16, 2025

sounds good to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants