Skip to content

TypeError when passing old numcodecs to zarr v3 #2964

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
TomNicholas opened this issue Apr 7, 2025 · 9 comments
Open

TypeError when passing old numcodecs to zarr v3 #2964

TomNicholas opened this issue Apr 7, 2025 · 9 comments
Labels
bug Potential issues with the zarr-python library

Comments

@TomNicholas
Copy link
Member

TomNicholas commented Apr 7, 2025

Zarr version

v3.0.6

Numcodecs version

v0.16.0

Python Version

3.13

Operating System

mac

Installation

uv

Description

Passing the old stype of numcodecs codec to zarr raises a TypeError, when this scenario could be detected and upcast into the zarr-v3-compatible version of that codec instead.

This has been reported by a lot of xarray users (pydata/xarray#10032) as well as here #2710 (comment).

Traceback (most recent call last):
  File "/Users/tom/Documents/Work/Code/experimentation/bugs/blosc/pure_zarr_mve.py", line 25, in <module>
    za = zarr.create_array(
        store,
    ...<4 lines>...
        compressors=compressors,
    )
  File "/Users/tom/.cache/uv/environments-v2/pure-zarr-mve-2145b34a8fc90dca/lib/python3.13/site-packages/zarr/api/synchronous.py", line 879, in create_array
    sync(
    ~~~~^
        zarr.core.array.create_array(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<19 lines>...
        )
        ^
    )
    ^
  File "/Users/tom/.cache/uv/environments-v2/pure-zarr-mve-2145b34a8fc90dca/lib/python3.13/site-packages/zarr/core/sync.py", line 163, in sync
    raise return_result
  File "/Users/tom/.cache/uv/environments-v2/pure-zarr-mve-2145b34a8fc90dca/lib/python3.13/site-packages/zarr/core/sync.py", line 119, in _runner
    return await coro
           ^^^^^^^^^^
  File "/Users/tom/.cache/uv/environments-v2/pure-zarr-mve-2145b34a8fc90dca/lib/python3.13/site-packages/zarr/core/array.py", line 4146, in create_array
    result = await init_array(
             ^^^^^^^^^^^^^^^^^
    ...<16 lines>...
    )
    ^
  File "/Users/tom/.cache/uv/environments-v2/pure-zarr-mve-2145b34a8fc90dca/lib/python3.13/site-packages/zarr/core/array.py", line 3961, in init_array
    array_array, array_bytes, bytes_bytes = _parse_chunk_encoding_v3(
                                            ~~~~~~~~~~~~~~~~~~~~~~~~^
        compressors=compressors,
        ^^^^^^^^^^^^^^^^^^^^^^^^
    ...<2 lines>...
        dtype=dtype_parsed,
        ^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/tom/.cache/uv/environments-v2/pure-zarr-mve-2145b34a8fc90dca/lib/python3.13/site-packages/zarr/core/array.py", line 4330, in _parse_chunk_encoding_v3
    out_bytes_bytes = tuple(_parse_bytes_bytes_codec(c) for c in maybe_bytes_bytes)
  File "/Users/tom/.cache/uv/environments-v2/pure-zarr-mve-2145b34a8fc90dca/lib/python3.13/site-packages/zarr/core/array.py", line 4330, in <genexpr>
    out_bytes_bytes = tuple(_parse_bytes_bytes_codec(c) for c in maybe_bytes_bytes)
                            ~~~~~~~~~~~~~~~~~~~~~~~~^^^
  File "/Users/tom/.cache/uv/environments-v2/pure-zarr-mve-2145b34a8fc90dca/lib/python3.13/site-packages/zarr/registry.py", line 184, in _parse_bytes_bytes_codec
    raise TypeError(f"Expected a BytesBytesCodec. Got {type(data)} instead.")
TypeError: Expected a BytesBytesCodec. Got <class 'numcodecs.blosc.Blosc'> instead.

Steps to reproduce

# /// script
# requires-python = ">=3.13"
# dependencies = [
#     "numpy",
#     "zarr>=3",
# ]
# ///
import numpy as np
import zarr
import numcodecs

print(zarr.__version__)
print(numcodecs.__version__)

store = "/tmp/foo.zarr"
shape = (1024 * 1024 * 1024,)
chunks = (1024 * 1024 * 16,)
dtype = np.float64
fill_value = np.nan

# cname = "blosclz"
cname = "lz4"
compressors = [numcodecs.Blosc(cname="lz4")]

za = zarr.create_array(
    store,
    shape=shape,
    chunks=chunks,
    dtype=dtype,
    fill_value=fill_value,
    compressors=compressors,
)

Additional output

No response

@TomNicholas TomNicholas added the bug Potential issues with the zarr-python library label Apr 7, 2025
@jhamman
Copy link
Member

jhamman commented Apr 7, 2025

@normanrz - you know this part of the code best. Do you think its reasonable for us to cast vanilla numcodecs codecs to zarr3 codecs? Seems like we have everything we need to make the right decisions here.

@TomNicholas
Copy link
Member Author

Note that if I change the compressors line to this then it works

compressors = [zarr.codecs.BloscCodec(cname="zstd", clevel=3, shuffle="shuffle")]

@normanrz
Copy link
Member

normanrz commented Apr 7, 2025

@normanrz - you know this part of the code best. Do you think its reasonable for us to cast vanilla numcodecs codecs to zarr3 codecs? Seems like we have everything we need to make the right decisions here.

Yes, upcasting is certainly possible. Whether to do that here in zarr or in numcodecs invokes the usual cyclic dependency issue. My gut feeling would be that a to_zarr3 function in numcodecs.zarr3 would be better placed, though.

@fowlerovski
Copy link

just adding weight to this ticket... [v3.0.6]
Expected a BytesBytesCodec. Got <class 'numcodecs.blosc.Blosc'> instead.. Skipping.

looking forward to the unified/aligned implementation, thanks devteam!

@TomNicholas
Copy link
Member Author

We really need this to work, because it's preventing people using this pattern to move their zarr v2 data into zarr v3 data via xarray:

ds = xr.open_zarr('store-v2.zarr')
ds.to_zarr('store-v3.zarr')

@darothen
Copy link

Bumping for priority here; also impacting an upgrade to a production workflow that I'd like to quickly migrate to Zarr v3.

@dcherian
Copy link
Contributor

ds.drop_encoding().to_zarr("store-v3.zarr") should work, as long as you're ok with default compression

@darothen
Copy link

Confirm that the defaults all work just fine when writing a new Dataset created in-memory to Zarr using latest mainline releases of zarr-python and numcodecs.

Still looking for clarity on defining custom encoding/compressors. The workflow I'm migrating to Zarr v3 previously had some fine-tuning done to create a compression scheme that balanced output size and runtime. Using the original way to setup this up - e.g. instantiate a numcodecs.Blosc as in @TomNicholas original top post - continues to produce the error message in this comment.

@brokkoli71
Copy link
Member

Yes, upcasting is certainly possible. Whether to do that here in zarr or in numcodecs invokes the usual cyclic dependency issue. My gut feeling would be that a to_zarr3 function in numcodecs.zarr3 would be better placed, though.

I made a draft for this in zarr-developers/numcodecs#741. feel free to leave a comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
None yet
Development

No branches or pull requests

7 participants