Skip to content

Can't pickle numcodecs codecs #744

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Tracked by #557
TomNicholas opened this issue Apr 23, 2025 · 8 comments · May be fixed by #745
Open
Tracked by #557

Can't pickle numcodecs codecs #744

TomNicholas opened this issue Apr 23, 2025 · 8 comments · May be fixed by #745
Labels

Comments

@TomNicholas
Copy link
Member

TomNicholas commented Apr 23, 2025

Zarr version

3.0.6

Numcodecs version

0.16.0

Python Version

3.12.2

Operating System

Mac

Installation

conda

Description

The numcodecs codecs aren't pickleable.

For me it's a problem because I can't use AWS lambda functions with the lithops serverless framework. My understanding is that pickleability is something that zarr-python tests it can do, so it's inconsistent that numcodecs can't.

The problem seems to be this weird dynamic creation of codec classes. Presumably if numcodecs just defined these classes in a more conentional static way then they would be pickleable.

Steps to reproduce

In [1]: from numcodecs.zarr3 import Zlib

In [2]: import zarr

In [3]: arr = zarr.create_array(store={}, shape=(10, 10), dtype='f8', compressors=[Zlib()])
/Users/tom/miniconda3/envs/lithops-coiled/lib/python3.12/site-packages/numcodecs/zarr3.py:133: UserWarning: Numcodecs codecs are not in the Zarr version 3 specification and may not be supported by other zarr implementations.
  super().__init__(**codec_config)

In [4]: arr.info
Out[4]: 
Type               : Array
Zarr format        : 3
Data type          : DataType.float64
Shape              : (10, 10)
Chunk shape        : (10, 10)
Order              : C
Read-only          : False
Store type         : MemoryStore
Filters            : ()
Serializer         : BytesCodec(endian=<Endian.little: 'little'>)
Compressors        : (Zlib(codec_name='numcodecs.zlib', codec_config={}),)
No. bytes          : 800

In [5]: import pickle

In [6]: pickle.dumps(arr)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[6], line 1
----> 1 pickle.dumps(arr)

AttributeError: Can't pickle local object '_make_bytes_bytes_codec.<locals>._Codec'

Additional output

No response

@d-v-b
Copy link
Contributor

d-v-b commented Apr 23, 2025

is it OK if I transfer this issue to numcodecs?

@jakirkham
Copy link
Member

Probably need to use cloudpickle. Did you already try?

Or is this coming from some other Zarr code path?

Can't comment on Numcodecs Zarr 3 code as I haven't been involved in this

@d-v-b
Copy link
Contributor

d-v-b commented Apr 23, 2025

@jakirkham the cause is this function:

def _make_bytes_bytes_codec(codec_name: str, cls_name: str) -> type[_NumcodecsBytesBytesCodec]:
    # rename for class scope
    _codec_name = CODEC_PREFIX + codec_name


    class _Codec(_NumcodecsBytesBytesCodec):
        codec_name = _codec_name


        def __init__(self, **codec_config: JSON) -> None:
            super().__init__(**codec_config)


    _Codec.__name__ = cls_name
    return _Codec

It defines a class inside function scope, but this approach is not needed to solve the underlying problem. As this function gets called fewer than 10 times, we could instead have fewer than 10 separate normal class definitions. That would be my recommended short-term fix.

@d-v-b
Copy link
Contributor

d-v-b commented Apr 23, 2025

we could also use __init_subclass__ and make the codec name a parameter of the class definition.

@jakirkham
Copy link
Member

jakirkham commented Apr 23, 2025

Yes I understand. Read the same code. This is the kind of thing cloudpickle handles routinely

That said, Idk why we went with that design choice. This is what I mean by I cannot comment on why that was added that way

@d-v-b d-v-b transferred this issue from zarr-developers/zarr-python Apr 23, 2025
@d-v-b
Copy link
Contributor

d-v-b commented Apr 23, 2025

@TomNicholas i transferred this to numcodecs, because it requires changes to numcodecs code.

@TomNicholas
Copy link
Member Author

As this function gets called fewer than 10 times, we could instead have fewer than 10 separate normal class definitions. That would be my recommended short-term fix.
...
we could also use init_subclass and make the codec name a parameter of the class definition.

I'm happy to submit a PR to do this - I had to make a janky branch to get what I'm working on to work anyway - I might as well turn it into a proper PR.

@d-v-b
Copy link
Contributor

d-v-b commented Apr 23, 2025

that would be great @TomNicholas, feel free to reach out if you want any pointers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants