Skip to content

String arrays written with zarr python 3 zarr spec 2 cannot be read in zarr python 2 #3132

Closed
@msschwartz21

Description

@msschwartz21

Zarr version

v2.18.7, v3.0.8

Numcodecs version

v0.15.1

Python Version

3.12.10

Operating System

Mac

Installation

Pixi from conda

Description

Current behavior:

  • Write an array of strings using zarr python 3 and zarr format 2
  • The same array cannot be read using zarr python 2
Traceback (most recent call last):
  File "/Users/schwartzm10/Code/geff/sandbox/zarr_string.py", line 33, in <module>
    read(path)
  File "/Users/schwartzm10/Code/geff/sandbox/zarr_string.py", line 20, in read
    ids = z['strings'][:]
          ~~~~~~~~~~~~^^^
  File "/Users/schwartzm10/Code/geff/.pixi/envs/zarr2/lib/python3.12/site-packages/zarr/core.py", line 799, in __getitem__
    result = self.get_basic_selection(pure_selection, fields=fields)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/schwartzm10/Code/geff/.pixi/envs/zarr2/lib/python3.12/site-packages/zarr/core.py", line 925, in get_basic_selection
    return self._get_basic_selection_nd(selection=selection, out=out, fields=fields)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/schwartzm10/Code/geff/.pixi/envs/zarr2/lib/python3.12/site-packages/zarr/core.py", line 967, in _get_basic_selection_nd
    return self._get_selection(indexer=indexer, out=out, fields=fields)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/schwartzm10/Code/geff/.pixi/envs/zarr2/lib/python3.12/site-packages/zarr/core.py", line 1342, in _get_selection
    self._chunk_getitems(
  File "/Users/schwartzm10/Code/geff/.pixi/envs/zarr2/lib/python3.12/site-packages/zarr/core.py", line 2187, in _chunk_getitems
    self._process_chunk(
  File "/Users/schwartzm10/Code/geff/.pixi/envs/zarr2/lib/python3.12/site-packages/zarr/core.py", line 2100, in _process_chunk
    chunk = self._decode_chunk(cdata)
            ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/schwartzm10/Code/geff/.pixi/envs/zarr2/lib/python3.12/site-packages/zarr/core.py", line 2370, in _decode_chunk
    chunk = chunk.view(self._dtype)
            ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/schwartzm10/Code/geff/.pixi/envs/zarr2/lib/python3.12/site-packages/numpy/_core/_internal.py", line 565, in _view_is_safe
    raise TypeError("Cannot change data-type for array of references.")
TypeError: Cannot change data-type for array of references.

Steps to reproduce

write.py

# /// script
# requires-python = ">=3.12,<3.13"
# dependencies = [
#   "zarr>3,<4",
# ]
# ///
import zarr
import numpy as np


if __name__ == "__main__":
    ids = np.array([f"1_{i}" for i in range(1000)])

    print(zarr.__version__)

    z = zarr.open(f'test.zarr', mode='a', zarr_format=2)
    z['strings'] = ids

read.py

# /// script
# requires-python = ">=3.12,<3.13"
# dependencies = [
#   "zarr>=2,<3",
# ]
# ///
import zarr

if __name__ == "__main__":
    z = zarr.open('test.zarr')
    ids = z['strings'][:]
    print(ids)

Additional output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugPotential issues with the zarr-python library

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions