Skip to content

Decoder for MultiIndexes fails if there are other variables, using a dimension which is part of the multiindex #461

@okz

Description

@okz

First, thank you so much. Compression-by-gathering is an incredibly usefull addition, which hopefully will end up in xarray for ragged (or sparse) array support on netcdf's. one day.

#321 added support encoding and decoding for Pandas multi-indexes using "compression by gathering". However if there are other variables in the dataset using a dimension which is part of the multiindex, decode fails.

Minimum example, is a single line addition of var_with_lat , derived from the Encoding and decoding tutorial:

ds = xr.Dataset(
    {"landsoilt": ("landpoint", np.random.randn(4), {"foo": "bar"})},
    {
        "landpoint": pd.MultiIndex.from_product(
            [["a", "b"], [1, 2]], names=("lat", "lon")
        )
    },
)

# ADDING THIS LINE WILL FAIL THE DECODING PROCESS. 
# ds["var_with_lat"] = xr.DataArray([1,2], dims="lat")

encoded = cfxr.encode_multi_index_as_compress(ds, "landpoint")
decoded = cfxr.decode_compress_to_multi_index(encoded, "landpoint")

Once var_with_lat is added, decoding fails:

---> [129](file:///home/mirico/git/Curvefit/tests/scratch%20copy.py?line=128) decoded = cfxr.decode_compress_to_multi_index(encoded, "landpoint")

File [~/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py:116](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a2232302e37372e32382e323139222c2275736572223a226d697269636f227d.vscode-resource.vscode-cdn.net/home/mirico/git/~/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py:116), in decode_compress_to_multi_index(encoded, idxnames)
    [110](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=109)     from xarray.indexes import PandasMultiIndex
    [112](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=111)     variables = {
    [113](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=112)         dim: encoded[dim].isel({dim: xr.Variable(data=index, dims=idxname)})
    [114](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=113)         for dim, index in zip(names, indices)
    [115](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=114)     }
--> [116](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=115)     decoded = decoded.assign_coords(variables).set_xindex(
    [117](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=116)         names, PandasMultiIndex
    [118](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=117)     )
    [119](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=118) except ImportError:
    [120](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=119)     arrays = [encoded[dim].data[index] for dim, index in zip(names, indices)]

File [~/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py:4330](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a2232302e37372e32382e323139222c2275736572223a226d697269636f227d.vscode-resource.vscode-cdn.net/home/mirico/git/~/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py:4330), in Dataset.set_xindex(self, coord_names, index_cls, **options)
   [4327](file:///home/mirico/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py?line=4326) indexed_coords = set(coord_names) & set(self._indexes)
   [4329](file:///home/mirico/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py?line=4328) if indexed_coords:
-> [4330](file:///home/mirico/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py?line=4329)     raise ValueError(
   [4331](file:///home/mirico/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py?line=4330)         f"those coordinates already have an index: {indexed_coords}"
   [4332](file:///home/mirico/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py?line=4331)     )
   [4334](file:///home/mirico/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py?line=4333) coord_vars = {name: self._variables[name] for name in coord_names}
   [4336](file:///home/mirico/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py?line=4335) index = index_cls.from_variables(coord_vars, options=options)

ValueError: those coordinates already have an index: {'lat'}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions