Skip to content

move("/", "/dataset") makes data unreachable via list_dir from root #2089

@TomNicholas

Description

@TomNicholas

Description

After moving the root group to a subpath and committing, list_dir("/") returns nothing even though the data is still present in storage (visible via list()). Opening the store with zarr/xarray from the root would see an empty store.

Reproducer

import asyncio
import tempfile
import icechunk as ic
import xarray as xr

async def main():
    air_temp = xr.tutorial.open_dataset("air_temperature")

    storage = ic.local_filesystem_storage(tempfile.TemporaryDirectory().name)
    repo = ic.Repository.create(storage)

    session = repo.writable_session("main")
    air_temp.to_zarr(session.store)
    session.commit("wrote tutorial dataset to root group")

    # Before the move — list_dir("/") works as expected
    session = repo.readonly_session(branch="main")
    print('--- before move, list_dir("/") ---')
    async for r in session.store.list_dir("/"):
        print(r)

    # Move the root group to /dataset
    session = repo.rearrange_session("main")
    session.move("/", "/dataset")
    session.commit("moved root to /dataset")

    session = repo.readonly_session(branch="main")

    print('--- after move, list_dir("/") ---')
    async for r in session.store.list_dir("/"):
        print(r)

    print('--- after move, list_dir("/dataset") ---')
    async for r in session.store.list_dir("/dataset"):
        print(r)

    print('--- after move, list_dir("dataset") ---')
    async for r in session.store.list_dir("dataset"):
        print(r)

    print("--- after move, list() ---")
    async for r in session.store.list():
        print(r)

asyncio.run(main())

Output

--- before move, list_dir("/") ---
zarr.json
air
lat
lon
time
--- after move, list_dir("/") ---
--- after move, list_dir("/dataset") ---
--- after move, list_dir("dataset") ---
zarr.json
air
lat
lon
time
--- after move, list() ---
dataset/zarr.json
dataset/air/zarr.json
dataset/lat/zarr.json
dataset/lon/zarr.json
dataset/time/zarr.json
dataset/air/c/0/0/0
dataset/air/c/0/0/1
dataset/air/c/0/1/0
dataset/air/c/0/1/1
dataset/air/c/1/0/0
dataset/air/c/1/0/1
dataset/air/c/1/1/0
dataset/air/c/1/1/1
dataset/air/c/2/0/0
dataset/air/c/2/0/1
dataset/air/c/2/1/0
dataset/air/c/2/1/1
dataset/air/c/3/0/0
dataset/air/c/3/0/1
dataset/air/c/3/1/0
dataset/air/c/3/1/1
dataset/lat/c/0
dataset/lon/c/0
dataset/time/c/0

Expected

list_dir("/") returns dataset as a child. list_dir("/dataset") and list_dir("dataset") behave identically.

Actual

list_dir("/") and list_dir("/dataset") return nothing. Only list_dir("dataset") (no leading slash) returns the group contents.

Notes

Possibly two bugs:

  1. move("/", ...) leaves no root group, so list_dir("/") finds no children even though storage keys exist under dataset/.
  2. Leading-slash path normalization in list_dir/dataset vs dataset should be equivalent.

Metadata

Metadata

Assignees

Labels

bug 🐛Something isn't working

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions