Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kvikio backend entrypoint with Zarr v3 #70

Draft
wants to merge 37 commits into
base: main
Choose a base branch
from
Draft

Conversation

weiji14
Copy link
Member

@weiji14 weiji14 commented Mar 11, 2025

Branched off the work at #10, this PR registers a kvikio backend that allows reading data from Zarr (v3) stores directly into GPU memory using NVIDIA GPU Direct Storage (GDS) without going through host (CPU) memory.

Preview demo at https://cupy-xarray--70.org.readthedocs.build/70/source/kvikio.html

Requires some patches from un-released versions:

Also requires:

TODO:

  • Wait for stable release of kvikio>=25.04 and xarray>=2025.3.0
  • Figure out where to insert zarr.config.enable_gpu() into the backend to read into cupy.ndarray by default?
  • Write more unit tests
  • Cleanup API and tutorial documentation

Limitations:

References:

dcherian and others added 30 commits August 2, 2022 15:17
* main:
  Min xarray >= 0.19.0 (#25)
  Fix broken dask_array_type import (#24)
* upstream/main:
  Documentation Updates  📖 (#35)
  [pre-commit.ci] pre-commit autoupdate (#37)
  [pre-commit.ci] pre-commit autoupdate (#34)
  [pre-commit.ci] pre-commit autoupdate (#32)
  Update .pre-commit-config.yaml (#28)
  Expand installation doc (#27)
Allow it to be rendered under the User Guide section.
Will need it for the kvikio.zarr docs later.
Create new section in the API documentation page for the kvikIO engine. Added more docstrings to the kvikio.py file, and fixed some imports so things render nicely on the API page. Also added an intersphinx link to the kvikio docs at https://docs.rapids.ai/api/kvikio/stable.
Fixes error like `TypeError: ZarrArrayWrapper.__init__() takes 2 positional arguments but 3 were given`.
Fix `TypeError: Implicit conversion to a NumPy array is not allowed. Please use `.get()` to construct a NumPy array explicitly` on https://github.com/pydata/xarray/blob/v2024.11.0/xarray/core/indexing.py#L578
Can directly rely on upstream xarray's ZarrStore.open_store_variable method since Zarr v3 compatibility was added in pydata/xarray#9552.
UserWarning: The `compressor` argument is deprecated. Use `compressors` instead.
weiji14 added 4 commits March 11, 2025 13:21
Only difference is to pass the Zarr store's root filepath to kvikio.zarr.GDSStore instead of xarray.backends.zarr.ZarrStore.
Dev version containing patch at pydata/xarray#10078 that fixes `TypeError: NumpyIndexingAdapter only wraps np.ndarray. Trying to wrap <class 'cupy.ndarray'>`.
Needed to return cupy.ndarray instead of numpy.ndarray objects. Should find a better place to put this `zarr.config.enable_gpu()` call later.
Rearranged some cells so the air-temperature.zarr store is created first. Added a couple of `zarr.config.enable_gpu()` statements in to ensure arrays are read to cupy.ndarray. Removed the flox section at the end.
@weiji14 weiji14 added the enhancement New feature or request label Mar 11, 2025
@weiji14 weiji14 added this to the 0.2.0 milestone Mar 11, 2025
@weiji14 weiji14 self-assigned this Mar 11, 2025

@pytest.mark.parametrize("indexer", [slice(None), slice(2, 4), 2, [2, 3, 5]])
def test_lazy_indexing(indexer, store):
with zarr.config.enable_gpu(), xr.open_dataset(store, engine="kvikio") as ds:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, xr.open_dataset(store, engine="kvikio") should have zarr.config.enable_gpu() set already, so GPU-backed cupy arrays are returned when a user accesses the arrays. Unless we expect there to be users who wants to use the kvikio engine while returning CPU-backed numpy arrays. Default should probably be cupy though?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure... The (v3) kvikio.zarr.GDSStore is written to accept either. I'm tempted to leave this up to the user (so make them configure Zarr correctly) rather than assuming they want to use GPU memory, but maybe engine="kvikio" is sufficient indication that they want stuff on the GPU.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, kvikio.zarr.GDSStore can return numpy (CPU) arrays too, and I guess there might be a case where someone wants nvCOMP decompression on the GPU (once zarr-developers/zarr-python#2863 is implemented), but have a numpy array returned in CPU memory? But I do feel that engine="kvikio" should default to putting things on GPU/cupy, while having a way to toggle it off as needed.

Looking at xr.open_dataset, I'm thinking if it might make sense to pass something into the backend_kwargs parameter to indicate the user's preference for GPU or CPU outputs. Is the from_array_kwargs or chunked_array_type parameters (experimental API) the intended use for this actually?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am inclined to agree that kvikio.zarr.GDSStore should have the ability to not enable the GPU zarr configs even if someone wants to get data on the GPU since they can try and use a different BufferPrototype that return torch tensors instead of cupy arrays, for example.

But for ease of use, it might be better to default to zarr.config.enable_gpu() and that can be overridden by passing in a different context manager? So xr.open_dataset(store, engine="kvikio") will default to using cupy arrays, but xr.open_dataset(store, engine="kvikio", zarr_context=contextlib.nullcontext()) would return numpy arrays and xr.open_dataset(store, engine="kvikio", zarr_context=MyCustomTorchContext()) would return torch tensors?

Copy link
Member Author

@weiji14 weiji14 Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is similar to the discussion at zarr-developers/zarr-python#2473. That zarr_context parameter is exactly what I'm thinking of, I'm trying to find out if there's a way to implement this right now by setting/monkeypatching the default prototype in kvikio.zarr.GDSStore's get method here:

https://github.com/rapidsai/kvikio/blob/a4170fc098e80d339a42c5da9a605796eb864c9f/python/kvikio/kvikio/zarr/_zarr_python_3.py#L98-L102

Currently the prototype: BufferPrototype is set dynamically by calling zarr.core.buffer.core.default_buffer_prototype(). But I'm wondering if we can have a pre-determined default buffer prototype that is set when the GDSStore instance is created?

@jacobtomlinson
Copy link
Collaborator

cc @jakirkham @TomAugspurger @madsbk @ncclementi who may be interested in this

Copy link

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High-level question: is are these two functionally equivalent?

ds = xr.open_dataset(kvikio.zarr.GDSStore("data.zarr"), engine="zarr")
ds = xr.open_dataset("data.zarr", engine="kvikio")

Do people have a preference on what to recommend to users?


@pytest.mark.parametrize("indexer", [slice(None), slice(2, 4), 2, [2, 3, 5]])
def test_lazy_indexing(indexer, store):
with zarr.config.enable_gpu(), xr.open_dataset(store, engine="kvikio") as ds:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure... The (v3) kvikio.zarr.GDSStore is written to accept either. I'm tempted to leave this up to the user (so make them configure Zarr correctly) rather than assuming they want to use GPU memory, but maybe engine="kvikio" is sufficient indication that they want stuff on the GPU.

filename_or_obj = _normalize_path(filename_or_obj)
if not store:
store = ZarrStore.open_group(
store=kvikio.zarr.GDSStore(root=filename_or_obj),
Copy link
Member Author

@weiji14 weiji14 Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High-level question: is are these two functionally equivalent?

ds = xr.open_dataset(kvikio.zarr.GDSStore("data.zarr"), engine="zarr")
ds = xr.open_dataset("data.zarr", engine="kvikio")

Do people have a preference on what to recommend to users?

Yes, those two are functionally the same. The xr.open_dataset(kvikio.zarr.GDSStore("data.zarr"), engine="zarr") style works today (with xarray patch =2025.03.0 and kvikio=25.04.00a), while xr.open_dataset("data.zarr", engine="kvikio") is more just a convenience syntax. I was hoping to also sneak in the zarr.config.enable_gpu() in the backend somewhere to get cupy arrays by default (but we can discuss that in the https://github.com/xarray-contrib/cupy-xarray/pull/70/files#r1988251560 thread).


@pytest.mark.parametrize("indexer", [slice(None), slice(2, 4), 2, [2, 3, 5]])
def test_lazy_indexing(indexer, store):
with zarr.config.enable_gpu(), xr.open_dataset(store, engine="kvikio") as ds:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, kvikio.zarr.GDSStore can return numpy (CPU) arrays too, and I guess there might be a case where someone wants nvCOMP decompression on the GPU (once zarr-developers/zarr-python#2863 is implemented), but have a numpy array returned in CPU memory? But I do feel that engine="kvikio" should default to putting things on GPU/cupy, while having a way to toggle it off as needed.

Looking at xr.open_dataset, I'm thinking if it might make sense to pass something into the backend_kwargs parameter to indicate the user's preference for GPU or CPU outputs. Is the from_array_kwargs or chunked_array_type parameters (experimental API) the intended use for this actually?

@jakirkham
Copy link
Collaborator

cc @akshaysubr

Using functools.partial to override default buffer protocol to be GPU buffer instead of CPU buffer. Not quite working as expected, but hopefully gets a point across.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants