DOC: Missing page on layers of Zarr abstractions

### Describe the issue linked to the documentation

**tl;dr: A page is missing from the Zarr docs which distinguishes between the different layers of specs, APIs, ABCs, formats etc.**

It took me a year of working closely with zarr (via VirtualiZarr) to fully understand that "Zarr" is a multi-layered thing, which you can opt-in and out of at many different levels.

### What is zarr?

My current understanding is that "Zarr" encompasses all of these things:

1) A canonical on-disk file format, for both file and object storage, sometimes known as "native zarr",
2) A specification for how to serialize and de-serialize array data and metadata as byte streams to an arbitrary key-value store,
3) A python ABC for `Store` subclasses, which are key-value stores with a standardized API implementing the spec, but otherwise can do whatever they like behind the scenes, including not writing using the "native zarr" format,
4) A set of canonical python `Store` implementations, which generally do write using the "native zarr" format,
5) A python API for interacting with those `Store` subclasses, allowing python client code to treat many storage systems as interchangable.
6) A set of informal extensions, metadata standards, and a nascent framework for formalizing the extensions.

_I feel this nuance and hierarchy is not clearly documented anywhere._

### "Zarr" projects and how they fit in

It matters because there are now many projects which opt-in to some of these layers but not others. For example:

- The "zarr specification" is (2) and only (2), it doesn't actually touch on any of the other layers.
- Much of the data in the wild today follows (1), even though AFAIK that layout isn't actually formally described anywhere official?! It obeys (2), but we're just lucky that the mapping from file/object storage to a KV store is so obvious that it's still easy to write readers implemented in any language without a formal description of (1).
- Zarr-python
    - Zarr-python's `zarr.abc` provides (3),
    - Zarr-python's `zarr.storage` provides (4), which writes to local and object storage using (1) but without explicitly noting that,
    - Zarr-python's `zarr.api` provides (5), and can only interact with implementations of (3), such as (4),
- Zarr implementations in other languages (such as `zarr-js`) generally use (1) as their format, following (2), and take vague inspiration from (4) and (5).
    - Tensorstore is included in that category, as it uses (1) on disk.
- VirtualiZarr's new [`ManifestStore`](https://github.com/zarr-developers/VirtualiZarr/blob/e7073750a82105672a7262bcba5871c499284e7f/virtualizarr/manifests/store.py#L169) class is a concrete implementation of (3), but it eschews (1), with the aim of allowing access to an extensible set of non-zarr data formats on disk via (5).
- Icechunk
    - Icechunk's python API implements (3) so that it can be used with (5),
    - Icechunk's spec is an alternative format to (1), but still follows (2),
    - Icechunk's rust client obeys (2) but otherwise nothing else IIUC,
    - A non-python library binding to Icechunk's rust client (as `zarrs` has done) would follow (2) and the Icechunk spec, potentially with API inspiration from (3) and (5) but otherwise nothing else formal.
- Xarray is a key user of (5), but also quietly does a few things that falls under (6) (e.g. the `coordinates` attribute it adds),
- GeoZarr and other extension efforts are only about (6),
- OME-Zarr is possibly also just (6)?
- Consolidated metadata is an example of (6), but one that's supported by (3), (4), and (5) (but not necessarily by other `Store` implementations).
- Kerchunk has it's own replacement for (1), but to read it you need to use the `FSSpecStore` implementation that's part of (4).

### Current docs

The main zarr homepage only says

> Zarr is a community project to develop specifications and software for storage of large N-dimensional typed arrays

which is true but insufficient.

This is a problem because it leads to confusion as to what zarr "is" - for example many people understandably but mistakenly think that (1) is Zarr, and that (2) is the specification for this file format. It also makes it harder for potential contributors (e.g. @nenb) to place their ideas within the framework of the zarr project.

I think this separation of layers is awesome, I just wish I could have understood it a year earlier via reading the docs, instead of having to have it explained to me one-to-one by the likes of @d-v-b, @jhamman, and @rabernat.

cc also @maxrjones @paraseba 

### Suggested fix for documentation

We should have a new page on the main Zarr docs explaining these layers, and a page on the zarr-python docs explaining how it fits into this framework. Other projects such as Icechunk and VirtualiZarr can then more easily explain their relationship to Zarr.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DOC: Missing page on layers of Zarr abstractions #2956

Describe the issue linked to the documentation

What is zarr?

"Zarr" projects and how they fit in

Current docs

Suggested fix for documentation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

DOC: Missing page on layers of Zarr abstractions #2956

Description

Describe the issue linked to the documentation

What is zarr?

"Zarr" projects and how they fit in

Current docs

Suggested fix for documentation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions