Add best practices for STAC Zarr and N-Dimensional Arrays #29

emmanuelmathot · 2025-10-15T07:53:43Z

This first PRs captures roughly the discussion of Day #1 of the STAC Sprint 2025 in Rome.

It needs to be refined and maybe split in more PRs.

best-practices-zarr-ndarray.md

lukasbindreiter · 2025-10-15T08:28:01Z

best-practices-zarr-ndarray.md

+- Zarr v2: `"application/vnd+zarr; version=2"`
+- Zarr v3: `"application/vnd+zarr; version=3"`


Should probably also add those in the table in asset and link best practices

We agreed to leave everything in the best-practices-zarr-ndarray.md for the time being until we consolidate the principles and then eventually move the sections to the appropriate other guides.
cc @m-mohr

Is the version documented in the official ZARR docs or did we invent the version parameter here/in STAC?

No reference found in the Zarr official doc but the metdia-type is already adopted in pystac (https://pystac.readthedocs.io/en/stable/api/media_type.html)

For background on the pystac definition: stac-utils/pystac#1546

best-practices-zarr-ndarray.md

florianziemen

Thanks a lot for your efforts! The Climate and Weather example looks very good to me. I've suggested a few minor edits to match the original data and stac specs a bit better.

best-practices-zarr-ndarray.md

Co-authored-by: Julia Signell <[email protected]>

Co-authored-by: Michele Claus <[email protected]>

Co-authored-by: Julia Signell <[email protected]>

Scartography · 2025-10-15T09:37:28Z

Apart from specifying the href to asset itself, the item should shave information how the asset pathsare being constructed and the assets should have information about the "groups" leading via a standard template to the assets themselves so:

Template suggestion information like this could be at the item level and the assets should stick to it:
self.item.assets[0].href = self.item.href/{group1}/{group2}/{group3}.../{asset/band_name/data_entry}

Thus:

https://objects.eodc.eu:443/e05ab01a9d56408d82ac32d69a5aae2a:202510-s02msil2a-eu/14/products/cpm_v256/S2C_MSIL2A_20251014T142151_N0511_R096_T25WET_20251014T161521.zarr/meassurement/reflectance/r10m/b02

where group1=meassurement, group2=r10m, data_entry=b02, this should work for any group names, which should be specified in the assets.

An example of item self.href (link) for reference.

self.item.href = "https://objects.eodc.eu:443/e05ab01a9d56408d82ac32d69a5aae2a:202510-s02msil2a-eu/14/products/cpm_v256/S2C_MSIL2A_20251014T142151_N0511_R096_T25WET_20251014T161521.zarr"

best-practices-zarr-ndarray.md

…Zarr best practices

Co-authored-by: Matthias Mohr <[email protected]>

mario-winkler · 2025-10-16T07:38:00Z

best-practices-zarr-ndarray.md

+
+1. **A Zarr asset SHALL reference a group containing one or more arrays or groups**
+
+   This is equivalent to an xarray Dataset or an xarray DataTree.


I would propose to tag such asset with the role group (or similar) to make it easier for clients to find such assets programmatically.

@emmanuelmathot

…cube extension (#33) @emmanuelmathot this PR will merge the changes we discussed into your existing PR

emmanuelmathot · 2025-10-16T09:18:49Z

@clausmichele @fabricebrito @Scartography Discussion for store moved here: radiantearth/stac-spec#1367

rhysrevans3 · 2025-10-16T09:56:12Z

best-practices-zarr-ndarray.md

+- The kerchunk reference file is considered as the data store and thus is reference as a link with `rel: store`
+- Assets include both the reference file and source data
+- Role `"reference"` indicates virtual/indirect data access
+- Role `"source"` indicates the underlying data files


Don't think these points match the example above, since the Assets don't include a reference file. Should Virtual Zarr files follow the same principle of the normal Zarr files if they reference a group of arrays they belong in Assets if they are a Zarr store they should be included in the links?

emmanuelmathot · 2025-10-16T10:12:22Z

linking for mime-type: stac-utils/pystac#1546

m-mohr · 2025-10-16T13:31:13Z

For the media type, there's no official registration at IANA yet. If OGC(?) would register it as part of the community standard process, we should probably make them aware that we'd appreciate a version parameter being registered.

jameshawkes · 2025-11-05T14:51:06Z

best-practices-zarr-ndarray.md

+
+   Individual arrays within the store SHOULD NOT be represented as separate assets.
+
+   The appropriate level depends on how users will access the data.


Weather and climate typically has deep hierarchical nesting, with multiple layers of subgrouping. There is not really a single appropriate level for users as a whole.

This is something we already struggle with for representing datacube data in STAC, when we are trying to map STAC to our raw datacubes (non-Zarr). We proposed the linked templates extension, with the ability to apply this to child links, to handle this problem neatly:
stac-extensions/link-templates#1

I think the same issue is surfacing here (but instead of catalogs and items, its assets and variables/bands). Flattening the multi-dimensional cube into a list of assets just doesn't do justice to the n-dimensional structure of a datacube. Also the size becomes enormous -- it can work for lower-dimension or smaller datacubes, but not for larger datacubes.

Using the linkTemplate, as described further down this document, for individual arrays helps a little, but I think linkTemplating should be allowed at the asset/group level, then we could do something like:

"assets": { "forecast": { "type": "application/vnd+zarr; version=3", "title": "Ensemble Forecast for <date>", "linkTemplate": { "rel": "data", "title": "Forecast field", "uriTemplate": "s3://bucket/path/forecast.zarr/{ensemble_member}/{step}/", "variables": { "ensemble_member": { "description": "Index or identifier of ensemble member", "type": "string", "enum": [ "1", "2", "3", ... "50" ] }, "step": { "description": "Forecast lead time (e.g., 6h, 12h, 24h)", "type": "string", "enum": [ "1", "2", "3", ... "360" ] } } } } }

This only shows a 2D example but would generalise pretty well to any N-dimensional structure. Plain enum's used here but you could also use more appropriate template schemas.

Within these groups we still have a large number of variables/"bands" which also benefit from link templates.

What is the step level pointing to? Group or array?

…ional Arrays

…set href requirements

m-mohr

I'm having an issue with this title / scoping:

Either we define best practices that apply generally for n-d arrays (aka datacubes) or it's a ZARR best practice. The mixture right now is not ideal, given that there is also another general datacube best practice evolving here: https://github.com/EOEPCA/datacube-access/blob/main/best_practices/stac_best_practices.md

If it's just for ZARR (and it's ancestry netCDF), it should only claim that but the document can keep as is.
If it's meant to be more generic, then we should merge the two best practices.

I'm open to both variants.

przell · 2025-11-19T16:13:05Z

If it's just for ZARR (and it's ancestry netCDF), it should only claim that but the document can keep as is.
If it's meant to be more generic, then we should merge the two best practices.

In a nutshell, in the mentioned evolving document we tried to cover data cubes in general. It is split into

"Datacubes"
- For datacube native formats (e.g. zarr)
- It's a stub - the best practices defined here are great and would replace this part.
"Raster Data"
- Construct data cubes from formats that are not "data cube native" (COGs, JPEG2000)
- Deconstruct data cubes into these formats.

Since many collections are based on the non n-d-formats it's also relevant to have these best practices.
How to have them organized is up for discussion: One document or two. A link between them (the two use cases) would be nice though.

best-practices-zarr-ndarray.md

abarciauskas-bgse · 2025-11-19T16:24:00Z

best-practices-zarr-ndarray.md

+
+**Key Points**:
+
+- The kerchunk reference file is considered as the data store and thus is reference as a link with `rel: store`


The kerchunk reference file is considered as the data store couples the convention to the kerchunk implementation. I think this should say the collection should include a link with rel:store which points to the entrypoint of the virtual Zarr store.

abarciauskas-bgse · 2025-11-19T16:27:13Z

best-practices-zarr-ndarray.md

+**Key Points**:
+
+- The kerchunk reference file is considered as the data store and thus is reference as a link with `rel: store`
+- Assets include both the reference file and source data


Again, I think this ties the convention to the kerchunk implementation. Common across Zarr stores is that there should is a single entrypoint (i.e. URL) so I feel like that should be the only standard practice. In my opinion, its up to the implementer if they want to also include source data as assets. For both virtual zarr stores (where the data source may be netcdf files) and native zarr stores (where the data source are zarr chunks), how data is stored should be largely abstracted from the user.

Looking at the example, I would assume this is an example for an item, not a collection. Is that right? I think we should have an example for both items and collections and make sure the practice is not uniquely applicable to kerchunk.

Co-authored-by: Aimee Barciauskas <[email protected]>

Add best practices for STAC Zarr and N-Dimensional Arrays

cb7465a

lukasbindreiter reviewed Oct 15, 2025

View reviewed changes

best-practices-zarr-ndarray.md Outdated Show resolved Hide resolved

clausmichele reviewed Oct 15, 2025

View reviewed changes

best-practices-zarr-ndarray.md Outdated Show resolved Hide resolved

fabricebrito reviewed Oct 15, 2025

View reviewed changes

best-practices-zarr-ndarray.md Outdated Show resolved Hide resolved

clausmichele reviewed Oct 15, 2025

View reviewed changes

best-practices-zarr-ndarray.md Outdated Show resolved Hide resolved

lukasbindreiter reviewed Oct 15, 2025

View reviewed changes

best-practices-zarr-ndarray.md Show resolved Hide resolved

clausmichele reviewed Oct 15, 2025

View reviewed changes

best-practices-zarr-ndarray.md Outdated Show resolved Hide resolved

lukasbindreiter reviewed Oct 15, 2025

View reviewed changes

best-practices-zarr-ndarray.md Outdated Show resolved Hide resolved

jsignell reviewed Oct 15, 2025

View reviewed changes

clausmichele reviewed Oct 15, 2025

View reviewed changes

best-practices-zarr-ndarray.md Outdated Show resolved Hide resolved

lukasbindreiter reviewed Oct 15, 2025

View reviewed changes

best-practices-zarr-ndarray.md Outdated Show resolved Hide resolved

florianziemen reviewed Oct 15, 2025

View reviewed changes

best-practices-zarr-ndarray.md Outdated Show resolved Hide resolved

best-practices-zarr-ndarray.md Outdated Show resolved Hide resolved

best-practices-zarr-ndarray.md Outdated Show resolved Hide resolved

best-practices-zarr-ndarray.md Outdated Show resolved Hide resolved

lukasbindreiter reviewed Oct 15, 2025

View reviewed changes

best-practices-zarr-ndarray.md Outdated Show resolved Hide resolved

clausmichele reviewed Oct 15, 2025

View reviewed changes

best-practices-zarr-ndarray.md Outdated Show resolved Hide resolved

florianziemen reviewed Oct 15, 2025

View reviewed changes

best-practices-zarr-ndarray.md Outdated Show resolved Hide resolved

jsignell reviewed Oct 15, 2025

View reviewed changes

best-practices-zarr-ndarray.md Show resolved Hide resolved

lukasbindreiter reviewed Oct 15, 2025

View reviewed changes

best-practices-zarr-ndarray.md Outdated Show resolved Hide resolved

emmanuelmathot and others added 10 commits October 15, 2025 11:00

Update best-practices-zarr-ndarray.md

78d0e06

Co-authored-by: Julia Signell <[email protected]>

Update best-practices-zarr-ndarray.md

aaad71c

Co-authored-by: Julia Signell <[email protected]>

Update best-practices-zarr-ndarray.md

7e44932

Co-authored-by: Julia Signell <[email protected]>

Update best-practices-zarr-ndarray.md

4cdb92a

Co-authored-by: Julia Signell <[email protected]>

Update best-practices-zarr-ndarray.md

384bbe8

Co-authored-by: Michele Claus <[email protected]>

Update best-practices-zarr-ndarray.md

c8e1e87

Co-authored-by: Julia Signell <[email protected]>

Update best-practices-zarr-ndarray.md

402faf1

Co-authored-by: Julia Signell <[email protected]>

Update best-practices-zarr-ndarray.md

92d89f6

Co-authored-by: Julia Signell <[email protected]>

Update best-practices-zarr-ndarray.md

c99feb9

Co-authored-by: Julia Signell <[email protected]>

Update best-practices-zarr-ndarray.md

7e7bd17

Co-authored-by: Julia Signell <[email protected]>

fabricebrito reviewed Oct 15, 2025

View reviewed changes

best-practices-zarr-ndarray.md Outdated Show resolved Hide resolved

emmanuelmathot added 2 commits October 15, 2025 15:51

Fix formatting of linkTemplates section in best practices for Zarr

5a17583

Remove early draft section on multi-resolution and pyramid data from …

5a4899c

…Zarr best practices

emmanuelmathot added the sprint label Oct 15, 2025

emmanuelmathot and others added 2 commits October 15, 2025 16:27

Update reflectance asset type and roles in Zarr best practices

0f77f2c

Update best-practices-zarr-ndarray.md

b1f18cb

Co-authored-by: Matthias Mohr <[email protected]>

mario-winkler reviewed Oct 16, 2025

View reviewed changes

emmanuelmathot mentioned this pull request Oct 16, 2025

What a Zarr asset should point to #30

Open

Restructure asset organization section, add example with updated data…

71b1b2a

…cube extension (#33) @emmanuelmathot this PR will merge the changes we discussed into your existing PR

emmanuelmathot mentioned this pull request Oct 16, 2025

Restructure asset organization section, add example with updated datacube extension #33

Merged

rhysrevans3 reviewed Oct 16, 2025

View reviewed changes

mario-winkler mentioned this pull request Oct 20, 2025

Add best practices for representing EOPF Zarr stores in STAC EOPF-Sample-Service/eopf-stac#54

Draft

emmanuelmathot mentioned this pull request Oct 24, 2025

Proposal for replacing variable object with bands stac-extensions/datacube#30

Open

emmanuelmathot marked this pull request as ready for review November 3, 2025 16:13

przell mentioned this pull request Nov 4, 2025

Finalize STAC Best Practices for Data Cubes - #3304 EOEPCA/datacube-access#67

Open

1 task

jameshawkes reviewed Nov 5, 2025

View reviewed changes

emmanuelmathot added 3 commits November 17, 2025 13:55

Major consolidation following discussions in stac-extensions/datacube#30

0be28ea

Fix typos and improve clarity in best practices for Zarr and N-Dimens…

0e0c44c

…ional Arrays

Add guidelines for multiple Zarr stores per collection and clarify as…

4834bf1

…set href requirements

gadomski self-requested a review November 17, 2025 16:35

emmanuelmathot mentioned this pull request Nov 17, 2025

docs: clarify unit field usage in common metadata radiantearth/stac-spec#1373

Merged

4 tasks

m-mohr self-requested a review November 18, 2025 11:50

m-mohr requested changes Nov 19, 2025

View reviewed changes

abarciauskas-bgse reviewed Nov 19, 2025

View reviewed changes

best-practices-zarr-ndarray.md Outdated Show resolved Hide resolved

abarciauskas-bgse reviewed Nov 19, 2025

View reviewed changes

Update best-practices-zarr-ndarray.md

5e856b3

Co-authored-by: Aimee Barciauskas <[email protected]>

		- Zarr v2: `"application/vnd+zarr; version=2"`
		- Zarr v3: `"application/vnd+zarr; version=3"`


		1. A Zarr asset SHALL reference a group containing one or more arrays or groups

		This is equivalent to an xarray Dataset or an xarray DataTree.


		Individual arrays within the store SHOULD NOT be represented as separate assets.

		The appropriate level depends on how users will access the data.


		Key Points:

		- The kerchunk reference file is considered as the data store and thus is reference as a link with `rel: store`

Add best practices for STAC Zarr and N-Dimensional Arrays #29

Are you sure you want to change the base?

Add best practices for STAC Zarr and N-Dimensional Arrays #29

Conversation

emmanuelmathot commented Oct 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

m-mohr Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

florianziemen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Scartography commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

emmanuelmathot commented Oct 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

emmanuelmathot commented Oct 16, 2025

Uh oh!

m-mohr commented Oct 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

m-mohr left a comment

Choose a reason for hiding this comment

Uh oh!

przell commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

m-mohr Oct 15, 2025 •

edited

Loading

Scartography commented Oct 15, 2025 •

edited

Loading

przell commented Nov 19, 2025 •

edited

Loading