-
Notifications
You must be signed in to change notification settings - Fork 3
Enhance documentation for STAC integration and Zarr asset access #130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
emmanuelmathot
wants to merge
2
commits into
main
Choose a base branch
from
zarr-store
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+114
−0
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -200,6 +200,116 @@ output.zarr/ | |||||||
| └── .zmetadata # Consolidated metadata | ||||||||
| ``` | ||||||||
|
|
||||||||
| ## STAC Integration and Zarr URL Resolution | ||||||||
|
|
||||||||
| ### Every Zarr Path is openable by client | ||||||||
|
|
||||||||
| Zarr is a **key/value store protocol**, not a file format. Crucially for clients, this means that **any Zarr group path is itself a valid store entry point**. A URL like: | ||||||||
|
|
||||||||
| ``` | ||||||||
| s3://bucket/S2A_MSIL2A_20251008T100041.zarr/measurements/reflectance/r10m | ||||||||
| ``` | ||||||||
|
|
||||||||
| is not a path that needs to be split or reverse-parsed to find some "real" store root. It *is* the store. The Zarr spec defines the existence of a node by whether `{path}/zarr.json` resolves to valid metadata — and any valid group path satisfies this. Clients like xarray, zarr-python, GDAL, and OpenLayers should therefore **open the asset href directly as a Zarr store**, without needing to know anything about the hierarchy above it. | ||||||||
|
|
||||||||
| ```python | ||||||||
| import xarray as xr | ||||||||
|
|
||||||||
| # Open the asset href directly — no splitting or parsing needed | ||||||||
| asset_href = "s3://bucket/S2A_MSIL2A.zarr/measurements/reflectance/r10m" | ||||||||
| ds = xr.open_dataset(asset_href, engine="zarr") | ||||||||
| ``` | ||||||||
|
|
||||||||
| This is the fundamental principle: **the STAC asset href is the URL to open, and it works as a complete, self-contained Zarr store**. | ||||||||
|
|
||||||||
| ### Consolidated Metadata Enables Standalone Group Access | ||||||||
|
|
||||||||
| A Zarr group becomes fully self-contained for clients when it carries [consolidated metadata](https://zarr.readthedocs.io/en/main/user-guide/consolidated_metadata.html). Consolidated metadata embeds the metadata of all descendant nodes inside the group's own `zarr.json` (Zarr v3) or `.zmetadata` (Zarr v2), so a client can discover the entire sub-hierarchy structure in a single request — no traversal, no requests to parent groups. | ||||||||
|
|
||||||||
| All EOPF-produced Zarr groups pointed to by STAC assets **MUST** have consolidated metadata. This is indicated in the STAC asset using the [Zarr STAC Extension](https://github.com/stac-extensions/zarr) field `zarr:consolidated: true`. | ||||||||
|
|
||||||||
| ```json | ||||||||
| "assets": { | ||||||||
| "reflectance": { | ||||||||
| "href": "s3://bucket/S2A_MSIL2A.zarr/measurements/reflectance/r10m", | ||||||||
| "type": "application/vnd.zarr; version=3", | ||||||||
| "zarr:consolidated": true, | ||||||||
| "zarr:node_type": "group", | ||||||||
| "zarr:zarr_format": 3 | ||||||||
| } | ||||||||
| } | ||||||||
| ``` | ||||||||
|
|
||||||||
| With consolidated metadata present, a client opening `asset_href` directly has everything it needs to work with the group and its children — without any knowledge of the parent hierarchy. | ||||||||
|
|
||||||||
| ### Role of the `rel: store` Link | ||||||||
|
|
||||||||
| The [STAC Zarr Best Practices](https://github.com/radiantearth/stac-best-practices/blob/main/best-practices-zarr.md#store-link-relationship) define a `"store"` relationship for exactly this purpose. All EOPF-produced STAC Items and Collections **MUST** include this link: | ||||||||
|
|
||||||||
| ```json | ||||||||
| "links": [ | ||||||||
| { | ||||||||
| "rel": "store", | ||||||||
| "href": "s3://bucket/S2A_MSIL2A_20251008T100041.zarr", | ||||||||
| "type": "application/vnd.zarr; version=3", | ||||||||
| "title": "Zarr Store Root" | ||||||||
| } | ||||||||
| ] | ||||||||
| ``` | ||||||||
|
|
||||||||
| Its purpose is **navigation and discovery**, not URL parsing: | ||||||||
|
|
||||||||
| - It lets clients traverse or inspect the **full Zarr hierarchy** above the asset group (siblings, parent groups, global attributes). | ||||||||
| - It provides a single stable reference to the underlying storage location, useful for tools that need to know where the data lives (e.g., to construct pre-signed URLs, or list all groups in a store). | ||||||||
| - It allows a client to verify that all assets in the STAC object share coverage under the same store. | ||||||||
|
|
||||||||
| !!! note | ||||||||
| Opening the `rel: store` href directly is equivalent to opening the top-level Zarr root — useful for exploring the complete dataset structure, but **not required** for using any individual asset. | ||||||||
|
|
||||||||
| ### URL Naming Constraint | ||||||||
|
|
||||||||
| Group names, array names, and any intermediate path segments **MUST NOT** end with `.zarr`. The `.zarr` suffix SHOULD appear at most once in a full URL — only at the store root level — as a human-readable convention. This avoids confusion when reading URLs, even though no client should rely on this suffix for parsing. | ||||||||
|
|
||||||||
| ``` | ||||||||
| ✅ s3://bucket/S2A_MSIL2A_20251008T100041.zarr/measurements/reflectance/r10m | ||||||||
| ❌ s3://bucket/S2A_MSIL2A_20251008T100041.zarr/measurements.zarr/reflectance | ||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is confusing, because s3://bucket/S2A_MSIL2A_20251008T100041.zarr/measurements/reflectance/r10m does not have consolidated metadata. I'd suggest changing to
Suggested change
|
||||||||
| ``` | ||||||||
|
|
||||||||
| ### EOPF Product URL Anatomy | ||||||||
|
|
||||||||
| For a Sentinel-2 L2A EOPF product, the store and asset relationship looks like this: | ||||||||
|
|
||||||||
| ``` | ||||||||
| rel: store → s3://bucket/S2A_MSIL2A_20251008T100041.zarr (top-level root) | ||||||||
| │ | ||||||||
| └── measurements/ | ||||||||
| └── reflectance/ ← asset href (open this directly as a store) | ||||||||
| ├── r10m/ ← sub-group: open directly, or path-join with band name | ||||||||
| │ ├── b02 ← array: asset_href + "/" + band_name | ||||||||
| │ ├── b03 | ||||||||
| │ └── b04 | ||||||||
| ├── r20m/ | ||||||||
| └── r60m/ | ||||||||
| ``` | ||||||||
|
|
||||||||
| The band `name` field in the STAC `bands` array is designed so that `asset_href + "/" + band_name` constructs the correct full Zarr array URL: | ||||||||
|
|
||||||||
| ```python | ||||||||
| import zarr | ||||||||
|
|
||||||||
| asset_href = "s3://bucket/S2A_MSIL2A.zarr/measurements/reflectance/r10m" | ||||||||
| band_name = "b04" | ||||||||
| red_band = zarr.open_array(asset_href + "/" + band_name, mode="r") | ||||||||
| ``` | ||||||||
|
|
||||||||
| ### Related Specifications | ||||||||
|
|
||||||||
| - **[Zarr v3 specification](https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html)** — defines the abstract store interface, hierarchy paths, and `zarr.json` metadata documents | ||||||||
| - **[STAC Zarr Best Practices](https://github.com/radiantearth/stac-best-practices/blob/main/best-practices-zarr.md)** — defines the `rel: store` link, asset media types, band representation patterns, and consolidated metadata guidance | ||||||||
| - **[Zarr STAC Extension](https://github.com/stac-extensions/zarr)** — adds `zarr:node_type`, `zarr:zarr_format`, and `zarr:consolidated` fields to STAC assets | ||||||||
|
|
||||||||
| --- | ||||||||
|
|
||||||||
| ## Metadata Architecture | ||||||||
|
|
||||||||
| ### 1. CF Conventions Compliance | ||||||||
|
|
||||||||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is confusing in the context of multiscales. Clients that support multiscales should receive asset urls with the available multiscales below them: