Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 110 additions & 0 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,116 @@ output.zarr/
└── .zmetadata # Consolidated metadata
```

## STAC Integration and Zarr URL Resolution

### Every Zarr Path is openable by client

Zarr is a **key/value store protocol**, not a file format. Crucially for clients, this means that **any Zarr group path is itself a valid store entry point**. A URL like:

```
s3://bucket/S2A_MSIL2A_20251008T100041.zarr/measurements/reflectance/r10m
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing in the context of multiscales. Clients that support multiscales should receive asset urls with the available multiscales below them:

s3://bucket/S2A_MSIL2A_20251008T100041.zarr/measurements/reflectance

```

is not a path that needs to be split or reverse-parsed to find some "real" store root. It *is* the store. The Zarr spec defines the existence of a node by whether `{path}/zarr.json` resolves to valid metadata — and any valid group path satisfies this. Clients like xarray, zarr-python, GDAL, and OpenLayers should therefore **open the asset href directly as a Zarr store**, without needing to know anything about the hierarchy above it.

```python
import xarray as xr

# Open the asset href directly — no splitting or parsing needed
asset_href = "s3://bucket/S2A_MSIL2A.zarr/measurements/reflectance/r10m"
ds = xr.open_dataset(asset_href, engine="zarr")
```

This is the fundamental principle: **the STAC asset href is the URL to open, and it works as a complete, self-contained Zarr store**.

### Consolidated Metadata Enables Standalone Group Access

A Zarr group becomes fully self-contained for clients when it carries [consolidated metadata](https://zarr.readthedocs.io/en/main/user-guide/consolidated_metadata.html). Consolidated metadata embeds the metadata of all descendant nodes inside the group's own `zarr.json` (Zarr v3) or `.zmetadata` (Zarr v2), so a client can discover the entire sub-hierarchy structure in a single request — no traversal, no requests to parent groups.

All EOPF-produced Zarr groups pointed to by STAC assets **MUST** have consolidated metadata. This is indicated in the STAC asset using the [Zarr STAC Extension](https://github.com/stac-extensions/zarr) field `zarr:consolidated: true`.

```json
"assets": {
"reflectance": {
"href": "s3://bucket/S2A_MSIL2A.zarr/measurements/reflectance/r10m",
"type": "application/vnd.zarr; version=3",
"zarr:consolidated": true,
"zarr:node_type": "group",
"zarr:zarr_format": 3
}
}
```

With consolidated metadata present, a client opening `asset_href` directly has everything it needs to work with the group and its children — without any knowledge of the parent hierarchy.

### Role of the `rel: store` Link

The [STAC Zarr Best Practices](https://github.com/radiantearth/stac-best-practices/blob/main/best-practices-zarr.md#store-link-relationship) define a `"store"` relationship for exactly this purpose. All EOPF-produced STAC Items and Collections **MUST** include this link:

```json
"links": [
{
"rel": "store",
"href": "s3://bucket/S2A_MSIL2A_20251008T100041.zarr",
"type": "application/vnd.zarr; version=3",
"title": "Zarr Store Root"
}
]
```

Its purpose is **navigation and discovery**, not URL parsing:

- It lets clients traverse or inspect the **full Zarr hierarchy** above the asset group (siblings, parent groups, global attributes).
- It provides a single stable reference to the underlying storage location, useful for tools that need to know where the data lives (e.g., to construct pre-signed URLs, or list all groups in a store).
- It allows a client to verify that all assets in the STAC object share coverage under the same store.

!!! note
Opening the `rel: store` href directly is equivalent to opening the top-level Zarr root — useful for exploring the complete dataset structure, but **not required** for using any individual asset.

### URL Naming Constraint

Group names, array names, and any intermediate path segments **MUST NOT** end with `.zarr`. The `.zarr` suffix SHOULD appear at most once in a full URL — only at the store root level — as a human-readable convention. This avoids confusion when reading URLs, even though no client should rely on this suffix for parsing.

```
✅ s3://bucket/S2A_MSIL2A_20251008T100041.zarr/measurements/reflectance/r10m
❌ s3://bucket/S2A_MSIL2A_20251008T100041.zarr/measurements.zarr/reflectance
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing, because s3://bucket/S2A_MSIL2A_20251008T100041.zarr/measurements/reflectance/r10m does not have consolidated metadata. I'd suggest changing to

Suggested change
s3://bucket/S2A_MSIL2A_20251008T100041.zarr/measurements.zarr/reflectance
s3://bucket/S2A_MSIL2A_20251008T100041.zarr/measurements/reflectance
❌ s3://bucket/S2A_MSIL2A_20251008T100041.zarr/measurements.zarr/reflectance

```

### EOPF Product URL Anatomy

For a Sentinel-2 L2A EOPF product, the store and asset relationship looks like this:

```
rel: store → s3://bucket/S2A_MSIL2A_20251008T100041.zarr (top-level root)
└── measurements/
└── reflectance/ ← asset href (open this directly as a store)
├── r10m/ ← sub-group: open directly, or path-join with band name
│ ├── b02 ← array: asset_href + "/" + band_name
│ ├── b03
│ └── b04
├── r20m/
└── r60m/
```

The band `name` field in the STAC `bands` array is designed so that `asset_href + "/" + band_name` constructs the correct full Zarr array URL:

```python
import zarr

asset_href = "s3://bucket/S2A_MSIL2A.zarr/measurements/reflectance/r10m"
band_name = "b04"
red_band = zarr.open_array(asset_href + "/" + band_name, mode="r")
```

### Related Specifications

- **[Zarr v3 specification](https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html)** — defines the abstract store interface, hierarchy paths, and `zarr.json` metadata documents
- **[STAC Zarr Best Practices](https://github.com/radiantearth/stac-best-practices/blob/main/best-practices-zarr.md)** — defines the `rel: store` link, asset media types, band representation patterns, and consolidated metadata guidance
- **[Zarr STAC Extension](https://github.com/stac-extensions/zarr)** — adds `zarr:node_type`, `zarr:zarr_format`, and `zarr:consolidated` fields to STAC assets

---

## Metadata Architecture

### 1. CF Conventions Compliance
Expand Down
3 changes: 3 additions & 0 deletions docs/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@

Practical examples demonstrating common use cases for the EOPF GeoZarr library.

!!! tip "Opening Zarr assets — no URL parsing required"
Every STAC asset href pointing to a Zarr group (e.g. `s3://bucket/data.zarr/measurements/reflectance/r10m`) **is itself a valid Zarr store** and can be opened directly by any Zarr-compatible client. No reverse-parsing or store-root extraction is needed. See [STAC Integration and Zarr URL Resolution](architecture.md#stac-integration-and-zarr-url-resolution) in the Architecture docs for the full model, the role of the `rel: store` link, and consolidated metadata requirements.

## Basic Examples

### Simple Local Conversion
Expand Down
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Welcome to the EOPF GeoZarr library documentation. This library provides tools t
- **[API Reference](api-reference.md)** - Complete Python API documentation
- **[Examples](examples.md)** - Practical examples for common use cases
- **[Architecture](architecture.md)** - Technical architecture and design principles
- **[STAC Integration and Zarr URL Resolution](architecture.md#stac-integration-and-zarr-url-resolution)** - How to unambiguously parse Zarr group/array URLs from STAC assets using the `rel: store` link
- **[GeoZarr Mini Spec](geozarr-minispec.md)** - Implementation-specific GeoZarr specification

### Support
Expand Down
Loading