Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
5f8c6fb
refactor terminology and structure in documentation
emmanuelmathot Aug 22, 2025
bb05f3f
Enhance documentation clarity and detail in the GeoZarr Unified Data …
emmanuelmathot Sep 2, 2025
561edd9
Refine descriptions of multiscale groups in documentation for clarity…
emmanuelmathot Sep 3, 2025
f99d742
Capitalize "Unified Data Model" for consistency across documentation …
emmanuelmathot Sep 3, 2025
b8c988b
Update section title to "Unified Data Model Structure" for consistenc…
emmanuelmathot Sep 3, 2025
8cac80c
Enhance documentation clarity by defining relationships to Zarr core …
emmanuelmathot Sep 4, 2025
08caa63
Refine Unified Data Model description to clarify adaptations for Zarr…
emmanuelmathot Sep 4, 2025
4db26fb
Fix relationship notation for Store and Hierarchy in Unified Data Mod…
emmanuelmathot Sep 4, 2025
1500a6d
Refine Unified Data Model relationships by correcting notation and en…
emmanuelmathot Sep 4, 2025
201ae3e
rename to GeoZarr standard
christophenoel Oct 10, 2025
e6dae7e
Simplified the specification to focus on the agreed intent of GeoZarr…
christophenoel Oct 10, 2025
10c637d
The Terms and Definitions section should include only common terms us…
christophenoel Oct 10, 2025
d10da75
Stressed out the core aspects of GeoZarr in the abstract
christophenoel Oct 10, 2025
4c94aae
refined dataset, and variable group.
christophenoel Oct 15, 2025
73bf157
GeoZArr data model refactoring
christophenoel Oct 15, 2025
ed0b386
adapted CF description to unidata feedback (mail)
christophenoel Oct 17, 2025
d53e08a
restrict to the minimum of what we do for the first version, to be ex…
christophenoel Oct 17, 2025
4f68cb6
adapted encodings to CDM model
christophenoel Oct 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions standard/template/geozarr-spec.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -45,23 +45,23 @@ include::sections/clause_6_informative_text.adoc[]

include::sections/clause_7_unified_data_model.adoc[]

include::sections/clause_8_conformance.adoc[]
// Discarded: include::sections/clause_8_conformance.adoc[]

include::sections/clause_9_zarr_encoding.adoc[]

include::sections/clause_10_geotiff_encoding.adoc[]
// include::sections/clause_10_geotiff_encoding.adoc[]

////
add or remove annexes after "A" as necessary
////
include::sections/annex-a.adoc[]
//include::sections/annex-a.adoc[]

include::sections/annex-n.adoc[]
// include::sections/annex-n.adoc[]

////
Revision History should be the last annex before the Bibliography
Bibliography should be the last annex
////
include::sections/annex-history.adoc[]
// include::sections/annex-history.adoc[]

include::sections/annex-bibliography.adoc[]
//include::sections/annex-bibliography.adoc[]
18 changes: 9 additions & 9 deletions standard/template/sections/clause_0_front_material.adoc
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
.Preface

The GeoZarr Unified Data Model and Encoding Standard defines a layered, standards-based framework for representing and encoding geospatial and scientific datasets in the Zarr format. It integrates foundational specifications such as the Unidata Common Data Model (CDM), the CF Conventions, and selected OGC and community standards to enable semantic, structural, and operational interoperability across Earth observation platforms and geospatial ecosystems.

This Standard introduces a unified model that harmonises metadata structures, array-based data representations, coordinate referencing, and multiscale tiling semantics. It provides a coherent framework that facilitates encoding into Zarr v2 and v3, supporting scalable, cloud-native workflows.

The purpose of this document is to provide implementation guidance and normative structure for consistent, interoperable adoption of GeoZarr across tools, platforms, and services. This work extends prior standardisation efforts within the OGC, including OGC API – Tiles, the Tile Matrix Set Standard, and EO metadata conventions, and anticipates integration with catalogue systems such as STAC.
The GeoZarr Standard defines a layered, standards-based framework for representing and encoding geospatial and scientific datasets in the Zarr format. The purpose of this document is to provide implementation guidance and normative structure for consistent, interoperable adoption of GeoZarr across tools, platforms, and services. This work extends prior standardisation efforts within the OGC, including OGC API – Tiles, the Tile Matrix Set Standard, and EO metadata conventions, and anticipates integration with catalogue systems such as STAC.

This Standard has been developed in collaboration with contributors from Earth observation, climate science, geospatial analysis, and cloud-native geodata infrastructure communities. Future work may extend this model to additional storage formats, API services, and semantic layers.

[abstract]
== Abstract

The GeoZarr Unified Data Model and Encoding Standard specifies a conceptual and implementation framework for representing multidimensional, geospatial datasets using the Zarr format. This Standard builds upon the Unidata Common Data Model (CDM) and the Climate and Forecast (CF) Conventions, and introduces interoperable constructs for tiling, georeferencing, and metadata integration.
Zarr provides efficient chunked storage for n-dimensional arrays but do not provide with the semantic constructs required for geospatial and scientific data workflows.

The model defines core elements—dimensions, coordinate variables, data variables, attributes—and optional extensions for multi-resolution overviews, affine geotransforms, and STAC metadata. Encoding guidance is provided for Zarr Version 2 and Zarr Version 3, including chunking, group hierarchy, and metadata conventions.
GeoZarr defines an abstract data model and a set of conventions for representing geospatial and scientific datasets in the Zarr format:

GeoZarr aims to bridge scientific and geospatial communities by enabling round-trip transformations with formats such as NetCDF and GeoTIFF, and supporting compatibility with tools in the scientific Python and geospatial ecosystems. This Standard enables scalable, standards-compliant, and semantically rich data structures for cloud-native Earth observation applications.
- GeoZarr bridges the Unidata CDM and the Zarr format. GeoZarr establishes the link between the Unidata Common Data Model (CDM) and the Zarr format by defining how the semantic constructs of the CDM are represented within Zarr’s storage model.
- Supports community metadata standards like CF, GeoTIFF, and GDAL.
- Extends CDM for geospatial through multiscale overviews and affine transformations.

By providing a standardized framework for geospatial semantics, GeoZarr enables scientific and geospatial applications to fully utilize cloud-native storage architectures while maintaining the rich metadata and coordinate referencing required for Earth observation workflows. The result is a modern, scalable approach to storing and accessing geospatial data that meets the needs of both data providers and consumers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this introduction.

== Submitters

Expand All @@ -29,4 +29,4 @@ All questions regarding this submission should be directed to the editor or the
|Brianna Pagán _(editor)_ | DevSeed
|Ryan Abernathey| EarthMover
| TBD | TBD
|===
|===
30 changes: 27 additions & 3 deletions standard/template/sections/clause_1_scope.adoc
Original file line number Diff line number Diff line change
@@ -1,7 +1,31 @@
== Scope

The GeoZarr Unified Data Model and Encoding Standard defines a conceptual and implementation framework for representing and encoding geospatial and scientific datasets using the Zarr format. The scope of this Standard includes the definition of a format-agnostic unified data model, the specification of its encoding into Zarr Version 2 and Version 3, and the establishment of extension points to support interoperability with external metadata and tiling standards.
The GeoZarr Standard defines a conceptual and implementation framework for representing and encoding geospatial and scientific datasets using the Zarr format. The scope of this Standard includes the definition of a format-agnostic data model, the specification of its encoding into Zarr Version 2 and Version 3, and a set of extensions to support affine transformations and overviews.

This Standard addresses the needs of Earth observation, environmental monitoring, and geospatial analysis applications that require efficient, scalable access to multidimensional datasets. It enables the harmonisation of existing data models, such as the Unidata Common Data Model (CDM) and the Climate and Forecast (CF) Conventions, with operational encoding formats suitable for cloud-native storage and analysis.
These capabilities are necessary for geospatial data because Zarr does not provide semantic constructs for geospatial data interpretation. Applications need to understand not just array shapes and values, but coordinate meanings, projection parameters, and scientific metadata. GeoZarr fills this gap without compromising Zarr's performance characteristics.

Typical use cases include the storage, transformation, discovery, and processing of raster and gridded data, data cubes with temporal or vertical dimensions, and catalogue-enabled datasets integrated with metadata standards such as STAC and OGC Tile Matrix Sets.
=== Why GeoZarr Exists

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may be missing an important clarification to justify the purpose of Geozarr: There are already existing conventions for geospatial data in Zarr, as implemented in Xarray, NCZarr, GDAL, those conventions primarily translate aspects of the CF/NetCDF data model into Zarr encoding.

However:

  1. The CF/NetCDF data model itself may lack certain capabilities, such as support for multiscale overviews, affine transforms, etc. .
  2. The current encoding conventions to Zarr – for example, mapping all NetCDF attributes into Zarr string attributes – may not be optimal and could be revisited.


Zarr, by design, is a low-level container for storing n-dimensional arrays and metadata. While this simplicity is a strength for performance and interoperability, it means Zarr lacks higher-level concepts that geospatial applications require:

* *Coordinate Systems:* No native way to associate spatial or temporal meaning with array dimensions
* *Grid Mappings:* No standard mechanism for projection and coordinate reference system metadata
* *Semantic Metadata:* No conventions for units, standard names, or scientific attributes
* *Variable Relationships:* No formal distinction between coordinate variables and data variables

These concepts are essential for geospatial workflows but must be layered on top of Zarr's array storage. GeoZarr provides this semantic layer through proven standards (Common Data Model and CF conventions) while preserving Zarr's cloud-native advantages.

=== Relationship to Zarr Core Concepts

GeoZarr builds upon Zarr's foundational concepts of <<term-store,stores>> and <<term-hierarchy, hierarchies>>. A Zarr store provides the storage and retrieval interface (e.g., filesystem, cloud object storage), while a hierarchy defines the logical tree structure of groups and arrays within that store. GeoZarr specifies how to organize and structure hierarchies to support geospatial semantics, without modifying the underlying store interface.

=== Use Cases and Applications

This Standard addresses the needs of Earth observation, environmental monitoring, and geospatial analysis applications that require efficient, scalable access to multidimensional datasets. It enables the harmonisation of existing data models with operational encoding formats suitable for cloud-native storage and analysis.

Typical use cases include:
* Storage and processing of raster and gridded data
* Management of data cubes with temporal or vertical dimensions
* Integration with catalogue systems through standardized metadata
* Multi-resolution tiling for efficient visualization and analysis
* Cloud-optimized access to large geospatial datasets
2 changes: 2 additions & 0 deletions standard/template/sections/clause_2_conformance.adoc
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
== Conformance

> WARNING: This section should be ignored and requirements classes should be designed and summarized here once the specification is completed.

The GeoZarr Unified Data Model is structured around a modular set of requirements classes. These classes define the conformance criteria for datasets and implementations adopting the GeoZarr specification. Each class provides a distinct set of structural or semantic expectations, facilitating interoperability across a broad spectrum of geospatial and scientific use cases.

The *Core* requirements class defines the minimal compliance necessary to claim conformance with the GeoZarr Unified Data Model. It is intentionally open and permissive, supporting incremental adoption and broad compatibility with existing Zarr tools and data models based on the Unidata Common Data Model (CDM).
Expand Down
37 changes: 28 additions & 9 deletions standard/template/sections/clause_4_terms_and_definitions.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,34 @@

=== Terms and definitions

GeoZarr specification inherits the terms from the following sources:

* https://docs.unidata.ucar.edu/netcdf-java/5.2/userguide/common_data_model_overview.html#data-access-layer-object-model[Unidata Common Data Model]

* https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#concepts-and-terminology[Zarr concepts and terminology].


==== affine transformation

An affine transformation is a geometric mapping that preserves points, straight lines, and parallelism. It combines linear transformations (such as rotation, scaling, reflection, or shear) with translation.


==== array

A multidimensional, regularly spaced collection of values (e.g., raster data or gridded measurements), typically indexed by dimensions such as time, latitude, longitude, or spectral band.

==== chunk

A sub-array representing a partition of a larger array, used to optimise data access and storage. In Zarr, data is stored and accessed as a collection of independently compressed chunks.
A sub-array representing a partition of a larger array, used to optimize data access and storage. In Zarr, data is stored and accessed as a collection of independently compressed chunks.

==== coordinate variable

A one-dimensional array whose values define the coordinate system for a dimension of one or more data variables. Typical examples include latitude, longitude, time, or vertical levels.

==== data model

A data model is an *abstract*, conceptual framework that defines how data is structured, organized, and interpreted, independent of any particular storage medium or implementation. In contrast, a file format represents a concrete realization of this model, defining how the data is stored on disk.

==== data variable

An array containing the primary geospatial or scientific measurements of interest (e.g., temperature, reflectance). Data variables are defined over one or more dimensions and associated with attributes.
Expand All @@ -22,29 +38,32 @@ An array containing the primary geospatial or scientific measurements of interes

An index axis along which arrays are organised. Dimensions provide a naming and ordering scheme for accessing data in multidimensional arrays (e.g., `time`, `x`, `y`, `band`).

==== group
==== dataset

A container for datasets, variables, dimensions, and metadata in Zarr. Groups may be nested to represent a logical hierarchy (e.g., for resolutions or collections).
*Avoid using:* this term is overloaded and avoided in this document. A dataset usually represent a self-contained group of variables within a hierarchical data structure. They often share one or more dimensions and represent the unit that can be opened by a data access library (see <<variable-group,variable group>>)

==== metadata

Structured information describing the content, context, and semantics of datasets, variables, and attributes. GeoZarr metadata includes CF attributes, geotransform definitions, and links to STAC metadata where applicable.

==== multiscale dataset
==== overview

A downscaled representation of a variable that facilitates rapid data display and efficient zooming. Overviews provide lower-resolution versions of the original data, enabling quick visualization and access without reading the full-resolution array. Multiple overview levels may be generated to support progressive rendering across different scales.

==== store

A dataset that includes multiple representations of the same data variable at varying spatial resolutions. Each resolution level is associated with a tile matrix from an OGC Tile Matrix Set.
A system that provides storage and retrieval operations for Zarr hierarchies, as defined in the https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#stores[Zarr core specification]. A store implements the abstract store interface and can be backed by various storage technologies such as filesystems, cloud object storage, or databases. GeoZarr hierarchies are stored within and accessed through Zarr stores.

==== tile matrix set

A spatial tiling scheme defined by a hierarchy of zoom levels and consistent grid parameters (e.g., scale, CRS). Tile Matrix Sets enable spatial indexing and tiling of gridded data.

==== transform
[[variable-group]]
==== variable group

An affine transformation used to convert between grid coordinates and geospatial coordinates, typically defined using the GDAL GeoTransform convention.
A variable group is a container that includes a coherent collection of variables sharing the same dimensional structure and coordinate system ( and may contain additional variables or subgroups). It is conceptually equivalent to an xarray Dataset..

==== unified data model (UDM)

A conceptual model that defines how to structure geospatial data in Zarr using CDM-based constructs, including support for coordinate referencing, metadata integration, and multiscale representations.

=== Abbreviated Terms

Expand Down
28 changes: 6 additions & 22 deletions standard/template/sections/clause_6_informative_text.adoc
Original file line number Diff line number Diff line change
@@ -1,30 +1,14 @@
[[overview]]
== Overview

The GeoZarr Unified Data Model and Encoding Standard defines a conceptual and implementation framework for representing multidimensional geospatial data using the Zarr format. Developed under the guidance of the OGC GeoZarr Standards Working Group (SWG), the Standard establishes conventions for encoding scientific and Earth observation datasets in a way that promotes scalability, interoperability, and compatibility with cloud-native infrastructure.
The **GeoZarr Standard** defines an **abstract data model** and a set of **conventions** for representing and describing geospatial and scientific datasets using the **Zarr** format.

GeoZarr is built on widely adopted community standards, including the Unidata Common Data Model (CDM) and Climate and Forecast (CF) Conventions. It introduces additional extensions and structural constructs to support multi-resolution tiling, geospatial referencing, and catalogue-enabled metadata integration (e.g., STAC).
Zarr provides efficient, chunked storage for n-dimensional arrays but does not include the semantic constructs required for geospatial and scientific data workflows. The **Unidata Common Data Model (CDM)** addresses this gap by introducing essential concepts that structure information through **variables**, **groups**, **coordinates**, and **metadata**. This abstract data model provides the semantic framework that enables structured interpretation of array-based data on top of Zarr’s storage foundation.

This Standard provides both:
The **primary objective** of GeoZarr is to specify how the **CDM** is encoded within Zarr. GeoZarr provides normative rules for encoding these CDM concepts in Zarr and thereby standardises the encoding practices already adopted by CDM-compatible libraries such as **xarray** and **nczarr**, promoting consistent interpretation and interoperability across tools and platforms.

* **Core requirements**, which define minimal compliance to represent array-based datasets using CDM constructs in Zarr, supporting open and permissive adoption across use cases.
* **Modular extension classes**, which define additional capabilities such as time series support, affine geotransform referencing, multi-resolution overviews, and projection coordinates, in line with OGC and community practices.
By defining an **abstract model** based on the **CDM** and a corresponding **encoding for Zarr**, GeoZarr establishes an explicit relationship between **the conceptual structure of the data** and **its physical storage representation**. Zarr defines how data are stored and accessed as chunked, hierarchical arrays, while GeoZarr specifies how this stored structure represents the scientific and geospatial meaning of the dataset..

These modular components enable GeoZarr to serve a wide range of applications—from basic EO data storage to high-performance, cloud-native visualisation and analytics workflows.

=== Encodings

GeoZarr supports encoding in both Zarr Version 2 and Zarr Version 3. Each version defines how arrays, groups, and metadata are stored within a directory-based structure. All metadata is encoded in JSON-compatible formats, ensuring both human readability and machine interoperability.

Encoding guidelines include:

* Hierarchical grouping of datasets via Zarr groups.
* Dimension indexing and binding via dimension metadata.
* Attribute-based metadata compliant with CF conventions.
* Multi-resolution overviews aligned with OGC Tile Matrix Sets.
* Optional integration of STAC metadata for discovery and cataloguing.

JSON is the primary format for metadata, attributes, and structural declarations. Implementations are encouraged to support standardised naming conventions, EPSG code references, and structured metadata to facilitate search, validation, and transformation across platforms.

GeoZarr does not prescribe a single interface for data access. Instead, it enables **serverless and cloud-native** data access strategies by aligning its model with chunked, parallelisable storage patterns that are optimised for use in object stores and analytical environments.
As a **secondary objective**, GeoZarr extends the **CDM base layer** with additional capabilities required for geospatial and cloud-native applications. These extensions include **multiscale overviews**, which enable the representation of data at multiple levels of detail, and **affine transformations**, which define the spatial relationship between data coordinates and real-world locations. All extensions remain fully aligned with the CDM framework.

The **CDM** base layer also provides a **generic framework** capable of hosting metadata from a wide range of community standards. GeoZarr encourages the use of the **Climate and Forecast (CF) Conventions**, which are themselves defined around the CDM model, without imposing them as mandatory. This flexibility also supports metadata from other domain-specific standards such as **GeoTIFF**, **GDAL**, and similar geospatial conventions.
Loading