GitHub - QianyeSu/GaussRegular: ECMWF reduced Gaussian (N/O) to regular Gaussian converter

A lightweight, high-performance Python library for converting ECMWF reduced Gaussian grids (N-grids and O-grids) to regular Gaussian grids. Designed for efficient in-memory processing with zero intermediate file I/O.

Overview

GaussRegular specializes in converting ECMWF reduced Gaussian grids to full (regular) Gaussian grids:

N-grids: Classical reduced Gaussian grids (e.g., N320, N640)
O-grids: Octahedral reduced Gaussian grids (e.g., O320, O640, O1280)

Both grid types carry a pl array (points per latitude row) and use identical row-wise interpolation logic, matching CDO's setgridtype,regular and setgridtype,regularnn behavior.

Features

Dual grid support: Handles both classical N-grids (e.g., N320, N640) and octahedral O-grids (e.g., O320, O640, O1280)
Two interpolation modes:
- Bilinear (default) — matching CDO setgridtype,regular
- Nearest-neighbor — matching CDO setgridtype,regularnn
C-accelerated: Core interpolation in compiled C, ~10-100x faster than pure NumPy
In-memory processing: Read GRIB2 data → convert → analyze → output, no intermediate files
Configurable precision: precision="auto" preserves float32/float64 input precision; single and double force the output precision
xarray/cfgrib integration: Direct support for GRIB2 data
Batch processing: Efficient multi-dimensional array handling (time, level, etc.)

Installation

pip install gaussregular

With xarray support:

pip install "gaussregular[xarray]"

Quick Start

Case 1: Direct Numpy Array Conversion

If you have a NumPy array with known grid metadata:

import numpy as np
import gaussregular as gr

# Your reduced Gaussian data (1D array, one value per grid point)
values = np.random.rand(1040640)  # Example data

# Gaussian row length array (from GRIB metadata, e.g., GRIB_pl)
pl = np.array([20, 27, 36, 40, 45, ..., 360, 360], dtype=np.int32)  # 161 latitude rows in this example

# Create regularizer
regularizer = gr.GaussRegularizer(method="linear")

# Convert to regular Gaussian (2D array)
regular_data, lon = regularizer.regularize_values(
    values=values,
    pl=pl,
    missval=np.nan
)

print(f"Output shape: {regular_data.shape}")  # e.g. (161, 320) in this example
print(f"Longitude points: {lon.size}")

Case 2: xarray DataArray (from cfgrib)

If you're reading GRIB2 files with cfgrib:

import xarray as xr
import gaussregular as gr

# Open GRIB2 file with cfgrib backend
ds = xr.open_dataset("era5_sample.grib", engine="cfgrib")
da = ds["temperature"]  # xarray.DataArray

# Create regularizer
regularizer = gr.GaussRegularizer(method="linear", cache=True)

# Convert directly
regular_da = regularizer.regularize_xarray(da)

print(f"Regular grid shape: {regular_da.shape}")
print(f"Coordinates preserved: {list(regular_da.coords.keys())}")

The library automatically extracts GRIB_pl, grid type, and other metadata from the DataArray attributes.

Case 3: Multi-dimensional Data (Time series, Levels)

Batch processing with leading dimensions:

import xarray as xr
import gaussregular as gr

# Multi-dimensional xarray: (time, level, values)
ds = xr.open_dataset("era5_multi_level.grib", engine="cfgrib")
da = ds["temperature"]  # Shape: (24, 137, 1040640) in this example

regularizer = gr.GaussRegularizer(method="linear", cache=True)
regular_da = regularizer.regularize_xarray(da)

print(f"Output shape: {regular_da.shape}")  # e.g. (24, 137, 161, 320) in this example

The regularizer automatically handles reshaping and maintains coordinate dimensions.

Case 4: xarray Dataset (Multiple Variables)

You can pass a full xr.Dataset directly. All convertible reduced-Gaussian variables are regularized.

import xarray as xr
import gaussregular as gr

ds = xr.open_dataset("era5_multi_vars.grib", engine="cfgrib")

regularizer = gr.GaussRegularizer(method="linear", cache=True)
regular_ds = regularizer.regularize_dataset(ds)

# Or use auto dispatch:
# regular_ds = regularizer.regularize(ds)

print(regular_ds)

Case 5: Regional Sub-area Data

For data covering only a portion of the globe (e.g., a regional cut from a model run), specify the longitude bounds and grid number:

import numpy as np
import gaussregular as gr

# Regional reduced Gaussian data
values = np.random.rand(342080)  # Subset of N320 (e.g., Europe region)
pl = np.array([...])  # Row lengths for the region

regularizer = gr.GaussRegularizer(method="linear")

# Specify regional bounds and grid number for proper output sizing
regular_data, lon = regularizer.regularize_values(
    values=values,
    pl=pl,
    missval=np.nan,
    grid_number=320,  # N320 grid
    xfirst=0.0,       # Start longitude (degrees east)
    xlast=40.0,       # End longitude (degrees east)
)

print(f"Regional grid shape: {regular_data.shape}")
print(f"Longitude range: {lon[0]:.2f}–{lon[-1]:.2f}°E")

Parameters for regional data:

grid_number (int): Gaussian truncation number (e.g., 320 for N320). Required to distinguish global vs. regional grids.
xfirst (float, default=0.0): Longitude of the first data point (degrees east, range 0–360).
xlast (float, default=359.9999): Longitude of the last data point (degrees east).

The library automatically infers the output longitude count based on the regional bounds.

Advanced: Fast Mode (Default)

By default fast=True is used for best performance. The linear interpolation kernel handles missing values (NaN / sentinel) correctly even in fast mode — if both interpolation neighbors are missing the output is missing, otherwise the valid neighbor is used. Set fast=False only if you need per-row missing-value scanning for strict correctness on exotic sentinel values:

import gaussregular as gr

engine = gr.GaussRegularizer(method="linear", cache=True)

# Default: fast=True — recommended for all ERA5/GRIB data
regular_ds = engine.regularize_dataset(ds)

# Safe mode: per-row missing-value scan (slower, ~1.8x)
regular_ds_safe = engine.regularize_dataset(ds, fast=False)

Advanced: Precision

By default precision="auto" is used. Float32 inputs use the single-precision C path and return float32 outputs; float64 inputs use the double-precision C path and return float64 outputs. Use precision="single" or precision="double" to force a specific output precision:

import gaussregular as gr

engine = gr.GaussRegularizer(method="linear", cache=True)

regular_auto = engine.regularize_xarray(da)  # follows da.dtype for float32/float64
regular_f32 = engine.regularize_xarray(da, precision="single")
regular_f64 = engine.regularize_xarray(da, precision="double")

API Reference

`GaussRegularizer`

Main converter class.

Parameters:

method (str, default="linear"): Interpolation method, either "linear" or "nearest"
precision (str, default="auto"): Precision policy, one of "auto", "single", or "double". Auto preserves float32/float64 input precision
grid_number (int, optional): Default Gaussian truncation number N (e.g., 320 for N320/O320). Used when GRIB_N is missing for xarray inputs and as a fallback when regularize_values is called without a per-call grid_number. When neither metadata nor this attribute provide N, a heuristic based on the equatorial row length is used
cache (bool, default=False): Enable xarray metadata plan caching for repeated calls on same-structure data
max_plan_cache (int, default=32): Maximum number of cached conversion plans

Methods:

regularize(data, nlon=None, grid_type_hint=None, method=None)

Auto-detects input type (xr.Dataset, xr.DataArray, or NumPy) and calls appropriate method
Returns regularized data in same type as input

regularize_values(values, pl, missval, nlon=None, method=None, fast=True)

values: 1D NumPy array (len(values) == sum(pl))
pl: 1D NumPy array of row lengths (from GRIB_pl)
missval: Missing-value sentinel (float). Points equal to this value (or NaN, if missval is NaN) are treated as missing and handled by the interpolation kernel
nlon (optional): Number of columns in output regular grid. Auto-inferred if not provided
method (optional): Per-call method override: "linear" or "nearest"
fast (default=True): Per-row missing-value scan toggle. True (recommended) lets the interpolation kernel handle NaN/sentinel natively. Set False for explicit per-row scanning
Returns: (out, lon) where out is NumPy array and lon is longitude vector

regularize_xarray(dataarray, nlon=None, grid_type_hint=None, method=None, fast=True)

dataarray: xarray.DataArray with GRIB metadata in .attrs
Requires: GRIB_pl attribute. If GRIB_missingValue is present it is used as missval; otherwise NaN values are treated as missing
method (optional): Per-call method override: "linear" or "nearest"
fast (default=True): Per-row missing-value scan toggle
Returns: xarray.DataArray with regularized data and preserved coordinates

regularize_dataset(dataset, nlon=None, grid_type_hint=None, method=None, fast=True)

dataset: xarray.Dataset with one or more reduced Gaussian variables
Converts each convertible variable and keeps non-convertible variables unchanged
method (optional): Per-call method override: "linear" or "nearest"
fast (default=True): Per-row missing-value scan toggle
Returns: xarray.Dataset

clear_cache()

Clears internal xarray plan cache

Module-level Functions

regularize_values(values, pl, missval, nlon=None, method="linear", fast=True)

Standalone function using default regularizer configuration

regularize_xarray(dataarray, grid_type_hint=None, method="linear", fast=True)

Standalone function using default regularizer configuration

regularize_dataset(dataset, grid_type_hint=None, method="linear", fast=True)

Standalone function using default regularizer configuration

Grid Specifics

N-grids (Classical Gaussian)

Regular spacing in longitude
Example: N320 → 4×N = 1280 columns at the equator in the regular grid
Total reduced-grid points: 4×N×(2N+1)

O-grids (Octahedral)

Refinement at equator for better accuracy
Example: O320, O640, O1280
Formula: nlon = 4×N + 16 for O-grid

The library automatically detects grid type from GRIB metadata. You can also override with grid_type_hint parameter.

Supported Grid Types

GaussRegular recognizes many GRIB metadata variants:

reduced_gg
reduced-gg
reduced gaussian
octahedral_gg
octahedral_reduced_gg
And several aliases

See GRID_TYPE_FULL_NAMES in the API for the full list.

Performance Tuning

Interpolation Method

method="linear" (default): Bilinear interpolation. Slightly slower but more accurate.
method="nearest": Nearest-neighbor. Faster, less smooth.

Missing-Value Detection

fast=True (default): Recommended mode. The linear interpolation kernel correctly propagates missing values at boundaries (both neighbors missing → output missing). Provides maximum throughput.
fast=False: Per-row missing-value scan before interpolation. Use only if you have exotic non-NaN sentinel values that require explicit detection. Typical slowdown: ~1.8x.

Caching

Enable cache=True when processing multiple DataArrays with identical structure to reuse parsed metadata.

Typical throughput:

On a typical ERA5 model-level example (4 time steps × 137 levels, N96/O96 grids), a full-field linear interpolation runs in ~0.24 s on a modern desktop CPU

Limitations

Missing values are propagated by the interpolation kernel — NaN/sentinel inputs produce NaN outputs at affected grid points by default (fast=True). For strict per-row scanning use fast=False
Requires GRIB metadata — pl array must be provided or in DataArray attrs
Supports only Gaussian grids — other map projections not supported
Precision: Outputs double-precision computation; results cast back to input dtype
No projection support — only Gaussian grid conversions

Testing & Validation

GaussRegular has been validated against CDO setgridtype,regular and setgridtype,regularnn with double-precision agreement to machine epsilon.

Test with real ERA5 data:

python examples/run_era5_demo.py

Requirements

Python: 3.9+
NumPy: ≥1.23
Optional: xarray ≥2023.1, cfgrib ≥0.9.10

Citation

If you use GaussRegular in research, please cite:

@software{gaussregular,
  author = {Qianye Su},
  title = {GaussRegular: ECMWF Reduced Gaussian to Regular Grid Converter},
  year = {2026},
  url = {https://github.com/QianyeSu/GaussRegular}
}

Troubleshooting

"Missing GRIB_pl in DataArray attrs"

Ensure you opened the file with engine="cfgrib"
Check that cfgrib correctly parsed the GRIB2 file

"Input does not look like reduced Gaussian grid"

Verify your file contains a reduced Gaussian grid (not full Gaussian, lat-lon, or other projection)
Pass grid_type_hint if metadata is nonstandard

"Results look wrong near missing values"

Ensure you are not using fast=True when missing values are present
Verify that missing values are either NaN (and missval is left as default) or match the GRIB_missingValue/missval you pass
If necessary, preprocess your data（例如插值或填充）以减少大块缺测区域对结果的影响

Performance is slower than expected

Use method="linear" unless you specifically want nearest-neighbour behavior
Use caching: regularizer.cache=True for repeated calls
Profile with larger batch operations instead of single conversions

Contributing

Contributions welcome! Please:

Read the source code and existing patterns
Write tests for new features
Ensure CI passes before submitting PR

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github		.github
assets		assets
src/gaussregular		src/gaussregular
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
meson.build		meson.build
meson_options.txt		meson_options.txt
pyproject.toml		pyproject.toml
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

Overview

Features

Installation

With xarray support:

Quick Start

Case 1: Direct Numpy Array Conversion

Case 2: xarray DataArray (from cfgrib)

Case 3: Multi-dimensional Data (Time series, Levels)

Case 4: xarray Dataset (Multiple Variables)

Case 5: Regional Sub-area Data

Advanced: Fast Mode (Default)

Advanced: Precision

API Reference

GaussRegularizer

Parameters:

Methods:

Module-level Functions

Grid Specifics

N-grids (Classical Gaussian)

O-grids (Octahedral)

Supported Grid Types

Performance Tuning

Interpolation Method

Missing-Value Detection

Caching

Limitations

Testing & Validation

Requirements

Citation

Troubleshooting

"Missing GRIB_pl in DataArray attrs"

"Input does not look like reduced Gaussian grid"

"Results look wrong near missing values"

Performance is slower than expected

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`GaussRegularizer`

Packages