A lightweight, high-performance Python library for converting ECMWF reduced Gaussian grids (N-grids and O-grids) to regular Gaussian grids. Designed for efficient in-memory processing with zero intermediate file I/O.
GaussRegular specializes in converting ECMWF reduced Gaussian grids to full (regular) Gaussian grids:
- N-grids: Classical reduced Gaussian grids (e.g., N320, N640)
- O-grids: Octahedral reduced Gaussian grids (e.g., O320, O640, O1280)
Both grid types carry a pl array (points per latitude row) and use identical row-wise interpolation logic, matching CDO's setgridtype,regular and setgridtype,regularnn behavior.
- Dual grid support: Handles both classical N-grids (e.g., N320, N640) and octahedral O-grids (e.g., O320, O640, O1280)
- Two interpolation modes:
- Bilinear (default) — matching CDO
setgridtype,regular - Nearest-neighbor — matching CDO
setgridtype,regularnn
- Bilinear (default) — matching CDO
- C-accelerated: Core interpolation in compiled C, ~10-100x faster than pure NumPy
- In-memory processing: Read GRIB2 data → convert → analyze → output, no intermediate files
- Configurable precision:
precision="auto"preserves float32/float64 input precision;singleanddoubleforce the output precision - xarray/cfgrib integration: Direct support for GRIB2 data
- Batch processing: Efficient multi-dimensional array handling (time, level, etc.)
pip install gaussregularpip install "gaussregular[xarray]"If you have a NumPy array with known grid metadata:
import numpy as np
import gaussregular as gr
# Your reduced Gaussian data (1D array, one value per grid point)
values = np.random.rand(1040640) # Example data
# Gaussian row length array (from GRIB metadata, e.g., GRIB_pl)
pl = np.array([20, 27, 36, 40, 45, ..., 360, 360], dtype=np.int32) # 161 latitude rows in this example
# Create regularizer
regularizer = gr.GaussRegularizer(method="linear")
# Convert to regular Gaussian (2D array)
regular_data, lon = regularizer.regularize_values(
values=values,
pl=pl,
missval=np.nan
)
print(f"Output shape: {regular_data.shape}") # e.g. (161, 320) in this example
print(f"Longitude points: {lon.size}")If you're reading GRIB2 files with cfgrib:
import xarray as xr
import gaussregular as gr
# Open GRIB2 file with cfgrib backend
ds = xr.open_dataset("era5_sample.grib", engine="cfgrib")
da = ds["temperature"] # xarray.DataArray
# Create regularizer
regularizer = gr.GaussRegularizer(method="linear", cache=True)
# Convert directly
regular_da = regularizer.regularize_xarray(da)
print(f"Regular grid shape: {regular_da.shape}")
print(f"Coordinates preserved: {list(regular_da.coords.keys())}")The library automatically extracts GRIB_pl, grid type, and other metadata from the DataArray attributes.
Batch processing with leading dimensions:
import xarray as xr
import gaussregular as gr
# Multi-dimensional xarray: (time, level, values)
ds = xr.open_dataset("era5_multi_level.grib", engine="cfgrib")
da = ds["temperature"] # Shape: (24, 137, 1040640) in this example
regularizer = gr.GaussRegularizer(method="linear", cache=True)
regular_da = regularizer.regularize_xarray(da)
print(f"Output shape: {regular_da.shape}") # e.g. (24, 137, 161, 320) in this exampleThe regularizer automatically handles reshaping and maintains coordinate dimensions.
You can pass a full xr.Dataset directly. All convertible reduced-Gaussian variables are regularized.
import xarray as xr
import gaussregular as gr
ds = xr.open_dataset("era5_multi_vars.grib", engine="cfgrib")
regularizer = gr.GaussRegularizer(method="linear", cache=True)
regular_ds = regularizer.regularize_dataset(ds)
# Or use auto dispatch:
# regular_ds = regularizer.regularize(ds)
print(regular_ds)For data covering only a portion of the globe (e.g., a regional cut from a model run), specify the longitude bounds and grid number:
import numpy as np
import gaussregular as gr
# Regional reduced Gaussian data
values = np.random.rand(342080) # Subset of N320 (e.g., Europe region)
pl = np.array([...]) # Row lengths for the region
regularizer = gr.GaussRegularizer(method="linear")
# Specify regional bounds and grid number for proper output sizing
regular_data, lon = regularizer.regularize_values(
values=values,
pl=pl,
missval=np.nan,
grid_number=320, # N320 grid
xfirst=0.0, # Start longitude (degrees east)
xlast=40.0, # End longitude (degrees east)
)
print(f"Regional grid shape: {regular_data.shape}")
print(f"Longitude range: {lon[0]:.2f}–{lon[-1]:.2f}°E")Parameters for regional data:
grid_number(int): Gaussian truncation number (e.g., 320 for N320). Required to distinguish global vs. regional grids.xfirst(float, default=0.0): Longitude of the first data point (degrees east, range 0–360).xlast(float, default=359.9999): Longitude of the last data point (degrees east).
The library automatically infers the output longitude count based on the regional bounds.
By default fast=True is used for best performance. The linear interpolation kernel handles missing values (NaN / sentinel) correctly even in fast mode — if both interpolation neighbors are missing the output is missing, otherwise the valid neighbor is used. Set fast=False only if you need per-row missing-value scanning for strict correctness on exotic sentinel values:
import gaussregular as gr
engine = gr.GaussRegularizer(method="linear", cache=True)
# Default: fast=True — recommended for all ERA5/GRIB data
regular_ds = engine.regularize_dataset(ds)
# Safe mode: per-row missing-value scan (slower, ~1.8x)
regular_ds_safe = engine.regularize_dataset(ds, fast=False)By default precision="auto" is used. Float32 inputs use the single-precision C path and return float32 outputs; float64 inputs use the double-precision C path and return float64 outputs. Use precision="single" or precision="double" to force a specific output precision:
import gaussregular as gr
engine = gr.GaussRegularizer(method="linear", cache=True)
regular_auto = engine.regularize_xarray(da) # follows da.dtype for float32/float64
regular_f32 = engine.regularize_xarray(da, precision="single")
regular_f64 = engine.regularize_xarray(da, precision="double")Main converter class.
method(str, default="linear"): Interpolation method, either"linear"or"nearest"precision(str, default="auto"): Precision policy, one of"auto","single", or"double". Auto preserves float32/float64 input precisiongrid_number(int, optional): Default Gaussian truncation number N (e.g., 320 for N320/O320). Used whenGRIB_Nis missing for xarray inputs and as a fallback whenregularize_valuesis called without a per-callgrid_number. When neither metadata nor this attribute provide N, a heuristic based on the equatorial row length is usedcache(bool, default=False): Enable xarray metadata plan caching for repeated calls on same-structure datamax_plan_cache(int, default=32): Maximum number of cached conversion plans
regularize(data, nlon=None, grid_type_hint=None, method=None)
- Auto-detects input type (
xr.Dataset,xr.DataArray, or NumPy) and calls appropriate method - Returns regularized data in same type as input
regularize_values(values, pl, missval, nlon=None, method=None, fast=True)
values: 1D NumPy array (len(values) == sum(pl))pl: 1D NumPy array of row lengths (from GRIB_pl)missval: Missing-value sentinel (float). Points equal to this value (or NaN, ifmissvalis NaN) are treated as missing and handled by the interpolation kernelnlon(optional): Number of columns in output regular grid. Auto-inferred if not providedmethod(optional): Per-call method override:"linear"or"nearest"fast(default=True): Per-row missing-value scan toggle.True(recommended) lets the interpolation kernel handle NaN/sentinel natively. SetFalsefor explicit per-row scanning- Returns:
(out, lon)whereoutis NumPy array andlonis longitude vector
regularize_xarray(dataarray, nlon=None, grid_type_hint=None, method=None, fast=True)
dataarray: xarray.DataArray with GRIB metadata in.attrs- Requires:
GRIB_plattribute. IfGRIB_missingValueis present it is used asmissval; otherwise NaN values are treated as missing method(optional): Per-call method override:"linear"or"nearest"fast(default=True): Per-row missing-value scan toggle- Returns: xarray.DataArray with regularized data and preserved coordinates
regularize_dataset(dataset, nlon=None, grid_type_hint=None, method=None, fast=True)
dataset: xarray.Dataset with one or more reduced Gaussian variables- Converts each convertible variable and keeps non-convertible variables unchanged
method(optional): Per-call method override:"linear"or"nearest"fast(default=True): Per-row missing-value scan toggle- Returns: xarray.Dataset
clear_cache()
- Clears internal xarray plan cache
regularize_values(values, pl, missval, nlon=None, method="linear", fast=True)
- Standalone function using default regularizer configuration
regularize_xarray(dataarray, grid_type_hint=None, method="linear", fast=True)
- Standalone function using default regularizer configuration
regularize_dataset(dataset, grid_type_hint=None, method="linear", fast=True)
- Standalone function using default regularizer configuration
- Regular spacing in longitude
- Example: N320 → 4×N = 1280 columns at the equator in the regular grid
- Total reduced-grid points: 4×N×(2N+1)
- Refinement at equator for better accuracy
- Example: O320, O640, O1280
- Formula: nlon = 4×N + 16 for O-grid
The library automatically detects grid type from GRIB metadata. You can also override with grid_type_hint parameter.
GaussRegular recognizes many GRIB metadata variants:
reduced_ggreduced-ggreduced gaussianoctahedral_ggoctahedral_reduced_gg- And several aliases
See GRID_TYPE_FULL_NAMES in the API for the full list.
method="linear"(default): Bilinear interpolation. Slightly slower but more accurate.method="nearest": Nearest-neighbor. Faster, less smooth.
fast=True(default): Recommended mode. The linear interpolation kernel correctly propagates missing values at boundaries (both neighbors missing → output missing). Provides maximum throughput.fast=False: Per-row missing-value scan before interpolation. Use only if you have exotic non-NaN sentinel values that require explicit detection. Typical slowdown: ~1.8x.
- Enable
cache=Truewhen processing multiple DataArrays with identical structure to reuse parsed metadata.
Typical throughput:
- On a typical ERA5 model-level example (4 time steps × 137 levels, N96/O96 grids), a full-field linear interpolation runs in ~0.24 s on a modern desktop CPU
- Missing values are propagated by the interpolation kernel — NaN/sentinel inputs produce NaN outputs at affected grid points by default (
fast=True). For strict per-row scanning usefast=False - Requires GRIB metadata —
plarray must be provided or in DataArray attrs - Supports only Gaussian grids — other map projections not supported
- Precision: Outputs double-precision computation; results cast back to input dtype
- No projection support — only Gaussian grid conversions
GaussRegular has been validated against CDO setgridtype,regular and setgridtype,regularnn with double-precision agreement to machine epsilon.
Test with real ERA5 data:
python examples/run_era5_demo.py- Python: 3.9+
- NumPy: ≥1.23
- Optional: xarray ≥2023.1, cfgrib ≥0.9.10
If you use GaussRegular in research, please cite:
@software{gaussregular,
author = {Qianye Su},
title = {GaussRegular: ECMWF Reduced Gaussian to Regular Grid Converter},
year = {2026},
url = {https://github.com/QianyeSu/GaussRegular}
}
- Ensure you opened the file with
engine="cfgrib" - Check that cfgrib correctly parsed the GRIB2 file
- Verify your file contains a reduced Gaussian grid (not full Gaussian, lat-lon, or other projection)
- Pass
grid_type_hintif metadata is nonstandard
- Ensure you are not using
fast=Truewhen missing values are present - Verify that missing values are either NaN (and
missvalis left as default) or match theGRIB_missingValue/missvalyou pass - If necessary, preprocess your data(例如插值或填充)以减少大块缺测区域对结果的影响
- Use
method="linear"unless you specifically want nearest-neighbour behavior - Use caching:
regularizer.cache=Truefor repeated calls - Profile with larger batch operations instead of single conversions
Contributions welcome! Please:
- Read the source code and existing patterns
- Write tests for new features
- Ensure CI passes before submitting PR