XArray Environmental Data Services
macOS
brew install netcdf4 h5 geos proj eccodesThen install with uv in a virtualenv:
uv venv
source venv/bin/activate
uv pip install -r requirements.txtOr install with pip in a virtualenv:
virtualenv -p python3 env/
source env/bin/activate
pip install -r requirements.txtBuild the react app
cd viewer/
npm run install
npm run buildRun the following in the activated virtualenv:
DATASETS_MAPPING_FILE=./test.json python app.pyWhere DATASETS_MAPPING_FILE is the path to the dataset key value store as described here. You can now navigate to http://localhost:8090/docs to see the supported operations
The docker container for the app can be built with:
docker build -t xreds:latest .There are aso build arguments available when building the docker image:
ROOT_PATH: The root path the app will be served from. Defaults to/xreds/.WORKERS: The number of gunicorn workers to run. Defaults to1.
Once built, it requires a few things to be run: The 8090 port to be exposed, and a volume for the datasets to live in, and the environment variable pointing to the dateset json file.
docker run -p 8090:8090 -e "DATASETS_MAPPING_FILE=/path/to/datasets.json" -v "/path/to/datasets:/opt/xreds/datasets" xreds:latestThere are a few docker compose examples to get started with:
docker compose -ddocker compose -f docker-compose-redis.yml up -ddocker compose -f docker-compose-nginx.yml up -dDatasets are specified in a key value manner, where the keys are the dataset ids and the values are objects with the path and access control info for the datasets:
{
"gfswave_global": {
"path": "s3://nextgen-dmac/kerchunk/gfswave_global_kerchunk.json",
"type": "kerchunk",
"chunks": {},
"drop_variables": ["orderedSequenceData"],
"storage_options": {
"target_protocol": "s3",
"target_options": {
"anon": false,
"key": "my aws key"
"secret": "my aws secret"
}
}
},
"dbofs": {
"path": "s3://nextgen-dmac/nos/nos.dbofs.fields.best.nc.zarr",
"type": "kerchunk",
"chunks": {
"ocean_time": 1
},
"drop_variables": ["dstart"],
"mask_variables": {
"time": "time_mask"
}
}
}Equivalent yaml is also supported:
---
gfswave_global:
path: s3://nextgen-dmac/kerchunk/gfswave_global_kerchunk.json
type: kerchunk
chunks: {}
drop_variables:
- orderedSequenceDataCurrently zarr, netcdf, virtual-icechunk, and kerchunk dataset types are supported. This information should be saved in a file and specified when running via environment variable DATASETS_MAPPING_FILE.
{
// path to dataset - used in xr.open_dataset(path)
"path": "s3://nextgen-dmac/kerchunk/gfswave_global_kerchunk.json",
// type of dataset - supported options: ZARR | KERCHUNK | NETCDF
"type": "kerchunk",
// (optional) engine used when opening dataset - only used when type=netcdf
// [default: None]
"engine": "netcdf4",
// (optional) chunking strategy for dataset - see xr.open_dataset docs
// [default: None]
"chunks": {},
// (optional) array of dataset variable names to drop - see xr.open_dataset docs
// [default: None]
"drop_variables": ["orderedSequenceData"],
// (optional) when type=kerchunk|zarr - see fsspec ReferenceFileSystem
// when type=virtual-icechunk - see virtualizarr/icechunk
"storage_options": {
// passed to fsspec ReferenceFileSystem
"remote_protocol": "s3",
// passed to fsspec ReferenceFileSystem
"remote_options": {
"anon": true,
},
// passed to fsspec ReferenceFileSystem
"target_protocol": "s3",
// passed to fsspec ReferenceFileSystem
"target_options": {
"anon": false,
},
},
"extensions": {
"vdatum": {
// fsspec path to vdatum dataset
"path": "s3://nextgen-dmac-cloud-ingest/nos/vdatums/ngofs2_vdatums.nc.zarr",
// variable to use for water level
"water_level_var": "zeta",
// variable mapping to vdatum transformation
"vdatum_var": "mllwtomsl",
// name of the vdatum transformation
"vdatum_name": "mllw"
}
}
}The following environment variables can be set to configure the app:
DATASETS_MAPPING_FILE: The fsspec compatible path to the dataset key value store as described herePORT: The port the app should run on. Defaults to8090.WORKERS: The number of worker threads handling requests. Defaults to1ROOT_PATH: The root path the app will be served from. Defaults to be served from the root.DATASET_CACHE_TIMEOUT: The time in seconds to cache the dataset metadata. Defaults to600(10 minutes).USE_MEMORY_CACHE: Whether to save loaded datasets into worker memory. Defaults toTrueMEMORY_CACHE_NUM_DATASETS: Number of datasets that are concurrently loaded into worker memory, with 0 being unlimited. Defaults to0EXPORT_THRESHOLD: The maximum size file to allow to be exported. Defaults to500mbUSE_REDIS_CACHE: Whether to use a redis cache for the app. Defaults toFalseREDIS_HOST: [Optional] The host of the redis cache. Defaults tolocalhostREDIS_PORT: [Optional] The port of the redis cache. Defaults to6379
First follow instructions above to build the docker image tagged xreds:latest. Then thexreds:latest image needs to be tagged and deployed to the relevant docker registry.
# Auth with ECR
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/m2c5k9c1
# Tag the image
docker tag xreds:latest public.ecr.aws/m2c5k9c1/nextgen-dmac/xreds:latest
# Push the image
docker push public.ecr.aws/m2c5k9c1/nextgen-dmac/xreds:latest