Skip to content

Extend CMEW recipe generation to support multi-run (N>2) ESMValTool datasets and plots #339

@mo-nikosbaltas

Description

@mo-nikosbaltas

Background
As part of investigation #315, CMEW was successfully refactored to support parallel standardisation of multiple model runs (e.g. ref, eval, eval2) using Cylc parametrisation and an external runs.json configuration. CDDS standardisation, restructuring, and workflow orchestration now correctly handle an arbitrary number of runs in parallel.
However, when running ESMValTool diagnostics (e.g. radiation_budget), only two plots are produced, even when three or more runs are configured and standardised successfully. This is because the generated ESMValTool recipe YAML still contains a fixed number of dataset entries (historically REF/EVAL).

Problem

  • CMEW supports N runs operationally, but
  • The recipe generation logic (e.g. update_recipe_file.py) only emits datasets for REF/EVAL
  • ESMValTool therefore only processes those datasets and produces plots for them
  • Additional runs present on disk (and visible under share/work/GCModelDev/...) are ignored
    This results in an inconsistency between:
  • the number of model runs standardised by CMEW, and
  • the number of datasets/plots processed by ESMValTool.

Scope of Work
This issue covers updating CMEW to make recipe generation fully multi-run aware.
Specifically:
1. Update recipe generation logic
Modify update_recipe_file.py to:

  • read the same run configuration used by standardisation (e.g. runs.json)
  • dynamically generate one datasets: entry per run
  • remove hard-coded REF/EVAL assumptions

2. Ensure dataset metadata consistency
Each generated dataset entry must correctly set:

  • dataset / model
  • ensemble (variant_label)
  • grid, exp, mip, institute
  • labels/aliases used in plots

3. Validate multi-dataset diagnostics
Confirm that diagnostics:

  • run correctly with 3+ datasets
  • produce expected plots (multi-model or per-model, depending on diagnostic)
    Document any diagnostics that are inherently pairwise or REF/EVAL-specific

4. Backward compatibility

  • Existing REF/EVAL configurations must continue to work unchanged
  • No breaking changes for users who still run two-model workflows

5. Documentation
Update README / developer docs to clarify:

  • how multi-run datasets are defined
  • how plots scale with number of runs
  • any diagnostic-specific limitations

Acceptance Criteria
CMEW run configured with ≥3 runs produces:

  • ≥3 dataset entries in the generated recipe YAML
  • corresponding plots or multi-model plots from ESMValTool

REF/EVAL workflows still function without modification
Code paths no longer assume exactly two runs

Metadata

Metadata

Labels

enhancementNew feature or requestneeds refinementThis issue needs to be refinedrecipeAnything related to ESMValTool

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions