Background
As part of investigation #315, CMEW was successfully refactored to support parallel standardisation of multiple model runs (e.g. ref, eval, eval2) using Cylc parametrisation and an external runs.json configuration. CDDS standardisation, restructuring, and workflow orchestration now correctly handle an arbitrary number of runs in parallel.
However, when running ESMValTool diagnostics (e.g. radiation_budget), only two plots are produced, even when three or more runs are configured and standardised successfully. This is because the generated ESMValTool recipe YAML still contains a fixed number of dataset entries (historically REF/EVAL).
Problem
- CMEW supports N runs operationally, but
- The recipe generation logic (e.g.
update_recipe_file.py) only emits datasets for REF/EVAL
- ESMValTool therefore only processes those datasets and produces plots for them
- Additional runs present on disk (and visible under
share/work/GCModelDev/...) are ignored
This results in an inconsistency between:
- the number of model runs standardised by CMEW, and
- the number of datasets/plots processed by ESMValTool.
Scope of Work
This issue covers updating CMEW to make recipe generation fully multi-run aware.
Specifically:
1. Update recipe generation logic
Modify update_recipe_file.py to:
- read the same run configuration used by standardisation (e.g.
runs.json)
- dynamically generate one datasets: entry per run
- remove hard-coded REF/EVAL assumptions
2. Ensure dataset metadata consistency
Each generated dataset entry must correctly set:
- dataset / model
- ensemble (variant_label)
- grid, exp, mip, institute
- labels/aliases used in plots
3. Validate multi-dataset diagnostics
Confirm that diagnostics:
- run correctly with 3+ datasets
- produce expected plots (multi-model or per-model, depending on diagnostic)
Document any diagnostics that are inherently pairwise or REF/EVAL-specific
4. Backward compatibility
- Existing REF/EVAL configurations must continue to work unchanged
- No breaking changes for users who still run two-model workflows
5. Documentation
Update README / developer docs to clarify:
- how multi-run datasets are defined
- how plots scale with number of runs
- any diagnostic-specific limitations
Acceptance Criteria
CMEW run configured with ≥3 runs produces:
- ≥3 dataset entries in the generated recipe YAML
- corresponding plots or multi-model plots from ESMValTool
REF/EVAL workflows still function without modification
Code paths no longer assume exactly two runs
Background
As part of investigation #315, CMEW was successfully refactored to support parallel standardisation of multiple model runs (e.g. ref, eval, eval2) using Cylc parametrisation and an external
runs.jsonconfiguration. CDDS standardisation, restructuring, and workflow orchestration now correctly handle an arbitrary number of runs in parallel.However, when running ESMValTool diagnostics (e.g. radiation_budget), only two plots are produced, even when three or more runs are configured and standardised successfully. This is because the generated ESMValTool recipe YAML still contains a fixed number of dataset entries (historically REF/EVAL).
Problem
update_recipe_file.py)only emits datasets for REF/EVALshare/work/GCModelDev/...) are ignoredThis results in an inconsistency between:
Scope of Work
This issue covers updating CMEW to make recipe generation fully multi-run aware.
Specifically:
1. Update recipe generation logic
Modify
update_recipe_file.pyto:runs.json)2. Ensure dataset metadata consistency
Each generated dataset entry must correctly set:
3. Validate multi-dataset diagnostics
Confirm that diagnostics:
Document any diagnostics that are inherently pairwise or REF/EVAL-specific
4. Backward compatibility
5. Documentation
Update README / developer docs to clarify:
Acceptance Criteria
CMEW run configured with ≥3 runs produces:
REF/EVAL workflows still function without modification
Code paths no longer assume exactly two runs