ADR: Pipeline spec by bentsherman · Pull Request #6900 · nextflow-io/nextflow

bentsherman · 2026-03-08T04:42:00Z

This PR adds an ADR for the pipeline spec based on the latest investigations.

Note that the schema for pipeline specs has already been bootstrapped based on the original nf-core schema: https://github.com/nextflow-io/schemas/blob/main/pipeline/v1/schema.json

Features:

Define the terms of adoption of the nf-core schema
Define new spec properties
Propose how to bootstrap a pipeline spec from pipeline source code

TODO:

Draft PR for nextflow-io/schemas adding new properties to pipeline schema
Add more comprehensive examples

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

netlify · 2026-03-08T04:42:04Z

✅ Deploy Preview for nextflow-docs-staging canceled.

Name	Link
🔨 Latest commit	`b582261`
🔍 Latest deploy log	https://app.netlify.com/projects/nextflow-docs-staging/deploys/69acfe1b6aca0600086b2f75

nvnieuwk · 2026-03-10T08:10:44Z

adr/20251212-pipeline-spec.md

+    }
+  ],
+
+  // inputs


Maybe it will be more clear if the input parameters are also in an inputs block or something similar? It's pretty easy to migrate the current schema to that

That was the original plan, but we wanted to avoid breaking the JSON-schema validity of the file. Basically because it's useful to be able to load it and throw it into any JSON-schema validation library (unrecognised keys are typically ignored).

Maybe a higher level use of definitions could help here? Like having one definitions for inputs, one for outputs, one for required...

Depending on how the auto-generation works out, it might be feasible to maintain an actual inputs section that mirrors the outputs, while also generating the parameter schema

In theory, the user wouldn't need to worry about the resulting duplication because they shouldn't modify those bits anyway. They would only be adding things to the JSON-schema part

Sounds good, let me know if you have something for me to test. I'd love to give it a go

nvnieuwk · 2026-03-10T08:12:57Z

adr/20251212-pipeline-spec.md

+
+- The `manifest` config options are effectively converted directly to JSON with only nominal changes, such as `manifest.name` -> `title` (preserve structure of original nf-core schema) and `nextflowVersion` -> `requires.nextflow` (leave space for module versions in the future).
+
+- The parameter schema follows the structure of the nf-core schema, which defines *parameter groups* under `$defs` and combines them using JSON schema properties such as `allOf`. This section should be generated with sensible defaults since some properties (e.g. group name) can not be specified in pipeline code.


Ungrouped parameters are also allowed in the schema, do we still want to support these or would you prefer to have everything in definitions?

Good catch - I was thinking this yesterday and meant to leave a comment but forgot. Yes we need to accept top-level ungrouped properties (ungrouped params).

{ "$schema": "http://json-schema.org/draft-07/schema", "$id": "https://raw.githubusercontent.com/YOUR_PIPELINE/master/nextflow_schema.json", "title": "Nextflow pipeline parameters", "description": "This pipeline uses Nextflow and processes some kind of data. The JSON Schema was built using the nf-core pipeline schema builder.", "type": "object", "properties": { "some_parameter": { "type": "string" } } }

Good to know. In that case I think Nextflow will generate ungrouped parameters by default and then preserve whatever groups the user adds

Yes that's how we do it in nf-core too

nvnieuwk

I really, really like this!

Will this be easy to automate using a built-in Nextflow command? Or will this require some manual work for the users? How will we make it easy for developers in that case?

ewels · 2026-03-10T09:51:40Z

Will this be easy to automate using a built-in Nextflow command?

Yes, the idea is that Nextflow will validate it at runtime and fail with an error if the schema doesn't match the internally defined params. And also that Nextflow will create / update the JSON file if it's missing or if params are added or edited.

It will only do basic keys and it'll allow customisation - so if you add fa_icon or long help texts etc. then that's fine, it'll leave those alone.

@bentsherman should we add this to the ADR?

nvnieuwk · 2026-03-10T10:07:42Z

Perfect, that would make the transition between nf-schema and built-in Nextflow function pretty seamless. I assume a migration script will also be available to migrate the JSON schema one way or another

bentsherman · 2026-03-10T12:40:05Z

Yes, there is a basic description of the spec generation / sync in the ADR but it would be good to flesh it out a bit more

awgymer · 2026-03-10T23:09:40Z

adr/20251212-pipeline-spec.md

+
+```json
+{
+  // metadata


I'm personally not a fan of this kind of information duplication in what are essentially all source files (assuming the jsonschema is still providing some of the validation logic and is not just a representation of it). If this is the route we are headed down I think it would be much more preferable for nextflow_spec.json to become the source of truth and this information be pulled over by nextflow when it's needed in the compiled source code itself (consider how e.g. Python's pyproject.toml or Rust's Cargo.toml work).

I'd almost go further and question a bit the logic of combining metadata and source-code validation logic (the params schema) into a single document? I guess it's mostly about a convenient single source for e.g. Seqera Platform launch forms?

The pipeline spec would be a source of truth for external systems, but not for Nextflow itself. The source-of-truth for Nextflow is the pipeline code

I suppose Nextflow could use the pipeline spec to perform additional validation that goes beyond what can be defined in the params block, but I would rather avoid that if possible

E.g. instead of using the pipeline spec to validate file extensions, we could add something like "blob types" to the Nextflow language that allow you to define specializations of the Path type such as Fastq, Bam, etc

The problem as I see it right now is that using jsonschema via nf-schema allows us to validate cleanly things beyond the simple type of the param. We can for example use min and max to bound an integer param. We can check string lengths. We can check things like the length of an array.

Unless the intention is to add ways to constrain the param types in Nextflow in such a way then this validation will need to continue and to do so using nf-schema it will require the use of the params part of this new spec document?

We have considered adding declarative ways to validate those kinds of things, for example:

params { num_iterations: Integer = 100 { min 1 max 1000 } }

But Paolo and I weren't too keen on doing this, at least not yet, since it would add a lot of noise to the params block. Meanwhile, you can always do this kind of validation with regular code:

workflow { if( params.num_iterations < 1 || params.num_iterations > 1000 ) { error "Parameter `num_iterations` should be between 1 and 1000" } // ... }

So it seems like we have the tools to validate everything that nf-schema validates through Nextflow code. But I'd like to wait and see how people use these features before we make any more drastic changes

The declarative method would be much better in my personal opinion.

I don't know how others feel but I was under the impression that part of the benefit of nf-schema was not having to use these kind of conditional checks in code - prior to the original nf-validation you would see huge chunks of if-else blocks at the start of pipelines.

I think this is a bigger consideration too if there is any intention for nextflow to use record types for native samplesheet validation and conversion to lists of records (which I got the sense from some teasers from Phil is something that could be coming) because there is often more of this sort of validation needed there, and do we really want to be doing something that potentially looks like:

inputs = samplesheetToRecords('samplesheet.json') def errors = [:] inputs.map{ rec -> errors["${rec.id}"] = [] if(rec.intval < 0){ errors["${rec.id}"].push(intval '${rec.intval}' must be greater than 0") } if(rec.someotherval not in ["a", "b", "c"]){ errors["${rec.id}"].push("${rec.id}: intval '${rec.intval}' must be greater than 0") } if(rec.finalval instanceOf Integer && rec.finalval > 0){ errors["${rec.id}"].push(intval '${rec.finalval}' must be less than 0 if an integer") } elif (rec.finalval instanceOf String && rec.finalval not in ["ont", "pacbio"]) { errors["${rec.id}"].push(finalval '${rec.finalval}' must be one of 'ont' or 'pacbio' if a string") <repeat for every necessary complex check> } }

This also has the effect of de-coupling the type level checking from the value level checking, meaning if you ever modified the type you'd need to then locate the value check and ensure it was still compatible.

Note also how in the last example where if a value can take multiple types (not sure if type unions are actually supported in the native nextflow type-casting yet) you would need to check the type before checking the value.

The above issues will also apply to params just without needing to be in the map call.

All fair points. We're just trying to take it one step at a time. The params block will provide much of the type-level validation, including samplesheets. But we have to find the appropriate line between declarative vs imperative validation in Nextflow, and that will take time, so I don't want to over-commit on anything yet

awgymer · 2026-03-10T23:12:36Z

adr/20251212-pipeline-spec.md

+  ],
+
+  // outputs
+  "output": {


I might be a bit fuzzy here, but this looks like a sub-schema (multiqc_report looks to have standard keys for an item in properties) but only semi-defined as such - no top-level object type, no properties, etc.

What is the purpose of this section beyond documentation? If it's intended for e.g. validating outputs then it would be better if it was a properly defined sub-schema I think?

Right now, each output can declare either a type (for simple outputs like numbers or files) or a schema (for complex outputs like a collection of samples)

In theory, we could embed the schema (e.g. for samples) directly in the pipeline spec. But if it is useful for that schema to be used on its own then it might make more sense to keep it in a separate file, as we do for samplesheet schemas.

ADR: Pipeline spec

b582261

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

nvnieuwk reviewed Mar 10, 2026

View reviewed changes

awgymer reviewed Mar 10, 2026

View reviewed changes

pditommaso force-pushed the master branch from 6fe40e1 to ea1f4ea Compare March 17, 2026 19:46


		- The `manifest` config options are effectively converted directly to JSON with only nominal changes, such as `manifest.name` -> `title` (preserve structure of original nf-core schema) and `nextflowVersion` -> `requires.nextflow` (leave space for module versions in the future).

		- The parameter schema follows the structure of the nf-core schema, which defines parameter groups under `$defs` and combines them using JSON schema properties such as `allOf`. This section should be generated with sensible defaults since some properties (e.g. group name) can not be specified in pipeline code.

Conversation

bentsherman commented Mar 8, 2026

Uh oh!

netlify bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for nextflow-docs-staging canceled.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nvnieuwk left a comment

Choose a reason for hiding this comment

Uh oh!

ewels commented Mar 10, 2026

Uh oh!

nvnieuwk commented Mar 10, 2026

Uh oh!

bentsherman commented Mar 10, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bentsherman Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

awgymer Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

netlify bot commented Mar 8, 2026 •

edited

Loading

bentsherman Mar 13, 2026 •

edited

Loading

awgymer Mar 13, 2026 •

edited

Loading