ADR: Unified record syntax for process inputs/outputs by pditommaso · Pull Request #6912 · nextflow-io/nextflow

pditommaso · 2026-03-12T10:18:11Z

Summary

Proposes using the record() function-call notation uniformly for both process inputs and outputs
Replaces the asymmetric Record { ... } block syntax currently used only in input declarations
Aligns with existing assignment (=) and type annotation (: Type) patterns already present in process I/O

Current (asymmetric):

input:
sample: Record {         // block syntax, unique to inputs
    id: String
    fastq_1: Path
}

output:
record(id: sample.id, html: file('*.html'))   // function call

Proposed (uniform):

input:
sample: Sample = record(id: String, fastq_1: Path)

output:
result: OtherType = record(id: sample.id, html: file('*.html'))

Test plan

Review ADR content for accuracy and completeness
Discuss with team whether this direction aligns with typed syntax roadmap

🤖 Generated with Claude Code

Propose using the record() function-call notation uniformly for both process inputs and outputs, replacing the asymmetric Record { ... } block syntax currently used in inputs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>

netlify · 2026-03-12T10:18:17Z

✅ Deploy Preview for nextflow-docs-staging canceled.

Name	Link
🔨 Latest commit	`2c51e4f`
🔍 Latest deploy log	https://app.netlify.com/projects/nextflow-docs-staging/deploys/69bd2f898f006f00073b81b3

bentsherman · 2026-03-12T13:33:47Z

adr/20260312-record-syntax-unification.md

+```nextflow
+process FASTQC {
+    input:
+    sample: Sample = record(id: String, fastq_1: Path, fastq_2: Path)


Worth mentioning that the = record(...) part here is redundant if you already declare the input with a record type (Sample)

From our discussion -- it could be useful to have both as a way to future-proof against changes to the Sample type. For example, if someone adds some Path fields to Sample, you don't necessarily want FASTQC to automatically start staging them as inputs

But at that point, there is no point to using Sample at all -- you might as well just use the generic Record type:

sample: Record = record(id: String, fastq_1: Path, fastq_2: Path)

bentsherman · 2026-03-12T13:37:06Z

adr/20260312-record-syntax-unification.md

+}
+```
+
+This asymmetry means the same concept (a record) is expressed with two different syntactic forms depending on context. The block syntax `Record { ... }` exists only in process input declarations and has no counterpart elsewhere in the language. Meanwhile, the `record()` function call used in outputs is already a general-purpose construct usable in any expression context.


This asymmetry means the same concept (a record) is expressed with two different syntactic forms depending on context.

I think the asymmetry is intentional. It helps distinguish between two slightly different concepts -- an input vs an output. Inputs and outputs have slightly different behaviors, especially in a process.

Inputs are receiving values from an external source, validating them against a declared structure, and staging them into the task environment

Outputs are collecting values from the task environment and pushing them into an output structure

So, process inputs and outputs are similar in some ways and different in others. The question is whether it is better to highlight their similarities or their differences

For what it's worth, the language server actually does highlight their similarities. When hovering on a process call, the process hover hint will be rendered as:

process FASTQC { input: sample: Record { id: String fastq_1: Path fastq_2: Path } output: result: Record { id: String html: Path zip: Path } }

I always parsed the difference as being the input was a type declaration (it has the same format except without a leading record I think?) that would be used to create an anonymous record type for input, and the output was instantiating a generic record instance, which can optionally be assigned?

This makes sense to me I think.

Agreed, I think this syntax is clearer and nicely separates out typing from instantiation.

I understand that's intentional, but my claim is that it's not needed cognitive load. Why as a user I should think to two different notations to express the structure of input and outputs.

Above all, the central point is how the syntax can be evolved keeping some structural continuity with the existing syntax so that as a nextflow developer I feel comfortable with it without the need to learn new concepts.

I feel comfortable with it without the need to learn new concepts.

Static typing and records are new concepts. There is no getting around that.

The question is, given that users will have to learn the new syntax for parameters, type annotations, record types, etc, is it better for the process inputs/outputs to be consistent with the new syntax everywhere else or consistent with the legacy process syntax?

If legacy continuity gets in the way of expressing inputs/outputs in the new system, surely the latter must take precedence

At the same time, we might be able to achieve both...

The best case I can see is to ditch the assignment and just have a destructor -> constructor pattern:

legacy

input: tuple val(id), path(fastq_1), path(fastq_2) output: tuple val(id), path("fastqc_${id}_logs")

typed (tuple)

input: tuple(id: String, fastq_1: Path, fastq_2: Path) output: tuple(id, file("fastqc_${id}_logs"))

typed (record)

input: record( id: String, fastq_1: Path, fastq_2: Path ) output: record( id: id, fastqc: file("fastqc_${id}_logs") )

This works well with static typing and the legacy continuity is decent.

My only concerns at that point would be:

The commas can get ugly with type annotations, for example:

input: record( id: String, single_end: Boolean, reads: List<Path>, // ugly args: String?, // ugly prefix: String? )

This is partly what led me to the block syntax which doesn't require commas.

If the new syntax is too similar, that could cause it's own confusion. Might be hard to distinguish between typed vs legacy syntax. This is the concern we had with typed workflows

adr/20260312-record-syntax-unification.md

bentsherman · 2026-03-12T14:08:47Z

adr/20260312-record-syntax-unification.md

+- Good, because reuses existing assignment and type annotation patterns.
+- Good, because `record()` is already a general-purpose function, no new syntax needed.
+- Good, because type annotations follow standard rules — `sample: Sample = record(...)` works like any typed assignment.
+- Bad, because input `record()` arguments are types rather than values, which is a different usage of the function.


This is my main issue with Option 3. The double usage of record() here is subtle and more likely to confuse users and agents

Whereas the syntax I ended up using for record inputs is easy to explain as an "inline record type"

So I think with Option 3 you are ultimately trading one form of double usage for another, without much benefit

My counter argument is that's aligned with the semantic for record constructor. Also the double usage is done for file. Above the central point is the continuity with existing notation val, path, tuple, etc

Function overloading like with file() is fine if done cautiously, but what I'm talking about is syntax overloading. You are hijacking the assignment and function call syntax for different purposes, which creates unnecessary cognitive load

The function call by itself could make sense as a reverse constructor pattern:

input: tuple(id: String, fastq_1: Path, fastq_2: Path) record(id: String, fastq_1: Path, fastq_2: Path)

That would also have better continuity with the legacy syntax

bentsherman · 2026-03-12T14:16:26Z

Great discussion overall. I would be keen to incorporate this somehow into the record type ADR, since we did discuss some of these options throughout the process but I didn't include them as alternatives in the original ADR

Would be good to document these alternatives so that we have the clear rationale for the final syntax

adr/20260312-record-syntax-unification.md

awgymer · 2026-03-12T22:58:22Z

adr/20260312-record-syntax-unification.md

+```
+
+- Good, because symmetric — same block form on both sides.
+- Bad, because the output block mixes type declarations with value assignments (`Path = file(...)`).


Is this not more akin to a type definition with some defaults? I know this may not be something that really exists in Java/Groovy but it makes perfect sense when thinking of record types as akin to something like Pydantic, where you would declare a record with a default using similar syntax.

e.g.

class DemoModel(BaseModel): ts: datetime = Field(default_factory=datetime.now)

Pydantic is an interesting comparison here. I think option 3 is essentially the same as Pydantic -- using function calls to create type definitions

Pydantic is constrained by Python syntax, so for them the best option is to use assignments and function calls, even though it conflates the meaning of this syntax (creating a value vs declaring a type). But this is probably still much better than creating a custom DSL that users would have to learn alongside Python

We have no such constraint in Nextflow, so we can differentiate these type declarations with different syntax

Extend Option 3 to establish a uniform constructor notation `name = constructor(...)` that applies to both record() and tuple() across process inputs and outputs. Highlights the migration path from classic tuple syntax through typed tuples to records. Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>

pditommaso · 2026-03-20T11:25:22Z

Updated ADR: Unified constructor notation for record() and tuple()

Key changes in this update:

Broadened scope from record-only to all structured types. The proposal now establishes name = constructor(...) as a uniform notation that applies to both record() and tuple(), not just records.

Three-tier notation for both constructors:

tuple(id, file('*.bam'))                          // bare
out = tuple(id, file('*.bam'))                    // assignment
out: Tuple<String,Path> = tuple(id, file('*.bam'))  // typed assignment

Same pattern for record() — bare, assignment, typed assignment.

Migration path from classic DSL2 through tuples to records:

tuple val(id), path(fastq)             // classic DSL2
in = tuple(id: String, fastq: Path)    // typed — uniform notation
in = record(id: String, fastq: Path)   // record — just change the keyword

Tuple input with inline named types — in = tuple(id: String, fastq: Path) — gives tuples named components while preserving positional semantics, mirroring record input notation exactly.
Coexistence example showing both tuple() and record() in the same pipeline with identical notation shape.

The core value: users learn one pattern and apply it everywhere. The only choice is positional (tuple) vs named (record).

Co-authored-by: Ben Sherman <bentshermann@gmail.com> Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>