Skip to content

ADR: Unified record syntax for process inputs/outputs#6912

Open
pditommaso wants to merge 4 commits intomasterfrom
adr/record-syntax-unification
Open

ADR: Unified record syntax for process inputs/outputs#6912
pditommaso wants to merge 4 commits intomasterfrom
adr/record-syntax-unification

Conversation

@pditommaso
Copy link
Member

@pditommaso pditommaso commented Mar 12, 2026

Summary

  • Proposes using the record() function-call notation uniformly for both process inputs and outputs
  • Replaces the asymmetric Record { ... } block syntax currently used only in input declarations
  • Aligns with existing assignment (=) and type annotation (: Type) patterns already present in process I/O

Current (asymmetric):

input:
sample: Record {         // block syntax, unique to inputs
    id: String
    fastq_1: Path
}

output:
record(id: sample.id, html: file('*.html'))   // function call

Proposed (uniform):

input:
sample: Sample = record(id: String, fastq_1: Path)

output:
result: OtherType = record(id: sample.id, html: file('*.html'))

Test plan

  • Review ADR content for accuracy and completeness
  • Discuss with team whether this direction aligns with typed syntax roadmap

🤖 Generated with Claude Code

Propose using the record() function-call notation uniformly for both
process inputs and outputs, replacing the asymmetric Record { ... }
block syntax currently used in inputs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
@netlify
Copy link

netlify bot commented Mar 12, 2026

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit 2c51e4f
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/69bd2f898f006f00073b81b3

```nextflow
process FASTQC {
input:
sample: Sample = record(id: String, fastq_1: Path, fastq_2: Path)
Copy link
Member

@bentsherman bentsherman Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth mentioning that the = record(...) part here is redundant if you already declare the input with a record type (Sample)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From our discussion -- it could be useful to have both as a way to future-proof against changes to the Sample type. For example, if someone adds some Path fields to Sample, you don't necessarily want FASTQC to automatically start staging them as inputs

But at that point, there is no point to using Sample at all -- you might as well just use the generic Record type:

    sample: Record = record(id: String, fastq_1: Path, fastq_2: Path)

}
```

This asymmetry means the same concept (a record) is expressed with two different syntactic forms depending on context. The block syntax `Record { ... }` exists only in process input declarations and has no counterpart elsewhere in the language. Meanwhile, the `record()` function call used in outputs is already a general-purpose construct usable in any expression context.
Copy link
Member

@bentsherman bentsherman Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This asymmetry means the same concept (a record) is expressed with two different syntactic forms depending on context.

I think the asymmetry is intentional. It helps distinguish between two slightly different concepts -- an input vs an output. Inputs and outputs have slightly different behaviors, especially in a process.

  • Inputs are receiving values from an external source, validating them against a declared structure, and staging them into the task environment

  • Outputs are collecting values from the task environment and pushing them into an output structure

So, process inputs and outputs are similar in some ways and different in others. The question is whether it is better to highlight their similarities or their differences

Copy link
Member

@bentsherman bentsherman Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, the language server actually does highlight their similarities. When hovering on a process call, the process hover hint will be rendered as:

process FASTQC {
    input:
    sample: Record {
        id: String
        fastq_1: Path
        fastq_2: Path
    }

    output:
    result: Record {
        id: String
        html: Path
        zip: Path
    }
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always parsed the difference as being the input was a type declaration (it has the same format except without a leading record I think?) that would be used to create an anonymous record type for input, and the output was instantiating a generic record instance, which can optionally be assigned?

This makes sense to me I think.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think this syntax is clearer and nicely separates out typing from instantiation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that's intentional, but my claim is that it's not needed cognitive load. Why as a user I should think to two different notations to express the structure of input and outputs.

Above all, the central point is how the syntax can be evolved keeping some structural continuity with the existing syntax so that as a nextflow developer I feel comfortable with it without the need to learn new concepts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel comfortable with it without the need to learn new concepts.

Static typing and records are new concepts. There is no getting around that.

The question is, given that users will have to learn the new syntax for parameters, type annotations, record types, etc, is it better for the process inputs/outputs to be consistent with the new syntax everywhere else or consistent with the legacy process syntax?

If legacy continuity gets in the way of expressing inputs/outputs in the new system, surely the latter must take precedence

At the same time, we might be able to achieve both...

The best case I can see is to ditch the assignment and just have a destructor -> constructor pattern:

legacy

    input:
    tuple val(id), path(fastq_1), path(fastq_2)

    output:
    tuple val(id), path("fastqc_${id}_logs")

typed (tuple)

    input:
    tuple(id: String, fastq_1: Path, fastq_2: Path)

    output:
    tuple(id, file("fastqc_${id}_logs"))

typed (record)

    input:
    record(
        id: String,
        fastq_1: Path,
        fastq_2: Path
    )

    output:
    record(
        id: id,
        fastqc: file("fastqc_${id}_logs")
    )

This works well with static typing and the legacy continuity is decent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concerns at that point would be:

  • The commas can get ugly with type annotations, for example:
    input:
    record(
        id: String,
        single_end: Boolean,
        reads: List<Path>, // ugly
        args: String?, // ugly
        prefix: String?
    )

This is partly what led me to the block syntax which doesn't require commas.

  • If the new syntax is too similar, that could cause it's own confusion. Might be hard to distinguish between typed vs legacy syntax. This is the concern we had with typed workflows

- Good, because reuses existing assignment and type annotation patterns.
- Good, because `record()` is already a general-purpose function, no new syntax needed.
- Good, because type annotations follow standard rules — `sample: Sample = record(...)` works like any typed assignment.
- Bad, because input `record()` arguments are types rather than values, which is a different usage of the function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my main issue with Option 3. The double usage of record() here is subtle and more likely to confuse users and agents

Whereas the syntax I ended up using for record inputs is easy to explain as an "inline record type"

So I think with Option 3 you are ultimately trading one form of double usage for another, without much benefit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My counter argument is that's aligned with the semantic for record constructor. Also the double usage is done for file. Above the central point is the continuity with existing notation val, path, tuple, etc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function overloading like with file() is fine if done cautiously, but what I'm talking about is syntax overloading. You are hijacking the assignment and function call syntax for different purposes, which creates unnecessary cognitive load

The function call by itself could make sense as a reverse constructor pattern:

input:
tuple(id: String, fastq_1: Path, fastq_2: Path)
record(id: String, fastq_1: Path, fastq_2: Path)

That would also have better continuity with the legacy syntax

@bentsherman
Copy link
Member

Great discussion overall. I would be keen to incorporate this somehow into the record type ADR, since we did discuss some of these options throughout the process but I didn't include them as alternatives in the original ADR

Would be good to document these alternatives so that we have the clear rationale for the final syntax

@bentsherman

This comment was marked as outdated.

@bentsherman bentsherman changed the title ADR: Unified record syntax for process I/O ADR: Unified record syntax for process inputs/outputs Mar 12, 2026
```

- Good, because symmetric — same block form on both sides.
- Bad, because the output block mixes type declarations with value assignments (`Path = file(...)`).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not more akin to a type definition with some defaults? I know this may not be something that really exists in Java/Groovy but it makes perfect sense when thinking of record types as akin to something like Pydantic, where you would declare a record with a default using similar syntax.

e.g.

class DemoModel(BaseModel):
    ts: datetime = Field(default_factory=datetime.now)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pydantic is an interesting comparison here. I think option 3 is essentially the same as Pydantic -- using function calls to create type definitions

Pydantic is constrained by Python syntax, so for them the best option is to use assignments and function calls, even though it conflates the meaning of this syntax (creating a value vs declaring a type). But this is probably still much better than creating a custom DSL that users would have to learn alongside Python

We have no such constraint in Nextflow, so we can differentiate these type declarations with different syntax

Extend Option 3 to establish a uniform constructor notation
`name = constructor(...)` that applies to both record() and tuple()
across process inputs and outputs. Highlights the migration path
from classic tuple syntax through typed tuples to records.

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
@pditommaso pditommaso force-pushed the adr/record-syntax-unification branch from 4d22995 to 30b8440 Compare March 20, 2026 11:23
@pditommaso
Copy link
Member Author

Updated ADR: Unified constructor notation for record() and tuple()

Key changes in this update:

  1. Broadened scope from record-only to all structured types. The proposal now establishes name = constructor(...) as a uniform notation that applies to both record() and tuple(), not just records.

  2. Three-tier notation for both constructors:

    tuple(id, file('*.bam'))                          // bare
    out = tuple(id, file('*.bam'))                    // assignment
    out: Tuple<String,Path> = tuple(id, file('*.bam'))  // typed assignment

    Same pattern for record() — bare, assignment, typed assignment.

  3. Migration path from classic DSL2 through tuples to records:

    tuple val(id), path(fastq)             // classic DSL2
    in = tuple(id: String, fastq: Path)    // typed — uniform notation
    in = record(id: String, fastq: Path)   // record — just change the keyword
  4. Tuple input with inline named typesin = tuple(id: String, fastq: Path) — gives tuples named components while preserving positional semantics, mirroring record input notation exactly.

  5. Coexistence example showing both tuple() and record() in the same pipeline with identical notation shape.

The core value: users learn one pattern and apply it everywhere. The only choice is positional (tuple) vs named (record).

pditommaso and others added 2 commits March 20, 2026 12:27
Co-authored-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants