Support staging closure for input files in typed process by bentsherman · Pull Request #6905 · nextflow-io/nextflow

bentsherman · 2026-03-10T00:29:23Z

This PR adds the ability to stage input files with a staging closure instead of a glob pattern, which allows for more find-grained control over individual file names

netlify · 2026-03-10T00:29:28Z

✅ Deploy Preview for nextflow-docs-staging ready!

Name	Link
🔨 Latest commit	`8e619c6`
🔍 Latest deploy log	https://app.netlify.com/projects/nextflow-docs-staging/deploys/69b44086a3da750008f9cab1
😎 Deploy Preview	https://deploy-preview-6905--nextflow-docs-staging.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

bentsherman · 2026-03-10T00:37:33Z

@Lehmann-Fabian let me know if this PR meets your needs from #2622

bentsherman · 2026-03-10T00:40:05Z

modules/nextflow/src/main/groovy/nextflow/processor/TaskInputResolver.groovy

+        final result = new ArrayBag(files.size())
+        for( final holder : files ) {
+            final stageName = stagingClosure.call(holder.getStorePath())
+            result << holder.withName(stageName)
+        }
+        return result


@Lehmann-Fabian the main difference I saw with your PR is that it seems like you allowed the staging closure to still use glob patterns

I wasn't sure if this is actually needed, and the implementation is much simpler if we say that the staging closure must return a fully-resolved file name (no glob patterns), so I left it out

But let me know if you actually needed this

pditommaso

I'm skeptical there's a strong use case for dynamic resolution, on there other hands brings back bloated closure in process definitions as we had (have) for publishDir

bentsherman · 2026-03-10T13:35:26Z

You can see Fabian's use case in #2622

I have come to see it as an alternative to to staging via glob pattern, which doesn't give you as much control and forces certain conventions on you (e.g. numbering files)

The closure doesn't bother me so much because (1) it will be rarely used and (2) it is a mapping function rather than a closure with no arguments (which is redundant)

It's essentially the same thing we do for workflow outputs:

output {
    samples {
        path { sample -> "fastq/${sample.id}/" }
    }
}

It's the simplest way to map files from one environment to another

bentsherman · 2026-03-10T13:51:29Z

From Fabian's example (nf-core/rangeland):

    script:
    """
    # ...

    # Rename files: /trend/<Tile>/<Filename> to <Tile>_<Filename>, otherwise we can not reextract the tile name later
    results=`find trend -name '*.tif*'`
    parallel -j $task.cpus 'mv {} {//}_{/}' ::: \$results
    """

So the main use case is when you have a multi-dimensional file collection (e.g. grid data)

Many tools expect a particular file/directory structure, and while we are trying to get away from encoding metadata in file paths, the reality is that many tools still do it

Nextflow's pattern-based staging isn't good enough here, so you end up "re-staging" the files in your process script (like the mv command above). So I think a input file -> stage name staging closure is a better way to do this

pditommaso · 2026-03-12T08:18:23Z

Alternative: Declarative String Replacement Rule (sed-like)

Instead of a closure, we could use a declarative pattern-based renaming rule — a string that specifies how to transform the original file path into the staged name.

Option A: `sed`-style substitution syntax

stageAs slice, 's|.*/([^/]+)/([^/]+)$|$1/$2|'

Option B: Path template with named segments

Define a mini-language that references parts of the source path:

stageAs slice, '{parent}/{name}'

Where built-in tokens map to path components:

Token	Meaning	Example for `/data/tile1/sample.tif`
`{name}`	File name with extension	`sample.tif`
`{simpleName}`	File name without extension	`sample`
`{extension}`	File extension	`tif`
`{parent}`	Immediate parent dir name	`tile1`
`{parent2}`	Grandparent dir name	`data`
`{path:N}`	Last N path segments	`tile1/sample.tif`

Examples:

// Preserve parent directory structure
stageAs slice, '{parent}/{name}'
// tile1/sample.tif, tile2/sample.tif

// Flatten with prefix from parent
stageAs slice, '{parent}_{name}'
// tile1_sample.tif, tile2_sample.tif

// Keep last 2 path segments (same as parent/name)
stageAs slice, '{path:2}'
// tile1/sample.tif

Option C: Regex substitution with explicit syntax

stageAs slice, from: '.*/([^/]+)/([^/]+)', to: '$1/$2'

Comparison

Criteria	Closure (PR)	Path Templates (B)	Regex Sub (A/C)
Readability	Medium — Groovy knowledge needed	High — self-documenting tokens	Low — regex is hard to read
Power/Flexibility	Maximum — arbitrary Groovy code	Covers 90%+ of use cases	Full regex power
Declarative	No — imperative code	Yes	Yes
Serializable	No — closures are opaque	Yes — plain string	Yes — plain string
Config override	Hard	Easy (`process.stageAs = ...`)	Possible but ugly
Learning curve	Low for Groovy users	Very low	High
*Composable with existing ``/`?` patterns**	No — separate code path	Could extend existing pattern language	No

Recommendation: Path Templates (Option B)

The path template approach ('{parent}/{name}') seems the strongest alternative because:

Declarative & serializable — it's just a string, making it compatible with config overrides, caching keys, and serialization
Readable — '{parent}/{name}' is immediately understandable vs { file -> "${file.parent.name}/${file.name}" }
Consistent with Nextflow's existing pattern language — extends the glob-based */? patterns rather than introducing a completely different paradigm (closures)
Covers the real use cases — the rangeland case (tile/sample.tif preservation) and the PR's test case (group1/sample1.txt) are both expressible with simple token substitution
No closure serialization concerns — closures in Nextflow can cause issues with -resume caching and Kryo serialization

bentsherman · 2026-03-12T18:12:49Z

@FloWuenne can you describe the Hive use case you encountered?

One question I also have -- do you actually need to rename / remap the input files from their original structure, or just preserve them as they are?

Part of the issue is that stageAs ignores directories when staging input files. It just stages them in by their base file name. So if I provide a collection of files like this:

/.../group1/sample1.txt
/.../group1/sample2.txt
/.../group1/sample3.txt
/.../group2/sample1.txt
/.../group2/sample2.txt
/.../group2/sample3.txt
/.../group3/sample1.txt
/.../group3/sample2.txt
/.../group3/sample3.txt

They will be staged in as sample1.txt, sample2.txt, and sample3.txt, which means they will overwrite each other.

But the problem is that Nextflow doesn't know how many parent directories should be included when staging them into a task (i.e. which part belongs to the ...?)

if the files are coming from an upstream task output, then we can simply use the relative path against the task directory
if the files coming from outside the work directory... ?

If we could find an elegant solution to (2), we might not need the staging closure (or any other alternative) at all. We could just make the default staging behavior do the right thing and then you don't have to think about it.

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

bentsherman requested a review from jorgee March 10, 2026 00:29

bentsherman requested review from a team as code owners March 10, 2026 00:29

bentsherman mentioned this pull request Mar 10, 2026

[New Feature] Evaluate Closure for every input file #2622

Closed

bentsherman commented Mar 10, 2026

View reviewed changes

bentsherman added this to the 26.04 milestone Mar 10, 2026

pditommaso reviewed Mar 10, 2026

View reviewed changes

bentsherman linked an issue Mar 10, 2026 that may be closed by this pull request

option to save/restore relative file paths #3631

Open

Support staging closure for input files in typed process

8e619c6

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

bentsherman force-pushed the typed-process-stageas-closure branch from 1139434 to 8e619c6 Compare March 13, 2026 16:51

bentsherman removed this from the 26.04 milestone Mar 16, 2026

pditommaso force-pushed the master branch from 6fe40e1 to ea1f4ea Compare March 17, 2026 19:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support staging closure for input files in typed process#6905

Support staging closure for input files in typed process#6905
bentsherman wants to merge 1 commit intomasterfrom
typed-process-stageas-closure

bentsherman commented Mar 10, 2026

Uh oh!

netlify bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

bentsherman commented Mar 10, 2026

Uh oh!

bentsherman Mar 10, 2026

Uh oh!

pditommaso left a comment

Uh oh!

bentsherman commented Mar 10, 2026

Uh oh!

bentsherman commented Mar 10, 2026

Uh oh!

pditommaso commented Mar 12, 2026

Uh oh!

bentsherman commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bentsherman commented Mar 10, 2026

Uh oh!

netlify bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for nextflow-docs-staging ready!

Uh oh!

bentsherman commented Mar 10, 2026

Uh oh!

bentsherman Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

pditommaso left a comment

Choose a reason for hiding this comment

Uh oh!

bentsherman commented Mar 10, 2026

Uh oh!

bentsherman commented Mar 10, 2026

Uh oh!

pditommaso commented Mar 12, 2026

Alternative: Declarative String Replacement Rule (sed-like)

Option A: sed-style substitution syntax

Option B: Path template with named segments

Option C: Regex substitution with explicit syntax

Comparison

Recommendation: Path Templates (Option B)

Uh oh!

bentsherman commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

netlify bot commented Mar 10, 2026 •

edited

Loading

Option A: `sed`-style substitution syntax