Skip to content

[New Feature] Evaluate Closure for every input file#2622

Closed
Lehmann-Fabian wants to merge 13 commits intonextflow-io:masterfrom
Lehmann-Fabian:ClosureMultipleInputs
Closed

[New Feature] Evaluate Closure for every input file#2622
Lehmann-Fabian wants to merge 13 commits intonextflow-io:masterfrom
Lehmann-Fabian:ClosureMultipleInputs

Conversation

@Lehmann-Fabian
Copy link
Contributor

@Lehmann-Fabian Lehmann-Fabian commented Feb 3, 2022

Until now, Nextflow evaluates Closures to stage multiple Inputfiles only once.
Accordingly, it cannot produce individual staging names for different files in one Channel/one task.
However, it might be helpful to evaluate the Closure for every file, as requested here: #1998.
I solve the problem with this PR while not changing the original logic.
If a Closure produces similar names, an increasing counter is added to the similar names.
I also thought about adding this to the current logic: if you stage in as * and multiple files have the same name. But this would skip collision warnings, which some users may expect and use for debugging.

For example, the following code shows how to keep folder structures for inputs.

fasta = Channel.fromPath( "/root/*/*.fa" ).buffer(size:10, remainder: true)
process blastThemAll {

    input:
    file {"${sourceObj.parent}/${sourceObj.name}.fa"} from fasta

    """
    find . -name "*"
    """

}

Signed-off-by: Lehmann-Fabian <fabian.lehmann@informatik.hu-berlin.de>
@Lehmann-Fabian Lehmann-Fabian changed the title Evaluate Closure for every input file [New Feature] Evaluate Closure for every input file Apr 27, 2022
@davidfrantz
Copy link

For datacube-structured Earth Observation datasets, this PR would be extremely helpful!

@pditommaso pditommaso force-pushed the master branch 2 times, most recently from 2c461f8 to 23432f3 Compare November 24, 2022 19:15
@pditommaso pditommaso force-pushed the master branch 2 times, most recently from e2b4a93 to f32ea0b Compare December 8, 2022 15:16
@pditommaso pditommaso force-pushed the master branch 2 times, most recently from cefb067 to e523afd Compare December 22, 2022 20:43
@pditommaso pditommaso force-pushed the master branch 2 times, most recently from 0d59b4c to b93634e Compare March 11, 2023 11:20
Signed-off-by: Fabian Lehmann <46564585+Lehmann-Fabian@users.noreply.github.com>
@Lehmann-Fabian
Copy link
Contributor Author

Hi @pditommaso, I am reaching out regarding this PR that has been open for over a year without any action but is still of great interest.
This PR allows you to dynamically name files if you stage a list of files into a process. This is particularly helpful if you want to create a dynamic folder structure.

To provide you with an example of the necessity of this PR:
We are requested to transfer our Rangeland workflow to nf-core. In this workflow, we use FORCE, a tool that organizes files in folder structures, which is not out-of-the-box Nextflow compatible.
As a result, we had to manually rename files in some instances, such as in the code snippet provided here.

I would greatly appreciate it if you could take some time to review this PR and provide feedback on any changes that could be made to improve it.

@pditommaso
Copy link
Member

pditommaso commented Mar 28, 2023

Can you please remind me what you are trying to solve? Nextflow already supports dynamic file name resolution. For example having this

» tree data/
data/
├── one
│   └── file.txt
├── three
│   └── file.txt
└── two
    └── file.txt

and using this script

process foo {
  debug true
  input: 
  tuple val(name), path("$name/*")

  '''
  tree .
  '''
}

workflow {
  channel.fromPath('data/**/*.txt').map { tuple(it.parent.name, it) } | foo 
}

It returns

.
└── three
    └── file.txt -> /Users/pditommaso/demo/data/three/file.txt

1 directory, 1 file

.
└── two
    └── file.txt -> /Users/pditommaso/demo/data/two/file.txt

1 directory, 1 file

.
└── one
    └── file.txt -> /Users/pditommaso/demo/data/one/file.txt

@Lehmann-Fabian
Copy link
Contributor Author

Thank you very much for getting back on this.
Sure, I extended the case in your example to also work for more than one file.
Accordingly, you should be able to pass multiple files into a single task with its original data structure.
In the closure path("$name/*"), the name is fixed if this task has more than one input file.

Let me extend your input:

tree data/
├── one
│   ├── file1.txt
│   ├── file2.txt
│   └── file3.txt
├── three
│   ├── file1.txt
│   ├── file2.txt
│   └── file3.txt
└── two
    ├── file1.txt
    ├── file2.txt
    └── file3.txt

Now in your Nextflow script, I group the files by their name. All file1 together, file2 together,...

workflow {
  channel.fromPath('/execution/data/**/*.txt').map { tuple(it.name, it) }.groupTuple().map{ it[1] } | foo 
}

With the current Nextflow version, I wouldn't be able to get the following:

[74/c871e1] process > foo (2) [100%] 3 of 3 ✔
.
├── one
│   └── file3.txt -> /execution/data/one/file3.txt
├── three
│   └── file3.txt -> /execution/data/three/file3.txt
└── two
    └── file3.txt -> /execution/data/two/file3.txt

3 directories, 3 files

.
├── one
│   └── file1.txt -> /execution/data/one/file1.txt
├── three
│   └── file1.txt -> /execution/data/three/file1.txt
└── two
    └── file1.txt -> /execution/data/two/file1.txt

3 directories, 3 files

.
├── one
│   └── file2.txt -> /execution/data/one/file2.txt
├── three
│   └── file2.txt -> /execution/data/three/file2.txt
└── two
    └── file2.txt -> /execution/data/two/file2.txt

3 directories, 3 files

But this worked with my adjustment and changing the input to:

input: 
path ("${sourceObj.parent.name}/*")

This way of data organization is frequently used for data cubes in remote sensing, and thus, supporting this in Nextflow helps using Nextflow for remote sensing workflows with data cubes.

…puts

Signed-off-by: Lehmann_Fabian <fabian.lehmann@informatik.hu-berlin.de>

# Conflicts:
#	docs/process.rst
@netlify
Copy link

netlify bot commented Jul 25, 2023

Deploy Preview for nextflow-docs-staging ready!

Name Link
🔨 Latest commit b526cf9
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/64bfd3e4add64d0008c58f26
😎 Deploy Preview https://deploy-preview-2622--nextflow-docs-staging.netlify.app/process
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@bentsherman
Copy link
Member

Looking at this PR again (after a certain reviewer referred me to it 😉 ), I think I have more clarity on how to incorporate this functionality.

In thinking about static types for processes, I have concluded that file staging needs to be separated from the input declaration, in order for the inputs to be typed in the same way as the rest of the language. For example, a separate stage: section that allows you to define more clearly how to map input files into a particular directory structure.

This is especially important when passing in directories, but even when we have record types, which I think will be nicer way to model these directories, we will still have this problem of how to materialize the right directory structure.

Here is a possible syntax:

workflow {
  ch_inputs = channel.fromPath('/data/**/*.txt')
    .map { file -> tuple(file.name, file) }
    .groupTuple()
    .map { key, files -> files }

  find( ch_inputs )
}

process find {
    input:
    slice: Bag<Path>

    stage:
    stageAs(slice) { file -> "${file.parent}/${file.name}.txt" }

    script:
    """
    find . -name "*"
    """
}

The stageAs directive can be omitted to use some default staging behavior, but here you can provide a closure of type (Path) -> String.

I don't know if we'll try to add this to the existing syntax, but I will definitely make it a priority for static types.

@pditommaso pditommaso force-pushed the master branch 2 times, most recently from 5a93547 to 27345a6 Compare February 10, 2025 21:46
@Lehmann-Fabian
Copy link
Contributor Author

Thanks for getting back to me, and I hope it was a helpful review😉
Your suggestion looks very promising. Having static types is an interesting idea, and having the feature available in the future is excellent.
In the meantime, we succeeded in making our workflow an nf-core workflow, and using this feature will avoid the ugly renaming we had to do there.
I still think internally evaluating the name as a closure offers new possibilities without altering existing logic. But knowing that there is a feature on the horizon is enough. Thanks 😊

@pditommaso pditommaso force-pushed the master branch 3 times, most recently from b4b321e to 069653d Compare June 4, 2025 18:54
@bentsherman bentsherman requested review from bentsherman and removed request for pditommaso November 22, 2025 17:17
@github-actions github-actions bot removed the stale label Feb 13, 2026
@pditommaso pditommaso force-pushed the master branch 2 times, most recently from d9fa5cd to d752bc2 Compare February 28, 2026 13:10
@bentsherman
Copy link
Member

Closing in favor of #6905

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants