[New Feature] Evaluate Closure for every input file#2622
[New Feature] Evaluate Closure for every input file#2622Lehmann-Fabian wants to merge 13 commits intonextflow-io:masterfrom
Conversation
Signed-off-by: Lehmann-Fabian <fabian.lehmann@informatik.hu-berlin.de>
|
For datacube-structured Earth Observation datasets, this PR would be extremely helpful! |
2c461f8 to
23432f3
Compare
e2b4a93 to
f32ea0b
Compare
cefb067 to
e523afd
Compare
0d59b4c to
b93634e
Compare
Signed-off-by: Fabian Lehmann <46564585+Lehmann-Fabian@users.noreply.github.com>
|
Hi @pditommaso, I am reaching out regarding this PR that has been open for over a year without any action but is still of great interest. To provide you with an example of the necessity of this PR: I would greatly appreciate it if you could take some time to review this PR and provide feedback on any changes that could be made to improve it. |
|
Can you please remind me what you are trying to solve? Nextflow already supports dynamic file name resolution. For example having this and using this script process foo {
debug true
input:
tuple val(name), path("$name/*")
'''
tree .
'''
}
workflow {
channel.fromPath('data/**/*.txt').map { tuple(it.parent.name, it) } | foo
}It returns |
|
Thank you very much for getting back on this. Let me extend your input: Now in your Nextflow script, I group the files by their name. All With the current Nextflow version, I wouldn't be able to get the following: But this worked with my adjustment and changing the input to: This way of data organization is frequently used for data cubes in remote sensing, and thus, supporting this in Nextflow helps using Nextflow for remote sensing workflows with data cubes. |
…puts Signed-off-by: Lehmann_Fabian <fabian.lehmann@informatik.hu-berlin.de> # Conflicts: # docs/process.rst
✅ Deploy Preview for nextflow-docs-staging ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
81f7cb7 to
8a43489
Compare
|
Looking at this PR again (after a certain reviewer referred me to it 😉 ), I think I have more clarity on how to incorporate this functionality. In thinking about static types for processes, I have concluded that file staging needs to be separated from the input declaration, in order for the inputs to be typed in the same way as the rest of the language. For example, a separate This is especially important when passing in directories, but even when we have record types, which I think will be nicer way to model these directories, we will still have this problem of how to materialize the right directory structure. Here is a possible syntax: workflow {
ch_inputs = channel.fromPath('/data/**/*.txt')
.map { file -> tuple(file.name, file) }
.groupTuple()
.map { key, files -> files }
find( ch_inputs )
}
process find {
input:
slice: Bag<Path>
stage:
stageAs(slice) { file -> "${file.parent}/${file.name}.txt" }
script:
"""
find . -name "*"
"""
}The I don't know if we'll try to add this to the existing syntax, but I will definitely make it a priority for static types. |
5a93547 to
27345a6
Compare
|
Thanks for getting back to me, and I hope it was a helpful review😉 |
b4b321e to
069653d
Compare
d9fa5cd to
d752bc2
Compare
|
Closing in favor of #6905 |
Until now, Nextflow evaluates Closures to stage multiple Inputfiles only once.
Accordingly, it cannot produce individual staging names for different files in one Channel/one task.
However, it might be helpful to evaluate the Closure for every file, as requested here: #1998.
I solve the problem with this PR while not changing the original logic.
If a Closure produces similar names, an increasing counter is added to the similar names.
I also thought about adding this to the current logic: if you stage in as
*and multiple files have the same name. But this would skip collision warnings, which some users may expect and use for debugging.For example, the following code shows how to keep folder structures for inputs.