Skip to content

RFC: Stage module/project bin scripts as task inputs#6928

Draft
pinin4fjords wants to merge 1 commit intomasterfrom
jm/stage-bin-scripts-as-inputs
Draft

RFC: Stage module/project bin scripts as task inputs#6928
pinin4fjords wants to merge 1 commit intomasterfrom
jm/stage-bin-scripts-as-inputs

Conversation

@pinin4fjords
Copy link
Contributor

This is a draft/thought experiment - not intended for immediate merge. Opening to get feedback on the approach and surface any concerns before investing further. Happy to close if the direction doesn't make sense.

Stage module bin scripts as task inputs

Problem

Module binaries (resources/usr/bin/ scripts within a module directory) currently require nextflow.enable.moduleBinaries = true and either a local/shared filesystem or Wave containers for cloud executors. On local/HPC executors, the mechanism works by bind-mounting host directories into containers, which is fragile with path remapping, rootless containers, or non-shared filesystems. On cloud executors, module binaries don't work at all without Wave.

The workflow-level bin/ directory has the same bind-mount limitation for containerized tasks.

Solution

Stage bin scripts into the task work directory as implicit input files, using the same proven staging infrastructure that handles regular task inputs. A hidden .bin/ directory is created in each task's work directory, and $PATH is updated in the wrapper script to include it.

  • Local/HPC executors: bin scripts are symlinked per-task (zero-cost), with source directories mounted into containers for symlink resolution
  • Cloud executors: bin scripts are uploaded once to {workDir}/.nextflow/bin/ in cloud storage, then staged per-task via standard cloud download commands
  • Project bin scripts: only scripts referenced in the task's script block are staged (same tokenization approach as TaskHasher)

This works identically across all executors and container engines, with no feature flag, no Wave dependency, and no bind mounts required for the scripts themselves.

How each environment works

Environment Bin file paths in inputFiles Per-task staging Container handling
Local/HPC, no container Local paths Symlink (zero-cost) N/A
Local/HPC + Docker/Singularity Local paths Symlink + mount originals addMountForInputs
Cloud + Fusion Cloud paths (uploaded once) Fusion remaps Fusion mounts
Cloud, no Fusion Cloud paths (uploaded once) Cloud download commands N/A (container-native)
Any + stageInMode = 'copy' Same as above Physical copy No mounts needed

What changed vs. the old module binaries behavior

Before After
Feature flag nextflow.enable.moduleBinaries = true required No flag needed (deprecated with warning)
Local/HPC Bind-mounts host bin dirs into containers Symlinks scripts into task work dir
Cloud executors Requires Wave containers Uploads once, stages via standard input mechanism
Supported bin paths resources/usr/bin/ resources/bin/, resources/usr/bin/, resources/usr/local/bin/
Container compatibility Depends on host path accessibility Works with any container engine
Project bin staging All scripts added to PATH via directory Only scripts referenced in task script are staged

Key changes

  • TaskProcessor: getModuleBinFiles() and getProjectBinFiles() collect bin scripts from module bundles and the project bin/ directory, prefixed with .bin/. For cloud work dirs, uploadBinFiles() uploads files once to {workDir}/.nextflow/bin/ using FileHelper.copyPath() with a shared ConcurrentHashMap cache to avoid duplicate uploads across processors
  • TaskBean: Merges bin files into the task's input files map for staging. Project bin files are filtered to only those referenced in the task script (via TaskProcessor.getReferencedProjectBinFiles()). Warns on filename collisions between module and project bins (project wins)
  • BashWrapperBuilder: Generates chmod +x and PATH export in the wrapper script; bin files are mounted into containers like any other input file
  • SimpleFileCopyStrategy: Uses the task's stageInMode for all files (no forced copy mode for bin files)
  • ResourcesBundle: getBinFiles() method returns executable files under bin/, usr/bin/, usr/local/bin/
  • TaskHasher: Includes module bin files in cache hash for correct invalidation
  • CmdRun: detectModuleBinaryFeature emits deprecation warnings when the flag is explicitly set

Backward compatibility

  • getBinDirs() on TaskProcessor is deprecated and returns empty; plugins referencing it (e.g. nf-k8s) continue to compile
  • Cloud executor remoteBinDir upload for project-level scripts is unchanged
  • nextflow.enable.moduleBinaries = true emits a deprecation warning (no longer needed)
  • nextflow.enable.moduleBinaries = false is honored for one release cycle with a deprecation warning, then will be ignored

Future direction

As @bentsherman pointed out, with typed processes the stageAs directive could offer a more explicit alternative:

nextflow.preview.types = true

process HELLO {
  stage:
  stageAs file("${moduleDir}/bin/script.py"), '*'

  script:
  """
  script.py ...
  """
}

The implicit approach here and explicit stageAs are complementary - implicit handles the common case (backward compatible with existing modules), while stageAs could serve users who want fine-grained control.

Questions for reviewers

  • Is staging bin scripts as input files the right direction, or is there a simpler path?
  • Any concerns with the symlink approach on local executors vs the old bind-mount?
  • Does the project bin filtering (tokenize script, stage only referenced scripts) feel right, or should all project bins always be staged?
  • How should this interact with remoteBinDir on cloud executors long-term?

Replace the bind-mount mechanism for module binaries and project bin/
scripts with input file staging. Bin scripts are staged into a hidden
.bin/ directory in each task's work directory and made available via PATH.

On local/HPC executors, scripts are symlinked (zero-cost). On cloud
executors, scripts are uploaded once to {workDir}/.nextflow/bin/ and
staged per-task via standard cloud download commands.

Project bin scripts are filtered to only those referenced in the task
script, avoiding unnecessary staging for large bin/ directories.

The nextflow.enable.moduleBinaries feature flag is deprecated with a
warning. Setting it to false is honored for one release cycle.

Signed-off-by: Jonathan Manning <jonathan.manning@seqera.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Jonathan Manning <jonathan.manning@seqera.io>
@netlify
Copy link

netlify bot commented Mar 16, 2026

Deploy Preview for nextflow-docs-staging ready!

Name Link
🔨 Latest commit c65e0fb
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/69b819ed701e5d000899e583
😎 Deploy Preview https://deploy-preview-6928--nextflow-docs-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant