Skip to content

Default stageOutMode to 'copy' for Google Batch executor#6917

Open
rhassaine wants to merge 1 commit intonextflow-io:masterfrom
rhassaine:fix/unstage-move-overlapping-outputs-master
Open

Default stageOutMode to 'copy' for Google Batch executor#6917
rhassaine wants to merge 1 commit intonextflow-io:masterfrom
rhassaine:fix/unstage-move-overlapping-outputs-master

Conversation

@rhassaine
Copy link

@rhassaine rhassaine commented Mar 13, 2026

Summary

  • Default stageOutMode to copy for Google Batch tasks when not explicitly set by the user
  • Fixes task failures during output unstaging when running on Google Batch with scratch enabled

Problem

On Google Batch, task outputs are unstaged from local scratch (local SSD) to a gcsfuse-mounted work directory, which is always a cross-device operation. The default move mode uses mv which fails in two scenarios:

  1. Overlapping output declarations: When a process declares both a directory and files inside it (e.g. path("outdir/") and path("outdir/*.txt")), mv moves the directory first (with all contents), causing subsequent file moves to fail with No such file or directory

  2. Symlinked inputs: When staged input files are symlinks pointing back to the work directory, mv detects source and target as the same file and fails with 'X' and 'Y' are the same file

Fix

After super(bean) in GoogleBatchScriptLauncher, default stageoutMode to copy on the SimpleFileCopyStrategy if the user hasn't explicitly set a stageOutMode. This uses cp -fRL instead of mv -f for unstaging, which handles both failure scenarios.

Test plan

  • Tested with a minimal pipeline on Google Batch with overlapping output declarations (path("outdir/") + path("outdir/*.txt")) — both tasks succeeded
  • Verified that explicitly setting stageOutMode in config still takes precedence (the fix only applies when bean.stageOutMode is not set)

🤖 Generated with Claude Code

@netlify
Copy link

netlify bot commented Mar 13, 2026

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit f03f527
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/69b4090b51a40f000858d997

On Google Batch, task outputs are unstaged from local scratch to a
gcsfuse-mounted work directory, which is always a cross-device operation.
The default 'move' mode uses 'mv' which fails in two scenarios:

- When output declarations include both a directory and files inside it,
  the directory is moved first (with all contents), causing subsequent
  file moves to fail with 'No such file or directory'

- When staged input files are symlinks pointing back to the work
  directory, 'mv' detects source and target as the same file

Using 'copy' mode avoids both issues at no additional I/O cost since
cross-device 'mv' already performs a copy internally.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: rhassaine <r.hassaine@hartwigmedicalfoundation.nl>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants