-
Notifications
You must be signed in to change notification settings - Fork 762
Open
Labels
Description
Bug report
An example output for my failed GCP Batch jobs:
2025-05-14 11:08:52.610 PDT
mv: preserving times for '/mnt/disks/arc-genomics-nextflow/work/13/150368f33208ba6fef5ff688caf8c7/./Solo.out/GeneFull_ExonOverIntron/raw.h5ad': No space left on device
2025-05-14 11:08:52.610 PDT
mv: failed to close '/mnt/disks/arc-genomics-nextflow/work/13/150368f33208ba6fef5ff688caf8c7/./Solo.out/GeneFull_ExonOverIntron/raw.h5ad': No space left on device
2025-05-14 11:08:52.835 PDT
mv: failed to close '/mnt/disks/arc-genomics-nextflow/work/13/150368f33208ba6fef5ff688caf8c7/./Solo.out/GeneFull_ExonOverIntron/Summary.csv': No space left on device
2025-05-14 11:08:53.027 PDT
mv: preserving times for '/mnt/disks/arc-genomics-nextflow/work/13/150368f33208ba6fef5ff688caf8c7/./Solo.out/GeneFull_Ex50pAS/raw.h5ad': No space left on device
2025-05-14 11:08:53.027 PDT
mv: failed to close '/mnt/disks/arc-genomics-nextflow/work/13/150368f33208ba6fef5ff688caf8c7/./Solo.out/GeneFull_Ex50pAS/raw.h5ad': No space left on device
2025-05-14 11:08:53.287 PDT
mv: failed to close '/mnt/disks/arc-genomics-nextflow/work/13/150368f33208ba6fef5ff688caf8c7/./Solo.out/Gene/Summary.csv': No space left on device
2025-05-14 11:08:53.746 PDT
mv: failed to close '/mnt/disks/arc-genomics-nextflow/work/13/150368f33208ba6fef5ff688caf8c7/./versions.yml': No space left on device
2025-05-14 11:08:58.815 PDT
Task task/nf-13150368-174723-238fac2a-1e92-4c510-group0-0/0/0 runnable 0 exited with status 0
2025-05-14 11:08:58.815 PDT
Task task/nf-13150368-174723-238fac2a-1e92-4c510-group0-0/0/0 background runnables all exited on their own.
2025-05-14 11:08:58.815 PDT
Task task/nf-13150368-174723-238fac2a-1e92-4c510-group0-0/0/0 succeededI'm using:
process {
shell = ['/bin/bash', '-euo', 'pipefail']
}
... so I don't see why the mv commands are not causing the job to fail due to the No space left on device errors.
I'm guessing that the No space errors are due to a lack of disk space. My process:
process STAR_map {
label "STAR_env"
cpus Math.min(params.cpus_max, 20)
memory { 40.GB * (1 + 0.5 * task.attempt) }
time { 4.h + (4 * task.attempt).h }
disk {
def read1_size_gb = fastq_read1.size() / 1024 ** 3
def read2_size_gb = fastq_read2.size() / 1024 ** 3
def size_gb = (read1_size_gb + read2_size_gb) * 10
def ssd_count = Math.max(1, Math.ceil(size_gb / 375)).intValue()
println "${meta.id} => size_gb: ${size_gb.round(1)}; ssd_count: ${ssd_count}"
[request: (375 * ssd_count * task.attempt).GB, type: "local-ssd"]
}
input:
tuple val(meta), path(fastq_read1), path(fastq_read2), path(genome_dir)
each path(star_par_file)
output:
path "*", emit: all // all files/folders will be saved into $outdir/STAR directory
tuple val(meta), path("${meta.id}_Aligned.toTranscriptome.out.bam"), emit: trbam // Transcritome alignments will be passed to Salmon (or other transcript quantification)
tuple val(meta), path("${meta.id}_Aligned.sortedByCoord.out.bam"), emit: bam // Genome alignments, sorted by coordinate by STAR
tuple val(meta), path("${meta.id}_ReadsPerGene.out.tab"), emit: reads_per_gene // for multiqc
tuple val(meta), path("Log.final.out"), emit: log_final // STAR log file
tuple val(meta), path("Solo.out"), emit: solo // Solo output directory
path "versions.yml", emit: versions
script:
"""
STAR ${params.extra_pars_star} \\
--runThreadN ${task.cpus} \\
--parametersFiles ${star_par_file} \\
--genomeDir ${genome_dir} \\
--readFilesIn ${fastq_read1} ${fastq_read2} \\
--outSAMattrRGline ID:${meta.id} SM:${meta.id} PL:ILLUMINA \\
--soloStrand ${meta.strandedness} \\
2>&1 | tee ${task.process}_${meta.id}.log
# remove temporary STAR files (sometimes they are not removed by STAR)
rm -rf _STARtmp
echo "# Compressing the output for the sake of scanpy" | tee -a ${task.process}_${meta.id}.log
find Solo.out -type f -name "*.mtx" | xargs -P ${task.cpus} gzip
find Solo.out -type f -name "*.tsv" | xargs -P ${task.cpus} gzip
echo "# Converting solo output to h5ad" | tee -a ${task.process}_${meta.id}.log
mtx-to-h5ad.py --output-dir Solo.out --sample "${meta.id}" Solo.out
# rename Aligned.toTranscriptome.out.bam by adding the sample name
mv Aligned.toTranscriptome.out.bam ${meta.id}_Aligned.toTranscriptome.out.bam
mv Aligned.sortedByCoord.out.bam ${meta.id}_Aligned.sortedByCoord.out.bam
mv ReadsPerGene.out.tab ${meta.id}_ReadsPerGene.out.tab
cat <<-END_VERSIONS > versions.yml
"${task.process}":
star: \$(STAR --version | sed -e "s/STAR_//g")
END_VERSIONS
"""
}
...but the zero-exit GCP batch jobs are problematic for troubleshooting the issue.
Expected behavior and actual behavior
Nextflow does not seem to be handling all bash jobs error correctly so that GCP Batch jobs exit with non-zero values.
Steps to reproduce the problem
See above
Program output
The relevant log:
~> TaskHandler[id: 5; name: STAR_map (LB_Brain_32-33_FLEX); status: RUNNING; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/98/a4db46edf26cb46d29cdc66fd4bcdc]
~> TaskHandler[id: 6; name: STAR_map (LB_Brain_32-33_FLEX); status: RUNNING; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/dd/ecf421747f747dba9031d294bbc5a4]
May-14 10:54:22.668 [Task monitor] DEBUG n.c.g.batch.GoogleBatchTaskHandler - [GOOGLE BATCH] Process `STAR_map (LB_Brain_28-29_FLEX)` - terminated job=nf-eb1af425-1747237817762; task=0; state=SUCCEEDED
May-14 10:54:23.220 [Task monitor] DEBUG n.c.g.batch.GoogleBatchTaskHandler - [GOOGLE BATCH] Cannot read exit status for task: `STAR_map (LB_Brain_28-29_FLEX)` - For input string: ""
May-14 10:54:23.221 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 3; name: STAR_map (LB_Brain_28-29_FLEX); status: COMPLETED; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/eb/1af4254a3364c13759c44d3f1f91bd]
May-14 10:54:23.222 [Task monitor] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'TaskFinalizer' minSize=10; maxSize=10; workQueue=LinkedBlockingQueue[-1]; allowCoreThreadTimeout=false
May-14 10:54:23.330 [TaskFinalizer-1] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
task: name=STAR_map (LB_Brain_28-29_FLEX); work-dir=gs://arc-genomics-nextflow/work/eb/1af4254a3364c13759c44d3f1f91bd
error [nextflow.exception.ProcessFailedException]: Process `STAR_map (LB_Brain_28-29_FLEX)` terminated for an unknown reason -- Likely it has been terminated by the external system
May-14 10:54:23.344 [TaskFinalizer-1] INFO nextflow.processor.TaskProcessor - [eb/1af425] NOTE: Process `STAR_map (LB_Brain_28-29_FLEX)` terminated for an unknown reason -- Likely it has been terminated by the external system -- Execution is retried (1)
May-14 10:54:25.142 [Task submitter] DEBUG n.c.g.batch.GoogleBatchTaskHandler - [GOOGLE BATCH] Process `STAR_map (LB_Brain_28-29_FLEX)` submitted > job=nf-eb0eaa6a-1747245263940; uid=nf-eb0eaa6a-174724-28d58387-b7f0-44df0; work-dir=gs://arc-genomics-nextflow/work/eb/0eaa6a7855d6f04b37093ad82ab212
May-14 10:54:25.143 [Task submitter] INFO nextflow.Session - [eb/0eaa6a] Re-submitted process > STAR_map (LB_Brain_28-29_FLEX)
May-14 10:55:12.115 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor google-batch > tasks to be completed: 6 -- submitted tasks are shown below
~> TaskHandler[id: 1; name: STAR_map (LB_Brain_10_FLEX); status: RUNNING; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/56/6e3ab805acfa393943326fc1d31bd9]
~> TaskHandler[id: 2; name: STAR_map (LB_Brain_10_FLEX); status: RUNNING; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/13/150368f33208ba6fef5ff688caf8c7]
~> TaskHandler[id: 4; name: STAR_map (LB_Brain_28-29_FLEX); status: RUNNING; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/ab/335dbd229a567686123b109c0f79f3]
~> TaskHandler[id: 5; name: STAR_map (LB_Brain_32-33_FLEX); status: RUNNING; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/98/a4db46edf26cb46d29cdc66fd4bcdc]
~> TaskHandler[id: 6; name: STAR_map (LB_Brain_32-33_FLEX); status: RUNNING; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/dd/ecf421747f747dba9031d294bbc5a4]
~> TaskHandler[id: 7; name: STAR_map (LB_Brain_28-29_FLEX); status: SUBMITTED; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/eb/0eaa6a7855d6f04b37093ad82ab212]
May-14 11:00:12.121 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor google-batch > tasks to be completed: 6 -- submitted tasks are shown below
~> TaskHandler[id: 1; name: STAR_map (LB_Brain_10_FLEX); status: RUNNING; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/56/6e3ab805acfa393943326fc1d31bd9]
~> TaskHandler[id: 2; name: STAR_map (LB_Brain_10_FLEX); status: RUNNING; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/13/150368f33208ba6fef5ff688caf8c7]
~> TaskHandler[id: 4; name: STAR_map (LB_Brain_28-29_FLEX); status: RUNNING; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/ab/335dbd229a567686123b109c0f79f3]
~> TaskHandler[id: 5; name: STAR_map (LB_Brain_32-33_FLEX); status: RUNNING; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/98/a4db46edf26cb46d29cdc66fd4bcdc]
~> TaskHandler[id: 6; name: STAR_map (LB_Brain_32-33_FLEX); status: RUNNING; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/dd/ecf421747f747dba9031d294bbc5a4]
~> TaskHandler[id: 7; name: STAR_map (LB_Brain_28-29_FLEX); status: RUNNING; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/eb/0eaa6a7855d6f04b37093ad82ab212]
May-14 11:05:12.125 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor google-batch > tasks to be completed: 6 -- submitted tasks are shown below
~> TaskHandler[id: 1; name: STAR_map (LB_Brain_10_FLEX); status: RUNNING; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/56/6e3ab805acfa393943326fc1d31bd9]
~> TaskHandler[id: 2; name: STAR_map (LB_Brain_10_FLEX); status: RUNNING; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/13/150368f33208ba6fef5ff688caf8c7]
~> TaskHandler[id: 4; name: STAR_map (LB_Brain_28-29_FLEX); status: RUNNING; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/ab/335dbd229a567686123b109c0f79f3]
~> TaskHandler[id: 5; name: STAR_map (LB_Brain_32-33_FLEX); status: RUNNING; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/98/a4db46edf26cb46d29cdc66fd4bcdc]
~> TaskHandler[id: 6; name: STAR_map (LB_Brain_32-33_FLEX); status: RUNNING; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/dd/ecf421747f747dba9031d294bbc5a4]
~> TaskHandler[id: 7; name: STAR_map (LB_Brain_28-29_FLEX); status: RUNNING; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/eb/0eaa6a7855d6f04b37093ad82ab212]
May-14 11:09:02.345 [Task monitor] DEBUG n.c.g.batch.GoogleBatchTaskHandler - [GOOGLE BATCH] Process `STAR_map (LB_Brain_10_FLEX)` - terminated job=nf-13150368-1747237814724; task=0; state=SUCCEEDED
May-14 11:09:02.931 [Task monitor] DEBUG n.c.g.batch.GoogleBatchTaskHandler - [GOOGLE BATCH] Cannot read exit status for task: `STAR_map (LB_Brain_10_FLEX)` - For input string: ""
May-14 11:09:02.932 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 2; name: STAR_map (LB_Brain_10_FLEX); status: COMPLETED; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/13/150368f33208ba6fef5ff688caf8c7]
May-14 11:09:03.273 [TaskFinalizer-2] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
task: name=STAR_map (LB_Brain_10_FLEX); work-dir=gs://arc-genomics-nextflow/work/13/150368f33208ba6fef5ff688caf8c7
error [nextflow.exception.ProcessFailedException]: Process `STAR_map (LB_Brain_10_FLEX)` terminated for an unknown reason -- Likely it has been terminated by the external system
May-14 11:09:03.274 [TaskFinalizer-2] INFO nextflow.processor.TaskProcessor - [13/150368] NOTE: Process `STAR_map (LB_Brain_10_FLEX)` terminated for an unknown reason -- Likely it has been terminated by the external system -- Execution is retried (1)
May-14 11:09:05.609 [Task submitter] DEBUG n.c.g.batch.GoogleBatchTaskHandler - [GOOGLE BATCH] Process `STAR_map (LB_Brain_10_FLEX)` submitted > job=nf-02949505-1747246143812; uid=nf-02949505-174724-9e82d505-a03e-45560; work-dir=gs://arc-genomics-nextflow/work/02/9495054c5685ca39bee0dd5caffd2b
May-14 11:09:05.610 [Task submitter] INFO nextflow.Session - [02/949505] Re-submitted process > STAR_map (LB_Brain_10_FLEX)
May-14 11:10:12.139 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor google-batch > tasks to be completed: 6 -- submitted tasks are shown below
~> TaskHandler[id: 1; name: STAR_map (LB_Brain_10_FLEX); status: RUNNING; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/56/6e3ab805acfa393943326fc1d31bd9]
~> TaskHandler[id: 4; name: STAR_map (LB_Brain_28-29_FLEX); status: RUNNING; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/ab/335dbd229a567686123b109c0f79f3]
~> TaskHandler[id: 5; name: STAR_map (LB_Brain_32-33_FLEX); status: RUNNING; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/98/a4db46edf26cb46d29cdc66fd4bcdc]
~> TaskHandler[id: 6; name: STAR_map (LB_Brain_32-33_FLEX); status: RUNNING; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/dd/ecf421747f747dba9031d294bbc5a4]
~> TaskHandler[id: 7; name: STAR_map (LB_Brain_28-29_FLEX); status: RUNNING; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/eb/0eaa6a7855d6f04b37093ad82ab212]
~> TaskHandler[id: 8; name: STAR_map (LB_Brain_10_FLEX); status: SUBMITTED; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/02/9495054c5685ca39bee0dd5caffd2b]
May-14 11:15:12.144 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor google-batch > tasks to be completed: 6 -- submitted tasks are shown below
~> TaskHandler[id: 1; name: STAR_map (LB_Brain_10_FLEX); status: RUNNING; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/56/6e3ab805acfa393943326fc1d31bd9]
~> TaskHandler[id: 4; name: STAR_map (LB_Brain_28-29_FLEX); status: RUNNING; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/ab/335dbd229a567686123b109c0f79f3]
~> TaskHandler[id: 5; name: STAR_map (LB_Brain_32-33_FLEX); status: RUNNING; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/98/a4db46edf26cb46d29cdc66fd4bcdc]
~> TaskHandler[id: 6; name: STAR_map (LB_Brain_32-33_FLEX); status: RUNNING; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/dd/ecf421747f747dba9031d294bbc5a4]
~> TaskHandler[id: 7; name: STAR_map (LB_Brain_28-29_FLEX); status: RUNNING; exit: -; error: -; workDir: gs://arc-genomics-nextflow/work/eb/0eaa6a7855d6f04b37093ad82ab2
Environment
- Nextflow version: 24.10.5.5935
- Java version: openjdk 11.0.1 2018-10-1
- Operating system: Linux
- Bash version: 5.1.16
Additional context
See https://nfcore.slack.com/archives/C02T98A23U7/p1736629652539469?thread_ts=1667836041.736919&cid=C02T98A23U7 for more context