Fix --save_output_as_bam flag#2154
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Warning Newer version of the nf-core template is available. Your pipeline is using an old version of the nf-core template: 3.5.1. For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation. |
Resolve conflicts: - CHANGELOG.md: keep entries from both sides - fastq_preprocess_gatk/main.nf: keep our fix removing old CRAM_TO_BAM_RECAL block Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The process was removed in #2154, so the sample log output in usage.md should no longer list it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of always producing CRAM and converting back to BAM:
- Make GATK4_MARKDUPLICATES/GATK4SPARK_MARKDUPLICATES ext.prefix
conditional on save_output_as_bam (same pattern as APPLYBQSR)
- Emit unified `alignment` channel from bam_markduplicates subworkflows
- Remove CRAM_TO_BAM conversion step at markduplicates stage
- Fix BAM_TO_CRAM_MAPPING ext.when to skip conversion when
save_output_as_bam is set with skip_markduplicates
- Fix CSV create subworkflows to derive file type from actual
filenames instead of using fragile .minus(".cram") hack
- Add BAM publishDir patterns for markduplicates configs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove dead BAM_TO_CRAM config block (no process uses this alias) - Remove unused save_output_as_bam parameter from CHANNEL_MARKDUPLICATES_CREATE_CSV and CHANNEL_BASERECALIBRATOR_CREATE_CSV (type is now derived from filename, making the parameter redundant) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
GATK4_MARKDUPLICATES does not auto-index BAM output (only indexes when converting to CRAM). Add explicit INDEX_MARKDUPLICATES step for the BAM path, matching the pattern already used in the spark variant. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The skip QC/recal/md test now correctly skips BAM_TO_CRAM_MAPPING when --save_output_as_bam is set, reducing process count from 10 to 9. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- alignment_from_everything: Remove CRAM_TO_BAM, add INDEX_MARKDUPLICATES, update file listings and stable_content md5s for BAM output - alignment_to_fastq: Same structural changes, update multiqc aggregate md5s - save_output_as_bam: Fix warning field to match CI behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- alignment_from_everything/alignment_to_fastq: Fix .bam.metrics md5s (values were extracted from wrong side of CI diff) - save_output_as_bam: Add missing variant calling snapshot, fix warnings Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GATK4_MARKDUPLICATES .bam.metrics output includes timestamps, making md5 values change between CI runs. Add to .nftignore (same as .cram.metrics) and remove from snapshot stable_content. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
maxulysse
left a comment
There was a problem hiding this comment.
All good but order of the PR in changelog
|
Hi, what's the time estimate for merging & releasing this fix? Thanks! |
|
I asked the issue authors above to test. @amizeranschi ran into something that i haven't had time to investigate: #2064 it would be good too exclude first if the changes in this PR are causing the stuck issue and/or validate that all the other scenarios are functioning now, if you have time to run any of them |
- Sentieon dedup now respects --save_output_as_bam: conditional ext.prefix produces dedup.bam/dedup.cram natively; BAM_SENTIEON_DEDUP exposes a unified `alignment` emit mixing the BAM and CRAM branches (#2078) - Variant-calling-stage CRAM_TO_BAM no longer publishes converted BAMs to preprocessing/converted/cram_to_bam when --save_output_as_bam is unset; the conversion still runs for cnvkit/msisensor2/muse (#2148) - Widen the --use_gatk_spark markduplicates + --save_mapped check to error regardless of --save_output_as_bam: Spark requires name-sorted input, so the saved mapped alignment (BAM or CRAM) is name-sorted and unindexable (#1949). Updated nextflow_schema.json and docs/usage.md accordingly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Align indentation of SAMTOOLS_STATS withName block with sibling - Make SAMTOOLS_STATS ext.prefix respect --sentieon_consensus, matching the SENTIEON_DEDUP prefix that was already conditional Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Let's merge #2197 before to fix the tests |
The previous mix of bam/bai + cram/crai joins fails because SENTIEON_DEDUP.out.bai is non-optional in the module (sentieon driver always writes a .bai alongside .crai). With --save_output_as_bam unset, .out.bam is empty but .out.bai has an entry — failOnMismatch on the bam side triggers a Join mismatch error before the pipeline can progress. Branch on params.save_output_as_bam instead: only the populated path runs, so the empty-side mismatch never happens. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- sentieon_dedup: SAMTOOLS_STATS prefix now respects --sentieon_consensus
(test.consensus.cram.stats instead of test.dedup.cram.stats)
- variant_calling_{cnvkit,muse,all}: drop preprocessing/converted/cram_to_bam
paths since CRAM_TO_BAM no longer publishes when --save_output_as_bam is unset
- alignment_{from_everything,to_fastq}: refresh MultiQC samtools md5s
- save_output_as_bam: add MultiQC bcftools_stats + samtools_insert_size
entries now that variant calling actually runs (no longer silently skipped)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Derive file type from `file.name.endsWith('.bam')` like
channel_markduplicates_create_csv and channel_baserecalibrator_create_csv
already do, and drop the now-unused save_output_as_bam parameter.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
What this fixes
--save_output_as_bamwas broken in several ways. This PR makes it work end-to-end and tightens an existing safeguard.User-visible behaviour
--save_output_as_bam: markduplicates, sentieon dedup, and recalibration all publish BAM directly. No more silently-skipped variant calling, no more duplicate-emission errors.--save_output_as_bam: BAM files no longer appear inpreprocessing/converted/cram_to_bam/as a side-effect of cnvkit/msisensor2/muse running. The conversion still happens for those tools (they need BAM input), it's just not published.--use_gatk_spark markduplicates+--save_mappednow errors at parameter validation, regardless of--save_output_as_bam. Previously it produced a name-sorted, unindexable mapped BAM/CRAM (Potentially incompatible outputs are generated with BAM output and Spark MarkDuplicates #1949).How
Instead of always producing CRAM and converting back to BAM, the relevant processes now output the requested format directly:
APPLYBQSR,GATK4_MARKDUPLICATES,GATK4SPARK_MARKDUPLICATES, andSENTIEON_DEDUPhaveext.prefix/ext.suffixconditional onparams.save_output_as_bam.bam_applybqsr,bam_markduplicates,bam_markduplicates_spark, andbam_sentieon_dedupeach expose a singlealignmentchannel that mixes their BAM and CRAM output branches — only one is ever populated per sample, so there is no duplicate emission to join on..minus(".cram")hack.CRAM_TO_BAM/CRAM_TO_BAM_RECALconversion steps at the preprocessing stages are gone. The variant-calling-stageCRAM_TO_BAM(needed for cnvkit/msisensor2/muse BAM input) still runs but now only publishes its output when--save_output_as_bamis set.Issues closed
GATK4_APPLYBQSRduplicate-emission on left channel.recal.bambut downstream looked for.recal.crampreprocessing/converted/cram_to_bam/without the flag--save_output_as_bam --save_mappedjoin mismatch.md.cram.metricsinstead of.md.bam.metricswhen saving as BAM--step prepare_recalibrationwith BAM input--save_mappedproduced unindexable outputTest plan
nf-test test tests/save_output_as_bam.nf.test --profile debug,test,docker— both scenarios passnf-test test tests/default.nf.test --profile debug,test,docker— default test passes (no BAM artifacts without flag)--save_output_as_bam: BAM files inpreprocessing/markduplicates/andpreprocessing/recalibrated/, variant calling runspreprocessing/converted/cram_to_bam/artifactspreprocessing/sentieon_dedup/when flag is set--use_gatk_spark markduplicates --save_mappederrors out at parameter validation🤖 Generated with Claude Code