Skip to content

Commit 61ce58a

Browse files
pinin4fjordsclaude
andauthored
feat(fastq_qc_trim_filter_setstrandedness): Add Bowtie2 as alternative rRNA removal tool (#9474)
* feat(fastq_qc_trim_filter_setstrandedness): Add Bowtie2 as alternative rRNA removal tool Add bowtie2 as a third option for rRNA removal alongside sortmerna and ribodetector. Implementation details: - Paired-end: Uses samtools view -f 12 to filter pairs where BOTH mates are unmapped (bowtie2's --un-conc-gz incorrectly includes pairs where one mate aligned) - Single-end: Uses bowtie2's --un-gz directly via save_unaligned=true - Converts U→T in rRNA reference FASTAs (RNA sequences contain U, reads contain T) Changes: - Add BOWTIE2_ALIGN, BOWTIE2_ALIGN_PE, BOWTIE2_BUILD module imports - Add SAMTOOLS_VIEW and SAMTOOLS_FASTQ for paired-end filtering - Add ch_bowtie2_index input and make_bowtie2_index parameter - Update meta.yml with bowtie2 in ribo_removal_tool enum - Add paired-end and single-end bowtie2 test cases 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * refactor(fastq_qc_trim_filter_setstrandedness): Extract rRNA removal to fastq_remove_rrna subworkflow Factor out rRNA removal logic (sortmerna, ribodetector, bowtie2) into a dedicated fastq_remove_rrna subworkflow for better modularity and reusability. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Add snapshot * fix(tests): Add missing tags for linting compliance - Add cat/fastq tag to fastq_remove_rrna tests - Add subworkflows/fastq_remove_rrna tag to fastq_qc_trim_filter_setstrandedness tests - Remove obsolete module tags (now covered by subworkflow) - Add test snapshots for fastq_remove_rrna 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * bump ribodetector * Update ribodetector snaps * Fix snapshot instability * fix(ribodetector): Update to topics syntax for version collection - Convert ribodetector module to topics syntax for version collection - Update container to Wave container for ribodetector 0.3.2 - Remove .out.versions access for modules using topics - Update test snapshots for topic-based version format - Clean up sed syntax and fix meta.yml quoting 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]>
1 parent 0b24358 commit 61ce58a

File tree

15 files changed

+1057
-198
lines changed

15 files changed

+1057
-198
lines changed

modules/nf-core/ribodetector/environment.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,4 @@ channels:
44
- conda-forge
55
- bioconda
66
dependencies:
7-
- "bioconda::ribodetector=0.3.1"
7+
- "bioconda::ribodetector=0.3.2"

modules/nf-core/ribodetector/main.nf

Lines changed: 4 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@ process RIBODETECTOR {
44

55
conda "${moduleDir}/environment.yml"
66
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
7-
'https://depot.galaxyproject.org/singularity/ribodetector:0.3.1--pyhdfd78af_0':
8-
'biocontainers/ribodetector:0.3.1--pyhdfd78af_0' }"
7+
'https://community-cr-prod.seqera.io/docker/registry/v2/blobs/sha256/4d/4de8fe74d21198e6fc8218cb3209d929b3d7dab750678501b096b0ccc324307b/data' :
8+
'community.wave.seqera.io/library/ribodetector:0.3.2--cbe1c77fa14eeb53' }"
99

1010
input:
1111
tuple val(meta), path(fastq)
@@ -14,7 +14,7 @@ process RIBODETECTOR {
1414
output:
1515
tuple val(meta), path("*.nonrna*.fastq.gz"), emit: fastq
1616
tuple val(meta), path("*.log") , emit: log
17-
path "versions.yml" , emit: versions
17+
tuple val("${task.process}"), val('ribodetector'), eval('ribodetector --version | sed "s/ribodetector //"'), emit: versions_ribodetector, topic: versions
1818

1919
when:
2020
task.ext.when == null || task.ext.when
@@ -35,11 +35,6 @@ process RIBODETECTOR {
3535
--log ${prefix}.log \\
3636
${ribodetector_mem} \\
3737
${args}
38-
39-
cat <<-END_VERSIONS > versions.yml
40-
"${task.process}":
41-
ribodetector: \$(ribodetector --version | sed 's/ribodetector //g')
42-
END_VERSIONS
4338
"""
4439

4540
stub:
@@ -50,12 +45,7 @@ process RIBODETECTOR {
5045
echo $args
5146
5247
echo | gzip > ${prefix}.nonrna.1.fastq.gz
53-
echo | gzip > ${prefix}.nonrna.2.fastq.gz
48+
echo | gzip > ${prefix}.nonrna.2.fastq.gz
5449
touch ${prefix}.log
55-
56-
cat <<-END_VERSIONS > versions.yml
57-
"${task.process}":
58-
ribodetector: \$(ribodetector --version | sed 's/ribodetector //g')
59-
END_VERSIONS
6050
"""
6151
}

modules/nf-core/ribodetector/meta.yml

Lines changed: 25 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json
22
name: "ribodetector"
3-
description: Accurate and rapid RiboRNA sequences Detector based on deep
4-
learning
3+
description: Accurate and rapid RiboRNA sequences Detector based on deep learning
54
keywords:
65
- RNA
76
- RNAseq
@@ -16,10 +15,10 @@ keywords:
1615
tools:
1716
- ribodetector:
1817
description: Accurate and rapid RiboRNA sequences detector based on deep learning.
19-
RiboDetector uses a deep learning approach to identify rRNA sequences in
20-
ribosome profiling (Ribo-seq) data. It can be used to filter out rRNA reads
21-
from Ribo-seq datasets, improving the quality of downstream analyses. As of version
22-
0.3.1, Ribodetector doesn't support setting a random seed, so results may not be fully
18+
RiboDetector uses a deep learning approach to identify rRNA sequences in ribosome
19+
profiling (Ribo-seq) data. It can be used to filter out rRNA reads from Ribo-seq
20+
datasets, improving the quality of downstream analyses. As of version 0.3.1,
21+
Ribodetector doesn't support setting a random seed, so results may not be fully
2322
deterministic across runs.
2423
homepage: "https://github.com/hzi-bifo/RiboDetector"
2524
documentation: "https://github.com/hzi-bifo/RiboDetector"
@@ -67,13 +66,27 @@ output:
6766
description: Log file from RiboDetector
6867
pattern: "*.log"
6968
ontologies: []
69+
versions_ribodetector:
70+
- - ${task.process}:
71+
type: string
72+
description: Name of the process
73+
- ribodetector:
74+
type: string
75+
description: Name of the tool
76+
- ribodetector --version | sed "s/ribodetector //:
77+
type: string
78+
description: Version of ribodetector used
79+
topics:
7080
versions:
71-
- versions.yml:
72-
type: file
73-
description: File containing software versions
74-
pattern: versions.yml
75-
ontologies:
76-
- edam: http://edamontology.org/format_3750 # YAML
81+
- - ${task.process}:
82+
type: string
83+
description: Name of the process
84+
- ribodetector:
85+
type: string
86+
description: Name of the tool
87+
- ribodetector --version | sed "s/ribodetector //:
88+
type: string
89+
description: Version of ribodetector used
7790
authors:
7891
- "@maxibor"
7992
maintainers:

modules/nf-core/ribodetector/tests/main.nf.test

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,8 @@ nextflow_process {
2929
{ assert process.success },
3030
{ assert process.out.fastq },
3131
{ assert process.out.log },
32-
{ assert path(process.out.log[0][1]).getText().contains("Writing output non-rRNA sequences") },
33-
{ assert snapshot(process.out.versions).match() }
32+
{ assert path(process.out.log[0][1]).getText().contains("Writing output non-rRNA sequences") }
33+
// Note: versions collected via topic, not snapshotted
3434
)
3535
}
3636

modules/nf-core/ribodetector/tests/main.nf.test.snap

Lines changed: 12 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,4 @@
11
{
2-
"ribodetector - rnaseq PE input": {
3-
"content": [
4-
[
5-
"versions.yml:md5,f98df8f0eaa704e4db74785adc9cc791"
6-
]
7-
],
8-
"meta": {
9-
"nf-test": "0.9.3",
10-
"nextflow": "25.10.0"
11-
},
12-
"timestamp": "2025-11-07T13:20:15.909875"
13-
},
142
"ribodetector - stub rnaseq PE input": {
153
"content": [
164
{
@@ -36,7 +24,11 @@
3624
]
3725
],
3826
"2": [
39-
"versions.yml:md5,f98df8f0eaa704e4db74785adc9cc791"
27+
[
28+
"RIBODETECTOR",
29+
"ribodetector",
30+
"0.3.2"
31+
]
4032
],
4133
"fastq": [
4234
[
@@ -59,15 +51,19 @@
5951
"test.log:md5,d41d8cd98f00b204e9800998ecf8427e"
6052
]
6153
],
62-
"versions": [
63-
"versions.yml:md5,f98df8f0eaa704e4db74785adc9cc791"
54+
"versions_ribodetector": [
55+
[
56+
"RIBODETECTOR",
57+
"ribodetector",
58+
"0.3.2"
59+
]
6460
]
6561
}
6662
],
6763
"meta": {
6864
"nf-test": "0.9.3",
6965
"nextflow": "25.10.0"
7066
},
71-
"timestamp": "2025-11-07T13:20:26.026547"
67+
"timestamp": "2025-11-29T20:07:13.509994907"
7268
}
7369
}

subworkflows/nf-core/fastq_qc_trim_filter_setstrandedness/main.nf

Lines changed: 18 additions & 84 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,10 @@
11
include { BBMAP_BBSPLIT } from '../../../modules/nf-core/bbmap/bbsplit'
22
include { CAT_FASTQ } from '../../../modules/nf-core/cat/fastq/main'
3-
include { RIBODETECTOR } from '../../../modules/nf-core/ribodetector/main'
4-
include { SEQKIT_STATS } from '../../../modules/nf-core/seqkit/stats/main'
5-
include { SORTMERNA } from '../../../modules/nf-core/sortmerna/main'
6-
include { SORTMERNA as SORTMERNA_INDEX } from '../../../modules/nf-core/sortmerna/main'
73
include { FQ_LINT } from '../../../modules/nf-core/fq/lint/main'
84
include { FQ_LINT as FQ_LINT_AFTER_TRIMMING } from '../../../modules/nf-core/fq/lint/main'
95
include { FQ_LINT as FQ_LINT_AFTER_BBSPLIT } from '../../../modules/nf-core/fq/lint/main'
106
include { FQ_LINT as FQ_LINT_AFTER_RIBO_REMOVAL } from '../../../modules/nf-core/fq/lint/main'
7+
include { FASTQ_REMOVE_RRNA } from '../fastq_remove_rrna'
118
include { FASTQ_SUBSAMPLE_FQ_SALMON } from '../fastq_subsample_fq_salmon'
129
include { FASTQ_FASTQC_UMITOOLS_TRIMGALORE } from '../fastq_fastqc_umitools_trimgalore'
1310
include { FASTQ_FASTQC_UMITOOLS_FASTP } from '../fastq_fastqc_umitools_fastp'
@@ -84,29 +81,6 @@ def multiqcTsvFromList(tsv_data, header) {
8481
return tsv_string
8582
}
8683

87-
//
88-
// Function that parses seqkit stats TSV output to extract the mean read length
89-
// for use with RiboDetector's -l parameter
90-
//
91-
def getReadLengthFromSeqkitStats(stats_file) {
92-
def lines = stats_file.text.readLines()
93-
if (lines.size() < 2) {
94-
return 100 // Default fallback
95-
}
96-
97-
def header = lines[0].split('\t')
98-
def avgLenIdx = header.findIndexOf { it == 'avg_len' }
99-
if (avgLenIdx < 0) {
100-
return 100 // Default fallback if column not found
101-
}
102-
103-
// Calculate mean avg_len across all files in the stats output
104-
def avgLens = lines[1..-1].collect { it.split('\t')[avgLenIdx] as float }
105-
def meanAvgLen = avgLens.sum() / avgLens.size()
106-
107-
return Math.round(meanAvgLen) as int
108-
}
109-
11084
workflow FASTQ_QC_TRIM_FILTER_SETSTRANDEDNESS {
11185
take:
11286
// Input channels
@@ -116,8 +90,9 @@ workflow FASTQ_QC_TRIM_FILTER_SETSTRANDEDNESS {
11690
ch_gtf // channel: /path/to/genome.gtf
11791
ch_salmon_index // channel: /path/to/salmon/index/ (optional)
11892
ch_sortmerna_index // channel: /path/to/sortmerna/index/ (optional)
93+
ch_bowtie2_index // channel: /path/to/bowtie2/index/ (optional)
11994
ch_bbsplit_index // channel: /path/to/bbsplit/index/ (optional)
120-
ch_rrna_fastas // channel: one or more fasta files containing rrna sequences to be passed to SortMeRNA (optional)
95+
ch_rrna_fastas // channel: one or more fasta files containing rrna sequences to be passed to SortMeRNA/Bowtie2 (optional)
12196

12297
// Skip options
12398
skip_bbsplit // boolean: Skip BBSplit for removal of non-reference genome reads.
@@ -129,6 +104,7 @@ workflow FASTQ_QC_TRIM_FILTER_SETSTRANDEDNESS {
129104
// Index generation
130105
make_salmon_index // boolean: Whether to create salmon index before running salmon quant
131106
make_sortmerna_index // boolean: Whether to create a sortmerna index before running sortmerna
107+
make_bowtie2_index // boolean: Whether to create a bowtie2 index before running bowtie2
132108

133109
// Trimming options
134110
trimmer // string (enum): 'fastp' or 'trimgalore'
@@ -138,7 +114,7 @@ workflow FASTQ_QC_TRIM_FILTER_SETSTRANDEDNESS {
138114

139115
// rRNA removal options
140116
remove_ribo_rna // boolean: true/false: whether to remove rRNA
141-
ribo_removal_tool // string (enum): 'sortmerna' or 'ribodetector'
117+
ribo_removal_tool // string (enum): 'sortmerna', 'ribodetector', or 'bowtie2'
142118

143119
// UMI options
144120
with_umi // boolean: true/false: Enable UMI-based read deduplication.
@@ -294,64 +270,22 @@ workflow FASTQ_QC_TRIM_FILTER_SETSTRANDEDNESS {
294270
}
295271

296272
//
297-
// MODULE: Remove ribosomal RNA reads
273+
// SUBWORKFLOW: Remove ribosomal RNA reads
298274
//
299275
if (remove_ribo_rna) {
300-
if (ribo_removal_tool == 'sortmerna') {
301-
ch_sortmerna_fastas = ch_rrna_fastas
302-
.collect()
303-
.map { [[id: 'rrna_refs'], it] }
304-
305-
if (make_sortmerna_index) {
306-
SORTMERNA_INDEX(
307-
[[], []],
308-
ch_sortmerna_fastas,
309-
[[], []],
310-
)
311-
ch_sortmerna_index = SORTMERNA_INDEX.out.index.first()
312-
}
313-
314-
SORTMERNA(
315-
ch_filtered_reads,
316-
ch_sortmerna_fastas,
317-
ch_sortmerna_index,
318-
)
319-
320-
SORTMERNA.out.reads.set { ch_filtered_reads }
321-
322-
ch_multiqc_files = ch_multiqc_files.mix(SORTMERNA.out.log)
323-
324-
ch_versions = ch_versions.mix(SORTMERNA.out.versions.first())
325-
}
326-
else if (ribo_removal_tool == 'ribodetector') {
327-
// Run seqkit stats to determine average read length
328-
SEQKIT_STATS(
329-
ch_filtered_reads
330-
)
331-
332-
ch_versions = ch_versions.mix(SEQKIT_STATS.out.versions.first())
333-
334-
// Join stats with reads and calculate read length for RiboDetector
335-
ch_filtered_reads
336-
.join(SEQKIT_STATS.out.stats)
337-
.multiMap { meta, reads, stats ->
338-
def readLength = getReadLengthFromSeqkitStats(stats)
339-
reads: [meta, reads]
340-
length: readLength
341-
}
342-
.set { ch_reads_with_length }
343-
344-
RIBODETECTOR(
345-
ch_reads_with_length.reads,
346-
ch_reads_with_length.length,
347-
)
348-
349-
RIBODETECTOR.out.fastq.set { ch_filtered_reads }
350-
351-
ch_multiqc_files = ch_multiqc_files.mix(RIBODETECTOR.out.log)
276+
FASTQ_REMOVE_RRNA(
277+
ch_filtered_reads,
278+
ch_rrna_fastas,
279+
ch_sortmerna_index,
280+
ch_bowtie2_index,
281+
ribo_removal_tool,
282+
make_sortmerna_index,
283+
make_bowtie2_index,
284+
)
352285

353-
ch_versions = ch_versions.mix(RIBODETECTOR.out.versions.first())
354-
}
286+
ch_filtered_reads = FASTQ_REMOVE_RRNA.out.reads
287+
ch_multiqc_files = ch_multiqc_files.mix(FASTQ_REMOVE_RRNA.out.multiqc_files)
288+
ch_versions = ch_versions.mix(FASTQ_REMOVE_RRNA.out.versions)
355289

356290
if (!skip_linting) {
357291
FQ_LINT_AFTER_RIBO_REMOVAL(

subworkflows/nf-core/fastq_qc_trim_filter_setstrandedness/meta.yml

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -9,14 +9,9 @@ keywords:
99
- strandedness
1010
components:
1111
- bbmap/bbsplit
12-
- samtools/sort
13-
- samtools/index
14-
- cat
1512
- cat/fastq
1613
- fq/lint
17-
- ribodetector
18-
- seqkit/stats
19-
- sortmerna
14+
- fastq_remove_rrna
2015
- fastq_subsample_fq_salmon
2116
- fastq_fastqc_umitools_trimgalore
2217
- fastq_fastqc_umitools_fastp
@@ -79,6 +74,15 @@ input:
7974
- index:
8075
type: directory
8176
description: SortMeRNA index directory
77+
- ch_bowtie2_index:
78+
description: Directory containing bowtie2 index for rRNA removal
79+
structure:
80+
- meta:
81+
type: map
82+
description: Metadata for the Bowtie2 index
83+
- index:
84+
type: directory
85+
description: Bowtie2 index directory
8286
- ch_bbsplit_index:
8387
description: Path to directory or tar.gz archive for pre-built BBSplit index
8488
structure:
@@ -90,7 +94,7 @@ input:
9094
description: BBSplit index directory or tar.gz archive
9195
pattern: "{*,*.tar.gz}"
9296
- ch_rrna_fastas:
93-
description: Channel containing one or more FASTA files containing rRNA sequences for use with SortMeRNA
97+
description: Channel containing one or more FASTA files containing rRNA sequences for use with SortMeRNA or Bowtie2
9498
structure:
9599
- meta:
96100
type: map
@@ -120,6 +124,9 @@ input:
120124
- make_sortmerna_index:
121125
type: boolean
122126
description: Whether to create sortmerna index before running sortmerna
127+
- make_bowtie2_index:
128+
type: boolean
129+
description: Whether to create bowtie2 index before running bowtie2 for rRNA removal
123130
- trimmer:
124131
type: string
125132
description: Specifies the trimming tool to use
@@ -140,7 +147,7 @@ input:
140147
- ribo_removal_tool:
141148
type: string
142149
description: Specifies the rRNA removal tool to use
143-
enum: ["sortmerna", "ribodetector"]
150+
enum: ["sortmerna", "ribodetector", "bowtie2"]
144151
- with_umi:
145152
type: boolean
146153
description: Enable UMI-based read deduplication

0 commit comments

Comments
 (0)