Skip to content

Conversation

@pinin4fjords
Copy link
Member

@pinin4fjords pinin4fjords commented Nov 28, 2025

Summary

Add bowtie2 as a third option for rRNA removal alongside sortmerna and ribodetector, providing users with more flexibility in choosing their preferred alignment-based rRNA filtering approach.

Implementation Details

Paired-end Read Handling

For paired-end reads, proper rRNA removal requires filtering pairs where either mate aligned to rRNA. The implementation:

  1. Aligns reads with BOWTIE2_ALIGN_PE
  2. Filters BAM with samtools view -f 12 (flag 12 = both mates unmapped)
  3. Converts filtered BAM back to FASTQ with samtools fastq

Single-end Read Handling

For single-end reads, bowtie2's --un-gz output is used directly via save_unaligned=true.

U→T Conversion

rRNA reference FASTAs may contain uracil (U) since they're RNA sequences, but sequencing reads contain thymine (T). The implementation converts U→T when building the bowtie2 index.

Changes

  • Add BOWTIE2_ALIGN, BOWTIE2_ALIGN_PE, and BOWTIE2_BUILD module imports
  • Add SAMTOOLS_VIEW and SAMTOOLS_FASTQ for paired-end filtering
  • Add ch_bowtie2_index input channel for pre-built indexes
  • Add make_bowtie2_index parameter for on-the-fly index building
  • Implement single-end/paired-end branching with appropriate filtering strategy
  • Update meta.yml with bowtie2 in ribo_removal_tool enum and new components
  • Add test cases for both paired-end and single-end bowtie2 rRNA removal
  • Update nextflow.config with bowtie2 and samtools process configurations

Test Results

Test Input After Trim rRNA Detected Output
PE fastp bowtie2 4169 pairs 1137 pairs 17 mates (10 pairs) 1127 pairs
SE fastp bowtie2 4169 reads 4162 reads 10 reads 4152 reads

The paired-end bowtie2 test achieves the same output (1127 pairs) as sortmerna, correctly removing all 10 synthetic rRNA pairs.

Test plan

  • Run existing sortmerna and ribodetector tests to ensure no regression
  • Run paired-end bowtie2 test
  • Run single-end bowtie2 test
  • Verify bowtie2 log output is captured for MultiQC

Proposed refactor before merge

Note: This subworkflow has grown substantially with three rRNA removal tools (sortmerna, ribodetector, bowtie2), each with their own index building and filtering logic. I propose we factor out the rRNA removal components into a dedicated fastq_remove_rrna subworkflow before merging this PR, once we confirm bowtie2 should be retained as a method.

🤖 Generated with Claude Code

…e rRNA removal tool

Add bowtie2 as a third option for rRNA removal alongside sortmerna and ribodetector.

Implementation details:
- Paired-end: Uses samtools view -f 12 to filter pairs where BOTH mates are unmapped
  (bowtie2's --un-conc-gz incorrectly includes pairs where one mate aligned)
- Single-end: Uses bowtie2's --un-gz directly via save_unaligned=true
- Converts U→T in rRNA reference FASTAs (RNA sequences contain U, reads contain T)

Changes:
- Add BOWTIE2_ALIGN, BOWTIE2_ALIGN_PE, BOWTIE2_BUILD module imports
- Add SAMTOOLS_VIEW and SAMTOOLS_FASTQ for paired-end filtering
- Add ch_bowtie2_index input and make_bowtie2_index parameter
- Update meta.yml with bowtie2 in ribo_removal_tool enum
- Add paired-end and single-end bowtie2 test cases

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@pinin4fjords pinin4fjords force-pushed the feat/fastq_qc_trim_filter_setstrandedness-add-bowtie2-rrna-removal branch from 2b3d4db to 73468c8 Compare November 28, 2025 11:12
Copy link
Contributor

@suhrig suhrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for this great addition (at amazing speed)!

I agree that moving the rRNA removal into a separate space would be sensible, but if bandwidth is limited, then it should be okay the way it is.

pinin4fjords and others added 3 commits November 28, 2025 11:46
…to fastq_remove_rrna subworkflow

Factor out rRNA removal logic (sortmerna, ribodetector, bowtie2) into
a dedicated fastq_remove_rrna subworkflow for better modularity and
reusability.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Add cat/fastq tag to fastq_remove_rrna tests
- Add subworkflows/fastq_remove_rrna tag to fastq_qc_trim_filter_setstrandedness tests
- Remove obsolete module tags (now covered by subworkflow)
- Add test snapshots for fastq_remove_rrna

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@pinin4fjords
Copy link
Member Author

pinin4fjords commented Nov 28, 2025

@suhrig Factoring out complete!

The failure in CI is because the factoring-out triggered the issue described in hzi-bifo/RiboDetector#59 for ribodetector (I think). I think we'll need that fix to propagate to Conda channels before we can set a seed for testing purposes and this can be completed.

Edit: I also hadn't sorted the reads

@pinin4fjords pinin4fjords marked this pull request as ready for review November 29, 2025 18:34
@pinin4fjords pinin4fjords requested review from a team as code owners November 29, 2025 18:34
@pinin4fjords pinin4fjords removed request for a team November 29, 2025 18:51
@pinin4fjords pinin4fjords force-pushed the feat/fastq_qc_trim_filter_setstrandedness-add-bowtie2-rrna-removal branch 2 times, most recently from f4f176f to 9e5dc0f Compare November 29, 2025 20:22
- Convert ribodetector module to topics syntax for version collection
- Update container to Wave container for ribodetector 0.3.2
- Remove .out.versions access for modules using topics
- Update test snapshots for topic-based version format
- Clean up sed syntax and fix meta.yml quoting

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
{ assert path(process.out.log[0][1]).getText().contains("Writing output non-rRNA sequences") },
{ assert snapshot(process.out.versions).match() }
{ assert path(process.out.log[0][1]).getText().contains("Writing output non-rRNA sequences") }
// Note: versions collected via topic, not snapshotted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be snapshoting the versions. There's a snippet for that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, it was set to auto-merge! Will pr a fix

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pinin4fjords pinin4fjords added this pull request to the merge queue Nov 29, 2025
Merged via the queue into master with commit 61ce58a Nov 29, 2025
60 checks passed
@pinin4fjords pinin4fjords deleted the feat/fastq_qc_trim_filter_setstrandedness-add-bowtie2-rrna-removal branch November 29, 2025 21:01
@iraiosub iraiosub mentioned this pull request Dec 1, 2025
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants