-
Notifications
You must be signed in to change notification settings - Fork 941
feat(fastq_qc_trim_filter_setstrandedness): Add Bowtie2 as alternative rRNA removal tool #9474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(fastq_qc_trim_filter_setstrandedness): Add Bowtie2 as alternative rRNA removal tool #9474
Conversation
…e rRNA removal tool Add bowtie2 as a third option for rRNA removal alongside sortmerna and ribodetector. Implementation details: - Paired-end: Uses samtools view -f 12 to filter pairs where BOTH mates are unmapped (bowtie2's --un-conc-gz incorrectly includes pairs where one mate aligned) - Single-end: Uses bowtie2's --un-gz directly via save_unaligned=true - Converts U→T in rRNA reference FASTAs (RNA sequences contain U, reads contain T) Changes: - Add BOWTIE2_ALIGN, BOWTIE2_ALIGN_PE, BOWTIE2_BUILD module imports - Add SAMTOOLS_VIEW and SAMTOOLS_FASTQ for paired-end filtering - Add ch_bowtie2_index input and make_bowtie2_index parameter - Update meta.yml with bowtie2 in ribo_removal_tool enum - Add paired-end and single-end bowtie2 test cases 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
2b3d4db to
73468c8
Compare
suhrig
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for this great addition (at amazing speed)!
I agree that moving the rRNA removal into a separate space would be sensible, but if bandwidth is limited, then it should be okay the way it is.
…to fastq_remove_rrna subworkflow Factor out rRNA removal logic (sortmerna, ribodetector, bowtie2) into a dedicated fastq_remove_rrna subworkflow for better modularity and reusability. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Add cat/fastq tag to fastq_remove_rrna tests - Add subworkflows/fastq_remove_rrna tag to fastq_qc_trim_filter_setstrandedness tests - Remove obsolete module tags (now covered by subworkflow) - Add test snapshots for fastq_remove_rrna 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
|
@suhrig Factoring out complete! The failure in CI is because the factoring-out triggered the issue described in hzi-bifo/RiboDetector#59 for ribodetector (I think). I think we'll need that fix to propagate to Conda channels before we can set a seed for testing purposes and this can be completed. Edit: I also hadn't sorted the reads |
…add-bowtie2-rrna-removal
f4f176f to
9e5dc0f
Compare
- Convert ribodetector module to topics syntax for version collection - Update container to Wave container for ribodetector 0.3.2 - Remove .out.versions access for modules using topics - Update test snapshots for topic-based version format - Clean up sed syntax and fix meta.yml quoting 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
9e5dc0f to
a9083a6
Compare
| { assert path(process.out.log[0][1]).getText().contains("Writing output non-rRNA sequences") }, | ||
| { assert snapshot(process.out.versions).match() } | ||
| { assert path(process.out.log[0][1]).getText().contains("Writing output non-rRNA sequences") } | ||
| // Note: versions collected via topic, not snapshotted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be snapshoting the versions. There's a snippet for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, it was set to auto-merge! Will pr a fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary
Add bowtie2 as a third option for rRNA removal alongside sortmerna and ribodetector, providing users with more flexibility in choosing their preferred alignment-based rRNA filtering approach.
Implementation Details
Paired-end Read Handling
For paired-end reads, proper rRNA removal requires filtering pairs where either mate aligned to rRNA. The implementation:
samtools view -f 12(flag 12 = both mates unmapped)samtools fastqSingle-end Read Handling
For single-end reads, bowtie2's
--un-gzoutput is used directly viasave_unaligned=true.U→T Conversion
rRNA reference FASTAs may contain uracil (U) since they're RNA sequences, but sequencing reads contain thymine (T). The implementation converts U→T when building the bowtie2 index.
Changes
BOWTIE2_ALIGN,BOWTIE2_ALIGN_PE, andBOWTIE2_BUILDmodule importsSAMTOOLS_VIEWandSAMTOOLS_FASTQfor paired-end filteringch_bowtie2_indexinput channel for pre-built indexesmake_bowtie2_indexparameter for on-the-fly index buildingribo_removal_toolenum and new componentsTest Results
The paired-end bowtie2 test achieves the same output (1127 pairs) as sortmerna, correctly removing all 10 synthetic rRNA pairs.
Test plan
Proposed refactor before merge
🤖 Generated with Claude Code