Propagate full cluster read support in discover#20
Merged
Conversation
Previously, only the representative read ID was added to matched segments and novel segments. Now all cluster member read IDs are tracked, giving accurate read_coverage counts for BAM-based discovery.
Storing every read ID string on segments would blow up memory for large BAM datasets. The count alone is sufficient — read IDs can be recovered from the BAM file if needed. Breaking change: .ggx serialization format no longer includes supporting_reads (field removed from segment_feature).
Tests verify that: - Cluster read count is propagated to matched segments via update_grove() - Read coverage accumulates across multiple clusters matching the same segment
Debug builds catch the same issues; Release only verifies the build compiles with optimizations.
Collaborator
Author
Code Review — PR #20Verdict: Approve Diff: 5 files, +91/-35
No issues found
PR description is staleBody still mentions "read IDs tracked" and 🤖 Generated with Claude Code |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
read_coverage = 1(representative read) per cluster. Now propagates the full cluster read count, giving accurate coverage from BAM input.add_read_support(cluster.read_count())instead of adding only the representative.supporting_readsvector: Storing every read ID string would blow up memory for large BAMs. Only the count (read_coverage) is tracked. Read IDs can be recovered from the BAM if needed.segment_featureno longer serializessupporting_reads. Breaking change for .ggx files (pre-production, no impact).Before this fix, a cluster of 50 reads matching an existing segment would record
read_coverage = 1. Now it correctly recordsread_coverage = 50.QC
🤖 Generated with Claude Code