Skip to content

Propagate full cluster read support in discover#20

Merged
riasc merged 5 commits intomainfrom
fix/discover-read-support
Apr 9, 2026
Merged

Propagate full cluster read support in discover#20
riasc merged 5 commits intomainfrom
fix/discover-read-support

Conversation

@riasc
Copy link
Copy Markdown
Collaborator

@riasc riasc commented Apr 8, 2026

Summary

  • Matched segments: Previously only recorded read_coverage = 1 (representative read) per cluster. Now propagates the full cluster read count, giving accurate coverage from BAM input.
  • Novel segments: Same fix — uses add_read_support(cluster.read_count()) instead of adding only the representative.
  • Removed supporting_reads vector: Storing every read ID string would blow up memory for large BAMs. Only the count (read_coverage) is tracked. Read IDs can be recovered from the BAM if needed.
  • Serialization format change: segment_feature no longer serializes supporting_reads. Breaking change for .ggx files (pre-production, no impact).
  • CI: Release builds skip tests (Debug catches the same issues).

Before this fix, a cluster of 50 reads matching an existing segment would record read_coverage = 1. Now it correctly records read_coverage = 50.

QC

  • I, as a human being, have checked each line of code in this pull request
  • Project builds successfully in CLion
  • All CI checks pass (GCC 13/14, Clang 18, macOS)
  • All tests pass (absorption, discover, query, serialization)

🤖 Generated with Claude Code

Previously, only the representative read ID was added to matched segments
and novel segments. Now all cluster member read IDs are tracked, giving
accurate read_coverage counts for BAM-based discovery.
@riasc riasc added the bug Something isn't working label Apr 8, 2026
riasc added 3 commits April 8, 2026 18:47
Storing every read ID string on segments would blow up memory for large
BAM datasets. The count alone is sufficient — read IDs can be recovered
from the BAM file if needed.

Breaking change: .ggx serialization format no longer includes
supporting_reads (field removed from segment_feature).
Tests verify that:
- Cluster read count is propagated to matched segments via update_grove()
- Read coverage accumulates across multiple clusters matching the same segment
Debug builds catch the same issues; Release only verifies the build
compiles with optimizations.
@riasc
Copy link
Copy Markdown
Collaborator Author

riasc commented Apr 9, 2026

Code Review — PR #20

Verdict: Approve

Diff: 5 files, +91/-35

File Assessment
include/genomic_feature.hpp supporting_reads vector removed. add_read_support(size_t count = 1) — clean, default arg maintains backward compat.
src/genomic_feature.cpp Serialization updated — no more string vector read/write. Breaking .ggx format change, appropriate since serialization is pre-production.
src/transcript_matcher.cpp Matched segments: seg.add_read_support(cluster.read_count()) — correct, propagates full count. Novel segments: same pattern, consistent.
tests/discover/sqanti_category_test.cpp Two new tests: single-cluster (5 reads → coverage=5) and accumulation (3+7 → coverage=10). Both verify the fix.
.github/workflows/ci.yml Skip tests on Release — correct, Debug catches the same issues.

No issues found

  • add_read_support(size_t count) correctly accumulates across multiple clusters
  • No remaining references to supporting_reads in the codebase
  • Serialization format change is safe (no production .ggx files exist)
  • Tests cover both single-cluster and multi-cluster accumulation

PR description is stale

Body still mentions "read IDs tracked" and supporting_reads — should be updated to reflect the count-only approach.

🤖 Generated with Claude Code

@riasc riasc merged commit b71fe23 into main Apr 9, 2026
6 of 8 checks passed
@riasc riasc deleted the fix/discover-read-support branch April 9, 2026 02:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant