Propagate full cluster read support in discover by riasc · Pull Request #20 · ylab-hi/atroplex

riasc · 2026-04-08T23:42:07Z

Summary

Matched segments: Previously only recorded read_coverage = 1 (representative read) per cluster. Now propagates the full cluster read count, giving accurate coverage from BAM input.
Novel segments: Same fix — uses add_read_support(cluster.read_count()) instead of adding only the representative.
Removed supporting_reads vector: Storing every read ID string would blow up memory for large BAMs. Only the count (read_coverage) is tracked. Read IDs can be recovered from the BAM if needed.
Serialization format change: segment_feature no longer serializes supporting_reads. Breaking change for .ggx files (pre-production, no impact).
CI: Release builds skip tests (Debug catches the same issues).

Before this fix, a cluster of 50 reads matching an existing segment would record read_coverage = 1. Now it correctly records read_coverage = 50.

QC

I, as a human being, have checked each line of code in this pull request
Project builds successfully in CLion
All CI checks pass (GCC 13/14, Clang 18, macOS)
All tests pass (absorption, discover, query, serialization)

🤖 Generated with Claude Code

Previously, only the representative read ID was added to matched segments and novel segments. Now all cluster member read IDs are tracked, giving accurate read_coverage counts for BAM-based discovery.

Storing every read ID string on segments would blow up memory for large BAM datasets. The count alone is sufficient — read IDs can be recovered from the BAM file if needed. Breaking change: .ggx serialization format no longer includes supporting_reads (field removed from segment_feature).

Tests verify that: - Cluster read count is propagated to matched segments via update_grove() - Read coverage accumulates across multiple clusters matching the same segment

Debug builds catch the same issues; Release only verifies the build compiles with optimizations.

riasc · 2026-04-09T02:22:09Z

Code Review — PR #20

Verdict: Approve

Diff: 5 files, +91/-35

File	Assessment
`include/genomic_feature.hpp`	`supporting_reads` vector removed. `add_read_support(size_t count = 1)` — clean, default arg maintains backward compat.
`src/genomic_feature.cpp`	Serialization updated — no more string vector read/write. Breaking .ggx format change, appropriate since serialization is pre-production.
`src/transcript_matcher.cpp`	Matched segments: `seg.add_read_support(cluster.read_count())` — correct, propagates full count. Novel segments: same pattern, consistent.
`tests/discover/sqanti_category_test.cpp`	Two new tests: single-cluster (5 reads → coverage=5) and accumulation (3+7 → coverage=10). Both verify the fix.
`.github/workflows/ci.yml`	Skip tests on Release — correct, Debug catches the same issues.

No issues found

add_read_support(size_t count) correctly accumulates across multiple clusters
No remaining references to supporting_reads in the codebase
Serialization format change is safe (no production .ggx files exist)
Tests cover both single-cluster and multi-cluster accumulation

PR description is stale

Body still mentions "read IDs tracked" and supporting_reads — should be updated to reflect the count-only approach.

🤖 Generated with Claude Code

Propagate full cluster read support to matched and novel segments

693e106

Previously, only the representative read ID was added to matched segments and novel segments. Now all cluster member read IDs are tracked, giving accurate read_coverage counts for BAM-based discovery.

riasc added the bug Something isn't working label Apr 8, 2026

riasc added 3 commits April 8, 2026 18:47

Add tests for read_coverage propagation to matched segments

c54cd57

Tests verify that: - Cluster read count is propagated to matched segments via update_grove() - Read coverage accumulates across multiple clusters matching the same segment

Skip tests on Release builds in CI

c86a9d0

Debug builds catch the same issues; Release only verifies the build compiles with optimizations.

Update CHANGELOG with PR #20 changes

170798d

riasc merged commit b71fe23 into main Apr 9, 2026
6 of 8 checks passed

riasc deleted the fix/discover-read-support branch April 9, 2026 02:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Propagate full cluster read support in discover#20

Propagate full cluster read support in discover#20
riasc merged 5 commits intomainfrom
fix/discover-read-support

riasc commented Apr 8, 2026 •

edited

Loading

Uh oh!

riasc commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

riasc commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

QC

Uh oh!

riasc commented Apr 9, 2026

Code Review — PR #20

Diff: 5 files, +91/-35

No issues found

PR description is stale

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

riasc commented Apr 8, 2026 •

edited

Loading