Skip to content

Comments

fix off-by-ones, and implement variable fmt uniwig#230

Merged
nsheff merged 2 commits intodevfrom
uniwig-fixes
Feb 20, 2026
Merged

fix off-by-ones, and implement variable fmt uniwig#230
nsheff merged 2 commits intodevfrom
uniwig-fixes

Conversation

@nsheff
Copy link
Member

@nsheff nsheff commented Feb 18, 2026

No description provided.

@donaldcampbelljr
Copy link
Member

I have some hesitation regarding the changes because I see that the outputs in the test files have changed.

I spent some time last August doing a review on the counting function, making tweaks and confirming the output: 0f7ce42

Could you explain why this implementation is better for the output's intended use case? Or give a specific example where the original implementation was insufficient (or inaccurate)?

@nsheff
Copy link
Member Author

nsheff commented Feb 18, 2026

Right. the reason the test files changed is that I adjusted the 0-based output to 1-based to match wiggle file format.
I can give better documentation of the specific issues a bit later. the more important issues weren't captured by tests, so those changes didn't trigger test changes.

Copy link
Member

@donaldcampbelljr donaldcampbelljr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, these changes seem reasonable.

@nsheff
Copy link
Member Author

nsheff commented Feb 19, 2026

More explanation on the +1 coordinate fix in gtars uniwig:

Quick Summary

The gtars uniwig tool accepts two input formats: BED files (0-based coordinates) and BAM files (1-based coordinates). The +1 fix in reading.rs converts BED start positions from 0-based to 1-based so that downstream counting and WIG output are correct. BAM files, read via noodles alignment_start(), arrive already 1-based and bypass reading.rs entirely.

The Two Input Paths

BED Input Path

When the user provides -t bed, uniwig reads coordinates through reading.rs. BED format is 0-based, but WIG output is 1-based. The fix adds 1 at read time:

BED file        reading.rs              Internal        WIG output
(0-based)       (+1 conversion)         (1-based)       (1-based)
---------       ---------------         ---------       ----------
chr1  19  20 -> parsed_start + 1 = 20 -> starts: [20] -> position 20

The relevant code in reading.rs:

chromosome.starts.push((parsed_start + 1, default_score)); // Convert 0-based BED to 1-based

Without this fix, a BED start of 19 would be treated as position 19 internally, producing WIG output shifted one position too low.

BAM Input Path

When the user provides -t bam, uniwig takes a different code path. It never calls reading.rs. Instead, process_bam() in lib.rs dispatches to counting functions in counting.rs that read BAM records directly using the noodles library:

BAM file        noodles library         Internal        WIG/BW output
(1-based POS)   alignment_start()       (1-based)       (1-based)
-----------     -----------------       ---------       -------------
POS = 20     -> .get() = 20          -> value: 20   -> position 20

The relevant code in counting.rs:

first_record.alignment_start().unwrap().unwrap().get() as i32

The SAM/BAM specification defines the POS field as 1-based. The noodles Position type preserves this: .get() returns the 1-based integer directly. No conversion is needed.

Why PEPATAC Is Unaffected

PEPATAC calls uniwig with -t bam, passing aligned BAM files as input. This means:

  1. PEPATAC's data flows through process_bam() in lib.rs and the counting functions in counting.rs.
  2. It never touches reading.rs, where the +1 fix lives.
  3. The noodles alignment_start() call already returns 1-based positions from the BAM POS field.

The +1 fix only changes behavior for BED input. Any pipeline that uses BAM input -- including PEPATAC -- is unaffected.

@nsheff
Copy link
Member Author

nsheff commented Feb 19, 2026

for the edge clamping error, here's the explanation:

Input

single.bed (BED 0-based: position 3 in 1-based)

chr1 2 3

chrom.sizes

chr1 20

Command

gtars uniwig \
--file single.bed \
--chromref chrom.sizes \
--smoothsize 5 \
--stepsize 1 \
--fileheader output \
--outputtype wig \
--counttype start \
--filetype bed

Output Comparison

Position OLD (v0.6) NEW (v0.7)
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
6 1 1
7 1 1
8 1 1
9 1 0
10 1 0
11 1 0
12 0 0
  • OLD: 11 positions of coverage (1-11)
  • NEW: 8 positions of coverage (1-8)

The Bug

A read at position 3 with smoothsize=5 should produce a window from 3-5 to 3+5, i.e., positions -2 to 8. Clamped to chromosome bounds: positions
1-8 (8 positions).

OLD algorithm (broken):
clamped_start = max(1, 3 - 5) = 1
end_site = clamped_start + 1 + 2*smoothsize = 1 + 1 + 10 = 12
// Result: coverage from 1 to 11 (11 positions) ❌

NEW algorithm (fixed):
original_position = 3
clamped_start = max(1, 3 - 5) = 1
end_site = original_position + smoothsize + 1 = 3 + 5 + 1 = 9
// Result: coverage from 1 to 8 (8 positions) ✓

The fix: calculate the end position from the original read position, not the clamped start.

Copy link
Member

@donaldcampbelljr donaldcampbelljr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@donaldcampbelljr
Copy link
Member

Great. Thank you for the follow up explanations.

@nsheff nsheff merged commit e9c0edd into dev Feb 20, 2026
@nsheff nsheff deleted the uniwig-fixes branch February 20, 2026 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants