Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cram: Clarify the preservation map's reference required flag for unmapped slices #809

Open
zaeleus opened this issue Feb 4, 2025 · 0 comments

Comments

@zaeleus
Copy link

zaeleus commented Feb 4, 2025

This is in regard to CRAM format specification (version 3.1) (2024-09-04).

§ 8.4.1 "Compression header block: Preservation map" describes the reference required (RR) flag as "true if reference sequence is required to restore the data completely". If this is true and the records do not require a reference sequence to restore the data (e.g., an unmapped slice), is it considered an invalid state?

As I understand it, this flag should be false if the slice is unmapped, but some implementation don't set it as such, e.g., htslib:

$ samtools --version | head -2
samtools 1.21
Using htslib 1.21

$ (
samtools view --output-fmt cram <<EOF
@HD	VN:1.6
r1	4	*	0	0	*	*	0	0	NNNN	!!!!
EOF
) | cram_dump - | grep --after-context 4 "Preservation map" | head -5
      Preservation map:
        SM => 30398990 (0x1cfda0e)
        TD => 30398986 (0x1cfda0a)
        RN => 1 (0x1)
        AP => 1 (0x1)

All the records in this slice are unmapped (cram_dump: "Slice ref seq -1"), and this implicitly sets the reference required field to true, as per "The boolean values are optional, defaulting to true when absent..." However, a reference sequence is not required to decode this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant