-
Notifications
You must be signed in to change notification settings - Fork 242
Description
The NM tag describes the number of mismatches between the read and the reference. According to the spec, this is defined as:
Number of differences (mismatches plus inserted and deleted bases) between the sequence and
reference, counting only (case-insensitive) A, C, G and T bases in sequence and reference as potential
matches, with everything else being a mismatch. Note this means that ambiguity codes in both
sequence and reference that match each other, such as ‘N’ in both, or compatible codes such as ‘A’ and
‘R’, are still counted as mismatches
Currently, matching ambiguity codes, such as N - N, are counted as matches, which is logically correct, but according to the spec, they should be counted as mismatches. This results in Picard's ValidateSamFile to report false errors. Note that this only happens if the read sequence contains ambiguity codes.