Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VCF specs - Clarification of <DUP> vs <DUP:TANDEM> #811

Open
davmlaw opened this issue Feb 5, 2025 · 5 comments
Open

VCF specs - Clarification of <DUP> vs <DUP:TANDEM> #811

davmlaw opened this issue Feb 5, 2025 · 5 comments

Comments

@davmlaw
Copy link

davmlaw commented Feb 5, 2025

Hi, thanks for all your hard work maintaining the specs

I recently processed a Manta VCF that had alt=<DUP:TANDEM>

In the VCFv4.5 spec (also 4.2, and possibly earlier) the symbolic alts are defined as:

DUP Region of elevated copy number relative to the reference, or a tandem duplication breakpoint
DUP:TANDEM Tandem duplication

I am wondering are there any cases where <DUP> represents a duplication that is not tandem?

If not, is there a a reason for having both? A downside of multiple representations for the same molecular event is tools / users miss that a variant is the same

@davmlaw davmlaw changed the title Clarification of <DUP> vs <DUP:TANDEM> VCF specs - Clarification of <DUP> vs <DUP:TANDEM> Feb 5, 2025
@d-cameron
Copy link
Contributor

d-cameron commented Feb 6, 2025 via email

@d-cameron
Copy link
Contributor

d-cameron commented Feb 6, 2025 via email

@davmlaw
Copy link
Author

davmlaw commented Feb 6, 2025

Thanks. I didn't know about SVCLAIM (though I haven't seen VCF 4.4+ in the wild)

I want to represent VCF variants from many different callers in 1 and only 1 way (database constraint to ensure uniqueness)

It looks like: <DUP> SVCLAIM=J and <DUP:TANDEM> SVCLAIM=J describe the same thing?

So it would be right to normalize the first into the second (being more explicit?)

@davmlaw
Copy link
Author

davmlaw commented Feb 7, 2025

Actually, thinking about it more, I'm always going to have <= 4.3 VCF records, so there is ambiguity in those <DUP>s. There might be in any SVCLAIM call too (as it's all just best guesses all the way down to the base call)

So if I want to group together records that may be the same, I'd be better off "downcasting" <DUP:TANDEM> to <DUP> to link with plain <DUP>s as they could be the same event between samples, level but keeping the callers claims (eg SVCLAIM or <DUP:TANDEM> etc) for that sample call

@davmlaw
Copy link
Author

davmlaw commented Feb 7, 2025

Imagine you wanted to merge 3 vcf files

A: <DUP>
B: <DUP> SVCLAIM=J
C: <DUP:TANDEM> SVCLAIM=J

I think they should be merged, but you can't do that without losing information as you'd have to throw away SVCLAIM

It seems that SVCLAIM should be a per-sample FORMAT field?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants