You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have been doing testing and benchmarking with (clinical) trio data, and have found a corner case where the behaviour of the merge between the pileup calls and the full alignment call results in a real variant being discarded.
The pileup VCF file contains the row
chr12 132836482 . A ACTCACAGTGACAGGCTCCCAGCAGGGCGCACGGCACTCACAGTGACAGGCTCCCAGCACGGCGCACGGCACTCACAGTGACAGGCTCCCAGCACGGCGCTCGGCC 28.66 PASS P GT:GQ:DP:AD:AF:PL 1/1:28:32:3,29:0.9062:56,48,0
And the full alignment has the row
chr12 132836482 . A . 0.00 RefCall F GT:GQ:DP:AD:AF:PL 0/0:0:32:3:0.0938:990
The VCF produced by MergeVcf outputs the RefCall, which causes the insertion to be dropped.
It's a bit of a corner case, since the position of insertion variants in a VCF is the base before the inserted sequence, and that is ref, but in this case, we do want to report the insertion!
I don't fully understand how the positions of the RefCall variants are determined, so I can't tell if this is a freak event, or if there is a systematic process that means these collisions are likely to occur elsewhere. If you could explain how these RefCall events are generated, that would help us understand how the caller works.
), I think I understand how the merge code is dropping the insertion event. I can think of a few possible ways to fix the code, but I haven't thought through all the semantics in detail, to work out what the best way to fix the code would be.
Again, thanks for making a good tool, and I hope this report helps you make it even better.
Tom.
The text was updated successfully, but these errors were encountered:
Many thanks Tom. I have some ideas how to fix it but can't tell which one does less harm to other variant candidates. Is it possible that you send me a minibam containing only the reads covering chr12:132836482? I gonna look into the details.
I'll have to consult - as I indicated, it's a clinical sample.
The smallest expansion would be to track RefCall entries, either by adding to the tuple in the set, or perhaps keeping a separate set. Then, if the pileup variant is an indel, it "overrides" the refcall. As noted, indels are kind of special in the sense that they are recorded at a ref position immediately before the actual variant. If the variant from the full alignment is not a RefCall then it is probably ok to drop the variant from the pileup.
If you're keen, I would suggest refactoring the code to do a "classic" merge - iterating over the two VCFs at the same time. That way the cases where things collide will appear as a natural clause in the inner loop, and you don't need memory proportional to the size of the full alignment VCF.
aquaskyline
changed the title
RefCall overwrites variant
choose pileup call over full-alignment call if pileup call has a way higher QUAL
Jul 23, 2024
aquaskyline
changed the title
choose pileup call over full-alignment call if pileup call has a way higher QUAL
choose pileup call over full-alignment call if pileup call has 1) a way higher QUAL, 2) is an indel
Jul 23, 2024
Hi!
Thanks for making a good tool!
We have been doing testing and benchmarking with (clinical) trio data, and have found a corner case where the behaviour of the merge between the pileup calls and the full alignment call results in a real variant being discarded.
The pileup VCF file contains the row
And the full alignment has the row
The VCF produced by
MergeVcf
outputs theRefCall
, which causes the insertion to be dropped.It's a bit of a corner case, since the position of insertion variants in a VCF is the base before the inserted sequence, and that is ref, but in this case, we do want to report the insertion!
I don't fully understand how the positions of the
RefCall
variants are determined, so I can't tell if this is a freak event, or if there is a systematic process that means these collisions are likely to occur elsewhere. If you could explain how theseRefCall
events are generated, that would help us understand how the caller works.Looking at the code of
MergeVcf
(e.g.Clair3/preprocess/MergeVcf.py
Line 191 in b975475
Clair3/preprocess/MergeVcf.py
Line 228 in b975475
Again, thanks for making a good tool, and I hope this report helps you make it even better.
Tom.
The text was updated successfully, but these errors were encountered: