- 
                Notifications
    You must be signed in to change notification settings 
- Fork 43
Description
Describe the bug
We have observed that ivar variants can generate false positive variant calls for SARS-CoV-2 genomes that contain insertions or deletions. Here is an example from a private genome that contains the 6bp ORF8 deletion:
| CHROM | POS | REF | ALT | GENE | EFFECT | HGVS_C | HGVS_P | DP | REF_DP | ALT_DP | AF | sample | software | lineage | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| NC_045512.2 | 28247 | AGATTTC | A | ORF8 | conservative_inframe_deletion | c.355_360delGATTTC | p.Asp119_Phe120del | 76166 | 48839 | 64275 | 0.84 | 218025 | ivar | AY.33 | 
Position 28247 has a well supported deletion (nearly 70000x coverage) and ivar variants is calling a variant inside that deletion (IGV image and variant table):
| CHROM | POS | REF | ALT | GENE | EFFECT | HGVS_C | HGVS_P | DP | REF_DP | ALT_DP | AF | sample | software | lineage | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| NC_045512.2 | 28253 | C | A | ORF8 | missense_variant | c.360C>A | p.Phe120Leu | 3954 | 61 | 3851 | 0.97 | 218025 | ivar | AY.33 | 
This variant should be included in the consensus according to our quality criteria (variants with an AF > 0.75). Therefore, the AF is overestimated due to the misscalculation in the variant depth (because of the deletion).
| CHROM | POS | REF | ALT | GENE | EFFECT | HGVS_C | HGVS_P | DP | REF_DP | ALT_DP | AF | sample | software | lineage | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| NC_045512.2 | 28247 | AGATTTC | A | ORF8 | conservative_inframe_deletion | c.355_360delGATTTC | p.Asp119_Phe120del | 75039 | 10741 | 63575 | 0.847 | 218025 | VarScan | AY.33 | 
| NC_045512.2 | 28248 | GATTTCA | G | ORF8 | disruptive_inframe_deletion | c.356_361delATTTCA | p.Asp119_Ile121delinsVal | 64673 | 126 | 189 | 0.003 | 218025 | VarScan | AY.33 | 
| NC_045512.2 | 28249 | ATTTC | A | ORF8 | frameshift_variant | c.357_360delTTTC | p.Asp119fs | 64614 | 98 | 79 | 0.001 | 218025 | VarScan | AY.33 | 
| NC_045512.2 | 28251 | TTCATC | T | ORF8 | frameshift_variant&stop_lost&splice_region_variant | c.360_364delCATCT | p.Phe120fs | 69573 | 5295 | 48 | 0.001 | 218025 | VarScan | AY.33 | 
| NC_045512.2 | 28252 | TC | T | ORF8 | frameshift_variant | c.360delC | p.Phe120fs | 69316 | 4895 | 151 | 0.002 | 218025 | VarScan | AY.33 | 
| NC_045512.2 | 28253 | C | A | ORF8 | missense_variant | c.360C>A | p.Phe120Leu | 69561 | 15 | 5009 | 0.072 | 218025 | VarScan | AY.33 | 
Other variant callers such us Varscan detect this variant with an AF << 0.25 because the depth of that position is calculated taking into account the deletion reads. Thus, the AF differs from ivar variants.
Expected behavior
We might suggest that ivar variants overestimate this variant based on the depth calculation and therefore can cause issues with variant prediction in indels (insertions and deletions).
This issue may be related to #79, #83, #85, #103
To Reproduce
Run ivar variants with these params:
samtools mpileup \\
        -a \\
        --count-orphans \\
        --no-BAQ \\
        --ignore-overlaps \\
        --max-depth 20 \\
        --fasta-ref fasta \\
        --min-BQ  | ivar variants -q 30 -t 0.75 -m 10 -r fasta gff -p sample
