-
Notifications
You must be signed in to change notification settings - Fork 24
Quality Metrics
Martin Maiers edited this page Jun 21, 2022
·
3 revisions
These are mean to characterize data so that thresholds can be consistently applied such as: only considering data with a RES_TRS_COUNT > x or only SAM_SIZE > y.
Many of these depend on having the genotypes, or some aspect of the genotypes, available.
QUAL_TYPE | GENOTYPE BASED | VALUE | DESCRIPTION |
---|---|---|---|
DIV_ALPHA | real, < 0 | Exponent of Power Law fit to HTF distribution (This is called alpha in Slater et al. Power Laws for Heavy- Tailed Distributions) | |
DIV_50 | X | integer | Number of haplotypes needed (in descending order of frequency) to have the cumulative sum be > 0.5 (Sample size sensitive!) |
DIV_50_REL | Real, 0 <= x <= 1 | Number of haplotypes needed (in descending order of frequency) to have the cumulative sum be > 0.5 divided by the number of HT | |
SAM_SIZE | X | integer | Number of GT |
DIV_PGD | X | Real, 0 <= x <= 1 | Population genetics diversity (1-sum f_i ^2 N/(N- 1)) where N = SAM_SIZE and f_i is the frequency of a specific haplotype |
DIV_HEAVY_TAIL | Real, 0 <= x <= 1 | a is an independence parameter of the Bayesian SHF model that describes how allele frequency products correlate with haplotype frequencies (also correlates with the fraction of nonzero categories) – From Yoram SHF MS | |
RES_TRS_COUNT | Real, 0 <= x <= 1 | Average number of possible genotypes per individual | |
RES_TRS | Real, 0 <= x <= 1 | Typing Resolution Score – Average sum of square of genotype probabilities (imputation method dependent; ideally imputed from the HF estimated) | |
RES_SHARE_AMBIG | X | Real, 0 <= x <= 1 | Fraction of GT with a lower resolution than defined in the resolution tag |
RES_MISS_LOCI | X | Real, 0 <= x <= 1 | Fraction of GT with missing loci (separate qual_type per locus?) |
DEV_HWE | X | Real | Deviation from HWE (using HWE with ambiguity method) |
ERR_STD | Real, 0 <= x <= 1 | Weighted average of standard errors across all haplotypes | |
ERR_SAMP_80_100 | Real | Laurent, Excoffier “If” between frequencies derived from 100% set and 80% training set | |
SUM_FREQ_GAP | Real | Sum of haplotype frequencies for unobserved haplotypes that are expected in population by SHF model | |
ERR_OFFSET | Real, 0 <= x <= 1 | 1-sum f_i (Difference between predicted full HF distribution using SHF versus actual including test set?) | |
LD_MEASURE | Real | Define in Method section – (Where is LD measured for quality?) | |
KFOLD_IMPUTE | X | Real, 0 <= x <= 1 | % of imputable GT in 20% test set from HT generated in 80% training set |
KFOLD_PRED_ACTUAL | X | Real, 0 <= x <= 1 | Divergence between predicted and actual with Log Loss function (for test set predictions on simulated lower-resolution typings) |
KFOLD_N | integer | Number of independent training-test folds (k) |