Quality Metrics

Quality metrics for haplotype frequency data

These are mean to characterize data so that thresholds can be consistently applied such as: only considering data with a RES_TRS_COUNT > x or only SAM_SIZE > y.

Many of these depend on having the genotypes, or some aspect of the genotypes, available.

QUAL_TYPE	GENOTYPE BASED	VALUE	DESCRIPTION
DIV_ALPHA		real, < 0	Exponent of Power Law fit to HTF distribution (This is called alpha in Slater et al. Power Laws for Heavy- Tailed Distributions)
DIV_50	X	integer	Number of haplotypes needed (in descending order of frequency) to have the cumulative sum be > 0.5 (Sample size sensitive!)
DIV_50_REL		Real, 0 <= x <= 1	Number of haplotypes needed (in descending order of frequency) to have the cumulative sum be > 0.5 divided by the number of HT
SAM_SIZE	X	integer	Number of GT
DIV_PGD	X	Real, 0 <= x <= 1	Population genetics diversity (1-sum f_i ^2 N/(N- 1)) where N = SAM_SIZE and f_i is the frequency of a specific haplotype
DIV_HEAVY_TAIL		Real, 0 <= x <= 1	a is an independence parameter of the Bayesian SHF model that describes how allele frequency products correlate with haplotype frequencies (also correlates with the fraction of nonzero categories) – From Yoram SHF MS
RES_TRS_COUNT		Real, 0 <= x <= 1	Average number of possible genotypes per individual
RES_TRS		Real, 0 <= x <= 1	Typing Resolution Score – Average sum of square of genotype probabilities (imputation method dependent; ideally imputed from the HF estimated)
RES_SHARE_AMBIG	X	Real, 0 <= x <= 1	Fraction of GT with a lower resolution than defined in the resolution tag
RES_MISS_LOCI	X	Real, 0 <= x <= 1	Fraction of GT with missing loci (separate qual_type per locus?)
DEV_HWE	X	Real	Deviation from HWE (using HWE with ambiguity method)
ERR_STD		Real, 0 <= x <= 1	Weighted average of standard errors across all haplotypes
ERR_SAMP_80_100		Real	Laurent, Excoffier “If” between frequencies derived from 100% set and 80% training set
SUM_FREQ_GAP		Real	Sum of haplotype frequencies for unobserved haplotypes that are expected in population by SHF model
ERR_OFFSET		Real, 0 <= x <= 1	1-sum f_i (Difference between predicted full HF distribution using SHF versus actual including test set?)
LD_MEASURE		Real	Define in Method section – (Where is LD measured for quality?)
KFOLD_IMPUTE	X	Real, 0 <= x <= 1	% of imputable GT in 20% test set from HT generated in 80% training set
KFOLD_PRED_ACTUAL	X	Real, 0 <= x <= 1	Divergence between predicted and actual with Log Loss function (for test set predictions on simulated lower-resolution typings)
KFOLD_N		integer	Number of independent training-test folds (k)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quality Metrics

Quality metrics for haplotype frequency data

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally