RT Journal Article SR Electronic T1 Hardy Weinberg Exact Test In Large Scale Variant Calling Quality Control JF bioRxiv FD Cold Spring Harbor Laboratory SP 095521 DO 10.1101/095521 A1 Zhuoyi Huang A1 Navin Rustagi A1 Degui Zhi A1 L. Adrienne Cupples A1 Richard Gibbs A1 Eric Boerwinkle A1 Fuli Yu YR 2016 UL http://biorxiv.org/content/early/2016/12/19/095521.abstract AB Hardy Weinberg Equilibrium (HWE) test is widely used as a quality control measure to detect sequencing artifacts like mismapping, allelic dropout and biases. However, in the high throughput sequencing era, where the sample size is beyond a thousand scale, the utility of HWE test in reducing the false positive rate remains unclear. In this paper, we demonstrate that HWE test has limited power in identifying sequencing artifacts when the variant allele frequency is lower than 1% in a variant call set produced from more than five thousand whole genome sequenced samples from two homogeneous populations. We develop a novel strategy of implementing HWE filtering in which we incorporate site frequency spectrum information and determine the p-value cutoff which optimizes the tradeoff between sensitivity and specificity. The novel strategy is shown to outperform the exact test of HWE with an empirical constant p-value cutoff regardless of the sequencing sample size. We also present best practice recommendations for identifying possible sources of false positives from large sequencing datasets based on an analysis of intrinsic biases in the variant calling process. Our novel strategy of determining the HWE test p-value cutoff and applying the test to the common variants provides a practical approach for the variant level quality controls in the upcoming sequencing projects with tens to hundreds of thousand of samples.