TY - JOUR T1 - The multiple testing burden in sequencing-based disease studies of global populations JF - bioRxiv DO - 10.1101/053264 SP - 053264 AU - Sara L. Pulit AU - Sera A.J. de With AU - Paul I.W. de Bakker Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/05/13/053264.abstract N2 - Genome-wide association studies (GWAS) of common disease have been hugely successful in implicating loci that modify disease risk. The bulk of these associations have proven robust and reproducible, in part due to community adoption of statistical criteria for claiming significant genotype-phenotype associations. Currently, studies of common disease are rapidly shifting towards the use of sequencing technologies. As the cost of sequencing drops, assembling large samples in global populations is becoming increasingly feasible. Sequencing studies interrogate not only common variants, as was true for genotyping-based GWAS, but variation across the full allele frequency spectrum, yielding many more (independent) statistical tests. We sought to empirically determine genome-wide significance for various analysis scenarios. Using whole-genome sequence data, we simulated sequencing-based disease studies of varying sample size and ancestry. We determined that future sequencing efforts in >2,000 samples should practically employ a genome-wide significance threshold of of p <5 ×10−9, though the threshold does vary with ancestry. Studies of European or East Asian ancestry should set genome-wide significance at approximately p <5×10−9, but similar studies of African or South Asian samples should be more stringent (p <1×10−9). Because sequencing analysis brings with it many challenges (especially for rare variants), appropriate adoption of a revised multiple test correction will be crucial to avoid irreproducible claims of association. ER -