RT Journal Article SR Electronic T1 Analysis of protein-coding genetic variation in 60,706 humans JF bioRxiv FD Cold Spring Harbor Laboratory SP 030338 DO 10.1101/030338 A1 , A1 Lek, Monkol A1 Karczewski, Konrad J A1 Minikel, Eric V A1 Samocha, Kaitlin E A1 Banks, Eric A1 Fennell, Timothy A1 O’Donnell-Luria, Anne H A1 Ware, James S A1 Hill, Andrew J A1 Cummings, Beryl B A1 Tukiainen, Taru A1 Birnbaum, Daniel P A1 Kosmicki, Jack A A1 Duncan, Laramie A1 Estrada, Karol A1 Zhao, Fengmei A1 Zou, James A1 Pierce-Hoffman, Emma A1 Cooper, David N A1 DePristo, Mark A1 Do, Ron A1 Flannick, Jason A1 Fromer, Menachem A1 Gauthier, Laura A1 Goldstein, Jackie A1 Gupta, Namrata A1 Howrigan, Daniel A1 Kiezun, Adam A1 Kurki, Mitja I A1 Moonshine, Ami Levy A1 Natarajan, Pradeep A1 Orozco, Lorena A1 Peloso, Gina M A1 Poplin, Ryan A1 Rivas, Manuel A A1 Ruano-Rubio, Valentin A1 Ruderfer, Douglas M A1 Shakir, Khalid A1 Stenson, Peter D A1 Stevens, Christine A1 Thomas, Brett P A1 Tiao, Grace A1 Tusie-Luna, Maria T A1 Weisburd, Ben A1 Won, Hong-Hee A1 Yu, Dongmei A1 Altshuler, David M A1 Ardissino, Diego A1 Boehnke, Michael A1 Danesh, John A1 Elosua, Roberto A1 Florez, Jose C A1 Gabriel, Stacey B A1 Getz, Gad A1 Hultman, Christina M A1 Kathiresan, Sekar A1 Laakso, Markku A1 McCarroll, Steven A1 McCarthy, Mark I A1 McGovern, Dermot A1 McPherson, Ruth A1 Neale, Benjamin M A1 Palotie, Aarno A1 Purcell, Shaun M A1 Saleheen, Danish A1 Scharf, Jeremiah A1 Sklar, Pamela A1 Sullivan, Patrick F A1 Tuomilehto, Jaakko A1 Watkins, Hugh C A1 Wilson, James G A1 Daly, Mark J A1 MacArthur, Daniel G YR 2015 UL http://biorxiv.org/content/early/2015/10/30/030338.abstract AB Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) sequence data for 60,706 individuals of diverse ethnicities. The resulting catalogue of human genetic diversity has unprecedented resolution, with an average of one variant every eight bases of coding sequence and the presence of widespread mutational recurrence. The deep catalogue of variation provided by the Exome Aggregation Consortium (ExAC) can be used to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; we identify 3,230 genes with near-complete depletion of truncating variants, 79% of which have no currently established human disease phenotype. Finally, we show that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human “knockout” variants in protein-coding genes.