PT - JOURNAL ARTICLE AU - Cristopher V. Van Hout AU - Ioanna Tachmazidou AU - Joshua D. Backman AU - Joshua X. Hoffman AU - Bin Ye AU - Ashutosh K. Pandey AU - Claudia Gonzaga-Jauregui AU - Shareef Khalid AU - Daren Liu AU - Nilanjana Banerjee AU - Alexander H. Li AU - O’Dushlaine Colm AU - Anthony Marcketta AU - Jeffrey Staples AU - Claudia Schurmann AU - Alicia Hawes AU - Evan Maxwell AU - Leland Barnard AU - Alexander Lopez AU - John Penn AU - Lukas Habegger AU - Andrew L. Blumenfeld AU - Ashish Yadav AU - Kavita Praveen AU - Marcus Jones AU - William J. Salerno AU - Wendy K. Chung AU - Ida Surakka AU - Cristen J. Willer AU - Kristian Hveem AU - Joseph B. Leader AU - David J. Carey AU - David H. Ledbetter AU - Geisinger-Regeneron DiscovEHR Collaboration AU - Lon Cardon AU - George D. Yancopoulos AU - Aris Economides AU - Giovanni Coppola AU - Alan R. Shuldiner AU - Suganthi Balasubramanian AU - Michael Cantor AU - Matthew R. Nelson AU - John Whittaker AU - Jeffrey G. Reid AU - Jonathan Marchini AU - John D. Overton AU - Robert A. Scott AU - Gonçalo Abecasis AU - Laura Yerges-Armstrong AU - Aris Baras AU - on behalf of the Regeneron Genetics Center TI - Whole exome sequencing and characterization of coding variation in 49,960 individuals in the UK Biobank AID - 10.1101/572347 DP - 2019 Jan 01 TA - bioRxiv PG - 572347 4099 - http://biorxiv.org/content/early/2019/03/09/572347.short 4100 - http://biorxiv.org/content/early/2019/03/09/572347.full AB - The UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world. Here we describe the first tranche of large-scale exome sequence data for 49,960 study participants, revealing approximately 4 million coding variants (of which ~98.4% have frequency < 1%). The data includes 231,631 predicted loss of function variants, a >10-fold increase compared to imputed sequence for the same participants. Nearly all genes (>97%) had ≥1 predicted loss of function carrier, and most genes (>69%) had ≥10 loss of function carriers. We illustrate the power of characterizing loss of function variation in this large population through association analyses across 1,741 phenotypes. In addition to replicating a range of established associations, we discover novel loss of function variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical significance in this population, finding that 2% of the population has a medically actionable variant. Additionally, we leverage the phenotypic data to characterize the relationship between rare BRCA1 and BRCA2 pathogenic variants and cancer risk. Exomes from the first 49,960 participants are now made accessible to the scientific community and highlight the promise offered by genomic sequencing in large-scale population-based studies.