Abstract
Drug development and biological discovery require effective strategies to map existing genetic associations to causal genes. To approach this problem, we began by identifying a set of positive control genes for 12 common diseases and traits that cause a Mendelian form of the disease or are the target of a medicine used for disease treatment. We then identified a widely-available set of genomic features enriching GWAS-associated single nucleotide variants (SNVs) for these positive control genes. Using these features, we trained and validated the Effector Index (Ei), a causal gene mapping algorithm using the 12 common diseases and traits. The area under Ei’s receiver operator curve to identify positive control genes was 80% and area under the precision recall curve was 29%. Using an enlarged set of independently curated positive control genes for type 2 diabetes which included genes identified by large-scale exome sequencing, these areas increased to 85% and 61%, respectively. The best predictors were coding or transcript altering SNVs, distance to gene and open chromatin-based metrics. We provide the Ei algorithm for its widespread use and have created a web-portal to facilitate understanding of results. This work outlines a simple, understandable approach to prioritize genes at GWAS loci for functional follow-up and drug development.
Author summary In order to derive biological insight, or develop drugs based on genome-wide association studies (GWAS) data, causal genes at associated loci need to be identified. GWAS usually identify large genome regions containing many genes, but seldomly identifies specific causal genes. We have developed an algorithm to predict which genes in a region of disease association are likely causal and have named this algorithm the Effector Index. The Effector Index was optimized on diseases that have known causal or drug target genes, and further validated to predict these types of genes in independent datasets. The Effector Index formalizes these predictive features into a tool that can be used by researchers, and results from the traits and diseases studied here are available via the Accelerating Medicine Partnership web-portal at http://hugeamp.org/effectorgenes.html.
Competing Interest Statement
The funding agencies had no role in the design, implementation or interpretation of this study. The views expressed in this article are those of the author(s) and not necessarily those of funders. EF is an employee of Pfizer. MIM is funded by the NHS, the NIHR, and the Department of Health. MIM has served on advisory panels for Pfizer, NovoNordisk and Zoe Global, has received honoraria from Merck, Pfizer, Novo Nordisk and Eli Lilly, and research funding from Abbvie, Astra Zeneca, Boehringer Ingelheim, Eli Lilly, Janssen, Merck, NovoNordisk, Pfizer, Roche, Sanofi Aventis, Servier, and Takeda. As of June 2019, MIM is an employee of Genentech, and a holder of Roche stock. MIM has received funding from the NIH: U01-DK105535 and the Wellcome Trust: Wellcome: 090532, 098381, 106130, 203141, 212259. The Greenwood lab acknowledges support from Compute Canada (RAPI: nzt-671-aa). MTM is partially funded by National Institutes of Health grant R35GM119703. The Richards research group is supported by the Canadian Institutes of Health Research (CIHR), the Lady Davis Institute of the Jewish General Hospital, the Canadian Foundation for Innovation, the NIH Foundation, Cancer Research UK and the Fonds de Recherche Quebec Sante (FRQS). JBR is supported by a FRQS Clinical Research Scholarship. JBR has served as an advisor to GlaxoSmithKline and Deerfield Capital. TwinsUK is funded by the Welcome Trust, Medical Research Council, European Union, the National Institute for Health Research (NIHR)-funded BioResource, Clinical Research Facility and Biomedical Research Centre based at Guy's and St Thomas' NHS Foundation Trust in partnership with King's College London. This research has been conducted using the UK Biobank Resource using project number 27449.