Improved exome prioritization of disease genes through cross-species phenotype comparison

  1. Damian Smedley4,10
  1. 1Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany;
  2. 2Berlin Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany;
  3. 3Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany;
  4. 4Mouse Informatics group, Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, United Kingdom;
  5. 5Zilkha Neurogenetic Institute, University of Southern California, Los Angeles, California 90089, USA;
  6. 6Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA;
  7. 7Department of Neuropaediatrics Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany;
  8. 8Department of Human Genetics, Nijmegen Centre for Molecular Life Sciences and Institute for Genetic and Metabolic Disorders, Radboud University Nijmegen Medical Centre, 6500 HB Nijmegen, The Netherlands;
  9. 9University Library and Department of Medical Informatics and Epidemiology, Oregon Health and Sciences University, Portland, Oregon 97239, USA

    Abstract

    Numerous new disease-gene associations have been identified by whole-exome sequencing studies in the last few years. However, many cases remain unsolved due to the sheer number of candidate variants remaining after common filtering strategies such as removing low quality and common variants and those deemed unlikely to be pathogenic. The observation that each of our genomes contains about 100 genuine loss-of-function variants makes identification of the causative mutation problematic when using these strategies alone. We propose using the wealth of genotype to phenotype data that already exists from model organism studies to assess the potential impact of these exome variants. Here, we introduce PHenotypic Interpretation of Variants in Exomes (PHIVE), an algorithm that integrates the calculation of phenotype similarity between human diseases and genetically modified mouse models with evaluation of the variants according to allele frequency, pathogenicity, and mode of inheritance approaches in our Exomiser tool. Large-scale validation of PHIVE analysis using 100,000 exomes containing known mutations demonstrated a substantial improvement (up to 54.1-fold) over purely variant-based (frequency and pathogenicity) methods with the correct gene recalled as the top hit in up to 83% of samples, corresponding to an area under the ROC curve of >95%. We conclude that incorporation of phenotype data can play a vital role in translational bioinformatics and propose that exome sequencing projects should systematically capture clinical phenotypes to take advantage of the strategy presented here.

    Footnotes

    • 10 Corresponding authors

      E-mail peter.robinson{at}charite.de

      E-mail ds5{at}sanger.ac.uk

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.160325.113.

      Freely available online through the Genome Research Open Access option.

    • Received May 13, 2013.
    • Accepted October 24, 2013.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 3.0 Unported), as described at http://creativecommons.org/licenses/by/3.0/.

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server