Optimizing genomic medicine in epilepsy through a gene-customized approach to missense variant interpretation

  1. Slavé Petrovski1,8
  1. 1Department of Medicine, The University of Melbourne, Austin Health and Royal Melbourne Hospital, Melbourne, Victoria 3010, Australia;
  2. 2Simcere Diagnostics, Nanjing, 210042, China;
  3. 3Epilepsy Research Centre, Department of Medicine, University of Melbourne, Austin Health, Heidelberg, Victoria 3084, Australia;
  4. 4Department of Mathematics, North Carolina A&T State University, Greensboro, North Carolina 27411, USA;
  5. 5Department of Biochemistry and Molecular Biology, The University of Melbourne, Parkville, Victoria 3010, Australia;
  6. 6Centre for Systems Genomics, School of BioSciences and School of Mathematics and Statistics, The University of Melbourne, Parkville, Victoria 3010, Australia
  1. 7 These authors contributed equally to this work.

  • 8 Present address: Centre for Genomics Research, IMED Biotech Unit, AstraZeneca, Cambridge SG8 6HB, UK

  • Corresponding author: slavep{at}unimelb.edu.au
  • Abstract

    Gene panel and exome sequencing have revealed a high rate of molecular diagnoses among diseases where the genetic architecture has proven suitable for sequencing approaches, with a large number of distinct and highly penetrant causal variants identified among a growing list of disease genes. The challenge is, given the DNA sequence of a new patient, to distinguish disease-causing from benign variants. Large samples of human standing variation data highlight regional variation in the tolerance to missense variation within the protein-coding sequence of genes. This information is not well captured by existing bioinformatic tools, but is effective in improving variant interpretation. To address this limitation in existing tools, we introduce the missense tolerance ratio (MTR), which summarizes available human standing variation data within genes to encapsulate population level genetic variation. We find that patient-ascertained pathogenic variants preferentially cluster in low MTR regions (P < 0.005) of well-informed genes. By evaluating 20 publicly available predictive tools across genes linked to epilepsy, we also highlight the importance of understanding the empirical null distribution of existing prediction tools, as these vary across genes. Subsequently integrating the MTR with the empirically selected bioinformatic tools in a gene-specific approach demonstrates a clear improvement in the ability to predict pathogenic missense variants from background missense variation in disease genes. Among an independent test sample of case and control missense variants, case variants (0.83 median score) consistently achieve higher pathogenicity prediction probabilities than control variants (0.02 median score; Mann-Whitney U test, P < 1 × 10−16). We focus on the application to epilepsy genes; however, the framework is applicable to disease genes beyond epilepsy.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.226589.117.

    • Freely available online through the Genome Research Open Access option.

    • Received June 23, 2017.
    • Accepted August 8, 2017.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server