Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Interpretable genotype-to-phenotype classifiers with performance guarantees

Alexandre Drouin, Gaël Letarte, Frédéric Raymond, Mario Marchand, Jacques Corbeil, François Laviolette
doi: https://doi.org/10.1101/388348
Alexandre Drouin
Université Laval
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: alexandre.drouin.8@ulaval.ca
Gaël Letarte
Université Laval
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Frédéric Raymond
Université Laval
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mario Marchand
Université Laval
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jacques Corbeil
Université Laval
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
François Laviolette
Université Laval
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Understanding the relationship between the genome of a cell and its phenotype is a central problem in precision medicine. Nonetheless, genotype-to-phenotype prediction comes with great challenges for machine learning algorithms that limit their use in this setting. The high dimensionality of the data tends to hinder generalization and challenges the scalability of most learning algorithms. Additionally, most algorithms produce models that are complex and difficult to interpret. We alleviate these limitations by proposing strong performance guarantees, based on sample compression theory, for rule-based learning algorithms that produce highly interpretable models. We show that these guarantees can be leveraged to accelerate learning and improve model interpretability. Our approach is validated through an application to the genomic prediction of antimicrobial resistance, an important public health concern. Highly accurate models were obtained for 12 species and 56 antibiotics, and their interpretation revealed known resistance mechanisms, as well as some potentially new ones. An open-source disk-based implementation that is both memory and computationally efficient is provided with this work. The implementation is turnkey, requires no prior knowledge of machine learning, and is complemented by comprehensive tutorials.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
Back to top
PreviousNext
Posted August 09, 2018.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Interpretable genotype-to-phenotype classifiers with performance guarantees
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
Share
Interpretable genotype-to-phenotype classifiers with performance guarantees
Alexandre Drouin, Gaël Letarte, Frédéric Raymond, Mario Marchand, Jacques Corbeil, François Laviolette
bioRxiv 388348; doi: https://doi.org/10.1101/388348
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
Interpretable genotype-to-phenotype classifiers with performance guarantees
Alexandre Drouin, Gaël Letarte, Frédéric Raymond, Mario Marchand, Jacques Corbeil, François Laviolette
bioRxiv 388348; doi: https://doi.org/10.1101/388348

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (1002)
  • Biochemistry (1500)
  • Bioengineering (954)
  • Bioinformatics (6850)
  • Biophysics (2445)
  • Cancer Biology (1803)
  • Cell Biology (2542)
  • Clinical Trials (108)
  • Developmental Biology (1706)
  • Ecology (2587)
  • Epidemiology (1506)
  • Evolutionary Biology (5041)
  • Genetics (3624)
  • Genomics (4644)
  • Immunology (1185)
  • Microbiology (4261)
  • Molecular Biology (1631)
  • Neuroscience (10845)
  • Paleontology (83)
  • Pathology (242)
  • Pharmacology and Toxicology (411)
  • Physiology (559)
  • Plant Biology (1464)
  • Scientific Communication and Education (414)
  • Synthetic Biology (546)
  • Systems Biology (1884)
  • Zoology (261)