Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Gist – an ensemble approach to the taxonomic classification of metatranscriptomic sequence data

Samantha Halliday, John Parkinson
doi: https://doi.org/10.1101/081026
Samantha Halliday
1Program in Molecular Structure and Function, Hospital for Sick Children, Toronto, Ontario, M5G 1L7, Canada
2Department of Computer Science, University of Toronto, Toronto, Ontario, M5S 1A8, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
John Parkinson
1Program in Molecular Structure and Function, Hospital for Sick Children, Toronto, Ontario, M5G 1L7, Canada
3Department of Molecular Genetics, University of Toronto, Toronto, Ontario, M5S 1A8, Canada
4Department of Biochemistry, University of Toronto, Toronto, Ontario, M5S 1A8, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: john.parkinson@utoronto.ca
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

ABSTRACT

The study of whole microbial communities through RNA-seq, or metatranscriptomics, offers a unique view of the relative levels of activity for different genes across a large number of species simultaneously. To make sense of these sequencing data, it is necessary to be able to assign both taxonomic and functional identities to each sequenced read. High-quality identifications are important not only for community profiling, but to also ensure that functional assignments of sequence reads are correctly attributed to their source taxa. Such assignments allow biochemical pathways to be appropriately allocated to discrete species, enabling the capture of cross-species interactions. Typically read annotation is performed by a single alignment-based search tool such as BLAST. However, due to the vast extent of bacterial diversity, these approaches tend to be highly error prone, particularly for taxonomic assignments. Here we introduce a novel program for generating taxonomic assignments, called Gist, which integrates information from a number of machine learning methods and the Burrows-Wheeler Aligner. Uniquely Gist establishes the most appropriate weightings of methods for individual genomes, facilitating high classification accuracy on next-generation sequencing reads. We validate our approach using a synthetic metatranscriptome generator based on Flux Simulator, termed Genepuddle. Further, unlike previous taxonomic classifiers, we demonstrate the capacity of composition-based techniques to accurately inform on taxonomic origin without resorting to longer scanning windows that mimic alignment-based methods. Gist is made freely available under the terms of the GNU General Public License at compsysbio.org/gist.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted October 28, 2016.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Gist – an ensemble approach to the taxonomic classification of metatranscriptomic sequence data
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Gist – an ensemble approach to the taxonomic classification of metatranscriptomic sequence data
Samantha Halliday, John Parkinson
bioRxiv 081026; doi: https://doi.org/10.1101/081026
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Gist – an ensemble approach to the taxonomic classification of metatranscriptomic sequence data
Samantha Halliday, John Parkinson
bioRxiv 081026; doi: https://doi.org/10.1101/081026

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3686)
  • Biochemistry (7766)
  • Bioengineering (5666)
  • Bioinformatics (21234)
  • Biophysics (10552)
  • Cancer Biology (8157)
  • Cell Biology (11902)
  • Clinical Trials (138)
  • Developmental Biology (6736)
  • Ecology (10387)
  • Epidemiology (2065)
  • Evolutionary Biology (13838)
  • Genetics (9693)
  • Genomics (13054)
  • Immunology (8120)
  • Microbiology (19932)
  • Molecular Biology (7824)
  • Neuroscience (42955)
  • Paleontology (318)
  • Pathology (1276)
  • Pharmacology and Toxicology (2256)
  • Physiology (3350)
  • Plant Biology (7207)
  • Scientific Communication and Education (1309)
  • Synthetic Biology (1998)
  • Systems Biology (5528)
  • Zoology (1126)