Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

SQANTI: extensive characterization of long read transcript sequences for quality control in full-length transcriptome identification and quantification

Manuel Tardaguila, Lorena de la Fuente, Cristina Marti, Cécile Pereira, Hector del Risco, Marc Ferrell, Maravillas Mellado, Marissa Macchietto, Kenneth Verheggen, Mariola Edelmann, Iakes Ezkurdia, Jesus Vazquez, Michael Tress, Ali Mortazavi, Lennart Martens, Susana Rodriguez-Navarro, Victoria Moreno, Ana Conesa
doi: https://doi.org/10.1101/118083
Manuel Tardaguila
1Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, University of Florida, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lorena de la Fuente
2Genomics of Gene Expression Laboratory, Centro de Investigaciones Principe Felipe (CIPF), Valencia, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Cristina Marti
2Genomics of Gene Expression Laboratory, Centro de Investigaciones Principe Felipe (CIPF), Valencia, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Cécile Pereira
1Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, University of Florida, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Hector del Risco
1Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, University of Florida, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Marc Ferrell
1Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, University of Florida, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Maravillas Mellado
3Neural Regeneration Laboratory, CIPF, Valencia, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Marissa Macchietto
4Department of Developmental and Cell Biology, University of California, Irvine, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kenneth Verheggen
5VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
6Department of Biochemistry, Ghent University, Ghent, Belgium
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mariola Edelmann
1Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, University of Florida, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Iakes Ezkurdia
7Centro Nacional de Investigaciones Cardiovasculares CNIC, Madrid, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jesus Vazquez
7Centro Nacional de Investigaciones Cardiovasculares CNIC, Madrid, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael Tress
8Centro Nacional de Investigaciones Oncologicas CNIO, Madrid, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ali Mortazavi
4Department of Developmental and Cell Biology, University of California, Irvine, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lennart Martens
5VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
6Department of Biochemistry, Ghent University, Ghent, Belgium
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Susana Rodriguez-Navarro
9RNA transport and metabolism Laboratory, CIPF, Valencia, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Victoria Moreno
3Neural Regeneration Laboratory, CIPF, Valencia, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ana Conesa
1Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, University of Florida, USA
2Genomics of Gene Expression Laboratory, Centro de Investigaciones Principe Felipe (CIPF), Valencia, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: aconesa@ufl.edu aconesa@cipf.es
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

ABSTRACT

High-throughput sequencing of full-length transcripts using long reads has paved the way for the discovery of thousands of novel transcripts, even in very well annotated organisms as mice and humans. Nonetheless, there is a need for studies and tools that characterize these novel isoforms. Here we present SQANTI, an automated pipeline for the classification of long-read transcripts that computes over 30 descriptors, which can be used to assess the quality of the data and of the preprocessing pipelines. We applied SQANTI to a neuronal mouse transcriptome using PacBio long reads and illustrate how the tool is effective in readily describing the composition of and characterizing the full-length transcriptome. We perform extensive evaluation of ToFU PacBio transcripts by PCR to reveal that an important number of the novel transcripts are technical artifacts of the sequencing approach, and that SQANTI quality descriptors can be used to engineer a filtering strategy to remove them. By comparing our iso-transcriptome with public proteomics databases we find that alternative isoforms are elusive to proteogenomics detection and are abundant in major protein changes with respect to the principal isoform of their genes. A comparison of Iso-Seq over the classical RNA-seq approaches solely based on short-reads demonstrates that the PacBio transcriptome not only succeeds in capturing the most robustly expressed fraction of transcripts, but also avoids quantification errors caused by unaccounted 3’ end variability in the reference. SQANTI allows the user to maximize the analytical outcome of long read technologies by providing the tools to deliver quality-evaluated and curated full-length transcriptomes. SQANTI is available at https://bitbucket.org/ConesaLab/sqanti.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted March 18, 2017.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
SQANTI: extensive characterization of long read transcript sequences for quality control in full-length transcriptome identification and quantification
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
SQANTI: extensive characterization of long read transcript sequences for quality control in full-length transcriptome identification and quantification
Manuel Tardaguila, Lorena de la Fuente, Cristina Marti, Cécile Pereira, Hector del Risco, Marc Ferrell, Maravillas Mellado, Marissa Macchietto, Kenneth Verheggen, Mariola Edelmann, Iakes Ezkurdia, Jesus Vazquez, Michael Tress, Ali Mortazavi, Lennart Martens, Susana Rodriguez-Navarro, Victoria Moreno, Ana Conesa
bioRxiv 118083; doi: https://doi.org/10.1101/118083
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
SQANTI: extensive characterization of long read transcript sequences for quality control in full-length transcriptome identification and quantification
Manuel Tardaguila, Lorena de la Fuente, Cristina Marti, Cécile Pereira, Hector del Risco, Marc Ferrell, Maravillas Mellado, Marissa Macchietto, Kenneth Verheggen, Mariola Edelmann, Iakes Ezkurdia, Jesus Vazquez, Michael Tress, Ali Mortazavi, Lennart Martens, Susana Rodriguez-Navarro, Victoria Moreno, Ana Conesa
bioRxiv 118083; doi: https://doi.org/10.1101/118083

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4384)
  • Biochemistry (9609)
  • Bioengineering (7103)
  • Bioinformatics (24896)
  • Biophysics (12632)
  • Cancer Biology (9974)
  • Cell Biology (14372)
  • Clinical Trials (138)
  • Developmental Biology (7966)
  • Ecology (12124)
  • Epidemiology (2067)
  • Evolutionary Biology (16002)
  • Genetics (10936)
  • Genomics (14755)
  • Immunology (9880)
  • Microbiology (23697)
  • Molecular Biology (9490)
  • Neuroscience (50924)
  • Paleontology (370)
  • Pathology (1541)
  • Pharmacology and Toxicology (2686)
  • Physiology (4023)
  • Plant Biology (8674)
  • Scientific Communication and Education (1511)
  • Synthetic Biology (2402)
  • Systems Biology (6444)
  • Zoology (1346)