Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

AtRTD2: A Reference Transcript Dataset for accurate quantification of alternative splicing and expression changes in Arabidopsis thaliana RNA-seq data

Runxuan Zhang, Cristiane P. G. Calixto, Yamile Marquez, Peter Venhuizen, Nikoleta A. Tzioutziou, Wenbin Guo, Mark Spensley, Nicolas Frei dit Frey, Heribert Hirt, Allan B. James, Hugh G. Nimmo, Andrea Barta, Maria Kalyna, John W. S. Brown
doi: https://doi.org/10.1101/051938
Runxuan Zhang
1Informatics and Computational Sciences, The James Hutton Institute, Invergowrie, Dundee DD2 5DA, Scotland, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Cristiane P. G. Calixto
2Plant Sciences Division, College of Life Sciences, University of Dundee, Invergowrie, Dundee DD2 5DA, Scotland, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yamile Marquez
3Max F. Perutz Laboratories, Medical University of Vienna, Dr. Bohrgasse 9/3, 1030, Vienna, Austria
9Current address: EMBL/CRG Research Unit in Systems Biology, Centre for Genomic Regulation (CRG), 88 Dr. Aiguader, Barcelona 08003, Spain.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Peter Venhuizen
3Max F. Perutz Laboratories, Medical University of Vienna, Dr. Bohrgasse 9/3, 1030, Vienna, Austria
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nikoleta A. Tzioutziou
2Plant Sciences Division, College of Life Sciences, University of Dundee, Invergowrie, Dundee DD2 5DA, Scotland, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Wenbin Guo
1Informatics and Computational Sciences, The James Hutton Institute, Invergowrie, Dundee DD2 5DA, Scotland, UK
2Plant Sciences Division, College of Life Sciences, University of Dundee, Invergowrie, Dundee DD2 5DA, Scotland, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mark Spensley
4The Donnelly Centre, University of Toronto, 160 College Street, Toronto, Ontario, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nicolas Frei dit Frey
5Institute of Plant Sciences Paris Saclay, INRA-CNRS-UEVE, Orsay 91405, France
10Current address: Laboratoire de Recherche en Sciences Végétales, UMR5546, University of Toulouse 3, CNRS, 24 chemin de Borde Rouge, Auzeville, BP42617, 31326, Castanet-Tolosan, France.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Heribert Hirt
5Institute of Plant Sciences Paris Saclay, INRA-CNRS-UEVE, Orsay 91405, France
11Current address: Center for Desert Agriculture, 4700 King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Allan B. James
6Institute of Molecular, Cell and Systems Biology, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, G12 8QQ, Scotland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Hugh G. Nimmo
6Institute of Molecular, Cell and Systems Biology, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, G12 8QQ, Scotland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Andrea Barta
3Max F. Perutz Laboratories, Medical University of Vienna, Dr. Bohrgasse 9/3, 1030, Vienna, Austria
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Maria Kalyna
7Department of Applied Genetics and Cell Biology, University of Natural Resources and Life Sciences - BOKU, Muthgasse 18, 1190 Vienna, Austria
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
John W. S. Brown
2Plant Sciences Division, College of Life Sciences, University of Dundee, Invergowrie, Dundee DD2 5DA, Scotland, UK
8Cell and Molecular Sciences, The James Hutton Institute, Invergowrie, Dundee DD2 5DA, Scotland, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Background Alternative splicing is the major post-transcriptional mechanism by which gene expression is regulated and affects a wide range of processes and responses in most eukaryotic organisms. RNA-sequencing (RNA-seq) can generate genome-wide quantification of individual transcript isoforms to identify changes in expression and alternative splicing. RNA-seq is an essential modern tool but its ability to accurately quantify transcript isoforms depends on the diversity, completeness and quality of the transcript information.

Results We have developed a new Reference Transcript Dataset for Arabidopsis (AtRTD2) for RNA-seq analysis containing over 82k non-redundant transcripts, whereby 74,194 transcripts originate from 27,667 protein-coding genes. A total of 13,524 protein-coding genes have at least one alternatively spliced transcript in AtRTD2 such that about 60% of the 22,453 protein-coding, intron-containing genes in Arabidopsis undergo alternative splicing. More than 600 putative U12 introns were identified in more than 2,000 transcripts. AtRTD2 was generated from transcript assemblies of ca. 8.5 billion pairs of reads from 285 RNA-seq data sets obtained from 129 RNA-seq libraries and merged along with the previous version, AtRTD, and Araport11 transcript assemblies. AtRTD2 increases the diversity of transcripts and through application of stringent filters represents the most extensive and accurate transcript collection for Arabidopsis to date. We have demonstrated a generally good correlation of alternative splicing ratios from RNA-seq data analysed by Salmon and experimental data from high resolution RT-PCR. However, we have observed inaccurate quantification of transcript isoforms for genes with multiple transcripts which have variation in the lengths of their UTRs. This variation is not effectively corrected in RNA-seq analysis programmes and will therefore impact RNA-seq analyses generally. To address this, we have tested different genome-wide modifications of AtRTD2 to improve transcript quantification and alternative splicing analysis. As a result, we release AtRTD2-QUASI specifically for use in Quantification of Alternatively Spliced Isoforms and demonstrate that it out-performs other available transcriptomes for RNA-seq analysis.

Conclusions We have generated a new transcriptome resource for RNA-seq analyses in Arabidopsis (AtRTD2) designed to address quantification of different isoforms and alternative splicing in gene expression studies. Experimental validation of alternative splicing changes identified inaccuracies in transcript quantification due to UTR length variation. To solve this problem, we also release a modified reference transcriptome, AtRTD2-QUASI for quantification of transcript isoforms, which shows high correlation with experimental data.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
Back to top
PreviousNext
Posted May 06, 2016.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
AtRTD2: A Reference Transcript Dataset for accurate quantification of alternative splicing and expression changes in Arabidopsis thaliana RNA-seq data
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
AtRTD2: A Reference Transcript Dataset for accurate quantification of alternative splicing and expression changes in Arabidopsis thaliana RNA-seq data
Runxuan Zhang, Cristiane P. G. Calixto, Yamile Marquez, Peter Venhuizen, Nikoleta A. Tzioutziou, Wenbin Guo, Mark Spensley, Nicolas Frei dit Frey, Heribert Hirt, Allan B. James, Hugh G. Nimmo, Andrea Barta, Maria Kalyna, John W. S. Brown
bioRxiv 051938; doi: https://doi.org/10.1101/051938
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
AtRTD2: A Reference Transcript Dataset for accurate quantification of alternative splicing and expression changes in Arabidopsis thaliana RNA-seq data
Runxuan Zhang, Cristiane P. G. Calixto, Yamile Marquez, Peter Venhuizen, Nikoleta A. Tzioutziou, Wenbin Guo, Mark Spensley, Nicolas Frei dit Frey, Heribert Hirt, Allan B. James, Hugh G. Nimmo, Andrea Barta, Maria Kalyna, John W. S. Brown
bioRxiv 051938; doi: https://doi.org/10.1101/051938

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4095)
  • Biochemistry (8786)
  • Bioengineering (6493)
  • Bioinformatics (23386)
  • Biophysics (11766)
  • Cancer Biology (9167)
  • Cell Biology (13290)
  • Clinical Trials (138)
  • Developmental Biology (7422)
  • Ecology (11386)
  • Epidemiology (2066)
  • Evolutionary Biology (15119)
  • Genetics (10413)
  • Genomics (14024)
  • Immunology (9145)
  • Microbiology (22108)
  • Molecular Biology (8793)
  • Neuroscience (47445)
  • Paleontology (350)
  • Pathology (1423)
  • Pharmacology and Toxicology (2483)
  • Physiology (3711)
  • Plant Biology (8063)
  • Scientific Communication and Education (1433)
  • Synthetic Biology (2215)
  • Systems Biology (6021)
  • Zoology (1251)