RT Journal Article SR Electronic T1 TrancriptomeReconstructoR: data-driven annotation of complex transcriptomes JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.12.10.418897 DO 10.1101/2020.12.10.418897 A1 Maxim Ivanov A1 Albin Sandelin A1 Sebastian Marquardt YR 2020 UL http://biorxiv.org/content/early/2020/12/11/2020.12.10.418897.abstract AB Background The quality of gene annotation determines the interpretation of results obtained in transcriptomic studies. The growing number of genome sequence information calls for experimental and computational pipelines for de novo transcriptome annotation. Ideally, gene and transcript models should be called from a limited set of key experimental data.Results We developed TranscriptomeReconstructoR, an R package which implements a pipeline for automated transcriptome annotation. It relies on integrating features from independent and complementary datasets: i) full-length RNA-seq for detection of splicing patterns and ii) high-throughput 5’ and 3’ tag sequencing data for accurate definition of gene borders. The pipeline can also take a nascent RNA-seq dataset to supplement the called gene model with transient transcripts.We reconstructed de novo the transcriptional landscape of wild type Arabidopsis thaliana seedlings as a proof-of-principle. A comparison to the existing transcriptome annotations revealed that our gene model is more accurate and comprehensive than the two most commonly used community gene models, TAIR10 and Araport11. In particular, we identify thousands of transient transcripts missing from the existing annotations. Our new annotation promises to improve the quality of A.thaliana genome research.Conclusions Our proof-of-concept data suggest a cost-efficient strategy for rapid and accurate annotation of complex eukaryotic transcriptomes. We combine the choice of library preparation methods and sequencing platforms with the dedicated computational pipeline implemented in the TranscriptomeReconstructoR package. The pipeline only requires prior knowledge on the reference genomic DNA sequence, but not the transcriptome. The package seamlessly integrates with Bioconductor packages for downstream analysis.Competing Interest StatementThe authors have declared no competing interest.AbbreviationsRNA-seqRNA sequencing;ONTOxford Nanopore;PacBioPacific Biosciences;NET-seqNative Elongation Transcript sequencing;GRO-seqGlobal Run-On sequencing;A. thalianaArabidopsis thaliana;cDNAcomplementary DNA;TSStranscription start site;PASpolyadenylation site;lncRNAlong non-coding RNA;CAGE-seqCap Analysis of Gene Expression sequencing;PAT-seqPoly(A) tag sequencing;CRANComprehensive R Archive;BAMBinary Alignment Map;BEDBrowser Extensible Data;Iso-seqisoform seqiencing;HChigh confidence;MCmedium confidence;LClow confidence;RTread-through;plaNET-seqplant Native Elongation Transcript sequencing;Mmillion;bpbase pair;ncRNAnon-coding RNA;TSS-seqTranscription Start Site sequencing;3’ DRS-seq3’ Direct RNA sequencing;TUtranscription unit;chrRNA-seqchromatin-associated RNA sequencing;TIF-seqTranscript Isoform sequencing;ESTexpressed sequence tag;