TY - JOUR T1 - High-throughput annotation of full-length long noncoding RNAs with Capture Long-Read Sequencing JF - bioRxiv DO - 10.1101/105064 SP - 105064 AU - Julien Lagarde AU - Barbara Uszczynska-Ratajczak AU - Silvia Carbonell AU - SÍlvia Pérez-Lluch AU - Amaya Abad AU - Carrie Davis AU - Thomas R. Gingeras AU - Adam Frankish AU - Jennifer Harrow AU - Roderic Guigo AU - Rory Johnson Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/10/09/105064.abstract N2 - Accurate annotations of genes and their transcripts is a foundation of genomics, but no annotation technique presently combines throughput and accuracy. As a result, reference gene collections remain incomplete: many gene models are fragmentary, while thousands more remain uncatalogued–particularly for long noncoding RNAs (lncRNAs). To accelerate lncRNA annotation, the GENCODE consortium has developed RNA Capture Long Seq (CLS), combining targeted RNA capture with third-generation long-read sequencing. We present an experimental re-annotation of the GENCODE intergenic lncRNA population in matched human and mouse tissues, resulting in novel transcript models for 3574 / 561 gene loci, respectively. CLS approximately doubles the annotated complexity of targeted loci, outperforming existing short-read techniques. Full-length transcript models produced by CLS enable us to definitively characterize the genomic features of lncRNAs, including promoter- and gene-structure, and protein-coding potential. Thus CLS removes a longstanding bottleneck of transcriptome annotation, generating manual-quality full-length transcript models at high-throughput scales.bpbase pairFLfull lengthntnucleotideROIread of insert, i.e. PacBio readSJsplice junctionSMRTsingle-molecule real-timeTMtranscript model ER -