%0 Journal Article %A Julien Lagarde %A Barbara Uszczynska-Ratajczak %A Silvia Carbonell %A Carrie Davis %A Thomas R. Gingeras %A Adam Frankish %A Jennifer Harrow %A Roderic Guigo %A Rory Johnson %T High-throughput annotation of full-length long noncoding RNAs with Capture Long-Read Sequencing (CLS) %D 2017 %R 10.1101/105064 %J bioRxiv %P 105064 %X Accurate annotations of genes and their transcripts is a foundation of genomics, but no annotation technique presently combines throughput and accuracy. As a result, current reference gene collections remain far from complete: many genes models are fragmentary, while thousands more remain uncatalogued—particularly for long non coding RNAs (lncRNAs). To accelerate lncRNA annotation, the GENCODE consortium has developed RNA Capture Long Seq (CLS), combining targeted RNA capture with third generation long-read sequencing. We present an experimental re-annotation of the entire GENCODE intergenic lncRNA population in matched human and mouse tissues. CLS approximately doubles the annotated complexity of targeted loci, in terms of validated splice junctions and transcript models. The full-length transcript models produced by CLS enable us to definitively characterize the genomic features of lncRNAs, including promoter- and gene-structure, and protein-coding potential. Thus CLS removes a longstanding bottleneck of transcriptome annotation, generating manual-quality full-length transcript models at high-throughput scales.bpbase pairFLfull lengthntnucleotideROIread of insert, i.e. PacBio readsSJsplice junctionSMRTsingle-molecule real-timeTMtranscript model %U https://www.biorxiv.org/content/biorxiv/early/2017/02/01/105064.full.pdf