RT Journal Article SR Electronic T1 De novo Clustering Nanopore Long Reads of Transcriptomics Data by Gene JF bioRxiv FD Cold Spring Harbor Laboratory SP 170035 DO 10.1101/170035 A1 Camille Marchet A1 Lolita Lecompte A1 Corinne Da Silva A1 Corinne Cruaud A1 Jean Marc Aury A1 Jacques Nicolas A1 Pierre Peterlongo YR 2017 UL http://biorxiv.org/content/early/2017/07/30/170035.abstract AB This work addresses the problem of assigning a set of long reads issued from a de novo transcriptomics study to clusters by genes they originate from. The different transcripts of a gene give long reads sharing similar sequences and our work makes use of this fact to retrieve the right cluster of reads for each gene from the graph of similarity between reads. We propose a method based on the use of the clustering coefficient (CC) and the search of a minimal cut in the graph with a greedy procedure favoring nodes with a high degree and high CC. Our approach compares favorably to state of the art methods. We provide results on the mouse brain transcriptome which show that the approach achieves a high precision level and a good level of recall despite not using any reference genome.