TY - JOUR T1 - De novo Clustering Nanopore Long Reads of Transcriptomics Data by Gene JF - bioRxiv DO - 10.1101/170035 SP - 170035 AU - Camille Marchet AU - Lolita Lecompte AU - Corinne Da Silva AU - Corinne Cruaud AU - Jean Marc Aury AU - Jacques Nicolas AU - Pierre Peterlongo Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/07/30/170035.abstract N2 - This work addresses the problem of assigning a set of long reads issued from a de novo transcriptomics study to clusters by genes they originate from. The different transcripts of a gene give long reads sharing similar sequences and our work makes use of this fact to retrieve the right cluster of reads for each gene from the graph of similarity between reads. We propose a method based on the use of the clustering coefficient (CC) and the search of a minimal cut in the graph with a greedy procedure favoring nodes with a high degree and high CC. Our approach compares favorably to state of the art methods. We provide results on the mouse brain transcriptome which show that the approach achieves a high precision level and a good level of recall despite not using any reference genome. ER -