TY - JOUR T1 - Exhaustive capture of biological variation in RNA-seq data through k-mer decomposition JF - bioRxiv DO - 10.1101/122937 SP - 122937 AU - Jérôme Audoux AU - Nicolas Philippe AU - Rayan Chikhi AU - Mikaël Saison AU - Marc Gabriel AU - Thérèse Commes AU - Daniel Gautheret Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/03/31/122937.abstract N2 - Each individual cell produces its own set of transcripts, which is a combinatorial result of genetic, transcriptomic and post-transcriptomic variations. Due to this combinatorial nature, obtaining the exhaustive set of full-length transcripts for a given species is a never ending endeavor. Yet, each RNA deep sequencing experiment turns out a variety of transcripts that depart from reference transcriptomes and should be properly identified. To address this challenge, we introduce a k-mer-based software protocol for capturing local transcriptional variation from a set of standard RNA-seq libraries, independently of a reference genome or transcriptome. Our software, called DE-kupl, analyzes k-mer contents and detects k-mers with differential abundance directly from the sequencing files, prior to assembly or mapping. This enables to retrieve the virtually complete set of unannotated variation lying in an RNA-seq dataset. This variation can be subsequently assigned to lincRNAs, antisense RNAs, splice and polyadenylation variants, retained introns, expressed repeats, chimeric or circular RNA, foreign RNA and SNV-harbouring RNA. We applied DE-kupl to a published differential RNA-seq experiment carried on a human cell line, and were able to discover highly significant unannotated transcript variations. We propose that DE-Kupl could be a valuable tool for extracting in full the untapped transcript information contained in large scale transcriptome projects. ER -