Abstract
Motivation Transcriptome assembly from RNA-sequencing data in species without a reliable reference genome has to be performed de novo, but studies have shown that de novo methods often have inadequate reconstruction ability of transcript isoforms. This impedes the study of alternative splicing, in particular for lowly expressed isoforms.
Result We present the de novo transcript isoform assembler ClusTrast, which clusters a set of guiding contigs by similarity, aligns short reads to the guiding contigs, and assembles each clustered set of short reads individually. We tested ClusTrast on datasets from six eukaryotic species, and showed that ClusTrast reconstructed more expressed known isoforms than any of the other tested de novo assemblers, at a moderate reduction in precision. An appreciable fraction were reconstructed to at least 95% of their length. We suggest that ClusTrast will be useful for studying alternative splicing in the absence of a reference genome.
Availability and implementation The code and usage instructions are available at https://github.com/karljohanw/clustrast.
Contact olofem{at}kth.se
Supplementary information Supplementary material is available.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
westrin{at}kth.se, wk{at}warrenwk.com, olofem{at}kth.se