RT Journal Article SR Electronic T1 ClusTrast: a short read de novo transcript isoform assembler guided by clustered contigs JF bioRxiv FD Cold Spring Harbor Laboratory SP 2022.01.02.473666 DO 10.1101/2022.01.02.473666 A1 Karl Johan Westrin A1 Warren W. Kretzschmar A1 Olof Emanuelsson YR 2022 UL http://biorxiv.org/content/early/2022/11/22/2022.01.02.473666.abstract AB Background Transcriptome assembly from RNA-sequencing data in species without a reliable reference genome has to be performed de novo, but studies have shown that de novo methods often have inadequate ability to reconstruct transcript isoforms. We address this issue by constructing an assembly pipeline whose main purpose is to produce a comprehensive set of transcript isoforms.Results We present the de novo transcript isoform assembler ClusTrast, which clusters a set of guiding contigs by similarity, aligns short reads to the guiding contigs, and assembles each clustered set of short reads individually. We tested ClusTrast on datasets from six eukaryotic species, and showed that ClusTrast reconstructed more expressed known isoforms than any of the other tested de novo assemblers, at a moderate reduction in precision. For recall, ClusTrast was on top in the lower end of expression levels (<15% percentile) for all tested datasets, and over the entire range for almost all datasets. Reference transcripts were often (35–69% for the six datasets) reconstructed to at least 95% of their length by ClusTrast, and more than half of reference transcripts (58–81%) were reconstructed with contigs that exhibited polymorphism, measuring on a subset of reliably predicted contigs.Conclusion We suggest that ClusTrast can be a useful tool for studying isoforms in species without a reliable reference genome, in particular when the goal is to produce a comprehensive transcriptome set with polymorphic variants.Competing Interest StatementThe authors have declared no competing interest.