Abstract
Motivation Long-read assemblers face challenges in discerning closely related viral or bacterial strains, often collapsing similar strains into a single sequence. This limitation has been hampering metagenome analysis, as diverse strains may harbor crucial functional distinctions.
Results We introduce a novel software, HairSplitter, designed to retrieve strains from a partially or totally collapsed assembly and long reads. The method uses a custom variant-calling process to operate with erroneous long reads and introduces a new read binning algorithm to recover an a priori unknown number of strains. On noisy long reads, HairSplitter recovers more strains while being faster than state-of-the-art tools, both in the cases of viruses and bacteria.
Availability HairSplitter is freely available on GitHub at github.com/RolandFaure/HairSplitter.
Contact roland.faure{at}irisa.fr
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
This is now the accepted PCI math comp biol version. Some citations were also badly formatted and have been reformatted.