Abstract
Recent advances in long-read sequencing technologies enabled accurate and contiguous de novo assemblies of large genomes and metagenomes. However, even long and accurate high-fidelity (HiFi) reads do not resolve repeats that are longer than the read lengths. This limitation negatively affects the contiguity of diploid human genome assemblies since two haplomes share many long identical regions. To generate the telomere-to-telomere assemblies of diploid genomes, biologists now construct their HiFi-based phased assemblies and use additional experimental technologies to transform these phased assemblies into more contiguous diploid assemblies. The barcoded linked-reads, generated using an inexpensive TELL-Seq technology, provide an attractive way to bridge unresolved repeats in phased assemblies of diploid genomes.
Here, we present a SpLitteR tool for haplotype phasing and scaffolding in an assembly graph using barcoded linked-reads. We benchmark SpLitteR on assembly graphs produced by various long-read assemblers and show how TELL-Seq reads facilitate phasing and scaffolding in these graphs. This benchmarking demonstrates that SpLitteR improves upon the state-of-the-art linked-read scaffolders in the accuracy and contiguity metrics.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Additional data was added for the benchmark of the SpLitteR tool. Zhoutao Chen, who provided the data, was added to the list of authors. Supplemental files were updated.