PT - JOURNAL ARTICLE AU - Anton Bankevich AU - Andrey Bzikadze AU - Mikhail Kolmogorov AU - Pavel A. Pevzner TI - Assembling Long Accurate Reads Using de Bruijn Graphs AID - 10.1101/2020.12.10.420448 DP - 2020 Jan 01 TA - bioRxiv PG - 2020.12.10.420448 4099 - http://biorxiv.org/content/early/2020/12/11/2020.12.10.420448.short 4100 - http://biorxiv.org/content/early/2020/12/11/2020.12.10.420448.full AB - Although the de Bruijn graphs represent the basis of many genome assemblers, it remains unclear how to construct these graphs for large genomes and large k-mer sizes. This algorithmic challenge has become particularly important with the emergence of long and accurate high-fidelity (HiFi) reads that were recently utilized to generate a semi-manual telomere-to-telomere assembly of the human genome using the alternative string graph assembly approach. To enable fully automated high-quality HiFi assemblies of various genomes, we developed an efficient jumboDB algorithm for constructing the de Bruijn graph for large genomes and large k-mer sizes and the LJA genome assembler that error-corrects HiFi reads and uses jumboDB to construct the de Bruijn graph on the error-corrected reads. Since the de Bruijn graph constructed for a fixed k-mer size is typically either too tangled or too fragmented, LJA uses a new concept of a multiplex de Bruijn graph with varying k-mer sizes. We demonstrate that LJA produces contiguous assemblies of complex repetitive regions in genomes including automated assemblies of various highly-repetitive human centromeres.Competing Interest StatementThe authors have declared no competing interest.