RT Journal Article SR Electronic T1 Simplitigs as an efficient and scalable representation of de Bruijn graphs JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.01.12.903443 DO 10.1101/2020.01.12.903443 A1 Břinda, Karel A1 Baym, Michael A1 Kucherov, Gregory YR 2020 UL http://biorxiv.org/content/early/2020/06/21/2020.01.12.903443.abstract AB De Bruijn graphs play an essential role in computational biology. However, despite their widespread use, they lack a universal scalable representation suitable for different types of genomic data sets. Here, we introduce simplitigs as a compact, efficient and scalable representation and present a fast algorithm for their computation. On examples of several model organisms and two bacterial pan-genomes, we show that, compared to the best existing representation, simplitigs provide a substantial improvement in the cumulative sequence length and their number, especially for graphs with many branching nodes. We demonstrate that this improvement is amplified with more data available. Combined with the commonly used Burrows-Wheeler Transform index of genomic sequences, simplitigs substantially reduce both memory and index loading and query times, as illustrated with large-scale examples of GenBank bacterial pan-genomes.Competing Interest StatementThe authors have declared no competing interest.