RT Journal Article SR Electronic T1 kmindex and ORA: indexing and real-time user-friendly queries in terabyte-sized complex genomic datasets JF bioRxiv FD Cold Spring Harbor Laboratory SP 2023.05.31.543043 DO 10.1101/2023.05.31.543043 A1 Lemane, Téo A1 Lezzoche, Nolan A1 Lecubin, Julien A1 Pelletier, Eric A1 Lescot, Magali A1 Chikhi, Rayan A1 Peterlongo, Pierre YR 2023 UL http://biorxiv.org/content/early/2023/10/31/2023.05.31.543043.abstract AB Public sequencing databases contain vast amounts of biological information, yet they are largely underutilized as one cannot efficiently search them for any sequence(s) of interest. We present kmindex, an innovative approach that can index thousands of highly complex metagenomes and perform sequence searches in a fraction of a second. The index construction is an order of magnitude faster than previous methods, while search times are two orders of magnitude faster. With negligible false positive rates below 0.01%, kmindex outperforms the precision of existing approaches by four orders of magnitude. We demonstrate the scalability of kmindex by successfully indexing 1,393 complex marine seawater metagenome samples from the Tara Oceans project. Additionally, we introduce the publicly accessible web server “Ocean Read Atlas” (ORA) at https://ocean-read-atlas.mio.osupytheas.fr/, which enables real-time queries on the Tara Oceans dataset. The open-source kmindex software is available at https://github.com/tlemane/kmindex.Competing Interest StatementThe authors have declared no competing interest.