Abstract
We present Raptor, a tool for approximately searching many queries in large collections of nucleotide sequences. In comparison with similar tools like Mantis and COBS, Raptor is 12-144 times faster and uses up to 30 times less memory. Raptor uses winnowing minimizers to define a set of representative k-mers, an extension of the Interleaved Bloom Filters (IBF) as a set membership data structure, and probabilistic thresholding for minimizers. Our approach allows compression and a partitioning of the IBF to enable the effective use of secondary memory.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Abbreviations
- BD
- Binning Directory
- FP
- False Positive
- FPR
- False Positive Rate
- IBF
- Interleaved Bloom Filter
- MIBF
- Minimizer Interleaved Bloom Filter
- PIBF
- Partitioned Interleaved Bloom Filter
- SDSL
- Succinct Data Structure Library
Copyright
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.