RT Journal Article SR Electronic T1 VariantStore: A Large-Scale Genomic Variant Search Index JF bioRxiv FD Cold Spring Harbor Laboratory SP 2019.12.24.888297 DO 10.1101/2019.12.24.888297 A1 Prashant Pandey A1 Yinjie Gao A1 Carl Kingsford YR 2020 UL http://biorxiv.org/content/early/2020/05/07/2019.12.24.888297.abstract AB The ability to efficiently query genomic variants from thousands of samples is critical to achieving the full potential of many medical and scientific applications such as personalized medicine. Performing variant queries based on coordinates in the reference or sample sequences is at the core of these applications. Efficiently supporting variant queries across thousands of samples is computationally challenging. Most solutions only support queries based on the reference coordinates and the ones that support queries based on coordinates across multiple samples do not scale to data containing more than a few thousand samples. We present VariantStore, a system for efficiently indexing and querying genomic variants and their sequences in either the reference or sample-specific coordinate systems. We show the scalability of VariantStore by indexing genomic variants from the TCGA-BRCA project containing 8640 samples and 5M variants in 4 Hrs and the 1000 genomes project containing 2500 samples and 924M variants in 3 Hrs. Querying for variants in a gene takes between 0.002 – 3 seconds using memory only 10% of the size of the full representation.Competing Interest StatementThe authors have declared no competing interest.