TY - JOUR T1 - Privacy-Preserving Read Mapping Using Locality Sensitive Hashing and Secure Kmer Voting JF - bioRxiv DO - 10.1101/046920 SP - 046920 AU - Victoria Popic AU - Serafim Batzoglou Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/04/03/046920.abstract N2 - The recent explosion in the amount of available genome sequencing data imposes high computational demands on the tools designed to analyze it. Low-cost cloud computing has the potential to alleviate this burden. However, moving personal genome data analysis to the cloud raises serious privacy concerns. Read alignment is a critical and computationally intensive first step of most genomic data analysis pipelines. While significant effort has been dedicated to optimize the sensitivity and runtime efficiency of this step, few approaches have addressed outsourcing this computation securely to an untrusted party. The few secure solutions that have been proposed either do not scale to whole genome sequencing datasets or are not competitive with the state of the art in read mapping. In this paper, we present BALAUR, a privacy-preserving read mapping algorithm based on locality sensitive hashing and secure kmer voting. BALAUR securely outsources a significant portion of the computation to the public cloud by formulating the alignment task as a voting scheme between encrypted read and reference kmers. Our approach can easily handle typical genome-scale datasets and is highly competitive with non-cryptographic state-of-the-art read aligners in both accuracy and runtime performance on simulated and real read data. Moreover, our approach is significantly faster than state-of-the-art read aligners in long read mapping. ER -