Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Privacy-Preserving Genotype Imputation in a Trusted Execution Environment

Natnatee Dokmai, Can Kockan, Kaiyuan Zhu, XiaoFeng Wang, View ORCID ProfileS. Cenk Sahinalp, View ORCID ProfileHyunghoon Cho
doi: https://doi.org/10.1101/2021.02.02.429428
Natnatee Dokmai
1Department of Computer Science, Indiana University, Bloomington, IN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Can Kockan
1Department of Computer Science, Indiana University, Bloomington, IN, USA
2Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kaiyuan Zhu
1Department of Computer Science, Indiana University, Bloomington, IN, USA
2Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
XiaoFeng Wang
1Department of Computer Science, Indiana University, Bloomington, IN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
S. Cenk Sahinalp
2Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for S. Cenk Sahinalp
  • For correspondence: cenk.sahinalp@nih.gov hhcho@broadinstitute.org
Hyunghoon Cho
3Broad Institute of MIT and Harvard, Cambridge, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Hyunghoon Cho
  • For correspondence: cenk.sahinalp@nih.gov hhcho@broadinstitute.org
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Genotype imputation is an essential tool in genetics research, whereby missing genotypes are inferred based on a panel of reference genomes to enhance the power of downstream analyses. Recently, public imputation servers have been developed to allow researchers to leverage increasingly large-scale and diverse genetic data repositories for imputation. However, privacy concerns associated with uploading one’s genetic data to a third-party server greatly limit the utility of these services. In this paper, we introduce a practical, secure hardware-based solution for a privacy-preserving imputation service, which keeps the input genomes private from the service provider by processing the data only within a Trusted Execution Environment (TEE) offered by the Intel SGX technology. Our solution features SMac, an efficient, side-channel-resilient imputation algorithm designed for Intel SGX, which employs the hidden Markov model (HMM)-based imputation strategy also utilized by a state-of-the-art imputation software Minimac. SMac achieves imputation accuracies virtually identical to those of Minimac and provides protection against known attacks on SGX while maintaining scalability to large datasets. We additionally show the necessity of our strategies for mitigating side-channel risks by identifying vulnerabilities in existing imputation software and controlling their information exposure. Overall, our work provides a guideline for practical and secure implementation of genetic analysis tools in SGX, representing a step toward privacy-preserving analysis services that can facilitate data sharing and accelerate genetics research.†

Availability Our software is available at https://github.com/ndokmai/sgx-genotype-imputation.

Competing Interest Statement

The authors have declared no competing interest.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted February 03, 2021.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Privacy-Preserving Genotype Imputation in a Trusted Execution Environment
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Privacy-Preserving Genotype Imputation in a Trusted Execution Environment
Natnatee Dokmai, Can Kockan, Kaiyuan Zhu, XiaoFeng Wang, S. Cenk Sahinalp, Hyunghoon Cho
bioRxiv 2021.02.02.429428; doi: https://doi.org/10.1101/2021.02.02.429428
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
Privacy-Preserving Genotype Imputation in a Trusted Execution Environment
Natnatee Dokmai, Can Kockan, Kaiyuan Zhu, XiaoFeng Wang, S. Cenk Sahinalp, Hyunghoon Cho
bioRxiv 2021.02.02.429428; doi: https://doi.org/10.1101/2021.02.02.429428

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (2533)
  • Biochemistry (4975)
  • Bioengineering (3486)
  • Bioinformatics (15229)
  • Biophysics (6908)
  • Cancer Biology (5395)
  • Cell Biology (7751)
  • Clinical Trials (138)
  • Developmental Biology (4539)
  • Ecology (7157)
  • Epidemiology (2059)
  • Evolutionary Biology (10233)
  • Genetics (7516)
  • Genomics (9790)
  • Immunology (4860)
  • Microbiology (13231)
  • Molecular Biology (5142)
  • Neuroscience (29464)
  • Paleontology (203)
  • Pathology (838)
  • Pharmacology and Toxicology (1465)
  • Physiology (2142)
  • Plant Biology (4754)
  • Scientific Communication and Education (1013)
  • Synthetic Biology (1338)
  • Systems Biology (4014)
  • Zoology (768)