Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

AirLift: A Fast and Comprehensive Technique for Remapping Alignments between Reference Genomes

View ORCID ProfileJeremie S. Kim, View ORCID ProfileCan Firtina, Meryem Banu Cavlak, View ORCID ProfileDamla Senol Cali, View ORCID ProfileNastaran Hajinazar, View ORCID ProfileMohammed Alser, View ORCID ProfileCan Alkan, View ORCID ProfileOnur Mutlu
doi: https://doi.org/10.1101/2021.02.16.431517
Jeremie S. Kim
1ETH Zurich, Rämistrasse 101, Zürich, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jeremie S. Kim
Can Firtina
1ETH Zurich, Rämistrasse 101, Zürich, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Can Firtina
Meryem Banu Cavlak
2Bilkent University, Bilkent, Ankara, Turkey
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Damla Senol Cali
3Carnegie Mellon University, 5000 Forbes Avenue, 15213, Pittsburgh, Pennsylvania, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Damla Senol Cali
Nastaran Hajinazar
1ETH Zurich, Rämistrasse 101, Zürich, Switzerland
4Simon Fraser University, 8888 University Dr, V5A 1S6, Burnaby, BC, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Nastaran Hajinazar
Mohammed Alser
1ETH Zurich, Rämistrasse 101, Zürich, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Mohammed Alser
Can Alkan
2Bilkent University, Bilkent, Ankara, Turkey
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Can Alkan
Onur Mutlu
1ETH Zurich, Rämistrasse 101, Zürich, Switzerland
2Bilkent University, Bilkent, Ankara, Turkey
3Carnegie Mellon University, 5000 Forbes Avenue, 15213, Pittsburgh, Pennsylvania, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Onur Mutlu
  • For correspondence: omutlu@gmail.com
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

As genome sequencing tools and techniques improve, researchers are able to incrementally assemble more accurate reference genomes, which enable sensitivity in read mapping and downstream analysis such as variant calling. A more sensitive downstream analysis is critical for a better understanding of the genome donor (e.g., health characteristics). Therefore, read sets from sequenced samples should ideally be mapped to the latest available reference genome that represents the most relevant population. Unfortunately, the increasingly large amount of available genomic data makes it prohibitively expensive to fully re-map each read set to its respective reference genome every time the reference is updated. There are several tools that attempt to accelerate the process of updating a read data set from one reference to another (i.e., remapping) by 1) identifying regions that appear similarly between two references and 2) updating the mapping location of reads that map to any of the identified regions in the old reference to the corresponding similar region in the new reference. The main drawback of existing approaches is that if a read maps to a region in the old reference that does not appear with a reasonable degree of similarity in the new reference, the read cannot be remapped. We find that, as a result of this drawback, a significant portion of annotations (i.e., coding regions in a genome) are lost when using state-of-the-art remapping tools. To address this major limitation in existing tools, we propose AirLift, a fast and comprehensive technique for remapping alignments from one genome to another. Compared to the state-of-the-art method for remapping reads (i.e., full mapping), AirLift reduces 1) the number of reads (out of the entire read set) that need to be fully mapped to the new reference by up to 99.99% and 2) the overall execution time to remap read sets between two reference genome versions by 6.7×, 6.6×, and 2.8× for large (human), medium (C. elegans), and small (yeast) reference genomes, respectively. We validate our remapping results with GATK and find that AirLift provides similar accuracy in identifying ground truth SNP and INDEL variants as the baseline of fully mapping a read set.

Code Availability AirLift source code and readme describing how to reproduce our results are available at https://github.com/CMU-SAFARI/AirLift.

Competing Interest Statement

The authors have declared no competing interest.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted February 17, 2021.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
AirLift: A Fast and Comprehensive Technique for Remapping Alignments between Reference Genomes
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
AirLift: A Fast and Comprehensive Technique for Remapping Alignments between Reference Genomes
Jeremie S. Kim, Can Firtina, Meryem Banu Cavlak, Damla Senol Cali, Nastaran Hajinazar, Mohammed Alser, Can Alkan, Onur Mutlu
bioRxiv 2021.02.16.431517; doi: https://doi.org/10.1101/2021.02.16.431517
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
AirLift: A Fast and Comprehensive Technique for Remapping Alignments between Reference Genomes
Jeremie S. Kim, Can Firtina, Meryem Banu Cavlak, Damla Senol Cali, Nastaran Hajinazar, Mohammed Alser, Can Alkan, Onur Mutlu
bioRxiv 2021.02.16.431517; doi: https://doi.org/10.1101/2021.02.16.431517

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4246)
  • Biochemistry (9176)
  • Bioengineering (6807)
  • Bioinformatics (24069)
  • Biophysics (12161)
  • Cancer Biology (9568)
  • Cell Biology (13847)
  • Clinical Trials (138)
  • Developmental Biology (7662)
  • Ecology (11739)
  • Epidemiology (2066)
  • Evolutionary Biology (15547)
  • Genetics (10673)
  • Genomics (14366)
  • Immunology (9517)
  • Microbiology (22916)
  • Molecular Biology (9135)
  • Neuroscience (49170)
  • Paleontology (358)
  • Pathology (1488)
  • Pharmacology and Toxicology (2584)
  • Physiology (3851)
  • Plant Biology (8353)
  • Scientific Communication and Education (1473)
  • Synthetic Biology (2302)
  • Systems Biology (6207)
  • Zoology (1304)