Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Accelerating long-read analysis on modern CPUs

Saurabh Kalikar, Chirag Jain, Vasimuddin Md, Sanchit Misra
doi: https://doi.org/10.1101/2021.07.21.453294
Saurabh Kalikar
1Intel Labs, 560103 Bangalore, India
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Chirag Jain
2Department of Computational and Data Sciences, Indian Institute of Science, 560012 Bangalore, India
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Vasimuddin Md
1Intel Labs, 560103 Bangalore, India
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sanchit Misra
1Intel Labs, 560103 Bangalore, India
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: sanchit.misra@intel.com
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Long read sequencing is now routinely used at scale for genomics and transcriptomics applications. Mapping of long reads or a draft genome assembly to a reference sequence is often one of the most time consuming steps in these applications. Here, we present techniques to accelerate minimap2, a widely used software for mapping. We present multiple optimizations using SIMD parallelization, efficient cache utilization and a learned index data structure to accelerate its three main computational modules, i.e., seeding, chaining and pairwise sequence alignment. These result in reduction of end-to-end mapping time of minimap2 by up to 3.5× while maintaining identical output.

Competing Interest Statement

SK, VM and SM are employees of Intel Corporation

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted July 23, 2021.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Accelerating long-read analysis on modern CPUs
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Accelerating long-read analysis on modern CPUs
Saurabh Kalikar, Chirag Jain, Vasimuddin Md, Sanchit Misra
bioRxiv 2021.07.21.453294; doi: https://doi.org/10.1101/2021.07.21.453294
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Accelerating long-read analysis on modern CPUs
Saurabh Kalikar, Chirag Jain, Vasimuddin Md, Sanchit Misra
bioRxiv 2021.07.21.453294; doi: https://doi.org/10.1101/2021.07.21.453294

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4397)
  • Biochemistry (9624)
  • Bioengineering (7119)
  • Bioinformatics (24935)
  • Biophysics (12664)
  • Cancer Biology (9989)
  • Cell Biology (14395)
  • Clinical Trials (138)
  • Developmental Biology (7986)
  • Ecology (12145)
  • Epidemiology (2067)
  • Evolutionary Biology (16022)
  • Genetics (10948)
  • Genomics (14776)
  • Immunology (9899)
  • Microbiology (23732)
  • Molecular Biology (9502)
  • Neuroscience (51041)
  • Paleontology (370)
  • Pathology (1544)
  • Pharmacology and Toxicology (2692)
  • Physiology (4037)
  • Plant Biology (8690)
  • Scientific Communication and Education (1512)
  • Synthetic Biology (2404)
  • Systems Biology (6455)
  • Zoology (1349)