Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

LinearTurboFold: Linear-Time Global Prediction of Conserved Structures for RNA Homologs with Applications to SARS-CoV-2

View ORCID ProfileSizhen Li, View ORCID ProfileHe Zhang, View ORCID ProfileLiang Zhang, View ORCID ProfileKaibo Liu, View ORCID ProfileBoxiang Liu, View ORCID ProfileDavid H. Mathews, View ORCID ProfileLiang Huang
doi: https://doi.org/10.1101/2020.11.23.393488
Sizhen Li
aSchool of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Sizhen Li
He Zhang
bBaidu Research, Sunnyvale, CA
aSchool of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for He Zhang
Liang Zhang
bBaidu Research, Sunnyvale, CA
aSchool of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Liang Zhang
Kaibo Liu
bBaidu Research, Sunnyvale, CA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kaibo Liu
Boxiang Liu
bBaidu Research, Sunnyvale, CA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Boxiang Liu
David H. Mathews
dDepartment of Biochemistry & Biophysics, Center for RNA Biology, and Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for David H. Mathews
  • For correspondence: David_Mathews@urmc.rochester.edu liang.huang.sh@gmail.com
Liang Huang
aSchool of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
bBaidu Research, Sunnyvale, CA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Liang Huang
  • For correspondence: David_Mathews@urmc.rochester.edu liang.huang.sh@gmail.com
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

The constant emergence of COVID-19 variants reduces the effectiveness of existing vaccines and test kits. Therefore, it is critical to identify conserved structures in SARS-CoV-2 genomes as potential targets for variant-proof diagnostics and therapeutics. However, the algorithms to predict these conserved structures, which simultaneously fold and align multiple RNA homologs, scale at best cubically with sequence length, and are thus infeasible for coronaviruses, which possess the longest genomes (∼30,000 nt) among RNA viruses. As a result, existing efforts on modeling SARS-CoV-2 structures resort to single sequence folding as well as local folding methods with short window sizes, which inevitably neglect long-range interactions that are crucial in RNA functions. Here we present LinearTurboFold, an efficient algorithm for folding RNA homologs that scales linearly with sequence length, enabling unprecedented global structural analysis on SARS-CoV-2. Surprisingly, on a group of SARS-CoV-2 and SARS-related genomes, LinearTurboFold’s purely in silico prediction not only is close to experimentally-guided models for local structures, but also goes far beyond them by capturing the end-to-end pairs between 5’ and 3’ UTRs (∼29,800 nt apart) that match perfectly with a purely experimental work. Furthermore, LinearTurboFold identifies novel conserved structures and conserved accessible regions as potential targets for designing efficient and mutation-insensitive small-molecule drugs, antisense oligonucleotides, siRNAs, CRISPR-Cas13 guide RNAs and RT-PCR primers. LinearTurboFold is a general technique that can also be applied to other RNA viruses and full-length genome studies, and will be a useful tool in fighting the current and future pandemics.

Availability and implementation Our source code is available at https://github.com/LinearFold/LinearTurboFold.

Significance Statement Conserved RNA structures are critical for designing diagnostic and therapeutic tools for many diseases including COVID-19. However, existing algorithms are much too slow to model the global structures of full-length RNA viral genomes. We present LinearTurboFold, a linear-time algorithm that is orders of magnitude faster, making it the first method to simultaneously fold and align whole genomes of SARS-CoV-2 variants, the longest known RNA virus (∼ 30 kilobases). Our work enables unprecedented global structural analysis and captures long-range interactions that are out of reach for existing algorithms but crucial for RNA functions. LinearTurboFold is a general technique for full-length genome studies and can help fight the current and future pandemics.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • https://github.com/LinearFold/LinearTurboFold

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted September 29, 2021.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
LinearTurboFold: Linear-Time Global Prediction of Conserved Structures for RNA Homologs with Applications to SARS-CoV-2
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
LinearTurboFold: Linear-Time Global Prediction of Conserved Structures for RNA Homologs with Applications to SARS-CoV-2
Sizhen Li, He Zhang, Liang Zhang, Kaibo Liu, Boxiang Liu, David H. Mathews, Liang Huang
bioRxiv 2020.11.23.393488; doi: https://doi.org/10.1101/2020.11.23.393488
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
LinearTurboFold: Linear-Time Global Prediction of Conserved Structures for RNA Homologs with Applications to SARS-CoV-2
Sizhen Li, He Zhang, Liang Zhang, Kaibo Liu, Boxiang Liu, David H. Mathews, Liang Huang
bioRxiv 2020.11.23.393488; doi: https://doi.org/10.1101/2020.11.23.393488

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4684)
  • Biochemistry (10361)
  • Bioengineering (7675)
  • Bioinformatics (26337)
  • Biophysics (13529)
  • Cancer Biology (10686)
  • Cell Biology (15440)
  • Clinical Trials (138)
  • Developmental Biology (8497)
  • Ecology (12821)
  • Epidemiology (2067)
  • Evolutionary Biology (16862)
  • Genetics (11399)
  • Genomics (15478)
  • Immunology (10617)
  • Microbiology (25219)
  • Molecular Biology (10223)
  • Neuroscience (54473)
  • Paleontology (401)
  • Pathology (1668)
  • Pharmacology and Toxicology (2897)
  • Physiology (4342)
  • Plant Biology (9247)
  • Scientific Communication and Education (1586)
  • Synthetic Biology (2558)
  • Systems Biology (6781)
  • Zoology (1466)