Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
Confirmatory Results

RamaNet: Computational de novo helical protein backbone design using a long short-term memory generative neural network

Sari Sabban, Mikhail Markovsky
doi: https://doi.org/10.1101/671552
Sari Sabban
1Department of Biological Sciences/Faculty of Science, King Abdulaziz University, Jeddah, Makka, Kingdom of Saudi Arabia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: sari.sabban@gmail.com
Mikhail Markovsky
2North Caucasian Federal Scientific Center of Horticulture, Viticulture, Wine-making, Krasnodar, Russian Federation
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

The ability to perform de novo protein design will allow researchers to expand the variety of available proteins. By designing synthetic structures computationally, they can utilise more structures than those available in the Protein Data Bank, design structures that are not found in nature, or direct the design of proteins to acquire a specific desired structure. While some researchers attempt to design proteins from first physical and thermodynamic principals, we decided to attempt to test whether it is possible to perform de novo helical protein design of just the backbone statistically using machine learning by building a model that uses a long short-term memory (LSTM) architecture. The LSTM model used only the ϕ and ψ angles of each residue from an augmented dataset of only helical protein structures. Though the network’s generated backbone structures were not perfect, they were idealised and evaluated post generation where the non-ideal structures were filtered out and the adequate structures kept. The results were successful in developing a logical, rigid, compact, helical protein backbone topology. This paper is a proof of concept that shows it is possible to generate a novel helical backbone topology using an LSTM neural network architecture using only the ϕ and ψ angles as features. The next step is to attempt to use these backbone topologies and sequence design them to form complete protein structures.

Author summary This research project stemmed from the desire to expand the pool of protein structures that can be used as scaffolds in computational vaccine development, since the number of structures available from the Protein Data Bank was not sufficient to allow for great diversity and increase the probability of grafting a target motif onto a protein scaffold. Since a protein structure’s backbone can be defined by the ϕ and ψ angles of each amino acid in the polypeptide and can effectively translate a protein’s 3D structure into a table of numbers, and since protein structures are not random, this numerical representation of protein structures can be used to train a neural network to mathematically generalise what a protein structure is, and therefore generate new a protein backbone. Instead of using all proteins in the Protein Data Bank a curated dataset was used encompassing protein structures with specific characteristics that will, theoretically, allow them to be evaluated computationally. This paper details how a trained neural network was able to successfully generate helical protein backbones.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • After closer inspection of the code we have realised that the SenseGen neural network that we adopted does not train the generator and discriminator networks adversarially, therefore we corrected the manuscript to reflect this. The manuscript now details an LSTM network. All results are the same since the network code and architecture did not change, only its name.

  • https://sarisabban.github.io/RamaNet/

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted September 01, 2020.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
RamaNet: Computational de novo helical protein backbone design using a long short-term memory generative neural network
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
RamaNet: Computational de novo helical protein backbone design using a long short-term memory generative neural network
Sari Sabban, Mikhail Markovsky
bioRxiv 671552; doi: https://doi.org/10.1101/671552
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
RamaNet: Computational de novo helical protein backbone design using a long short-term memory generative neural network
Sari Sabban, Mikhail Markovsky
bioRxiv 671552; doi: https://doi.org/10.1101/671552

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4863)
  • Biochemistry (10818)
  • Bioengineering (8061)
  • Bioinformatics (27368)
  • Biophysics (14011)
  • Cancer Biology (11155)
  • Cell Biology (16091)
  • Clinical Trials (138)
  • Developmental Biology (8806)
  • Ecology (13321)
  • Epidemiology (2067)
  • Evolutionary Biology (17387)
  • Genetics (11701)
  • Genomics (15955)
  • Immunology (11053)
  • Microbiology (26143)
  • Molecular Biology (10673)
  • Neuroscience (56691)
  • Paleontology (422)
  • Pathology (1737)
  • Pharmacology and Toxicology (3012)
  • Physiology (4563)
  • Plant Biology (9662)
  • Scientific Communication and Education (1617)
  • Synthetic Biology (2697)
  • Systems Biology (6991)
  • Zoology (1513)