Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Predicting gene expression level in E. coli from mRNA sequence information

Linlin Zhao, Nima Abedpour, Christopher Blum, Petra Kolkhof, Mathias Beller, Markus Kollmann, Emidio Capriotti
doi: https://doi.org/10.1101/089102
Linlin Zhao
1Institute for Mathematical Modeling of Biological Systems, Department of Biology. Heinrich Heine University Düsseldorf. Universitaetsstr. 1, 40225 Düsseldorf, Germany.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nima Abedpour
2Department of Translational Genomics, University of Cologne, Weyertal 115b, 50931 Cologne, Germany.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Christopher Blum
1Institute for Mathematical Modeling of Biological Systems, Department of Biology. Heinrich Heine University Düsseldorf. Universitaetsstr. 1, 40225 Düsseldorf, Germany.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Petra Kolkhof
1Institute for Mathematical Modeling of Biological Systems, Department of Biology. Heinrich Heine University Düsseldorf. Universitaetsstr. 1, 40225 Düsseldorf, Germany.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mathias Beller
3Systems Biology of Lipid Metabolism, Department of Biology. Heinrich Heine University Düsseldorf. Universitaetsstr. 1, 40225 Düsseldorf, Germany.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Markus Kollmann
1Institute for Mathematical Modeling of Biological Systems, Department of Biology. Heinrich Heine University Düsseldorf. Universitaetsstr. 1, 40225 Düsseldorf, Germany.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Emidio Capriotti
4BioFolD Unit, Department of Biological, Geological, and Environmental Sciences (BiGeA), University of Bologna, Via F. Selmi 3, Bologna, 40126, Italy.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Motivation The accurate characterization of the translational mechanism is crucial for enhancing our understanding of the relationship between genotype and phenotype. In particular, predicting the impact of the genetic variants on gene expression will allow to optimize specific pathways and functions for engineering new biological systems. In this context, the development of accurate methods for predicting translation efficiency from the nucleotide sequence is a key challenge in computational biology.

Methods In this work we present PGExpress, a binary classifier to discriminate between mRNA sequences with low and high translation efficiency in E. coli. PGExpress algorithm takes as input 12 features corresponding to RNA folding and anti-Shine-Dalgarno hybridization free energies. The method was trained on a set of 1,772 sequence variants (WT-High) of 137 essential E. coli genes. For each gene, we considered 13 sequence variants of the first 33 nucleotides encoding for the same amino acids followed by the superfolder GFP. Each gene variant is represented sequence blocks that include the Ribosome Binding Site (RBS), the first 33 nucleotides of the coding region (C33), the remaining part of the coding region (CC), and their combinations.

Results Our logistic regression-based tool (PGExpress) was trained using a 20-fold gene-based cross-validation procedure on the WT-High dataset. In this test PGExpress achieved an overall accuracy of 74%, a Matthews correlation coefficient 0.49 and an Area Under the Receiver Operating Characteristic Curve (AUC) of 0.81. Tested on 3 sets of sequences with different Ribosome Binding Sites, PGExpress reaches similar AUC. Finally, we validated our method by performing in-house experiments on five newly generated mRNA sequence variants. The predictions of the expression level of the new variants are in agreement with our experimental results in E. coli.

Availability http://folding.biofold.org/pgexpress

Contact markus.kollmann{at}hhu.de, emidio.capriotti{at}unibo.it

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
Back to top
PreviousNext
Posted November 22, 2016.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Predicting gene expression level in E. coli from mRNA sequence information
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Predicting gene expression level in E. coli from mRNA sequence information
Linlin Zhao, Nima Abedpour, Christopher Blum, Petra Kolkhof, Mathias Beller, Markus Kollmann, Emidio Capriotti
bioRxiv 089102; doi: https://doi.org/10.1101/089102
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Predicting gene expression level in E. coli from mRNA sequence information
Linlin Zhao, Nima Abedpour, Christopher Blum, Petra Kolkhof, Mathias Beller, Markus Kollmann, Emidio Capriotti
bioRxiv 089102; doi: https://doi.org/10.1101/089102

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4688)
  • Biochemistry (10371)
  • Bioengineering (7693)
  • Bioinformatics (26367)
  • Biophysics (13547)
  • Cancer Biology (10719)
  • Cell Biology (15459)
  • Clinical Trials (138)
  • Developmental Biology (8509)
  • Ecology (12841)
  • Epidemiology (2067)
  • Evolutionary Biology (16884)
  • Genetics (11412)
  • Genomics (15491)
  • Immunology (10637)
  • Microbiology (25246)
  • Molecular Biology (10234)
  • Neuroscience (54575)
  • Paleontology (402)
  • Pathology (1671)
  • Pharmacology and Toxicology (2899)
  • Physiology (4353)
  • Plant Biology (9263)
  • Scientific Communication and Education (1588)
  • Synthetic Biology (2558)
  • Systems Biology (6788)
  • Zoology (1470)