Abstract
Predicting gene expression levels from any DNA sequence is a major challenge in biology. Using libraries with >25,000 random mutants, we developed a biophysical model that accounts for major features of σ70-binding bacterial promoters to accurately predict constitutive gene expression levels of any sequence. We experimentally and theoretically estimated that 10-20% of random sequences lead to expression and 82% of non-expressing sequences are one point mutation away from a functional promoter. Generating expression from random sequences is pervasive, such that selection acts against σ70-RNA polymerase binding sites even within inter-genic, promoter-containing regions. The pervasiveness of σ70– binding sites, which arises from the structural features of promoters captured by our biophysical model, implies that their emergence is unlikely the limiting step in gene regulatory evolution.
Competing Interest Statement
The authors have declared no competing interest.