Abstract
Functional genetic elements are one of the most essential units for synthetic biology. However, both knowledge-driven and data-driven methodology can hardly accomplish the complicated task of genetic elements design efficiently due to the lack of explicit regulatory logics and training samples. Here, we proposed a knowledge-constraint deep learning model named PccGEO to automatically design functional genetic elements with high success rate and efficiency. PccGEO utilized a novel “fill-in-the-flank” strategy with a conditional generative adversarial network structure to optimize the flanking regions of known functional sequences derived from the biological prior knowledge, which can efficiently capture the implicit patterns with a reduced searching space. We applied PccGEO in the design of Escherichia coli promoters, and found that the implicit patterns in flanking regions matter to the properties of promoters such as the expression level. The PccGEO-designed constitutive and inducible promoters showed more than 91.6% chance of success by in vivo validation. We further utilized PccGEO by setting a limited frequency of nucleotide modifications and surprisingly found that the expression level of E. coli sigma 70 promoters could show up to a 159.3-fold increase with only 10-bp nucleotide modifications. The results supported that the implicit patterns are important in the design of functional gene elements and validated the strong capacity of our method in the efficient design of functional genetic elements.
Availability https://github.com/WangLabTHU/PccGEO
Competing Interest Statement
The authors have declared no competing interest.