Optimization of transcription factor genetic circuits

Steven A. Frank

doi:10.1101/2022.07.05.498863

Summary

Transcription factors (TFs) affect the expression of mRNAs. In essence, the TFs form a large computation network that controls many aspects of cellular function. This article introduces a computational method to optimize TF networks. The method extends recent advances in artificial neural network optimization. In a simple example, computational optimization discovers a four-dimensional TF network that maintains a circadian rhythm over many days, successfully buffering strong stochastic perturbations in molecular dynamics and entraining to an external day-night signal that randomly turns on and off at intervals of several days. This work highlights the similar challenges in understanding how computational TF and neural networks gain information and improve performance, and in how large TF networks may acquire a tendency for genetic variation and disease.

Introduction

Transcription factors (TFs) influence mRNA production. Multiple TFs form inputs into a biochemical network that affects mRNA outputs. The TF networks govern the biochemical dynamics that control much of cellular function.

TF networks pose five key challenges. How do natural processes design TF networks? What TF network architectures commonly arise? What consequences follow from the particular architectures? How can human-engineered TF networks achieve particular design goals? How do TF networks compare with other input-output networks, such as artificial neural networks?

This article introduces a computational optimization method to design TF network models. The computation process differs from how natural selection designs actual biochemical TF networks but does provide insight into design by blind search.

On the technical side, the computational approach arises from the great recent advances in automatic differentiation algorithms.^2,19 Automatic differentiation provided the key step that transformed modern

AI by allowing realistic optimization of large artificial neural networks.¹³ In the same way, it is now possible to optimize TF networks that depend on large numbers of parameters.

As often happens with novel computational optimization applications, the techniques from other fields do not work immediately without additional technical modifications and advances. This article introduces several small but essential technical steps. Such steps include expanding the thermodynamically motivated TF input-output function into a computational function for arbitrary network sizes, developing an inverse computational map from realistically motivated parameter bounds to computationally useful parameter ranges for optimization, and finding the various computational hyperparameters and initial conditions that allow successful optimization.

This article also illustrates the method with a simple example and lists promising directions for future work. Given the wide interest in such computational models and the potential for broad future development and insight, this first small step may motivate further progress.

Literature search did not turn up any prior methods for the general optimization of TF networks. Hiscock¹⁴ used machine learning algorithms to optimize differential equation models of biochemical dynamics. Hiscock’s work did not consider mRNAs, gene expression, or TF control networks. Lopatkin & Collins¹⁶ reviewed various related modeling approaches for microbial biology and their future potential. Much recent work focuses on the general optimization of differential equation models,²² which motivated my application to TF networks and bio-chemical dynamics.

Results

Dynamics of TF networks

The derivatives with respect to time for numbers of mRNA molecules, x, and the TFs produced by those mRNAs, y, are for m_i as maximum mRNA production rate, δ_i as the mRNA decay rate, s_i as the TF production rate per mRNA, and γ_i as the decay rate of i = 1, …, n TFs.¹⁸

TF network as input-output function

The function f_i transforms the numbers of TFs in the vector y into the production level of each mRNA, varying between 0 for complete repression and 1 for maximum production. The activation function arises from thermodynamic theory,⁵ leading to the calculation for a single TF as^4,18 in which ν = (y/k)^h is activation by TF abundance y relative to the TF–promoter binding dissociation constant, k, reshaped by the Hill coefficient, h. The parameters α₀ and α₁ weight the expression levels when the promoter is unbound or bound, respectively, with α varying between 0 and 1. For two TFs, in which each TF has its own intensity parameters, , and r quantifies synergy between TFs. My computer code expands f for any number of TFs.

Maintaining circadian rhythm as a design challenge

To illustrate the optimization method, the design goal is for TF 1 to follow a 24h period. TF abundance above 10³ molecules per cell corresponds to an “on” state for daytime. Below that threshold, the cell is in an “off” nighttime state.

For a system with n = 4 TFs, the differential equations for the system have 164 parameters (see Methods). In Fig. 1a, we seek a parameter combination that minimizes the loss measured as the distance between the target circadian pattern shown in the gold curve and the transformed abundance of TF 1 in the green curve.

Figure 1:

Circadian dynamics with stochastic fluctuations and random daylight signal.

Stochastic dynamics of TF proteins (a-d) and the mRNAs that produce them (e-h) over six days. The parameters were obtained from the best result of all optimization runs, in which best means the closest match of cellular dynamics to a circadian pattern, as defined in the following paragraphs.

The vertical lines in (a,b) show entry into daylight (dotted) and nighttime (solid). The y-axis is log 10(1 + y) for number of molecules per cell, y. The optimization design goal is for the blue curve in (a), the number of TF 1 molecules, to match a circadian rhythm. To define the optimization loss value to be minimized, the number of TF 1 molecules, y, is transformed by a Hill function, , to yield the green curve, which traces values . The gold curve traces the target circadian pattern. The optimization loss value to be minimized is the sum of the squared deviations between the gold and green curves at 50 equally spaced time points per day. The number of TF 2 proteins in (b) is influenced by the internal cellular dynamics and is also increased in response to an external daylight signal (see text). The availability of the light signal switches on and off randomly. It is initially off. The average waiting time for a random switch in the presence or absence of the signal is w, measured in days. In this example, w = 2. The signal turns on around sunrise of day 3 and stays on for the remaining days shown. Because the switching is random, daylight can be present or absent for several days in a row, or it can switch on and off several times in one day. In this particular example, looking at the match between cellular state shown by green curve in (a) compared with the target gold signal, stochastic molecular perturbations push the cellular rhythm behind the actual circadian pattern during the first few days. When the daylight signal appears in the middle of day 3, the system entrains to the external signal and closely matches the target circadian pattern for the remaining days shown in the plot.

Panels (i,j) illustrate the match of internal cellular state (green) and target circadian pattern (gold) over 20 days. Each plot shows a sample of 20 stochastic trajectories for cellular state, showing the magnitude and the randomness in the degree of mismatch between actual and target trajectories. In (i), the average waiting time between random switching of for the presence or absence of the external light signal is w = 2 days. In (j), the waiting time is w = 1000 days. Because the signal starts in the off state, in (j) the external signal essentially never comes on. Thus, the green stochastic cellular dynamics in (j) illustrate the ability of the cell to hold a circadian rhythm over many days in the absence of an external light signal for entrainment.

Optimization of the deterministic system in eqn 1 often finds a nearly perfect fit between the system’s temporal trajectory and the target circadian pattern. Repeated computer runs with different random initialization and search components typically converge to different TF networks.

Stochastic molecular dynamics

Different optimized fits for the deterministic system in eqn 1 had widely varying sensitivities to perturbation by stochastic molecular dynamics. The real challenge is to optimize a stochastic system. To each derivative in eqn 1, I added a Gaussian noise process weighted by the square root of the molecular abundance. The updated dynamics fluctuate stochastically.

Random external light signal for entrainment

A stochastic system inevitably diverges from the target circadian trajectory. The system may use an external entrainment signal, such as daylight, to correct deviations. In this example, I added a strong boost to the production rate of TF 2 in proportion to the intensity of external light. Initially, the light signal is absent. The signal switches on and off randomly. In Fig. 1b, the gold curve shows the external light signal, which switches on in the middle of the third day and stays on for the remaining days shown. The blue curve traces the abundance of TF 2. The updated challenge is for TF 1 to track the circadian pattern, with stochastic molecular dynamics and a randomly occurring external entrainment signal.

Dynamics of an optimized system

Figure 1 shows the best-performing system obtained by optimization. In panel (a), the system’s trajectory (green) lags the target pattern (gold) during the first few days because of stochastic perturbations from molecular dynamics. When the external light signal switches on in the middle of day 3, the system quickly entrains to the circadian pattern and remains tightly synchronized for the remaining days shown. Panels (c) and (d) show the other two TFs, and panels (e-h) show the mRNAs for each matching TF.

Each stochastic trajectory of the system differs because of the stochastic molecular dynamics and the random switching of the external light signal. Panel (i) shows 20 system trajectories over 20 days, in which the average waiting time between the switching on and off of the light signal is w = 2 days. In this case, the signal comes on often enough for the system to correct most deviations caused by stochastic dynamics.

In panel (j), the average waiting time for the light switch is w = 1000. Because the light starts in the off state, it essentially never comes on. Thus, the trajectories show how well the system can maintain a circadian pattern in response to internal stochastic molecular perturbations, with no external signal to correct deviations. In this case, most trajectories remain remarkably close to their target over a long time period.

Figure 2a shows the deviations of system trajectories from the circadian target for 1000 sample trajectories. The numbers associated with the w labels show the average waiting time between random switches of the external light signal. In each vertical set of circle and lines, the circle shows the median deviation in hours between the target and system entry times into the daylight state. The upper line traces the range of the 75th to 95th percentile, and the lower line traces the 5th to 25th percentile. The left, middle, and right set for each w value shows the distribution of deviations at day 10, 20, and 30, respectively. With an entrainment signal of w ≤ 16, the system typically remains close to its target.

Figure 2:

Stochastic perturbations to entrainment (a) and TF logic for mRNA expression (b).

Panel (a) presents the deviation of cellular dynamics from the circadian target pattern. Each set of two vertical lines and a circle show the distribution of the deviation between the entry into daytime cellular state and the actual onset of daytime. The circle denotes the median of 1000 stochastic cellular trajectories. The upper line shows the 75th percentile at the bottom and the 95th percentile at the top. The lower line shows the 5th (bottom) and 25th (top) percentiles. Each set of three distributions shows, from left to right, the distribution of deviations measured at 10, 20, and 30 days. The w labels below denote the average waiting time in days for random switches between the presence or absence of the external daylight entrainment signal (see caption for Fig. 1). From left to right, the waiting times vary over 2, 4, 8, 16, 1000. Shorter waiting times provide more frequent entrainment signals, reducing the consequences of the intrinsic molecular stochasticity of cellular dynamics. For w1000, the external signal essentially never occurs, thus the set of distributions shows the intrinsic cellular stochasticity and the ability of the cell to maintain a circadian pattern in the absence of an external signal.

(b) TF protein levels control mRNA expression. The four TF proteins form the inputs, and the four expression levels for the associated mRNAs form the outputs. This panel shows all four TF inputs and the associated mRNA expression level for protein 1 as the surface levels of each plot. In each plot, the basal axes quantify the levels of TF 1 and TF 2, labeled as p1 and p2 in the bottom row of plots. The scale is log 10(1 + y) for TF protein level y. The height of each plot shows the relative expression level triggered by the TF inputs, scaled from 0 for complete repression to 1 for maximum expression. The rows from top to bottom show increasing levels of TF 3, labeled as p3. The columns from left to right show increasing levels of TF 4, labeled as p4. See the text for interpretation of the plots.

When there is effectively no external entrainment, for w = 1000, the system inevitably diverges from its target with an increase in the day of measurement. Nonetheless, the match remains very good given the highly stochastic molecular dynamics.

This system performs better than other optimization runs in my study. When I tried to improve the performance of this system further with additional optimization steps and altered optimization hyper-parameters, the performance always decreased. The reason remains an open puzzle. Further study may provide insight into the geometry of the performance surface, an analogy for the commonly discussed fitness landscape problems of evolutionary theory.^17,24

TF logic of an optimized system

Figure 2b illustrates the TF network input-output function, f. This subsection links the observed TF logic to the design challenge of circadian pattern with a randomly fluctuating daylight entrainment signal.

The plots show the expression level output for the mRNA that makes TF 1. In each plot, the bottom axes show TF protein numbers, labeled as p1 and p2 for TFs 1 and 2, respectively. The scale is log 10(1+y), in which y is the number of TF molecules per cell. Rows show increasing amounts of TF protein 3, labeled p3. Columns show increasing amounts of TF 4.

A high value of p2 associates with a strong external light signal, as in Fig. 1b. The TF network only strongly stimulates p1 production when the external light signal is strong and both p3 and p4 are high. From the plots in Fig. 1b-d, those conditions are only met from a couple of hours past the temporal transition into daytime through midday. Transition into daytime is marked by the vertical dotted lines in panels (a) and (b). Those requirements allow the system to entrain accurately to the external daylight signal when it is present.

In the absence of an external light signal, p2 rises to a lower level at the onset of daylight. Once again, with high levels for both p3 and p4, the rise of p2 increases expression of the mRNA that produces p1. That pattern creates the same temporal entrainment to daylight, but in this case solely by internal signals from the cell’s intrinsic dynamics. However, when there is no external light signal, the system lacks the very strong rise in expression of p1 in midday that seems to be the main entrainment force to an external daylight signal.

In this way, the cell entrains relatively weakly to its stochastic and less reliable internal circadian signals and entrains relatively strongly to an external daylight signal, when that external signal is present. The use of two internal signals, p3 and p4, may help to buffer the effects of stochastic perturbations.

The story outlined here gives a plausible interpretation of the system design. However, when I tried to further optimize this system, I observed significant reductions in performance. That decay raises a puzzle. What causes the sensitivity of the parameter tuning with respect to the dual challenges of buffering stochastic molecular dynamics and entraining to an external light signal when present? In other words, what is the geometry of the optimization surface with respect to the parameters?

Finally, Fig. 2b suggests that the TF logic tends to create steep sigmoid step changes in response to changes in TF input concentrations. That pattern matches a common theoretical assumption that TF logic can be modeled by a piecewise continuous pattern.⁷ Further optimization studies may show the conditions that favor such piecewise continuous outputs versus other patterns.

Discussion

Optimize a neural network and fit a TF network

A TF network is a function, f, that inputs n TFs and outputs N mRNA expression levels. Perhaps one could take a two-step approach in optimization modeling. First, use an artificial neural network model for the function, f. Optimize the system’s temporal trajectory with respect to a design challenge. Second, fit the TF parameters against the optimized neural network function, f.

This two-step approach allows one to use highly efficient neural network algorithms for the initial optimization. Then, in the fitting of the TF parameters to the optimized neural network function, one can explore the role of various biochemical constraints on the TFs and their effects. One may also gain in-sight into the geometry of optimization surfaces for TFs versus other types of computational networks.

The current neural network literature is actively exploring how and why various network architectures succeed or fail in gaining information and improving performance.^1,8,21,23 Can we bring TF networks and cellular computations for information processing and control within this broader conceptual framework? TF optimization modeling will play an important role in answering this question.

Large networks, flat fitness surfaces, and disease

Greater dimensionality of computational networks and more parameters sometimes lead to better optimization. The reasons are not fully understood.²¹ It may be that more dimensions and parameters tend to smooth the optimization surface, perhaps also flattening the surface in many directions. With respect to TF networks, larger systems may adapt better to new challenges.⁶ In more highly parameterized and flatter optimization surfaces, a particular TF variation would have less average effect.

In a flatter optimization surface and fitness landscape, genetic variants and mutational effects may tend to be smaller. Smaller fitness effects associate with more genetic variation. There are some hints that, in theory, flatter landscapes and more genetic variation associate with increased heritability of failure.^9,10 Thus, studying TF optimization models may lead to better understanding of fitness landscapes, genetic variation, and disease.

Declaration of interests

The author declares no competing interests.

Methods

I wrote the computer code in the Julia programming language.³ I used the SciML packages to optimize differential equations.²² Efficient optimization depends on automatic differentiation,^2,19 which is built into the SciML system. The source code for this article provides the details for all calculations and plotting of results.¹¹ The following sections highlight aspects of the computations.

Number of parameters

For n TFs, there are 4n rate parameters from the differential equations in eqn 1. I included the 2n initial conditions as parameters to be optimized, which improves the search process. In the activation function f, the n TFs potentially bind to n different promoters, adding n² values for each of k and h. There are n2ⁿ values of α, and n(2ⁿ − (n + 1)) values of r. Simplifying, the total number of parameters is n (5 + n + 2ⁿ⁺¹). For example, n = 4 associates with 164 parameters.

As n rises above 5, the number of parameters quickly approaches the approximation n2ⁿ⁺¹. For n = 10, there are approximately 2 × 10⁴ parameters, and for n = 20, there are approximately 4 × 10⁷. The large number of parameters favors study of the twostep optimization approach suggested in the Discussion.

Biological bounds on parameters

I set bounds on all parameters to be roughly compatible with realistic value ranges.²⁰ When a parameter concerned the abundance of a molecule, I used number per cell. I list rates m, δ, s, γ on a per second ba-sis. Using the notation aeb = a × 10^b, the parameter ranges are The rate parameters set bounds on the molecular abundances, with mRNA molecules approximately bounded on (0, 1e2) and TF protein molecules approximately bounded on (0, 1e3).

Optimization of bounded parameters

To keep the parameters bounded when using an intrinsically unbounded optimization procedure, I used an algorithmic transformation. The parameter vector used for automatic differentiation and optimization was unbounded. Before feeding the parameters into the differential equations of eqn 1, I transformed the unbounded values into the bounds of the prior section. For example, to transform an unbounded parameter value, p, into the range (0, 1), I used a sigmoid function with k₁ = d(1+e^−10d), and k₂ = 10(1−d)+log(d/(1− d)), for d = 0.01. For the tranformed parameter, θ, on the range (0, 1), we obtain a biologically meaning-ful range, (a, a + b) by a + bθ, which, for a = 0 or b ≫ a, is approximately (a, b).

Initial parameters and hyperparameters

I chose parameters within their bounded range and then inverted the above transformation to store those parameters on the unbounded scale for optimization. The success of optimization depended on the initial choice of parameters, including the initial numbers of mRNAs and TFs.

In a typical run, I set the initial parameters and then multiplied each value by 1 + z for the Gaussian random variable z, with mean 0 and standard deviation typically in the range (0.1, 0.5). Common initial parameters are m from a uniform random variate on (1e–4, 1e–2), δ = 1.01e–4, s = 1.01e–1, γ = 1.01e–3, k = 5e2, h = 2, α from a random uniform variate on (0, 1), and r = 1. The computer code shows the exact details of all values and calculations.

Setting the decay rates δ and γ near their minimum seemed particularly important for successful optimization. If these values were initially high, the numbers of mRNAs and TFs often quickly decayed toward zero, providing little opportunity for discovering good parameter values.

Optimization success depends on many additional choices, traditionally called hyperparameters. For example, I used the Adam algorithm for updating parameter values given the gradient of performance with respect to the parameters.¹⁵ That learning algorithm has several hyperparameters that determine how new parameters are chosen. I typically used a learning rate of 0.002, and reduced that rate when attempting to refine a potentially good solution. I also began the initial optimization search with only a short period of the temporal target trajectory, and then slowly lengthened the fitting period.¹² The computer code shows the full details for these and other choices.¹¹

Stochastic fluctuations vary with abundance

I set stochastic fluctuations for a molecule with abundance z as in which W is a standard normal variate with mean 0 and standard deviation 1, thus has a standard deviation of . As z drops, the ratio of the standard deviation relative to the mean increases. To prevent fluctuations becoming too large relative to the abundance, which can cause negative abundance values in the numerical analysis, for z ≤ 16, I replaced with z/4.

Resource availability

All computer code, parameters and output used to generate the figures are available on Zenodo.¹¹

Acknowledgments

The Donald Bren Foundation, National Science Foundation grant DEB-1939423, and DoD grant W911NF2010227 support my research.

Footnotes

References

1.↵
Baraniuk, R., Donoho, D. & Gavish, M. 2020. The science of deep learning. Proceeding of the National Academy of Science USA 117, 30029– 230032.
OpenUrl FREE Full Text
2.↵
Baydin, A. G., Pearlmutter, B. A., Radul, A. A. & Siskind, J. M. 2018. Automatic differentiation in machine learning: a survey. Journal of Machine Learning Research 18, 1–43.
OpenUrl
3.↵
Bezanson, J., Edelman, A., Karpinski, S. & Shah, V. B. 2017. Julia: a fresh approach to numerical computing. SIAM Review 59, 65–98.
OpenUrl CrossRef
4.↵
Bintu, L. et al. 2005. Transcriptional regulation by the numbers: applications. Current Opinion in Genetics & Development 15, 125–135.
OpenUrl
5.↵
Bintu, L. et al. 2005. Transcriptional regulation by the numbers: models. Current Opinion in Genetics & Development 15, 116–124.
OpenUrl
6.↵
Ciliberti, S., Martin, O. C. & Wagner, A. 2007. Innovation and robustness in complex regulatory gene networks. Proceedings of the National Academy of Sciences USA 104, 13591–13596.
OpenUrl Abstract/FREE Full Text
7.↵
De Jong, H. et al. 2004. Qualitative simulation of genetic regulatory networks using piecewise-linear models. Bulletin of Mathematical Biology 66, 301–340.
OpenUrl CrossRef PubMed Web of Science
8.↵
Elad, M., Simon, D. & Aberdam, A. 2020. Another step toward demystifying deep neural networks. Proceedings of the National Academy of Sciences USA 117, 27070–27072.
OpenUrl FREE Full Text
9.↵
Frank, S. A. 2004. Genetic variation in cancer predisposition: mutational decay of a robust genetic control network. Proceedings of the National Academy of Sciences 101, 8061–8065.
OpenUrl Abstract/FREE Full Text
10.↵
Frank, S. A. 2019. Evolutionary design of regulatory control. II. Robust error-correcting feedback increases genetic and phenotypic variability. Journal of Theoretical Biology 468, 72–81.
OpenUrl
11.↵
Frank, S. A. Optimization of transcription factor genetic circuits: Julia software code. 2022. doi: 10.5281/zenodo.6798421.
OpenUrl CrossRef
12.↵
Frank, S. A. Optimizing differential equations to fit data and predict outcomes. 2022. doi: 10.48550/arXiv.2204.07833.
OpenUrl CrossRef
13.↵
Goodfellow, I., Bengio, Y. & Courville, A. 2016. Deep Learning. Cambridge, MA: MIT Press.
14.↵
Hiscock, T. W. 2019. Adapting machine-learning algorithms to design gene circuits. BMC Bioinformatics 20, 214.
OpenUrl
15.↵
Kingma, D. P. & Ba, J. 2014. Adam: a method for stochastic optimization. arXiv:1412.6980.
16.↵
Lopatkin, A. J. & Collins, J. J. 2020. Predictive biology: modelling, understanding and harnessing microbial complexity. Nature Reviews Microbiology 18, 507–520.
OpenUrl
17.↵
Malan, K. M. 2021. A survey of advances in land-scape analysis for optimisation. Algorithms 14.
18.↵
Marbach, D. et al. 2010. Revealing strengths and weaknesses of methods for gene network inference. Proceedings of National Academy of Sciences USA 107, 6286–6291.
OpenUrl Abstract/FREE Full Text
19.↵
Margossian, C. C. 2019. A review of automatic differentiation and its efficient implementation. WIREs Data Mining and Knowledge Discovery 9, e1305.
OpenUrl
20.↵
Milo, R. & Phillips, R. 2015. Cell Biology by the Numbers. Garland Science.
21.↵
Poggio, T., Banburski, A. & Liao, Q. 2020. The-oretical issues in deep networks. Proceedings of the National Academy of Sciences USA 117, 30039–30045.
OpenUrl Abstract/FREE Full Text
22.↵
Rackauckas, C. et al. 2020. Universal differential equations for scientific machine learning. arXiv:2001.04385.
23.↵
Radhakrishnan, A., Belkin, M. & Uhler, C. 2020. Overparameterized neural networks implement associative memory. Proceedings of the National Academy of Sciences USA 117, 27162–27170.
OpenUrl Abstract/FREE Full Text
24.↵
1. M. Lässig &
2. A. Valleriani
Stadler, P. F. 2002. Fitness landscapes. In: Biological Evolution and Statistical Physics. Ed. By M. Lässig & A. Valleriani. Berlin, Heidelberg: Springer, 183–204. doi: 10.1007/3-540-45692-9_10.
OpenUrl CrossRef