Summary
Transcription factors (TFs) affect the expression of mRNAs. In essence, the TFs form a large computation network that controls many aspects of cellular function. This article introduces a computational method to optimize TF networks. The method extends recent advances in artificial neural network optimization. In a simple example, computational optimization discovers a four-dimensional TF network that maintains a circadian rhythm over many days, successfully buffering strong stochastic perturbations in molecular dynamics and entraining to an external day-night signal that randomly turns on and off at intervals of several days. This work highlights the similar challenges in understanding how computational TF and neural networks gain information and improve performance, and in how large TF networks may acquire a tendency for genetic variation and disease.
Introduction
Transcription factors (TFs) influence mRNA production. Multiple TFs form inputs into a biochemical network that affects mRNA outputs. The TF networks govern the biochemical dynamics that control much of cellular function.
TF networks pose five key challenges. How do natural processes design TF networks? What TF network architectures commonly arise? What consequences follow from the particular architectures? How can human-engineered TF networks achieve particular design goals? How do TF networks compare with other input-output networks, such as artificial neural networks?
This article introduces a computational optimization method to design TF network models. The computation process differs from how natural selection designs actual biochemical TF networks but does provide insight into design by blind search.
On the technical side, the computational approach arises from the great recent advances in automatic differentiation algorithms.2,19 Automatic differentiation provided the key step that transformed modern
AI by allowing realistic optimization of large artificial neural networks.13 In the same way, it is now possible to optimize TF networks that depend on large numbers of parameters.
As often happens with novel computational optimization applications, the techniques from other fields do not work immediately without additional technical modifications and advances. This article introduces several small but essential technical steps. Such steps include expanding the thermodynamically motivated TF input-output function into a computational function for arbitrary network sizes, developing an inverse computational map from realistically motivated parameter bounds to computationally useful parameter ranges for optimization, and finding the various computational hyperparameters and initial conditions that allow successful optimization.
This article also illustrates the method with a simple example and lists promising directions for future work. Given the wide interest in such computational models and the potential for broad future development and insight, this first small step may motivate further progress.
Literature search did not turn up any prior methods for the general optimization of TF networks. Hiscock14 used machine learning algorithms to optimize differential equation models of biochemical dynamics. Hiscock’s work did not consider mRNAs, gene expression, or TF control networks. Lopatkin & Collins16 reviewed various related modeling approaches for microbial biology and their future potential. Much recent work focuses on the general optimization of differential equation models,22 which motivated my application to TF networks and bio-chemical dynamics.
Results
Dynamics of TF networks
The derivatives with respect to time for numbers of mRNA molecules, x, and the TFs produced by those mRNAs, y, are for mi as maximum mRNA production rate, δi as the mRNA decay rate, si as the TF production rate per mRNA, and γi as the decay rate of i = 1, …, n TFs.18
TF network as input-output function
The function fi transforms the numbers of TFs in the vector y into the production level of each mRNA, varying between 0 for complete repression and 1 for maximum production. The activation function arises from thermodynamic theory,5 leading to the calculation for a single TF as4,18 in which ν = (y/k)h is activation by TF abundance y relative to the TF–promoter binding dissociation constant, k, reshaped by the Hill coefficient, h. The parameters α0 and α1 weight the expression levels when the promoter is unbound or bound, respectively, with α varying between 0 and 1. For two TFs, in which each TF has its own intensity parameters, , and r quantifies synergy between TFs. My computer code expands f for any number of TFs.
Maintaining circadian rhythm as a design challenge
To illustrate the optimization method, the design goal is for TF 1 to follow a 24h period. TF abundance above 103 molecules per cell corresponds to an “on” state for daytime. Below that threshold, the cell is in an “off” nighttime state.
For a system with n = 4 TFs, the differential equations for the system have 164 parameters (see Methods). In Fig. 1a, we seek a parameter combination that minimizes the loss measured as the distance between the target circadian pattern shown in the gold curve and the transformed abundance of TF 1 in the green curve.
Optimization of the deterministic system in eqn 1 often finds a nearly perfect fit between the system’s temporal trajectory and the target circadian pattern. Repeated computer runs with different random initialization and search components typically converge to different TF networks.
Stochastic molecular dynamics
Different optimized fits for the deterministic system in eqn 1 had widely varying sensitivities to perturbation by stochastic molecular dynamics. The real challenge is to optimize a stochastic system. To each derivative in eqn 1, I added a Gaussian noise process weighted by the square root of the molecular abundance. The updated dynamics fluctuate stochastically.
Random external light signal for entrainment
A stochastic system inevitably diverges from the target circadian trajectory. The system may use an external entrainment signal, such as daylight, to correct deviations. In this example, I added a strong boost to the production rate of TF 2 in proportion to the intensity of external light. Initially, the light signal is absent. The signal switches on and off randomly. In Fig. 1b, the gold curve shows the external light signal, which switches on in the middle of the third day and stays on for the remaining days shown. The blue curve traces the abundance of TF 2. The updated challenge is for TF 1 to track the circadian pattern, with stochastic molecular dynamics and a randomly occurring external entrainment signal.
Dynamics of an optimized system
Figure 1 shows the best-performing system obtained by optimization. In panel (a), the system’s trajectory (green) lags the target pattern (gold) during the first few days because of stochastic perturbations from molecular dynamics. When the external light signal switches on in the middle of day 3, the system quickly entrains to the circadian pattern and remains tightly synchronized for the remaining days shown. Panels (c) and (d) show the other two TFs, and panels (e-h) show the mRNAs for each matching TF.
Each stochastic trajectory of the system differs because of the stochastic molecular dynamics and the random switching of the external light signal. Panel (i) shows 20 system trajectories over 20 days, in which the average waiting time between the switching on and off of the light signal is w = 2 days. In this case, the signal comes on often enough for the system to correct most deviations caused by stochastic dynamics.
In panel (j), the average waiting time for the light switch is w = 1000. Because the light starts in the off state, it essentially never comes on. Thus, the trajectories show how well the system can maintain a circadian pattern in response to internal stochastic molecular perturbations, with no external signal to correct deviations. In this case, most trajectories remain remarkably close to their target over a long time period.
Figure 2a shows the deviations of system trajectories from the circadian target for 1000 sample trajectories. The numbers associated with the w labels show the average waiting time between random switches of the external light signal. In each vertical set of circle and lines, the circle shows the median deviation in hours between the target and system entry times into the daylight state. The upper line traces the range of the 75th to 95th percentile, and the lower line traces the 5th to 25th percentile. The left, middle, and right set for each w value shows the distribution of deviations at day 10, 20, and 30, respectively. With an entrainment signal of w ≤ 16, the system typically remains close to its target.
When there is effectively no external entrainment, for w = 1000, the system inevitably diverges from its target with an increase in the day of measurement. Nonetheless, the match remains very good given the highly stochastic molecular dynamics.
This system performs better than other optimization runs in my study. When I tried to improve the performance of this system further with additional optimization steps and altered optimization hyper-parameters, the performance always decreased. The reason remains an open puzzle. Further study may provide insight into the geometry of the performance surface, an analogy for the commonly discussed fitness landscape problems of evolutionary theory.17,24
TF logic of an optimized system
Figure 2b illustrates the TF network input-output function, f. This subsection links the observed TF logic to the design challenge of circadian pattern with a randomly fluctuating daylight entrainment signal.
The plots show the expression level output for the mRNA that makes TF 1. In each plot, the bottom axes show TF protein numbers, labeled as p1 and p2 for TFs 1 and 2, respectively. The scale is log 10(1+y), in which y is the number of TF molecules per cell. Rows show increasing amounts of TF protein 3, labeled p3. Columns show increasing amounts of TF 4.
A high value of p2 associates with a strong external light signal, as in Fig. 1b. The TF network only strongly stimulates p1 production when the external light signal is strong and both p3 and p4 are high. From the plots in Fig. 1b-d, those conditions are only met from a couple of hours past the temporal transition into daytime through midday. Transition into daytime is marked by the vertical dotted lines in panels (a) and (b). Those requirements allow the system to entrain accurately to the external daylight signal when it is present.
In the absence of an external light signal, p2 rises to a lower level at the onset of daylight. Once again, with high levels for both p3 and p4, the rise of p2 increases expression of the mRNA that produces p1. That pattern creates the same temporal entrainment to daylight, but in this case solely by internal signals from the cell’s intrinsic dynamics. However, when there is no external light signal, the system lacks the very strong rise in expression of p1 in midday that seems to be the main entrainment force to an external daylight signal.
In this way, the cell entrains relatively weakly to its stochastic and less reliable internal circadian signals and entrains relatively strongly to an external daylight signal, when that external signal is present. The use of two internal signals, p3 and p4, may help to buffer the effects of stochastic perturbations.
The story outlined here gives a plausible interpretation of the system design. However, when I tried to further optimize this system, I observed significant reductions in performance. That decay raises a puzzle. What causes the sensitivity of the parameter tuning with respect to the dual challenges of buffering stochastic molecular dynamics and entraining to an external light signal when present? In other words, what is the geometry of the optimization surface with respect to the parameters?
Finally, Fig. 2b suggests that the TF logic tends to create steep sigmoid step changes in response to changes in TF input concentrations. That pattern matches a common theoretical assumption that TF logic can be modeled by a piecewise continuous pattern.7 Further optimization studies may show the conditions that favor such piecewise continuous outputs versus other patterns.
Discussion
Optimize a neural network and fit a TF network
A TF network is a function, f, that inputs n TFs and outputs N mRNA expression levels. Perhaps one could take a two-step approach in optimization modeling. First, use an artificial neural network model for the function, f. Optimize the system’s temporal trajectory with respect to a design challenge. Second, fit the TF parameters against the optimized neural network function, f.
This two-step approach allows one to use highly efficient neural network algorithms for the initial optimization. Then, in the fitting of the TF parameters to the optimized neural network function, one can explore the role of various biochemical constraints on the TFs and their effects. One may also gain in-sight into the geometry of optimization surfaces for TFs versus other types of computational networks.
The current neural network literature is actively exploring how and why various network architectures succeed or fail in gaining information and improving performance.1,8,21,23 Can we bring TF networks and cellular computations for information processing and control within this broader conceptual framework? TF optimization modeling will play an important role in answering this question.
Large networks, flat fitness surfaces, and disease
Greater dimensionality of computational networks and more parameters sometimes lead to better optimization. The reasons are not fully understood.21 It may be that more dimensions and parameters tend to smooth the optimization surface, perhaps also flattening the surface in many directions. With respect to TF networks, larger systems may adapt better to new challenges.6 In more highly parameterized and flatter optimization surfaces, a particular TF variation would have less average effect.
In a flatter optimization surface and fitness landscape, genetic variants and mutational effects may tend to be smaller. Smaller fitness effects associate with more genetic variation. There are some hints that, in theory, flatter landscapes and more genetic variation associate with increased heritability of failure.9,10 Thus, studying TF optimization models may lead to better understanding of fitness landscapes, genetic variation, and disease.
Declaration of interests
The author declares no competing interests.
Methods
I wrote the computer code in the Julia programming language.3 I used the SciML packages to optimize differential equations.22 Efficient optimization depends on automatic differentiation,2,19 which is built into the SciML system. The source code for this article provides the details for all calculations and plotting of results.11 The following sections highlight aspects of the computations.
Number of parameters
For n TFs, there are 4n rate parameters from the differential equations in eqn 1. I included the 2n initial conditions as parameters to be optimized, which improves the search process. In the activation function f, the n TFs potentially bind to n different promoters, adding n2 values for each of k and h. There are n2n values of α, and n(2n − (n + 1)) values of r. Simplifying, the total number of parameters is n (5 + n + 2n+1). For example, n = 4 associates with 164 parameters.
As n rises above 5, the number of parameters quickly approaches the approximation n2n+1. For n = 10, there are approximately 2 × 104 parameters, and for n = 20, there are approximately 4 × 107. The large number of parameters favors study of the twostep optimization approach suggested in the Discussion.
Biological bounds on parameters
I set bounds on all parameters to be roughly compatible with realistic value ranges.20 When a parameter concerned the abundance of a molecule, I used number per cell. I list rates m, δ, s, γ on a per second ba-sis. Using the notation aeb = a × 10b, the parameter ranges are The rate parameters set bounds on the molecular abundances, with mRNA molecules approximately bounded on (0, 1e2) and TF protein molecules approximately bounded on (0, 1e3).
Optimization of bounded parameters
To keep the parameters bounded when using an intrinsically unbounded optimization procedure, I used an algorithmic transformation. The parameter vector used for automatic differentiation and optimization was unbounded. Before feeding the parameters into the differential equations of eqn 1, I transformed the unbounded values into the bounds of the prior section. For example, to transform an unbounded parameter value, p, into the range (0, 1), I used a sigmoid function with k1 = d(1+e−10d), and k2 = 10(1−d)+log(d/(1− d)), for d = 0.01. For the tranformed parameter, θ, on the range (0, 1), we obtain a biologically meaning-ful range, (a, a + b) by a + bθ, which, for a = 0 or b ≫ a, is approximately (a, b).
Initial parameters and hyperparameters
I chose parameters within their bounded range and then inverted the above transformation to store those parameters on the unbounded scale for optimization. The success of optimization depended on the initial choice of parameters, including the initial numbers of mRNAs and TFs.
In a typical run, I set the initial parameters and then multiplied each value by 1 + z for the Gaussian random variable z, with mean 0 and standard deviation typically in the range (0.1, 0.5). Common initial parameters are m from a uniform random variate on (1e–4, 1e–2), δ = 1.01e–4, s = 1.01e–1, γ = 1.01e–3, k = 5e2, h = 2, α from a random uniform variate on (0, 1), and r = 1. The computer code shows the exact details of all values and calculations.
Setting the decay rates δ and γ near their minimum seemed particularly important for successful optimization. If these values were initially high, the numbers of mRNAs and TFs often quickly decayed toward zero, providing little opportunity for discovering good parameter values.
Optimization success depends on many additional choices, traditionally called hyperparameters. For example, I used the Adam algorithm for updating parameter values given the gradient of performance with respect to the parameters.15 That learning algorithm has several hyperparameters that determine how new parameters are chosen. I typically used a learning rate of 0.002, and reduced that rate when attempting to refine a potentially good solution. I also began the initial optimization search with only a short period of the temporal target trajectory, and then slowly lengthened the fitting period.12 The computer code shows the full details for these and other choices.11
Stochastic fluctuations vary with abundance
I set stochastic fluctuations for a molecule with abundance z as in which W is a standard normal variate with mean 0 and standard deviation 1, thus has a standard deviation of . As z drops, the ratio of the standard deviation relative to the mean increases. To prevent fluctuations becoming too large relative to the abundance, which can cause negative abundance values in the numerical analysis, for z ≤ 16, I replaced with z/4.
Resource availability
All computer code, parameters and output used to generate the figures are available on Zenodo.11
Acknowledgments
The Donald Bren Foundation, National Science Foundation grant DEB-1939423, and DoD grant W911NF2010227 support my research.