An enhanced transcription factor repressilator that buffers stochasticity and entrains to an erratic external circadian signal

How do cellular regulatory networks solve the challenges of life? This article presents computer software to study that question, focusing on how transcription factor networks transform internal and external inputs into cellular response outputs. The example challenge concerns maintaining a circadian rhythm of molecular concentrations. The system must buffer intrinsic stochastic fluctuations in molecular concentrations and entrain to an external circadian signal that appears and disappears randomly. The software optimizes a stochastic differential equation of transcription factor protein dynamics and the associated mRNAs that produce those transcription factors. The cellular network takes as inputs the concentrations of the transcription factors and produces as outputs the transcription rates of the mRNAs that make the transcription factors. An artificial neural network encodes the cellular input-output function, allowing efficient search for solutions to the complex stochastic challenge. Several good solutions are discovered, measured by the probability distribution for the tracking deviation between the stochastic cellular circadian trajectory and the deterministic external circadian pattern. The solutions differ significantly from each other, showing that overparameterized cellular networks may solve a given challenge in a variety of ways. The computation method provides a major advance in its ability to find transcription factor network dynamics than can solve environmental challenges. The article concludes by drawing an analogy between overparameterized cellular networks and the dense and deeply connected overparameterized artificial neural networks that have succeeded so well in deep learning. Understanding how overparameterized networks solve challenges may provide insight into the evolutionary design of cellular regulation.

This article focuses on the first problem, the functional solution to the biological challenge. 38 Following a prior article, the challenge concerns maintaining a circadian rhythm of molecular 39 concentrations (Frank, 2022c). The rhythm must buffer stochastic fluctuations in molecular con-40 centrations. 41 An internal rhythm perturbed by stochasticity cannot retain perfect periodicity without an 42 external signal. In this case, an external circadian signal appears and disappears randomly. That 43 external signal provides an occasional opportunity for entrainment. However, the erratic external 44 signal does not provide sufficient information for the cell to achieve good periodicity by simply 45 mirroring the signal. Instead, the TF network must entrain to the external signal when avail-46 able and otherwise buffer internal molecular stochasticity to maintain its internal rhythm in the 47 absence of external information. 48 I use an artificial neural network to search for a TF input-output function that solves this 49 circadian challenge. I embed the TF network in a stochastic differential equation that tracks 50 the concentrations of mRNAs transcribed by gene expression and the associated TF proteins 51 translated from the mRNAs. The TF network takes the TF protein concentrations as inputs and 52 produces as outputs the mRNA transcription rates for the various TF genes. 53 I studied the same circadian challenge in the prior article (Frank, 2022c). That prior article  Among many computer runs in that prior study, I found only one solution that provided rea-60 sonably good circadian tracking and entrainment based on explicit thermodynamic parameters 61 for the TF network. That solution was very difficult to find computationally and was sensitive to 62 changes in the thermodynamic parameters. Additionally, although the thermodynamic model I 63 used is the most widely favored description for the molecular mechanism, it is very unlikely for 64 2 git • jtb @ biorxiv_1.0-3::8472a44-2023-04-21 (2023-04-21 16:22Z) • safrank that model to be an accurate and complete description of the actual molecular mechanism.

65
This article gains by focusing solely on the functional challenge of mapping inputs to outputs 66 without concern for the mechanism. In particular, I use an artificial neural network to encode the 67 functional mapping instead of the TF thermodynamic model. Once we have an understanding 68 of the kinds of functional relations that solve the challenge, we can then search for candidate 69 molecular mechanisms that could potentially encode those functional relations.   Without these technical advantages, computational optimization of TF networks is difficult 82 and has not previously been studied in a widely applicable way. The computer code provided 83 with this article can easily be adapted to study other biological challenges. Additional studies 84 will eventually give a sense of the kinds of input-output mappings required of TF networks to 85 solve the demands of life.

86
As future studies accumulate, we may find that the evolutionary success of TF networks and 87 the computational success of neural networks arise from a common foundation. Both networks 88 may induce essentially the same geometric manifold of evolutionary or learning dynamics on 89 which improving performance plays out (Frank, 2017).   (Frank, 2022a). 99 The goal is to optimize a TF regulatory control system in order to track an environmental 100 target. I first describe the TF system and then describe the environmental target.

101
The deterministic component of the TF system dynamics is given by the temporal derivatives 102 for numbers of mRNA molecules, x, and the TFs produced by those mRNAs, y, as 104 for m i as the maximum mRNA production rate, δ i as the mRNA decay rate, s i as the TF production 105 rate per mRNA, and γ i as the decay rate of the i = 1, . . . , n TFs (Marbach et al., 2010).

106
The function f i transforms the numbers of TFs in the vector y into the production level of the 107 ith mRNA, varying between 0 for complete repression and 1 for maximum production. In this 108 article, each f i is a separate neural network that takes n inputs as log(y).

109
The neural network architecture follows a standard general form, with the following details.

110
The first layer of the network has 5n output nodes. Each of those nodes sums an affine transfor-111 mation of each input, z, of the form α + βz, in which each of those 5n 2 transformations has its 112 own α and β parameters. The value of each output node for this first layer is transformed by the 113 mish activation function (Misra, 2019) 114 mish(z) = z tanh log(1 + e z ) . 115 The 5n outputs from the first layer form the inputs for another neural network layer, which leads 116 to the final single-valued output that controls the mRNA transcription rate for the associated TF (2) 134 I calculated the trajectories of these stochastic differential equations with the Julia DifferentialE-

139
The goal is for the stochastic TF system to maintain a circadian rhythm given only a sporad-140 ically present external circadian signal, as described in Frank (2022c) and summarized here. In 141 particular, the design goal is for TF 1 (y 1 ) to follow a 24h period. TF abundance above S = 10 3 142 molecules per cell corresponds to an "on" state for daytime. Below that threshold, the cell is in 143 an "off" nighttime state.

144
The optimization procedure seeks a parameter combination that minimizes the loss measured 145 as the distance between the target circadian pattern shown in the gold curve and the transformed 146 abundance of TF 1 in the green curve in Fig. 1a. In particular, the daily target rhythm follows    The leftmost run, sde-4_1_t4, corresponds to the dynamics in Fig. 1 and to the more detailed 238 summary of deviations in Fig. 4b. The deviation details in Fig. 4b show the distributions induced  The y-axis is log 10(1 + y) for number of molecules per cell, y. In (a), the optimization procedure attempts to match the number of TF 1 molecules (blue curve) to a circadian rhythm by minimizing a loss value.
To calculate the loss value, begin with the number of TF 1 molecules, y, transformed by a Hill function, y = y 2 /(10 6 + y 2 ), to yield the green curve, which traces 1 + 4ỹ. The gold curve traces the target circadian pattern. The loss value to be minimized is the sum of the squared deviations between the gold and green curves at 50 equally spaced time points per day. The number of TF 2 proteins in (b) is influenced by the internal cellular dynamics and is also increased in response to an external daylight signal that switches on and off randomly (Frank, 2022c). It is initially off. The average waiting time for a random switch in the presence or absence of the signal is w = 2, measured in days. In this example, the signal turns on during the night of the third day and stays on for the remaining days shown. Because the switching is random, daylight can be present or absent for several days in a row, or it can switch on and off several times in one day. In panel (a), stochastic molecular perturbations push the cellular rhythm (green curve) behind the actual circadian pattern (gold curve) during the first few days in this particular realization of the stochastic dynamics. When the daylight signal appears in day 3, the system entrains to the external signal, closely matching the target circadian pattern for the remaining days shown. Panels (i,j) are described in the text.