## Abstract

How changes in the different steps of protein synthesis – transcription, translation and degradation – contribute to differences of protein abundance among genes is not fully understood. There is however accumulating evidence that transcriptional divergence might have a prominent role. Here, we show that yeast paralogous genes are more divergent in transcription than in translation. We explore two causal mechanisms for this predominance of transcriptional divergence: an evolutionary trade-off between the precision and economy of gene expression and a larger mutational target size for transcription. Performing simulations within a minimal model of post-duplication evolution, we find that both mechanisms are consistent with the observed divergence patterns. We also investigate how additional properties of the effects of mutations on gene expression, such as their asymmetry and correlation across levels of regulation, can shape the evolution of duplicates. Our results highlight the importance of fully characterizing the distributions of mutational effects on transcription and translation. They also show how general trade-offs in cellular processes and mutation bias can have far-reaching evolutionary impacts.

## Introduction

Gene expression level – the steady-state cellular abundance of the corresponding protein – is a fundamental property of genes, as shown by extensive reports of fitness-expression dependencies across organisms and biological functions [1–3]. As such, understanding its evolution is of great biological importance. Gene expression is a multi-step process, involving the transcription of mRNAs and their translation into proteins, as well as the active degradation of both types of molecules and their dilution during cell division [4]. Accordingly, variation in protein abundance among genes within or between species can arise from changes at multiple levels. While many studies have described the crucial role of mutations within genes, their regulatory sequences and their regulators in generating expression variation, most reports focused on one of these aspects, for instance on the transcriptional component [5]. How each of these levels of regulation change during evolution, independently or jointly, thus remains to be fully elucidated.

One intriguing possibility which has emerged following recent investigations is that transcription may evolve at a higher rate than translation, such that variation in the abundance of transcripts would accumulate faster than changes in their translation efficiencies. It has for instance been reported that expression divergence between humans and primates occurred mostly at the level of mRNA abundance, with little translational contribution [6]. Similarly, virtually none of the expression divergence observed between lines of the bacterium *Escherichia coli* evolved for 50 000 generations involved changes in the translation efficiency of transcripts [7]. A similar observation has been made in yeasts, where a greater number of variants affecting the abundance of mRNAs rather than their translation have been identified between *Saccharomyces cerevisiae* strains [8]. Further interspecies comparisons however revealed mostly equal contributions of transcriptional and translational changes [9–11]. Extensive reports that changes in transcription are partially buffered by variations in translation might also support a higher evolutionary rate for transcription. Such observations have been made both within mammals [12,13] and in yeasts [9–11], although the latter are more ambiguous and have been challenged [8,14].

Overall, it appears likely that transcriptional changes play a larger role than translational ones in the evolution of gene expression levels. Potential mechanisms underlying this discrepancy however remain to be elucidated. One powerful context in which such an investigation can be performed is that of gene duplication, an evolutionary process which creates a pair of gene copies – named paralogs – from an ancestral gene. Since the two resulting duplicates are usually identical, their expression levels are likely similar immediately after the duplication event. They would thus gradually diverge from a common starting point over millions of years of evolution by the accumulation of expression changes. As such, expression differences which can currently be measured between two paralogs in transcription and in translation allow to approximate their relative evolution in both dimensions. Moreover, the numerous duplicate pairs present in a given organism constitute as many evolutionary replicates. Because a given variation in protein abundance can be obtained from an infinity of transcriptional and translational changes, any consistent pattern across paralog pairs may be telling about the underlying mutational process as well as the selective pressures potentially involved. Most importantly, a predominant role of changes at the level of mRNA abundance in the expression divergence of paralogous genes has previously been reported in two model plant species [15].

In addition to providing a model for the study of the evolution of gene expression levels, the divergence of paralogs is in itself of high biological relevance. It is estimated that between 30% and 65% of all genes are part of duplicate families in most eukaryotes [16,17], while new single-gene duplications may be more frequent than single-nucleotide mutations [18,19]. Besides their high frequency, gene duplication events also have far-reaching consequences. This phenomenon is often associated with the divergence of the resulting paralogs into two functionally distinct genes through processes known as neofunctionalization and subfunctionalization, respectively involving the acquisition of new function(s) [20] and the partitioning of ancestral functions [21]. Protein abundance changes frequently accompany and may even shape this divergence. Post-duplication expression reduction has for instance been reported within paralog pairs [22]. Moreover, a compensatory drift of expression levels may also occur, allowing both gene copies to diverge while maintaining a constant cumulative protein abundance [23,24]. As such, elucidating how the transcription and translation of paralogous genes jointly evolve is important to better understand both the general evolution of gene expression levels and the evolutionary impact of gene duplications.

To this end, we leveraged a published set of transcriptional and translational measurements for 4440 genes of the yeast *S. cerevisiae* [25], which is a well-recognized model for the study of gene duplication. These data present a major advantage: they are transcription and translation rates expressed in molecular terms, respectively transcripts synthesized per time unit and proteins produced per transcript per time unit. They thereby allow for the investigation of potential mechanisms at the molecular level. We first validated whether the paralog pairs included in this dataset show a significantly larger divergence in transcription than in translation. Having confirmed that the evolution of yeast duplicated genes is compatible with a higher evolutionary rate for transcription, we next investigated potential underlying mechanisms. We considered two hypotheses: an evolutionary trade-off in the optimization of gene expression levels [25] and the more parsimonious possibility that transcriptional mutations are more frequent and/or have larger effects. Using *in silico* evolution, we show that both explanations are consistent with the observed divergence patterns. We conclude by stressing the need for measuring the mutational parameters of genes in transcription and in translation in order to be able to further support one model or the other, as well as to fully understand the multi-level evolution of gene expression.

## Results

### Yeast duplicates mostly diverged in transcription

We first confirmed whether transcription changes played a larger role in the evolution of yeast paralogs by comparing the extent of transcriptional and translational divergence within duplicate pairs using transcription rates *β _{m}* and translation rates

*β*– respectively in mRNAs per hour and in proteins per mRNA per hour – reported by Hausser et al. (2019) [25]. Because protein abundance is proportional to the product of these two rates, they contribute equally to overall expression and their relative changes are directly comparable.

_{p}Among the 4440 genes for which *β _{m}* and

*β*were estimated, we identified 409 high-confidence paralog pairs, each derived from a whole-genome duplication (WGD; n=245) or a single small-scale duplication event (SSD; n=164). We assessed the contributions of transcriptional and translational changes by computing the magnitude of relative divergence in transcription and translation as follows, where

_{p}*θ*represents the transcription or translation rates of paralogs 1 and 2 within a pair:

The distribution of these two measures across paralog pairs confirms that the evolution patterns of yeast duplicated genes are consistent with a higher evolutionary rate for the regulation of transcription than for that of translation, as relative divergence in *β _{m}* is significantly higher (Fig 1A). The median magnitude of relative divergence is 1.5 times larger in transcription than in translation, with only slight differences depending on duplication type (1.45× for WGD and 1.48× for SSD). In addition, about 70% (74.3% for WGD and 62.2% for SSD) of paralog pairs are more divergent transcriptionally than translationally.

A potential caveat is that the per-gene transcription rates used have been obtained under the assumption that decay rates do not vary between transcripts [25]. When experimental measurements of mRNA decay [26] are employed to recalculate *β _{m}*, the observation that relative divergence is larger in transcription than in translation holds, although statistical significance is lost for SSD-derived paralog pairs (Fig 1B). Further analyses using three other sets of direct measurements of mRNA decay rates [27–29] – obtained through distinct experimental approaches – confirm the validity of our initial observation. In all cases, the relative expression divergence within paralog pairs is significantly higher in transcription than in translation (

*p*< 0.05; Mann–Whitney–Wilcoxon two-sided test) for WGD- and SSD-derived duplicates, although the magnitude of the difference varies (Fig S1B). To further validate our observation, we also made sure that

*β*log2-fold changes are representative of the true translational variation between paralogs. To this end, we compared the protein abundance

_{p}*log*

_{2}-fold changes calculated from

*β*and

_{m}*β*divergence to experimental measurements of protein abundance, which revealed strong correlations (

_{p}*r*= 0.63 − 0.80; Fig S1E).

An additional potential confounder in the calculation of the relative rates of change is experimental variation. If transcriptional measurements were noisier than translational ones, the magnitude of relative divergence in transcription could be artificially inflated. This is however unlikely to be the case, owing to how transcription and translation rates were inferred from mRNAseq and ribosome profiling data [30]. Whereas *β _{m}* was obtained through a normalization of the mRNA abundances measured by sequencing,

*β*for a given gene was defined as the normalized ratio of its number of ribosomal footprints over its mRNA abundance [25]. Hence, translation rates could be inherently more noisy, since they compound any noise in mRNAseq measurements affecting

_{p}*β*as well as additional variance introduced by the ribosome profiling experiment. We could thus expect our assessment of expression divergence to be conservative regarding any predominance of transcriptional changes. The analysis of simulated data supports this intuition, as the addition of noise leads to an underestimation of the relative contribution of transcription in most cases (Fig S2). The robustness of our initial observation is further confirmed by additional simulations combining the effect of experimental noise with that of unaccounted-for variations in transcript decay rates. While slightly overestimating the relative contribution of transcription changes appears plausible, in no instance does a predominantly translational divergence falsely appear mostly transcriptional (Fig S3).

_{m}To more thoroughly characterize the joint evolution of transcriptional and translational regulation within yeast paralog pairs, we also looked for correlations between the relative magnitudes of divergence in transcription and in translation. This revealed a weak but significant positive association (Fig 1C). We additionally repeated this analysis without first defining the log-fold changes as strictly positive, thus preserving information on the direction of the changes at both levels. In this case, a stronger positive correlation is observed (Fig S1D). Because mRNA abundances are used in the calculation of both *β _{m}* and

*β*[25], spurious correlations between the magnitudes of relative divergence may occur (Fig S4A). Taking into account the very strong association between the mRNA abundance and ribosomal footprints of individual genes within the dataset (

_{p}*r*= 0.981,

*p*< 1 × 10

^{−6}; Fig S4B), only very weak to nonexistent correlations are however expected between the magnitudes of relative divergence, both for absolute and signed values (Fig S4C). Accordingly, the relationships observed here among yeast paralogs suggest that duplicate pairs which diverged more transcriptionally also tend to have accumulated more changes at the level of translation – and usually in the same direction. This finding may reflect the action of gene-specific selective constraints on protein abundance or the existence of a correlation between transcriptional and translational mutations (see below).

Interestingly, a recent study examining the divergence of mRNA abundance and translation efficiency (analogous to transcription and translation rates, respectively) within paralog pairs in the model plants *Arabidopsis thaliana* and *Zea mays* reported patterns strikingly similar to our observations (Fig 1), with one notable exception [15]. In this study, a predominance of compensation between changes at the transcriptional and translational levels was observed, disagreeing with the positive correlation that we report between signed divergences in transcription and translation (Fig 1C). This discrepancy is consistent with predicted differences in the efficiency of selection, and may support an involvement of selective constraints. The effective population size of *A. thaliana* is indeed one to two orders of magnitude smaller than that of *Saccharomyces* yeasts [31–33]. These organisms however differ in many other ways, as plants are multicellular – meaning that paralogs could also diverge along other dimensions that are not captured in yeasts – and the regulation of their gene expression may feature additional layers of complexity.

### A minimal model of post-duplication expression evolution

Since yeast paralog pairs are more divergent transcriptionally than translationally, they constitute a suitable model system to investigate how a higher evolutionary rate for the regulation of transcription than for that of translation could potentially emerge. To this end, we considered two non-mutually exclusive hypotheses (Fig 2A).

First, transcriptional mutations might accumulate faster because they have more beneficial – or less deleterious – fitness effects than translational ones. Such a discrepancy could arise through a recently described evolutionary trade-off [25], according to which corresponding changes in transcription and in translation are equivalent in terms of amount of protein produced, but not in cost and precision(Fig 2A, left). Briefly, it stipulates that the precision and economy of gene expression cannot be maximized simultaneously because they inversely depend on the relative contributions of transcription and translation. While a larger transcriptional contribution reduces the magnitude of stochastic fluctuations in protein abundance [34,35] – or expression noise – and thus increases precision, it also incurs additional metabolic costs through the synthesis of more mRNA molecules [25,36,37]. We note that, within this framework, the sole relevant cost of gene expression is this latter transcription-dependent one, as the cost of protein production itself only depends on how many molecules are synthesized, regardless of whether translation is done from many or few mRNAs. Conversely to a higher reliance on transcription, a greater relative translational contribution increases economy (since fewer transcripts are synthesized) but amplifies noise, which thereby decreases precision. By having distinct effects on the economy and precision of gene expression, changes to *β _{m}* and

*β*could therefore be differently favored by natural selection.

_{p}Second, the faster accumulation of transcriptional variation could also be explained in a more parsimonious manner – without any direct involvement of selection – if the rate of transcription was more likely to be altered by mutations. Our other hypothesis is thus that transcription has a larger mutational target size than translation (Fig 2A, right), meaning that mutations acting on this trait would be more frequent or have larger effects, or both. Under neutral evolution or even under stabilizing selection to maintain protein abundance, more changes would thereby accumulate at the transcriptional level.

In order to test these two hypotheses (precision-economy trade-off vs differences in mutational target sizes), we defined a minimal model of post-duplication evolution. Within this framework, natural selection solely acts to maintain the cumulative expression of a pair of duplicated genes at a certain level, as the two proteins are functionally equivalent. Such selection on total protein abundance is likely an important feature of the early evolution of most paralog pairs, as suggested by several observations. The retention of duplicates after WGD events is for instance higher for genes whose products are part of protein complexes [38–40], for which the lowered expression resulting from the loss of a gene copy would be more deleterious. In addition, reduction in the cumulative expression of paralog pairs, so as to more closely replicate the expression of their singleton ancestor, appears widespread [22]. More importantly, *quantitative subfunctionalization* – under which selection acts on the cumulative expression of both duplicates, such that the protein abundance of the ancestral gene is effectively subfunctionalized between the two copies – has been shown to likely be a major mechanism for the long-term maintenance of WGD-derived paralogs [23]. Under this model, as well as within our minimal framework, the individual expression levels of two duplicates can vary as long as their sum remains close enough to an optimum, resulting in a compensatory drift of expression levels as mutations accumulate [23,24]. This process of quantitative subfunctionalization can of course be accompanied by functional changes within proteins, both simultaneously [41] or sequentially [42]. To ensure the generality of our minimal model, we nonetheless choose to ignore all forms of divergence other than expression changes. Functional variation is indeed much more difficult to quantify and probably occurs through a greater variety of mechanisms. Within the scope of this work, this simplification appears reasonable, as analyses show that the transcriptional bias of expression divergence is at worst weakly related to various proxies for the divergence of molecular functions among yeast paralog pairs (Fig S5). Accordingly, we assume that the two copies comprising any paralog pair are identical and express the same protein, as could be expected early after a duplication event.

More formally, in accordance with this parsimonious view of post-duplication evolution, we model the evolution of the expression levels of two copies (paralogs) of the same gene across a landscape of diagonal fitness isoclines where the optimum is along a central diagonal of constant (optimal) cumulative expression (Fig 2B I). Such a landscape is obtained from a parabolic function of fitness according to the cumulative protein abundance of both paralogs. For this minimal model to be more directly applicable to the complete yeast genome, a family of functions with varying curvatures (Fig 2B II) – taken from a distribution inferred from [25] – is defined, so that distinct fitness landscapes can be obtained for different gene pairs. The model is additionally complexified in two ways to implement the precision-economy trade-off, such that this hypothesis can be tested. First, expression noise (and thus the importance of precision) is explicitly taken into account by considering the mean fitness of a population of cells expressing two paralogs at a mean cumulative protein abundance *P _{tot}* with standard deviation

*σ*, which itself depends on the relative contribution of transcription to overall expression [25] (Fig 2B III; Methods). Second, economy considerations are implemented by the addition of a cost of transcription, in the form of a penalty

_{tot}*C*to fitness increasing linearly with the total number of transcribed nucleotides [25] (Fig 2 IV; Methods).

To formally compare our two hypotheses, we performed *in silico* evolution according to the two models, minimal and precision-economy. Only transcription and translation rates themselves were evolved, while mRNA and protein decay rates were assumed to be constant (Methods). This restriction to only the two traits of interest had the advantage of reducing the parameter space which had to be explored, while being a reasonable simplification. Most of the variation in gene expression levels indeed occurs at these two regulatory levels [25]. In addition, taking into account changes in mRNA decay has little impact on the patterns of transcriptional divergence, as we show (Figs 1A-B and S1). The simulations were carried out following a sequential fixation approach [43], meaning that each successive mutation was instantaneously brought to fixation or rejected. This is of course a simplification of evolutionary processes (see Discussion).

To initialize each simulation run, random singleton genes are first generated by sampling protein abundances and fitness function curvatures from the complete yeast dataset [25] and assigning corresponding realistic combinations of transcription rate *β _{m}* and translation rate

*β*(Methods). Then, each singleton is duplicated into two paralogs

_{p}*P*

_{1}and

*P*

_{2}which both have the ancestral

*β*and

_{m}*β*rates. We assume that a duplication event causes a doubling of total transcriptional output without affecting the translation rate of individual transcripts, which is realistic since the translation rates of most mRNA do not change upon gene copy number increase [44]. Previous descriptions of the quantitative subfunctionalization framework, which we use as a minimal model, have postulated that the initial post-duplication cumulative protein abundance is already optimal [23,24]. In order to be more general, we instead simply assume that the immediate loss of a newly created paralog is deleterious. Due to the high frequency of loss-of-function mutations, any duplicate whose loss was neutral or beneficial would rapidly be lost even if it did reach fixation [45]. Accordingly, only simulated gene pairs for which the instant loss of a paralog would be deleterious are considered, while the post-duplication optimum of cumulative protein abundance is set to slightly less than double the ancestral protein abundance (1.87×) – such that the duplication-induced doubling overshoots the optimum but still results in a positive fitness (Extended Methods).

_{p}Once all paralog pairs have been generated, they are subjected to successive mutation-selection rounds so that their transcription and translation rates can evolve (Fig 2C). These are performed as follows. One relative mutational effect is first sampled for each pair from a normal distribution of mean 0 and standard deviation *σ _{mut}*. This relative expression change is then randomly assigned to transcription or translation according to relative probabilities

*P*and

_{βm}*P*, which represent the relative mutational target sizes of the two traits. The transcriptional or translational mutation is next applied randomly to paralog

_{βp}*P*

_{1}or

*P*

_{2}, increasing or decreasing its expression by a fraction of its initial level. Because each effect is applied to only one gene copy, all the simulated mutations can be considered as

*cis*-acting ones occurring in one paralog’s regulatory or coding sequences. Changes in

*trans*, which affect both duplicates simultaneously, are therefore ignored. Such changes would indeed not result in any expression divergence between two identical copies of the same gene, as we consider here. Once a mutational effect has been applied, a new cumulative protein abundance can be computed, from which a mutant fitness

*F*is calculated either according to the minimal model or its precision-economy version. From the resulting

_{j}*F*and the ancestral fitness

_{j}*F*, a fixation probability is computed using a modified Metropolis criterion [46], according to which the attempted mutation instantly reaches fixation or is lost. This modified criterion can be adjusted for different levels of selection efficacy with a parameter

_{i}*N*analogous to effective population size. This complete mutation-selection process is repeated numerous times within each simulation, until the median relative protein abundance divergence (Eq 1) for the simulated set of duplicate pairs is not significantly different from the empirical value observed for yeast paralogs (Methods).

### The precision-economy trade-off is sufficient to promote transcriptional divergence

We first used the framework described above to perform a small-scale ‘mock’ simulation of 50 randomly generated duplicate pairs according to the precision-economy implementation of our minimal model of post-duplication evolution. Transcription and translation were assumed to have equal mutational target sizes (*P _{βm}* =

*P*), the standard deviation of mutational effects was set to an arbitrarily small magnitude (

_{βp}*σ*= 0.025) and a scenario of high selection efficacy was considered (

_{mut}*N*= 10

^{6}).

This simulation revealed a clear bias towards transcriptional changes, with the relative divergence in transcription accounting for almost all the total variation of protein abundance (Fig 3A). After ~ 6500 mutation-selection rounds, the median relative divergence was nearly twice larger for *β _{m}* than for

*β*across the 50 simulated paralog pairs. The resulting evolutionary trajectories highlight that expression divergence was driven by transcriptional changes, as the most transcribed paralog is almost always associated with the highest protein abundance, while the same is not true for the most translated one (Fig 3C).

_{p}Interestingly, these patterns do not arise because the divergence of transcription levels is itself beneficial under the precision-economy trade-off, as illustrated by the fitness plateau observed from round 1000, while transcription rates are still actively diverging (Fig 3B). Indeed, since expression noise scales according to both protein abundance and transcription rate (Methods), variance in the cumulative expression of a duplicated gene depends on the total *β _{m}* and expression of its two copies. Similarly, the cost of transcription only depends on the total number of mRNAs synthesized. As such, both the precision and economy of expression are not impacted by the distribution of the relative transcriptional contributions between two paralogs, which thus has no effect on fitness. Rather, it is the ratio of the cumulative translation and transcription rates which dictates fitness under precision-economy constraints, as clearly shown by the early adaptation phase of the simulated evolutionary trajectories (3B). After an initial reduction of cumulative protein abundance to reach the new post-duplication expression optimum, the transcription of both duplicates is decreased while their translation is increased (Fig 3C), to rebalance the ratio [25] of the paralog pair. Following this first phase of post-duplication adaptation, compensatory drift of cumulative protein abundance takes place [23,24], but changes are almost exclusively biased towards transcriptional divergence. This occurs due to interactions between transcription- and translation-acting mutations introduced by the precision-economy trade-off. Further transcriptional changes can for instance be expected to be favored by selection after the fixation of a mutation altering

*β*, as they have the potential to compensate effects on both precision and economy, while a change of translation can individually only act on precision.

_{m}This first simulation thus shows that the trade-off between the precision and economy of gene expression can lead to a mostly transcriptional expression divergence between two paralogs, potentially creating evolutionary patterns as observed in yeast.

### Selecting a biologically plausible parameter space

The precision-economy trade-off is sufficient to favor the transcriptional divergence of duplicated genes, but it might still not be the most likely explanation for the evolutionary patterns observed in yeast paralogs. To rigorously compare our two competing hypotheses, we performed series of large-scale simulations under biologically realistic assumptions. All the corresponding *in silico* evolution experiments were carried out in three replicates of 2500 randomly generated paralog pairs. Throughout these computational experiments, the WGD-derived duplicates of *S. cerevisiae* were used as a reference set, both for identifying when to stop iterating through mutation-selection rounds as well as for assessing the results of each simulation. We chose to restrict our analysis to this group of paralogs, because the mechanism of quantitative subfunctionalization, which we use as a minimal evolutionary model, was initially proposed in the context of whole-genome duplication [23] and its applicability to other duplication events is thus less clear. An additional reason to focus on WGD-derived paralogs is that our initial observation is less certain for duplicates which originated from SSD. In one instance, controlling for variations in mRNA decay erased the difference between the magnitudes of transcriptional and translational divergence within this set of gene pairs (Fig 1B). Ensuring the biological plausibility of our simulations also required a careful examination of two important parameters – the efficacy of selection *N* and the standard deviation of mutational effects *σ _{mut}* –, for which there are only partial estimations.

Many estimates of the effective population size of *Saccharomyces* yeasts, which is analogous to the *N* parameter of the evolutionary algorithm regarding the efficiency of selection [46], are available [32,33]. The relevance of such historical values to the immediate post-WGD context is however unclear, because the whole-genome doubling could have been accompanied with a strong founder event. Identifying a most realistic *N* is therefore difficult. Accordingly, we instead chose to consider two scenarios, respectively of high (*N* = 10^{6}) and reduced (*N* = 10^{5}) efficacy of selection, and compare our two hypotheses within both contexts.

Similarly, although the distribution of mutational effects on expression level has been studied in budding yeast, only limited information is available on the effect of mutations – and especially *cis*-occuring ones – on expression. Previous studies often focused on only a small subset of genes [47,48], did not differentiate between *cis* and *trans* mutations [48,49], assessed too few mutations [49] or were limited to substitutions occurring within a short segment of the promoter sequence [50]. Thus, instead of arbitrarily choosing a *σ _{mut}*, we performed simulations across a range of standard deviations to identify the most biologically plausible value. As most mutations affect the expression level of selected yeast genes by 20% or less [48], we considered values ranging from 0.01 to 0.35.

Our second hypothesis only stipulates that the regulation of transcription may have a larger mutational target size than that of translation without specifying the magnitude of this difference. Testing it therefore requires using various relative probabilities of transcriptional and translational mutations, *P _{βm}* and

*P*(Fig 2C). To make it more robust, the identification of the best-fitting

_{βp}*σ*was combined with this screening into a grid search, which was performed separately for the two scenarios of selection efficacy. In each instance, the most plausible standard deviation of mutational effects was defined as the one resulting in the best overall fit to the empirical distributions of relative divergence – at the levels of transcription, translation and protein abundance – for any combination of model (minimal or precision-economy) and relative mutational target sizes (Methods). This approach identified best

_{mut}*σ*of 0.025 and 0.075, respectively under high (

_{mut}*N*= 10

^{6}) and reduced (

*N*= 10

^{5}) selection efficacies (Fig S6). Interestingly, the value obtained for

*N*= 10

^{6}is highly consistent with a previous experimental characterization of

*cis*-regulatory mutations in the yeast

*TDH*3 promoter [47]. Whether the latter is representative of the typical

*S. cerevisiae*gene is however unclear [48].

### A difference of mutational target sizes may better explain the observed divergence patterns

To thoroughly evaluate the validity of the two competing hypotheses, we assessed the extent to which our minimal model and its precision-economy version could replicate the main features of the divergence patterns of yeast paralogs (Fig 1): 1) a predominance of transcriptional changes, 2) a weak positive correlation between the magnitudes of relative divergence in transcription and in translation and 3) a high frequency of amplifying changes at the two regulatory levels.

We focused on simulations performed in three replicates of 2500 paralog pairs using the best-fitting *σ _{mut}* values identified previously, for the two selection efficacy regimes (

*N*= 10

^{6}and

*N*= 10

^{5}). Following each individual simulation, summary statistics were computed on the complete set of 2500 simulated gene pairs, in order to compare the resulting expression divergence to what we observed in WGD-derived paralogs (Fig 4A). The distance between the simulated and empirical distributions of relative divergence (

*log*

_{2}-fold changes) in transcription and translation rates as well as protein abundance was first quantified using the Kolmogorov-Smirnov (KS) statistic. For each replicate simulation, the three resulting measurements were combined into a mean KS statistic, between 0 and 1, for which a lower value indicates a better overall fit. In addition, the two types of divergence correlation – either between the transcription and translation

*log*

_{2}-fold changes or between their signed versions (Fig 1C-D) – were also computed for each replicate simulation, on the whole set of diverged duplicate pairs obtained. By comparing these two correlation coefficients to their empirical values and the associated 95% confidence interval, we could assess whether each individual simulation successfully replicated the corresponding feature of the expression divergence of yeast WGD-derived paralogs (Fig 4A).

The comparison of empirical and simulated relative divergence distributions using the mean KS statistic reveals the contrasting performance of the two models, depending on the choice of parameters (Fig 4B). When a high efficacy of selection is assumed, the minimal model is by far the most accurate, as shown by the attainment of much lower mean KS statistics (as low as ~ 0.07, compared to values > 0.2 for the other model). The best fit is obtained when a higher probability of mutations affecting *β _{m}* is assumed, and especially when

*P*is between 3 and 6, which supports the hypothesis of a larger mutational target size for transcription. The poor performance of the precision-economy model in these conditions is almost entirely due to its inability to produce a realistic translational divergence (Fig S7A-B). We thus performed additional simulations with

_{βm}/P_{βp}*P*>

_{βp}*P*which however did not result in a better fit to the empirical relative expression changes (Fig S8A-B). In contrast, when the efficacy of selection is reduced, the performance of the two models is much more similar, as illustrated by the obtention of mean KS statistics below 0.1 in both cases (Fig 4B). The most accurate replication of the empirical relative divergence distribution is still observed for the minimal model, more precisely when transcriptional mutations are three to six times more likely than translational ones, but the precision-economy trade-off performs almost as well when mutations affecting

_{βm}*β*are as frequent or even twice likelier than ones on

_{p}*β*. Supplemental simulations show that increasing the relative probability that mutations act on translation rates – up to

_{m}*P*= 1/10 – does not further improve the performance of the precision-economy model (Fig S8C). Overall, this highlights how both hypothesized mechanisms might have shaped the expression divergence of yeast paralogs, although the mutational target size hypothesis appears much more robust to assumptions about the values of evolutionary parameters.

_{βm}/P_{βp}The replication of the empirical divergence correlations is also dependent on the choice of model and selection efficacy regime, but much less on the relative frequencies of transcriptional and translational mutations. When *N* = 10^{6}, the positive association between the absolute magnitudes of transcription and translation changes (Fig 1C) is only reasonably reproduced by the minimal model, for which almost all simulations result in a correlation that falls within the 95% confidence interval of the empirical value (Fig 4C). For *N* = 10^{5}, implying a reduced efficacy of selection, it is instead the precision-economy trade-off which is associated with the most realistic such correlations. All the corresponding simulation runs indeed produce correlation coefficients within the confidence interval. The minimal model is however also able to generate realistic correlations, for one or two replicates under most of the tested ratios of transcriptional and translational mutational target sizes (Fig 4C). As both models can replicate the observed correlation, this again highlights the plausibility of the two alternative hypotheses. Yet, it also further reveals the increased robustness of the minimal model.

If the two models accurately described the evolution of WGD-derived paralogs, they would also replicate the strong positive relationship observed between the signed *log*_{2}-fold changes in transcription and in translation (Fig 1D). That is however not the case, as neither model can realistically reproduce this correlation, across the selection efficacy regimes as well as the range mutational target sizes ratios tested (Fig 4D). Nevertheless, all simulations are interestingly associated with strictly positive correlations, even the minimal model of post-duplication evolution, contrary to the intuitive expectation of a compensatory drift at both levels to maintain protein abundance [23,24]. This, as well as the inability to replicate the empirical correlations, may reveal an effect of the postulated post-duplication expression optimum. We thus performed additional simulations while varying this optimum, to consider the alternative possibilities that the duplication-induced expression doubling was perfectly optimal or did not sufficiently increase the expression level. These showed a limited dependency of the correlation between signed relative divergences on the posited expression optimum (Fig S9B), ensuring that we were not being misled by a potentially poor choice of this value. We note that the effect of the post-duplication optimum is slightly stronger for the correlation between the absolute magnitudes of the *log*_{2}-fold changes (Fig 4C; Fig S9A), but still insufficient to invalidate our previous conclusion on the replication of this feature.

Overall, these comparative analyses show that both mechanisms may realistically have contributed to the greater importance of transcriptional changes in the expression divergence of yeast WGD-derived paralogs. They however also reveal that the two hypotheses cannot fully explain the observed evolutionary patterns – since one of the three features of the empirical divergence can be replicated by neither model – and as such require refining. While both mechanisms may have shaped the evolution of duplicate pairs, the hypothesis of a larger mutational target size for transcription emerges as the preferred explanation. It indeed results in the best fit (lowest mean KS statistic found on Fig 4B) while being more robust to assumptions about the efficacy of natural selection. Although the relative mutational target sizes of transcription and translation regulation are not known, the fact that the best agreement with our observations (lowest mean KS statistic) is obtained for a modest difference of relative mutation probability means that the bias need not be important to impact evolution.

### Revisiting the hypotheses when considering transcription-translation couplings and biased mutational effects distributions

Neither model can replicate the strong positive correlation observed between signed transcriptional and translational changes within paralog pairs (Fig 1D), but this may be because we made unrealistically simple assumptions about the mutational process. First, the extent to which mutations may independently act on transcription and translation is unclear. Second, mutational effects might not be distributed symmetrically.

Many mutations in the transcribed region of a gene may for instance simultaneously have transcriptional and translational effects, as the identity of the translated codons might affect both mRNA stability and translation itself [51–53]. In addition, mutations at one regulatory level may be associated with amplifying or buffering regulatory changes at the other – as suggested by stress responses [54–56]. The effects of random mutations on expression level might also often be distributed asymmetrically, as shown by the experimental characterization of mutations affecting ten yeast genes [48]. Recent work additionally predicted that mutations increasing expression are rarer for highly expressed promoters, and vice-versa for lowly expressed ones [50]. As such, there are many potential constraints on the effects of mutations which could create correlations between transcriptional and translational changes. Taking these effects into account may allow at least one of our models to fully replicate the expression divergence patterns of yeast WGD-derived paralogs.

It is not possible to include all of the complexity of gene regulation in a single model but we nevertheless examined these additional potential factors. We made two new versions of our simulation framework, respectively implementing an asymmetry in the distribution of mutational effects and a correlation between the transcriptional and translational effects of mutations. This second addition to the model, which allows mutations to act on both *β _{m}* and

*β*at once, can potentially account for regulatory responses as well as for the coupling between mRNA decay and translation efficiency. Within our minimal model, under which identical changes to transcription and mRNA stability are entirely equivalent (as only their effect on protein abundance matters), correlated effects on

_{p}*β*and

_{m}*β*can indeed represent such a coupling.

_{p}Mutation asymmetry was implemented using a skew normal distribution of mutational effects with skewness parameter *α* ≠ 0, while correlations between transcriptional and translational mutations were added using a bivariate normal distribution of mutational effects. The latter modification also meant that relative mutational target sizes could not be modeled as mutation probabilities, as each mutation now affected transcription and translation. Instead, they were modeled as differences in mean (absolute) mutational effects. As such, the standard deviations of transcriptional and translational effects were scaled according to the ratio of target sizes. The precise standard deviations were chosen to obtain the same mean change in protein abundance per mutation as with a given reference *σ _{mut}* within our standard framework (Methods). Because this represents a significant departure from our previous approach, another grid search was performed in order to identify the best-fitting reference

*σ*under each selection efficacy scenario when using bivariate mutational effects (Fig S10).

_{mut}Simulations were performed as previously across ranges of distribution asymmetry and correlation of mutational effects (Methods). While both negative and positive correlations between transcriptional and translational effects were assayed, only negative skewness values – biasing mutations towards a decrease of expression – were used, since a positive skew lengthened simulations too much by impeding the initial protein abundance reduction. The accuracy of each simulation run was again assessed using summary statistics computed on the complete set of 2500 paralog pairs simulated. As before, mean KS statistics were calculated to quantify the fit to the empirical distributions of relative divergence (*log*_{2}-fold changes) in transcription and translation rates, as well as protein abundance (Fig 5A). The two types of divergence correlation, using either absolute or signed relative divergences, were also computed. For all three metrics, the values obtained for each of three replicate simulations were combined into a mean – or grand mean in the case of KS statistics –, which was then used in comparisons of model and parameter values combinations (Fig 5A). An additional set of summary statistics was also computed: p-values of Mood’s median test for the comparison of empirical and simulated distributions of relative divergence at the levels of transcription, translation and protein abundance. These allowed to classify each simulation run as generating expression divergence of a realistic magnitude or not.

When considering our minimal model of post-duplication evolution in a context of high selection efficacy (*N* = 10^{6}), both tested mutational constraints are sufficient to create a strong positive correlation between transcriptional and translational signed *log*_{2}-fold changes, as observed within real WGD-derived paralog pairs (Fig 5D). A high negative skew on distributions of mutational effects or a strong positive correlation between effects on *β _{m}* and

*β*can both result in Spearman’s correlation coefficients which fall within the empirical confidence interval, for a wide range of mutational target sizes ratios. This can even coincide with the obtention of realistic relative divergence distributions, as shown by non significant Mood’s median tests (

_{p}*p*> 0.05) on all three properties and low grand mean KS statistics (Fig 5B). There is however no combination of parameters for which the other divergence correlation – between the absolute magnitudes of fold changes – can simultaneously be replicated (Fig 5C). As such, the addition of more realistic mutational constraints can rescue one type of divergence correlation at the expense of the other, which was in contrast properly replicated by our previous simulations assuming perfectly independent mutations and normally distributed mutational effects. An identical conclusion is reached when the efficacy of selection is reduced (Fig S11) or when the precision-economy model is instead considered (Fig S12).

While this attempt at more realistically modeling the effect of mutations did not clearly favor one hypothesis or the other, the fact that it rescued the replication of one feature of the expression divergence of yeast paralogs (Fig 1D) suggests that at least one of our two models may be adequate when combined with truly realistic mutational biases and correlations. A larger mutational target size for transcription and the evolutionary trade-off between the precision and economy of gene expression thus both appear as suitable non-mutually exclusive explanations for the predominance of transcriptional changes in the divergence of yeast paralogs.

## Discussion

Our analysis of published data [25] suggests that transcriptional changes played a greater role than translational ones in the divergence of paralogs in the yeast *S. cerevisiae*. Whether this is a general feature of the evolution of duplicated genes remains to be fully investigated, but a report of very similar evolutionary patterns in plants *A. thaliana* and *Z. mays* [15] highlights the plausibility of such a generalization.

Focusing more specifically on WGD-derived paralogs, we used *in silico* evolution to investigate two potential mechanisms explaining this predominantly transcriptional divergence: an evolutionary trade-off between the precision and economy of gene expression [25], and a larger mutational target size for the regulation of transcription. Simulations revealed that both hypotheses may be consistent with the observed patterns of evolution. We turned to WGD-derived paralogs because the minimal model of post-duplication evolution we used was initially described for cases of whole-genome doubling [23]. It might however still be applicable to other types of duplication events. Our framework for instance also partially replicated the divergence patterns of SSD-derived paralog pairs, albeit with a lower overall agreement (Fig S13).

The precision-economy trade-off is sufficient for divergence to occur mostly in transcription, which highlights how interactions at two levels of expression regulation can shape evolutionary trajectories. The general relevance of this type of epistasis in the evolution of duplicate genes is however unclear. Due to the small magnitude of the fitness effects involved [25], precision-economy constraints may only be impactful in large populations where selection is particularly efficient. This might not be the case in the early evolution of WGD-derived paralogs, even in *S. cerevisiae*, as WGD events initially create a small polyploid population. We also note that, contrary to what the *N* parameter of 10^{5}-10^{6} would suggest, our simulations are not exactly representative of such a large finite population. Because any beneficial or perfectly neutral mutation is automatically accepted [46], even infinitesimal fitness gains are visible to selection, rather than only those larger than 1/*N*. In a more realistic scenario, weakly beneficial and mildly deleterious mutations acting on transcription or translation may both be close to neutrality and have similar fixation probabilities, limiting the ability of the precision-economy trade-off to favor transcriptional divergence.

A larger mutational target size for transcription, modeled as a higher relative mutation probability, emerges as the preferred hypothesis, as it is robust to assumptions about selection efficacy. The high similarity between the expression divergence patterns of WGD- and SSD-derived paralogs (Figs 1 and S1; [15]), which have been shown to differ substantially in their initial properties and subsequent evolutionary trajectories [57–59], may also support such a more general mutational mechanism. In addition, when strictly considering *cis* mutations – the only changes which can cause two identical gene copies to diverge – as in the current work, a larger mutational target for transcription is intuitively likely. Because *cis* mutations acting on translation have to occur within the transcribed sequence while transcriptional mutations can also arise upstream or downstream of the gene, more nucleotide positions could potentially affect transcription. Determining whether a higher frequency of mutations affecting transcription truly contributes to the predominance of transcriptional changes within paralog pairs would however require direct measurements of the neutral evolutionary rates of transcriptional and translational efficiencies.

Since the current distribution of transcription and translation rates among *S. cerevisiae* genes has previously been attributed to the optimization of the precision-economy trade-off [25], suggesting that such constraints may not be needed to explain the divergence patterns of paralogs might appear contradictory. This is especially true considering the significant energetic costs of even small increases of transcription and translation [36,60]. Yet, even when precision-economy considerations are fully neglected under the minimal model, extended *in silico* evolution results in only minor deviations from the reported distribution of genes in the transcription-translation space (Fig S14). One plausible explanation could be that precision-economy constraints impact evolutionary trajectories on longer timescales and/or along greater ranges of variation, while mutational effects dominate on the shorter timescales and smaller expression changes associated to the divergence of duplicates.

While we have identified two potential underlying mechanisms for the preeminent contribution of transcriptional variation to the expression divergence of paralogs, their wider applicability to the evolution of gene expression levels remains to be investigated. Whether any of them could explain general trends of faster transcriptional evolution within and between species [6–8] is indeed unclear. If a larger mutational target size for transcription was involved, it would presumably affect singleton genes as well as paralog pairs – unless it were unique to duplicates, potentially because more nucleotide positions might affect transcription than translation in *cis*. In contrast, if the precision-economy trade-off was responsible for the greater magnitude of transcriptional divergence among yeast paralogs, it likely would not explain any general tendency for transcription to evolve at a faster rate. It is indeed interactions between mutations in the two paralogs that favor transcription divergence and bias expression changes within our simulation, which could hardly apply to singletons genes. The precision-economy trade-off may nevertheless have less intuitive effects on the evolution of expression levels in such genes if combined with other mechanisms, for instance a slight difference of mutational target sizes. How this trade-off [25] could affect the relative evolutionary rates of transcription and translation at the genome scale, particularly over long timescales, thereby warrants further investigation.

Besides providing explanations for the patterns of expression divergence observed in yeast paralogs – as well as suggesting hypotheses for a more general propensity for transcription changes to dictate protein abundance variations –, we also illustrate how biases in the mutation process can impact the multi-level evolution of expression levels. Asymmetry in the distribution of mutational effects and correlations between transcriptional and translational effects both markedly affect the correlation between transcription and translation changes within gene pairs in our simulations. This further shows that various types of mutational bias could impact the evolutionary trajectories of a duplicate pair under selection to maintain its cumulative protein abundance, as might be the case for singleton genes in the absence of selection [48]. Overall, our work thus highlights the importance of thoroughly characterizing the distributions of mutational effects on expression at multiple regulatory levels in order to fully understand the expression divergence of paralogs, and, more widely, the genome-wide evolution of expression levels. Future experiments could for instance perform large-scale measurements of the transcriptional and translational effects of mutations and assess what correlation(s) exist between them.

Irrespective of its cause(s), the predominance of transcriptional changes in the expression divergence of paralogs, whether yeast-specific or more general, might have significant evolutionary consequences. Due to the relationship between transcription rate and expression noise [25,34,35], it could result in a greater divergence of noise levels within duplicate pairs than symmetrical or mostly translational expression changes. Whether this could affect or even dictate the evolutionary paths followed by paralogs remains to be investigated. One intriguing possibility is that such early divergence of noise levels could favor the resolution of noise-control conflicts [61] while maintaining functional redundancy. This is exemplified by paralogous yeast transcription factors *MSN2* /*MSN4*, which appear to combine the benefits of low and high expression noise. While one gene is stably expressed with low noise across conditions, the other is expressed with high noise and is environmentally responsive [62]. Although the maintenance of this gene pair likely involved other mechanisms such as post-translational changes [63], it is conceivable that the benefit of such two-factor regulation was first revealed by early transcriptional divergence, and later refined through changes in promoter architecture. A high prevalence of such trajectories could help explain why 40% of WGD-derived transcription factor pairs in *S. cerevisiae* bind the same targets [64]. While speculative, this possibility underscores how a simple bias towards transcriptional divergence may have far-reaching impact.

We studied the expression divergence of paralogs while neglecting all types of changes to protein function and regulation. This simplification is supported by the absence of a clear continuous relationship between functional divergence and the relative contribution of transcriptional variation to expression changes, although stronger relationships with the magnitude of expression divergence advise caution (Fig S5). Such an approach would be particularly suitable if expression changes were not affected by functional divergence, which could for instance be true if expression diverged first during evolution. This was postulated by previous models, according to which early expression changes under selection to maintain cumulative protein abundance lengthen the retention of duplicates and allow function-altering mutations to arise and fix [23,42]. Whether the divergence of protein function really follows expression changes is however unclear, as it might also occur simultaneously or even precede it [41].

If expression divergence did occur first in the evolution of duplicated genes, it would also shape their subsequent functional divergence. Any mutation affecting the function of one of two identical paralogs needs to overcome the deleteriousness of the associated reduction in the abundance of the ancestral gene product [23], meaning that such changes might be restricted to new functions for which the protein abundance of the mutated gene copy is already close to optimal. The addition of precision-economy constraints suggests an intriguing extension of this model. If transcription cost and tolerance to expression noise dictate optimal transcription and translation rates for all genes as described by [25], functional changes may be further restricted to molecular functions which are compatible with the precision and cost of the expression of the affected paralog.

While this work provides insights on the expression divergence of duplicated genes – and thus more generally on the evolution of gene expression –, it presents limitations. A first one is the assumption that all paralog pairs were initially made of two identical gene copies. This is especially significant in the case of WGD-derived pairs, as the yeast WGD likely involved a hybridization event [65]. As such, some paralogs were already diverged and their expression fold changes may not be representative of the evolution of duplicates. The agreement between WGD- and SSD-derived pairs would however support the idea that a predominance of transcriptional changes is a general feature of the divergence of paralogs. Other simplifying assumptions made to simulate the evolution of *S. cerevisiae* WGD-derived duplicated genes also warrant examination. Sequential fixation, where mutants never coexist in the population [43], is likely not fully representative of evolutionary processes in yeasts with large population sizes. It is also unable to replicate the fixation dynamics of pairs of compensatory mutations [66], which could play a role in the expression divergence of duplicated genes. More importantly, the use of the Metropolis criterion to accelerate simulations may skew the resulting evolution patterns by equally valuing all positive fitness changes and artificially widening the gap between mildly beneficial and slightly deleterious mutations. Another important limitation of our approach is the absence of gene loss. Post-WGD paralog retention is known to be on the order of 15% in *Saccharomyces* yeasts after 100 million years of evolution [67], yet all randomly generated duplicate pairs are retained throughout our simulations. Including loss-of-function mutations, through which gene copies could have been inactivated when tolerated by selection, would have been more realistic, but tests showed that it made the end condition of the simulation an ever-moving target. We note that performing all tolerated loss-of-function mutations at the end of the simulations (Methods), prior to the calculation of summary statistics, produces qualitatively similar results (Fig S15), which strengthens our conclusions. An alternative could have been to restrict our simulations to duplicate pairs which were destined to be retained for an extensive period. Using the current transcription and translation rates of paralogs to infer the expression levels of ancestral singletons and then investigate the divergence of duplicates would however have proved circular. Caution is additionally in order before generalizing our observations, and the two underlying mechanisms investigated, to the evolution of all duplicated genes. The patterns of divergence among paralogs that motivated our work are by definition observed for pairs that survived to this day, such that we cannot infer what happened for pairs which returned to single-copy genes.

## Conclusion

The expression divergence of yeast paralogs mostly occurred at the transcriptional level, which may be due to two mechanisms: an evolutionary trade-off between the precision and economy of gene expression and a larger mutational target size for transcription than for translation. Whether these explanations also hold for more general observations that transcription may evolve at a faster rate within and between species remains to be elucidated. Interestingly, some features of the divergence of duplicate pairs can be replicated by either model only when mutational biases – asymmetry and transcription-translation correlations in the distributions of mutational effects – are added. This observation illustrates the importance of fully characterizing how mutations jointly affect the different traits contributing to gene expression. More importantly, our work highlights how measuring the neutral evolutionary rates of transcription and translation efficiencies is essential to a complete understanding of the evolution of expression levels. Such measurements would be pivotal in discriminating between the two alternative mechanisms, as well as in elucidating whether our findings apply only to duplicated genes. Further research will additionally help clarify the wider evolutionary implications of predominantly transcriptional expression divergence, especially regarding the impact of gene duplication events.

## Methods

### Expression divergence of yeast paralogs

#### Transcriptional and translational divergence

We downloaded the transcription and translation rates of 4440 yeast genes from [25]. These rates have been inferred from a mRNA-seq and ribosome profiling experiment by [30]. Transcription rates *β _{m}* for each gene

*i*were obtained as follows (Eq 2). The fraction

*r*of mRNA-seq RPKMs

_{i}*r*associated to gene

_{j}*i*was first converted into mRNA abundance

*m*using the total number

_{i}*N*of transcripts per cell, estimated to 60 000 molecules. Then,

_{m}*m*was inferred from the mRNA abundance using decay rate

*α*, in hours

_{m,i}^{−1}. The rates reported by [25] were obtained under the assumption that mRNA decay rate does not vary between genes, such that

*α*becomes constant

_{m,i}*α*, equal to the median transcript decay rate of 5.10 h

_{m}^{−1}.

To obtain the translation rate *β _{p}* for each gene

*i*, the total translational flux (in proteins per h) was first estimated as the product of median protein decay rate

*α*, equal to 1.34 h

_{p}^{−1}, and total number of proteins per cell

*N*, amounting to 1.1 × 10

_{p}^{8}molecules [25]. The fraction of this synthesis flux directed to gene product

*i*was then obtained from the corresponding fraction

*s*of ribosome profiling RPKMs

_{i}*s*, which was divided by the abundance

_{j}*m*of the transcript [25].

_{i}We identified paralogs originating from either whole-genome duplications (WGD) or small-scale duplication events (SSD) using annotations by [68]. Groups of more than two duplicates, derived from successive duplication events, were excluded by only considering genes annotated “WGD” and “SSD” in the cited work. We obtained a final set of 409 high-confidence paralog couples, of which 245 are WGD-derived and 164 have originated from SSD events.

To assess the relative expression divergence of paralogs, we computed *log*_{2}-fold changes in transcription rates and translation rates within each gene pair, according to Eq 1. The two correlations between transcriptional and translational divergences within paralog pairs were calculated as Spearman’s *ρ*. For the correlation computed from signed *log*_{2}-fold changes, the latter were calculated similarly to Eq 1, but without defining the ratio as max/min. For each pair, the ratio was computed in the two possible orientations and a duplicated dataset was generated, on which the correlation was then calculated.

The 95% confidence intervals for each of the two correlations were obtained by bootstrapping, involving 10000 sampling with replacement of 409 true pairs of observations (or 818 when using a duplicated dataset). The corresponding correlation was recomputed on each of the resampled datasets.

#### Taking into account gene-to-gene variation in mRNA decay

We recalculated *β _{m}* using experimental measurements of mRNA decay. The

*α*constant was replaced with gene-specific decay rates, taken from four datasets [26–29]. All measurements were converted into hours

_{m}^{−1}. As decay rates compound active degradation and dilution due to cell division within the framework of [25], the effect of this dilution was added when necessary [26,28,29]. As [25], we assumed a cell division time of 99 minutes, such that the decay rate of transcript

*i*is obtained from the experimental decay constant

*γ*as follows: . Relative divergences (

_{i}*log*

_{2}-fold changes) as well as the two correlations were recalculated with these datasets.

#### Validating the translation rates

For the 818 paralogs previously identified, protein abundances were computed from transcription and translation rates reported by [25] using Eq 5. To better isolate whether the translation rates *β _{p}* are representative of translational flux, constant

*α*was replaced by gene-specific protein decay rates

_{p}*α*obtained from experiments [69,70]. As for transcript decay, the effect of dilution due to cell division was added when necessary. Variations in mRNA decay were not taken into account, simply because they could not have any effect on the calculation. Any constant or gene-specific transcript decay rate would indeed be present both at the numerator – to obtain transcription rate

_{p,i}*β*(since

_{m,i}*β*=

_{m,i}*m*) – and denominator. Only mRNA abundance

_{i}α_{m,i}*m*for each gene

_{i}*i*(read counts obtained in mRNA-seq normalized into a number of transcripts per cell [25]) was thereby needed as transcript-level data. The abundance of each protein

*i*was thus estimated as . Within each of the 409 duplicate pairs, an estimated

*log*

_{2}-fold change of protein abundance was computed from these values

*p*. Measurements of the abundance of each protein [69,71,72] were also used to calculate an experimental

_{i}*log*

_{2}-fold change of protein abundance within each paralog pair. For each combination of datasets, the Pearson correlation between the estimated and experimental

*log*

_{2}-fold changes was computed. To assess the significance of the resulting correlations, we compared them to distributions of correlation coefficient obtained from 10000 repetitions of this process using randomly shuffled

*β*rates.

_{p}#### Estimating the impact of experimental noise

We generated simulated mRNA and ribosomal footprints abundances for a range of measurement errors. These values, which can be directly inferred from the mRNA-seq and ribosome profiling RPKMs, correspond to the experimental component of the calculations of *β _{m}* and

*β*.

_{p}Each simulated dataset was obtained as follows. Rates *β _{m}* and

*β*were first sampled from the dataset produced by [25] for paralog

_{p}*P*

_{1}of each of

*n*gene pairs. Then, sets of

*β*and

_{m}*β*

_{p}log_{2}-fold changes were sampled randomly from two normal distributions of mean 0 and respective standard deviations

*σ*

_{Δβm}and

*σ*

_{Δβp}. The transcription and translation rates for all paralogs

*P*

_{2}were next computed by applying the selected fold changes to the rates previously sampled for the

*P*

_{1}of each pair. These first steps generate the true

*β*and

_{m}*β*for the simulated gene pairs, from which noisy experimental measurements were subsequently inferred. From Eq 2, the true

_{p}*β*values can directly be used as (exact) measurements of mRNA abundance

_{m}*m*, since mRNA decay rate

*α*is assumed to be constant. Using Eq 3 and ignoring all constants, equally exact measurements of the abundance of ribosomal footprints

_{m}*s*for gene

_{i}*i*(ribosome profiling RPKMs) can be obtained as:

*s*=

_{i}*β*. Following this calculation, experimental variation was added to the

_{p,i}m_{i}*m*and

*s*measurements for each paralog as Gaussian noise. Relative measurement errors were sampled independently for both properties of each gene from normal distributions with means 0 and respective standard deviations

*cv*and

_{βm}*cv*, which are a percentage of the corresponding

_{βp}*m*or

*s*, and added to the exact experimental measurements previously calculated. Apparent

*β*and

_{m}*β*rates were finally computed from the noisy measurements, and used in the calculation of apparent

_{p}*log*

_{2}-fold changes, which were themselves compared to the true sampled fold changes.

A slightly modified version of this approach was also used to estimate the combined impact of noise and gene-to-gene variations in mRNA decay. In this case, the initial sampling of *β _{m}* and

*β*rates and the corresponding fold changes was accompanied by a draw of experimental mRNA decay rates and

_{p}*log*

_{2}-fold changes – both taken from one of the datasets previously used [26–29]. Then, when calculating the exact experimental measurements (before the addition of noise), mRNA abundance for gene

*i*was computed as . This way, simulated experimental measurements at both levels were impacted by variations in mRNA decay. Once Gaussian noise had been added as previously, apparent

*β*and

_{m}*β*values were computed under the assumption of invariable mRNA decay rates (using constant

_{p}*α*= 5.10 h

_{m}^{−1}), which were used to obtain the apparent fold changes. These were again compared to the true fold changes initially sampled.

#### Assessing the significance of divergence correlations

We looked at the correlations which could be expected from the fact that mRNA abundance *m* is used in the calculations of both *β _{m}* and

*β*. We first considered a scenario in which

_{p}*m*and the abundance

*s*of ribosomal footprints are entirely independent. For both variables, pairs of values (for paralogs

*P*

_{1}and

*P*

_{2}) were sampled from normal distributions and converted to pseudo rates of transcription and translation (

*β*=

_{m}*m*and ), as previously. From these, absolute and signed

*log*

_{2}-fold changes were obtained, and then used to compute the two correlations of divergence. A similar approach was used to consider a situation in which

*m*and

*s*are very strongly correlated (

*r*= 0.98), as in the dataset [30] used by [25]. In that case,

*m*and

*s*values for each paralog were sampled simultaneously from a bivariate normal distribution showing such a high correlation. Absolute and signed

*log*

_{2}-fold changes were then computed and used to calculate the divergence correlations, as described.

#### Measuring the relationship between expression divergence and functional changes

To investigate the relationship between the patterns of expression divergence and functional changes within paralog pairs, we introduced a divergence ratio *D*, measuring the bias towards transcriptional changes, and correlated it with proxies of functional divergence. This correlation analysis was also repeated to investigate how the magnitude of protein abundance divergence (*log*_{2}-fold changes of estimated protein abundance obtained from the *β _{m}* and

*β*values; see Eq 5) and functional changes are related.

_{p}The matrix of pairwise genetic interaction profile similarities was downloaded (access: 2021-12-17) from TheCellMap.org [73]. Only results from the AllxAll genetic interaction screen were considered. All paralog pairs for which both duplicates were represented in this dataset were kept, leaving 377 gene pairs. When more than one unique mutant had been screened for one gene, the corresponding vectors of pairwise similarity were averaged. For each paralog pair, we kept the mean of the reported similarity (Pearson correlation) between the interaction profiles of genes *P*_{1} and *P*_{2}.

For GO overlap, the GO Slim mappings from the SGD project [74] were downloaded (http://sgd-archive.yeastgenome.org/curation/literature/[2022-01-12]). Annotations from the three ontology levels (Process, Function and Component) were combined and the Jaccard index was computed within each paralog pair as the ratio of the intersection over the union of GO terms.

Amino acid sequences for all *S. cerevisiae* ORFs were downloaded from the SGD project (http://sgd-archive.yeastgenome.org/sequence/S288C_reference/orf_protein/[2022-01-12]). Pairwise global alignments were performed within each of the 409 paralog pairs using BioPython (v 1.80) and the corresponding amino acid identities were computed.

These three proxies of functional divergence were correlated with the divergence ratios *D* and *log*_{2}-fold changes of protein abundance described above for each paralog pair using Spearman’s *ρ*.

### Minimal model of post-duplication evolution

#### Selection on cumulative protein abundance

We defined a minimal model of post-duplication evolution based on the idea of quantitative subfunctionalization [23]. Accordingly, selection acts to maintain the cumulative protein abundance of two paralogs near an optimal level.

In accordance with [25], an ancestral gene with a transcription rate *β _{m}* and a translation rate

*β*is defined. The resulting steady-state protein abundance is obtained using equation Eq 5 [25], where

_{p}*α*and

_{m}*α*are the previously described constants set to median mRNA and protein decay rates of 5.10 h

_{p}^{−1}and 1.34 h

^{−1}[25].

This gene is also associated with a unique function *W* (*p*) of fitness according to protein abundance. This function is assumed to be a parabola of vertex (*p _{opt}, μ*), where

*p*is a gene-specific protein abundance optimum (in proteins per cell) and

_{opt}*μ*is the maximal growth rate of

*S. cerevisiae*(0.42 h

^{−1}[25]). This function is additionally described by a noise sensitivity parameter

*Q*, which measures curvature relative to the value of

*p*[25] and is also gene-specific. A higher

_{opt}*Q*means that fitness decreases more sharply following any relative protein abundance variation away from the optimum (for instance ±5% of

*p*), and thereby represents a more stringent selection to maintain protein abundance. The three parameters

_{opt}*a*,

*b*and

*c*of the standard form of the parabolic fitness function

*W*(

*p*) are all obtained directly from

*p*,

_{opt}*μ*and

*Q*(see Extended Methods).

Following the duplication of any ancestral gene, two paralogs *P*_{1} and *P*_{2} are considered. Each inherits the ancestral transcription and translation rates, meaning that *β _{m}* =

*β*

_{m,1}=

*β*

_{m,2}and

*β*=

_{p}*β*

_{p,1}=

*β*

_{p,2}. The total number of mRNAs synthesized is thus doubled while the translation rate of each transcript remains unchanged, such that protein abundance is also doubled. The function

*W*(

*p*) becomes the function

*W*(

*p*

_{1}+

*p*

_{2}) of fitness according to cumulative protein abundance for the duplicate pair. Because it seems unrealistic to assume that the new post-duplication protein abundance is perfectly optimal, the optimum of this function is set to 1.87

*p*, such that the gene doubling overshoots the optimal expression level. This value has been selected as it is the smallest multiple which ensures fitness

_{opt}*W*(

*p*

_{1}+

*p*

_{2}) > 0 immediately after the duplication event, even for the narrowest fitness function (Extended Methods). Apart from the optimum, the other parameters (

*Q*and

*μ*) of the parabolic function do not change following the duplication.

Within this minimal model, mutations affecting the *β _{m}* and/or

*β*of a paralog are filtered by natural selection solely according to their effect on fitness

_{p}*W*(

*p*

_{1}+

*p*

_{2}).

#### Addition of precision-economy constraints

To obtain the precision-economy model, we added precision-economy constraints to the minimal model described above. These implement the evolutionary trade-off between the precision and economy of gene expression that was conceptualized by [25]. This involves modifying the fitness calculations to account for stochastic fluctuations of protein abundance (expression noise) and transcription costs.

According to [25], the variance of protein abundance for a singleton gene within a population of isogenic cells can be approximated from the relative contribution of transcription to its expression. This is done according to Eq 6, where *c*_{v0} is a constant representing the minimal coefficient of variation for protein abundance observed within a clonal population, referred to as noise floor. In *S. cerevisiae*, its value is 0.1 [25].

From this relationship, we have obtained an equation for the variance on the cumulative abundance of a protein expressed from two identical paralogous genes, according to the relative expression levels and transcription rates of each of the two gene copies (Extended Methods).

Using this variance and the parabolic function *W* (*p*1 + *p*2), it is possible to compute fitness while taking into account expression noise. In this case, the fitness *F* becomes the mean of function *W* (*p _{tot}*) for a population of cells expressing the paralog pair at expression levels

*p*distributed around the mean

_{tot}*P*with variance :

_{tot}Because the variance of cumulative protein abundance *p _{tot}* and the function

*W*linking its mean to fitness are known, the populational (mean) fitness

*F*can be computed in an exact manner. The mean of

*W*(

*p*) can be expressed as equation Eq 8, where

_{tot}*a*,

*b*and

*c*are the parameters of the corresponding parabolic function. Using the definition of variance as the difference between the mean of the squares and the square of the mean, the mean of the squared cumulative protein abundance can be obtained from variance and squared mean cumulative abundance (Eq 9). Plugging this expression into Eq 8, The fitness for any combination of expression levels when accounting for expression noise is thus given by Eq 10, where depends on the relative contribution of transcription to the expression of the paralog pair (Extended Methods).

For a given protein abundance, the fitness cost of expression varies according to whether translation is done from few or many mRNAs. Following work by [25], the cost of transcription *C* for two paralogs is calculated from equation Eq 11, where *l _{m}* is the length of the pre-mRNA (identical for both copies), in nucleotides, and

*c*is the transcription cost per nucleotide, in

_{m}*nt*

^{−1}. In the current model,

*l*is considered to be a constant and set to the median yeast pre-mRNA length of 1350 nt [25]. The cost per nucleotide

_{m}*c*is another constant, which has been estimated as 1.2 × 10

_{m}^{−9}

*nt*

^{−1}under the assumption that transcriptional resources are limiting and that any increase in transcription level is done at the expense of other transcripts [25].

Taking expression noise and transcription costs into account, the population-level fitness *F* at mean cumulative protein abundance *P _{tot}* thus becomes the mean of fitness

*W*under protein abundance variance minus the penalty

*C*(Eq 12). Within the precision-economy model, mutations are favored or not by selection according to their effect on this fitness

*F*.

### Simulating the expression divergence of paralogs

We used a sequential fixation approach [43] to simulate the expression divergence of paralogs, thus making the simplifying assumption that mutation rate is low enough for only two alleles (the ancestral state and a mutant) of a given duplicate pair to coexist simultaneously in the population.

#### Initialization

To obtain a set of *n* paralog pairs, the same number of ancestral singletons are first generated as an array of *n* rows containing ancestral *β _{m}* and

*β*values. Thus, no nucleotide or amino acid sequences are modeled – only their expression phenotypes. Two distinct groups of ancestral genes are generated depending on whether their evolution is simulated according to the minimal model or to its precision-economy implementation.

_{p}Combinations of protein abundance *p _{opt}* – obtained from the reported transcription and translation rates using Eq 5 – and

*Q*are first independently sampled

*n*times with replacement from the 4440 individual genes (singletons as well as duplicates) included in the dataset. The full distribution of

*Q*for yeast genes had been inferred beforehand in accordance with [25], using Eq 13. For this sampling, values of

*Q*above the theoretical maximum reported by [25] (~ 6.8588 × 10

^{−6}) are excluded. Combinations of

*p*and noise sensitivity resulting in a fitness function curvature below a specified threshold are also filtered out, to avoid cases where a two-fold reduction of cumulative expression immediately after the duplication would be neutral or beneficial (see Extended Methods).

_{opt}For the first group of ancestral singletons, used in simulations under the minimal model, transcription and translation rates are set accordingly with Eq 13. It is used to calculate an optimal ratio set by precision-economy constraints [25]. The transcription rate *β _{m}* and translation rate

*β*which satisfy both this ratio and the optimal expression

_{p}*p*are then computed. Strictly speaking, an infinity of combinations of

_{opt}*β*and

_{m}*β*are optimal for any gene under the minimal model, as long as they result in the specified protein abundance

_{p}*p*. We thus use equation Eq 13 to reproducibly choose one realistic combination for each ancestral singleton. Because the minimal model should not account in any way for the precision-economy constraints on gene expression,

_{opt}*Q*values are subsequently resampled with replacement from the genomic distribution. As previously, values above the theoretical maximum are excluded and the sampling is repeated if the new

*Q*results in a fitness function curvature below the threshold.

To define the second group of ancestral genes – for simulations under the precision-economy trade-off –, the first combinations of *Q* and *p _{opt}* generated are again taken as a starting point. This time, ancestral

*β*and

_{m}*β*are set to the combination of rates which maximizes fitness

_{p}*F*(Eq 12), computed according to gene-specific functions

*W*(

*p*) of fitness according to expression level. This optimization is performed using a differential evolution algorithm – the

*differential evolution*method of the

*optimize*suite of the

*SciPy*module [75]. Although equation Eq 13 already describes an optimal

*β*-

_{m}*β*pair, the latter may not correspond to the true optimum, as it is restricted to values which result in protein abundance

_{p}*p*. Since the magnitude of stochastic expression fluctuations (noise) scales with protein abundance (Eq 6), the true optimal expression under the precision-economy trade-off may indeed be slightly below the abundance optimum

_{opt}*p*.

_{opt}Once generated, all ancestral genes are duplicated as previously described into two paralogs *P*_{1} and *P*_{2}, both with the ancestral rates *β _{m}* and

*β*. For each duplicate pair, a new function

_{p}*W*(

*p*) of fitness according to cumulative protein abundance is defined and its optimum is set to the ancestral

_{tot}*p*times Δ

_{opt}_{opt}= 1.87.

#### Mutation-selection approach

The two sets of duplicate pairs thereby generated are then used in a sequential fixation simulation, as previously described. One random mutation is first sampled individually for each of the *n* paralog pairs from a normal distribution with mean 0 and standard deviation *σ _{mut}*. These mutational effects are each assigned randomly to transcription or translation rates according to relative probabilities

*P*and

_{βm}*P*, which represent the relative mutational target sizes of the two traits. For instance,

_{βp}*P*= 2 and

_{βm}*P*= 1 would be used in a simulation where the mutational target size of transcription is assumed to be twice that of translation. Once it has been defined as either transcriptional or translational, each mutation is assigned at random to one of paralogs

_{βp}*P*

_{1}and

*P*

_{2}of the corresponding duplicate pair, which are both equally likely to be mutated. These steps of mutation generation and assignment are done at once for the two sets of simulated gene pairs. As such, the nth paralog couple of both simulations – respectively performed according to the minimal and precision-economy models – receives the exact same series of mutations.

So that their impact on fitness can be assessed, the sampled mutational effects are applied to the transcription or translation rates of the designated gene copies across the minimal model and precision-economy simulations. Epistasis is assumed to be multiplicative, so that every mutational effect is relative to the current trait value as shown in equation Eq 14, where *δ _{m}* and

*δ*respectively represent the transcriptional and translational magnitudes of the mutation. Because a given mutation affects

_{p}*β*or

_{m}*β*but not both simultaneously, at least one of

_{p}*δ*and

_{m}*δ*is null every time a mutational effect is applied.

_{p}The new transcription and translation rates are used to compute mutant fitness values *F _{j}* for each paralog pair of the two simulations, according to the specifications of the minimal and precision-economy models. Thus, while the same mutations are attempted across the two models, they do not necessarily have the same fitness effects in both scenarios.

Mutant fitness *F _{j}* is compared to ancestral fitness values

*F*computed using the pre-mutation

_{i}*β*and

_{m}*β*of all duplicates and a fixation probability

_{p}*P*is calculated for each mutation. This is done using the Metropolis criterion to accelerate the simulation [46]. Following equation Eq 15, any beneficial or completely neutral mutation is automatically accepted, while deleterious mutations are fixed with a probability which decreases exponentially according to the magnitude of their fitness effect.

_{fix}Prior to this fixation probability calculation, all fitness values – which are growth rates between 0 and 0.42 *h*^{−1} – are scaled between 0 and 1. The same set of randomly generated floats is used in every simulation to decide whether each mutation is rejected or reaches fixation according to *P _{fix}*. Any mutation resulting in

*F*< 0 is automatically rejected. In addition, no mutation taking

_{j}*β*or

_{m}*β*to 0 or increasing either above the maximal value reported by [25] is tolerated, irrespective of its fitness effect. Once the fate of each mutation has been established, the new transcription and translation rates of all simulated paralogs are set and this process of mutation-selection is repeated.

_{p}Simulations are stopped as soon as they have resulted in a magnitude of protein abundance divergence consistent with what is observed for the extant pairs of yeast paralogs. Following each mutation-selection round, the *log*_{2}-fold change of protein abundance is calculated for every simulated duplicate pair using Eq 1. Mood’s median test is used to compare this distribution to its empirical equivalent for real paralog pairs, computed from estimated protein abundances obtained from reported *β _{m}* and

*β*values (Eq 5). Once a p-value > 0.1 is obtained, the simulated protein abundance divergence is considered to have reached a realistic magnitude and the simulation is stopped. This is done separately for the two models, so that their respective simulations may not be completed in the same number of rounds.

_{p}#### Implementing asymmetrically distributed mutational effects

So that mutations increasing or decreasing expression could occur with different frequencies, the mutation-selection framework described above was slightly modified. The normal distribution from which mutational effects are sampled was replaced by a skew normal distribution of mean 0, standard deviation *σ _{mut}* and skewness parameter

*α*≠ 0. No changes were made to the steps of mutation assignment and selection.

#### Implementing correlated mutational effects

The sampling of mutational effects was modified to consider a bivariate normal distribution of means 0 and standard deviations *σ _{βm}* and . Within this new framework, the step of assigning mutations to one level of regulation or the other with relative probabilities

*P*and

_{βm}*P*has to be skipped, because each affects transcription and translation simultaneously. Consequently, mutational target size differences were modeled differently: the magnitudes of transcriptional and translational mutational effects were used instead of their relative probabilities. A larger mutational target size was thus implemented as a higher corresponding standard deviation of effects. For instance, a target size twice larger for transcription than for translation was modeled as

_{βp}*σ*= 2

_{βm}*σ*.

_{βp}The precise values of *σ _{βm}* and

*σ*were set to ensure a roughly constant expected protein abundance change per mutation across all simulations. To this end, an additional optimization step was added to the initialization of simulations. During this step, a brute-force optimization approach –

_{βp}*brute*method of the

*optimize*suite of the

*SciPy*module [75] – is used to find the

*σ*and

_{βm}*σ*values which verify the desired mutational target size ratio while resulting in the same mean absolute protein abundance change per mutation as a chosen reference

_{βp}*σ*in the general framework. From the two resulting standard deviations, the covariance matrix of the bivariate normal distribution is computed, according to a specified correlation coefficient

_{mut}*r*between

_{mut}*δ*and

_{m}*δ*.

_{p}#### Assessment of the simulations

Once the two simulations are completed, summary statistics are computed for each of them. The resulting distributions of *log*_{2}-fold changes (Eq 1) in transcription rates, translation rates and protein abundance are compared to their empirical counterparts using the two-sample KS test and Mood’s median test. Depending on the simulation runs, this comparison is made with the set of WGD- or SSD-derived true paralog pairs, or both. The two previously described correlations are also computed for the gene pairs obtained from each simulation.

These steps are additionally repeated when considering only gene pairs which would have remained as duplicates even if loss-of-function mutations had been allowed. Summary statistics are thus recalculated for the subset of simulated pairs for which the loss of either paralog would be deleterious (decreases fitness by more than ).

### Simulation runs

Runs of the simulation script were parallelized on a computing cluster using *GNU parallel* [76]. Except in one instance (Fig 3), all simulations were done in three replicates of 2500 paralog pairs, using the same set of three random seeds throughout. In addition, except when noted otherwise (Fig S13), the WGD-derived paralogs of *S. cerevisiae* were taken as a reference for the evaluation of the end condition and for the calculation of summary statistics. Most simulations were repeated for the same set of mutational target size ratios (*P _{βm}/P_{βp}* ∈ {1/2, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}), unless described differently. The details of each set of simulations are described in the corresponding figure legends and text.

#### Identification of the best-fitting variance of mutational effects

A grid search of *σ _{mut}* and mutational target sizes ratio

*P*was performed, as previously described. This grid search process was done twice, under the assumption of high (

_{βm}/P_{βp}*N*= 10

^{6}) and reduced (

*N*= 10

^{5}) selection efficacy. The best-fitting

*σ*mut was identified individually for each value of

_{mut}*N*. It was defined as the value which, for any of the two models (minimal and precision-economy), resulted in the lowest overall grand mean KS statistic (mean KS value for

*β*,

_{m}*β*and protein abundance across the three replicate simulations) for any

_{p}*P*.

_{βm}/P_{βp}#### Impact of asymmetrically distributed mutational effects

Further simulations were done using the modified mutation-selection approach which samples mutations from a skew normal distribution. These simulations, which considered a range of negative values for skewness parameter *α*, were performed at the best-fitting *σ _{mut}* initially identified for WGD-derived paralogs, successively for

*N*= 10

^{6}and for

*N*= 10

^{5}.

#### Impact of correlations between transcriptional and translational mutations

The grid search of evolutionary parameters was repeated for the mutation-selection framework adapted to use a bivariate distribution of mutational effects. The range of *σ _{mut}* previously considered were set as reference values, according to which the standard deviations

*σ*and

_{βm}*σ*were calculated individually for each simulation. This grid search was repeated for the two levels of selection efficacy

_{βp}*N*(Fig S10). In each case, the best-fitting reference

*σ*was identified using the same definition as previously and used in subsequent simulations combining a range of correlation coefficients

_{mut}*r*between the transcriptional and translational effects of mutations (Fig 5, S11–S12).

_{mut}## Code availability

All data preparation, simulations and analysis were done in *Python* (v 3.10 for simulations and v 3.8.16 for analyses done in notebooks), with all statistical tests being performed using their *SciPy* (v 1.8.0 for simulations and v 1.7.3 for analyses done in notebooks) implementation. The corresponding code, as well as all command lines used to perform the simulations, have been deposited on GitHub: https://github.com/Landrylab/Aube_et_al_2022.

## Supplementary Material

### Supplementary figures

## Extended Methods

### Defining gene-specific fitness functions

We defined the relationship between fitness and the expression level (protein abundance) of any gene using parabolic functions. Each such function *W* (*p*) is described by its vertex (*p _{opt}, μ*), where the highest fitness is obtained for an optimal protein abundance, and a noise sensitivity

*Q*[25] related to the curvature

*W*″ (

*p*) of the parabola (Eq 16).

_{opt}Since the curvature of the function can be isolated from the previous equation, *Q* can be used to compute the *a* parameter of the parabola, and from there, obtain the full equation of the standard form *ax*^{2} + *bx* + *c*:

Following a duplication event, the new fitness function of cumulative protein abundance *W* (*p*_{1} + *p*_{2}) is defined from the *W* (*p*) function of the ancestral gene. Only one parameter is modified, *p _{opt}*, which is multiplied by 1.87 (see below). The post-duplication fitness function thus becomes:

### Selecting the post-duplication change in optimal protein abundance

After a gene duplication event, the total transcription of the resulting gene pair is the ancestral transcription rate of the singleton times factor Δ_{m}. Similarly, the optimal cumulative protein abundance for the two paralogs is Δ_{opt} times the original optimal expression *p _{opt}* of the ancestral singleton. From equation Eq 18, a strictly positive post-duplication fitness is thus obtained when:

Multiplying by Δ_{opt}, an expression of the form is obtained:

To solve for Δ_{opt}, we consider the most extreme case: an ancestral gene with the highest possible noise sensitivity and protein abundance. Accordingly, *Q* is set to the highest possible value within the framework of [25] (~ 6.8588 × 10^{−6}) and *p _{opt}* is set to the highest expression level observed in the dataset (~ 6.0649 × 10

^{6}proteins per cell). Constant Δ

_{m}is set to 2, meaning a doubling of total transcription, and a maximum growth rate

*μ*of 0.42

*h*

^{−}^{1}is considered. The following bounds are obtained:

In accordance with this result, we used Δ_{opt} = 1.87 throughout the current work. For all the random seeds used in the simulations, this value, obtained for the minimal model, was also valid for the precision-economy model.

### Estimating expression noise for a protein expressed from a pair of paralogous genes

As mentioned in the main text, the variance of protein abundance for a single-copy gene can be estimated as:

In order to obtain a similar equation for a pair of identical paralogs expressing the same protein, the extrinsic and intrinsic components of noise must be treated separately. Because two duplicate genes are by definition present in the same cell, extrinsic fluctuations will be equal for both of them (as we assume they are identical and thereby share all regulators), while intrinsic fluctuations will independently affect the expression level of each copy. As it does not depend on any gene-specific property, the noise floor *c*_{v0} is chosen as the extrinsic component (Eq 22). Although recent modeling work indicates that this noise floor is extrinsic in nature [77], we note that it might still not fully represent extrinsic noise.

The variance on the cumulative protein abundance of *P*_{1} and *P*_{2} can be obtained from the variances of their individual protein abundances. In order to perform this calculation, the fluctuations from mean protein abundance across a population of cells can be seen as a random variable with mean 0 and variance *σ*^{2}. As shown above (Eq 22), this random variable is itself the sum of two other random variables representing the intrinsic and extrinsic components of these fluctuations. In turn, these two components are each a sum of the respective contributions of both paralogs. For intrinsic noise, the cumulative variance is the sum of the intrinsic variances respectively calculated for each duplicate gene. By definition, intrinsic fluctuations are uncorrelated between duplicates, meaning that the intrinsic components are two independent variables and that their variances can be summed. In contrast, extrinsic fluctuations are the same for two identical paralogs within the same cell, resulting in the extrinsic components of protein abundance variance for *P*_{1} and *P*_{2} being two perfectly positively correlated variables. Their cumulative variance is thus the square of the sum of their standard deviations. Accordingly, the variance of cumulative protein abundance for a duplicate couple is obtained using the following equation:

### Selection of valid ancestral genes

During the generation of ancestral singletons, a minimal threshold of fitness function curvature is enforced. This ensures that all selected genes are sensitive enough to changes in protein abundance for the immediate post-duplication loss of a paralog to be deleterious. A duplicate pair for which this would not be the case would rapidly revert to the singleton state.

Classical population genetics theory indicates that a mutation needs to cause a loss of fitness greater than the inverse of the effective population size to be efficiently selected against. Accordingly, we want to identify conditions under which the loss of a paralog immediately after duplication would reduce fitness by more than 1/*N*. That is:

For the filtering of singleton genes, it is more convenient to express *p _{tot}* as twice the ancestral protein abundance optimum

*p*. Using the parabola of form that is the fitness function

_{opt}*W*(

*p*) and adding constants Δ

_{opt}_{m}and Δ

_{opt}– describing the post-duplication change of total transcription and optimal cumulative protein abundance, respectively – to generalize to any duplication, we obtain:

Summing and simplifying, we obtain the following expression:

Accordingly, all ancestral singletons included in the current simulations combine a noise sensitivity *Q* and a protein abundance optimum *p _{opt}* which satisfy the following condition:

This condition is only valid when *c _{n}* > 0, which implies that . In accordance with this, all simulations presented in the current work are done under .

## Acknowledgments

We thank all Landry lab members, and especially Angel Cisneros, Philippe Després and Johan Hallin, for insightful journal club discussions which led to this project. Funding for this work was from a NSERC discovery grant to CRL (RGPIN-2020-04844). SA was supported by graduate scholarships from NSERC and FRQNT, as well as a PBLDD scholarship from Université Laval. LNT was supported by an Alexander Graham Bell doctoral scholarship from NSERC. CRL holds the Canada Research Chair in Cellular Systems and Synthetic Biology.

## Footnotes

Additional analyses were performed to validate the observation that yeast paralog pairs are more divergent in transcription than in translation, which is central to the modeling and simulation work presented in the manuscript. The paper was also largely rewritten, to make it clearer.