SUMMARY
The extent to which chance and contingency shaped the sequence outcomes of protein evolution is largely unknown. To directly characterize the causes and consequences of chance and contingency, we combined directed evolution with ancestral protein reconstruction. By repeatedly selecting a phylogenetic series of ancestral proteins in the B-cell lymphoma-2 family to evolve the same protein-protein interaction specificities that existed during history, we show that contingency and chance interact to make sequence evolution almost entirely unpredictable over the timescale of metazoan evolution. At any historical moment, multiple sets of mutations can alter or maintain specificity, and chance decides which ones occur. Contingency arises because historical sequence substitutions epistatically altered which mutations are compatible with new or ancestral functions. Evolutionary trajectories launched from different ancestors therefore lead to dramatically different outcomes over phylogenetic time, with virtually no mutations occurring repeatedly in distantly related proteins, even under identical selection conditions.
INTRODUCTION
Whether the world is a necessary outcome of fate-like processes has long fascinated philosophers, historians, and scientists (Aristotle, 1938; Gould, 1989; Jablonski, 2017; Ramsey and Pence, 2016; Travisano et al., 1995). In biology, if a single optimal state is accessible from all starting points, then natural selection is expected to predictably drive the realization of that state, deterministically and independently of initial conditions. Chance and contingency reduce the necessity of evolutionary outcomes in distinct ways and arise from different aspects of the genotype-phenotype relationship. If multiple adaptive outcomes are accessible from a given genetic starting point, then which one is realized depends on chance; a process free of chance is deterministic and predictably leads to the same outcome again and again (Lobkovsky and Koonin, 2012; Monod, 1972). If the accessibility of outcomes differs among starting genotypes, then the realized outcome of evolution is contingent on the particular starting point from which a historical trajectory is initiated; even in the absence of chance, the course of evolution can be predicted only given full knowledge of the starting genotype and the particular outcomes to which it can lead (Blount et al., 2018).
To systematically characterize how chance and contingency affected historical protein evolution, we would have to travel back in time, re-launch evolution repeatedly from each of various starting points that arose during history, and compare the outcomes realized in replicates from the same starting point (for chance) and from different starting points (for contingency) (Figure 1A). This design cannot be executed directly, but we can come close by reconstructing ancestral proteins as they existed in the deep past (Thornton, 2004) and using them to launch replicated evolutionary trajectories in the laboratory under selection to evolve the same molecular functions that they acquired during history. Studies to date have used less direct approaches to provide partial insights into the effects of chance and contingency on the repeatability of sequence evolution (Orgogozo, 2015; Storz, 2016). Experimental studies have typically replicated trajectories from just one or a few starting points (Bollback and Huelsenbeck, 2009; Counago et al., 2006; Dickinson et al., 2013; Kacar et al., 2017; Meyer et al., 2012; Zheng et al., 2019) – a design that can illuminate the effect of chance but is limited in addressing contingency; moreover, these studies have not used ancestral proteins ordered across time to make their findings historically relevant (Baier et al., 2019; Blount et al., 2012; Kryazhimskiy et al., 2014; Salverda et al., 2011; Wunsche et al., 2017). Mechanistic studies have established that particular historical mutations have different effects when introduced into different ancestral backgrounds, suggesting contingency, but history happened only once, so these studies do not elucidate the role of chance (Gong et al., 2013; Harms and Thornton, 2014; Ortlund et al., 2007; Starr et al., 2018). Case studies show that phenotypic convergence by closely related populations or species in nature sometimes involve the same gene, and occasionally the same mutations, suggesting some degree of necessity; however, these studies do not involve replicate lineages from multiple starting points, so they do not allow chance and contingency to be disentangled (Orgogozo, 2015; Storz, 2016). It is therefore clear that chance and contingency can affect the sequence outcomes of protein evolution, but the extent of their effects, interactions, and causes at historical timescales remain largely unknown.
Here we combine ancestral protein reconstruction with repeated experimental evolution in the B-cell lymphoma-2 (BCL-2) family of proteins. These proteins promote or inhibit apoptosis in eukaryotes (Chipuk et al., 2010; Danial and Korsmeyer, 2004) and are regulated by specific protein-protein interactions (PPIs) with coregulators (Chen et al., 2005; Lomonosova and Chinnadurai, 2008), the structural and biochemical basis of which is well understood (Kale et al., 2018; Petros et al., 2004). BCL-2 family proteins are found throughout Metazoa (Banjara et al., 2020; Lanave et al., 2004) and differ from each other in PPI specificity: proteins in the Myeloid Cell Leukemia Sequence 1 (MCL-1) class bind both the BID and NOXA coregulators, whereas proteins in the BCL-2 class (a subset of the larger BCL-2 protein family) bind BID but not NOXA (Figure 1B) (Certo et al., 2006). The two classes are structurally similar and use the same cleft to interact with their coregulators (Figures 1C and S1). Prior work has identified some determinants of coregulator binding and specificity (Dutta et al., 2010), but little is known about how the MCL-1 and BCL-2 classes evolved their distinct binding profiles during history. We used ancestral protein reconstruction to characterize the historical trajectory by which these proteins evolved their coregulator specificities and then applied a rapid experimental evolution technique to repeatedly select reconstructed ancestral family members to acquire these same functions. This approach allowed us to quantitatively dissect the roles of chance, contingency, and necessity in the evolution of PPI specificity in this protein family.
RESULTS
BID specificity is derived from an ancestor that bound both BID and NOXA
We first characterized the historical evolution of PPI specificity in the BCL-2 family using ancestral protein reconstruction. We inferred the maximum likelihood phylogeny of the family, recovering the expected sister relationship between the metazoan BCL-2 and MCL-1 classes (Figure 2, Figure S2A). We then reconstructed the most recent common ancestor (AncMB1) of the two classes, which represents a gene duplication that occurred before the last common ancestor of all animals, by inferring the posterior probability distribution of ancestral states and the maximum a posteriori (MAP) amino acid sequence (Supplementary Table S1). We also reconstructed 11 other ancestral proteins that existed along the lineages leading from AncMB1 to human BCL-2 (hsBCL-2) and to human MCL-1 (hsMCL-1).
We then synthesized genes coding for these proteins and experimentally assayed their ability to bind BID and NOXA using a proximity-dependent split RNA polymerase (RNAP) luciferase assay (Figures 1D and 1E) (Pu et al., 2017). AncMB1 – the deepest ancestor reconstructed -- bound both BID and NOXA, as did all ancestral proteins in the MCL-1 clade and extant MCL-1 from humans (Figure 2, Figure S2, Table S1). Ancestral proteins in the BCL-2 clade that existed before the last common ancestor of deuterostomes also bound both BID and NOXA, whereas BCL-2 ancestors within the deuterostomes bound only BID, just as human BCL-2 does. BID specificity therefore evolved when the ancestral ability to bind NOXA was lost. This event occurred between AncB2 (in the ancestral eumetazoan) and AncB4 (in the ancestral deuterostome): the precise timing of the loss of NOXA binding during this interval cannot be resolved by these experiments, because AncB3 (the protostome-deuterostome ancestor) had an intermediate phenotype, binding NOXA weakly.
To further test this history, we characterized the coregulator specificity of extant BCL-2 class proteins from taxonomic groups in particularly informative phylogenetic positions. Those from Cnidaria were activated by both BID and NOXA, whereas those from protostomes and invertebrate deuterostomes were BID-specific (Figure 2, Figure S2, Table S1). These results corroborate the inferences made from ancestral proteins and suggest that the loss of NOXA binding occurred in the BCL-2 class between the last common ancestor of Eumetazoa and before the protostome-deuterostome ancestor. This reconstruction of history was also robust to uncertainty in the ancestral sequence reconstruction, because experiments on “AltAll” proteins at each ancestral node – which combine all plausible alternative amino acid states (PP>0.2) in a single “worst-case” alternative reconstruction – also showed that BID specificity arose between AncB2 and AncB4 (Table S2).
A directed continues evolution system for the rapid, repeated selection of PPI specificity
We next developed a new phage-assisted continuous evolution (PACE) system (Esvelt et al., 2011) to rapidly evolve ancestral and extant BCL-2 family proteins to acquire the same PPI specificities that existed during the family’s history (Figures 3A, S3A, and S3B). In general, PACE links the life cycle of M13 bacteriophage to the evolution of a desired function through the inducible expression of an essential phage gene, gIII. By focusing variation and selection on a particular protein of interest, PACE typically allows evolution of a new function in very large, replicated populations over the course of just a few days.
Previous PACE experiments have evolved binding to new protein partners using a bacterial 2-hybrid approach (Badran et al., 2016). To evolve novel PPI specificity, however, requires simultaneous selection for a desired PPI and against an undesired PPI. We therefore developed a PACE system that uses two orthogonal proximity-dependent split RNAPs that imposes both selection and counterselection on defined PPIs (Pu et al., 2017) (Figure 3A). In our system, the N-terminal portion of RNAP is fused to the BCL-2 family protein of interest. The target protein to which binding is desired is fused to the C-terminal portion of an RNAP that recognizes a promoter linked to gIII; binding of the BCL-2 family protein to the target protein reconstitutes this RNAP, causing gIII to be expressed and infectious phage to be produced. To achieve specificity, the protein to which no binding is desired is fused to the C-terminal portion of a different RNAP, which recognizes a promoter that drives expression of a dominant negative form of gIII (gIIIneg); binding of the BCL-2 family protein to this counterselection protein reconstitutes this second RNAP, producing gIIIneg and causing virtually all phage produced in the system to be incapable of infection (Carlson et al., 2014). To increase the mutation rate within the PACE system an arabinose-inducible mutagenesis plasmid (MP) was included in host cells.
We optimized this system using activity-dependent plaque assays and phage growth assays to drive acquisition of the natural binding profiles of BCL-2 and MCL1 proteins. We showed that phage carrying either hsBCL-2 or hsMCL-1 could replicate when BID binding was selected for. Conversely, phage carrying hsBCL-2 could replicate when BID binding was selected for and NOXA binding was selected against, but phage carrying hsMCL-1 could not (Figures 3B and 3C). Likewise, phage carrying hsMCL-1 could replicate when binding NOXA was selected for, whereas phage carrying hsBCL-2 could not. Finally, neither phage could replicate when NOXA binding was selected for and BID binding was selected against.
Characterizing chance, contingency, and necessity
To characterize the extent of chance, contingency, and necessity on the outcomes of evolution, we used our PACE system to drive extant and reconstructed ancestral proteins to repeatedly recapitulate or reverse the historical loss of NOXA binding. Three proteins with the ancestral phenotype—hsMCL-1, AncM6, and AncB1—were selected to acquire the derived BCL-2 phenotype, losing NOXA binding but retaining BID binding (Figure S3C). Conversely, hsBCL-2, AncB5, and AncB4 were evolved to revert to the ancestral phenotype, gaining NOXA binding (Figure S3D). For each starting genotype, we performed three to four replicate experimental evolution trajectories, for a total of 23 separate trajectories (Table S3). All trajectories produced the target phenotype, and we confirmed that the selected PPI specificity had been acquired by randomly isolated phage clones using activity-dependent plaque assays as well as in vivo and in vitro binding assays (Figures 3D and S3E-L).
We found very limited necessity in the sequence outcomes of directed evolution. High-throughput sequencing of the phage populations revealed that 100 mutant amino acid states (at 75 different sites) evolved to frequency >5% in at least one replicate (Figures 4A-B and S4A-D and Table S4). Of these acquired states, 73 occurred in only a single trajectory, although the 27 that appeared more than once is significantly greater than expected by chance alone (P<10−5 by permutation test, given the same number of mutations and replicates). Most of the repeated mutations were observed in replicate trajectories from the same starting genotype, indicating some degree of determinism in evolutionary outcomes from a given starting genotype (Figure 4C). Only four mutations were observed in more than one replicate from different starting genotypes, however, suggesting considerable contingency. Only a single mutation was observed in every replicate from more than one starting point, and this mutation was not experimentally sufficient to confer the desired phenotype (Figure S4E). When mapped onto the protein structure, all repeatedly mutated sites either directly contact the bound peptide or are on secondary structural elements that do so (Figures 4D and 4E), suggesting that evolutionary necessity in this protein family, to the extent that it exists, reflects a limited number of structural mechanisms by which PPIs can be altered.
To quantify the effects of chance and contingency on the genetic outcomes of evolution, we analyzed the genetic variance -- the probability that two alleles, chosen at random, are different in state -- within and between replicates from the same and different starting genotypes. To estimate the effects of chance, we compared the genetic variance between replicates initiated from the same starting genotype (Vg) to the within-replicate genetic variance (Vr). We found that Vg was on average 1.3-fold greater than Vr, indicating that chance causes evolution to produce substantially divergent genetic outcomes (Figure 5A). We estimated the effects of contingency by comparing the genetic variance among replicates from different starting genotypes (Vt) to the genetic variance among replicates from the same starting genotype. Contingency had an even larger effect than chance, increasing Vt by an average of 1.8 fold compared to Vg. Together, chance and contingency had a multiplicative effect, increasing the genetic variance among replicates from different starting genotypes (Vt) by an average of 2.4-fold compared to Vr.
To determine the evolutionary tempo and mode of chance and contingency, we evaluated how chance and contingency changed as the phylogenetic distance between starting genotypes increased. Across all pairs of starting genotypes, the combined effects of chance and contingency increased significantly with phylogenetic distance (slope = 0.208, P = 1×10−4) (Figures 5B and S5A); this relationship remained statistically significant after accounting for phylogenetic non-independence by comparing only successive ancestors along phylogenetic lineages (P = 0.03). Across the metazoan evolutionary timescale of our experiments, contingency and chance together increased genetic variance more than three-fold. This relationship was driven almost entirely by an increase in the effects of contingency with phylogenetic distance (slope = 0.128, P = 5×10−4), with the effects of chance displaying a very weak increase (slope=0.019, p=0.002). The combined effect of chance and contingency increased faster with phylogenetic distance than contingency alone did. Chance and contingency therefore magnify each other’s effects, increasing the unpredictability, and reducing necessity, in evolutionary outcomes as sequences diverge through history.
Sources of contingency
Contingency arises when sequence outcomes differ between genetic starting points; this conditionality arises when epistasis exists – that is, when the effects of mutations depend on the sequence background into which they are introduced. Epistasis could produce contingency if mutations that confer selected phenotypes in some backgrounds fail to confer that phenotype in others or are not tolerated at all; alternatively, epistasis might merely change the probability that selection favors one mutation over another by altering the relative magnitude of their effect on the phenotype. To experimentally distinguish between these possibilities, we transferred sets of mutations that arose repeatedly during experimental evolution into other starting genotypes and then measured their effects on BID and NOXA binding (Figures 6A and 6B). Eleven of the 12 “swaps” failed to confer the PPI specificity on other genetic backgrounds that they did on their evolved genetic background. Of the six swaps of mutation sets that arose when MCL-1-like proteins were evolved to lose NOXA binding and preserve BID binding, five compromised BID binding and three had no effect on NOXA binding when introduced into other backgrounds. Of the six swaps of mutation sets that caused BCL-2-like proteins to gain NOXA binding, four failed to gain any detectable NOXA binding and one compromised BID binding in other backgrounds. The only case in which mutations that conferred the target phenotype during experimental evolution had the same effect in another background was the swap of mutations evolved on AncB5 into AncB4, and these two genotypes are more similar to each other than any other pair. Thus, contingency arose because of strong epistatic interactions between the mutations that confer new specificities in our experiments and the historical substitutions that occurred during the intervals between ancestral proteins; these substitutions transiently opened and blocked routes to adaptive phenotypes by making specificity-changing mutations subsequently—and previously—deleterious or invisible to selection.
To identify when these epistatic effects arose, we mapped these incompatibilities onto the phylogeny (Figure 6C). If mutations that arise in PACE using some ancestral protein as a starting point decrease BID binding when introduced into one of that protein’s descendants, then restrictive substitutions must have occurred historically on the branches between the two proteins. Conversely, if PACE-derived mutations compromise or abolish BID activity when introduced into a protein’s ancestor, then historical permissive substitutions occurred between them. Similarly, if PACE-derived mutations confer the selected change in NOXA activity in one protein but fail to have that effect in its ancestors, then potentiating substitutions must have occurred on the phylogeny. Finally, if PACE-derived mutations fail to have the effect on NOXA activity in a protein’s descendants, then depotentiating substitutions occurred during history.
We found that all four types of epistatic effects occurred, with multiple types of contingency-inducing substitutions present on most branches. The only exception—the branch from AncB4 to AncB5, on which only depotentiating substitutions occurred—is the branch immediately after NOXA function changed during history, the shortest of the branches examined, and the one with the smallest effect of contingency on genetic variance (Figure 5A). Even across this branch, the PACE mutations that restore the ancestral PPI specificity in AncB4 can no longer do so in AncB5. These results indicate that the paths through sequence space that allow new NOXA functions to evolve repeatedly changed during the BCL-2 family’s history, even during intervals when the proteins’ PPI binding profiles did not change.
Consistent with this conclusion, the mutations that altered NOXA binding during PACE almost never recapitulated or reversed substitutions that occurred during the historical interval in which NOXA binding changed (Figures 5C and S5B). The sole exceptions, f160L and y259F, arose when AncB4 was evolved to regain the ancestral NOXA binding and are reversals to ancestral amino acids observed in AncB1, prior to the historic loss of NOXA binding. No PACE trajectories for the loss of NOXA binding recapitulated substitutions that occurred during this interval. These observations suggest that the substitutions that changed PPI specificity during historical evolution had the capacity to confer this function—and, if reverted, to restore the ancestral function—only during a very limited temporal window.
Sources of chance and determinism
For chance to strongly influence the outcomes of adaptive evolution, multiple paths to a selected phenotype must be accessible with reasonably similar probabilities of being taken; such a situation could arise if several different mutations (or sets of mutations) can confer a new function, or if there are mutations that have no effect on function that can accompany function-changing mutations by chance. To distinguish between these possibilities, we measured the functional effects of sets of mutations that arose in different replicates when hsMCL-1 was evolved to lose NOXA binding (Figures 6D and S6A). One mutation (v189G) was found at high frequency in all four replicates, but it was always accompanied by other mutations, which varied among trajectories. We found that the apparently deterministic mutation v189G was a major contributor to the loss of NOXA binding, but it had this effect only in the presence of the other mutations, which did not decrease NOXA binding on their own. v189G therefore required permissive mutations, and there are multiple sets of mutations that can exert that effect, so which ones occur in any replicate is a matter of chance. All permissive mutations were located near the NOXA binding cleft (Figure 6E). Other starting genotypes showed a similar pattern of multiple key mutations capable of conferring the selected function (Figures S6B and S6C).
To better understand the genetic causes of chance’s effects on BCL-2 family protein evolution, we performed PACE experiments in which we evolved hsBCL-2 to retain its BID binding, without selection for or against NOXA binding, and then screened for variants that fortuitously gained NOXA binding by activity-dependent plaque assay (Figures S7A and S7B). This strategy allowed us to distinguish between the number of mutations that can confer this phenotype and the influence of selection in favoring a subset of mutations that most rapidly increase NOXA binding. All four replicate populations produced clones that gained NOXA binding at a frequency of about ~0.1% to 1% – lower than when NOXA binding was selected for, but five orders of magnitude higher than when NOXA binding was selected against (Figure 7A). From each replicate, we sequenced three NOXA-binding clones and found that all but one of them contained mutation r165L (Figure 7B), a mutation that also occurred at high frequency in hsBCL-2 trajectories when NOXA binding was selected for. We introduced r165L into hsBCL-2 and found that it conferred significant NOXA binding with little effect on BID binding (Figure S7C). Several other mutations also appeared repeatedly in clones that fortuitously acquired NOXA binding, which were also acquired under selection for NOXA binding. A similar pattern of common mutations was observed in AncB4 and AncB5 clones that fortuitously or selectively evolved NOXA binding (Figures S7F-J). These observations indicates that the determinism observed in these experiments arises because there are few genotypes that can increase NOXA binding while retaining BID binding, rather than because there are many such genotypes, but under strong selection a few are strongly favored over others.
Chance and contingency alter accessibility of new functions
Although we found a strong influence of chance and contingency in the sequence outcomes of evolution across ancestral starting points, they had little influence on the acquisition of the historical functions per se, because all replicates from all starting points acquired the selected BCL-2- or MCL-1-like specificity. However, changes in function that did not evolve during history might be subject to more chance or contingency. Using PACE and several proteins that can bind both BID and NOXA as starting points, we selected for variants that could bind NOXA but not BID, a phenotype not known in nature. hsMCL-1 readily evolved the selected phenotype, but two experimentally-evolved variants of hsBCL-2 that had acquired NOXA binding went extinct under the same selection conditions (Figures 7C and S7K-M). The inability of the derived hsBCL-2 genotypes to acquire NOXA specificity was not attributable to a general lack of evolvability, because these same genotypes successfully evolved in a separate PACE experiment to lose their NOXA binding but retain BID binding (Figure S7N). These results illustrate that contingency can influence the ability to evolve new functions.
DISCUSSION
Without chance, contingency in history would be inconsequential, because all phylogenetic lineages launched from a common ancestor would always lead to the same intermediate steps and thus the same ultimate outcomes. On the other hand, without contingency, chance events would be inconsequential, because the mutations that happen to occur in any time interval would not affect the next set of available steps or ultimate outcomes; every path that was ever open would remain forever so. Our work shows that, across the timescale of BCL-2 sequence evolution, the interaction of chance and contingency eliminates virtually all traces of necessity and repeatability in sequence evolution under strong selection (except for sites that are constrained and never vary).
To systematically assess chance and contingency in BCL-2 evolution, we developed an efficient new method to simultaneously select for particular interactions and against others, which allowed us to drive the evolution of new PPI specificities in multiple replicates without severe bottlenecks in just days. By applying this technology to reconstructed ancestral proteins, our experiments directly illuminate how the sequence divergence that occurred during BCL-2 historical evolution generates chance and contingency in the experimental evolution of the same functional specificities that evolved during the family’s history. Our design does not directly reveal chance and contingency during the historical evolution of these proteins, because the selection pressures, environmental conditions, and population parameters that pertained in the deep past are unknown. It is likely, however, that chance and contingency were at least as significant during history as in our experiments for several reasons. First, our experiments likely favor determinism in the evolutionary process due to the very large population sizes, extremely strong selection pressures, and high mutation rates, all of which were directed at a single gene; if, as seems likely, BCL-2 historical evolution involved smaller populations, weaker selection, lower mutation rates, and a larger genetic “target size” for adaptation, then chance would have played an even larger role during history than in our experiments. Second, chance and contingency both arise from the relationship between genotype and function—chance from the number of accessible genotypes that confer a selected-for function, and contingency from epistasis-induced differences in the accessibility of paths among starting genotypes. The fold and structural basis for binding of coregulatory proteins is conserved across extant BCL-2 family members, so it is unlikely that these factors, and the ways that they lead to chance and contingency, were dramatically different between the evolutionary processes underlying changes in BCL-2 specificity during history and in our experiments.
Epistasis is a common feature of protein structure and function, so we expect that the accumulating effect of contingency that we observed across phylogenetic time among BCL-2 family members will be a general feature of protein evolution, although its rate and extent is likely to vary among folds and functions (Chandler et al., 2013; Harms and Thornton, 2013; Shah et al., 2015; Storz, 2018). The influence of chance is likely to depend on the particular function that is evolving: more determinism is expected for functions with very narrow sequence-structure-function-constraints (e.g. (Hawkins et al., 2018; Karageorgi et al., 2019; Menendez-Arias, 2010; Meyer et al., 2012; Salverda et al., 2011; Storz, 2016), than those for which sequence requirements are less strict (e.g. (Blount et al., 2012; Starr et al., 2017; Yokoyama et al., 2008; Zheng et al., 2019). In the extreme, when diffuse selection pressures are imposed on whole organisms, virtually no repeatability has been observed among replicates, because loci across the entire genome are potential sources of adaptive mutations (Kryazhimskiy et al., 2014; Wunsche et al., 2017).
Our results have implications for protein biochemistry, engineering, and evolution. First, we found no evidence that ancestral proteins were more “evolvable” than extant proteins: the selected-for phenotypes readily evolved from both extant and ancestral proteins; further, the effect of chance in these processes was virtually constant across about a billion years of evolution, indicating that the number of accessible mutations that can confer a selected-for function was no greater in the past than it is in the present. Second, the strong effect of contingency suggests that efforts to produce proteins with new functions by design or directed evolution will be most effective if they use multiple different protein sequences as starting points, ideally separated by long intervals of sequence evolution. Third, our finding that affinity for new partners can evolve fortuitously during selection to maintain existing binding indicates that new interactions can sometimes be acquired neutrally and then, if they become functionally significant, be amplified or preserved by selection; conversely, maintaining specificity during natural and directed evolution requires selection against off-target interactions (Levin et al., 2009).
Finally, our observations suggest that the sequence-structure-function associations apparent in sequence alignments are, to a significant degree, the result of shared but contingent constraints that were produced by chance events during history (Gong et al., 2013; Harms and Thornton, 2014; Starr et al., 2018; Starr et al., 2017). Present-day proteins are physical anecdotes of a particular history: they reflect the interaction of accumulated chance events during descent from common ancestors with necessity imposed by physics, chemistry and natural selection. Apparent “design principles” in extant or evolved proteins express not how things must be—or even how they would be best—but rather the contingent legacy of the constraints and opportunities that those molecules just happen to have inherited.
AUTHOR CONTRIBUTIONS
All authors contributed to conception of the project. BCD, VCX, and JP designed the PACE dual-selection system, which VCX and JP engineered, optimized, and implemented. VCX and JP performed PACE, biochemical assays, and sequencing experiments, with input from BPHM. BPHM and JWT developed and designed the evolutionary and genetic analyses. BPHM led and performed the phylogenetic, genetic, and evolutionary analyses, with input from VCX and JP. All authors contributed to writing the manuscript.
DECLARATION OF INTERESTS
J.P. and B.C.D. have a patent on the proximity-dependent split RNAP technology used in this work. The content is solely the responsibility of the authors and the funders had no input on the study design, analysis, or conclusions.
DATA AVAILABILITY
Raw high-throughput sequencing data are available on SRA (accession number PRJNA647218). Processed HTS data are available on Dryad (https://datadryad.org/stash/share/Ty32n-2c8nUZuiFegxDdy7wNeGDEsqlwtL8tOKz1RWU).
CODE AVAILABILITY
Code used for analysis and phylogenetic data are available at https://github.com/JoeThorntonLab/BCL2.ChanceAndContingency
Table S1 (Excel File). Luciferase assay data for all experiments. Related to Figures 1, 2, 3, 6, 7, S4, S6, S7.
Table S2 (Excel File). Posterior probabilities for reconstructed ancestral sequences. Related to Figure 2. For each sequence, the site, maximum likelihood (ML) amino acid state, and posterior probability (PP) are given, along with the highest posterior probability alternative (ALT) state and posterior probability for this alternative state. Locations of paralog specific insertions are shown as gaps. For each reconstructed sequence, the average posterior probability for the maximum likelihood states and the alternative states is given, as are the number of sites where the posterior probability of a non-maximum likelihood state is greater than 0.2. Finally, the average, maximum, minimum, and variance among reconstructed ancestors is given for the average maximum likelihood posterior probability and the number of non-maximum likelihood states greater than 0.2 posterior probability.
Table S3 (Excel File). List of PACE experiments, amino acid alignments of hsBCL-2 and hsMCL-1 with their structural global alignment, and mutations found in individual variants isolated from PACE. fs is frameshift, aa is amino acid, co is codon change. Related to STAR Methods.
Table S4 (Excel file). PACE library and high-throughput sequencing (HTS) data. Related to STAR Methods. PACE experiments are listed in the tab “Library-info” which contains the name, purpose of the experiment, and HTS experiment numbers. The tab “Primers for HTS” lists all the primer sequences used for HTS library constructions. The tab “MiSeq reads number” include the read number of each library in this MiSeq run and the library sample information. The library samples are labeled as X*-end or X*-$$. “X” indicates the specific PACE experiment, “*” the experimental replicate, “end” means samples were collected after 96 hours when the experiment finished, and “$$” indicates the time point after removing chemostat A (e.g., “B2-24” is a sample from replicate 2 of evolution B and collected 24 hours after removing chemostat A, which is 72 hours from the start of PACE). The tab “genotype” includes the aligned protein sequences with corresponding residue numbers. The ‘Frequency’ tab contains the non-wildtype amino acid frequency of each sample for each site.
Table S5 (Excel file). Descriptions of plasmids and sequences used. Related to STAR Methods.
ACKNOWLEDGEMENTS
We thank members of the Dickinson and Thornton groups for helpful comments on the manuscript, S. Ahmadiantehrani for editing, and R. Ranganathan for the use of the Illumina MiSeq instrument. This work was supported by CAREER Award 1749364 from the National Science Foundation (NSF) to B.C.D and National Institutes of Health grants R01GM131128 and R01GM121931 to J.W.T. V.C.X was supported by an NSF Graduate Research Fellowship (DGE-1746045). B.P.H.M. was supported by a NIH NRSA award (F32GM122251).
Footnotes
↵* email: pujy{at}uchicago.edu, joet1{at}uchicago.edu, dickinson{at}uchicago.edu