Coordinated evolution at amino acid sites of SARS-CoV-2 spike

SARS-CoV-2 has adapted in a stepwise manner, with multiple beneficial mutations accumulating in a rapid succession at origins of VOCs, and the reasons for this are unclear. Here, we searched for coordinated evolution of amino acid sites in the spike protein of SARS-CoV-2. Specifically, we searched for concordantly evolving site pairs (CSPs) for which changes at one site were rapidly followed by changes at the other site in the same lineage. We detected 46 sites which formed 45 CSP. Sites in CSP were closer to each other in the protein structure than random pairs, indicating that concordant evolution has a functional basis. Notably, site pairs carrying lineage defining mutations of the four VOCs that circulated before May 2021 are enriched in CSPs. For the Alpha VOC, the enrichment is detected even if Alpha sequences are removed from analysis, indicating that VOC origin could have been facilitated by positive epistasis. Additionally, we detected nine discordantly evolving pairs of sites where mutations at one site unexpectedly rarely occurred on the background of a specific allele at another site, for example on the background of wild-type D at site 614 (four pairs) or derived Y at site 501 (three pairs). Our findings hint that positive epistasis between accumulating mutations could have delayed the assembly of advantageous combinations of mutations comprising at least some of the VOCs.


Introduction
Evolution of SARS-CoV-2 in human hosts before November 2020 was largely neutral, with little evidence for emergence of novel adaptation with the exception of fixation of D614G in the Spike protein (Dearlove et al., 2020;MacLean et al., 2021). However, since the end of 2020, evidence for adaptive viral evolution has started to accumulate, suggesting a change in the mode of evolution (Martin et al., 2021;Rochman et al., 2021b). The subsequent pandemic was characterized by emergence of multiple concurrently circulating lineages with increased fitness compared to the ancestral variant, including Alpha (B.1.1.7), Beta (B.1.351), Gamma (P.1), and Delta (B.1.617.2). These lineages are typically characterized by high divergence, compared to cocirculating strains; divergence is often particularly pronounced at nonsynonymous sites, suggesting positive selection at origin of these variants. Some of these sites are evident of a change in selection regime at the origin of VOCs. For example, out of the 34 lineage-defining amino acid changes in the S-protein at the origin of the Omicron BA.1 sublineage (Hodcroft, 2021), eleven were characterized by strong purifying selection against changes of ancestral amino acids (Martin et al., 2022), suggesting that Omicron lineagedefining mutations at these sites were previously individually deleterious. In turn, the obvious high fitness of Omicron suggested that the origin of this variant has been characterized by a change in the selection regime at least at these sites. The reasons for this change are unclear. Several non-exclusive explanations were proposed, including a distinct mode of evolution at variant origin, for example, in an immunosuppressed individual (Corey et al., 2021;Kupferschmidt, 2021) or a different host species (Wei et al., 2021) and/or cascades of substitutions at positively epistatically interacting sites (Moulana et al., 2022). Here, we focus on the latter possibility.
Several previous studies have attempted to infer possible epistatic interactions between sites of SARS-CoV-2 genome from sequence data or experimentally. In an early study, Zeng et al. used direct coupling analysis (DCA) to search for epistasis in SARS-CoV-2 genome and reported several pairs of putatively interacting sites (Zeng et al., 2020). No pairwise interactions between sites of the S-protein were identified. For DCA to accurately detect interacting sites, the analyzed sequences need to be highly divergent (Bisardi et al., 2022). SARS-CoV-2 has a recent common ancestor, and divergence of its lineages is relatively low, limiting the applicability of DCA for this virus. Rodriguez-Rivas et al. applied DCA to homologous protein sequences from genomes of other coronaviruses and successfully predicted variability of SARS-CoV-2 protein sites, thus showing that knowledge of covariation between sites in related viruses is relevant for predicting evolution of new pathogens . In the study of Rochman et al., a method based on counting of mutations on the phylogeny was used to look for strongly associated mutation pairs (Rochman et al., 2020). They found intra-and intergenic epistasis between positively selected mutations in the nuclear localization signal (NLS) of the N-protein and RBD in the S-protein. In RBD, many of the detected epistatically interacting mutations were among the lineage signature mutations. Another proposed approach to study epistatic interaction was to estimate the fitness effects of mutations that arose on different backgrounds relative to their effects on the wild-type background (Rochman et al., 2021a). Using molecular dynamics, Rochman et. al. estimated effects of all individual nonsynonymous mutations in the S-protein RBM on binding with host ACE2 receptor and on binding with neutralizing Ab (NAb). Effects of each mutation were estimated for the Wuhan ancestral background, Delta (452 R, 478 K), Gamma variants (417T, 484 K, 501Y), and Omicron (339D, 371 L, 373 P, 375 F, 417 N, 440 K, 446 S, 477 N, 478 K, 484 A, 493 R, 496 S, 498 R, 501Y, 505 H). On average, the epistatic effects of mutations weakly stabilized NAb binding for Delta and destabilized it for Gamma and Omicron variants relative to the ancestral background. The authors concluded that the Gamma and Omicron variants had a higher potential for emergence of immune escape mutations than Delta or Wuhan variants.
For some site pairs, epistasis had been demonstrated experimentally. For example, the Q498R mutation alone affected the affinity of Spike to ACE2 only slightly (Zahradník et al., 2021), but on the background of N501Y, the affinity of binding increased by a factor of 4-25 (Starr et al., 2022a;Zahradník et al., 2021), with both mutations together increasing the affinity by up to 387-fold compared to the wild type (Starr et al., 2022a). The very strong binding provided by the double mutant allows accumulation, at Omicron origin, of multiple immune escape mutations at other sites which by themselves destabilize ACE2 binding (Moulana et al., 2022;Starr et al., 2022a).
Here, we study the mutual distribution of spike mutations in SARS-CoV-2 phylogeny to infer the pairs of sites with evidence for concordant and discordant evolution, as manifested by the propensity of substitutions at these sites to occur rapidly one after the other (for concordant evolution), or to avoid each other (for discordant evolution). We detect 46 concordantly evolved sites combined into 13 coevolving clusters, and 12 discordantly evolved sites. Many of the concordantly evolved sites carry the characteristic mutations of VOC lineages, strongly arguing for the role of positive epistasis in VOC origin.

Detecting interdependently evolving pairs of sites
To find coevolving site pairs, we modified our previously developed phylogenetic approach (Kryazhimskiy et al., 2011;Neverov et al., 2021;Neverov et al., 2015) to improve the accuracy of detecting concordantly evolving site pairs (see Materials and methods). Similarly to our previous work, as a measure of concordance of evolution at two sites, we used the epistatic statistics calculated as the weighted sum of consecutive pairs of mutations at these two sites on the phylogeny, where each mutation pair was taken with exponential penalty for the waiting time for the later mutation (Kryazhimskiy et al., 2011).
We need to introduce some definitions for further explanation. Hereafter, unless specified otherwise, we use the term 'mutation' for defining a triple of a site, ancestral and derived amino acids identifiers. Using ancestral state reconstruction, we are able to infer the order in which two specific mutations occurred in an evolving lineage. For a pair of consecutive mutations at two sites, we call the mutation that occurs first a leading mutation, and the mutation that follows it, a trailing mutation. For an ordered pair of sites, we call the first site in a pair the background site, and the second site in the pair, the foreground site. The epistatic statistic for an ordered pair of sites summarizes the weights of consecutive pairs of mutations at these sites, such that mutations at the background site are leading and mutations at the foreground site are trailing. The epistatic statistics for an unordered pair of sites is a sum of statistics of the two corresponding ordered pairs (Neverov et al., 2021).
We introduced two significant changes to the original method (Kryazhimskiy et al., 2011;Neverov et al., 2021) which improved the power to infer epistasis ('revised method', see next section). First, as in Neverov et al., 2015, we modified the null model used to calculate the significance of the epistatic statistics in permutations. While previously (Kryazhimskiy et al., 2011;Neverov et al., 2021) we permuted the positions of mutations on the tree branches at each site independently of other sites, here, we fixed the positions of mutations for the background site and permuted just the positions of mutations at the foreground site. This change allowed us to account for the possible effects of leading mutations on the topology of the phylogenetic tree; for example, an advantageous leading mutation could give rise to a prolific clade (Neher, 2013) which in turn would carry a large number of trailing mutations, artefactually inflating the epistatic statistic for this pair of sites even in the absence of epistatic interactions.
Second, while our previous work (Kryazhimskiy et al., 2011;Neverov et al., 2021) treated all substitutions at a site equally, we now distinguished between substitutions into different amino acids. Therefore, the revised epistatic statistic accounts for the preference of a specific mutation at the foreground site to follow a specific mutation at the background site. For this, in calculation of the epistatic statistic, we now additionally scored each pair of consecutive mutations by the fraction of times that the specific type of mutation at the foreground site followed the specific type of mutations at the background site, among all occurrences of this type of mutations at the foreground site. Therefore, extra weight was given to those mutation types that became more frequent on a specific background.

Estimating the power of the method to detect epistasis
To demonstrate that our revised method improves inference of positive and negative epistasis, we used MimicrEE2 (Vlachos and Kofler, 2018) to simulate clonal evolution of linked sites. We simulated two modes of evolution: (i) under positive and negative selection without epistatic interactions ('multiplicative mode'), and (ii) under epistatic selection ('epistatic mode').
Specifically, we simulated independent forward-time evolution of a population of 50,000 genotypes consisting of 100 biallelic sites. For the multiplicative mode, at the start of the simulations, 20 sites of the gene carried the disfavored allele, and were therefore under positive selection; and 20 sites carried the favored allele, and were therefore under negative selection. For the epistatic mode, 20 sites constituted 10 site pairs such that the sites within each pair evolved under positive epistasis; and another 20 sites constituted 10 site pairs such that the sites within each pair evolved under negative epistasis. Under each mode, the remaining 60 sites evolved neutrally (see Materials and methods).
We used simulations to estimate how the changes to the method for inference of epistasis introduced in this work impacted method accuracy. For this, using simulated datasets, we compared the power of the four variants of our method, corresponding to the presence or absence of the two modifications introduced in the previous section (accounting for amino acid identities and unlinking the distributions of mutations on the tree branches for background and foreground for the null model).
To compare the specificity of the four variants of the method, we used the multiplicative mode of simulation (i.e. the absence of the epistasis), and asked how frequently concordant or discordant pairs were inferred under each model. Since there was no epistasis in the simulation, each such pair was spurious, and the best method would be the one with fewest such pairs. For each method, we counted the number of spuriously inferred concordant and discordant pairs at the lowest p-value threshold in our simulation trials (10 -4 ). There were 24 concordant pairs and 16 discordant pairs in the method of Neverov et al., 2021, but just 2 concordant pairs and 2 discordant pairs in the revised method, indicating that the modifications introduced here helped improve the specificity of our approach. We used the 10% FDR level for this analysis and for all its variants (see below). For the FDR 10%, no concordant or discordant pairs were inferred in this dataset by the revised method ((Appendix 1tables 1 and 2, Appendix 1- figure 1).
To study the accuracy of the four variants of the method, we used simulations with epistasis. The revised method detected all 10 positively epistatic site pairs as concordantly evolving; additionally, it spuriously detected five other site pairs as concordantly evolving (Appendix 1-tables 3 and 4). The revised method also detected 7 out of 10 negatively epistatic site pairs as discordantly evolving, and spuriously detected four other site pairs as discordantly evolving (Appendix 1-tables 5 and 6). The three other detection models were less accurate: the number of false predictions was greater than the number of true predictions for positive epistatic pairs for all other detection methods, and for negative epistatic pairs, for two out of three methods (Appendix 1-tables 3 and 5). Therefore, the revised method was the method of choice for subsequent analyses.

Phylogenetic analysis of SARS-CoV-2 spike
To obtain a phylogeny representative of SARS-CoV-2 diversity, we downloaded 3,299,439 complete genome sequences of SARS-CoV-2 aligned to the WIV04 reference genome from the GISAID EpiCov database on 07.09.2021. We ignored insertions and deletions relative to the reference sequence and removed sequences with inframe stop codons in the spike protein. We then clustered the remaining sequences by pairwise distances between S-protein subsequences, allowing up to three mutations in the S-protein within a cluster, which resulted in 7,348 clusters. For each cluster, we selected one representative sequence of the best quality with the earliest date of sampling. The median date of representative sequences was February 10, 2021. Therefore, the dataset covered approximately equally both characteristic periods of SARS-CoV-2 evolution: the neutral period between Jan and Nov 2020, and the period of antigenic drift between Dec 2020 and May 2021 Martin et al., 2021). We classified representative sequences according to pangolin lineages. Most sequences (5,721) were of the B.1.* sublineages. The representative sequences included some from the variants of concern (VOCs) Alpha (B.1.1.7+Q.*, 951 sequences), Beta (B.1.351.*, 192 sequences), Delta (B.1.617.2+AY.*, 24 sequences) and Gamma (P.1.*, 100 sequences). The phylogeny of representative sequences was reconstructed using IQ-TREE (Minh et al., 2020). The tree was rooted by the outgroup USA-WA1/2020 (EPI_ISL_404895) that matched the sequence of the putative SARS-CoV-2 progenitor (Bloom, 2021;Kumar et al., 2021). The ancestral sequences at internal tree nodes were reconstructed by TreeTime (Sagulenko et al., 2018). We extracted the part of the alignment that corresponded to the S gene, and collapsed the internal tree branches without mutations in the S gene. The final tree had 1,783 internal branches. For each internal branch, we listed the amino acid mutations that occurred at this branch.

Concordantly evolving site pairs
To study the concordant and discordant evolution of pairs of sites in SARS-CoV-2 spike, we applied our approach to the distribution of mutations in the S gene on the reconstructed SARS-CoV-2 phylogeny. 185 of the sites carried two or more mutations on internal tree branches. We considered all 17,020 unordered pairs of these sites.
We detected 45 concordantly evolving site pairs which comprised 46 sites ( Figure 1A, Appendix 1table 7, Appendix 1- figure 2A). Our phylogenetic approach for detecting concordantly and discordantly evolved site pairs relied on the assumption that the tree provided for analysis is correct. To check the robustness of our results to uncertainty of phylogenetic reconstruction, we repeated the analysis on the tree reconstructed for the same set of sequences by the UShER (Turakhia et al., 2021) method utilizing maximum parsimony approach (see Materials and methods). Among the 45 site pairs inferred to be concordantly evolving (Appendix 1-table 7), 33 were also concordantly evolving on the UShER tree at 50% FDR, including 28 at 10% FDR (Table 1, Figure 1A, Appendix 1-table 8). Thus, we conclude that for 73% (33/45) of the detected concordantly evolved site pairs, the statistical signal was strong enough to be insensitive to phylogenetic uncertainty. In what follows, we focus on the IQ-TREE results.
To characterize the detected concordantly evolving site pairs, we considered the positions of their sites on the S-protein trimer structure (PDB ID: 7JJJ). The mean distance between coevolving site  1-table 7). The graph consists of 13 connected components, 8 of which contain just a single edge. Site pairs that were among the set of the best scoring pairs predicted for the alternative UShER (Turakhia et al., 2021) topology (FDR <10%, Table 1) are marked by asterisks. (B) Concordantly evolving sites among the lineage-defining sites of Alpha, Beta, Gamma, Delta (B.1.617.2+AY.*) and Omicron (BA.1) VOCs (Hodcroft, 2021). Concordantly evolving sites are colored in accordance with the clusters in panel A. Sites with fewer than 2 mutations which were not included in the analysis are in gray. Table 1. Concordantly evolving sites of the SARS-CoV-2 S-protein with FDR less than 10% for both reconstructed phylogenies (see Text).
The following characteristics are shown: coordinates on the S-protein sequence, nominal p-values, the value of the epistatic statistics, the total number of consecutive mutation pairs for the two corresponding ordered site pairs, numbers of mutations in consecutive pairs at sites 1 and 2, total numbers of mutations at sites 1 and 2, and the distance in the protein structure (PDB ID: 7JJJ). Pairs of sites where non-consecutive mutations are further from each other than expected (suggesting both epistatic and episodic selection; p-value <0.05 after adjustment) are indicated in bold; pairs of sites where they are closer to each other than expected (suggesting episodic rather than epistatic selection) are indicated in italic (see  Table 1 continued pairs (20.17 A) was below that expected for random site pairs between the 185 sites with two or more mutations on internal branches (36.71 A, p=0.0004) as well as for random site pairs between the 46 sites involved in concordant evolution (36.64 A, p<1E-4). Among the 42 coevolving site pairs for which the distances on the protein structure were known, in 18 (43%), the two sites were in contact, that is, within 5 A from each other.
To visualize the detected concordantly evolving site pairs, we plotted them as a graph where vertices represented the 46 sites, and edges represented the 45 pairs formed by them ( Figure 1A). The graph has 13 connected components; five of them contain between 5 and 9 sites, and the remaining 8 each consist of a single site pair. Sites of three of the five subgraphs with multiple vertices formed dense clusters on the protein structure, and for each such cluster, all or almost all sites belonged to the same domain; sites from the other two components were distributed dispersedly ( Figure 2A). Hereafter, we referred to these clusters of sites by the Roman numerals I-V. All six sites of the first dense cluster 439-444 (I) were in the receptor binding motif (RBM) within the receptor binding domain (RBD). Mutations at sites of cluster (I) affect neutralization of SARS-CoV-2 by monoclonal and polyclonal antibodies (Barnes et al., 2020;Harvey et al., 2021). The four sites of the second dense cluster (II) 356, 357, 359, and 360 were in the RBD, while the fifth site 189 was in the NTD. Interestingly, in the open conformation (PDB ID: 7KL9), sites 357, 359, and 360 contacted the NTD domain of the adjacent subunit of the S-protein trimer, but in the closed conformation (PDB ID: 7JJJ) they were not in contacts (Figure 2A and B). The third dense cluster (III) 63, 64, 67, 69, 70, 144, 213, 259, and 261 was in the NTD domain in the region of binding of neutralizing antibodies (McCarthy et al., 2021). Four sites 18,20,26,and 190 in the first of the two dispersed clusters (IV) were in the NTD domain, and one site 417 was in the RDB. The second dispersed cluster (V) comprised sites from different domains: RBM (501), a position near the S1/S2 cleavage site (681), S2 (716 and 982) and the connecting domain (1118; Figure 3).   concordantly evolving site pair (155,157) comprises sites flanking the N3 loop. Yet another concordantly evolving pair (13, 152) comprises a site from the signal peptide and a site from the N3 loop. It has been previously shown that mutations at some sites of the signal peptide could abrogate virus neutralization by antibodies due to changes of the signal peptide cleavage site (McCallum et al., 2021). Changes of lengths and sequences of NTD loops mediate Spike membrane fusion, cell entry and extracellular stability (Cantoni et al., 2022;Qing et al., 2021). Loops N1 (positions 14-26), N3 and N5 contribute to NTD supersite of binding of neutralizing antibodies (Cerutti et al., 2021;McCallum et al., 2021).

Discordantly evolving site pairs
We detected nine pairs composed of 12 sites that exhibited discordant patterns of evolution ( Table 2, Appendix 1- figure 2B). Five of these site pairs were among the 16 discordantly evolving pairs between the 15 sites detected for the alternative UShER topology for the same level of FDR (Appendix 1-table 9). Similarly to concordant evolution, the signal of discordant evolution indicated that mutations at one site in a pair arose preferentially at specific allelic contexts of another site and avoided other contexts; e.g. mutations Q675H and Q677H occurred mostly on the wild-type background N in the 501 and rarely occurred on the background of 501Y (Figure 4), while mutations A653V and S982A occurred mostly on the derived background G at site 614 and rarely followed reversions to the wild-type D background.
Among the nine predicted discordantly evolving pairs, two, (501, 677) (shown on Figure 4) and (501, 675), are between the three sites whose effects of mutations were assessed experimentally or computationally. The N501Y mutation increases the binding affinity of Spike to ACE2 up to 15-fold (Starr et al., 2022a) and increases infectivity (Liu et al., 2022) the Q677H mutation increases infectivity, propensity to syncytium formation and escape of neutralization by serum of vaccinated people (Zeng et al., 2021) and the Q675H mutation is predicted to increase furin binding affinity (Bertelli et al., 2021). Although experiments suggested a positive effect of Q677H on the background of VOCs Alpha and Gamma both carrying N501Y (Zeng et al., 2021), this fact is in disagreement with the very low population frequencies of Q677H in these VOCs (Gangavarapu et al., 2022a, Gangavarapu Table 2. Discordantly evolving sites of the SARS-CoV-2 S-protein. The following characteristics are shown: coordinates on the S-protein sequence, nominal p-values, the value of the epistatic statistic, the total number of consecutive mutation pairs for the two corresponding ordered site pairs, numbers of mutations in consecutive pairs at site 1 and site 2, total numbers of mutations at site 1 and site 2, FDR value corresponding to the p-value of the site pair obtained for the alternative phylogeny reconstructed by UShER (Turakhia et al., 2021), and the distance in the protein structure (PDB ID: 7JJJ). Pairs of sites where non-consecutive mutations are closer to each other than expected (suggesting both epistatic and episodic selection; p-value <0.05 after adjustment) are in bold; those where they are further from each other than expected (suggesting episodic rather than epistatic selection) are in italic (see Appendix 1-tables 11 and 13).    al., 2022b;Gangavarapu, 2022c;Gangavarapu, 2022d;Khare et al., 2021). The same is true for the Q675H mutation: while it was observed in some isolates of VOCs with the N501Y lineage-defining mutation, the population frequencies of these strains were also very low (Bertelli et al., 2021).

Distinguishing between epistasis and non-epistatic episodic selection
The observed concordant and discordant evolution can stem from two sources: epistatic interactions between sites and episodic selection pressure affecting two or more sites at the same time. These two cases can be distinguished by patterns of phylogenetic distribution of non-consecutive mutations. Episodic selection simultaneously affecting two sites is expected to bias phylogenetic distances between all mutations at these sites, both consecutive and non-consecutive, so that substitutions are more likely to cooccur in more closely related lineages. By contrast, positive epistasis only leads to an excess of rapid consecutive mutations and does not bias distances between mutations in different lineages (Neverov et al., 2021). To check whether coordinated evolution of some pairs can be explained by concordant or discordant episodic selection alone, we applied the test described in Neverov et al., 2021, calculating the average distances between all pairs of non-consecutive substitutions for each concordantly or discordantly evolving pair of sites (Appendix 1-tables 10-13).
For the 12 concordant site pairs from 28 pairs that were detected on both phylogenies, we could not rule out that concordance stems from episodic selection alone, since their non-consecutive mutations are also evident of clustering (z-score <0, p-value after Benjamini-Hochberg adjustment <0.05). Specifically, for all concordantly evolving pairs within cluster I (sites 439-444, 10 pairs), non-consecutive mutations occur in more closely related lineages than expected, implying that their coordinated evolution could be a result of coincident episodic selection. Concordant evolution of the remaining 16 pairs cannot be explained without invoking positive epistasis, since their non-consecutive mutations, unlike consecutive ones, either tend to avoid each other [pairs (69, 70) and (70, 144) from cluster III, z-score >0, p-value after Benjamini-Hochberg adjustment <0.05] or at least do not show any signs of clustering (for 11 pairs, z-score >0; for the remaining 3 pairs, z-score <0 but p-value of clustering even before adjustment is greater than 0.15).
For the discordantly evolving pair (614, 653), non-consecutive mutations are more distant from each other than expected, suggesting that these pairs could also represent discordant episodic selection rather than epistatic interactions. By contrast, for discordant pairs (440, 681) and (501, 675) nonconsecutive substitutions tend to be closer to each other than expected (in contrast to consecutive ones which repulse each other). This can only result from negative epistasis, probably accompanied by concordant episodic selection.

Long-term coordinated evolution of Spike
We asked whether the concordantly evolving pairs of sites that we detect are also evident of longterm coordinated evolution on timescales of diversification within the larger group of sarbecoviruses. At such larger timescales, recombination, which is common between different sarbecoviruses (Boni et al., 2020;Hu et al., 2017;Starr et al., 2022a;Wells et al., 2021), becomes a major source of the evolutionary signal. Phylogenetic methods for inference of epistasis can be confounded by recombination (Neverov et al., 2015). Instead, we rely in this section on the DCA methods which are robust to moderate amounts of recombination (Gao et al., 2019). A previous study Rodriguez-  used DCA to infer interactions within the five PFAM domains of Spike: bCoV_S1_N (PF16451.6), bCoV_S1_RBD (PF09408.11), CoV_S1_C (PF19209.1), bCoV_S2 (PF01601.17), and CoV_S2_C (PF19214.1). For each domain, we ordered site pairs by descending DCA scores, and calculated the ranks of concordantly and discordantly evolving pairs located within the corresponding domain. For each list, we referred to N D pairs with highest scores as 'high scoring pairs', where N D was the number of sites in the corresponding domain D, namely N D = 305 for bCoV_S1_N, 178 for bCoV_S1_RBD, 57 for CoV_S1_C, 519 for bCoV_S2 and 40 for CoV_S2_C.
Some of the concordantly evolving same-domain site pairs detected by us were also evolving in a coordinated fashion in sarbecoviruses in general. Specifically, 3 out of the 14 pairs of concordantly evolving sites in the bCoV_S1_RBD domain ((439, 441), (440,442) and (441, 443), all from cluster I), 2 out of the 15 pairs of sites in the bCoV_S1_N domain ((63, 64) and (69, 70), both from cluster III) and one pair of sites in the Cov_S2_C domain (1258, 1259) were among the high scoring DCA pairs. The intersections of sets of concordantly evolving pairs of sites located within the same domain with the sets of DCA high scoring pairs for corresponding domains were higher than expected by chance: p=4.7e-4 for bCoV_S1_RBD, p=3.2e-3 for bCoV_S1_N, and p=0.051 for CoV_S2_C (Fisher's exact test; the latter domain carried just one concordantly evolving pair). For the five domains with DCA results, no discordantly evolving pairs had both sites within the same domain.

Coordinated evolution of sites carrying VOC mutations
Many of the concordantly evolving sites carried mutations that defined VOC lineages. In what follows, we discuss the sites of concordant mutations found among the sites carrying the characteristic mutations of specific VOCs ( Figure 1B). We refer to sites bearing lineage-defining mutations as lineage defining sites.
For the Alpha VOC,8 (69,70,144,501,681,716,982 и 1118) out of the 10 lineage defining sites (Hodcroft, 2021) were among the 46 concordantly evolving sites. Three of these sites, 69, 70 и 144, were within the dense cluster III of sites located in the NTD; the remaining five sites represented the dispersed cluster V.
For the Beta VOC, 3 sites (417, 501, and 484) out of the 10 lineage defining sites were among the concordantly evolving sites, but these sites did not form pairs with each other. For the Gamma VOC, 7 out of the 12 lineage-defining sites (18, 20, 26, 417, 484, 501, and 655) were among the concordantly evolving sites. Four of these sites (18, 20, 26 and 417) were within the dispersed cluster IV of sites. Two sites (484, 655) constituted a distinct cluster with a single pair ( Figure 5). Finally, site 501 had no concordantly evolving partners.
For the Delta VOC (B.1.617.2+AY.*), three out of the ten lineage defining sites (157, 681, and 950) were among the set of concordantly evolving sites, however, these sites belonged to different clusters and they did not form pairs with each other.
Finally, for the Omicron VOC (BA.1), 10 out of the 36 lineage defining sites (67, 69, 70, 144, 417, 440, 484, 501, 655, and 681) were among the concordantly evolving sites. Notably, no Omicron sequences were in the dataset used to assess concordance, so the signal observed at these sites is not due to clustering of substitutions in them at the origin of Omicron. Seven of these 10 sites, with the exception of the sites carrying deletions in Omicron (69, 70 and 144), were previously shown to evolve under positive selection (Martin et al., 2022). Among these 10 sites, four (67, 69, 70, and 144) were within the dense cluster III of sites located in the NTD. The sites 501 and 681 belonged to the highly dispersed cluster V of sites but did not form a pair, suggesting that any interactions between them could be indirect and mediated by other sites; site 501 is located within the RBM, while site 681 is near the S1/S2 furin cleavage site (FCS). Two sites 440 and 417 each were the only representatives of the corresponding cluster and had no concordantly evolving partners among the lineage-defining sites. However, site 440 was discordantly evolving with site 681 ( Table 2). Mutations N440K and P681H are lineage-defining mutations of BA.1, but our analysis suggests that these mutations avoided each other before the appearance of Omicron due to negative epistasis. This indeed can be the case: despite the fact that the N440K mutation increases binding affinity to ACE2 and affects Ab neutralization efficiency (Barnes et al., 2020;Harvey et al., 2021;Moulana et al., 2022), by the end of 2021 its frequency in BA.1 was relatively low (<75%), while P681H reached 98% frequency (Gangavarapu, 2022c). However, the worldwide prevalence of the mutation pattern N440K+P681H+N501 Y was increasing during the year and at the end of 2022, it was higher than 98% (Gangavarapu et al., 2022e). This may be explained by the acquisition of additional compensatory mutations (i.e. entrenchment of the N440K mutation on the background of P681H). To the best of our knowledge, there is no experimental confirmation of negative epistasis for these mutations; however, recent research by Moulana et al., 2022 has shown that N440K exhibits weak negative epistasis with another Omicron lineage-defining mutation, N501Y. We also detected the discordant evolution of sites 440 and 501 for the USHER tree (Appendix 1-table 9). Together with the fact that 501 and 681 belong to the same cluster of concordantly evolving sites (cluster V), this supports the possibility of negative epistasis between sites 440 and 681. Finally, the two remaining sites, 484 and 655, formed a pair of coevolving sites which represented a separate single-edge cluster ( Figures 1A  and 5). Again, the first site in this pair was within the RBM and the second site was within the FCS. While 484 A is the characteristic allele of the Omicron VOC, it was rare in previously circulating strains: indeed, mutation E484A occurred in just a few (8 out of 7348) of the terminal branches of our phylogenetic tree and always in the context of the ancestral histidine at site 655. As our analysis disregards substitutions at terminal branches, E484A thus could not have contributed to our epistatic statistic. Instead, the signal of epistasis between sites 484 and 655 was formed by other mutations, notably, the frequently occurring E484K.
These findings suggest that the lineage-defining sites are enriched in concordantly evolving site pairs. To formally test this, we generated 400 artificial datasets by randomly redistributing the mutations on the phylogeny while preserving the numbers of substitutions at each site and on each branch of the tree, and applied our method for inference of coordinated evolution for each such dataset. For each pair of sites, we estimated the upper p-value as the probability to obtain the value of the epistatic statistic for independently evolving sites at least as high as that observed for the data. We ordered all site pairs in ascending order of their upper p-values, and compared the difference of mean ranks of pairs of lineage-defining sites and mean ranks of other pairs of sites, both for the real and for the random distributions of substitutions on the phylogeny (see Methods). All VOCs except Omicron had a stronger signal of coordinated evolution (lower ranks) for pairs of lineage-defining sites than for the remaining site pairs. This could not be explained by a difference in numbers of substitutions in lineage-defining and other sites because we have preserved numbers of substitutions at sites when generating datasets (Appendix 1-table 14; Appendix 1-table 15; Appendix 1-table 16;  Appendix 1-table 17; Appendix 1-table 18). Thus, our findings indicate that the lineage-defining sites of VOCs comprised pairs with an unexpectedly strong signal of concordant evolution.
As the lineage-defining sites of a VOC are by definition those that carry mutations at the origin of this VOC, the signal of concordant evolution at lineage-defining sites could arise from clustering of mutations at these sites at VOC origin. To address this confounding factor, we asked whether concordant evolution of lineage-defining sites of a VOC is also observed when this VOC itself is excluded from analysis. For this, we separately pruned all isolates belonging to each of the four VOCs Alpha, Beta, Gamma, and Delta from the tree. Note that our datasets did not contain Omicron from the start. For each of the four pruned trees, we separately predicted the concordantly evolving site pairs, and tested for enrichment of pairs of lineage-defining sites. For Alpha, pairs of lineage-defining sites still had a stronger signal of concordant evolution, indicating that at least for this VOC, the observed concordance is not due to clustering of substitutions at VOC origin (Appendix 1-table 19). For the other three VOCs, Beta, Delta, and Gamma, no enrichment was detected (Appendix 1-table 20; Appendix 1-table 21; Appendix 1-table 22). In fact, for Gamma, after pruning of the VOC clade, pairs of lineage-defining sites became depleted among the pairs with a stronger signal of concordant evolution (P=0.0272, Appendix 1-table 22); in this VOC, the observed signal of concordant evolution of lineage-defining sites is mainly caused by reversions of lineage-defining mutations within the VOC clade ( Figure 5).

Discussion
In this work, we modified the phylogenetic approach for detection of interdependently evolving pairs of sites that had been previously successfully applied for influenza A (Kryazhimskiy et al., 2011;Neverov et al., 2015) and mitochondrial proteins (Neverov et al., 2021), and applied it to the Spike protein of SARS-CoV-2. Using simulations, we show that our revised method has a better specificity and sensitivity for detection of epistatically interacting site pairs than its original version, and that it is able to detect positive as well as negative epistasis between alleles. Our simulations involved multiple sites with unfavorable alleles, so there were multiple adaptive mutations, allowing competition between multiple clones and hitchhiking. Despite these confounders, our method produced no spurious signal of epistasis when epistasis was not a part of the simulation (Appendix 1-table 1 and Appendix 1-table 2, Appendix 1- figure 1) and was able to detect true interacting site pairs with a reasonable FDR for the epistatic mode of evolution of genotypes (Appendix 1-table 3;  Appendix 1-table 4).
Similarly to other methods for inference of factors of evolution from comparative genomics data, the accuracy of our method depends on validity of its assumptions. First, we assume that the observed changes in the ancestral reconstructed states between adjacent nodes of the phylogeny are not artefactual but correspond to actual mutations. Unfortunately, for large-scale sequencing projects such as that of SARS-CoV-2, some extent of mistakes in the called sequences is unavoidable, and these mistakes are not random. Specifically, as noted previously (Martin et al., 2022), the propensity to call the ancestral nucleotides (reference bias), particularly at sites of low NGS read coverage, may lead to reversions to wild-type alleles, in particular for lineage-defining mutations. Indeed, the main cause of calling of wild-type alleles at multiple sites is the selective preference of PCR or sequencing primers to some genotypes that leads to a high variation in coverage along the genome of different isolates. In periods of change of the dominant variant, two genetically different variants circulate with high population frequencies, and mixed infections or cross-contaminations of samples may result in artifactual hybrid sequences. This can be a problem for our method. Exclusion of terminal tree branches mitigates these problems by focusing the analysis on more frequent genotypes because spurious reversions are more likely in individual sequences than larger clades. This problem is also partially addressed by the fact that we count all trailing mutations following a single leading mutation as one, so our statistic favors site pairs with multiple phylogenetically unrelated pairs of consecutive mutations; this requirement decreases the impact of a systematic loss of mutations at some sites of specific variants of genotypes.
Second, our approach relies on the correctness of the phylogeny. The inference of the true phylogeny for SARS-CoV-2 is difficult due to the huge amount of data and limited sequence diversity (Morel et al., 2021;Rochman et al., 2021b). To assess the impact of phylogenetic uncertainty on the results of analysis, we compared the sets of concordantly and discordantly evolving site pairs inferred for two trees: the maximum likelihood (ML) tree obtained by IQ-TREE 2 (Minh et al., 2020) and maximum parsimony (MP) tree obtained by UShER (Turakhia et al., 2021). The choice of the method of phylogenetic reconstruction affected the set of predicted pairs ( Table 1, Appendix 1-table 7;  Appendix 1-table 8); however, over 60% of concordantly evolving pairs detected using the IQ-TREE phylogeny were also detected using the UShER tree.
Third, we assume that evolution is clonal, so that the phylogenetic tree reflects the true evolutionary history of the genome. If genotypes recombine, some genomic sites would be incompatible, that is, have different evolutionary histories (Bruen et al., 2006). On the whole genome phylogenetic tree, those sites with evolutionary history disagreeing with the majority would be enriched in spurious parallel or convergent substitutions, possibly affecting the signal of concordance. While the recombination frequency for SARS-CoV-2 is unknown, it has been estimated that about 3-5% of circulating genotypes are recombinants (Kozlakidis, 2022;Turkahia et al., 2021) and the frequency of occurrence of recombination breakpoints in the S gene is up to three times higher than in the rest of the genome (Turkahia et al., 2021). While our dataset included four sequences of the XB recombinant lineage, exclusion of these sequences did not affect our results. Nevertheless, some of the recombinants could remain unannotated, particularly those including just a few samples and/or originating from similar sequences.
The signals of concordant and discordant evolution that we observe could come from at least two natural sources. Firstly, they could arise from epistatic interactions between sites, an explanation which we favored in our previous works (Kryazhimskiy et al., 2011;Neverov et al., 2021;Neverov et al., 2015). Previously, Ruan et al., 2022 interpreted the stepwise accumulation of mutations at origin of Gamma and Delta VOCs as evidence for epistatic interactions between lineage defining mutations. Under this explanation, concordant evolution of a pair of sites is indicative of positive epistasis between the newly arising alleles, and discordant evolution, of negative epistasis between them. Recently, epistasis in the Spike protein has been demonstrated experimentally: at 15 of the RBD sites, the effects of mutations were shown to change on the background of N501Y (Starr et al., 2022a). Unfortunately, it is hard to cross-validate the epistatic interactions between the evolutionary and experimental analyses. In our dataset, these sites are conserved: for 14 out of the 15 sites, there are no mutations on internal tree branches, making them unfit for our analysis. The remaining site 449 carried a single mutation Y449H which by itself is known to strongly decrease ACE2 binding affinity (Starr et al., 2022a) this mutation co-occurred with N501Y on one internal branch which had two descendent leaves, leading to a p-value of 0.18 for this pair in our test.
Several lines of evidence indicate that the observed coordination of evolution at different sites is largely due to epistatic, rather than (or at least in addition to) episodic, selection. First, over a half of concordantly evolving site pairs that we detected lack any signs of clustering of non-consecutive substitutions, which are expected if concordance results from coincident episodes of selection (Neverov et al., 2021), and thus can only be interpreted as epistatic pairs. The same applies to some discordantly evolving pairs, where non-consecutive mutations, unlike those occurring in the same lineage, do not show any signs of repulsion. A second piece of evidence comes from the observation of discordant evolution at site pairs (501, 675) and (501, 677) carrying mutations which are likely individually beneficial -a pattern expected under epistasis but not episodic selection. Third, six of the concordantly evolving pairs also experience coordination over the long term evolution of sabrecoviruses. The fitness landscape experienced by this group at large evolutionary timescales is likely different from the short-term landscape of SARS-CoV-2, e.g. because the Spike affinity to ACE2 could be completely lost in some viral lineages or easily expanded to new host species (Starr et al., 2022b). On such long-term time scales, the conservation of the protein fold is likely the strongest evolutionary constraint, and the fact the same site pairs evolve in a coordinated fashion within SARS-CoV-2 suggests that this short-term evolution is also shaped by epistasis.
The second natural source of coordinated occurrence of mutations is differences in selection regimes between tree branches. The observed phylogenetic clustering of non-consecutive mutations at concordant site pairs, and phylogenetic repulsion of non-consecutive mutations at discordant site pairs, imply that many concordantly and discordantly evolving site pairs were subject to coordinated episodes of adaptation, coincident or separate. When the pattern is the same for both consecutive and non-consecutive mutations, it is impossible to determine whether there is any epistatic interaction between sites, or the coordinated evolution results from coordinated additive selection alone. Over the course of the pandemic, selection on the virus has changed due to the dynamics of herd immunity of the human population, with selection favoring immune escape mutations increasing with time. In particular, this was proposed to underlie the accelerated evolution and selective advantage of Gamma (Gräf et al., 2021, p. 1) and Omicron (Martin et al., 2022) VOCs. Additionally, distinct selection regimes could correspond to long-term infection in immunocompromised individuals or in non-human hosts under reverse zoonosis events.
Overall, we find that the VOCs at their origin have gained mutations at concordantly evolving sites (Appendix 1-table 14; Appendix 1-table 15; Appendix 1-table 16; Appendix 1-table  17; Appendix 1-table 18). Has this selection been episodic or epistatic? Both hypotheses have support. On the one hand, the set of the VOCs that we indicate as evolving in a coordinated manner, Alpha, Gamma, and Omicron, are the same three VOCs that are characterized by an increase in the rate of evolution at their origin and therefore have been suggested to had evolved in distinct evolutionary regimes, perhaps that of an immunocompromised patient (Hill et al., 2022;Martin et al., 2022). A possible explanation of constellations of lineage-defining mutation at the origin of VOCs is that selection in favor of these mutations increases in immunocompromised patients, leading to episodes of adaptation (Martin et al., 2022) during which the rate of their accumulation is increased compared to the baseline observed in the general population, making them clustered in the branches corresponding to such patients. However, it is unclear why the virus had to 'wait' for an immunocompromised individual to evolve the mutations that also additively increased its fitness in the general population. The late emergence of these constellations of mutations instead suggests that the selection had to be non-additive, that is, epistatic. Our observation that lineage-defining sites of Alpha evolve concordantly even outside the Alpha clade further supports the significance of epistasis at origin of at least this VOC (Appendix 1-table 19).
Finally, the signal of concordant evolution at the origin of VOCs could come from the combination of both factors: a distinct selection regime and epistasis. The role of immunocompromised patients in the evolution of SARS-CoV-2 is even higher for epistatic than for the non-epistatic mode of evolution (Smith and Ashby, 2022). If the viral fitness, as it has been previously proposed (Hill et al., 2022;Rochman et al., 2021a;Smith and Ashby, 2022), is a trade-off between transmission efficiency and ability to avoid herd immunity, and the transmission component of the fitness landscape has valleys of low-fitness genotypes, the immunocompromised individuals with prolonged infectious periods due to relaxed selection for transmission efficiency allow the virus to accumulate mutations and cross these valleys. Direct experimental studies of the possible epistatic interactions between coevolving sites will help elucidate the mechanism of origin of radically novel viral variants.

Constructing the set of sequences
Masked full genome sequence alignment and corresponding metadata were downloaded from https://gisaid.org/ on 07.09.2021 comprising sequences for 3,299,439 isolates (see data availability statement). Based on metadata, all sequences from nonhuman hosts, without collection dates, or with wrongly formatted collection dates were excluded. For each sample, the number of gaps, 'N's and ambiguous characters were computed for each genome sequence and separately for the S-gene in non-masked positions. For each sequence, we calculated the number of positions that contained a nongap, non-'N' symbol that were aligned to non-gap positions of the reference genome sequence WIV04. We excluded sequences from the analysis with fewer 29,000 aligned positions, or with <95% of sequence length in aligned positions. Next, we sorted sequences by sampling dates and then converted each sequence into a list of changes relative to the reference, treating consecutive gaps as one change. Then, we excluded sequences having too many changes for their sampling dates. For that, for each sampling date, we computed the mean and standard deviations of the number of changes for all sequences whose sampling dates were within a half year time interval centered at this date. All samples with the number of changes exceeding the mean value by two standard deviations or more in the corresponding time interval centered on the sampling date were filtered out. Samples with preliminary stop codons inside the S gene were also excluded. This left us with 2,676,884 sequences for further analysis.
We partitioned the remaining sequences into groups of equivalence such that all sequences in each group had the same list of mutations in the S gene, and selected the sequence with the earliest collection date as the class representative. To further cluster the sequences, we reimplemented the UCLUST (Edgar, 2010) algorithm in a custom python script to allow it to process the huge amount of SARS-CoV-2 data. In contrast to the original UCLUST implementation that receives nucleotide or amino acid sequences as input, our script clustered the lists of changes in sequences that occurred relative to the sequence of the reference genome. This sped up computation, because there were few changes in SARS-CoV-2 sequences compared to the genome length. The pairwise distance between the two lists of changes was defined as the number of changes unique to each list. At the start of the procedure, samples were ordered by the sampling date; the first sample was defined to be the centroid of the first cluster and added to the list of centroids. Next, the remaining samples were iteratively compared with the centroids in the list: if the distance to some centroid was less than three, the sample was added to the corresponding cluster, otherwise it was added as a new entry to the list of centroids. Thus, by construction, cluster centroids tend to be the earliest representatives of their members. For each cluster, all samples from the corresponding groups of equivalent sequences were pooled together, and the sample having the highest quality sequence was selected as the representative of the cluster. The highest-quality sequence was defined as the sequence having the minimal number of gaps or 'N' characters in the S gene. If several sequences complied with the previous condition, the sequence with the minimal number of ambiguous characters in the S gene was selected as the representative of the cluster. If there were more than one such sequence, the one additionally having the minimal number of gaps or 'N' characters in the whole genome sequence was selected. If still more than one sequence met all the previous conditions, the first of them which had the minimal number of ambiguous characters in the whole genome sequence was finally chosen. All gaps in the selected complete genome sequences were converted to reference characters, and insertions relative to reference sequence were ignored.

Phylogenetic analysis and inference of ancestral sequences
The phylogenetic tree was constructed with IQ-TREE (v. 2.1.2, model = GTR + I+G) (Minh et al., 2020). Additionally, we obtained an alternative topology for these sequences by UShER (Turakhia et al., 2021) for this, we inserted the sequences into a prebuilt global SARS-CoV-2 phylogeny of publicly available genome sequences (http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/ UShER_SARS-CoV-2/public-latest.all.masked.pb.gz accessed on 10.01.2022). Some of our selected GISAID sequences had already been in the global tree, so we inserted only those which were absent there. Finally, we extracted the subtree corresponding to the analyzed sequences from the global tree.
The trees were rooted by the outgroup sequence USA-WA1/2020 (EPI_ISL_404895). For both trees, ancestral sequences were reconstructed by TreeTime (v. 0.8.2) (Sagulenko et al., 2018) with default parameters. Because we focused on the evolution of the S gene, we removed from each tree the internal nodes which had S-gene sequences identical to their parental nodes. Finally, we obtained the list of mutations for each tree branch.

Concordantly and discordantly evolving pairs of sites
Our approach to detection of concordantly and discordantly evolving pairs of sites is a development of the phylogenetic method published earlier (Kryazhimskiy et al., 2011;Neverov et al., 2021;Neverov et al., 2015). It is based on counting consecutive pairs of mutations on the branches of a phylogenetic tree. A pair of mutations at two different sites is called consecutive if a mutation in one of the sites occurs in the subtree of the branch which carries a mutation in the other site, and there are no other mutations at these sites on the branches that constitute the path between them; if two sites mutate on the same branch, we assume that both orders are equiprobable (Kryazhimskiy et al., 2011). Here, we consider four models for detection of epistasis which are based on this approach. Two of these models account for identities of ancestral and derived amino acids at sites; the other two models disregard the identities of amino acids and account only for occurrences of mutations on the tree branches. We define the epistatic statistics in a general form which is used for all four models. The expression of the epistatic statistic for models that ignore identities of alleles could be straightforwardly obtained from the general form.
For an ordered pair of sites ( i, j ) , the epistatic statistics e ( i,j ) is the weighted number of consecutive pairs. It is formally defined as.
Here, α and β are the ancestral alleles, and A and B are the derived alleles at sites i and j ; E is the set of all tree edges; and E ( m ≽ l ) is the set of edges descendant to the edge l . Indicator functions m, β, B ) equal to one if on branches l and m , mutations at sites i and j occur from the specified ancestral alleles to the specified derived alleles. The indicator function δ i,j ( m ≽ l ) equals to one if a mutation occurs at branch l at site i , and a consecutive mutation occurs at branch m at site j . t l,m is the length of the shortest path between tree branches l and m . The function c ( is the fraction of all mutations from β to B at site j that occur on the background of the mutation from α to A at site i . In the models of detection of epistasis which ignore allele identities, the weights w ( are set equal to 1. The epistatic statistic for an unordered pair is a total of two statistics for ordered pairs: We used two different null models for the epistatic statistics. The first model randomly and independently reshuffles mutations at each site; the second model specifically accounts for distribution of mutations at each ordered pair of sites ( i, j ) . To do that, the phylogenetic positions of mutations at the first site of a pair (background site) i are considered fixed, and the positions of mutations at the second (foreground) site j are reshuffled, thus unlinking the background and foreground. For both null models, reshuffling of mutation positions between tree branches preserves the total number of mutations at each site and the total number of mutations at each branch by using BiRewire utility (Gobbi et al., 2014). Combinations of two variants of epistatic statistics (with and without alleles) and two variants of the null model (with linked and with unlinked background and foreground) provide four models for detection of site pair epistasis which are compared in this study.
For the two models that account for allele identities, we generate amino acid sequences for internal and terminal nodes of the tree. For that, starting from the root sequence and traversing the tree from root to tips, we generate derived alleles for each mutation on a tree branch from the empirical allele distributions at this site, conditioned on the allele at the parental node of the branch. For the null model with unlinked background and foreground, the sequences in the tree nodes for the background remain unchanged, while for the foreground, new sequences are generated. For generating random distributions of mutations on tree branches and for calculation of epistatic statistics, we used the Bio::-Phylo Perl module (Vos et al., 2011) for traversing phylogenetic trees.
For each model, we perform 50,000 permutations of positions of mutations; for the two alleleaware models, for each permutation, we also generate the amino acid sequences at tree nodes. For each permutation, we calculate the epistatic statistics for unordered pairs, together with two p-values: the fraction of statistics equal or greater than the observed value (upper p-value), and the fraction of statistics equal or less than the observed value (lower p-value).
To account for multiple testing, we estimated the false discovery rates (FDR). For this, we randomly selected 400 out of 50,000 permutations. For each of the 400 permutations, for each unordered pair of sites, we calculated the epistatic statistic and the upper and the lower p-values. For each p-value threshold, we calculate the corresponding number of findings for the real dataset (R -declared positives) and the average number of findings in the fake dataset (E[V] -false positives). The FDR is the ratio of E[V] to R.
To differentiate between epistasis and non-epistatic episodic selection, for each concordantly or discordantly evolving pair of sites, we analyzed phylogenetic distances between nonconsecutive substitutions as described in Neverov et al., 2021, with the only difference that 400 sets of mutations instead of 200 were generated for the null model to obtain p-values. We then adjusted the resulting p-values using the Benjamini-Hochberg correction (Benjamini and Hochberg, 1995).

Simulation of independent evolution of sites
To model independent evolution of sites, we used genome-wide forward simulator MimicrEE2 (Vlachos and Kofler, 2018). Under independent mode of evolution, MimicrEE2 multiplies the fitness changes caused by individual mutations to get the fitness of a genome. The initial population consisted of 50,000 identical haploid genotypes with 100 biallelic (a or A) sites. At the start of simulation, 20 sites were under positive selection, since the initial allele was deleterious: its fitness was equal to 0.9945, while fitness of the other variant was equal to 1. Another 20 sites evolved under negative selection, since the initial allele was beneficial. The remaining 60 sites evolved neutrally, with the two possible variants at each site having fitness of 1. We simulated evolution of the population for 5000 generations, with mutation rate 5e-4 mutations per site per generation. Each 250 generations, 50 genotypes were sampled from the population, resulting in 1000 sequences. These were further used to reconstruct the phylogeny and measure the signal of epistasis. For each of the four detection models, the minimal FDR threshold was obtained, such that if the desired level of FDR would be below the threshold, no false concordantly evolving pairs of sites would be predicted.

Simulation of positively and negatively epistatically evolved site pairs
MimicrEE2 allows modeling of epistatic interaction between a pair of sites, assigning fitness values to all possible combinations of binary variants at these sites (aA, aA, Aa and AA). The fitness of a genome in this case is the product of fitness values of individual changes and fitnesses of variant combinations for specified pairs. The initial population consisted of identical genotypes with lowercase alleles at each site, with a total of 100 sites: 20 sites (or 10 pairs of sites) in positive epistasis, 20 sites in negative epistasis and 60 neutrally evolving sites with no epistatic interactions. At neutrally evolving sites, all variants had fitness equal to 1. To model positive epistasis between a pair of sites, we assigned fitness 1 to variant combinations aa and AA and fitness 0.9945 to aA and Aa, so that the first mutation at one of the sites was deleterious, and the consequent mutation at the second site restored the initial fitness. To model negative epistasis between a pair of sites, we assigned fitness 1 to variant combinations aa, aA and Aa, and fitness 0.8 to AA, so that the first mutation at one of the sites was neutral, and the consequent mutation at the second site was deleterious. Again, we simulated evolution of a haploid population of size 50,000 for 5000 generations, with mutation rate 5e-4 mutations per site per generation, sampled 50 genotypes each 250 generations, and used the resulting 1000 sequences to reconstruct the phylogeny and measure the signal of epistasis.

Comparing the signal strength of coordinated evolution across site subsets
To compare the strength of concordant evolution for pairs of sites belonging to a specified site subset with other pairs of sites, we designed a test comparing the average ranks of pairs of sites in these two subsets of pairs. First, we ordered all site pairs from high to low strength of concordant evolution of their sites according to ascending upper p-values. Then, we calculated the mean ranks of site pairs in each of the two subsets: in a subset of pairs for which both sites belong to the specified subset of sites, and in the complementary subset of pairs. A direct comparison of ranks in these two subsets of pairs may be misleading, because a test for coordinated evolution of sites may assign better p-values to pairs of sites having some properties, for example those with higher evolutionary rates, which could be overrepresented in a specified subset of sites. Therefore, we need to compare the mean ranks for the bipartition of site pairs for the actual data with the distributions of mean ranks for the same bipartition for data simulating independent evolution of sites. As simulated data, we used a set of 400 permutations of mutations on the tree used for the FDR estimation. Calculating the ranks of pairs, we then assigned the average values of ranks for site pairs having the same values of the ordering statistic. As the test statistic, we used the difference of the mean ranks of pairs in two parts of the bipartition. To calculate the p-value of the test, the test statistic for the actual data was compared with the test statistic obtained for the simulated data.
We applied this procedure to separately tested subsets of lineage-defining sites of VOCs that circulated before May 2021: Alpha (B.1.1.7+Q.*), Beta (B.1.351.*), Delta (B.1.617.2+AY.*) and Gamma (P.1.*). The list of lineage-defining mutations in the S-gene was compiled according to Hodcroft, 2021Hodcroft, (accessed 23.07.2022. We considered only missense lineage-defining mutations and excluded from the analysis the site S:614, because the mutation D614G was fixed in all considered VOCs. To test whether the enrichment of pairs of lineage-defining sites for Alpha, Beta, Delta, and Gamma among concordantly evolving site pairs is due to mutations that occurred at origins of these VOCs and/or reversions of these mutations within the VOC clades, for each VOC, we obtained a pruned tree which did not carry the sequences of this VOC, as well as the sequences descendant from the VOC clade but not annotated as this VOC. For this, we removed from the phylogeny the isolates that were assigned by PANGOLIN (O'Toole et al., 2021) to the corresponding VOC in their metadata as well as the isolates of other lineages descendant to the ancestral node of the VOC that carried on its branch the earliest lineage-defining mutation. This procedure is conservative, in that it could exclude some of the non-VOC samples that carried a subset of VOC lineage-defining mutations. For each pruned tree, we then applied the procedure of finding concordantly evolving pairs of sites, and then the procedure of comparing the strength of concordant evolution for pairs of lineage-defining sites and the complementary subset of pairs.
Comparing the sets of concordantly evolving pairs and DCA high scoring pairs Alignments of protein sequences and DCA scores for pairs of alignment sites for the following PFAM domains of Spike bCoV_S1_N (PF16451.6), bCoV_S1_RBD (PF09408.11), CoV_S1_C (PF19209.1), bCoV_S2 (PF01601.17), CoV_S2_C (PF19214.1) were kindly provided by Dr. Rodriguez-Rivas upon our request. For each alignment, we identified the sequence most similar to the SARS-CoV-2 Spike protein encoded in the genome EPI_ISL_404895 using BLASTP []. The lengths of domains mapped on the SARS-CoV-2 Spike protein were 305 for bCoV_S1_N, 178 for bCoV_S1_RBD, 57 for CoV_S1_C, 519 for bCoV_S2 and 40 for CoV_S2_C. The total number of site pairs for each domain equals N D (N D -1)/2, where N D is the number of sites in the domain. For each domain, we considered only those concordantly evolving pairs of sites for which both sites were located within the mapped domains; there were 15 such pairs in bCoV_S1_N, fourteen in bCoV_S1_RBD, zero in CoV_S1_C, four in bCoV_ S2, and one in CoV_S2_C. We estimated the chance to find the observed numbers of concordantly evolving pairs of sites among the N D pairs having the highest DCA scores using Fisher's exact test.

Funder
Grant reference number Author The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication. Additional files

Data availability
The codes and data files required to reproduce the analysis are available at https://github.com/gFedonin/EpiStat, (copy archived at swh:1:rev:6c528ec4991854467e98684aa471a9b8f095b875). The data for reproducing analyses from the paper are in the archive file -" sars-2. epistat. data. tgz". All genome sequences and associated metadata in the dataset EPI_SET_20220729hq (Appendix 1 - Number of lineage-defining sites: 14. P{delta(simulations)≤delta(data)}=0.635.