Jointly modeling deep mutational scans identifies shifted mutational effects among SARS-CoV-2 spike homologs

Deep mutational scanning (DMS) is a high-throughput experimental technique that measures the effects of thousands of mutations to a protein. These experiments can be performed on multiple homologs of a protein or on the same protein selected under multiple conditions. It is often of biological interest to identify mutations with shifted effects across homologs or conditions. However, it is challenging to determine if observed shifts arise from biological signal or experimental noise. Here, we describe a method for jointly inferring mutational effects across multiple DMS experiments while also identifying mutations that have shifted in their effects among experiments. A key aspect of our method is to regularize the inferred shifts, so that they are nonzero only when strongly supported by the data. We apply this method to DMS experiments that measure how mutations to spike proteins from SARS-CoV-2 variants (Delta, Omicron BA.1, and Omicron BA.2) affect cell entry. Most mutational effects are conserved between these spike homologs, but a fraction have markedly shifted. We experimentally validate a subset of the mutations inferred to have shifted effects, and confirm differences of > 1,000-fold in the impact of the same mutation on spike-mediated viral infection across spikes from different SARS-CoV-2 variants. Overall, our work establishes a general approach for comparing sets of DMS experiments to identify biologically important shifts in mutational effects.

this scenario, the use of α d is important for allowing the model to learn the full latent phenotypic distance between wildtype 23 sequences from different experiments. Otherwise, if all such mutations are sampled, α d can be locked at zero. 24 If there is an unmeasured mutation in the bundle, our strategy for removing this mutation from the summation term of 25 Eq. (1) is to completely ignore this site when analyzing the data. This involves ignoring all variants with mutations at this site. 26 Although this can result in throwing away data, it is necessary for the math to work out. For instance, in the above example 27 where Y30A is missing from the non-reference experiment and A30Y is missing from the reference experiment, α d will be used 28 to capture the effect of A30Y when modeling the latent phenotype of the wildtype sequence from the non-reference experiment. 29 If a variant from the non-reference experiment has a Y30G mutation, this mutation will be included in the summation term of 30 Eq. (1) as A30G. However, since α d is a constant offset used to compute the latent phenotype of all variants from experiment 31 d, and since α d includes the effect of A30Y, Eq. (1) would effectively be adding the effects of both A30G and A30Y when 32 computing the latent phenotype of this particular variant, which does not make sense. Because of this problem, we choose to 33 ignore all data at such sites. where θ scale is a scaling factor that allows the sigmoid to stretch or contract along the y-axis and θ bias is a factor that 39 allows it to translate along the y-axis. We do not include analogous factors for the x-axis, since these factors would be Hill coefficient of 1.

42
• Softplus: This function is a log-transformed version of a sigmoid function from above, with θ scale and θ bias parameters serving 45 similar roles. This function may be more appropriate for modeling functional scores in log space.

46
• Single-layer neural network with sigmoid activations: where n is the number of units in the hidden layer and is defined by the user; w (h) ∈ R n ≥0 and b (h) ∈ R n are the hidden that all functional scores are directly comparable.

66
Ideally one or more of the same sequences would be included in the DMS library of each experiment, so that all scores could 67 be normalized to the same sequence. However, if there are no common sequences, then it may be possible to computationally 68 estimate how to normalize scores so that they are more directly comparable.

69
To this end, the model includes an additional parameter γR D that allows functional scores from each experiment to be 70 normalized as follows: where γ d for the reference experiment is locked at zero. There is a theoretical basis for adding γ d to y v,d if functional scores are 73 log-enrichment ratios. As mentioned above, log-enrichment ratios are normalized so that the wildtype sequence from a given 74 experiment has a value of zero, according to the formula: Thus, adding γ d to y v,d is akin to renormalizing the log-enrichment ratios so that a different sequence has a functional score 77 of zero. γ can be used to recalibrate the functional scores of each non-reference experiment to the wildtype sequence of the 78 reference experiment. If these values are not experimentally measured, the model allows γ to be fit during optimization.

79
Alternatively, γ can be locked at zero if the user does not wish to implement this optional feature.  To this end, the software package provides an option to truncate predicted functional scores. To enable this, the mathematical 93 model passes predicted functional scores through an activation function, t. Users can choose between multiple options for t, 94 one of which (softplus) truncates predicted functional scores at a user-specified lower bound, and the second of which (identity) 95 leaves the functional scores unaltered. The functional forms of these options are as follows.

96
• Softplus: where l d is the floor of the dynamic range of the DMS experiment from experiment d, and is set by the user. Functionally  There is a small range of input values where the function smoothly transitions between a linear regime (where data is not 101 truncated) and a flat regime (where data is truncated). But, the 0.1 constants from above ensure a sharp transition 102 between regimes. This function is differentiable and thus allows for gradient-based optimization.
• Identity:   This is an experimental artifact that may partially account for why inferred shifts tend to be positive in Delta relative to BA.1. In both panels, functional scores are clipped at a lower bound of -3.5, which we use as an approximation of the lower bound of the assay's dynamic range. and each dot corresponds to a different mutation. The effects inferred for BA.1 were highly correlated. The shifts in effects were also well correlated, but showed variability between models. Thus, the results were partially dependent on the choice of reference experiment, and suggest that future users of this method should also compare results between different choices. But the mutations with large shifts in one model almost always had large shifts in the other models, suggesting that the overall results were robust to this choice in the case of this study.   S7. Patterns of mutational effects in BA.1 compared to shifts in Delta or BA.2 in this study and other studies. The top row of plots show data from this study, while the subsequent rows show data from other studies (see row labels on the right). In each row, the left scatter plot shows mutational effects in BA.1 (x-axis) compared to shifts in mutational effects in Delta relative to BA.1 (y-axis), where each dot corresponds to a unique mutation. The right scatter plot shows the same data, but for shifts in mutational effects in BA.2. In each plot, the red line shows the median of all shift parameters within a sliding window of size 1.0 along the x-axis, centered on a given x-value. If a shift occurs along the blue dashed line of y = −x, the shift completely neutralizes a mutation's effect. The second and third rows report data from DMS experiments on the RBD measuring the impact of mutations on ACE2 binding, where effects quantify log 10 fold change in the K d of binding, or expression on the surface of yeast, where effects quantify log 10 fold change in expression (4, 5). The fourth row reports data from a computational method for estimating mutational effects from millions of naturally occurring sequenced SARS-CoV-2 genomes, with effects independently estimated for each clade (6). For each of these datasets, we computed shifts by subtracting the effect of a given mutation in Delta or BA.2 by its effect in BA.1, only showing data for mutations that overlap between this study and the study in a given row, with row labels indicating the number of mutations being plotted. In this study, shifts in Delta tend to be positive, softening the effects of mutations that are deleterious in BA.1 (top row). As indicated by the red lines, this same qualitative pattern is also visible in the DMS data on RBD-ACE2 binding (second row) and estimates of mutational effects from natural-sequence variation (fourth row), supporting the hypothesis that Delta might be generally more tolerant of mutations than BA.1, though these datasets also contain a subset of mutations with negative shifts, suggesting that this increased mutational tolerance may not be as stark as the shifts from this study suggest. The DMS data on RBD expression (third row) shows a different pattern, where mutations with small deleterious effects in BA.1 tend to be positively shifted in Delta (in agreement with the above trend), though ones with larger deleterious effects in BA.1 tend to have strong negative shifts. This pattern does not support the hypothesis that Delta is more mutationally tolerant than BA.1, at least in terms of mutational effects on RBD expression. In contrast to Delta, the shifts in BA.2 inferred from other datasets do not show strong positive or negative biases as a function of mutational effects in BA.1 on the x-axis, as indicated by the mostly flat red lines at y-values of approximately zero. Note: in the fourth row, data was clipped to the plotted xand y-ranges to aid with the comparison between studies, though the red lines were not affected by this clipping. Fig. S8. Correlation in mutational effects between this study and other studies. The other studies are the same ones described in Figure S7. (A) Each plot compares the mutational effects in BA.1 measured in this study (x-axis) with effects measured from another study (y-axis), where each dot corresponds to a unique mutation present in both studies being compared. Each plot shows an intermediate correlation (r is Pearson's correlation coefficient and p is the corresponding p-value), suggesting that the effects estimated in this study capture biological signal. (B) Each plot compares the shifts in mutational effects in either Delta (left column) or BA.2 (right column) measured in this study compared to shifts computed from each dataset in panel A, which we computed here by subtracting the effect of a given mutation in Delta or BA.2 by its effect in BA.1. The shifts in Delta are correlated in all three rows of comparisons, further suggesting that the shifts estimated here capture biological signal, although the the correlations are weak. The shifts in BA.2 show little-to-no correlation with those from the RBD DMSs, perhaps because the largest shifts inferred from our study are outside of the RBD, whereas there is a somewhat higher correlation with shifts estimated from natural-sequence variation of full-length spike, though the overlapping mutations from these datasets still do not include most of the strongly shifted mutations in BA.2 from our study. The reason why these correlations are not higher could be due to noise, as well as differences in selective pressures between datasets (e.g., selection for spike-mediated entry of pseudoviruses into cells is shaped by an overlapping but distinct set of selective pressures compared to selection for ACE2 binding of RBD monomers on the surface of yeast  shows the inferred global-epistasis function, which is used to map a variant's predicted latent phenotype (x-axis) to a predicted functional score. Each dot corresponds to a single variant from a single experiment, with dots colored by experiment. The y-axis shows the observed functional score of each variant. All dots would be positioned along the sigmoid if predicted and observed functional scores were identical. Instead, the dots track with the sigmoid with some deviations, indicating good fit despite some combination of inaccurate predictions and experimental noise. Vertical lines indicate the inferred latent phenotype of each homolog, all of which cluster together. (B) Correlation between predicted and observed functional scores with dots colored by experiment. Scores were well correlated for each homolog and for each replicate fit, where r reports the corresponding Pearson's correlation coefficient for a given homolog (hue; see legend in panel A).