Harnessing Matrix Completion to Unify and Extend Viral Serology Studies

Characterizing the antibody response against large panels of viral variants provides unique insight into key processes that shape viral evolution and host antibody repertoires, and has become critical to the development of new vaccine strategies. Given the enormous diversity of circulating virus strains and antibody responses, exhaustive testing of all antibody-virus interactions is unfeasible. However, prior studies have demonstrated that, despite the complexity of these interactions, their functional phenotypes can be characterized in a vastly simpler and lower-dimensional space, suggesting that matrix completion of relatively few measurements could accurately predict unmeasured antibody-virus interactions. Here, we combine available data from several of the largest-scale studies for both influenza and HIV-1 and demonstrate how matrix completion can substantially expedite experiments. We explore how prediction accuracy evolves as the number of available measurements changes and approximate the number of additional measurements necessary in several highly incomplete datasets (suggesting ∼250,000 measurements could be saved). In addition, we show how the method can be used to combine disparate datasets, even when the number of available measurements is below the theoretical limit for successful prediction. Our results suggest new approaches to improve ongoing experimental design, and could be readily generalized to other viruses or more broadly to other low-dimensional biological datasets.


Introduction
The constant evolution of viruses such as influenza and the human immunodeficiency virus (HIV-1) leads to viral escape and degraded immunity. As a result, the influenza vaccine must be periodically reformulated to focus our antibody response against currently circulating strains (Petrova & Russell, 2017). In the context of HIV-1, cocktails of antibodies are being developed to counteract the vast diversity of viral strains within each host (Wagh et al., 2018). Thus, the ability to characterize the spectrum of virus evolution against panels of antibodies or antisera has become critical to the development of new vaccine strategies and successful viral control.
While thousands of new variants emerge each year, only a small fraction are functionally characterized in terms of their effects upon the antibody repertoire. Virus quantification assays (typically assessing binding, neutralization, or hemagglutination inhibition (HAI)) are time-and resource-intensive, necessitating small antibody-virus panels that rarely exceed several dozen strains (Creanga et al., 2021;Georgiev et al., 2013;Kong et al., 2015;Li et al., 2012;Wrammert et al., 2011). Crucially, virus panels often differ between studies, making it difficult to translate the lessons learned in one context to predict new viral behavior.
Nonetheless, such small-scale studies have helped shed light on how viral infection or vaccination reshape the antibody response (Amanna et al., 2007;Arevalo et al., 2020;Thompson et al., 2016) and constrain viral evolution (Dingens et al., 2019;J. M. Lee et al., 2019;Perelson et al., 2012). Moreover, an increasing number of large-scale studies have probed how our antibody response copes with the diverse array of viruses we encounter. For example, each year the influenza surveillance network surveys ~100,000 viruses against ~10 reference sera to determine whether a new antigenicallydistinct virus strain has emerged. Extending this surveillance by combining data and harnessing the insights from the small-scale studies could improve vaccine selection (Petrova & Russell, 2017). This ability to combine datasets is especially feasible in the context of antibody-virus interactions, where past studies have repeatedly shown that the data are low-dimensional (Lapedes & Farber, 2001;Ndifon, 2011). Strains within an antigenic cluster often elicit similar responses across experiments (Smith et al., 2004). Different antibodies targeting the same epitopes, or sera from individuals with similar exposure histories, can show correlated activity across viral strains (Fonville et al., 2014;Kucharski et al., 2015). The profile of serum inhibition from an individual exposed to two antigenically distinct viruses might be expressed as a combination of profiles from two individuals, each exposed to one of the viruses (Carter et al., 2016). Mathematically, these features are consistent with a low-rank structure ( Figure S1).
The theory of compressed sensing, and the related field of matrix completion, describe how such structure can be leveraged in the setting of under-sampled experiments (Candes & Recht, 2009;Cleary & Regev, 2020). These techniques have been extensively studied theoretically and applied in many settings from online recommender systems to magnetic resonance imaging (Candes & Tao, 2010). With matrix completion, the goal is to use a relatively small number of observations (individual matrix entries; here, corresponding to a virus-antibody or virus-serum pair) to identify low-rank features that can be used to infer missing values.
In the context of antigenic characterization, matrix completion has been used to refine the calculation of antigenic distance between strains (Cai et al., 2010;Ndifon, 2011). However, prior work has not sufficiently addressed questions of sample complexity (describing how the magnitude of errors will change based on the number of available measurements), which is critical for the successful design of under-sampled experiments. Moreover, while earlier studies have performed matrix completion on the data within a given study (what we will call intra-table matrix completion, Figure 1A), the limits of intertable matrix completion are not clear ( Figure 1B). It remains to be demonstrated to what extent partially overlapping studies from varying geographic regions and points in time can be integrated; whether different types of studies (e.g., infection versus vaccination, or using fluorescence versus hemagglutination inhibition (Fonville et al., 2014;Nguyen Vinh et al., 2021) can be combined; and whether studies across species (particularly human versus ferret in the case of influenza) can successfully inform each other in the context of matrix completion. Successful inference in these tasks could vastly increase the amount of data used to study viral evolution or develop vaccination strategies, and could be applied to improve ongoing experimental design.
Here, we develop a low-rank matrix completion framework that can analyze the increasing number of serological studies of the antibody response against viruses. We focus on two of the largest such datasets: (1) The Fonville dataset consists of six influenza studies (1 ferret study, 1 human infection study, and 4 human vaccination studies), all measured using HAI (Fonville et al., 2014). (2) The Catnap dataset contains antibody-virus neutralization measurements from over one hundred HIV-1 studies (Yoon et al., 2015).  Such experiments commonly contain a subset of missing measurements. Ideally, subsequent analyses can ignore these missing values and use all available data -yet in some cases, such as quantifying the dimensionality of the antibody response using singular value decomposition, entire rows or columns must be removed to obtain a complete subset of data ( Figure 1A). Moreover, measurements across multiple studies are difficult to combine since virus panels tend to only partially overlap, further constraining the number of measurements ( Figure 1B). For example, only 9/81 viruses were included in all six Fonville studies, and 0/933 of the Catnap viruses were included in all 157 studies from that database.
In this work, rather than trimming down to a complete subset of data, we instead harness the low dimensionality of the immune response and extrapolate missing antibody-virus measurements both within and between studies. Mathematically, missing values are imputed by minimizing the nuclear norm of the antibody-virus data to create a complete and low-rank matrix. This framework can be implemented with fewer than 10 lines of code in Python or Mathematica (see Methods and the linked GitHub repository). In the following sections, we illustrate this approach using progressively more complex forms of matrix completion, creating a framework that naturally incorporates antibody-virus measurements from multiple studies to inform future efforts.

Results
Antibody-virus data are consistent with a low-rank structure We first quantified how well a low rank approximation can characterize the influenza and HIV-1 datasets. Every dataset we evaluated is considerably low-rank in comparison to its naive dimensionality, with ~95% of variance in entries explained by rank 6-23 approximations, depending on the dataset (Table S1). These low-rank approximations minimally shifted the existing measurements, with a root-mean-squared error (RMSE) ≤0.06 for the log 10 (titers) across all studies (i.e., un-logged titers are shifted by 10 0.06 =1.1-fold on average, representing near perfect fidelity).
These results hold for the nearly complete influenza studies ; each 160 sera × 20 viruses and rank 8-9) as well as the more incomplete studies ; between 35-324 sera × 57-74 viruses and rank 6-8) [ Table S1]. Interestingly, the low rank of the Fonville data is similar to the number of antigenic clusters (n=14) in the combined panel of 81 viruses (Fonville et al., 2014), although substantial differences between viruses in the same antigenic cluster are observed in the completed matrix ( Figure S2). The low rank result also holds for large tables with hundreds of antibodies and viruses, including the monoclonal antibody Catnap database (373 antibodies × 933 viruses and rank 23). The larger rank in the Catnap dataset reflects a greater number of viruses (~12x more than the next largest study) in addition to the larger diversity between the viruses in HIV-1 versus influenza data. This consistent, low-rank structure suggests that substantial information about missing entries may be contained in the already measured values. According to established theoretical results, if we wish to recover the missing entries of an n 1 × n 2 matrix with minimal error by matrix completion, the number of available measurements (m) needs to be linearly proportional to the rank (r) of the complete matrix and proportional to n log(n) (with n=max(n1,n2)) (Candes & Tao, 2010;Recht, 2011). The number of necessary measurements is also proportional to the coherence, μ, of the low-rank projection, which quantifies how concentrated the low rank factors are on individual entries. Matrices with coherent factors -which in this context may correspond to individual antibodies or viruses with unique behavior that is orthogonal to the behavior of all other antibodies or viruses -require more measurements for successful recovery. Together, at least m > C μ r n log(n) (Equation 1) measurements are necessary, for some positive constant C (we approximate C below) (Candes & Tao, 2010).
While these results establish the sample complexity necessary for recovery with minimal error, there may be practical scenarios that deviate from some underlying assumptions of the theory. Most importantly, the theoretical result assumes the available entries have been sampled uniformly at random from the matrix, but in reality, available data are likely to have been sampled in a biased (non-uniform) fashion. Moreover, when the number of samples is below the theoretical limit, the recovered values may nonetheless contain useful information about missing entries that might, for example, be used to effectively prioritize future experiments. Below we investigate how the error in completed entries evolves under increasing sample complexity, and under uniform and non-uniform sampling.

Sample complexity under random uniform sampling
To quantify the accuracy of intra-table completed values with uniform random sampling, we split the available data in each study into observed measurements and a withheld validation set, and varied the fraction of entries withheld while completing the matrix using the observed entries as input ( Figure 1C). As the observed fraction approaches 100% of available entries, we obtain an approximated error for the truly missing values ( Figure 1D, purple boxes).
For instance, when 50% of measurements are observed in Fonville Study 1 (the first of six datasets within (Fonville et al., 2014)), matrix completion can impute the values of all missing and withheld titers ( Figure  2A). Using the completed data, we can quantify how accurately the withheld validation set was recovered -with Pearson's correlation coefficient r 2 =0.8 (i.e., 80% variance explained) and RMSE=0.4 of the log10(titers) ( Figure 2B) -and estimate the complexity of the data as the rank required to explain 95% of the variance, with rank=6 for the data shown ( Figure 2C, Methods). This analysis is repeated for all six Fonville studies at varying fractions of observed data, with the accuracy of each completion quantified using Pearson's correlation coefficient (r 2 ) and the RMSE of the log10(titers) averaged over 10 runs. For nearly complete studies with few missing values, we expect the final estimated error to approach the experimental accuracy of the assay, whereas the error for studies with a sizable fraction of missing values will depend on how the number of measurements compares to the bound discussed above (Equation 1). In the nearly complete tables (Fonville Studies 5 and 6), we indeed observe that the RMSE approaches 0.3 (in log10 units) ( Figure 2D), comparable to the 10 0.3 =2-fold accuracy of the HAI assay (see Figure S8 of (Fonville et al., 2014)). As more entries are withheld, the error increases, as expected ( Figure 2D). The same trend is observed in the more incomplete influenza datasets, as well as the Catnap database ( Figure 3), with the final estimated RMSE for the log10 titers across all tables varying from 0.2 in Fonville Studies 3-4 (90% of entries available), to 0.8 in Catnap (13% of entries available).
Since all six Fonville studies impute the missing entries with an RMSE on the order of the experimental error, we can use these curves to determine how fewer measurements could have led to equally successful completions. In brief, because the experimental error may vary by dataset and assay, we determine when the slope of each RMSE curve bottoms-out (when slope≥-0.5, see Methods). We find that in Fonville Study 1, 67% of measurements were needed (implying ~1,000 measurements could have been "saved"); in Fonville Study 2, 37% were needed (~12,000 saved measurements); in Studies 3 and 4, 40% were needed (~20,000 saved measurements); and in Studies 5 and 6, 55% were needed (~3,000 saved measurements). The total across all studies, compared with measuring every value (including those missing in the published datasets), is ~35,000 potential saved measurements out of a total of ~60,000 possible measurements (58%).
Next, we used these empirical measures of sample complexity necessary for successful completion to approximate the number of additional measurements necessary to achieve the lower bound RMSE in the more incomplete Catnap database. In monoclonal antibody studies, we estimate that with matrix completion on all available data the missing values had an r 2 =0.8 and RMSE=0.8, with both metrics expected to rapidly improve as more antibody-virus interactions are measured (Figure 3). To approximate the minimum number of additional measurements necessary to achieve the lowest possible RMSE, we first use the available data in each Fonville study to estimate the coherence (Methods). We then use the number of necessary measurements empirically determined above, together with the theoretical lower bound in Equation 1, to find the value of the constant, C, in each study (Table S2), and average this value across the studies. Finally, given this constant, the size of the monoclonal antibody Catnap table, and estimates of its rank and coherence, we approximate the number of additional measurements necessary for successful completion (Table S2). This suggests that 27% of all possible measurements are necessary, implying that ~48,000 of the ~300,000 missing antibody-virus pairs need to be measured. We repeated this analysis for the polyclonal antibody Catnap data, which, like the Fonville tables, was sufficiently complete to get an empirical estimate of the fraction of measurements necessary for successful completion, and found good agreement between the empirical and theoretical results (44% and 41%, respectively). We stress, however, that the theoretical estimates are a rough approximation, meant more to identify the scale for potential under-sampling, rather than attempting to define a specific point at which measurements should stop.

Non-uniform sampling: Withholding a virus in inter-table completion
We next examined matrix completion in a practical scenario with non-uniform sampling. We asked if intertable completion can be used to predict inhibition against viruses that were not included in a given study ( Figure 4A, blue squares). This form of completion is especially useful for rapidly evolving pathogens such as influenza or HIV-1, where studies published in the same year will often only have partially overlapping virus panels ( Figure 1B).
We evaluated the accuracy of inter-table completion by merging all the Fonville studies, resulting in a matrix with 1,147 serum samples and 81 viruses. While only 9 viruses were included in all six studies, every virus was present in at least two studies ( Figure S3). For each of 311 virus-study pairs, we withheld from the merged matrix all available measurements for that virus in that study, and asked whether the withheld values could be successfully inferred using matrix completion on the remaining data. We found that, on average, the imputed titers had an RMSE=0.3±0.2 ( Figure 4B), corresponding to roughly 2-fold error when predicting the behavior of a virus that is entirely absent from a given study.  Figure S3). Swarm plots depict r 2 [top] and RMSE [bottom] when withholding all possible viruses from each study. Viruses whose predicted or measured log10(titers) have standard deviation ≤0.2 were excluded from the r 2 analysis since their correlation coefficient was artificially deflated (Methods). (C) Monoclonal antibody data and polyclonal serum data from Catnap against 71 shared viruses were combined. For each virus, all its measurements were withheld from either the monoclonal or polyclonal dataset and imputed via inter-table completion. These imputed values were ranked from most-to-least potent antibody interactions, and in that order the true measurements were revealed and characterized as strong interactions provided either the IC50<0.1 μg/mL (for antibody measurements) or ID50>1,000 (for sera). The gray lines represent results for individual viruses, and show how quickly the strong interactions are identified when using these matrix completion results, with the average over all viruses shown in blue. The fraction of measurements required to identify 80% of the potent antibodies is emphasized by the dashed lines.
While the majority of inter-table completions led to accurate predictions, some virus-study pairs showed a large RMSE ( Figure 4B). Inter-table completion performed best for the two pairs of Fonville studies (#3-4 and #5-6). Both pairs of studies used the same virus panel, and each contained human vaccination data collected in sequential years. Therefore, these studies represent the ideal scenario to apply intertable completion, since virus trends in one study should be highly informative of the trends in the other study. Indeed, removing any virus from these four studies led to low reconstruction error (RMSE=0.2±0.1 for Studies 3-4 and RMSE=0.3±0.1 for Studies 5-6). To quantify how much effort could have been saved in each pair of studies, we removed multiple viruses and computed the average RMSE for the imputed values ( Figure S4). In each case, removing half the viruses from one study only increased the average RMSE by 30%. This suggests that when two contemporary studies with similar experimental designs have partially overlapping virus panels, the two studies can be merged and accurately matrix completed.
On the other hand, Fonville Study 1 (ferret study) and Study 2 (human infection study) did not have a paired study with an identical virus panel and experimental design. Hence, these two studies represent a harder but more common scenario for inter-table completion. While many of these viruses had accurate inter-table completions (average RMSE=0.5±0.2 for both studies), there were some extreme outliers. For example, the worst virus (A/Auckland/20/2003) appeared in Studies 1, 3, and 4; it exhibited a large RMSE=1.2 when removed from Study 1 but a small RMSE=0.3 when excluded from Study 3 or 4. Moreover, restricting the analysis to a subset of studies -such as imputing human infection data (Study 2) using human vaccination data (Studies 3-6) without ferret data (Study 1) -leads to marginally worse completions, and more generally, including additional and varied datasets leads to better completions ( Figure S5). Indeed, when combining all Fonville studies, 81/131=60% of the inter-table completions using viruses from Study 1 or 2 had an RMSE<0.5, demonstrating that behavior of multiple viruses can be inferred across disparate studies.

Non-uniform sampling: Combining monoclonal and polyclonal antibody data
We next evaluated a fundamentally different form of inter-table completion using monoclonal antibody data to impute polyclonal serum measurements (and vice versa). While both datasets measure the amount of antibodies required to inhibit a virus by 50%, monoclonal antibodies represent a single, defined antibody whereas polyclonal serum contain an unknown mixture of antibodies. Hence monoclonal antibodies are measured in terms of the 50% inhibitory concentration (IC50 in μg/mL units; lower values represent a more potent antibody) while sera are measured by the dilution necessary to achieve 50% inhibition (ID50 in dilution units; higher values represent a more potent serum). This more difficult case of inter-table completion must combine these disparate types of data as well as compensate for greater incompleteness of the data, with a far larger number of monoclonal antibody measurements (373 antibodies × 933 viruses) than polyclonal data (40 sera × 71 viruses; each of these viruses is also present in the monoclonal data).
Consistent with expectations, we observed higher inter-table RMSEs than in Fonville studies. The average error when withholding viruses from the monoclonal study was RMSE=1.2±0.3 while the average error when withholding from the polyclonal study was RMSE=0.7±0.3, implying an absolute prediction error of roughly 16-fold or 5-fold, respectively. However, given that the correlation between the imputed and missing entries was also somewhat high (average r 2 of 0.47 and 0.49 when withholding from the monoclonal and polyclonal studies, respectively), the imputed data may be able to approximately sort the missing measurements from weakest to strongest.
Hence, despite the larger magnitude of errors, we hypothesized that inter-table completion could identify the most potent antibodies or sera. For each virus-study pair in the merged monoclonal-polyclonal Catnap data, we sorted the inferred entries for the withheld virus from most inhibitory to least. We then asked whether prioritizing future experiments in this order would identify highly inhibitory antibodies (IC50<0.1 μg/mL for monoclonal; ID50>1,000 dilutions for polyclonal data) more effectively than performing future experiments in a random order. Averaging across viruses withheld from the monoclonal study, 80% of highly inhibitory antibodies were identified in just the top 36% of prioritized experiments ( Figure 4C). Similarly, averaging across viruses withheld from the polyclonal study, 80% were identified in the top 30% ( Figure 4C). Thus, even when the individual estimates have large errors, inter-table completed results across dramatically different studies can be used to substantially increase the efficiency of future experiments.

Non-uniform sampling: Using timestamps to predict Catnap measurements across time
The dozens of studies comprising the monoclonal antibody-virus interactions in the Catnap database together create the largest matrix we analyzed (348,009 total interactions). With 87% of entries missing, the imputed titers exhibited a relatively high estimated RMSE=0.8 (signifying 6-fold error). This large error not only arises from the large fraction of missing entries, but also from the considerable divergence between the HIV-1 viruses in the Catnap database (viruses differ by 200 amino acids on average, whereas the Fonville influenza viruses differed by 30 amino acids on average). It is easier to matrix complete a virus's titers when a similar virus exists in the dataset, and conversely, the accuracy of matrix completion decreases as the distance to the nearest virus increases ( Figure S6A). In the context of HIV-1 where all viruses are highly diverse, the baseline matrix completion error tends to be higher, but it also depends less on sequence similarity ( Figure S6B).
Yet although the matrix completion results had a large RMSE for the missing values, they also exhibited a large r 2 =0.7. As with the monoclonal-polyclonal data examined above, this large r 2 implies that the completed titers can accurately distinguish which of the missing values represent very weak or very potent interactions. Said another way, 6-fold error is more than sufficient to differentiate strong and weak antibody interactions, given that IC 50 values span 6 orders of magnitude from 10 -4 -10 -2 μg/mL.
To assess this approach, we examined how well N experiments, prioritized by matrix completion, could identify weak antibody-virus interactions (signaling viral escape) or the strongest interactions (representing a potent immune response), with N varying from 1 to the total number of experiments performed. We used the timestamp associated with each Catnap measurement and considered data taken before a given year (e.g., in 2017 or before) to predict the missing values that were subsequently measured (e.g., in 2018 or later; Figure 5A). In doing so, we only consider the antibodies and viruses measured both before 2017 and after 2018 (in the example of Figure 5A, we ignore the antibody in the right-most column and the virus in the bottom row). We then ordered the predictions from strongest-toweakest ( Figure 5B) and determined the number of strong interactions (defined as an IC50≤0.1 μg/mL) identified by N experiments.
When guided by matrix completion, 160/200=80% of the strong interactions would have been found after performing only 260/1230=20% of all Catnap measurements carried out in 2018 or later ( Figure 5C). In contrast, searching through the antibody-virus pairs at random would have required 980 measurements on average before 80% of the strong interactions could have been found, necessitating 4x more work. A similar trend was found when searching for the weakest antibody-virus interactions or when analyzing the Catnap data before/after a different year ( Figure S7). In summary, using matrix completion can help sift through the wealth of available data to find new antibody-virus interactions with especially high or low titers.

Discussion
The low dimensionality of antibody-virus data provides the mathematical foundation to fill in missing entries in a partially observed table of binding, neutralization, or HAI measurements. This same principle underlies the field of compressed sensing, where a sparse signal can reconstruct a complete image, enabling dramatically faster data acquisition. In the context of immunology, low dimensionality enables us to readily combine existing studies and inform future efforts.
We suggest several ways to harness matrix completion and facilitate future serological studies. First, an experiment quantifying how a large number of antibody responses inhibit a large number of viruses does not need to measure all interactions. If the experiment can be performed in waves (e.g., randomly selecting 20% of the unmeasured antibody-virus pairs each time), then the error of imputing the missing values can be calculated after each wave (Figures 2D). When this error becomes comparable to the error of the assay (typically 2-fold for HAI or neutralization measurements), or when the error bottoms-out, the remaining values can be filled in via matrix completion. In the context of the Fonville 2014 studies, this amounts to saving tens of thousands of measurements.
Second, given a limited supply of sera or viruses, the same number of measurements could generate larger datasets via matrix completion. For example, suppose an experiment measures how potently 10 sera inhibit 10 viruses (more viruses would be better, but the amount of serum is limited). Rather than measuring all 100 interactions, an experiment could measure each serum against 10 randomly selected viruses from a group of 20 viruses. This would require the same number of experiments (i.e., 100 measurements), but the resulting completed matrix would have double the data.
Third, in large-scale serology studies where thousands of viruses/sera are assessed against a small number (~10) of sera/viruses (Harvey et al., 2016;Nguyen Vinh et al., 2021), the few sera/viruses chosen should be as functionally orthogonal to one another as possible to maximize the information gained. The ability to successfully matrix complete with only a fraction of measurements would suggest that the viruses are functionally similar, and a more diverse virus panel could be chosen.
Beyond these applications of matrix completion on a single dataset (intra-table completion), utilizing completion across datasets (inter-table completion) remains largely unexplored. One of the simplest implementations is to merge datasets and infer measurements for viruses that were entirely missing from a study ( Figure 1B). This method of inter-table completion consistently worked between pairs of human vaccination studies carried out in consecutive years (Fonville Studies 3-6, Figure 4B), presenting a simple framework to combine contemporary studies and greatly magnify the amount of data. Predictably, this approach was not always successful when inferring human (or ferret) infection data from human vaccination data or vice versa (Fonville Studies 1-2 in Figure 4B, Figure S5). Unlike with intra-table completions, where a fraction of the data can be withheld to quantify the expected error of the imputed values, there is currently no analogous method to quantify the expected error from inter-table completion. Hence, a fraction of interactions need to be measured to verify the results of such matrix completions.
Even in cases where the error of imputed values is unacceptably large (for example, if the majority of measurements are missing or if the diversity of viruses/antibodies results in a large coherence), intertable completion can nevertheless pinpoint the strongest or weakest interactions that are often of greatest interest. For example, while only 16% of all 1230 HIV-1 antibody-virus measurements from 2018-2020 had a strong IC50≤0.1 μg/mL, by matrix completing all data from 2017 and earlier we found 80% of these strong IC50 values using only 260 experiments ( Figure 5C). In addition, we demonstrated that monoclonal and polyclonal HIV-1 data against a panel of identical viruses could be combined to identify the strongest neutralization measurements, even though the composition of antibodies within these sera is not known ( Figure 4C). Thus, if a new viral variant emerges and is rapidly tested against a panel of sera, matrix completion can determine which previously-characterized monoclonal antibodies potently inhibit the virus. In short, inter-table completion can rapidly sift through all missing antibody-virus combinations and prioritize efforts for groups seeking the strongest or weakest interactions.
Unlike earlier applications of matrix completion using multidimensional scaling or alternating gradient descent (Cai et al., 2010;Lapedes & Farber, 2001;Ndifon, 2011;Smith et al., 2004), our approach of minimizing the nuclear norm does not require a dataset's rank to be specified. Hence, it is straightforward to adapt this method to different studies and different viruses without needing to refit hyperparameters. For example, we used the same code to analyze the six influenza studies (with rank between 6-9) and the HIV-1 Catnap dataset containing hundreds of studies (with rank 23).
Matrix completion can harness the low-rank behavior of antibody-virus interactions to greatly increase the amount of available data. In light of these results, it is worth considering how much data we actually have, given that patterns seen in one experiment can inform future efforts, even when the specific viruses or antibodies studied differ. Indeed, from the vantage of matrix completion, differences in antibody and virus panels between experiments becomes an asset, enabling us to explore a greater fraction of the immune landscape and extrapolate the behavior of novel antibody-virus pairs.

Availability of Code
The matrix-completed results for the Fonville and Catnap datasets are included in the supplementary files. We also include the code to perform matrix completion in both Mathematica and Python (https://github.com/cleary-lab/Inhibition-Neutralization-Completion).
In Python, nuclear norm minimization takes fewer than 10 lines of code with the cvxpy package (Agrawal et al., 2018;Diamond & Boyd, 2016 Each serum sample only appeared in a single study, whereas all viruses were included in at least two studies. Cumulatively, the six studies contained 1147 distinct serum samples and 81 distinct viruses, with the distribution of viruses across the six studies shown in Figure S3. The raw data annotated with the virus names is included in the GitHub repository associated with this paper.

Intra-Table Matrix Completion
Consider a table of measurements M where entry Mjk represents the interaction between antibody (or serum) j and virus k, with larger values always corresponding to a more potent antibody that strongly inhibits a virus. Mjk either represents the hemagglutination inhibition (HAI) titer from the Fonville influenza study or the inverse of the 50% inhibitory concentration (1/IC50) for monoclonal HIV-1 antibodies from Catnap. Thus, potent antibody/serum interactions are always denoted by larger values.
Influenza HAI is measured using progressive 2-fold dilutions, so that titers can equal "<10", 10, 20, …, 1280, or "≥1280." We first replaced lower or upper bounds ("<10"→5 and "≥1280"→2560) and then took log10 of all titers. To prevent the error from large values from artificially dominating the completion algorithm, we subtracted the minimum value m=minjk(Mjk) from all entries in M, performed matrix compilation, and then added m back to obtain the complete matrix with log titers (to transform to the original un-logged titers, the matrix completed value can be exponentiated by 10). We note that as an alternative, the mean of M could be subtracted to center the data in log-space; in this work, we chose to subtract the minimum titer because it results in slightly better predictions of the larger titer values (which are often the most important measurements).
The goal of matrix completion is to find a filled-in low-rank matrix M̃ such that Mjk≈M̃jk for all pairs (j,k) that were measured. Note these two criteria are at odds with one another, since maintaining exact fidelity with the data may result in a matrix M̃ with larger rank. We control the relative importance of these two goals with a single hyperparameter λ described below.
Although minimizing matrix rank is NP-hard, minimizing the convex envelope of the rank (the nuclear norm ||M|| ⋆ ) is computationally efficient. Hence, the matrix M∊ℝ n×m is completed as min{M̃∊ℝ n×m : ||M̃|| ⋆ + λ ||M̃m easured -M measured || 2 2 } where (M̃m easured ) jk =M̃j k whenever entry (j,k) was measured and 0 otherwise, with an analogous definition for Mmeasured. The first term, ||M̃||⋆, minimizes the nuclear norm (and hence the rank) of M̃, whereas the second term ||M̃m easured -M measured || is a quadratic penalty for deviating from the measured values (the prefactor λ is discussed below). Such minimizations have a known tendency to shrink values towards zero, so that the best-fit line of the measured vs predicted titers has a slightly larger slope than the perfect experiment-theory match (purple vs black lines in the scatterplots of Figure 2A and 3A). For the influenza serum titers in Figure 2 where the log 10 (titers) are matrix completed, the predicted values tend to be smaller than the measured titers; for the HIV-1 Catnap antibody data where the -log 10 (IC 50 values) are matrix completed, the predicted values tend to be greater than the measured titers because of the negation (see the Catnap Database Methods section).
The hyperparameter λ dictates the relative importance of the two terms. Large values of λ would indicate strong trust in the measured values and would result in higher-rank completions that remain highly faithful to the data. Smaller values of λ would be appropriate when the data is noisy and will result in a lowerrank matrix that can deviate from the measurements. In this work, we set λ=1 in all analyses. With this value, over 95% of the completed Fonville titers were within 10% of the measured values, keeping high fidelity with the experimental values.
Because the predicted and measured titers span multiple decades and are presented on log-log plots, both Pearson's r 2 and RMSE were computed on the log 10 data. The RMSE can be recast into un-logged titers by exponentiating by 10 (e.g., an RMSE of 0.2 in log10 space becomes a 10 0.2 =1.6-fold error in the titer). The dimensionality of the data was computed on the titers mean-centered in log space, Mcentered = log10(M) -mean(log10(M)) to prevent a change in the units (e.g., measuring an IC50 in μg/mL versus Molar) from affecting the dimensionality. The dimensionality r of M centered is defined as the minimum number of singular values of Mcentered that explain 95% of its variance (the minimum r such that 0.95 ≤ Σ1≤j≤r σj 2 /Σ1≤k≤n σk 2 ).

Inter-Table Matrix Completion
Computationally, inter-table completion works identically to intra-table completion, except that virus and serum panels must first be combined. Note that some of the Fonville viruses had different capitalization or punctuation in different tables (e.g., "HN196_2009" and "JO/33/94" in Fonville Table S1 and their counterparts "HN196/2009" and "A/JOHANNESBURG/33/94" in Fonville Table S3) that are unified in our supplementary data files. Any virus that was not present in a study was defined as missing for all sera in that study (e.g., "A/AUCKLAND/20/2003" was contained in Studies 1, 3, and 4, so when combining all Fonville data its titers were defined as "Missing" against sera in Studies 2, 5, and 6).
In the few cases when all predicted or measured log10(titers) withheld from a dataset have nearly identical values, the correlation coefficient r 2 is artificially deflated. For example, if five (predicted, measured) titers for a virus were log10 of {(10,10), (10,10), (10,10), (10,20), (20,10)}, then any linear dependence is masked by the small range of the predictions and measurements. This leads to a small Pearson's r 2 =0.06, even though the predictions were perfect for the first three points and only 2-fold off for the final two points. In contrast, when measurements span a wide dynamic range with this same error distribution, such as log10 of {(10,10), (20,20), (40,40), (80,160), (160,80)}, their resulting Pearson's r 2 =0.81, matching the typical intuition for this metric. Hence, when all predicted or measured log10(titers) withheld had a standard deviation ≤0.2, we excluded such cases from the Pearson's r 2 analysis in Figures 4B, S3, and S5. We note that such cases only occurred for inter-table completion for viruses that showed minimal HAI against all sera ( Figure 4B) and never occurred for intra-table completion (Figures 2 or 3).

Catnap Database
The complete list of Catnap antibody-virus measurements is updated each month on the Downloads page (https://www.hiv.lanl.gov/components/sequence/HIV/neutralization/download_db.comp; click the file labeled as "Assay"). We split each antibody-virus pair into one of four categories based on the antibody name and the IC50 and ID50 values: 1. Polyclonal sera (n=40) did not have IC50 values but instead had ID50 values (the dilution at which a virus was 50% neutralized). 2. Antibody mixtures (n=44) had a "+" in their antibody names to distinguish the monoclonal antibodies in each mixture. 3. Multispecific antibodies (n=50) had an "/" in their antibody name to denote the different monoclonal antibodies from which they were composed. 4. Monoclonal antibodies (n=373) contained all remaining entries.
For simplicity, our Catnap analysis was restricted to the monoclonal antibodies (category #4 above) and polyclonal sera (category #1). We replaced values bounded from above by half the value of the bound and values bounded from below by double their value (e.g., "<20"→10, ">40"→80), and when multiple antibody-virus measurements were available we took their geometric mean. We note that these assumptions, especially our treatment of the bounded values, may influence the matrix completion results given that the IC50 values in Catnap range from 10 -5 μg/mL to 344 μg/mL, while the upper bounds varied between 10 -3 and 10 0 .
The only minor difference between the Catnap analysis and the Fonville analysis accounted for the fact that potent antibodies have low IC50 values in Catnap whereas potent sera have a high HAI titer in Fonville. Therefore, we completed the matrix of -log10(IC50 values) from Catnap monoclonal antibody data, whereas we completed the matrix log 10 (HAI titers) for Fonville and log 10 (ID 50 ) for Catnap polyclonal serum data. Note that in all cases, we represented the weak antibody/sera by smaller values in the matrix, which is important for the stability of the completion algorithm (because nuclear norm minimization pulls values towards 0). When combining measurements from different studies, the accuracy of matrix completion may improve if each dataset is separately mean-centered in log-space. This can adjust for different protocols across labs or for systematic shifts in the data. Moreover, when combining datasets with different units (e.g., the Catnap monoclonal antibody in IC50 units and polyclonal serum data in dilution units; Figure 4C), the logtiters of both datasets should be separately mean-centered before completion.

Number of Measurements Required for Accurate Matrix Completion
When missing values are randomly sampled across a matrix for intra-table completion, the theoretical bound in Equation 1 quantifies the number of measurements (m) required to accurately reconstruct a matrix based on the number of antibodies or viruses n (equal to max(# of antibodies, # of viruses)), the matrix rank r (here, determined as the rank required to explain 95% of the variance of the completed matrix), the coherence μ (calculated from the complete matrix as explained below), and a constant C (inferred from the Fonville RMSE curves where m can be determined empirically). Together, these values quantify how difficult it is to complete a given dataset.
To compute μ for all data within a study (Candes & Recht, 2009;Ndifon, 2011), we take all rows/columns of the data matrix with ≥50% of entries measured and calculate the completed matrix M whose dimensions we denote by n1×n2. (Note that using the rows/columns with few measurements leads to unstable results for highly incomplete datasets such as Catnap and are hence dropped.) We compute the rank r of M as the minimum number of singular values of M that explains 95% of its variance. Using the singular value decomposition M=UΣV T , we form the orthogonal projection onto M using the first r eigenvectors of U or V. For example, P U =Σ 1≤j≤r u j u j T where u j is the j th column of U, with an analogous expression for V. The coherence of M is given by μ = max(n 1 /r max 1≤j≤n ||P U e j || 2 , n 2 /r max 1≤k≤n ||P V e k || 2 ), (Equation 2) where e j and e k represent standard basis elements, and with values shown in Table S2.
To compute the value of the constant C in Fonville studies, we find the value of m at which the slope of RMSE curves in Figure 2 is greater than -0.5 (indicating the sufficient number of measurements for that study). We then solve for C in Equation 1, to obtain a separate value for each study (Table S2). Below, we use the averaged value, C=0.18.
To approximate when the Catnap RMSE and r 2 curves in Figures 2B and 3B will flatten out, we utilize the theoretical bound in Equation (1) that a matrix can be accurately completed when the number of (randomly observed) measurements is greater than C μ r n log(n), where n=933 is the larger dimension of the Catnap matrix and r=23 equals the rank of the completed matrix (Candes & Recht, 2009). To validate this result, we apply this same approach to the Catnap polyclonal sera dataset and predict that the RMSE curve will bottom-out when 41% of measurements are observed. Since the polyclonal serum dataset has approximately half of all serum-virus interactions measured, we can directly test this prediction by performing intra-table completion using different fractions of observed data -in doing so, we find that the RMSE curves bottoms-out when 44% of measurements are observed, close to our predicted value. Table S2. Number of measurements needed to accurately reconstruct missing data. The theoretical lower bound, m > C μ r n log(n), implies that accurate imputation via matrix completion requires at least m measurements in an n1×n2 matrix. μ represents the coherence of the matrix (Candes & Recht, 2009), r its rank, n the larger of the matrix length or width, and C is a constant that is determined from the RMSE curves of the Fonville studies. The average value of C is then used to determine at what fraction of observed data the Catnap RMSE curves will bottomout (right-most column). The average value of the constant and the implied fraction of measurements necessary in Catnap studies are shown in parentheses. The Catnap polyclonal serum data is sufficiently complete to determine when the RMSE bottoms-out, providing a direct test for this approach.
Figure S1. Example of low-rank matrices. (A) A rank 1 matrix has rows (or columns) that are multiples of a single vector. If the bottom row (representing a new antibody or virus) is added with just a single measurement, all other values in this row can be determined by finding the multiple of the common vector, as shown by the right-hand side of the equality. (B) A rank 2 matrix has rows (or columns) that are the linear combination of two vectors. If a new row is added with just two measurements, the other values in the row can be determined. Note that although the matrix on the left-hand side is visually very complex, it nevertheless follows a simple mathematical structure that can be determined using the singular value decomposition or alternative methods. Figure S2. Antigenic clustering in the combined Fonville data. All six Fonville studies were combined, with any missing entries inferred by matrix completion. Heatmap values depict log10(IC50) for each virus-serum pair (columns and rows, respectively). Columns are labeled by antigenic cluster (defined in Figure S1 of Fonville 2014) and ordered according to hierarchical clustering of their profile across sera in the completed matrix using Ward's minimum variance dissimilarity. Although many viruses group together by antigenic cluster, there are substantial differences within antigenic clusters, for example in antigenic clusters 10-14. Figure S3. Inter-table completion using a single virus. As shown in Figure 4A,B, each virus was withheld from one Fonville study and predicted via matrix completion. Collectively, the measurements of the 81 viruses across the six Fonville studies led to 311 combinations. Pearson's r 2 [top] and the RMSE [bottom] are shown for each case. Viruses absent from a study are colored tan. Viruses shown in orange have predicted or measured log10(titers) with standard deviation ≤0.2; since they exhibit no linear dependence, their r 2 is artificially deflated and they were excluded from the r 2 analysis (see Methods). Figure S4. Inter-table completion using multiple viruses. (A) Fonville Studies 3 and 4 have identical virus panels with 70 viruses. Different subsets of viruses were entirely removed from Study 4 (e.g., five viruses shown in this schematic using columns of gold boxes) and inter-table completed. This process was repeated using Study 5 to inter-table complete Study 6, which contained identical virus panels with 20 viruses. (B) The number of viruses shown on the x-axis were observed in Study 4 or Study 6, and the remaining virus titers were imputed via matrix completion, and the resulting RMSE was calculated. Figure S5. Inter-table completion across different types of data. Inter-table completion was performed analogous to Figure 4B, but using different subsets of Fonville studies. From left-to-right, the columns represents: (1) Using human data to impute ferret data [Studies 2-6 → Study 1] and (2) Figure 4B where the j th column represents using all Fonville studies except Study X to impute the data in Study X. Hence, the first column in Figure 4B is identical to the left column in this figure [Studies 2-6 → Study 1]. The second column from the left in figure 4B [Studies 1,3,4,5,6 → Study 2] is similar to the third column in this figure, and comparing the two demonstrates the effects of excluding the ferret data from the inter-table completions. In general, including more datasets improves the inter-table completion results. The right-most column in this figure demonstrates that inter-table completion works best between vaccination studies that are 1 year apart, and this effect dominates the results in the four rightmost columns in Figure 4B. Figure S6. The accuracy of matrix completion diminishes for highly divergent viruses. The root-meansquared error for viral titers as a function of the amino acid distance (ΔAA) to the most similar virus in a dataset. (A) For each virus V in Fonville Study 4 (chosen because it has nearly-complete data), construct a matrix containing the titers of V together with the titers of the 10 most similar viruses whose HA sequences are ≥ΔAA from V. Withhold 20% of titers from virus V and impute them via intra-table completion. Each gray line shows the resulting curve for one virus, and the blue line shows the average response over all viruses. The RMSE slowly increases from 0.16 when ΔAA≤5 (1.4-fold error) to 0.44 when ΔAA=30 (2.8-fold error). (B) This analysis was repeated for the more diverse Catnap viruses, except now using viruses whose Envelope sequence differed by ≥ΔAA. The response remained relatively flat with an RMSE≈0.9 between ΔAA=100 and ΔAA=200. to predict all future measurements. The predicted interactions were ordered from strongest-to-weakest [top row] or weakest-to-strongest [bottom row], and each the true measurements were revealed in order. The top row shows the number of strong interactions measurements (IC50≤0.1 μg/mL) found after N measurements, either ordered by matrix completion or by a random search. The bottom row shows the analogous results when searching for weak interactions (IC50≥10 μg/mL). The black dashed line shows when 80% of the strong/weak interactions are discovered. The gray dotted line (y=x) represents perfect predictions for the strong/weak interactions.