A general model of multivalent binding with ligands of heterotypic subunits and multiple surface receptors

Multivalent cell surface receptor binding is a ubiquitous biological phenomenon with functional and therapeutic significance. Predicting the amount of ligand binding for a cell remains an important question in computational biology as it can provide great insight into cell-to-cell communication and rational drug design toward specific targets. In this study, we extend a mechanistic, two-step multivalent binding model. This model predicts the behavior of a mixture of different multivalent ligand complexes binding to cells expressing various types of receptors. It accounts for the combinatorially large number of interactions between multiple ligands and receptors, optionally allowing a mixture of complexes with different valencies and complexes that contain heterogeneous ligand units. We derive the macroscopic predictions and demonstrate how this model enables large-scale predictions on mixture binding and the binding space of a ligand. This model thus provides an elegant and computationally efficient framework for analyzing multivalent binding.


Introduction
Binding to extracellular ligands is among the most fundamental and universal activities of a cell. Many important biological activities, and cell-to-cell communication in particular, are based on recognizing extracellular molecules via specific surface receptors. For example, multivalent ligands are common extracellular factors in the immune system [1], and computational models have been applied to study IgE-FcεRI [2], MHC-T cell receptor [3], and IgG-FcγR interactions [4]. However, these models are specific to their biological applications, limited to a single homogenous ligand and receptor [5], or fail to scale with valency [6]. 10 Multivalent binding to various receptors on a cell can be accounted for by the kinetics of individual association reactions between each monomer-receptor pair. However, when the complexes contain multiple ligand monomers of either the same or different kinds, and when there is a mixture of complexes with either the same or different valencies, the system becomes complicated: different 15 binding orders of units on a complex creates a combinatorially large amount of possible reactions, and the competition among different kinds of ligands and complexes impedes intuitive understanding. In this case, enumerating all binding configurations and reactions become impractical.
In this study, we extend a simple two-step, multivalent binding model to 20 cases involving multiple receptors and ligand subunits [7,8,5,9,3]. By harnessing the power of combinatorics via applying the multinomial theorem and focusing on macrostates, we can predict the amount of binding for each ligand and receptor at equilibrium. We derive macroscopic quantities for both specifically arranged and randomly assorted complexes, and demonstrate how this 25 model enables large-scale predictions on mixture binding and the binding space of a ligand.
Our model provides both generality and computational efficiency, allowing large-scale predictions such as characterizing synergism of using a mixture of ligands and depicting the general binding behavior of a compound. The com-30 pactness and elegance of the formulae enable both analytical and numerical analyses, in turn allowing for the construction of higher-level computational tools. We expect this binding model will be widely applicable to many biological contexts.

Vector and matrix notation
In this work, we denote a vector in boldface letter and its entry in the same letter but with subscript and not in boldface, e.g. C = [C 1 , C 2 , . . . , C n ]. The sum of elements for a vector is denoted as |C| = n i=1 C i . For any matrix (A ij ) of size m × n, we denote the vector formed by its i-th 40 row as A i• = [A i1 , A i2 , · · · , A in ], and the vector formed by its j-th column as A •j = [A 1j , A 2j , · · · , A mj ]. The row sums of matrix (A ij ), therefore, can be written as |A 1• |, |A 2• |, · · · , |A m• |, and column sums |A •1 |, |A •2 |, · · · , |A •n |.
In this work, multinomial coefficients such as n choose k 1 , k 2 , · · · , k n will be written as 45 n k = n k 1 k 2 · · · k n = n! k 1 !k 2 ! · · · k n ! . (2.1) The implicit assumption here is that |k| = n, and each k i ∈ N.

Some useful theorems in combinatorics
From the binomial theorem, we know that Differentiating both sides by Φ, we get Throughout this work, we consolidate multiple summation symbols into one. In this case, we use |u|=m,|v|=n as a shorthand for |u|=m |v|=n . From Equation 2.5, we can derive the sum of a linear combination of two exponents from each multinomial term as 60 |u|=m |v|=n m u where k 1 and k 2 can be any constant. We can extend this to the product of N multinomial equations. Let q 1 , · · · , q N be N nonnegative integer vectors, each with |q i | = θ i , and ψ 1 , · · · , ψ N be N nonnegative vectors. The sum of any linear combination of exponent terms r k r q srtr , where k r 's can be any constant, and each q srtr is the t r -th element of q sr , can be calculated as (2.8)  Figure 1: General setup of the model. In this study, we investigate the binding behavior of complexes formed by monomer ligands in either a specific arrangement or by random assortment. We propose that the binding configuration between a complex and several receptors on a cell can be described as a matrix (q ij ). The construction of a complex can be written as a vector θ. The figure shows the dimensions of the model's parameters: C i , the monomer compositions, are in a vector of length N L ; R tot,j and R eq,j , the receptor expression and equilibrium levels are in vectors of length N R ; the binding affinities, K a,ij , are in a matrix of dimension N L × N R ; ϕ ij and ψ ij are in the matrices of dimension N L × (N R + 1). Θ is a set of all possible θ's, with C θ as each of their compositions. Each θ is a vector of length N L , and C θ should be in a vector of the same size as Θ.

Parameters and notations
In this study, we investigate the binding between multivalent ligand complexes and a cell expressing various surface receptors. As shown in Figure 1, we 70 consider N L types of distinct monomer ligands, namely L 1 , L 2 , ..., L N L , and N R types of distinct receptors expressed on a cell, namely R 1 , R 2 , ..., R N R .
The monovalent binding association constant between L i and R j is defined as K a,ij . A ligand complex consists of one or several monomer ligands, and each of them can bind to a receptor independently. Its construction can be described by a vector θ = [θ 1 , θ 2 , ..., θ N L ], where each entry θ i represents how many L i this complex contains. The sum of elements of vector θ, |θ|, is f , the valency of this complex.
The binding configuration at equilibrium between an individual complex and a cell expressing various receptors can be described as a matrix (q ij ) with N L 80 rows and (N R + 1) columns. For example, the complex bound as shown on the top left corner in Figure 1 can be described as the matrix below it. q ij represents the number of L i to R j binding, and q i0 , the entry on the 0-th column, is the number of unbound L i on that complex in this configuration. This matrix can be unrolled into a vector form q = [q 10 , q 11 , ..., q 1N R , q 20 , ..., q 2N R , q 30 , ..., q N L N R ] 85 of length N L (N R + 1). Note that this binding configuration matrix (q ij ) only records how many L i -to-R j pairs are formed, regardless of which exact ligand on the complex binds. For example, in Figure 1, swapping the two L 2 's binding to R 2 's will give us the same configuration matrix. Therefore, we will need to account for this combinatorial factor when applying the law of mass action. 90 We know from the conservation of mass that for this complex, θ i = q i0 + q i1 + q i2 + · · · + q iN R = |q i• | must hold for all i's. Mathematically, vector θ is the row sums of matrix (q ij ). The corresponding θ of a binding configuration q, θ(q), written in the format of a function of q, can be determined by this relationship. Also, the sum of elements in q, |q| = f , will be the valency.

95
The concentration of complexes in the solution is L 0 (not to be confused with L i , the name of ligands, when i = 1, 2, · · · , N L ). It is the concentration of all ligands at the equilibrium state. It is approximately the same as the initial ligand concentration if the amount of ligands is much greater than that of the receptors and binding event does not significantly deplete the ligand 100 concentration.
On the receptor side, R tot,i is the total number of R i expressed on the cell surface. This usually can be measured experimentally. R eq,i is the number of unbound R i on a cell at the equilibrium state during the ligand complex-receptor interaction, and usually must be calculated from R tot,i as we will explain later.

105
The binding of a ligand complex, a large molecule, is complicated. To simplify the matter, we will need to make some key thermodynamic assumptions. In this model, we make two assumptions on the binding dynamics: 1. The initial binding of L i on a free (unbound) complex to a surface receptor R j has the same affinity (association constant, K a,ij ) as that of a monomer 110 ligand L i ; 2. In order for the detailed balance to hold, the association constant of any subsequent binding event on the surface of a cell after the initial interaction must be proportional to their corresponding monovalent affinity. We assume the subsequent binding affinity in multivalent interactions between 115 L i and R j to be K * x K a,ij . K * x is a term coined as the crosslinking constant. It captures the difference between free and multivalent ligand-receptor binding, including but not limited to steric effects and local receptor clustering [10]. In practice this term is often fit to apply this model to a specific biological context. 120 We create two last variables that will help to simplify our equations. For all i in {1, 2, ..., N L }, we define ψ ij = R eq,j K a,ij K * x and ϕ ij = R eq,j K a,ij K * x C i where j = {1, 2, ..., N R }, and we define ψ i0 = 1, ϕ i0 = C i . Therefore, ϕ ij = ψ ij C i holds for all i and j. Then we define the sum of this new matrix (ϕ ij ) as of these definitions will become clear in future sections.

The amount of a specific binding configuration
With the definitions of our model we can now derive the amount of complexes bound with the configuration described as q on a cell at equilibrium, v q . We know that the composition of any complex can be described by a vector 130 θ of length N L , where each entry θ i represents the number of monomers L i within the complex. We can enumerate all possible binding configurations of θ complex by filling the matrix (q ij ) with any nonnegative integer values so long as its row sums equal θ. Conversely, for a certain binding configuration, q, the construction of the complex involved must be its row sum, θ(q), and the 135 concentration of this complex is L 0 C θ(q) . If the corresponding complex θ(q) does not exist in the solution, we set C θ(q) = 0. Since θ(q) is defined only by the ligand concentration at equilibrium, it will remain L 0 C θ(q) .
initial binding subsequent binding Figure 2: The scheme of cell-complex binding step by step. We assume the initial binding event has the same affinity as monomer binding, K a,ij , while subsequent binding has an association constant scaled by K * x , the crosslinking constant. Each binding configuration scheme above can be described by the q right below, if we ignore the statistical factors. θ(q) is the construction of the complex and can be implied from q. Initial binding. We start with the initial binding reaction of a complex, L i -to-R j . As shown in Figure 2, the reactants are the free complexes and the free 140 receptors R j (in this case R 2 ), and the product are the L i -to-R j (in this case L 2 -R 2 ) monovalently bound complexes q (1) . We denote the amount of this new complex as v q (1) . The concentration of free complexes is L 0 C θ(q (1) ) . The equilibrium constant for this reaction is K a,ij . Therefore, we have While the binding configuration of q (1) can be described by q a , the total 145 amount of complexes that bind as described as q a may not be the same as v q (1) , since q a does not consider the number of ways this binding L i can be chosen. An equivalent explanation is that, q (1) is only one possible microstate to achieve the q a configuration, and we need to count the total number of possible microstates for q a . Accounting for this statistical factor, we have . q a,i• is a vector formed by the i-th row of q a . For example, in Figure 2, q a,2• = [2, 0, 1, 0]. Conceptually, θi q a,i• can be understood as the number of ways to split θ i L i 's into q i0 of unbound, q i1 of R 1 -bound, q i2 of R 2 -bound, ..., and q iN R of R N R -bound units. In the initial binding reaction, only q i0 and q ij will be nonzero, with q i0 = θ i − 1 and q ij = 1, so it is effectively the 155 same as θi 1 . However, the multinomial coefficient expression can be generalized to subsequent binding steps.
Subsequent binding. For a subsequent binding between L i and R j (i and j are not necessarily the same as in initial binding), we have the reactants as a bound complex, q (1) , and a free receptor R j (in the case shown by Figure 2, R 2 ), 160 while the product is another bound complex, q (2) . The equilibrium constant is To account for the statistical factors (3.5) By recursion, we can solve v q for any q from these equations. It is In the next section, we will use this formula repeatedly.
Notice that this equation is only for surface-bound complexes, not suitable for calculating the concentration of unbound q, when every nonzero values are 170 on its 0-th column. The concentration of unbound ligands should always be L 0 C θ(q) . However, for algebraic convenience, we allow such definition but only to subtract them later, and will name it v 0,eq which equals L 0 C θ(q) /K * x .

Macroscopic equilibrium predictions
From here we will investigate the macroscopic properties of binding, such as the total amount of ligand bound and receptor bound on a cell surface at equilibrium. We consider two different ways complexes in the solution can be formed. First, complexes may come in a specific arrangement. In this case, the structure and exact concentration for each complex are designed and known. Alternatively, ligand monomers of known proportion can congregate into com-180 plexes with a fixed valency f . Through random assortment, any combination of f monomer ligands can form a complex, and their concentration will follow a multinomial distribution. We will explore these two cases separately.

Complexes formed in a specific arrangement
When complexes are specifically arranged, the structure and proportion of 185 each kind are well-defined. To formulate this mathematically, we assume that we have various kinds of complexes, and each of them can be described by a vector θ of length N L , with each entry θ i as the number of L i in this complex. The valency of each complex may be different, and for complex θ its valency is |θ|. The proportion of θ among all complexes is defined as C θ , and the concentration 190 of each θ complex will be L 0 C θ . For example, if we create a mixture of 20% of bivalent L 1 and 80% of bispecific L 1 − L 2 , then θ 1 = [2, 0], θ 2 = [1, 1], C θ1 = 20%, and C θ2 = 80%. If the mixture solution has a total concentration of 10 nM, then the concentration of θ 1 is 2 nM, and the concentration of θ 2 is 8 nM.
We further conceptualize that Θ is a set of all existing θ's. By this setting, we should have θ∈Θ C θ = 1. These complexes will bind in various configurations which can all be described as a q. We define Q as a set of all possible q's, and we borrow the notation q ⊆ θ to indicate any binding configuration q that can be achieved by complex θ. This is equivalent to |q i• | = θ i for all i, or θ is the 200 row sum of (q ij ).
Solve the amount of free receptors. A remaining problem in the model setup is that in practice, only R tot,j , the total receptor expressions of each kind of a cell, can be experimentally measured, while the amount of free receptors at equilibrium, R eq,j , though being used extensively, is unknown. To find R eq,j , 205 we first need to derive the amount of bound receptors of each kind, R bound,j , then use conservation of mass to solve R eq,j numerically.
To calculate the amount of bound ligand R bound,n , we can simply add up all entries in the n-th column for every q's: By the conservation of mass, we have In this equation, R tot,n are known, and any |ψ i• | is a function of every R eq,j , j = 1, 2, · · · , N R , so all R eq,j need to be solved together. This system of equations usually does not have a closed form and must be solved numerically. When implementing, we suggest taking the logarithm of both sides of these 215 equations so the exponents can be eliminated and the range is restricted to positive numbers.
As a side note, the total amount of bound receptors regardless of kind is The amount of bound ligand complexes. Our model makes many macroscopic predictions readily accessible. For example, the amount of ligand bound at equilibrium is a useful quantity when measuring the overall quantity of tagged 220 ligand. To compute this number, we can add up all v q except the q's that only have nonzero values on the 0-th column, v 0,eq . Consequently, the model prediction of bound ligand at equilibrium is j=0 ψ ij , and the predicted amount of bound complex θ (complex of each kind) is The amount of fully bound ligands. In multivalent complexes like bispecific antibodies, drug activity may require that all subunits be bound to their respective targets [11]. The predicted amount of ligand full-valently bound can be calculated as v full,eq = θ∈Θ q10,...,q N L 0 =0 with q * i• = (q i1 , . . . , q iN R ), the q i• vector without q i0 . In this equation, the 230 multinomial coefficient θi describes the number of ways one can allocate θ i receptors to any position in the i-th row of the (q ij ) matrix except the 0-th row which stands for unbound.
In fact, the predicted amount of any specific-valently bound ligands can be derived in such manner. For example, the amount of ligands that bind monovalently can be calculated as This can be used for estimating the amount of multimerized ligands, L multi = L bound − v 1,eq , and multimerized receptors, R multi = R bound − v 1,eq .

Complexes formed through random assortment
Another common mode of forming multivalent complexes in biology, such as in the formation of antibody-antigen complexes [4], is the stochastic assembly of monomer units to a common scaffold. Instead of a specific arrangement, we provide binding compounds of a fixed valency f and a mixture of monomer lig-240 ands, and complexes can form through random assortment. The concentration of these complexes, therefore, will follow a multinomial distribution.
To formulate this mathematically, we denote the proportion of L i as C i , and For example, say we have 40% L 1 and 60% L 2 in the solution to form dimers (f = 2), then C 1 = 40%, C 2 = 60%. As complex formation 245 follows a binomial distribution, there will be 16% bivalent L 1 , 36% bivalent L 2 , and 48% L 1 − L 2 complex. In general, the probability of complexes formed as described by θ in random assortment is Plugging this relationship between C θ and C i into Equation 3.6 for the 250 amount of a specific binding configuration, we have where ϕ ij = R eq,j K a,ij K * x C i and ϕ i0 = C i .
Solve the amount of free receptors. Similar to the specific arrangement case, we still need to solve R eq,n numerically from R tot,n . We first derive the amount of bound receptors of each kind at equilibrium as Then by the conservation of mass, we can numerically solve R eq,n as R tot,n = R eq,n + R bound,n Again, since Φ = N R j=1 |ϕ •j | is a function of every R eq,n , all R eq,n need to be solved together.
The amount of k-valently bound complexes. For randomly assorted complexes, we first derive the amount of ligands that bind k-valently. As we will show, it has 260 a useful expression that can used to find many other quantities conveniently. First, let's break q into two separate vectors, q = (q •0 , q •x ). We define the vector formed by the 0-th column of q which stand for unbound as q •0 , and the one formed by the other elements as q •x . By the model setup, |q| = f , |q •x | = k, and |q •0 | = f − k. We then have The amount of total bound ligands and receptors. Many macroscopic properties can be derived from v k,eq . For example, the amount of total bound ligands is simply the sum of ligands bound monovalently to fully, and can be simplified to Similarly, the total bound receptors should be The number of cross-linked receptors. In some biological contexts such as T cell receptor-MHC [3] or antibody-Fc receptor [4] interactions, signal transduction 275 is driven by receptor cross-linking due to multivalent binding. The amount of total cross-linked receptors can be derived from v k,eq as To find the number of crosslinked receptors of a specific kind, R n , requires extra consideration. Similar to how v k,eq was found, we break q into three separate vectors, q = (q •0 , q •n , q •x ). q •0 is the vector formed by the 0-th 280 column of q, q •n is the vector formed by the n-th column of q, and q •x contains all others. If the complex is s-valently bound, then |q •0 | = f − s. We further assume that |q •n | = t, then |q •x | = s − t. By this setup, we have This formula can useful when investigating the role of each receptor in a pathway that requires multimerized binding.

285
Of course, the macroscopic predictions provided in this section cannot exhaust many biological quantities one may wish to study, but with the ideas demonstrated here, the readers can derive their own formulae as needed.

Numerical implementation notes
In this model, most predictions can be calculated directly by closed form 290 formulae after R eq,n 's have been solved. For solving R eq,n 's numerically, we have not found issues with deriving numerical solutions generally, except at extreme affinities (e.g., less than 1 pM K d ) combined with very high valency (e.g., greater than 64), where floating point errors can cause problems with the termination conditions of the root-finding operation. We have learned through 295 experience that one need not set bounds on the root-finding, as the function is monotonic with a single root. While we use auto-differentiation of the model through the Python package Jax or Julia package ForwardDiff.jl during rootfinding (both packages available on GitHub), one can use numerical differencing with identical results. Root-finding of R eq,n is by far the slowest part of the 300 calculations.
Ligand concentration handling. Our model requires the total concentration of ligands at equilibrium, L 0 . There are at least four approaches one could take to rectify some measurement of ligand concentration with the model: (1) First, one could apply an assumption of no ligand depletion. This is an extreme 305 assumption in many cases but can be applicable in in vitro experiments where it is known that the ligand amount is many orders of magnitude greater than that of the receptors. (2) Alternatively, one might know the concentration of ligand in solution after some or all of any ligand depletion has occurred. (3) If one has an estimate of the absolute number of receptors and ligand units 310 before binding, L 0 can be solved numerically by conservation of mass along with R eq,n 's. We have where V is the effective volume of the ligand solution, L init is the total initial ligand concentration, and L init,θ is the initial concentration of complex θ.

315
L bound and L bound,θ may be found at Equations 4.14 and 4.5, depending on the occasion. (4) Finally, certain aspects of the system may not be sensitive to ligand concentration as an input parameter, or one could treat concentration as an unknown.

Application examples
In previous sections, we have shown how all macroscopic predictions made in this work can be written in closed form formulae. Therefore, some computationally expensive analyses are enabled by the efficiency of our model. Here, we provide two examples to demonstrate the utility of large-scale predictions made possible by this model.

Mixture binding prediction
Leveraging a synergistic effect among two or more drugs is of great interest in pharmaceutical development. A challenge in investigating synergy is to identify its underlying source. Most biological pathways follow a similar pattern: when the drug binds to certain surface receptors of a cell, a downstream pathway in 330 the cell is initiated, leading to some actions. Therefore in general, synergism can come from either the initial binding events themselves or downstream signal transduction interactions. Binding-level synergy means that merely using a combination of ligands boosts the amount of binding to the important receptors and thus intensifies the overall effect. Downstream effect synergy indicates 335 that the benefit of using mixtures arises from other cellular regulatory mechanisms two ligands can bring about. The binding model we propose here can help to investigate this issue by offering accurate predictions for the binding of multivalent complex mixtures. In case a (purple circles), since most data points are inside the confidence interval, we can assume the measurement error can explain these variations. In case b (orange crosses), however, the synergism of these complexes are beyond the binding level.
Mixture binding prediction can help us identify the source of synergy. To connect model predictions to experimental measurements, ligand binding might be measured by fluorescently-tagged ligands, while the number of bound re-355 ceptors of a specific type might associate with an indirect measurement such as cellular response. After making a series of measurements for different compositions of mixtures, we can fit the 100% of one complex cases (numbers on the two ends on the plot) first and then compare the mixture measurements to the predictions. Determining whether the downstream effect contributes to 360 the observed synergy (or antagonism) can be framed as a hypothesis testing problem: H 0 : The synergism of the mixture can be explained solely by binding.
The uncertainty of mixture binding prediction comes from measurement errors of receptor abundance and binding affinities. Usually, the receptor expres-365 sion of a cell population has an empirical distribution which can be measured. The confidence intervals in Figure 3 were drawn with the assumption that receptor expression fluctuates up and down by 10% (coinciding with log-normally distributed amounts). Also, due to the measurement technique, the binding affinities may be over-or underestimated [12]. The confidence interval of mix-370 ture prediction can be determined by the model with all these considered, and a p-value can be derived.
We arbitrarily drew some possible experimental results on Figure 3b for demonstration. If most mixture measurements fall within the confidence interval of the predictions (such as case a annotated by the purple circles in 375 Figure 3b), the synergy will very likely come from binding only. However, if the measurements are obviously beyond the confidence interval (case b, the orange crosses), it is reasonable to suspect a synergistic (or antagonistic) effect beyond binding alone. With the flexibility of the binding model, this method can also be extended to a mixture of more than two compounds.

Binding space of a ligand
When a dose of ligand (drug, hormone, cytokine, etc.) is released into the circulation system of an individual due to either physiological response or exogenous administration, the compound will spread and bind to many cell populations to varying extents. An essential question in pharmacology is how much 385 a compound will bind to their intended target populations compared to offtarget ones. This question is important for understanding basic biology as well as developing new therapeutics. For example, hormones and cytokines are important signaling molecules, and having a quantitative prediction of on-and off-target binding can help us understand their mechanism greatly. For drug development, binding prediction can guide optimization to improve specificity toward the intended targets [13]. A cell population can be defined by the receptors they express. Therefore, given the parameters of the dose and the receptor profile of each cell population, our model can make all the predictions discussed previously.

395
From the perspective of this binding model, there is nothing special about any specific cell population. If the local concentration is constant everywhere, our model can map any cell with a certain receptor expression to the amount of binding induced by this dose. If the biological activity of this compound on a cell is related to the quantity of binding to a certain ligand or receptor, the 400 effect of this dose can be written as a function f , with where R tot is a vector of nonnegative entries that describes the cell's expression of N R receptors, and f (R tot ) is the amount of binding. Here, we define the binding behavior of this dose (or any compound) as its binding space.  In Figure 4, we plot the binding space of a bivalent L 1 ligand θ = [2, 0] with 405 concentration 1 nM. The binding affinities are the same as described in the last subsection (5.1). In this binding space, we consider three receptors, R 1 , R 2 , and R 3 . We plot how the amount of binding relates to the cell expression profile, R tot . Here, the amount of R 1 and R 2 varies with the two axes, while R 3 is held constant at 2.0 × 10 3 cell −1 . In this plot, each point represents a cell with  The binding space can provide ample information about the compound. It is an intrinsic property of a ligand given its concentration and other ligand it mixes with, independent of any specific cell. The biological process of drug diffusion to a certain cell is analogous to sampling a point from this binding space. Its gradient indicates in which direction the binding level increases the 425 fastest, as well as to which receptor amount it is the most sensitive. An inactive antagonist that introduces binding competition with the ligand can distort its binding space, and we can visualize it by the change of shape in the contour lines. This plot can also intuitively demonstrate intrapopulation binding variance and interpopulation cell specificity of the compound. With the development of high-430 throughput single-cell methods such as flow cytometry, the expression profiles of a collection of cells can be identified en masse, and we can overlap their results onto a binding space plot (as in Figure 4c). This shows the promise of applying our model to single-cell data. Although we can only visualize two receptors in a plot, binding space applies to any N R types of receptors. Theoretically, 435 the concept of the binding space of a ligand is only complete when all relevant surface receptors are considered.

Discussion
In this work, we propose a mechanistic multivalent binding model that accounts for the interaction among multiple receptors and a mixture of ligand 440 complexes formed by binding monomers. The flexible framework allows a mixture of both homogeneous and heterogeneous ligand complexes, even when they don't have the same valency. We first derive the amount of ligand of a specific binding configuration at equilibrium through the law of mass action. Using this formula, we make macroscopic predictions by applying the multinomial theo-445 rem strategically. Our predictions cover cases where complexes are formed by specific arrangement or random assortment. Finally, we provide two practical examples of how this model can help with biological research.
Compared with previous approaches, the model here is a uniquely scalable and elegant approach to multivalent binding when considering multivalent com-450 plexes of heterogeneous monomer composition and/or multiple receptors. Scalability to higher valency complexes is essential as rule-based computational models become impractical due to a combinatorial explosion of binding states. By contrast, our model can make a large number of predictions easily, enabling mixture synergy analysis and binding space calculations across individual cells.
The mathematical elegance of the model welcomes analytical studies and incorporating it into higher-level computational frameworks. For example, we apply auto-differentiation to ensure accuracy in the root-finding operation when solving for unbound receptor. We have similarly used auto-differentiation to solve for the gradients of the model with respect to input quantities when fitting it 460 to data points. One could even feasibly derive analytical forms of the gradients. This enables one to build more complex computational models on top of this binding framework, such as inferring the composition of multivalent complexes in solution from indirect high-throughput assays. While differentiation of differential equation models is possible through adjoint state methods, solving can 465 be sensitive to the parameters of the system, is much less efficient, and requires trade-offs in accuracy for performance.
The assumptions made in this model may compromise its accuracy in some cases. Our setup has a single crosslinking constant, K * x , to reflect the multivalency effect. In practice, this model has worked well in predicting experimental 470 binding results [13,14,4]. However, the steric effects of a multivalent ligand can be more complicated and context-dependent. The complication of multivalency effect comes from the geometry of ligand complexes that introduced steric effect as well as the distribution of receptors on the cell surface. For instance, the length of the hinge region is needed to estimate the radius of area a molecule 475 can reach [15]. Receptor clustering can be play a big role in the behavior of ligand binding as well [16]. Accounting for these effects requires more in-depth studies than just measuring the monomer binding kinetics. Some other computational approaches investigate steric effects more meticulously, but inevitably introduce some added complexity. For example, previous work has conducted 480 a case-by-case exploration of how ligands bind when distributed randomly or ordered, arranged as a lattice, ring, or chain to give a better hindrance factor estimation [10]. When the actual situation is not known, our model can serve as an adequate starting point.
Although this model is very general purpose, it mainly focuses on the binding 485 dynamics on a cell surface, similar to the previous work on which it is based [7,8,5]. For ligands discordant with the two-step binding process shown in Figure  2, other model constructions might be necessary. For example, some previous work focuses on scaffold proteins as intracellular multivalent complexes [17], but these often lack independence between the individual monomer binding events.

490
In this case, various alternative computational models have been developed [18,19,20]. Surface receptor binding is a universal event in biology. A prevalent question calls for a general solution. We expect this model to be successfully applied to many contexts. Previously, we have used a simpler version of the random 495 assortment model to accurately predict IgG antibody-FcγR interactions [4], and also applied it to fit epithelial cell adhesion molecule binding data [13,14]. We are also working on applying the model to IL-2 immunocomplexes [21], for optimization of high-valency cytokines with specific cell targeting [22], design of cytokine-antibody bispecific antibody fusions, and as a factorization kernel 500 in dimensionality reduction of systems serology data [23]. With the arise of multispecific drugs in the recent decade [24], we expect this model to apply even more widely, exhibit its full competence and facilitate both basic scientific research and new therapy development.
Declaration of interest. This work was supported by NIH U01-AI-148119 to Data and software availability. A Python package of this model and the code for the plots can be found at https://github.com/meyer-lab/valentBind/. 510 We also provide a Julia package of the model at https://github.com/meyer-lab/polyBindingModel.jl/.