Abstract
Multivalent cell surface receptor binding is a ubiquitous biological phenomenon with functional and therapeutic significance. Predicting the amount of ligand binding for a cell remains an important question in computational biology as it can provide great insight into cell-to-cell communication and rational drug design toward specific targets. In this study, we extend a mechanistic, two-step multivalent binding model to account for multiple ligands and receptors, optionally allowing heterogeneous complexes. We derive the macroscopic pre-dictions for both specifically arranged and randomly assorted complexes, and demonstrate how this model enables large-scale predictions on mixture binding and the binding space of a ligand. This model provides an elegant and computationally efficient framework for analyzing multivalent binding.
1 Introduction
Binding to extracellular ligands is among the most fundamental and universal activities of a cell. Many important biological activities, and cell-to-cell communication in particular, are based on recognizing extracellular molecules via specific surface receptors. For example, multivalent ligands are common extracellular factors in the immune system [8], and many computational models have been applied to study IgE-FcεRI [5], MHC-T cell receptor [7], and IgG-FcγR interaction [16].
In this study, we extend a simple two-step, multivalent binding model to cases involving multiple receptors and ligand subunits [1, 2, 3, 12, 7]. By harnessing the power of combinatorics via applying the multinomial theorem and focusing on macrostates, we can predict the amount of binding for each ligand and receptor at the equilibrium state. Our model provides both generality and computational efficiency, allowing large-scale predictions such as characterizing synergism of using a mixture of ligands and depicting the binding space of a compound. The compactness and elegance of the formulae enable both analytical and numerical analyses. We expect this binding model will be widely applicable to many biological contexts.
2 Preliminaries
2.1 Vector and matrix notation
In this work, we denote a vector in boldface letter and its entry in the same letter but with subscript and not in boldface, e.g. C = [C1, C2,…, Cn]. The sum of elements for a vector is denoted as .
For any matrix (Aij) of size m × n, we denote the vector formed by its i-th row as Ai• = [Ai1, Ai2, …, Ain], and the vector formed by its j-th column as A•j= [A1j, A2j, …, Amj]. The row sums of matrix (Aij), therefore, can be written as |A1•|, |A2•|, …, |Am•|, and column sums |A•1|, |A•2|, …, |A•n|.
In this work, multinomial coefficients such as n choose k1, k2, …, kn will be written as
The implicit assumption here is that |k| = n, and each ki ∈ ℕ.
2.2 Some useful theorems in combinatorics
From the binomial theorem, we know that
Differentiating both sides by Φ, we get
We can derive similar property from the multinomial theorem. Assume the elements of a nonnegative integer vector q add up to f, or |q| = f. Given another nonnegative vector φ with sum of elements |φ|, we have
Differentiate both sides by φm where φm can be any entry of φ, and rearrange, we have
We can multiply two different multinomial theorem equations together, too. Let u and v are two nonnegative integer vectors, a and b are two nonnegative vectors, and |u| = m, |v| = n, we have
Throughout this paper, we consolidate multiple summation symbols into one. In this case, we use Σ|u|=m,|v|=n as a shorthand for Σ|u|=m Σ|v|=n. From Eq.(2), we can derive the sum of a linear combination of two exponents from each multinomial term as where k1 and k2 are constants.
We can extend this to the product of N multinomial equations. Let q1, …, qN be N nonnegative integer vectors, each with |qi | = θi, and Ψ1,… , ΨN be N nonnegative vectors. Then, the sum of any linear combination of exponent terms , where kr’s are constants and each is the tr-th element of , can be calculated as
3 Model setup
3.1 Parameters and notations
In this study, we investigate the binding between multivalent ligand complexes and a cell expressing various surface receptors. As shown in Figure 1, we consider NL types of distinct monomer ligands, namely , and NR types of distinct receptors expressed on a cell, namely . The monovalent binding association constant between Li and Rj is defined as Ka,ij. A ligand complex consists of one or several monomer ligands, and each of them can bind to a receptor independently. Its construction can be described by a vector , where each entry θi represents how many Li this complex contains. The sum of elements of vector θ, |θ |, is f, the valency of this complex.
The binding configuration at equilibrium between an individual complex and a cell expressing various receptors can be described as a matrix (qij) with NL rows and (NR + 1) columns. For example, the complex bound as shown on the top left corner in Figure 1 can be described as the matrix below it. Entry qij represents the number of Li to Rj binding, and qi0, the entry on the 0-th column, is the number of unbound Li on that complex in this configuration. This matrix can be unrolled into a vector form of length NL(NR + 1). Note that this binding configuration matrix (qij) only records how many Li-to-Rj pairs are formed, regardless of which exact ligand on the complex binds. For example, in Figure 1, swapping the two L2’s binding to R2’s will give us the same configuration matrix. Therefore, we will need to account for this combinatorial factor when applying the law of mass action.
We know from the conservation of mass that for this complex, must hold for all i. Mathematically, vector θ is the row sums of matrix (qij). The corresponding θ of a binding configuration q, θ(q) which is written in the format of a function, can be determined by this relationship. Also, the sum of elements in q, |q| = f, the valency.
The concentration of complexes in the solution is L0 (not to be confused with Li, the name of ligands, when i = 1, 2, …, NL). The number of ligand complexes in the solution is usually much greater than that of the receptors and so it is a common practice to assume binding does not deplete the ligand concentration.
On the receptor side, Rtot,i is the total number of Ri expressed on the cell surface. This usually can be measured experimentally. Req,i is the number of unbound Ri on a cell at the equilibrium state during the ligand complex-receptor interaction, and it needs to be calculated from Rtot,i as we will explain later.
The binding of a ligand complex, a large molecule, is complicated. To simplify the matter, we will need to make some key thermodynamic assumptions. In this model, we make two assumptions on the binding dynamics:
The initial binding between a free (unbound) complex and a surface receptor Rj has the same affinity (association constant, Ka,ij) as the monomer ligand Li;
In order for the detailed balance to hold, the affinity constant of any sub-sequent binding event on the surface of a cell after the initial interaction must be proportional to their corresponding monovalent affinity. We assume the subsequent binding affinity in multivalent interactions between Li and Rj to be .
is a term coined as the crosslinking constant. It captures the difference between free and multivalent ligand-receptor binding, including but not limited to steric effects and local receptor clustering [4]. In practice this term is often fit to apply this model to a specific biological context.
We create two more variables that will help to simplify our equations through-out this work. For all i in {1, 2, …, NL}, we define and where j = {1, 2, …, NR}, and we define Ψi0 = 1, φi0 = Ci. Therefore, φij = ΨijCi holds for all i and j. Then we define the sum of this new matrix (φij) as , and . The rationale of these definitions will become clear in future sections.
3.2 The amount of a specific binding configuration
Now we will derive the amount of complexes bound with the configuration described as q on a cell at equilibrium, vq.
Within the definitions of our model, we know that the composition of any complex can be described by a vector θ of length NL, where each entry θi represents the number of monomer Li this complex consists of. We can enumerate all possible binding configuration of θ complex by filling the matrix (qij) with any nonnegative integer values so long as its row sums equal θ. Conversely, we can imply the complex composition given any binding configuration q by finding its row sums, θ(q). For a certain configuration q, its θ(q) is determined and has concentration L0Cθ(q). If the corresponding complex θ(q) does not exist in the solution, Cθ(q) = 0. Since we assumed that binding will not deplete the ambient concentration of any θ(q), it will remain L0Cθ(q) at equilibrium.
Initial binding
We start with the initial binding reaction of a complex, Li-to-Rj. As shown in Figure 2, the reactants of this reaction are the free complexes and the free receptors Rj (in this case R2), and the product are Li-to-Rj (in this case L2-R2) monovalently bound complexes q(1). We denote the concentration of this new complex as . The concentration of free complexes is . By the assumption of the model, the equilibrium constant for the reaction is Ka,ij. Therefore, we have
While the binding configuration of q(1) can be described by qa, the total amount of complexes that bind as described as qa may not be the same as vq(1), since qa does not consider the number of ways this binding Li can be chosen. An equivalent explanation is that, q(1) is only one possible microstate to achieve the qa configuration, and we need to count how many microstates are possible for qa. Accounting for this statistical factor, we have since θ(q(1)) = θ(qa). qa,i • is a vector formed by the i-th row of qa. For example, in Figure 2, qa,2 • = [2, 0, 1, 0]. Conceptually, can be understood as the number of ways to split θi Li’s into qi0 of unbound units, qi1 of R1-bound, qi2 of R2-bound, …, and of -bound. Here, only qi0 and qij will be nonzero, with q = θ − 1 and q = 1, so it is effectively the same as . However, the multinomial coefficient expression can be generalized into more complicated cases.
Subsequent binding
For a subsequent binding between Li and Rj (i and j are not necessarily the same as in initial binding), we have the reactants as a bound complex, q(1), and a free receptor Rj (in the case shown by Figure 2, R2), while the product is another bound complex, q(2). The equilibrium constant is , then
To account for the statistical factors for , we have . For example, in Figure 2, qb,2• = [1, 0, 2, 0]. Putting these together, we have
By recursion, we can solve vq for any q from these equations. It is if we define for j = 1, 2, …, NR and Ψi0 = 1 for all i. is a shorthand for . In the next section, we will use this formula repeatedly.
Notice that this equation is not suitable for calculating the concentration of unbound q, when every nonzero values are on its 0-th column. The concentration of unbound ligands should always be L0Cθ(q). However, for algebraic convenience, we allow such definition and will name it v0,eq which equals .
4 Macroscopic equilibrium predictions
From here we will investigate the macroscopic properties of binding, such as the total amount of ligand bound and receptor bound on a cell surface at equilibrium. We consider two different ways complexes in the solution to be formed. First, complexes can be formed in a specific arrangement. In this case, the structure and exact concentration for each kind of complex are designed and known. Alternatively, we can set a fixed valency f for all complexes given the known proportion of each ligand monomer. Through random assortment, any combination of f monomer ligands can form a complex, and their concentration will follow a multinomial distribution. We will explore these two cases separately.
4.1 Complexes formed in a specific arrangement
When complexes are specifically arranged, the structure and proportion of each kind are well-defined. To formulate this mathematically, we assume that we have various kinds of complexes, and each of them can be described by a vector θ of length NL, with each entry θi as the number of Li in this complex. The valency of each complex may be different, and for complex θ its valency is |θ|. The proportion of θ among all complexes is defined as Cθ, and the concentration of each θ complex will be L0Cθ. For example, if we create a mixture of 20% of bivalent L1 and 80% of bispecific L1 − L2, then θ1 = [2, 0], θ2 = [1, 1], , and . If the mixture solution has a total concentration of 10 nM, then the concentration of θ1 is 2 nM, and the concentration of θ2 is 8 nM.
We further conceptualize that Θ is a set of all existing θ’s. By this setting, we should have Σ θ∈Θ Cθ = 1. These complexes will bind in various configurations which can all be described as a q. We define Q as a set of all possible q’s, and we borrow the notation q ⊆ θ to indicate any binding configuration q that can be achieved by complex θ. This is equivalent to |qi • | = θi for all i, or θ is the row sum of (qij).
Solve the amount of free receptors
A remaining problem in the model setup is that in practice we can only experimentally measure the total amount of receptor of each kind expressed by a cell, Rtot,j, while the amount of free receptors at equilibrium, Req,j, though being used extensively in the model derivation, is unknown. To find Req,j, we first need to derive the amount of bound receptors of each kind, Rbound,j, then use conservation of mass to solve Req,j numerically.
To calculate the amount of bound ligand Rbound,n, we can simply add up all entries at the n-th column for every q’s: where , and .
By the conservation of mass, we have
In this equation, Rtot,n are known, and any Ψi• is a function of every Req,j, j = 1, 2, …, NR, so all Req,j need to be solved together. This system of equations usually does not have a closed form and must be solved numerically. When implementing, we suggest taking the logarithm of both sides of these equations so the exponents can be eliminated and the range is restricted to positive numbers.
As a side note, the total amount of bound receptors regardless of which kind is
The amount of bound ligand complexes
Our model makes many macro-scopic predictions readily accessible. For example, the amount of ligand bound at equilibrium is a useful quantity when measuring the overall quantity of tagged ligand. To compute this number, we can add up all vq except the q’s that only have nonzero values on the 0-th column, v0,eq. Consequently, the model prediction of bound ligand at equilibrium is when , and the predicted amount of bound complex θ (complex of each kind) is
The amount of fully bound ligands
In multivalent complexes like bispecific antibodies, drug activity may require that all subunits be bound to their respective targets [13]. The predicted amount of ligand bound f -valently can be calculated as with , the qi• vector without qi0. In this equation, the multinomial coefficient describes the number of ways one can allocate θi receptors to any position in the i-th row of the (qij) matrix except the 0-th row which stands for unbound.
In fact, the predicted amount of any specific-valently bound ligands can be derived in such manner. For example, the amount of ligands that bind monovalently can be calculated as
This can be used for estimating the amount of multimerized ligands, Lmulti = Lbound − v1,eq, and multimerized receptors, Rmulti = Rbound − v1,eq.
4.2 Complexes formed through random assortment
Another common mode of forming multivalent complexes in biology, such as in the formation of antibody-antigen complexes [16], is engagement of monomer units to a common scaffold. Instead of resulting in a specific arrangement, we provide binding compounds of a fixed valency f and a litany of monomer ligands, and complexes can form through random assortment. The concentration of these complexes, therefore, will follow a multinomial distribution.
To formulate this mathematically, we denote the proportion of Li as Ci, and . For example, we have 40% L and 60% L in the solution to form dimers (f = 2), then C1 = 40%, C2 = 60%. Assume complex formation follows a binomial distribution, there will be 16% bivalent L1, 36% bivalent L2, and 48% L1 L2 complex. When a complex is randomly assembled from the monomer ligands, the probability of such complex formed as described by θ is
Since , we know that
Plugging this relationship between Cθ and Ci into the equation for the amount of a specific binding configuration derived in the previous section, we have where and φi0 = Ci.
Solve the amount of free receptors
Like in the specific arrangement case, we still need to solve Req,n numerically from Rtot,n. We first derive the amount of bound receptors of each kind at equilibrium as
Then by the conservation of mass, we have the equation to numerically solve for Req,n:
Again, since Φ is a function of every Req,n, all Req,n need to be solved together.
The amount of k-valently bound complexes
For randomly assorted complexes, we first derive the amount of ligands that bind k-valently. As we will show, it has a nice expression that can used to calculate many other quantities conveniently. First, let’s break q into two separate vectors, q = (q•0, q•x). We define the vector formed by the 0-th column of q which stand for unbound as q•0, and the one formed by the other elements as q•x. By the model setup, we know |q| = f, |q•x| = k, and |q•0| = f − k. We then have
The amount of total bound ligands and receptors
Many macroscopic properties can be derived from vk,eq. For example, the amount of total bound ligands is simply the sum of ligands bound monovalently to fully, and can be simplified to
Similarly, the total bound receptors should be
As we show here, these quantities all have elegant closed form solutions, and they are only dependent on Φ, a single value that incorporate all information about receptor amounts, monomer ligand compositions, and binding affinities.
The number of cross-linked receptors
In some biological contexts such as T cell receptor-MHC [7] or antibody-Fc receptor [16] interactions, signal transduction is driven by receptor cross-linking due to multivalent binding. The amount of total cross-linked receptors can be derived from vk,eq as
To find the number of crosslinked receptors of a specific kind, Rn, requires extra consideration. Similar to how vk,eq was found, we break break q into three separate vectors, q = (q•0, q•n, q•x). q•0 is the vector formed by the 0-th column of q, q•n is the vector formed by the n-th column of q, and q•x contains all others. If we assume that a complex is s-valently bound, then |q•0| = f−s. We further assume that |q•n| = t, then |q•x| = s−t. By this setup, we have
This formula can useful when investigating the role of each receptor in a pathway that requires multimerized binding.
Of course, the macroscopic predictions provided in this section cannot exhaust many biological quantities one may wish to study, but with the ideas we have demonstrated here, the readers can derive their own formulae as needed.
5 Application examples
In previous sections, we have shown how all macroscopic predictions made in this work can be written in closed form formulae. Therefore, many computational methods such as auto-differentiation and sensitivity analysis can be easily applied. These analyses will bring great insights into the complex behavior of multivalent binding. Here, we provide two examples to demonstrate the advantage of large-scale predictions made possible by this model.
5.1 Mixture binding prediction
Leveraging the synergistic effect among two or more drugs is of great interest in pharmaceutical development. A challenge in investigating synergy is to identify its underlying source. Most biological pathways follow a similar pattern: when the drug binds to certain surface receptors of a cell, a downstream pathway in the cell is initiated, leading to some actions. Therefore in general, synergism can come from either the initial binding events themselves or downstream processes. Binding-level synergy means that merely using a combination of ligands boosts the amount of binding to the important receptors and thus intensifies the overall effect. Downstream effect synergy indicates that the benefit of using mixtures arises from other cellular regulatory mechanisms two ligands can bring about. The binding model we introduced can help to investigate this issue by offering accurate predictions for the binding of multivalent complex mixtures.
In Figure 3, we provide an example of mixture binding predictions. We investigate a mixture of two types of ligand complexes, bivalent L1 (θ1 = [2, 0]) and bispecific L1−L2 (θ2 = [1, 1]). The crosslinking constant is set to be , similar to previous results [16]. We predict the amount of binding of this mixture to a cell expressing three types of receptors, with Rtot = [2.5 × 104, 3 × 104, 2 × 103] cell−1. The affinity constants of L1 to these three receptors are Ka,1• = [1 × 108, 1 × 105, 6 × 105] M−1, and of L2, Ka,2• = [3 × 105, 1 × 107, 1×106] M−1. Figure 3 shows the predicted ligand bound (left panel) and R3 bound (right panel) for only θ1 or θ2 with L0 from 0 to 1 nM, and their mixtures in every possible composition with total concentration L0 = 1 nM (from and to and ).
Mixture binding prediction can help us identify the source of synergy. To connect model predictions to experimental measurements, ligand binding might be measured by fluorescently-tagged ligands, while the number of bound receptors of a specific type might associate with an indirect measurement such as cellular response. After making a series of measurements for different compositions of mixtures, we can fit the 100% of one complex cases (numbers on the two ends on the plot) first and then compare the mixture measurements to the predictions. Determining whether the downstream effect contributes to the observed synergy (or antagonism) can be framed as a hypothesis testing problem:
H0: The synergism of the mixture can be explained solely by binding.
The uncertainty of mixture binding prediction comes from measurement errors of receptor abundance and binding affinities. Usually, the receptor expression of a cell population has an empirical distribution which can be measured. The confidence interval in Figure 3 is drawn with the assumption that receptor expression fluctuates up and down for 10%, similar to the confidence interval of a log-normal distribution. Also, due to the measurement technique, the binding affinities may be over- or underestimated [14]. The confidence interval of mixture prediction can be determined by the model with all these considered, and a p-value can be even derived.
If most mixture measurements fall within the confidence interval of the predictions (such as case a annotated by the red circles in Figure 3, left panel), the synergy will very likely come from binding only. However, if the measurements are obviously beyond the confidence interval (case b, the red squares), it is reasonable to suspect a synergistic (or antagonistic) effect beyond binding alone. Because of the binding model’s flexibility, this method can also be extended to a mixture of more than two compounds.
5.2 Binding space of a ligand
When a dose of ligands (drug, hormone, cytokine, etc.) is released into the circulation system of an individual due to either physiological responses or exogenous administration, the compounds will spread and bind to many cell populations to varying extents. An essential question in pharmacology is how much a compound will bind to their intended target populations compared to off-target ones. This question is important for understanding basic biology as well as developing new therapeutics. For example, hormones and cytokines are important signaling molecules, and having a quantitative prediction of on- and off-target binding can help us understand their mechanism greatly. For drug development, binding prediction can guide optimization to improve specificity toward the intended targets [18]. A cell population can be defined by the protein they express, especially their surface receptors. Therefore, given the parameters of the dose and the receptor profile of a cell population, our model can make all the predictions discussed previously.
From the perspective of this binding model, there is nothing special about one specific cell population. If the local concentration is constant everywhere, our model can map any cell with a certain receptor expression to the amount of binding induced by this dose. If the biological activity of this compound on a cell is related to the quantity of binding to a certain ligand or receptor, the effect of this dose can be written as a function f, with where Rtot is a vector of nonnegative entries that describes the cell’s expression of NR receptors, and f (Rtot) is the amount of binding. Here, we define the binding behavior of this dose (or any compound) as its binding space.
In Figure 4, we plot the binding space of a bivalent L1 ligand θ = [2, 0] with concentration 1 nM. The binding affinities are the same as described in the last subsection. In this binding space, we consider three receptors, R1, R2, and R3. We plot how the amount of binding relates to the cell expression profile, Rtot. Here, the amount of R1 and R2 varies with the two axes, while R3 is held constant at 2.0 × 103 cell−1. Then we use colors and contour lines to show the amount of binding. From these two plots, we can see that although both ligand binding and R2 binding increase with more receptors, ligand binding is more sensitive to R1 amounts, and R2 binding R2 amounts. To consider any specific cell population, we only need to determine where its expression profile falls on the plot and read the predictions from the contour line. For example, on the left panel, the red cell population will have about e5.2 = 181 bound ligands per cell. The number of contour lines a population ride on can also show intrapopulation variation. In this case, we expect the variation in ligand binding to fall between e4.3 = 74 and e6.0 = 403.
The binding space can provide ample information about the compound. It is an intrinsic property of a ligand given its concentration and other ligand it mixes with, independent of any specific cell. The biological process of drug diffusion to a certain cell is analogous to sampling a point from this binding space. Its gradient indicates in which direction the binding level increases the fastest, as well as to which receptor it is more sensitive. An inactive antagonist that introduces binding competition with the ligand can distort its binding space, and we can visualize it by the change of shape in the contour lines. This plot can also intuitively demonstrate intrapopulation binding variance and interpopulation cell specificity of the compound. With the development of high-throughput single-cell methods such as flow cytometry, the expression profiles of a collection of cells can be identified en masse, and we can overlap their results onto a binding space plot (as in Figure 4, right panel). This shows the promise of applying our model to single-cell data. Although we can only visualize two receptors in a plot, binding space applies to any NR types of receptors. Theoretically, the concept of the binding space of a ligand is only complete when all relevant surface receptors are considered.
6 Discussion
In this work, we propose a mechanistic multivalent binding model that accounts for the interaction among multiple receptors and a mixture of ligand complexes formed by binding monomers. We first derive the amount of lig- and of a specific binding configuration at equilibrium through the law of mass action. Using this formula, we make macroscopic predictions by applying the multinomial theorem strategically. Our predictions cover cases where complexes are formed by specific arrangement or random assortment. Finally, we provide two practical examples of how this model can help with biological research.
Compared with many previous approaches, this model has several clear advantages. First of all, it is extremely efficient, and it is capable of handling a large number of receptors, ligands, and complexes types. This allows the model to make large-scale predictions easily, enabling mixture synergy analysis and binding space calculations. The mathematical elegance of the model welcomes analytical studies and incorporating it into more complicated frameworks.
The assumptions made in this model may compromise its accuracy in some cases. For example, the steric effects of a multivalent ligand can be more complicated and context-dependent. Our setup has a single crosslinking constant, , to reflect the multivalency effect. In practice, this model works well in predicting experimental binding results [18, 15]. Some other computational approaches investigate the steric effect more meticulously, but inevitably introduce considerable added complexity [4]. When the actual situation is not known, our model can serve as an adequate starting point.
Although this model is very general purpose, it mainly focuses on the binding dynamics on a cell surface, similar to the previous work on which it is based [1, 2, 3]. For intracellular ligands discordant with the multivalent velcro shape shown in Figure 2, this model may be less suitable. For example, some previous works focus scaffold proteins in the cell signaling system for quantitative analysis [9], and various computational models different from ours have been developed [11, 6, 10].
Surface receptor binding is a universal event in biology. A prevalent question calls for a general enough solution. The model we present in this work can be successfully applied to many contexts, including predicting Fc-FcγR interaction [16] and fitting epithelial cell adhesion molecule binding data [18, 15]. With the arise of multispecific drugs in the recent decade [17], we expect this model to apply even more widely, exhibit its full competence and facilitate both basic scientific research and new therapy development.
Declaration of interest
This work was supported by NIH U01-AI-148119 to A.S.M. The authors declare no competing financial interests.
Author contributions
Z.C.T.: Methodology, Writing – original draft; A.S.M.: Funding acquisition, Writing – review & editing.