## Abstract

Germinal centers (GCs) are the key histological structures of the adaptive immune system, responsible for the development and selection of B cells producing high-affinity antibodies against antigens. Due to their level of complexity, unexpected malfunctioning may lead to a range of pathologies, including various malignant formations. One promising way to improve the understanding of malignant transformation is to study the underlying gene regulatory networks (GRNs) associated with cell development and differentiation. Evaluation and inference of the GRN structure from gene expression data is a challenging task in systems biology: recent achievements in single-cell (SC) transcriptomics allow the generation of SC gene expression data, which can be used to sharpen the knowledge on GRN structure. In order to understand whether a particular network of three key gene regulators (BCL6, IRF4, BLIMP1), influenced by two external stimuli signals (surface receptors BCR and CD40), is able to describe GC B cell differentiation, we used a stochastic model to fit SC transcriptomic data from a human lymphoid organ dataset. The model is defined mathematically as a piecewise-deterministic Markov process. We showed that after parameter tuning, the model qualitatively recapitulates mRNA distributions corresponding to GC and plasmablast stages of B cell differentiation. Thus, the model can assist in validating the GRN structure and, in the future, could lead to better understanding of the different types of dysfunction of the regulatory mechanisms.

## 1 Introduction

Adaptive immune response is a complex mechanism, relying on B and T lymphocytes, which protects the organism against a range of pathogens. Crucial elements of adaptive immune response, the germinal centers (GCs) are the structures in lymphoid organs where activated naive B cells are expanded (in a dark zone, DZ) and selected (in a light zone, LZ) and can have multiple exit fates, such as antibody production (plasmablasts and plasma cells, PB_PC), long term storage of antigen information (memory B cells, MC), or death via apoptosis [1, 2].

It is currently thought that B cell differentiation in GC is controlled by a small network of transcription factors (TFs) constituted by B-cell lymphoma 6 (BCL6), interferon regulatory factor 4 (IRF4) and PR domain zinc finger protein 1 (BLIMP1) [3]. BCL6 controls formation of GC, terminal differentiation of B cells and lymphomagenesis [4, 5]. BCL6 disturbance can be triggered by several mechanisms, including proteasome degradation by BCR, T-cell-mediated CD40-induced IRF4 repression of BCL6 [6, 5], or disruption of BCL6 autoregulation loop [5, 7]. Transcription factor IRF4 is involved in the termination of GC B cell differentiation, in immunoglobulin class switch recombination (CSR) and plasma cell development [8]. Impairment of IRF4 expression is tightly connected with the appearance of multiple malignancies [8]. BLIMP1 regulates pathways responsible for B cell lineage (e.g., PAX5) and GC proliferation and metabolism (e.g., MYC) [9, 10]. BLIMP1 is also involved in the induction of genes (e.g., XBP-1, ATF6, Ell2) facilitating antibody synthesis [11, 12, 13]. These three TFs interact, through various activation/inhibition processes: IRF4 represses BCL6 and activates BLIMP1 [14] (hence being essential for GC maturation and B cell differentiation into plasmablast), BLIMP1 and BCL6 mutually repress each other [15, 16, 17, 18].

Martinez et al. [3] developed a deterministic kinetic ODE model capable of simulating normal and malignant GC exits using a GRN based on these three transcription factors. For the normal differentiation of GC B cells towards PB_PC stage, the kinetic ODE model fits microarray data at two steady-states: the first one associated with the GC stage of B cell differentiation (with high levels of BCL6 and low levels of IRF4 and BLIMP1), and the second one associated with PB_PC stage (with low levels of BCL6 and high levels of IRF4 and BLIMP1).

Recently, multiple protocols for SC RNA-seq data generation have been developed and used to answer various questions in biology [19, 20]. At the same time, different groups showed that gene transcription in eukaryotes is a discontinuous process and follows bursting kinetics [21, 22, 23, 24]. Such results suggest that the stochastic nature of gene expression at the single cell (SC) level can be partly responsible for the phenotype variation in living organisms [25]. Thus, by gaining access to a stochastic behavior of gene expression, the SC viewpoint may lead to further improvement of the understanding of the biological systems and their variability.

Nevertheless, stochastic modeling of GRNs using SC gene expression data is still in its early stage [26, 27] and has never been studied for GC B cells. Here, we apply a particular class of stochastic models combining deterministic dynamics and random jumps, called piecewise-deterministic Markov processes (PDMPs) [28], to the description of GC B cell differentiation. It is a two-state model of gene expression introduced in [29] that allows a description of the system’s dynamics at the promoter, transcription and translation levels for a given GRN. We apply this model to the GRN made of the three key genes, BCL6, IRF4 and BLIMP1, and simulate single B cell mRNA data [30]. We show that the model can qualitatively simulate the SC mRNA patterns for normal B cell differentiation at GC and PB_PC stages.

## 2 Material, Methods and Models

### 2.1 Single-cell data

We used the B cell dataset from human lymphoid organs published by Milpied et al. [30]. The authors studied normal B cell subsets from germinal centers of the human spleen and tonsil and performed integrative SC analysis of gene expression. They used an adapted version of the integrative single-cell analysis protocol [31]. In short, the authors prepared cells for flow cytometry cell sorting. Then in every 96-well plate the authors sorted three to six ten-cell samples of the same phenotype as a single-cell. They performed multiplex qPCR analysis using the Biomark system (Fluidigm) with 96×96 microfluidic chips (fluidigm) and Taqmann assays (Thermofisher) [30]. They obtained results in the form of fixed fluorescence threshold to derive *Ct* values. We used *Ct* values to derive Expression threshold (*Et*) values: *Et* = 30 *Ct*. When there was an unreliably low or undetected expression (*Ct >* 30), *Et* was set to zero [30]. Using SC gene expression analysis of a panel of 91 preselected genes and pseudotime analysis (based on the cartesian coordinates of SC on the first and second principal components of the PCA), the authors separated GC DZ cells, GC LZ cells, memory cells and PB_PC cells.

Here we focused on three genes, BCL6, IRF4 and BLIMP1. We selected the SC gene expression values for BCL6, IRF4 and BLIMP1 for GC DZ cells (317 SC) and for PB_PC (104 SC) (see Figure 5). The experimental dataset includes at the GC B cell stage 30 cells with zero BCL6 mRNA amount, 292 cells with zero IRF4 mRNA amount and 292 cells with zero BLIMP1 mRNA amount. For the end of the B cell differentiation (PB_PC), there were 25 cells with zero BCL6 mRNA amount, 79 cells with zero IRF4 and 5 cells with zero BLIMP1 mRNA amount.

### 2.2 Kinetic ODE model

Martinez et al. [3] derived an ODE model that simulates B cell differentiation from mature GC cells towards PB_PC. Dynamics of each protein (BCL6, IRF4 and BLIMP1) are defined by a production rate (*μ*), a degradation rate (*λ*), a dissociation constant (*k*) and a maximum transcription rate (*σ*). Dynamics are described by System (1)-(3), where *p, b* and *r* account for proteins BLIMP1, BCL6 and IRF4, respectively:
In this model, CD40 and BCR act as stimuli on genes: BCR temporary represses BCL6 and CD40 temporary activates IRF4.

### 2.3 Stochastic model

The stochastic model that describes the coupled dynamics of gene *i* and the other genes of the GRN is defined by the series of equations:
where *E*_{i}(*t*), *M*_{i}(*t*) and *P*_{i}(*t*) are, respectively, the activation status of the promoter, the quantity of mRNA and the quantity of proteins of gene *i*, for *i* ∈ {1, 2, 3}. For *s* ∈{ 4, 5 }, *Q*_{s} accounts for external stimuli intensity. Each index *i* refers to one of the gene in the GRN, either BCL6, IRF4, or BLIMP1, and each index *s* to stimuli BCR and CD40 (see Table 1).

For each gene *i*, System (4) is defined by the promoter state switching rates *k*_{on,i} (*h*^{−1}) and *k*_{o, i} (*h*^{−1}), by a degradation rate of mRNA (*d*_{0,i}, *h*^{−1}), a protein degradation rate (*d*_{1,i}, *h*^{−1}), a transcription rate (*s*_{0,i}, mRNA *h*^{−1}), a translation rate (*s*_{1,i}, protein × mRNA^{−1} ×*h*^{−1}), and interaction parameters (*s*_{1},_{i}, protein × mRNAwith either gene or stimulus *j*. Interactions between genes are based on the assumption that *k*_{on,i} is a function of the proteins *P*_{1}, *P*_{2}, *P*_{3} and stimuli *Q*_{s} and is given by:
Where
In (6), *H*_{j,i} represents an interaction threshold for the protein *j* on gene *i*, while *β*_{i} is a scaling parameter. The structure of System (4)-(6) for the particular network considered in this paper is illustrated in Figure 1.

A detailed derivation of the model is presented in the supplementary material of [29]. Starting from a simple biochemical model of gene expression, the authors described higher-order interactions and took into consideration possible auto-activations. After normalization and simplifcation steps, Herbach et al. [29] and Bonnaffoux et al. [32] described the promoter switching rates *k*_{on,i} and *k*_{off, i} in the form of (5) and (6) by introducing the scaling parameter *β*_{i}.

It can be noted that the promoter state evolution of gene *i* between time *t* and *t* + *δt* in System (4)-(6) is defined, for small *δt*, as a Bernoulli-distributed random variable [29, 32]:
where probability π_{i}(*t*), derived by solving the master equation [29, 33], is given by
It follows that the promoter state of gene *i* averages to *k*_{on,i}*/*(*k*_{on,i} + *k*_{off, i}) in the fast promoter regime (*k*_{on,i} + *k*_{off, i} ≫1*/δt*). This quantity will be used to reduce System (4)-(6) into an ordinary differential equation (ODE) system in Section 3.1.

### 2.4 Simulating the stochastic model

During B cell differentiation in GC, B cells first receive BCR signal, through follicular dendritic cells interaction, that represses BCL6. Then, B cells integrate CD40 signals, through T follicular helper, that activate IRF4 [3, 6, 34].

In order to simulate these interactions, we assumed that BCR was acting on BCL6 from 0h until 25h, and CD40 was acting on IRF4 from 35h until 60h. Stimuli were implemented in three steps: first a linear increase (*t*_{BCR} ∈ [0.5*h*; 1.5*h*]; *t*_{CD40} ∈ [35*h*; 36*h*]), then a stable stimulus (*t*_{BCR} ∈ [1.5*h*; 24*h*]; *t*_{CD40 ∈} [36*h*; 60*h*]), finally a linear decrease (*t*_{BCR} ∈ [24*h*; 25*h*]; *t*_{CD40} ∈ [60*h*; 61*h*]) (see Supplementary Figure S1).

In all simulations, the system evolves for 500h so it can reach a steady state before applying the stimuli (at time *t* = 0h). After the first stimulus (BCR) is applied, the system is simulated for an additional 500h. For each simulation, the amounts of mRNA counts have been collected every 0.5h.

The stochastic system (4)-(6) is defined by 40 parameters, whose values are given in Tables 2 to 5.

### 2.5 Model execution in a computational center

All models were established as part of the WASABI pipeline [32] and were implemented in Python 3. All computations were performed using the computational center of IN2P3 (Villeurbanne/France).

### 2.6 Tuning of the PDMP model

#### 2.6.1 Parameters estimation for the ODE-reduced model

In Section 3.1, we use a reduced, deterministic version of System (4)-(6), namely System (11). Initial guess of each parameter has been chosen randomly in the same order of magnitude as in Bonnaffoux et al. [32]. Specifically, the initial value of *k*_{on} for IRF4 (*k*_{on, init, IRF4}) has been estimated by comparison with values of the kinetic model from Martinez et al. [3]. Initial values of *k*_{on} for BCL6 and BLIMP1 were selected in the same order of magnitude as *k*_{on, init, IRF4}.

#### 2.6.2 Estimation of the parameters for the stochastic model: Automatized approach

After we have established the parameters for the reduced model (11), and we have shown that (11) has two steady states, we used these values as initial guess for the stochastic model (4)-(6). The goal was then to further tune parameter values so the stochastic model (4)-(6) fits the experimental SC data.

We investigated a possible effect of *H*_{j,i} values, *θ*_{j,i} values and *k*_{on, init} values on the quality of the fitting (for each parameter combination, simulation was performed for 200 SC). We tested the values of interaction threshold *H*_{j,i} within the set {0.01, 0.1, 1} for *j, i* ∈ {1, 2, 3}, (*i* ≠ *j*), and the set {0.0001, 0.001, 0.1, 1, 100} for BCR repression stimuli on BCL6 node, CD40 activation stimuli on IRF4 and for *H*_{2,2}. We also tested the values of *θ*_{j,i} by multiplying by a factor *f* _{θ} ∈ {1, 5} for *j, i* ∈ {1, 2, 3}, (*i* ≠ *j*), and by multiplying by a factor *f*_{θ} {1, 10} for BCR repression stimuli on BCL6 node, CD40 activation stimuli on IRF4 and for *θ*_{2,2} (IRF4 autoactivation loop). In total we tested two different values of *θ*_{j,i} for 8 interactions (2^{8}), 3 values of *H*_{j,i} for 5 interactions (3^{5}), and 5 values of *H*_{j,i} for 3 interactions (5^{3}), generating 2^{8} × 3^{5} × 5^{3} ≈ 7.8 10^{6} combinations of parameters.

During this automatized tuning procedure, we selected a set of parameter values that allows the system to provide the best fit of the experimental mRNA values for BCL6, IRF4 and BLIMP1 at the GC stage, based on a quality-of-fit criterion. This criterion was defined as a comparison between the average model-derived values (ϒ) and the average experimental values (Ω), with an objective function (*OF*) to minimize for the set of genes *G* = {*BCL*6, *IRF* 4, *BLIMP* 1} and stages *ST* = {*GC, PB*_*PC*} defined by
The quality-of-fit criterion is then
where *PS* is the set of parameter values from Tables 2 to 5.

#### 2.6.3 Estimation of the parameters for the stochastic model: Semi-manual tuning

The automatized estimation procedure was followed by a semi-manual tuning of the parameters of the stochastic model (4)-(6) to improve the quality of the fit. Values of candidate parameters have been tested in an interval of interest and the rest of the parameter values have been fixed at this stage. After model execution, model-simulated SC values of gene expression were collected. Then we selected the values of the parameters that provided the best qualitative fitting (see Equation (10)) of the experimental SC data. Ranges of tested values are summarised in Table 6.

### 2.7 Evaluation of model variability using Kantorovich distance

To compare distributions and to evaluate model variability, we used the Kantorovich distance (KD, particular case of Wasserstein distance, WD), as defined by Baba et al. [35] and implemented in Python 3 by Bonnaffoux et al. [32].

Consider two discrete distributions *p* and *q*, defined on *N* bins of equal sizes, and denote by *x*_{k} the center of the *k*-th bin. Then the Kantorovich distance between *p* and *q* is given by
We chose WD because it suggested to be preferable over alternative methods such as Kullback-Leibler (KL) divergence or Jensen-Shannon (JS) divergence [36]. More specifically, WD does not require that distributions belong to the same probability space. At the same time, WD is more tractable and has higher performance compared to KL divergence [37]. JS divergence, in turn, does not provide a gradient for the distributions of non-overlapping domains, compared to WD [36]. Also, because WD is a metric and accounts both for the “cost” for the transfer (distance) and “the number of counts” to transfer, we selected its 1D case of WD (Kantorovich Distance, KD) for comparison of discrete experimental distributions versus model-derived distributions [38].

## 3 Results

### 3.1 Reduced model

In [3], Martinez et al. applied the kinetic ODE model (1)-(3) to the BCL6-IRF4-BLIMP1 GRN associated with B cell differentiation and successfully simulated GC B cell dynamics based on microarray data. Before using the complex, stochastic model (4)-(6) to fit SC data, we considered a reduced version of System (4)-(6) that can be compared to model (1)-(3), hence providing an initial guess for a key parameter of the model. Since model (1)-(3) is deterministic, it is necessary to simplify the stochastic model (4)-(6) to perform a comparison of both models dynamics. We assume, in this section, that the stochastic process *E*(*t*) (promoter status) in (4)-(6) equals its mean value, ⟨*E*(*t*) ⟩, given by *k*_{on}*/*(*k*_{on} + *k*_{off}). System (4)-(6) then reduces to
Comparing mathematical formulations of systems (1)-(3) and (11), one can see that it is possible to identifiy an initial value of the promoter state *E*(*t*) for IRF4 gene in System (11) that will correspond to GC differentiation stage (see Supplementary Material A.1). Indeed, after rewriting System (11) in terms of System (1)-(3), we obtained the candidate value of *k*_{on, init, IRF4} = 1.7 10^{−3}. Using this value of *k*_{on, init, IRF4}, System (11) successfully simulates two steady states for IRF4, i.e. it recapitulates the qualitative dynamics of System (1)-(3) (see Figure 2).

Before application of BCR and CD40 stimuli, the system is at a steady state (simulating GC B cell stage) that corresponds to a low amount of IRF4 and BLIMP1 and a high amount of BCL6 mRNA molecules. After application of both stimuli, the system has transitioned to a second steady state that corresponds to a high number of IRF4 and BLIMP1 mRNA molecules and a low number of BCL6 mRNA molecules. However, it can be noted that for the current parameter set (see Tables 2-5, version I), System (11) underestimates the amount of IRF4 mRNA at both steady states (see Figure 2). Dynamics of System (11) shows the existence of two steady-states for the parameter set from Tables 2-5, version I. Notably, if we test a random value of *k*_{on, init, IRF4} in combination with the parameters from Tables 2-5, version I (see Supplementary Table S1), System (11) has only one steady-state (see Supplementary Figure S2). To our knowledge, there may be more than one set of parameter values associated with two steady states of System (11).

We showed that for the parameter set from Tables 2-5, version I, the reduced model (11) is capable to qualitatively recapitulating the expected behavior of GC B cell differentiation GRN (see Figure 2). Next we wanted to understand if the stochastic system (4)-(6) can fit the experimental SC data.

### 3.2 Stochastic modeling of B cell differentiation

#### 3.2.1 Assessing the variability of the stochastic model

Due to the stochastic nature of the stochastic system (4)-(6), it is important to first evaluate the variability of the model-generated SC data, that is of model’s outputs. Indeed, when one repeatedly simulates a finite number of cells from the stochastic system (4)-(6) for the same parameter value set (Tables 2-5, version I), the resulting model-derived empirical distributions are slightly different between each run due to the stochasticity of the model. We investigated how strongly shapes of distributions of simulated SC mRNA molecules vary for different executions of model (4)-(6).

We evaluated the level of variability of model (4)-(6) using the Kantorovich distance (KD, see Section 2.7). We simulated 200 datasets, each containing 200 single cells, of System (4)-(6) with a fixed parameter set (see Tables 2-5, version I). We estimated the KD between pairs of simulated datasets (mRNA counts for three genes at GC and PB_PC stages for 200 simulated cells), and obtained a distribution of all KD that we call the model-to-model (m-t-m) distribution (Figure 3). Shapes of m-t-m distributions are different for each gene and stage of differentiation. For instance, for BLIMP1, long tails are observed. We can also notice that the mean value of IRF4 at GC stage is low compared to other genes.

In order to get a more accurate evaluation of the variability in model’s outputs, we plotted distributions of the number of mRNA molecules (model’s outputs) for each node of the GRN with the highest m-t-m distribution at both GC and PB_PC stages (Figure 4). Qualitatively, no difference is detected in the shapes of model-generated distributions. For all 6 nodes, the shapes of distributions are remarkably similar. These results suggest that it may be sufficient to perform parameter tuning of the stochastic model (4)-(6) using only one simulation run for each parameter value set.

#### 3.2.2 Initial estimation step based on an automatized approach

Variability of the stochastic model being assessed, and comparison of experimental data and a single model’s output in order to assess their closeness being validated, we now focus on the estimation of parameter values. Model (4)-(6) comprises 40 parameters, so we first apply a straightforward strategy, that we call automatized approach, which consists in solving the stochastic system (4)-(6) for a number of fixed parameter values and selecting the set of parameter values associated with the best fit (see Section 2.6.2) of experimental data [30].

Approximately 8 × 10^{6} combinations of parameter values have been tested (see Section 2.6.2), then the best set of parameter values has been selected based on the quality of BCL6, IRF4 and BLIMP1 fitting at the GC and PB_PC stages (see Equations (9)-(10)). Numbers of mRNA molecules estimated by the stochastic model (4)-(6) are in a similar range of magnitude as the experimental SC data (see Supplementary Figure S3). However, the selected parameter values (Tables 2-5, version II) generate model-derived mRNA distributions that have sufficient overlap with experimental data for GC stage but insufficient overlap for PB_PC stage (see Supplementary Figure S3). Indeed, distributions of numbers of mRNA molecules at PB_PC stage mostly underestimate the experimental SC data (see Supplementary Figure S3B, D and F).

Implementing an automatized approach for estimating parameter values helped to establish a set of parameter values that allows System (4)-(6) to correctly estimate the number of mRNA molecules for 3 out of 6 nodes of the GRN. In order to improve the quality of the fit, a more directed and sensitive tuning of the parameter set is then performed (see Section 2.6.3).

#### 3.2.3 Generation of simulated distributions of mRNA counts describing B cell differentiation

Due to the complexity of the stochastic model (4)-(6) that is made of 40 parameters, it is important to identifiy which parameters should be targeted to improve the quality of fit. To do so, we rely on the properties of the GRN (see Figure 1A). Thanks to the topological structure of the BCL6-IRF4-BLIMP1 GRN, where IRF4 activates BLIMP1 and autoactivates itself, we hypothesize that System (4)-(6) underestimates the experimental SC data at the PB_PC stage due to low values of the parameters responsible for IRF4 autoactivation (*θ*_{22}, and to a lesser extent *s*_{0, IRF4}) and BLIMP1 activation by IRF4 (*θ*_{23}). Further, we improved the quality of the fit, in particular of BLIMP1 distribution, by focusing on BLIMP1-related interaction parameters (*θ*_{13}, *θ*_{31}).

Indeed, if IRF4 autoactivation reaction is not efficient enough, there are not enough IRF4 molecules to affect BCL6 and BLIMP1 activity at PB_PC stage. Because IRF4 activity is only impacted by its autoactivation loop, we first modulated values of the parameter related to this reaction (*θ*_{22}). During preliminary tests, we noticed that this reaction is crucial for the transition from GC towards PB_PC stage and that when interaction parameter *θ*_{22} and transcription rate *s*_{0, IRF4} have low absolute values then the system cannot reach PB_PC stage, even after application of the stimuli. It can be explained by the insufficient amount of IRF4 molecules produced (see Supplementary Figure S3, C and D). On the other hand, when parameters *θ*_{22} and *s*_{0, IRF4} have high values, model (4)-(6) transitions from GC towards PB_PC stage even before application of stimuli, exhibiting an abnormal behavior.

After comparison of the stochastic system (4)-(6) outputs for a range of different *θ*_{22} and *s*_{0, IRF4} values (described in Table 6), we selected the parameter set for which model (4)-(6) correctly fits the IRF4 experimental data at both GC and PB_PC stages. Such model-derived SC pattern is obtained using the values (*θ*_{22} = 11 and *s*_{0, IRF4} = 2.1 molecule.*h*^{−1}) We additionally performed simulations to improve the quality of the fitting of BLIMP1 and BCL6 distributions by testing parameters that are directly responsible for the balance between BLIMP1 and BCL6, such as interaction parameters *θ*_{13}, *θ*_{31} and *θ*_{23}. We also tested parameters which can influence BCL6 and BLIMP1 indirectly, such as transcription rates (*s*_{0, BCL6} and *s*_{0, BLIMP1}), and degradation rates of mRNA (*d*_{0, BCL6}, *d*_{0, IRF4} and *d*_{0, BLIMP1}).

After comparison of the stochastic system (4)-(6) outputs, we selected the parameters which allow the model to have a qualitative fit of the experimental data for all nodes at GC and PB_PC stages (see Figure 5, and Tables 2-5, version III). For this tuned parameter set, we see that the model (4)-(6) can have a good qualitative fitting of experimental data for all nodes. Results also show that for this parameter set (version III), the stochastic model (4)-(6) fits SC data at the GC stage for BCL6 (see Figure 5A). The model-derived empirical distribution of BLIMP1 was capable of showing overlap with experimental data at the PB_PC stage (see Figure 5F), but it overestimated the number of BLIMP1 mRNA molecules at the GC stage (see Figure 5E).

The current parameter set (Tables 2-5, version III) has difficulties to correctly evaluate the number of zero values. The model (4)-(6) tends to overestimate the number of BCL6 mRNA molecules at PB_PC stage, as well as the number of IRF4 mRNA molecules at GC stage and number of BLIMP1 mRNA molecules at GC stage (see Figure 5). Nevertheless, this parameter set allowed the model to generate SC data with a similar level of magnitude of the amount of mRNA as experimentally observed.

## 4 Discussion

In this work, we applied a particular class of stochastic models combining deterministic dynamics and random jumps to the simulation of SC data from two stages of B cell differentiation in germinal centers.

We first defined a reduced model (11) whose dynamics were compared to the ones of the kinetic model (1)-(3) and we established an initial parameter value for the key parameter *k*_{on, init, IRF4}. We then showed that for a given parameter set (Table 2-5, Version I), the reduced model (11) admits two steady states. Secondly, we evaluated the effect of stochasticity on multiple independent generations of the number of mRNA molecules by the stochastic model (4)-(6) and we confirmed that for the same parameter set there is no noticeable difference between each model-generated outputs for BCL6-IRF4-BLIMP1 GRNs (see Figure 4). These results allow performing a combined parameter screening with the confidence that for each candidate parameter set, the algorithm needs to perform only one run of the model (4)-(6). Lastly, we showed that the model (4)-(6) can simulate distributions of the number of mRNA molecules for BCL6, IRF4, BLIMP1 at GC and PB_PC stages with the same order of magnitude as experimental data. However, as future scope of this work, a few strategies to improve the final parameter value set (Tables 2-5, version III) can be investigated.

Since in BCL6-IRF4-BLIMP1 GRN, IRF4 activity depends only on its autoactivation reaction, we have only succeeded, by writing the reduced model (11) in terms of the kinetic model (1)-(3), in estimating the value of *k*_{on, init, IRF4}. It would be advantageous to additionally estimate the values of *k*_{on, init, BCL6} and *k*_{on, init, BLIMP1}, using the same logic. However, because BLIMP1 depends on BLIMP1, IRF4 and BCL6 (see Equation (1)) and BCL6 depends on both IRF4 and BLIMP1 (see Equation (2)), the rewriting of system (4)-(6) in terms of (1)-(3) would require additional calculations and simplifications.

The effect of mutual repression between BCL6 and BLIMP1 could be evaluated by performing a more extensive parameter value search. The current parameter value set (Tables 2-5, version III) makes model (4)-(6) overestimate the number of mRNA molecules of BLIMP1 at GC stage. Increasing BCL6 repression of BLIMP1 could potentially decrease the quantity of BLIMP1 at the GC stage.

The effect of the duration of the BCR and CD40 stimuli on the differentiation from GC B cells towards PB_PC could be investigated. Multiscale modeling of GCs performed by Tejero et al. [39] showed that CD40 signalling in combination with the asymmetric division of B cells results in a switch from memory B cells to plasmablasts. It would be relevant to evaluate a possible application of the stochastic model to study the effect of combined CD40 and BCR signaling with different intensities and durations at the SC level.

Additionally, one can evaluate the impact of including additional genes into the BCL6-IRF4-BLIMP1 GRN on the quality of data fitting by the stochastic model. One of the possible candidates to incorporation in the GRN is PAX5, which plays an important role in directing lymphoid progenitors towards B cell development [40]. PAX5 positively regulates IRF8 and BACH2, which indirectly positively regulate IRF4 and negatively regulate BLIMP1 at an early stage of B cell differentiation. During further development, BLIMP1 starts to repress PAX5, consequently decreasing the expression of IRF8 and BACH2. The correct orchestration of PAX5-IRF8-BACH2 during B cell differentiation is important for the successful differentiation towards antigen producing cells (PB_PC), while its malfunction can cause aberration in GC B cell development [41].

CD40 stimulation of B cells initiates NF-*ŸB* signaling which is associated with cellular proliferation. In B cells, NF-*ŸB* activates IRF4, negatively regulates BACH2, what leads to positive regulation of BLIMP1 and consecutive repression of BCL6 [5, 34].

Another important transcription factor in GC development is MYC, which regulates B cell proliferation [42] and the DZ B cell phenotype [43]. MYC indirectly activates the histone methyltransferase enhancer of zeste homologue 2 (EZH2), which is responsible for the repression of IRF4 and BLIMP1 [44, 45, 46, 47].

The transcription factors mentioned above are present in the SC RT-qPCR dataset from Milpied et al. [30] that we used and could be used to extend the current GRN. Inclusion of additional transcription factors may have both positive and negative effects on the application of model (4)-(6). On one side, it can increase the computational time and the number of parameters required for simulating System (4)-(6). On the other side, because the inclusion of transcription factors can more precisely describe the biological system it could improve the quality of the fitting. However, any inclusion of new nodes to GRN should be carefully evaluated and only essential transcription factors should be added. For instance, there are no advantages in adding a transcrption factor that would only have one downstream output. As an example, MYC activates E2F1 and further activates EZH2. For this reason, incorporation of the chain MYC-E2F1-EZH2 should have a similar outcome, as the incorporation of the simplified MYC-EZH2 reaction. This is expected because in the modeling, intermediate elements of one-to-one redundant reactions can be omitted without significant changes in the quality of the simulations.

To further continue our study, we could also use SC RNA-seq dataset from Milpied et. al [30]. The authors have produced SC RNA-seq dataset from GC B cells and analysed the similarities between SC RNA-seq and SC RT-qPCR dataset. Even though the gene-gene correlation levels were lower in SC RNA-seq compared to SC RT-qPCR, SC RNA-seq analysis confirmed the observation obtained by SC RT-qPCR [30]. From the stochastic modeling perspective, combining the data from SC RT-qPCR and SC RNA-seq should improve our understanding of the SC dataset variability and the quality of the fitting. To summarise, the stochastic model (4)-(6) is capable of qualitatively simulating and depicting the stochasticity of experimental SC gene expression data of human B cell differentiation at the GC and PB_PC stages using a GRN made of three-key genes (BCL6, IRF4, BLIMP1). These results are encouraging, and suggest that our model may be used to test the different B cell exits from GC. Future steps may include testing of the model (4)-(6) on alternative SC datasets [48, 49, 50] and investigating the malignant formations, by evaluating differences of the associated GRN compared to the normal B cell differentiation from GC towards PB_PC.

## Funding

This work was supported by the COSMIC grant (www.cosmic-h2020.eu) which has received funding from European Union’s Horizon 2020 research and innovation program under the Marie Sk≥odowska-Curie grant agreement no. 765158.

## Acknowledgments

We thank Arnaud Bonnaffoux and Matteo Bouvier for their help with the WASABI framework and their critical reading of the manuscript. We thank the computational center of IN2P3 (Villeurbanne/France), especially Gino Marchetti and Renaud Vernet. We also thank Aurelien Pélissier and Elias Ventre for their scientific discussions.

## Footnotes

↵* olivier.gandrillon{at}ens-lyon.fr