Temporal mixture modelling of single-cell RNA-seq data resolves a CD4+ T cell fate bifurcation

Differentiation of naïve CD4+ T cells into functionally distinct T helper subsets is crucial for the orchestration of immune responses. Due to multiple levels of heterogeneity and multiple overlapping transcriptional programs in differentiating T cell populations, this process has remained a challenge for systematic dissection in vivo. By using single-cell RNA transcriptomics and computational modelling of temporal mixtures, we reconstructed the developmental trajectories of Th1 and Tfh cell populations during Plasmodium infection in mice at single-cell resolution. These cell fates emerged from a common, highly proliferative and metabolically active precursor. Moreover, by tracking clonality from T cell receptor sequences, we infer that ancestors derived from the same naïve CD4+ T cell can concurrently populate both Th1 and Tfh subsets. We further found that precursor T cells were coached towards a Th1 but not a Tfh fate by monocytes/macrophages. The integrated genomic and computational approach we describe is applicable for analysis of any cellular system characterized by differentiation towards multiple fates. One Sentence Summary Using single-cell RNA sequencing and a novel unsupervised computational approach, we resolve the developmental trajectories of two CD4+ T cell fates in vivo, and show that uncommitted T cells are externally influenced towards one fate by inflammatory monocytes.


Introduction
T helper (Th) cells, also known as CD4 + T cells, are key instructors of the immune system (1). They display extensive functional and phenotypic diversity in response to a spectrum of immune challenges, including viral, bacterial, fungal and parasitic infection, immunogenic cancers, and autoimmune and allergic stimuli. Th cell subsets are distinguished from each other most frequently by the cytokines they secrete. Th1 cells produce interferon-γ, leading to macrophage activation and enhanced killing of intracellular pathogens. Th2 cells produce IL-4, IL-5, and IL-13, prompting eosinophils to act against extracellular parasites and venom. Th17 cells produce IL-17 and IL-22, promoting neutrophilic responses against extracellular bacteria and fungi. Follicular T helper (Tfh) cells, a more recently defined Th subset, secrete IL-21, and drive somatic hypermutation of immunoglobulin genes in germinal centre B cells. This produces high affinity antibodies, upon which many licensed vaccines depend for efficacy. Since Th subsets can both control infections and drive immune-mediated diseases there remains tremendous interest in the molecular mechanisms that control their in vivo development.
In order for Th cells to develop, CD4 + T cells must first be raised from an immunologically naïve state by antigenic stimulation of their highly diverse T cell receptors (TCR), which is followed by processes of clonal proliferation and differentiation. Recent in vivo data suggested that the unique TCR sequence of a single naïve CD4 + T cell imparts a genetically programmed 3 preference towards a particular Th fate (2). However, co-stimulatory and cytokine signals can also profoundly influence both the magnitude of the response, and skewing towards particular Th fates. Several master transcription factors have been described in CD4 + T cells that drive and stabilize Th fates, which supports a view of Th development as a choice between clearly distinct states. However, the relationship between Th subsets, particularly between Tfh and other Th fates remains unclear in vivo.
In many cases, immune challenges, such as infection or vaccination, induce concurrent differentiation into two or more Th fates within the same individual. Indeed, by performing a limiting dilution single-cell adoptive transfer of naïve CD4 + T cells, it was suggested that daughter cells from a particular clone could bifurcate phenotypically to give rise to both Th1 and Tfh cells (2). However, it was not possible to visualize and pinpoint the bifurcation of Th1/Tfh cell fates in vivo.
Resolving Th cell fate decision-making in vivo using population-level approaches has been challenging, mainly due to extensive heterogeneity amongst differentiating cells. More specifically, CD4 + T cells at any given time point display a distribution of intermediate and transitional states, which blurs the dynamics of Th cell developmental progression (3). Tfh differentiation, in particular, has been difficult to elucidate since it involves multiple stages with potential overlap with transcriptional programs of other Th subsets. Of particular note, computational tools for modelling bifurcations in cellular decision-making have not been available.
Th cell fate decisions are driven by both intrinsic factors and external signalling cues from other cells. Conventional dendritic cells (cDCs) are important cellular sources of antigenic stimulation, co-stimulation and cytokines for Th differentiation in secondary lymphoid tissues. Intra-vital imaging in lymph nodes has demonstrated that cDCs make long-lasting stable contacts with naïve CD4 + T cells in order to initiate T cell priming (4). Once activated, CD4 + T cells continue to require antigenic stimulation via their TCR to optimize their proliferation and Th differentiation (5)(6)(7). Continued signalling has been reported to be important for Th1 responses, although the cell types providing this signal remain unknown (4). A recent report suggested that CXCR3 expression by activated CD4 + T cells facilitated continued interaction with adoptivelytransferred CXCL9 and CXCL10-expressing cDCs (8), however, interactions with endogenous myeloid cell populations, including cDC subsets and monocytes have not been studied in vivo. While Tfh cells are sustained, once generated, via multiple molecular interactions with B cells in developing germinal centres (9,10), possible roles for myeloid cells in providing early instruction towards a Tfh fate remain relatively unexplored. A recent study targeted antigens to two different cDC-subsets in vivo, and suggested that CD8α -cDCs displayed the greater propensity for generating Tfh responses (11). Whether Th1/Tfh fate bifurcation can result from differential interactions with cDC subsets or activated monocytes currently remains unknown.
Herein we have used single-cell RNA sequencing (scRNA-seq) to study the various transcriptional states of individual CD4 + T cells during blood-stage Plasmodium chabaudi infection in mice. This is an experimental model of malaria in which CD4 + T cells are essential 4 for controlling parasite numbers, and which is characterized by concurrent development of Th1 and Tfh cells (12). We have used Plasmodium-specific TCR transgenic CD4 + T (PbTII) cells to minimise the effects of TCR diversity on Th fate decisions.
Crucially, our approach builds on scRNA-seq profiling coupled with new computational strategies to reconstruct the differentiation trajectories of Th1 and Tfh cells at a single-cell resolution. Our data reveals, for the first time, the molecular detail of how a single antigenspecific CD4 + T cell clone can undergo parallel development into Th1 and Tfh states in vivo, and reveals the hierarchical regulation of genes involved in this cell fate decision. Finally, we investigated intercellular interactions using scRNA-seq, and predicted roles for inflammatory monocytes, after cDC-dependent T cell activation, in coaching uncommitted CD4 + T cells, specifically towards a Th1 fate.

scRNA-seq resolves Th1 and Tfh cell fates during Plasmodium infection in mice
To study concurrent progression towards Th1 and Tfh fates, and to characterize the heterogeneity associated with this process during an in vivo CD4 + T cell response, we performed scRNA-seq of PbTII cells during PcAS infection ( Figure 1A, Figure S1). We transferred naïve, proliferative dye-labeled PbTII cells into congenically marked wild-type mice, and recovered them at days 2, 3, 4, and 7 post-infection (p.i.) by fluorescence-activated cell sorting (FACS) of those expressing the early activation marker, CD69, or displaying dilution of the proliferative dye ( Figure S2). Flow cytometric measurements of the canonical Th1 markers, T-bet (coded by Tbx21) and Interferon-γ, and Tfh markers, CXCR5 and Bcl6, indicated that these subsets emerged in parallel by day 7 p.i. (13,14) ( Figure 1B-D). Notably, markers of Th2, Th17 or Treg subsets were not upregulated on the PbTII cells ( Figure S3).
We initially used Principal component analysis (PCA) to assess the overall heterogeneity of the PbTII cells ( Figure 1E, Figure S4A). In all time points, the first principal component was strongly associated with the number of detected transcripts, which is reflective of changes in cellular RNA content and, in general, is linked to proliferative status ( Figure S4B). As expected, the variability related to previously established Th1 and Tfh gene expression signatures became more prominent with the progression of time (15) (Figure S4C). Notably, at day 7 p.i., a PCA using these signature genes alone recapitulated the results of the genome-wide PCA (Spearman correlation -0.87) ( Figure S5). Amongst the cells from day 7 p.i., two distinct subpopulations were apparent, separated along PC2 ( Figure 1E). Notably, many of the genes associated with these subpopulations have been identified as associated with either Th1 or Tfh fates ( Figure 1F, Table S1). Results from a global PCA of the entire dataset were largely in accordance with the time point information, with the Th1/Tfh signature genes showing separation along multiple PCs ( Figure S6). Taken together, these results suggested a progressive commitment to Th1 and Tfh fates, and indicated that single-cell transcriptomes could be used for estimating both proliferative states and degrees of differentiation of individual cells. 5

Unbiased delineation of Th1 and Tfh trajectories using a Mixture of Gaussian Processes model
The results from the PCA suggest that gene expression variation in PbTII single-cell transcriptomes permit reconstruction of the transcriptional programs underlying Th1 and Tfh differentiation. To more explicitly model the temporal dynamics of the differentiation process, we developed and applied GPfates, a temporal mixture model that builds on the Gaussian Process Latent Variable Model (GPLVM) and Overlapping Mixtures of Gaussian Processes (OMGP) (16). This approach first reconstructs the differentiation trajectory from the observed data ("pseudotime", Figure 2A-B), thereby establishing an order for the cells. While our model uses the sample time as prior information, the inferred temporal orderings did not strictly adhere to these experimental time points ( Figure S7). For example, cells from day 4 p.i. were mixed with some of the cells from day 3 and day 7 at either end of the day 4 pseudotime distribution. This was consistent with the idea that bulk assessments of cells at specific time points fail to take into account the heterogeneity and differential kinetics of responses made by single cells. We also repeated this analysis without supplying the experimental sampling times to the model, finding overall consistent results (Comp. Supp. Figure 8).
In a second step, GPfates uses a time series mixture model, which we adapted from a model that was initially developed to deconvolve temporal data into independent separate trends, and which is related to previous time series models for bulk gene expression time series (16). Using this approach, we identified two simultaneous trends ( Figure 2C-D). These two alternative trajectories were in agreement with the Th1/Tfh signature genes identified by Hale et al. (15) ( Figure 3A-D), indicating that the fitted mixture components correspond to cells with Th1 and Tfh phenotypes. Notably, these trends could not be identified by other published methods for reconstructing single-cell trajectories (17,18) (Figure S8). Furthermore, the mixture modelling in GPfates could also successfully resolve bifurcation events in two other recently published scRNAseq datasets, which examined lung epithelial development in mice (Comp. Supp. Figure  11) (19) and primordial germ cell development in human embryos (Comp. Supp. Figure 12) (20). This suggests that pseudotime inference coupled with time series mixture modelling is applicable more generally for studying cellular differentiation in scRNAseq data.
Next, we sought to more clearly characterize the bifurcation time point. Using a change point model to annotate the inferred trajectories (see section 4.2 of the Computational Supplement), we could divide pseudotime into before and after bifurcation. We sought to characterize single cells that existed at the Th1/Tfh bifurcation point. Firstly, bifurcation initiated amongst cells from day 4 p.i. (see section 6.2 of computational supplement for a robustness analysis), specifically at a relatively early point in pseudotime compared with all day 4 p.i. cells ( Figure 4A). Bifurcating PbTII cells also expressed the largest number of genes compared to those at all other points in pseudotime.
High transcriptional activity correlated with upregulation of Mki67 and other known proliferation marker genes (21) (confirmed at the level of Ki-67, Figure 4B-C and S9A). It also correlated 6 with cell cycle activity, based on computational allocation of cells into cell cycle stages (22), and flow cytometric confirmation of DNA content and cell size ( Figure 4D-E). Bifurcating PbTII cells also had increased expression of genes associated with aerobic glycolysis (data not shown), an indication of increased metabolic requirements being met by glucose metabolism and increased mTORC1 activity. Consistent with this was the observed elevated levels of ribosomal protein S6 phosphorylation by day 4 p.i. (Figure 4F).
Taken together, our data indicate that bifurcating PbTII cells exhibit a highly proliferative and metabolically active state, coupled with the upregulation of thousands of genes. Importantly, progression from the Th1/Tfh bifurcation point to either fate was marked by widespread silencing of gene expression across the genome. Although this decrease in gene expression can be partially explained by a deceleration in cell cycle speed, it is also consistent with other cellular differentiation processes characterized at a single-cell resolution (19).

Detectable expression of endogenous T cell receptor loci reveals breadth of clonotype fates
Since previous reports have suggested a role for TCR sequences in determining Th cell fate (2), our TCR transgenic approach was designed to minimize this potential source of variability. Importantly however, PbTII cells were generated in mice with functional Rag1 and Rag2 genes, and therefore, retained natural expression of highly diverse endogenous TCR chains in addition to the transgenic TCR. Sequence analysis of TCR transcripts in single PbTII cells confirmed universal expression of the PbTII Vα2 and Vβ12 chains in all cells (Supplementary Tables 2 &  3). Moreover, it confirmed highly diverse, though lower levels of expression of endogenous TCRα chains in many cells ( Figure S10).
Given the vast combinatorial diversity of endogenous TCR sequences, we employed these as unique molecular barcodes to scan for PbTII cells that could be inferred with high confidence to have derived from a single common PbTII progenitor clone. Notably, we identified six clones comprising two or more sibling cells, while all other PbTII cells were individually unique. Of these six clones, two consisted of sibling cells that mapped close to the bifurcation point. For the remaining four clones, siblings exhibited highly diverging patterns of gene expression, with three sibling groups falling at the extremities of the Th1-Tfh phenotype spectrum ( Figure 5A). These results demonstrate that during an in vivo infection, the progeny of a single CD4 + T cell clone can differentiate into both Th1 and Tfh cells.

Transcriptional signatures associated with bifurcation of Th1 and Tfh fates
Next, we sought to identify genes whose expression followed the pattern of branching. We derived bifurcation statistic to estimate the concordance with the bifurcation for individual genes (see section 4.2 of the Computational Supplement text for details, Figure 5B). Among the highest-ranking bifurcating genes, the most common pattern was an increase in expression during progression to the Th1 fate. These genes were positively correlated with both pseudotime 7 and the Th1 trend assignment ( Figure 5B). This suggests that Tfh cells are in fact developmentally closer to the highly proliferative progenitor state than Th1 cells as the Th1 fate involves up-regulation of numerous genes not expressed in either the progenitor or Tfh states.
The highest-ranking transcription factors were Tcf7 for the Tfh fate, and Mxd1, Bhlhe40, Hopx, Pgs1 and Id2 for the Th1 fate ( Figure 5C). In addition, the hallmark Tfh transcription factor Bcl6 was strongly associated with the Tfh fate. Tcf7 is required for T cell development, and has been recently shown to be instrumental for Tfh differentiation (23,24). Notably, it represented one of the rare genes defined by a decrease in expression when moving towards the Th1 fate. Of the Th1-associated transcription factors, Mxd1 is a negative regulator of the proliferation-associated, proto-oncogene, Myc (25) and Bhlhe40 has been recently identified as a cofactor of T-bet (coded by Tbx21) (26). Id2 is known as an antagonist of Tcf7 (27) and as a regulator of effector CD8 + T cell responses. Notably, while this manuscript was under revision, the role of Id2 as a key driver of Th1 responses was independently shown by another study (28).
Our results strongly support reciprocal regulation of Id2 and Tcf7 as a key feature of the Th1/Tfh bifurcation process. Expression of Id2 and Ifng were highly correlated in the later stages of Th1 differentiation, and negatively correlated with Tcf7, both at a transcriptional and protein level ( Figure 5D-F, Figure S11). Notably, the hallmark Th1 transcription factor Tbx21 was induced before the bifurcation point, and showed only modest separation after bifurcation ( Figure S12).
To validate the robustness of these gene signatures and the timing of the bifurcation, we repeated the infection, and at days 0, 4 and 7 sequenced additional single PbTII-cells using the Smart-seq2 protocol (29) ( Figure 1A & S13A). Consistent with the original data, the cells from day 7 (but not day 4) segregated into two subpopulations correlating with Th1 and Tfh gene signatures ( Figure S13A). Subset-characteristic co-expression patterns of the bifurcating genes identified by GPfates emerged by day 7 ( Figure S13B). Notably, at this time, the cells from the different mice could be equally separated into distinct Th1-and Tfh-subpopulations using the top bifurcating genes ( Figure S13C). Taken together, this indicated that the gene expression patterns associated with the cell fate bifurcation were reproducible across experiments and sequencing platforms.
In Th1 cells, a large fraction of the bifurcating genes were cytokine and chemokine receptors, including the top-ranked gene, Cxcr6, confirmed at protein level (Fig S14A and S14B), other established Th1 markers, Ifngr1 and Il18rap (30, 31), and the chemokine receptors Ccr2 and Ccr5 ( Figure 5C). These data were consistent with the idea that Th1 cells can migrate to peripheral tissues and remain receptive to external signals. In contrast, the only bifurcating chemokine receptor associated with a Tfh fate was Cxcr5, a gene established to mediate migration of Tfh cells into B cell follicles (32, 33).
Cxcr5 was among an early wave of chemokine receptor genes, including Cxcr3 and Ccr4 ( Figure  5G) whose expression and translation into protein ( Figure S14C) was initiated before the Th1/Tfh bifurcation point had been reached. We hypothesized that differences in the timing of expression of receptors reflected their roles in controlling differentiation or effector function. We reasoned, for instance, that while Cxcr6, Ccr2 and Ccr5 served to mediate trafficking and 8 effector function of Th1 cells, others such as Cxcr3 and Cxcr5 controlled Th cell fate via interactions with other immune cells ( Figure 5H). Indeed, Cxcr5 allows T cell trafficking towards B cells (34, 35), while Cxcr3 has been associated with cDC-driven Th1 fates (8).

Myeloid cells support a Th1 but not Tfh fate
After activation and proliferation, PbTII cells reached an uncommitted state around the bifurcation and expressed chemokine receptors that indicated receptiveness to other chemokineexpressing cells. Given that B cells were essential, as expected, for supporting a Tfh fate in PbTII cells (Fig S15), we hypothesized that myeloid cells provided alternative, competing signals to promote a Th1 fate.
To study this, we performed scRNA-seq on splenic cDCs and inflammatory monocytes when activated PbTII cells were yet to bifurcate. We sorted CD8α + and CD11b + cDCs and Ly6C hi monocytes from naïve and infected mice ( Figure S16) and subjected these to single-cell analysis. PCA of cDCs firstly distinguished between the two naïve cell types, separating them along PC2 ( Figure 6A & S17) with an efficiency consistent with recent data (36), and further highlighting a number of expected and previously unknown cDC subset-specific genes (Figure S18A-C). We next compared naïve cDCs with those from infection ( Figure 6A & S16), and separated these along PC6 ( Figure 6A). Analysis of differential gene expression between cDCs from naive and infected mice identified 30 genes, 29 upregulated ( Figure 6B & S19), including interferonassociated transcription factors, Stat1 and Irf1, and CXCR3-attractant chemokine genes, Cxcl9 and Cxcl10. Notably, gene expression patterns amongst individual cDCs varied according to the gene. For example, Stat1 and Irf1 were heterogeneously expressed amongst individual naïve cDCs, and further upregulated during infection ( Figure 6C). This was similar for Cxcl9, which was expressed by CD8α + cDCs in naive mice, while Cxcl10 was induced only upon infection ( Figure 6C). These data revealed interferon-associated gene expression amongst individual cDCs, and also suggested interactions between cDCs and uncommitted CXCR3 + PbTII cells, consistent with a recent study (8). Next, PCA of Ly6C hi monocytes from naïve and infected mice distinguished them from each other along PC2 ( Figure 6D & S20). Differential gene expression analysis between naïve and infected groups uncovered ~100 genes, both up-and down-regulated during infection ( Figure 6E & S21). This illustrated a fundamental difference in the directionality of transcriptional changes in individual monocytes compared to cDCs during Plasmodium infection, with only monocytes exhibiting down-regulation of gene expression ( Figure 6B-C & E-F). Interestingly, a high proportion (~40%) of genes upregulated in cDCs were also induced in Ly6C hi monocytes, including transcription factors Stat1 and Irf1, and the chemokine Cxcl10 ( Figure 6E & F), suggesting possible overlapping biological functions between these cell types. In addition, monocyte-specific chemokines were also observed, including Cxcl2, Ccl2 and Ccl3 ( Figure 6E & F). Furthermore, specific examination of all immune cellular interaction genes ( Figure S22) revealed emerging variable expression of Tnf, Cd40, Pdl1, Ccl4, Ccl5, Cxcl16, Cxcl9, and Cxcl11 in monocytes, thus suggesting complex interactions and multiple roles for Ly6C hi monocytes during infection . 9 Given that Cxcl9-11, Ccl2, Ccl3 and Ccl5 signal through either Cxcr3 or Ccr4, which were expressed by activated but uncommitted PbTII cells, we next hypothesized that Ly6C hi monocytes, in addition to cDCs, might interact with PbTII cells, thereby influencing Th1/Tfh fate (8). To begin testing this, we first confirmed chemokine expression at protein level by Ly6C hi monocytes, focussing on CXCL9 ( Figure 6G). Kinetics of CXCL9 production was similar in cDCs and Ly6C hi monocytes, consistent with a possible role in interacting with CXCR3 + PbTII cells. To test whether monocytes could influence Th1/Tfh bifurcation in vivo, we employed LysMCre x iDTR mice, in which Ly6C hi monocytes could be depleted after PbTII cell activation, but before bifurcation ( Figure 6H, Figure S22A). We also noted a modest reduction in CD68 + macrophages using this approach, with no evidence for depletion of cDCs or marginal zone macrophages ( Figure S23). In this transgenic approach, Th1 fates, but not Tfh fates, were supported by monocytes/macrophages ( Figure 6H). Together, these data supported a model in which progression of activated, uncommitted PbTII cells towards a Tfh fate was dependent upon B cells ( Figure S14), and a Th1 fate was promoted by chemokine-expressing myeloid cells, including Ly6C hi inflammatory monocytes.

Discussion
By capturing single CD4 + T cell transcriptomes over time, and using a novel analysis approach to reconstruct the continuous course of events, we have resolved the bifurcation of naive CD4 + T cells into Th1 and Tfh cells at an unprecedented level of molecular detail, and illustrated that external cellular signals influence Th fate around the point of bifurcation. Importantly, the GPfates modelling of scRNA-seq data is not limited to immune cells or single bifurcation events. The mixture of time series model we used can also be combined with existing computational workflows (17, 37) (see section 5.2 of the Computational Supplement). Therefore, it provides the means for high-resolution analysis of differentiation in any cellular system, mainly towards two fates, as shown by our examination of existing embryonic development and lung tissue regeneration data (Comp. Supp. Figure 11), and, in principle, also for differentiation into multiple cell types (Comp. Supp. Figure 12), for example, during haematopoiesis. The filtered expression data and gaussian process models presented in this study can be found on our interactive web application at data.teichlab.org, where users can visualise their own genes of interest.
Our data reveals the developmental relationship between Th1 and Tfh cells on a genomic scale, and shows that the same naïve precursor can give rise to both fates simultaneously. It provides insights for the early stages of differentiation, and describes the order of transcriptional events before and after the bifurcation of Th1 and Tfh fates. To date, this process has remained incompletely characterised. Here, we use pseudo-temporal ordering of cells to reveal the hierarchy of transcriptional regulation of these events at an unprecedented resolution. Our data highlight the importance of stochastic expression of transcription factors as well as chemokine receptors, suggesting a role for noisy gene expression in Th development.
Transcriptomic profiling previously suggested developmental similarities between Tfh and Th1 cells (38), with in vitro studies suggesting relatively late bifurcation of Tfh and Th1 cells (39). However, highly immunogenic viral or bacterial infections induced CD4 + T cells to segregate into Bcl6 + (Tfh) or Blimp-1 + (Th1) subpopulations within two days, and by three days, fatecommitted Tfh cells had developed (40-42). In our parasitic model, single CD4 + T cell transcriptomes remained remarkably similar until four days of infection. Although it is difficult to directly compare viral or bacterial systems with our parasitic model, we speculate that due to infection-related differences in antigen-presenting cell function, antigen load and availability, Plasmodium infection in mice does not drive Th bifurcation as early as observed with highly immunogenic viruses or bacteria. Evidence of sub-optimal MHCII antigen-presenting cell function early during Plasmodium infection (43, 44) raises the hypothesis that Th bifurcation is sensitive to immune-suppression. Our data indicate that uncommitted, activated CD4 + T cells are heterogeneous, but nevertheless closely related at a transcriptional level, suggesting considerable flexibility throughout the proliferative phase of their response. Such plasticity during Th differentiation has been proposed to be beneficial as a means of countering evolution of immuneevasion strategies by pathogens (3).
As CD4 + T cells progress from immunological naivety towards a Th fate, they may experience different cellular microenvironments, even within the confines of secondary lymphoid tissue. The observation that bifurcation towards Th1 and Tfh cells was preceded by upregulation of chemokine receptors prompted us to investigate possible interactions with chemokine-expressing myeloid cells. Previous studies have highlighted the potential for cDCs in lymph nodes to produce Th1-associated chemokines (8). Our study, which focused on the spleen, was consistent with this concept, and, furthermore, implicated inflammatory monocytes in Th1 support. However, since our transgenic approach for depleting monocytes also removed a portion of splenic red pulp macrophages, we cannot discount the possibility that red pulp macrophages may partly contribute to a Th1 fate. Nevertheless, our data support a model in which myeloid cells in the spleen influence bifurcation, and support a Th1 fate during Plasmodium infection. Moreover, our studies emphasise that although cDCs are the predominant professional antigen-presenting cell for initiating CD4 + T cell activation in the spleen, other myeloid cells also exhibit a capacity to influence towards a Th1 fate. In contrast, Th bifurcation towards a Tfh fate was not supported by monocytes/macrophages. Instead, given that CXCR5 was the only chemokine receptor significantly associated with bifurcation towards a Tfh fate, cellular interaction with B cell follicles may be the primary mechanism for supporting activated CD4 + T cells towards a Tfh fate. Our model suggests that activated, uncommitted CD4 + T cells become receptive to competing chemoattractant signals from multiple cell types in different zones of the spleen. This model focuses on intercellular communication as the main driver of bifurcation. However, upstream of these processes, internal stochasticity in uncommitted CD4 + T cells may control the balance of chemokine receptor expression (45), thus mediating differential trafficking and variation in intercellular interactions. Future experiments combining our integrated single-cell genomics and computational approach with in vivo positional and trafficking data may reveal molecular relationships between internal stochasticity, migratory behaviour, Th fate and perhaps immunological memory. (D) Flow cytometry data indicate concurrent differentiation of Th1 (IFNγ + ) and Tfh (CXCR5 + ) PbTII CD4 + T cells within the spleen of PcAS-infected mice (n=4). Index expression is the product of MFI and proportion IFNγ + or CXCR5 + . These data are representative of two independent experiments. MFI, mean fluorescence intensity.
(E) PCA of single PbTII cells at 7 days post-infection with PcAS. The arrows represent the Pearson correlation with PC1 and PC2. Cell size refers to the number of detected genes. The size of the data points also represents cell size. "Th1 signature" and "Tfh signature" refer to cumulative expression of genes associated with Th1 or Tfh phenotypes (15). PC, Principal Component.
(F) Expression of top 50 genes with largest PC2 loadings of day 7 cells (D). The genes were annotated as Th1-or Tfh-associated based on public datasets (15,38,46,47). *Cdk2ap2 appears twice because two alternative genomic annotations exist. PC, Principal Component (A) Overview of analysis abilities from the framework of Gaussian Processes. Data is modelled and interpreted on the cellular level using the global genomic level data. Through downstream analysis from these models, it is possible to investigate individual genes to explain the drivers of the different models.
(B) Sketch of the analysis workflow. A low-dimensional model of the non-linear highdimensional data is inferred by Bayesian GPLVM. The low-dimensional representation is then modelled as an Overlapping Mixture of Gaussian Processes. This gives us a data-trend assignment per cell which can be used for interpretation. Since the models are all predictive, the low-dimensional model can be interpreted in the original high-dimensional space.

(C)
The low-dimensional representation of our data. The blue line depicts the progression of pseudotime. The text labels illustrate features of typical cells on that region of the pseudotime, and are provided purely as a visual aid. (A) Th1 and Tfh states were defined as cells with assignment probability of ≥0.8 for the respective trend. For each single cell, cumulative expression of Th1 and Tfh signature genes (15) was calculated as in Figure 1E.
(B) The effect of the probability threshold on the cumulative expression of Th1 and Tfh signature genes. The p-values were calculated using Wilcoxon rank sum test.   Tables 2 and 3).
(B) Identification of genes associated with the differentiation of Th1 or Tfh cells. For every gene, the correlation of its expression with pseudotime (x-axis) and Tfh trend assignment (y-axis) are shown. Statistical significance was determined using the bifurcating score (methods). Genes satisfying the significance threshold of FDR<0.002 are represented in colours according to the functional classification of the genes (methods and Supplementary Table 4). FDR, False Discovery Rate, estimated by performing the same analysis with permuted data. (C) The genes with strongest association with Th1 (left) or Tfh differentiation (right). The genes were filtered using the bifurcation score as in (B). The genes were then ranked in descending order of association with either Th1 or Tfh trend. Cdk2ap2 appears twice because two alternative genomic annotations exist.

Introduction
GPfates is based on a three-stage approach that first i) infers a low-dimensional representation of single-cell RNA-seq data, then ii) infers pseudotime to iii) model the temporal dynamics of gene expression profiles with a mixture model. These steps build on existing model-  Figure 2D of the main text (as well as Supp. Comp. Fig 1). In Sections 2 and 3 we describe the statistical models that underlie the components of GPfates. In Section 4 we describe downstream analysis methods for interpreting the fitted model. Finally, in Section 5, we present additional validation experiments using simulations, robustness analyses and by analyzing multiple existing data sets.

Gaussian Process Regression
A main component of GPfates is to model temporal transitions. We use the Gaussian process (GP) framework, thereby casting this problem as non-parametric regression. Let us begin by assuming that the developmental time t for each cell we observe is known. Then, the output y g (i.e. expression of gene g) is modelled as a continuous function of the input t (i.e. developmental progression) where is Gaussian distributed residual noise and f (t) denotes the unknown regression function. In this work y g is considered to be an N -dimensional vector of N cells with observed expression of the gene g. We denote the expression of g in an individual cell n as [y g ] n .
A GP can be interpreted as a function-valued prior on the elements of f , which is defined by a covariance function that in turn is parametrized by the input (developmental time) t: cov(f (t n1 ), f(t n2 )) = k(t n , t n2 ). Sup. Comp. Fig. 1: Illustration of the analysis workflow. A low dimensional parametrization of the data is found using Bayesian GPLVM. The low-dimensional representation is viewed as a mixture problem, and solved by an Overlapping Mixture of Gaussian Processes. This allows us to represent our cells as members of different smooth processes. But also interpret in terms of the high-dimensional space parametrized by the GPLVM.
The covariance function k(t n1 , t n2 ) encodes prior assumptions on the smoothness and lengthscales of the function f (t). The most widely used covariance function is the Squared Exponential (SE) covariance function, and this is the covariance function we will generally be used in this work. This covariance has the hyperparameters θ = (σ 2 SE , l 2 SE ), which parametrize the amplitude ( σ 2 SE ) and the lengthscale (l 2 SE ) of functions under the prior. Throughout the remainder of the text we will omit the hyperparameters from equations for the sake of brevity. Note that there is a whole compendium of valid covariance functions, which can also be combined using sum or multiplication; see [cite: Rasmussen, GP 2006] for an overview.
We write that a function f is Gaussian Process distributed by (t n1 , t n2 )).
This prior on the function f can be linked to the finite observed data using a Gaussian likelihood: Together with the prior on the corresponding (finite) elements of f , this results in the marginal likelihood p(y g |t) = N (y g |0, K t + σ 2 · I).
Here K t is an N × N matrix of pairwise evaluations of the covariance functions at the observed times t. I.e.
By considering the joint distribution of the observed data y g and an unseen function value f (t ⋆ ), it is possible to derive the predictive distribution for f (t ⋆ ): For a full review on Gaussian Processes, see Williams and Rasmussen [2006].
We use Y to compactly denote the N × G expression matrix of cells × genes, where The assumption that all genes are governed by similar functional relationships with t means we place the same GP prior (with shared covariance function): In the next section we will see the usefulness of considering multiple genes at once.

Pseudotime inference by Bayesian GPLVM with per -cell prior
The Gaussian Process regression framework described above assumes we know the time t of each cell. While in many single-cell RNA-seq experiments record a collection times over some time-course, these are rather sparse, and it has been pointed out [Trapnell et al., 2014] that cells are sampled from a population where responses are unsynchronized. Each cell has reached a certain stage in the differentiation process under investigation, which we do not observe directly. The progress in to this process is referred to as pseudotime. We can however infer this from the data. In the Gaussian Process Latent Variable Model (GPLVM) [Lawrence, 2006], we use the multiple output case of Gaussian Process regression (equation 4), but consider the values of t to be parameters which we wish to infer. The joint probability of the GPLVM is where p(Y |t) is defined in equation 5, and the prior p(t) is such that for cell n, p(t n ) = N (0, 1).
Following Reid and Wernisch [2016], we can also consider the prior p(t) to be informed about the experimental ordering of collection times of the cells, putting the mean of t n to correspond to the time point of cell n. If we use our Malaria time course as and example, we can put the prior on t so that where day n ∈ {1, 2, 3, 4, 5} correspond to the collection order of those cells. The parameter σ 2 prior alters the strength of the prior.
The objective of Bayesian GPLVM [Titsias and Lawrence, 2010], is to find the posterior probability distribution p(t|Y ) ∝ p(Y |t)p(t). This is intractable though, due to the t values appearing non-linearly in the matrix inverse [K t + σ 2 · I] −1 .
In Titsias and Lawrence [2010], a lower bound to the marginal likelihood is calculated by estimating the posterior p(t|Y ) by a variational distribution q(t). The distribution N (t n |µ n , S n ) is described in that paper, and Bayesian training of the model to maximize this lower bound. This is the method we use.
Because the scale of t is ambiguous, in particular if no priors are specified, we prefer to at times scale the inferred t to the range [0, 1] when reporting the pseudotime, to avoid confusion about negative "time". In these cases we refer to pseudotime as scaled pseudotime in the legends.

Dimensionality reduction
In many cases it is useful to work on a reduced representation of cellular expression profiles. For example, when modelling transcriptomic data, fitting a model to a low-dimensional representation can be preferable to fitting it to expression profiles of thousands of genes. Formally, the objective of dimensionality reductions is to find some M -dimensional representation of the Gdimensional expression measurements, where M << G. Typically M is 2 or 3, which aides visual interpretation. Analogous to the pseudotime inference, these latent cell states can also be inferred using the GPLVM. Say X is an M × N matrix so that each cell n correspond to an M -dimensional vector, We want to model the expression matrix Y so that Note that now the covariance function is evaluated as k(X n1 , X n2 ), where, in the Squared Exponential covariance function in equation 2, the operator |·| is evaluated as the Euclidean norm for vectors, rather than absolute value. Just as the t n values are inferred from data above, so can the X n vectors be inferred from the data.

Bifurcation inference using overlapping mixtures of Gaussian processes
In a continuous setting, a bifurcating process can be seen as one function, splitting apart into two functions over time. One approach to model this could be to consider two functions throughout time, but before the bifurcation happens, the two functions are identical. With this in mind, we can use a mixture model to tease apart the shared and bifurcated functions.

Mixture model
Mixture models are hierarchical models where an observation is assumed to be generated from one of C components, each of which is described by its own model. cells. This model was originally developed for the application of missile tracking, and in that setting an observation is e.g. a radar based location at a given real time point. As such, the main focus of the definition of the model is for the case of C completely independent components. The approach presented here is based on the realisation that the model would also be able to handle the case of branching trajectories. There would simply be a time interval where it does not matter which mixture trajectory data is sampled from. In

Original OMGP application
Bifurcating OMGP application Sup. Comp. Fig. 2: Comparison of the original OMGP use case (left) and our use case (right), in both cases where the number of trends K = 2. In the original use case trends are expected to be independent throughout time, albeit with some ambiguity in some locations. In our application, we interpret ambiguous cell assignment to be in a common precursor state.
our setting, an observation is a single cell, and the analog to real time is pseudotime (Supp. Comp. Fig. 2).
As an additional extension, we phrase a version of the OMGP model which is non-parametric in the number of trajectories.
In the original regression case described in equation 1, data is assumed to be generated by a single smooth unknown function. When modeling our gene expression data with the Overlapping Mixture of Gaussian Processes, data is considered to be generated by However, we are lacking information about which latent function f c generated any given observation (t n , X n ) of pseudotime and gene expression for the N observed cells. Here X correspond to some representation of the transcriptional state of the cells. It could be the expression of all genes (X = Y ), a single gene (X = y g ), or an Mdimensional inferred representation as discussed above. This is viewed as a mixture modelling problem, where each cell has a latent variable z i specifying to which component f c the cell should be allocated to. Write F for the collection of all latent functions. The covariance functions k c for each f c can be different from each other, though for the applications we discuss here, we take them as Squared Exponential covariance functions with different hyperparameter values.
In the OMGP formulation, the likelihood is We specify a multinomial prior on the latent variables Additionally, each of the latent functions f c has an independent Gaussian process prior: The covariance matrices K 1 t , . . . , K C t for the latent functions f 1 , . . . , f C are generated from a covariance functions k 1 (t n1 , t n2 ), . . . , k C (t n1 , t n2 ) like in equation 3. Now we rephrase this as a Dirichlet Process Gaussian Process mixture model [Hensman et al., 2015]. Let every latent function f c have an associated "stick-breaking length" v c , based on the "stick-breaking" formulation of the Dirichlet Process. Here V = [v 1 , · · · , v ∞ ] is the collection of stick-breaking lengths for constructing the Dirichlet process for the assignment. The joint distribution of the OMGP model is The value α is a parameter of the model which controls the expected concentrations of mixtures (which we in practice take as α = 1, a common default), and where Beta(·, ·) is the beta distribution. The prior distribution over the collection of Gaussian Processes is Following the stick-breaking formulation, The assignments between observations X and the latent functions F is given by a binary N × C matrix Z. The assignments to latent functions are considered as additional variational parameters. Let φ be an N × C matrix where φ nc is the approximate posterior probability of assigning the nth observation to the cth latent function. The φ parameters are inferred by collapsed variational inference as described in Hensman et al. [2012]. Overall, the likelihood of the model is (It should be noted that everything described generalizes to the case where the latent functions f c are vector valued, as long as all output dimensions of such a function share the same covariance function. In this case, probabilities factorize over output dimensions, but beyond that all calculations are the same.)

Parameter inference
In Lázaro-Gredilla et al. [2012] the latent variables Z in the parametric version of OMGP were inferred using an expectation-maximization scheme. Here we describe how we perform variational inference for the φparameters in the non-parametric version of the model.
To make the inference problem tractable, the variational distribution q(Z) is introduced with variational parameters φ, at a given truncation level C such that q(Z) = The second and third parts of L KL were derived in [Hensman2014] as For optimizing variational mixture assignment parameters we follow Hensman et al. [2012], and use natural gradient descent. For hyperparameters of the kernels, as well as the variance parameter σ 2 of the model, we perform gradient descent.
If we know ∂LKL ∂φ we can calculate the natural gradient by equation (22)

.1 Ranking genes by bifurcation
Once the OMGP model has been fit, it can be used to investigate individual genes in terms of their bifurcating trajectory.
The log-likelihood of the OMGP model depends on the covariance matrices K t = {K c t , c = 1, . . . , C}, the variational mixture parameter matrix φ, and the N observations (t, X). Let us assume that we have mixture parameters φ b which have been found to distinguish a bifurcating trend based on some X response variables. We can now keep the fitted parameters and evaluate the marginal likelihood of a model where the response variables X are replaced by gene expression values y g . We call this new model H bifurcating . We wish to find genes which fit this bifurcating model better than a model where this is no bifurcation. To this end, we make a third model H not bifurcating identical to the precious one, except we replace φ b with ambiguous assignments φ a . To asses whether a given gene g is better described by the bifurcating or the not bifurcating model, we evaluate the Bayes factor: BF g = log p(y g |H bifurcating ) − log p(y g |H not bifurcating ).
We refer to this ratio as the bifurcation statistic.
To estimate p-values, we used a permutation approach where we perform the same analysis for every gene g, except with permuted t values to estimate a null distribution.
As a proxy for effect size of bifurcation, we consider how well the expression values of a gene correlate with the trend assignments to a latent function. Strong positive correlation will mean the gene is particularly upregulated in the cells unambiguously belonging to the trend. Conversely, a strong negative correlation indicates the gene is down-regulated in the strongly assigned cells compared to all cells.

Inferring the bifurcation time point
It is possible to qualitatively appreciate from the GP assignment probability (φ c ) for each trajectory (f c (t)) of the OMGP model, which cells are ambiguous and which cells are exclusive to individual GP's. In the case of two trends, ambiguous cells have assignment probability (φ) close to 0.5. A model where the data can be described by two trends, but not by one, will have a higher likelihood. Similarly, if only a region of the φ parameters over time are replaced by ambiguous cell assignment values, the new model will have a lower likelihood.
For the sake of clarity, we make the assumption that the OMGP will begin as ambiguous, and then become less ambiguous over time, splitting into two trends, in this special case. To investigate these cases, we pick a time-point t b in an OMGP, then replace all φ values prior to t b with 0.5. We define this new φ as φ >t b : Now we can evaluate the model likelihood for this particular t b and define This procedure is repeated for multiple ts over the predictor variable of the OMGP model. In our implementation, we consider 30 evenly spaced bins by default, which has given enough resolution for the data investigated (though the number of bins can easily be changed).
The likelihood has to decrease by definition. However, after the bifurcation the decrease is much more pronounced. We use a break-point heuristic to detect this elbow, which is indicative of the bifurcation time.
To identify the region at which the likelihood decreases more rapidly, we fit a piece-wise linear curve to the log-likelihoods, defined by This curve consists of two linear pieces, broken up at the point p. When the curve is fitted, we consider the breakpoint p to be the point after which we can be confident that a bifurcation has occurred, see Supp. Comp. 5 Implementation and combination with existing workflows

Practical use of GPfates
The basis principle of GPfates is the combination of pseudotime and mixture modelling.
Sup. Comp. Fig. 3: Inferring bifurcation point. The plot illustrates how different points along the pseudotime are sampled. Ambigous assignment probabilities replaces trained assignment probabilities in the observations earler than the sampled points. The breakpoint model identifies the points where a decrease in likelihood differences becomes more extreme.
Input to the GPLVM is an expression table consisting of log scaled relative abundance values Transcripts Per Million, TPM, with a value of 1 added to handle cases where expression is 0. As relative abundance follow a log-normal distribution, the Gaussian likelihood used for Gaussian Process regression should be appropriate.
In practice, the pseudotime should represent the biological process of interest. If this process is clear, the expression data should be usable without pre-processing. In single cell time course experiments where the process of interest is less immediate, a strategy highlighted in Trapnell et al. [2014] is to select the gene set used could be to rank the genes by an ANOVA test over the time points, and select a larger number of significant genes.
Similarly, the low-dimensional representation of the transcriptomic cell state should represent the biological response of interest. It can be beneficial to select the parts of the representation which correspond to this. For example, in the analysis of CD4+ T cell time course, we use the second GPLVM latent variable as a representation of T cell response, and model this factor by the OMGP.
While the pseudotime can be inferred directly from the expression matrix Y , in many cases it helps interpretation to perform an intermediate step of dimensionality reduction. This process could also be beneficial if the data has a very complex structure.
Another practical consideration we must consider is that single cell expression vales can be quite noisy. This limits the time-scale at which we can expect to measure proper functional differences in expression levels. Due to this, we tend to put lower limits on the lengtscale l SE of the squares exponential covariance function.

Integration of existing methods
We have presented use of the GPfates method when pseudotime or low-dimensional representations have been based on the GPLVM. This is because the OMGP follows from this framework, and the statistical assumptions are consistent between the models.
In practice, other methods for inferring pseudotime or low-dimensional representations could also be considered. Here we briefly outline possible strategies for applications of GPfates downstream of popular singlecell analysis methods.
Recall that to perform the GPfates inference, we need pseudotime t and some representation of transcriptomic state X. These variables can be set as the output from other methods.
In Monocle [Trapnell et al., 2014], the low-dimensional representation X is found by independent component analysis, and the pseudotime t for each cell is defined by the path distance to a starting cell through a minimum spanning tree in the coordinates of X.
In Wanderlust, a heuristic is used to build a stable k Nearest Neighbor (kNN) graph of the data in the highdimensional space of protein measurements. The pseu-Sup. Comp. Fig. 4: OMGP is compatible with e.g. Wanderlust and Monocle, as demonstrated with a toy data set dotime t for a cell is then defined as the average shortest path from a known starting cell through the kNN graph. Note that for CyTOF data, which Wanderlust is designed for, only up to 40 analytes can be measured at once, so it could be feasible to take X to be the original expression matrix (Y in our notation).
Another dimensionality reduction technique which have been used for single cell RNA seq data is Diffusion Maps [Haghverdi et al., 2015]. Here X is a spectral embedding of the data manifold, based on the Laplace operator. It has been pointed out that these embeddings preserve branching structure in the data. Taking the pseudotime t as the Diffusion Pseudotime [Haghverdi et al., 2016], an approximation of geodesic distance over the data manifold (from a known starting cell), based on the diffusion map, GPfates modelling could be used downstream to quantify the branching structure of the data.
We list alternative compatible pseudotime methods in table 1.
As a demonstration, we generated a toy data set with three branches, and extracted the pseudotime using both the Monocle method and the Wanderlust method. Then fitted and OMGP with C = 3 on the output. The results can be seen in Comp. Supp. Fig 4, which illustrates the correct identification of the branching processes for either input.

Software availability
We have made a software package for using the GPfates method, which is available at https://github.com/ Teichlab/GPfates. It provides guidance and sensible defaults for the kind of analysis we have described here. It makes extensive use of the GPy 1 package, and the GPclust 2 package, where we implemented the nonparametric OMGP model. Sup. Comp. Fig. 5: Robustness of analysis steps by subsampling. Parameters inferred from a subsample of the data are compared to parameters inferred using the full data. The top panel indicates this analysis for independent steps assuming the previous step is known. The lower panel shows the result when running the workflow from start to end.
6 Assessment of GPfates on simulated and real data 6

.1 Sample-size robustness analysis
Our full analysis consists of several independent consecutive steps: first the GPfates method where we are i) finding a low-dimensional representation, ii) smoothing the data over a pseudotime, and iii) finding a trend mixture model. After this we perform downstream analysis where we are iv) dentifying the end states and bifurcation.
How much data do we need to accurately reconstruct trends from all four of the above steps, and how much data is needed for individual steps? We investigated both how stable the full procedure is, as well as the individual steps, by re-running it on sub-sampled datasets with fewer cells than the entire dataset.
To measure the stability of the methods, we consider absolute Pearson correlation of the parameters inferred for sub-sampled data relative to the results obtained from performing the analysis on the full data set.
We found that recovering a low-dimensional representation is extremely stable with respect to the number of cells (Supp. Comp. Fig. 5), with almost perfect correlations between analysis of the sub-sampled data and the original GPLVM values. (For example, the lowest absolute Pearson correlation for a run with 50 cells was 0.96). Similarly, the pseudotime inference is also very stable to sub-sampling.
Finding the entire OMGP mixture model over pseudotime requires a larger number of cells. We don't see Sup. Comp. Fig. 6: Complete reanalysis of our T-cell data excluding cells cellected at day 4. The bifurcation point is identified as being between Day 3 and Day 7, and is not forced in to either of the days.
any higher degrees of consistency until we reach 150 subsampled cells, with correlations around 0.5. It is rare to see single cell studies with so few cells, and in the study accompanying this text we had many more cells (408). Identifying only the end states is rather robust (but in many cases might be best analyzed as a cluster problem rather than a continuous value problem), where we start seeing a correlation of 0.9 at 150 cells.
The individual steps were in general very stable to sub-sampling, relative to the "gold standard" of using the full data set. When running the entire procedure, we see that smaller errors early on in the analysis will propagate and affect later steps.

Predicted bifurcation time is not biased by collection times
We consider the risk that the identified bifurcation point in the CD4+ T cell data potentially just reflects the time points at which we have collected data. We test the robustness of the prediction of the bifurcation as having happened at Day 4 by re-running the analysis after removing cells collected at Day 4. In this analysis we find that the bifurcation happens at some point between Day 3 and Day 7 where we don't have any observed cells. The alternate hypothesis would have been that the bifurcation would be found in either Day 3 or Day 7. This provides confidence both in the bifurcation point identification, and more generally in the meaningfulness of the low-dimensional GPLVM representation of the data (Supp. Comp. Fig. 6).

Assessment of the ability to select the number of trends in OMGP
In principle, the marginal log likelihood of the OMGP model should let us select the C number of trends which optimally explain the data. We investigated this by generating four synthetic data sets with between 1 and 4 underlying trends. For each of the data sets, we optimized OMGP models with the number of trends C varying Sup. Comp. Fig. 7: Attempts detecting number of trends with OMGP. Simulated data with expected numbers of trends where fitted with OMGP, where the C cutoff was set to a range of values. from 1 to 9 (three times per C value). We found that the marginal likelihood of the models corresponded to the correct number of trends in the cases of 3 and 4 ground truth trends, but not for the 1-trend and 2-trend synthetic data. For 1 trend, the likelihood was lowest for a larger number of trends, and for 2 trends, the likelihood was very similar for 2 and 3 trends. This suggests that the OMGP may have a tendency to overestimate the number of trends if there is a single progression. Supp. Comp. Fig. 7.
For our CD4+ T cell data, we found that the marginal likelihood continuously increased with the number C. We elected to keep the model simple and made the assumption that we could sufficiently explain the data with C = 2.
It is possible that the optimal likelihood for K is not well defined when we have trends branching off from a common trend. In the original application of the OMGP model, the assumption is that the trends will be completely independent of each other. As we are already to some extent failing to fit two models in the ambiguous case, this might cause the likelihood to reflect a poor fit. For quantitatively determining the number of trends in the data, further work is needed, probably with a model which explicitly considers branching from a common original trend. The marginal likelihood of the model is an indication, but the choice of C should also reflect the biological system under consideration.
Sup. Comp. Fig. 8: Comparison of pseudotime with and without per-cell priors. The upper left shows the fit of the pseudotime predicted in to the 2D GPLVM with and without priors. Below are the corresponding inferred covariance matrices. The right plot shows the relations between the two versions of pseudotime, clearly indicating that they have an approximate one-to-one mapping.
Sup. Comp. Fig. 9: Investigation of uncertainty of inferred pseudotimes. Left panel, since the Bayesian GPLVM fits the variance of the pseudotime for each cell, we can compare the assignments with each other. The bars correspond to 95% confidence intervals. On the right panel we see how the lengths of the confidence intervals globally decrease as the number of cells used increases.

Comparison of pseudotime inference with and without priors
For the 1-dimensional Bayesian GPLVM which we use to find the pseudotime of the data, we put priors on the cells based on their known time points. This is not strictly necessary, but helps to enhance interpretability as there is intrinsic invariance to the inferred values. If we do not use priors, qualitatively, the same trajectory is identified. Additionally, comparing the two versions of pseudotime against each other, we see that they correspond to a circular shift relative to each other. The covariance matrices inferred using either strategy have a very similar block structure (Frobenius norm ... of difference) indicating that neighbor relations are consistent. Supp. Comp. Fig. 8.

Pseudotime uncertainty
As pointed out in Campbell and Yau [2015], we can use the posterior distribution of pseudotime from the Bayesian GPLVM to assess how meaningful the order-Sup. Comp. Fig. 10: Stability of GPLVM representation, and prediction through GPLVM. Top row: Predicting cells from regions of show higher similarity with left out real cells from corresponding regions than noncorresponding regions. Bottom row: Predicting cells from unobserved regions potentially identifies antagonizing gene combinations.
ing is. By investigating the confidence intervals of the pseudotime for each cell compared to neighboring cells, we see that the ordering is quite meaningful (few cells overlap in confidence interval). (Supp. Comp. Fig. 9) We also investigated how the confidence of the pseudotime depends on the number of cells observed. As the number of observed cells increases, the distribution of variance per cell decreases towards zero. (Supp. Comp. Fig. 9) 6.6 Stability of the circular shape of the GPLVM representation We wanted to rule out the possibility that the latent variable representations of data which appear circular might be artifacts due to random noise, as suggested by Diaconis et al. [2008]. To make sure this was not the case for our CD4+ T cell data, we removed two 'slices' of cells from the circular 2D GPLVM pattern. Following this, we fitted a new GPLVM with this reduced data set. After optimizing the GPLVM, a representation was found which was again missing the same slices, Supp. Comp. Fig. 10A. (The correlation between the two representation for the common cells is very high as well, XX). This control experiment strongly suggests that the GPLVM learns the actual topology of the data.

Assessing the accuracy of imputing virtual cells
Unlike many other dimensionality reduction techniques, the GPLVM creates a model which maps into the high dimensional observed space. It is, however, not clear how meaningful this representation is. We assessed this by taking the "slice-less" model described above, and in the empty areas corresponding to the removed cells, predicting "virtual cells" (Supp. Comp. Fig 10A). Using an independent clustering technique, t-SNE [Van der Maaten and Hinton, 2008], on both the left out slices of cells and the predicted virtual cells, we find that single cell transcriptomes predicted from a given slice coincide with the real cells from the corresponding slice (Supp. Comp. Fig. 10). This indicates that GPLVM prediction in to high-dimensional spaces is not simply producing overfitted results. Following from this, we investigated the "hole" in our CD4+ T cell data. We create number of virtual cells from the hole region and compare which genes would be expressed in these cells compared to genes expressed in all cells (Supp. Comp. Fig. 10C). The underlying reasons for data being non-linear is that particular combinations of gene expression patterns do not occur together. If we find genes which are high in the virtual cells but are not observed at the same time in actual cells, this might indicate that they are incompatible with each other. This might be a good complementary tool for generating hypotheses about regulation. For instance, we identified the genes Hspe1 and Gm29216 which would be co-expressed in the hole, but are generally not co-expressed in observed cells (Supp. Comp. Fig. 10D).

Validating the BGPLVM and
OMGP approach by application to other data sets In order to further corroborate our analysis approach, we considered two recently published single cell data sets produced to investigate progression of single cells in two developmental contexts: mouse fetal lung and human fetal primordial germ cells.

Analysis of lung development data
We downloaded the data from Treutlein et al. [2014] and quantified the expression using Salmon. To smooth the data over pseudotime, we found genes that vary over the a priori known time points by a likelihood ratio test of an ANOVA model of the time points. The expression values for the top varying genes were run on a GPLVM. One of the factors of the optimized GPLVM was used as pseudotime, and the top two factors of the GPLVM were used to represent the entire data set. An OMGP was then optimized on this low-dimensional representation to identify the two trends corresponding to the AT1 and AT2 lung cell lineages without prior annotation. The bifurcation statistic for all expressed genes in these cells reconstituted many of the genes identified in a largely manual manner by Treutlein et al. [2014].

Analysis of human primordial germ cell data
The data from  was downloaded and quantified with Salmon as with the other data, but with an index based on the human transcriptome: Ensembl 78 annotationa of GRCh38, together with ERCC sequences and human specific repeats from RepBase. To smooth the time course data, we used a likelihood ratio test to find the top genes which were described linearly along the time points in the data. The expression of these genes were then used to fit a GPLVM. This lowdimensional representation of the data was then used to fit an OMGP, taking one of the latent factors as pseudotime.
In this data set, the ground truth about the sex of the cells is known, and thus we could have use a supervised approach such as GPTwoSample We applied the bifurcation statistic test to identify genes which follow the male and female development differently.
Unlike in the case of the lung development data, the majority of the genes we identify are not discussed in the original study. In the original study, the authors focused on genes specific to given conditions (e.g. Male PGC's from week 11 compared to all other cells). In our analysis, we consider the dynamics of gene expression over development. We find that in the male branch, the GAGE family is highly upregulated over development. Additionally we find the Y-linked gene ENSG00000279950. Also among the top male hits is RHOXF2, a gene linked to male reproduction [Niu et al., 2011]. Further down the list we also interestingly find PIWIL4, a gene with function in development and maintenance of germline stem cells [Sasaki et al., 2003]. On the female side, the top hit is MDK, a gene involved with fetal adrenal gland development (by similariry: UniProtKB P21741). Other top hits include MEIOB, a meiosis related gene, and the satellite repeat GSATII. Surprisingly, we also see upregulation of SPATA22, a gene associated with spermatogenesis.

Discussion
We have demonstrated our GPfates method, where we use latent variable modelling to infer temporal expression dynamics, and Gaussian process mixture modelling to identify diverging global trends. The method has been investigated in terms of robustness, and applied on several simulated and real data sets showing good results.
Of course there is no silver bullet for these sorts of problems, and it would not be surprising if other methods than the ones we have used work better for some biological systems. We have illustrated that the main component, the Gaussian process mixture modelling, is compatible with other methods in these cases.
A benefit from the methods we use is that diagnostics such as marginal likelihood can be used to aide the user with regards to the models to use. Still, the user will need to keep the biological system in mind, and be critical of results.

Infections
Plasmodium chabaudi chabaudi AS parasites were used after one in vivo passage in WT C57BL/6 mice. Mice were infected with 10 5 pRBCs i.v. and blood parasitemia was monitored by Giemsa-stained thin blood smears obtained from tail bleeds.
Flow cytometry and cell isolation Single-cell suspensions were prepared by homogenising spleens through 100 µm strainers and lysing erythrocytes using RBC lysis buffer (eBioscience). Fc receptors were blocked using anti-  Single-cell mRNA sequencing Single cell capture and processing with the Fluidigm C1 system was performed as described in (52). The cell suspension obtained from sorting was loaded onto the Fluidigm C1 platform using small-sized capture chips (5-10µm cells). 1 µl of a 1:4000 dilution of External RNA Control Consortium (ERCC) spike-ins (Ambion, Life Technologies) was included in the lysis buffer. Reverse transcription and pre-amplification of cDNA were performed using the SMARTer Ultra Low RNA kit (Clontech). For processing with the Smart-seq2 protocol (29), the cells were sorted into 96-well plates containing lysis buffer using either a MoFlo XDP (Beckman Coulter) or an Influx (Becton Dickinson) instrument. The Smart-seq2 amplification was performed as described in (29), with the lysis buffer containing Triton-X, RNase inhibitor, dNTPs, dT30 primer and ERCC spike-ins (Ambion, Life Technologies, final dilution 1:100 million). The cDNA amplification step was performed with 24 cycles. The sequencing libraries were prepared using Nextera XT DNA Sample Preparation Kit (Illumina), according to the protocol supplied by Fluidigm (PN 100-5950 B1). Libraries from up to 96 single cells were pooled and purified using AMPure XP beads (Beckman Coulter). Pooled samples were sequenced on an Illumina HiSeq 2500 instrument, using paired-end 100 or 125base pair reads.
Processing and QC of scRNA-Seq data Gene expressions were quantified from the paired end reads of the samples using Salmon (41), version 0.4.0. An example command for a one sample would be "salmon quant -i mouse_cdna_38.p3.78_repbase_ercc_index -l IU -p 4 -1 1771-026-195-H4_1.fastq -21771-026-195-H4_2.fastq -o1771-026-195-H4_salmon_out -g mouse_cdna38.78_repbase_ercc_index_gene_map.txt". The parameter libType=IU, and a transcriptome index built on Ensembl version 78 mouse cDNA sequences. We also had 30 sequences from the ERCC RNA spike-ins in the index, as well as 313 mouse specific repeat sequences from RepBase to potentially capture transcribed repeats. For quality control of the single-cell data we assessed the number of input read pairs, and the amount of mitochondrial gene content. For all cells, we considered samples with less than 100,000 reads or more than 35% mitochondrial gene content as failed. For T cells, we additionally considered cells where number of genes was less than 100 + 0.008 * (mitochondrial gene content) as failed. For the data generated using a 96-well plate-based Smart-seq2 protocol, which does not permit visual inspection of the captured cells, we additionally excluded lowquality cells from which fewer than 2000 genes were detected, motivated by negative control wells. To verify that that the cells sorted in the wells were PbTII cells, we only selected cells from which both the transgenic TCR alpha and beta chains were detected (Supplementary Tables  2 and 3). Excluded cells were removed from all further analyses, and the remainder of the samples were taken as individual single cells.
For expression values, the Transcripts Per Millions (TPM's) estimated by Salmon included ERCC spike-ins. Thus, for analysis of the cells, we removed ERCC's from the expression table and scaled the TPM's so they again summed to a million. This way we get endogenous TPM values, representing the relative abundance of a given gene within a cell. We also globally removed genes from analysis where less than three cells expressed the gene at minimum 1 TPM, unless stated otherwise.
Latent Variable Modelling of data We modelled the data using an unsupervised Bayesian Gaussian Process Latent Variable model (BGPLVM) (14) on log10 transformed TPM values (with a scaling factor 1 added). The BGPLVM was run with 5 latent variables. As we used an ARD (Automatic Relevance Determination) squared exponential covariance function, we could infer that two latent factors explained the data. All other parameters to the BayesianGPLVM model in GPy (version 1.0.9) were left as default. Upon inspection we noticed a circular pattern. This corresponds to a 1dimensional topology, which requires two dimensions for a faithful representation. Thus we inferred a new latent variable by 1-dimensional BGPLVM, with priors on the latent variable based on the cell collection times (see the Computational Supplement), where we used the 2D latent variables as input. This way we inferred smoothed "pseudotime" values for the every data point representing the progressive response to the malaria infection. In the 2-dimensional model of the data, we searched for genes that highly correlated with either of the two explanatory latent factors. Performing functional enrichment analysis using gProfiler (42) on the top genes revealed that factor 1 (which explained most of the variation) was related to cell proliferation. The second factor was largely explained by genes involved with immune response. Upon inspection, it seemed as the second latent factor terminated in two groups of cells. We investigated this in terms of a bifurcating time series.
Bifurcating time series analysis To study the cells in terms of a bifurcating time series, we implemented an Overlapping Mixture of Gaussian Processes (OMGP) model (16), see the Computational Supplement for details. The 31 model uses an optimization procedure to associate observed data with a given number of individual independent trends over a time variable. The model was run with pseudotime as input, and the immune response related latent variable as output. For the mixture model, we assumed two trends. The two trends were given squared exponential covariance functions, where we fixed the length scale to 1 based on our prior assumptions on smoothness over pseudotime. We also constrained the model variance to 0.05, which allows trends to share observed data points. Remaining hyperparameters were optimized by gradient descent. (See the Computational Supplement for details) Testing genes for bifurcated expression The output of the OMGP model is a soft assignment to each of two trends for every observed cell. The original model was fitted with the 2nd latent variable from the latent variable analysis. To find genes that significantly drive this bifurcation, we keep all parameters fixed but change the data to be individual genes expression levels and calculate the data likelihood. In order to get a null distribution to assess significance, we performed the same analysis but with randomly permuted pseudotime-values. This is described in detail in the Computational Supplement. To measure in which direction a gene is involved with the bifurcation, we used correlation between expression and trend assignment. For example, a gene's expression being strongly positively correlated with a trends assignment means it is being upregulated on that bifurcated branch.

Monocle
The Monocle analysis was performed with version 1.2.0 of the Monocle package, following the steps outlined in the original vignette (17). In brief, the analysis was performed using the sizenormalized data (TPM) including all genes expressed in ≥10 cells (11439 genes) with default parameters. The genes used for the ordering of cells were defined by carrying out a differential expression analysis of the time points using the differentialGeneTest embedded in the package. Following the original vignette, genes with q-value <0.01 were selected (7738 genes). The num_paths option was set as 2. SCUBA (https://github.com/gcyuan/SCUBA/tree/2ffa4fe5842dfe88db0207c82088bce0e5b97be7) was run using 3003 genes and provided information about time point. RNAseq_preprocess.m and SCUBA scripts were run according to instructions. SCUBA did not find any bifurcation points. Similarly, using 1000 most informative genes (SCUBA default), or scaling of the data (to aid the sensitivity), did not result in any bifurcation either. Changing the number of was done by the variable ngene_select in RNAseq_preprocess.m. All other variables were kept at default.
Annotation of cell-surface receptors, cytokines and transcription factors Genes likely to encode transcription factors, cell-surface receptors or cytokines were found by combining information from KEGG (http://www.genome.jp/kegg/), the Gene Ontology Consortium (http://geneontology.org/, PANTHER (http://www.pantherdb.org/) along with the more specific databases detailed below.     (A) Principal Component Analysis of the single cells from the replicate PcAS infection. The single cells were sorted on 96-well plates and cDNA was amplified using the Smart-seq2 protocol (29). The arrows represent the Pearson correlation with PC1 and PC2. Cell size refers to the number of detected genes. "Th1 signature" and "Tfh signature" refer to cumulative expression of genes associated with Th1 or Tfh phenotypes (15). PC, Principal Component.

(B)
The emergence of subset-specific gene patterns at day 7 of infection. For the top bifurcating genes (Fig S5C) pairwise gene-to-gene Spearman correlations were calculated. The rowside colours represent the association of the gene with either Th1 fate (red) or Tfh fate (blue).

(C)
The expression of top 20 Th1 and Tfh associated genes identified using GPfates in single PbTII cells at days 4 and 7. Cdk2ap2 appears twice because two alternative genomic annotations exist.