The decision to develop into or T cells is pre-programmed in distinct subpopulations 1 of DN 1 thymocytes 2 3

12 T cells are divided into the  and  lineages. It is currently thought that these lineages differentiate in the thymus 13 from uncommitted progenitor thymocytes. However, in this study we show that the decision to differentiate into 14 one or either lineage is in fact already fixed in these apparent uncommitted progenitors. Using single-cell RNA 15 sequencing, we reassemble de novo a model of early T cell development based on the transcriptional profiles of 16 individual CD4CD8 double negative and  thymocytes. We show that the earliest thymocyte stage, known as 17 CD4CD8 double negative 1 (DN1), is actually comprised of a mixture of transcriptionally distinct subpopulations. 18 Although not yet expressing definitive markers of  and  lineages, such as the lineage-defining T cell receptors, 19 specific DN subpopulations exhibit restricted developmental potential for either  or  lineage. Furthermore, 20 specific -primed DN1 subpopulations preferentially develop into IL-17 or IFN-producing  T cells. Thus, T 21 cell lineage decisions are hardwired from the earliest stages of T cell development. 22


23
T cells are divided into two principal lineages that are distinguished by the composition by their T cell receptors 24 (TCRs) (Carpenter & Bosselut, 2010). Those that express an  TCR include conventional CD4 + and CD8 + T 25 cells and these can be further divided into diverse subsets that have distinct immunologic functions (  Additionally, weak TCR signaling may promote an IL-17A phenotype in  T cells, while strong signaling 48 promotes an IFN phenotype (Sumaria, Grandjean, Silva-Santos, & Pennington, 2017). 49 50 However, the role of the TCR remains debatable because a lack of Notch signaling at the DN2 stage compromises 51 progression to DN3 and impairs  development but not  development (Tanigaki et al., 2004;Wolfer, Wilson, 52 Nemir, MacDonald, & Radtke, 2002). How can the stage prior to the apparent  branching be compromised and 53 yet  T cell development still proceed? Evidence suggest that this Notch-independence is at least partially due to 54 the transcription factor ID3 because a weak TCR signal in cooperation with Notch signaling is sufficient to inhibit 55 E-protein activity (Lauritsen et al., 2009). However, ID3 activity could explain the development of only a subset 56 of  T cells. This murkiness surrounding the αβ versus γδ decision has arisen partly due to a lack of markers to 57 define the development of thymocytes that have been selected to mature into γδ T cells (Pellicci, Koay, & Berzins, 58 2020). Together, these ambiguities raise the question as to whether the early stages of T cell development are 59 really bipotent homogenous populations that can give rise to both  and  T cells. Emerging evidence in other 60 developmental pathways suggests that progenitors can be heterogeneous with biases in lineage potential (Naik et 61 al., 2013). In this study, we perform single-cell RNA sequencing (scRNA-seq) of early thymocytes from mouse 62 thymus to determine cellular heterogeneity at single-cell resolution and to assemble a model of T cell development 63 de novo in order to address when the  versus  lineage decision is actually made. 64

Differences in the kinetics of  versus  T cell development from DN1 thymocytes 67
To better understand the development of the  lineage, we began by investigating the kinetics of  versus  T 68 cell development in OP9-DL1 cultures, which supports the development of hematopoietic progenitors in vitro 69 into either  T cells or  cells up to the CD4 + CD8 + double positive (DP) stage 70 2009). Total DN1 (CD44 + CD25 -Lin -) and DN2 (CD44 + CD25 + Lin -) thymocytes were sorted from C57BL/6 mice and cocultured on OP9-DL1 cells. The cultures were then analyzed after 9-23d. DP cells were detected by CD8 72 (and CD4) expression while  cells were detected by TCR expression (Supplementary Fig. 1a, b). 73 74 While DN1 and DN2 thymocytes gave rise to both lineages, surprisingly, there was a significant difference in the 75 kinetics in the development of DN1 thymocytes into  cells compared to  cells. Starting from DN1 thymocytes, 76  cells were already present at 9d, with the percentage and cell number peaking at 12d, waning and then 77 increasing again at 20d (Fig. 1a, b). In contrast,  cells only started appearing after 14d and peaked at 20d. The 78 earlier  peak suggests that either  development is either simply more rapid than  development or that there 79 may be different developmental pathways for the two lineages, with  cells potentially produced in two waves. 80

81
Unlike the discordant kinetics of the DN1 cultures, culturing DN2 thymocytes produced both  and  lineage 82 cells from 9d and in similar proportions throughout (Fig. 1a, b).  cells were produced at a consistently low 83 frequency of 2-6%, while between one-third to half the cells were  cells at each timepoint. Given that DN2 84 thymocytes are only one stage along from DN1, the consistent rate of both  and  production argues against 85  development being simply more rapid. 86

87
If the  and  lineages develop independently, we might expect to see each lineage being derived from distinct 88 DN1 thymocytes. To investigate this, we tagged individual DN1 thymocytes with unique genetic heritable 89 barcodes (Naik et al., 2013) and tracked the propagation of these barcodes in OP9-DL1 cultures. If an  cell and 90  cell inherits the same barcode sequences, it means that they were derived from the same progenitor. CD8 + 91 and TCR + cells were sorted after 14d and 20d of culture and analyzed for barcode composition (Supplementary 92 Fig. 2a). While there was some overlap in barcode usage between the four populations, the barcode profiles were 93 largely non-overlapping, with Pearson's correlations of r<0.6 for the first experiment (Fig. 1d, e) and even lower 94 in the second experiment ( Supplementary Fig. 2c, d). Together with the kinetic analysis, this suggest that the DN1 thymocytes may in fact be a heterogenous mixture consisting of distinct subpopulations that developed into either 96  cells or  cells. (CD44 + CD25 -Lin -) and DN2 (CD44 + CD25 + Lin -) thymocytes were sorted from the thymus of 01 C57BL/6 mice then analyzed for  versus  lineage differentiation in OP9-DL1 cultures. The 02 cultures were analyzed at the indicated time points by flow cytometry.  lineage cells were identified 03 as CD8 + while  lineage cells were identified as TCR + . CD4 expression was also analyzed but 04 was concomitant with CD8 expression (not shown). Lin = CD4, CD8, B220, CD11b and CD11c.  To better understand the potential heterogeneity of the early stages of T cell development, we next characterized 19 the transcriptional landscape of DN and  thymocytes at single-cell resolution. Cells were sorted from the thymus 20 of C57BL/6 mice for analysis by 10X scRNA-seq over three independent runs (Fig. 2a, b and Supplementary 21 Table 1). The first run consisted of total DN and TCR + cells, the second consisted only of DN1 and DN2 cells 22 and the third involved sorting DN1+2, DN3 and TCR + cells separately and mixing back together post-sort at a 23 ratio of 55% to 30% to 15%, respectively ( Fig. 2a and   strategies used to sort DN and gd thymocytes from C57BL/6 mice for 10X scRNA-seq. Three 29 separate runs were completed. The first run was total DN and TCR + cells. The second run consisted 30 only of DN1 and DN2 cells. The third run involved sorting DN1 plus DN2, DN3 and TCR + cells 31 separately, which were then mixed back together post-sort at a ratio of 55% to 30% to 15% 32 respectively. Dump = CD4, CD8, B220, CD11b, CD11c, TCR and TCR. (b) Following processing 33 of the 10X data on CellRanger, each dataset was analyzed for clustering based on the first 12 principal 34 components in Seurat. Shown is the t-SNE plot of each run color-coded to DN developmental stage,  were first integrated to assemble a global view of early T cell development. To recover biological distinction from 43 these different replicates and minimize batch-associated variability, the pooled data was normalized using 44 SCTransform. Following dimensional reduction, unsupervised clustering was performed using the first 12 45 principal components (PC). This identified 30 distinct clusters (Fig. 2c), which were assigned to a DN stage or 46 mature  thymocytes based on the expression of canonical markers ( Supplementary Fig. 1a, b). High Cd44 and 47 The trajectory analyses, as well as the OP9-DL1 kinetic studies, predicted that  T cell development might be 96 occurring in parallel with  development, rather than branching off at the later DN2b>DN3 point. To better 97 delineate the earlier stages of the developmental model, we analyzed the second 10X run that was performed 98 specifically on DN1 and DN2 thymocytes. Unsupervised clustering of the 8,851 cells that pass quality control checks 99 identified 26 distinct subpopulations (Fig. 4a). Analysis for marker genes identified eight DN1, five DN2a and 10 00 DN2b subpopulations (Fig. 4a). There were also three small clusters of non-T cells consisting of doublets and B 01 cells that, for simplicity, were excluded from downstream analysis. DN1s expressed high levels of progenitor 02 markers, including Hhex, Il7r and Cd44 but low levels of T-commitment markers, such as Il2ra, Tcf7, Notch1, 03 Cd24a and Myb ( Supplementary Fig. 5a, b). Late T-lineage commitment genes, including Rag1, Rag2, Notch3 04 and Ptcra, were used to separate DN2a and DN2b subpopulations ( Supplementary Fig. 5a, b). While heterogeneity 05 in DN1 thymocytes was reported previously (Porritt et al., 2004), the degree of heterogeneity among both DN1 06 and DN2 thymocytes was far more extensive than expected, which could be clearly seen by the differential 07 expression of many marker genes ( Supplementary Fig. 5b). 08 09

Identification of novel cell surface markers for delineating DN1 and DN2 subpopulations 10
In order to study these novel subpopulations of DN1 and DN2 thymocytes, we needed to identify useful cell 11 surface markers that could be used for flow cytometry. Differential expression analysis was performed to select 12 features (p<0.05 and 2-fold difference) that were cross-referenced to GO terms for cell surface proteins 13 (Supplementary Fig. 6a, b). We then tested commercial antibodies against these candidates. While not all 14 antibodies produced a staining pattern consistent with mRNA expression levels, we were able to identify a 15 minimal panel to delineate both DN1 and DN2 thymocytes. 16 The combination of five markers encoded by the genes Cd24a, Kit, Bst2, Ly6a and Klrk1 distinguished the eight 39 DN1 subpopulations (Fig. 4b). CD24 and c-Kit was previously shown to separate DN1 thymocytes into five 40 subpopulations, known as DN1a, DN1b, DN1c, DN1d and DN1e (Porritt et al., 2004). DN1a to DN1d each 41 corresponded to a single cluster (Fig. 4a). Bst2, Ly6a and Klrk1 were then used to delineate DN1e-1 to DN1e-4 42 (Fig. 4c). 43 44 The pair of DN2a clusters 7/15 could not be separated by cell surface markers as well as the pairs of DN2b clusters 45 4/23, 9/13 and 8/19 due to very similar gene expression profiles. Thus, these cluster pairs were combined. The 46 resulting 11 DN2 clusters were annotated as DN2a-1 to 4 and DN2b-1 to 7 based on the expression of genes 47 encoded by Thy1, Kit, Cd53, Ly6d and Cd3e (Fig. 4d, e). The heterogeneity of the DN1 thymocytes could potentially represent different progenitors of  and  lineages, 51 which would explain the early branch point inferred by pseudotime trajectory analyses (Fig. 3a-c). To test this, 52 we sorted the DN1 and DN2 subpopulations using our antibody panels and accessed their  versus  lineage 53 outcomes in OP9-DL1 cultures after 14d and 20d ( Fig. 5a-f).  Table 5). There was also a dramatic reduction in cell numbers in the DN2b cultures, which may 67 be a result of the cells undergoing cell death after reaching the DP stage (Fig. 5e). 68   In our initial clustering analyses of the DN and  thymocyte scRNA-seq data,  thymocytes were found to 92 segregate into two distinct but closely related clusters (Fig. 2c). We therefore wanted to determine how these 93 related to the different DN1 subpopulations. Firstly, differential expression analysis revealed substantial 94 differences between two  clusters, with 93 genes expressed at significantly higher levels by the -1 cluster 95 while 117 genes were expressed at significantly higher levels by -2 cluster (Fig. 6a). Genes that were highly 96 expressed by the -1 cluster included Gzma, Blk, Maf, Sox13, Etv5, Gata3, Ccr9, Rorc, Sox4, Tcf12, Lgals9, 97 Cmak4 and Bcl11b, which are all associated with IL-17A-producing  T cell subset (Sagar et al., 2020). This 98 suggests that the -1 cluster is likely to be  thymocytes that go on to mature into the IL-17-producing subset, while the -2 cluster are those that are likely to mature into the IFN producing  T cell subset. This is consistent 00 with a previous study that suggested the eventual effector function of  T cells, may already be acquired in the DN1c and all the DN1e subpopulations overlapped significantly with -2 thymocytes (Fig. 6b). 17 However, there were also clearly differences between the DN1e subpopulations. 18

19
We then focused on transcription factors because they are regulators of gene expression and could potentially play 20 key roles in the hardwiring  effector outcomes in DN1 thymocytes. Sox13 was highly expressed by DN1d cells 21 (Fig. 6b), which was previously shown to be important for the differentiation of a subset of DN1 thymocytes into 22 was highly expressed by both DN1e-1 and DN1d cells (Fig. 6b). We thus predict that  effector outcomes could 25 be pre-wired from the DN1 stage, with DN1d going on to develop into IL-17-producing cells and DN1e potentially 26 producing both  effector subsets. 27 28

Hardwiring of  subset outcomes in DN1 subpopulations. 29
To determine if different DN1 subpopulations give rise to different  T cell subsets, the TCR + cells that 30 developed from the different DN1 subpopulations in OP9-DL1 co-cultures were analyzed for effector phenotype 31 by staining for intracellular IL-17A and IFN expression and for expression of specific V chains (Fig. 7a). 32 Cytokine production is highly associated with specific V chain usage, with V1 + cells enriched for IFN and 33  DN1c thymocytes generated both V1 + and V2 + cells, with only a low percentage of IFN expression by the 36 V1 + cells (Fig. 7b-d and Supplementary Table 6). DN1d primarily produced V2 + cells that secreted IL-17A.
Similarly, DN1e-1 and DN1e-2 were restricted to a V2 + IL-17A + . On the other hand, DN1e-3 and DN1e-4 38 exhibited plasticity and could generate both IL-17A and IFN producing cells (Fig. 7b-d Table  39 6). This strongly suggests that not only is the  versus  lineage decision largely determined by the DN1 stage 40 in T cell development, but even  effector outcomes are already hardwired. The concept that the early stages of T cell development may consist of distinct subpopulation has previously been 68 considered. It was shown that DN1 thymocytes can be subdivided into five subpopulations (DN1a-e) based on 69 While it is clear that that the majority of DN1 thymocytes exhibited a restricted lineage outcome, the fact that 05

and Supplementary
DN1e-3 and DN1e-4 can differentiate into both IL-17A and IFN-producing cells, and that DN1c and DN2a still 06 displayed some  potential in addition to their dominant  outcome, means that we cannot entirely rule out that 07 at least some decisions could be made at the DN2b>3 stage. While largely hardwired, there remains some lineage 08 flexibility and there must be other pathways that can contribute to the  versus  lineage decision. TCR 09 signaling may have a role in these remaining undecided precursors. Moreover, appropriate TCR signaling may 10 still be required for maturation of early progenitors even if they are already hardwired to develop into either  11 or  cells. an IL-17A effector phenotype, while strong signaling promotes IFN phenotype . However, 20 it is still unclear whether TCR signal strength is dependent on instructive extrinsic signals or simply a result of 21 stochastic selection of TCR chains. The instructive model proposes that TCR signal competes with pre-TCR 22 signal and the lineage decision is determined by cell specific interactions that activate key transcription factors, 23 which in turn instructs gene expression program (Carpenter & Bosselut, 2010). In contrast, the stochastic model 24 suggests that there is random rearrangement of TCR genes, where thymocytes that successfully assemble a  25 chain progress along the  lineage, while those that assemble specific V chains differentiate into a specific 26 subset of  T cells following appropriate TCR signaling. It is also possible that there may a component of both 27 instruction and random selection to these decisions. Further analysis of whether TCR signaling occurs in a 28 deterministic manner or in a random manner will be important to provide mechanistic insight into lineage 29 commitment and subsequent  effector differentiation of those DN1c and DN2a. Moreover, as mentioned, 30 appropriate TCR signaling may still be important for maturation of the -restricted DN1 subpopulations, even if 31 lineage outcome has already been determined. In conclusion, we have made the novel discovery that the  versus  T lineage decision, as well as the effector 45 outcomes of  T cells, are already hardwired in DN1 thymocytes. Such hardwiring in apparently uncommitted 46 progenitor and prior to the apparent lineage decision, at DN2b>3 in the case of T cell development, could 47 potentially be a common feature of many cellular developmental pathways. It will thus be of interest to perform 48 a similar analysis of other immune developmental pathways, including other lymphocytes, in the future. 49 50

Mice and thymocyte preparations 52
Thymuses was harvested from C57BL/6 mice at around 6 weeks of age. All experiments were approved by the St Sigma) to deplete endogenous thymocytes. The depleted thymic lobes were then transferred onto new sponges 88 soaked in fresh media with supplements but without dGuo for 2d before repopulation with thymocyte progenitors. 89 To repopulate thymic lobes, they were placed in 20mL hanging drop cultures on Terasaki plates (Sigma-Aldrich) 90 containing 5×10 2 to 2×10 3 sorted thymocytes for 24h before returning to fresh sponges. The media was refreshed 91 every 3-4d. Single cell suspensions of the thymic lobes were generated by passing through a 70mm sieve for 92 analysis by flow cytometry. 93 Sorted total DN1 thymocytes were pre-cultured on OP9-DL1 for 24h. The cells were then transduced with barcode 96 lentivirus library (Naik et al., 2013) in StemSpan medium (Stem Cell Technologies) by centrifugation at 900 ×g 97 for 1.5h at room temp. A viral titer pre-determined to give 10% transduction efficiency was used to ensure that 98 the cells are not transduced with multiple barcodes. 2ng/mL murine IL-7 and 5ng/mL human FLT3L was then 99 added to each well and the cells were returned to the incubator. The following day, fresh MEM with supplements 00 was added.  (CD90.2 + CD8 + TCR -) and  (CD90.2 + CD8 -TCR + ) lineage cells were sorted after 14d 01 and 20d OP9-DL1 co-culture. 02 03

Barcode amplification and sequencing 04
Barcode library construction was performed as described previously (Naik et al., 2013). The cells were lysed in 05 0.5mg/ml Proteinase K (Invitrogen) in Direct PCR Lysis Buffer (Viagen) at 55 C for 2h. The Proteinase K was 06 then inactivated at 85C for 30min and 95 C for 5min. The lysate was split into 2 wells for technical replicate 07 PCRs. A first round of PCR was performed using 1× Standard-Taq magnesium free reaction buffer pack (NEB) 08   (Robinson, McCarthy, & Smyth, 2010). The barcode counts were then processed in the following 23 steps: 1) barcodes with 0 read counts in all sample/cell types were excluded from the analysis; 2) Barcodes that 24 were detected in the water control were also removed from the analysis; 3) The read count between technical 25 replicates of the same sample was averaged and the total read count in each sample was then normalized to 100%. based on their corresponding t-SNE coordinates. Heatmaps were plotted to visualize lineage output of barcodes 30 that were identified by transforming the normalized reads value for each barcode using logarithmic transformation. 31 Two-tailed Pearson's correlation analysis with 95% confidence interval was calculated to determine the 32 correlation between each lineage outcomes using Prism v9 (GraphPad Murrow, & Gartner, 2019). The expected number of doublets was calculated as 0.75% of every 1,000 targeted 52 cells of recovery. The remaining cells were re-clustered and were visualized with t-SNE or uniform manifold 53 approximation and projection (UMAP) dimensional reduction with a resolution of 2.0 (Table S1). Differential 54 expression between subclusters was carried out using FindAllMarkers function, with default parameters; 55 differentially expressed genes with adjusted p-value <0.05 and fold change >0.5 or <-0.5 (log2FC) were 56 considered unless otherwise stated. The heatmaps was generated using function DoHeatmap. Hierarchical