Distinct subpopulations of DN1 thymocytes exhibit preferential γδ T lineage potential

The αβ and γδ T cell lineages both differentiate in the thymus from common uncommitted progenitors. The earliest stage of T cell development is known as CD4-CD8- double negative 1 (DN1), which has previously been shown to be a heterogenous mixture of cells. Of these, only the CD117+ fraction has been proposed to be true T cell progenitors that progress to the DN2 and DN3 thymocyte stages, at which point the development of the αβ and γδ T cell lineages diverge. However, recently, it has been shown that at least some γδ T cells may be derived from a subset of CD117- DN thymocytes. Along with other ambiguities, this suggests that T cell development may not be as straightforward as previously thought. To better understand early T cell development, particularly the heterogeneity of DN1 thymocytes, we performed a single cell RNA sequence (scRNAseq) of mouse DN and γδ thymocytes and show that the various DN stages indeed comprise a transcriptionally diverse subpopulations of cells. We also show that multiple subpopulations of DN1 thymocytes exhibit preferential development towards the γδ lineage. Furthermore, specific γδ-primed DN1 subpopulations preferentially develop into IL-17 or IFNγ-producing γδ T cells. We show that DN1 subpopulations that only give rise to IL-17-producing γδ T cells already express many of the transcription factors associated with type 17 immune cell responses, while the DN1 subpopulations that can give rise to IFNγ-producing γδ T cell already express transcription factors associated with type 1 immune cell responses.


Introduction
The ab and gd T cell lineages both arise from common progenitors that seed the thymus from the bone marrow (1,2). In mice, the earliest stages of T cell development are termed doublenegative (DN) because they lack CD4 and CD8 expression. These are further subdivided into DN1 (CD44 + CD25 -), DN2 (CD44 + CD25 + ), DN3 (CD44 -CD25 + ), and DN4 (CD44 -CD25 -) stages. DN1 thymocytes are known to be a heterogenous mixture of cells, based on the expression of cell surface markers such as CD24 and CD117 (c-Kit), proliferative capacity, and expression of early T lineage genes (3). Of these, the CD117 + fraction, which is referred to as "early thymic progenitors" (ETPs), is able to differentiate into DN2 stage cells (3) and appear to be the most efficient at generating CD4 + CD8 + double-positive (DP) ab lineage thymocytes (4). These ETPs are derived from bone marrow progenitors (5), and they still have the capacity to differentiate into NK cells (3, 6, 7) suggesting that they remain uncommitted to the T cell lineage. These cells express transcriptional regulators that are associated with stemness as well early regulators of T cell identity (8). ETPs can be further divided into CD24and CD24 lo subpopulations, termed DN1a and DN1b cells, respectively, which are thought to have a precursor-progeny relationship (3). Interestingly, within these ETPs, there is also a CD63 + Ly6c + subpopulation that appears to be a granulocyte-committed precursor with no T cell potential (8). Thus, even within ETPs, there is significant heterogeneity.
The gd lineage has been proposed to bifurcate from the main developmental pathway at the DN2>DN3 transition, when Tcrb/g/d gene rearrangement occurs (9)(10)(11)(12), because clonal assays of ETPs and DN2 thymocytes show that a large proportion at these cells can give rise to both lineages, but this biopotency is lost by the DN3 stage (10). As thymocytes do not express cell surface TCR until late DN3, it suggests that the ab vs. gd commitment occurs independently of a TCR signal. There remains, however, some debate because strong TCR signals can divert TCRb-pTa (pre-TCR)-expressing DN3 thymocytes towards the gd lineage (13), suggesting that there remains some plasticity at the DN3 stage. Subsequently, another study showed that when DN3 thymocytes express both a pre-TCR and gdTCR, the pre-TCR contributes to generating a strong TCR signal that drives gd differentiation (14). Thus, rather than revealing plasticity, strong TCR signaling at the DN3 stage may in fact be re-enforcing the differentiation of gdcommitted cells.
Thus, the current favored model of early T cell development is that the ETP subpopulation of DN1 thymocytes is the most immature population in the thymus, which progress to the DN2 stage, at which point commitment toward either the ab and gd lineages is initiated. Completion of lineage commitment then occurs at the DN3 stage. The DN3 stage is also when cells are selected for expression of a functional TCRb in complex with pre-Ta, which is known as b-selection, or for expression of a complete gdTCR dimer (15).
However, in vitro clonal assays have shown that DN3 thymocytes can give rise to a far greater frequency of gd cells than when starting from DN2 thymocytes (10). This is inconsistent with a simple linear progression model, although this discrepancy could reflect differing survival or plating efficiencies. Moreover, a significant oversight of this model is that it ignores the large number of CD117 -/lo DN1 thymocytes that exist. These remaining CD117 -DN1 thymocytes can be separated into CD117 lo CD24 hi (DN1c), CD117 -CD24 hi (DN1d), and CD117 -CD24 -(DN1e) subpopulations. However, they were not previously considered part of the T cell developmental pathway because they do not expand as much as the DN1a and DN1b subpopulations when placed in culture (3). Moreover, they appear to differentiate faster than DN2 thymocytes when cultured on OP9-DL1 monolayers, whereas DN1a and DN2b thymocytes progress with kinetics consistent of being developmentally earlier than DN2 (3).
Potentially, these CD117 -/lo DN1 thymocytes may primarily be the progenitors of non-T lineages as it has been shown that dendritic cells can differentiate from these cells as well as from ETPs (16,17). However, it was recently shown that a subset of IL-17-producing gd T cells are in fact derived from Sox13-expressing DN1d thymocytes, not from ETPs (18), and therefore not via the canonical ETP>DN2>DN3 pathway.
This finding that IL-17-producing gd T cells can differentiate from DN1d thymocytes also points to another feature of gd T development that differs from the development of most ab T cells, that effector outcomes are determined in the thymus rather than in the periphery. First, the expression of IL-17A or IFNg by mature gd T cells correlates with Vg chain usage. Vg2 + gd T cells tend to produce IL-17A while Vg1.1 + gd T cells tend to produce IFNg (19,20). Additionally, weak TCR signaling may promote an IL-17A phenotype in gd T cells, whereas strong signaling promotes an IFNg phenotype (21).
While there is little controversy to ab T cell development progressing via the DN2 and DN3 stages and that at least some gd cells bifurcate at the DN3 stage, the ambiguities described suggest that the ab vs. gd decision lineage decision is not as strict as the prevailing model and that there is still much to be clarified at these early stages of T cell development. Notably, CD117 -/lo DN1 thymocytes cannot be simply omitted from a model of T cell development as highlighted by the finding that at least some IL-17-producing gd T cells develop from DN1d thymocytes. To better characterize the earliest stage of T cell development, particularly the composition of the DN1 population, we performed single-cell RNA sequencing (scRNAseq) of mouse DN and gd thymocytes to determine the transcriptional heterogeneity at singlecell resolution. By delineating transcriptionally distinct subpopulations and assessing their lineage potential, we better clarify the composition of the DN populations, particularly of DN1 thymocytes.

Mice and thymocyte preparations
Thymuses were harvested from C57BL/6 mice at 6-7 weeks of age. All experiments were approved by the St Vincent's Hospital Animal Ethics Committee and performed under the Australian code for the care and use of animals for scientific purposes. Total thymocytes were obtained by crushing the thymus through a metal sieve to generate a single cell suspension. The cells were washed with PBS and filtered through a 70 µm sieve to remove any clumps. CD4 + -and CD8 + -expressing thymocytes were depleted using anti-CD4 and CD8 magnetic-activated cell-sorting (MACS) beads (Miltenyi Biotec), according to the manufacturer's instructions. The depleted thymocyte preparation was stained with surface antibodies for sorting on a FACSAria (BD Biosciences).

Flow cytometry
For the analysis of cell surface phenotype, cells were simply stained with antibodies. For the analysis of intracellular cytokine expression, cells were first restimulated in vitro with 50 ng/mL PMA + 2 µg/mL ionomycin (both Sigma-Aldrich) in the presence of Monensin (BD Biosciences) at 37°C for 2.5 h before staining with cell surface antibodies. The cells were then fixed with the Intracellular Fixation and Permeabilization buffer set (eBioscience) and stained with antibodies against cytokines. All antibodies were purchased from eBiosciences except for the anti-CD53 and TCR Vg1.1 antibodies, which were purchased from BD Biosciences. The full list of antibodies can be found in Supplementary Table 1. Flow cytometry data were acquired on LSR Fortessa III (BD Biosciences) and analyzed with the FlowJo v10.7.0 (Treestar) software. When only analyzing cell surface phenotype, dead cells were excluded using DAPI. For intercellular cytokine analyses, the cells were not stained with DAPI but were gated on live cells determined by size.
The Illumina sequencing output was pre-processed with Seurat (v2.3 or v.3.2.2) on R (v3.6.3 and 4.1.0). Cells with <500 genes, >5000 genes, or >7% mitochondria gene expression were filtered out as low-quality cells. Following normalization and removal of confounders, highly variable genes were identified and selected using the VST selection method (23). Unsupervised linear principal component analysis (PCA) was performed on these highly variable genes to group them into 20 principal components. Cell clustering was implemented using the number of components that retain >90% of variance of gene expression in the data.
DoubletFinder (v2.0.3) was applied to remove likely sequencing doublets before downstream analyses (24). The expected number of doublets was calculated as 0.75% of recovered cells. The remaining cells were re-clustered and visualized with t-distributed stochastic neighbor embedding (t-SNE) or uniform manifold approximation and projection (UMAP) dimensional reduction. Differential expression between subclusters was carried out using the FindAllMarkers function, with default parameters; differentially expressed genes with adjusted p-value <0.05 and fold change >0.5 or <-0.5 (log2FC) were considered unless otherwise stated.
Cell cycle genes specific to the G1, S, or G2/M stages were used to perform cell cycle scoring and assign cells to their respective stage of the cell cycle (25). Cell cycle genes were regressed out using Seurat's built-in regression model.

Merging of multiple scRNAseq datasets
To compare cell types and proportions across three independent sequencing runs, the datasets were integrated as described at https://satijalab.org/seurat/archive/v3.0/ integration.html (26). The Seurat package (v.3.2.2) was used to assemble multiple distinct scRNAseq datasets into an integrated dataset and cell cycling scores were calculated. To remove technical variability, the datasets were pre-processed and normalized using SCTransform (27). To correct for experimental batch effect, integration anchors were identified between the experiments then merged using canonical correlation analysis. Linear dimensional reduction was applied and principal components that retain >90% of variance of gene expression in the data were included for downstream analysis. Unsupervised clustering was implemented on the integrated data.

Pseudotime trajectory construction
Filtered 10x data was imported into Monocle 2 by generating a cell dataset from the raw counts slot of the Seurat object. Cells were ordered into a branch pseudotime trajectory according to the procedure recommended in the Monocle 2 documentation (28). The highly variable genes identified by Seurat were chosen as ordering genes to recover pseudospatial trajectories using the setOrderingFilter, reduceDimension, and orderCells functions in Monocle 2 with default parameters. Differential expression between pseudotime states was determined using the Seurat function FindAllMarkers.
Slingshot (29) was also employed to infer developmental trajectories using the umap, clusterLabels = seurat_clusters, and start.clus = X functions, with DN1 cells fixed as the starting point.

Cellular barcoding
Sorted total DN1 thymocytes were pre-cultured on OP9-DL1 for 24 h. The cells were then transduced with barcode lentivirus library (31) in StemSpan medium (Stem Cell Technologies) by centrifugation at 900×g for 1.5 h at room temperature. A viral titer pre-determined to give 5-10% transduction efficiency was used to ensure that the cells were not transduced with multiple barcodes; 2 ng/mL murine IL-7 and 5 ng/mL human FLT3L were then added to each well and the cells were returned to the incubator. The following day, fresh aMEM with supplements was added. ab (CD90.2 + CD8a + TCRgd -) and gd (CD90.2 + CD8a -TCRgd + ) lineage cells were sorted after 14 d and 20 d of the OP9-DL1 co-culture.
Barcode library construction was performed as described previously (31). The cells were lysed in 0.5 mg/ml Proteinase K (Invitrogen) in Direct PCR Lysis Buffer (Viagen) at 55°C for 2 h. The Proteinase K was then inactivated at 85°C for 30 min and 95°C for 5 min. The lysate was split into two wells for technical replicate PCRs. A first round of PCR was performed using 1× Standard-Taq magnesium-free reaction buffer pack (NEB) with 2 mM MgCl2 (NEB), 0.2 mM dNTPs (made in house), 0.5 mM TopLiB forward primer (TGCTGCCGTCAACTAGAACA), and 0.5 mM of BotLiB reverse primer (GATCTCGAATCAGGCGCTTA) for 32 cycles (1 cycle at 95°C for 5 min, 30 cycles at 95°C for 15 s, 57.2°C for 15 s, and 72°C for 15 s followed by 1 cycle at 72°C for 10 min). A second round of PCR was then performed to add different Illumina index to each sample by amplifying the first-round PCR product with a sample specific Illumina forward index primer and a common Illumina reverse index primer for 32 cycles (1 cycle at 95°C for 5 min, 30 cycles at 95°C for 15 s, 57.2°C for 15 s, and 72°C for 15 s followed by 1 cycle at 72°C for 10 min). An aliquot of the PCR product was run on 2% agarose gel to check for barcode amplification, then the samples were pooled, and the DNA was cleaned using NucleoMag SPRI beads (Machery-Nagel).
The 75-cycle sequencing runs were performed on a NextSeq instrument (Illumina). The data was demultiplexed and aligned to the reference barcode library using the processAmplicons function from the edgeR package (32). The barcode counts were then processed in the following steps: 1) barcodes with less than two read counts in all sample/cell types were excluded from the analysis; 2) barcodes that were detected in the water control were also removed from the analysis; 3) the read count between technical replicates of the same sample was averaged and the total read count in each sample was then normalized to 100%. The normalized barcode profiles were checked for any biases by t-SNE clustering. DBSCAN (version 1.1-8) was then used to classify barcodes based on their corresponding t-SNE coordinates.

Fetal thymic organ culture (FTOC)
Fetal thymic lobes were isolated from embryos at gestational Day 14.5 following timed pregnancies of C57BL/6 female mice. They were cultured for 5 days on 0.8-mm isopore membranes (Millipore) atop surgical gelfoam sponge (Ferrosan Medical Devices) soaked in RPMI-1640 (Sigma) supplemented with 10% FCS (Bovogen Biologicals), 20 mM HEPES (Sigma-Aldrich), 50 mM 2-mercaptoethanol (Sigma), and 1.35 mM 2′-deoxygyanosine (dGuo, Sigma) to deplete endogenous thymocytes. The depleted thymic lobes were then transferred onto new sponges soaked in fresh media with supplements but without dGuo for 2 days before repopulation with thymocyte progenitors. To repopulate thymic lobes, they were placed in 20 mL hanging drop cultures on Terasaki plates (Sigma-Aldrich) containing 5×10 2 to 2×10 3 sorted thymocytes for 24 h before returning to fresh sponges. The media were refreshed every 3-4 days. After 14 days, single cell suspensions of the thymic lobes were generated by passing through a 70-mm sieve for analysis by flow cytometry.

Data availability
All scRNAseq datasets have been deposited in the NCBI Gene Expression Omnibus repository under accession number GSE188913.

A comprehensive transcriptional map of early T cell development at single-cell resolution
To better characterize the heterogeneity of the early stages of T cell development, we analyzed the transcriptional landscape of DN and gd thymocytes at single-cell resolution. Cells were sorted from the thymus of C57BL/6 mice for analysis by 10x scRNAseq over three independent runs (Figures 1A, B; Table 1; Supplementary Figure 1A). The first run consisted of total DN (defined as CD4 -CD8 -B220 -CD11b -CD11c -NK1.1 -TCRb -) and TCRgd + thymocytes, the second consisted only of DN1 and DN2 (CD4 -CD8 -CD44 + B220 -CD11b -CD11c -NK1.1 -TCRb -TCRgd -) cells, and the third involved sorting DN1+DN2, DN3 and TCRgd + cells separately and mixing back together post-sort at a ratio of 55% to 30% to 15%, respectively. The latter two runs were performed to ensure that a sufficient number of DN1 and DN2 cells were captured for high-resolution analysis that is not possible by analyzing total DN cells.
A total of 22,094 high-quality cells passed quality control checks across the three datasets. The datasets were first integrated by anchoring common cells (26) in order to assemble a global view of early T cell development. To recover biological distinction from the different replicates and minimize batch-associated variability, the pooled data was normalized using SCTransform. Following dimensional reduction, unsupervised clustering was performed using the first 12 PCs. Next, we performed clustering of the cells at different resolutions starting from 0.5 up to 3.0. We selected a minimum value that was sufficient to separate pre-and post-T lineage commitment cells (DN2a vs. DN2b) and pre-and post-b- Single-cell RNA sequencing analysis of early T cell development. (A) Shown are the gating strategies used to sort DN and gd thymocytes from C57BL/6 mice for 10x scRNAseq. Three separate runs were completed: 1st = total DN and TCRgd + cells; 2 nd = Only DN1 and DN2 cells; 3 rd = DN1 plus DN2, DN3 and TCRgd + cells were sorted separately, and then mixed back together post-sort at a ratio of 55% to 30% to 15% respectively. Dump = CD4, CD8, B220, CD11b, CD11c, NK1.1 and TCRb. (B) Following processing of the 10x data on CellRanger, each dataset was analyzed for clustering based on the first 12 principal components in Seurat. Shown is the UMAP visualization of each run color-coded to DN developmental stage, TCRgd + thymocytes or other (non-thymocytes).  Because there is a massive expansion of cell number between the ETP and DP stages, we also tested the effect of cell cycle gene expression on the clustering. Cell cycling scores were calculated from the integrated data and regressed using Seurat's built-in regression model (Supplementary Figure 3). There was a slight variation in the number of output clusters, but we did not observe any biological variability that could be explained by cell cycle status. The genetic profiles were nearly identical between the outputs with cell cycle genes left in and regressed out. Thus, the transcriptional heterogeneity of DN thymocytes observed is not simply a result of being in a different phase of the cell cycle. This also meant that exclusion of cell cycle genes was not necessary for downstream analyses.

Trajectory analysis implies that gd T cell development branches from DN1 thymocytes
We next employed Monocle 2 (28) to infer a potential developmental pathway from the scRNAseq data of DN and gd thymocytes by ordering the cells based on tracking gene expression in pseudotime analysis (Figure 2A). This assembled the cells along an asymmetric trajectory that divided into six states ( Figure 2B). Each state was then analyzed for the expression of signature genes ( Figure 2C) to assign to a stage in development. State 2, comprising DN1 and some DN2a cells, was identified as pre-T lineage committed cells, and therefore the starting point. State 3 contained T lineage committed cells and cells that had passed bselection, and therefore corresponded to the main ab pathway, with We also performed trajectory analyses with another algorithm, Slingshot (29), on the integrated dataset ( Figure 2D). This also suggested that gd cells develop directly from DN1. Moreover, the main ab pathway, ending with DN4, and the alternate branch that terminated with DN3 cells were also observed. However, this algorithm also predicted a DN1 branch.
We suspected that the alternate branch that terminated with DN3 cells represented cells that had not passed b-selection. We therefore performed differential gene expression analysis between the State 4/5/6 cells with the State 3 cells. First, Cd27 expression was upregulated in State 3 but not State 4/5/6 cells, which was previously shown to delineate pre-and post-b-selected DN3 cells (2). Furthermore, there was an enrichment of differentially expressed genes associated with "Apoptosis", "Senescence", "DNA replication" and "Cell Cycle" KEGG terms ( Figure 2E), among others. Together, these strongly suggest that this branch represents cells that had failed b-selection.
Our de novo assembly of a model of early T cell development based on scRNAseq data thus suggests that the current model where gd development branches from the DN2b>DN3a stage is likely to be incomplete. Cellular barcoding in OP9-DL1 cocultures suggests only a partial overlap of the DN1 thymocytes that give rise to ab and gd T cells If the ab and gd lineages can develop independently, we might expect to see each lineage being derived from distinct DN1 thymocytes. To investigate this, we sorted total DN1 (CD44 + CD25 -CD4 -CD8 -B220 -CD11b -CD11c -NK1.1 -TCRb -TCRgd -) thymocytes, which includes both CD117 + ETPs and the CD117 -/lo subpopulations. The cells were tagged with a lentiviral library of unique DNA barcodes (31). Once tagged, that barcode is inherited by the progeny of that cell and thus we can estimate how frequently ab cells and gd cells originate from the same starting DN1 cell ( Figure 3A). If an ab cell and a gd cell inherit the same barcode sequence, it means that they were derived from the same progenitor.
First, we performed a time-course of ab vs. gd differentiation from DN1 thymocytes to determine the optimal times for this analysis (Figures 3B, C). Cell surface CD8 was used as the marker of ab lineage cells because the fixation required to detect intracellular TCRb interferes with downstream analyses. Late DN4 to DP-staged cells and the few CD8 (TCRb + ) single positive cells that are generated were captured by this strategy. We also checked CD4 expression (not shown), which correlated entirely with the DP stage. CD4 (TCRb + ) single positive cells do not develop in these cultures due to lack of class II MHC presentation (33). TCRgd + cells first appeared in these DN1 OP9-DL1 cocultures on Day 9 and reached a maximum at Day 12. Cellular barcoding of DN1 thymocytes suggests that individual cells more frequently give rise to a single lineage than to both ab and gd cells in OP9-DL1 cocultures. (A) Overview of the experimental setup to track lineage outcomes of DN1 thymocytes. Total DN1 (CD25 -CD44 + CD4 -CD8 -B220 -CD11b -CD11c -NK1.1 -TCRb -TCRgd -) thymocytes were sorted from the thymus of C57BL/6 mice and tagged with unique genetic barcodes encoded in a lentiviral library. The cells were then differentiated on OP9-DL1 monolayers. (B) Shown is a representative of the ab versus gd lineage profiles over a time course in these OP9-DL1 cocultures. ab lineage cells were identified as CD8a + , which captures cells from late DN4 onwards, while gd lineage cells were identified as TCRgd + . CD4 expression was also analyzed and was largely concomitant with CD8a expression as most ab cells were DP (not shown). For the cellular barcoding analysis, ab (CD90.2 + CD8 + CD4 +/-TCRgd -) and gd (CD90.2 + TCRgd + ) lineage cells were sorted at Day 14 from half the culture. The remaining half was further differentiated out to Day 20 and then sorted. The DNA from the resulting ab and gd cells were amplified by PCR then Illumina-sequenced ( Figure 3D). We found that at Day 14, only 45% of the detected barcode species were sequenced in both ab and gd populations, while at Day 20, only 33% of detected barcode species were sequenced in both. At both time points, the majority of unique barcode species were detected in only the ab or gd population, suggesting that these were derived from DN1 thymocytes that gave rise to only one lineage. Together with the trajectory analysis of the scRNAseq data, this suggests that there are different DN1 thymocyte subpopulations, some of which have bipotency for both lineages, while others appear to preferentially develop into either ab cells or gd cells. Thus, it is possible that at least some gd cells can develop directly from DN1 thymocytes rather than following the canonical pathway and bifurcating from ab cells at the DN2>DN3 stage.

Transcriptional heterogeneity of DN1 and DN2 thymocytes
To better delineate the earlier stages of the developmental model, we focused the second 10x run that was performed specifically on DN1 and DN2 thymocytes. This allowed for more precise delineation of these two stages than if analyzing all DN thymocytes together. Unsupervised clustering of the 8,851 cells that pass quality control checks identified 26 clusters ( Figure 4A; Supplementary Figure 4A). This was based on a resolution of 2.0, which was the minimum number that resulted in clusters containing only DN1, DN2a, or DN2b thymocytes. Specifically, we checked that no cluster contained a mixture of cells with DN2a (pre-T-specification) or DN2b (post-T-specification) identity, at least based on expression of conventional stage marker genes. DN1 cells expressed high levels of progenitor markers, including Hhex, Il7r and Cd44 but low levels of T-commitment markers, such as Il2ra, Tcf7, Notch1, Cd24a, and Myb (Supplementary Figure 4B). Late T-lineage commitment genes, including Bcl11b, Rag1, Rag2, Notch3, and Ptcra, were used to separate the DN2a and DN2b subpopulations (Supplementary Figure 4B). This resulted in eight DN1, five DN2a, and 10 DN2b subpopulations. Some clusters formed distinct subpopulations, particularly DN1 thymocytes, while others appeared to be divisions within a continuum, particularly DN2 thymocytes. Such heterogeneity within these early T cell developmental stages is consistent with that previously reported, at least for DN1 thymocytes (3). There were also three small clusters of non-T cells consisting of doublets and B cells that, for simplicity, were excluded from downstream analysis.

Identification of novel cell surface markers for delineating DN1 and DN2 subpopulations
While various cell surface markers have been used to divide DN1 thymocytes into DN1a to DN1e, and DN2 thymocytes into DN2a and DN2b (3), these are insufficient to delineate the larger number of subpopulations that are suggested by the scRNAseq analysis. We therefore needed to identify useful cell surface markers that could be used for flow cytometry. Differential expression analysis was performed to select features (p < 0.05 and twofold difference) for cross-referencing with GO terms for cell surface proteins (not shown). We then tested commercial antibodies against these candidates. Protein does not always correlate with mRNA levels and therefore not all antibodies produced a staining pattern that matched the scRNAseq data. However, we were able to identify a minimal panel that could delineate the eight DN1 subpopulations and most of the DN2 subpopulations.
CD24 and CD117 have previously been shown to divide DN1 thymocytes into five subpopulations (3). DN1a, DN1b, DN1c, and DN1d each corresponded to a single cluster in the scRNAseq analysis, but we identified four clusters (#1, 3, 5, and 22) that corresponded to DN1e thymocytes ( Figure 4A). The inclusion of CD314 (NKG2D), CD317 (BST2), and Sca-1 delineated these four DN1e subpopulations (Figures 4B, C). Differential CD90 and CD117 expression distinguished DN2a from DN2b cells, which were then subdivided further based on CD53, Ly-6d, and CD3e expression (Figures 4D, E). However, unlike DN1 cells that appeared to form distinct subpopulations, DN2 cells did not clearly partition into subpopulations. The pair of DN2a clusters 7/15 could not be separated by cell surface markers, nor could the pairs of DN2b clusters 4/23, 9/13, and 8/19 due to very similar gene expression profiles. Thus, these cluster pairs were combined. The resulting 11 DN2 clusters were annotated as DN2a-1 to 4 and DN2b-1 to 7 for identification purposes (not an indication of developmental order). Figure 5), and thus, it was employed to sort subpopulations of DN1 and DN2 thymocytes for functional analyses.

Using this panel of antibodies, we reliably identified the eight subpopulations of DN1 cells and 11 DN2 cells over experiments (Supplementary
OP9-DL1 cultures reveal that specific subpopulations exhibit a bias towards the gd lineage The transcriptional heterogeneity of the DN1 thymocytes may be an indication that only some subpopulations are true progenitors of T cells, as previously suggested (3). Alternately, these different subpopulations could represent different progenitors of ab and gd lineages, which would be consistent with the early branch point inferred by pseudotime trajectory analyses (Figures 2A, B). To test this, we sorted the DN1 and DN2 subpopulations using our antibody panels and accessed their ab vs. gd lineage potential in OP9-DL1 cultures after 14 and 20 days.
At Day 14, there were primarily ab lineage or undifferentiated cells in DN1a, DN1b, and DN1c cultures, with very few TCRgd + cells produced (Figures 5A-C). DN1d and all DN1e cultures only generated TCRgd + cells. Notably, the number of TCRgd + cells produced in all DN1e cultures was substantially greater than the number of TCRgd + produced in DN1a and DN1b cultures ( Figure 5C). At 20 days, there was a substantial increase in the percentage and number of ab cells produced in DN1a and DN1b cultures, while DN1c had given rise to both gd cells and ab cells. DN1d and DN1e subpopulations continued to exhibit a bias toward the gd lineage, with DN1e-1 and DN1e-2 producing the highest percentage and number of TCRgd + cells. In terms of absolute numbers, DN1a and DN1b subpopulations produced ab cells at a greater rate than the production of gd cells from the DN1c, DN1d, or DN1e subpopulations. Starting from 10 3 DN1a thymocytes, almost 2 × 10 4 ab cells were present after 20 days, whereas only 2-3×10 3 gd cells had been produced in all four DN1e cultures after this time ( Figure 5C). This of course could be a reflection of gd cells differentiating and dying more quickly, but is also implies that DN1a and DN1b subpopulations are proliferating more quickly (3) and preferentially producing ab cells.
At Day 14, all DN2b subpopulations and DN2a-1 had produced a high percentage and number of ab cells and few TCRgd + cells, while the rest of DN2a subpopulations produced mostly TCRgd + cells and some ab cells (Figures 5D, E). By Day 20, all DN2 subpopulations had produced ab lineage cells, while TCRgd + cells had been lost from the DN2a cultures. There was also a dramatic reduction in cell numbers in the DN2b cultures, which may be a result of the cells undergoing cell death after reaching the DP stage because these cultures poorly support the later stages of ab maturation (33).

Fetal thymic organs cultures confirm the lineage bias of DN1 subpopulations
While OP9-DL1 cultures are a well-characterized system for analyzing T cell development, it is possible that the lineages biases observed for the different DN1 subpopulations may be exaggerated in this model. We therefore also assessed the differentiation of select DN1 subpopulations in fetal thymic organ cultures (FTOCs), by seeding the sorted subpopulations into dGuo-depleted fetal thymic lobes ( Figure 5F). The lobes were analyzed at Day 14 after repopulation. Like in the OP9-DL1 co-cultures, DN1b cells preferentially produced ab cells, DN1c cells primarily produced abcells, and some gd cells, while DN1d and DN1e-4 cells only produced TCRgd + cells ( Figure 5G).
Thus, both OP9-DL1 and FTOC systems suggest that the DN1d and DN1e subpopulations are biased toward the gd lineage. Moreover, although DN1a, DN1b, and DN1c can produce both ab cells and gd cells, they are heavily biased toward the ab lineage and produce these cells in large numbers. The DN1d and DN1e subpopulations, together, make up more than three-quarters of DN1 thymocytes. Thus, on a per cell basis, it appears that more gd cells are produced from CD117 -DN1 thymocytes than from what are traditionally considered the ETPs (DN1a and DN1b).
Gene expression analysis of DN1d and DN1e subpopulations suggest potential relationships with distinct mature gd subsets Consistent with two group of gd progenitors identified by Sagar and colleagues (34), gd thymocytes clearly segregate into two clusters (#11 and 19) in our scRNAseq dataset ( Figure 1C). We therefore wanted to determine how these relate to the different DN1 subpopulations that produced gd cells. Differential gene expression analysis revealed substantial differences between the two gd clusters, with 93 genes expressed at significantly higher levels by cluster 11 cells, while 117 genes were expressed at significantly higher levels by cluster 19 cells ( Figure 6A). Genes that were highly expressed by cluster 11 include Gzma, Blk, Maf, Sox13, Etv5, Gata3, Ccr9, Rorc, Sox4, Tcf12, Lgals9, Cmak4, and Bcl11b, which are all associated with the IL-17 effector phenotype (34). This suggests that cluster 11 cells are probably the gd thymocytes that mature into the gd17 subset in the periphery. Numerous interferon-related genes, such as Stat1, were highly expressed by cluster 19 cells, suggesting that these cells probably mature into the IFNg-producing gd subset. This is consistent with a previous study that suggested the eventual effector function of gd T cells may already be acquired in the thymus (21). Next, to investigate the relationship between these gd thymocyte subpopulations and the DN1 subpopulations that differentiate into TCRgd + cells, the DN1c, DN1d, and DN1e subpopulations were analyzed for the expression of the 210 differential genes that distinguished the two gd thymocyte populations. This revealed a similar transcriptional profile between DN1d and cluster 11 gd thymocytes, while DN1c and all the DN1e subpopulations overlapped significantly with cluster 19 gd thymocytes ( Figure 6B). However, there were clearly also differences between the DN1e subpopulations.
Differential expression of transcription factors was notable. These are likely to be important because as regulators of gene expression they could potentially play key roles in hardwiring gd effector outcomes in DN1 thymocytes. Sox13 was highly expressed by DN1d cells (Figure 6B), which was previously shown to be important for the differentiation of a subset of DN1 thymocytes into IL-17-producing gd T cells (18). Interestingly, DN1e subpopulations also express transcription factors associated with IL-17-producing gd T cells. Notably, DN1e-1 and DN1e-2 cells expressed high levels of Maf, while Gata3 was highly expressed by both DN1e-1 and DN1d cells ( Figure 6B). We thus predict that the foundation of the gd effector transcriptional network may already be in place in DN1 subpopulations, with DN1d going on to develop into IL-17-producing cells and DN1e potentially producing both effector subsets.

Different DN1 subpopulations can give rise to different gd effector subsets
To determine if different DN1d and DN1e subpopulations might differentiate into distinct effector gd T cell subsets, the TCRgd + cells that developed in OP9-DL1 co-cultures were analyzed for intracellular IL-17A and IFNg expression and for expression of specific Vg chains ( Figure 7A). Cytokine production is highly associated with specific Vg chain usage, with Vg1.1 + cells enriched for IFNg and Vg2 + cells enriched for IL-17A production (19,35).
DN1c thymocytes generated both Vg1.1 + and Vg2 + cells, with only a low percentage of Vg1.1 + cells expressing IFNg ( Figures 7B-D). DN1d primarily produced Vg2 + cells that express IL-17A. Similarly, DN1e-1 and DN1e-2 were biased towards a Vg2 + IL-17A + gd effector subset. On the other hand, only DN1e-3 and DN1e-4 exhibited the plasticity to generate both IL-17A and IFNg-expressing cells ( Figures 7B-D). This suggests that the foundation of gd effector programs is already in place in DN1 thymocytes. Indeed, the differential gene expression analysis clearly showed that each of A B FIGURE 6 The gene expression profiles of the different DN1 subpopulations correlate with distinct gd effector subsets. (A) Heatmap showing genes differentially expressed (p-value <0.05 and twofold difference) between the two gd thymocyte subpopulations (clusters 11 and 19) identified in the scRNAseq analysis of DN and gd thymocytes in Figure 1C. Each column is an individual cell in the dataset while each row is a differentially expressed gene. Genes associated with either type 17 immune responses (IL-17A-production) or type 1 immune responses (IFNg production) are indicated. gd-producing DN1 subpopulations are transcriptionally distinct from each other, although DN1e-3 and DN1e-4 cells did appear somewhat similar to each other ( Figure 6B).

Discussion
Our study has confirmed that, at a transcriptional and functional level, DN1 thymocytes are indeed a heterogenous population of cells (3). We also confirmed that DN1a and DN1b thymocytes (CD117 + DN1 fraction), which together have been considered the true ETPs, can give rise to both ab and gd lineages. However, we showed that gd cells can be derived from multiple progenitor sources, with the CD117 -DN1 subpopulations more efficient at producing gd cells than DN1a and DN1b subpopulations. That being said, ab cells are produced in greater numbers because DN1a and DN1b thymocytes proliferate much more than the other DN1 subpopulations (3), and these two DN1 subpopulations only produce very few TCRgd + cells compared to ab lineage cells. This would explain why TCRgd + thymocytes are greatly outnumbered by ab lineage thymocytes (DN4 onwards) within the thymus.
While CD117 + DN1 cells have been considered bipotent, a previous study demonstrated the existence of lineage-restricted precursors. Spidale and colleagues showed that neonatal IL-17producing gd T cells develop from a subset of Sox13-expressing DN1d thymocytes and that this lineage is determined by a cell intrinsic program that is independent of TCR signaling (18). Our study has now expanded the subdivision of DN1 thymocytes to define eight subpopulations. We showed that not only are DN1d  Evidence suggests that TCR signal strength is an important determinant of lineage outcome in bipotential precursors. Strong signals that activate the ERK-EGR-ID3 pathway have been shown to drive gd T cell differentiation, whereas a weak signal promotes the ab fate (37-39). However, it is still unclear whether TCR signal strength is dependent on instructive extrinsic signals or simply a result of stochastic selection of the TCR chains that are expressed. The instructive model proposes that TCRgd signals compete with pre-TCR(b) signals and that the lineage decision is determined by cell-specific interactions that activate key transcription factors, which in turn instructs the gene expression program (15). In contrast, the stochastic selective model postulates that lineage commitment is determined prior to the onset of TCR gene rearrangement, and it is only when a thymocyte receives the appropriate TCR signal that matches its hardwired identity that it actually progresses along the ab or gd developmental pathway. DN1d and DN1e thymocytes are clearly committing to gd lineage cells prior to the expression of the TCR. The fact that a DN3 thymocyte expressing both a pre-TCR and gdTCR results in an even stronger signal that drives gd differentiation (14) also points towards stochastic selection. This does not rule out the role of selection because the appropriate antigen presenting cell may still be required for gd maturation.
Not only do a large fraction of gd thymocytes appear to be derived from distinct DN1 thymocytes prior to TCR signaling, but we also observed compartmentalization of effector outcomes ( Figure 8). Unlike ab T cells, gd T cells are thought to acquire their effector potential in the thymus rather than upon antigen exposure in secondary lymphoid organs (40). Although none of the DN1 subpopulations expressed definitive markers of specific mature T cell populations, we observed substantial transcriptomic overlap between the IL-17-primed DN1 subpopulations with mature IL-17-expressing gd thymocytes and between the IFNgprimed DN1 subpopulations with mature IFNg-expressing gd thymocytes. Expression of key transcription factors was notable. We showed that DN1d thymocytes express many of the transcription factors that are expressed by mature IL-17producing gd thymocytes but not IFNg-producing thymocytes, like Bcl11b, Etv5, Sox13, Rcf7, Rorc, and Maf. SOX13 has previously been shown to be an important lineage determining factor for the neonatal IL-17A-producing cells (18). It is thought to act in concert with other transcription factors like BC11B, TCF7, RORgd and c-MAF to specify the IL-17A-effector program in gd T cells (18,41).
While DN1e-1 and DN1e-2 thymocytes were also biased towards a Vg2 + IL-17A-producing gd T cell fate, only DN1e-3 and DN1e-4 thymocytes displayed the plasticity to also produce Vg1.1 + IFNg + cells. This plasticity is likely to involve the differential expression of transcription factors that contribute to distinct effector fates. Although all four DN1e subpopulations expressed Stat1, DN1e-1 and DN1e-2 also expressed Maf. c-MAF is known to positively regulate IL-17A-producing gd T cell development (42), whereas a lack of c-MAF expression by gd T cells correlates with increased IFNg expression (42-44). Furthermore DN1e-1 cells were found to express Gata3, encoding another important regulator of IFNg expression (45). The interplay between transcription factors may thus be key to lineage decisions. Thus, critical components of the IL-17 or IFNg-producing gd T cell transcriptional programs appear to be already in place within distinct DN1 subpopulations, suggesting that predetermination contributes to gd effector subset differentiation. Interestingly, Shibata et al. previously showed that the DN2 thymocyte stage contains a heterogenous mixture of gd T cell precursors that give rise to either IL-17 producers or nonproducers (46), thus further suggesting that gd effector outcomes may indeed be hardwired from these stages in T cell development. Further analysis of chromatin states and epigenetic mechanisms associated with these transcription factors will likely be valuable for revealing the regulatory cascades that drive the different gd effector outcomes.

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

Ethics statement
The animal study was reviewed and approved by St Vincent's Hospital Animal Ethics Committee.