Antigen experience relaxes the organisational structure of the T cell receptor repertoire

The creation and evolution of the T cell receptor repertoire within an individual combines stochastic and deterministic processes. We systematically examine the structure of the repertoire in different T cell subsets in young, adult and LCMV infected mice, from the perspective of variable gene usage, nucleotide sequences and amino acid motifs. Young individuals share a high level of organization, especially in the frequency distribution of variable genes and amino acid motifs. In adult mice, this structure relaxes and is replaced by idiotypic evolution of the effector and regulatory repertoire. The repertoire of CD4+ regulatory T cells was more similar to naïve cells in young mice, but became more similar to effectors with age. Finally, we observed a dramatic restructuring of the repertoire following infection with LCMV. We hypothesize that the stochastic process of recombination and thymic selection initially impose a strong structure to the repertoire, which gradually relaxes following asynchronous responses to different antigens during life.

The ability to sustain effective T cell immunity relies on a diverse αβ heterodimeric T cell receptor (TCR) repertoire 2 generated by the stochastic variable, diversity and joining (VDJ) recombination mechanism (Kohler et al., 2005). This 3 diverse repertoire is shaped over time by recombination biases (Qi et al., 2014) (Snook et al., 2018), thymic and extra-4 thymic selection (Kohler et al., 2005) (Qi et al., 2014) (Kavazović et al., 2018) , selective migration and antigen-driven 5 clonal expansion. The encounter with cognate peptide-MHC complex (pMHC) also drives the differentiation of the T cell. 6 For example, the strength of TCR stimulation can skew differentiation of memory versus effector T cells (Snook et al.,7 2018) (Kavazović et al., 2018) and CD4+ regulatory (Treg) versus effector/memory CD4+ cells (Lee et al., 2012) (Stritesky 8 et al., 2012) linking TCR specificity to phenoytpe and function. The aim of this study is to document the influence of 9 these diverse processes on the underlying structure and organization of the TCR repertoire, determined at a global 10 level. 11 Several previous studies have used deep sequencing to explore the TCR repertoire in different T cell subsets. For 12 example, significant changes can be found between the repertoires of CD4+ and CD8+ cells, presumably reflecting 13 selection by different classes of MHC peptide complexes (Li et al., 2016) - (Gulwani-Akolkar et al., 1995). Similarly, the 14 repertoire differences found between CD4+ Treg and conventional CD4+ cells (Pacholczyk et al., 2006) , (Wang et al.,15 2010) are presumed to be shaped by their recognition of self or foreign peptides. However, the processes driving 16 repertoire diversification are probabilistic, rather than deterministic. As a result, identical TCR sequences can be found 17 in multiple subsets, and can even be shared between CD4+ and CD8+ populations (Wang et al., 2010). 18 In young individuals, the majority of the T cell compartment is made up of naïve cells, and the repertoire is presumably 19 shaped largely by stochastic recombination and thymic selection. However, as individuals age their immune system 20 responds to an increasing number of foreign antigens, derived principally from microbial, allergen or altered-self (e.g. 21 neoantigen) exposure. This drives a relative shift towards the memory/effector phenotype (Arnold et al., 2011), 22 accompanied by increased clonal expansion. Interestingly, exposure to antigen in different individuals can drive both 23 convergent and divergent repertoire evolution (Heather et al., 2016) , (Pogorelyy et al., 2018). At the repertoire level 24 clonal expansion results in a gradual decrease in overall repertoire diversity (Jörg J. et al., 2015)   . 25 The CD4+ T cell repertoire diversity is more preserved with age in the bone marrow compared to the spleen (Shifrut et 26 al., 2013), which may relate to the role of the bone marrow microenvironment in preservation of memory T cells (Di 27 Rosa and Pabst, 2005) , (Baliu-Piqué et al., 2018). The Treg repertoire also changes with age, as production of thymic 28 "natural " Treg drops significantly, and are replaced by a high proportion of Tregs with active effector/memory 29 phenotype (Smigiel et al., 2014) , (Thiault et al., 2015). 30 In this study, we combine multi-parameter fluorescence-activated cell sorting with high-throughput-next generation 31 sequencing to undertake a comprehensive high resolution analysis of the αβ TCR repertoire of various T cell 32 compartments in young and adult mice, comparing CD4+ and CD8+ T cells of naïve, central memory, effector and Tregs,33 from the spleen and bone marrow. We illustrate the impact of strong antigen exposure on the global properties of the 34 repertoire by analyzing the changes that follow infection with lymphocytic choriomeningitis virus. We quantify the global 35 parameters of the repertoire at different levels of dimensionality, spanning variable gene frequencies, amino acid motif 36 frequencies and at the level of individual nucleotide sequences. We explore different ways to visualize the structure and 37 order which underlies the superficially diverse and chaotic collections of different DNA and protein sequences which 38 constitute the T cell repertoire. Finally, we interpret our observations from the perspective of the probabilistic, but not 39 chaotic processes which determine the development and evolution of the TCR repertoire. We hypothesize that these 40 processes operating on millions of T cells impose a strong overall structure to the repertoire. This structure relaxes as a 41 result of divergent responses to antigen exposure in different individuals. 42 appreciate that our antibody panel does not fully capture the complexity of the T cell compartment, and that more 48 extensive panels would be required to fully differentiate between all the known sub-compartments. However, for the 49 purpose of this high level analysis, we simplify the nomenclature, and refer to the sorted populations as naïve, Treg, 50 central memory and effector. After RNA extraction, we amplified the TCR repertoire using a previously published 51 experimental pipeline which incorporates unique molecular identifiers (UMI) for each cDNA molecule to correct for PCR 52 bias and sequencing error, allowing a robust and quantitative annotation of each sequence in terms of V gene, J gene, 53 CDR3 sequence and frequency (Oakes et al., 2017) , (Uddin et al., 2019). 54 The numbers of cells and the number of TCR mRNAs (captured by the total UMI count) which were recovered varied 55 widely between compartments and age groups. For example, both splenic CD4+ and CD8+ naïve compartment from 56 young mice resulted in the highest average UMI count (~415,000) while the splenic CD4+ central memory (CM) 57 population yielded the lowest average UMI count (~44,000). As expected, the proportion of naïve cells in both spleen 58 and bone marrow was higher in young than adult mice, and this was balanced by an increase in memory and especially 59 effectors in the older mice (SI Table 1). The total UMI count was strongly correlated with the number of sorted cells 60 across compartments and tissues (SI Fig 1C). The number of α and β UMIs were also highly correlated (SI Fig 1D). Both 61 these correlations provide additional confidence in the robustness and quantitative output of the overall pipeline. 62 The clonal structure and diversity of the repertoire varies with compartment and age. 63 We first explored the changes in the clonality and diversity of the TCR repertoire across compartments and tissues. We 64 estimated T cell clonotype size by the number of different UMIs associated with a unique TCR, and illustrated the clonal 65 frequency distribution of the repertoire within each population (e.g. Figs 1B and 1C for spleen; SI Fig2A and B for bone 66 marrow). As a comparator in this, and subsequent figures, we generated a set of synthetic TCRs using SONIA, a 67 generative probabilistic model of TCR recombination which incorporates learnt parameters of the genomic TCR 68 recombination process, without any subsequent selective expansion (Sethna et al., 2020). This serves as a useful 69 baseline with which to compare real repertoires, in which the products of recombination have been shaped by selection 70 and proliferation. 71 As expected, the naïve repertoires were dominated by rare TCRs (observed only once or twice in a sample) and had very 72 few expanded clonotypes (expanded clones are represented by the darkest color in panel B, and by the points to the 73 right in panel C). The naïve repertoires were also most similar to the synthetic repertoires. In contrast, T effectors 74 contained much larger numbers of expanded clonotypes, and this was more pronounced in CD8+ cells from the older 75 mice. Consistent with these distributions, the Simpson index, and the Shannon index, two commonly used measures of 76 diversity of the repertoire, were highest in naïve populations from young individuals, and progressively lower in central 77 memory and effectors ( Fig 1D, SI Fig 2C). The Simpson and Shannon indices are examples (k = 2 and k = 1, respectively) 78 of a series of diversity measurements, which are captured by the Renyi entropy of order k, where k can run from 0 to 79 infinity. We calculated the Renyi diversities for k = 0, 0.25, 0.5, 1, 2, 4 for each repertoire and then plotted them in two 80 dimensions using principal component analysis (PCA; Fig 1E and SI Fig 2D). In the young mice, the repertoires of naïve, 81 central memory, effector and T regulatory cells are clearly separated by the diversity measurements alone, with almost 82 all the variance captured in a single dimension (reflecting very consistent differences across the entire Renyi profile). In 83 older mice, the distinction between the populations is still observed but is less clear cut, and with greater variation 84 between individual mice. All panels in Fig 1 show the results obtained for the TCRβ repertoires (spleen), because TCRβ 85 repertoires are the most diverse and are more commonly studied. However, similar results were observed for the α 86 repertoires, and the diversity of α and β repertoires was very highly correlated (SI Fig 1E). 87 In summary, the analysis of the repertoires of different populations captures the known decreasing diversity and 88 increasing clonality of the naïve, central memory and effector compartments in both spleen and bone marrow and the 89 decrease in diversity observed with age. These results build further confidence in the reliability of the repertoire 90 sequencing and analysis pipeline. 91 The TCRs in each repertoire were ranked according to frequency, and the proportion within each decile is illustrated (low abundance sequences in white, ranging to high abundance sequences in dark red). The percentage of the distribution represented by the top decile is shown in white text. (C) The sequence abundance distribution in each compartment. The plots show the proportion of the repertoire (y-axis) made up of TCR sequences observed once, twice etc. (x-axis). Repertoires from young mice are shown with red dots, repertoires from older mice in blue dots and synthetic repertories in green. (D) Simpson and Shannon scores of equal repertories size (1000 CDR3NTs) from each compartment and mouse. Colors same as panel C. Mean is shown in black lines (n=3). (E) PCA of the Renyi diversities of order 0, 0.25, 0.5, 1, 2, 4. Treg   0  1  2  3  4  0  1  2  3  4  0  1  2  3  4  0  1  2  3  4  0  1  2  3  4  0  1  2  4  0  1  2  3  3   10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10  As reported previously (Ndifon et al., 2012) , , both Vα and Vβ gene usage was non-uniform in all 93 the repertoires examined, and also in the synthetic repertoire sequences, reflecting differential usage of V genes in 94 the recombination process (Kohler et al., 2005)(SI Fig3A). However, the distribution of V gene usage also differed 95 between T cell naïve subsets (young vs. adult, adult vs .synthetic, young vs. synthetic mice). The pairwise similarity 96 between V gene distributions of different repertoires was quantified using the cosine similarity between the 97 distributions (see Methods). We also used the Horne similarity index (Greiff et al., 2015) , (Venturi et al., 2008) and 98 found these two measures highly correlated (SI Fig3B). 99 A hierarchical clustered heatmap summarizes the similarity between all pairwise combinations of repertoires for 100 TCRβ V genes (Fig 2A). In young individuals there was a clear segregation between CD4+ and CD8+ repertoires,and 101 between naïve, central memory, effector and Treg populations. Naïve, central memory, and Tregs repertoires were 102 most similar, while effectors were mapped to a distinct branch. In contrast, there was little distinction between 103 spleen and bone marrow within each sub-compartment. Repertoires from the same compartment but different 104 individuals clustered together, demonstrating that each compartment had a distinct repertoire distribution, 105 conserved between individuals. 106 In contrast to the strong hierarchical structure observed in the TCRβ repertoires in 12-week individuals, the Vβ gene 107 usage in repertoires from older animals was much more heterogenous. Although the distinction between CD4+ and 108 CD8+ repertoires was mostly still retained, the sub-compartments were much more inter-mingled. The repertoires of 109 the CD8+ effector compartments, in particular, showed little similarity between individuals. 110 The structure observed in the heatmap organization was further investigated by performing principal component 111 analysis (PCA) on the pairwise similarity matrix for Vβ usage ( Fig 2B) for young (top panels) and adult (bottom 112 panels) mice. Each dot represents an individual repertoire and is colored by CD4+/CD8+ compartment (left panels), 113 anatomical compartment (middle panels) and differentiation phenotype (right). In young mice there is a clear 114 separation of both CD4+ and CD8+ repertoires, and of repertoires from different functional compartments. We 115 noted that the Treg populations lie closest to the naïve, while the biggest variance is seen between effector 116 populations. In adult mice, the separation between CD4+ and CD8+ repertoires is retained, but the distinction 117 between functional compartments largely collapses. 118 In contrast to the TCRβ repertoires, the equivalent analysis for the α repertoires (SI Fig 3C and 3D) showed much less 119 evidence of consistent structure in either heatmap or PCA. Furthermore, there was only limited correlation between 120 the cosine similarities of α and β repertoires, especially in the older individuals (SI Fig 3E). The selective pressures 121 which shape the repertoires of different CD4+ and CD8+ compartments therefore seem to be reflected differently in 122 Vα and Vβ gene usage. 123 Since we observed that there were no systematic differences between spleen and bone marrow repertoires in terms 124 of Vβ gene distribution, we estimated the degree of variation which could be attributed to idiosyncratic differences 125 between mice, by comparing intra-individual (between bone marrow and spleen) differences with inter-individual 126 differences ( Fig 2C). The plots illustrate a clear hierarchy of variance, with naïve repertoires being closest to each 127 other, followed by central memory and Tregs, and with effector repertoires showing the greatest divergence. CD8+ 128 repertoires (right panels) showed greater divergence (smaller similarity indices) that CD4+ repertoires, and the adult 129 repertoires showed greater variance than young. Interestingly the intra-individual variation was in general very 130 similar to the inter-individual variation, the only exception being the effector CD8+ repertoires in the older animals. 131 Thus, the high variance seen especially between effector T cell repertoires seems to be an intrinsic property of these 132 repertoires, observed even between different compartments from the same individual. This high variance was not 133 simply a reflection of the different sizes of the different compartments since different sized synthetic repertoires 134 were very similar to each other (SI Fig3 F-G). 135

Figure 2: Differential V gene usage defines different sub-populations of T cells in young individuals. (A)
Cosine similarity was calculated between all pairs of repertories in young (left) or adult (right) mice and displayed as a heatmap. Hierarchical clustering dendrograms showing the organization of the assigned at each plot, colored by CD4+ and CD8+ groups (grey and red branches respectively) and labels by compartment (text and symbol). Tissues are marked in symbols shape (SP = triangles, BM = circles). (B) PCA separates the Vβ usage of CD4+ and CD8+ compartments in age dependent-manner (Young in upper and Adult in lower panel). Each color represents one compartment from one mouse (e.g., CD8+ Effectors, BM, mouse 1). See legend for symbols and color code. PC1 separates between CD4+ and CD8+ classes in both age groups. PC2 divides between cell compartments in Vβ usage of young mice. The Vβ genes with the highest influence (loading) are marked with arrows. (C) Cosine similarity of the Vβ gene usage between individuals (circles) or within individuals (between spleen and bone marrow, triangles). T cells compartments (colored dots) are divided to CD4+ (left) and CD8+ (right) from young or adult mice. Mean is shown by horizontal black lines. The TCR V gene distributions analyzed above create a simplified abstraction of individual repertoires, and TCR repertoires can 137 also be considered as a hyperdimensional feature space defined by the millions of individual nucleotides which constitute 138 each repertoire. In order to identify structure within this space, we first visualized the qualitative patterns of sharing 139 between CD4+ and CD8+ sub-compartments, using circus plots ( Fig 3A). This analysis, which included only sequences shared 140 by at least two compartments, reveals a distinctive pattern of sharing which is conserved between individuals, and is age 141 specific. In young individuals, CD4+ and CD8+ splenic naïve and CD8+ central memory repertoires contribute the highest 142 young mice a hierarchical structure was observed, with naïve and Treg repertoires clustered together, and effector and 156 central memory repertoires for CD4+ and CD8+ T cells forming distinct clusters. In older individuals, this structure is 157 perturbed. CD4+ and CD8+ repertoires remain distinct, but Tregs now cluster independently of naïve, and are closest to CD4+ 158 effector repertoires. As was the case for V gene similarities, there was modest correlation between TCRα and β similarities, 159 especially in the older individuals (SI Fig 4D). The synthetic repertoires show very little sharing or structure, consistent with 160 clonal expansion being driven by selective forces which operate subsequent to recombination (Fig3 B-C, SI Fig4 B-C, right 161 subplots). 162

Figure 6: T cells compartments of LCMV infected mice express distinct top amino acid triplets of β chain TCR repertoire. (A) Summary of the LCMV induced T cell compartments and epitope -specific cells isolation for TCR repertoire sequencing and analysis. (B)
Cosine similarity index of TRBV genes, CDR3βNT and top CDR3βAA 3-mers (top 350) motifs calculated between tissues and individuals. Colored dots reflect the mice groups (red = young, blue = adult, green/purple = mice after 8 and 40 days of acute LCMV infection, respectively). Mean is shown by horizontal black lines. (C) CD8+ effector differentially expressed triplets are found after 8 days of LCMV infection, and not in the young healthy mice. Each dot represents a single top triplet. P-value (t-test) was calculated for each triplet across six -eight samples (three-four mice and 2 tissues) of CD8+ effectors from young and LCMV infected mice. The y-axis shows FDR-adjusted p-values. The x-axis shows the log 2-fold-change, calculated between mean triplets from young and LCMV infected mice (6-8 samples in each). Significance thresholds are marked in blue lines: at y=1.3 (equivalent to p-value of 0.05) and x=±1 (denoting a total fold-change of 2). Representative triplets above both thresholds are labeled with red text and dots. Significantly enriched triplets that are labeled in red text are found in the epitope specific full CDR3βAA sequences (NP396, NP205, and GP92). 36 significantly expressed triplets are found, among them, 30 triplets are also found annotated to the epitope-specific sequences (83%). not encoded in the germline, but are created de novo in each individual by a stochastic process of imprecise DNA 232 recombination. A fundamental task for immunologists is to understand how this stochasticity and associated inter-233 individual heterogeneity can nevertheless result in a robust and regulated response to a enormous diversity of 234 antigens in most individuals of a population. In this study we explore the balance between stochasticity and 235 heterogeneity on the one hand, and order and consistency on the other. We systematically analyze the TCR repertoire 236 of different functional and anatomical compartments of the adaptive immune system, sampled from young (3 month) 237 and adult (12 month) mice. From this perspective, we consider the immune system as evolving in a multi-dimensional 238 selective space. The dimensions (selective pressures) include thymic selection, peripheral differentiation (along the 239 naïve-memory-effector axis), migration (spleen -bone marrow) and aging (illustrated in Fig 7). We document the 240 effects of these selective processes on different features of the repertoire, which span the range from the full hyper-241 dimensionality of individual nucleic acid sequences (>10 8 per mouse) through the enumeration of amino acid motifs (a 242 few hundred), to the frequency of different V genes (20). We focus the analysis on quantitative measurements of 243 similarity between repertoires, which reflects both convergent and divergent evolution of the repertoire. A recent 244 study has reported systematic sequencing of TCR repertoire of different human T cell subsets, but the focus of their 245 analysis was on the biochemical characteristics of the TCR (Kasatskaya et al., 2020) . 246 Figure 7: The TCR repertoire is considered as evolving in four dimensions, captured by the diagram above.
In the younger mice, the analysis of similarity revealed clear evidence of order, with a hierarchical structure of similarity 247 between the different functional compartments. The most consistent feature was the clear separation between CD4+ 248 and CD8+ repertoires, which was evident in all feature sets explored, in both TCRα and TCRβ repertoires, and 249 presumably reflects the MHC/peptide selection process which operates in the thymus. Notably, however, the selection 250 operates on a complex multi-feature construct, since no one feature (V gene, amino acid motif, or even individual CDR3 251 nucleotide sequence) could distinguish individual CD4+ from CD8+ TCRs. Within CD4+ or CD8+ compartments, the 252 similarity from the perspective of V gene or amino acid motif frequency distributions was highest between naïve 253 repertoires, with progressively decreasing similarity for memory and effector repertoires. Remarkably, this increasing 254 heterogeneity was observed both between matched compartments of different mice and between the same 255 compartment sampled in bone marrow and spleen. We hypothesise that this diversity is an intrinsic feature of the 256 differentiation process shown in Fig 7, driven by clonal expansion in response to continuous exposure to a diverse set 257 of self and non-self antigens. These selective forces must operate on the TCRα/β heterodimer, since the two genes are 258 co-expressed as a single structure at the cell surface. However, the selection seems to operate rather independently on 259 the α and β sequences, since the patterns of inter-repertoire sharing observed for α and β are only loosely correlated. 260 Vβ genes are much more informative than Vα genes in terms of distinguishing functional compartments. 261 The tension between randomness and directed evolution is most evident when comparing the analysis of V gene 262 frequencies and individual CDR3 nucleotide sequences. Similarity in V gene usage is greatest in naïve, and decreases 263 progressively in central memory and effector repertoires. In contrast, similarity in CDR3 frequencies is lowest in naïve, 264 because of the extreme diversity of this compartment, and increases progressively in central memory and effector 265 repertoires. The combination of recombination and selection therefore impose a rigid pattern of V gene usage, which 266 nevertheless encompasses an enormous diversity of TCR sequences. Memory and effector differentiation, presumably 267 in response to antigen, drive some convergent evolution of the clonal repertoire, reflected by increasing similarity of 268 nucleotide sequence repertoires, but paradoxically increasingly disturbing the rigid pattern of V gene usage. 269 In the older mice, elements of the structure remain, but aging and the much longer exposure to the antigenic 270 environment significantly loosen the initial rigid structure evident in V gene and amino acid motif frequency. CD4+ and 271 CD8+ repertoires, for example, remain clearly distinct in all feature sets. However, the clear segregation between naïve, 272 central memory and effector repertoires becomes blurred, and the overall pattern of similarity is increasingly driven by 273 the idiosyncratic effector repertoires which differ both at V gene and at amino acid motif level. The Treg population 274 show a distinctive distribution of similarities. In both young and adult mice, the Treg repertoires are more similar to 275 themselves than to any other compartment, confirming the distinct nature of the Treg repertoire, which has been 276 hypothesized to arise from exposure to a distinct set of antigens (Wyss et al., 2016) , (Bolotin et al., 2017). However, the 277 Treg repertoires are more similar to naïve repertoires in the younger individuals, but become more similar to effector 278 repertoires with age. The switch from a naïve-like to a more effector-like repertoire, which is also observed at a 279 phenotypic level by increased expression of CD44 and decreased expression of CD62L may reflect a life-long gradual 280 recruitment of induced Tregs to the original natural Treg population emerging from the thymus (Darrigues et al., 2018). 281 The switch of regulatory T cells to a more effector phenotype might also represent a weakening of regulatory activity, 282 and hence be linked to the increase in autoimmunity associated with age. 283 The response to environmental antigens drives many of the differentiation and age-associated changes which we 284 describe. Since the mice are housed in specific pathogen free conditions, and are not germ-free, this may include a 285 variety of microbial antigens present in the environment. However, although the mice are co-housed, the individual 286 antigen exposure may be heterogenous and asynchronous. We therefore investigated the impact of exposure to a 287 strong synchronous exogenous antigenic stimulus, by infecting the mice with LCMV, which produces a strong but self-288 limiting infection in the C57Bl/6 strain. The immune response to this virus has been studied extensively (Zhou et al.,289 2012), and is known to involve strong systemic clonal expansion by both CD4+ and CD8+ T cells. Indeed, as expected, 290 the repertoires at 8 days post-infection, when the immune response is strongest (Murali-Krishna et al., 1998) , (Slifka et 291 al., 1997) showed evidence of perturbation. Interestingly, LCMV induced a marked decrease in similarity in both V gene 292 and amino acid motif usage in both CD4+ and CD8+ naïve repertoires, perhaps reflecting increased turnover and 293 perturbation of this compartment in response to the infection. However, in contrast to the changes observed in 294 response to chronic environmental antigen stimulation, LCMV drove an increased similarity of effector repertoires. This 295 was reflected not only in V gene and CDR3 nucleotide distributions, but was evidenced by the existence of amino acid 296 triplets highly enriched in the TCR repertoire of infected individuals. Remarkably, many of these triplets were found 297 within the set of CDR3s of CD8+ TCRs which bound one specific epitope of LCMV, confirming the link between motifs 298 and specific antigen recognition. Thus, exposure to a strong synchronous source of antigen, such as is provided by acute 299 exposure to LCMV, drives strong convergent evolution and decreased diversity of the TCR effector repertoire, which 300 relaxes partially towards the uninfected state at 40 days post-infection. 301 The study we present has a number of limitations. The number of individuals analysed was small, limiting the amount 302 of robust statistical analysis which can be carried out. Thus, many of the conclusions we make are based on statistical 303 trends rather than classical statistical significance thresholds. Furthermore, the analysis of the effects of aging are 304 limited to two time points, and would benefit from extension to very young or very old mice. We also recognize that the 305 functional sub-compartments we define are based on a rather simplistic and limited panel of antibody markers, and 306 that in reality the populations we refer to as naïve, central memory and effector certainly contain further heterogeneity 307 which could be explored further in future studies. 308 In conclusion, we present a novel approach to the analysis of the TCR repertoire which we use to address the 309 fundamental relationship between stochastic and deterministic processes which drive evolution of the adaptive 310 repertoire. The adaptive immune system shows a remarkable capability to preserve high-order structure, as reflected 311 by conserved frequency distributions of V gene and short amino acid linear motifs, while still allowing enormous 312 diversity at individual sequence level. This high order structure is partially preserved but gradually weakened as the 313 adaptive immune system ages. We speculate that this structure is key to maintaining a robust consistent antigen-specific 314 response across a population in the face of the randomness and heterogeneity imposed by the process of imprecise TCR 315 recombination. 316

317
Animals: All experiments except for the LCMV infections were carried out using inbred female Foxp3-GFP (C57BL/6 318 background) mice sacrificed at three months (young) and one year (adults). All animals were handled according to 319 regulations formulated by The Weizmann Institute's Animal Care and Use Committee and maintained in a pathogen-320 free environment. constructed by binding Biotinylated monomers with PE/APC -conjugated-streptavidin (according to the NIH 341 protocol). Purified T cells were stained with FITC anti-CD4+ and PB anti-CD8+ and followed by tetramers staining (two 342 tetramers together), for 30 min at room temperature (0.6ug/ml). CD8+ epitope-specific cells were sorted from single-343 positive gates for one type of tetramer. 344 Library preparation for TCR-seq: All libraries in this work were prepared according to the published method (Oakes et 345 al., 2017), with minor adaptations for mice. Briefly, we extracted total RNA from CD4+/CD8+/CD3+ T cells (from spleen 346 or bone marrow) of Foxp3-GFP or C57BL/6 mice using RNeasy Micro Kit (Qiagen) and cleaned from excess DNA with 347 DNAse 1 enzyme (Promega). RNA samples were reverse transcribed to cDNA and an anchor sequence at the variable 348 part of the TCR was added using single strand ligation. Ligation products were amplified by PCR in three reactions, 349 using an extension PCR to add Illumina sequencing primers, indices and adaptors. Our modified protocol for mice 350 included specific primers for the constant region of the TCR α or β chain 351 ("GAGACCGAGGATCTTTTAACTGG","GCTTTTGATGGCTCAAACAAGG", for α and β chain respectively). These primers are 352 used in the reverse transcription (RT) and the first two PCR reactions (PCR1: "CAGCAGGTTCTGGGTTCTGGATG"," 353 TGGGTGGAGTCACATTTCTCAGATCCT", for α and β chain respectively). Primers in the second round of the PCR included 354 TCR constant region sequence, together with a six base pair Illumina index for multiplex sequencing, six random base 355 pairs to improve cluster calling at the start of read 1, and the Illumina SP1 sequencing primer (PCR2: 356 "ACACTCTTTCCCTACACGACGCTCTTCCGATCTHNHNNH-index-CAGCAGGTTCTGGGTTCTGGATG", 357 "ACACTCTTTCCCTACACGACGCTCTTCCGATCTHNHNNH-index-GGTGGGAACACGTTTTTCAGGTCCTC", for α and β chain 358 respectively). In the third round of the PCR, the primers were the SP1 and SP5 Illumina adaptors (PCR3: 359 "CAAGCAGAAGACGGCATACGAGAT ", "AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCC", 360 forward and revers respectively). All PCR reactions were done using KAPA HiFi high fidelity proof reading polymerase 361 (KAPA Biosystems). Libraries were sequenced using NexsSeq 550 (200 bp forward read, 100 bp reverse) (Illumina). 362 Pre-Processing and Error Correction for Raw Reads: Data was processed using an in-house pipeline, coded in R. First,363 we transfer the UMI sequence from the read2 to read1 sequence. Trimmomatic (Bolger et al., 2014) was used to filter 364 out the raw reads containing bases with Q-value ≤20 and trim reads containing adaptors sequences. The remaining 365 reads were separated according to their barcodes and reads containing the constant region for α or β chain primers 366 sequences were filtered (CAGCAGGTTCTGGGTTCTGGATG/ TGGGTGGAGTCACATTTCTCAGATCCT α and β chain 367 respectively), allowing up to three mismatches. Bowtie 2(Langmead and Salzberg, 2012) (using sensitive local 368 alignment parameters) was used to align the reads to the germline V/J gene segments, as found in IMGT germline. The 369 CDR3 nucleotide sequences were translated to amino-acid sequence in two steps. The N-terminal Cysteine was 370 identified by matching it to the V aligned region. Then the C-terminal Phenylalanine was identified by matching it to 371 the J aligned region. Up to one mismatch was allowed per 18-stretch sequence, ending with the Cys or starting at the 372 Phe. CDR3AA sequences were defined according to IMGT convention. To correct for possible sequence errors, we 373 cluster the sequences UMI's in two steps; (1) UMI's with highest frequency grouped within a Levenshtein distance of 374 1.
(2) Out of these sequences, CDR3AA sequences (starting from the most frequent sequence in a group) were 375 clustered using a Hamming distance (Hamming, 1950) threshold of 4. Finally, the UMI of each CDR3 sequence was 376 counted, and UMI count reads with one copy number were filtered out. For the entire analysis, sequences were used 377 only if they were fully annotated (both V and J segments assigned), in-frame (i.e., they encode for a functional peptide 378 without stop codons) and with copy number greater than one. In addition, we removed the invariant α chain of the 379 iNKT CDR3 sequence ("CVVGDRGSALGRLHF" (Greenaway et al., 2013), 0.001% from all sequence in our data). 380 Statistical Analysis: All statistical analysis was performed using R Statistical Software. For the pre-processing pipeline 381 we used the "ShortRead" package (Morgan et al., 2009). The package "vegan" (Dixon, 2003) was used to measure the 382 Simpson and Shannon indices (Leinster and Cobbold, 2012) , (Mehr et al., 2012). We also used it to compute the Horn 383 similarly index (Greiff et al., 2015) , (Venturi et al., 2008) and to project the Nonmetric Multidimensional Scaling(Faith, D. 384 P, Minchin, P. R. and Belbin, 1987). The Horn index relies on both overlap and abundancy of sequences, as evaluated 385 by the unique molecular identifier count (UMI count)  , (Friedensohn et al., 2017). For the PCA 386 analysis we applied the "factoextra" package(A. Kassambara, 2017) and the "ggplot2" (Wickham H, 2009) Figure 1: (A) Representative sorting gates for CD4+ cells of one young mouse. (B) FACS-sorting cells percentage of each compartment of young (red) or adult (blue). Mean is shown in black lines (n=3). Significant differences between age groups are denoted by asterisks (P-values: * <0.05, ** < 0.01, *** <0.001, t-test). (C) The number of obtained UMI correlates with sorted cells number. Colored dots correspond to the sum of UMI count in the shown young mice compartments vs. the number of sorted cells. α and β chains are marked in circles and triangles, respectively. P-value= 1.83x10-42, R 2 = 0.9. (D) High correlation between α and β UMI counts. Colored dots correspond to the sum of each young mice compartment (color and shape). (E) Shannon indices from α and β repertoires are highly correlated. Each point is the Shannon index of one SP or BM, CD4+ or CD8+ (dots shape) compartments from young or adult mice (upper or lower panel, respectively).    (red), adult (blue), and synthetic (green) mice. Each bar represents the mean frequency of the V segment in grouped naïve T cells from both tissues. Error bars are SEM (n=6, three mice from CD4+ and CD8+ naïve). Significant differences between all pair groups (Young vs Adult= orange, Young vs Syn=black, Adult vs Syn=grey) in specific segments are detected both in TRBV genes and TRAV families of genes (P-values: * <0.05, ** < 0.01, t-test with Benjamini & Hochberg correction). (B) A high correlation between Cosine and Horn similarity measurements was calculated for the TRBV usage. Each point is the Horn or Cosine score for the Vβ usage between all pair compartments. (C) The cosine similarity index of the TRAV usage was calculated between all pairs of repertories in young (left) or adult (right) mice. Hierarchical clustering dendrograms show the organization of the assigned at each plot, colored by CD4+ and CD8+ groups (grey and red branches respectively) and labels by compartment (text and symbol). Tissues are marked in symbols shape (SP= triangles, BM= circles). (D) PCA separates the Vα usage between CD4+ and CD8+ class of young (upper) or adult (lower) mice but not within their subgroup compartments. Each color represents one compartment from one mouse (e.g., CD8+ Effectors, BM, mouse 1). (E) Pairwise cosine similarities between Vα and Vβ usage show low correlation, especially in adult mice. Each point is the cosine similarity for Vα and the Vβ usage. (F-G) Uniform Vβ usage in synthetic TCRs, both in PCA analysis (F) and in pairwise cosine similarity scores (G). TRAV1 TRAV2 TRAV3 TRAV4 TRAV5  TRAV6 TRAV7 TRAV8 TRAV9  TRAV10 TRAV11 TRAV12 TRAV13 TRAV14 TRAV15  TRAV16 TRAV17 TRAV18 TRAV19 TRAV20 TRAV21 TRAV23 TRBV1  TRBV2 TRBV3 TRBV4 TRBV5  TRBV14 TRBV15 TRBV16 TRBV17 TRBV19 TRBV20 TRBV23 TRBV24 TRBV26  TRBV29 TRBV30 TRBV31   0  . These values were compared across mice using another Cosine score calculation. The dots color corresponds to the TCR chain (red= TCRα, grey= TCRβ). Significant differences between age groups are denoted in asterisks (P-values: * <0.05, ** < 0.01, t-test). (B) Pairwise cosine similarity from representative young, adult, or synthetic ("Syn") mouse CDR3αNT sequences. Correlation levels are represented by color (high=light blue, low= dark blue). In color and text, hierarchical clustering dendrograms for all T cell compartments are plotted to the left of each heat map (CD4+=circle, CD8+= triangles). (C) The similarity matrices shown as heatmaps in B are represented in two dimensions by NMDS. (D) CDR3αNT vs. CDR3βNT pairwise cosine similarities between all pairwise compartments of young and adult mice. (E) Cosine index sharing levels between CDR3βNT of Tregs across tissues or naïve and CD4+ effector repertoires within each young(red), adult(blue) or synthetic-based (green) mouse. Comparisons between the different tissues (SP-SP, SP-BM, BM-BM, n= 9). Mean is shown by horizontal black lines. Significant differences are denoted in asterisks (P-values: * <0.05, ** < 0.01, T-test) and calculated between the groups: Tregs across tissues and Treg CD4+ naïve cells.