Single-cell reconstitution reveals persistence of clonal heterogeneity in the murine hematopoietic system

The persistence of patterns of monoallelic expression is a controversial matter. We report a genome-wide in vivo transcriptomics approach based on allelic expression imbalance to evaluate whether the transcriptional allelic patterns of single murine hematopoietic stem cells (HSC) are still present in the respective differentiated clonal B-cell populations. For 14 genes, we show conclusive evidence for a remarkable persistence in HSC-derived B clonal cells of allele-specific autosomal transcriptional states already present in HSCs. In a striking contrast to the frequency of genes with clonal allelic expression differences in clones expanded without differentiation (up to 10%), we find that clones that have undergone multiple differentiation steps in vivo are more similar to each other. These data suggest that most of the random allele-specific stable transcriptional states on autosomal chromosomes are established de novo during cell lineage differentiation. Given that allele-specific transcriptional states are more stable in cells not undergoing extensive differentiation than in the clones we assessed after full lineage differentiation in vivo, we introduce the “Punctuated Disequilibria” model: random allelic expression biases are stable if the cells are not undergoing differentiation, but may change during differentiation between developmental stages and reach a new stable equilibrium that will only be challenged if the cell engages in further differentiation. Thus, the transcriptional allelic states may not be a stable feature of the differentiating clone, but phenotypic diversity between clones of a population at any given stage of the cell lineage is still ensured.

One of the most remarkable features of multicellular organisms is the diversity of cellular 2 phenotypes within each body. Isogenic cells display distinct phenotypes due to different 3 epigenetic features or chromatin states that underlie specific gene expression programs. 4 Technical progress in next-generation sequencing (NGS) methods has produced a wealth of 5 data on the transcriptomics and genome-wide chromatin states of different lineages and 6 stages within each lineage. However, distinguishing stable and reversible modes of gene 7 regulation remains a challenge 1 . Likewise, the epigenetic and functional inter-clonal 8 diversity within cell lineages has been difficult to capture. One proxy for approaching these 9 questions is to explore the allelic differences in expression. 10 Diploid eukaryotic organisms inherit one allele from each parent and, in most cases, the two 11 alleles of each gene are expressed at the same time and roughly similar levels in each cell. 12 Exceptions to this biallelic expression pattern arise from asymmetries between the two 13 alleles, leading to unequal expression of two alleles, which can be quantified as allelic 14 imbalances (AI). AI can have a genetic basis, due to inherited differences in each allele's 15 cis-regulatory regions or acquired somatic DNA modifications or be caused by allele- 16 specific epigenetic differences accumulated by the somatic cell. Parent-of-origin genomic 17 imprinting 2 and X-chromosome inactivation (XCI) 3 , the most well-studied examples of AI 18 due to epigenetic differences, cannot shed light on inter-clonal lineage diversity; in the 19 former process, all somatic cells from the organism are virtually identical concerning the 20 genomic imprint; in the latter, only two different cell populations emerge in females 21 (differing in which X chromosome was inactivated). Potentially more useful are the random 22 epigenetic-based AI that have been identified in autosomal genes at frequencies ranging 23 from 2% per cell type to up to 15% of all genes 4-8 . Some cells may express mostly or 24 exclusively (monoallelically) one allele of these autosomal genes, whereas other cells 25 express mostly or exclusively the other allele, a phenomenon known as random monoallelic 26 expression (RME). These imbalances in heterozygous organisms establish clones within 27 each lineage with phenotypic and functional differences, as in the extensively studied 28 antigen and olfactory receptor gene 9,10 . However, it remains to be addressed if the concept 29 applies broadly at the functional level to more genes 4 and what is the real potential for clonal 30 diversity based on the combinations of genes with distinct allelic expression levels. 31 Most of the studies reporting measurable frequencies of autosomal genes with random AI 32 were performed in collections of clones expanded in vitro. In most cases these clones were 33 2 expanded without undergoing differentiation or under limited differentiation. Building upon 1 previous work 11 , here we report the first genome-wide analysis of B and T cell populations 2 emerging in vivo from a single hematopoietic stem cell (HSC) to evaluate whether regions 3 in the autosomal chromosomes can keep stable expression patterns after extensive 4 differentiation.

6
A single HSC with long-term reconstitution gives rise to myeloid and lymphoid cells in 7 the blood 8 This work's main goal is to study stable transcriptional states using the allelic transcriptional 9 states of readouts in a clonal system recreated in vivo. For this purpose, we introduced single 10 HSCs from a donor female mouse carrying the Ly5.2 pan-leukocyte marker in a sub-lethally 11 irradiated recipient female mouse carrying the Ly5.1 marker to distinguish recipient and 12 donor cells (Supplementary Fig. 1 transplanted per animal to generate oligoclonal or polyclonal control populations (Fig. 1A). 19 The HSC population is heterogeneous, and several protocols based on flow cytometry were 20 developed to distinguish between long-term HSCs (LT-HSCs) and short-term HSCs (ST-21 HSCs) 13 . We used CD150 + and CD48signaling lymphocyte activation molecule family 22 markers on lineage negative and Sca-1 + /cKit + (LSK) cells isolated from the bone marrow of 23 donor mouse 14 to single sort the LT-HSC population (Fig. 1B). Pure single HSCs were 24 introduced by intravenous retro-orbital injection into recipient mice. The presence of donor 25 cells was evaluated over 12 weeks by identifying the Ly5.2 + cells in the blood of recipient 26 mice (Fig. 1C). From 16 experiments, 12 weeks after injections, we were able to reconstitute 27 with a single HSC 6.6% of recipient mice with a percentage of blood chimerism in the 1-  Table 1). 31 Twelve weeks after injection, the animals with chimerism were sacrificed to isolate HSC 1 derived splenic donor B cells (CD19 + IgM + ), donor thymocytes (CD4 + CD8 + ), and myeloid 2 cell populations from monoclonal and polyclonal animals (Supplementary Fig. 3 and Fig.   3 1D). We used bone marrow cells to produce secondary reconstitutions (Fig. 1E), showing 4 that these CD150 + /CD48 -HSCs originate long-term and multilineage reconstitutions. RNA 5 isolation and whole transcriptome sequencing were performed for the HSC derived B and T 6 cell samples from the reconstituted animals and B cells and T cells from an unmanipulated 7 donor female, which were used as additional non-clonal controls. 8 To compare the populations of evolving lymphocytes in the single-HSC and control 9 reconstituted animals, we used MiXCR-3.0.12 15,16 to quantify the V(D)J rearrangement 10 clonotypes of sorted B and T cell samples. We observed roughly the same number of 11 rearrangements in the single-HSC reconstitution samples, the samples produced from 50-12 200 HSCs, and the non-clonal samples, suggesting that there is a substantial cellular 13 expansion in the single-HSC derived hematopoietic system before V(D)J rearrangement, 14 which first occurs in pro-B and pro-T cells (Fig. 1F). 15 Single HSC reconstitutions produce clonal hematopoietic systems 16 HSCs isolated from one donor mouse (F1[CAST Ly5/Ly5 x B6 Ly5.2/Ly5.2 ]) were injected in 17 multiple recipient animals (F1[CAST Ly5/Ly5 x B6 Ly5.2/Ly5.1 ]), and allowed to expand in vivo. 18 HSC-derived B cells from polyclonal and monoclonal animals for three different 19 experiments (E6, E13, and E15) were FACS-sorted and cDNA was sequenced (RNA-Seq); 20 for experiment 13, HSC-derived T cells were also sorted and sequenced. B and T cells from 21 one unmanipulated donor animal were used as non-clonal control populations ( Fig. 2A). We 22 took advantage of XCI to internally confirm the monoclonality vs. oligo or polyclonality of 23 the reconstitutions. A single HSC produced not only multilineage long-term reconstitutions 24 but also hematopoietic cell populations that are clonal. In a hematopoietic system derived 25 from a single female HSC, all cells must have inactivated the same X-chromosome, 26 producing a complete skewing of the maternal and paternal X-linked AI (maternal 27 allele/(maternal + paternal alleles)), which will be equal to 1 or 0; AI will tend to 0.5 as the 28 number of clones increases. Given that the Xist non-coding RNA is only expressed from the 29 inactivated X, we first performed Sanger sequencing on Xist cDNA, focusing on two strain-30 specific SNPs. As expected, chromatograms show two overlapping peaks for the control 31 animals, whereas, only one peak was observed in the chromatogram of single-HSC 32 reconstituted animals (Supplementary Fig. 4). We then deepened this analysis by 33 calculating the AI for the X-linked genes from the NGS transcriptomics data. As expected, 1 in the control animals, the AI values are not extreme and in some samples they are fairly 2 balanced (close to 0.5), whereas in the single-HSC derived hematopoietic system mice the 3 AI for the vast majority of the X-linked genes is extreme (Fig. 2B). Intriguingly, in samples 4 from some single-HSC reconstituted animals, notably E13.24_B and E13.29_B, the median 5 AI value is slightly below one. Three scenarios were considered to explain this puzzling 6 observation: 1) more than one HSC may have erroneously be injected in these mice; 2) XCI 7 could be leaky in the sorted lymphocytes, given that inactivated X of mature naïve T and B 8 cells has been reported to lack the typical heterochromatic modifications 17 ; 3) contaminating 9 recipient (polyclonal) cells were present in the sorting cells. To sort out these hypotheses, with single HSCs and that the data do not support the hypothesis that XCI in lymphocytes 16 is leaky. Thus, the dataset is composed of monoclonal samples with a low frequency of 17 contaminating cells and oligoclonal or polyclonal control samples. 19 Genes expressed from both the active and inactive X chromosomes are known as XCI 20 escapees. In mice, XCI escapees have been studied using three systems: 1) single-cell RNA-21 seq 18,19 ; 2) heterozygous female mice knockout for X-linked genes, such as Xist or Hprt 20,21 22 or heterozygous female mice for an X-linked gene linked to a reporter 22 ; 3) and clonal 23 female F1 hybrid cell lines 23-25 . We sought to determine whether single-HSC reconstitution 24 could be an additional strategy to identify hematopoietic lineage-specific X escapees. X-25 linked genes with expression from the Xi of at least 10% of total expression 26 were 26 identified taking into account the recipient cell contamination in each monoclonal sample 27 ( Fig. 2B; see Methods). We identified a total of eight escapees, which were escapees both 28 in B and T samples: 5530601H04Rik, Eif2s3x, Gm8822, Kdm5c, Kdm6a, Pbdc1, Utp14a, 29 and Xist (Supplementary Fig. 6). These genes were plotted along the X chromosome and, 30 as verified before 20 , they are not clustered (Fig. 2C). Considering the literature, 117 genes 31 have been described as XCI-escapees in different mouse tissues and cell lines 20-23 . Some of 32 these genes were excluded from our analysis for lack of expression (36 genes), insufficient 33 number of SNPs to estimate AI (2 genes), or for not being listed in the annotation reference 1 used in this work (1 gene). Overall, 79 genes known to escape XCI were considered. 7 of 2 the escapees identified in our B and T samples are in this group of 79 genes; the only 3 exception is Gm8822, which we have identified as an XCI pseudogene escapee and was not 4 the subject of investigation in other studies. According to our analysis, 71 of the known 5 escapees are not escapees in lymphocytes, which is consistent with the notion of tissue-6 specific XCI (Supplementary Table 2). Overall, we show that single-HSC transfer is an 7 effective method to study lineage-specific XCI in blood cells. The previous analysis would fail to detect a small percentage of genes with stable epigenetic 3 states. If a gene has clone-specific AI, then the dispersion of the AI values in monoclonal 4 samples would be higher than in the control group. To further scrutinize the dataset, we 5 plotted the AI standard deviations of B-cell monoclonal (x-axis) and polyclonal (y-axis) 6 samples. The plot highlighted 14 genes with higher dispersion values in the monoclonal set 7 than in the polyclonal set (Fig. 4A). The fact that, above a threshold of standard deviation, 8 no gene is found to have a standard deviation in the polyclonal set remarkably higher than 9 in the monoclonal set suggests that the identified genes are not exceptions due to the multiple 10 comparisons that were performed (p < 2.7 x 10 -6 , one-sided Wilcoxon test). The represented for these 14 genes (Fig. 4C). The data revealed no obvious LOH for any of the 20 genes involved. In addition, these 14 genes have not been associated with LOH or replication 21 fragile sites and lack the molecular features typically associated with these regions, such as 22 high expression levels and a large size 28,29 . We conclude that the high standard deviation of 23 the AI values for these 14 genes is not a result of LOH and is likely to reflect stable 24 transcriptional biases originally present in the cloned HSC. populations over multiple differentiation steps. Our analysis suggest that the incidence of 29 such stable states is much lower than was previously reported in clonal cells not undergoing 30 differentiation 4-8 . However, in this work we used a much more stringent statistical approach 31 to allele-specific analysis, relying on technical replicates for RNA-seq libraries to exclude 32 false positives 30 .This raises the possibility that the differences could be due, at least in part, 33 to the differences in experimental and statistical procedures compared to previous studies.

1
To exclude this potential source of discrepancy, we applied the same analytical pipeline to 2 RNA-seq data generated from clonal cells that grew without differentiation. We used v-Abl 3 pro-B clonal cell lines Abl.1, Abl.2, Abl.3 and Abl.4 which were derived previously from 4 129S1/SvImJ x Cast/EiJ F1 female mice 5 , with two replicate RNA-seq libraries prepared 5 and sequenced per sample. We found that all pairwise comparisons have at least fourfold 6 more genes with significant differences (Fig. 5A) than the pairwise comparison of CAST/EiJ 7 x C57BL/6 HSC-derived clones with the highest number of genes with significant 8 differences (Fig. 3B). Furthermore, the AI values in the collection of Abelson clones also 9 have a higher dispersion than the collection of the HSC-derived clones (Fig. 5B). It is 10 unlikely that these massive differences result from genetic differences between 129S1 and 11 C57BL/6 because the two strains share an ancestor after the split from CAST/EiJ 31 . The 12 data suggest that in clones undergoing differentiation there is erasure and intraclonal 13 reestablishment of AI.

15
There is an ongoing debate on whether phenotypic diversity due to epigenetics or somatic 16 DNA recombination is a general phenomenon that improves the function of defined cellular HSCs. We report two major findings. First, the analysis of these monoclonal and genetically 23 unmanipulated hematopoietic systems allowed us to conclude that after prolonged (more 24 than four months between HSC transfer and collection) and extensive cell division and 25 lineage differentiation, the percentage of autosomal genes displaying RME is much lower 26 than the estimates from collections of clones grown in vitro (<0.2% vs. ~2-15% 4-8 ). Second, 27 to our knowledge, we have identified for the first time rare regions in the autosomal 28 chromosomes that keep stable allelic transcriptional states along HSC differentiation stages.

29
Below we discuss the implications of the technique we used and the findings for XCI, 30 hematology, RME, and phenotypic diversity. The failure to observe an increased number of X escapees in lymphocytes is probably 16 explained by the low percentage of biallelic expression in the X-linked genes in lymphocytes 17 or the fact that the experiment was not designed to address this question.

18
Autosomal versus XCI parallels 19 XCI and RME in autosomal regions have in common the stochastic component leading to 20 expression vs. silencing. A number of parallels have been drawn between these phenomena 21 44,45 ; notably, at least one gene has been found to play a role in XCI and RME 46 and high 22 concentrations of long interspersed nuclear element sequences, which were implicated in 23 XCI 47 , have been proposed to characterize loci involved in RME 48 . Despite these possible 24 common mechanistic features, our study establishes a fundamental difference: during 25 lineage differentiation, RME lacks the stability of XCI.

26
Applications of stable imprints in the autosomal regions 27 Identifying a few regions in the autosomal chromosomes with stable epigenetic states in the 28 hematopoietic lineage could be explored in the future to develop clonality assays for the 29 hematopoietic system. These assays have typically relied on finding significant skewing of 30 the XCI ratio from the 1:1 ratio, which is limited to females and has a low resolution 49 . By 31 focusing on polymorphisms in the autosomal regions with stable epigenetic states, it should 32 be possible to design clonality assays for both sexes that are more sensitive to decreases in 1 clonality than the assays based on XCI. As a way to reconcile the lack of AI in extensively differentiated in vivo grown clones with 4 the data on in vitro grown clones that do not undergo differentiation in culture, we propose 5 that the evolutionary selection pressure shaping RME is at the level of the phenotypic 6 diversity displayed by a cellular population, which does not absolutely require the 7 persistence of the allelic biases at the deep memory clonal level. What should be crucial is 8 that, within a given developmental stage, the cells forming a population keep distinct allelic 9 biases, but these may change stochastically from one stage of differentiation to the next 10 (within the clone as it undergoes differentiation) (Fig. 6). We call this model "Punctuated  unique genes, particularly in the blood cells, which circulate permanently, may be better 31 described within each cell population and along lineage differentiation by punctuated 32 disequilibria rather than phenotypic clonal stability. In the future, we will address how the 33 allelic expression equilibrium of a given gene is disturbed by time, cell cycle, the extent of 1 differentiation, the changes in the expression levels throughout the development of the gene 2 and its neighbor genes, and we will also dissect the epigenetic and genetic components of 3 this process. For now, we propose that the phenotypic diversity of a given cell population 4 could rely less on clonal stability than on the independence of each cell during the stages of 5 gene (re)activation that punctuate lineage commitment and cell activation, which may set a 6 new expression balance for the alleles until next stage of differentiation.     and estimation of confidence intervals for differential AI analysis 30 . RNA-seq reads were 21 trimmed from nextera adapters with cutadapt.v.1.14 using the wrapper trim_galore. and at least one T cell sample. (Supplementary Fig. 6).

VDJ clonotypes
13 Immunoglobulin rearrangements were detected by alignment of RNA-Seq raw data with  . RNA-seq data analysis followed the same pipeline as for HSC derived clones in vivo, 14 with exception of the maternal reference genome which was 129S1. These data were 15 originally generated for the work described in bioRxiv by Gupta et al, 2020 Preprint 52 .     The cells were gated for PI -/ APC-Cy7to exclude dead cells and any remaining lineage-positive cells, then 7 for c-Kit + /Sca-1 + to obtain Lin -Sca + cKit + (LSK) cells, and finally gated for CD48 -/CD150 + to obtain LT-