Skewed X inactivation in genetically diverse mice is associated with recurrent copy number changes at the Xce locus

Kathie Y Sun; Daniel Oreper; Sarah A Schoenrock; Rachel McMullan; Paola Giusti-Rodríguez; Vasyl Zhabotynsky; Darla R Miller; Lisa M Tarantino; Fernando Pardo-Manuel de Villena; William Valdar

doi:10.1101/2020.11.13.380535

ABSTRACT

Female mammals are functional mosaics of their parental X-linked gene expression due to X chromosome inactivation (XCI). This process inactivates one copy of the X chromosome in each cell during embryogenesis and that state is maintained clonally through mitosis. In mice, the choice of which parental X chromosome remains active is determined by the X chromosome controlling element (Xce), which has been mapped to a 176 kb candidate interval. A series of functional Xce alleles has been characterized or inferred for classical inbred strains based on biased, or skewed, inactivation of the parental X chromosomes in crosses between strains. To further explore the function-structure basis and location of the Xce, we measured allele-specific expression of X-linked genes in a large population of F1 females generated from Collaborative Cross strains. Using published sequence data and applying a Bayesian “Pólya urn” model of XCI skew, we report two major findings. First, inter-individual variability in XCI suggests mouse epiblasts on average contain 20-30 cells contributing to brain. Second, NOD/ShiLtJ has a novel and unique functional allele, Xce^f, that is the weakest in the Xce allelic series. Despite phylogenetic analysis confirming that NOD/ShiLtJ carries a haplotype almost identical to the well-characterized C57BL/6J (Xce^b), we observed unexpected patterns of XCI skewing in females carrying the NOD/ShiLtJ haplotype within the Xce. Copy number variation is common at the Xce locus and we conclude that the observed allelic series is a product of independent and recurring duplications shared between weak Xce alleles.

Introduction

Although X chromosome inactivation (XCI) was first described in the early 1960s (Lyon 1961; Beutler et al. 1962), the genetic influences and molecular mechanisms underlying this phenomenon are still incompletely understood. Embryonic stem cells of female placental mammals undergo random XCI, a process that transcriptionally inactivates one of the two X chromosomes early in development (Avner and Heard 2001; Disteche and Berletch 2015). Subsequent daughter cells carry on the initial decision, forming clusters of cells in which either the maternal or paternal X is actively transcribed. Consequently, female mammals are unique mosaics of parental X chromosome activity. XCI ensures that expression of genes on the X chromosome is functionally equalized with those of males as a form of dosage compensation.

At the epiblast stage, each embryonic cell randomly and independently inactivates one of the parental X chromosomes and locks in its cellular fate (Nesterova et al. 2001; Okamoto et al. 2004). This random selection occurs at around embryonic day E5.5 (Takagi et al. 1982; Rastan 1982), prior to differentiation into the three major embryonic germ layers and when there are 120-250 cells comprising the epiblast (Snow 1977). The inactivated X chromosome (Xi) undergoes major reorganization and becomes condensed and heterochromatic, stabilizing gene repression in subsequent somatic cells (Wutz 2011; Nora et al. 2012). Regulation of XCI is carried out in part by Xist, a cis-acting long noncoding RNA (lncRNA) that is transcribed only from the inactivated X (Xi) (Brown et al. 1991). The major X inactivation center (Xic) extends across a 450 kb multi-function region containing many elements responsible for the complex molecular cascade orchestrating XCI, including Xist and other cis elements such as Tsix and Xite (Lee et al. 1996; Cattanach et al. 1970; Ogawa and Lee 2003).

The role played by Xist is necessary but not sufficient to fully explain XCI, leading researchers to explore the larger landscape of cis and trans regulators, chromatin modifiers, and protein complexes that may comprise the Xist interactome (Dossin et al. 2020; Penny et al. 1996; Brockdorff et al. 1991; Giorgetti et al. 2016; Minajigi et al. 2015). Control of XCI is inherently genetic and thus heterogeneity in the genetic architecture of these elements may affect the expression of Xist and its antisense counterpart, Tsix, leading to disruption of the machinery controlling the counting, choice, and silencing of the inherited X chromosomes. Xite is one such example of a region harboring both allelic heterogeneity and intergenic transcription start sites resulting in differential regulation of Tsix expression (Ogawa and Lee 2003). In turn, Tsix is monoallelically expressed from the active X (Xa) and blocks Xist accumulation, thus ensuring the future Xa (Lee et al. 1999a,b).

XCI is ostensibly random, so the a priori distribution of maternal and paternal Xa is expected to be 50:50. Nevertheless, non-random biases between mouse lines have been observed for decades (Cattanach and Isaacson 1967; Cattanach 1970; Cattanach et al. 1970), leading researchers to postulate that beyond the wholesale control of inactivation, preferential skewing for one parental set of X chromosomes over the other may also be under genetic control. Skewing can take two forms. Primary skewing is when the parental chromosomes are inactivated in unequal proportion from the outset (Percec et al. 2002). Secondary skewing arises as a form of selection, where paternal and maternal chromosomes are initially inactivated at random but the embryonic cells carrying them undergo unbalanced rates of replication or death (Minks et al. 2008; Takagi 1980). The hallmarks of secondary skewing also differ, in that it can be tissue-specific and occur at any point during development. In the event of a beneficial or deleterious mutation being carried on the chromosome inherited from one parent, secondary skewing could be advantageous.

Primary skewing in mice has been associated with an allelic series on the X chromosome named the X chromosome controlling element (Xce). Five known functional Xce alleles have been described from weakest to strongest, i.e. Xce^a > Xce^e > Xce^b > Xce^c > Xce^d (Cattanach et al. 1972; Cattanach and Papworth 1981). Under this paradigm, X chromosomes with Xce^a are the least likely to remain active, and when found in female heterozygotes alongside the Xce^c allele, skewing as extreme as 20:80 is expected (Figure 1). These allelic designations are well-recognized and have been consistently observed in inbred mouse strains exhibiting replicable skews in X inactivation ratio. Among the remaining unknown features of the Xce, however, are the exact size and location of the region and the nature of genetic variation that leads to the phenomenon.

Figure 1

Estimate of maternal Xa proportion given parental Xce alleles from previously published work (Calaway et al. 2013; Sheedy 2012; Thorvaldsen et al. 2012; Wang et al. 2010; Chadwick et al. 2006; Plenge et al. 2000)

A natural starting place to search for the Xce would be within the Xic. Control of XCI was initially mapped to a genomic region which overlaps the Xic, and Xist was an early candidate for the Xce. Using translocated coat color genes, Cattanach and collaborators placed the control region between two markers for Tabby (Ta) and Mottled (Mo) coat colors (Figure 2). Upon discovery of Tsix and Xite, allelic heterogeneity across the Xic was suggested as a candidate for Xce and as an explanation for the phenotypic breadth of skewing observed in mice (Ogawa and Lee 2003). However, more recent work in the last two decades demonstrate that the Xce does not overlap the Xic, suggesting that another separate region also participates in XCI. Further refinements over the the decades (Cattanach and Papworth 1981; Simmler et al. 1993; Chadwick et al. 2006; Calaway et al. 2013) have narrowed down the region to a 176 kb minimum interval about 500 kb proximal to Xic, rich with multiple structural variants including duplications and inversions.

Figure 2

Physical map of the mouse X chromosome, highlighting locations of historical candidate Xce intervals. Zoomed in region (bottom) depicts the segmental duplications (SD) and inversions (I) examined in this study.

Researchers have thus generally coalesced around the theory that the Xce, as first defined by Cattanach (1970), describes a region on the X chromosome proximal to the Xic that influences XCI and skewing phenotypes with a number of functional alleles. Nevertheless, there remains uncertainty about the nature, function, and mechanisms of how Xce influences XCI, or if XCI is entirely controlled by a single locus. A complete characterization of Xce and its influence on XCI remains elusive.

More recent work incorporating genetic diversity in F1 mouse crosses confirmed the broad patterns of the Xce alleles and support the importance and functionality of the region in XCI; this is despite difficulty in pinpointing the functional variation and specific gene(s) causing the skew. The narrowest putative Xce region to date was reported by our group in Calaway et al. (2013) using F1 crosses of classical inbred mouse strains, wild-derived strains, and other Mus species. Those results showed that the Xce region, localized to an at-minimum 176 kb candidate region consistent with previously described intervals, confers skewed XCI in patterns compatible with the known paradigm (Chadwick et al. 2006). The minimum Xce interval comprises a series of duplications and inversions and Calaway et al. (2013) proposed that copy number variations (CNVs) may play a role in XCI skewing (Figure 2). Increased genetic diversity made possible the discovery of another functional allele in the series, Xce^e, observed in inbred PWK/PhJ mice (Crowley et al. 2015; Calaway et al. 2013; Lenarcic et al. 2018).

In addition, Thorvaldsen et al. (2012) demonstrated that mice with recombinant breakpoints across Xce had significantly different XCI ratios than either control populations of homozygous or non-backcrossed F1 mice. This finding highlights the difficulty of identifying one single region that explains the entire phenomenon and confers the totality of XCI control.

In this study we take advantage of the fairly narrow putative Xce interval to explore XCI in a genetically heterogeneous mouse population. We further define and characterize the role of Xce, and in particular of CNVs, in XCI skewing using 266 female mice from 28 F1 crosses of the Collaborative Cross (CC) multiparental mouse population (Collaborative Cross Consortium 2012; Srivastava et al. 2017). The CC are a panel of replicable and genetically diverse inbred mouse strains, each derived from an independent cross of eight inbred strains representing the three major Mus musculus subspecies: domesticus (A/J, C57BL/6J, 129S1/SvImJ, NOD/ShiLtJ, NZO/HlLtJ, WSB/EiJ), castaneus (CAST/EiJ) and musculus (PWK/PhJ). Each CC strain possesses genome-wide contributions from the founder strains due to mixing that occurred during rounds of breeding, leading to functional genetic variation and phenotypic breadth. Generations of sib-pair mating resulted in inbred haplotype blocks, allowing for replicates of each CC strain.

Most of the previous studies quantifying XCI make use of either 1) F1 hybrids of classical inbred mouse strains, or 2) back-crossed mouse populations on an inbred background with specific and deliberate introductions of one other strain to probe the boundaries of Xce. With increased heterozygosity in the genetic background of our CC-derived sample population, we can tease apart the effects of Xce independent from the genetic background. As a result, any observed XCI will be directly attributable to primary skewing due to Xce because other loci on the X chromosome will be shuffled among the crosses. Increased genetic heterogeneity in our sample population also allows us to describe further phenotypic heterogeneity in XCI ratios beyond the known Xce alleles (Figure 3).

Figure 3

Mus musculus strains and their observed or predicted Xce alleles. CC founder one-letter code and color corresponds to CC labeling convention. Alleles are ordered in terms of strength. NOD and NZO are presumed Xce^b.

Two of the inbred laboratory strains used in generating the CC, NOD/ShiLtJ and NZO/HILtJ (henceforth referred to as NOD and NZO, respectively), have not had their Xce alleles characterized through crosses; however, both were predicted to be Xce^b due to haplotype similarity with C57BL/6J based on dense genotyping (Calaway et al. 2013). Our results interrogate the accuracy of these predictions based on observed XCI skew in F1 females with sequence derived from NOD or NZO spanning the Xce.

Our estimation of XCI skewing is more precise and generalizable compared with much of the XCI literature for two reasons. First, we incorporate X chromosome-wide expression data by quantifying from global RNA-seq. Previous work, by contrast, has generally quantified XCI using allele-specific expression (ASE) measured at a few known genes, which may present biases and inaccurate ratios due to inactivation escape, cis regulatory elements, or various confounding variables that are not due to XCI itself. Second, we report precise measures of uncertainty about our estimates using a Bayesian hierarchical statistical model that accounts for multiple sources of information. Chromosome-wide ASE data presents more opportunities for sophisticated statistical modeling to assess XCI, and there are relatively few examples of XCI proportion modeled hierarchically as a beta-distributed random variable (Larson et al. 2017; Lenarcic et al. 2018). This allows us to largely account for other subtle factors that are known to play a role, such as parent-of-origin effects (POE) in XCI whereby the paternal X (Xp) is predisposed to slightly lower levels of activation regardless of Xce allele (Wang et al. 2010; Calaway et al. 2013; Lenarcic et al. 2018). The model also, in accounting for variability in XCI among genetically identical individuals, estimates the effective number of epiblast cells at the point of X inactivation that contribute to the organ on which the RNA-seq is collected.

Another key resource we take advantage of is recently-published high coverage whole genome sequences of the CC strains (Srivastava et al. 2017; Shorter et al. 2019), which we used to specifically and accurately quantify CNVs across the Xce. By quantifying targeted, short reads, we confirm that this region hosts highly recurring sequences which appears to have implications for Xce function, and consequently, skewed XCI proportions in mouse crosses. Our characterization of the Xce region utilizes the most genetically diverse mouse population to estimate XCI to date and incorporates data from next-generation sequencing to determine ASE, providing a comprehensive quantification of chromosome-wide skewing.

Materials and Methods

Notation

Throughout this article, we denote each F1 sample by Strain 1/Strain 2, where counts from Strain 1 comprise the numerator of the XCI proportion, i.e. . Reciprocal crosses are denoted a or b, for CC001♀ x CC011♂ and CC011♂ × CC001♀, respectively. These designations were made arbitrarily, but remain consistent throughout the study. Table S1 provides a summary of the CC strains and the F1 crosses.

Mouse breeding populations and sample collection

The process of generating CC strains has been previously described in detail by Collaborative Cross Consortium (2012). CC mice were purchased from the Systems Genetics Core Facility (SGCF) at the University of North Carolina (UNC). This study includes data from 266 samples derived from a total of 29 CC strains (Figure 4) used to produce 28 F1 recombinant inbred intercross lines (CC-RIX). Data for this study was generated from two CC-RIX sample populations (SP). Heterozygosity present in the RIX lines allows us to both precisely measure ASE by comparing the expression of transcripts with allele A versus transcripts with allele B from mice that inherit the genotype AB.

Figure 4

Breeding schemes for two populations of CC-RIX mice that contributed data to this study. A) SP1 was developed to study POE, hence the reciprocal RIX design. Schematics of the X chromosome from each founder are shown to illustrate the paired comparisons. B) SP2 provided more diverse pairings of CC strains, without considering reciprocity. CC strains were paired in a quasi-loop design to generate dozens of RIX crosses with maximum diversity, of which this figure only shows the pertinent subset with RNA-seq data. Arrows point from the dam to the sire used for the CC-RIX.

SP1

This population was developed to identify strain, POE and perinatal maternal diet effects on gene expression and behavioral phenotypes in adulthood by utilizing F1 crosses of CC-RIX and has been described in detail (Schoenrock et al. 2018). Nine genetically distinct reciprocal CC-RIX were bred from 18 nonoverlapping CC strains such that samples from CC1♀ x CC2♂ and CC2♀ x CC1♂ are each represented (Figure 4a). Strain-pair selection aimed to maximize several criteria, namely the number of known brain-imprinted loci, as defined from Crowley et al. (2015) and Williamson et al. (2013) that are heterozygous between haplotypes that are identical by descent with NOD and C57BL/6J (Oreper et al. 2018).

Females from the 18 CC strains were exposed to one of four experimental diets (vitamin D deficient, protein deficient, methyl donor enriched, or standard control chow; Dyets Inc., Bethlehem, PA) during the perinatal period from 5 weeks prior to mating until their pups were weaned 3 weeks after birth. Whole brain tissue was collected from 188 female CC-RIX mice at 60 days of age (65.1 ± 4.8 days (mean and st dev)). Mice used for gene expression studies were behaviorally naïve. Tissue was collected in 26 batches with a minimum of 2 RIX/diet combinations in a batch. Mice were euthanized and whole brain was immediately extracted. A sagittal cut was made to hemisection the left and right hemisphere and tissue was immediately flash frozen in liquid nitrogen and stored at −80° until pulverization. Right brain hemispheres of all samples were pulverized using a BioPulverizer unit (BioSpec Products, Bartlesville, OK).

SP2

In the second population, 21 CC strains, 10 of which overlap with the strains in SP1, produced 19 non-reciprocal RIX. These mice were part of a study to elucidate the genetic basis of antipsychotic-induced adverse drug reactions and has been previously described (Giusti-Rodríguez et al. 2020). The larger study comprised 840 mice, representing 62 CC strains and 73 RIX lines. The design of the RIX crosses formed a quasi-loop such that each maternal line was also the paternal line for another cross (see 4b). Only 85 female samples with RNA-seq data were relevant to our analysis so the number of replicates from SP2 is smaller than from SP1 with a median of four samples per CC-RIX (range: 2-7).

Starting at 8-weeks of age, the mice were subjected to a 30 day treatment protocol where half were implanted with slow-release haloperidol (antipsychotic drug) pellets (3.0 mg/kg/day) and the other half received placebo. Treated and untreated mice were matched between sexes, RIX cross, cage, and batch. After 30 days of exposure to drug or vehicle at 12 weeks of age, mice were sacrificed by cervical dislocation without anesthesia to avoid effects on gene expression. Complete description of this experiment is provided in an independent manuscript (FPMV, unpublished).

RNA-seq preparation

SP1

For 188 mice, total RNA was extracted from ~25 mg of powdered right brain hemisphere tissue using Maxwell 16 Tissue LEV Total RNA Purification Kit (AS1220, Promega, Madison, WI). UNC HTSF core performed RNA concentration and quality check using fluorometry (Qubit 2.0 Fluorometer, Life Technologies Corp., Carlsbad, CA) and a microfluidics platform (Bioanalyzer, Agilent Technologies, Santa Clara, CA). RNA-sequencing was performed in three sequencing batches spread out over the course of the two-year collection of brain tissue once 96 samples from F1 CC-RIX offpsring were obtained. There were a median of 20 samples per CC-RIX (range: 12-32), with 3-4 samples per diet and reciprocal direction.

RNA was prepped with the Illumina TruSeq Stranded mRNA protocol for 100 base pair, stranded, single-end reads at the UNC sequencing core. An initial round of RNA-seq was conducted in December 2014 and June 2015 on HiSeq 2500 machines, and quality control (QC) was conducted on the first few batches of RNA-seq output with fastqc/0.11.8. Reads with low “Per base sequence quality” and “Per sequence quality scores” were prioritized for a second library prep. This first round of RNA-seq was followed up with more sequencing in June 2019 on a HiSeq 4000 machine to boost average read depth for each sample. The final data for each sample were subjected to the same QC criteria and combined, for an average of 24.6 million (M) reads per sample (median 17.9 M, range 10.6-109 M). 7 samples were removed due to missing X chromosomes or low read count.

SP2

Detailed methods for RNA-seq sample preparation and processing are described in an independent manuscript (FPMV, unpublished). Briefly, RNA was extracted from striatum using the Total RNA Purification 96-Well Kit (Norgen Biotek, Thorold, ON, Canada) and prepared with the Illumina (San Diego, CA) TruSeq Stranded mRNA Library Preparation Kit v2 with polyA selection using 1 μg total RNA as input. Equal amounts of all barcoded samples were pooled, to account for lane and machine effects. Each of the three pools was sequenced on eight lanes of the Illumina HiSeq 2000 for 100 base pair, stranded, single-end reads.

Quality control filtered out lanes with significant issues in terms of duplication level, fraction of mapped reads (using TopHat2) and, after summarizing reads at a gene level, fraction of mapped reads among the reads that were mapped to an exon. We only considered samples that passed 3 cutoffs: filtering by duplication (at most 40% duplication), percentage of mapped reads (at most 25% reads not mapping) and percentage of mapped reads being mapped to a gene (at most 35% not being mapped to a gene). QC procedures also resulted in corrections or discarded samples due to mismatches in labeling for strain and sex. Principle component analysis identified an outlier that was also removed. Another sample was removed due to a missing X chromosome.

Demographic details about the 266 CC-RIX samples across study populations are compiled in File S1.

Genotyping in CC-RIX and haplotype reconstruction

To ensure accurate phasing of variants, each sample in SP1 was genotyped on the MiniMUGA platform (Sigmon et al. 2020). MiniMUGA is an array-based genetic QC platform with over 11,000 probes designed to perform robust discrimination between most classical and wild-derived laboratory mouse strains. Three X0 females from SP1 that were removed from subsequent analysis were confirmed using the MiniMUGA platform, serving as a useful negative control for our ASE quantification methods. Haplotypes corresponding to each CC founder strain were reconstructed using R/qtl2 v0.20 (Broman et al. 2019). Genotype- and allele-probabilities for SP2 were inferred from previous genotyping conducted on CC strains and two to four additional animals per strain known to be their most recent common ancestors using the MegaMUGA platform. MegaMUGA comprises up to 77,800 single nucleotide polymorphism (SNP) markers that were optimized for detecting heterozygous regions and discriminating between haplotypes in homozygous regions, with a special emphasis for markers that are informative in the CC (Morgan et al. 2016). Genotyping for MiniMUGA and MegaMUGA was performed at Neogen (Lincoln, NE). Cross-referencing RIX haplotype regions with known CC and CC founder variants for consistency was particularly important at heterozygous loci where the correct parental inheritance would be critical for determining ASE.

We defined the Xce in the data based on previously published intervals because all 8 CC founder strains are represented in every sample, instead of each mouse representing one single strain. In iterative stages we defined Xce, first, based on the interval described in Chadwick et al. (2006) from 101.6-103.6 Mb, and then, refined to the minimum interval described in Calaway et al. (2013) roughly from 102.75-102.92 Mb because the narrower interval was still consistent with both our results from the broader interval and previously observed XCI skews between strains. All base pair positions throughout the manuscript are derived from the Genome Reference Consortium Mouse Build 38 (GRCm38).

Measuring allele-specific expression (ASE) in F1 females

To detect allele- and chromosome-specific expression, we have developed a novel approach using direct k-mer matching to capitalize on known variants in the sequenced CC and founder mice. Key to this method is set of virtual 25-base genotyping probe sequences created from the forward and reverse complement sequences centered about both reference and alternate variants. The reference sequence was provided by the GRCm38 reference mouse genome, based on C57BL/6J, and alternate alleles were collected from sequence data of the other 7 CC founder strains obtained from the Sanger Institute’s Mouse Genomes Project (Keane et al. 2011).

The variant set was filtered to remove unusually high and low probe-sequence counts occurring in any of the sequenced samples. An initial set of approximately 866,000 genome-wide variants were verified across CC and founder strains and became the anchors for matched pairs of k-mers with either the reference or variant allele in the center base. Roughly 590,000 of these k-mers are present in sequences with the highest transcript support level (TSL1), and of those about 414,000 are unique. We filtered k-mers to exclude those that (1) contain multiple variants, and match to (2) duplicated sequences, (3) patterns that are missing from multiple founder strains, (4) loci close to exon start sites, and critically, (5) multiple genomic locations in any CC strain. Taking these criteria into account, between 40-60% of the remaining variants were usable per chromosome. The remaining 7,957 k-mers on the X chromosome comprise a set of paired 25-mers designed to uniquely identify if a sample contains the reference or alternate allele (File S2). We used the tool msbwt v0.3.0 (run on python/2.7.11) to transform our RIX RNA-seq reads into multi-string Burrows-Wheeler Transform (BWT) formatted files to perform efficient, exact k-mer searches to count instances of each k-mer in the RNA-seq reads, thereby quantifying gene expression corresponding to each CC parent in an allele-specific fashion (Holt and McMillan 2014).

Statistical modeling of X chromosome inactivation

We designed a Bayesian hierarchical model to estimate X inactivation proportion at the level of the gene, individual, and RIX, based on the RNA data above. The model also, as a byproduct of its use of beta distributions and their connection to Pólya urns, estimates the number of brain precursor cells in the epiblast at the point of X inactivation choice, at around E5.5 (Rastan 1982; Lenarcic et al. 2018). This section describes first the model for estimating the XCI proportion associated with a given RIX, and then the estimation of the number of brain precursor cells (hereafter, the day 5 brain precursor count) based both on a given RIX and on all RIXs combined. The main components of the model are summarized in Figure 5, with more detail in Figure S1.

Figure 5

Directed acyclic graph (DAG) showing the main parameters of the hierarchical model for XCI proportion at the gene-, individual mouse-, and RIX level. The y_kgi node is observed; all other nodes are parameters to be estimated. This model is applied to each RIX separately. Estimates for the number of day 5 brain precursor cells (α₀), across RIXs are then combined through a post-processing step.

Model for RIX-specific XCI proportion

The average XCI proportion inherent to a RIX is reflected by the XCI proportions of mice from that RIX. These mouse-level XCI proportions are in turn reflected by ASE at X chromosome genes. Our model estimates mouse-level XCI proportions for genes by counting k-mers from the allele of one parent vs that of the other and treating these as outputs from a binomial distribution controlled by overall XCI proportions at the gene-, mouse and RIX level.

Consider a given RIX of CC strains u and v, where strain u is expected to have a weaker Xce allele or, in the case where both are of the same strength, the maternal strain. For counts associated with Xist, which is expressed from the Xi and should therefore have the opposite XCI proportion, the assignment of u and v were reversed. For mouse i = 1, …, n, let N_kgi be the total number of counts for k-mer k of gene g and let y_kgi be the number of these counts specifically from strain u. Then, y_kgi is distributed where μ_gi is the expected proportion expressed from strain u vs strain v for gene g in mouse i. Different genes g = 1,2, … can have different proportions μ_1i, μ_2i, …, but we require these to be centered around a common individual-level proportion μ_i as where this corresponds to the conventional parameterization, Beta(μ_iα, (1 – μ_i)α). The individual-level proportion μ_i is modeled as where c[i] denotes the combination of experimental factors c that are relevant to mouse i, μ_c is the XCI proportion predicted for that combination, and α₀ models the day 5 brain precursor count (described later). The proportion μ_c is modeled through a logit link as the outcome of a linear predictor, where intercept β₀ models an overall value for the RIX, and θ_c incorporates the effects of experimental covariates.

The set of experimental covariates in θ_c was different for SP1 and SP2. For SP1, these were perinatal diet (diet), POE (recip), and their interaction, where diet_c is a categorical predictor indicating the perinatal diet to which mice in condition c was exposed, β_D is a n_diet-vector of diet effects constrained to sum to zero, recip_c indicates the reciprocal direction ( if the dam was u, if the dam was v), β_R is the POE, and β_DR is a n_diet-vector of treatment-by-POE, also constrained to sum to zero. Across the RIXs in SP1, n_diet ranged from 2-4, corresponding to a maximum of 4, 6, or 8 conditions per RIX. For RIXs where any condition level c contained only one sample, we set θ_c = 0.

For SP2, which did not include reciprocal crosses, we initially considered using where trt_c indicates the drug treatment assignment ( for haloperidol, for placebo) of condition level c. Treatment assignment was missing for 9 mice, and in these cases we used model-based imputation, with γ_c ~ Bin(1,0.5). The treatment effect, however, was observed to be zero (see File S3), which serves as a negative control for the model given the timing of the drug dose at 8 weeks after birth, well after XCI is established. Because of the zero effect, the lack of a strong biological rationale for its inclusion, and the relative instability of its estimation for some RIX, the final model for SP2 was θ_c = 0, ie, with treatment effect excluded.

Our primary target quantity for each RIX, regardless of its population, was the overall XCI proportion, μ, given by the inverse logit of β₀, i.e.,

We additionally report XCI proportions for each mouse, μ_i for i = 1, …, n.

Prior distributions for parameters were specified as follows. For parameters modeling RIX-wide XCI, we set β₀ ~ Logistic(0,1) such that μ ~ Unif(0,1), i.e., a flat prior on overall XCI proportion. The prior set on α₀ ~ Uniform(0,1000) reflected a reasonable number of cells in the whole embryo at around E5-6 (Snow 1977). Other parameters were modeled with weakly informative priors: β_R, β_T ~ N(0,10⁴); β_T, β_TR ~ N_stz (0,10⁴ × I), where N_stz() is the multivariate normal distribution constrained so that its variates sum to zero [after Crowley et al. (2014), Appendix A]; and α ~ Ga(0.01,0.01).

Posterior distributions for parameters were obtained using Markov chain Monte Carlo (MCMC). MCMC was performed over two separate chains each run with 5 × 10⁴ (SP1) or 10⁵ (SP2) iterations, discarding the initial 10% of the iterations as burnin and thinning every 5, thus providing 1.8 × 10⁴ or 3.6 × 10⁴ posterior samples in total. Estimates are reported as posterior means (modes and medians are supplied in Tables S1–2) with 95% highest posterior density (HPD) intervals. All models were written and implemented in JAGS 4.3.0 (Plummer 2003) and R version 3.5.2 (R Core Team 2017). Code to run the statistical model is available at https://github.com/kathiesun/XCI_analysis.

Pólya urn-based estimation of the day 5 brain precursor count

In our model for X inactivation, the individual-specific XCI proportion μ_i is modeled as a beta distribution with precision α₀ (Equation 1). This use of the beta distribution can be directly related to an idealized model of cell proliferation based on a Pólya urn (Lenarcic et al. 2018) (Figure S1). The Pólya urn is a hypothetical random process that begins with an urn containing a red balls and b blue balls. A ball is drawn at random and replaced by two balls of the same color. This is repeated an infinite number of times, after which the proportion of red balls p_red in the urn will be distributed as where the precision a + b is the total number of balls at the point the process began. To the extent that proliferation of embryonic cells in alternate XCI states is analogous to the proliferation of alternate color balls in the Pólya urn, our precision parameter α₀ models the (effective) number of brain-relevant cells at the point of the E5.5 XCI decision.

We estimated 1) an α₀ for each RIX, and 2) a global α₀, based on all RIX data. Posterior distributions of α₀ for each RIX is were obtained using MCMC as described above. These were similar to each other but individually somewhat vague (see Results). To obtain a more precise estimate, we assumed the α₀ was the same across RIXs and calculated a posterior given all RIX data as the normalized product of the individual posteriors, where denotes the posterior for RIX r = 1, …, R give RIX data , and the above relation holding only because the priors on α₀ are identical and uniform such that . In practice, this involved parametrically approximating each RIX posterior, , as gamma distribution with shape and rate using the fitdistr() function from the R package MASS v7.3-51.4 (Venables and Ripley 2002), and then calculating their renormalized product, which is equivalent to a gamma distribution with shape and rate .

Point and interval estimates from the aggregate posterior approach above were comparable to those from traditional random-effects meta-analysis on the per-RIX estimates, the latter conducted with the R package meta v4.14-0 (Balduzzi et al. 2019) using both inverse variance and DerSimonian-Laird estimators (DerSimonian and Laird 1986).

Whole genome sequences of CC strains

Over the last few years, high-coverage sequences of the CC strains have been made available to the research community. These whole genome sequences (WGS) improved upon the resolution of recombination breakpoints and haplotype assignment in 75 CC strains by sequencing paired-end short reads (150 bp) at 30× coverage for a single male per strain (Srivastava et al. 2017; Shorter et al. 2019). Deeper sequencing led to improved haplotype reconstruction in samples bred from CC strains, and allowed for identification of unique mutations private to a particular strain. We incorporated additional WGS of the CC founder strains from other previously published sources (Keane et al. 2011) and from the GRCm38 mouse reference genome.

The WGS described above for 75 CC strains, along with 24 replicates of C57BL/6J mice and one replicate each of the other seven CC founders, have been made publicly available in BWT-formatted DNA-seq reads http://csbio.unc.edu/CEGSseq/index.py. These multi string BWTs were built using the msBWT python tool (Holt and McMillan 2014) from all lanes and paired ends of the Illumina read sets for these genome sequences. Resources making use of the the BWT dataset for effecient k-mer searches have been previously described (Srivastava et al. 2017).

Haplotype analyses based on WGS

The resulting WGS from the CC strains were used to assemble 8 intervals totalling 8,215 bp across the Calaway et al. (2013) minimum Xce locus in each one of the 8 CC founders. The following CC strains represented the corresponding founder as follows: reference genome for C57BL/6J; CC055 as representative of the NOD haplotype; CC020 for A/J; CC024 for 129S1/SvImJ; CC051 for WSB/EiJ; CC032 for CAST/EiJ; CC003 for PWK/PhJ; and CC002 for NZO. We first identified intervals between 0.4 – 3 Kb in length, composed of contiguous 45-mers that are present only once in the reference genome. We used the most proximal of these 45-mers as a seed and assembled the sequence in the CC strains using the consensus of the read pileups. All bases used in the consensus were supported by at least two independent reads and, within each strain, lacked any evidence of SNPs or copy number differences. Once assembled, the sequences were aligned using the EMBL-EBI tool, Clustal Omega (Madeira et al. 2019), and alignments were optimized by manual inspection to reduce the number of variants. The location, length, and CC strains used for the assembly are shown in Table S3.

Phylogenetic analysis of CC founder strains

The 8 assembled intervals spanning the Xce region were used to estimate the phylogenetic relationship based on X chromosome sequence similarity among the 8 CC founders using BEAST 2.6.3, which performs Bayesian evolutionary analysis by sampling trees (Bouckaert et al. 2019). The tree model was based on a coalescent prior for a constant population and was simplified with linked site, clock, and tree parameters among the intervals. We assumed a strict clock and the HKY substitution model (Hasegawa et al. 1985). We generated 10⁷ MCMC samples from the posterior of coalescent trees, thinning every 10³ samples, over the course of three separate runs with different starting seeds for a total of 3 × 10⁴ recorded posterior samples. We visualized the resulting tree set using DensiTree.v2.2.7, which shows different topographies with varying level of support.

Quantifying copy number variations

The set of 106 WGS with BWT-formatted data described above was also previously used to develop an occurrence-count matrix of every sequential, non-overlapping 45-mer from the standard mouse reference (GRCm38). We used this count matrix to query 45-mers across CC strains containing different functional alleles in the putative Xce interval defined in Calaway et al. (2013). By comparing and quantifying differential k-mer counts between strains, we generated discrete evidence of CNVs in regions along the X chromosome. Samples were classified into eight groups corresponding to the CC founder strains at the Xce interval, roughly between 102.65-102.95 Mb when translated to GRCm38 coordinate space. The 24 C57BL/6J replicates comprised the baseline “reference” group and the remaining CC-derived samples that were homozygous for C57BL/6J across the Xce interval were separated into another group to provide a negative control.

Strain-wide copy numbers for each k-mer were first normalized per sample and then averaged across samples in each group. Segmental duplications (SD) and inversions (I) were defined as regions where the mean difference, Δ, between 45-mer counts in the comparison strain versus the inbred C57BL/6J mean were different than 0 after k-means clustering of Δ centered at 0, > 0, and < 0. K-mers that have an average of one copy in the reference group and zero copies in the comparison group were deemed to contain nucleotide polymorphisms in the non-reference strain. The relevant 45-mers spanning the Xce are compiled in File S4, along with the X chromosome positions, the number of copies present in the reference genome, and any SD or I assignments. Alignment boundaries for each SD were determined and visualized using Gepard v1.40 with word size of 45 (Krumsiek et al. 2007).

Data availability

The processed data and code to support the results reported here are available at Figshare [link]. These data include: R scripts to re-generate figures in this manuscript and intermediate data files; full demographic data for SP1 and SP2; curated lists of 25-mers used to detect reference and variant alleles in RNA-seq data from the X chromosome along with code to generate this list; k-mer counts of the curated 25-mers for both populations; k-mer counts of 45-mers from DNA-seq using CC and CC founder strains; haplotype probabilities based on genotyping data for SP1 and based on MRCA genotypes for SP2; full MCMC sampling outputs for the statistical model. The processed incident count matrices of contiguous 45-mers for the CC strains noted above, and BWT-formatted files of all RNA-seq data are available publicly at http://csbio.unc.edu/CEGSseq/index.py. Genotyping data for the CC MRCAs are available at http://csbio.unc.edu/CCstatus/index.py?run=FounderProbs and genotyping data for SP1 have been deposited in a UNC Data-verse repository (https://dataverse.unc.edu/dataverse/MiniMUGA) under DOI number 10.15139/S3/UYURKF. All R scripts to run the statistical model, and process and generate datasets are available at https://github.com/kathiesun/XCI_analysis.

Results

XCI ratio estimated for each mouse and RIX from RNA-seq allele-specific expression

The CC-RIX females comprising this study were genetically heterogeneous mosaics of the 8 CC founder strains with one copy of each chromosome inherited in its entirety from each CC parent. In order to quantify ASE, we relied on efficient multi-string BWT searching of k-mers to identify reference and alternate alleles in the RNA-seq reads. This is akin to a microarray-based quantification strategy where each k-mer represents a probe designed based on prior knowledge, allowing us to precisely target known SNPs to measure ASE.

Counts of k-mers containing reference and alternate alleles of variants were attributed to a particular CC parent according to the haplotype reconstruction derived from genotyping data. The relative frequency of summed reads across a gene originating from one of the CC parents, e.g. CC001 in a CC001/CC011 RIX, was modeled analogously to the frequency of heads when flipping a potentially biased coin, as a binomial count that depends on an underlying long-run proportion that may deviate from 0.5. This proportion was estimated for each gene; the proportions across genes were used to estimate an underlying XCI proportion for each mouse; and the XCI proportions across mice were used to estimate a proportion specific to the RIX. These estimations were performed simultaneously using a Bayesian hierarchical model, which also 1) incoporated, and thereby corrected for, potential effects of experimental or breeding-related factors, and 2) connects the variability of mouse-specific XCI proportions about their RIX-wide mean to the subset of epiblast cells at the point of the initial XCI decision contributing to the assayed tissue, in this case the brain.

XCI is relatively consistent across genes within an individual

Across an individual mouse, gene-level estimates of XCI proportion are stable, suggesting that our quantification methodology is reliable. Figure 6 shows XCI proportion estimates for a mouse each from three CC-RIXs (all 266 samples are shown in File S5). Our Bayesian model estimates posterior distributions for XCI proportions at the gene and RIX level; we report both the means of those distribution and their 95% highest posterior density (HPD) intervals. Gaps in the X chromosome position reflect the patchwork heterozygosity and homozygosity of the CC-RIX samples. In this example, the HPD intervals are fairly narrow around the means, indicating the precision of these estimates, and for two of the mice, the XCI proportion is far from 0.5, indicating strong XCI skew (File S5).

Figure 6

Proportions of parental X chromosome representation at the gene-level (points with 95% HPD bars) for one mouse in each of three separate CC-RIX. Mouse-level estimates summarized as line across the region and shaded 95% HPD interval. Counts from the first strain cross name contribute to the numerator of the proportion. Xce allele status: A) CC006 and CC026 are both Xce^f; B) CC023 (Xce^f) has a weaker allele than CC047 (Xce^b derived from NZO); C) CC041 (Xce^f) has a weaker allele than CC051 (Xce^b derived from WSB/EiJ).

These three example mice demonstrate the consistency of estimates for each sample at genes across the X chromosome, supporting our estimates of even fairly extreme XCI skews such as those shown in the Figure 6b-c. At the mouse level, this consistency is representative of all of the samples in the experiment overall.

Pattern of XCI skew in RIXs with known Xce allele is consistent with previous studies

Our results for XCI skew were largely consistent with previously published research, given our knowledge about the underlying haplotype structure of the CC strains and the known Xce subtypes corresponding to major Mus musculus strains (Figure 3). Estimates of XCI proportions for each sample and each CC-RIX are compiled in Table S1 and File S1.

Figure 7 shows the XCI proportion at the individual- and RIX-level for every cross in the study, divided as a) crosses between strains with previously phenotyped Xce alleles, b) crosses between strains with inferred alleles. Crosses with both CC strains sharing the same Xce allele had XCI ratios at roughly 50:50. The crosses demonstrate that Xce^a is weaker than any other known allele, as only roughly 30-35% of the cells have active chromosomes bearing Xce^a (Figure 7a). Xce^e and Xce^b are approximately of equal strength, which corroborates the similar pattern seen in Calaway et al. (2013).

Figure 7

XCI proportion for all 266 samples across 28 CC-RIXs. Crosses where both CC parent contain previously-observed Xce alleles (A), and crosses where at least one CC parent is NOD or NZO within the interval (B). The y-axis labels state the CC-RIX followed by the two founder haplotypes that overlap the Xce in that CC-RIX. The Xce comparison for each group of RIX crosses is noted on the left vertical axis. Square points show the mean estimate of XCI proportion with 95% HPD bars for individual samples. The size of the point reflects the total k-mer counts from the sample, corresponding to its total RNA-seq read count and informativeness. Each cross is summarized across the RIX with round points. Crosses in gray match our predicted estimates of XCI skew based on known or inferred Xce allele whereas crosses highlighted in red do not.

Unlike the narrow HPD intervals seen at the gene and individual level (Figure 6), there is greater variability across individuals within a RIX. Some RIX from SP2 have wide HPD intervals reflecting their smaller replicate groups overall and perhaps a smaller starting amount of cells relative to SP1 due to RNA-seq sample collection for SP2 that took tissue from the striatum as opposed to whole brain tissue for SP1. An additional caveat is that haplotype reconstruction for SP2 relied upon genotyping data from the CC resource and not the specific individual mouse, which may have led to errors in assigning haplotypes, particularly near segregation points. Therefore, some RIX from SP2 have HPD intervals that are less informative, e.g. CC015/CC005, CC015/CC011, and CC021/CC002 (Figure 7).

The width of the HPD intervals at the RIX level derives from the precision, α₀, of the overall RIX-wide XCI proportion. Though we described some legitimate experimental artifacts that may contribute to lower precision in certain RIX crosses, there are also true underlying biological reasons for this variation among samples in a cross. Inter-individual variability among the samples in a RIX can be interpreted as different amounts of starting cells that correspond to our precision estimate, α₀, as described next.

Estimated number of cells in pre-brain epiblast tissue range from 20 to 30

Our statistical model for sample-specific XCI proportion implies a Pólya-urn model for cell proliferation in which one of the estimated parameters, α₀, relates to the number of brain precursor cells in the epiblast at the onset of random X inactivation. Our estimates of α₀ were strikingly concordant between the two sample populations (Figure 8 and Table S2), and so we combined them to give a single, overall value. The combined posterior distribution for α₀ followed a gamma distribution with shape parameter 100.36 and rate parameter 4.10. This translates to a point estimate (posterior mean) for α₀ of 24.48 with standard error (posterior standard deviation) of 2.44 and a 95% HPD interval of 19.93 to 29.50. Our model thus suggests that the number of initial pluripotent cells in the epiblast that eventually form brain tissue in mature mice may be around 20-30. This is a reasonable figure given the number of total cells in the epiblast ranges from around 120 on E5.5 to 660 on E6.5 (Snow 1977).

Figure 8

Bayesian inference of parameter α₀, which estimates the number of brain precursor cells in the E5.5 epiblast. Fitted posterior curves are shown for each RIX (thin lines) from SP1 (blue) and SP2 (green), with consensus posteriors for SP1 and SP2 (thick blue and green), and an overall consensus posterior (dotted magenta) centered at 24.48 (95% HPD 19.93-29.50). The horizontal axis is given on the log scale for readability. Shape, rate, mean, and variance estimates from the posterior for each line are provided in Table S2.

Unexpected XCI skewing in RIX females with the NOD Xce allele

As well as corroborating earlier studies, the CC strain data also characterized the XCI (and thus Xce subtype) for two founder strains that had not been previously evaluated. Both founder strains, NOD and NZO, had been previously assigned to Xce^b due to sequence similarity with the reference genome.

We found a striking pattern of skewed XCI in crosses containing haplotypes derived from NOD at the Xce interval from 102.65-102.95 Mb. Crosses heterozygous at this locus between NOD and any other founder exhibited profoundly skewed XCI proportions, despite the expectation that skewing would behave similarly with other strains carrying Xce^b. Our results indicate that NOD harbors a novel Xce allele conferring a lower tendency to remain active, weaker even than Xce^a. Figure 6 shows examples of gene- and sample-wide estimates of XCI proportion in three different CC-RIX crosses, each with NOD contributing the Xce region for at least one of its inherited X chromosomes.

Chromosomes bearing the Xce derived from NOD were consistently more likely to be inactivated than any other Xce allele (Figure 7b). This consistency suggests that this observed skewing is due to underlying variation that is inherent to the NOD Xce haplotype and not CC strain-specific factors, leading us to establish Xce^f from NOD as new allele in the functional series. Unexpected skews were observed in 11 out of 12 CC-RIX where one parental chromosome inherits the NOD Xce allele. This concordance was irrespective of different CC and founder strains carrying the Xce^f, and transcended different Xce pairings, suggesting that this result is genuinely due to primary skewing.

Interestingly, chromosomes from NZO behave like they carry Xce^b which follows our a priori assumptions. This narrows our focus of inquiry because both NZO and NOD are identical-by-descent in this region and harbor few SNPs compared with the mouse reference genome. As a result, we investigated whether 1) the observed XCI skewing phenomenon in NOD—and by extension, other Xce functional alleles—may be driven by chromosomal rearrangements and not necessarily sequence variation; or 2), NOD and/or NZO were improperly categorized as Xce^b based on haplotype similarity.

SNP analysis in the Xce interval show that NOD and C57BL/6J have almost identical haplotypes

Given the unexpected patterns of XCI skewing in RIX females that carry the NOD Xce haplotype in heterozygosity, we decided to use recently released WGS from 75 CC strains to determine the extent of haplotype sharing between the CC founders. To ensure that we only compare orthologous sequences we limited this analysis to genomic regions spanning the Xce candidate interval that have copy number one in the reference genome and C57BL/6J, and likely copy number one in each of the other CC founders. For each region, we assembled the CC founder sequence using the CC strain with the corresponding haplo-type and deepest sequence coverage. After aligning each region, we used standard phylogenetic analysis to determine the relationships between the founder haplotypes (Figure 9). The results were fully consistent with the previously published haplotype sharing based on microarray genotyping (Calaway et al. 2013). Briefly, the eight founders are distributed in four well supported haplotypes: one represented by CAST/EiJ, the second by PWK/PhJ, a third that includes 129S1/SvImJ and A/J; and the fourth and last comprises C57BL/6J, NOD, NZO and WSB/EiJ. We conclude that the expectation that NOD should be Xce^b is supported by haplotype sharing.

Figure 9

Phylogenetic tree generated using 8 sequences from the Xce interval among the 8 CC founder strains. A) Trees sampled from a Bayesian posterior of phylogenies, with the most frequently occurring topologies in blue, followed by green and red, respectively. The maximum clade credibility tree is shown in a thick blue line and posterior probabilities for each node in this consensus tree are shown next to the branch break point. B) Comparison of the expected Xce functional alleles based on haplotype similiarity for each of the founder strain, along with observed Xce strength and CNV repeat structure. C) Table of the observed number of copies for each SD and I in the Xce interval. *The CNV landscape of WSB/EiJ is similar to that of C57BL/6J except for a small duplication at the proximal end of SD1, which does not appear to affect XCI skewing. The SD and I pattern follow that described by Calaway et al. (2013) and Sheedy (2012), expanded to allow for more complicated duplication structures observed across the CC founder strains.

Copy number variations distinguish weaker Xce alleles from stronger ones

The minimum Xce interval between 102,747,920-102,924,411 bp identified in Calaway et al. (2013) contain a series of recurring chromosomal rearrangements. These CNVs—segmental duplications (SD) and inversions (I)—were also verified in C57BL/6J with molecular assays by Sheedy (2012). We further corroborated the chromosomal architecture of this region in the mouse reference sequence with local nucleotide comparisons (Figure 10a) and optical mapping data (Figure S2). These rearrangements have been posited as a potential explanation for the effect of the Xce functional allele series.

Figure 10

A) Dotplot of the mouse reference X chromosome from 102.7-102.9 Mb generated from pairwise sequence concordance in the genome assembly. Diagonal lines slanting down from left to right (shaded in gray) are duplications, while diagonal lines from left to right (shaded in green) are inversions. B) Difference in average counts of genomic 45-mers between sequenced samples with Xce haplotypes derived from NOD (16 CC strains and 1 inbred NOD) and counts from 24 inbred C57BL/6J strains. Arrows signify duplications (SD1-4) and inversions (I5a-b). There is a clear increase in copy number in the interval marked in magenta, R1. CNV clusters are centered at −0.895, 0.107 (shaded in gray), and 1.311. C) Schematic showing the hypothesized architecture of recurrent duplications and inversions within the Xce. The arrows in blue comprise R1, in magenta.

Using direct searches of non-overlapping 45-mers, we discovered an additional copy of the X chromosome sequence from approximately 102,802,400-102,839,400 bp, forming a continuous, 37 kb repeat spanning SD3b, SD4, and the bridge sequence between these recurring regions that we denote SD6. As shown in Figure 10b, the pertinent duplicated region marked with a magenta band clearly spans 45-mers with a consistent increase of counts centered at one extra copy. Henceforth we refer to this novel CNV as R1. Furthermore, we demonstrate that both A/J and 129S1/SvlmJ, which both carry Xce^a, share the same duplicated region, R1, as NOD with a roughly increased copy number of one (see Figures S4-S5). Replicated experiments over decades (Johnston and Cattanach 1981; Simmler et al. 1993; Calaway et al. 2013) have demonstrated that Xce^a was the weakest known Xce allele, previous to our finding in NOD.

This strong molecular evidence establishes a distinction between the reference genome and strains with weak Xce alleles, supporting the idea that variations in copy number within the putative Xce region contributes to the functional allele. We hypothesize R1 is associated with a weak Xce allele, and that the chromosomal organization of CNVs in NOD, A/J, and 129S1/SvlmJ may be described with the schematic shown in Figure 10C.

Compared with NOD, both A/J and 129S1/SvlmJ have a markedly higher number of nucleotide variations relative to the reference (Figures S4-S5). Although all three strains share a similar pattern of repeats with R1, NOD has a weaker phenotype still compared with Xce^a. Both XCI skewing and genetic differences still remain between NOD and the two strains confirmed to possess Xce^a, leading us to establish NOD as its own allele in the functional series, Xce^f.

Strikingly, the CNV pattern seen in NZO contains notable departures from those in other strains. NZO appears to have a more complex series of nested repeats such that different portions of the “weak repeat,” i.e. R1, are replicated at different frequencies (Figure 11). It carries three additional copies of SD4, two additional copies of SD3b, and one additional copy of a sequence segment distal to SD4 that we denote SD7.

Figure 11

A) Difference in average counts of genomic 45-mers between sequenced samples with Xce haplotypes derived from NZO (7 CC strains and 1 inbred NZO) and counts from 24 inbred C57BL/6J strains. CNV clusters are centered at −0.878, 0.144 (shaded in gray), and 2.947. There are 62 SNPs across the 20 kb interval, 10 of which are in R1 (0.013% of the k-mers in the interval). B) Schematic showing the hypothesized architecture of recurrent duplications and inversions within the Xce.

We confirm that NZO has unique breakpoints between SDs that NOD and the reference sequence lack by querying matches of 45-mers at the SD boundaries. Neither the reference nor NOD contain repeats of SD7, so there is only one set of sequences flanking both sides of SD7, i.e. between SD4-SD7 and SD7-I5b. NZO, on the other hand, contains two distinct sets of k-mers on both the proximal and distal ends of SD7 (see Figure S9). This provides evidence that there are two copies of SD7 in NZO, one of which is a repeat flanked by sequences that form a pattern neither observed in the reference nor NOD. Although we are not able to verify the exact locations and pattern of the NZO duplications, shown as a hypothesized schematic in Figure S9B, we do see molecular evidence supporting the quantity of repeats in NZO and the presence of unique breakpoints between duplications and inversions. This suggests that NZO has a different chromosomal architecture in this region compared with other strains, though one that does not manifest in differences of XCI pattern compared to the reference strain.

Discussion

In a previous study by our group (Calaway et al. 2013), we used a diverse set of inbred strains and allele-specific gene expression to characterize a new Xce phenotype and to narrow the putative Xce interval. That study identified a set of recurrent duplications within the Xce and suggested that variation in their copy numbers may in fact be the functional variation driving the allelic series. In the present study, we examined that hypothesis and quantified the skewing phenotypes of two CC founder strains with inferred Xce alleles based on sequence similarity with C57BL/6J across the Xce locus.

Leveraging increased genetic diversity in CC-RIX identifies novel XCI patterns

Two important features of our methods are worth noting: 1) increased heterogeneity in the genetic composition of our F1 crosses of well-described CC strains, and 2) improved mapping resolution across the X chromosome from a novel method of quantifying ASE in CC-RIX mice and modeling the resulting counts in a hierarchical Bayesian manner. The animals represented in our study are each mosaics of 8 inbred mouse strains, with one X chromosome inherited entirely from each parent. Haplotype estimates across the genome in the CC strains are stable and replicable, thus allowing us to leverage previously collected genotyping and sequencing data to inform ASE estimates in our dataset. The complete haplotype reconstruction across the X chromosome for every cross used in this study is depicted in Figure 12.

Figure 12

Heterozygous regions in each CC-RIX line illustrated by the predicted haplotype for each line. Haplotype assignments and their probabilities are represented by the color corresponding to each CC founder and the transparency of the colors, respectively. The 14 crosses in the left, non-shaded area each contain the proposed Xce^f. The beige highlighted region between 102.5 and 105.6 Mb is consistent with heterozygous regions in these crosses where exactly one founder is shared, namely, NOD. Three crosses in the middle each contain the proposed Xce^NZO and the 11 crosses in the right, non-shaded region contain other Xce alleles. The highlighted Xce interval region is consistent with the expected allelic series across the 14 non-NOD crosses and previously proposed Xce intervals.

Our methods relied upon a novel way to quantify ASE across the X chromosome by querying a set of curated 25-mers among the RNA-seq reads from each of the 266 mice in our study population. The 25-mers specifically targeted reference and alternate alleles at known polymorphisms in coding regions, and fed into a hierarchical Bayesian model to quantify XCI proportion for each cross and sample. Among the Xce alleles that have previously been characterized, our estimated XCI proportions matched what we would expect based on data from the literature (see Figure 7a). This finding serves to corroborate historical observations and to provide validation for our Xce imputation method and statistical model.

We observed highly variable proportions in some crosses, potentially owing to multiple sources of variation. Some CC strains have segregating boundaries at or near the Xce interval, making the assignment of CC strain from which the haplotype derives more uncertain, such as near 102.5 Mb in CC062. As shown in Figure 12, CC062/CC035 defines the lower boundary of the maximum Xce interval because the data is consistent with the XCI ratio being 50:50, i.e. between two Xce^a functional alleles of equal strength. In reality, CC062 has a large recombination interval between 129S1/SvImJ and NOD near this proximal boundary. The broad range of proportions we actually observe suggests that Xce^a / Xce^a may not be an appropriate designation for every sample in this cross and that some may indeed be Xce^a / Xce^f.

In addition, we have few samples and crosses with Xce^e derived from PWK/PhJ. Our findings suggest that it is similar in strength to Xce^b and not demonstrably weaker, consistent with previous findings (Calaway et al. 2013). As noted in the Methods, samples from SP2 had fewer replicates because only the females were relevant to this study, potentially leading to more RIX-wide variability. Lastly, RNA-seq tissue collection for SP2 used the striatum as opposed to whole brain tissue, resulting in a smaller starting amount of cells relative to SP1. XCI proportions for individuals in SP2 thus had higher variance, leading to less stable estimates and larger HPD intervals.

Pólya urn-based approximation to the number of cells in pre-brain epiblast tissue

Inter-individual variability in XCI skew among genetically-identical samples within a RIX cross can be partitioned into experimental and biological variation. Although the two cannot be easily disentangled, we surmise that the biological variation derives, in part, from the precision of the beta distributed parameter for each estimate of mouse-specific XCI proportion. At the point of inactivation choice, the cells in the epiblast are akin to balls in a Pólya urn. The Pólya urn describes a random process in which an intial number of red and blue balls undergo successive rounds of randomly assigned duplications; after infinite rounds, the final proportion of red vs blue balls is a random number whose variability is a function of the total starting number. Urns that start with a greater number of balls are more stable against random fluctuations in the proportions of the red to blue balls, and have proportions more closely gathered around the starting proportion; urns starting with a smaller number lead to a final proportion that is more variable.

Analogously, the urn represents a RIX and α₀ represents the starting number of pluripotent cells that are involved in the decision to activate either the maternal or paternal chromosome at around E5.5 and will eventually form brain tissue (or whichever tissue undergoes an ASE assay) in the mature mouse. Though we first estimate α₀ in each RIX individually, we assume that the parameter should be similar in each individual cross, given the stability of biology underlying the XCI process.

Though we are unable to verify this quantity of 20-30 pre-brain epiblast cells, it does seem reasonable given the total number of cells in the epiblast is between 120-660 at E5-6 (Snow 1977).

Copy number of recurrent duplications may explain the weakness of Xce^a and novel Xce^f, found in NOD

Both NOD and NZO were previously predicted to express the Xce^b functional allele based on haplotype similarities to C57BL/6J. Our results do not support this conclusion in NOD. We characterize the Xce locus derived from NOD as a separate functional allele in the series, Xce^f, because we find it to be consistently weaker than all other known Xce haplotypes. Crosses involving 6 CC strains (CC012, CC023, CC026, CC028, CC041, and CC065) that contain the NOD-derived Xce region corroborate the weakness of the novel Xce^f (Figure 7). This continuity leads us to conclude that chromosomes carrying the NOD Xce allele contain sequence-level variation in this interval, manifesting in primary inactivation bias against keeping that parental copy of the X chromosome active.

We confirm that both NOD and NZO share sequence similarity in the Xce interval with the reference genome (Figure 9) using haplotype assembly from CC WGS and phylogenetic analysis. As a result, we conclude that CNV structure may be the causative factor for this phenomenon. CNV analysis (Figure 10b) reveals a large interval in which the normalized counts for all k-mers are consistent with the presence of an extra copy in NOD. We tentatively conclude that the repeat, R1, represents a genuine copy number increase of a contiguous 37 kb-long segment in NOD. R1 includes the entire SD3b and SD4, as well as the bridge sequence connecting them that is not duplicated in the reference (Figure 10b). The novel R1 appears to be recent; the last duplication found in the reference genome is that of SD3a-b inverting and inserting distally to form I5a-b, and is demonstrated by the sequence similarity between these two sets of sequences in both k-mer identity over sliding windows and optical mapping data. This general rearrangement structure is similar between two weak alleles, Xce^a and Xce^f. A/J and 129S1/SvlmJ express Xce^a, and both strains share with NOD evidence of the same SD3a-b to I5a-b inversion alongside the novel repeat, R1.

NZO expresses Xce^b despite complicated CNV organization

XCI estimates from crosses containing an Xce region derived from NZO do not deviate from our hypothesized ratios based on the strain carrying Xce^b. Whether the XCI proportions seen in NZO indicate that its Xce interval is the same molecular species with genuinely identical function as Xce^b, or if the two phenotypes have converged to appear similar is unclear. We would expect NZO to have a duplication structure akin to that of the reference mouse genome, or at least a different structure to that of NOD, A/J, and 129S1/SvlmJ. Our analysis of NZO is hampered by the lack of CC-RIX in our data with NZO in the putative Xce region. One of our main study populations, SP1, was designed to maximize heterozygous loci between C57BL/6J and NOD, which explains the predominance of both strains in our down-stream analysis. Nevertheless, the three RIX that contain NZO are consistent with the strain bearing Xce^b or at least a functional allele of the same strength.

In NZO, we find a more complex pattern of SD’s and I’s than seen in other strains. As shown in Figure 11, NZO appears to harbor one increased copy of SD7, two increased copies of SD3b and SD6, and three increased copies of SD4. We confirm the increased copy number of these elements by observing novel sets of boundaries between SD’s that are not present in C57BL/6J, NOD, or other strains. For example, NZO contains two distinct sets of sequences on the distal end of SD7, suggesting that there are two real copies of the segment in the NZO sequence: one of which leads into I5b and is present in the C57BL/6J sequence, and the other of which is novel (see Figure S9).

Thus, the copy number pattern observed in NZO is indeed different than what we observe in NOD, A/J and 129S1/SvlmJ, and the reference genome. NOD and NZO were predicted to share the same skewing phenotype as the reference based on sequence similarity at the SNP level. Our data demonstrates that the NOD Xce haplotype has a novel functional allele, distinct from NZO and any known Xce allele. CNVs can explain the difference between the functional Xce alleles present in NOD and C57BL/6J but they are not able to discriminate between NOD and strains with the Xce^a allele. This is not particularly surprising given that this simplified approach ignores the potential effect of variation outside of the recent NOD duplication and do not consider higher order factors associated with duplications such as location and orientation of the duplicated segment.

CNV abundance and organization, along with sequence variation, may all play a role in Xce strength and XCI skewing

We do not capture a full portrait of how duplicated segments are organized in this complicated region. Copy number may well play a role, but there seem to be additional factors distinguishing NOD from strains in Xce^a, as all of these strains appear to contain the novel R1. In addition, the duplication structure found in NZO is more complex than what we observe in other strains yet this does not translate to a detectably different phenotype compared with C57BL/6J. This suggests that alternate recurrent duplication structures, each containing variations relative to the reference mouse genome, may present technically different Xce species that converge in similar phenotypes. This is supported by phylogenetic analysis showing that A/J and 129S1/SvlmJ are more similar to each other, while NOD and C57BL/6J evolved separately along another branch. The larger region surrounding the Xce contains many other recurrent duplications and repeats, indicating that it is potential “hotspot” of copy number changes (Sheedy 2012). We provide evidence that CNV are important to the Xce phenotype and that simply looking at sequence variation is insufficient, but the exact cause and nature of those aberrations remain elusive.

Based on genotyping information collected on the animals in SP1 and from ancestral CC samples, we can map the control element associated with our observed results to a chromosomal location that is consistent with the historical interval as set forth by Calaway et al. (2013), Chadwick et al. (2006), Simmler et al. (1993), and others. In our CNV analysis, we focus on a small portion of this interval near the proximal end from 102.7-102.9 Mb. Our analysis does not preclude the possibility that more distal genomic elements—including sequence variations, CNVs, chromosomal structure, etc.—may also contribute to the function of the element. The immediate region surrounding our putative Xce interval is highly duplicated and carries convoluted patterns of CNVs, only a few of which we examined in this study. Further work into the nature of the Xce may explore the patterns and inheritance of those rearrangements. For example, some molecular evidence from Sheedy (2012) suggests that a distal duplication distinguishes CAST/EiJ from C57BL/6J. Broader molecular characterization of the extent that CNV plays a role in enacting this control will be required to fully understand the function of Xce.

Supporting information

File S1 List of all 266 mouse samples (CSV).

File S2 Complete data on 7,957 25-mers used to quantify gene expression on X chromosome (CSV).

File S3 Posterior mode, mean, median, and 95% highest posterior densities determined by the Bayesian hierarchical model for all covariates in GLM performed for SP1 and SP2 (CSV).

File S4 Complete data on non-overlapping genomic 45-mers used to determine CNV in Xce interval (CSV).

File S5 Individual XCI proportion estimates across X chromosome for all 266 samples in study, akin to Figure 6 (PDF).

Supplemental Figures and Tables

The following pages provide supplemental figures and tables for the manuscript “Skewed X inactivation in genetically diverse mice is associated with recurrent copy number changes at the Xce locus” by Sun et al.

Figure S1

DAG showing in detail the parameters of the hierarchical model for XCI proportion at the gene-, individual mouse-, and RIX level. Estimates for the number of day 5 brain precursor cells (α₀) across RIXs are then combined through a post-processing step. Arrows indicate dependencies between nodes. Dashed arrows are probabilistic dependencies, and labels denote the probability distributions linking the nodes. For distributions with multiple parameters, the star (*) indicates the parameter of the parent node, and the dot (·) is a placeholder for the other parameter. We use a parameterization of the beta distribution based on mean and precision, as described in the text. Solid arrows are deterministic dependencies, and labels denote the operation linking the nodes.

Figure S2

Bionano optical mapping alignment of the mouse reference sequence (labeled Ref 21) and CC005, which has a C57BL/6J-derived Xce region. Lines connecting the two sequences represent shared markers, and the red line linking one marker on CC005 and two on Ref 21 indicates that the marker on CC005 shows up twice in the reference. The shared lines in the marked region line up with the known duplication and inversion structure in C57BL/6J (blue arrows), and suggests that SD3a-b inverts and duplicates to form I5a-b.

Figure S3

Counts of genomic 45-mers spanning the proposed Xce: difference between average counts across the interval in sequenced mice with haplotypes derived from C57BL/6J (10 CC strains) and counts from 24 inbred C57BL/6J mice.

Figure S4

Counts of genomic 45-mers spanning the proposed Xce: difference between average counts across the interval in sequenced mice with haplotypes derived from A/J (4 CC strains and one inbred A/J representative) and counts from 24 inbred C57BL/6J mice.

Figure S5

Counts of genomic 45-mers spanning the proposed Xce: difference between average counts across the interval in sequenced mice with haplotypes derived from 129S1/SvlmJ (15 CC strains and one inbred 129S1/SvlmJ representative) and counts from 24 inbred C57BL/6J mice.

Figure S6

Counts of genomic 45-mers spanning the proposed Xce: difference between average counts across the interval in sequenced mice with haplotypes derived from CAST/EiJ (3 CC strains and one inbred CAST/EiJ representative) and counts from 24 inbred C57BL/6J mice.

Figure S7

Counts of genomic 45-mers spanning the proposed Xce: difference between average counts across the interval in sequenced mice with haplotypes derived from PWK/PhJ (2 CC strains and one inbred PWK/PhJ representative) and counts from 24 inbred C57BL/6J mice.

Figure S8

Counts of genomic 45-mers spanning the proposed Xce: difference between average counts across the interval in sequenced mice with haplotypes derived from WSB/EiJ (7 CC strains and one inbred WSB/EiJ representative) and counts from 24 inbred C57BL/6J mice.

Figure S9

Counts of k-mers spanning the proximal and distal ends of SD7 for both C57BL/6J and NZO. Sequence data from NZO clearly show at least two unique boundaries, suggesting that SD7 is indeed duplicated in this strain.

View this table:

Table S1

Summary of all RIX crosses phenotyped in this study, the number of samples per cross, the CC haplotype imputed in the Xce interval (first Xce 1-2 column), and the corresponding Xce alleles (second Xce 1-2 column).

View this table:

Table S2

Shape, rate, mean, variance, and 95% HPD estimates for gamma-distributed α₀ posterior distributions from MCMC run on individual CC-RIX.

View this table:

Table S3

Assembled 8 sequences spanning the Xce interval generated from high-coverage CC WGS used to infer the phylogenetic relationships between the CC strains based on this region.

Acknowledgements

This work was funded by a National Institute of Mental Health (NIMH) grant R01-MH100241 to WV and LMT, NIMH and National Human Genome Research Institute (NHGRI) grants (P50-MH090338, P50-HG006582, and U24-HG010100) to FPMV, and a National Institute of General Medical Sciences (NIGMS) grant R35-GM127000 to WV. KYS also received partial support from the UNC-CH Caroline H. and Thomas S. Royster Fellowship and NIGMS training grant, T32-GM067553. VZ is funded via National Institute of Environmental Health Sciences training grant, T32-ES007018. MiniMUGA was developed under a service contract to FPMV and other investigators at UNC-CH from Neogen Inc., Lincoln, NE. None of the authors have a financial relationship with Neogen Inc. apart from the service contract listed above. The authors have no other conflict of interest to declare. Members of Leonard McMillan’s lab at UNC-CH, including Maya Najarian and Sebastian Sigmon, provided guidance with the WGS data and building k-mers. James Xenakis was instrumental in accessing data from SP2. We thank Greg Keele and other members of the Valdar laboratory for helpful discussions and thoughtful comments on the manuscript.

Literature Cited

↵
Avner, P. and E. Heard, 2001 X-chromosome inactivation: Counting, choice and initiation. Nature Reviews Genetics 2: 59–67.
OpenUrl CrossRef PubMed Web of Science
↵
Balduzzi, S., G. Rücker, and G. Schwarzer, 2019 How to perform a meta-analysis with R: a practical tutorial. Evidence-Based Mental Health pp. 153–160.
↵
Beutler, E., M. Yeh, and V. F. Fairbanks, 1962 The normal human female as a mosaic of X-chromosome activity: studies using the gene for C-6-PD-deficiency as a marker. Proceedings of the National Academy of Sciences of the United States of America 48: 9–16.
OpenUrl FREE Full Text
↵
Bouckaert, R., T. G. Vaughan, J. Barido-Sottani, S. Duchêne, M. Fourment, et al., 2019 BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLoS Computational Biology 15: e1006650.
OpenUrl
↵
Brockdorff, N., A. Ashworth, G. F. Kay, P. Cooper, S. Smith, et al., 1991 Conservation of position and exclusive expression of mouse Xist from the inactive X chromosome. Nature 351: 329–331.
OpenUrl CrossRef PubMed Web of Science
↵
Broman, K. W., D. M. Gatti, P. Simecek, N. A. Furlotte, P. Prins, et al., 2019 R/qtl2: Software for mapping quantitative trait loci with high-dimensional data and multiparent populations. Genetics 211: 495–502.
OpenUrl Abstract/FREE Full Text
↵
Brown, C. J., A. Ballabio, J. L. Rupert, R. G. Lafreniere, M. Grompe, et al., 1991 A gene from the region of the human X inactivation centre is expressed exclusively from the inactive X chromosome. Nature 349: 38–44.
OpenUrl CrossRef PubMed Web of Science
↵
Calaway, J. D., A. B. Lenarcic, J. P. Didion, J. R. Wang, J. B. Searle, et al., 2013 Genetic architecture of skewed X inactivation in the laboratory mouse. PLoS genetics 9: e1003853.
OpenUrl
↵
Cattanach, B. M., 1970 Controlling elements in the mouse X-chromosome: III. Influence upon both parts of an X divided by rearrangement. Genetical Research 16: 293–301.
OpenUrl CrossRef PubMed Web of Science
↵
Cattanach, B. M. and J. H. Isaacson, 1967 Controlling elements in the mouse X-chromosome. Genetics 57: 331–346.
OpenUrl FREE Full Text
↵
Cattanach, B. M. and D. Papworth, 1981 Controlling elements in the mouse. V. Linkage tests with X-linked genes. Genetical research 38: 57–70.
OpenUrl PubMed Web of Science
↵
Cattanach, B. M., J. N. Perez, and C. E. Pollard, 1970 Controlling elements in the mouse X-chromosome. II. Location in the linkage map. Genetical research 15.
↵
Cattanach, B. M., C. E. Williams, and C. W. BM Cattanach, 1972 Evidence of non-random X chromosome activity in the mouse. Genetics Research 19: 229–40.
OpenUrl
↵
Chadwick, L. H., L. M. Pertz, K. W. Broman, M. S. Bartolomei, and H. F. Willard, 2006 Genetic control of X chromosome inactivation in mice: Definition of the Xce candidate interval. Genetics 173: 2103–2110.
OpenUrl Abstract/FREE Full Text
↵
Collaborative Cross Consortium, 2012 The genome architecture of the collaborative cross mouse genetic reference population. Genetics 190: 389–401.
OpenUrl Abstract/FREE Full Text
↵
Crowley, J. J., Y. Kim, A. B. Lenarcic, C. R. Quackenbush, C. J. Barrick, et al., 2014 Genetics of adverse reactions to haloperidol in a mouse diallel: A drug-placebo experiment and Bayesian causal analysis. Genetics 196: 321–347.
OpenUrl Abstract/FREE Full Text
↵
Crowley, J. J., V. Zhabotynsky, W. Sun, S. Huang, I. K. Pakatci, et al., 2015 Analyses of allele-specific gene expression in highly divergent mouse crosses identifies pervasive allelic imbalance. Nature Genetics 47: 353–360.
OpenUrl CrossRef PubMed
↵
DerSimonian, R. and N. Laird, 1986 Meta-analysis in clinical trials. Controlled Clinical Trials 7: 177–188.
OpenUrl CrossRef PubMed Web of Science
↵
Disteche, C. M. and J. B. Berletch, 2015 X-chromosome inactivation and escape. Journal of Genetics 94: 591–599.
OpenUrl CrossRef PubMed
↵
Dossin, F., I. Pinheiro, J. J. Żylicz, J. Roensch, S. Collombet, et al., 2020 SPEN integrates transcriptional and epigenetic control of X-inactivation. Nature 578: 455–460.
OpenUrl CrossRef
↵
Giorgetti, L., B. R. Lajoie, A. C. Carter, M. Attia, Y. Zhan, et al., 2016 Structural organization of the inactive X chromosome in the mouse. Nature 535: 575–579.
OpenUrl CrossRef PubMed
↵
Giusti-Rodríguez, P., J. G. Xenakis, J. J. Crowley, R. J. Nonneman, D. M. DeCristo, et al., 2020 Antipsychotic Behavioral Phenotypes in the Mouse Collaborative Cross Recombinant Inbred Inter-Crosses (RIX). G3: Genes, Genomes, Genetics p. g3.400975.2020.
↵
Hasegawa, M., H. Kishino, and T. aki Yano, 1985 Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution 22: 160–174.
OpenUrl CrossRef PubMed Web of Science
↵
Holt, J. and L. McMillan, 2014 Merging of multi-string BWTs with applications. Bioinformatics 30: 3524–3531.
OpenUrl CrossRef PubMed
↵
Johnston, P. G. and B. M. Cattanach, 1981 Controlling elements in the mouse. IV. Evidence of non-random X-inactivation. Genetical research 37: 151–60.
OpenUrl CrossRef PubMed Web of Science
↵
Keane, T. M., L. Goodstadt, P. Danecek, M. A. White, K. Wong, et al., 2011 Mouse genomic variation and its effect on phenotypes and gene regulation. Nature 477: 289–294.
OpenUrl CrossRef PubMed Web of Science
↵
Krumsiek, J., R. Arnold, and T. Rattei, 2007 Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics 23: 1026–8.
OpenUrl CrossRef PubMed Web of Science
↵
Larson, N. B., Z. C. Fogarty, M. C. Larson, K. R. Kalli, K. Lawrenson, et al., 2017 An integrative approach to assess X-chromosome inactivation using allele-specific expression with applications to epithelial ovarian cancer. Genetic Epidemiology 41: 898–914.
OpenUrl
↵
Lee, J. T., L. S. Davidow, and D. Warshawsky, 1999a Tsix, a gene antisense to Xist at the X-inactivation centre. Nature Genetics 21: 400–404.
OpenUrl CrossRef PubMed Web of Science
↵
Lee, J. T., N. Lu, and N. L. JT Lee, 1999b Targeted mutagenesis of Tsix leads to nonrandom X inactivation. Cell 99: 47–57.
OpenUrl CrossRef PubMed Web of Science
↵
Lee, J. T., W. M. Strauss, J. A. Dausman, and R. Jaenisch, 1996 A 450 kb transgene displays properties of the mammalian X-inactivation center. Cell 86: 83–94.
OpenUrl CrossRef PubMed Web of Science
↵
Lenarcic, A. B., J. D. Calaway, F. P.-M. de Villena, and W. Valdar, 2018 Bayesian Manifold-Constrained-Prior Model for an Experiment to Locate Xce. ArXiv p. 1812.08863.
↵
Lyon, M. F., 1961 Gene action in the X-chromosome of the mouse (mus musculus L.). Nature 190: 372–373.
OpenUrl CrossRef PubMed Web of Science
↵
Madeira, F., Y. M. Park, J. Lee, N. Buso, T. Gur, et al., 2019 The EMBL-EBI search and sequence analysis tools APIs in 2019. Nucleic Acids Research 47: W636–W641.
OpenUrl
↵
Minajigi, A., J. E. Froberg, C. Wei, H. Sunwoo, B. Kesner, et al., 2015 A comprehensive Xist interactome reveals cohesin repulsion and an RNA-directed chromosome conformation. Science 349: aab2276.
OpenUrl Abstract/FREE Full Text
↵
Minks, J., W. P. Robinson, and C. J. Brown, 2008 A skewed view of X chromosome inactivation. Journal of Clinical Investigation 118: 20–23.
OpenUrl CrossRef PubMed Web of Science
↵
Morgan, A. P., C. P. Fu, C. Y. Kao, C. E. Welsh, J. P. Didion, et al., 2016 The mouse universal genotyping array: From substrains to subspecies. G3: Genes, Genomes, Genetics 6: 263–279.
OpenUrl
↵
Nesterova, T. B., S. C. Barton, M. A. Surani, and N. Brockdorff, 2001 Loss of Xist imprinting in diploid parthenogenetic preimplantation embryos. Developmental Biology 235: 343–350.
OpenUrl CrossRef PubMed Web of Science
↵
Nora, E. P., B. R. Lajoie, E. G. Schulz, L. Giorgetti, I. Okamoto, et al., 2012 Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 485: 381–385.
OpenUrl CrossRef PubMed Web of Science
↵
Ogawa, Y. and J. T. Lee, 2003 Xite, X-inactivation intergenic transcription elements that regulate the probability of choice. Molecular Cell 11: 731–743.
OpenUrl CrossRef PubMed Web of Science
↵
Okamoto, I., A. P. Otte, C. D. Allis, D. Reinberg, and E. Heard, 2004 Epigenetic Dynamics of Imprinted X Inactivation during Early Mouse Development. Science 303: 644–649.
OpenUrl Abstract/FREE Full Text
↵
Oreper, D., S. A. Schoenrock, R. McMullan, R. Ervin, J. Farrington, et al., 2018 Reciprocal F1 Hybrids of Two Inbred Mouse Strains Reveal Parent-of-Origin and Perinatal Diet Effects on Behavior and Expression. G3: Genes, Genomes, Genetics 8: 3447–3468.
OpenUrl
↵
Penny, G. D., G. F. Kay, S. A. Sheardown, S. Rastan, and N. Brockdorff, 1996 Requirement for Xist in X chromosome inactivation. Nature 379: 131–137.
OpenUrl CrossRef PubMed Web of Science
↵
Percec, I., R. M. Plenge, J. H. Nadeau, M. S. Bartolomei, and H. F. Willard, 2002 Autosomal dominant mutations affecting X inactivation choice in the mouse. Science 296: 1136–1139.
OpenUrl Abstract/FREE Full Text
↵
Plenge, R. M., I. Percec, J. H. Nadeau, and H. F. Willard, 2000 Expression-based assay of an X-linked gene to examine effects of the X-controlling element (Xce) locus. Mammalian Genome 11: 405–408.
OpenUrl CrossRef PubMed Web of Science
↵
1. K. Hornik,
2. F. Leisch, and
3. A. Zeileis
Plummer, M., 2003 JAGS: A Program for Analysis of Bayesian Graphical Models Using Gibbs Sampling. In Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003), edited by K. Hornik, F. Leisch, and A. Zeileis, pp. 1–10, Vienna, Austria.
↵
R Core Team, 2017 R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
↵
Rastan, S., 1982 Timing of X-chromosome inactivation in postimplantation mouse embryos. Journal of embryology and experimental morphology 71: 11–24.
OpenUrl PubMed Web of Science
↵
Schoenrock, S. A., D. Oreper, J. Farrington, R. C. McMullan, R. Ervin, et al., 2018 Perinatal nutrition interacts with genetic background to alter behavior in a parent-of-origin-dependent manner in adult Collaborative Cross mice. Genes, Brain and Behavior 17: e12438.
OpenUrl CrossRef
↵
Sheedy, C. B., 2012 Genomic Characterization and Comparative Analysis of the Xce Candidate Region. Ph.D. thesis, Duke University.
↵
Shorter, J. R., M. L. Najarian, T. A. Bell, M. Blanchard, M. T. Ferris, et al., 2019 Whole Genome Sequencing and Progress Towards Full Inbreeding of the Mouse Collaborative Cross Population. G3: Genes, Genomes, Genetics 9: 1303–1311.
OpenUrl
↵
Sigmon, J. S., M. Blanchard, R. Baric, T. Bell, J. Brennan, et al., 2020 Content and performance of the MiniMUGA genotyping array, a new tool to improve rigor and reproducibility in mouse research. bioRxiv p. 2020.03.12.989400.
↵
Simmler, M. C., B. M. Cattanach, C. Rasberry, C. Rougeulle, and P. Avner, 1993 Mapping the murine Xce locus with (CA)n repeats. Mammalian Genome 4: 523–530.
OpenUrl CrossRef PubMed Web of Science
↵
Snow, M. H. L., 1977 Gastrulation in the mouse: Growth and regionalization of the epiblast. Development 42.
↵
Srivastava, A., A. P. Morgan, M. L. Najarian, V. K. Sarsani, J. S. Sigmon, et al., 2017 Genomes of the Mouse Collaborative Cross. Genetics 206: 537–556.
OpenUrl Abstract/FREE Full Text
↵
Takagi, N., 1980 Primary and secondary nonrandom X chromosome inactivation in early female mouse embryos carrying Searle’s translocation T(X; 16)16H. Chromosoma 81: 439–459.
OpenUrl CrossRef PubMed Web of Science
↵
Takagi, N., O. Sugawara, and M. Sasaki, 1982 Regional and temporal changes in the pattern of X-chromosome replication during the early post-implantation development of the female mouse. Chromosoma 85: 275–286.
OpenUrl CrossRef PubMed Web of Science
↵
Thorvaldsen, J. L., C. Krapp, H. F. Willard, and M. S. Bartolomei, 2012 Nonrandom X chromosome inactivation is influenced by multiple regions on the murine X chromosome. Genetics 192: 1095–1107.
OpenUrl Abstract/FREE Full Text
↵
Venables, W. N. and B. D. Ripley, 2002 Modern Applied Statistics with S. Springer, New York, fourth edition, ISBN 0-387-95457-0.
↵
Wang, X., P. D. Soloway, and A. G. Clark, 2010 Paternally biased X inactivation in mouse neonatal brain. Genome biology 11: R79.
OpenUrl CrossRef PubMed
↵
Williamson, C., A. Blake, S. Thomas, C. Beechey, J. Hancock, et al., 2013 Mouse Imprinting Data and References.
↵
Wutz, A., 2011 Gene silencing in X-chromosome inactivation: Advances in understanding facultative heterochromatin formation. Nature Reviews Genetics 12: 542–553.
OpenUrl CrossRef PubMed

View the discussion thread.

Posted November 14, 2020.

Download PDF

Citation Tools

Subject Area

Genetics

Subject Areas

All Articles

Animal Behavior and Cognition (5215)
Biochemistry (11752)
Bioengineering (8752)
Bioinformatics (29200)
Biophysics (14974)
Cancer Biology (12096)
Cell Biology (17411)
Clinical Trials (138)
Developmental Biology (9421)
Ecology (14182)
Epidemiology (2067)
Evolutionary Biology (18308)
Genetics (12245)
Genomics (16803)
Immunology (11869)
Microbiology (28097)
Molecular Biology (11594)
Neuroscience (60969)
Paleontology (451)
Pathology (1871)
Pharmacology and Toxicology (3238)
Physiology (4959)
Plant Biology (10427)
Scientific Communication and Education (1683)
Synthetic Biology (2886)
Systems Biology (7340)
Zoology (1651)

[1] ↵
Avner, P. and E. Heard, 2001 X-chromosome inactivation: Counting, choice and initiation. Nature Reviews Genetics 2: 59–67.
OpenUrl CrossRef PubMed Web of Science

[2] ↵
Balduzzi, S., G. Rücker, and G. Schwarzer, 2019 How to perform a meta-analysis with R: a practical tutorial. Evidence-Based Mental Health pp. 153–160.

[3] ↵
Beutler, E., M. Yeh, and V. F. Fairbanks, 1962 The normal human female as a mosaic of X-chromosome activity: studies using the gene for C-6-PD-deficiency as a marker. Proceedings of the National Academy of Sciences of the United States of America 48: 9–16.
OpenUrl FREE Full Text

[4] ↵
Bouckaert, R., T. G. Vaughan, J. Barido-Sottani, S. Duchêne, M. Fourment, et al., 2019 BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLoS Computational Biology 15: e1006650.
OpenUrl

[5] ↵
Brockdorff, N., A. Ashworth, G. F. Kay, P. Cooper, S. Smith, et al., 1991 Conservation of position and exclusive expression of mouse Xist from the inactive X chromosome. Nature 351: 329–331.
OpenUrl CrossRef PubMed Web of Science

[6] ↵
Broman, K. W., D. M. Gatti, P. Simecek, N. A. Furlotte, P. Prins, et al., 2019 R/qtl2: Software for mapping quantitative trait loci with high-dimensional data and multiparent populations. Genetics 211: 495–502.
OpenUrl Abstract/FREE Full Text

[7] ↵
Brown, C. J., A. Ballabio, J. L. Rupert, R. G. Lafreniere, M. Grompe, et al., 1991 A gene from the region of the human X inactivation centre is expressed exclusively from the inactive X chromosome. Nature 349: 38–44.
OpenUrl CrossRef PubMed Web of Science

[8] ↵
Calaway, J. D., A. B. Lenarcic, J. P. Didion, J. R. Wang, J. B. Searle, et al., 2013 Genetic architecture of skewed X inactivation in the laboratory mouse. PLoS genetics 9: e1003853.
OpenUrl

[9] ↵
Cattanach, B. M., 1970 Controlling elements in the mouse X-chromosome: III. Influence upon both parts of an X divided by rearrangement. Genetical Research 16: 293–301.
OpenUrl CrossRef PubMed Web of Science

[10] ↵
Cattanach, B. M. and J. H. Isaacson, 1967 Controlling elements in the mouse X-chromosome. Genetics 57: 331–346.
OpenUrl FREE Full Text

[11] ↵
Cattanach, B. M. and D. Papworth, 1981 Controlling elements in the mouse. V. Linkage tests with X-linked genes. Genetical research 38: 57–70.
OpenUrl PubMed Web of Science

[12] ↵
Cattanach, B. M., J. N. Perez, and C. E. Pollard, 1970 Controlling elements in the mouse X-chromosome. II. Location in the linkage map. Genetical research 15.

[13] ↵
Cattanach, B. M., C. E. Williams, and C. W. BM Cattanach, 1972 Evidence of non-random X chromosome activity in the mouse. Genetics Research 19: 229–40.
OpenUrl

[14] ↵
Chadwick, L. H., L. M. Pertz, K. W. Broman, M. S. Bartolomei, and H. F. Willard, 2006 Genetic control of X chromosome inactivation in mice: Definition of the Xce candidate interval. Genetics 173: 2103–2110.
OpenUrl Abstract/FREE Full Text

[15] ↵
Collaborative Cross Consortium, 2012 The genome architecture of the collaborative cross mouse genetic reference population. Genetics 190: 389–401.
OpenUrl Abstract/FREE Full Text

[16] ↵
Crowley, J. J., Y. Kim, A. B. Lenarcic, C. R. Quackenbush, C. J. Barrick, et al., 2014 Genetics of adverse reactions to haloperidol in a mouse diallel: A drug-placebo experiment and Bayesian causal analysis. Genetics 196: 321–347.
OpenUrl Abstract/FREE Full Text

[17] ↵
Crowley, J. J., V. Zhabotynsky, W. Sun, S. Huang, I. K. Pakatci, et al., 2015 Analyses of allele-specific gene expression in highly divergent mouse crosses identifies pervasive allelic imbalance. Nature Genetics 47: 353–360.
OpenUrl CrossRef PubMed

[18] ↵
DerSimonian, R. and N. Laird, 1986 Meta-analysis in clinical trials. Controlled Clinical Trials 7: 177–188.
OpenUrl CrossRef PubMed Web of Science

[19] ↵
Disteche, C. M. and J. B. Berletch, 2015 X-chromosome inactivation and escape. Journal of Genetics 94: 591–599.
OpenUrl CrossRef PubMed

[20] ↵
Dossin, F., I. Pinheiro, J. J. Żylicz, J. Roensch, S. Collombet, et al., 2020 SPEN integrates transcriptional and epigenetic control of X-inactivation. Nature 578: 455–460.
OpenUrl CrossRef

[21] ↵
Giorgetti, L., B. R. Lajoie, A. C. Carter, M. Attia, Y. Zhan, et al., 2016 Structural organization of the inactive X chromosome in the mouse. Nature 535: 575–579.
OpenUrl CrossRef PubMed

[22] ↵
Giusti-Rodríguez, P., J. G. Xenakis, J. J. Crowley, R. J. Nonneman, D. M. DeCristo, et al., 2020 Antipsychotic Behavioral Phenotypes in the Mouse Collaborative Cross Recombinant Inbred Inter-Crosses (RIX). G3: Genes, Genomes, Genetics p. g3.400975.2020.

[23] ↵
Hasegawa, M., H. Kishino, and T. aki Yano, 1985 Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution 22: 160–174.
OpenUrl CrossRef PubMed Web of Science

[24] ↵
Holt, J. and L. McMillan, 2014 Merging of multi-string BWTs with applications. Bioinformatics 30: 3524–3531.
OpenUrl CrossRef PubMed

[25] ↵
Johnston, P. G. and B. M. Cattanach, 1981 Controlling elements in the mouse. IV. Evidence of non-random X-inactivation. Genetical research 37: 151–60.
OpenUrl CrossRef PubMed Web of Science

[26] ↵
Keane, T. M., L. Goodstadt, P. Danecek, M. A. White, K. Wong, et al., 2011 Mouse genomic variation and its effect on phenotypes and gene regulation. Nature 477: 289–294.
OpenUrl CrossRef PubMed Web of Science

[27] ↵
Krumsiek, J., R. Arnold, and T. Rattei, 2007 Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics 23: 1026–8.
OpenUrl CrossRef PubMed Web of Science

[28] ↵
Larson, N. B., Z. C. Fogarty, M. C. Larson, K. R. Kalli, K. Lawrenson, et al., 2017 An integrative approach to assess X-chromosome inactivation using allele-specific expression with applications to epithelial ovarian cancer. Genetic Epidemiology 41: 898–914.
OpenUrl

[29] ↵
Lee, J. T., L. S. Davidow, and D. Warshawsky, 1999a Tsix, a gene antisense to Xist at the X-inactivation centre. Nature Genetics 21: 400–404.
OpenUrl CrossRef PubMed Web of Science

[30] ↵
Lee, J. T., N. Lu, and N. L. JT Lee, 1999b Targeted mutagenesis of Tsix leads to nonrandom X inactivation. Cell 99: 47–57.
OpenUrl CrossRef PubMed Web of Science

[31] ↵
Lee, J. T., W. M. Strauss, J. A. Dausman, and R. Jaenisch, 1996 A 450 kb transgene displays properties of the mammalian X-inactivation center. Cell 86: 83–94.
OpenUrl CrossRef PubMed Web of Science

[32] ↵
Lenarcic, A. B., J. D. Calaway, F. P.-M. de Villena, and W. Valdar, 2018 Bayesian Manifold-Constrained-Prior Model for an Experiment to Locate Xce. ArXiv p. 1812.08863.

[33] ↵
Lyon, M. F., 1961 Gene action in the X-chromosome of the mouse (mus musculus L.). Nature 190: 372–373.
OpenUrl CrossRef PubMed Web of Science

[34] ↵
Madeira, F., Y. M. Park, J. Lee, N. Buso, T. Gur, et al., 2019 The EMBL-EBI search and sequence analysis tools APIs in 2019. Nucleic Acids Research 47: W636–W641.
OpenUrl

[35] ↵
Minajigi, A., J. E. Froberg, C. Wei, H. Sunwoo, B. Kesner, et al., 2015 A comprehensive Xist interactome reveals cohesin repulsion and an RNA-directed chromosome conformation. Science 349: aab2276.
OpenUrl Abstract/FREE Full Text

[36] ↵
Minks, J., W. P. Robinson, and C. J. Brown, 2008 A skewed view of X chromosome inactivation. Journal of Clinical Investigation 118: 20–23.
OpenUrl CrossRef PubMed Web of Science

[37] ↵
Morgan, A. P., C. P. Fu, C. Y. Kao, C. E. Welsh, J. P. Didion, et al., 2016 The mouse universal genotyping array: From substrains to subspecies. G3: Genes, Genomes, Genetics 6: 263–279.
OpenUrl

[38] ↵
Nesterova, T. B., S. C. Barton, M. A. Surani, and N. Brockdorff, 2001 Loss of Xist imprinting in diploid parthenogenetic preimplantation embryos. Developmental Biology 235: 343–350.
OpenUrl CrossRef PubMed Web of Science

[39] ↵
Nora, E. P., B. R. Lajoie, E. G. Schulz, L. Giorgetti, I. Okamoto, et al., 2012 Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 485: 381–385.
OpenUrl CrossRef PubMed Web of Science

[40] ↵
Ogawa, Y. and J. T. Lee, 2003 Xite, X-inactivation intergenic transcription elements that regulate the probability of choice. Molecular Cell 11: 731–743.
OpenUrl CrossRef PubMed Web of Science

[41] ↵
Okamoto, I., A. P. Otte, C. D. Allis, D. Reinberg, and E. Heard, 2004 Epigenetic Dynamics of Imprinted X Inactivation during Early Mouse Development. Science 303: 644–649.
OpenUrl Abstract/FREE Full Text

[42] ↵
Oreper, D., S. A. Schoenrock, R. McMullan, R. Ervin, J. Farrington, et al., 2018 Reciprocal F1 Hybrids of Two Inbred Mouse Strains Reveal Parent-of-Origin and Perinatal Diet Effects on Behavior and Expression. G3: Genes, Genomes, Genetics 8: 3447–3468.
OpenUrl

[43] ↵
Penny, G. D., G. F. Kay, S. A. Sheardown, S. Rastan, and N. Brockdorff, 1996 Requirement for Xist in X chromosome inactivation. Nature 379: 131–137.
OpenUrl CrossRef PubMed Web of Science

[44] ↵
Percec, I., R. M. Plenge, J. H. Nadeau, M. S. Bartolomei, and H. F. Willard, 2002 Autosomal dominant mutations affecting X inactivation choice in the mouse. Science 296: 1136–1139.
OpenUrl Abstract/FREE Full Text

[45] ↵
Plenge, R. M., I. Percec, J. H. Nadeau, and H. F. Willard, 2000 Expression-based assay of an X-linked gene to examine effects of the X-controlling element (Xce) locus. Mammalian Genome 11: 405–408.
OpenUrl CrossRef PubMed Web of Science

[46] ↵
K. Hornik,
F. Leisch, and
A. Zeileis
Plummer, M., 2003 JAGS: A Program for Analysis of Bayesian Graphical Models Using Gibbs Sampling. In Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003), edited by K. Hornik, F. Leisch, and A. Zeileis, pp. 1–10, Vienna, Austria.

[47] K. Hornik,

[48] F. Leisch, and

[49] A. Zeileis

[50] ↵
R Core Team, 2017 R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.

[51] ↵
Rastan, S., 1982 Timing of X-chromosome inactivation in postimplantation mouse embryos. Journal of embryology and experimental morphology 71: 11–24.
OpenUrl PubMed Web of Science

[52] ↵
Schoenrock, S. A., D. Oreper, J. Farrington, R. C. McMullan, R. Ervin, et al., 2018 Perinatal nutrition interacts with genetic background to alter behavior in a parent-of-origin-dependent manner in adult Collaborative Cross mice. Genes, Brain and Behavior 17: e12438.
OpenUrl CrossRef

[53] ↵
Sheedy, C. B., 2012 Genomic Characterization and Comparative Analysis of the Xce Candidate Region. Ph.D. thesis, Duke University.

[54] ↵
Shorter, J. R., M. L. Najarian, T. A. Bell, M. Blanchard, M. T. Ferris, et al., 2019 Whole Genome Sequencing and Progress Towards Full Inbreeding of the Mouse Collaborative Cross Population. G3: Genes, Genomes, Genetics 9: 1303–1311.
OpenUrl

[55] ↵
Sigmon, J. S., M. Blanchard, R. Baric, T. Bell, J. Brennan, et al., 2020 Content and performance of the MiniMUGA genotyping array, a new tool to improve rigor and reproducibility in mouse research. bioRxiv p. 2020.03.12.989400.

[56] ↵
Simmler, M. C., B. M. Cattanach, C. Rasberry, C. Rougeulle, and P. Avner, 1993 Mapping the murine Xce locus with (CA)n repeats. Mammalian Genome 4: 523–530.
OpenUrl CrossRef PubMed Web of Science

[57] ↵
Snow, M. H. L., 1977 Gastrulation in the mouse: Growth and regionalization of the epiblast. Development 42.

[58] ↵
Srivastava, A., A. P. Morgan, M. L. Najarian, V. K. Sarsani, J. S. Sigmon, et al., 2017 Genomes of the Mouse Collaborative Cross. Genetics 206: 537–556.
OpenUrl Abstract/FREE Full Text

[59] ↵
Takagi, N., 1980 Primary and secondary nonrandom X chromosome inactivation in early female mouse embryos carrying Searle’s translocation T(X; 16)16H. Chromosoma 81: 439–459.
OpenUrl CrossRef PubMed Web of Science

[60] ↵
Takagi, N., O. Sugawara, and M. Sasaki, 1982 Regional and temporal changes in the pattern of X-chromosome replication during the early post-implantation development of the female mouse. Chromosoma 85: 275–286.
OpenUrl CrossRef PubMed Web of Science

[61] ↵
Thorvaldsen, J. L., C. Krapp, H. F. Willard, and M. S. Bartolomei, 2012 Nonrandom X chromosome inactivation is influenced by multiple regions on the murine X chromosome. Genetics 192: 1095–1107.
OpenUrl Abstract/FREE Full Text

[62] ↵
Venables, W. N. and B. D. Ripley, 2002 Modern Applied Statistics with S. Springer, New York, fourth edition, ISBN 0-387-95457-0.

[63] ↵
Wang, X., P. D. Soloway, and A. G. Clark, 2010 Paternally biased X inactivation in mouse neonatal brain. Genome biology 11: R79.
OpenUrl CrossRef PubMed

[64] ↵
Williamson, C., A. Blake, S. Thomas, C. Beechey, J. Hancock, et al., 2013 Mouse Imprinting Data and References.

[65] ↵
Wutz, A., 2011 Gene silencing in X-chromosome inactivation: Advances in understanding facultative heterochromatin formation. Nature Reviews Genetics 12: 542–553.
OpenUrl CrossRef PubMed

Skewed X inactivation in genetically diverse mice is associated with recurrent copy number changes at the Xce locus

ABSTRACT

Introduction

Materials and Methods

Notation

Mouse breeding populations and sample collection

SP1

SP2

RNA-seq preparation

SP1

SP2

Genotyping in CC-RIX and haplotype reconstruction

Measuring allele-specific expression (ASE) in F1 females

Statistical modeling of X chromosome inactivation

Model for RIX-specific XCI proportion

Pólya urn-based estimation of the day 5 brain precursor count

Whole genome sequences of CC strains

Haplotype analyses based on WGS

Phylogenetic analysis of CC founder strains

Quantifying copy number variations

Data availability

Results

XCI ratio estimated for each mouse and RIX from RNA-seq allele-specific expression

XCI is relatively consistent across genes within an individual

Pattern of XCI skew in RIXs with known Xce allele is consistent with previous studies

Estimated number of cells in pre-brain epiblast tissue range from 20 to 30

Unexpected XCI skewing in RIX females with the NOD Xce allele

SNP analysis in the Xce interval show that NOD and C57BL/6J have almost identical haplotypes

Copy number variations distinguish weaker Xce alleles from stronger ones

Discussion

Leveraging increased genetic diversity in CC-RIX identifies novel XCI patterns

Pólya urn-based approximation to the number of cells in pre-brain epiblast tissue

Copy number of recurrent duplications may explain the weakness of Xcea and novel Xcef, found in NOD

NZO expresses Xceb despite complicated CNV organization

CNV abundance and organization, along with sequence variation, may all play a role in Xce strength and XCI skewing

Supporting information

Supplemental Figures and Tables

Acknowledgements

Literature Cited

Citation Manager Formats

Subject Area

Copy number of recurrent duplications may explain the weakness of Xce^a and novel Xce^f, found in NOD

NZO expresses Xce^b despite complicated CNV organization