History of the Collaborative Cross

The Collaborative Cross (CC) is a large, multiparental, recombinant inbred (RI) strain panel that was motivated by the need among the mouse genetics community for a high-precision genetic resource that could serve as a common integration point for the multitude of mouse genetic studies that were sure to follow in the wake of the complete sequencing of the mouse and human genomes. The concept for this common mouse genetic reference population was first proposed at the Edinburgh meeting of the International Mouse Genome Conference in October of 2001 and in print by founding members (Threadgill et al. 2002) of the Complex Trait Consortium (CTC).

An RI strain panel provides significant advantages as a resource because it is a reproducible population for cumulative data integration. Existing RI panels have limited statistical power due to their small size and capture only limited allelic diversity because all current RI sets originate from only two inbred progenitor strains. A large RI panel derived from multiple strains could capture significantly more genetic diversity and would provide sufficient power and resolution for genetic dissection of polygenic traits and construction of systems genetic networks. The CC breeding design was proposed as a strategy to rapidly and randomly mix the genomes of eight founder strains to create independent breeding lines (Churchill et al. 2004). Five classical inbred strains (A/J, C57BL/6 J, 129S1/SvImJ, NOD/LtJ, NZO/H1LtJ) and three wild-derived strains (CAST/EiJ, PWK/PhJ, and WSB/EiJ) were selected to be the eight founders of the CC. Analysis of the allelic variation in mouse inbred strains demonstrates that the eight CC founder strains capture on average 90% of the known allelic diversity across all 1-Mb intervals spanning the entire mouse genome (Roberts et al. 2007). Simulations of the power (Valdar et al. 2006a) and precision (Broman 2005) of genetic mapping with the CC population indicated superior performance to alternative strategies and provided guidelines for sample sizes.

Initial proposals for implementing the CC called for a widely distributed breeding effort. However, logistical and scientific advantages exist for breeding and distributing the CC from a small number of locations that have well-defined and consistent husbandry practices that will minimize confounding effects of environment-specific selective effects. Due to the proposed CC size, attaining the ideals originally envisioned for the CC design requires a large, dedicated facility capable of consistent randomized matings. In May 2005, breeding began at the Oak Ridge National laboratory (ORNL) supported through major funding from The Ellison Medical Foundation and The Department of Energy. Additional support for production and expansion of the CC resource at ORNL has been provided by the National Institutes of Health. Here we report on the current status of the CC lines established at the ORNL William R. and Liane B. Russell Vivarium. To date, 650 lines have been initiated in two cohorts. The first set of 474 CC lines was initiated in 2005 with a second set of 176 lines initiated in 2007 (Fig. 1).

Fig. 1
figure 1

Birth records of all CC mice born in the ORNL colony. Each mouse is represented by a single bar. Two groups of lines have been started at ORNL. The first, consisting of 474 started lines, is approaching G2:F12. The second is in the initiation phases. Time to fertility is gradually slowing to a typical pace for inbred lines. This display can be regenerated at http://mouse.ornl.gov/projects/cc_breeding_progress.html

Construction of the Collaborative Cross at ORNL

To initiate construction of the CC, the eight progenitor strains, referred to as the G0 generation, were first intercrossed to generate 56 possible G1 hybrid combinations. G1 progeny are crossed to create the four-way G2 generation and a G2 × G2 cross yields the first eight-way progeny, the G2:F1 s. G2:F1 s are then propagated by sib-mating through the G2:Fn generations until they are fully inbred, at approximately G2:F22 (Broman 2005). Each resulting independent CC breeding funnel will be a unique and independent combination of the eight founder genomes. A major goal of the ORNL implementation of this breeding scheme has been to minimize clustering of recombination sites that result from strain pair-specific hotspots found in most mating designs (Kelmenson et al. 2005), and selection for single or multiple loci associated with viability, behavioral, and fertility traits by using the breeding software described below. The G0 progenitor strains were obtained from The Jackson Laboratory (TJL), and the G1 animals were produced either at TJL or at ORNL from stock obtained directly from TJL. The colony is restocked periodically to avoid drift in the progenitor lines.

Design for balance

An eight-way CC line can be defined by the order of the initial G0 through G2:F1 matings. Thus, if we represent the eight founder strains by letters A–H, one possible breeding scheme is

$$ (({\text{A }} \times {\text{ B}}) \times ({\text{C }} \times {\text{ D}})) \times (({\text{E }} \times {\text{ F}}) \times ({\text{G }} \times {\text{ H}})), $$

where the strain on the left of each pair is the female parent. This notation summarizes the four crosses yielding four two-way hybrids (G1) followed by two crosses yielding two four-way hybrids (G2) followed by a cross yielding an eight-way hybrid (G2F1). There are more than 40,000 possible unique orderings (8!), so a subset must be chosen for production. Each line is initiated with a unique breeding order, which we refer to as a breeding “funnel.”

G0 or G1 animals from the same strain or mating, respectively, are genetically identical. Therefore, two independent reciprocal funnels can be established from the same litters of G1 hybrids, doubling the number of lines obtained from a given set of G1 hybrids without introducing shared recombination sites. Thus, the funnels are set up as reciprocal pairs:

$$ (({\text{A }} \times {\text{ B}}) \times ({\text{C }} \times {\text{ D}})) \times (({\text{E }} \times {\text{ F}}) \times ({\text{G }} \times {\text{ H}})) $$

and

$$ (({\text{E }} \times {\text{ F}}) \times ({\text{G }} \times {\text{ H}})) \times (({\text{A }} \times {\text{ B}}) \times ({\text{C }} \times {\text{ D}})) $$

With this systematic breeding design, the genetic contribution of each of the eight CC founder strains to each line is equivalent and when averaged across all CC lines, the allele frequency at each locus will ideally be 0.125 (1/8). While this is automatically true for autosomal loci, it is not necessarily true for mitochondrial genomes and sex chromosomes. Therefore, of the possible funnels, a balanced set is chosen to produce equal contributions for each single factor (X, Y, and mitochondria) as well as for all pairwise combinations of factors across all the breeding funnels.

The location occupied by any particular strain in the initial crosses impacts the genetic composition of the resultant CC line (Fig. 2). One reason for this is that the female parent of a cross contributes mitochondrial DNA through cytoplasm and the male parent of each cross contributes the Y chromosome. In the cross denoted A × B, the strain in the A position contributes mitochondrial DNA, and the strain in the B position contributes the Y chromosome to offspring. When these offspring are crossed to offspring of the cross denoted C × D, mitochondria from the strain in the A position and the Y chromosome from the strain in the D position are retained.

Fig. 2
figure 2

A depiction of the Collaborative Cross funnel design showing the progenitor source of autosomal, sex chromosomal, and mitochondrial DNA. The strain in the A position contributes the mitochondrial DNA, and the strain in the H position contributes the Y chromosome. X chromosomal DNA in finished line comes from strains in the A, B, C, E, and F positions

A second way in which progenitor position determines genetic contribution is through the contribution of X chromosomal material. Males contribute their X chromosome to female offspring and their Y chromosome to male offspring. Females contribute their X chromosomes. In the cross denoted above, a female is chosen from the A × B cross, and a male is chosen from the C × D cross. The female has X chromosomal material from parents in both the A and B position, but the male only inherits an X chromosome from the parent in the C position.

Finally, the pairing of progenitors in the earliest generations may confer specific biases in the location and density of recombinations. Meiotic recombination events are cumulative. Those recombinations that occur in early generations are retained in subsequent generations. Additional recombinations occur in each subsequent generation and have the potential to accumulate detectably in each generation prior to inbreeding. In a particular segment of DNA, if alleles are similar in the two progenitors of a cross (identical by state) or fixed through inbreeding (identical by descent), recombination events are not detectable because identical strands of DNA are being broken and recombined. The accumulation of genetic recombination is influenced by genetic background (Kelmenson et al. 2005), and recombinations do not occur with equal probability across the genome. Consequently, each of the eight progenitor strains should occupy the A through H positions an approximately equal number of times when the entire CC panel is considered. Furthermore, it is desirable to balance higher-order combinations of mitochondrial and Y-chromosomal material by balancing all two-way combinations of these factors. Lastly, balance of the combinations of parents in the first generation of crosses will minimize potential bias in recombination accumulation.

Pairwise balance of the founder strain combinations is achieved when all 56 pairwise combinations occur at equal frequency across the set of funnels. This will reduce the impact of systematic allele incompatibilities within and across loci. The variance of the number of lines across 56 pairwise combinations, therefore, gives a single number by which to evaluate pairwise balance in a set of funnels. The controlled pairwise combinations are discussed in the following subsections.

Chr Y-mitochondria combinations

Progenitors A and H, respectively, contribute mitochondria and Y chromosome to the funnel line, and progenitors E and D, respectively, contribute mitochondria and Y chromosome to the reciprocal line. Y-mitochondria combinations will be balanced if all 56 pairwise combinations of strains appear with equal frequency in the progenitor pairs (A:H) and (D:E).

Chr X-Chr Y combinations

Since five progenitors (A, B, C, E, or F in the funnel) may contribute X chromosomes to a line, strain frequency must be averaged over ten two-strain progenitor combinations, with double weight for two combinations (C:E and C:F in the funnel), to evaluate the balance of X-Y combinations.

Autosomal combinations

Genomes of strains paired in the first generation have an early opportunity for recombination (Broman 2005). To balance any effect of this opportunity, pairwise strain combinations can be balanced over the four matings in the first generation involving progenitor pairs (A:B), (C:D), (E:F), and (G:H).

Balanced CC design schemes are created using customized software (CC8scheme) which systematically tests all available funnel pairs to optimize over the above parameters. CC8scheme builds a design scheme by stepwise addition of the funnel pair that would most improve the balance of the existing set of funnel pairs. The resulting designs avoid using strain combinations known to be infertile or unproductive while still achieving the best possible balance. The current design avoids (NZO × CAST) and (NZO × PWK) hybrids, which are reproductively incompatible, and (PWK × 129) males, which are infertile.

At each generation, two to seven matings are started from a randomly selected litter from the previous generation within each funnel. Normally, a litter from the first-priority mating will provide the next generation. If the first-priority mating fails to produce a litter by the time that one of the other matings has produced a second litter, then the lower-priority litter will be used for the next generation. Thus, litter selection is pseudorandom, since selection for fecundity is allowed only if the survival of the line is threatened. Once a litter is chosen, mice within the litter are randomly assigned mates from available siblings, avoiding inadvertent selection for docility or other behavioral characteristics.

Breeding of the Collaborative Cross is controlled and documented by custom software called Collaborative Cross Database (CCDB), which supports the task of maintaining a randomized mating scheme in the breeding colony based on a specified balanced mating design. CCDB is a three-tier Web application comprising a MySQL database and a Python application that provides a browser-based user interface. Husbandry technicians access CCDB directly from laptops in the mouse handling rooms. The user interface is tightly integrated with the workflow of weaning one generation of mice and setting up matings to create the next generation. CCDB was designed with the following goals: (1) to ensure the randomization of progeny selection and mate choice at each generation, (2) to allow data entry as mice are weaned and mated, (3) to minimize data entry time and data entry errors, (4) to monitor and maintain data integrity, and (5) to allow data entry, monitoring, and reporting from multiple locations. The result is a fully traceable breeding history for each mouse (Fig. 3). These custom software tools are available at http://sourceforge.net/projects/cc8works/.

Fig. 3
figure 3

Visual representation of a completely traceable funnel in CCDB. This display shows funnel #55 and its reciprocal #232. The gray highlight indicates the mating cage that was randomly chosen to produce the next generation

Status of the Collaborative Cross at ORNL

A total of 650 CC funnels were initiated from a balanced set of funnels identified by CC8scheme. As of this writing, 452 funnels are extant, with the remaining 198 lost during the breeding process. These include 41 funnels that required crosses (NZO/H1LtJ × CAST/EiJ) and (NZO/H1LtJ × PWK/PhJ) that are not easily obtained or that are often infertile. Because these combinations are now avoided in the first-generation crosses, the breeding success rate should improve. The majority of funnels (126 of 198) were lost during generations G2:F4-G2:F6 (i.e., after three to five generations of inbreeding) (Fig. 4). Currently, the most advanced lines have reached 12 generations of inbreeding, with the bulk of funnels distributed through generations G2:F6–G2:F8. Therefore, the population is theoretically inbred at approximately 75% of all loci (Fig. 4), based on a simulation of allelic identity-by-descent performed using the R/ricalc package (Broman 2005). In reality, the proportion of loci with alleles identical by state will be higher due to shared haplotypes among the founder strains.

Fig. 4
figure 4

Summary of ORNL funnels generated to date. The solid line indicates the total number of funnels that have reached the given generation, and the dotted line reveals the maximum number of funnels that can potentially be reached at this time based on a projection of the current number of extant funnels. Some loss is anticipated. The two subgroups of lines have been separated. Infertile G1 s are not represented

Characterizing the Collaborative Cross

During generation of the CC lines, many measurements are routinely collected to support intermediate phenotypic analysis. Breeding records allow analysis of time to fertility, gestation period, sex ratio, and litter size (Fig. 5). Each retired breeder is also characterized through a phenotyping protocol, which includes dissection and storage of tissue samples. A panel of phenotypes selected to broadly index morphology, organ function, and behavior is collected from the retired breeders of each generation and from each line. Tail samples for DNA have been obtained for all breeders in all lines, and a DNA bank has been made for many of the breeders, including the entire G2:F7 generation. Phenotype data and imported husbandry records from CCDB are stored and integrated in the MouseTrack system (Baker et al. 2004), which consists of an ORACLE database and SAS client software for genetic analysis. Software for QTL mapping of interim generations (Mott et al. 2000; Valdar et al. 2006b) is being incorporated into this system.

Fig. 5
figure 5

Estimate of heterozygosity within the CC. This figure shows the mean estimated identical by descent allele frequency at each generation in 10,000 simulated CC lines

Phenotype distributions and heritability

Breeding and phenotype data are used to monitor heritabilities and phenotypic diversity as inbreeding progresses (Fig. 6). The female and male breeding pair that serves as parents for the subsequent generation within each line are removed from breeding and used in the phenotyping screen after two generations of offspring have been produced for that line. A variety of biological systems is under current investigation, including morphologic, behavioral, and physiologic phenotypes. The panel of phenotypes currently collected includes behavioral wildness, anxiety, activity, sleep, nociception, body weight, tail length, bone density, gastrointestinal microflora, chromosomal aging, fasting plasma glucose, blood chemistry, and the weights of kidney, heart, and gonadal fat pads. For each transition across generations, heritability is computed in the MouseTrack system using parent-offspring regression on either mid-parent values or single-parent values. Heritability analysis tests the utility of the CC as a genetic mapping tool for a given trait, and the dispersion of phenotypes in interim generations tests its validity as a population-based model system. These ongoing studies will provide a wealth of parametric data for simulation, power analysis, and computational tool development (Fig. 7).

Fig. 6
figure 6

CC funnel litter size by generation. Litter sizes have reached an asymptote at approximately 4, which is typical for RI lines

Fig. 7
figure 7

MouseTrack calculation and display of heritability of two complex phenotypes, fasting plasma glucose, and behavioral wildness (Wahlsten et al. 2003) through seven generations of inbreeding of the Collaborative Cross

Genotyping the Collaborative Cross

Several genotyping efforts are underway, including the Tennessee Mouse Genome Consortium and DOE-funded effort to genotype and characterize the G2:F7 generation. In this project one female and one male of each line will be genotyped on a custom array of 13,000 SNPs that uniquely identify all eight progenitor haplotypes at over 1200 regions of the genome. This single-generation cross section will enable the analysis of population structure, recombination rate, and detection of systematically linked loci. Any such loci that are identified should be the result of actual biological selection, because the CCDB software-assisted breeding has eliminated many effects of human selection for docility and reproductive behavioral effects. A second effort underway will entail in-depth genotyping and phenotyping of individuals from all extant strains at various generations.

The future of the Collaborative Cross at ORNL

The CC mice and derivatives of their breeders are currently available through collaborative arrangements with the ORNL Mouse Genetics Research Facility. Plans are being made to ensure that finished lines will be available from a network of phenotyping centers, each of which will have all inbred CC lines on site. CC funnels will continue to be initiated and maintained until the target population size of 1000 CC lines has been met or exceeded. Archiving of strains through cryopreservation will be performed as inbreeding is advanced. The early studies performed on intermediate generations will provide a valuable resource for integrative genetics and genomics, and will yield a demonstration of the utility of the CC. As the CC genotypes are generated and regions of residual heterozygosity identified, mapping studies can be undertaken in their progeny.