Abstract
Evolutionary studies are often limited by missing data that are critical to understanding the history of selection. Selection experiments, which reproduce rapid evolution under controlled conditions, are excellent tools to study how genomes evolve under strong selection. Here we present a genomic dissection of the Longshanks selection experiment, in which mice were selectively bred over 20 generations for longer tibiae relative to body mass, resulting in 13% longer tibiae in two replicate lines. We synthesized evolutionary theory, genome sequences and molecular genetics to understand the selection response and found that it involved both polygenic adaptation and discrete loci of major effect, with the strongest loci likely to be selected in parallel between replicates. We show that selection may favor de-repression of bone growth through inactivation of two limb enhancers of an inhibitor, Nkx3-2. Our integrative genomic analyses thus show that it is possible to connect individual base-pair changes to the overall selection response.
Main Text
Understanding how populations adapt to a changing environment is an urgent challenge of global significance. The problem is especially acute for mammals, which often feature small and fragmented populations due to widespread habitat loss. Small populations often have limited capacity for rapid adaptation due to loss of diversity through inbreeding (1). From a genetic perspective, adaptation depends on the availability of beneficial alleles, which in mammals typically come from standing genetic variation, rather than new mutations (as in clonal microbes). Mammals nonetheless respond readily to selection in many traits, both in nature and in the laboratory (2-6). This is interpreted in quantitative genetics as evidence for a large set of nearly neutral loci; data from selection experiments are largely indistinguishable from this infinitesimal model, which is the basis for commercial breeding (7). However, it remains unclear what type of genomic change is associated with rapid response to selection, especially in small populations where allele frequency changes can be dominated by random drift. While a large body of theory exists to describe the birth, rise and eventual fixation of adaptive variants under diverse selection scenarios (8-13), few empirical datasets capture sufficient detail on the founding conditions and selection regime to allow full reconstruction of the selection response. This is particularly problematic in nature, where historical samples, environmental measurements and replicates are often missing. Selection experiments, which reproduce rapid evolution under controlled conditions, are therefore excellent tools to understand response to selection—and by extension—adaptive evolution in nature (4).
Here we describe an integrative, multi-faceted investigation into an artificial selection experiment, called Longshanks, in which mice were selected for increased tibia length, relative to body mass (14). The mammalian limb is an ideal model to study the dynamics of complex traits under selection. It is both morphologically complex and functionally diverse, reflecting its adaptive value; limb development has been studied extensively in mammals, birds and fishes as a genetic and evolutionary paradigm (15). The Longshanks selection experiment thus offers the opportunity to study selection response not only from a quantitative and population genetics perspective, but also from a developmental (16) and genomic perspective.
By design, the Longshanks experiment preserves a nearly complete archive of the phenotype (trait measurements) and genotype (via tissue samples) in the pedigree. By sequencing the initial and final genomes, we can study this example of rapid evolution with unprecedented detail and resolution. Here, with essentially complete information, we wish to answer a number of important questions regarding the factors that determine and constrain rapid adaptation. We ask whether the observed changes in gene frequency are due to selection, or random drift: does rapid selection response of a complex trait proceed through innumerable loci of infinitesimally small effect, or through a few loci of large effect? If so, what signature of selection do we expect? Finally, when the same trait changes occur independently, do these depend on changes in the same gene(s) or the same pathways (parallelism)?
Longshanks selection for longer tibiae
At the start of the Longshanks experiment, we established three base populations with 14 pairs each by sampling from a genetically diverse, commercial mouse stock [Hsd:ICR, also known as CD-1; derived from mixed breeding of classical laboratory mice (17)]. In two replicate “Longshanks” lines (LS1 and LS2), we selectively bred mice with the longest tibia relative to the cube root of body mass, 15–20% of offspring being selected for breeding [see (14) for details]. We kept a third Control line (Ctrl) using an identical breeding scheme, except that breeders were selected at random. In LS1 and LS2, we observed a strong and significant response to selection of tibia length [0.29 and 0.26 Haldane or standard deviations (s.d.) per generation, from a selection differential of 0.73 s.d. in LS1 and 0.62 s.d. in LS2]. Over 20 generations, selection for longer relative tibia length produced increases of 5.27 and 4.81 s.d. in LS1 and LS2, respectively (or 12.7% and 13.1% in tibia length), with negligible changes in body mass (Fig. 1B & C; fig. S1). By contrast, Ctrl showed no directional change in tibia length or body mass (Fig. 1C; Student’s t-test, P > 0.05). This approximately 5 s.d. change in 20 generations is rapid compared to typical rates observed in nature [(18), but see (19)] but is in line with responses seen in selection experiments (3, 6, 20, 21).
Simulating selection response: infinitesimal model with linkage
We cannot use a neutral null model, since we know that strong selection was applied. We therefore developed a simulation that faithfully recapitulates the artificial selection experiment by integrating the trait measurements, selection regime, pedigree and genetic diversity of the Longshanks selection experiment, in order to generate an accurate expectation for the genomic response. Using the actual pedigree and trait measurements, we mapped fitness onto tibia length T and body mass B as a single composite trait ln(TBθ). We estimated θ from actual data as −0.57, such that the ranking of breeders closely matched the actual composite ranking used to select breeders in the selection experiment, based on T and separately (14) (fig. S2A). We assumed a maximally polygenic genetic architecture using an “infinitesimal model with linkage” (abbreviated here as HINF), under which the trait is controlled by very many loci, each of infinitesimally small effect (see Supplementary Notes for details). Results from simulations seeded with actual genotypes or haplotypes showed that overall, the predicted increase in inbreeding closely matched the observed data (fig. S2B). We tested models with varying selection intensity and initial linkage disequilibrium (LD), and for each, ran 100 simulated replicates to determine the significance of changes in allele frequency (fig. S2C–E). This flexible quantitative genetics framework allows us to explore possible changes in genetic diversity over 17 generations of breeding, under strong selection.
In simulations, we followed blocks of genome as they were passed down the pedigree. In order to compare with observations, we seeded the initial genomes with single nucleotide polymorphisms (SNPs) in the same number and initial frequencies as the data. We observed much more variation between chromosomes in overall inbreeding (fig. S2B) and in the distribution of allele frequencies (fig. S3B) than expected from simulations in which the ancestral SNPs were initially in linkage equilibrium. This can be explained by linkage disequilibrium (LD) between the ancestral SNPs, which greatly increases random variation. Therefore, we based our significance tests on simulations that were seeded with SNPs drawn with LD consistent with the initial haplotypes (fig. S2C & E; see Supplementary Notes).
Because our simulations assume infinitesimal effects of loci, allele frequency shifts exceeding this stringent threshold suggest that discrete loci contribute significantly to the selection response. An excess of such loci in either a single LS replicate or in parallel would thus imply a mixed genetic architecture of a few large effect loci amid an infinitesimal background.
Sequencing the Longshanks mice reveals genomic signatures of selection
To detect the genomic changes in the actual Longshanks experiment, we sequenced all individuals of the founder (F0) and 17th generation (F17) to an average of 2.91-fold coverage (range: 0.73–20.6×; n = 169 with <10% missing F0 individuals; Table S1). Across the three lines, we found similar levels of diversity, with an average of 6.7 million (M) segregating SNPs (approximately 0.025%, or 1 SNP per 4 kbp; Table S1; figs S3A & S4). We checked the founder populations to confirm negligible divergence between the three founder populations (across-line FST on the order of 1×10-4), which increased to 0.18 at F17 (Table S2). This is consistent with random sampling from an outbred breeding stock. By F17, the number of segregating SNPs dropped to around 5.8 M (Table S1). This 13% drop in diversity (0.9M SNPs genome-wide) closely matched our simulation results, from which we could determine that the drop was mostly due to inbreeding, with only a minor contribution from selection (Supplementary Notes, fig. S2B, D).
We conclude that despite the strong selection on the LS lines, there was little perturbation to genome-wide diversity. Indeed, the changes in diversity during the 17 generations were remarkably similar in all three lines, despite Ctrl not having experienced selection on relative tibia length (fig. S3A). Hence, and consistent with our simulation results (fig. S2B, D), changes in global genome diversity had little power to distinguish selection from neutral drift despite the strong phenotypic selection response.
We next asked whether specific loci reveal more definitive differences between the LS replicates and Ctrl (and from infinitesimal predictions). We calculated Δz2, the square of arc-sin transformed allele frequency difference between F0 and F17; this has an expected variance of 1/2Ne per generation, independent of starting frequency, and ranges from 0 to π2. We averaged Δz2 within 10 kbp windows (see Methods for details), and found 169 windows belonging to eight clusters that had significant shifts in allele frequency in LS1 and/or 2 (corresponding to 9 and 164 clustered windows respectively at P ≤ 0.05 under HINF, max LD; Δz2 ≥ 0.33 π2; genome-wide Δz2= 0.02 ± 0.03 π2; Fig. 2; figs. S2D, S5, S6; see Methods for details) and in 3 clusters in Ctrl (8 windows; genome-wide Δz2 = 0.01 ± 0.02 π2). The eight loci overlapped between 2 to 179 genes and together contain 11 candidate genes with known roles in bone, cartilage and/or limb development (e.g., Nkx3-2 and Sox9; Table 1). Four out of the eight loci contain genes with a “short tibia” or “short limb” knockout phenotype (Table 1; P ≤ 0.032 from 1000 permutations, see Methods for details). Of the broader set of genes at these loci with any limb knockout phenotypes, only fibrillin 2 (Fbn2) is polymorphic for SNPs coding for different amino acids, suggesting that for the majority of loci with large shifts in allele frequency, gene regulatory mechanism(s) were likely important in the selection response (fig. S7; Table S3; see Supplementary Notes for further analyses on enrichment in gene functions, protein-coding vs. cis-acting changes and clustering with loci affecting human height).
Taken together, two major observations stand out from our genomic survey. One, a polygenic, infinitesimal selection model with strong LD amongst marker SNPs best fits the observed data; and two, we nevertheless find more discrete loci in LS1 and LS2 than in Ctrl, beyond the significance threshold set by the infinitesimal model (Fig. 2; fig. S5). Thus, We conclude that the genetic basis of the selection response in the Longshanks experiment has a significant contribution from discrete loci with major effect, but the remaining response may be indistinguishable from a polygenic response.
Table 1. Major loci likely contributing to the selection response. These 8 loci show significant allele frequency shifts in Δz2 and are ordered according to their estimated selection coefficients. Shown for each locus are the full hitchhiking spans, peak location and their size covering the core windows, the overlapping TAD and the number of genes found in it. The two top-ranked loci show shifts in parallel in both LS 1 & 2, with the remaining six showing line-specific response (LS1: 1; LS2: 5). Candidate genes found within the TAD with limb, cartilage or bone developmental knockout phenotypes functions are shown, with asterisks (*) marking those with a “short tibia” knockout phenotype (see also fig. S6 and Table S3 for full table).
We next tested the repeatability of the selection response at the gene/locus level using the two LS replicates. If the founding populations shared the same selectively favored variants, we may observe parallelism or co-incident selective sweeps, as long as selection could overcome random drift. Indeed, the Δz2 profiles of LS 1 & 2 were more similar to each other than to Ctrl (Fig. 2 & 3A; fig. S8; Pearson’s correlation in Δz2 from 10 kbp windows: LS1–LS2: 0.21, vs. LS1–Ctrl: 0.063 and LS2–Ctrl: 0.045). Whereas previous genomic studies with multiple natural or artificial selection replicates focused mainly on detecting parallel loci (23-26), here we have the possibility to quantify parallelism and determine the selection value of a given locus. Six out of eight significant loci at the HINF, max LD threshold were line-specific, even though the selected alleles were always present in the F0 generation in both lines. This prevalence of line-specific loci was consistent, even if we used different thresholds. However, the two remaining loci that ranked first and second by selection coefficient were parallel, both with s > (Fig. 3B; note that as outliers, the selection coefficient may be substantially overestimated, but their rank order should remain the same), supporting the idea that the probability of parallelism can be high among those loci with the greatest selection advantage (27). In contrast to changes in global diversity over 17 generations, where we could only detect a slight difference between the LS lines and Ctrl, we found the signature of parallelism to be significantly different between the selected LS1 and LS2 replicates, as opposed to comparisons with Ctrl, or between simulated replicates (fig. S8; χ 2 test, LS1–LS2: P ≤ 1 ×10-10; Ctrl–LS1 and Ctrl–LS2, P > 0.01 and P > 0.2, respectively, both non-significant after correcting for multiple testing; see Supplementary Notes for details). As such, the parallel selected loci between LS1 and LS2 provide the strongest evidence for the role of discrete major loci; and represent prime candidates for molecular dissection (see fig. S9 and Supplementary Notes “Molecular dissection of Gli3” for an additional locus).
Molecular dissection of the Nkx3-2 locus highlights cis-acting changes
Between the two major parallel loci, we chose the locus on chromosome 5 (Chr5) at 41– 42 Mbp for functional validation because it showed the strongest estimated selection coefficient and both Longshanks lines shared a clear and narrow selection signature. Crucially, it contains only three genes, including Nkx3-2, a known regulator of bone maturation (also known as Bapx1; Fig. 2 & 4A) (29). At this locus, the pattern of variation resembles a selective sweep spanning 1 Mbp (Fig. 5A). Comparison between F0 and F17 individuals revealed no recombinant in this entire region (fig. S11A, top panel), precluding fine-mapping using recombinants. We then analyzed the genes in this region to identify the likely target(s) of selection. First, we determined that no coding changes existed for either Rab28 or Nkx3-2, the two genes located within the topologically associating domain (TADs, which mark chromosome segments with shared gene regulatory logic) (22). We then performed in situ hybridization and detected robust expression of Nkx3-2 and Rab28 in the developing fore-and hindlimb buds of Ctrl, LS1 and LS2 E12.5, in a domain broadly overlapping the presumptive zeugopod, the region including the tibia (fig. S10B). A third gene, Bod1l, straddled the TAD boundary with its promoter located in the neighboring TAD, making its regulation by sequences in the selected locus unlikely. Consistent with this, Bod1l showed only weak or undetectable expression in the developing limb bud (fig. S10A). We next combined ENCODE chromatin profiles and our own ATAC-Seq data to identify limb enhancers in the focal TAD. Here we found 3 novel enhancer candidates (N1, N2 and N3) carrying 3, 1 and 3 SNPs respectively, all of which showed significant allele frequency shifts in LS1 & 2 (Fig. 4B & C; fig. S11A). Chromosome conformation capture assays showed that the N1–N3 sequences formed long-range looping contacts with the Nkx3-2 promoter—a hallmark of enhancers—despite nearly 600 kbp of intervening sequence (Fig. 4B). We next used transgenic reporter assays to determine whether these sequences could drive expression in the limbs. Here, we were not only interested in whether the sequence encoded enhancer activity, but specifically whether the SNPs would affect the activity (Fig. 4D). We found that the F0 alleles of the N1 and N3 enhancers (3 SNPs each in about 1 kbp) drove robust and consistent lacZ expression in the developing limb buds (N1 and N3) as well as in expanded trunk domains (N3) at E12.5 (Fig. 4E). In contrast, transgenic reporters carrying the selected F17 alleles of N1 and N3 from the Chr 5: 41 Mbp locus showed consistently weak, nearly undetectable lacZ expression (Fig. 4E). Thus, switching from the F0 to the F17 enhancer alleles led to a nearly complete loss in activity (“loss-of-function”). This is consistent with the role of Nkx3-2 as a repressor in long bone maturation (29). We hypothesize that the F17 allele causes de-repression of bone formation by reducing enhancer activity and Nkx3-2 expression. Crucially, the F0 N1 enhancer showed activity that presages future long bone cartilage condensation in the limb (Fig. 4D). This pattern recalls previous results that suggest that undetected early expression of Nkx3-2 may mark the boundaries and size of limb bone precursors, including the tibia (30) (fig. S10C). Conversely, over-expression of Nkx3-2 has been shown to cause shortened tibia (even loss) in mice (31, 32). In humans, homozygous frameshift mutations in NKX3-2 cause the rare disorder spondylo-megaepiphyseal-metaphyseal dysplasia (SMMD; OMIM: 613330) that is characterized by short-trunk, long-limbed dwarfism and bow-leggedness (33). This broadly corresponds to the expression domains of the two novel N1 (limbs) and N3 (limbs and trunk) enhancers. Instead of wholesale loss of Nkx3-2 expression, which would have been lethal in mice (34) or likely cause major defects similar to SMMD patients (33), our in situ hybridization data did not reveal qualitative differences in Nkx3-2 expression domains between Ctrl or LS embryos (fig. S10B). Taken together, our results recapitulate the key features of cis-acting mode of adaptation: Nkx3-2 is a broadly expressed pleiotropic transcription factor, which causes lethality when knocked out (34). We found no amino acid changes that could impact its protein function. Rather, changes of tissue-specific expression by modular enhancers likely played a more important role. By combining population genetics, functional genomics and developmental genetic techniques, we were able to dissect a megabase-long locus and present data supporting the identification of up to 6 candidate quantitative trait nucleotides (QTNs). In mice, this represents a rare example of genetic dissection of a trait to the base-pair level.
Linking molecular mechanisms to evolutionary consequence
We next aimed to determine the evolutionary relevance of the Nkx3-2 enhancer variants at the molecular and the population level. At the strongly expressed N3/F0 “trunk and limb” enhancer, we note that the SNPs in the F17 selected allele lead to disrupted Nkx3-1 and Nkx3-2 binding sites [Fig. 4C & 5A; UNIPROBE database (35)]. This suggests that the selected SNPs may disrupt an auto-feedback loop to decrease Nkx3-2 activity in the limb bud and trunk domain (Fig. 5A). Using a GFP transgenic reporter assay in stickleback fish embryos, we found that the mouse N1/F0 enhancer allele was capable of driving expression in the distal cells but not in the fin rays of the developing fins (Fig. 5A). This pattern recapitulates fin expression of nkx3.2 in fish, which gives rise to endochondral radials (homologous to ulna/tibia in mice) (36). Our results suggest that despite its deep functional conservation, strong selection may have favored the weaker N1/F17 and N3/F17 enhancer alleles in the context of the Longshanks selection regime.
Using theory and simulations, we can go beyond the qualitative level and quantitatively estimate the selection coefficient and the contribution of the Nkx3-2 locus to the total selection response in the Longshanks mice. We retraced the selective sweep of the Nkx3-2 N1 and N3 alleles through targeted genotyping in 1569 mice across all 20 generations. The selected allele steadily increased from around 0.17 to 0.85 in LS1 and 0.98 in LS2 but fluctuated around 0.25 in Ctrl (Fig. 5B). We estimate that such a change of around 0.8 in allele frequency would correspond to a selection coefficient s of ∼0.24 ± 0.12 at this locus (Fig. 5C; see Supplementary Notes section on “Estimating selection coefficient”). By extending our simulation framework to allow for a major locus against an infinitesimal background, we find that the Nkx3-2 locus would have to contribute 9.4% of the total selection response (limits 3.6 – 15.5%; see Supplementary Notes section “Estimating selection coefficient” for details) in order to produce a shift of 0.8 in allele frequency over 17 generations.
Discussion
A defining task of our time is to understand the factors that determine and constrain how small populations respond to sudden environmental changes. Here, we analyze the genomic changes in the Longshanks experiment, which was conducted under replicated and controlled conditions, to characterize the genomic changes that occur as small experimental populations respond to selection.
An important conclusion from the Longshanks experiment is that tibia length increased readily and repeatedly in response to selection even in an extremely bottlenecked population with as few as 14-16 breeding pairs. This is because the lines were founded with enough standing variation, and generation 17 was still only a fraction of the way to the characteristic time for the selection response at ∼2Ne generations (37), estimated here to be around 90 (fig. S2B; see Supplementary Notes on simulation). This Ne of 46, while small, is comparable to those in natural populations like the Soay sheep (35.3), Darwin’s finches (38–60) or Tasmanian Devils (26–37; this last study documents a rapid and parallel evolutionary response to transmissible tumor) (38-40). Our results underscore the importance of standing genetic variation in rapid adaptive response to a changing environment, a recurrent theme in natural adaptation (24, 39, 41) and breeding (42).
By combining pedigree records with sequencing of founder individuals, our data had sufficient detail to allow precise modeling of trait response with predicted shifts in allele frequency distribution that closely matched our results, and with specific loci that we functionally validated. Our results imply a mixed genetic architecture with a few discrete loci of large effect, amid an infinitesimal background. This finding highlights another advantage of evolve-and-resequence (E&R) experiments over quantitative trait loci (QTL) mapping crosses: by sampling a much broader pool of alleles and continually competing them against each other, the inferred genetic architecture and distribution of effect sizes are more likely to be representative of the population at large.
Parallel evolution is often seen as a hallmark for detecting selection (25, 43-45). We investigated the factors favoring parallelism by contrasting the two Longshanks replicates against the Control line. We observed little to no parallelism between selected lines and Ctrl, or between simulated replicates of selection, even though the simulated haplotypes were sampled directly from actual founders. This underscores that parallelism depends on both shared selection pressure (absent in Ctrl) and the availability of large effect loci that confer a substantial selection advantage (absent under the infinitesimal model; Fig. 3A&B; fig. S8).
Through in-depth dissection of the Nkx3-2 locus, our data show in fine detail how the selective value of standing variants depends strongly on the selection regime: the originally common F0 variant of the N1 enhancer show deep functional conservation and can evidently recapitulate fin nkx3.2 expression in fishes (Fig. 5A). Yet in the Longshanks experiment selection strongly favored the inactive allele (Fig. 5B). Similarly, our molecular dissection of two loci show that both gain-of-function (Gli3) and loss-of-function (Nkx3-2) variants could be favored by selection (Fig. 4E, 5A; fig. S9D). Through synthesis of multiple lines of evidence, our work uncovered the key role of Nkx3-2, which was not an obvious candidate gene like Gli3 due to the lack of limb phenotype in the Nkx3-2 knockout mice. To our surprise, the same loss of NKX3-2 function in human SMMD patients manifests itself in opposite ways in different bone types as short trunk and long limbs (33). This matches the expression domains of our N1 (limb) and N3 (limb and trunk) enhancers (Fig. 5A). In the absence of any lethal coding mutations, evidently the F17 haplotype carried beneficial alleles at both enhancers for the limb and potentially also trunk target tissues; and was therefore strongly favored under the novel selection regime in the Longshanks selection experiment. We estimate that these enhancer variants, along with any other tightly linked beneficial SNPs, segregate as a single locus and contributes ∼10% of the overall selection response.
Conclusion
Using the Longshanks selection experiment, we show that by synthesizing theory, empirical data and molecular genetics, it is possible to identify some of the individual SNPs that have contributed to the response to selection on morphology. In particular, discrete, large effect loci are revealed by their parallel response. Further work should focus on dissecting the mechanisms behind the dynamics of selective sweeps and/or polygenic adaptation by resequencing the entire selection pedigree; testing how the selection response depends on the genetic architecture; and the extent to which linkage places a fundamental limit on our inference of selection. Improved understanding in these areas may have broad implications for conservation, rapid adaptation to climate change and quantitative genetics in medicine, agriculture and in nature.
Author Contributions
C.R. designed and initiated the Longshanks selection experiment. M.M. and C.R. performed the selection, phenotyping and collected tissue samples for sequencing. Y.F.C. and C.R. designed the sequencing strategy. W.H.B., M.K., prepared the samples and performed sequencing. S.B., N.B. performed simulations and analyzed data. M.M., S.B., N.B., C.R., Y.F.C. analyzed the pedigree data. J.P.L.C., M.N.Y., M.K., I.S., J.C., C.R. and Y.F.C. designed, performed and analyzed results from functional experiments. J.P.L.C., R.N. and Y.F.C. planned and performed and analyzed the mouse transgenic experiments. S.B., M.N.Y., C.R., N.B. and Y.F.C analyzed the genomic data. All authors discussed the results and implications, wrote and commented on the manuscript at all stages.
Competing Financial Interests
The authors declare no competing interests. The Max Planck Society, IST Austria, the Natural Sciences and Engineering Research Council of Canada, and the Faculty of Veterinary Medicine of the University of Calgary provide funding for the research but no other competing interests.
Materials and Methods
Animal Care and Use
All experimental procedures described in this study have been approved by the applicable University institutional ethics committee for animal welfare at the University of Calgary (HSACC Protocols M08146 and AC13-0077); or local competent authority: Landesdirektion Sachsen, Germany, permit number 24-9168.11-9/2012-5.
Reference genome assembly
All co-ordinates in the mouse genome refer to Mus musculus reference mm10, which is derived from GRCm38.
Code and data availability
Sequence data have been deposited in the GEO database under accession number [X]. Non-sequence data have been deposited at Dryad under accession number [Y]. Analytical code and additional notes have been deposited in the following repository: https://github.com/evolgenomics/Longshanks.
Pedigree data
Tibia length and body weight phenotypes were measured as previously described (14). A total of 1332 Control, 3054 LS1, and 3101 LS2 individuals were recorded. Five outlier individuals with a skeletal dysplasia of unknown etiology were removed from LS2 and excluded from further analysis. Missing data in LS2 were filled in with random individuals that best matched the pedigree. Trait data were analyzed to determine response to selection based on the measured traits and their rank orders based on the selection index.
Simulations
Simulations were based on the actual pedigree and selection scheme, following one chromosome at a time. Each chromosome was represented by a set of junctions, which recorded the boundaries between genomes originating from different founder genomes; at the end, the SNP genotype was reconstructed by seeding each block of genome with the appropriate ancestral haplotype. This procedure is much more efficient than following each of the very large number of SNP markers. Crossovers were uniformly distributed, at a rate equal to the map length (46). Trait value was determined by a component due to an infinitesimal background (Vg); a component determined by the sum of effects of 104 evenly spaced discrete loci (Vs); and a Gaussian non-genetic component (Ve). The two genetic components had variance proportional to the corresponding map length, and the heritability was estimated from the observed trait values (see Supplementary Notes under “Simulations”). In each generation, the actual number of male and female offspring were generated from each breeding pair, and the male and female with the largest trait value were chosen to breed.
SNP genotypes were assigned to the founder genomes with their observed frequencies. However, to reproduce the correct variability requires that we assign founder haplotypes. This is not straightforward, because low-coverage individual genotypes cannot be phased reliably, and heterozygotes are frequently mis-called as homozygotes. We compared three procedures, which were applied within intervals that share the same ancestry: assigning haplotypes in linkage equilibrium (LE, or “no LD”); assigning heterozygotes to one or other genome at random, which minimises linkage disequilibrium, given the diploid genotype (“min LD”); and assigning heterozygotes consistently within an interval, which maximises linkage disequilibrium (“max LD”) (fig. S2C). For details, see Supplementary Notes.
Significance thresholds
To obtain significance thresholds, we summarize the genome-wide maximum Δz2 shift for each replicate of the simulated LS1 and LS2 lines, averaged within 10kb windows, and grouped by the selection intensity and extent of linkage disequilibrium (LD). From this distribution of genome-wide maximum Δz2 we obtained the critical value for the corresponding significance threshold (typically the 95th quantile or P = 0.05) under each selection and LD model (Fig. 3A; fig. S2E). This procedure controls for the effect of linkage and hitchhiking, line-specific pedigree structure, and selection strength.
Sequencing, genotyping and phasing pipeline
Sequencing libraries for high-throughput sequencing were generated using TruSeq or Nextera DNA Library Prep Kit (Illumina, Inc., San Diego, USA) according to manufacturer’s recommendations or using equivalent Tn5 transposase expressed in-house as previously described (47). Briefly, genomic DNA was extracted from ear clips by standard Protease K digestion (New England Biolabs GmbH, Frankfurt am Main, Germany) followed by AmpureXP bead (Beckman Coulter GmbH, Krefeld, Germany) purification. Extracted high-molecular weight DNA was sheared with a Covaris S2 (Woburn, MA, USA) or “tagmented” by commercial or purified Tn5-transposase according to manufacturer’s recommendations. Each sample was individually barcoded (single-indexed as N501 with N7XX variable barcodes; all oligonucleotides used in this study were synthesized by Integrated DNA Technologies, Coralville, Iowa, USA) and pooled for high-throughput sequencing by a HiSeq 3000 (Illumina) at the Genome Core Facility at the MPI Tübingen Campus. Sequenced data were pre-processed using a pipeline consisting of data clean-up, mapping, base-calling and analysis based upon fastQC v0.10.1 (52); trimmomatic v0.33 (48); bwa v0.7.10-r789 (49); GATK v3.4-0-gf196186 modules BQSR, MarkDuplicates, IndelRealignment (50, 51). Genotype calls were performed using the GATK HaplotypeCaller under the GENOTYPE_GIVEN_ALLELES mode using a set of high-quality SNP calls made available by the Wellcome Trust Sanger Centre (Mouse Genomes Project version 3 dbSNP v137 release (52), after filtering for sites segregating among inbred lines that may have contributed to the original 7 female and 2 male CD-1 founders, namely 129S1/SvImJ, AKR/J, BALB/cJ, BTBR T+ Itpr3tf/J, C3H/HeJ, C57BL/6NJ, CAST/EiJ, DBA/2J, FVB/NJ, KK/HiJ, MOLF/EiJ, NOD/ShiLtJ, NZO/HlLtJ, NZW/LacJ, PWK/PhJ and WSB/EiJ based on (17). We consider a combined ∼100x coverage sufficient to recover any of the 18 CD-1 founding haplotypes still segregating at a given locus. The raw genotypes were phased with Beagle v4.1 (53) based on genotype posterior likelihoods using a genetic map interpolated from the mouse reference map (46) and imputed from the same putative CD-1 source lines as the reference panel. The site frequency spectra (SFS) were evaluated to ensure genotype quality (fig. S3A).
Population genetics summary statistics
Summary statistics of the F0 and F17 samples were calculated genome-wide [Weir–Cockerham FST, π, heterozygosity]; in adjacent 10 kbp windows (Weir–Cockerham FST, π, allele frequencies p and q), or on a per site basis (Weir–Cockerham FST, π, p and q) using VCFtools v0.1.14 (54). The summary statistics Δz2 was the squared within-line difference in arcsine square root transformed MAF q; it ranges from 0 to π2. The resulting data were further processed by custom bash, Perl and R v3.2.0 (55) scripts.
Peak loci and filtering for hitchhiking windows
Peak loci were defined by a descending rank ordering of all 10 kbp windows, and from each peak signals the windows were extended by 100 SNPs to each side, until no single SNP rising above a Δz2 shift of 0.2 π2 was detected. A total of 810 peaks were found with a Δz2 shift ≥ 0.2 for LS1 & 2. Following the same procedure, we found 766 peaks in Ctrl.
Candidate genes
To determine whether genes with related developmental roles were associated with the selected variants, the topologically associating domains (TADs) derived from mouse embryonic stem cells as defined elsewhere (22) were re-mapped onto mm10 co-ordinates. Genes within the TAD overlapping within 500 kbp of the peak window (“core span”) were then cross-referenced against annotated knockout phenotypes (Mouse Genome Informatics, http://www.informatics.jax.org). This broader overlap was chosen to account for genes whose regulatory sequences like enhancers but not their gene bodies fall close to the peak window. We highlight candidate genes showing limb-and bone-related phenotypes, e.g., with altered limb bone lengths or epiphyseal growth plate morphology, as observed in Longshanks mice (16), of the following categories (along with their Mammalian Phenotype Ontology term and the number of genes): “abnormal tibia morphology/MP:0000558” (212 genes), “short limbs/MP:0000547” and “short tibia/MP:0002764” (223 genes), “abnormal cartilage morphology/MP:0000163” (321 genes), “abnormal osteoblast morphology/MP:0004986” (122 genes). Note that we exclude compound mutants or those conditional mutant phenotypes involving transgenes. To determine if the overlap with these genes are significant, we performed 1000 permutations of the core span using bedtools v2.22.1 shuffle with the -noOverlapping option (56) and excluding ChrY, ChrM and the unassembled scaffolds. We then followed the exact procedure as above to determine the number of genes in the overlapping TAD belonging to each category. We reported the quantile rank as the P-value, ignoring ties. To determine other genes in the region, we list all genes falling within the entire hitchhiking window (Table S3).
Identification of putative limb enhancers
We downloaded publicly available chromatin profiles, derived from E14.5 limbs, for the histone H3 lysine-4 (K4) or lysine-27 (K27) mono-/tri-methylation or acetylation marks (H3K4me1, H3K4me3 and H3K27ac) generated by the ENCODE Consortium (57). We intersected the peak calls for the enhancer-associated marks H3K4me1 and H3K27ac and filtered out those overlapping promoters [H3K4me3 and promoter annotation according to the FANTOM5 Consortium (58)].
Enrichment analysis
To calculate enrichment through the whole range of Δz2, a similar procedure was taken as in Candidate genes above. For knockout gene functions, genes contained in TADs within 500 kbp of peak windows were included in the analysis. We use the complete database of annotated knockout phenotypes for genes or spontaneous mutations, after removing phenotypes reported under conditional or polygenic mutants. For gene expression data, we retained all genes which have been reported as being expressed in any of the limb structures, by tracing each anatomy ontological term through its parent terms, up to the top level groupings, e.g., “limb”, in the Mouse Genomic Informatics Gene Expression Database (59). For E14.5 enhancers, we used a raw 500 kbp overlap with the peak windows, because enhancers, unlike genes, may not have intermediaries and may instead represent direct selection targets.
For coding mutations, we first annotated all SNPs for their putative effects using snpEff v4.0e (60). To accurately capture the per-site impact of coding mutations, we used per-site Δz2 instead of the averaged 10 kbp window. For each population, we divided all segregating SNPs into up to 0.02 bands based on per-site Δz2. We then tracked the impact of coding mutations in genes known to be expressed in limbs, as above. We reported the sum of all missense (“moderate” impact), frame-shift, stop codon gain or loss sites (“high impact”). A linear regression was used to evaluate the relationship between Δz2 and the average impact of coding SNPs (SNPs with high or moderate impact to all coding SNPs).
For regulatory mutations, we used the same bins spanning the range of Δz2, but focused on the subset of SNPs falling within the ENCODE E14.5 limb enhancers. We then obtained a weighted average conservation score based on an averaged phastCons (61) or phyloP (62) score in ±250 bp flanking the SNP, calculated from a 60-way alignment between placental mammal genomes [downloaded from the UCSC Genome Browser (63)]. We reported the average conservation score of all SNPs within the bin and fitted a linear regression on log-scale. In particular, phastCons scores range from 0 (un-conserved) to 1 (fully conserved), whereas phyloP is the |log10| of the P-value of the phylogenetic tree, expressed as a positive score for conservation and a negative score for lineage-specific accelerated change. We favored using phastCons for its simpler interpretation.
Impact of coding variants
Using the same SNP effect annotations described in the section above, we checked whether any specific SNP with significant site-wise Δz2 in either LS1 or LS2 cause amino acid changes or protein disruptions and are known to cause limb defects when knocked out. For each position we examined outgroup sequences using the 60-way placental mammal alignment to determine the ancestral amino acid state and whether the selected variant was consistent with purifying vs. diversifying selection. The resulting 12 genes that match these criteria are listed in Table S4.
Association with human height loci
To test if loci known to be associated with human height are clustered with the selected loci in the LS lines, we downloaded the set of previously published 697 SNPs (64). In order to facilitate mapping to mouse co-ordinates, each SNP was expanded to 100 kbp centering on the SNP and converted to mm10 positions using the liftOver tool with the multiple mapping option disabled (63). We were able to assign positions in 655 out of the 697 total SNPs. Then for each of the 810 loci above the HINF, no LD threshold the minimal distance to any of the mapped human loci was determined using bedtools closest with the −d option (56). Should a region actually overlap, a distance of 0 bp was assigned. To generate the permuted set, the 810 loci were randomly shuffled across the mouse autosomes using the bedtools shuffle program with the - noOverlapping option. Then the exact same procedure as the actual data was followed to determine the closest interval. The resulting permuted intervals follow an approximately normal distribution, with the actual observed results falling completely below the range of permuted results.
In situ hybridization
Detection of specific gene transcripts were performed as previously described in (65). Probes against Nkx3-2, Rab28, Bod1l and Gli3 were amplified from cDNA from wildtype C57BL/6NJ mouse embryos (Table S5). Amplified fragments were cloned into pJET1.2/blunt plasmid backbones in both sense and anti-sense orientations using the CloneJET PCR Kit (Thermo Fisher Scientific, Schwerte, Germany) and confirmed by Sanger sequencing using the included forward and reverse primers. Probe plasmids have also been deposited with Addgene. In vitro transcription from the T7 promoter was performed using the MAXIscript T7 in vitro Transcription Kit (Thermo Fisher Scientific) supplemented with Digoxigenin-11-UTP (Sigma-Aldrich) (MPI Tübingen), or with T7 RNA polymerase (Promega) in the presence of DIG RNA labelling mix (Roche) (University of Calgary). Following TURBO DNase (Thermo Fisher Scientific) digestion probes were cleaned using SigmaSpin Sequencing Reaction Clean-Up columns (Sigma-Aldrich) (MPI Tübingen), or using Illustra MicroSpin G-50 columns (GE Healthcare) (University of Calgary). During testing of probe designs, sense controls were used in parallel reactions to establish background non-specific binding.
ATAC-seq library preparation and sequencing pipeline
ATAC-seq was performed on dissected C57BL/6NJ E14.5 forelimb and hindlimb. Nuclei preparation and tagmentation were performed as previously described in (28), with the following modifications. To minimize endogenous protease activity, cells were strictly limited to 5 + 5 minutes of collagenase A treatment at 37 °C, with frequent pipetting to aid dissociation into single-cell suspensions. Following wash steps and cell lysis, 50 000 nuclei were tagmented with expressed Tn5 transposase. Each tagmented sample was then purified by MinElute columns (Qiagen) and amplified with Q5 High-Fidelity DNA Polymerase (New England Biolabs) using a uniquely barcoded i7-index primer (N701-N7XX) and the N501 i5-index primer. PCR thermocycler programs were 72°C for 4 min, 98°C for 30 s, 6 cycles of 98°C for 10 s, 65°C for 30 s, 72°C for 1 min, and final extension at 72°C for 4 min. PCR-enriched samples were taken through a double size selection with PEG-based SPRI beads (Beckman Coulter) first with 0.5X ratio of PEG/beads to remove DNA fragments longer than 600 bp, followed by 1.8X PEG/beads ratio in order to select for Fraction A as described in (66). Pooled libraries were run on the HiSeq 3000 (Illumina) at the Genome Core Facility at the MPI Tübingen Campus to obtain 150 bp paired end reads, which were aligned to mouse mm10 genome using bowtie2 v.2.1.0 (67). Peaks were called using MACS14 v.2.1 (68).
Multiplexed chromosome conformation capture (4C-Seq)
Chromosome conformation capture (3C) template was prepared from pooled E14.5 liver, forelimb and hindlimb buds (n = 5–6 C57BL/6NJ embryos per replicate), with improvements to the primer extension and library amplification steps following (69). The template was amplified with Q5 High-Fidelity Polymerase (New England Biolabs GmbH, Frankfurt am Main, Germany) using a 4C adapter-specific primer and a pool of 6 Nkx3-2 enhancer viewpoint primers [and, in a separate experiment, a pool of 8 Gli3 enhancer-specific viewpoint primers; Table S6]. Amplified fragments were prepared for Illumina sequencing by ligation of TruSeq adapters, followed by PCR enrichment. Pooled libraries were sequenced by a HiSeq 3000 (Illumina) at the Genome Core Facility at the MPI Tübingen Campus with single-end, 150 bp reads. Sequence data were processed using a pipeline consisting of data clean-up, mapping, and analysis based upon cutadapt v1.10 (70); bwa v0.7.10-r789 (49) ; samtools v1.2 (71); bedtools (56) and R v3.2.0 (55). Alignments were filtered for ENCODE blacklisted regions (72) and those with MAPQ scores below 30 were excluded from analysis. Filtered alignments were binned into genome-wide BglII fragments, normalized to Reads Per Kilobase of transcript per Million mapped reads (RPKM), and plotted and visualized in R.
Plasmid construction
Putative limb enhancers corresponding to the F0 and F17 alleles of the Gli3 G2 and Nkx3-2 N1 and N3 enhancers were amplified from genomic DNA of Longshanks mice from the LS1 F0 (9 mice) and F17 (10 mice) generations and sub-cloned into pJET1.2/blunt plasmid backbone using the CloneJET PCR Kit (Thermo Fisher Scientific) and alleles were confirmed by Sanger sequencing using the included forward and reverse primers (Table S7). Each allele of each enhancer was then cloned as tandem duplicates with junction SalI and XhoI sites upstream of a β-globin minimal promoter in our reporter vector (see below). Constructs were screened for the enhancer variant using Sanger sequencing. All SNPs were further confirmed against the rest of the population through direct amplicon sequencing.
The base reporter construct pBeta-lacZ-attBx2 consists of a β-globin minimal promoter followed by a lacZ reporter gene derived from pRS16, with the entire reporter cassette flanked by double attB sites. The pBeta-lacZ-attBx2 plasmid and its full sequence have been deposited and is available at Addgene.
Pronuclear injection of F0 and F17 enhancer-reporter constructs in mice
The reporter constructs containing the appropriate allele of each of the 3 enhancers were linearized with ScaI (or BsaI in the case of the N3 F0 allele due to the gain of a ScaI site) and purified. Microinjection into mouse zygotes was performed essentially as described (73). At 12 d after the embryo transfer, the gestation was terminated and embryos were individually dissected, fixed in 4% paraformaldehyde for 45 min and stored in PBS. All manipulations were performed by R.N. or under R.N.’s supervision at the Transgenic Core Facility at the Max Planck Institute of Cell Biology and Genetics, Dresden, Germany. Yolk sacs from embryos were separately collected for genotyping and all embryos were stained for lacZ expression as previously described (74). Embryos were scored for lacZ staining, with positive expression assigned if the pattern was consistently observed in at least two embryos.
Genotyping of time series at the Nkx3-2 N3 locus
Allele-specific primers terminating on SNPs that discriminate between the F0 from the F17 N3 enhancer alleles were designed (rs33219710 and rs33600994; Table S8). The amplicons were optimized as a qPCR reaction to give allele-specific, present/absent amplifications (typically no amplification for the absent allele, otherwise average ΔCt > 10). Genotyping on the entire breeding pedigree of LS1 (n = 602), LS2 (n = 579) and Ctrl (n = 389) was performed in duplicates for each allele on a Bio-Rad CFX384 Touch instrument (Bio-Rad Laboratories GmbH, Munich, Germany) with SYBR Select Master Mix for CFX (Thermo Fisher Scientific) and the following qPCR program: 50°C for 2 min, 95°C for 2 min, 40 cycles of 95°C for 15 s, 58°C for 10 s, 72°C for 10 s. In each qPCR run we included individuals of each genotype (LS F17 selected homozygotes, heterozygotes and F0 major allele homozygotes). For the few samples with discordant results between replicates, DNA was re-extracted and re-genotyped or otherwise excluded.
Transgenic reporter assays in stickleback fish
In sticklebacks, transgenic reporter assays were carried out using the reporter construct pBHR (44). The reporter consists of a zebrafish heat shock protein 70 (Hsp70) promoter followed by an eGFP reporter gene, with the entire reporter cassette flanked by tol2 transposon sequences for transposase-directed genomic integration. The Nkx3-2 N1/F0 enhancer allele was cloned as tandem duplicates using the NheI and EcoRV restriction sites upstream of the Hsp70 promoter. Enhancer orientation and sequence was confirmed by Sanger sequencing. Transient transgenic stickleback embryos were generated by co-microinjecting the plasmid (final concentration: 10 ng/µl) and tol2 transposase mRNA (40 ng/µl) into freshly fertilized eggs at the one-cell stage as described in (44).
Acknowledgements
We thank Felicity Jones for input into experimental design, helpful discussion and for improving the manuscript. We thank the Rolian, Barton, Chan and Jones Labs members for support, insightful scientific discussion and improving the manuscript. We thank the Rolian lab members, the Animal Resource Centre staff at the University of Calgary and MPI Dresden Animal Facility staff for animal husbandry. We thank Derek Lundberg for help with library preparation automation. We thank Christa Lanz, Rebecca Schwab and Ilja Bezrukov for assistance with high-throughput sequencing and associated data processing; Andre Noll for high-performance computing support; the MPI Tübingen IT team for computational support. We thank Felicity Jones and the Jones Lab for help with stickleback microinjections. pRS16 was a gift from François Spitz. We thank Mirna Marinič for creating an earlier version of the transgenic reporter plasmid. We are indebted to Gemma Puixeu Sala, William G. Hill, Peter Keightley for input and discussion on data analysis and simulation. We are also indebted to Stefan Mundlos, Przemko Tylzanowki, Weikuan Gu for suggested experiments and sharing unpublished data. We thank Sean B. Carroll, Andrew Clark, Jonathan Pritchard, Matthew Rockman, Gregory Wray and David Kingsley for thoughtful input that has greatly improved our manuscript. J.P.L.C. is supported by the International Max Planck Research School “From Molecules to Organisms”. S.B. and N.B. are supported by IST Austria. C.R. is supported by Discovery Grant #4181932 from the Natural Sciences and Engineering Research Council of Canada and by the Faculty of Veterinary Medicine at the University of Calgary. Y.F.C. is supported by the Max Planck Society.
References and Notes
The following references appeared in Material and Methods only