The rate and effect of de novo mutations in natural populations of Arabidopsis thaliana

Moises Exposito-Alonso; Claude Becker; Verena J. Schuenemann; Ella Reitter; Claudia Setzer; Radka Slovak; Benjamin Brachi; Jöerg Hagmann; Dominik G. Grimm; Chen Jiahui; Wolfgang Busch; Joy Bergelson; Rob W. Ness; Johannes Krause; Hernán A. Burbano; Detlef Weigel

doi:10.1101/050203

SUMMARY

Like many other species, the plant Arabidopsis thaliana has been introduced in recent history from its native Eurasian range to North America, with many individuals belonging to a single lineage. We have sequenced 100 genomes of present-day and herbarium specimens from this lineage, covering the time span from 1863 to 2006. Within-lineage recombination was nearly absent, greatly simplifying the genetic analysis, allowing direct estimation of the mutation rate and an introduction date in the early-17^th century. The comparison of substitution rates at different sites throughout the genome reveals that genetic drift predominates, but that purifying selection in this rapidly expanding population is nevertheless evident even over short historical time scales. Furthermore, an association analysis identifies new mutations affecting root development, a trait important for adaptation in the wild. Our work illustrates how mutation and selection rates can be observed directly by combining modern genetic methods and historic samples.

HIGHLIGHTS

A historically young colonizing lineage of Arabidopsis thaliana allows observation of contemporary evolutionary forces.
Genomes from specimens collected over 150 years support direct calculation of mutation rates occurring in nature.
Drift predominates, but purifying selection is evident genome-wide over historical time scales.
New mutations with phenotypic effects can be identified and traced back in time and space.

INTRODUCTION

If we want to understand evolution and especially adaptation, we need to know rates of mutation and selection, which together determine the substitutions that can be observed in a population. Typically, one tries to infer evolutionary parameters from patterns of genetic diversity in extant individuals of a species. Unfortunately, demographic and genetic factors such as migration, fluctuating population sizes, recombination and gene conversion greatly complicate such inferences. Many scientists have therefore chosen to focus on mutations only, measuring their accumulation in artificial conditions, using mutation accumulation lines grown in the laboratory (Halligan and Keightley, 2009), or over very short time scales, for example in human parent-offspring trios (Roach et al., 2010).

An alternative approach is the use of older, but still simple lineages with limited genetic diversity, such as colonizing populations that have undergone a recent, strong genetic bottleneck. Such populations can be considered natural experiments in which one can test ecological or evolutionary hypotheses (Gauze, 1934; Maron et al., 2004; Sax et al., 2007). Recent colonization events permit the quantification of evolutionary forces related to adaptation – mutation, selection, genetic drift, recombination – that are still not well understood in invasion ecology (Barrett, 2014; Bock et al., 2015; Lee, 2002).

Humans have increasingly blurred biogeographical boundaries of species outside their native range by planned or serendipitous dissemination. While the exact reasons for success or failure in alien environments remain unclear, many species can become established in new areas, with North America being the continent with the highest number of naturalized plants (van Kleunen et al., 2015). Among these is the model plant Arabidopsis thaliana, which is native to Eurasia but has recently colonized and spread throughout much of North America (Platt et al., 2010). Although A. thaliana is not an invasive species, it has traits typical for successful colonizers, such as a high selfing rate, a durable seed bank and a short generation time (Baker, 1965).

Colonizing populations often start with very few individuals and therefore have low genetic diversity. The N. American A. thaliana population is much less diverse than what is seen in the native range, with one predominant lineage, named haplogroup-1 (HPGI), accounting for about half of all N. American individuals (Platt et al., 2010). The success of an isolated, selfing lineage that is genetically very uniform seems to contradict the common idea that such lineages are evolutionary dead-ends because they can adapt only through de novo mutations, a process predicted to be much slower than adaptation from standing variation (Barrett and Schluter, 2008), although we cannot know how long this lineage will last.

Ideally, to evaluate all evolutionary trajectories, including unsuccessful ones, one should have access not only to the evolved extant individuals, but also to their “unevolved” ancestors. The power of temporal transects has been aptly demonstrated with the genetic analysis of historical and archaeological samples of humans and microbes, relying on advances in the study of ancient DNA (aDNA) (Orlando et al., 2015; Shapiro and Hofreiter, 2014). Natural history collections that cover the past several hundreds of years offer an exciting, underused resource for such studies (Martin et al., 2013; Staats et al., 2013; Vandepitte et al., 2014; Weiß et al., 2015; Yoshida et al., 2013).

There is a rich history of sampling plants and storing them in herbaria. Importantly, herbaria do not merely house exotic, rare species collected in the more distant past, but also common plants that have been sampled for many decades over and over again, making them powerful tools for monitoring recent colonization events in space and time (Crawford and Hoagland, 2009; Lankaua et al., 2009). Such a resource exists for N. American A. thaliana. Here, we compare genomes from herbarium specimens, collected between 1863 to 1993, and from live individuals, collected between 1993 and 2006, to date the origin of this lineage, and infer mutation rates, selection, demography and migration routes. We also identify de novo mutations in this lineage that are associated with phenotypes likely to be under selection in the wild, which in turn correlate with climatic variables. Our analyses of a colonizing A. thaliana lineage serve as a blueprint for future studies of similar colonizing or otherwise recently bottlenecked populations, in order to understand mutation, selection and rapid adaptation in nature.

RESULTS AND DISCUSSION

Herbarium and modern HPGI genomes

When analyzed with 149 genome-wide, intermediate-frequency SNP markers, about half of over 2,000 North American A. thaliana individuals collected between 1993 and 2006 were found to be very similar (Platt et al., 2010). A recent study of 13 individuals from this collection confirmed that their genomes were indeed almost identical (Hagmann et al., 2015). We selected 74 additional individuals for illumina whole-genome sequencing, aiming for broad geographic representation, and, where available, at least two accessions per collection site (Fig. 1; Table S1).

View this table:

Table S1. Sample information, Related to Figure 1.

Latitude and longitude for historic samples were imputed from the geographic centroid of the most accurate toponym described in the herbarium specimen label.

Key of abbreviations of herbarium collections or seed sources:

UCONN = University of Connecticut herbarium; CFM = Chicago Field Museum; NY = New York Botanical Garden; ABRC = Arabidopsis Biological Resources Center; OSU = Ohio State University H* indicate herbarium samples that cluster with the modern HPGI clade rather than the historic HPGI clade in Fig. 3.

Geographic location highlighted in Fig. 1.

Figure 1. Geographic location and temporal distribution of HPGI samples.

(A) Sampling location of herbarium specimens (blue) and modern individuals (green). (B) Temporal distribution of samples (randomly Jittered in a y axis for visualization). Stars indicate four herbarium accessions that nest in the clade of modern accessions. See Fig. 3.

Diversity and relationships within HPGI

Among the 87 modern individuals, seven clearly did not belong to the HPGI lineage, which could be due to errors in the initial genotyping, or to lack of resolution based only on 149 SNPs. Four additional individuals that were identical to the rest of the HPGI lineage at the 149 genotyped SNPs (Fig. S2A) appeared to have small stretches of introgression from other lineages and were therefore classified as non-HPGI, as indicated by several methods (e.g., Fig. S2B). Of the 36 herbarium samples, nine turned out to be non-HPGI lines (Fig. S2A and S2B). In total, 76 modern and 27 herbarium samples were identified as HPGI by means of neighbor-joining trees and multidimensional scaling (MDS), including the 12 oldest herbarium specimens (Fig. S2C). The obvious homogeneity and abundance of HPGI compared to other N. American lineages greatly simplified its classification.

After removal of non-HPGI lines, the HPGI neighbor-joining tree reconstruction resulted in a star-like phylogeny (Fig. 2A). MDS could not differentiate samples within the HPGI group, with the first and second dimensions each explaining only small amounts of variance, 8.8% and 8.0% (Fig. 2B). A parsimony network identified a small fraction of reticulations indicative of intra-HPGI recombination (Fig. 2C). Removing three potential intra-HPGI recombinants resolved the reticulations (Fig. 2D). The remaining 73 modern and 27 herbarium samples (Table S1) appeared to constitute a clonal lineage devoid of effective recombination and population structure, with no SNPs detected in chloroplasts nor mitochondrial genomes, and with very low genome-wide nuclear diversity (π = 0.000002, θ_w = 0.00001, 4,368 segregating sites), which is two orders of magnitude lower than in the native range (θ_w = 0.008) (Cao et al., 2011; Nordborg et al., 2005). The enrichment of low frequency variants (Tajima’s D = −2.84) and low levels of polymorphism in surveyed genomes is consistent with a recent bottleneck followed by population expansion. We hypothesize that the bottleneck corresponds to a colonization founder event, likely by one or only few very closely related individuals.

Figure 2. Relationship among herbarium and modern HPGI samples.

(A) Neighbor-joining tree. Consensus of 1,000 bootstrap replicates. Branch lengths indicate number of base substitutions. (B) First two dimensions of a multidimensional scaling plot based on pairwise identity-by-state distances. Fraction of variance explained given in parentheses. Phylogenetic network of all samples using the parsimony splits algorithm, before (C) and after (D) removing intra-HPGI recombinants.

Estimates of mutation rate and spectrum in the wild

To estimate the substitution rate in the HPGI lineage, we used a distance-and a phylogeny-based method, both of which take advantage of the collection dates of our samples. It is necessary to distinguish between substitutions and mutations. The substitution rate is the observed cumulative change in DNA that results from several evolutionary forces, such as demography and natural selection. These forces act in concert on the new mutations produced by DNA damage, repair and replication errors, which are presumed to be constant over time (Barrick and Lenski, 2013).

In the distance method, the substitution rate is first calculated from the correlation of distances of collecting dates with genetic distances, as measured in number of substitutions, then scaled to the size of the genome accessible to illumina sequencing (Fig. 3C). With this method, we estimated a rate of 3.3 × 10⁻⁹ substitutions site⁻¹ year⁻¹ (95% bootstrap Confidence Interval [CI]: 2.9 to 3.6 × 10⁻⁹). If one changes the thresholds for base calling, this affects both the number of called SNPs, and the fraction of the genome that is interrogated for variants. We therefore explored how either more relaxed or more stringent base calling methods affected our substitution rate estimates. We used three quality thresholds of increasing stringency (see Experimental Procedures for details) and found that the impact was negligible, with mean substitution rate estimates ranging from 3.0 to 4.0 × 10⁻⁹, compared to our standard threshold, which had given 3.3 × 10⁻⁹ substitutions site⁻¹ year⁻¹.

The Bayesian phylogenetic approach uses the collection years as tip-calibration points; its application resulted in a very similar estimate, 4.0 × 10⁻⁹ substitutions site⁻¹ year⁻¹ (95% Highest Posterior Probability Density [HPD]: 3.2 to 4.7 × 10⁻⁹). We confirmed MCMC chain convergence on demographic and tree topology parameters by repeating the analysis with this rate. The stability of all parameters indicated that under a low complexity scenario with no population structure or recombination, phylogenetic and population genetic methods generate congruent evolutionary rates.

Under neutral evolution, substitution and mutation rates should be the same, but typically substitution rates are expressed per year, whereas mutation rates are expressed per generation, among other conceptual differences (Barrick and Lenski, 2013; Kimura, 1967). Although A. thaliana has an annual life cycle, the generation time in nature has been estimated to average 1.3 years (Lundemo et al., 2009), because seeds could potentially survive 3 to 5 years in a seed bank (Montesinos et al., 2009). To correctly compare the substitution rates from our study with mutation accumulation lines propagated in the greenhouse (Ossowski et al., 2010), we re-scaled the estimated substitution rate by the 1.3 year average, resulting in 4.2 × 10⁻⁹ substitutions site⁻¹ generation⁻¹ (95% CI 3.7 to 4.7 × 10⁻⁹) (Fig. 3E, Table S3).

To obtain the best possible estimate of short-term mutation rates for comparison, we reanalyzed a recent re-sequencing dataset of mutation accumulation lines grown in the greenhouse (Hagmann et al., 2015); from this, we confirmed a rate of 7.1 × 10⁻⁹ mutations site⁻¹ generation⁻¹ (95% CI 6.3 to 7.9 × 10⁻⁹) (see Table S2 and Extended Experimental Procedures). In several species, including Escherichia coli (Sniegowski et al., 1997) and A. thaliana (Jiang et al., 2014), growth under abiotic stress can increase mutation rates. Although wild conditions can be considered moderately stressful environments compared to standard greenhouse conditions, we found the generation-corrected substitution rate in the HPGI lineage to be lower than the mutation rate in greenhouse lines. The mutation spectrum was, however, closer to that of greenhouse lines exposed to salt stress (Jiang et al., 2014) than to the greenhouse lines grown under standard conditions (Ossowski et al., 2010) (Fig. 3D). One possible contributor to a shift in mutation spectrum is DNA methylation, since methylated cytosines are more likely to undergo substitutions than unmethylated cytosines, something that has been observed in other natural accessions (Cao et al., 2011; Hagmann et al., 2015).

Genome-wide inference of selection

One likely explanation for the unexpected differences between the greenhouse mutation rate and our estimate from the HPGI population (Fig. 3E) is the effect of purifying selection, which should slow the accumulation of mutations in the wild. In other organisms, including humans, estimates of short-and long-term mutation rates differ considerably and have motivated a hot debate (Ho et al., 2005; Scally and Durbin, 2012; Ségurel et al., 2014; Subramanian and Lambert, 2011). In humans, counterintuitively, pedigree-based short-term estimates of nuclear mutation rates are lower (Kong et al., 2012; Roach et al., 2010) than long-term estimates based on interspecific phylogenies (Nachman and Crowell, 2000). Recently, the use of DNA retrieved from dated fossils (Fu et al., 2014) and new methods incorporating recombination map scaling (Lipson et al., 2015) have produced more concordant, intermediate mutation rates estimates. That long-term rates are lower is expected, since purifying selection would have had more time to effectively remove deleterious mutations from the population. Indeed, older calibrating points in human-great ape phylogenies have yielded lower substitution rate estimates (Subramanian and Kumar, 2003). Alternatively, long-and short-term rates may really be different, because of changes in generation times or fluctuating mutation rates (Green and Shapiro, 2013). Discrepancy could perhaps also come from intra-specific variation in mutation rates (e.g. the effect of genetic background), reported to be more than 7-fold across genotypes of Chlamydomonas reinhardtii (Ness et al., 2015). This, however, does not seem to apply when comparing natural and greenhouse populations. Phylogenetic and regression-based methods produced very similar estimates for the HPGI population, and were similar to mutation rate measurements in a greenhouse population with an exactly known number of generations. We attribute the small differences between the A. thaliana populations to either the efficiency of purifying selection over different temporal and environmental scales or to imperfect knowledge of generation time.

To test the purifying selection hypothesis, we compared mutation rates in differently annotated portions of the genome. Ideally, one would compare synonymous substitutions at four-fold degenerate sites with non-synonymous substitutions, but there were too few of such substitutions in our data set to achieve appropriate statistical power (on average 0.9 four-fold and 2.7 nonsynonymous mutations per 30 generations in mutation accumulation lines). We therefore used the net distances method to compare rates in intergenic regions with whole-genome rates. The comparison of mutation rates across annotations supported the hypothesis that purifying selection is the cause of different mutation rate estimates in the HPGI and greenhouse populations. The estimate for whole-genome rates was 33.59% (95% CI 33.59 - 33.60) lower than the intergenic estimate in the HPGI lineage, compared to 26.04% (95% CI 21.44 - 29.31%) in the greenhouse population (Fig. 3E). In addition, medium-frequency variants (4% ≤ allele frequency ≤ 50%) were more strongly depleted in the whole-genome set compared to intergenic regions (Fisher’s Exact test, p=0.03) in the HPGI linage.

The observed rate at which new mutations accumulate in populations, the substitution rate, depends on both the number of individual genomes in the population in which mutations occur, for diploid species 2 N_e, and on the selection coefficient s, affecting the probability of fixation of a mutation. When selection is negligible and only genetic drift operates, the probability of fixation of a new mutation is equal to its frequency (1/2 N_e). Under neutrality, the observed rate at which mutations accumulate equals the rate at which mutations arise. If we assume that the behavior of intergenic substitutions is close to neutrality, we can use it as the reference mutation rate, μ, and compare it with the genome-wide substitution rate, k, to solve for the genome-wide selection coefficient of the fixation probability equation from Kimura (1967). The coefficient responsible for the genome-wide deficit in substitutions was N_e s = −0.76. Only a coefficient scaled by population size is meaningful in our context, since theory predicts that selection is efficient when N_e |s| > I, where |s| is the absolute value of a hypothetical semi-dominant genome-wide selection coefficient. Our estimate is negative, suggesting a net effect of purifying selection, but its value is smaller than I, indicating that the number of substitutions is largely determined by population drift (Charlesworth and Charlesworth, 2010).

We were curious whether our result of net inefficient purifying selection is related to the mating system, namely predominant selfing, or the recent genetic bottleneck of the HPGI lineage. A previous point estimate of the coefficient of selection N_e s in A. thaliana was ∼ −0.8, using an approach based on polymorphism within A. thaliana and divergence between A. thaliana and its close relative A. lyrata in 12 nuclear genes (Bustamante et al., 2002). The same study reported that in the genus Drosophila N_e s was positive and greater than one, indicative of widespread and effective selection. The authors hypothesized that in highly selfing species, N_e decreases due to inbreeding, reducing the ability of selection to purge slightly to moderately deleterious mutations, consistent with other studies (Charlesworth and Wright, 2001; Ness et al., 2010; Wright et al., 2008).

We recognize that averaging selection coefficients across the entire genome may be inappropriate if different genomic features are under very different selection regimes, resulting in a highly dispersed or even bimodal distribution of selection coefficients. Point estimates should therefore be treated with caution. Keightley and Eyre-Walker (2007) showed that this is the case in humans, by estimating the distribution of purifying selection coefficients using the distribution of predicted fitness effects of various polymorphisms. They found, however, that this did not apply to Drosophila melanogaster, where almost the entire genome was under strong purifying selection, with N_e s > 100 (Keightley and Eyre-Walker, 2007). A case that may resemble more closely HPGI evolution is that of the plant Eichhornia paniculate, which experienced a recent intra-species transition to selfing. As a consequence, purifying selection coefficients have become more broadly distributed, with the proportion of almost neutral coefficients having increased due to low N_e, and the proportion of strongly negative coefficients also having increased due to homozygosity, which uncovers recessive deleterious sites (Arunkumar et al., 2015). Given these studies and our average selection coefficient estimate, we hypothesize that a combination of brief evolutionary history and low N_e has reduced the efficiency of natural selection, with only highly deleterious mutations being eliminated. More information could be obtained by developing new models and performing simulations of site frequency spectra that include different demographic scenarios in combination with selfing.

Phenotypic effect and spatio-temporal context of de novo mutations

In the HPGI lineage, drift seems to determine genome-wide polymorphism patterns, but there is some evidence for purifying selection. We wondered whether, in addition, we would be able to find signals of adaptive, positive selection, expected to be much rarer and thus much more difficult to detect. Selection scans based on population divergence or haplotype sharing decay are inappropriate when divergence between samples is low and/or when there is high intra-and inter-chromosomal linkage disequilibrium. We therefore adopted an association approach in an effort to link segregating mutations to climatic variables as well as phenotypic variation in several traits of likely ecological relevance: flowering phenology, fruit set (fecundity), seed size, root growth and morphology. Replicated measurements of phenotypic traits in controlled conditions showed significant quantitative variation between lines as described by broad sense heritability (Table S4). HPGI individuals resemble near isogenic lines (NILs) in that they share large segments of the genome. Formally, genetic mapping with NILs seeks to associate phenotypes with large blocks of linked variants. It has been successfully used to examine the genetic basis of many different traits in crop species (Brouwer and St Clair, 2004; Stec et al., 2013; Szalma et al., 2007; Xie et al., 2006) and also in A. thaliana (Bentsink et al., 2010; Fletcher et al., 2013; Keurentjes et al., 2007; Swarup et al., 1999; Weigel, 2012). Our approach has the advantage that it can discern the phenotypic effects of a limited number of mutations free from confounding population structure (see Extended Experimental Procedures). In association analyses, statistical power relies on variants with a certain minimum frequency, hence we only considered ∼400 variants with at least 5% allele frequency. These are, however, not independent due to linkage disequilibrium, thus rather comprise haplotypes (Templeton et al., 1988). Focusing on intermediate frequency variants not only increases statistical power, but is also more likely to reveal adaptive mutations, because intermediate frequency variants will be on average older and less likely to be deleterious.

View this table:

Table S4.

Description of phenotypic and climatic variables for association mapping analyses. Related to Figure 5. Mean and standard deviation across accessions for each phenotypic and climatic variable.

Broad sense heritabilities (H2) calculated from between line and within line (between replicate) variance in ANOVA framework. Narrow sense heritability (h2) calculated employing linear mixed models and Kinship matrix from mean accession values.

With permutation tests to assess significance, we found several root phenotypes to be significantly associated with 79 SNPs. Thirty-six of these were in protein coding genes and nine resulted in non-synonymous substitutions. Nineteen other SNPs were associated with climate variables (www.worldclim.org/bioclim) even after correction for latitude and longitude. Eight of these were located in genes, and four resulted in non-synonymous substitutions (Table 1, Table S4, S5). We did not find SNPs that were significantly associated with flowering, fecundity or seed size. In addition to permutation testing, we applied a Bonferroni corrected significance threshold to account for multiple traits tested. As an alternative to the permutation approach, we adjusted the significance threshold for multiple traits and SNPs tested. Even with these two very conservative approaches, 13 and four genic SNPs remained significant for root phenotypes and climate variables, respectively (Table 1).

View this table:

Table S5.

SNP hits from association mapping. Related to Table 1.

View this table:

Table S6.

Supplemental Graphic Table 6

View this table:

Table 1. Genie SNPs associated with different traits.

Most SNPs first appeared in sample JK2530 collected 1922 in Indiana. For non-synonymous SNPs, the amino acid transition and the Grantham score (ranging from 0 to 215) are reported. All SNPs in the table were significant (p < 0.05) after raw p-values were permutation corrected. # highlights those whose permutation corrected p-values were still significant when the threshold was corrected by multiple traits (p<0.002). * indicates SNPs when raw p-values passed the threshold corrected by multiple SNP correction as well as multiple trait correction (p<0.0001). See Table S4 for details on phenotypes and climatic variables, and Table S5 for information on all significant SNPs.

The most common climate variable with significant SNP associations was precipitation during the warmest quarter of the year, followed by mean temperature during the wettest quarter, and precipitation during the wettest quarter and month. Some SNPs were associated with both climate variables and root phenotypes, with the caveat that these traits can be correlated, for example, root growth-related traits with precipitation-related variables and root gravitropism-related traits with temperature-related variables. The non-independence of traits would have made our multiple testing correction procedures even more stringent. SNPs associated with root variables alone and/or with climate variables were first observed in older herbarium samples when compared with random SNPs segregating at similar allele frequencies (Fig. 5A). This suggested an older origin for variants associated with relevant phenotypes, which could point to positive selection having maintained them for over a century.

Figure 5. Spatial and temporal emergence of mutations associated with root morphology phenotypes and/or climate variables.

(A) Age distribution of the oldest herbarium sample with the derived allele of each SNP with a significant trait association, compared with genome-wide SNPs with at least 5% minor allele frequency (black), or without frequency cutoff (grey). (B) Spatial centroid of all samples carrying derived-allele SNPs shown in (A).

Population demography and migrations

The substitution rate estimate immediately allows dating of the HPGI colonization of North America. We first inferred the root of the HPGI phylogenetic tree using Bayesian methods. The mean estimate was the year 1597 (Highest Posterior Probability Density 95%: 1519-1660) (Fig. 3A, B). We also used a non-phylogenetic method that utilizes the relationship among the genetic distance of two individuals, their average divergence time, and the mutation rate. The average divergence d between sequences can be approximated by the mutation rate μ multiplied by twice the divergence time L, since mutations accumulate on both branches of diverging sequences:

We used our previously estimated substitution rate and the average pairwise genetic distance to calculate a divergence time of 363 years. Subtracting this age from the average collecting date of our samples gave a point estimate of 1615, very close to the Bayesian estimate of 1597. Both are in agreement with a colonization in the post-Columbian era. We believe the substitution rate in the wild reported here is more appropriate when dating evolutionary events in Arabidopsis thaliana that using the higher greenhouse mutation rate, from which we had previously inferred a more recent colonization of N. America by HPGI (Hagmann et al., 2015).

Knowing both the mutation rate μ and average pairwise differences π, we can obtain an approximate estimate of the effective population size (N_e), by solving the equation π < 4N_eμ < θ_w, from which we can place N_e somewhere between 152 - 758. A single N_e value represents the harmonic mean of N_e over time, and thus is much closer to the historic N_e minimum than to the arithmetic average over time (Wright, 1940). That N_e is so small is consistent with the recent HPGI founder bottleneck. Pairwise genetic distances between samples within the same decade, an approximate measure of diversity, increased over time (Fig. S5), which supports a trend of historic population growth. More sophisticated inference of N_e through time came from our dated phylogeny and its coalescent model (Fig. 3B). However, our model had no resolution at the root of the tree, where population size could be N_e=l, since HPGI may have been founded by a single individual, or a few almost identical individuals. Until the early 19^th century, the model suggested exponential population growth, followed by slight shrinkage (Fig 3B). The shrinkage in population during the last century is reflected in time-calibrated phylogenies (Fig. 3A, B), which showed that modern samples descended from a very limited number of historic sublineages, with only four 20th-century herbarium samples being closely related to modern samples. Altogether, population size fluctuations and the disjoint distribution of A. thaliana today (Platt et al., 2010) suggest that the N. American population passed through recurrent bottlenecks since the initial colonization.

Since we knew both the collection years and origins of the HPGI samples, we could also analyze the migration dynamics of HPGI. The phylogeographic models suggested that HPGI dispersed over much of its modern range already soon after its introduction to N. America (Fig. S5 A, B). Based on the collection dates and sites of the herbarium samples, we postulate that the oldest populations were established in the Northeast, from where they migrated west in discrete long-distance dispersions, likely helped by humans. Corroborating this hypothesis, we found a significant correlation between collection date and either latitude (linear regression coefficient r = 0.32; p = 3.5 × 10⁻¹⁰) or longitude (r = 0.20; p = 3.7 × 10⁻⁶) (Fig. 4A), which we interpret as a net, yet highly dispersed, movement in a Northwestern direction over time. Additional support comes from an isolation-by-distance signal, which is most consistent with a historic westward dispersion and a more recent reverse eastward migration (Fig. 4 B, C; see Extended Experimental Procedures). The Lake Michigan area, where major populations are found today, was both the apparent source of new migrants and the region where most derived alleles of SNPs associated with root and climate traits first appeared (Fig. 5B). The coincidence between these patterns of HPGI diversity and land use change for agricultural purposes in the last two centuries (Goldewijk and Ramankutty, 2004) is striking, although historical sampling biases are unknown. We hypothesize that agricultural changes could have driven the initial establishment of HPGI in N. America, since most current A. thaliana habitats are used agriculturally or are cultivated by humans in other ways.

Figure 4. Migration dynamics of HPG1.

(A) Linear regression of longitude and latitude as a function of collection year. The p-value was obtained from the t-test of the slope. (B) Origin of herbarium and modern geographic spread, determined using separate heuristic searches of isolation-by-distance patterns. Three locations of modern samples and four of herbarium samples showed significant slope (p<0.05) in the isolation-by-distance pattern. That is, genetic distance increased when moving apart from those geographic locations. For one sample of each subset a likely migration trajectory is depicted by an arrow. (C) Isolation-by-distance patterns of the herbarium (left) and modern (right) samples from which the hypothetical trajectory in (C) was inferred.

CONCLUSIONS

We have exploited whole-genome information from historic and contemporary collections to understand fine-scale genome evolutionary dynamics in the context of a recent colonization by Arabidopsis thaliana. By deriving a rigorously supported estimate for the mutation rate in the wild, we have answered the long-standing question of how rapidly diversity is generated in natural plant populations. We have presented evidence that purifying selection explains the discrepancy between short-and long-term mutation rate estimates. Finally, even though rapidly expanding populations such as the one studied here are severely affected by drift, limited in diversity, and likely constrained by purifying selection, we found de novo mutations with apparent phenotypic effects that could have been subject to Darwinian, adaptive selection. Recent invasion and colonization events such as the A. thaliana HPGI example are natural experiments ideally suited for analyzing adaptation to new environments. Finally, our work should encourage others to unlock the potential of herbarium specimens for the study of evolution in action.

EXPERIMENTAL PROCEDURES

Additional details are given in the Extended Experimental Procedures in Supplemental information.

Sample collection and DNA sequencing

Modern A. thaliana accessions were from the collection described by Platt and colleagues (2010); HPGI candidates were identified based on 149 genome-wide SNPs (Table S1). Herbarium specimens (collection dates 1863-1993) were directly sampled by our colleagues jane Devos and Gautam Shirsekar, or sent to us by collection curators from various herbaria (Table S1). DNA from herbarium specimens was extracted as described (Yoshida et al., 2013) in a clean room facility at the University of Tubingen,. Two sequencing libraries with sample-specific barcodes were prepared following established protocols, with and without repair of deaminated sites using uracil-DNA glycosylase and endonuclease VIII (Briggs et al., 2010; Kircher, 2012; Meyer and Kircher, 2010). DNA from modern individuals was extracted from pools of eight siblings of each inbred line. Genomic DNA libraries were prepared using the TruSeq DNA Sample prep kit or TruSeq Nano DNA sample prep kit (illumina, San Diego, CA), and sequenced on illumina HiSeq and MiSeq instruments. Reads were mapped with GenomeMapper v0.4.5s (Schneeberger et al., 2009) against an HPGI pseudo-reference genome (Hagmann et al., 2015), and against the Col-0 reference genome. Samples JK2509 to JK2531 were only mapped to the HPGI pseudo-reference genome. Coverage, number of covered positions in the genome, and number of SNPs identified per accession relative to HPGI are reported in Table S1. We also re-sequenced the genomes of twelve mutation accumulation (MA) lines (Becker et al., 2011; Shaw et al., 2000) (Table S2).

Phylogenetic methods and genome-wide statistics

We used four methods to estimate the relationships among modern accessions, and between modern and herbarium samples: (i) multidimensional scaling (MDS) analysis; (ii) construction of a neighbor joining tree with the adegenet package in R (Jombart, 2008), with branch support assessed with 1,000 bootstrap iterations; (iii) construction of a parsimony network using SplitsTree v.4.12.3 (Huson and Bryant, 2006), with confidence values calculated with 1,000 bootstrap iterations; (iv) performing a Bayesian phylogenetic analysis using BEAST v. 1.8 (Bouckaert et al., 2014; Drummond et al., 2012) (see below).

We estimated genetic diversity as Watterson’s θ (Watterson, 1975) and nucleotide diversity π, and the difference between these two statistics as Tajimas’s D (Tajima, 1989) using DnaSP v5 (Librado and Rozas, 2009). We calculate the folded site frequency spectrum (SFS) as well as the unfolded SFS, for which we assigned the ancestral state using the Arabidopis lyrata genome (Hu et al., 2011). We estimated pairwise linkage disequilibrium (LD) between all possible combinations of informative sites, ignoring singletons, by computing r², D and D’ statistics. For the modern individuals, we calculated the recombination parameter rho (4Ner) and performed the four-gamete-test (Hudson and Kaplan, 1985) to identify the minimum number of recombination events. All LD and recombination related statistics were determined using DnaSP v5 (Librado and Rozas, 2009).

Substitution and mutation rate analyses

We used genome-wide nuclear SNPs to calculate pairwise “net” genetic distances using the equation D’_ij = D_ic-D_jc, where D’_ij is the net distance between a modern sample i and a herbarium sample j; D_ic the distance between the modern sample i and the reference genome c; and D_jc is the distance between a modern sample (j) and the reference genome (c). We calculated a pair-wise time distance in years, Tij, using the collection dates and linear regression: D’ = a+bT. The slope coefficient b describes the number of substitution changes per year. We used either all SNPs or subsets of SNPs at different annotations appropriately scaled by accessible genome length.

The second approach used Bayesian phylogenetics with the tip-calibration method implemented in BEAST vl.8 software (Drummond et al., 2012). Our analysis optimized simultaneously and in an iterative fashion using a Monte Carlo Markov Chain (MCMC) a tree topology, branch length, substitution rate, and a demographic Skygrid model. The demographic model is a Bayesian nonparametric one that is optimized for multiple loci and that allows for complex demographic trajectories by estimating population sizes in time bins across the tree based on the number of coalescent events per bin (Gill et al., 2012). We also performed a second analysis run using a fixed prior for substitution rate of 3.3 × 10⁻⁹; substitutions site⁻¹ year⁻¹ that we had estimated empirically using the net-distance method to confirm that the MCMC had the same parameter convergence, e.g. tree topology, as the first ‘estimate-all-parameters’ run.

Inference of genome-wide selection parameters

We separately analyzed sequences at different annotations, since some regions should be under a different selection regime (less evolutionary constraint) than others. We estimated the average strength of genome-wide selection by contrasting substitution rates in the entire genome and in intergenic regions. We use the latter as a near-neutral contrast because it provides more statistical power in our sample with limited diversity, than the more usual contrast between synonymous (or fourfold degenerate) and non-synonymous sites. Selection was estimated based on the equation k = μ × Q × 2N_e, where Q is the fixation probability of a new mutation (Barrick and Lenski, 2013; Kimura, 1967), and the equation Q ≈ s / 2N_e (l-e^−2_es) (Charlesworth and Charlesworth, 2010).

Association analyses and dating of new mutations

We collected flowering, seed and root morphology phenotypes for 63 modern accessions. For associations with climate parameters, we followed a similar rationale as described (Hancock et al., 2011). We extracted information from the publicly available bioclim database (http://www.worldclim.org/bioclim) at 2.5 degrees resolution raster and intersected it with geographic locations of HPGI samples (n = 100). We performed association analyses under several models and p-value corrections using the R package GeneABEL (Aulchenko et al., 2007), with phenotypes and climatic variables as response variables and SNPs as explanatory variables and appropriate correcting covariates. Significance estimates were corrected with 1,000 permuted datasets, or with Bonferroni correction.

Accession numbers

Short reads have been deposited in the European Nucleotide Archive under the accession number TO BE UPDATED UPON ACCEPTANCE.

SUPPLEMENTAL INFORMATION

Supplemental Information includes Extended Experimental Procedures, six supplemental figures and six tables, and can be found online at TO BE UPDATED UPON ACCEPTANCE.

SUPPLEMENTAL INFORMATION FOR

Exposito-Alonso, Becker et al.: THE RATE AND EFFECT OF DE NOVO MUTATIONS IN NATURAL POPULATIONS OF ARABIDOPSIS THALIANA

Supplemental Tables

Tables S1 to S5 in file Exposito-Alonso_2016_TABLES_S1_to_S5.xlsx

Table S1. Sample information. Related to Figure 1.

Table S2. Sample information for greenhouse-grown mutation accumulation lines. Related to Figure 3.

Table S3. Mutation rate estimates for different annotations in HPGI and greenhouse-grown mutation accumulation lines. Related to Figure 3.

Table S4. Description of phenotypic and climatic variables for association analyses. Related to Figure 5.

Table S5. SNP hits from association analyses. Related to Table 1.

Table S6. Trait distributions and QQ plots of association analyses. Related to Figure 5.

For each trait employed in association analyses, we report the histogram distribution and the QQ plot of p-values to ensure that no trait departs exaggeratedly from the normal distribution, and that no inflation of p-values is observed (when lambda <= 1, there is no inflation of false positives).

Supplemental Figures

Figure S1. Ancient-DNA-like characteristics of herbarium-derived libraries not treated with uracil glycosylase. Related to Figure 1.

Figure S2. Separation between HPGI and other North American lineages. Related to Figure 2.

Figure S3. Substitution spectrum and relationship between methylation and substitutions. Related to Figure 3

Figure S4. Density of SNPs along all chromosomes and location of SNP hits. Related to Figure 5.

Figure S5. Bayesian phylogeographic inference using continuous trait models, and HPGI genetic diversity in time and space. Related to Figure 4.

Figure S6. Linkage disequilibrium and SNPs with significant trait associations and correlations between SNP effects, frequency and age. Related to Figure 5.

Supplemental Experimental Procedures Supplemental References

SUPPLEMENTAL TABLES

See separate.xlsx file for Tables S1-5 and separate.pdf file for Graphic Table S6.

Figure S1. Ancient DNA-like characteristics of herbarium-derived libraries not treated with uracil glycosylase.

(A) Percentage of Arabidopsis thaliana endogenous DNA. (B) Median length of merged reads. (C) Percentage of cytosine to thymine (C-to-T) substitutions at first base (5’ end). (D) Relative enrichment of purines (adenine and guanine) at 5’ end breaking points. Position −1 is compared with position −5. Numbers indicate genomic context before upstream reads’ 5’ end.

Related to Figure 1.

Figure S2. Separation between HPGI and other North American lineages.

(A) Neighbor-joining tree built using illumina-based SNP calls at the 149 genotyping markers originally used to identify HPGI candidates (consensus of 1,000 replicates). HPGI accessions are shown in black, whereas other North American lineages are depicted in red. (B) Neighbor-joining tree based on genome-wide SNPs (Consensus of 1,000) replicates. Accessions colored as in (A). Note that three accessions originally classified as HPGI based on 149 SNPs (A) are placed outside this clade. A further accession (BRRR7) within the HPGI main branch turned out to be a recombinant that was removed from the analysis. (C) First two dimensions of a multidimensional scaling plot based on the identity by state pairwise distances. Notice that the black dot arises as a result of plotting multiple almost-identical HPGI grey dots. Numbers between parentheses indicate the percentage of the variance explained by each dimension.

Related to Figure 2.

Figure S3. Substitution spectrum and relationship between methylation and substitutions.

(A) “Unfolded” site frequency spectrum using Arabidopsis lyrata as outgroup for all transitions and transversions. (B) Substitution spectrum for all transitions and transversions divided by genomic annotation. (C, D) Fraction of intergenic SNPs (C) and coding sequence (CDS) SNPs (D) that correspond to methylated cytosines in the HPG-I pseudo-reference. Methylation data was taken from (Hagmann et al., 2015). (E, F) Fraction of intergenic (E) and CDS SNPs (F) that correspond to methylated cytosines in the Col-0 reference genome (methylation data from (Becker et al., 2011)). Blue and red lines indicate fractions for SNPs segregating within the HPGI population. Red and blue histograms indicate fractions for subsets of SNPs fixed within the HPGI population. Grey histograms indicate fractions for invariant positions, i.e., cytosines that have not undergone substitution. See Extended Experimental Procedures for details.

Related to Figure 3.

Figure S4. Density of SNPs along all chromosomes and location of SNP hits.

The line shows the number of SNPs per 100 kb window. Centromere locations are indicated by grey background. Vertical lines indicate SNPs associated with root phenotypes (red) and climatic variables (blue).

Related to Figure 5.

Figure S5. Bayesian phylogeographic inference using continuous trait models, and HPGI genetic diversity in time and space.

(A, B) The model infers the most probable geographic location of each of the nodes of the phylogeny in Figure 3. (A) Ancestral distribution map summarizing the first ∼100 years of the phylogenetic tree (green). The clouds represent the 95% interval of the Highest Posterior Probability Density of locations. (B) Current distribution map (blue) summarizing the last ∼100 years. Clouds as in (A). (C, D) Diversity in time and space. (C) Diversity in time. Each point represents the average hamming genetic distance among samples within a decade. The black line shows the fit using a generalized additive model and the grey shaded area the 95% confidence interval. (D) Diversity in space. Each point represents the average hamming genetic distance among the 10 geographically closest neighbors. Genetic distances are represented as a scaled gradient from red (low) to blue (high) local genetic diversity.

Related to Figure 3 and 4.

Figure S6. Linkage disequilibrium and SNPs with significant trait associations and correlations between SNP effects, frequency and age.

(A-F) Linkage disequilibrium and SNPs with significant trait associations. Histogram of genetic distances (A) between samples when evaluating only coding regions at 5% minimum allele frequency. Linkage disequilibrium between SNP hits measured as r² (B) and D’ (C). Three significant SNPs were further studied to exemplify the power of association analyses with HPGI. For each, phenotypic differences between accessions that differ in the focal SNP and that are otherwise virtually genetically identical are compared both with all pairs of accession and with pairs of accessions completely identical for coding regions. Below each violin plot is the histogram of linkage disequilibrium of the focal SNP with all other SNP hits. The three focal SNPs evaluated are inside AT5GI9330 (D), ATIG54440 (E) and AT2GI6580 (F) genes. (G-J) Correlation between SNP effects, frequency and age. Correlation between SNP frequency and p-value (G), frequency and effect (H), age and p-value (I), age and effect (J).

Related to Figure 5.

SUPPLEMENTAL EXPERIMENTAL PROCEDURES

Sample collection

Modern A. thaliana accessions were chosen from the collection described by Platt and colleagues (2010); HPGI candidates were identified based on 149 genome-wide SNPs (Table S1). Seeds were bulked at the University of Chicago. Progeny for DNA extraction was grown at the Max Planck Institute for Developmental Biology. Herbarium specimens (collection dates 1863-1993) were directly sampled by our colleagues jane Devos and Gautam Shirsekar, or sent to us by collection curators (Table S1). We used 2 to 8 mm² of dried tissue for destructive sampling.

DNA extraction, library preparation and sequencing

DNA from herbarium specimens was extracted in a clean room facility at the University of Tubingen as described (Yoshida et al., 2013). Two sequencing libraries were prepared for each specimen; without and with repair of deaminated sites with uracil-DNA glycosylase and endonuclease VIII (Briggs et al., 2010). DNA from modern, live samples was extracted from rosette leaves pooled from 8 individual plants using the DNeasy plant mini kit (Qiagen, Hilgendorf, Germany). Genomic DNA libraries were prepared using the TruSeq DNA Sample prep kit or TruSeq Nano DNA sample prep kit (illumina, San Diego, CA). Unrepaired herbarium libraries were screened for authenticity by sequencing at low coverage on illumina HiSeq 2500 or MiSeq instruments. Production sequencing (101 bp paired end) was carried out on an illumina HiSeq 2000 instrument.

Read processing

Paired-end reads from modern samples were trimmed and quality filtered before mapping using the SHORE pipeline vO.9.0 (Hagmann et al., 2015; Ossowski et al., 2008). Because ancient DNA fragments are short (Fig. S1B), forward and reverse reads for herbarium samples were merged after trimming, requiring a minimum of 11 bp overlap (Yoshida et al., 2013), and were treated as single-end reads. Reads were mapped with GenomeMapper v0.4.5s (Schneeberger et al., 2009) against an HPGI pseudo-reference genome (Hagmann et al., 2015), and against the Col-0 reference genome. Samples JK2509 to JK2531 were only mapped to the HPGI pseudo-reference genome. Coverage, number of covered positions in the genome, and number of SNPs identified per accession relative to HPGI are reported in Table S1.

We also sequenced the genomes of twelve greenhouse-grown mutation accumulation (MA) lines (Becker et al., 2011; Shaw et al., 2000) (Table S2). We called SNPs, indels and structural variants (SVs), following the workflow and parameters described (Hagmann et al., 2015), but without repeated iterations. This procedure resulted in 2,203 polymorphisms that were shared by all lines, indicating errors in the reference sequence (12% of variants replaced N’s in the TAIR9 genome) or genetic differences in the founder plant of the MA population compared to the Col-0 individual that had been used to generate the reference genome. In addition, we identified 388 segregating variants across the twelve lines (Table S2), of which 350 were singletons. This analysis revealed on average 25.5 SNPs, 4.9 deletions and 3.2 insertions per 31st generation line (Table S2), compared to 19.6 SNPs, 2.4 deletions and 1.0 insertions previously detected in the 30^th generation with shorter read length and lower read depth (Ossowski et al., 2010). The genome length accessed in this sequencing effort, 115,954,227,bp, was used to scale the number of point mutations to a rate of 7.1 × 10⁻⁹ mutations site⁻¹ generation⁻¹ (Table S3).

Identification of bona fide HPGI accessions and HPGI phylogeny

We established the relationships among samples at three levels of resolution: (i) the original 149 nuclear SNP genotyping calls based on which the HPGI haplogroup had been identified (Platt et al., 2010), (ii) SNPs in the chloroplast genome (where we did not find any variants), (iii) and all nuclear genome SNPs. At these three levels we performed a multidimensional scaling (MDS) analysis and built a neighbor-joining tree using the adegenet package in R (Jombart et al., 2008).

We used four methods to estimate the relationships among modern accessions, and between modern accessions and historic specimens: (i) multidimensional scaling (MDS) analysis; (ii) construction of a neighbor joining tree with the adegenet package in R (Jombart, 2008), with branch support assessed with 1,000 bootstrap iterations; (iii) construction of a parsimony network using SplitsTree v.4.12.3 (Huson and Bryant, 2006), with confidence values calculated with 1,000 bootstrap iterations; (iv) performing a Bayesian phylogenetic analysis using BEAST v. 1.8 (Bouckaert et al., 2014; Drummond et al., 2012) (see below).

Descriptive genome-wide statistics

We estimated genetic diversity as Watterson’s θ and nucleotide diversity π, and the difference between these two statistics as Tajimas’s D using DnaSP v5 (Librado and Rozas, 2009), both for the entire dataset and independently for modern and herbarium specimens. We calculated the folded and unfolded site frequency spectrum (SFS) for the whole dataset. For the unfolded SFS, we assigned the ancestral state using the Arabidopsis lyrata genome (Hu et al., 2011). We estimated pairwise linkage disequilibrium (LD) between all possible combinations of informative sites, ignoring singletons, by computing r² D and D’ statistics. LD decay was estimated using a linear regression approach. For the modern individuals, we calculated the recombination parameter R and performed the four-gamete-test (Hudson and Kaplan, 1985) to identify the minimum number of recombination events. All LD and recombination related statistics were determined using DnaSP v5 (Librado and Rozas, 2009).

Substitution and mutation rate analyses

Greenhouse.grown mutation accumulation lines

Mutation rate estimated from greenhouse-grown mutation accumulation lines (Becker et al., 2011) was calculated per line, and the mean and confidence intervals are reported. For each 31^st generation MA line, the number of point mutations detected was divided by 31 and by the total genome length. The genome length was determined as all base pairs with coverage higher or equal to 3, and a SHORE mapping quality score of at least 32 in one sample (Table S2).

Natural populations of HPGI

To estimate the number of nucleotide changes per year in natural populations of HPGI, we took advantage of the known collection years of the samples. We used genome-wide nuclear SNPs to calculate pairwise “net” genetic distances between historic and modern HPGI samples using the equation D’_ij = D_ic-D_ij, where D’_ij is the net distance between a modern sample i and a historic sample j; D_ic the distance between the modern sample i and the reference genome c; and D_jc is the distance between a modern sample (j) and the reference genome (c). We calculated a pair-wise time distance in years, T_ij, between all modern and historic pairs using the collection dates and linear regression:

The slope coefficient b describes the number of substitution changes per year. However, the points in the regression are not independent because different lines have some common evolutionary history, regression confident intervals would be “over-confident”. We calculated more rigorous 95% confidence intervals using 1000 bootstrap resamples (Drummond et al., 2003). We used either all SNPs or SNPs at specific annotations. To scale the genome-wide substitution rate into a per-base rate, we used all positions that passed SNP or reference call quality thresholds, instead of using a single value of genome length.

The second approach to estimate a substitution rate was framed in Bayesian phylogenetics using the tip-calibration approach implemented in BEAST vl.8 software (Drummond et al., 2012). After systematic runs and chain convergence assessment of different demographic and molecular clock models, we determined that the Skygrid demographic model and the lognormal relaxed molecular clock were the most appropriate. Our analysis simultaneously optimized tree topology and length, substitution rate, and the demographic model. Using the relationship between the time distance of two sequences and the difference in branch length in the tree, BEAST estimates a molecular clock. Under a relaxed molecular clock, the substitution rate is allowed to vary across branches with a lognormal distribution. The prior used for molecular clock was a Continuous-Time Markov Chain (CTMC) (Ferreira and Suchard, 2008). The demographic model is a Bayesian nonparametric demographic model that is optimized for multiple loci, and which allows for complex demographic trajectories by estimating population sizes in time bins (of 10 years in our case) across the tree, based on the number of coalescent events per bin (Gill et al., 2012). In addition, to confirm that demography and root dating converged on the same parameters, we performed a second estimate using a fixed substitution rate of 3.3 × 10⁻⁹ substitutions site⁻¹ years⁻¹ that we had estimated empirically using the net-distance method.

The analysis was carried out remotely at CIPRES PORTAL (v3.l www.phylo.org) using uninformative priors. The run took about 1,344 CPU hours and performed 1,000 million steps in a Monte Carlo Markov Chain (MCMC), sampling every 100,000 steps. Burn-in was adjusted to 10% of steps. To visualize the tree output we produced a Maximum Clade Credibility (MCC) tree with a minimum posterior probability threshold of 0.8 and a 10% burn-in using TreeAnnotator (part of BEAST package), and visualized the MCC tree using FigTree (tree.bio.ed.ac.uk/software/figtree/). Additionally, we used DensiTree (Bouckaert, 2010) to draw simultaneously the 10,000 BEAST trees with the highest posterior probability. Since all trees were drawn transparently, agreements in both topology and branch lengths appear as densely colored regions (Fig. 3A), while areas with little agreement appear lighter.

Demography and migration of HPGI

From the Bayesian phylogenetic analyses described in previous sections, we studied the demographic model estimated via Skygrid. We reconstructed a skyline plot that depicts changes in effective population size, a measure of relative diversity, through time (Bouckaert et al., 2014; Drummond et al., 2012). Implementation of non-phylogenetic methodologies for demographic inference exist, e.g. Multiple sequentially Markovian coalescent (MSMC) (Schiffels and Durbin, 2014), but after exploring them we concluded that their resolution is not sufficient for analyses of the last several centuries, as in our case.

We performed another Bayesian phylogenetic analysis incorporating a geographic location trait (Lemey et al., 2010; Wilson and Barton, 1995). For this, Brownian diffusion parameters are estimated by fitting a continuous gradient of geographic locations along tree branches, starting from the leaves of the tree for which geographic locations are known, i.e. the collection sites of our samples. We excluded three samples from the West coast of the United States, since propagation by Brownian diffusion along large distances is an unrealistic model. We ran this analysis with the parameters described in the previous sections and sliced the resulting 3D (temporal and geographical) phylogeny at the early 16* century and late 18^th century using SPREAD software (Bielejec et al., 2011).

We used a heuristic search using an isolation-by-distance pattern inspired by (Handley et al., 2007) to find the origin of diffusion of HPGI in North America, and compared it to the phylogeography analyses. We performed pairwise tests of the relation between genetic and geographic distances using a linear regression. Afterwards we decomposed for each sample the isolation-by-distance pattern (i.e. each row of both distance matrices), and tested whether the slope of the regression still held, that is, whether the remaining samples showed a gradual increase in genetic distance as they moved away from the presumed origin. The sample locations that showed the steepest and most significant slopes were assumed to have been closest to the origin of HPGI diffusion. Because there are indications that more than a single spread of the groups might have happened, we performed the isolation by distance analyses for modern accessions and herbarium specimens separately. These two analyses allowed us to locate the origin of the modern and historic diffusions of HPGI in North America, respectively. The analysis consisted of a heuristic search across all sampled locations, in which a regression between genetic distance ∼ Euclidean geographic distances was performed.

Analysis of the methylation status of mutated sites

As in many other species, the spectrum of de novo mutations in A. thaliana is biased towards G:C→A:T transitions in greenhouse-grown mutation accumulation lines (Ossowski et al., 2010), leading to an inflated transition-to-transversion ratio (Ts/Tv). This bias is less pronounced in recent mutations in a Eurasian collection of natural accessions (Cao et al., 2011) and in HPGI accessions (Fig. 3D). A recent multigenerational salt stress experiment in the greenhouse also showed a more balanced Ts/Tv (Jiang et al., 2014). These findings indicate that less benign conditions might promote a lower Ts/Tv.

The mechanisms underlying a high Ts/Tv ratio are unknown, but could include spontaneous deamination of methylated cytosines (5-methyl-C → T). In agreement with this possibility, we found previously that ancestral cytosines methylated in the A. thaliana reference strain had a more than two-fold higher polymorphism rate than unmethylated cytosines, with the highest rate found in CHG sites (where H is A, C or T) (Table 1, Cao et al., 2011).

We interrogated the putative evolutionary role of cytosine methylation in the mutability of cytosine bases in the HPGI accessions. For reference DNA methylation data, we used previously generated bisulfite-sequencing data of HPGI strains (Hagmann et al., 2015) and of Col-0 lines (Becker et al., 2011), respectively. Our rationale was that if methylation affected mutability, this should reflect in the proportion of mutated sites being methylated in the reference datasets, compared to that proportion for non-mutated sites. To be able to determine the ancestral state of a given site, we only considered positions for which we could determine that state by alignment to the A. lyrata genome (Hu et al., 2011).

The test set of genomic positions consisted of the n sites that were invariant cytosines in A. lyrata and the A. thaliana Col-0 reference genome and whose derived allele was present in at least one HPGI accession (i.e., SNPs segregating within the HPGI population). For these sites, we determined the fraction of methylated cytosines as the number of corresponding sites classified as ‘methylated’ in the HPGI and Col-0 reference datasets, respectively, divided by n.

As a first control set of sites, hence called ‘neutral’, we selected cytosines that were invariant between A. lyrata, Col-0, and all HPGI accessions. A second control set, which we called ‘fixed’, consisted of cytosines that were invariant between A. lyrata and Col-0, and that had mutated and had been fixed in all HPGI accessions. For both control sets we generated empirical distributions of the fraction of sites that were methylated in the HPGI and Col-0 reference datasets, respectively. To this end, we randomly selected n positions with sequence information in the methylation datasets; this process was repeated 1,000 times.

Ancestral cytosines with higher methylation proportion in both A. thaliana and HPGI methylome datasets were more likely to mutate to thymines (Fig. S3C-F). Surprisingly, not only C→T but also C→A/G segregating sites were more likely to have been methylated compared to the fixed and neutral positions, which cannot be explained by higher deamination rates of methylated vs. unmethylated cytosines.

There is an ongoing debate on how epigenetics, i.e. environmentally-induced modification with non-Mendelian inheritance, could contribute to adaptation (Mirouze and Paszkowski, 2011; Nicotra et al., 2010). This result could certainly constitute a genetically-based hypothesis of epigenetic roles in adaptation, perhaps in favor of the “adaptive mutation” argument heavily evidenced in bacteria (Al Mamunetal., 2012).

Inference of genome-wide selection parameters

We estimated the average strength of genome-wide selection using the non-equal relationship between whole-genome and intergenic substitution rates. We selected intergenic regions as the neutral reference because they should not involve any direct phenotypic or biochemical effect but have abundant sites to compare with. This was based on the well-known relationship described by Kimura (1967): where k is the substitution rate, μ the mutation rate, N_e the effective population size, and Q the fixation probability of a new mutation. Under neutrality, substitution and mutation rate should be equal since and the effective population size term, 2N_e, cancels out in the equation. With a semidominant genome-wide selection coefficient s acting on a new mutation, Q ≈ s / 2N_e (l-e^2N_es) (Charlesworth and Charlesworth, 2010). We used the intergenic substitution rate as proxy for the mutation rate μ and the whole-genome substitution rate as proxy for the substitution rate k. We solved the equation for 2N_es, known as the population selection parameter.

Association analyses and dating of newly arisen mutations

For 63 modern accessions, we measured time to bolting and flowering with four replicates, and fecundity (as seed set) with one replicate in growth chambers at the University of Chicago. Additionally, using ≥ 10 replicates we analyzed primary root phenotypes at the Gregor Mendel Institute in Vienna, describing growth and morphological traits extracted from images as described (Slovak et al., 2014) (see next section for details in phenotypic characterization). For associations with climate parameters, we followed a similar rationale as described (Hancock et al., 2011). We extracted information from the publicly available bioclim database (http://www.worldclim.org/bioclim) at 2.5 degrees resolution raster and intersected it with geographic locations of HPGI samples (n = 103).

We performed association analyses using the R package GenABEL (Aulchenko et al., 2007), with measured phenotypes (p = 25) and climatic variables (c = 18) as response variables and SNPs as explanatory variables. A Minimum Allele Frequency cutoff 5% was used. The number of assessed SNPs was 391 in a dataset of only modern samples but imputed genotypes for missing data using Beagle v4.0 (Browning and Browning, 2009), and 456 SNPs with a dataset of modern and also historic samples, although without imputation. For all associations, either phenotypic or climatic ones, minimum 63 individuals were genotyped for a SNP. All phenotypic variables were measured in common chamber or common garden experiments. We first investigated broad sense heritability (H2) of each trait using ANOVA partition of variance between and within lines using replicates (Table S4). Significance was obtained by common F test in ANOVA Secondly we used the polygenic_hglm function in GenABEL to fit a genome wide kinship matrix in order to calculate a narrow sense heritability estimate (h2). Significance was calculated employing a likelihood ratio test comparing with a null model. In principle, h2 is a component of H2, then its values should always be h2 < H2. Our result cannot be interpreted in this framework, since we employed genotype means for h2 calculation and replicate measurements for H2 calculation. This reduced the environmental and developmental noise and thus inflated h2 (Table S4). In this framework, however, we could calculate h2 for climatic variables as well. Seed size had a particularly high heritability, a pattern attributed to highly accurate and replicated measurements (see Phenotyping section). For association analyses we first employed a linear mixed model that fitted the kinship matrix using the mmscore function, and only three significant SNP hits were discovered using a 5% significance threshold after False Discovery Rate correction (FDR). This was expected since we have very few variants and these would have originated in an approximated phylogeny structure. We concluded that fitting the kinship matrix in our model was not appropriate since there would be no leftover variation for association with specific SNPs. With this rationale we employed a fixed effects linear model using the function qtscore from GenABEL. To reduce the false-positive rate we took a conservative permutation strategy that consisted in carrying out association analyses over 1,000 random datasets (permuting phenotypes across individuals) and used the resulting p-value distribution to correct p-values estimated with the original dataset. SNPs with p-values below 5% in the empirical p-value distribution were considered significant (Table S5). In climatic models, we additionally included longitude and latitude as covariates to correct any spurious association between SNPs and climate gradients created by the migratory pattern of isolation by distance. Significant SNPs were interspersed throughout the genome (Fig. S4) and their p-value and phenotypic effect did not correlate with the putative age of the SNPs neither with the frequency, something that could have indicated that the significance was merely driven by the higher statistical power of intermediate frequency variants (Fig. S5 G-J). Using QQ plots to assess inflation or deflation of p-values, we observed generally that permutation corrected p-values were deflated. Straight series of points in QQ plots indicate identical p-values for multiple SNPs, a pattern that we attributed to long range LD, i.e. lack of independence (see Graphical Table S6 for trait distributions and QQ plots from each association analysis). Due to this fact, we add two correction procedures more: (1) Bonferroni-correcting the significance threshold for permutation corrected SNPs from 5% to 5% / number of traits, i.e. 0.2% for phenotype association and 0.27% for climatic associations. (2) Bonferroni-correcting the significance threshold for raw p-values from 5% to 5% / (number of SNPs + number of traits), i.e. 0.01% for phenotype and climatic associations (Table 1 and S5).

For each SNP in our dataset, we determined directionality of mutation, i.e. ancestral vs derived alleles, by determining which state was found in the oldest herbarium samples. We compared the time of emergence and the centroid of geographic distribution of the alternative alleles of SNP hits to random draws of SNPs with the same minimum allele frequency filtering (5%).

On top of phenotypic and climatic associations of SNP hits, we also provide a putative protein effect employing a commonly used amino acid matrix of biochemical effects (Grantham, 1974). Gene name and ontology categorization of SNPs inside annotated transcriptional units was extracted from the online tool www.arabidopsis.org/tools/bulk/go/.

Association analyses – proof of concept examples

We argue that the power of an association approach relies on the fact that HPGI lines resemble Near Isogenic Lines (NILs) produced by experimental crosses (Weigel, 2012). Similarly to genome-wide association studies (GWA), power depends on a number of factors, namely the noise of phenotype under study, architecture of phenotypic trait, quality of genotyping, population structure, sample diversity, sample size, allele frequency, and recombination. On one hand, association analyses in NILs suffers from large linkage blocks, but confident results can be achieved due to accurate measurement of phenotypes, limited genetic differences between any two lines, and high quality genotypes. In common GWA such as in humans, there are multiple confounding effects. Among the confounders are (1) that any two samples differ in hundreds of thousands of SNP and (3) that historical and geographic stratification produce non-random correlations among those SNP differences. This complicates considerably the identification of phenotypic effects at specific genes, and power relies greatly on large samples and frequent recombination between markers.

We exemplify the association analysis confidence with some examples. To provide support for the nonsynonymous SNP on chromosome 5, at position 6,508,329 in AT5GI9330, we looked for pairs of lines that carry the ancestral and the derived allele, but that differ in few (or no other) SNP in the genome. When considering all genic substitutions with a minimum allele frequency of 5% (Fig. S6A), we identified 20 pairs of lines differing only in the AT5G19330 SNP and another linked SNP (which is located on a different chromosome and had an association p-value > 0.4). The phenotypic differences in mean gravitropic score of these almost-identical pairs were significantly higher than phenotypic differences among all pairs of HPGI lines, and genetically identical pairs attending to substitutions inside genes (Fig. S6D). Furthermore this SNP was not in linkage disequilibrium with any other SNP hit (r² < 0.6) (Fig. S6D). A similar approach was used to examine the SNPs in AT2G02220 (Fig. S6E) and AT2G16580 (Fig. S6F).

Phenotyping

Root phenotypes

Fifteen root phenotypes were scored for three replicates per genotype over a time-series experiment via image analysis as described in detail in (Slovak et al., 2014). We used the mean and standard deviation of the time series values for association analyses.

Seed size phenotype:

We dispersed the seeds of given genotypes on separate plastic square 12 × 12 cm Petri dishes. For faster image acquisition we used cluster of eight Epson V600 scanners. The scanner cluster was operated by the BRAT Multiscan image acquisition tool (https://www.gmi.oeaw.ac.at/research-groups/wolfgang-busch/resources/brat/). The resulting 1600 dpi images were analyzed in Fiji software. Scans were converted to 8-bit binary images, thresholded (parameters: setAutoThreshold(“Default dark”); setThreshold(20, 255)) and particles analyzed (inclusion parameters: size=0.04-0.25 circularity=0.70-1.00). The 2D seed size was measured in square millimeters (parameters: distanced600 known=25.4 pixel=l unit=mm) on > 500 replicates per genotype.

Flowering time in growth chambers

We estimated the flowering time in growth chambers under 4 vernalization treatments. We grew 6 replicates per accession divided between two complete randomized blocks for each treatment. Seeds were sown on a 1:1 mixture of Premier Pro-Mix and MetroMix and cold stratified for 6 days (6°C, no light). We then let plants germinate and grow at 18°C, 14 hours light, 65% humidity. After 3 weeks, we transferred the plants to the vernalization conditions (6°C, 8 hours light, and 65% humidity). The 4 treatments consisted of 0, 14, 28 and 63 days of vernalization, respectively. After the vernalization treatment, plants were transferred back to the previous long day growth conditions. Trays were rotated around the growth chambers every other day throughout the experiments, under both vernalization and growth conditions. Germination, bolting and flowering dates were recorded every other day until all plants had flowered. Days till flowering or bolting times were calculated from the germination date until the first flower bud was developed and until the first flower opened, respectively. The average flowering time and bolting time per genotype was used for association analyses.

Flowering time and fecundity in the field

To investigate variation in flowering time and fecundity in natural conditions, we grew 3 replicates for each of the 78 accessions in a field experiment following a completely randomized block design. Seeds were sowed between 09/20/2012 to 09/22/2012 in 66 well trays (well diameter=4cm) on soil from the field site where plants were to be transplanted. The trays were cold stratified for seven days before being placed in a cold frame at the University of Chicago, IL, USA, (outdoors, no additional light or heat, but watered as needed and protected from precipitation). Seedlings were transplanted directly into tilled ground at the Warren Wood field station (41.84° North, 86.63° West), Michigan, USA on 10/13/2012 and 10/14/2012. Seedlings were watered-in and left to overwinter without further intervention. Plants were scored for bolting and flowering in Spring 2013. Upon maturation of all fruits, stems were harvested and stored between sheets of newsprint paper. To estimate the fecundity, stems were photographed on a black background and the size of each plant was estimated as the number of pixels occupied by the plant on the image. This measure correlates well with the total length of siliques produced, a classical estimator of fecundity in A. thaliana (Spearman’s rho=0.84, p-value<0.001, data not shown).

ACKNOWLEDGEMENTS

For providing and retrieving herbarium specimens, we thank Robert Capers (University of Connecticut), Jane Devos and Gautam Shirsekar (MPI), Michael S. Dossmann (Arnold Arboretum), John Freudenstein (Ohio State University), Cathy M. Herring (Agricultural Research Station, North Carolina State University), Christine Niezgoda (Field Museum), Carol Ann McCormick (University of North Carolina), John Peter (New York Botanical Garden), and Marco Thines (Goethe University). We thank Xiaohui Zhao and Ian Henderson (SLCU) for recombination estimates in Eurasia, Christa Lanz (MPI) for support with sequencing, Christian Goeschl and Bettina Zierfuss (GMI) for assistance in root imaging, and Bonnie Wohlrab (GMI) for amplifying the seeds for root assays. We thank Magnus Nordborg for discussions and pointing us to the work of Templeton, Kay Pruefer for input on data analysis, Patricia Karlsson and Danelle Seymour for thorough proofreading and comments, and various present and past members of the Department of Molecular Biology of the Max Planck Institute for Developmental Biology for further comments on the manuscript. This work was supported by ERC Advanced Grant IMMUNEMESIS and the Max Planck Society.

AUTHOR CONTRIBUTIONS

H.A.B and D.W. conceived and supervised the project, and coordinated the collaborative effort. J.B. coordinated the collection of modern seed samples. C.J. B.B. and J.B. performed and analyzed flowering time and seed set greenhouse experiments. R.S. and W.B. conceived and analyzed root assays. C.S. and R.S. performed the root assays and seed size phenotyping. C.B. and J.H. sequenced and curated modern samples. H.A.B. coordinated the collection and analysis of herbarium samples. J.K. coordinated the extraction of DNA and library preparation of herbarium samples. V.J.S. and E.R. prepared sequencing libraries from herbarium specimens. C.B. called variants in HPGI. J.H. called variants in mutation accumulation lines. M.E.A performed the population and quantitative genomic analyses with supervision of R.N., C.B. and H.A.B. The paper was written by M.E.A, C.B., H.A.B. and D.W. with comments from all coauthors.

REFERENCES

↵
Arunkumar, R., Ness, R.W., Wright, S.I., and Barrett, S.C. (2015). The evolution of selfing is accompanied by reduced efficacy of selection and purging of deleterious mutations. Genetics 199, 817–829.
OpenUrl Abstract/FREE Full Text
↵
Aulchenko, Y.S., Ripke, S., Isaacs, A., and van Duijn, CM. (2007). GenABEL: an R library for genome-wide association analysis. Bioinformatics 23, 1294–1296.
OpenUrl CrossRef PubMed Web of Science
↵
1. H.G. Baker, and
2. G.L Stebbins, eds
Baker, H.G. (1965). Characteristics and modes of origin of weeds. In The Genetics of Colonizing Species, H.G. Baker, and G.L Stebbins, eds. (New York: Academic Press), pp. 147–168.
↵
Barrett, R.D.H., and Schluter, D. (2008). Adaptation from standing genetic variation. Trends Ecol Evol 23, 38–44.
OpenUrl CrossRef PubMed Web of Science
↵
Barrett, S.C.H. (2014). Foundations of invasion genetics: the Baker and Stebbins legacy. Mol Ecol 24, 1927–1941.
OpenUrl
↵
Barrick, J.E., and Lenski, R.E. (2013). Genome dynamics during experimental evolution. Nat Rev Genet 14, 827–839.
OpenUrl CrossRef PubMed
↵
Becker, C., Hagmann, J., Müller, J., Koenig, D., Stegle, O., Borgwardt, K., and Weigel, D. (2011). Spontaneous epigenetic variation in the Arabidopsis thaliana methylome. Nature 480, 245–249.
OpenUrl CrossRef PubMed Web of Science
↵
Bentsink, L., Hanson, J., Hanhart, C.J., Blankestijn-de Vries, H., Coltrane, C., Keizer, P., El-Lithy, M., Alonso-Blanco, C., de Andres, M.T., Reymond, M., et al. (2010). Natural variation for seed dormancy in Arabidopsis is regulated by additive genetic and molecular pathways. Proc Natl Acad Sci USA 107, 4264–4269.
↵
Bock, D.G., Caseys, C., Cousens, R.D., Hahn, M.A., Heredia, S.M., Hübner, S., Turner, K.G., Whitney, K.D., and Rieseberg, LH. (2015). What we still don’t know about invasion genetics. Mol Ecol 24, 2277–2297.
OpenUrl CrossRef
↵
Bouckaert, R., Heled, J., Kühnert, D., Vaughan, T., Wu, C.-H., Xie, D., Suchard, M.a., Rambaut, A., and Drummond, A.J. (2014). BEAST 2: a software platform for Bayesian evolutionary analysis. PLoS CompBiol 10, e1003537.
OpenUrl
↵
Briggs, A.W., Stenzel, U., Johnson, P.L., Green, R.E., Kelso, J., Prüfer, K., Meyer, M., Krause, J., Ronan, M.T., Lachmann, M., et al. (2007). Patterns of damage in genomic DNA sequences from a Neandertal. Proc Natl Acad Sci USA 104, 14616–14621.
↵
Briggs, A.W., Stenzel, U., Meyer, M., Krause, J., Kircher, M., and Pääbo, S. (2010). Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA. Nucleic Acids Res 38, e87.
OpenUrl CrossRef PubMed
↵
Brouwer, D.J., and St Clair, D.A. (2004). Fine mapping of three quantitative trait loci for late blight resistance in tomato using near isogenic lines (NILs) and sub-NILs. Theor Appl Genet 108, 628–638.
OpenUrl CrossRef PubMed Web of Science
↵
Bustamante, C.D., Nielsen, R., Sawyer, S.A., Olsen, K.M., Purugganan, M.D., and Hartl, D.L. (2002). The cost of inbreeding in Arabidopsis. Nature 416, 531–534.
OpenUrl CrossRef PubMed Web of Science
↵
Cao, J., Schneeberger, K., Ossowski, S., Gunther, T., Bender, S., Fitz, J., Koenig, D., Lanz, C., Stegle, O., Lippert, C., et al. (2011). Whole-genome sequencing of multiple Arabidopsis thaliana populations. Nat Genet 43, 956–963.
OpenUrl CrossRef PubMed
↵
Charlesworth, B., and Charlesworth, D. (2010). Elements of Evolutionary Genetics (Roberts and Company: Greenwood Village, CO, 2010).
↵
Charlesworth, D., and Wright, S.I. (2001). Breeding systems and genome evolution. Curr Opin Genet Dev 11, 685–690.
OpenUrl CrossRef PubMed Web of Science
↵
Choi, K., Zhao, X., Kelly, K.A., Venn, O., Higgins, J.D., Yelina, N.E., Hardcastle, T.J., Ziolkowski, P.A., Copenhaver, G.P., Franklin, F.C., et al. (2013). Arabidopsis meiotic crossover hot spots overlap with H2A.Z nucleosomes at gene promoters. Nat Genet 45, 1327–1336.
OpenUrl CrossRef PubMed
↵
Crawford, P.H.C., and Hoagland, B.W. (2009). Can herbarium records be used to map alien species invasion and native species expansion over the past 100 years? J Biogeography 36, 651–661.
OpenUrl
↵
Drummond, A.J., Suchard, M.A., Xie, D., and Rambaut, A. (2012). Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol Biol Evol 29, 1969–1973.
OpenUrl CrossRef PubMed Web of Science
↵
Fletcher, R.S., Mullen, J.L., Yoder, S., Bauerle, W.L., Reuning, G., Sen, S., Meyer, E., Juenger, T.E., and McKay, J.K. (2013). Development of a next-generation NIL library in Arabidopsis thaliana for dissecting complex traits. BMC Genomics 14, 655.
OpenUrl CrossRef PubMed
↵
Fu, Q., Li, H., Moorjani, P., Jay, F., Slepchenko, S.M., Bondarev, A.A., Johnson, P.L.F., Aximu-Petri, A., Prüfer, K., de Filippo, C., et al. (2014). Genome sequence of a 45,000-year-old modern human from western Siberia. Nature 514, 445–449.
OpenUrl CrossRef PubMed Web of Science
↵
Gauze, G.F. (1934). The Struggle for Existence (Baltimore: Williams & Wilkins).
↵
Gill, M.S., Lemey, P., Faria, N.R., Rambaut, A., Shapiro, B., and Suchard, M.a. (2012). Improving Bayesian population dynamics inference: a coalescent-based model for multiple loci. Mol Biol Evol 30, 713–724.
OpenUrl PubMed Web of Science
↵
Goldewijk, K.K., and Ramankutty, N. (2004). Land cover change over the last three centuries due to human activities: The availability of new global data sets. Geojournal 61, 335–334.
OpenUrl CrossRef
↵
Green, R.E., and Shapiro, B. (2013). Human evolution: turning back the clock. Curr Biol 23, R286–288.
OpenUrl CrossRef PubMed
↵
Hagmann, J., Becker, C., Müller, J., Stegle, O., Meyer, R.C., Wang, G., Schneeberger, K., Fitz, J., Altmann, T., Bergelson, J., et al. (2015). Century-scale methylome stability in a recently diverged Arabidopsis thaliana lineage. PLoS Genet 11, e1004920.
OpenUrl CrossRef PubMed
↵
Halligan, D.L., and Keightley, P.D. (2009). Spontaneous mutation accumulation studies in evolutionary genetics. Annu Rev Ecol Evol S 40, 151–172.
OpenUrl
↵
Hancock, AM., Brachi, B., Faure, N., Horton, M.W., Jarymowycz, LB., Sperone, F.G., Toomajian, C., Roux, F., and Bergelson, J. (2011). Adaptation to climate across the Arabidopsis thaliana genome. Science 334, 83–86.
OpenUrl Abstract/FREE Full Text
↵
Ho, S.Y.W., Phillips, M.J., Cooper, A., and Drummond, A.J. (2005). Time dependency of molecular rate estimates and systematic overestimation of recent divergence times. Mol Biol Evol 22, 1561–1568.
OpenUrl CrossRef PubMed Web of Science
↵
Hofreiter, M., Jaenicke, V., Serre, D., Haeseler Av, A., and Pääbo, S. (2001). DNA sequences from multiple amplifications reveal artifacts induced by cytosine deamination in ancient DNA. Nucleic Acids Res 29, 4793–4799.
OpenUrl CrossRef PubMed Web of Science
↵
Hu, T.T., Pattyn, P., Bakker, E.G., Cao, J., Cheng, J.F., Clark, R.M., Fahlgren, N., Fawcett, J.A., Grimwood, J., Gundlach, H., et al. (2011). The Arabidopsis lyrata genome sequence and the basis of rapid genome size change. Nat Genet 43, 476–481.
OpenUrl CrossRef PubMed Web of Science
↵
Hudson, R.R., and Kaplan, N.L. (1985). Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111, 147–164.
OpenUrl Abstract/FREE Full Text
↵
Huson, D.H., and Bryant, D. (2006). Application of phylogenetic networks in evolutionary studies. Mol Biol Evol 23, 254–267.
OpenUrl CrossRef PubMed Web of Science
↵
Jiang, C., Mithani, A., Belfield, E.J., Mott, R., Hurst, L.D., and Harberd, N.P. (2014). Environmentally responsive genome-wide accumulation of de novo Arabidopsis thaliana mutations and epimutations. Genome Res 24, 1821–1829.
OpenUrl Abstract/FREE Full Text
↵
Jombart, T. (2008). adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics 24, 1403–1405.
OpenUrl CrossRef PubMed Web of Science
↵
Keightley, P.D., and Eyre-Walker, A. (2007). Joint inference of the distribution of fitness effects of deleterious mutations and population demography based on nucleotide polymorphism frequencies. Genetics 177, 2251–2261.
OpenUrl Abstract/FREE Full Text
↵
Keurentjes, J.J., Bentsink, L., Alonso-Blanco, C., Hanhart, C.J., Blankestijn-De Vries, H., Effgen, S., Vreugdenhil, D., and Koornneef, M. (2007). Development of a near-isogenic line population of Arabidopsis thaliana and comparison of mapping power with a recombinant inbred line population. Genetics 175, 891–905.
OpenUrl Abstract/FREE Full Text
↵
Kim, S., Choi, H.I., Ryu, H.J., Park, J.H., Kim, M.D., and Kim, S.Y. (2004). ARIA, an Arabidopsis arm repeat protein interacting with a transcriptional regulator of abscisic acid-responsive gene expression, is a novel abscisic acid signaling component. Plant Physiol 136, 3639–3648.
OpenUrl Abstract/FREE Full Text
↵
Kimura, M. (1967). On the evolutionary adjustment of spontaneous mutation rates. Genet Res 9, 23.
OpenUrl CrossRef Web of Science
↵
Kircher, M. (2012). Analysis of high-throughput ancient DNA sequencing data. Methods Mol Biol 840, 197–228.
OpenUrl CrossRef PubMed Web of Science
↵
Kong, A., Frigge, M.L., Masson, G., Besenbacher, S., Sulem, P., Magnusson, G., Gudjonsson, S.a., Sigurdsson, A., Jonasdottir, A., Jonasdottir, A., et al. (2012). Rate of de novo mutations and the importance of father’s age to disease risk. Nature 488, 471–475.
OpenUrl CrossRef PubMed Web of Science
↵
Krause, J., Briggs, A.W., Kircher, M., Maricic, T., Zwyns, N., Derevianko, A., and Pääbo, S. (2010). A complete mtDNA genome of an early modern human from Kostenki, Russia. Curr Biol 20, 231–236.
OpenUrl
↵
Lankaua, R.A., Nuzzo, V., Spyreasa, G., and Davisc, A.S. (2009). Evolutionary limits ameliorate the negative impact of an invasive plant. Proc Natl Acad Sci USA 107, 1253.
↵
Lee, C.E. (2002). Evolutionary genetics of invasive species. Trends Ecol Evol 17, 386–391.
OpenUrl CrossRef Web of Science
↵
Librado, P., and Rozas, J. (2009). DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics 25, 1451–1452.
OpenUrl CrossRef PubMed Web of Science
↵
Lipson, M., Loh, P.R., Sankararaman, S., Patterson, N., Berger, B., and Reich, D. (2015). Calibrating the Human Mutation Rate via Ancestral Recombination Density in Diploid Genomes. PLoS Genet 11, e1005550.
OpenUrl CrossRef PubMed
↵
Lundemo, S., Falahati-Anbaran, M., and Stenoien, H.K. (2009). Seed banks cause elevated generation times and effective population sizes of Arabidopsis thaliana in northern Europe. Mol Ecol 18, 2798–2811.
OpenUrl CrossRef PubMed
↵
Markakis, M.N., Boron, A.K., Van Loock, B., Saini, K., Cirera, S., Verbelen, J.P., and Vissenberg, K. (2013). Characterization of a small auxin-up RNA (SAUR)-like gene involved in Arabidopsis thaliana development. PLoS One 8, e82596.
OpenUrl CrossRef PubMed
↵
Maron, J.L., Vila, M., Bommarco, R., Elmendorf, S., and Beardsley, P. (2004). Rapid evolution of an invasive plant. Ecol Monogr 74, 261–280.
OpenUrl CrossRef Web of Science
↵
Martin, M.D., Cappellini, E., Samaniego, J.A., Zepeda, M.L., Campos, P.F., Seguin-Orlando, A., Wales, N., Orlando, L., Ho, S.Y., Dietrich, F.S., et al. (2013). Reconstructing genome evolution in historic samples of the Irish potato famine pathogen. Nat Commun 4, 2172.
OpenUrl CrossRef PubMed
↵
Meyer, M., and Kircher, M. (2010). Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb Protoc 2010, pdb prot5448.
↵
Montesinos, A., Tonsor, S.J., Alonso-Blanco, C., and Pico, F.X. (2009). Demographic and genetic patterns of variation among populations of Arabidopsis thaliana from contrasting native environments. PLoS One 4, e7213.
OpenUrl CrossRef PubMed
↵
Nachman, M.W., and Crowell, S.L. (2000). Estimate of the mutation rate per nucleotide in humans. Genetics 156, 297–304.
OpenUrl Abstract/FREE Full Text
↵
Ness, R.W., Morgan, A.D., Vasanthakrishnan, R.B., Colegrave, N., and Keightley, P.D. (2015). Extensive de novo mutation rate variation between individuals and across the genome of Chlamydomonas reinhardtii. Genome Res.
↵
Ness, R.W., Wright, S.I., and Barrett, S.C. (2010). Mating-system variation, demographic history and patterns of nucleotide diversity in the Tristylous plant Eichhornia paniculata. Genetics 184, 381–392.
OpenUrl Abstract/FREE Full Text
↵
Nordborg, M., Hu, T.T., Ishino, Y., Jhaveri, J., Toomajian, C., Zheng, H., Bakker, E., Calabrese, P., Gladstone, J., Goyal, R., et al. (2005). The pattern of polymorphism in Arabidopsis thaliana. PLoS Biol 3, e196.
OpenUrl CrossRef PubMed
↵
Orlando, L., Gilbert, M.T., and Willerslev, E. (2015). Reconstructing ancient genomes and epigenomes. Nat Rev Genet 16, 395–408.
OpenUrl CrossRef PubMed
↵
Ossowski, S., Schneeberger, K., Lucas-Lledo, J.L., Warthmann, N., Clark, R.M., Shaw, R.G., Weigel, D., and Lynch, M. (2010). The rate and molecular spectrum of spontaneous mutations in Arabidopsis thaliana. Science 327, 92–94.
OpenUrl Abstract/FREE Full Text
↵
Platt, A., Horton, M., Huang, Y.S., Li, Y., Anastasio, A.E., Mulyati, N.W., Agren, J., Bossdorf, O., Byers, D., Donohue, K., et al. (2010). The scale of population structure in Arabidopsis thaliana. PLoS Genet 6, e1000843.
OpenUrl CrossRef PubMed
↵
Prüfer, K., and Meyer, M. (2015). Comment on “Late Pleistocene human skeleton and mtDNA link Paleoamericans and modern Native Americans”. Science 347, 835.
OpenUrl Abstract/FREE Full Text
↵
Roach, J.C., Glusman, G., Smit, A.F., Huff, CD., Hubley, R., Shannon, P.T., Rowen, L., Pant, K.P., Goodman, N., Bamshad, M., et al. (2010). Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328, 636–639.
OpenUrl Abstract/FREE Full Text
↵
Sax, D.F., Stachowicz, J.J., Brown, J.K., Bruno, J.F., Dawson, M.N., Gaines, S.D., Grosberg, R.K., Hastings, A., Holt, R.D., Mayfield, M.M., et al. (2007). Ecological and evolutionary insights from species invasions. Trends EcolEvol 22, 465–471.
OpenUrl CrossRef PubMed Web of Science
↵
Scally, A., and Durbin, R. (2012). Revising the human mutation rate: implications for understanding human evolution. Nat Rev Genet 13, 745–753.
OpenUrl CrossRef PubMed
↵
Schneeberger, K., Hagmann, J., Ossowski, S., Warthmann, N., Gesing, S., Kohlbacher, O., and Weigel, D. (2009). Simultaneous alignment of short reads against multiple genomes. Genome Biol 10, R98.
OpenUrl CrossRef PubMed
↵
Ségurel, L., Wyman, M.J., and Przeworski, M. (2014). Determinants of mutation rate variation in the human germline. Annu Rev Genomics Hum Genet 15, 47–70.
OpenUrl CrossRef PubMed
↵
Shapiroa, B., and Hofreiter, M. (2014). A paleogenomic perspective on evolution and gene function: new insights from ancient DNA. Science 343, 1236573.
OpenUrl Abstract/FREE Full Text
↵
Shaw, R.G., Byers, D.L., and Darmo, E. (2000). Spontaneous mutational effects on reproductive traits of Arabidopsis thaliana. Genetics 155, 369–378.
OpenUrl Abstract/FREE Full Text
↵
Sniegowski, P.D., Gerrish, P.J., and Lenski, R.E. (1997). Evolution of high mutation rates in experimental populations of E. coli. Nature 387, 703–705.
OpenUrl CrossRef PubMed Web of Science
↵
Spartz, A.K., Lee, S.H., Wenger, J.P., Gonzalez, N., Itoh, H., Inze, D., Peer, W.A., Murphy, A.S., Overvoorde, P.J., and Gray, W.M. (2012). The SAUR19 subfamily of SMALL AUXIN UP RNA genes promote cell expansion. Plant J 10, 978–990.
OpenUrl
↵
Staats, M., Erkens, R.H.J., van de Vossenberg, B., Wieringa, J.J., Kraaijeveld, K., Stielow, B., Geml, J., Richardson, J.E., and Bakker, F.T. (2013). Genomic treasure troves: complete genome sequencing of herbarium and insect museum specimens. PLoS ONE 8, e69189.
OpenUrl CrossRef PubMed
↵
Stec, A.O., Bhaskar, P.B., Bolon, Y.T., Nolan, R., Shoemaker, R.C., Vance, C.P., and Stupar, R.M. (2013). Genomic heterogeneity and structural variation in soybean near isogenic lines. Front Plant Sci 4, 104.
OpenUrl PubMed
↵
Subramanian, S., and Kumar, S. (2003). Neutral substitutions occur at a faster rate in exons than in noncoding DNA in primate genomes. Genome Res 13, 838–844.
OpenUrl Abstract/FREE Full Text
↵
Subramanian, S., and Lambert, D.M. (2011). Time dependency of molecular evolutionary rates? Yes and no. Genome Biology and Evolution 3, 1324–1328.
OpenUrl CrossRef PubMed
↵
Swarup, K., Alonso-Blanco, C., Lynn, J.R., Michaels, S.D., Amasino, R.M., Koornneef, M., and Millar, A.J. (1999). Natural allelic variation identifies new genes in the Arabidopsis circadian system. Plant J 20, 67–77.
OpenUrl CrossRef PubMed Web of Science
↵
Szalma, S.J., Hostert, B.M., Ledeaux, J.R., Stuber, C.W., and Holland, J.B. (2007). QTL mapping with near-isogenic lines in maize. Theor Appl Genet 114, 1211–1228.
OpenUrl CrossRef PubMed Web of Science
↵
Tajima, F. (1989). Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123, 585–595.
OpenUrl Abstract/FREE Full Text
↵
Templeton, A.R., Sing, C.F., Kessling, A., and Humphries, S. (1988). A Cladistic-Analysis of Phenotype Associations with Haplotypes Inferred from Restriction Endonuclease Mapping .2. The Analysis of Natural-Populations. Genetics 120, 1145–1154.
OpenUrl
↵
van Kleunen, M., Dawson, W., Essl, F., Pergl, J., Winter, M., Weber, E., Kreft, H., Weigelt, P., Kartesz, J., Nishino, M., et al. (2015). Global exchange and accumulation of non-native plants. Nature 525, 100–103.
OpenUrl CrossRef PubMed
↵
Vandepitte, K., de Meyer, T., Helsen, K., van Acker, K., Roldán-Ruiz, J., Mergeay, J., and Honnay, O. (2014). Rapid genetic adaptation precedes the spread of an exotic plant species. Mol Ecol 23, 2157–2164.
OpenUrl
↵
Watterson, G.A. (1975). On the number of segregating sites in genetical models without recombination. Theor Pop Biol 7, 256–276.
OpenUrl CrossRef PubMed Web of Science
↵
Weigel, D. (2012). Natural variation in Arabidopsis: from molecular genetics to ecological genomics. Plant Physiol 158, 2–22.
OpenUrl FREE Full Text
↵
Weiss, C.L., Dannemann, M., Prufer, K., and Burbano, H.A. (2015). Contesting the presence of wheat in the British Isles 8,000 years ago by assessing ancient DNA authenticity from low-coverage data. eLife 4.
↵
Weiß, C.L., Schuenemann, V.J., Devos, J., Shirsekar, G., Reiter, E., Gould, B.A., Stinchcombe, J.R., Krause, J., and Burbano, H.A. (2015). Temporal patterns of damage and decay kinetics of DNA retrieved from plant herbarium specimens. bioRxiv http://dx.doi.org/1O.11O1/023135.
↵
Wright, S.I., Ness, R.W., Foxe, J.P., and Barret, S.C.H. (2008). Genomic consequences of outcrossing and selfing in plants. Int J Plant Sci 169, 105–118.
OpenUrl CrossRef Web of Science
↵
Xie, X., Song, M.H., Jin, F., Ahn, S.N., Suh, J.P., Hwang, H.G., and McCouch, S.R. (2006). Fine mapping of a grain weight quantitative trait locus on rice chromosome 8 using near-isogenic lines derived from a cross between Oryza sativa and Oryza rufipogon. Theor Appl Genet 113, 885–894.
OpenUrl CrossRef PubMed Web of Science
↵
Yoshida, K., Schuenemann, V.J., Cano, L.M., Pais, M., Mishra, B., Sharma, R., Lanz, C., Martin, F.N., Kamoun, S., Krause, J. et al. (2013). The rise and fall of the Phytophthora infestans lineage that triggered the Irish potato famine. eLife 2, e00731.
OpenUrl CrossRef PubMed
↵
Zhang, H., Tang, K., Qian, W., Duan, C.G., Wang, B., Zhang, H., Wang, P., Zhu, X., Lang, Z., Yang, Y., et al. (2014). An Rrp6-like protein positively regulates noncoding RNA levels and DNA methylation in Arabidopsis. Mol Cell 54, 418–430.
OpenUrl CrossRef PubMed Web of Science

SUPPLEMENTAL REFERENCES

↵
Al Mamun, A.A., Lombardo, M.J., Shee, C., Lisewski, A.M., Gonzalez, C., Lin, D., Nehring, R.B., Saint-Ruf, C. Gibson, J.L., Frisch, R.L., et al. (2012). Identity and function of a large gene network underlying mutagenic repair of DNA breaks. Science 338, 1344–1348.
OpenUrl Abstract/FREE Full Text
↵
Aulchenko, Y.S., Ripke, S., Isaacs, A., and van Duijn, C.M. (2007). GenABEL: an R library for genome-wide association analysis. Bioinformatics 23, 1294–1296.
OpenUrl CrossRef PubMed Web of Science
↵
Becker, C., Hagmann, J., Müller, J., Koenig, D., Stegle, O., Borgwardt, K., and Weigel, D. (2011). Spontaneous epigenetic variation in the Arabidopsis tbaliana methylome. Nature 480, 245–249.
OpenUrl CrossRef PubMed Web of Science
↵
Bielejec, F., Rambaut, A., Suchard, M.A., and Lemey, P. (2011). SPREAD: spatial phylogenetic reconstruction of evolutionary dynamics. Bioinformatics 27, 2910–2912.
OpenUrl CrossRef PubMed Web of Science
↵
Bouckaert, R., Heled, J., Kühnert, D., Vaughan, T., Wu, C.-H., Xie, D., Suchard, M.a., Rambaut, A., and Drummond, A.J. (2014). BEAST 2: a software platform for Bayesian evolutionary analysis. PLoS Comp Biol 10, e1003537.
OpenUrl
↵
Briggs, A.W., Stenzel, U., Meyer, M., Krause, J., Kircher, M., and Pääbo, S. (2010). Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA. Nucleic Acids Res 38, e87.
OpenUrl CrossRef PubMed
↵
Browning, B.L., and Browning, S.R. (2009). A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet 84, 210–223.
OpenUrl CrossRef PubMed Web of Science
↵
Cao, J., Schneeberger, K., Ossowski, S., Gunther, T., Bender, S., Fitz, J., Koenig, D., Lanz, C., Stegle, O., Lippert, C., et al. (2011). Whole-genome sequencing of multiple Arabidopsis thaliana populations. Nat Genet 43, 956–963.
OpenUrl CrossRef PubMed
↵
Charlesworth, B., and Charlesworth, D. (2010). Elements of Evolutionary Genetics (Roberts and Company: Greenwood Village, CO, 2010).
↵
Drummond, A., Pybus, O.G., and Rambaut, A. (2003). Inference of viral evolutionary rates from molecular sequences. Adv Parasitol 54, 331–358.
OpenUrl CrossRef PubMed
↵
Drummond, A.J., Suchard, M.A., Xie, D., and Rambaut, A. (2012). Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol Biol Evol 29, 1969–1973.
OpenUrl CrossRef PubMed Web of Science
↵
Ferreira, M.A.R., and Suchard, M.A. (2008). Bayesian analysis of elapsed times in continous-time Markovchains. Can J Stat 36, 355–368.
OpenUrl
↵
Gill, M.S., Lemey, P., Faria, N.R., Rambaut, A., Shapiro, B., and Suchard, M.a. (2012). Improving Bayesian population dynamics inference: a coalescent-based model for multiple loci. Mol Biol Evol 30, 713–724.
OpenUrl PubMed Web of Science
↵
Grantham, R. (1974). Amino acid difference formula to help explain protein evolution. Science 185, 862–864.
OpenUrl Abstract/FREE Full Text
↵
Hagmann, J., Becker, C., Müller, J., Stegle, O., Meyer, R.C., Wang, G., Schneeberger, K., Fitz, J., Altmann, T., Bergelson, J., et al. (2015). Century-scale methylome stability in a recently diverged Arabidopsis thaliana lineage. PLoS Genet 11, e1004920.
OpenUrl CrossRef PubMed
↵
Hancock, A.M., Brachi, B., Faure, N., Horton, M.W., Jarymowycz, L.B., Sperone, F.G., Toomajian, C., Roux, F., and Bergelson, J. (2011). Adaptation to climate across the Arabidopsis tlialiana genome. Science 334, 83–86.
OpenUrl Abstract/FREE Full Text
↵
Handley, L.J.L., Manica, A., Goudet, J., and Balloux, F. (2007). Going the distance: human population genetics in a clinal world. Trends Genet 23, 432–439.
OpenUrl CrossRef PubMed Web of Science
↵
Hu, T.T., Pattyn, P., Bakker, E.G., Cao, J., Cheng, J.F., Clark, R.M., Fahlgren, N., Fawcett, J.A., Grimwood, J., Gundlach, H., et al. (2011). The Arabidopsis lyrata genome sequence and the basis of rapid genome size change. Nat Genet 43, 476–481.
OpenUrl CrossRef PubMed Web of Science
↵
Hudson, R.R., and Kaplan, N.L. (1985). Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111, 147–164.
OpenUrl Abstract/FREE Full Text
↵
Huson, D.H., and Bryant, D. (2006). Application of phylogenetic networks in evolutionary studies. Mol Biol Evol 23, 254–267.
OpenUrl CrossRef PubMed Web of Science
↵
Jiang, C., Mithani, A., Belfield, E.J., Mott, R., Hurst, L.D., and Harberd, N.P. (2014). Environmentally responsive genome-wide accumulation of de novo Arabidopsis thaliana mutations and epimutations. Genome Res 24, 1821–1829.
OpenUrl Abstract/FREE Full Text
↵
Jombart, T. (2008). adegenet; a R package for the multivariate analysis of genetic markers. Bioinformatics 24, 1403–1405.
OpenUrl CrossRef PubMed Web of Science
↵
Lemey, P., Rambaut, A., Welch, J.J., and Suchard, M.a. (2010). Phylogeography takes a relaxed random walk in continuous space and time. Mol Biol Evol 21, 1877–1885.
OpenUrl
↵
Librado, P., and Rozas, J. (2009). DnaSP v5; a software for comprehensive analysis of DNA polymorphism data. Bioinformatics 25, 1451–1452.
OpenUrl CrossRef PubMed Web of Science
↵
Mirouze, M., and Paszkowski, J. (2011). Epigenetic contribution to stress adaptation in plants. Curr Opin Plant Biol 14, 267–274.
OpenUrl CrossRef PubMed
↵
Nicotra, A.B., Atkin, O.K., Bonser, S.P., Davidson, A.M., Finnegan, E.J., Mathesius, U., Poot, P., Purugganan, M.D., Richards, C.L., Valladares, F., et al. (2010). Plant phenotypic plasticity in a changing climate. Trends Plant Sci 15, 684–692.
OpenUrl CrossRef PubMed Web of Science
↵
Ossowski, S., Schneeberger, K., Clark, R.M., Lanz, C., Warthmann, N., and Weigel, D. (2008). Sequencing of natural strains of Arabidopsis thaliana with short reads. Genome Res 18, 2024–2033.
OpenUrl Abstract/FREE Full Text
↵
Ossowski, S., Schneeberger, K., Lucas-Uedo, J.I., Warthmann, N., Clark, R.M., Shaw, R.G., Weigel, D., and Lynch, M. (2010). The rate and molecular spectrum of spontaneous mutations in Arabidopsis thaliana. Science 327, 92–94.
OpenUrl Abstract/FREE Full Text
↵
Platt, A., Horton, M., Huang, Y.S., Li, Y., Anastasio, A.E., Mulyati, N.W., Agren, J., Bossdorf, O., Byers, D., Donohue, K., et al. (2010). The scale of population structure in Arabidopsis thaliana. PLoS Genet 6, e1000843.
OpenUrl CrossRef PubMed
↵
Schiffels, S., and Durbin, R. (2014). Inferring human population size and separation history from multiple genome sequences. Nat Genet.
↵
Schneeberger, K., Hagmann, J., Ossowski, S., Warthmann, N., Gesing, S., Kohlbacher, O., and Weigel, D. (2009). Simultaneous alignment of short reads against multiple genomes. Genome Biol 10, R98.
OpenUrl CrossRef PubMed
↵
Shaw, R.G., Byers, D.L., and Darmo, E. (2000). Spontaneous mutational effects on reproductive traits of Arabidopsis thaliana. Genetics 155, 369–378.
OpenUrl Abstract/FREE Full Text
↵
Slovak, R., Goschl, C., Su, X., Shimotani, K., Shiina, T., and Busch, W. (2014). A scalable open-source pipeline for large-scale root phenotyping of Arabidopsis. Plant Cell 26, 2390–2403.
OpenUrl Abstract/FREE Full Text
↵
Weigel, D. (2012). Natural variation in Arabidopsis: from molecular genetics to ecological genomics. Plant Physiol 158, 2–22.
OpenUrl FREE Full Text
↵
Wilson, I., and Barton, N.H. (1995). Genealogies and geography. Philosophical transactions of the Royal Society of London Series B. Biological sciences 349, 49–59.
OpenUrl CrossRef
↵
Yoshida, K., Schuenemann, V.J., Cano, L.M., Pais, M., Mishra, B., Sharma, R., Lanz, C., Martin, F.N., Kamoun, S., Krause, J., et al. (2013). The rise and fall of the Phytophthora infestans lineage that triggered the Irish potato famine. eLife 2, e00731.
OpenUrl CrossRef PubMed

View the discussion thread.

Posted April 25, 2016.

Download PDF

Citation Tools

Subject Area

Genomics

Subject Areas

All Articles

Animal Behavior and Cognition (5214)
Biochemistry (11745)
Bioengineering (8751)
Bioinformatics (29195)
Biophysics (14971)
Cancer Biology (12095)
Cell Biology (17411)
Clinical Trials (138)
Developmental Biology (9421)
Ecology (14178)
Epidemiology (2067)
Evolutionary Biology (18306)
Genetics (12245)
Genomics (16801)
Immunology (11867)
Microbiology (28083)
Molecular Biology (11592)
Neuroscience (60965)
Paleontology (451)
Pathology (1870)
Pharmacology and Toxicology (3238)
Physiology (4959)
Plant Biology (10427)
Scientific Communication and Education (1683)
Synthetic Biology (2885)
Systems Biology (7339)
Zoology (1651)

[1] ↵
Arunkumar, R., Ness, R.W., Wright, S.I., and Barrett, S.C. (2015). The evolution of selfing is accompanied by reduced efficacy of selection and purging of deleterious mutations. Genetics 199, 817–829.
OpenUrl Abstract/FREE Full Text

[2] ↵
Aulchenko, Y.S., Ripke, S., Isaacs, A., and van Duijn, CM. (2007). GenABEL: an R library for genome-wide association analysis. Bioinformatics 23, 1294–1296.
OpenUrl CrossRef PubMed Web of Science

[3] ↵
H.G. Baker, and
G.L Stebbins, eds
Baker, H.G. (1965). Characteristics and modes of origin of weeds. In The Genetics of Colonizing Species, H.G. Baker, and G.L Stebbins, eds. (New York: Academic Press), pp. 147–168.

[4] H.G. Baker, and

[5] G.L Stebbins, eds

[6] ↵
Barrett, R.D.H., and Schluter, D. (2008). Adaptation from standing genetic variation. Trends Ecol Evol 23, 38–44.
OpenUrl CrossRef PubMed Web of Science

[7] ↵
Barrett, S.C.H. (2014). Foundations of invasion genetics: the Baker and Stebbins legacy. Mol Ecol 24, 1927–1941.
OpenUrl

[8] ↵
Barrick, J.E., and Lenski, R.E. (2013). Genome dynamics during experimental evolution. Nat Rev Genet 14, 827–839.
OpenUrl CrossRef PubMed

[9] ↵
Becker, C., Hagmann, J., Müller, J., Koenig, D., Stegle, O., Borgwardt, K., and Weigel, D. (2011). Spontaneous epigenetic variation in the Arabidopsis thaliana methylome. Nature 480, 245–249.
OpenUrl CrossRef PubMed Web of Science

[10] ↵
Bentsink, L., Hanson, J., Hanhart, C.J., Blankestijn-de Vries, H., Coltrane, C., Keizer, P., El-Lithy, M., Alonso-Blanco, C., de Andres, M.T., Reymond, M., et al. (2010). Natural variation for seed dormancy in Arabidopsis is regulated by additive genetic and molecular pathways. Proc Natl Acad Sci USA 107, 4264–4269.

[11] ↵
Bock, D.G., Caseys, C., Cousens, R.D., Hahn, M.A., Heredia, S.M., Hübner, S., Turner, K.G., Whitney, K.D., and Rieseberg, LH. (2015). What we still don’t know about invasion genetics. Mol Ecol 24, 2277–2297.
OpenUrl CrossRef

[12] ↵
Bouckaert, R., Heled, J., Kühnert, D., Vaughan, T., Wu, C.-H., Xie, D., Suchard, M.a., Rambaut, A., and Drummond, A.J. (2014). BEAST 2: a software platform for Bayesian evolutionary analysis. PLoS CompBiol 10, e1003537.
OpenUrl

[13] ↵
Briggs, A.W., Stenzel, U., Johnson, P.L., Green, R.E., Kelso, J., Prüfer, K., Meyer, M., Krause, J., Ronan, M.T., Lachmann, M., et al. (2007). Patterns of damage in genomic DNA sequences from a Neandertal. Proc Natl Acad Sci USA 104, 14616–14621.

[14] ↵
Briggs, A.W., Stenzel, U., Meyer, M., Krause, J., Kircher, M., and Pääbo, S. (2010). Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA. Nucleic Acids Res 38, e87.
OpenUrl CrossRef PubMed

[15] ↵
Brouwer, D.J., and St Clair, D.A. (2004). Fine mapping of three quantitative trait loci for late blight resistance in tomato using near isogenic lines (NILs) and sub-NILs. Theor Appl Genet 108, 628–638.
OpenUrl CrossRef PubMed Web of Science

[16] ↵
Bustamante, C.D., Nielsen, R., Sawyer, S.A., Olsen, K.M., Purugganan, M.D., and Hartl, D.L. (2002). The cost of inbreeding in Arabidopsis. Nature 416, 531–534.
OpenUrl CrossRef PubMed Web of Science

[17] ↵
Cao, J., Schneeberger, K., Ossowski, S., Gunther, T., Bender, S., Fitz, J., Koenig, D., Lanz, C., Stegle, O., Lippert, C., et al. (2011). Whole-genome sequencing of multiple Arabidopsis thaliana populations. Nat Genet 43, 956–963.
OpenUrl CrossRef PubMed

[18] ↵
Charlesworth, B., and Charlesworth, D. (2010). Elements of Evolutionary Genetics (Roberts and Company: Greenwood Village, CO, 2010).

[19] ↵
Charlesworth, D., and Wright, S.I. (2001). Breeding systems and genome evolution. Curr Opin Genet Dev 11, 685–690.
OpenUrl CrossRef PubMed Web of Science

[20] ↵
Choi, K., Zhao, X., Kelly, K.A., Venn, O., Higgins, J.D., Yelina, N.E., Hardcastle, T.J., Ziolkowski, P.A., Copenhaver, G.P., Franklin, F.C., et al. (2013). Arabidopsis meiotic crossover hot spots overlap with H2A.Z nucleosomes at gene promoters. Nat Genet 45, 1327–1336.
OpenUrl CrossRef PubMed

[21] ↵
Crawford, P.H.C., and Hoagland, B.W. (2009). Can herbarium records be used to map alien species invasion and native species expansion over the past 100 years? J Biogeography 36, 651–661.
OpenUrl

[22] ↵
Drummond, A.J., Suchard, M.A., Xie, D., and Rambaut, A. (2012). Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol Biol Evol 29, 1969–1973.
OpenUrl CrossRef PubMed Web of Science

[23] ↵
Fletcher, R.S., Mullen, J.L., Yoder, S., Bauerle, W.L., Reuning, G., Sen, S., Meyer, E., Juenger, T.E., and McKay, J.K. (2013). Development of a next-generation NIL library in Arabidopsis thaliana for dissecting complex traits. BMC Genomics 14, 655.
OpenUrl CrossRef PubMed

[24] ↵
Fu, Q., Li, H., Moorjani, P., Jay, F., Slepchenko, S.M., Bondarev, A.A., Johnson, P.L.F., Aximu-Petri, A., Prüfer, K., de Filippo, C., et al. (2014). Genome sequence of a 45,000-year-old modern human from western Siberia. Nature 514, 445–449.
OpenUrl CrossRef PubMed Web of Science

[25] ↵
Gauze, G.F. (1934). The Struggle for Existence (Baltimore: Williams & Wilkins).

[26] ↵
Gill, M.S., Lemey, P., Faria, N.R., Rambaut, A., Shapiro, B., and Suchard, M.a. (2012). Improving Bayesian population dynamics inference: a coalescent-based model for multiple loci. Mol Biol Evol 30, 713–724.
OpenUrl PubMed Web of Science

[27] ↵
Goldewijk, K.K., and Ramankutty, N. (2004). Land cover change over the last three centuries due to human activities: The availability of new global data sets. Geojournal 61, 335–334.
OpenUrl CrossRef

[28] ↵
Green, R.E., and Shapiro, B. (2013). Human evolution: turning back the clock. Curr Biol 23, R286–288.
OpenUrl CrossRef PubMed

[29] ↵
Hagmann, J., Becker, C., Müller, J., Stegle, O., Meyer, R.C., Wang, G., Schneeberger, K., Fitz, J., Altmann, T., Bergelson, J., et al. (2015). Century-scale methylome stability in a recently diverged Arabidopsis thaliana lineage. PLoS Genet 11, e1004920.
OpenUrl CrossRef PubMed

[30] ↵
Halligan, D.L., and Keightley, P.D. (2009). Spontaneous mutation accumulation studies in evolutionary genetics. Annu Rev Ecol Evol S 40, 151–172.
OpenUrl

[31] ↵
Hancock, AM., Brachi, B., Faure, N., Horton, M.W., Jarymowycz, LB., Sperone, F.G., Toomajian, C., Roux, F., and Bergelson, J. (2011). Adaptation to climate across the Arabidopsis thaliana genome. Science 334, 83–86.
OpenUrl Abstract/FREE Full Text

[32] ↵
Ho, S.Y.W., Phillips, M.J., Cooper, A., and Drummond, A.J. (2005). Time dependency of molecular rate estimates and systematic overestimation of recent divergence times. Mol Biol Evol 22, 1561–1568.
OpenUrl CrossRef PubMed Web of Science

[33] ↵
Hofreiter, M., Jaenicke, V., Serre, D., Haeseler Av, A., and Pääbo, S. (2001). DNA sequences from multiple amplifications reveal artifacts induced by cytosine deamination in ancient DNA. Nucleic Acids Res 29, 4793–4799.
OpenUrl CrossRef PubMed Web of Science

[34] ↵
Hu, T.T., Pattyn, P., Bakker, E.G., Cao, J., Cheng, J.F., Clark, R.M., Fahlgren, N., Fawcett, J.A., Grimwood, J., Gundlach, H., et al. (2011). The Arabidopsis lyrata genome sequence and the basis of rapid genome size change. Nat Genet 43, 476–481.
OpenUrl CrossRef PubMed Web of Science

[35] ↵
Hudson, R.R., and Kaplan, N.L. (1985). Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111, 147–164.
OpenUrl Abstract/FREE Full Text

[36] ↵
Huson, D.H., and Bryant, D. (2006). Application of phylogenetic networks in evolutionary studies. Mol Biol Evol 23, 254–267.
OpenUrl CrossRef PubMed Web of Science

[37] ↵
Jiang, C., Mithani, A., Belfield, E.J., Mott, R., Hurst, L.D., and Harberd, N.P. (2014). Environmentally responsive genome-wide accumulation of de novo Arabidopsis thaliana mutations and epimutations. Genome Res 24, 1821–1829.
OpenUrl Abstract/FREE Full Text

[38] ↵
Jombart, T. (2008). adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics 24, 1403–1405.
OpenUrl CrossRef PubMed Web of Science

[39] ↵
Keightley, P.D., and Eyre-Walker, A. (2007). Joint inference of the distribution of fitness effects of deleterious mutations and population demography based on nucleotide polymorphism frequencies. Genetics 177, 2251–2261.
OpenUrl Abstract/FREE Full Text

[40] ↵
Keurentjes, J.J., Bentsink, L., Alonso-Blanco, C., Hanhart, C.J., Blankestijn-De Vries, H., Effgen, S., Vreugdenhil, D., and Koornneef, M. (2007). Development of a near-isogenic line population of Arabidopsis thaliana and comparison of mapping power with a recombinant inbred line population. Genetics 175, 891–905.
OpenUrl Abstract/FREE Full Text

[41] ↵
Kim, S., Choi, H.I., Ryu, H.J., Park, J.H., Kim, M.D., and Kim, S.Y. (2004). ARIA, an Arabidopsis arm repeat protein interacting with a transcriptional regulator of abscisic acid-responsive gene expression, is a novel abscisic acid signaling component. Plant Physiol 136, 3639–3648.
OpenUrl Abstract/FREE Full Text

[42] ↵
Kimura, M. (1967). On the evolutionary adjustment of spontaneous mutation rates. Genet Res 9, 23.
OpenUrl CrossRef Web of Science

[43] ↵
Kircher, M. (2012). Analysis of high-throughput ancient DNA sequencing data. Methods Mol Biol 840, 197–228.
OpenUrl CrossRef PubMed Web of Science

[44] ↵
Kong, A., Frigge, M.L., Masson, G., Besenbacher, S., Sulem, P., Magnusson, G., Gudjonsson, S.a., Sigurdsson, A., Jonasdottir, A., Jonasdottir, A., et al. (2012). Rate of de novo mutations and the importance of father’s age to disease risk. Nature 488, 471–475.
OpenUrl CrossRef PubMed Web of Science

[45] ↵
Krause, J., Briggs, A.W., Kircher, M., Maricic, T., Zwyns, N., Derevianko, A., and Pääbo, S. (2010). A complete mtDNA genome of an early modern human from Kostenki, Russia. Curr Biol 20, 231–236.
OpenUrl

[46] ↵
Lankaua, R.A., Nuzzo, V., Spyreasa, G., and Davisc, A.S. (2009). Evolutionary limits ameliorate the negative impact of an invasive plant. Proc Natl Acad Sci USA 107, 1253.

[47] ↵
Lee, C.E. (2002). Evolutionary genetics of invasive species. Trends Ecol Evol 17, 386–391.
OpenUrl CrossRef Web of Science

[48] ↵
Librado, P., and Rozas, J. (2009). DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics 25, 1451–1452.
OpenUrl CrossRef PubMed Web of Science

[49] ↵
Lipson, M., Loh, P.R., Sankararaman, S., Patterson, N., Berger, B., and Reich, D. (2015). Calibrating the Human Mutation Rate via Ancestral Recombination Density in Diploid Genomes. PLoS Genet 11, e1005550.
OpenUrl CrossRef PubMed

[50] ↵
Lundemo, S., Falahati-Anbaran, M., and Stenoien, H.K. (2009). Seed banks cause elevated generation times and effective population sizes of Arabidopsis thaliana in northern Europe. Mol Ecol 18, 2798–2811.
OpenUrl CrossRef PubMed

[51] ↵
Markakis, M.N., Boron, A.K., Van Loock, B., Saini, K., Cirera, S., Verbelen, J.P., and Vissenberg, K. (2013). Characterization of a small auxin-up RNA (SAUR)-like gene involved in Arabidopsis thaliana development. PLoS One 8, e82596.
OpenUrl CrossRef PubMed

[52] ↵
Maron, J.L., Vila, M., Bommarco, R., Elmendorf, S., and Beardsley, P. (2004). Rapid evolution of an invasive plant. Ecol Monogr 74, 261–280.
OpenUrl CrossRef Web of Science

[53] ↵
Martin, M.D., Cappellini, E., Samaniego, J.A., Zepeda, M.L., Campos, P.F., Seguin-Orlando, A., Wales, N., Orlando, L., Ho, S.Y., Dietrich, F.S., et al. (2013). Reconstructing genome evolution in historic samples of the Irish potato famine pathogen. Nat Commun 4, 2172.
OpenUrl CrossRef PubMed

[54] ↵
Meyer, M., and Kircher, M. (2010). Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb Protoc 2010, pdb prot5448.

[55] ↵
Montesinos, A., Tonsor, S.J., Alonso-Blanco, C., and Pico, F.X. (2009). Demographic and genetic patterns of variation among populations of Arabidopsis thaliana from contrasting native environments. PLoS One 4, e7213.
OpenUrl CrossRef PubMed

[56] ↵
Nachman, M.W., and Crowell, S.L. (2000). Estimate of the mutation rate per nucleotide in humans. Genetics 156, 297–304.
OpenUrl Abstract/FREE Full Text

[57] ↵
Ness, R.W., Morgan, A.D., Vasanthakrishnan, R.B., Colegrave, N., and Keightley, P.D. (2015). Extensive de novo mutation rate variation between individuals and across the genome of Chlamydomonas reinhardtii. Genome Res.

[58] ↵
Ness, R.W., Wright, S.I., and Barrett, S.C. (2010). Mating-system variation, demographic history and patterns of nucleotide diversity in the Tristylous plant Eichhornia paniculata. Genetics 184, 381–392.
OpenUrl Abstract/FREE Full Text

[59] ↵
Nordborg, M., Hu, T.T., Ishino, Y., Jhaveri, J., Toomajian, C., Zheng, H., Bakker, E., Calabrese, P., Gladstone, J., Goyal, R., et al. (2005). The pattern of polymorphism in Arabidopsis thaliana. PLoS Biol 3, e196.
OpenUrl CrossRef PubMed

[60] ↵
Orlando, L., Gilbert, M.T., and Willerslev, E. (2015). Reconstructing ancient genomes and epigenomes. Nat Rev Genet 16, 395–408.
OpenUrl CrossRef PubMed

[61] ↵
Ossowski, S., Schneeberger, K., Lucas-Lledo, J.L., Warthmann, N., Clark, R.M., Shaw, R.G., Weigel, D., and Lynch, M. (2010). The rate and molecular spectrum of spontaneous mutations in Arabidopsis thaliana. Science 327, 92–94.
OpenUrl Abstract/FREE Full Text

[62] ↵
Platt, A., Horton, M., Huang, Y.S., Li, Y., Anastasio, A.E., Mulyati, N.W., Agren, J., Bossdorf, O., Byers, D., Donohue, K., et al. (2010). The scale of population structure in Arabidopsis thaliana. PLoS Genet 6, e1000843.
OpenUrl CrossRef PubMed

[63] ↵
Prüfer, K., and Meyer, M. (2015). Comment on “Late Pleistocene human skeleton and mtDNA link Paleoamericans and modern Native Americans”. Science 347, 835.
OpenUrl Abstract/FREE Full Text

[64] ↵
Roach, J.C., Glusman, G., Smit, A.F., Huff, CD., Hubley, R., Shannon, P.T., Rowen, L., Pant, K.P., Goodman, N., Bamshad, M., et al. (2010). Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328, 636–639.
OpenUrl Abstract/FREE Full Text

[65] ↵
Sax, D.F., Stachowicz, J.J., Brown, J.K., Bruno, J.F., Dawson, M.N., Gaines, S.D., Grosberg, R.K., Hastings, A., Holt, R.D., Mayfield, M.M., et al. (2007). Ecological and evolutionary insights from species invasions. Trends EcolEvol 22, 465–471.
OpenUrl CrossRef PubMed Web of Science

[66] ↵
Scally, A., and Durbin, R. (2012). Revising the human mutation rate: implications for understanding human evolution. Nat Rev Genet 13, 745–753.
OpenUrl CrossRef PubMed

[67] ↵
Schneeberger, K., Hagmann, J., Ossowski, S., Warthmann, N., Gesing, S., Kohlbacher, O., and Weigel, D. (2009). Simultaneous alignment of short reads against multiple genomes. Genome Biol 10, R98.
OpenUrl CrossRef PubMed

[68] ↵
Ségurel, L., Wyman, M.J., and Przeworski, M. (2014). Determinants of mutation rate variation in the human germline. Annu Rev Genomics Hum Genet 15, 47–70.
OpenUrl CrossRef PubMed

[69] ↵
Shapiroa, B., and Hofreiter, M. (2014). A paleogenomic perspective on evolution and gene function: new insights from ancient DNA. Science 343, 1236573.
OpenUrl Abstract/FREE Full Text

[70] ↵
Shaw, R.G., Byers, D.L., and Darmo, E. (2000). Spontaneous mutational effects on reproductive traits of Arabidopsis thaliana. Genetics 155, 369–378.
OpenUrl Abstract/FREE Full Text

[71] ↵
Sniegowski, P.D., Gerrish, P.J., and Lenski, R.E. (1997). Evolution of high mutation rates in experimental populations of E. coli. Nature 387, 703–705.
OpenUrl CrossRef PubMed Web of Science

[72] ↵
Spartz, A.K., Lee, S.H., Wenger, J.P., Gonzalez, N., Itoh, H., Inze, D., Peer, W.A., Murphy, A.S., Overvoorde, P.J., and Gray, W.M. (2012). The SAUR19 subfamily of SMALL AUXIN UP RNA genes promote cell expansion. Plant J 10, 978–990.
OpenUrl

[73] ↵
Staats, M., Erkens, R.H.J., van de Vossenberg, B., Wieringa, J.J., Kraaijeveld, K., Stielow, B., Geml, J., Richardson, J.E., and Bakker, F.T. (2013). Genomic treasure troves: complete genome sequencing of herbarium and insect museum specimens. PLoS ONE 8, e69189.
OpenUrl CrossRef PubMed

[74] ↵
Stec, A.O., Bhaskar, P.B., Bolon, Y.T., Nolan, R., Shoemaker, R.C., Vance, C.P., and Stupar, R.M. (2013). Genomic heterogeneity and structural variation in soybean near isogenic lines. Front Plant Sci 4, 104.
OpenUrl PubMed

[75] ↵
Subramanian, S., and Kumar, S. (2003). Neutral substitutions occur at a faster rate in exons than in noncoding DNA in primate genomes. Genome Res 13, 838–844.
OpenUrl Abstract/FREE Full Text

[76] ↵
Subramanian, S., and Lambert, D.M. (2011). Time dependency of molecular evolutionary rates? Yes and no. Genome Biology and Evolution 3, 1324–1328.
OpenUrl CrossRef PubMed

[77] ↵
Swarup, K., Alonso-Blanco, C., Lynn, J.R., Michaels, S.D., Amasino, R.M., Koornneef, M., and Millar, A.J. (1999). Natural allelic variation identifies new genes in the Arabidopsis circadian system. Plant J 20, 67–77.
OpenUrl CrossRef PubMed Web of Science

[78] ↵
Szalma, S.J., Hostert, B.M., Ledeaux, J.R., Stuber, C.W., and Holland, J.B. (2007). QTL mapping with near-isogenic lines in maize. Theor Appl Genet 114, 1211–1228.
OpenUrl CrossRef PubMed Web of Science

[79] ↵
Tajima, F. (1989). Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123, 585–595.
OpenUrl Abstract/FREE Full Text

[80] ↵
Templeton, A.R., Sing, C.F., Kessling, A., and Humphries, S. (1988). A Cladistic-Analysis of Phenotype Associations with Haplotypes Inferred from Restriction Endonuclease Mapping .2. The Analysis of Natural-Populations. Genetics 120, 1145–1154.
OpenUrl

[81] ↵
van Kleunen, M., Dawson, W., Essl, F., Pergl, J., Winter, M., Weber, E., Kreft, H., Weigelt, P., Kartesz, J., Nishino, M., et al. (2015). Global exchange and accumulation of non-native plants. Nature 525, 100–103.
OpenUrl CrossRef PubMed

[82] ↵
Vandepitte, K., de Meyer, T., Helsen, K., van Acker, K., Roldán-Ruiz, J., Mergeay, J., and Honnay, O. (2014). Rapid genetic adaptation precedes the spread of an exotic plant species. Mol Ecol 23, 2157–2164.
OpenUrl

[83] ↵
Watterson, G.A. (1975). On the number of segregating sites in genetical models without recombination. Theor Pop Biol 7, 256–276.
OpenUrl CrossRef PubMed Web of Science

[84] ↵
Weigel, D. (2012). Natural variation in Arabidopsis: from molecular genetics to ecological genomics. Plant Physiol 158, 2–22.
OpenUrl FREE Full Text

[85] ↵
Weiss, C.L., Dannemann, M., Prufer, K., and Burbano, H.A. (2015). Contesting the presence of wheat in the British Isles 8,000 years ago by assessing ancient DNA authenticity from low-coverage data. eLife 4.

[86] ↵
Weiß, C.L., Schuenemann, V.J., Devos, J., Shirsekar, G., Reiter, E., Gould, B.A., Stinchcombe, J.R., Krause, J., and Burbano, H.A. (2015). Temporal patterns of damage and decay kinetics of DNA retrieved from plant herbarium specimens. bioRxiv http://dx.doi.org/1O.11O1/023135.

[87] ↵
Wright, S.I., Ness, R.W., Foxe, J.P., and Barret, S.C.H. (2008). Genomic consequences of outcrossing and selfing in plants. Int J Plant Sci 169, 105–118.
OpenUrl CrossRef Web of Science

[88] ↵
Xie, X., Song, M.H., Jin, F., Ahn, S.N., Suh, J.P., Hwang, H.G., and McCouch, S.R. (2006). Fine mapping of a grain weight quantitative trait locus on rice chromosome 8 using near-isogenic lines derived from a cross between Oryza sativa and Oryza rufipogon. Theor Appl Genet 113, 885–894.
OpenUrl CrossRef PubMed Web of Science

[89] ↵
Yoshida, K., Schuenemann, V.J., Cano, L.M., Pais, M., Mishra, B., Sharma, R., Lanz, C., Martin, F.N., Kamoun, S., Krause, J. et al. (2013). The rise and fall of the Phytophthora infestans lineage that triggered the Irish potato famine. eLife 2, e00731.
OpenUrl CrossRef PubMed

[90] ↵
Zhang, H., Tang, K., Qian, W., Duan, C.G., Wang, B., Zhang, H., Wang, P., Zhu, X., Lang, Z., Yang, Y., et al. (2014). An Rrp6-like protein positively regulates noncoding RNA levels and DNA methylation in Arabidopsis. Mol Cell 54, 418–430.
OpenUrl CrossRef PubMed Web of Science

The rate and effect of de novo mutations in natural populations of Arabidopsis thaliana

SUMMARY

INTRODUCTION

RESULTS AND DISCUSSION

Herbarium and modern HPGI genomes

Diversity and relationships within HPGI

Estimates of mutation rate and spectrum in the wild

Genome-wide inference of selection

Phenotypic effect and spatio-temporal context of de novo mutations

Population demography and migrations

CONCLUSIONS

EXPERIMENTAL PROCEDURES

Sample collection and DNA sequencing

Phylogenetic methods and genome-wide statistics

Substitution and mutation rate analyses

Inference of genome-wide selection parameters

Association analyses and dating of new mutations

Accession numbers

SUPPLEMENTAL INFORMATION

SUPPLEMENTAL INFORMATION FOR

SUPPLEMENTAL EXPERIMENTAL PROCEDURES

Sample collection

DNA extraction, library preparation and sequencing

Read processing

Identification of bona fide HPGI accessions and HPGI phylogeny

Descriptive genome-wide statistics

Substitution and mutation rate analyses

Greenhouse.grown mutation accumulation lines

Natural populations of HPGI

Demography and migration of HPGI

Analysis of the methylation status of mutated sites

Inference of genome-wide selection parameters

Association analyses and dating of newly arisen mutations

Association analyses – proof of concept examples

Phenotyping

Root phenotypes

Seed size phenotype:

Flowering time in growth chambers

Flowering time and fecundity in the field

ACKNOWLEDGEMENTS

AUTHOR CONTRIBUTIONS

REFERENCES

SUPPLEMENTAL REFERENCES

Citation Manager Formats

Subject Area