Summary
The tree of life is highly reticulate, with the history of population divergence buried amongst phylogenies deriving from introgression and lineage sorting. In this study, we test the hypothesis that there are regions of the oak (Quercus, Fagaceae) genome that are broadly informative about phylogeny and investigate global patterns of oak diversity.
We utilize fossil data and restriction-site associated DNA sequencing (RAD-seq) for 632 individuals representing ca. 250 oak species to infer a time-calibrated phylogeny of the world’s oaks. We use reversible-jump MCMC to reconstruct shifts in lineage diversification rates, accounting for among-clade sampling biases. We then map the > 20,000 RAD-seq loci back to a recently published oak genome and investigate genomic distribution of introgression and phylogenetic support across the phylogeny.
Oak lineages have diversified among geographic regions, followed by ecological divergence within regions, in the Americas and Eurasia. Roughly 60% of oak diversity traces back to four clades that experienced increases in net diversification due to climatic transitions or ecological opportunity.
The support we find for the phylogeny contrasts with high genomic heterogeneity in phylogenetic signal and introgression. Oaks are phylogenomic mosaics, and their diversity may in fact depend on the gene flow that shapes the oak genome.
Introduction
The tree of life exhibits reticulation from its base to its tips (Folk et al., 2018; Quammen, 2018). Oaks (Quercus L., Fagaceae) are no exception (Hipp, 2018), and in fact the genus is rife with case-studies in localized gene flow (e.g. Hardin, 1975; Whittemore & Schaal, 1991; McVay et al., 2017a; Kim et al., 2018), and ancient introgression (Crowl et al., In review; McVay et al., 2017b; Kim et al., 2018). Oaks have in fact been held up as a paradigmatic syngameon (Hardin, 1975; Van Valen, 1976; Dodd & Afzal-Rafii, 2004; Cannon & Scher, 2017; Boecklen, 2017), a system of interbreeding species in which incomplete reproductive isolation may facilitate adaptive gene flow and species migration (Petit et al., 2003; Dodd & Afzal-Rafii, 2004). The oak genome (Plomion et al., 2018) consequently tracks numerous unique species-level phylogenetic histories that result from lineage sorting and differential rates of introgression (Anderson, 1953; Eaton et al., 2015; McVay et al., 2017b; Edelman et al., 2018). Oak genomes are mosaics of disparate phylogenetic histories (cf. Pääbo, 2003). Given the prevalence of hybridization in trees globally (Petit & Hampe, 2006; Cannon & Lerdau, 2015), understanding how these stories line up with one another, and whether there are regions of the genome that track a common story, is essential to understanding the prevalence of adaptive gene flow and the phylogenetic history of forest trees.
Restriction-site associated DNA sequencing (RAD-seq; Miller et al., 2007a,b; Lewis et al., 2007; Baird et al., 2008; Ree & Hipp, 2015) has revolutionized our understanding of oak phylogeny in the past five years (Jiang et al., In review; Hipp et al., 2014, 2018; Cavender-Bares et al., 2015; Eaton et al., 2015; Hipp, 2017; Fitz-Gibbon et al., 2017; Pham et al., 2017; Ortego et al., 2018; Deng et al., 2018; Kim et al., 2018). Its ties to the genome, however, have not been fully exploited because of the lack of an assembled genome. While earlier studies have explored the effects of gene identity on phylogenetic informativeness (Hipp et al., 2014) and genomic heterogeneity in phylogenetic vs. introgressive signals (McVay et al., 2017b,a), they have not had access to the oak genome sequence. As a consequence, we do not understand the distribution of genomic breakpoints between introgressive and divergent histories. Moreover, no studies to date have brought together a comprehensive sampling of taxa to investigate the history of diversification across the genus.
In this paper, we integrate data from the recently published Quercus robur genome (Plomion et al., 2016, 2018) with previously published RAD-seq data for 427 sequenced oak individuals across the tree of life and new RAD-seq data for an additional 205 individuals to investigate the global oak phylogenomic mosaic for approximately 60% of the world’s Quercus species. We test the hypothesis that there are regions of the genome that are uniformly informative about Quercus phylogeny, regions that make oak lineages what they are. Furthermore, using a time-calibrated one-tip-per species tree novel to this study for ca. 60% of known species, we test the hypothesis that the high diversity of oaks in Mexico and eastern China is a consequence of high diversification rates. Finally, we show that the consensus of the evolutionary histories of more than 20,000 RAD-seq loci matches our understanding of oak evolution based on morphological information from extant and fossil species in spite of broadly conflicting individual locus genealogies.
Materials and Methods
Previously published RAD-seq and new RAD-seq: sequencing and clustering
Data from previously published RAD-seq phylogenies were analyzed alongside new RAD-seq data for a total of 632 individuals (Table S1). RAD-seq data were generated as described in the previous studies. New data were from library preparations conducted at Floragenex, Inc. (Portland, OR, USA) following the methods of Baird et al. (2008) with PstI, barcoded by individual, and sequenced on an Illumina Genome Analyzer IIx at Floragenex, or an Illumina HiSeq 2500 or HiSeq 4000 at the University of Oregon Genomic Facility.
FASTQ files were demultiplexed and filtered to remove sequences with more than 5 bases of quality score < 20 and assembled into loci for phylogenetic analysis using ipyrad 0.7.23 (Eaton, 2014) at 85% sequence similarity. Consensus sequences for each individual for each locus were then clustered across individuals, retaining loci present in at least 4 individuals and possessing a maximum of 20 SNPs and 8 indels across individuals. The dataset was filtered to loci with a minimum of 15 individuals each, for a total of 58,985 loci. Data were imported into R using the RADami package (Hipp et al., 2014) for downstream analysis.
RAD-seq loci were mapped back to the latest version of the Quercus robur haploid genome (haplome 2.3; https://urgi.versailles.inra.fr/Data/Genome/Genome-data-access) (Plomion et al., 2018). The oak genome is made of 12 pseudomolecules (i.e. chromosomes) and a set of 538 unassigned scaffolds. Mapping was performed using Blast+ 2.8.1 (Camacho et al., 2009). We filtered alignments based on expect (E) values (E-value ≤10−5), alignment length (≥80% of the length of the loci) and percent identity (≥80%). For each locus, the best alignment was kept. All sequence data analyzed in this paper are available as FASTQ files from NCBI’s Short Read Archive (Table S1), and aligned loci and additional data and scripts for all analysis are available from https://github.com/andrew-hipp/global-oaks-2019. Analysis details are in the Supplement (Methods S1).
Phylogenetic analysis
Maximum likelihood phylogenetic analyses were conducted in RAxML v8.2.4 (Stamatakis, 2014) using the GTRCAT implementation of the general time reversible model of nucleotide evolution (Stamatakis, 2006), with branch support assessed using RELL bootstrapping (Minh et al., 2013). For the phylogeny including all tips (Fig. S1), analysis was unconstrained, and we used the taxonomic disparity index (TDI) of Pham et al. (2016) to identify the extent of non-monophyly by species. Topology within the white oaks of sections Ponticae, Virentes, and Quercus (hereafter in the paper “white oaks s.l.,” contrasted with “white oaks s.str.” for just section Quercus) was observed to be at odds with previous close studies (Crowl et al., In review; McVay et al., 2017b,a; Hipp et al., 2018) that have shown the topology of the white oaks s.l. to be sensitive to taxon and locus sampling. For dating, samples were pruned to one sample per named species, favoring samples with the most loci, except for species in which variable position of samples from different populations was deemed to represent cryptic diversity, in which case more than one exemplar was retained. The singletons tree was estimated in RAxML using a phylogenetic constraint (Manos, 2016; McVay et al., 2017b; Hipp et al., 2018) available in the supplemental methods and supplemental data. The remainder of the tree was unconstrained and conforms closely to previous topologies.
We utilized neighbor-net (Bryant & Moulton, 2004) to visualize overall patterns of molecular genetic diversity. Likelihood-based methods (e.g., Solís-Lemus & Ané, 2016; Solís-Lemus et al., 2017; Wen et al., 2018; Zhang et al., 2018) that we have utilized on smaller oak datasets (Crowl et al., In review; Eaton et al., 2015; Hauser et al., 2017; McVay et al., 2017b,a) proved computationally intractable for the current dataset. Consequently, we utilized a splits network inferred with SPLITSTREE v. 14.3 (Huson & Bryant, 2006) based on the maximum-likelihood (GTR+gamma) pairwise distance matrix estimated in RAxML and the same datasets utilized for the singletons tree. Full phylogenetic analysis details are in the Supplement (Methods S1).
Calibration of singletons tree
Branch lengths on the tree were inferred using penalized likelihood under both a relaxed model, where rates are uncorrelated among branches (Paradis, 2013), and a correlated rates model (which corresponds to the penalized likelihood approach of Sanderson, 2002), as implemented in the chronos function of ape v 5.1 (Paradis et al., 2004) of R v 3.4.4 (“Someone to Lean On”) (R-Development-Core-Team, 2004). Nodes were calibrated in two different ways, either using eight fossil calibrations, corresponding to the crown of the genus and seven key clades (Fig. S2a; Table 1), or more conservatively as stem ages, using a subset of five fossils (Fig. S2b; Table 1). The two calibrations (referred to as the ‘crown calibration’ and ‘stem calibration’ respectively) bracket what we consider to be plausible age ranges for the tree. A separate estimate of the best fit λ for the correlated clock model was made using cross-validation as implemented in the chronopl function of ape, and that value of λ was used for both the relaxed and correlated clocks. Comparison of □IC was used to identify the best fit model for each value of λ. Analysis details are in the Supplement (Methods S1)
Transitions in lineage diversification rates were estimated using the speciation-extinction model implemented in Bayesian Analysis of Macroevolutionary Mixtures (BAMM) (Rabosky, 2014); the BAMMtools R package was used for configuration and analysis of MCMC. Priors were set using the setBAMMpriors function. Analyses were run for 4E06 generations, saving every 2000 generations, with four chains per MCMC analysis. To visualize changes in standing diversity over time for the different sections, we plotted lineage through time (LTT) plots by section against δ18O levels reported in Zachos et al. (2001) as a temperature proxy. Analysis details are in the Supplement (Methods S1).
Investigating the genomic landscape of oak evolutionary history
Introgressive status of loci for two known introgression events involving the Eurasian white oaks (McVay et al., 2017b) and the western North American lobed-leaf white oaks (McVay et al., 2017a) was assessed by calculating the likelihood of phylogenies inferred for each locus under the constraint of the inferred divergence history (species tree) and the gene flow history at odds with that divergence history, as inferred in the studies cited above. These two cases are of particular interest because they are well studied, and lineage sorting has been ruled out in the above studies as an explanation of incongruence between the alternative topologies we test. Position of loci with a relative support of at least 2 log-likelihood points for one history relative to the other were mapped back to the Quercus robur genome (Plomion et al., 2018). Analysis details are in the Supplement (Methods S1).
To identify relative phylogenetic informativeness of loci, two tests were conducted based on the singletons tree. First, the ML topology was estimated in RAxML for each of 2,762 mapped, rootable loci of at least 10 individuals that resolved at least one bipartition. Overall, locus trees resolved an average of 4.48 (+/−1.83 s.d.) nodes, with a maximum of 15 and a median of 4. These were compared with the total-evidence tree using quartet similarities using the tqDist algorithm (Sand et al., 2014) in the Quartet package (Smith, 2019). We used as our similarity metric the number of quartets resolved the same way for both the locus tree and the whole singletons tree divided by the sum of quartets resolved the same or differently. Then, these same locus trees were mapped back to the singletons tree using phyparts (Smith et al., 2015), which identifies for all branches on a single tree how many individual locus trees support or reject that branch. We tested for genomic autocorrelation in phylogenetic signal using spline correlograms (Bjørnstad & Falck, 2001; Bjørnstad, 2008), with each chromosome tested independently. Analysis details are in the Supplement (Methods S1).
Results
RAD-seq data matrix
RAD-seq library preps and sequencing yielded a mean of 1.685E06 ± 1.104E06 (s.d.) raw reads per individual; of these, > 99.8% (1.683E06 ± 1.104E06) passed quality filters. The total number of clusters per individual prior to clustering across individuals was 101,895 ± 58,810, with a mean depth of 17.2 ± 11.2 sequences per individual and cluster. Clusters with more than 10,000 sequences per individual were discarded. Mean estimated heterozygosity by individual was 0.0135 ± 0.0027, and sequencing error rate was 0.0020 ± 0.0004. After clustering, a total of 49,991 loci were present in at least 15 individuals each. Each individual in the final dataset posseses 6.48% ± 2.48% of all clustered loci. The total data matrix is 4.352 × 106 aligned nucleotides in width. The singletons dataset is composed of 22,432 loci present in at least 15 individuals, making up a dataset of 1.970 × 106 aligned nucleotides.
All-tips tree
The all-tips tree (Fig. S1) comprises 246 named Quercus species, of which 99 have a single sample. The remaining 147 species have an average of 3.54 ± 2.72 (s.d.) samples each. 97 of the 147 species with more than one sample cohere for all samples, and only 13 have a taxonomic disparity index (TDI, Pham et al., 2016) of 10 or more (Table S3), suggesting taxonomic problems beyond difficulties distinguishing very close relatives. All but four are Mexican species or species split between the southwestern U.S. and Mexico (see Discussion). Of the others, the largest TDI values are for Q. stellata and Q. parvula of North America, Q. hartwissiana and Q. petraea of western Eurasia, all with a complicated taxonomic history.
The topology of the all-tips tree closely matches previous analyses based on fewer taxa (McVay et al., 2017b; Hipp et al., 2018; Deng et al., 2018) for all sections except sections Quercus and Virentes. Unlike previous analyses, the all-tips topology embeds the long-branched section Virentes within section Quercus, sister to a clade comprising the SW US and Mexican clade and the Stellatae clade. This appears to be an artefact of clustering, as prior analyses of the same taxa do not reveal this topology, and unconstrained analysis of these taxa also recovers this aberrant topology. As a consequence, we consider the large-scale topology of the white oaks s.l. not to be reliable in the all-tips tree, and as this topology is well resolved in prior works (McVay et al., 2017b,a), we constrain the singletons topology as described in the methods section.
Topology and timing of the oak phylogeny
Between the correlated and relaxed models of molecular rate heterogeneity, the correlated rates model (i.e., the penalized likelihood approach of Sanderson 2002) is consistently favored using □IC except at λ of 0, when the models are identical (Table S4). Though dating estimates differ little from λ = 0 to λ = 10 (not shown, but reproducible using code archived for this paper), cross-validation shows lowest sensitivity of taxon-removal on dating estimates at λ = 1.
Analyses with the crown-age calibrations (Fig. 1, S3a) suggest an older origin of most sections than proposed in prior studies (e.g., Cavender-Bares et al., 2015; Hipp et al., 2018; Deng et al., 2018), in part because in the current study we had access to a more comprehensive picture of the fossil record in oaks, including fossils used as age priors that predate those used in earlier studies. Section Virentes in our analysis has a crown age of ca. 30 Ma, whereas Cavender-Bares et al. (2015) estimated the crown age at 11 Ma. Even under the stem-age calibrations (Fig. 1; Fig. S3b, c), we estimate the crown age of Virentes at close to the Oligocene-Miocene boundary (ca. 23 Ma), nearly twice as old as prior estimates. Sections Quercus and Lobatae had an Oligocene crown constraint (31 Ma) in our previous work (Hipp et al., 2018); in the current study, they were constrained to a mid-Eocene origin (45–48 Ma) for the crown calibration, while the stem calibration recovers a late-Eocene origin for the red oaks (39 Ma) while the white oaks float down to a mid-Oligocene crown age (28 Ma). In the previous study of section Cyclobalanopsis, a minimum age of 33 Ma was set as a constraint at the root of subgenus Cerris, leading to a late Oligocene crown age for section Cyclobalanopsis (Deng et al., 2018); by contrast we recover an early Eocene crown age (38 Ma) for the group under the crown calibration, late Eocene (36 Ma) under the stem calibration. Given the high fossil density in Quercus (Table 1 and references therein; also reviewed in part in Denk & Grimm, 2009; Grímsson et al., 2015; Denk et al., 2017), the potential for alternative interpretations of their placement, and disparity among alternative methods for modeling (Paradis, 2013; Donoghue Philip C. J. & Yang Ziheng, 2016), we leave an investigation of a broader range of dating scenarios to later studies.
White oaks s.str. are estimated in the crown-calibration analysis to have arrived in Eurasia some point in the Oligocene, close to the split between the section Ponticae sisters, which despite their morphological similarity appear to have diverged from one another nearly twice as long ago as the crown age of the Mexican white oaks; under the stem-calibration, the Eurasian white oaks are approximately half the crown age of the Ponticae. By contrast with the two species of sect. Ponticae, the Mexican white oak ancestor gave rise to an estimated 80 species in approximately half the time. The Roburoids had divided into a European and an East Asian clade by the early Miocene under the crown calibration, the late Miocene under the stem calibration.
Under the diversification scenarios implied by both the crown and the stem calibrations (Fig. 1, 2), there are four relatively recent and nearly simultaneous upticks in diversification: white oaks of Mexico and Central America; the red oaks of Mexico and Central America; the Eurasian (Roburoid) white oaks; and the Glauca, Semiserrata, and Acuta clades of section Cyclobalanopsis. In addition, the Eurasian white oaks and the southeastern U.S. white oaks (the Stellatae clade) and red oaks (the Laurofoliae clade) show a lesser increase in diversification rate in both analyses, and the clade of section Ilex that includes the Himalayan and Mediterranean species shows an uptick in diversification rate in the stem calibration. This result is robust to missing taxa, as we find essentially the same clades increasing in rate even assuming the 40% of missing taxa in our study were missing at random from the tree (Fig. S3a-c), with the addition of a portion of section Ilex and some of the eastern North American taxa as high-rate clades under the global sampling proportions model.
Genomic arrangement of RAD-seq loci
A total of 39,860 loci aligned to at least one position on the oak genome. The 12 “pseudochromosomes” (inferred linkage groups, corresponding to the 12 Quercus chromosomes) as well as 360 scaffolds that did not map to the linkage groups were targeted by these loci. A total of 19,468 loci mapped to a unique position on a scaffold placed to one of the 12 oak genome pseudochromosomes, an average of 1,622.3 ± 575.4 (s.d.) per chromosomes. Of these, 31.7% ± 8.1% overlapped with the boundaries of a gene model (Fig. 3), despite the fact that only 10.1% of the 716 Mb of the Quercus robur genome that fall within the 12 pseudochromosomes fall within the endpoints of a gene model.
For the tests of introgression, 2,422 loci had taxon sampling appropriate to testing for introgression involving Q. macrocarpa and Q. lobata (the Dumosae alternative topologies); 2,228 were suitable to testing for introgression involving the Roburoid white oaks and Q. pontica (the Roburoid alternative topologies); and 728 were suitable to testing both. Because we were interested in investigating genomic overlap in support for different areas of the species tree, we limited ourselves to the 728 loci that were potentially informative about both situations. Of these, 418 mapped to one position on one of the Quercus robur pseudochromosomes; and of these, 297 exhibited a log-likelihood difference of at least 2.0 between the better and more poorly supported topology for the Dumosae hypothesis or the Roburoid hypothesis, or both (Fig. 4). There was no correlation between the Roburoid and Dumosae hypotheses (r = −0.0286, p = 0.4878), meaning that loci that support or reject either of the Roburoid hypotheses do not correlate with a particular Dumosae hypothesis. Moreover, whether or not a locus is located within one of the Q. robur gene models has no effect on whether it recovers the introgression or the divergence history for the Roburoid oaks (F1,366 = 0.6494, p = 0.4209) or the Dumosae (F1,415 = 0.0377, p = 0.8461).
Quartet similarity—the number of taxon quartets with a topology shared between trees over the total number of quartets that both trees are informative about—between the RAD-seq individual-locus trees and the singletons tree (Fig. S4) is similarly uninfluenced by presence in one of the gene models presented in the Quercus robur genome (Plomion et al., 2018) (F1,2542 = 0.0495, p = 0.8239) and shows no evidence of genomic auto-correlation (Fig. S5). Rather, loci that support the tree are distributed across the genome. The same is true using locus trees to investigate the support for selected nodes of the phylogeny, all strongly supported (bootstrap support > 95% for all nodes tested; Fig. S1) (Fig. 5). The 2762 RAD-seq locus trees made 4,745 branch-level support claims and 27,283 conflict claims on the singletons tree, of which 6,409 total claims pertain to the nodes investigated, ranging from 107 to 1,055 per node (427.3 ± 273.7; Fig. 5). The locus-by-locus incongruence is high at this level: the proportion of loci concordant with each node averages 0.2395 ± 0.2523, but the range is high, from 0.6879 for the genus as a whole to as low as 0.0075 for the Mexican red oaks and 0.0088 for the Mexican white oaks (Table S5). There is no genomic autocorrelation in support vs. rejection of nodes in the singletons tree by individual locus trees (as inferred using phyparts; Smith et al., 2015) (Fig. S6), but the correlation between the crown age of clades investigated and the proportion of loci concordant with the crown age is positive and moderately significant (r = 0.4996, p = 0.0579; Fig. S7). Three clades stand out as outliers for high proportion of loci supporting divergence (outside the 95% regression CI): the genus as a whole, and sections Cerris and Ilex. This widespread genomic incongruence is reflected in broad network-like reticulation in the neighbor-net tree at the base of most clades (Fig. 6).
Discussion
Our analyses demonstrate that the diversity of oaks we observe today reflects deep geographic separation of major clades within the first 15 million years after the origin of the genus, and that standing species diversity arose mostly within the last 10 million years, predominantly in four rapidly diversifying clades that together account for ca. 60% of the diversity of the genus. Previous work has demonstrated American oak diversity was shaped in large part by ecological opportunity, first by the space left by tropical forests as they receded from North America, then by migration into the mountains of Mexico (Hipp et al., 2018; Cavender-Bares et al., 2018). The current study deepens this understanding by demonstrating two increases in diversification rates in Eurasia: one in the Eurasian white oaks, which arrived from eastern North America 7.5 to 18 Ma to low continental oak diversity, and no closely related oaks; and one in the southeast Asian section Cyclobalanopsis, driven by changing climates and the Himalayan orogeny (Deng et al., 2018). At the same time, our work demonstrates widespread genomic incongruence in phylogenetic history, with alternative phylogenetic histories interleaved across all linkage groups. Contrary to our hypothesis at the outset of this study, there appear to be no regions of the genome that on their own define the entire oak phylogeny. Instead, the primary divergence history of oaks (Crowl et al., In review; McVay et al., 2017b) knits together and emerges from a patchwork of histories that comprise the oak genome.
Topology and timing of the global oak phylogeny
Our work indicates that by the mid-Eocene (45 Ma), all Quercus sections (fide Denk et al., 2017), representing eight major clades of the genus, had originated with the possible exception of section Quercus, which under the stem calibrations scenario arose at the Eocene-Oligocene boundary (33 Ma). Following this compressed interval of crown radiation, diversification rates spiked in the late Miocene to Pliocene, ca. 10 Ma (Fig. 2), primarily in southeast Asia, Mexico, and the white oaks of Eurasia. The eight fossil calibrations that we utilize here, and the two alternative methods of calibrating the tree (Fig. S3a-c), bracket what we consider to be a wide range of the plausible diversification times for the genus; so that while additional calibrations and a wider range of rate models bear investigation, we consider this overall finding for the shape and timing of oak diversification to be reasonable.
While Quercus arose at around the early Eocene climatic optimum (the earliest known Quercus fossil is pollen from Sankt Pankratz, Austria, 47°45’ N latitude, ca. 56 Ma; Hofmann et al., 2011), early fossils range as far north as Axel Heiberg Island in far northern Canada, which at 79° (both modern and paleolatitude in early Eocene; Scotese, 2014) is 20° further north than the northernmost oak populations today. As it followed the cooling climate southward, the genus remained largely a lineage of the northern temperate zone with some species of sections Virentes, Lobatae, and Quercus inhabiting tropical climates; but even these possess physiological adaptations that reflect their temperate ancestry (Cavender-Bares, 2019). In Eurasia, section Cyclobalanopsis dominates in subtropical evergreen broadleaf forests (Deng et al., 2018), but the sister sections Cerris and Ilex are temperate to Mediterranean. This climatic conservatism structures the geographic distribution of oak clades at several levels. Geographic patterns among and within major clades in the American oaks (subg. Quercus) have already been studied in detail, with geographic differentiation among the western U.S., the eastern U.S., and the southwestern U.S. and Mexico / Central America in each of two sections approximately simultaneously (Hipp et al., 2018). The current phylogeny makes clear that in the Eurasian white oaks of sect. Quercus, the Roburoid clade, the morphologically distinctive Mediterranean, dry-adapted species often treated as subsection Galliferae (e.g., Tschan & Denk, 2012) are distributed among all four subclades, suggesting that adaptations to the Mediterranean climate are convergent within the Roburoid clade; as discussed below under Rapid diversification of the Eurasian white oaks, it is geography rather than ecology or morphology that defines clades: species within clades are mostly separated by ecology, not geography. Likewise, the western Eurasian members of section Ilex form an inclusive subtree, in which the two widespread Mediterranean species Q. coccifera and Q. ilex are clearly separated and placed sister to the montane Asian clade. The geographically most distant species of the section are also genetically most distinct (Fig. 6). Even within clades, geographic structuring is evident. In section Cerris, for example, the east and west Eurasian species group in sister clades; within these latter, the western Mediterranean Q. crenata and Q. suber ‘corkish oaks’, the Near East ‘Aegilops’ oaks (Q. brantii, Q. ithaburensis, Q. macrolepis), and the remaining central-eastern Mediterranean members of the section are clearly separated. Within sect. Quercus, the North American Prinoids and Albae form a grade, reflecting diversification in North America predating dispersal of the Roburoid ancestor back to Eurasia. Once established in Eurasia, this lineage then diverged into East Asian and western Eurasian sister clades, ca. 10 My after isolation from its North American ancestors. Geography is imprinted in the oak phylogeny across clades, time periods, and continents.
Despite the older crown-age inferences in the current study in comparison to the RAD-seq studies of 2015–2018, relative dates in the present study confirm earlier results that the American oaks increased in diversification rate as they entered Mexico (in both red oaks and white oaks). It broadens this perspective with a global sample, providing evidence that the relative diversification rate of the Glauca, Acuta, and Semiserrata clades of the semitropical southeast Asian section Cyclobalanopsis is comparable to if not higher than the Mexican diversification. To a lesser extent, the Eurasian white oaks (the Roburoid clade) also show an increased rate of diversification. It is worth noting that the crown age of the Roburoid clade as a whole may be younger than our inferences, as fossil data raise some questions as to whether the Old World Roburoids were already isolated by the early Oligocene. Eocene sect. Quercus from Axel Heiberg Island (Canada), for example, appears to be closely allied with East Asian white oaks, and Quercus furuhjelmi from the Paleogene of Alaska and central Asia might belong to any of the modern New World or Old World white oak lineages, as might the early Oligocene Quercus kodairae and Q. kobatakei from Japan (Camus, 1936, 1938; Tanai & Uemura, 1994; Menitsky, 2005; Denk & Grimm, 2010; Tschan & Denk, 2012). Whereas previous analysis of Fagus (Fagaceae) found an unambiguous deep split between North American and Eurasian beech species that was also backed by fossils (Renner et al., 2016), the fossil data we have to date do not conclusively pin down the divergence between the North American and Eurasian white oaks. By contrast, the inferred early Miocene split between western Eurasian and East Asian white oaks is compatible with fossil evidence (Denk & Grimm, 2010), lending support to the observed increase in diversification rates observed in this study.
Taxonomy of the Mexican and Central American oaks
The general high species-coherence we observe in the all-tips tree provides strong evidence that oak species, in general, are genetically coherent biological entities. The fact that 97 of the 147 species with more than one sample cohere for all samples provides the broadest test to date of species coherence in oaks. Among the species that do not exhibit coherence, the majority are from Mexico. Two sets of examples suggest that the Mexican oaks, while having been the focus of extensive taxonomic study (e.g., Trelease, 1924; Spellenberg & Bacon, 1996; Spellenberg et al., 1998; González-Villarreal, 2003; Valencia-A., 2004; de Beaulieu & Lamant, 2010), may harbor even higher species diversity than current estimates. The examples of Quercus laeta (González-Elizondo et al., In prep.) and Q. conzattii (McCauley et al., In revision; McCauley & Oyama, In prep.) exemplify a problem likely to be common in Mexican oaks. Both species have samples from northern and central to southern Mexico. Researchers working with them have noticed that northern and southern populations differ and may constitute separate species as our molecular data suggest. These samples are from two centers of Mexican oak diversity (Torres-Miranda et al., 2011, 2013; Rodríguez-Correa et al., 2015) and may reflect even higher species diversity in areas already known for high diversity. Interestingly, the observed divergence between northern Mexico and the Jalisco and Oaxaca samples in these examples appear to correlate with the formation of the Tepic-Zacoalco rift 5.5 Ma in the Jalisco block (Ferrari & Rosas-Elguera, 2000) and not with climatic transitions during the Pleistocene, which has been argued to be more a period of population movement than of speciation in the neotropics (Bennett et al., 2012). Notably, one of the youngest groups in the white oaks is located in the Sierra Madre Occidental, which harbors great habitat diversity in relatively small areas (Torres-Morales et al., 2010). The rugged and relatively young topography, a product of magmatism and subduction processes that lasted up through 12 Ma (Ferrari et al., 2018), and the convergence of temperate and tropical climates shaped the high diversification rates.
Several other cases of confusing taxonomy involving Mexican and Central American species are less clear. For example, the sect. Lobatae complex involving Q. eugeniifolia, Q. benthamii, Q. cortesii and Q. lowilliamsii, has a history of extensive taxonomic complication (Quezada Aguilar et al., 2016). The current work provides evidence that the species constitute a complex meriting more attention and draws attention to the possibility that Central American oak diversity and the role of Central American geology in Neotropical oak diversification has been underestimated (Cárdenes-Sandí et al., 2019), overshadowed as they have been by interest in the Mexican oak diversification (Quezada Aguilar et al., 2017). In the white oaks s.str. (sect. Quercus), cases such as Q. insignis and Q. corrugata seem even more obscure. Field observations (HG-C) suggest subtle differences between Q. insignis, a species of conservation concern from Jalisco, Oaxaca, Chiapas and Veracruz (Jerome, 2018), and Q. corrugata (from Chiapas and Oaxaca), but our molecular data are inconclusive. In general, taxonomy of the recently diverged or still-diverging Mexican species is particularly complicated because of extensive hybridization and introgression, even among relatively distantly related species (Spellenberg, 1995; Bacon & Spellenberg, 1996; González-Rodríguez et al., 2004; Bacon et al., 2011) and the dynamics of recent or ongoing speciation.
Rapid diversification of Eurasian white oaks
Among the long-studied oaks of Eurasia (e.g., Camus, 1936, 1938, 1952; Schwarz, 1993; Menitsky, 2005), the data presented here point to the important role of ecological and morphological convergence among unrelated oaks. Phylogeny of the Eurasian white oaks (the Roburoid clade of section Quercus) has not previously been addressed in detail, despite their importance to our understanding of oak biodiversity and biology (cf. Kremer et al., 1991; Dumolin-Lapegue et al., 1997; Petit et al., 1997; Leroy et al., 2017 and references therein). Previous work has sampled a maximum of 14 Roburoid species (Hubert et al., 2014), but not recovered the monophyly of the clade, much less relationships among species. Our study includes 23 of the estimated 25 Roburoid white oak species, the strongest sampling to date. The late Miocene increase in diversification rate inferred in our study at the base of the western Eurasian white oaks clade is a particularly exciting finding, as it is one of only four major upticks in diversification inferred in our study. Our sampling of northern temperate white and red oaks is almost complete, and we have accounted for sampling bias in our diversification analyses, making it unlikely that the increase in diversification rate detected here is artefactual. The fact that the Roburoids are a northern temperate clade makes their radiation notable.
The unexpected increase in diversification rate in the Roburoids parallels the sympatric diversification of red and white oaks in North America, with divergence within clades and geographic regions accompanying convergence between clades (Cavender-Bares et al., 2018). As in the Mexican oak diversification (Torres-Miranda et al., 2011; Rodríguez-Correa et al., 2015), the western Eurasian white oaks are ecologically diverse, ranging from lowland swamp to Mediterranean scrub, steppe and from mesic lowland forests to subalpine timberline (de Beaulieu & Lamant, 2010). The European Roburoid clades are not readily diagnosable morphologically, and the morphological and ecological convergence among clades has led to taxonomic confusion. The morphologically distinctive Mediterranean, dry-adapted species (subsection Galliferae; cf. Tschan & Denk, 2012), for example, are distributed among all four subclades. Conversely, Roburoid clades 1 to 4 show geographic sorting whereas differentiation within clades commonly reflects ecological and climatic niche evolution along with morphological adaptations (e.g. from deciduous large lobed leaves to small, brevideciduous, unlobed leaves). Our study thus demonstrates that across the genus, ecological diversification within clades has shaped diversification.
Genomic landscape of the global oak phylogeny
The current study uses mapped phylogenomic markers to demonstrate that the oak tree of life is etched broadly across the genome. Previous work demonstrated that approximately 19% of RAD-seq loci were associated with ESTs (Hipp et al., 2014), but that the EST-associated RAD-seq loci analyzed alone did not yield a topology that was different or differently supported than the RAD-seq loci not associated with EST markers, and that they were not differently apportioned to the base or the tips of the phylogeny (which might have suggested that RAD-seq loci associated with coding regions were more or less conservative or more or less homoplasious than the remainder). In the current study, 6,099 (31.3%) of RAD-seq loci in our dataset that map uniquely to one position in the genome do so in or overlapping with a predicted gene in the Quercus robur genome (as expected from a methylation-sensitive restriction enzyme; Rabinowicz et al., 2005; Pegadaraju et al., 2013). Our work demonstrates that gene-based RAD-seq loci do not differ from non-gene-based RAD-seq loci in similarity to the consensus tree or on introgression rates in the Roburoids and the Dumosae. Gene identity tells us little or nothing about how reliably a region of the genome records phylogenetic history.
At the same time, non-significant correlation between loci that strongly differentiate alternative topologies in the Dumosae and Roburoids suggests that these stories segregate nearly independently on the genome. There is also no evidence of genomic autocorrelation of phylogenetic informativeness in our study, despite the fact that our study has more mapped markers that significantly differentiate topologies in at least one of these parts of the tree than a previous study investigating genomic architecture of differentiation at the species level (N = 158 mapped markers with known GST; Scotti-Saintagne et al., 2004). Our hypothesis that there are particular genes or regions of the genome that define the oak phylogeny globally appears to be false: rather, the phylogenetic history of oaks is defined by different genes in different lineages, making evolutionary history of oaks a phylogenetic and genomic mosaic. The effort to find a single best suite of genes for phylogenetic or population genetic inference across the oak genus is thus unlikely to be successful, though markers can clearly be designed for individual clades (Guichoux et al., 2011; Fitzek et al., 2018). What is perhaps most remarkable is that this heterogeneity of histories covarying independently along the oak genome yields, in aggregate, an evolutionary history of the complex genus that mirrors the morphological and ecological diversity of living and fossil oak species.
Conclusion
Questions about the genomic architecture of population differentiation and speciation are generally asked at fine scales (Leroy et al., 2017, 2018), at the point at which population level processes directly shape genomic differentiation. But microevolution—comprising processes at the population level— leaves an imprint in the phylogeny; when such impressions persist, they can often be detected using topological methods that may be sensitive even to introgression along internal phylogenetic branches (Eaton et al., 2015; Solís-Lemus & Ané, 2016; McVay et al., 2017b). With multiple Fagaceae genomes now becoming available (Staton et al., 2015; Plomion et al., 2016, 2018; Sork et al., 2016; Ramos et al., 2018), we may soon be able to detangle the mosaic history of oaks and understand what story each gene tells. The current study makes clear that the phylogeny we unravel will neither be unitary nor told by a small subset of the genome, as the regions of the genome capturing the divergence history for one clade are not the regions capturing the divergence history of another. Understanding phylogenetic history in the face of this variation is only one problem. It will be followed by a greater one: how do we interpret the history of oak diversification in space and time if it is really a collection of diverse histories from different regions of the genome, all reflecting different evolutionary pathways, all equally real?
Author Contributions
ALH, PSM, JCB, MD, AK, CP, and AG-R conceived and designed the study.ALH, PSM, MH, MA, JCB, MD, TD, OG, MSG-E, AG-R, GWG, X-LJ, JDM, HR-C, MCS, VLS, and SV-A collected, identified, and curated samples. ALH, PSM, MH, JCB, AC, MD, TD, AG-R, GWG, X-LJ, JDM, VLS generated and analyzed phylogenetic data. CB, AK, IL, CP generated and analyzed genomic data. ALH, PSM, TD and GWG drafted the manuscript. All authors wrote and edited the manuscript.
Acknowledgements
Funding for this project was provided by U.S. National Science Foundation Awards 1146488 to ALH, 1146102 to PSM and 1146380 to JCB; Swedish Research Grants 2015-03986 to TD and 2009-00000 to GWG; support of the German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig funded by the German Research Foundation (FZT 118) to E-DS and GWG; and The Morton Arboretum Center for Tree Science. This paper is dedicated to the memory of Michael Avishai (1935-2018), founder of the Jerusalem Botanical Gardens and cherished colleague.