Abstract
The use of diverse datasets in phylogenetic studies aiming for understanding evolutionary histories of species can yield conflicting inference. Phylogenetic conflicts observed in animal and plant systems have often been explained by hybridization, incomplete lineage sorting (ILS), or horizontal gene transfer. Here, we employed target enrichment data and species tree and species network approaches to infer the backbone phylogeny of the family Caprifoliaceae, while distinguishing among sources of incongruence. We used 713 nuclear loci and 46 protein-coding sequences of plastome data from 43 samples representing 38 species from all major clades to reconstruct the phylogeny of the group using concatenation and coalescence approaches. We found significant nuclear gene tree conflict as well as cytonuclear discordance. Additionally, coalescent simulations and phylogenetic species network analyses suggest putative ancient hybridization among subfamilies of Caprifoliaceae, which seems to be the main source of phylogenetic discordance. Ancestral state reconstruction of six morphological characters revealed some homoplasy for each character examined. By dating the branching events, we inferred the origin of Caprifoliaceae at approximately 69.38 Ma in the late Cretaceous. By integrating evidence from molecular phylogeny, divergence times, and morphology, we herein recognize Zabelioideae as a new subfamily in Caprifoliaceae. This work shows the necessity to use a combination of multiple approaches to identify the sources of gene tree discordance. Our study also highlights the importance of using data from both nuclear and chloroplast genomes to reconstruct deep and shallow phylogenies of plants.
1 Introduction
Gene tree discordance is common in the phylogenomic era (Galtier and Daubin, 2008; Degnan and Rosenberg, 2009; Szöllősi et al., 2015; Sun et al., 2015; Lin et al., 2019). Many studies have shown that incomplete lineage sorting (ILS), hybridization, and other processes such as horizontal gene transfer, gene duplication, or recombination, may be contributing to discordance among the gene trees (Degnan and Rosenberg, 2009; Linder and Naciri, 2015). Among these potential sources of discordance, hybridization is undoubtedly a research hotspot in plant systematics (e.g., Morales-Briones et al., 2018; Lee-Yaw et al., 2019; Morales-Briones et al., 2020a; Stull et al., 2020). Hybridization is especially prevalent in rapidly radiating groups, which is increasingly recognized as a major force in evolutionary biology, in many cases leading to new species and lineages (Mallet, 2007; Abbott et al., 2010; Yakimowski and Rieseberg, 2014; Konowalik et al., 2015). ILS is one of the prime sources of gene tree discordance, which has attracted increasing attention in the past decades as phylogenetic reconstruction methods allowed its modeling (Edwards 2009; Liu et al., 2015). Despite that, distinguishing ILS from hybridization is still challenging (Linder and Naciri, 2015). More recently, several methods that account simultaneously for ILS and hybridization haven been developed to estimate phylogenetic networks (Solís-Lemus and Ané, 2016; Wen et al., 2018). At the same time, the empirical studies using phylogenetic networks to identify the sources gene tree discordance are increasing (e.g., Morales-Briones et al., 2018, 2020a; Widhelm et al., 2019; Feng et al., 2020).
Caprifoliaceae s.l. sensu Angiosperm Phylogeny Group (APG) IV (APG, 2016; hereafter as Caprifoliaceae) is a woody family in the order Dipsacales containing 41 genera and ca. 960 species, with most genera restricted to eastern Asia and eastern North America (Manchester and Donoghue, 1995; Bell, 2004; APG, 2016). The family has long been the focus of studies of character evolution, especially regarding its tremendous diversity in reproductive structures (Backlund 1996; Donoghue et al. 2003). Caprifoliaceae has five corolla lobes and five stamens as ancestral states, which are retained in Diervilleae C. A. Mey., Heptacodium Rehd., and Caprifolieae (though in some Symphoricarpos Duhamel and Lonicera L. there are four corolla lobes and four stamens). However, for other genera, the number of stamens is reduced to four or even one. Caprifoliaceae shows even greater variation in fruit types (e.g., achene in Abelia R. Br., berry in Lonicera, drupe in Viburnum L.; Manchester and Donoghue, 1995; Donoghue et al., 2003). Some genera bear highly specialized morphological characters (e.g., the spiny leaf of Acanthocalyx (DC.) Tiegh., Morina L. and Dipsacus L.) that have likely played key roles in lineage-specific adaptive radiation (Blackmore and Cannon, 1983; Caputo and Cozzolino, 1994; Donoghue et al., 2003) (Fig. 1).
Floral diversity of Dipsacales. (A) Kolkwitzia amabilis; (B) Zabelia integrifolia; (C) Dipsacus asper; (D) Valeriana flaccidissima; (E) Acanthocalyx nepalensis subsp. delavayi; (F) Lonicera fragrantissima var. lancifolia; (G) Weigela coraeensis; (H) Viburnum opulus subsp. calvescens.
Phylogenetic relationships within Caprifoliaceae have been studied during the past two decades using plastid and nuclear DNA data (Fig. 2), but the placement of Zabelia (Rehder) Makino has never been resolved confidently using either morphological characters (Backlund, 1996; Donoghue et al., 2003) or molecular data (Donoghue et al., 1992; Jacobs et al., 2010; Smith et al., 2010; Landrein et al., 2012; Stevens, 2019; Xiang et al., 2019; Wang et al., 2020). Caprifoliaceae includes seven major clades: Linnaeoideae, Zabelia, Morinoideae, Valerianoideae, Dipsacoideae, Caprifolioideae and Diervilloideae (Donoghue et al., 1992; Jacobs et al., 2010; Smith et al., 2010; Landrein et al., 2012; APG, 2016; Stevens, 2019; Xiang et al., 2019; Wang et al., 2020). Based on nuclear (ITS) and chloroplast DNA (cpDNA) data (trnK, matK, atpB-rbcL, trnL-F) of 51 taxa, Jacobs et al. (2010) found moderate support (bootstrap support [BS] = 62%) for the placement of Zabelia (formerly part of Abelia) in a clade with Morinoideae, Dipsacoideae, and Valerianoideae. Based on the same data set, Jacobs et al. (2010) raised Abelia sect. Zabelia to the genus level as Zabelia, and more recent studies have confirmed the distinctiveness of Zabelia (Landrein et al., 2012; Wang et al., 2015), often finding it sister to Morinoideae, although with low (BS ≤ 50%) to moderate support (BS ≤ 50-70%) (Donoghue et al., 1992; Jacobs et al., 2010; Tank and Donoghue, 2010; Wang et al., 2015). Based on cpDNA data (rbcL, trnL-K, matK and ndhF) of 14 taxa, Landrein et al. (2012) suggested that Zabelia and Diabelia Landrein (Linnaeoideae) had similar “primitive” inflorescences of reduced simple thyrses. Landrein et al. (2012) conducted phylogenetic analyses of the Caprifoliaceae based on the structural characters of reproductive organs. In these analyses, Zabelia was sister to the clade of Morinoideae, and Valerianoideae + Dipsacoideae. Recently, Xiang et al. (2019) carried out analyses of complete plastomes of 32 species in this clade, demonstrating that Heptacodium and Triplostegia Wall. ex DC. are members of Caprifoliaceae s. str. and Dipsacaceae, respectively. Furthermore, Zabelia was found to be the sister to Morinaceae in all analyses (Xiang et al., 2019). Moreover, using complete plastomes from 56 accessions representing 47 species of Caprifoliaceae, Wang et al. (2020) recovered the clade composed of Linnaeoideae, and Morinoideae + Zabelia as sister to Dipsacoideae + Valerianoideae) with strong support (BS = 100%).
Alternative relationships for the Caprifoliaceae s.l. backbone based on previous analyses. (A) Donoghue et al. (2001); parsimony analyses based on chloroplast rbcL sequences and morphological characteristics; (B) Bell et al. (2001); maximum likelihood tree from the combined chloroplast DNA data; (C) Zhang et al. (2003); maximum likelihood tree based on chloroplast trnL-F and ndhF sequences; (D) Jacobs et al. (2010); maximum parsimony Dipsacales phylogeny based on nuclear and chloroplast sequence data; (E) Wang et al.(2020); maximum likelihood tree based on 68 complete plastomes. (F) This study, Species tree based on nuclear concatenated data set.
In this study, we assembled and analyzed a custom target enrichment dataset of Caprifoliaceae to: (1) evaluate sources of gene tree discordance, in order to clarify the backbone phylogeny of Caprifoliaceae with special attention to positions of recalcitrant taxa (i.e., Zabelia and Morinoideae); and (2) determine the evolutionary patterns of key morphological characters of Caprifoliaceae.
2 Materials and methods
2.1 Taxon sampling
We sampled 43 individuals from 38 species of Caprifoliaceae, including representatives of all seven major clades (including Zabelia) of Caprifoliaceae sensu Stevens (2019) and Wang et al. (2020). Additionally, three species of Adoxaceae were included as outgroups. Most samples (38) were collected in the field where leaf tissue was preserved in silica gel. The remaining samples were obtained from the United States National Herbarium (US) at the Smithsonian Institution (Table S1). Vouchers of newly collected samples were deposited in the herbarium of the Institute of Tropical Agriculture and Forestry (HUTB), Hainan University, Haikou, China. Complete voucher information is listed in Supporting Information Table S1.
2.2 DNA extraction, target enrichment, and sequencing
We extracted total genomic DNA from silica gel-dried tissue or herbarium tissue using the CTAB method of Doyle and Doyle (1987). We checked the quantity of each extraction with a Qubit 2.0 Fluorometer (Thermo Fisher Scientific, Waltham, MA, USA) and sonicated 400◻ ng of DNA using a Covaris S2 (Covaris, Woburn, MA) to produce fragments ~150-350◻ bp in length for library preparations. To ensure that genomic DNA was sheared at approximately the selected fragment size, we evaluated all samples on a 1.2% (w/v) agarose gel.
We identified putative single copy nuclear (SCN) genes with MarkerMiner v.1.2 (Chamala et al., 2015) with default settings, using the transcriptomes of Dipsacus asper, Lonicera japonica, Sambucus canadensis, Valeriana officinalis, and Viburnum odoratissimum from 1KP (Matasci et al., 2014), and the genome of Arabidopsis thaliana (L.) Heynh. (Gan et al., 2011) as a reference. SCN genes identified with MarkerMiner were further filtered using GoldFinder (Vargas et al., 2019) requiring loci with at least 400 bp and a coverage of at least three species. This resulted in 428 SCN for phylogenetic analyses. A custom set of 80 bp MYbaits biotinylated RNA baits based on exon sequences were manufactured by Arbor Biosciences (Ann Arbor, MI, USA), with a 2× tiling density. The bait sequences are available as a supplemental file (Appendix 1).
Library preparation was done with the NEBNext Ultra II DNA Library Prep Kit for Illumina (New England Biolabs, MA, USA) following the manufacturer’s protocol. Library concentrations were quantified using a Qubit 2.0, with a dsDNA HS Assay Kit (Thermo Fisher Scientific). Fragment size distribution was determined with a High Sensitivity D1000 ScreenTape run on the Agilent 2200 TapeStation system (Agilent Technologies, Inc., Santa Clara, California, United States). Solution-based hybridization and enrichment with MYbaits followed Weitemier et al. (2014). We conducted a library spiking of 40% of unenriched library and 60% of enriched library for each sample. The final spiked library pools were sequenced by Novogene Corporation (Sacramento, California, U.S.A.) on one lane using the Illumina HiSeq X sequencing platform (Illumina Inc, San Diego, California, U.S.A.) producing 150◻ bp paired-end reads.
2.3 Read processing and assembly
Sequencing adapters and low-quality bases were removed with Trimmomatic v0.36 (ILLUMINACLIP: TruSeq_ADAPTER: 2:30:10 SLIDINGWINDOW: 4:5 LEADING: 5 TRAILING: 5 MINLEN: 25; Bolger et al. 2014). Assembly of nuclear loci was carried out with HybPiper v.1.3.1 (Johnson et al. 2016). Assemblies were carried on an exon basis to avoid chimeric sequences in multi-exon genes product of potential paralogy (Morales-Briones et al. 2018). Only exons with a reference length of ≥ 150 bp were assembled (1220 exons from 442 genes). Paralog detection was carried out for all exons with the ‘paralog_investigator’ option of HybPiper. All assembled loci (with and without paralogs detected) were processed following Morales-Briones et al. (2020b) to obtained ‘monophyletic outgroup’ (MO) orthologs (Yang and Smith, 2014).
Off-target reads from target enrichment were used for de novo assemblies of plastome with Fast-Plast (McKain., 2017). Resulting contigs from Spades v3.9.0 (Bankevich et al., 2012) were mapped to Kolkwitzia amabilis (Genbank accession no. NC_029874.1), with one copy of the Inverted Repeat removed. Mapped contigs manually edited in Geneious v.11.1.5 (Kearse et al. 2012) to produce the final oriented contigs. Contigs were further annotated using K. amabilis as a reference and coding sequences (CDS) were extracted using Geneious.
2.4 Phylogenetic analyses
We used concatenation and coalescent-based methods to reconstruct the phylogeny of Caprifoliaceae. We performed phylogenetic analyses on the nuclear and plastid CDS, separately. Individual nuclear exons were aligned with MAFFT version 7.407 (Katoh and Standley, 2013) and aligned columns with more than 90% missing data were removed using Phyutility (Smith and Dunn, 2008). A maximum likelihood (ML) tree was estimated from the concatenated matrix, partitioning by gene, using RAxML version 8.2.12 (Stamatakis, 2014) and the GTRGAMMA model for each partition. Clade support was assessed with 100 rapid bootstrap replicates. We also estimated a species tree with ASTRAL v5.7.1 (Zhang et al., 2018) from individual ML gene trees inferred using RAxML with a GTRGAMMA model. Local posterior probabilities (LPP; Sayyari and Mirarab, 2016) were used to assess clade support. Gene tree discordance was evaluated using two approaches. First, we mapped the 713 nuclear gene trees onto the species-tree phylogeny and calculated the internode certainty all (ICA; Salichos et al., 2014) and number of conflicting and concordant bipartitions on each node of the species trees using Phyparts (Smith et al., 2015). Then we used Quartet Sampling (QS; Pease et al., 2018) to distinguish strong conflict from weakly supported branches in the nuclear tree. We carried out QS with 1000 replicates.
Plastid CDS were aligned with MAFFT and then concatenated into a supermatrix. We reconstructed a phylogenetic tree using RAxML based on chloroplast data. We also use QS to investigate potential conflict in the chloroplast data set. QS was carried using 1000 replicates.
2.5 Assessment of hybridization
To test whether ILS alone could explain cytonuclear discordance, we used coalescent simulations similar to Folk et al. (2017) and García et al. (2017). We simulated 10,000 gene trees under the coalescent with DENDROPY v.4.1.0 (Sukumaran & Holder, 2010) using the ASTRAL species trees as a guide tree with branch lengths scaled by four to account for organellar inheritance. We summarized the simulated gene trees on the cpDNA tree. Under a scenario of ILS alone, any relationships in the empirical chloroplast, tree should be present in the simulated trees and have a high frequency; under a hybridization scenario, relationships unique to the cpDNA tree should be at low (or zero) frequency (García et al., 2017).
2.6 Species network analysis
We inferred species networks using a maximum pseudo-likelihood approach (Yu et al., 2012). Due to computational restrictions and given our main focus on potential reticulation among major clades of Caprifoliaceae (i.e. along the backbone), we reduced our 46-taxon data set to one outgroup and nine ingroup taxa to represent all major clades. Species network searches were carried out with PHYLONET v.3.6.1 (Than et al., 2008) with the command ‘InferNetwork_MPL’ and using the individual gene trees. Network searches were performed using only nodes in the gene trees that had BS support of at least 50%, allowing for up to four hybridization events and optimizing the branch lengths and inheritance probabilities of the returned species networks under the full likelihood. To estimate the optimal number of hybridizations and test whether the species network fits our gene trees better than a strictly bifurcating species tree, we computed the likelihood scores of concatenated RAxML, ASTRAL and plastid DNA trees, given the individual gene trees, as implemented in Yu et al. (2012), using the command ‘CalGTProb’ in PHYLONET. Finally, we performed model selection using the bias-corrected Akaike information criterion (AICc; Sugiura, 1978). The number of parameters was set to equal the number of branch lengths being estimated, the number of hybridization probabilities being estimated, and the number of gene trees used to estimate the likelihood, to correct for finite sample size.
2.7 Divergence time estimation
Divergence times were inferred using BEAST v.2.4.0 (Bouckaert et al., 2014). There is potential ancient hybridization in Caprifoliaceae, therefore, we used the nuclear and chloroplast gene tree for age estimates, separately. We constrained the root age to be 78.9 Ma based on the analysis of Li et al. (2019). We selected two fossils as the calibration points. (1) The fossil seeds of Weigela from the Miocene and Pliocene in Poland (Lańcucka-rodoniowa, 1967), and the Miocene in Denmark (Friis, 1985) was used to constrain its stem age to at 23 Ma (Wang et al., 2015). (2) The stem age of Diplodipelta was constrained to be at least 34.07 Ma based on the fruit fossil from the late Eocene Florissant flora of Colorado (34.07±0.1 Ma; Manchester, 2000). All dating analyses were performed with an uncorrelated lognormal relaxed clock (Drummond et al., 2012), GTR + G substitution model (Posada, 2008), gamma site heterogeneity model, estimated base frequencies, and a ML starting tree. A Yule process was specified as the tree prior. Two independent MCMC analyses of 300,000,000 generations with 10% burn-in and sampling every 3000 generations were conducted to evaluate the credibility of posterior distributions of parameters. BEAST log files were analyzed with Tracer v.1.7 (Drummond et al., 2012) for convergence with the first 10% removed as burn-in. Parameter convergent was assessed using an effective sample size (ESS) of 200. Log files where combined with LogCombiner and a maximum clade credibility tree with median heights was generated with TreeAnnotator v.1.8.4 (Drummond et al., 2012).
2.8 Analysis of character evolution
Character states were coded from the literature, particularly from Backlund (1996), Donoghue et al. (2003), Jacobs et al. (2011) and Landrein (2017). The number of stamens was scored: (0), 1; (1), 2; (2), 3; (3), 4; (4), 5. Two character states were scored for the style exertion: (0), not exceeding corolla; (1), exceeding corolla. Four fruit types were scored: (0), achene; (1), capsule, (2), berry; (3), drupe. The number of carpels was scored as: (0), 2; (1), 3; (2), 4. Number of seeds was scored: (0), 1; (1), 2; (2), 4-5; (3), 6-20; (4), 20+; Two epicalyx types were scored: (0), no; (1), yes. All the morphological charactersanalyzed here were presented in Supplementary Fig. S1. Ancestral character state reconstruction was performed using the Maximum Likelihood approach as implemented in Mesquite v.3.51 (Maddison and Maddison, 2018) with the ‘Trace character history’ option based on the topology of the chloroplast trees. To explore the difference caused by different topology, we also reconstructed ancestral character based on the topology of the nuclear trees. The Markov k-state one-parameter model of evolution for discrete unordered characters (Lewis, 2001) was used.
2.9 Data accessibility
Raw Illumina data from sequence capture is available at the Sequence Read Archive (SRA) under accession SUB7674585 (see Table S1 for individual sample SRA accession numbers). DNA alignments, phylogenetic trees and results from all analyses and datasets can be found in the Dryad data repository.
3 Results
3.1 Exon assembly
The assembly resulted in sequences of up to 793 exons (≥ 150 bp) per species. The number of exons per sample varied from 380 to 1500, with an average of 1068 exons per sample. HybPiper identified paralogous copies for up to 284 exons per species. We found up to six paralogs per exon in Caprifoliaceae. After paralog pruning and removal of exons with poor coverage across samples (≤ 24 samples), we kept 713 exons from 196 different genes. Additionally, 63 of those exons, showed the presence of two (60) and three (three) paralogs copies that met the pruning requirements, giving us a total of 713 loci. The resulting concatenated matrix had an aligned length of 343,609 bp with 21,004 parsimony-informative sites, a minimum locus size of 277 bp, and a maximum locus size of 5,739 bp.
3.2 Phylogenetic reconstruction
We retrieved 87 CDS from off-target plastome reads, which after concatenation resulted in a matrix of 78,531 bp (Table 1). Overall, both nuclear and plastid data strongly support (1) Diervilloideae as sister to the rest of Caprifoliaceae, followed successively by Caprifolioideae, and (2) five monophyletic groups, Diervilloideae, Caprifolioidea, Valerianoideae, Zabelia and Morinoideae (Figs. 3 and 4).
Species tree of the nuclear concatenated dataset inferred with ASTRAL-◻ Local posterior probabilities support values and internode certainty all scores are shown above and below branches respectively. Pie charts next to the nodes present the proportion of gene trees that supports that clade (blue), the proportion that supports the main alternative for that clade (green), the proportion that supports the remaining alternatives (red), and the proportion (conflict or support) that has < 50% bootstrap support (gray). Numbers next to pie charts indicate the number of gene trees concordant/conflicting with that node in the species tree. Major taxonomic groups or main clades in the family as currently recognized are indicated by branch colors as a visual reference to relationships.
Tanglegram of the nuclear concatenated (left) and plastid (right phylogenies. Dotted lines connect taxa between the two phylogenies. Maximum likelihood bootstrap support values are shown above branches. The asterisks indicate maximum likelihood bootstrap support of 100%. Major taxonomic groups or main clades in the family as currently recognized are indicated by branch colors as a visual reference to relationships.
Nuclear dataset
The ASTRAL topology (Fig. 3) was largely congruent with the RAxML concatenated dataset (Fig. 4), and the main clades were maximally supported. Our concatenation nuclear phylogeny recovered full support for the monophyly of the seven major clades (Fig. 4). The clade Valerianoideae + Dispsacoideae was sister to Linnaeoideae, with Zabelia + Morinoideae close to the clade Valerianoideae + Dispsacoideae + Linnaeoideae. It is worth mentioning that all relationships of major clades were with strong support (Fig. 4).
The coalescent nuclear phylogeny recovered moderate to strong support (local posterior probabilities (LPP ≥ 0.7) for all of the major clades and relationships within and among them. The clade of Valerianoideae + Dipsacoideae + Linnaeoideae was sister to Zabelia + Morinoideae and both together constituted the sister clade to Caprifolioideae. Within Linnaeoideae, all analyses and data sets recovered a clade of Vesalea M. Martens & Galeotti + Linnaea Gronov. ex L. as sister to a clade of all other Linnaeoideae, with strong support (Fig. 3).
The coalescent analyses and ICA scores about ASTRAL tree of the nuclear concatenated dataset revealed that most gene trees conflicted with the species trees (Fig. 3). Our results showed that ICA values along the backbone were ranging from 0.1 to 0.58, while ICA values in many of the nested clades were lower and ranged from −0.05 to 0.25. The node Kolkwitzia amabilis showed the lowest values (ICA= −0.05). The ICA values calculated here are notably lower, indicating a great deal of underlying gene tree conflict (Figs. 3 and S2). Based on the PhyParts analysis, for Zabelia + Morinoideae, 417 of the simulated gene trees (out of 713) were concordant with this relationship while 181 were in conflict (Fig. 3). Furthermore, multiple conflicting placements were observed, suggesting that ILS is likely also at play here.
For ASTRAL tree, our analyses showed the Quartet Concordance (QC) values of the backbone clades were positive scores (0.17-1), which indicated moderate or strong support for the relationship of backbone clades, however, the Quartet Differential (QD) score tend to have more extreme values and were close to 0, which meant that no skew in the proportions of discordant trees. In addition, high Quartet Informativeness (QI) for these clades (QI=1 or near to 1), which showed low or no information for the given branch (Fig. S3). A similar pattern was found in nuclear concatenated RAxML trees, with positive QC values (0.23 to 1) for the backbone clades, QD values were near to 0, and high QI values (0.99-1) for these clades (Fig. S3). Slight differences in some relationships were observed between the concatenation and ASTRAL analyses of the nuclear genes (e.g., the positions of Kolkwitzia amabilis and Dipsacus japonicus; Figs. 3 and 4) and these differences were largely confined to areas of strong support.
Plastid dataset
Phylogenetic analysis of the cpDNA dataset also recovered the seven major clades in Caprifoliaceae with high support (Fig. 4). There were lots of conflicts on the cpDNA and nuclear trees (cf. Figs. 3 and 4). The clade Zabelia + Morinoideae and the clade Linnaeoideae were recovered as sister with relatively strong support (BS = 87%), and together these were sister to the clade Valerianoideae + Dipsacoideae (Fig. 4). For chloroplast trees, QS showed strong QC values (0.37-1), low QD values (near to 0) and high QI values (0.94-1) (Fig. S5). This indicates strong majority of quartets support the focal branch and the low skew in discordant frequencies with low or no information for the relationship of these clades.
The plastid analyses (Fig. 4) placed Diabelia either sister to Dipelta Maxim. (consistent with Wang et al., 2020) with moderate support (BS = 85, Fig. 1) and together these were sister to Kolkwitzia amabilis (Fig. 4). However, the nuclear concatenation tree is consistent with the species tree in placing Kolkwitzia amabilis sister to Abelia + Diabelia (Fig. 1). Additional instances of cytonuclear discordance included the placements of Zabelia, Morinoideae, Dipsacoideae and Valerianoideae (Fig. 4).
3.3 Coalescent simulations analysis
Coalescent simulations under the organellar model did not produce gene trees that resembled the observed chloroplast tree. When the simulated gene trees were summarized on the observed chloroplast tree, most clade frequencies were near to zero, for instance, Kolkwitzia amabilis and the clade Valerianoideae + Dipsacoideae, Zabelia + Morinoideae and the clade Linnaeoideae, Valerianoideae and Dipsacoideae (Fig. S6). This suggested that ILS alone cannot explain the high level of cytonuclear discordance observed in Caprifoliaceae.
3.4 Species network analysis
A network with three reticulation events had the lowest AICc value (Table 2), suggesting hybridization events in the ancestor of Zabelia biflora, the ancestor of Morina longifolia, and the ancestor of Scabiosa techiliensis (Figs. 5 and S7). The inheritance probabilities analysis showed that Morinoideae (represented by Morina longifolia) had a genetic contribution of 24.2 % from an ancestral lineage of Vesalea and Linnaea (Fig. 5). Inferred inheritance probabilities for reticulation event indicate that Zabelia (represented by Zabelia bilfora) had a genetic contribution of 26.6% of its genome from an ancestral lineage of Morinoideae (Fig. 5). The inheritance probabilities showed that Dipsacoideae (represented by Scabiosa techiliensis) had a genomic contribution of 59.2 % from an ancestral lineage of Vesalea and Linnaea (Fig. 5).
Best supported species network of the selective nuclear dataset inferred with PHYLONET. Numbers next to the hybrid branches indicate inheritance probabilities. Red lines represent minor hybrid edges (edges with an inheritance contribution < 0.50).
3.5 Divergence time estimation
Divergence time estimates based on nuclear gene tree suggested that the deepest divergences in Caprifoliaceae occurred in the late Cretaceous, whereas most generic-level diversification occurred in the Middle-Eocene (Fig. 6). The divergence between Dipsacoideae and Valerianoideae was dated to 46.88 Ma (95% Highest Posterior Density (HPD) = 37.44–57.35 Ma). The diversification of Linnaeoideae was inferred to be at 53.83 Ma (95% HPD = 37.37–55.73 Ma). Within Linnaeoideae, both Abelia and Kolkwitzia originated almost contemporaneously. The onset of Zabelia and Morinoideae diversification occurred between 26.51 and 53.72 Ma. The divergence time estimated by different data matrices was not completely consistent (Figs 6 and S8). A comparison of the time estimates using plastid gene tree is shown in Fig. S8. For instance, the age for the split of Dipsacoideae and Valerianoideae was estimated at 53.61 Ma (95% HPD: 41.44–65.57). The diversification of Linnaeoideae was inferred to begin at 47.81 Ma (95% HPD = 34.21–52.55 Ma).
BEAST analysis of divergence times based on the nuclear alignment. Calibration points are indicated by A, B. and C. Numbers 1–11 represent major divergence events in Caprifoliaceae; mean divergence times and 95% highest posterior densities are provided for each.
3.6 Character evolution
The likelihood inference of character evolution using the cpDNA tree detected some homoplasies in each of the six morphological characters examined (Figs. 7, 8 and 9), and the style exertion relative to corolla showed particularly high homoplasy (Fig. 7). Morphological trait mapping suggested two hypotheses for the character evolution in Caprifoliaceae: (1) Except for Caprifolioideae and Valerianoideae, most subfamilies of Caprifoliaceae have four stamens. The number of stamens changed in a relatively parsimonious manner, from five in most Caprifolioideae and Diervilloideae to four in the bulk of Caprifoliaceae, within Valerianoideae, further reductions to 3 and 1(Fig. 7); (2) The style exertion character has shown a high level of homoplasy in the early diversification of the family. Even in the broad Linnaeoideae, the state “not exceeding corolla” originated twice, once in Vesalea, and the other in the Diabelia-Dipelta-Kolwitzia-Abelia clade; (3) Ancestral fruit type for Caprifoliaceae is uncertain, but it is most likely an achene (Fig. 8). Nevertheless, the distribution of fruit types among the basal lineages was complex, and inclusion of broader outgroup taxa is needed to test the achene fruit type as the most likely ancestral state; (4) The state of three carpels is common within Caprifoliaceae. There was much variation among the early diverged lineages, but the state of three carpels was inferred to be ancestral in the family and other states were largely derived from the three-carpel state; (5) One seed is common and was inferred as the ancestral character state for Caprifoliaceae. Similar to number of carpels, variation of this character is high in Caprifoliaceae, with five character-states. Nevertheless, the character evolution was relatively parsimonious with only low levels of homoplasy; (6) The epicalyx occurs only in two major lineages, showing a case of convergent evolution (Figs. 7, 8 and 9).
Likelihood inference of character evolution in Caprifoliaceae using Mesquite v.2.75 based on plastid matrix. Left, Number of stamens; Right, Style exertion.
Likelihood inference of character evolution in Caprifoliaceae using Mesquite v.2.75 based on plastid matrix. Left, fruit type; Right, Number of carpels.
Likelihood inference of character evolution in Caprifoliaceae using Mesquite v.2.75 based on plastid matrix. Left, number of seeds; Right, epicalyx presence/absence.
A summary of character states using the nuclear gene tree that are relevant for the taxonomy of the group is shown in Figs. S9, S10 and S11. We found that the patterns of character evolution from cpDNA tree and nuclear gene tree were similar.
4 Discussion
4.1 Phylogenetic incongruence and putative hybridization
Although both our nuclear and plastid phylogenies supported the same seven major clades of Caprifoliaceae, the relationships among these clades are incongruent between data sets (Figs. 3 and 4). For instance, in the nuclear ASTRAL tree, Linnaeoideae is recovered as sister to Dipsacoideae (except for Dipsacus japonicus) +Valerianoideae (Fig. 3), while in the plastid tree Linnaeoideae is sister to Zabelia + Morinoideae (Fig. 4). In contrast, in the nuclear RAxML concatenated tree (Fig. 3), Linnaeoideae is recovered as sister to Dipsacoideae +Valerianoideae. Some of these points of conflict pertain to areas of Caprifoliaceae phylogeny that have long been problematic—for example, the relationships between Zabelia and other subfamilies. Our results reevaluated Caprifoliaceae phylogeny with more extensive evidence from the nuclear genome, because many previous results inferred from the plastome may be incorrect or incompletely understood due to evolutionary processes such as ILS or organellar capture via hybridization.
Three main processes will lead to gene tree heterogeneity and cytonuclear discordance: gene duplication/extinction, horizontal gene transfer/hybridization, and ILS. Currently, there are many methods to detect gene discordance (e.g., Smith et al., 2015; Pease et al., 2018), however, sources of such discordance remain hard to disentangle, especially when multiple process co-occur (e.g., Morales-Briones et al. 2020a).
In previous studies, Zabelia has long been thought to be closely related to Abelia (Hara, 1983; Tang & Lu, 2005). However, based on molecular datasets, Tank and Donoghue (2010) and Jacobs et al. (2011) found that Zabelia was sister to Morinaceae. Using six molecular loci and inflorescence morphology, Landrein et al. (2012) concluded that the position of Zabelia remained unclear. The molecular investigation of Xiang et al. (2019) found that the sister relationships between Zabelia + Morinaceae and Linnaceae + Valerianaceae + Dipsacaceae were not highly supported. Such phylogenetic incongruence provides the opportunity to test causal hypotheses of cytonuclear discordance, e.g., ILS or hybridization. Further, in our analyses (Fig. 4), widespread cytonuclear discordance exists across Caprifoliaceae, especially at genus levels, with a high level of conflict within genera. Regarding deep Caprifoliaceae relationships, the results from the nuclear analyses (Figs. 3 and 4) showed multiple instances (at least two) of well-supported conflict with the results from the plastome (Fig. 4), and the plastid results were largely consistent with previous plastid and large-scale analyses of Caprifoliaceae (Wang et al., 2020).
It is worth mentioning that Dipsacoideae was not recovered as monophyletic only in the species tree (Fig. 3), in which Dipsacus japonicus had a sister relationship with Linnaeoideae. The nodes with the strong LPP (LPP=1) also had the lower ICA score (ICA= 0.1), which suggests that ILS and/or unidentified hybrid lineages continue to obscure our understanding of relationships in Dipsacoideae. Our ICA scores and QS analyses of the nuclear dataset revealed strong signals of gene tree discordance among the seven major clades of Caprifoliaceae.
The concordance analysis and ICA scores showed that a large amount of conflict between individual gene trees and the species trees. Our coalescent simulations also suggested that the observed cytonuclear discordance cannot be explained by ILS alone. Previous studies reported that hybridization has shaped the evolutionary history of Caprifoliaceae (e.g., Heptacodium miconioides) (Zhang et al., 2003; Landrein et al., 2002). The extensive analyses performed here revealed a similar pattern of cytonuclear discordance, e.g., some species were recovered in different positions between the nuclear and plastid phylogenies (see Fig. 4 on positions of Dipsacus japonicus, Kolkwizia amabilis, and the clade of Zabelia and Morinoideae).
Our analyses showed that both ancient reticulation and ILS might be at play in the initial radiation of Caprifoliaceae. The results indicated that the parental contributions to the events of reticulation was unequal. Solís-lemus et al. (2017) suggest that inheritance probabilities of ~ 0.10 from a parental population to a reticulate node may suggest introgression, and that inheritance probabilities close to 0.50 may indicate that the hypothesized reticulate node is the product of hybrid speciation between parental populations. With regard to the Zabelia clade, the inheritance contributions (0.266 and 0.734) support a hybridization event between Zabelia and the ancestral lineage of Morina clade (Fig. 5). The second and third reticulation events reveal that there has been extensive gene flow between the Scabiosa clade and the Morina clade as well as the Vesalea -Linnaea clade (Fig. 5). The network analyses inferred Zabelia, Mornia and Scabiosa to be putative hybrid lineages. Furthermore, the coalescent simulations indicated extensive ILS, that can be product of a rapid radiation in the backbone of Caprifoliaceae.
4.2 Temporal divergences of Caprifoliaceae
Our estimated ages using nuclear and chloroplast trees are generally younger than those of Wang et al. (2015) and Wang et al. (2020) based on two reliable fossils (Li et al., 2019). We found that the diversification and global spread of the subfamilies of Caprifoliaceae occurred during the late Cretaceous, Paleocene and Eocene (Fig. 6), similar to the results of Beaulieu et al. (2013). Our result showed a very short node connecting Linnaeoideae with Zabelia + Morionoideae in the backbone, which was supported in both the nuclear and cpDNA trees (Figs. 6 and S8). Linnaeoideae diverged from Zabelia + Morionoideae after the K-Pg boundary. Our results are congruent with the phenomena reported in several other plant groups such as Amaranthaceae s.l. (Morales-Briones et al. 2020a) and legumes (Koenen et al., 2020), and in lichenized fungi such as Lobariaceae (Ascomycota) (Widhelm et al., 2019). It is generally accepted that soon after the K-Pg boundary, due to mass extinctions, new habitats became available and diverse organisms experienced rapid diversifications (Schulte et al., 2010). Therefore, our results also reveal the wave of evolutionary radiation shortly after the K-Pg boundary (Fig. 6). As a result of the tectonic movements, historical climate fluctuation from Paleocene to Eocene, the Caprifoliaceae lineages subsequently underwent rapid diversifications. The stem lineages of most genera were dated to the Oligocene and Miocene, and most within-genus diversifications were dated to the Miocene and Pliocene (Fig. 6). Our result may be explained by the hypothesis that members of the Caprifoliaceae are well adapted to relatively cool environments (Friis, 1985; Manchester and Donoghue, 1995; Manchester, 2000), and an increase in the earth’s temperature may have forced them to move to higher altitudes or latitudes. As plants moved to higher altitudes, their distribution was likely to be fragmented, resulting in isolation between populations. We have some evidence to support this hypothesis: (1) This family is mainly distributed in north temperate zone, and some genera even reach areas near the Arctic Circle (such as Linnaea); (2) there are numerous species (such as Valeriana officinalis, Lonicera rupicola, and L. spinosa) with island-like distributional patterns in relatively high altitudes. Survivors by isolation may have blossomed after the late Oligocene, especially during the Miocene with a shift into new geographic areas, especially if these were mountainous, and then struggled again during recent climatic cooling and glacial activities (Moore and Donoghue, 2007). The global events (e.g. ancient orogenic and monsoon-driven) that might have led to the diversification of Caprifoliaceae as reported in other taxa (Lu et al., 2018; Ding et al., 2020). For example, some genera or taxa (e.g., Linnaea, Lonicera myrtillus) may have benefited from the global cooling and drying of the Miocene and Pliocene, and these taxa usually possess tiny, narrow or needle-like leaves, while certain lineages (Abelia, Diabelia, and Dipelta) may be more adapted to the wetter, warmer parts of the world and these lineages may not have benefitted from the global cooling of the past 30 million years.
4.3 Evolution of morphological characters
The characters were traced on the phylogeny of the cpDNA data using ML method (Figs. 7, 8 and 9) because of the potential hemiplasy and xenoplasy produced by the discordance and hybridization detected in the nuclear backbone (Avies and Robinson 2008; Robinson et al., 2008; Copetti et al., 2017; Wang et al., 2020). A consequence of this discordance is elevated levels of apparent homoplasy in the species tree (Copetti et al., 2017; Hahn and Nakhleh 2017).
Stamen number, fruit type, style exertion, number of carpels, number of seeds and epicalyx presence have been traditionally used for generic recognition within Caprifoliaceae (Backlund, 1996; Donoghue et al., 2003; Yang and Landrein, 2011; Landrein et al., 2020). Discordance among morphological traits might plausibly arise due to either variable convergent selection pressures or other phenomena such as hemiplasy. The evidence indicates that the probability of hemiplasy is high for the four characters traits in Caprifoliaceae: the branch lengths leading to lineages with derived character states are uniformly short with high levels of gene tree discordance. It is possible that gene flow contributes to these patterns. For example, the ancestral stamen number states (i.e., 2 and 4) found in Morina longifolia and Acanthocalyx alba within the Morinoideae clade could be due to alleles introgressed, as we identified putative introgression events between those lineages (Fig. 5). Morphological and anatomical studies showed that the earliest Caprifoliaceae had monosymmetric flowers (probably weakly so at first) with larger calyx lobes, tubular corollas, elongate styles, and capitate stigmas (Donoghue et al., 2003). Within Caprifoliaceae, the main change in stamen number is a reduction from five to four stamens. Subsequently, there was a reduction to two stamens within Morinaceae and to three, two, and one within Valerianaceae (Figs. 7 and S9). These variations may be related to an underlying change in floral symmetry (Donoghue et al., 2013). Increasing symmetry characteristic may relate to carpel abortion or to differences in the arrangement of flowers at the level of the inflorescence.
Similar to other five characters traits, our data suggest that carpels number is also affected by hemiplasy: in most relevant internodes, the ancestral state of carpels number can be inferred to be inconsistent with carpels number transitions generally following phylogenetic relationships. Our results suggest that multiple independent evolutionary events of the carpel evolution in Caprifoliaceae have occurred (Figs. 8 and S10). In Caprifoliaceae, the abortion of two of the three carpels and the development of just a single ovule within the remaining fertile carpel was evidently correlated with fruit type (Wilkinson 1949). For some subfamilies of Caprifoliaceae, carpel abortion occurs at a relatively late stage of ovary development, so many species have two empty chambers at fruit maturity (e.g., Linnaeoideae, Morinoideae, and Valerianoideae). In fact, in some species, these empty compartments have been co-opted in various ways in connection with dispersal (e.g., inflated for water dispersal in some Valeriana).
Caprifoliaceae shows great variation in fruit types. Fleshy, bird-dispersed fruits are limited to the Caprifolieae Dumort. (Donoghue et al., 2003). It is important to note that the ancestral carpel number for Caprifoliaceae is most likely 3. Lonicera has berries, though generally with just a few seeds embedded in copious pulp. There is programmed carpel abortion and the number of seeds corresponds to the number of fertile carpels. For Symphoricarpos, two of the four carpels abort, and there are two stones. The mesocarp in the cases is rather dry and mealy in texture. In the Caprifoliaceae, achenes with a single seed are present in Heptacodium and in the large Linnaeoideae clade (though in Dipelta, and in Linnaea there are two seeds at maturity). From the standpoint of fruit evolution, the linkage of Heptacodium within Caprifolioideae implies either the independent evolution of achenes or a transition from achenes to fleshy fruits in the line leading to Caprifolioideae. Among the achene-producing Caprifoliaceae, there are various adaptations for wind dispersal. One of the most striking of these modifications is enlargement of the calyx lobes into wings as the fruits mature (e.g. in Abelia, Dipelta,and Diabelia). Especially well known is the production of a feathery pappus-like structure in species such as Valeriana officianalis and Centranthus ruber in Valerianoideae. This modification facilitates passive external transport by animals. A similar case is also found in Kolkwitzia.
The reconstruction of character evolution thus shows that some characters that were once considered important for taxonomy within the family have been inferred to be the results of homoplasious evolution (Gould 2000; Pyck, 2001; Bell 2001, 2004; Carlson et al., 2009; Zhai et al., 2019). In character evolution analysis, homoplasy is regarded as evolutionary noise that, if not properly accommodated, jeopardizes phylogenetic reconstructions using morphological characters. At the same time, hemiplasy is one of the causes of homoplasy (Copetti et al., 2017). The phenomenon of hemiplasy is most plausible when the internodal distances in a phylogenetic tree are short (relative to effective population sizes) (Robinson et al., 2008). This may explain why it has been difficult to reconstruct the relationships among the major lineages and genera of the family. Eventually, more extensive sampling and developmental studies will be needed to elucidate the mechanisms underlying the morphological evolutionary patterns outlined here.
4.4 Recognition of Zabelioideae as a new subfamily in Caprifoliaceae
Despite the strong signals of gene tree discordance, our nuclear and plastid phylogenies strongly supported seven major clades in Caprifoliaceae: Linnaeoideae, Zabelia, Morinoideae, Valerianoideae, Dipsacoideae, Caprifolioideae and Diervilloideae, and show Zabelia as the sister to the morphologically highly distinct Morinoideae (Figs. 3 & 4). Our analyses supported reticulate evolution concerning the origins of both the Zabelia lineage as well as the Morinoideae. Based on the phylogenomic and morphological analyses, we herein propose to recognize Zabelia as representing a new subfamily of Caprifoliaceae.
Zabelioideae B. Liu & S. Liu ex H.F. Wang, D.F. Morales-B, M.J. Moore & J. Wen, subfam. nov.
Type:Zabelia (Rehder) Makino.
Description: Shrubs, deciduous; old branches often with six deep longitudinal grooves. Leaves opposite, entire or dentate at margin; estipulate; petioles of opposite leaf pairs dilated and connate at base, enclosing axillary buds. Inflorescence a congested thyrse of cymes; cymes 1-3-flowered. Calyx 4- or 5-lobed, persistent, spreading. Corolla 4- or 5-lobed, hypocrateriform, ± zygomorphic; corolla tube cylindrical. Stamens 4, included, didynamous. Ovary 3-locular, 2 locules with 2 series of sterile ovules and 1 locule with a single fertile ovule; stigmas green, capitate, mucilaginous. Fruit an achene crowned with persistent and slightly enlarged sepals. Basic chromosome number x = 9.
One genus and six species distributed in China, Japan, Korea, Afghanistan, NW India, Kyrgyzstan, Nepal, and Russian Far East.
Zabelioideae is highly distinct morphologically from its sister Morinoideae. They can be easily distinguished by their habit (with Zabelioideae as shrubs, and Morinoideae as herbs), the six distinct, longitudinal grooves on twigs and branches of Zabelioideae (the six grooves absent in Morinoideae), and the epicalyx (absent in Zabelioideae and present in Morinoideae). Zabelioideae and Morinoideaeshare show some similarities in pollen micromorphology, as both have psilate pollen grains with an endocingulum (Verlaque 1983; Kim et al. 2001; Jacobs et al., 2011). The two subfamilies diverged in the early-mid Eocene (Figs. 6, S7), and their long evolutionary history associated with deep hybridization events, ILS and extinctions likely have made it difficult to determine their phylogenetic placements.
5 Conclusions
Gene tree discordacne has been commonly observed in phylogenetic studies. More evidence has shown that the species tree method is inconsistent in the presence of gene flow (Solís-lemus et al., 2016; Long and Kubatko 2018), which suggests that both ILS and gene flow simultaneously need to be considered in constructing phylogenetic relationships. Here, our results show clear evidence of cytonuclear discordance and extensive conflict between individual gene trees and species trees in Caprifoliaceae. Second, the short node connecting Linnaeoideae with Zabelioideae+Morionoideae was dated to be after the K-Pg boundary, which support that there was a rapid radiation Caprifoliaceae species at that time, as reported in other plant taxa. Third, the temporal diversification of Caprifoliaceae provides a good case to support the evolutionary radiations and adaptation of a dominantly north temperate plant family to climatic changes from the late Cretaceous to the late Cenozoic. Finally, based on evidence from molecular phylogeny, divergence times, and morphological characters, we herein recognize the Zabelia clade as representing a new subfamily, Zabelioideae, in Caprifoliaceae. The phylogenetic framework also sheds important insights into the character evolution in Caprifoliaceae.
Author contributions
H.F.W. and J.W. conceived the study. H.F.W. and D.F.M-B. performed the research and analyzed the data. H.X.W. and H.F.W. wrote the manuscript, D.F.M-B., J.W. and revise the manuscript.
Simplified ML tree generated from the nuclear gene data showing the distribution of selected character states. The asterisks indicate Maximum likelihood bootstrap support of 100%.
ASTRAL-II species tree; node label indicates internode certainty all (ICA) scores.
Results of simulation testing of the Quartet Sampling of the Astral trees. Node labels indicate QC/Quartet Differential (QD)/Quarte Informativeness (QI) scores.
Results of simulation testing of the Quartet Sampling of the nuclear concatenated RAxML trees. Node labels indicate QC/Quartet Differential (QD)/Quarte Informativeness (QI) scores.
Results of simulation testing of the Quartet Sampling of the chloroplast trees. Node labels indicate QC/Quartet Differential (QD)/Quarte Informativeness (QI) scores.
Phylogeny of the plastid DNA dataset; numbers above branches represent clade frequencies of the simulated gene trees.
Best species networks of the selective nuclear dataset estimated with PhyloNet with one (A), two (B), three (C) and four (D) hybridization events. Blue branches connect the hybrid nodes. Numbers next to blue branches indicate inheritance probabilities.
BEAST analysis of divergence times based on the cpDNA data. Calibration points are indicated by A, B, and C. Numbers 1–10 represent major divergence events in Caprifoliaceae; mean divergence times and 95% highest posterior densities are provided for each.
Likelihood inference of character evolution in Caprifoliaceae using Mesquite v.2.75 based on nuclear matrix. Left, Number of stamens; Right, Style exertion.
Likelihood inference of character evolution in Caprifoliaceae using Mesquite v.2.75 based on nuclear matrix. Left, Style of fruit; Right, Number of carpels.
Likelihood inference of character evolution in Caprifoliaceae using Mesquite v.2.75 based on nuclear matrix. Left, number of seeds; Right, epicalyx presence/absence.
Appendix 1. The baits developed based on the transcriptomes of Lonicera japonica, Valeriana officinalis, Viburnum odoratissimum, Sambucus canadensis, Symphoricarpos sp. and Dipsacus asper from Caprifoliaceae in 1KP used in this study.
Acknowledgement
The work was funded by National Scientific Foundation of China (31660055). We appreciate Gabriel Johnson for his help with the target enrichment experiment, and the United States National Herbarium for permission for sampling some collections. We acknowledge the staff in the Laboratories of Analytical Biology at the National Museum of Natural History, the Smithsonian Institution for support and assistance.