Phylogenomic analyses using genomes and transcriptomes do not “resolve” relationships among major clades in Phrymaceae

Premise of the study Phylogenomic datasets using genomes and transcriptomes provide rich opportunities beyond resolving bifurcating phylogenetic relationships. Monkeyflower (Phrymaceae) is a model system for evolutionary ecology. However, it lacks a well-supported phylogeny for a stable taxonomy and for macroevolutionary comparisons. Methods We sampled 24 genomes and transcriptomes in Phrymaceae and closely related families, including eight newly sequenced transcriptomes. We reconstructed the phylogeny using IQ-TREE and ASTRAL, evaluated gene tree discordance using PhyParts, Quartet Sampling, and cloudogram, and carried out phylogenetic network analyses using PhyloNet and HyDe. We searched for whole genome duplication (WGD) events using chromosome numbers, synonymous distance, and gene duplication events. Key results Most gene trees support the monophyly of Phrymaceae and each of its tribes. Most gene trees also support the tribe Mimuleae being sister to Phrymeae + Diplaceae + Leucocarpeae, with extensive gene tree discordance among the latter three. Despite the discordance, polyphyly of Mimulus s.l. is strongly supported, and no particular reticulation event among the Phrymaceae tribes is well supported. Reticulation likely occurred among Erythranthe bicolor and close relatives. No ancient WGD event was detected in Phrymaceae. Instead, small-scale duplications are among potential drivers of macroevolutionary diversification of Phrymaceae. Conclusions We show that analysis of reticulate evolution is sensitive to taxon sampling and methods used. We also demonstrate that genome-scale data do not always fully “resolve” phylogenetic relationships. They present rich opportunities to investigate reticulate evolution, and gene and genome evolution involved in lineage diversification and adaptation.


INTRODUCTION 54
Phylogenomic analyses using genomes and transcriptomes often focus on "resolving" relationships that 55 previously show conflict among markers or low statistical support using Sanger sequencing. With 56 thousands of genes, these datasets are rich in insights beyond providing a bifurcating phylogenetic 57 relationship, pointing to reticulate evolution, gene/genome duplications, and adaptation. However, 58 detecting these events are difficult due to computational limitations, and studies often do not fully 59 interrogate the data. 60

102
In this study we sampled transcriptomes and genomes covering four of the five tribes in 103 Phrymaceae, including eight newly generated transcriptomes, to 1) provide a phylogenetic backbone and 104 examine gene tree discordance and 2) investigate patterns of gene and genome duplication in the family. 105 We found that most gene trees support Mimuleae being sister to Phrymeae+Diplaceae+Leucocarpeae, 106 with the relationship among the latter three showing extensive gene tree discordance. However, no 107 particular reticulation event among the Phrymaceae tribes was strongly supported. Instead, we found 108 evidence for reticulation events among closely related species contributing to Erythranthe bicolor. Our 109 analyses did not support any ancient WGD events in Phrymaceae; instead, small-scale gene duplications 110 in defense, stress response, growth and development, and certain biochemistry pathways are candidates 111 for potential drivers underlie macroevolutionary diversification of Phrymaceae. 112

114
Taxon sampling-Transcriptomes were newly generated from eight accessions representing seven 115 ingroup Phrymaceae species for this study (Appendix S1). Seeds were collected from natural populations 116 and were cold-treated in soil for a week at 4°C in the dark before growing with 15-hour daylight in a 117 greenhouse. Young leaves and flower buds were flash frozen in liquid nitrogen. RNA extraction, library 118 preparation (rRNA removal or poly-A enrichment), and sequencing procedures are detailed in Appendix 119 S1. In addition, we included three genomes and five transcriptomes from Phrymaceae that are publicly 120 Mirarab, 2018) with the 'per-gene' mode and a false positive error rate threshold (α) of 0.001. The 139 resulting trees were visually inspected, and deep paralogs producing internal branch lengths longer than 140 0.25 were cut apart retaining subclades with a minimum of 20 taxa to obtain final homolog trees. 141 Orthology inference was carried out using the 'monophyletic outgroup' approach from Yang and 142 Smith (2014). The approach filters unrooted homolog trees, requiring outgroups to be monophyletic and 143 single-copy. It then traverses the ingroup clade rooted by outgroups from root to tip and trims the side 144 with less taxa each time a gene duplication event was detected, until every taxon is represented by a 145 single sequence. We set the three Lamiaceae genomes as outgroups (Zhang et al., 2020), keeping ortholog 146 groups that included at least 15 taxa. 147

Species tree inference and evaluation of support -Sequences from individual ortholog groups 149
were aligned using OMM_MACSE. Columns with more than 20% missing data were trimmed with Phyx, 150 and only alignments with at least 1000 characters and all 24 taxa were retained. We first estimated a 151 To explore discordance among gene trees, we calculated the number of concordant and discordant 159 bipartitions on each node of the species tree using PhyParts (Smith et al., 2015). We mapped bipartitions 160 from gene trees with local BS support of at least 50% against the IQ-TREE tree from the concatenated 161 supermatrix (identical to the ASTRAL tree; see results). Next, to distinguish conflict from poorly 162  The resulting plastomes with one Inverted Repeat removed were aligned with MAFFT and 182 columns with more than 50% missing data were trimmed with Phyx. An ML tree was inferred with IQ-183 TREE with automated extended model selection and 1000 rapid BS replicates. Additionally, we used QS 184 with 1,000 replicates to evaluate branch support. For each of the two areas, we first ran PhyParts using all taxa from the reduced dataset. We then 191 removed one taxon at a time to determine which taxon produced the highest gene tree conflict. We  Fig. 2B) and the backbone relationships were identical to the nuclear results (Fig.  276 2A, Appendix S5). However, relationships among closely related ingroup taxa differ in two places (Fig. 2,  277 A vs. B): 1) The two Erythranthe cardinalis accessions were sister to each other in the nuclear tree but 278 were paraphyletic with E. bicolor nested among them in the cpDNA tree; 2) relationships among 279 among nuclear genes, low quartet concordance, and dominant secondary topologies in the cpDNA tree, 281 and conflicting topology between cpDNA and nuclear trees. In addition, extensive nuclear gene tree 282 conflict and discordance between nuclear and cpDNA trees are present among other Lamiales families 283 The second instance of conflict we focused on was among Erythranthe bicolor, E. cardinalis, and 306 E. lewisii. Consistent with the cloudogram (Fig. 2C) where some gene trees supported E. bicolor being 307 sister to E. lewisii, removing either E. bicolor or E. lewisii (Fig. 3B) removed most of the gene tree 308 conflicts. PhyloNet ML analysis (Appendix S6) recovered networks with 38-48% of E. bicolor genes 309 from E. lewisii or its close relatives. AIC or AICc did not prefer any network, while BIC preferred the 1-310 reticulation network (Appendix S7). PhyloNet MCMC searches recovered a 95% credible set of three 1-311 reticulation networks with gene flow towards E. bicolor, but the source of gene flow varied among 312 networks (Appendix S9). The MPP (54% of the credible set) showed that E. bicolor had 26% genes from 313 E. lewisii, similar to the 1-reticulation ML network (Fig. 3B). HyDe, on the other hand, did not recover 314 any significant hybridization events.

327
Although we also detected extensive gene tree conflict among Erythranthe pardalis, E. nasuta, E. 328 guttata, and E. glaucescens, as branches among them were short and we lacked any intraspecific 329 sampling, we did not carry out additional analyses on reticulate evolution. 330 331 Gene and Genome Duplications -Mapping gene duplication events did not reveal any node 332 with more than 4.3% of gene duplications in Phrymaceae (Appendix S10). Similarly, the Ks plots 333 (Appendix S11) did not support any Phrymaceae-specific WGD that involved more than one species 334 sampled. All 24 datasets included in this study shared two optimal mixing components (i.e. Ks peaks). 335 The first component had a Ks mean of 1.8-2.2, corresponding to a whole-genome triplication event early 336 The ten Phrymaceae genes (Appendix S12) with the highest numbers of copies were involved in 348 defense/immune response (Serine protease inhibitor, aspartyl protease, MLP-like protein), stress response 349 (HSP20-like chaperones, Ribosomal protein L10 family protein), mitochondria organization (prohibitin 350 2), regulating plant growth (small auxin up-regulated RNA-like auxin-responsive protein family), cell 351 wall architecture (Glycosyl hydrolase), and various other biochemical processes (S-adenosyl-L-352 methionine-dependent methyltransferases, hydroxymethyltransferase 4). Since we reduced sequences 353 from the same sample that formed monophyletic or paraphyletic groups, we ignored isoforms from 354 alternative splicing, assembly artifacts, and recent copy number increase involving only a single sample, 355 as these are difficult to quantify using de novo assembled transcriptomes. Therefore, only gene 356 duplication events involving more than two taxa in our sampling contributed to our copy number counts. 357 Given our much denser taxon sampling in Leucocarpeae and Diplaceae, the top ten are heavily influenced 358 by genes that had multiple rounds of gene duplications in these two tribes. phylogenomic analyses recovered strong support of the monophyly of Phrymaceae and each of its tribes 363 sampled. We also recovered extensive and well-supported gene tree discordance along the backbone of 364 Phrymaceae. The discordance is not an artifact of gene and genome duplications; nor is any particular 365 reticulation event well supported by phylogenetic network analyses and hypothesis testing. Therefore, 366 phylogenetic uncertainty, ILS, and population structure likely contributed to the extensive gene tree 367 discordance among Phrymaceae tribes. 368 Among closely related species, our phylogenetic network analyses support introgression from E. 369 lewisii or close relatives towards E. bicolor. In addition, our plastome analysis (Fig. 2B)  In summary, our phylogenetic analyses suggest that: 1) network inferences are sensitive to the 378 methods used, sources of data, and taxon sampling, including the choice of both ingroups and outgroups. Phrymaceae. Our study provides initial insights into the gene space of species across Phrymaceae and 395