## Abstract

A major goal of evolutionary biology is to identify key evolutionary transitions that correspond with shifts in speciation and extinction rates. Here we test the association of transitions in plant mating system with shifts in diversification rates using a novel stochastic character mapping method that identifies the timing and nature of both transitions in trait evolution and diversification rate shifts over evolutionary trees. Utilizing a state-dependent speciation and extinction (SSE) model and a densely sampled fossil-calibrated phylogeny of the plant family Onagraceae, we confirm long standing theory that self-compatible lineages have higher extinction rates and lower net diversification rates compared to self-incompatible lineages. Furthermore, our results provide the first empirical evidence for the “senescing” diversification rates predicted in highly selfing lineages: our mapped character histories show that the loss of self-incompatibility is followed by a short-term spike in speciation rates, which declines after a time lag of several million years resulting in negative net diversification. Lineages that have long been self-compatible such as *Fuchsia* and *Clarkia* are in a previously unrecognized and ongoing evolutionary decline.

## 1 Introduction

Evolutionary biologists have long sought to identify key evolutionary transitions that drive the diversification of life (Szathmary and Smith 1995; Sanderson and Donoghue 1996). One such evolutionary transition is the loss of self-incompatibility (SI) in flowering plants (Stebbins 1974; Grant 1981; Barrett 2002). The majority of flowering plants are hermaphrodites, and SI is the genetic system that encourages outcrossing and prevents self-fertilization. Independent transitions to self-compatibility (SC) have occurred repeatedly across the angiosperm phylogeny (Igic et al. 2008) and within the plant family Onagraceae (Raven 1979). Despite the repeated loss of SI, outcrossing is widespread and prevalent in plants, an observation that led Stebbins to hypothesize that SC was an evolutionary dead-end (Stebbins 1957). Stebbins proposed that over evolutionary time SC lineages will have higher extinction rates due to reduced genetic variation and an inability to adapt to changing conditions. However, Stebbins also speculated that SC is maintained by providing a short-term advantage in the form of reproductive assurance. The ability of SC lineages to self reproduce has long been understood to be potentially beneficial in droughts and other conditions where pollinators are rare (Darwin 1876) or after long distance dispersal when a single individual can establish a new population (Baker 1955).

One method that could be used test such hypotheses about the short-term versus long-term consequences of evolutionary transitions is stochastic character mapping on a phylogeny (Nielsen 2002; Huelsenbeck et al. 2003). While most ancestral state reconstruction methods estimate states only at the nodes of a phylogeny, stochastic character mapping explicitly infers the timing and nature of each evolutionary transition along the branches of a phylogeny. However, current approaches to stochastic character mapping have two major limitations: the commonly used rejection sampling approach proposed by Nielsen (2002) is inefficient for characters with large state spaces (Huelsenbeck et al. 2003; Hobolth and Stone 2009), and more importantly current methods only apply to models of character evolution that are finite state substitution processes. While the first limitation has been partially overcome through uniformization techniques (Rodrigue et al. 2008; Irvahn and Minin 2014), a novel approach is needed for models with infinite state spaces, such as models that specifically test the association of character state transitions with shifts in diversification rates. These models describe the joint evolution of both a character and the phylogeny itself, and define a class of widely used models called state-dependent speciation and extinction models (SSE models; Maddison et al. 2007; FitzJohn 2012; Goldberg and Igić 2012; Freyman and Höhna 2017).

In this work we introduce a method to sample character histories directly from their joint distribution, conditional on the observed tip data and the parameters of the model of character evolution. The method is applicable to standard finite state Markov processes of character evolution and also more complex SSE models that are infinite state Markov processes. The method does not rely on rejection sampling and does not require complex data augmentation (Van Dyk and Meng 2001) schemes to handle unobserved speciation/extinction events. Our implementation directly simulates the number, type, and timing of diversification rate shifts and character state transitions on each branch of the phylogeny. Thus, when applying our method together with a Markov chain Monte Carlo (MCMC; Metropolis et al. 1953) algorithm we can sample efficiently from the posterior distribution of both character state transitions and shifts in diversification rates over the phylogeny.

To illustrate the usefulness of our method to sample stochastic character maps from SSE models, we applied the method to examine the association of diversification rate shifts with mating system transitions over a densely sampled fossil calibrated phylogeny of Onagraceae. Recent studies have reported higher net diversification rates for SI lineages, supporting Stebbins’ dead-end hypothesis (Goldberg et al. 2010; Ferrer and Good 2012). However, explicit phylogenetic tests for increased extinction rates in SC lineages are limited to the plant family Solanaceae, where increased rates of speciation in SC lineages were offset by higher extinction rates, leading to lower overall rates of net diversification in SC lineages compared to SI lineages (Goldberg et al. 2010). In the study by Goldberg et al., the association of mating system transitions with shifts in extinction and speciation rates was tested using the Binary State Speciation and Extinction model (BiSSE; Maddison et al. 2007). More recently, BiSSE has been shown to be prone to falsely identifying a positive association when diversification rate shifts are associated with another character *not* included in the model (Maddison and FitzJohn 2015; Rabosky and Goldberg 2015). One approach to reduce the possibility of falsely associating a character with diversification rate heterogeneity is to incorporate a second, unobserved character into the model (i.e., a Hidden State Speciation and Extinction (HiSSE) model; Beaulieuand OMeara 2016). The changes in the unobserved character’s state represent background diversification rate changes that are not correlated with the observed character. Our work here is the first to apply a HiSSE-type model to test Stebbins’ dead-end hypothesis. We additionally use simulations and Bayes factors (Kass and Raftery 1995) to evaluate the false positive error rate of our model. Most notably, we employ our novel stochastic character mapping method to reconstruct the timing of both diversification rate shifts and transitions in mating system over the Onagraceae phylogeny. We test the hypothesis that SC lineages have higher extinction and speciation rates yet lower net diversification rates compared to SI lineages, and investigate the short-term versus long-term macroevolutionary consequences of the loss of SI.

## 2 Methods

### 2.1 Stochastic Character Mapping Method

The primary steps of the novel stochastic character mapping algorithm introduced here are illustrated in Fig. 1. A pseudocode formulation of the algorithm is provided in the Supporting Information (Alg. S1). Additionally, Supporting Information Fig. S1 gives a side by side comparison of the standard stochastic character mapping algorithm as originally described by Nielsen (2002) and the approach introduced in this work. In standard stochastic character mapping the first step is to traverse the tree post-order (tip to root) calculating the conditional likelihood of the character being in each state at each node using Felsenstein’s pruning algorithm (Felsenstein 1981). Transition probabilities are computed along each branch using matrix exponentiation. Ancestral states are then sampled at each node during a pre-order (root to tip) traversal. Finally, character histories are repeatedly simulated using rejection sampling for each branch of the tree.

In our new stochastic character mapping algorithm we begin similarly by traversing the tree post-order and calculating conditional likelihoods. However, instead of using matrix exponentiation we calculate the likelihood using a set of ordinary differential equations. We numerically integrate these equations for every arbitrarily small time interval along each branch and store a vector of conditional likelihoods for the character being in each state for every small time interval. The two functions we must numerically integrate are *D _{N,i}*(

*t*) which is defined as the probability that a lineage in state i at time t evolves into the observed clade

*N*, and

*E*(

_{i}*t*) which is the probability that a lineage in state i at time t goes extinct before the present, or is not sampled at the present. The equations for these two probabilities are given as Supporting Information Eq. S1 and Eq. S2. Note these equations are identical to the ones describing the Cladogenetic State Speciation and Extinction model (ClaSSE; Goldberg and Igić 2012), which all other discrete SSE models are nested within.

At the tips of the phylogeny (time *t* = 0) the extinction probabilities are *E _{i}*(0) = 1 −

*ρ*for all i where

*ρ*is the sampling probability of including that lineage. For lineages with the observed state

*i*, the initial condition is

*D*(0) =

_{N,i}*ρ*. The initial condition for all other states

*j*is

*D*(0) = 0. When a node

_{N,j}*L*is reached, the probability of it being in state i is calculated by combining the probabilities of its descendant nodes

*M*and

*N*as such: where the rate of a lineage in state

*i*splitting into two lineages in states

*j*and

*k*is λ

*. Letting 𝓧 represent the observed tip data, Ψ an observed phylogeny, and*

_{ijk}*θ*a particular set of character evolution model parameters, then the likelihood is given by: where

_{q}*π*is the root frequency of state

_{i}*i*and D

_{R},į (t) is the likelihood of the root node being in state

*i*conditional on having given rise to the observed tree Ψ and the observed tip data 𝓧 (Maddison et al. 2007; FitzJohn 2012).

We then sample a complete character history during a pre-order tree traversal in which the root state is first drawn from the marginal likelihoods at the root, and then states are drawn for each small time interval moving towards the tip of the tree conditioned on the state of the previous small time interval. We must again numerically integrate over a set differential equations during this root-to-tip tree traversal. This integration, however, is performed in forward-time, thus a different and new set of differential equations must be used. Letting the rate of anagenetic change from state *i* to *j* to be *Q _{ij}* and the rate of extinction in state

*i*to be

*μ*:

_{i}In the Supporting Information we derive these forward-time differential equations. We demonstrate how the forward-time equations correctly handle non-reversible models of character evolution and validate the forward-time computation of *D _{N,i}*(

*t*) and

*E*(

_{i}*t*). With this approach we can directly sample character histories from an SSE process in forward-time, resulting in a complete stochastic character map sample without the need for rejection sampling or uniformization, see Figure 1.

#### 2.1.1 Implementation

The stochastic character mapping method described here is implemented in C++ in the software RevBayes (Höhna et al. 2014, 2016). The RevGadgets R package (available at `https://github.com/revbayes/RevGadgets`) can be used to generate plots from RevBayes output. Scripts to run all RevBayes analyses presented here can be found in the repository at `https://github.com/wf8/onagraceae`.

### 2.2 Onagraceae Phylogenetic Analyses

DNA sequences for Onagraceae and Lythraceae were mined from GenBank using SUMAC (Freyman 2015). Lythraceae was selected as an outgroup since previous molecular phylogenetic analyses place it sister to Onagraceae (Sytsma et al. 2004). Information about the alignments and GenBank accessions used can be found in the Supporting Information. Phylogeny and divergence times were inferred using RevBayes (Höhna et al. 2016). Details regarding the fossil and secondary calibrations, the model of molecular evolution, and MCMC analyses are given in the Supporting Information.

### 2.3 Analyses of Mating System Evolution

The mating system of Onagraceae species were scored as either self-compatible or self-incompatible following Wagner et al. (Wagner et al. 2007).

#### 2.3.1 HiSSE Model

To test whether diversification rate heterogeneity is associated with shifts in mating system or changes in other unmeasured traits, we used a model with 4 states that describes the joint evolution of mating system as well as an unobserved character with hidden states *a* and *b* (Fig. 2). Since the RNase-based gametophytic system of self-incompatibility found in Onagraceae is ancestral for all eudicots (Steinbachs and Holsinger 2002), we used an irreversible model that only allowed transitions from self-incompatible to self-compatible. For each of the 4 states we estimated speciation (λ) and extinction (μ) rates. While estimating diversification rates, we accounted for uncertainty in phylogeny and divergence times by sampling 200 trees from the posterior distribution of trees. For details on priors used and the MCMC analyses see the Supporting Information.

#### 2.3.2 Model Comparisons and Error Rates

To test whether diversification rate heterogeneity was *not* associated with shifts in mating system, we calculated a Bayes factor (Kass and Raftery 1995) to compare the mating system dependent diversification model described above with a mating system independent diversification model. The independent model had 4 states and the same parameters as the dependent model, except that the speciation and extinction rates were fixed so they only varied between the hidden states *a* and *b*. Hence, λ* _{ca}* was fixed to equal λ

*, λ*

_{ia}*was fixed to λ*

_{cb}*,*

_{ib}*μ*was fixed to

_{ca}*μ*, and

_{ia}*μ*was fixed to

_{cb}*μ*.

_{ib}To evaluate the false positive error rate we performed a series of simulations that tested the power of our models to reject false associations between shifts in mating system and diversification rate shifts. Trees were simulated under a BiSSE model, and then diversification *independent* binary characters representing mating system were simulated over the trees. For each simulation replicate Bayes factors were calculated to compare the fit of the mating system dependent diversification model and mating system independent diversification model. Details on the simulations are provided in the Supporting Information.

All Bayes factors were calculated using the stepping stone method (Xie et al. 2010; Höhna et al. 2017) as implemented in RevBayes. Marginal likelihood estimates were run for 50 path steps and 19000 generations within each step. The Bayes factor was then calculated as twice the difference in the natural log marginal likelihoods (Kass and Raftery 1995).

## 3 Results

### 3.1 Onagraceae Phylogeny

In our estimated phylogeny, all currently recognized Onagraceae genera (Wagner et al. 2007) were strongly supported to be monophyletic with posterior probabilities > 0.98. The crown age of Onagraceae was estimated to be 98.8 Ma (94.0 Ma – 107.3 Ma 95% HPD; Fig. 3), and a summary of the divergence times of major clades within Onagraceae can be found in Supporting Information Table 3.

### 3.2 Stochastic Character Maps

Under the state-dependent diversification model, repeated independent losses of SI across the Onagraceae phylogeny were found to be associated with shifts in diversification rates (Fig. 3). Additionally, transitions between the unobserved character states *a* and *b* were also associated with diversification rate heterogeneity. Uncertainty in the timing of diversification rate shifts and character state transitions was generally low, but increased along long branches where there was relatively little information regarding the exact timing of transitions (Fig. 4). Following the loss of self-incompatibility, there was an evolutionary time lag (mean 1.97 My) until net diversification (speciation minus extinction) turned negative (Fig. 5). In many cases the loss of self-incompatibility occurred in an ancestral lineage followed by multiple shifts to negative net diversification in descendant lineages. To account for these non-independent time lags we divided the time during ancestral lineages evenly among the individual shifts to negative net diversification in descendant lineages.

### 3.3 Diversification Rate Estimates

Within either hidden state (*a* or *b*) SC lineages had generally higher speciation and extinction rates compared to SI lineages (Fig. 3). SC lineages in state *a* had a speciation rate of 0.12 (0.02 – 0.23 95% HPD) compared to 0.16 (0.09 – 0.24 95% HPD) in SI lineages in state *a*. For SC lineages in state *b* the speciation rate was 1.66 (0.98 – 2.41 95% HPD) compared to 0.65 (0.45 – 0.85 95% HPD) in SI lineages in state *b*. Similarly, SC lineages in state *a* had an extinction rate of 0.35 (0.25 – 0.48 95% HPD) compared to 0.04 (0.00 – 0.09 95% HPD) in SI lineages in state *a*. For SC lineages in state *b* the extinction rate was 1.36 (0.65 – 2.19 95% HPD) compared to 0.10 (0.00 – 0.29 95% HPD) in SI lineages in state *b*.

Despite higher speciation and extinction rates, SC lineages had lower net diversification compared to SI lineages. Net diversification was found to be negative for most but not all extant SC lineages. The net diversification rate for SC lineages in state *a* was −0.23 (−0.32 – −0.14 95% HPD), compared to 0.13 (0.05 – 0.19 95% HPD) in SI lineages in state *a*. For SC lineages in state *b* the net diversification rate was 0.30 (0.15 – 0.46 95% HPD), compared to 0.55 (0.39 – 0.71 95% HPD) in SI lineages in state b.

### 3.4 Model Comparisons and Error Rates

The state-dependent diversification model of mating system evolution (Fig. 2) was “decisively” supported over the state-independent diversification model with a Bayes factor (2*ln*BF) of 19.9 (Jeffreys 1961). Bayes factors calculated using simulated datasets showed that the false positive error rate was low (Figure 6). The false positive rate for “strong” support (2*ln*BF > 6; Kass and Raftery 1995) was 0.05, and the false positive rate for “very strong” support (2*ln*BF > 10; Kass and Raftery 1995) was 0.0.

## 4 Discussion

The stochastic character map results reveal that the loss of SI has different short term and long term macroevolutionary consequences. Lineages with relatively recent losses of SI like *Epilobium* are undergoing a burst in both speciation and extinction rates with a positive net diversification rate. However, lineages that have long been SC such as *Fuchsia* (Tribe Circaeeae) and *Clarkia* are in a previously unrecognized evolutionary decline. These lineages went through an increase in both speciation and extinction rates a long time ago — after the loss of SI— but now only the extinction rates remain elevated and the speciation rates have declined, resulting in negative net diversification. The stochastic character maps quantify the speed of this evolutionary decline in SC lineages; while the mean time until evolutionary decline was 1.97 My, there was a large amount of variation in time estimates (Fig. 5). This variation could be due to differences in realized selfing/outcrossing rates of different lineages. Lineages with higher selfing rates likely build up load due to weakly deleterious mutations more quickly, leading to a more rapid mutational meltdown and eventual evolutionary decline. Furthermore, even if mutational load is low, the loss of genetic variation in highly selfing lineages will reduce the probability that such lineages can respond adequately to natural selection, such as imposed by a changing or new environment, thus increasing potential for extinction.

These results confirm long-standing theory about the macroevolutionary consequences of SC (Darwin 1876; Stebbins 1957). These consequences include the increased probability of going extinct due to the accumulation of harmful mutations (Lynch et al. 1995a,b) and an increased rate of speciation which may be driven by higher among-population differentiation and reproductive assurance that facilitates colonization of new habitats (Baker 1955; Hartfield 2016). The advantages of reproductive assurance may explain why transitions to SC occur repeatedly (Igic et al. 2008; Lande and Schemske 1985). However, our results reveal that this advantage is short-lived; the burst of increased speciation following the loss of SI eventually declines, possibly due to failing to adapt to changing conditions and the accumulation of deleterious mutations. The overall macroevolutionary pattern is one in which SC lineages undergo rapid bursts of increased speciation that eventually decline, doomed by intensified extinction and thus supporting Stebbins’ hypothesis of SC as an evolutionary dead-end (Stebbins 1957). These results provide the first empirical evidence for the “senescing” diversification rates predicted in highly selfing lineages by Ho and Agrawal (2017), who proposed that primarily selfing lineages may at first diversify at higher rates than outcrossing lineages but over time slow down due to elevated extinction rates.

Our findings corroborate previous analyses performed in the plant family Solanceae (Goldberg et al. 2010), where SC lineages were also found to have higher speciation and extinction rates yet lower net diversification. Our results, however, are the first to show that this pattern is supported even when other unmeasured factors affect diversification rate heterogeneity. Intuitively it is clear that no single factor drives all diversfication rate heterogeneity in diverse and complex clades such as Onagraceae. Indeed, in some lineages of *Oenothera*, the loss of sexual recombination and segregation due to extensive chromosome translocations (a condition called Permanent Translocation Heterozygosity) is associated with increased diversification rates (Johnson et al. 2011). Furthermore, other factors such as polyploidy and shifts in habitat, growth form, or life cycle may impact diversification rates (Mayrose et al. 2011; Donoghue 2005; Eriksson and Bremer 1992).

Stochastic character mapping of state-dependent diversification can be a powerful tool for examining the timing and nature of both shifts in diversification rates and character state transitions on a phylogeny. Character mapping reveals which stages of the unobserved character a lineage goes through; e.g. after the loss of self-incompatibility transitions are predominantly from hidden state *b* to *a*, representing shifts from positive net diversification to negative net diversification. Furthermore, character mapping infers the state of the lineages in the present and so reveals which tips of the phylogeny are currently undergoing positive or negative net diversification. If used with an SSE model in which all states are hidden (no observed states) our method will “paint” the location of shifts in diversification rate regimes over the tree. Distributions of character map samples could be used for posterior predictive assessments of model fit (Nielsen 2002; Bollback 2006; Höhna et al. 2017) and for testing whether multiple characters coevolve (Huelsenbeck et al. 2003; Bollback 2006). Our hope is that these approaches enable researchers to examine the macroevolutionary impacts of the diverse processes shaping the tree of life with increasing quantitative rigor.

## S2 Onagraceae Phylogenetic Analyses

### S2.1 Methods

#### S2.1.1 Supermatrix Assembly

DNA sequences for Onagraceae and Lythraceae were mined from GenBank using SUMAC (Freyman 2015). Lythraceae was selected as an outgroup since previous molecular phylogenetic analyses place it sister to Onagraceae (Sytsma et al. 2004). SUMAC assembled an 8 gene supermatrix (7 chloroplast loci plus the nuclear ribosomal internal transcribed spacer region) representing a total of 340 taxa. Table S1 summarizes the genes used, their length, and the percent of missing data. Sequences were aligned using MAFFT v7.123b (Katoh and Standley 2013). The default settings in MAFFT were used except that proper sequence polarity was ensured by using the direction adjustment option. Alignments were then concatenated resulting in chimeric operational taxonomic units (OTUs) that do not necessarily represent a single individual.

#### S2.1.2 Phylogenetic Analyses

Divergence times and phylogeny were jointly estimated using RevBayes (Höhna et al. 2014, 2016). Estimates were time calibrated using six node calibrations: four stem fossil calibrations, one crown fossil calibration, and a secondary calibration for the root split between Onagraceae and Lythraceae (Table S2). An uncorrelated lognormal relaxed clock model was used, and each of the eight gene partitions were assigned independent GTR substitution models (Tavaré 1986; Rodriguez et al. 1990). Rate variation across sites was modeled under a gamma distribution approximated by four discrete rate categories (Yang 1994). The constant rate birth-death-sampling tree prior (Nee et al. 1994; Yang and Rannala 1997) was used with the probability of sampling species at the present (*ρ*) set to 0.27. p was calculated by dividing the number of extant species sampled in the supermatrix (340) by the sum of the number of species recognized in Onagraceae (~ 650) and in Lythraceae (~ 620).

Four independent MCMC analyses were performed. Each MCMC ran for 15000 generations, where each generation consisted of 837 randomly scheduled Metropolis-Hastings moves. This resulted in four chains that each performed a total of 12,555,000 MCMC steps. Samples of the posterior distribution were drawn every 10 generations, and the first 50% of samples from each chain were discarded as burnin resulting in 750 trees sampled from each of the 4 independent chains. Convergence was assessed by ensuring the effective sample size of each parameter was over 200 for each independent chain. The maximum a posteriori (MAP) tree was then calculated from the combined 3000 tree samples of all 4 chains.

### S2.2 Results

All Onagraceae genera described in Wagner et al. (2007) were recovered as monophyletic clades in the MAP summary tree with posterior probabilities > 0.95 (Figure S6). Onagraceae was found to diverge from Lythraceae at 111.3 My (95% HPD interval 106.0 - 116.6 My). Divergence time estimates of other major clades and 95% HPD intervals can be seen in Table S3.

## S3 Mating System Evolution Analyses

### S3.1 Model Priors

Model parameter priors are listed in Table S4. The rate of loss of self-incompatibility (*q _{ic}*), and the rates of switching between hidden states

*a*and

*b*(

*q*and

_{ab}*q*) were each given an exponential distribution with a mean of

_{ba}*n*/Ψ

_{l}, where Ψ

*is the length of the tree Ψ and*

_{l}*n*is the expected number of transitions.

*n*was given an exponential hyperprior with a mean of 20.

The speciation and extinction rates were drawn from exponential priors with a mean equal to an estimate of the net diversification rate . Under a constant rate birth-death process not conditioning on survival of the process, the expected number of lineages at time *t* is given by:
where *N*_{0} is the number of lineages at time 0 and *d* is the net diversification rate λ − *μ* (Nee et al. 1994; Höhna 2015). Therefore, we estimate as:
where *N _{t}* is the number of lineages in the clade that survived to the present,

*t*is the age of the root, and

*N*

_{0}= 2. The root state probabilities

*π*were set to start the process equally in either self-incompatible hidden state

*a*or self-incompatible hidden state

*b*.

### S3.2 MCMC Analyses

To account for uncertainty in phylogeny and divergence times 200 independent MCMC analyses were performed, each sampling a tree from the posterior distribution of trees generated during the phylogenetic analyses. All outgroup (Lythraceae) lineages were pruned off. Each MCMC run drew 10000 samples from the posterior distribution, with 190 randomly scheduled Metropolis-Hastings moves per sample. The first 10% of samples from each run were discarded as burnin. For each run, all parameters had effective sample sizes greater than 200, and the mean effective sample size of the posterior across all 200 tree samples was 1161.6. Estimates of the diversification rates were made by combining samples from all 200 independent runs.

## S4 Simulations

### S4.1 Simulated Datasets

100 datasets were simulated under a model where the observed binary character was diversification rate independent yet an unobserved binary character drove background diversification rate heterogeneity. First trees were simulated under BiSSE (Maddison et al. 2007) as implemented in the R package Diversitree (FitzJohn 2012). The binary character represented hidden states *a* and *b* with diversification rates λ_{a} = 1.0, λ_{b} = 2.0, *μ _{a}* = 0.4, and

*μ*= 0.1. The rate of change between hidden states

_{b}*a*and

*b*was set to

*q*= q

_{ab}_{ba}= 0.1. This resulted in trees that were qualitatively similar in shape to the empirically estimated Onagraceae tree, with a mix of early diverging depauperate clades and more rapidly radiating recent clades (Figure S6). To simulate incomplete sampling, 55% of the extant tips were randomly pruned off the tree. After pruning, tree samples were discarded unless they had between 100 and 200 sampled lineages that survived to the present. This restriction ensured that the simulated datasets were not too small for reliable inference and yet not so large to be computationally infeasible. Furthermore, we discarded datasets that had fewer than 20% of the tips in either hidden state to ensure that the trees were generated under a sufficiently heterogenous process.

Once the trees were simulated, diversification independent binary characters were simulated over the trees. These characters represented the observed character (mating system) and so were simulated under an irreversible model where the allowed transition occurred with the rate 10/Ψ* _{s}*, where Ψ

*is the length of the simulated tree. This represents an expected 10 irreversible transitions over the length of the tree, and resulted in simulated datasets with a proportion of either state similar to the proportion of self-compatible/selfincompatible in the empirical Onagraceae dataset. These diversification independent characters were then used to calculate Bayes factors that compared the fit of the diversification dependent model to the diversification independent model of mating system. For details on how Bayes factors were calculated see the main text. The false positive error rate was calculated as the percent of simulation replicates in which the Bayes factor supported the false dependent model over the true independent model.*

_{s}## 5 Acknowledgements

Thank you to Bruce Baldwin, John Huelsenbeck, Emma Goldberg, Michael Landis, Seema Sheth, and Carl Rothfels and his lab group for discussions that have improved our work. W.A.F. was supported by grants from the National Science Foundation (GRFP DGE 1106400 and DDIG DEB 1601402). Computations were performed on the Savio computational cluster provided by the Berkeley Research Computing program at the University of California, Berkeley.