## Abstract

Hidden Markov models (HMM) have emerged as an important tool for understanding the evolution of characters that take on discrete states. Their flexibility and biological sensibility make them appealing for many phylogenetic comparative applications.

Previously available packages placed unnecessary limits on the number of observed and hidden states that can be considered when estimating transition rates and inferring ancestral states on a phylogeny.

To address these issues, we expanded the capabilities of the R package

`corHMM`to handle*n*-state and*n*-character problems and provide users with a streamlined set of functions to create custom HMMs for any biological question of arbitrary complexity.We show that increasing the number of observed states increases the accuracy of ancestral state reconstruction. We also explore the conditions for when an HMM is most effective, finding that an HMM outperforms a Markov model when the degree of rate heterogeneity is moderate to high.

Finally, we demonstrate the importance of these generalizations by reconstructing the morphology of the ancestral angiosperm flower. Exactly opposite to previous results, we find the most likely state to be a spiral perianth, spiral androecium, whorled gynoecium. The difference between our analysis and previous studies was that our modeling allowed for the correlated evolution of several flower characters.

## 1. Introduction

Hidden Markov models (HMMs) are important centerpieces in many biological applications (Eddy, 2004; Yang Lou, 2017). They provide a natural framework for comparative biologists, particularly for relaxing assumptions about homogeneous evolution through time and across taxa without vastly increasing the number of parameters (e.g., Felsenstein & Churchill, 1996; Galtier, 2001; Penny, McComish, Charleston, & Hendy, 2001; Beaulieu, O’Meara, & Donoghue, 2013; Beaulieu & O’Meara, 2016). For instance, simple models of binary character evolution make sense for small, young clades, because a single set of transition rates seems like a reasonable assumption. However, homogeneous rates are unlikely to explain the evolution of the same character across a much larger and older clade in which transition rates may differ dramatically among subclades, perhaps due to correlations with traits that were not included in the model. This observation was the motivation for the development of the hidden rate model (HRM) of Beaulieu *et al.* (2013), which uses a hidden Markov approach to objectively locate regions of a phylogeny where hidden factors have either promoted or constrained the evolutionary process for a binary character.

Within comparative biology, HMMs have been applied as both standalone models (Beaulieu et al., 2013) and in combination with other phylogenetic models (e.g., hidden state-dependent speciation and extinction models, Beaulieu & O’Meara, 2016). Hidden Markov models can be used to address many problems in comparative biology (Siepel & Haussler, 2005) and their flexibility allows biologists to create models tailored to their specific hypotheses. However, previous implementations of HMMs for comparative methods have placed limitations on the number of observed and hidden states. For instance, the implementation of the HRM model of Beaulieu et al. (2013) is restricted only to the analysis of binary characters. There is no mathematical basis for limiting the number of observed states or hidden states in an HMM, and such constraints necessitate a simplification of datasets and candidate models.

We describe a new version of `corHMM` that implements *n*-state HMMs. This does not require new algorithms or a different likelihood function. Instead, we optimized and generalized existing code so users can create custom HMMs for any biological question of arbitrary complexity. We have also added a number of “quality of life” improvements that make `corHMM` much easier to use and interpret, including an implementation of stochastic character mapping (simmap; Bollback, 2006). Additionally, we demonstrate the effectiveness of HMMs to identify rate heterogeneity when it is present, and we outline the informational advantages of increasing the number observed and hidden states in discrete character data sets. Finally, to demonstrate the importance of this generalization we apply `corHMM` to reconstruct the morphology of the ancestral angiosperm flower.

## 2. Materials and Methods

### 2.1 Generalizing HMMs

From a technical standpoint, hidden Markov models have a hierarchical structure that can be broken down into two components: a “state-dependent process” (Fig. 1a,b) and an unobserved “parameter process” (Fig. 1c)(Zucchini, MacDonald, & Langrock, 2017). In comparative biology, for characters that take on discrete states the standard “state-dependent process” is a continuous-time Markov chain with finite state-space (CTMC-FS). The benefit of a Markov model is its simplicity — to calculate the probabilities of observed discrete states at the tips of a phylogeny all that is required is a tree, a transition model describing transitions among a set of observed states, and frequencies at the root (O’Meara, 2012; Fig. 1a,b). The observed states could be any discretized trait such as presence or absence of extrafloral nectaries (Marazzi et al., 2012), woody or herbaceous growth habit (Beaulieu et al., 2013), or diet state across all animals (Román-Palacios, Scholl, & Wiens, 2019). However, a simple Markov process that assumes homogeneity through time and across taxa is often not adequate to capture the variation of real datasets (e.g. Beaulieu *et al.*, 2013). Under an HMM, observations are generated by a given state-dependent process, which in turn depends on the state of the parameter process. In other words, the observed data are the product of several processes occurring in different parts of a phylogeny and the parameter process is way of linking them. It is initially unknown what the parameter process corresponds to biologically, hence the moniker “hidden” state. Nevertheless, the information for detecting hidden states comes from the differences in how the observed states change. As long as the transitions between observed states of different lineages are more adequately described by several Markov processes rather than a single process, there will be information to detect hidden states (see *3.1 Performance in Simulations*).

The likelihood of any HMM is obtained by maximizing the standard likelihood formula, *L* = *P*(*D*|(**Q**, *T*), for observing character states, *D*, across a set of extant taxa, given the continuous-time Markov model **Q**, and a fixed topology with a set of branch lengths (denoted by *T*). For a binary character, **Q** is a 2×2 transition matrix representing the transition rates, whose entries define transitions between the character states, 0 and 1. To form an HMM, we expand **Q** to accommodate both observed and hidden states. Formally, the HMM can be generalized to include any number of observed states (e.g., 0, 1, 2), and hidden states (e.g., A, B, C). Following Beaulieu and O’Meara (2016), the state space is defined as *o* being the index of the observed state, *o* ∈ 0,1.…,*α*, and *h* as the index of the hidden state, *h* ∈ *A, B,…, β*. Thus, a given model will have, in general, |*o*| × |*h*| states. In `corHMM`, the model **Q** is defined by amalgamating each of the state-dependent processes and the parameter process specified in the model. For example, if we have state-dependent matrices, * R*,
that are related by a parameter-process

*, where entries*

**P***r*

_{R1→R2}and

*r*

_{R2→R1}define transition rates between the state-dependent processes, we can extend Eq. 2 of Tarasov (2019) to amalgamate these processes,

This matrix can be understood as a block matrix where the diagonal blocks are the state-dependent processes **R**_{1} and **R**_{2}, and the off-diagonal blocks are the parameter process, * P*, that describe transitions between

**R**_{1}and

**R**_{2}. The resulting

**Q**can simply be evaluated using the same likelihood formula as before. It should be noted that the observed character states are set to 1 when the trait is consistent with the observed data, and 0 otherwise. For example, we would set the probability to 1 for both 1A and 1B for all species exhibiting state 1.

The general formulation of an HMM can easily be extended to examine the correlated evolution of multiple characters (Pagel, 1994). For example, consider a case of two binary characters where trait 1 defines the presence or absence of fleshiness of fruits, and trait 2 defines whether or not the fruits are animal-dispersed. At most there are four binary combinations of these characters (i.e., 00, 01, 10, and 11). But, it can also be coded as a single multistate character, where 1=dry fruits not dispersed by animals, 2=dry fruits dispersed by animals, 3=fleshy fruits not dispersed by animals, and 4=fleshy fruits dispersed by animals. Therefore, transforming binary combinations to multistate characters also applies for two characters with a different number of observed states. In other words, one character could be binary (e.g., dry vs. fleshy fruit) and the other could be multistate (e.g., fruits dispersed mechanically, by wind, or by animal).

### 2.2 Simulation Study

We conducted a set of simulations to address two broad goals. The first was to demonstrate, from an informational standpoint, the advantage of increasing the number of observed states by comparing two-state, three-state, and four-state datasets. The second goal was to demonstrate the ability of hidden Markov models to detect varying degrees of rate heterogeneity. We then link these goals together by demonstrating that while HMMs naturally address rate heterogeneity in discrete characters, they can also recover some of the informational content of unobserved characters through the use of hidden states. These simulations are in no way exhaustive, but represent a set of reasonable questions that many beginning users might have about the behavior of HMMs.

#### 2.2.1 Increasing the number of observed characters or states

To test the behavior of two-state, three-state, and four-state datasets we relied on ancestral state reconstruction (ASR) at nodes. ASR is a widely-utilized feature of corHMM, and it is important to know the accuracy of multistate ancestral reconstructions. Additionally, using ancestral states gives us a direct means to compare models with different datasets. A 500-tip phylogeny was simulated (birth rate set to 1 event Myr^{-1}, and death rate of 0.5 events Myr^{-1}) to be used as a fixed tree. Datasets were simulated using transition rates sampled from a truncated normal distribution (μ = 1, σ = 0.5), which were then scaled to have mean rates of 0.1, 1.0, or 10 transitions Myr^{-1} by dividing the rate matrix by the sum of the diagonal and then multiplying by the desired scalar. This resulted in a range of evolutionary rates. For each transition model, 100 datasets were simulated. The transition rates of each dataset were then estimated and their maximum likelihood estimates were used to infer marginal probabilities of each character state across the tree. This procedure was repeated 10 times.

An underappreciated concern with evaluating models that differ in the number of observed states is that the probability of guessing the correct state without any additional information is simply 1/*k* states. This could, in theory, inflate the accuracy of datasets with fewer states even though the tip states themselves provide no information about the ancestral states when the rates are high (Schultz, Cocroft, & Churchill, 1996; Sober & Steel, 2011, 2014). To deal with this issue, we also calculated the mutual information, measured in bits, about ancestral states from each dataset and model (Cover & Thomas, 1991; Sober & Steel, 2011). Specifically, mutual information is a measure of how much ancestral state uncertainty is reduced by knowing the tip states. The initial uncertainty, or unconditional entropy, is set by the model – given a model of evolution and no knowledge of the extant tips, how uncertain is the best guess of the ancestral states? The remaining uncertainty after ASR, or conditional entropy, is given by the combination of the model and the tip states – given the model of evolution and knowledge of the extant tips, how uncertain is the best guess of the ancestral states? It is important to note that information, just like ancestral state reconstruction, is highly correlated with the model of evolution, and thus any results related to information will take on the assumptions of the model.

We define information as the difference between the unconditional entropy of the node states, *H*(*X _{v}*), and the entropy of the node states conditioned on the data,

*H*(

*X*|

_{v}*X*–

_{h}*D*) (Cover & Thomas, 1991). The unconditional entropy of node

*v*is defined as: where

*π*[

*X*=

_{v}*i*] is the prior probability of a node taking a particular state. For the root, the prior depends on user choice, as there are several options (Yang, Kumar, & Nei, 1995; Pagel, 1999; FitzJohn, Maddison, & Otto, 2009). Here we assume the prior probability on the root node is the expected equilibrium frequency,

*π*, which is calculated directly from the transition model by solving

*π*= 0. This aligns our expectation of the root node with all other internal nodes such that, in the absence of information from the tips, the probability of a particular state is assumed to be drawn from the equilibrium frequencies. In other words, the information of the tip states decreases as rates increase and, ultimately, the probability of a node state becomes completely determined by the model. We define the conditional entropy as: where

**Q***P*[

*X*–

_{v}*i*|

*X*=

_{h}*D*] is the conditional probability that a node is fixed as being in state

*i*given the probability of observing the tip data (which is just the marginal probability of state

*i*). In particular, we are interested in the average entropy of a node for all states

*i*…

*k*, given we observe a particular dataset,

*X*–

_{h}*D*. Thus, the conditional entropy will vary by node, but the unconditional entropy is set by the model. To produce a measure of mutual information between the observations at the tips and estimates at internal nodes, we take the difference between the conditional entropy and the unconditional entropy and average across all nodes. However, the unconditional entropies will be greater for datasets that include more states because unconditional entropy sets the upper limit of what is possible to learn. This alone could contribute to large informational differences between models with different numbers of observed states. Therefore, we also measure the proportion of maximum information gained .

#### 2.2.2 Evaluating hidden Markov models

We evaluated the ability to detect rate heterogeneity by simulating data under an HMM.

As outlined above (see *2.1 Generalizing HMMs*), there are two major axes along which an HMM differs from standard Markov models. First, we varied the magnitude of the difference in the state-dependent process by simulating data under a model where there was: (1) no difference between the state-dependent processes (**R**_{1} **R**_{2}), (2) a 2-fold difference in rates between the state-dependent processes (e.g. if **R**_{1}’s mean rate was 1 Myr^{-1}, **R**_{2} mean rate would be 2 Myr^{-1}), (3) a 10-fold difference between the state-dependent processes, and (4) a covarion-like trait model in which transitions among nucleotide states occur freely, all transition rates are zero, and evolution is essentially “turned off” (Penny et al., 2001). For all simulation scenarios, we set the parameter-process to have equal transition rates between state-dependent processes. In addition to examining ancestral state reconstruction at nodes, we also used the new makeSimmap to assess how well the model captures the expected number of character changes within and among all branches in the tree. For each of the 150 datasets simulated above, we evaluated 100 simmaps per model by counting the number of transitions for a given simmap.

Next, we tested the impact of the magnitude of the asymmetry in the underlying parameter-process. We simulated data where the state-dependent process always differed by 2-fold, but for the underlying parameter-process there was: (1) no difference in transition rate (*r*_{R1→R2}=*r*_{R2→R1}), (2) 1.5× faster transition rate to the slower rate class (*r*_{R1→R2}>*r*_{R2→R1}), (3) 2× faster transition rate to the slower rate class, (4) 10× faster transition rate to the slower rate class. For each of the models described, we used the same 500-tip phylogeny as before to simulate 150 datasets.

Finally, we examined how much information is available when we allow for hidden states to be observed at the tips. We used the same data generated from simulations examining state-dependent differences, but this time we did not remove the hidden state. We then fit a Markov model to this full dataset and compared it to models in which the “second character” remained unobserved.

### 2.3 Case study: reconstructing the ancestral angiosperm flower

#### 2.3.1 Background

Understanding the origin of flowering plants is widely considered to be one of the most important goals of systematic botany. In a recent study, Sauquet et al. (2017) compiled an extensive database of floral characteristics and attempted to reconstruct the morphology of the ancestral angiosperm flower. The paper’s conclusions proved controversial. Sokoloff, Remizowa, Bateman, & Rudall (2018) disputed the original study’s claim that the ancestral flower had a whorled perianth, whorled androecium, and spiral gynoecium, instead preferring the hypothesis that the ancestral flower was either entirely whorled or entirely spiraled. We do not dispute the biological claims put forth by either side. However, a major limitation of the original study, by the authors’ own admission, was the fact that “no comparative method exists yet to account for the potential correlation of more than two discrete characters” (although see Beaulieu & Donoghue, 2013). This represents a major problem for the successful reconstruction of the ancestral state because flowers are highly integrated structures with potentially several developmental constraints (Sauquet et al., 2017, 2018; Sokoloff et al., 2018). Treating the phyllotaxy of the perianth, androecium, and gynoecium as independent represents a major assumption with potentially large consequences on the ancestral state reconstruction.

#### 2.3.2 Worked example and methods

We limited ourselves to including only the characters related to the phyllotaxy of the perianth, androecium, and gynoecium. Although it is possible to include other characters, given the corresponding increase of parameter space, we suspect that we would not have the power to accurately infer the model and ancestral states. An additional simplification was the exclusion of polymorphic species. Once again, corHMM is capable of analyzing such a dataset, but the rarity of these species would make maximum likelihood estimation far more uncertain. Thus, we treat the phyllotaxy of the perianth, androecium, and gynoecium as either “whorled” or “spiral”. We use the C series phylogeny of Sauquet et al. (2017) in which *Amborella* is constrained as the sister of angiosperms and *Monocotyledoneae, Ceratophyllaceae*, and *Eudicotyledoneae* are constrained to form a monophyletic group. In total, 291 taxa have a complete description of our focal characters.

In our case, we have three data columns each with two observed states. Because this dataset contains two or more columns of trait information, each column is automatically interpreted as an evolving character. In these cases, corHMM will also automatically remove dual transitions from the model since that would constitute two or more evolutionary events (Pagel, 1994; Maddison, Midford, Otto, & Oakley, 2007). For example, a lineage with a whorled perianth, whorled androecium, and whorled gynoecium cannot evolve directly to have a spiral perianth, spiral androecium, and spiral gynoecium. It is forced to gain the spiral state as three separate evolutionary events. In our analysis without hidden states we include three different model structure: `model=“ER”` (equal rates), `model=“SYM”` (symmetric rates), and `model=“ARD”` (all rates differ). The other options used (`rate.cat=1` and `nstarts=10`) specify that no hidden states are to be used and that the maximum likelihood search will be performed 10 additional times with different initial parameters. For example, the corHMM call specifying an all rates differ model without hidden states would be:
CorRes_1RC.ARD <- corHMM(phy=phy, data=data, rate.cat=1, nstarts=10, model = “ARD”)

We also include a set of analyses in which hidden states are present because it is likely that there are unobserved characters which influence the evolution of the angiosperm flower. We include four hidden state models: *ER/ER, SYM/SYM, ARD/ARD*, and *ER/ARD*. Each of these models allows for the possibility of rate heterogeneity through the inclusion of a hidden state, however the state-dependent processes differ. The *ER/ER* model can be thought of as a drift model where differences between **R**_{1} and **R**_{2} represent different rates of change, but ultimately state change is random. The *SYM/SYM* model specifies that within character changes are equally probable, but some characters may change faster than others. The *ARD*/ARD model specifies that there could be an optimal phenotype, but the optimal state may differ depending on whether the lineage is in **R**_{1} or **R**_{2}. Finally, *ER/ARD* is a hybrid model which includes aspects of random drift and selection towards an optimal phenotype. To create these models we use tools available in `corHMM` and specify our model through the `rate.mat` option. To create the *ER/ARD* model we first obtain a generic `rate.mat` object using the new `getStateMat4Dat` function:
LegendAndRateMat <- getStateMat4Dat(data)

The `getStateMat4Dat` function produces a list that contains `$legend`, which indicate the unique character combinations recognized by `corHMM`, and `$rate.mat`, which is a rate index matrix describing a single rate class. We can modify `$rate.mat` to customize transitions for our desired model. Any of the state-dependent processes can be based on the initial rate index matrix (from `getStateMat4Dat`) and then subsequently modified to specify the differences between the rate categories. In **R**_{1} we assume that all transition rates are equal, and in **R**_{2} we assume that all rates differ. This model describes a mixture of a drift-like process (**R**_{1}) and a process in which there is an optimal phenotype (**R**_{2}).
R1 <- getStateMat4Dat(data, model = “ER”)$rate.mat
R2 <- getStateMat4Dat(data, model = “ARD”)$rate.mat

We can group all of our rate classes together in a list. The first element of the list corresponds to **R**_{1}, the second to **R**_{2}, and so on:
StateMats <- list(RateCat1, RateCat2)

To obtain the parameter process matrix, we have implemented a separate function `getRateCatMat`. The only input is the number of hidden states to include in the model. By default, this function will assume that all transitions among the specified number of rate classes occur independently. In our example, we will generate a matrix that specifies how transitions between **R**_{1} and **R**_{2} occur. Note that **R**_{1} and **R**_{2} could represent a biologically-relevant, but unmeasured factor, such as temperate or tropical environments, island or mainland, presence or absence of a trait (e.g., woody vs. herbaceous life forms). For illustrative purposes, we will specify that the transition rate from **R**_{1} to **R**_{2} is the same as the rate from **R**_{2} to **R**_{1}:
RateClassMat <- getRateCatMat(2)
RateClassMat <- equateStateMatPars(RateClassMat, c(1,2))

We now have all the components necessary to create the full model using the `getFullMat` function. This function requires that the first input be a list of the state-dependent processes and the second argument be the parameter process:
ER.ARD_rate.mat <- getFullMat(StateMats, RateClassMat)

Three additional features warrant brief discussion. First, it is important to note that users are not limited to the default models (*ER, SYM, ARD*). Any state-dependent matrix can be modified to create specific models that suit the user’s hypothesis. For example, a model in which transitions between states is irreversible is not included within the default models but is nonetheless straightforward to create. Second, to help ensure that the user-specified rate matrix is consistent with their verbal model, we have implemented a new function for plotting a decomposed version of the model: `plotMKModel`. The user can input either the rate matrix they intend to use for modeling, or the resulting `corHMM` object. In both cases, the plotting output will be in two parts: (1) ball-and-stick diagrams of the state-dependent processes (Fig. 1a,b) and the parameter process (Fig. 1c), and (2) a set of rate matrices that describe the model in matrix form (Fig. 1d-f). Finally, we have implemented a new function, `makeSimmap`, which is based on Bollback (2006). Although simmaps are closely related to ancestral state reconstruction, a character history not only generates a hypothesis about the ancestral states but is an effective way to understand the tempo of evolution. This is particularly important for HMMs because the rates of evolution can vary drastically across the tree (see *Performance in simulation* below). We choose not to implement any new plotting functions, instead `makeSimmap` produces a `simmap` object which is formatted such that it can be used with other R packages such as phytools (Revell, 2012). For additional capabilities, options, and biological examples we refer readers to the detailed vignette now provided as part of the R package.

## 3. Results

### 3.1 Performance in Simulation

#### 3.1.1 Increasing the number of observed characters or states

Overall, the accuracy of an ancestral state reconstruction is a function of the transition rates as well as the number of states allowed in the model (Fig. 2a). For example, all datasets generally inferred the correct ancestral state at low rates. However, when viewed in terms of information, datasets that contained just two states showed detectable informational loss when compared to the three- and four-state datasets. In fact, across all scenarios — low, intermediate, and especially at the highest rates — datasets with more states consistently showed more informational gain relative to the maximum information content for a given number of states (Fig. 2b). We suspect this largely reflects the impacts of homoplasy when the number of character states are restricted in the model (Sanderson & Donoghue, 1989; Steel & Penny, 2005). This is not to say that more character states are always necessary for accurate ASR, rather we demonstrate that there are cases when additional characters or character states enhance the accuracy of an ancestral state reconstruction and those datasets have a signal of increased information.

#### 3.1.2 Evaluating hidden Markov models

The accuracy of ancestral state estimation, based solely on reconstructing character states at nodes, appears largely unaffected by the inclusion of hidden states regardless of differences in the state-dependent processes (Fig. 3a). However, the amount of information gained depends on both the use of an HMM and the presence of strong differences between the state-dependent processes (Fig. 3b). These seemingly contradictory results are best explained by examining the average number of transitions based on the simmap character histories. Datasets fit under an HMM more accurately estimated the tempo of evolution compared to the standard Markov model. We also found that when the generating model does not have state-dependent differences, the HMM does not pickup significant rate variation (Fig. 4a-c) and resembles the character history implied by the standard Markov model. These findings are reassuring and suggest that HMMs are supported in datasets where rate heterogeneity is present. Although a standard Markov model performs well when there are not state-dependent differences (Fig. 3c & Fig. 4b), it clearly underperforms compared to the HMM when there are state-dependent differences (Fig. 4d-f).

Surprisingly, we found little effect of altering the transition rate bias of the parameter process on either ancestral state reconstruction (Fig. 5a) or information content (Fig. 5b). The apparent association between ancestral state accuracy and bias is likely a consequence of asymmetric direction of evolution towards the slower rate class (as the bias increased, the time spent in a slower rate class increased). This is evident from fewer transitions occurring as the asymmetry was increased (Fig. 5c).

Unsurprisingly, observing the “second character” states increased the amount of information (Fig. 6). However, as the state-dependent processes became more distinguishable, the informational gap between an HMM and including observing the second character decreased. In other words, when the evolution of an observed character changes across the phylogeny, an HMM is able to extract additional information from a dataset.

### 3.2 Case study: reconstructing the ancestral angiosperm flower

The best fitting model was one without hidden states and all rates differing (Table 1). The most likely state at the root is a spiral perianth, spiral androecium, whorled gynoecium (Fig. 7a). This result differs Sauquet et al.’s (2017) which suggested a whorled perianth, whorled androeciuim, and spiral gynoecium as the most likely state. The difference between our result and those of Sauquet et al. (2017) is likely a direct consequence of allowing for correlated evolution among several characters. However, neither result matched the expectations of Sokoloff et al. (2018), who predicted that the ancestral state of the angiosperm flower was likely to be either entirely whorled or entirely spiral. Although our best fitting model did not match those expectations, if we include hidden states we found that the root state to be either whorled perianth, spiral androecium, and spiral gynoecium or entirely spiral (Fig. 7b). It is likely that the hidden rates model is more biologically appropriate because there are several characters which influence flower evolution not included in our analyses. But, in order to find support for a hidden rate model it is often necessary to have several examples of each state-dependent process within the dataset. This is more likely as a dataset grows larger and the appropriateness of a hidden state, as well as the power to detect rate heterogeneity, increases.

## 4. Discussion

Hidden Markov models are an essential tool for inferring character states across phylogenies. The new version of corHMM, expands the array of potential uses of HMMs by increasing the number of possible character states and allowing users to construct custom models. In addition, we demonstrated the informational advantages of using hidden Markov models versus simple Markov models. Users interested in hypothesis-driven model construction are encouraged to read through the vignette associated with the corHMM package. This vignette fully describes how to use the package and includes several examples of how to take a biological hypothesis and codify it into an explicit HMM.

Information theory has mainly been discussed in a theoretical context and rarely used in practice to understand empirical trait evolution (Mossel, 2003; Mossel & Peres, 2003; Townsend & Naylor, 2007; Sober & Steel, 2011, 2014; Gascuel & Steel, 2014). In this paper we have introduced a measure for the amount of information that the tips provide the nodes during ancestral state reconstruction. Two important caveats of this measure of information is that the data and model are taken as fixed. These are not uncommon assumptions in phylogenetic comparative methods. For example, if one is to interpret an ancestral state reconstruction it comes with the implicit assumption that the model accurately describes the evolution of the traits (Beaulieu & O’Meara, 2019). Mutual information, as we have defined it, only provides information relative to the specified model and specified tips. A model which is more uncertain about any ancestral state, such as an equal rates model, is liable to have a more informative ancestral state reconstruction because any deviation from an uninformed ancestral state is due to the particular values of the tip states. This does not make the equal rates model better than alternatives nor do we advocate for the use of information to assist in model selection. Instead, mutual information provides insight into the interaction between the model and tip states. Higher information of particular nodes could be indicative of an area of a phylogeny where the model is poorly suited and thus the tips provide the major explanation of the ancestral state. Mutual information is also highly correlated with the rates of evolution and has the intuitive property that as rates of evolution (or time) increase the information that the tips provide to the nodes decreases (Sober & Steel, 2011).

It is important to have a biologically realistic model of trait evolution when conducting an ancestral state reconstruction. With the generalizations made to `corHMM` we have provided two distinct ways to increase the realism of phylogenetic comparative modeling. First, we have allowed for the correlated evolution of several characters and states. Whether traits are correlated because of underlying pleiotropy leading to genetic correlation (Conner et al., 2011) or similar ecological contexts leading to convergent morphological evolution (Mahler, Revell, Glor, & Losos, 2010), at the macroevolutionary scale they are better understood in a holistic context rather than independently evolving subunits. Second, the inclusion of hidden states allows for more detailed descriptions of the evolutionary process. State-dependent processes can differ in both rate and structure and thus provide a description of heterogeneity in the tempo and mode of evolution.

We have demonstrated the informational and accuracy advantages of including additional states and characters in a simulation setting (*3.1.1 Increasing the number of observed characters or states)*. However, it remained to be seen whether additional characters would impact the ancestral state in an empirical example and whether those results match biological expectations. The controversy surrounding the phyllotaxy of the ancestral angiosperm flower is a particularly appropriate case study for the generalized version of corHMM, as it not only allows for the dependent evolution of several discrete characters but also includes hidden states as a fitting addition to help describe the heterogeneous evolution of angiosperms. The effects of including correlated character evolution or hidden states are evident from our ancestral state reconstruction of the angiosperm flower. The results of our best model were exactly opposite the original study’s finding (Sauquet et al., 2017). Additionally, although models which included hidden states were not best supported, the ancestral state reconstruction produced by these models matched the expectations put forward based on developmental data (Sokoloff et al., 2018). This demonstrates that reconstructions of ancestral states change when correlated trait evolution are incorporated and might be more biologically realistic when hidden states are included.

## 5. Conclusion

Although there is a growing consensus that phylogenies and their associated methods are being used in ways that exceed what they can infer (Losos 2011; Maddison and FitzJohn 2015; Rabosky and Goldberg 2015; Cooper et al. 2016), we have shown that there is still under-utilized information in phylogenetic comparative datasets. First, HMMs extract signals of rate heterogeneity when it is present and, equally important, do not falsely locate signals where they are absent. Second, increased trait depth adds new information and consistently improves ancestral state reconstruction estimates. Indeed, as datasets continue to grow, so will the analytical power that biologists have for testing complex models of evolution. Finally, the inclusion of correlated trait evolution and hidden states is relevant beyond theoretical considerations and we have shown that these generalizations can completely change the results of an ancestral state reconstruction in empirical datasets. Although hidden Markov models are not a perfect substitute for real observation of a hidden character, they make for a tractable and a biologically reasonable description of heterogeneity in the evolutionary process over long time scales.

## Availability

This open source software is written entirely in the R language and is freely available through the Comprehensive R Archive Network (CRAN) at http://cran.r-project.org/.

## Conflict of Interest

None declared.

## Author Contributions

J.D.B. and J.M.B. designed research; J.D.B performed research and analyzed data; and J.D.B. and J.M.B. wrote the paper.

## Acknowledgements

We thank members of the Beaulieu lab and colleagues at the University of Arkansas for their comments and for general discussions of the ideas presented here. We would also like to specifically thank Andrew Alverson, Brian O’Meara, and Adam Siepielski for their insightful critiques and helpful edits on an earlier version of this manuscript.