## Abstract

This paper investigates model congruence (= asymptotic unidentifiability) in phylogenetics for continuous-time Markov chains (CTMCs) that include models for DNA, protein, discrete trait evolution, and state-dependent diversification. Without exception, all CTMCs have infinite classes of congruent models. Congruent models vary in the number of parameters from one to infinity and may have drastically different evolutionary dynamics, and standard model selection criteria are not applicable to them. We classify the structure of a congruence class and show how the “best” model can be selected based on mathematical and biological reasoning. Thus, congruent models in CTMCs do not suffer from the model selection problem previously identified for the time-dependent diversification process. Moreover, we demonstrate that congruence may serve to explain some evolutionary phenomena, specifically linking macro-and microevolution. We also discuss other types of congruence that may occur in phylogenetics and ways to handle them.

## Introduction

Most studies suggest that organismal evolution at different organizational levels, be it DNA, proteins, phenotypes, or species diversification, is driven by heterogeneous processes with rates variable across time and lineages. A central goal of evolutionary biology is to disentangle biotic and abiotic factors that explain this rate variation. A recent study of Louca and Pennell (2020) [hereafter, LP] raised an important issue of model congruence occurring in the birth-death models used to reconstruct diversification. LP showed that any diversification scenario associated with extant timetrees could be explained by alternative scenarios, equally likely to have generated the observed data. In other words, for any observed extant timetree there is an infinite set of congruent models with different dynamics of extinction and speciation but identical likelihood scores. These congruent models are asymptotically unidentifiable (Morlon et al., 2022), which makes them indistinguishable given the observed data. Nevertheless, they suggest different biological interpretations for explaining rate heterogeneity.

Currently, there is no consensus within the phylogenetic community on how to deal with the congruence and correctly treat alternative scenarios (Morlon et al., 2022; O’Meara and Beaulieu, 2021; Helmstetter et al., 2021). LP demonstrated the existence of congruent models for only time-dependent diversification process, where speciation and extinction rates are constant across lineages. While they conjectured that the congruence issue might also arise in State-Speciation and Extinction (SSE) models where the diversification rate is allowed to vary across lineages [e.g., BiSSE, HiSSE, (Maddison et al., 2007; Beaulieu and O’Meara, 2016)], a recent study proposed that SSE models are identifiable (O’Meara and Beaulieu, 2021).

This paper discusses the congruence issue for general continuous-time Markov chains (CTMCs), not investigated by LP. CTMCs lie at the heart of phylogenetics and embrace the most common models for DNA (e.g, JC, GTR), protein (e.g, Dayhoff), discrete trait evolution, and SSE models for diversification. We show a general and mathematically tractable approach for constructing congruent models using hidden state expansion that relies on the lumpability property of Markov chains. Our approach demonstrates that, without exception, model congruence occurs in all CTMC models, regardless of whether they are modeled on a known phylogenetic tree, or the tree is a parameter to be estimated. Moreover, there is an infinite set of congruent models, called congruence class, for any single CTMC, as was shown for the birth-death models by LP. We find that CTMC models in a congruence class substantially vary in the number of parameters – from one to infinity. Thus, any CTMC, regardless of its level of rate heterogeneity, can be represented by a congruent model with one parameter or vice versa – a model with one parameter can be represented using many parameters. Such transformations come at the cost of expanding the model’s state space.

Naturally, congruent models in CTMCs may have drastically different evolutionary dynamics and may support alternative or even incoherent hypotheses in hypothesis-driven research. As noted by LP, this ambiguity imposes a model selection problem. We investigate the structure of a CTMC congruence class by classifying its constituent elements and demonstrate that a biologically and mathematically reasonable approach exists for unambiguously selecting the “best” model within a class. It is called the irreducible model and it is unique in each congruence class. Our selection approach is based on penalizing the model’s state complexity. The necessity for penalizing complexity was emphasized earlier for handling congruent models (Morlon et al., 2022). We also provide practical recommendations for it use. Thus, we show that CTMCs do not suffer from a model selection problem in the same way as birth-death models. Additionally, we discuss how a similar approach can be applied to address unidentifiability in time-dependent birth-death models. However, we admit that potential problems of model selection may arise when comparing models from similar congruence classes, in which models have similar but different likelihoods.

Generally, our findings indicate that model congruence does not pose a problem to phylogenetics. However, additional research is needed to understand relationships between congruence classes and develop general methods to handle congruence across different types of phylogenetic models.

Moreover, congruence might be an exciting tool for studying links between micro-and macroevolution. As we discuss below, it may explain how a constant-rate microevolutionary process generates variable rates at the macroevolutionary level due to the curvature of the adaptive landscape. In the end, we overview other types of congruence, not based on the hidden expansion, which may exist in CTMCs. Thus far, we see no evidence that they should challenge phylogenetic practice, but this requires further investigation. Most of our theoretical findings are supported by R scripts; see “Material and Methods” for more information.

## Theoretical Background and Notations

### Hidden Markov models

Hidden Markov models (HMMs) elaborate traditional CTMCs by splitting the model states into two layers – observable and hidden (Tarasov, 2019). The transitions between states are allowed only within the hidden layer that, in essence, represents a traditional CTMC. The observable states exhibit one-to-many mapping with the hidden states. Thus, the probability of an observable state in an HMM is integrated over the probabilities of it belonging to each of its possible hidden states.

Originally, HMMs were proposed to model rate heterogeneity across lineages (Tuffley and Steel, 1998; Beaulieu et al., 2013) and were interpreted as different rate categories influencing character evolution or hidden extrinsic environmental factors. A recent study (Tarasov, 2019) showed that their interpretation is, in fact, multifaceted. The hidden states can be anything of the following: time-heterogeneity, hidden genetic elements, environmental factors, and subjectivity associated with trait discretization. Thus, interpreting hidden states must be taken with caution and, like in diversification models, will lead to the degeneracy of processes onto particular patterns of trait evolution.

### Lumpability

The property of lumpability is central to the present study. Two or more states of an initial CTMC can be aggregated (lumped) into one state that produces the aggregated process with fewer states. If the aggregated process is still Markovian, the initial CTMC is called lumpable for the given state aggregation. If the process is not Markovian, the initial CTMC is called non-lumpable. The aggregation of states is equivalently applicable to three levels of a CTMC representation: rate matrix, Markov model topology, and a stochastic realization of the process. To train an intuition about the aggregation, suppose we have an initial CTMC that characterizes the evolution of a four-state character {*s*_{1}, *s*_{2}, *s*_{3}, *s*_{4}} and has the following rate matrix *Q*:

The aggregation at the level of the rate matrix means constructing a smaller matrix with fewer states and transitions. Suppose that we are willing to aggregate *Q* into a two-state process , with states , using the following partition of the initial states: . This partition can be visualized on *Q* by splitting the rates into four partition blocks shown with a horizontal and a vertical line within the matrix in the equation (1). Each block includes four initial rates whose function should yield new rates in the aggregated matrix. So, the aggregated matrix should be:

Before learning of the relationships between the original and the aggregated rates, let us consider other levels of aggregation. The same aggregation at the level of Markov model topology (Fig. 1A-B) implies collapsing a four-state transition graph of *Q* into the two-state graph for . Furthermore, the aggregation at the level of a stochastic realization means “recoloring” the realized initial states with fewer colors according to the state’s partition (Fig. 1C-D).

The lumpability property guarantees that the transition rates of the aggregated CTMC can be derived from the initial CTMC and thereby modeled using the lumped process. There are several classes of lumpability, of which the ”strong lumpability” (Kemeny and Snell, 1960) is the most interesting to us. This is because of its mathematical tractability and the fact that it holds in CTMCs on phylogenetic trees (Tarasov, 2019). Below, we refer to ”strong lumpability” as simply ”lumpability”. This property depends on certain rate symmetries in the rate matrix, which must obey the “row-wise sum rule” (RWR).

The RWR implies the following: *the original CTMC is lumpable with respect to a given partition of states when the row-wise sum of rates within one partition block in the original Q is the same for all rows within the given partition block, and this property holds for all blocks in Q (Kemeny and Snell, 1960). The rates in the aggregated matrix represent simply the row-wise sums of the original rates*.

For *Q* in the equation (1), the RWR is maintained if the following two equalities hold: and . These equalities imply lumpability of *Q* for the given state partition; and represent the new rates in the lumped process shown in the equation (2). The same aggregation for other levels of the Markov process is shown in Fig. 1B-D. Note, to prove lumpability, it is enough to show that the RWR holds for the off-diagonal blocks because it implies that the main diagonal blocks maintain the RWR too. In the lumped process, the initial probability vector at the tree root (π) should be aggregated, too: by adding up state probabilities from the original process belonging to the same partition subset.

The lumpability property is general and applies to all types of CTMCs. Lumpability does not depend on the time over which a CTMC evolves nor the initial probability vector at the tree root; if a CTMC is lumpable, it is lumpable under any value of the initial vector. If the lumpability property does not hold, then the simple form for the rate matrix for the aggregated process does not exist since it is not Markovian.

We denote the operation of lumping a CTMC, if the RWR holds, as *Lump*(·). For a lumpable CTMC, the following equality holds:
where *P* is the probability of the Markov process after some time *t*, that is *P* (*Q,π,t*) = *πe ^{Qt}*.

### Correlated CTMC evolution

When dealing with a set of evolving CTMCs, it is sometimes convenient to consider their evolution as a joint process instead of a set of individual processes (Tarasov, 2020). Both such representations are equivalent. To make the joint process, one needs to amalgamate a set of CTMCs into one CTMC by combining their states and rate matrices. The state-space of the amalgamated CTMC consists of all possible state combinations from the individual processes. For example, the amalgamation of two binary CTMCs with the states {0,1} produces a CTMC with four states {00, 01,10,11}. The rate matrix of the amalgamated process has a specific structure where only one-step transitions across state space are allowed while simultaneous transitions are prohibited.

If the initial CTMCs evolve independently, their joint rate matrix has certain symmetries. Specifically, if one partitions its state space with respect to states of some initial CTMC, then this aggregation would be lumpable, and the lumped process would yield the selected initial CTMC. One can construct an amalgamated CTMC that describes the correlated evolution of initial CTMCs in which the transition are dependent. In that case, the rate symmetries are broken, and the amalgamated process is not lumpable with respect to all or some initial chains. The CTMC amalgamation is often used to test (un)correlated character evolution by fitting both correlated and independent matrices and then choosing the best scenario using model selection (Pagel, 1994).

In this study, we consider amalgamated matrices constructed from a set of the initial CTMCs with an equal number of states and identical transition rates, *q*. We define an initial CTMC as *Q ^{s}* where s indicates the number of states. We denote the amalgamation of two rate matrices as independently evolving through the Kronecker sum (⊕). Then, the formula for it is

*Q*

_{1}⊕

*Q*

_{2}=

*Q*

_{1}⊕

*I*

_{Q2}+

*I*

_{Q1}⊗

*Q*

_{2}, where

*I*

_{Q2}and

*I*

_{Q1}are the identity matrices for the respective

*Q*’s, and ⊗ denotes the Kronecker product. To amalgamate

*n*identical matrices

*Q*as independently evolving, we should apply the Kronecker sum operation successively:

^{s}Note, below we refer to a rate matrix constructed from *n* identical and independent CTMCs with *s* states as ⊕^{n} *Q ^{s}*.

We can slightly modify the independent amalgamations to make all or some identical *Q ^{s}* evolve in correlation. For this, we can amalgamate them as previously (⊕

^{n}

*Q*) and set some allowed transitions (those denoting one-step changes) in that rate matrix to zero. Let us define this operation of the correlated amalgamation via ⊙. Then, ⊙

^{s}^{n}

*Q*indicates the amalgamation of n CTMCs, each with s states, where all or some CTMCs evolve in correlation.

^{s}The number of states in the amalgamated matrices grows exponentially as *s ^{n}*. The processes ⊕

^{n}

*Q*and ⊙

^{s}^{n}

*Q*have specific topologies in the state space because only one-step transitions are allowed. We generally refer to those topologies as a space of dimension

^{s}*n*on

*s*elements. For example, the space of dimension

*n*on two elements (⊕

^{n}

*Q*

^{2}and ⊙

^{n}

*Q*

^{2}) is a hypercube, with dimension

*n*= 3 being simply a cube. If

*s*> 2, we refer to those amalgamations as a hyperspace on s elements.

## Materials and Methods

We support most of our theoretical findings with simulations or computational examples in R using the following packages: *corHMM* (Beaulieu et al., 2013), *diversitree* (FitzJohn, 2012), *hisse* (Beaulieu and O’Meara, 2016), and *phytools* (Revell, 2012). The details on simulations are given, where needed, throughout the text, or in the scripts which are available in Supplementary Material at GitHub https://github.com/sergeitarasov/Congruent_CTMCs. The functions to reproduce the congruent transformations on the Markov models discussed in this paper are provided in the R package *rphenoscate* https://github.com/uyedaj/rphenoscate.

## Results

This section summarizes the main types of Hidden Expansions (HEs) used throughout the paper to build and study congruent models for CTMCs. To show practical aspects of HEs, we use several motivating examples with simple CTMCs. One can think of them as those CTMCs which describe the evolution of some discrete character on a given tree. These simple processes were specifically selected for explanatory purposes. In a separate section, we discuss model congruence in SSE models, which is the same as in other simple CTMCs. Different types of HEs are necessary for understanding the structure of a congruence class, which we also discuss in a separate section. In the Discussion, we show that the structure of the congruence class is crucial for assessing model selection methods.

### Hidden Expansion (HE)

Instead of aggregating and thereby constructing rate matrices with fewer states, the lumpability property can be used in the reverse direction. The state-space of any CTMC can be expanded by adding hidden states in a way that preserves lumpability. So, the expanded process can be collapsed back to the original one by lumping. The hidden states should be added to the observable ones without changing the number of observables. Hereafter, we refer to this transformation as hidden expansion (HE).

Consider a two-state CTMC {*s*_{1}, *s*_{2}} defined by the matrix *Q _{asy}* with asymmetric rates shown in the equation (5). For example, we can think of it as an evolutionary process that generates characters on a known phylogeny. Suppose, we expand

*s*

_{1}with two hidden states

*s*

_{1}= {

*s*

_{11},

*s*

_{12}}, then the new matrix

*Q*, that has two observable and three hidden states, may look like the one shown in right hand side of the equation (5). Obviously, lumping

_{he}*Q*with respect to the partition {{

_{he}*s*

_{11},

*s*

_{12}},

*s*

_{2}} produces

*Q*because the transitions

_{asy}*s*

_{11}→

*s*

_{2}and

*s*

_{12}→

*s*

_{2}maintain the RWR.

There are infinite ways to expand *Q _{asy}* with hidden states and obtain a matrix that would be lumpable back into

*Q*because there are infinite ways to maintain RWR. Both

_{asy}*Q*and

_{asy}*Q*exhibit congruent processes (denoted by ≅). The congruence means that the likelihoods for

_{he}*Q*and

_{asy}*Q*are identical for any set of data because the probabilities for their observable states are the same [the equation (3), Supplementary Material]. In other words, the two processes are phylogenetically indistinguishable – they have identical observable state space and yield identical likelihood scores. The congruence property is symmetric and transitive – if a process

_{he}*X*is congruent to

*Y*, and

*Y*is congruent to

*Z*, then

*X*is congruent to Z and Y. The set of congruent processes (or models) forms a congruence class (Louca and Pennell, 2020). Clearly, HE can be applied to any CTMC to produce congruent CTMCs with hidden states because any rate matrix can always be expanded to maintain the RWR. Thus, any particular CTMC process is a member of an infinite congruence class.

### Classification of HEs and Parametric Representation

#### Parametric forms

To derive parametric forms for congruent processes, we classify HEs into three main types depending on the number of parameters in a HE model compared to the original CTMC used to generate the expanded model. For this classification, we count only parameters located in the off-diagonal blocks (hereafter, off-diagonal parameters) of the HE matrix, but not the total number of the parameters in the model; because the off-diagonal parameters define the parametric form.

The three types of HEs are (1) equivalent expansion, (2) super-expansion, and (3) sub-expansion. They generate models with the same, greater, and fewer off-diagonal parameters, respectively. The parametric form for the equivalent and super-expansion models derives from the original model by maintaining the RWR, as shown in the Lumpability section above. For the sub-expansion model, the parametric form is conditional on the specific realization of the rate values, and the general form does not exist.

To derive a parametric form, we simply substitute the same values in a numeric form with a unique parameter. For example, the natural parametric form for *Q _{asy}* process from the equation (5) is a two-parameter model

*M*shown in the equation (6). The equivalent expansion of

_{asy}*Q*produces

_{asy}*Q*[the equation (5)] that is also a two-parameter model (

_{he}*M*) shown in the equation (6). Note, the parametric form indicates that the congruence between

_{eq}*M*and

_{asy}*M*holds for any value of

_{eq}*α*

_{1}and

*α*

_{2}. The example of a super-expansion for

*M*is shown in the equation (6) as well, it is a three-parameter model

_{asy}*M*. There are the following relationships between the original parameters and those in HEs:

_{s}*α*

_{1}=

*β*

_{1}=

*γ*

_{1},

*α*

_{2}= 2

*β*

_{2}=

*γ*

_{2}+

*γ*

_{3}. The parametric forms for the sub-expansions are discussed in the following sections.

Clearly, there are infinite ways to construct congruent equivalent-and super-expansion models by adding hidden states and/or parameters but maintaining the RWR. Thus, any CTMC model has infinite congruence classes consisting of expanded models.

#### Waiting-Times Preserving Models

Also, we define two other categories of HE models and processes that are essential for analyzing the CTMC dynamics. The Waiting-Time Preserving (WTP) hidden expansion does not change the expected waiting times between the original model and HE. Note that the waiting time is indicated by the negative main-diagonal entities in the rate matrix. Alternatively, the opposite is a non-Waiting-Time Preserving (nWTP) expansion that changes the expected waiting times between the original and HE models. For example, *M _{eq}* and

*M*[the equation (6)] are WTP expansions. However, zeros in those rate matrices can be substituted with a separate parameter. This substitution would maintain the lumpability but would make them nWTP.

_{s}### Sub-Expansions: Equal Rate Hidden Expansion (EHE)

Let us proceed further with hidden expansion and continue adding hidden states to the observable ones. Eventually, it is possible to reach a point when all allowed transitions have the same rate value. We call this transform Equal Rate Hidden Expansion (EHE). For the process *Q _{asy}* form the equation (5), it results in the following equal rate matrix (Fig. 2B):

The observable states in this particular matrix are represented by two and three hidden states respectively; the transitions in the partition blocks located along the main diagonal are prohibited (i.e., 0), all allowed transitions have the same rate equal to 0.1. Lumping *Q _{ehe}* with respect to the partition {{

*s*

_{11},

*s*

_{12}}, {

*s*

_{21},

*s*

_{22},

*s*

_{23}}} produces

*Q*from the equation (5). Like before,

_{asy}*Q*and

_{asy}*Q*are congruent CTMCs (Supplementary Material).

_{ehe}The EHE transform holds for any CTMCs regardless of the observable states and rate values because it stems from the general property of rational numbers. The proof of it is as follows. Imagine some original rate matrix *Q* whose off-diagonal rates form a set: {*q*_{1}, *q*_{2},… *q _{n}*}. These rates are some specific values that we assume to be rational numbers. We approximate them by rational number to any desired precision if they are not. Every rational number can be represented as a product of an integer (z), and a rational multiplier (

*q*) shared by all elements of the set, which gives {

_{sin}*q*

_{1}=

*z*

_{1}

*q*,

_{sin}*q*

_{2}=

*z*

_{2}

*q*, ⋯

_{sin}*q*=

_{n}*z*}. The shared multiplier

_{n}q_{sin}*q*is the equal rate, while integers (

_{sin}*z*) are hidden states in EHE. To calculate them, we should represent those rates as irreducible fractions of two integers and find their least common denominator (

_{i}*lcd*). So, the equal rate is

*q*= 1/

_{sin}*lcd*. The number of required hidden states for an observable state

*s*can be calculated by taking the maximum rate value from the

_{i}*s*’s column in

_{i}*Q*and dividing it by

*q*. For example, in

_{sin}*Q*from the equation (5), the two rates {0.2, 0.3} can be represented as fractions: 2/10 and 3/10; their

_{asy}*lcd*= 10, hence

*q*= 1/

_{sin}*lcd*= 0.1. So, the number of hidden states per observable state is 0.2/0.1 = 2 and 0.3/0.1 = 3, respectively. Finally, the EHE matrix can be constructed using the calculated number of hidden states and appropriately arranging

*q*within the off-diagonal blocks to maintain the RWR [the equation (7)]. The redundant transitions within off-diagonal blocks and the transitions within main diagonal blocks should be set to zero.

_{sin}The EHE transform exists for any CTMC process because all rate values in *Q* are rational (or can be approximated by rational numbers with high precision) and thereby have an *led.* The EHE transform for any given *Q* produces an infinite set of congruent processes because of the infinite number of common denominators for any set of fractions. To construct equal rate expansions, one can choose another common denominator (not *led*) and follow the aforementioned algorithm.

Note that it is not necessary to have zero-valued transitions along the main diagonal blocks to maintain lumpability. Within each such block, zeros can be substituted with any value that will elevate the expected waiting times and make the process nWTP. Yet another congruent nWTP might be constructed by substituting the main diagonal zeros to have equal expected times for all hidden states.

#### Parametric representation

The parametric form for *Q _{ehe}* from the equation (7) can be written by substituting all 1

*s*with an unknown parameter (

*β*) which yield a one-parameter model

*M*[the equation (8)]. So, EHE is a sub-expansion. Note, this model is congruent with the original process

_{ehe}*Q*[the equation (5)] but not the original model

_{asy}*M*[the equation (6)]. If one takes the original model

_{asy}*M*but different rate values (e.g,

_{asy}*α*

_{1}= 0.3 and

*α*

_{2}= 0.1), then this setting would require an EHE with a different number of hidden states (e.g., three and one respectively). Thus, the parametric form of EHE or any other sub-expansion is not defined for any original model since the number of hidden states depends on specific rate values. It happens because sub-expansions imply stricter constraints on the rate parameters compared to the original model. For example, if we lump

*M*with respect to the observable states then the resulting model is

_{ehe}*M*[the equation (8)].

_{lump}The model *M _{lump}* is a one-parameter model where rates are scaled by specific factors, it is a submodel of the original two-parameter model

*M*[the equation (5)]. The lack of a defined parametric form and stricter rate constraints is a general property of sub-expansion, where the number of the off-diagonal parameters is fewer than that in the original model. Although the parametric forms are undefined for an original model, they are defined when inferring likelihood since data exhibit the realization of some particular parameter values.

_{asy}### Sub-Expansions: Correlated Hyperspace Expansion (CHE)

We further elaborate the EHE process from the equation (7), by applying our favorite operation – adding hidden states. However, now we add them without altering the rate values. For example, we can add another two and one hidden states to each observable state, respectively. To avoid rows and columns with all rates equal to zero (i.e., prohibited), we rearrange non-zero rates to preserve the RWR. There are many ways to do this, and all of them produce equivalent results for our purpose. One such rearrangement gives the following process (*Q _{che}*) where each observable state consists of four hidden ones:

Lumping all hidden states in this matrix produces the initial CTMC, *Q _{asy}*, from the equation (5). Clearly, both

*Q*and

_{asy}*Q*are congruent processes (Supplementary Material).

_{che}One may notice that *Q _{che}* defines a correlated evolution of three binary CTMCs (i.e., ⊙

^{3}

*Q*

^{2}). To demonstrate it, we need to rearrange the rows and columns in

*Q*accordingly, which is visualized in Fig. 2C. The correlation with respect to the binary CTMCs arises due to the prohibited transitions marked with “*” in the equation (9). These transitions break the independent evolution between the three CTMCs but are necessary to maintain the lumpability of

_{che}*Q*. If they were allowed (with a rate equal 0.1), it would imply an independent evolution for the binary CTMCs but violate the congruence of CHE with

_{che}*Q*. Note, each observable state of

_{asy}*Q*mismatches the observable states corresponding to the binary CTMCs (Fig. 1C). So, those three CTMCs can be thought of as correlated hidden factors (or characters) whose interaction generates the new observable state space (Fig. 2D-E).

_{che}Hereafter, we refer to the proposed transform as Correlated Hyperspace Expansion (CHE). To find CHE for some binary matrix *Q*, apply EHE to *Q*, and find an observable state with the maximum number of hidden states; denote this number by *N _{max}*. Any amalgamation of binary CTMCs (⊕

^{n}

*Q*

^{2}), where

*n*≫

*N*, embeds the EHE matrix. Use this ⊕

_{max}^{n}

*Q*

^{2}by populating allowed transitions with the equal rates found for EHE. Prohibit certain rates in ⊕

^{n}

*Q*

^{2}to construct ⊙

^{n}

*Q*

^{2}and maintain the RWR (as shown above). This yields the desired CHE transform. The derivation of CHE for

*Q*with more than two states is less trivial.

In fact, any CTMC process transforms via CHE into a congruent and equal-rate matrix that defines the correlated evolution of certain underlying CTMCs (Appendix A1). The number of those underlying CTMC characterizes the dimension in the CHE process and depends on the initial rate values. If the initial CTMC has *s* states then its CHE transform results in a hyperdimensional CTMC on *s* elements (⊙^{n} *Q ^{s}*). For example, transforming a three-state CTMC via CHE means amalgamating a certain number of three-state CTMCs (⊙

^{n}

*Q*

^{3}). One way to interpret CHE is to think of any given CTMC coming from some higher dimensional space where the unobserved hidden factors evolve in a correlated fashion and generate the observable CTMC.

The parametric representation for CHE is analogous to that for EHE since CHE is a sub-expansion.

### HEs and State-dependent Speciation and Extinction (SSE) Models

All aforementioned types of HEs apply to CTMCs with infinite state space as in simple birth-death or SSE models, because lumpability is the general property of Markov chains (Appendix A2). Thus, an infinite set of congruent models exists for any SSE model used for reconstructing lineage-specific diversification. The general SSE model consists of two components: (i) a birth-death process that generates extinctions and speciations on a tree; and (ii) a CTMC that generates evolution in observed and hidden traits, which may or may not affect the the birth-death process. The hidden expansions can be constructed for each of these components. The properties of these expansions are the same as described above for the finite state CTMCs. For example, any complex SSE process with variable rates for trait evolution and diversification can be represented by a congruent process with equal rates for both components (i.e., as EHE or CHE, Appendix A2.1). Note, that the hidden expansion for the diversification process requires a dual transition — a simultaneous speciation/extinction event and change of the trait’s state. The standard SSE models currently used in phylogenetics (e.g., BiSSE, HiSSE) do not allow it, but can be extended to include it.

### The Structure of a Congruence Class

Every congruence class (a set of all congruent models) generated by HEs, consists of a unique non-lumpable model (*M*_{0}) that cannot be lumped further and an infinite set of lumpable models generated by expanding *M*_{0} with hidden states (Fig. 3). We call *M*_{0} the irreducible model and its states the irreducible states; *M*_{0} can be with or without hidden states. Clearly, if *M*_{0} has hidden states, then it characterizes a process that does not have a simpler representation (i.e., without the hidden states). Thus, all other models in the class have more hidden states than *M*_{0}. Note that the congruence class can be reconstructed for any given model. For example, if *M*_{0} is known, then the class is all HEs of *M*_{0}. If some other model is known, lump it into *M*_{0} and follow the previous step.

The lumpable congruent models fall into three types – equivalent, super-expansion, and sub-expansion – depending on the type of HE that generates them. The parametric forms for the equivalent and super-expansion models can be derived directly from *M*_{0} using the RWR. The total number of parameters in this model is the same (e.g., WTP equivalent type models) or greater (e.g., nWTP equivalent or super-expansion models). Since there is no upper bound to maintain lumpability, the number of parameters in super-expansion models can grow to infinity.

The parametric forms for the sub-expansion models are conditional on the specific parameter values of *M*_{0}. It means that every particular point in the parameter space of *M*_{0} has some unique sub-expansion model. The total number of parameters in sub-expansion models might be: (1) less than in *M*_{0} (e.g., WTP models), all the way down to a single parameter (EHE, CHE), indicating that such models are nested within *M*_{0}; or (2) the same and greater than in *M*_{0} (e.g., nWTP models).

The irreducible model *M*_{0} is congruent to all equivalent and super-expansion models. However, it is not congruent directly with any particular sub-expansion model (e.g., EHE or CHE) because the parameter space of *M*_{0} is larger than that in the sub-expansion models. Instead, *M*_{0} is congruent to some sets of them, for example, the set (denoted by [·]) of CHE and EHE models, *M*_{0} ≅ [*EHE*] ≅ [*CHE*]. Thus, the congruence class has an infinite number of paths that connect congruent points in parameter space across different models. These points assemble into models whose congruence depends on their type. Each congruence class has an invariant quantity that is constant across all its models – this is the row-wise sum of rates with respect to the irreducible states in *M*_{0} that obeys the RWR.

## Discussion

### Model Congruence in Phylogenetics

We have shown the general technique for constructing congruent models by hidden expansion – adding hidden states to a CTMC and maintaining its lumpability property through the RWR. Any CTMC has an infinite set of congruent models, which can be constructed by different types of HEs (equivalent, super-and, sub-expansions). These expansions result in congruent models, in which the number of parameters can vary from one to infinity for any original model. The sub-expansion models have a narrower parameter space, and their parametric form depends on the specific rate values in the original model. The number of the hidden states and the original rate values is confounded in the sub-expansion.

Congruent models are a general property of Markov chains and, thereby, apply to all kinds of CTMCs used in phylogenetics for modeling discrete traits, DNA, proteins, and species diversification (SSE models). Given this generality, the proposed congruence holds in models treating phylogeny as an unknown parameter. Thus, the model congruence discovered in the time-dependent diversification process by Louca and Pennell (2020) also exists in the general CTMCs.

We need to admit that other types of congruence also exist in CTMCs. We discuss them at the end of this paper. They are less tractable mathematically but we did not identify any problems they pose for phylogenetic analysis as in, for example, model selection.

#### Comparison of HE and LP Congruence

Based on the current state of knowledge, our congruence classes by HEs seem quite similar to those described by LP. The congruent models result in the same likelihood regardless of the data. They both contain an infinite set of models. They have an invariant quantity in their rates – pulled speciation and diversification rates in LP class and sums of the transition rates (the RWR) in HE class. Importantly, like diversification models, the dynamics of alternative models within the same class varies substantially (see the next section). Moreover, standard information criteria are not applicable for model selection in both classes (see next section). However, the LP classes are mathematically challenging since they do not have closed-form expressions. Conversely, the HE classes can be systematically constructed by using lumpability and hidden states. Therefore, we do not know whether the general structure of the LP and HE classes is similar enough to apply the same principles to both. For example, whether in LP classes, there exists a unique element that would be analogous to our irreducible *M*_{0}. This uncertainty emphasizes that congruence properties require further research in phylogenetics.

### Alternative Evolutionary Scenarios

The congruent models are asymptotically unidentifiable but, as noted by LP, may exhibit markedly different evolutionary dynamics, which is also the rule for the HE models. They have two behaviors depending on whether the model preserves the waiting times (WTP) or not (nWTP) compared to the irreducible model (*M*_{0}) from its congruence class. We exemplify these dynamics by mapping the evolutionary history of the simple two-state character, *Q _{asy}* [the equation (5)] on a known phylogeny (Fig. 4). Here,

*Q*is the irreducible model. All WTP models within a class have the same number of state changes over a tree compared to

_{asy}*M*

_{0}, indicating that they and

*M*

_{0}have the same average evolutionary rate (Fig. 4A-B). However, WTP models have larger state space because of the hidden states. Recoloring the hidden states into the original ones (i.e., lumping) returns the character history for

*M*

_{0}.

The nWTP models have more state transitions, compared to *M*_{0}, and thereby a faster evolutionary rate that can be as fast as one desires since there is no limit (Fig. 4C). Moreover, this rate can be unbalanced between the observable states, for example, slow in one and ultra fast in the other (Fig. 4D). However, recoloring the hidden states also returns the history for *M*_{0}. This distinct scenario may indeed lead to drastically different biological interpretations as regards to the underlying evolutionary rates.

Analogous behavior would hold for SSE models and diversification rates as they can be similarly mapped onto phylogeny with time intervals on tree branches colored according to speciation and extinction rates. However, the available SSE software does not allow dual transitions; hence, full HEs cannot be applied to SSE models for this demonstration.

Another type of congruence between alternative scenarios may mislead hypotheses testing. In SSE models, two alternative hypotheses are usually tested. One implies that diversification does not depend on the observable trait, known as the character-independent (CID) model (Beaulieu and O’Meara, 2016). The other implies that diversification is character-dependent. Interestingly, the CID model is congruent with some (but not all) character-dependent models, and they have the same number of parameters (Appendix A2.2 and Supplementary Material). Thus, the standard model selection procedure cannot distinguish the two hypotheses.

### Selecting among congruent models

LP suggested that the models within the same congruence class are not identifiable and equally possible, precluding reliable inference of the diversification dynamics in time-dependent birth-death models.

Given a similar indentifiability issue in CTMCs, can we select among the alternative models in a HE congruence classes? We believe that we can, but not using the standard model selection. The standard model selection criteria as the Akaike Information Criterion [AIC, (Akaike, 1981)] or the Bayesian Information Criterion [BIC, (Schwarz, 1978)] do not work for this problem. They would always select sub-expansion models because these models have fewer parameters in a congruence class. Although these models look simple (e.g., one-parameter EHE or CHE), their parameter space is very narrow, and the state space is complex, rendering them biased and having little explanatory power.

If the EHE or CHE is selected based on AIC or BIC, we would have a situation where a separate model would describe every single dataset with an almost unique and large number of hidden states. If, we were to follow this route, then it would require we discuss why one binary character has a hundred hidden states, and another character has four hundred hidden states. Such questions may not make sense biologically unless we have access to alternative means for studying the underlying state space. Instead, explaining this process using variable transition rates in a binary character (without hidden states) will likely represent much more interpretable evolutionary information. Therefore, we can rule out the sub-expansion models from the candidate list.

Using similar logic, we can straightforwardly remove the equivalent and super-expansion models from the selected candidates list because they have more parameters or hidden states than *M*_{0}. Thus, the only reasonable candidate model becomes *M*_{0} itself.

The same principles apply to hypotheses testing in SSE. We may prefer selecting the CID model (that is *M*_{0}) since it provides a simpler explanation of data instead of selecting the congruent character-dependent model. Thus, following the parsimony principle that embraces the penalty on the number of hidden states and parameters, we can unambiguously select *M*_{0} – the unique irreducible model in a congruence class. This parsimony principle is also backed by the biological reasoning outlined above.

Our main recommendation is to always use irreducible models for model comparison. When comparing models between different congruent classes, one needs to ensure that they are irreducible. It is trivial if a model has no hidden states because all such models are irreducible. However, one should always check if a smaller irreducible model may exist for the models with hidden states using the RWR.

In fact, irreducible models have been used in phylogenetics for decades (i.e., our standard models) and seem to often have been quite successful in explaining the observed data. A priori selection of an irreducible model means that even if the true process was evolving under some other model (non *M*_{0}), we would never be able to identify it. But do we really need it? Possibly not; if following parsimony, we assume that other models are less likely to occur in nature (Morlon et al., 2022).

General considerations on model penalization and statistical regularization for congruent models are overviewed in Morlon et al. (2022). We agree that this is the way forward to handle the congruence issue described by LP.

#### Solving LP congruence with the HE principles?

If we assume that the HE congruence is similar enough to the LP congruence, what the analogous parsimony principle would look like for the LP problem?

Imagine that the rate matrices for our congruent models are unknown. We even do not know how to write them on paper. However, we have an algorithm that can reconstruct mapped evolutionary histories for our congruent models given data, as shown in Fig. 4. We observe that congruent histories have different dynamics but the same likelihood. In this case, the complexity penalty would refer to the total number of states and the frequency of state changes in a mapped history, which is the same as the penalty derived using the rate matrices. Thus, the analogous metric of complexity for a time-dependent birth-death process in LP might be some kind of curvature for the extinction and speciation functions. This metric would select a model with the smoothest functions in a congruence class [see, Magee et al. (2020) for one implementation of this philosophy].

### Curvature of Adaptive Landscape Explains Macroevolutionary Rate Variation

We believe that the provided hidden expansions may open up new avenues for studying evolutionary processes. For example, CHE transformation may explain rate variation at the macroevolutionary scale through a prism of microevolutionary processes, data, and biological knowledge. The “holey adaptive landscape” model (Gavrilets, 1997; Gavrilets and Gravner, 1997) suggests that genotypic space is a high dimensional graph where viable genotypes form clusters connected by a chain of one-step mutations and surrounded by “holes” – volumes of the space with low fitness (Fig. 5A). Evolution might be seen as a random walk of a species or population on such multidimensional graphs over time. Topologically, these graphs are identical to those that result from CHE transformation [e.g., Fig. 2C and Gavrilets and Gravner (1997)]. Interpreting hidden factors in CHE as individual genetic elements, while its prohibited transitions as holes, suggests that CHE exhibits an analog of a holey adaptive landscape. Although the “mutation” rate in CHE is constant, lumping it for the observable states returns a heterogeneous Markov process. It happens because the evolutionary walk of species or population between the hidden factors proceeds at a constant rate. However, this walk should constantly go around the holes (i.e., prohibited transitions), rendering its trajectory a hyperdimensional curve. The observable states exhibit a projection of the high-dimensional hidden space into our low-dimensional world. Projecting the curved walk onto the observable states forms a process that evolves at variable rates (Polly, 2008).

A simple metaphor would help train an intuition of the curved walk. Imagine a taxicab driving around downtown at a constant speed (Fig. 5B). Due to road constructions (= holes), the taxicab can only travel on certain streets to reach the final destination. Its movement within the two-dimensional downtown landscape is constant by our assumption. However, if one observes its movement in one dimension alone – south-north direction – then it appears to happen at different speeds due to the curved driving path. This is similar to what CHE transform implies. Thus, the constant-rate dynamics on the holey adaptive landscape at the microevolutionary scale may generate rate heterogeneity over larger time intervals at the macroevolutionary scale. Given the generality of CHE, this phenomenon equally may apply to molecular evolution, traits, and diversification.

In the literature it is common to explain macroevolutionary rate variation by temporal changes in biotic and abiotic factors. Our theoretical finding indicates that rate variation at macroevolutionary scale might be explained by the topology or “evolutionary architecture” of adaptive landscape alone.

At the macroevolutionary scale, multiple factors will change over time – the selective environment, genetic architecture, development, mutation rates, life history, etc. Each of these may map onto additional hidden factors that may alter transition rates between states or expand the state space of evolutionary change in our framework. As the environment changes, so too does the ”holey adaptive landscape,”-expanding the dimensionality of the state space further. Over millions of years of macroevolutionary change, evolutionary walks traverse this vast “macroevolutionary landscape” during trait evolution, shedding light on its larger topology and accessibility. When using only macroevolutionary data, many of these different factors-including the observational process by the researcher’s construct characters [see, the Two-Scientist Paradox in Tarasov (2019)] are confounded, and many of them may lead to the appearance of heterogeneity in rate. However, integrating knowledge of “microevolutionary topologies” known through developmental biology, mutation screening, trait variation within and among populations, and selection experiments, for example, can begin to constrain congruence classes to the set of macroevolutionary topologies that are consistent with the observed data. We believe this will provide a pathway for integrating microevolutionary data into macroevolutionary studies by using micro and macroevolutionary data jointly to “learn” the overarching topology of macroevolutionary landscapes. If macroevolutionary dynamics change, for example, with rate shifts in particular clades–candidate factors can be investigated within groups to identify plausible causes that may further expand the state space of macroevolution. Such causes can range from trivial (improperly lumping distinct states into a single state, e.g., the wings of birds and insects) to elucidating dramatic changes in the evolvability of a character due to developmental reorganization.

Thus, we see two primary advantages to recognizing and exploring the congruence classes of Markov models. First, we can better understand the limits to macroevolutionary inference and enumerate the large family of candidate processes that can explain a given macroevolutionary pattern by exhaustively or heuristically exploring higher dimensional state spaces [as in Höhna et al. (2022)]. But second, within those congruence classes can be found topologies that map onto specific biological hypotheses for the structure of evolutionary change, each of which may make specific predictions, for example, for how traits are constructed developmentally or evolve in response to selection at the microevolutionary scale. Thus, while the apparent widespread existence of congruence classes across comparative models is a substantial challenge to macroevolutionary inference, full recognition and understanding of this higher dimensional state space may nevertheless provide the key insights needed to connect macroevolutionary models to the causes and underlying processes of evolutionary change.

### Are there other congruence classes?

CTMCs have other types of asymptotically unidentifiable congruent models besides those reviewed here, based on the lumpability property. Some of such models can be constructed given that *e ^{X}e^{Y}* =

*e*

^{(X+Y)}, if the matrices commute:

*XY*=

*YX*. For example, imagine a homogeneous CTMC that we denote as with all transitions equal to one (i.e, ); it evolves over a tree branch of length

*τ*, with probability . This CTMC is congruent to a heterogeneous CTMC consisting of two rate matrices

*Q*

_{1}and

*Q*

_{2}, in which the parameters (

*q*

_{1}and

*q*

_{2}, respectively) are different between the matrices but equal within them following this relationship

*q*

_{1}+

*q*

_{2}= 2; for example, the rates can be:

*q*

_{1}= 0.5 and

*q*

_{1}= 1.5; apparently,

*Q*

_{1}and

*Q*

_{2}commute. The heterogeneous CTMC evolves under

*Q*

_{1}from the start (

*τ*= 0) to the half-time (

*t*=

*τ*/2), and under

*Q*

_{2}from the half-time to

*τ*; its probability is . Clearly, the probabilities are the same for these two processes , and hence the likelihoods are the same too. There are infinite ways to construct a congruent heterogeneous process from a homogeneous one, following the properties of matrix commutativity. Obviously, in such trivial cases, the heterogeneous CTMC will always have more parameters than the homogeneous CTMC, which does not pose any problem of model selection since the homogeneous model can always be chosen due to its simplicity. Even if the true process was heterogeneous, we would never be able to detect it. We must assume that the homogeneous CTMC represents the average rate of the true process that we consider the simplest description. Therefore, this trivial congruence is less interesting than the HE congruence at this moment.

Things become more complicated if the matrices do not commute (i.e., *XY* ≠ *YX*). Then, the homogeneous rate matrix may or may not exist. This issue is related to the multiplicative closure of Markov models (Sumner et al., 2012b) that has been studied using interesting methods of Lie group theory (Sumner et al., 2012a; Woodhams et al., 2015). These methods might be promising for studying model congruence in general. In this type of congruence, the problem of model selection may arise if it is possible to show that some homogeneous process can be represented by a set of rate matrices that have fewer parameters in total than . In that case, this would make the model selection ambiguous. We do not know any such examples, but this issue requires further research.

Another class of congruent models for general CTMCs can be constructed using weak lumpability that holds under specific values of the initial probability vector (Kemeny and Snell, 1960). The earlier study (Tarasov, 2019) showed that the weak lumpability does not generally hold in the CTMCs operating on phylogenetic trees. However, weak lumpability is less tractable mathematically, and it is possible, but unlikely, that some specific kind of it may apply to phylogenetics. Thus, model congruence needs additional research to identify cases that may pose ambiguity for model selection or prove that such cases do not exist.

### Future Perspectives

This paper showed that congruent models exist in all types of CTMCs used in modeling various phylogenetic phenomena. The HE can be used to construct infinite congruence classes that include models with a varying number of parameters. However, we show that the model selection within the same congruence class is not particularly problematic by penalizing model complexity based on mathematical and biological reasoning. We also believe that similar methods that penalize for complexity, use priors, or other regularization (Morlon et al., 2022; Magee et al., 2020) techniques should be adopted for the congruence occurring in the time-dependent diversification process discussed in LP. Thus, model congruence does not generally pose a problem but a challenge that helps us better understand statistical models and underlying structure of evolutionary change. It can be potentially used as a tool for understanding evolutionary processes, as we demonstrated using the holey adaptive landscape evolution. Thus, we call for additional research in congruent models to better understand relationships between congruence classes and classify other types of congruence that may occur in CTMCs.

## Funding

ST was supported by the Academy of Finland grant: 339576, and the three-year grant from University of Helsinki; ST partially conducted this work while a Postdoctoral Fellow at the National Institute for Mathematical and Biological Synthesis sponsored by the NSF Award DBI-1300426.

JCU was funded by NSF-DEB-1942717 and NSF-DBI-1661516.

## Appendix

### A1. Proofs for CHE

In the proofs below, consider thinking of any CTMC as of a graph with states (nodes) and transitions between them (edges). Every such a graph is specified by a rate matrix. Coloring graph states corresponds to splitting the states into groups which refer to the observable states (different colors means different observable states). We say that a graph can be colored (or coloring is possible) if no two adjacent vertices (i.e., those connected by the same edge) are of the same color. We provide two proofs for the existence of CHE, one in *Q*^{2} (trivial) and another in *Q ^{s}*.

#### A1.1 Proving CHE for a two-state CTMC

*Any two-state CTMC with asymmetric and defined rates* *can be represented by a hyperdimensional CTMC with hidden states (*⊙^{n} *Q*^{2}). *The hyperdimensional CTMC characterizes correlated evolution of n two-state CTMCs (Q*^{2}). *Both* *and* ⊙^{n} *Q*^{2} *are congruent processes*.

Suppose ⊕^{n} *Q*^{2} describes evolution of *n* independent CTMCs with equal transition rate *q*. The state space of ⊕^{n} *Q*^{2} is a hypercube with dimension *n*. The number of states in ⊕^{n} *Q*^{2} is 2^{n}, every state is labeled by the combination of states (i.e., 0s and 1s) from *n* initial *Q*^{2}. For example, for ⊕^{2} *Q*^{2}, the state labels are {00, 01, 10, 11}.

Every hypercube graph can be colored with two colors because all hypercube graphs are bipartite. The colors are two observable states. This coloring may be constructed by giving one color to those labels that sum up to an even number and the other color to those that sum up to an odd number. Arranging the rows and columns in the rate matrix ⊕^{2} *Q*^{2} by colors splits the matrix into four blocks that correspond to transitions between the observable states (i.e., colors). Main diagonal blocks in the arranged ⊕^{2} *Q*^{2} has all transitions equal to 0s (by the coloring condition). The transitions between the hidden states are allowed only in the off-diagonal blocks, the number of possible transitions per hidden sate equals to the dimension *n*. Obviously, ⊕^{2} *Q*^{2} is lumpable, with the respect to the two colors, into a binary CTMC with symmetric transition rate *nq* (by the RWR). It is always possible to modify one off-diagonal partition block in ⊕^{2} *Q*^{2} by prohibiting certain allowed transitions (i.e. making them 0) but, at the same, time maintaining the RWR within this block; this implies setting the number of allowed transitions for each hidden state in the modified block to *k*, (*k* < *n*). Such a transformation results into ⊙^{n} *Q*^{2} that describes evolution of *n*-correlated binary CTMCs. Obviously, ⊙^{n} *Q*^{2} is also lumpable (by the RWR) with the respect to the two colors and produces some binary CTMC, , with asymmetric rates *nq* and *kq*. Apparently, the construction of ⊙^{n} *Q*^{2} is flexible: the variables *n, q* and *k* can be chosen so that the rates *nq* and *kq* become any desired rational numbers. Moreover this construction is not unique for *Q*_{2}. This implies that any is congruent to ⊕^{n} *Q*^{2}.

#### A1.2 Proving CHE for a CTMC with any number of states

*An s-state CTMC with variable and defined rates* *can be represented by a hyperdimensional CTMC with hidden states (*⊙^{n} *Q ^{s}*).

*The hyperdimensional CTMC characterizes correlated evolution of n s-state CTMCs (Q*

^{s}). Both*and*⊙

^{n}

*Q*.

^{s}are congruent processesThe proof is similar to above. Suppose, there is ⊕^{n} *Q ^{s}* that describes evolution of n independent CTMCs (

*Q*) with equal transition rate q; the number of states in ⊕

^{s}^{n}

*Q*is

^{s}*s*. The main idea is to show that amalgamated ⊕

^{n}^{n}

*Q*can be colored by

^{s}*s*colors that correspond to observable

*s*states. Since the coloring is possible (see the next section) then arranging the rate matrix of ⊕

^{n}

*Q*by colors results in

^{s}*s*

^{2}matrix blocks. The main diagonal blocks have all transitions equal to 0s (except for the negative transitions); the off-diagonal blocks have equal number of transitions per each hidden state that is

*n*. Lumping with the respect to them produces

*s*-state rate matrix with equal transition rate qn. Obviously, ⊕

^{n}

*Q*can be transformed into ⊙

^{s}^{n}

*Q*by prohibiting some transitions to match the rate values of and maintain the RWR (as in the previous proof). Thus, ⊙

^{s}^{n}

*Q*can lumped into . This implies that any is congruent to ⊙

^{s}^{n}

*Q*.

^{s}Constructing ⊕^{n} *Q ^{s}* means amalgamating

*Q*with itself

^{s}*n*– 1 times (by the definition of ⊕

^{n}

*Q*) that can be done recursively. To make ⊕

^{s}^{2}

*Q*, take

^{s}*Q*and uniquely enumerate each state with one of

^{s}*s*natural numbers {1,2,…,

*s*}. This classifies states in enumerated classes. Make

*s*– 1 copies of

*Q*, which in total gives

^{s}*s*copies of

*Q*and

^{s}*s*copies of states within each enumerated class. To complete the amalgamation, create edges (i.e., transitions with rate

*q*) between all states within the same enumerated class (because all state transitions are allowed in Q

^{s}). This gives ⊕

^{2}

*Q*.

^{s}Now, we show that coloring exists for this amalgamation. Color each copy of *Q ^{s}* identically by

*s*colors {

*c*

_{1},

*c*

_{2},…

*c*}; this coloring maps the set {

_{s}*c*

_{1},

*c*

_{2},…

*c*} to the enumerated state classes {1, 2,…,

_{s}*s*}. Leave one copy of

*Q*intact and permute colors in the remaining

^{s}*s*– 1 copies. For this, construct a permutation

*σ*= (

*c*

_{1},

*c*

_{2},…

*c*) that sends

_{s}*c*

_{1}to

*c*

_{2}, …,

*c*

_{s-1}to

*c*to

_{s}, c_{s}*c*

_{1}. Note, permutation power (

*σ*) means applying

^{k}*σ*sequentially

*k*times. For example, applying

*σ*and

*σ*

^{2}to the set of the three colors {

*c*

_{1},

*c*

_{2},

*c*

_{3}} gives the following reordered sets: {

*c*

_{2},

*c*

_{3},

*c*

_{1}} and {

*c*

_{3},

*c*

_{1},

*c*

_{2}}. The permutation

*σ*has a cycle of length

*s*indicating that

*σ*returns the identity set. Thus, applying

^{s}*σ*while incrementing

^{k}*k*from

*k*= 1 to

*k*=

*s*– 1, produces

*s*– 1 sets with unique (up to permutation) color order. Now, apply the set of permutations {

*σ, σ*

^{2},…,

*σ*

^{s-1}} to the set of the remaining copies of

*Q*(there are

^{s}*s*– 1 copies). The first copy (left intact) will preserve the original coloring, while the remaining

*s*– 1 copies will have colors permuted. This will result in each of

*s*copies of

*Q*having unique order of colors, which also implies that the states within the same enumerated class (across the copies) are differently colored. Note, in the amalgamated ⊕

^{s}^{2}

*Q*the states are adjacent only if they are within the same copy of

^{s}*Q*or if they belong to the same enumerated class. All states within the same copy have different colors (by the initial condition) and all states within the same enumerated class have different colors too, due to the permutation. Thus, no two adjacent states have the same color and coloring is possible.

^{s}Using the same recursive technique can construct higher dimensional amalgamations. For ⊕^{d+1} *Q ^{s}*, take ⊕

^{d}

*Q*, create a new enumeration for its

^{s}*s*states, create

^{d}*s*– 1 copies of ⊕

^{d}

*Q*, link edges within the same enumerated class, use the coloring obtained for ⊕

^{s}^{d}

*Q*but permute it for

^{s}*s*– 1 copies using the powers of

*σ*. This will yield the appropriately colored ⊕

^{d+1}

*Q*. Since the permutation

^{s}*σ*always exists for any ⊕

^{n}

*Q*, it implies that coloring of ⊕

^{s}^{n}

*Q*is always possible.

^{s}### A2. Hidden Expansions in State-Speciation and Extinction Process (SSE)

#### A2.1. Hidden Expansions

We define a birth-death process (*Q _{bd}*) for lineage speciation (λ) and extinction (

*μ*), and a binary CTMC for trait evolution (

*Q*

_{1}) as:

The simplest joint process for *Q _{bd}* and

*Q*

_{1}, known as BiSSE model (

*Q*), is shown in the equation (11). A more complex model, HiSSE, that includes hidden traits is given in the equation (12). The capital integers {1,2, 3,…} in the state names indicate number of born species, the numerical subscripts {0,1} denote states in the observed character, and letters {

_{bi}*a,b*} denote hidden states.

These are the conventional SSE models used in phylogenetics. Note, in both models every trait’s state is associated with only one speciation and extinction event. The hidden expansions for traits can be constructed by adding extra hidden states to them. However, the hidden expansions for diversification process cannot be constructed on those conventional models. To allow it, the conventional SSE models should be extended to permit a dual transition – speciation/extinction event and change in trait’s state. The simplest of such models with dual transitions is:

So, the birth-death dual transitions can be used to construct any HE. For example, suppose there is a BiSSE process with variable diversification: λ_{1} = 0.1, λ_{2} = 0.2, *μ* = 0.1, and asymmetric transitions between the trait’s states: *q*_{1} = 0.2, *q*_{2} = 0.1, shown on the left hand side of the equation (14). The EHE of this process is shown on right hand side of the equation (14); it is characterized by the equal rate value *q _{sin}* = 0.1. Note that the EHE is lumpable by the RWR with respect to the aggregation of states {{1

_{0a}}, {1

_{1a}, 1

_{1b}}, {2

_{0a}}, {2

_{1a}, 2

_{1b}},…{

*S*

_{0a}}, {

*S*

_{1a},

*S*

_{1b}}} that produces the original BiSSE matrix. The same logic of allowing dual transitions applies to other hidden expansions of SSE processes.

#### A2.2. Congruence between correlated and independent models

Consider the character independent (CID) model where diversification process is dependent on the hidden trait but independent for the observable trait. The CID model implies the following constraints between the rates in the equation (12) to hold simultaneously: (1) *q*_{14} = *q*_{23} = *q*_{32} = *q*_{41} = 0; (2) *q*_{13} = *q*_{24}, *q*_{12} = *q*_{34}, *q*_{21} = *q*_{43}, *q*_{31} = *q*_{42}; (3) λ_{1} = λ_{3} > λ_{2} = λ_{4}; and (4) *μ*_{1} = *μ*_{3}, *μ*_{2} = *μ*_{4}. The submatrix (*B*_{1}) describing trait evolution in the CID model is shown in the equation (15); the submatrix notation is used herein for brevity. Note, it lumpable with respect to the observed trait (*Q _{ob}*) by the following aggregation {{1

_{0a}, 1

_{0}

*b*}> {2

_{1a}, 2

_{1b}}}, which means that

*Q*can be modeled separately without any information on the hidden trait and diversification process. Obviously, there are many ways to maintain lumpability in

_{ob}*B*

_{1}with the respect to the observed trait. For example, a lumpable submatrix

*B*

_{2}can be constructed by allowing dual transition in top-right block shown in the equation (15). Note,

*B*

_{2}breaks the independence constraints of the CID model. Thus, it implies the observed and hidden traits evolve in correlation and affect the diversification process; so the latter is dependent on the observed trait too. Both models,

*B*

_{1}and

*B*

_{2}are congruent and have the same number of parameters but imply drastically different evolutionary interpretations. These two models cannot be distinguished phylogenetically. Apparently any CID model can be modified in similar way by allowing dual transitions to result in a congruent model that implies state dependent diversification. Thus, any CID model is congruent with the state-dependent model.