Abstract
In this paper we analyze the effect of substitution rate heterogenity on the sample complexity of species tree estimation. We consider a model based on the multi-species coalescent (MSC), with the addition that gene trees exhibit random i.i.d. rates of substitution. Our first result is a lower bound on the number of loci needed to distinguish 2-leaf trees (i.e., pairwise distances) with high probability, when substitution rates satisfy a growth condition. In particular, we show that to distinguish two distances differing by length f with high probability, one requires O(f−2) loci, a significantly higher bound than the constant rate case. The second main result is a lower bound on the amount of data needed to reconstruct a 3-leaf species tree with high probability, when mutation rates are gamma distributed. In this case as well, we show that the number of gene trees must grow as O(f−2).
Competing Interest Statement
The authors have declared no competing interest.