RT Journal Article SR Electronic T1 Distinguishing coalescent models - which statistics matter most? JF bioRxiv FD Cold Spring Harbor Laboratory SP 679498 DO 10.1101/679498 A1 Fabian Freund A1 Arno Siri-jégousse YR 2019 UL http://biorxiv.org/content/early/2019/06/22/679498.abstract AB Modelling genetic diversity needs an underlying genealogy model. To choose a fitting model based on genetic data, one can perform model selection between classes of genealogical trees, e.g. Kingman’s co-alescent with exponential growth or multiple merger coalescents. Such selection can be based on many different statistics measuring genetic diversity. We use a random forest based Approximate Bayesian Computation to disentangle the effects of different statistics on distinguishing between various classes of genealogy models. For the specific question of inferring whether genealogies feature multiple mergers, we introduce a new statistic, the observable minimal clade size, which corresponds to the minimal allele count of non-private mutations in an individual.