PT - JOURNAL ARTICLE AU - Fabian Freund AU - Arno Siri-jégousse TI - Distinguishing coalescent models - which statistics matter most? AID - 10.1101/679498 DP - 2019 Jan 01 TA - bioRxiv PG - 679498 4099 - http://biorxiv.org/content/early/2019/06/22/679498.short 4100 - http://biorxiv.org/content/early/2019/06/22/679498.full AB - Modelling genetic diversity needs an underlying genealogy model. To choose a fitting model based on genetic data, one can perform model selection between classes of genealogical trees, e.g. Kingman’s co-alescent with exponential growth or multiple merger coalescents. Such selection can be based on many different statistics measuring genetic diversity. We use a random forest based Approximate Bayesian Computation to disentangle the effects of different statistics on distinguishing between various classes of genealogy models. For the specific question of inferring whether genealogies feature multiple mergers, we introduce a new statistic, the observable minimal clade size, which corresponds to the minimal allele count of non-private mutations in an individual.