PT - JOURNAL ARTICLE AU - Trefzer, Axel AU - Stamatakis, Alexandros TI - Compressing Streams of Phylogenetic Trees AID - 10.1101/440644 DP - 2018 Jan 01 TA - bioRxiv PG - 440644 4099 - http://biorxiv.org/content/early/2018/10/15/440644.short 4100 - http://biorxiv.org/content/early/2018/10/15/440644.full AB - Bayesian Markov-Chain Monte Carlo (MCMC) methods for phylogenetic tree inference, that is, inference of the evolutionary history of distinct species using their molecular sequence data, typically generate large sets of phylogenetic trees. The trees generated by the MCMC procedure are samples of the posterior probability distribution that MCMC methods approximate. Thus, they generate a stream of correlated binary trees that need to be stored. Here, we adapt state-of-the art algorithms for binary tree compression to phylogenetic tree data streams and extend them to also store the required meta-data. On a phylogenetic tree stream containing 1, 000 trees with 500 leaves including branch length values, we achieve a compression rate of 5.4 compared to the uncompressed tree files and of 1.8 compared to bzip2-compressed tree files. For compressing the same trees, but without branch length values, our compression method is approximately an order of magnitude better than bzip2. A prototype implementation is available at https://github.com/axeltref/tree-compression.git.