RT Journal Article SR Electronic T1 Compressing Streams of Phylogenetic Trees JF bioRxiv FD Cold Spring Harbor Laboratory SP 440644 DO 10.1101/440644 A1 Trefzer, Axel A1 Stamatakis, Alexandros YR 2018 UL http://biorxiv.org/content/early/2018/10/15/440644.abstract AB Bayesian Markov-Chain Monte Carlo (MCMC) methods for phylogenetic tree inference, that is, inference of the evolutionary history of distinct species using their molecular sequence data, typically generate large sets of phylogenetic trees. The trees generated by the MCMC procedure are samples of the posterior probability distribution that MCMC methods approximate. Thus, they generate a stream of correlated binary trees that need to be stored. Here, we adapt state-of-the art algorithms for binary tree compression to phylogenetic tree data streams and extend them to also store the required meta-data. On a phylogenetic tree stream containing 1, 000 trees with 500 leaves including branch length values, we achieve a compression rate of 5.4 compared to the uncompressed tree files and of 1.8 compared to bzip2-compressed tree files. For compressing the same trees, but without branch length values, our compression method is approximately an order of magnitude better than bzip2. A prototype implementation is available at https://github.com/axeltref/tree-compression.git.