TY - JOUR T1 - Sparsification of Large Ultrametric Matrices: Insights into the Microbial Tree of Life <sup>*</sup> JF - bioRxiv DO - 10.1101/2022.08.21.504697 SP - 2022.08.21.504697 AU - Evan D. Gorman AU - Manuel E. Lladser Y1 - 2022/01/01 UR - http://biorxiv.org/content/early/2022/11/21/2022.08.21.504697.abstract N2 - Strictly ultrametric matrices appear in many domains of mathematics and science; nevertheless, they can be large and dense, making them difficult to store and manipulate, unlike large but sparse matrices. In this manuscript, we exploit that strictly ultrametric matrices can be represented as binary trees to sparsify them via an orthonormal base change based on Haar-like wavelets. We show that, with overwhelmingly high probability, only an asymptotically negligible fraction of the off-diagonal entries in random but large strictly ultrametric matrices remain non-zero after the base change; and develop an algorithm to sparsify such matrices directly from their tree representation. We also identify the subclass of matrices diagonalized by the Haar-like wavelets and supply a sufficient condition to approximate the spectrum of strictly ultrametric matrices outside this subclass. Our methods give computational access to the covariance matrix of the microbiologists’ Tree of Life, which was previously inaccessible due to its size, and motivate introducing a new wavelet-based (beta-diversity) metric to compare microbial environments. Unlike the established (beta-diversity) metrics, the new metric may be used to identify internal nodes (i.e., splits) in the Tree that link microbial composition and environmental factors in a statistically significant manner.MSC codes 05C05, 15A18, 42C40, 65F55, 92C70Competing Interest StatementThe authors have declared no competing interest. ER -