Methods in Description and Validation of Local Metagenetic Microbial Communities

David Molik; Michael Pfrender; Scott Emrich

doi:10.1101/198614

Abstract

We propose minhash (as implemented by MASH) and NMF as alternative methods to estimate similarity between metagenetic samples. We further describe these results with cluster analysis and correlations with independent ecological metadata.
Species and kmer abundance information is used to determine similarities and create clusters to better understand how communities interact, as well as relate to known environmental variables, such as Ph and Soil Conductivity.
We use cluster silhouettes to assess various approaches for clustering metagenetic samples as well as anova to uncover links between metagenetic samples and the known environmental variables.
By analyzing data from the Atacama desert and determining the relationship between ecological factors and group membership, we show the applicability of these methods.

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.