Abstract
We propose minhash (as implemented by MASH) and NMF as alternative methods to estimate similarity between metagenetic samples. We further describe these results with cluster analysis and correlations with independent ecological metadata.
Species and kmer abundance information is used to determine similarities and create clusters to better understand how communities interact, as well as relate to known environmental variables, such as Ph and Soil Conductivity.
We use cluster silhouettes to assess various approaches for clustering metagenetic samples as well as anova to uncover links between metagenetic samples and the known environmental variables.
By analyzing data from the Atacama desert and determining the relationship between ecological factors and group membership, we show the applicability of these methods.