Abstract
We propose MinHash (as implemented by MASH) and NMF as alternative methods to estimate similarity between metagenetic samples. We further describe these results with cluster analysis and correlations with independent ecological metadata.
Using sample to sample similarities based on MinHash similarities we use hierarchal clustering to generate clusters, simultaneously we generate groups based on NMF, we compare groups generated from the MinHash similarity derived clusters and from NMF to those determined by the environment, looking to Silhouette Width for an assessment of the quality of the cluster.
We analyze existing data from the Atacama desert to determine the relationship between ecological factors and group membership, using the generated groups from MASH and NMF we run an ANOVA to uncover links between metagenetic samples and the known environmental variables, such as pH and Soil Conductivity.