Abstract
Methods for remediating PCR and sequencing artifacts in 16S rRNA gene sequence collections are in continuous development and have significant ramifications on the inferences that can be drawn. A common approach is to remove rare amplcon sequence variants (ASVs) from datasets. But, the definition of rarity is generally selected without regard for the number of sequences in the samples or the variation in sequencing depth across samples within a study. I analyzed the impact of removing rare ASVs on metrics of alpha and beta diversity using samples collected across 12 published datasets. Removal of rare ASVs significantly decreased the number of ASVs and operational taxonomic units as well as their diversity. Furthermore, their removal increased the variation in community structure between samples. When simulating a known effect size, removal of rare ASVs reduced the power to detect the effect relative to not removing rare ASVs. Removal of rare ASVs did not affect the false detection rate when samples were randomized to simulate a null model. However, the false detection rate increased when rare ASVs were removed using a null distribution and assignment of samples to simulated treatment groups according to their sequencing depth. The false detection rate did not vary when rare ASVs were retained. This analysis demonstrates the problems inherent in removing rare ASVs. Researchers are encouraged to retain rare ASVs, to select approaches that minimize PCR and sequencing artifacts, and to use rarefaction to control for uneven sequencing effort.
Importance Removing rare amplicon sequence variants (ASVs) from 16S rRNA gene sequence collections is an approach that has grown in popularity for limiting PCR and sequencing artifacts. Yet, it is unclear what impact an abundance-based filter has on downstream analyses. To investigate the effects of removing rare ASVs, I analyzed the community distributions found in the samples of 12 published datasets. Analysis of these data and simulations based on them showed that removal of rare ASVs distorts the representation of microbial communities. This has the effect of artificially making it more difficult to detect differences between treatment groups. Also of concern was the observation that if sequencing depth is confounded with the treatment, then the probability of falsely detecting a difference between the treatment groups increased with the removal of rare ASVs. The practice of removing rare ASVs should stop, lest researcher adversely affect the interpretation of their data.