%0 Journal Article %A Ethan B. Linck %A C.J. Battey %T Minor allele frequency thresholds strongly affect population structure inference with genomic datasets %D 2017 %R 10.1101/188623 %J bioRxiv %P 188623 %X Across the genome, the effects of different evolutionary processes and historical events can result in different classes of genetic variants (or alleles) characterized by their relative frequency in a given population. As a result, population genetic inference can be strongly affected by biases in laboratory and bioinformatics treatments that affect the site frequence spectrum, or SFS. Yet despite the widespread use of reduced-representation genomic datasets with nonmodel organisms, the potential consequences of these biases for downstream analyses remain poorly examined. Here, we assess the influence of minor allele frequency (MAF) thresholds implemented during variant detection on inference of population structure. We use simulated and empirical datasets to evaluate the effect of MAF thresholds on the ability to discriminate among populations and quantify admixture with both model-based and non-model-based clustering methods. We find model-based inference of population structure is highly sensitive to choice of MAF, and may be confounded by either including singletons or excluding all rare alleles. In contrast, non-model-based clustering is largely robust to MAF choice. Our results suggest that model-based inference of population structure can fail due to either natural demographic processes or assembly artifacts, with broad consequences for phylogeographic and population genetic studies. We propose a simple hypothesis to explain this behavior and recommend a set of best practices for researchers seeking to describe population structure using reduced-representation libraries. %U https://www.biorxiv.org/content/biorxiv/early/2017/09/14/188623.full.pdf