RT Journal Article SR Electronic T1 How Array Design Affects SNP Ascertainment Bias JF bioRxiv FD Cold Spring Harbor Laboratory SP 833541 DO 10.1101/833541 A1 Johannes Geibel A1 Christian Reimer A1 Steffen Weigend A1 Annett Weigend A1 Torsten Pook A1 Henner Simianer YR 2019 UL http://biorxiv.org/content/early/2019/11/07/833541.abstract AB Single nucleotide polymorphisms (SNPs), genotyped with SNP arrays, have become the most widely used marker types in population genetic analyses over the last 10 years. However, compared to whole genome re-sequencing data, arrays are known to lack a substantial proportion of globally rare variants and tend to be biased towards variants present in populations involved in the development process of the respective array. This affects population genetic estimators and is known as SNP ascertainment bias. We investigated factors contributing to ascertainment bias in array development by redesigning the Axiomâ„¢ Genome-Wide Chicken Array in silico and evaluating changes in allele frequency spectra and heterozygosity estimates in a stepwise manner. A sequential reduction of rare alleles during the development process was shown with main influencing factors being the identification of SNPs in a limited set of discovery populations and a within-line selection of SNPs when aiming for equidistant spacing. These effects were shown to be less severe with a larger discovery panel. Additionally, a generally massive overestimation of expected heterozygosity for the ascertained SNP sets was shown. This overestimation was 28 % higher for populations involved in the discovery process than not involved populations for the original array. The same was observed after the SNP discovery step in the redesign. However, an unequal contribution of populations in the equal spacing process can mask this effect but also adds uncertainty. Finally, we make suggestions for the design of specialized arrays for large scale projects where whole genome re-sequencing techniques are still too expensive.ABBREVIATIONSHexpexpected heterozygosityMAFminor allele frequencyOHEoverestimation of expected heterozygositySNPsingle nucleotide polymorphismWGSwhole genome re-sequencing