RT Journal Article SR Electronic T1 Developing a 670k genotyping array to tag ∼2M SNPs across 24 horse breeds JF bioRxiv FD Cold Spring Harbor Laboratory SP 112979 DO 10.1101/112979 A1 Robert J. Schaefer A1 Mikkel Schubert A1 Ernest Bailey A1 Danika L. Bannasch A1 Eric Barrey A1 Gila Kahila Bar-Gal A1 Gottfried Brem A1 Samantha A. Brooks A1 Ottmar Distl A1 Ruedi Fries A1 Carrie J. Finno A1 Vinzenz Gerber A1 Bianca Haase A1 Vidhya Jagannathan A1 Ted Kalbfleisch A1 Tosso Leeb A1 Gabriella Lindgren A1 Maria Susana Lopes A1 Nuria Mach A1 Artur da Câmara Machado A1 James N. MacLeod A1 Annette McCoy A1 Julia Metzger A1 Cecilia Penedo A1 Sagi Polani A1 Stefan Rieder A1 Imke Tammen A1 Jens Tetens A1 Georg Thaller A1 Andrea Verini-Supplizi A1 Claire M. Wade A1 Barbara Wallner A1 Ludovic Orlando A1 James R. Mickelson A1 Molly E. McCue YR 2017 UL http://biorxiv.org/content/early/2017/07/05/112979.abstract AB Background To date, genome-scale analyses in the domestic horse have been limited by suboptimal single nucleotide polymorphism (SNP) density and uneven genomic coverage of the current SNP genotyping arrays. The recent availability of whole genome sequences has created the opportunity to develop a next generation, high-density equine SNP array.Results Using whole genome sequence from 153 individuals representing 24 distinct breeds collated by the equine genomics community, we cataloged over 23 million de novo discovered genetic variants. Leveraging genotype data from individuals with both whole genome sequence, and genotypes from lower-density, legacy SNP arrays, a subset of ∼5 million high-quality, high-density array candidate SNPs were selected based on breed representation and uniform spacing across the genome. Considering probe design recommendations from a commercial vendor (Affymetrix, now Thermo Fisher Scientific) a set of ∼2 million SNPs were selected for a next-generation high-density SNP chip (MNEc2M). Genotype data were generated using the MNEc2M array from a cohort of 332 horses from 20 breeds and a lower-density array, consisting of ∼670 thousand SNPs (MNEc670k), was designed for genotype imputation.Conclusions Here, we document the steps taken to design both the MNEc2M and MNEc670k arrays, report genomic and technical properties of these genotyping platforms, and demonstrate the imputation capabilities of these tools for the domestic horse.SNPSingle Nucleotide PolymorphismMNEc2Mthe 2 million SNP array developed hereMNEc670kthe 670 thousand SNP array developed hereWGSwhole genome sequencingbpbase pairGATKgenome analysis tool kitQUALgeneric quality score output by GATK and SAMtoolsMORthe Morgan horse breedSTDthe Standardbred horse breedVQSLODvariant quality score log-oddsGbgiga-basesMHCmajor histocompatibility complexECAEquus caballus chromosomeVIPvery important probeMAFminor allele frequencyALTalternate alleleKDEkernel density estimationLDlinkage disequilibriumChrUn1Equus calballus unknown chromosome