RT Journal Article SR Electronic T1 SNP Data Quality Control in a National Beef and Dairy Cattle System and Highly Accurate SNP Based Parentage Verification and Identification JF bioRxiv FD Cold Spring Harbor Laboratory SP 148171 DO 10.1101/148171 A1 M.C. McClure A1 J. McCarthy A1 P. Flynn A1 J. McClure A1 K. O’Connell A1 J.F. Kearney YR 2017 UL http://biorxiv.org/content/early/2017/06/09/148171.abstract AB A major use of genetic data is parentage verification and identification as inaccurate pedigrees negatively affect genetic gain. Since 2012 the international standard for single nucleotide polymorphism (SNP) based verification in Bos taurus cattle has been the ISAG 100 and 200 SNP panels. While these SNP sets have provided an increased level of parentage accuracy over microsatellite markers (MS), they can validate the wrong parent for an animal at ≤1% misconcordance rate levels, indicating that more SNP are needed if a more accurate pedigree is required. With rapidly increasing numbers of cattle being genotyped in Ireland that represent 61 Bos taurus breeds from a wide range of farm types: beef/dairy, AI/pedigree/commercial, purebred/crossbred, and large to small herd size the Irish Cattle Breeding Federation (ICBF) analysed different SNP densities to determine that at a minimum ≥500 SNP are needed to consistently predict only one set of parents at a ≤1% misconcordance rate. For parentage validation and prediction ICBF uses 800 SNP selected based on SNP clustering quality, ISAG200 inclusion, call rate (CR), and minor allele frequency (MAF) in the Irish cattle population. Large datasets require sample and SNP quality control (QC). Most publications only deal with SNP QC via CR, MAF, parent-progeny conflicts, and Hardy-Weinberg deviation, but not sample QC. We report here a genomic sample QC pipeline to deal with the unique challenges of >1,000,000 genotypes from a national herd such as SNP genotype errors from mis-tagging of animals, lab errors, farm errors, and multiple other issues that can arise.