RT Journal Article SR Electronic T1 Quality Control and Integration of Genotypes from Two Calling Pipelines for Whole Genome Sequence Data in the Alzheimer’s Disease Sequencing Project JF bioRxiv FD Cold Spring Harbor Laboratory SP 318857 DO 10.1101/318857 A1 Adam C. Naj A1 Honghuang Lin A1 Badri N. Vardarajan A1 Simon White A1 Daniel Lancour A1 Yiyi Ma A1 Michael Schmidt A1 Fangui Sun A1 Mariusz Butkiewicz A1 William S. Bush A1 Brian W. Kunkle A1 John Malamon A1 Najaf Amin A1 Seung Hoan Choi A1 Kara L. Hamilton-Nelson A1 Sven J. van der Lee A1 Namrata Gupta A1 Daniel C. Koboldt A1 Mohamad Saad A1 Bowen Wang A1 Alejandro Q. Nato, Jr. A1 Harkirat K. Sohi A1 Amanda Kuzma A1 Alzheimer’s Disease Sequencing Project (ADSP) A1 Li-San Wang A1 L. Adrienne Cupples A1 Cornelia van Duijn A1 Sudha Seshadri A1 Gerard D. Schellenberg A1 Eric Boerwinkle A1 Joshua C. Bis A1 Josée Dupuis A1 William J Salerno A1 Ellen M. Wijsman A1 Eden R. Martin A1 Anita L. DeStefano YR 2018 UL http://biorxiv.org/content/early/2018/05/11/318857.abstract AB The Alzheimer’s Disease Sequencing Project (ADSP) performed whole genome sequencing (WGS) of 584 subjects from 111 multiplex families at three sequencing centers. Genotype calling of single nucleotide variants (SNVs) and insertion-deletion variants (indels) was performed centrally using GATK-HaplotypeCaller and Atlas V2. The ADSP Quality Control (QC) Working Group applied QC protocols to project-level variant call format files (VCFs) from each pipeline, and developed and implemented a novel protocol, termed “consensus calling,” to combine genotype calls from both pipelines into a single high-quality set. QC was applied to autosomal bi-allelic SNVs and indels, and included pipeline-recommended QC filters, variant-level QC, and sample-level QC. Low-quality variants or genotypes were excluded, and sample outliers were noted. Quality was assessed by examining Mendelian inconsistencies (MIs) among 67 parent-offspring pairs, and MIs were used to establish additional genotype-specific filters for GATK calls. After QC, 578 subjects remained. Pipeline-specific QC excluded ~12.0% of GATK and 14.5% of Atlas SNVs. Between pipelines, ~91% of SNV genotypes across all QCed variants were concordant; 4.23% and 4.56% of genotypes were exclusive to Atlas or GATK, respectively; the remaining ~0.01% of discordant genotypes were excluded. For indels, variant-level QC excluded ~36.8% of GATK and 35.3% of Atlas indels. Between pipelines, ~55.6% of indel genotypes were concordant; while 10.3% and 28.3% were exclusive to Atlas or GATK, respectively; and ~0.29% of discordant genotypes were. The final WGS consensus dataset contains 27,896,774 SNVs and 3,133,926 indels and is publicly available.Abbreviations AD, Alzheimer’s disease; QC, Quality Control; LSSAC, Large-Scale Sequencing and Analysis Center; Broad, Broad Institute Genomics Service; Baylor, Baylor College of Medicine Human Genome Sequencing Center; WashU, Washington University-St. Louis McDonnell Genome Institute; WGS, whole genome sequencing; WES, whole exome sequencing; indel, insertion-deletion variants; VCF, variant control format; MI, Mendelian inconsistency; MC, Mendelian consistency; GWAS, genome-wide association study; VR, referent allele read depth; DP, overall read depth; MS, mapping score; GQ, genotype quality score; Ti/Tv, Transition/Transversion; CS, concordance code