TY - JOUR T1 - Whole-genome characterization in pedigreed non-human primates using Genotyping-By-Sequencing and imputation JF - bioRxiv DO - 10.1101/043240 SP - 043240 AU - Ben N Bimber AU - Michael J Raboin AU - John Letaw AU - Kimberly Nevonen AU - Jennifer E Spindel AU - Susan R McCouch AU - Rita Cervera-Juanes AU - Eliot Spindel AU - Lucia Carbone AU - Betsy Ferguson AU - Amanda Vinson Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/03/12/043240.abstract N2 - Background Rhesus macaques are widely used in biomedical research, but the application of genomic information in this species to better understand human disease is still undeveloped. Whole-genome sequence (WGS) data in pedigreed macaque colonies could provide substantial experimental power, but the collection of WGS data in large cohorts remains a formidable expense. Here, we describe a cost-effective approach that selects the most informative macaques in a pedigree for whole-genome sequencing, and imputes these dense marker data into all remaining individuals having sparse marker data, obtained using Genotyping-By-Sequencing (GBS).Results We developed GBS for the macaque genome using a single digest with PstI, followed by sequencing to 30X coverage. From GBS sequence data collected on all individuals in a 16-member pedigree, we characterized an optimal 22,455 sparse markers spaced ~125 kb apart. To characterize dense markers for imputation, we performed WGS at 30X coverage on 9 of the 16 individuals, yielding ~10.2 million high-confidence variants. Using the approach of “Genotype Imputation Given Inheritance” (GIGI), we imputed alleles at an optimized dense set of 4,920 variants on chromosome 19, using 490 sparse markers from GBS. We assessed changes in accuracy of imputed alleles, 1) across 3 different strategies for selecting individuals for WGS, i.e., a) using “GIGI-Pick” to select informative individuals, b) sequencing the most recent generation, or c) sequencing founders only; and 2) when using from 1-9 WGS individuals for imputation. We found that accuracy of imputed alleles was highest using the GIGI-Pick selection strategy (median 92%), and improved very little when using >4 individuals with WGS for imputation. We used this ratio of 4 WGS to 12 GBS individuals to impute an expanded set of ~14.4 million variants across all 20 macaque autosomes, achieving ~85-88% accuracy per chromosome.Conclusions We conclude that an optimal tradeoff exists at the ratio of 1 individual selected for WGS using the GIGI-Pick algorithm, per 3-5 relatives selected for GBS, a cost savings of ~67-83% over WGS of all individuals. This approach makes feasible the collection of accurate, dense genome-wide sequence data in large pedigreed macaque cohorts without the need for expensive WGS data on all individuals.List of abbreviationsWGSwhole-genome sequencingGBSGenotyping-By-SequencingSNVsingle-nucleotide variantONPRCOregon National Primate Research CenterMAFminor allele frequencyGIGIGenotype Imputation Given InheritanceCNVcopy number variantBWABurrows-Wheeler AlignerVCFvariant call formatGATKGenome Analyzer ToolKitMCMCMarkov Chain Monte CarloMLmost likely genotype calling methodTHRthreshold genotype calling method ER -