PT - JOURNAL ARTICLE AU - Justin M. Zook AU - Jennifer McDaniel AU - Hemang Parikh AU - Haynes Heaton AU - Sean A. Irvine AU - Len Trigg AU - Rebecca Truty AU - Cory Y. McLean AU - Francisco M. De La Vega AU - Marc Salit TI - Reproducible integration of multiple sequencing datasets to form high-confidence SNP, indel, and reference calls for five human genome reference materials AID - 10.1101/281006 DP - 2018 Jan 01 TA - bioRxiv PG - 281006 4099 - http://biorxiv.org/content/early/2018/03/13/281006.short 4100 - http://biorxiv.org/content/early/2018/03/13/281006.full AB - Benchmark small variant calls from the Genome in a Bottle Consortium (GIAB) for the CEPH/HapMap genome NA12878 (HG001) have been used extensively for developing, optimizing, and demonstrating performance of sequencing and bioinformatics methods. Here, we improve and simplify the methods we use to integrate multiple sequencing datasets, with the intention of deploying a reproducible cloud-based pipeline for application to arbitrary human genomes. We use these reproducible methods to form high-confidence calls with respect to GRCh37 and GRCh38 for HG001 and 4 additional broadly-consented genomes from the Personal Genome Project that are available as NIST Reference Materials. Our new methods produce 17% more SNPs and 176% more indels than our previously published calls for HG001. We also phase 99.5% of the variants in HG001 and call about 90% of the reference genome with high-confidence, increased from 78% previously. Our calls only contain 108 differences from the Illumina Platinum Genomes calls in GRCh37, only 14 of which are ambiguous or likely to be errors in our calls. By comparing several callsets to our new calls, our previously published calls, and Illumina Platinum Genomes calls, we highlight challenges in interpreting performance metrics when benchmarking against imperfect high-confidence calls. Our new calls address some of these challenges, but performance metrics should always be interpreted carefully. Benchmarking tools from the Global Alliance for Genomics and Health are useful for stratifying performance metrics by variant type and genome context to elucidate strengths and weaknesses of a method. We also explore differences between comparing to high-confidence calls for the 5 GIAB genomes, and show that performance metrics for one pipeline are largely similar but not identical when comparing to the 5 genomes. Finally, to explore applicability of our methods for genomes that have fewer datasets, we form high-confidence calls using only Illumina and 10x Genomics, and find that they have more high-confidence calls but have a higher error rate. These newly characterized genomes have a broad, open consent with few restrictions availability of samples and data, enabling a uniquely diverse array of applications.