PT  - JOURNAL ARTICLE
AU  - Gunjan Baid
AU  - Maria Nattestad
AU  - Alexey Kolesnikov
AU  - Sidharth Goel
AU  - Howard Yang
AU  - Pi-Chuan Chang
AU  - Andrew Carroll
TI  - An Extensive Sequence Dataset of Gold-Standard Samples for Benchmarking and Development
AID  - 10.1101/2020.12.11.422022
DP  - 2020 Jan 01
TA  - bioRxiv
PG  - 2020.12.11.422022
4099  - http://biorxiv.org/content/early/2020/12/11/2020.12.11.422022.short
4100  - http://biorxiv.org/content/early/2020/12/11/2020.12.11.422022.full
AB  - Accurate standards and extensive development datasets are the foundation of technical progress. To facilitate benchmarking and development, we sequence 9 samples, covering the Genome in a Bottle truth sets on multiple instruments (NovaSeq, HiSeqX, HiSeq4000, PacBio Sequel II System) and sample preparations (PCR-Free, PCR-Positive) for both whole genome and multiple exome kits. We benchmark pipelines, quantifying strengths and limitations for sequencing and analysis methods. We identify variability within and between instruments, preparation methods, and analytical pipelines, across various sequencing depths. We discuss the relevance of this variability to downstream analyses, and strategies to reduce variability.Competing Interest StatementAll authors are employees of Google LLC and own Alphabet stock as part of the standard compensation package. This study was funded by Google LLC.