PT - JOURNAL ARTICLE AU - Gunjan Baid AU - Maria Nattestad AU - Alexey Kolesnikov AU - Sidharth Goel AU - Howard Yang AU - Pi-Chuan Chang AU - Andrew Carroll TI - An Extensive Sequence Dataset of Gold-Standard Samples for Benchmarking and Development AID - 10.1101/2020.12.11.422022 DP - 2020 Jan 01 TA - bioRxiv PG - 2020.12.11.422022 4099 - http://biorxiv.org/content/early/2020/12/11/2020.12.11.422022.short 4100 - http://biorxiv.org/content/early/2020/12/11/2020.12.11.422022.full AB - Accurate standards and extensive development datasets are the foundation of technical progress. To facilitate benchmarking and development, we sequence 9 samples, covering the Genome in a Bottle truth sets on multiple instruments (NovaSeq, HiSeqX, HiSeq4000, PacBio Sequel II System) and sample preparations (PCR-Free, PCR-Positive) for both whole genome and multiple exome kits. We benchmark pipelines, quantifying strengths and limitations for sequencing and analysis methods. We identify variability within and between instruments, preparation methods, and analytical pipelines, across various sequencing depths. We discuss the relevance of this variability to downstream analyses, and strategies to reduce variability.Competing Interest StatementAll authors are employees of Google LLC and own Alphabet stock as part of the standard compensation package. This study was funded by Google LLC.