RT Journal Article SR Electronic T1 Benchmarking challenging small variants with linked and long reads JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.07.24.212712 DO 10.1101/2020.07.24.212712 A1 Justin Wagner A1 Nathan D Olson A1 Lindsay Harris A1 Jennifer McDaniel A1 Ziad Khan A1 Jesse Farek A1 Medhat Mahmoud A1 Ana Stankovic A1 Vladimir Kovacevic A1 Byunggil Yoo A1 Neil Miller A1 Jeffrey A. Rosenfeld A1 Bohan Ni A1 Samantha Zarate A1 Melanie Kirsche A1 Sergey Aganezov A1 Michael Schatz A1 Giuseppe Narzisi A1 Marta Byrska-Bishop A1 Wayne Clarke A1 Uday S. Evani A1 Charles Markello A1 Kishwar Shafin A1 Xin Zhou A1 Arend Sidow A1 Vikas Bansal A1 Peter Ebert A1 Tobias Marschall A1 Peter Lansdorp A1 Vincent Hanlon A1 Carl-Adam Mattsson A1 Alvaro Martinez Barrio A1 Ian T Fiddes A1 Chunlin Xiao A1 Arkarachai Fungtammasan A1 Chen-Shan Chin A1 Aaron M Wenger A1 William J Rowell A1 Fritz J Sedlazeck A1 Andrew Carroll A1 Marc Salit A1 Justin M Zook YR 2021 UL http://biorxiv.org/content/early/2021/10/06/2020.07.24.212712.abstract AB Genome in a Bottle (GIAB) benchmarks have been widely used to help validate clinical sequencing pipelines and develop new variant calling and sequencing methods. Here, we use accurate linked reads and long reads to expand the prior benchmarks in 7 samples to include difficult-to-map regions and segmental duplications that are not readily accessible to short reads. Our new benchmark adds more than 300,000 SNVs, 50,000 indels, and 16 % new exonic variants, many in challenging, clinically relevant genes not previously covered (e.g., PMS2). For HG002, we include 92% of the autosomal GRCh38 assembly, while excluding problematic regions for benchmarking small variants (e.g., copy number variants and reference errors) that should not have been in the previous version, which included 85% of GRCh38. By including difficult-to-map regions, this benchmark identifies eight times more false negatives in a short read variant call set relative to our previous benchmark.We have demonstrated the utility of this benchmark to reliably identify false positives and false negatives across technologies in more challenging regions, which enables continued technology and bioinformatics development.Competing Interest StatementAMW and WJR are employees and shareholders of Pacific Biosciences. AMB and ITF were employees and shareholders of 10X Genomics. FJS has received sponsored travel from Oxford Nanopore and Pacific Biosciences, and received a 2018 sequencing grant from Pacific Biosciences. AS and VK are employees of Seven Bridges. AC is an employee of Google Inc. and is a former employee of DNAnexus. AF and C-SC are employees of DNAnexus. SMES is an employee of Roche.