PT - JOURNAL ARTICLE AU - Christopher I. Cooper AU - Delia Yao AU - Dorota H. Sendorek AU - Takafumi N. Yamaguchi AU - Christine P’ng AU - Cristian Caloian AU - Michael Fraser AU - SMC-DNA Challenge Participants AU - Kyle Ellrott AU - Adam A. Margolin AU - Robert G. Bristow AU - Joshua M. Stuart AU - Paul C. Boutros TI - Valection: Design Optimization for Validation and Verification Studies AID - 10.1101/254839 DP - 2018 Jan 01 TA - bioRxiv PG - 254839 4099 - http://biorxiv.org/content/early/2018/01/28/254839.short 4100 - http://biorxiv.org/content/early/2018/01/28/254839.full AB - Background Platform-specific error profiles necessitate confirmatory studies where predictions made on data generated using one technology are additionally verified by processing the same samples on an orthogonal technology. In disciplines that rely heavily on high-throughput data generation, such as genomics, reducing the impact of false positive and false negative rates in results is a top priority. However, verifying all predictions can be costly and redundant, and testing a subset of findings is often used to estimate the true error profile. To determine how to create subsets of predictions for validation that maximize inference of global error profiles, we developed Valection, a software program that implements multiple strategies for the selection of verification candidates.Results To evaluate these selection strategies, we obtained 261 sets of somatic mutation calls from a single-nucleotide variant caller benchmarking challenge where 21 teams competed on whole-genome sequencing datasets of three computationally-simulated tumours. By using synthetic data, we had complete ground truth of the tumours’ mutations and, therefore, we were able to accurately determine how estimates from the selected subset of verification candidates compared to the complete prediction set. We found that selection strategy performance depends on several verification study characteristics. In particular the verification budget of the experiment (i.e. how many candidates can be selected) is shown to influence estimates.Conclusions The Valection framework is flexible, allowing for the implementation of additional selection algorithms in the future. Its applicability extends to any discipline that relies on experimental verification and will benefit from the optimization of verification candidate selection.SNVsingle-nucleotide variantNGSnext-generation sequencingICGCInternational Cancer Genome ConsortiumTCGAThe Cancer Genome AtlasDREAMDialogue for Reverse Engineering Assessments and MethodsSMC-DNASomatic Mutation Calling DNA ChallengeTPtrue positiveFPfalse positiveFNfalse negative