Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Valection: Design Optimization for Validation and Verification Studies

Christopher I. Cooper, Delia Yao, Dorota H. Sendorek, Takafumi N. Yamaguchi, Christine P’ng, Cristian Caloian, Michael Fraser, SMC-DNA Challenge Participants, Kyle Ellrott, Adam A. Margolin, Robert G. Bristow, Joshua M. Stuart, Paul C. Boutros
doi: https://doi.org/10.1101/254839
Christopher I. Cooper
1Ontario Institute for Cancer Research, Toronto, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Delia Yao
1Ontario Institute for Cancer Research, Toronto, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Dorota H. Sendorek
1Ontario Institute for Cancer Research, Toronto, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Takafumi N. Yamaguchi
1Ontario Institute for Cancer Research, Toronto, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Christine P’ng
1Ontario Institute for Cancer Research, Toronto, Canada
2Department of Medical Biophysics, University of Toronto, Toronto, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Cristian Caloian
1Ontario Institute for Cancer Research, Toronto, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael Fraser
3Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kyle Ellrott
4Computational Biology Program, Oregon Health & Science University, Portland, OR, USA
5Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
6Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Adam A. Margolin
4Computational Biology Program, Oregon Health & Science University, Portland, OR, USA
5Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
7Sage Bionetworks, Seattle, WA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Robert G. Bristow
2Department of Medical Biophysics, University of Toronto, Toronto, Canada
3Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Joshua M. Stuart
6Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Paul C. Boutros
1Ontario Institute for Cancer Research, Toronto, Canada
2Department of Medical Biophysics, University of Toronto, Toronto, Canada
8Department of Pharmacology & Toxicology, University of Toronto, Toronto, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: Paul.Boutros@oicr.on.ca
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Background Platform-specific error profiles necessitate confirmatory studies where predictions made on data generated using one technology are additionally verified by processing the same samples on an orthogonal technology. In disciplines that rely heavily on high-throughput data generation, such as genomics, reducing the impact of false positive and false negative rates in results is a top priority. However, verifying all predictions can be costly and redundant, and testing a subset of findings is often used to estimate the true error profile. To determine how to create subsets of predictions for validation that maximize inference of global error profiles, we developed Valection, a software program that implements multiple strategies for the selection of verification candidates.

Results To evaluate these selection strategies, we obtained 261 sets of somatic mutation calls from a single-nucleotide variant caller benchmarking challenge where 21 teams competed on whole-genome sequencing datasets of three computationally-simulated tumours. By using synthetic data, we had complete ground truth of the tumours’ mutations and, therefore, we were able to accurately determine how estimates from the selected subset of verification candidates compared to the complete prediction set. We found that selection strategy performance depends on several verification study characteristics. In particular the verification budget of the experiment (i.e. how many candidates can be selected) is shown to influence estimates.

Conclusions The Valection framework is flexible, allowing for the implementation of additional selection algorithms in the future. Its applicability extends to any discipline that relies on experimental verification and will benefit from the optimization of verification candidate selection.

  • List of abbreviations

    SNV
    single-nucleotide variant
    NGS
    next-generation sequencing
    ICGC
    International Cancer Genome Consortium
    TCGA
    The Cancer Genome Atlas
    DREAM
    Dialogue for Reverse Engineering Assessments and Methods
    SMC-DNA
    Somatic Mutation Calling DNA Challenge
    TP
    true positive
    FP
    false positive
    FN
    false negative
  • Copyright 
    The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
    Back to top
    PreviousNext
    Posted January 28, 2018.
    Download PDF

    Supplementary Material

    Email

    Thank you for your interest in spreading the word about bioRxiv.

    NOTE: Your email address is requested solely to identify you as the sender of this article.

    Enter multiple addresses on separate lines or separate them with commas.
    Valection: Design Optimization for Validation and Verification Studies
    (Your Name) has forwarded a page to you from bioRxiv
    (Your Name) thought you would like to see this page from the bioRxiv website.
    CAPTCHA
    This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
    Share
    Valection: Design Optimization for Validation and Verification Studies
    Christopher I. Cooper, Delia Yao, Dorota H. Sendorek, Takafumi N. Yamaguchi, Christine P’ng, Cristian Caloian, Michael Fraser, SMC-DNA Challenge Participants, Kyle Ellrott, Adam A. Margolin, Robert G. Bristow, Joshua M. Stuart, Paul C. Boutros
    bioRxiv 254839; doi: https://doi.org/10.1101/254839
    Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
    Citation Tools
    Valection: Design Optimization for Validation and Verification Studies
    Christopher I. Cooper, Delia Yao, Dorota H. Sendorek, Takafumi N. Yamaguchi, Christine P’ng, Cristian Caloian, Michael Fraser, SMC-DNA Challenge Participants, Kyle Ellrott, Adam A. Margolin, Robert G. Bristow, Joshua M. Stuart, Paul C. Boutros
    bioRxiv 254839; doi: https://doi.org/10.1101/254839

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    • Tweet Widget
    • Facebook Like
    • Google Plus One

    Subject Area

    • Bioinformatics
    Subject Areas
    All Articles
    • Animal Behavior and Cognition (2235)
    • Biochemistry (4302)
    • Bioengineering (2958)
    • Bioinformatics (13483)
    • Biophysics (5959)
    • Cancer Biology (4633)
    • Cell Biology (6641)
    • Clinical Trials (138)
    • Developmental Biology (3939)
    • Ecology (6240)
    • Epidemiology (2053)
    • Evolutionary Biology (9181)
    • Genetics (6883)
    • Genomics (8803)
    • Immunology (3918)
    • Microbiology (11286)
    • Molecular Biology (4458)
    • Neuroscience (25625)
    • Paleontology (183)
    • Pathology (722)
    • Pharmacology and Toxicology (1209)
    • Physiology (1776)
    • Plant Biology (3999)
    • Scientific Communication and Education (892)
    • Synthetic Biology (1194)
    • Systems Biology (3627)
    • Zoology (654)