Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

CV-α: designing validations sets to increase the precision and enable multiple comparison tests in genomic prediction

View ORCID ProfileRafael Massahiro Yassue, View ORCID ProfileJosé Felipe Gonzaga Sabadin, View ORCID ProfileGiovanni Galli, View ORCID ProfileFilipe Couto Alves, View ORCID ProfileRoberto Fritsche-Neto
doi: https://doi.org/10.1101/2020.11.11.376343
Rafael Massahiro Yassue
1Department of Genetics, Luiz de Queiroz College of Agriculture, University of Sao Paulo, Piracicaba, São Paulo, Brazil
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Rafael Massahiro Yassue
José Felipe Gonzaga Sabadin
1Department of Genetics, Luiz de Queiroz College of Agriculture, University of Sao Paulo, Piracicaba, São Paulo, Brazil
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for José Felipe Gonzaga Sabadin
Giovanni Galli
1Department of Genetics, Luiz de Queiroz College of Agriculture, University of Sao Paulo, Piracicaba, São Paulo, Brazil
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Giovanni Galli
Filipe Couto Alves
2Departments of Epidemiology and Biostatistics, Statistics and Probability and Institute of Quantitative Health Science and Engineering, Michigan State University, East Lansing, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Filipe Couto Alves
Roberto Fritsche-Neto
1Department of Genetics, Luiz de Queiroz College of Agriculture, University of Sao Paulo, Piracicaba, São Paulo, Brazil
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Roberto Fritsche-Neto
  • For correspondence: roberto.neto@usp.br
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

Usually, the comparison among genomic prediction models is based on validation schemes as Repeated Random Subsampling (RRS) or K-fold cross-validation. Nevertheless, the design of training and validation sets has a high effect on the way and subjectiveness that we compare models. Those procedures cited above have an overlap across replicates that might cause an overestimated estimate and lack of residuals independence due to resampling issues and might cause less accurate results. Furthermore, posthoc tests, such as ANOVA, are not recommended due to assumption unfulfilled regarding residuals independence. Thus, we propose a new way to sample observations to build training and validation sets based on cross-validation alpha-based design (CV-α). The CV-α was meant to create several scenarios of validation (replicates x folds), regardless of the number of treatments. Using CV-α, the number of genotypes in the same fold across replicates was much lower than K-fold, indicating higher residual independence. Therefore, based on the CV-α results, as proof of concept, via ANOVA, we could compare the proposed methodology to RRS and K-fold, applying four genomic prediction models with a simulated and real dataset. Concerning the predictive ability and bias, all validation methods showed similar performance. However, regarding the mean squared error and coefficient of variation, the CV-α method presented the best performance under the evaluated scenarios. Moreover, as it has no additional cost nor complexity, it is more reliable and allows the use of non-subjective methods to compare models and factors. Therefore, CV-α can be considered a more precise validation methodology for model selection.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • https://github.com/allogamous/CV-Alpha

  • Abbreviations

    CV
    cross-validation
    CV-α
    cross-validation alpha-based design
    GP
    Genomic prediction
    RRS
    Repeated Random Subsampling
    TGV
    true genetic value
  • Copyright 
    The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
    Back to top
    PreviousNext
    Posted November 12, 2020.
    Download PDF
    Data/Code
    Email

    Thank you for your interest in spreading the word about bioRxiv.

    NOTE: Your email address is requested solely to identify you as the sender of this article.

    Enter multiple addresses on separate lines or separate them with commas.
    CV-α: designing validations sets to increase the precision and enable multiple comparison tests in genomic prediction
    (Your Name) has forwarded a page to you from bioRxiv
    (Your Name) thought you would like to see this page from the bioRxiv website.
    CAPTCHA
    This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
    Share
    CV-α: designing validations sets to increase the precision and enable multiple comparison tests in genomic prediction
    Rafael Massahiro Yassue, José Felipe Gonzaga Sabadin, Giovanni Galli, Filipe Couto Alves, Roberto Fritsche-Neto
    bioRxiv 2020.11.11.376343; doi: https://doi.org/10.1101/2020.11.11.376343
    Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
    Citation Tools
    CV-α: designing validations sets to increase the precision and enable multiple comparison tests in genomic prediction
    Rafael Massahiro Yassue, José Felipe Gonzaga Sabadin, Giovanni Galli, Filipe Couto Alves, Roberto Fritsche-Neto
    bioRxiv 2020.11.11.376343; doi: https://doi.org/10.1101/2020.11.11.376343

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    • Tweet Widget
    • Facebook Like
    • Google Plus One

    Subject Area

    • Genomics
    Subject Areas
    All Articles
    • Animal Behavior and Cognition (4838)
    • Biochemistry (10734)
    • Bioengineering (8013)
    • Bioinformatics (27174)
    • Biophysics (13935)
    • Cancer Biology (11080)
    • Cell Biology (15984)
    • Clinical Trials (138)
    • Developmental Biology (8757)
    • Ecology (13232)
    • Epidemiology (2067)
    • Evolutionary Biology (17309)
    • Genetics (11665)
    • Genomics (15882)
    • Immunology (10989)
    • Microbiology (25989)
    • Molecular Biology (10608)
    • Neuroscience (56326)
    • Paleontology (417)
    • Pathology (1727)
    • Pharmacology and Toxicology (2998)
    • Physiology (4529)
    • Plant Biology (9588)
    • Scientific Communication and Education (1610)
    • Synthetic Biology (2671)
    • Systems Biology (6959)
    • Zoology (1507)