Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Validation of the linear regression method to evaluate population accuracy and bias of predictions for non-linear models

View ORCID ProfileHaipeng Yu, Rohan L Fernando, Jack CM Dekkers
doi: https://doi.org/10.1101/2022.10.02.510518
Haipeng Yu
1Department of Animal Sciences, University of Florida, Gainesville, FL, USA 32611
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Haipeng Yu
  • For correspondence: haipengyu@ufl.edu
Rohan L Fernando
2Department of Animal Science, Iowa State University, Ames, IA, USA 50011
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jack CM Dekkers
2Department of Animal Science, Iowa State University, Ames, IA, USA 50011
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Background The linear regression method (LR) was proposed to estimate population bias and accuracy of predictions, while addressing the limitations of commonly used cross-validation methods. The validity and behavior of the LR method have been provided and studied for linear model predictions but not for non-linear models. The objectives of this study were to 1) provide a mathematical proof for the validity of the LR method when predictions are based on conditional mean, 2) explore the behavior of the LR method in estimating bias and accuracy of predictions when the model fitted is different from the true model, and 3) provide guidelines on how to appropriately partition the data into training and validation such that the LR method can identify presence of bias and accuracy in predictions.

Results We present a mathematical proof for the validity of the LR method to estimate bias and accuracy of predictions based on the conditional mean, including for non-linear models. Using simulated data, we show that the LR method can accurately detect bias and estimate accuracy of predictions when an incorrect model is fitted when the data is partitioned such that the values of relevant predictor variables differ in the training and validation sets. But the LR method fails when the data are not partitioned in that manner.

Conclusions The LR method was proven to be a valid method to evaluate the population bias and accuracy of predictions based on the conditional mean, regardless of whether it is a linear or non-linear function of the data. The ability of the LR method to detect bias and estimate accuracy of predictions when the model fitted is incorrect depends on how the data are partitioned. To appropriately test the predictive ability of a model using the LR method, the values of the relevant predictor variables need to be different between the training and validation sets.

Competing Interest Statement

The authors have declared no competing interest.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted October 05, 2022.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Validation of the linear regression method to evaluate population accuracy and bias of predictions for non-linear models
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Validation of the linear regression method to evaluate population accuracy and bias of predictions for non-linear models
Haipeng Yu, Rohan L Fernando, Jack CM Dekkers
bioRxiv 2022.10.02.510518; doi: https://doi.org/10.1101/2022.10.02.510518
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Validation of the linear regression method to evaluate population accuracy and bias of predictions for non-linear models
Haipeng Yu, Rohan L Fernando, Jack CM Dekkers
bioRxiv 2022.10.02.510518; doi: https://doi.org/10.1101/2022.10.02.510518

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genetics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4239)
  • Biochemistry (9164)
  • Bioengineering (6801)
  • Bioinformatics (24061)
  • Biophysics (12154)
  • Cancer Biology (9564)
  • Cell Biology (13822)
  • Clinical Trials (138)
  • Developmental Biology (7655)
  • Ecology (11736)
  • Epidemiology (2066)
  • Evolutionary Biology (15538)
  • Genetics (10670)
  • Genomics (14357)
  • Immunology (9508)
  • Microbiology (22894)
  • Molecular Biology (9123)
  • Neuroscience (49107)
  • Paleontology (357)
  • Pathology (1487)
  • Pharmacology and Toxicology (2581)
  • Physiology (3851)
  • Plant Biology (8351)
  • Scientific Communication and Education (1473)
  • Synthetic Biology (2301)
  • Systems Biology (6205)
  • Zoology (1302)