RT Journal Article
SR Electronic
T1 Validation of the linear regression method to evaluate population accuracy and bias of predictions for non-linear models
JF bioRxiv
FD Cold Spring Harbor Laboratory
SP 2022.10.02.510518
DO 10.1101/2022.10.02.510518
A1 Haipeng Yu
A1 Rohan L Fernando
A1 Jack CM Dekkers
YR 2022
UL http://biorxiv.org/content/early/2022/10/05/2022.10.02.510518.abstract
AB Background The linear regression method (LR) was proposed to estimate population bias and accuracy of predictions, while addressing the limitations of commonly used cross-validation methods. The validity and behavior of the LR method have been provided and studied for linear model predictions but not for non-linear models. The objectives of this study were to 1) provide a mathematical proof for the validity of the LR method when predictions are based on conditional mean, 2) explore the behavior of the LR method in estimating bias and accuracy of predictions when the model fitted is different from the true model, and 3) provide guidelines on how to appropriately partition the data into training and validation such that the LR method can identify presence of bias and accuracy in predictions.Results We present a mathematical proof for the validity of the LR method to estimate bias and accuracy of predictions based on the conditional mean, including for non-linear models. Using simulated data, we show that the LR method can accurately detect bias and estimate accuracy of predictions when an incorrect model is fitted when the data is partitioned such that the values of relevant predictor variables differ in the training and validation sets. But the LR method fails when the data are not partitioned in that manner.Conclusions The LR method was proven to be a valid method to evaluate the population bias and accuracy of predictions based on the conditional mean, regardless of whether it is a linear or non-linear function of the data. The ability of the LR method to detect bias and estimate accuracy of predictions when the model fitted is incorrect depends on how the data are partitioned. To appropriately test the predictive ability of a model using the LR method, the values of the relevant predictor variables need to be different between the training and validation sets.Competing Interest StatementThe authors have declared no competing interest.