Evaluating predictive biomarkers for a binary outcome with linear versus logistic regression – Practical recommendations for the choice of the model

A predictive biomarker can forecast whether a patient benefits from a specific treatment under study. To establish predictiveness of a biomarker, a statistical interaction between the biomarker status and the treatment group concerning the clinical outcome needs to be shown. In clinical trials looking at a binary outcome, linear or logistic regression models may be used to evaluate the interaction, but the effects in the two models are different and differently interpreted. Specifically, the effects are estimated as absolute risk reductions (ARRs) and odds ratios (ORs) in the linear and logistic model, thus measuring the effect on an additive and multiplicative scale, respectively. We derived the relationship between the effects of the linear and the logistic regression model allowing for translations between the effect estimates between both models. In addition, we performed a comprehensive simulation study to compare the power of the two models under a variety of scenarios in different study designs. In general, the differences in power to detect interaction were minor, and visible differences were detected in rather unrealistic scenarios of effect size combinations and were usually in favor of the logistic model. Based on our results and theoretical considerations, we recommend to 1) estimate logistic regression models because of their statistical properties, 2) test for interaction effects and 3) calculate and report both ARRs and ORs from these using the formulae provided.

5 89 subgroup and indeed includes the effect observed in patients with normal serum albumin.
90 Consequently, it cannot be ruled out that the effect was only not detected in the smaller 91 group, and no interaction between the treatment and albumin can be observed. A second 92 reason against claiming predictiveness based on the analysis of subgroups only is that even if 93 there are effects in both subgroups, predictiveness of the biomarker cannot be excluded, 94 because the therapeutic effect might be weaker (quantitative interaction) or in the opposite 95 direction (qualitative interaction) in the second subgroup.
96 In the following, we will describe the statistical methods to evaluate the biomarker-by-97 treatment interaction that needs to be shown for the predictiveness of a biomarker. 99 The statistical method of choice to evaluate the biomarker-by-treatment interaction 100 depends on the data, i.e., the scale of the outcome variable and additional covariables that 101 are to be included in the model. In the following, we will focus on the simple setting of a 102 dichotomous outcome without further covariables. As a first approach, a linear regression 103 framework can be used in which the risk or probability of the dichotomous outcome y (e.g. 126 biomarker interaction as the odds ratio (OR) on the multiplicative scale. One advantage of 127 this model is that the predicted outcome probability will be guaranteed to lie between 0 and 128 1. Furthermore, the logit link is the natural parameter from the linear exponential family 129 which provides excellent statistical properties.

Statistical evaluation of biomarker-by-treatment interaction
130 The linear and the logistic models are different, they have different effect sizes. This can be 131 seen from S1 Appendix in which we have derived the relation between ARRs from the linear 132 probability model and ORs from the logistic regression model.

186
187 Given that interactions on both scales can occur, are relevant and should be analyzed, we 188 need to know how powerful the statistical analyses will be. More specifically, if there is an 189 additive interaction, how likely will this be detected using the "false" model, i.e., the logistic 190 regression? Vice versa, how likely is it to detect a multiplicative interaction when using the 191 linear regression? To answer these questions, we performed a simulation study that will be 192 described in the following.     277 The code is available in the supplement (S2 Appendix).
278 Results 279 Table 2 shows the estimated frequency of type I errors of the interaction test, i.e., the 280 restricted deviance test, in logistic and linear regression models to detect a interaction effect 287 (Table 2). Other scenarios meeting these restrictions but not displayed are redundant such 288 that the effects , , or have opposite signs or are permuted.

416
417 For an overview, Table 9 shows a comparison of the estimated power across the considered 418 scenarios. Here, the number of scenarios is given in which the power in the linear and 419 logistic regression model is comparable (less than 3% difference), in which one of the models 420 is slightly better (difference between 3% and 10%), and in which one of the models is better