Using the Weibull accelerated failure time regression model to predict time to health events

We describe a statistical method protocol to use a Weibull accelerated failure time (AFT) model to predict time to a health-related event. This prediction method is quite common in engineering reliability research but rarely used for medical predictions such as survival time. A worked example for how to perform the prediction using a published dataset is included.

In medical literature, most prediction models are used to predict the probability of an 2 event occurring or condition developing over a specified time period, such as the 3 Framingham 10 Year Risk of General Cardiovascular Disease [1] and FRAX: a Tool for 4 Estimating 10 year Fracture Risk. [2] In engineering reliability research, it is common to 5 use Weibull accelerated failure time (AFT) models to predict 'time to failure', for 6 example the lifespan of a machine; when a component will need replacement; and an 7 optimal maintenance schedule that maximizes reliability of the entire system. [3] 8 Weibull AFT models are also used to predict shelf-life of perishable foods and warranty 9 period of goods. [4,5] This statistical method estimates when the event will occur 10 without being bound to a defined time period (i.e. absolute time; when a component 11 will need replacement vs. relative time; 10-y risk of needing replacement). In addition 12 to predicting engineering or mechanic events, this statistical method might also be 13 useful for predicting medical events such as fracture, myocardial infarction and death. 14 In this paper, we have not intended to develop and present a prediction tool. Rather, 15 we aim to illustrate how to use the Weibull AFT model and assess point accuracy in a the shape parameter γ. The location parameter µ is predetermined as the minimum 21 value in the distribution. In survival or failure analysis, µ of 0 usually selected to 22 produce a two-parameter distribution. 23 The cumulative distribution function (CDF) for a two-parameter Weibull distributed 24 random variable T ∼ W (ρ, γ) is given as for t ≥ 0, ρ > 0, γ > 0 and F (t; ρ, γ) = 0 for t < 0

26
The probability density function (PDF) of the Weibull distribution is given as The survival function of the Weibull distribution is given as The mean survival time or mean time to failure (MTTF) is given as The log-Weibull distribution is also called Gumbel distribution, or type I extreme value 30 distribution. [7] Let T ∼ W (ρ, γ) and Y = g(T ) = logT is a one-one transformation The probability density function of Y is then derived from (2) = γ e γy e γlogρ exp − e γy e γlogρ = γe γy−γ log ρ exp −e γy−γlogρ = γe γ(y−log ρ) exp −e γ(y−log ρ) (let γ = 1 b and log ρ = a) It shows log-Weibull distribution has a Gumbel distribution G(a, b), where a = log ρ and b = 1 γ . The CDF of the log-Weibull distribution can be derived from the above PDF or by the definition directly.
The survival function of Y = log(T ) is given by The hazard function is given by where β = (β 0 , ..., β p ) are the regression coefficients of interest, σ is a scale parameter Note Eq(9) and Eq(10) are equal to set a = 0, b = 1 in Eq(5) and Eq (6). We denote 44 it as a G(0, 1) distribution or a standard Gumbel distribution. 45 Now, let us find the PDF of T from Eq(8) Puting Eq (11) and Eq (12) to Eq(9) we get Comparing Eq(13) with Eq (2) and let γ = 1 σ and ρ = exp(x β), we can see T has a 46 Weibull distribution T ∼ W (exp(x β), 1 σ ).

47
Referring to Eq(3), now the survival function of T ∼ W (exp(x β), 1 σ ) can be written 48 as Referring to Eq(4) the expected survival time of W (exp(x β), 1 σ ) is given as Since most statistical software use log(T ) to calculate these parameters, let us show the 51 distribution and characteristics of the log(T ). Let Putting Eq (16) and Eq (17) to Eq(9), we get Comparing Eq(18) to Eq(5) we can see that Y i.e. log(T ) has a G(x β, σ) 56 distribution. We also can see the error term , which has a G(0, 1) distribution. similar 57 to the error term in a simple linear regression with a N (0, σ 2 ) distribution.

58
Referring to Eq (13) and Eq(18), we can see in the Weibull AFT model, T has a 59 Weibull W (exp(x β, 1 σ )) distribution, and log(T ) has a Gumbel G(x β, σ) distribution. 60 From Eq (7) the survival function of Y i.e. log(T ) is given as And the expectation of Y i.e log(T ) is calculated as Note, since log(x) is a concave down function by Jensen's inequality, (15) is the correct formula to calculate the expected 63 survival time rather than exp(x β − σξ).

64
Parameter estimation for the Weibull AFT model 65 The parameters of Weibull AFT model can be estimated by the maximum likelihood method. The likelihood function of the n observed log(t) time, y 1 , y 2 , ..., y n is given by where δ i is the event indicator for ith subject with δ i = 1 if an event has occurred and 66 δ i = 0 if the event has not occurred. The maximum likelihood estimates p + 1 67 parameters σ, β 1 ...., β p . We can take the log of the likelihood function and use the After we calculate the MTTF, we can use the Delta method to calculate the confidence 79 interval for the MTTF. We will treat the predicted MTTF as a function ofβ andσ.

80
The standard error of the MTTF can be calculated as where Σσβ is the variance-covariance matrix ofβ andσ. The variance-covariance 82 matrix can be estimated by the observed Fisher information of the Weibull AFT model. 83 The (1-α)% confidence interval is given as where α is the type I error and z is the quantile of the standard normal distribution.
For the Weibull AFT model, we use Eq (14) to calculate pth survival time of an individual i.
After we obtainβ andσ from Eq(20) and use the the invariance property of MLE, the 88 median survival time is estimated by We can treat the estimated survival time percentile as a function ofσ andβ when p 90 is fixed by using the Delta method to calculate the standard error of predicted pth Both mean and median survival time estimates are biased when the sample size is small and the model includes censoring. To attenuate bias, Henderson et al. purposed a method to find the optimum prediction time with the minimum prediction error. [9] They specified that if an observed survival time t falls in the interval p k < t < kp where p is the predicted survival time and k > 1 then the prediction is accurate. The probability of prediction error E k condition on the predicted time p is given as P (E k |p) = P (T < p/k) + P (T > kp) (observed time falls outside the bounds) The minimum probability of prediction error P (E k |p) is achieved when Now let us calculate the minimum prediction error for the Weibull AFT model. From Eq(13) we get Applying Eq(25) and canceling the common parts, we get Taking the log of both sides we get Finally,when rearranging these terms we get Here p is the minimum prediction error survival time. We may use the Delta method to 96 get the standard error of the minimum prediction error survival time. Bootstrap 97 methods also could be used to get a confidence interval.

181
If we use SAS software, we can directly get the variance-covariance matrix ofβ and 182 σ by using the following statements The right side vector of the Eq(27) is calculated as  This standard error differs slightly to our calculations as R uses Greedwood's formula to 203 calculate the standard error of the survival function. [11] 204 Assess the point prediction accuracy 205 Two methods of assessing accuracy of predicted survival time have been proposed by 206 Parkes and Christakis and Lamont. [12,13] Parkes suggests let t be the observed survival 207 time and p be the predicted time. If p/k ≤ t ≤ kp then the point prediction p is defined 208 as "accurate" and any value outside the interval is "inaccurate". Parkes proposes k = 2 209 as suitable. As a more strict assessment, Christakis and Lamont purposed a 33% rule to 210 measure accuracy, where the observed time is divided by the predicted survival time 211 and defined as "accurate" if this quotient lies between 0.67 and 1.33. Values less than 212 0.67 or greater than 1.33 are defined as "errors". We compared the accuracy of our 213 example using Parkes method (k = 2) and Christakis' method. The accuracy rate (i.e. 214 proportion of "accurate" prediction over the total sample size) is presented in Table 1. 215

216
In this paper, we introduced how to use a Weibull AFT model to predict when a 217 health-related event will happen. Expected survival time, median survival time and 218 minimum prediction error survival time from baseline to event were estimated and 219 prediction accuracy assessed using Parkes' and Christakis and Lamont's method. When 220 we fixed k = 2 the accuracy was 55.6% for median, 50% for MTTF and 51.1% for 221 MPET. When used the method as suggested by Christakis and Lamont, the accuracy 222 rate decreased to 30.0%,37.8 % and 33.3% (data not shown), respectively. Our example 223 is limited as the sample is small and we only used two predictors. However, with a larger 224 sample size and more predictors, accuracy may improve. In this example, we did not Parametric survival models are advantageous in predicting survival time compared 227 to semi-parametric Cox regression models. The Cox regression model which can be 228 specified as S i (t|x i ) = S 0 (t) exp(x i β) cannot predict time directly but calculates the 229 probability of an event occurring within a specified timeframe. However, one 230 disadvantage of parametric survival models such as Weibull AFT is the need to to make 231 stronger assumptions than semi-parametric models. [14] 232 Currently, most clinical prediction models describe a patient's likelihood of having or 233 developing a certain disease as a traditional probability value or a risk score that is 234 based on calculated probability. [15] However, probabilities are not intuitive to the 235 general population and probability itself can be defined in different ways. [16] In 236 practice, the time axis remains the most natural measure for both clinicians and 237 patients. It is much easier to understand a survival time rather than probability of 238 survival in a certain timepoint. Predicting when an event will happen provides a 239 practical and interpretable guide for clinicians, health care providers and patients and 240 and can help with decision making over remaining lifespan.

241
Upon recognizing that the Weibull AFT can be adapted from an engineering 242 reliability framework to a medical framework, the next step involves developing a real 243 prediction tool for medical predictive purposes, applying models to larger datasets and 244 performing rigorous internal and external validation procedures. Such steps are outlined 245 in the book Clinical prediction models: a practical approach to development, validation, 246 and updating for the steps to develop a real prediction tool. [17] 247