Revisiting the Natural History of Pulmonary Tuberculosis: a Bayesian Estimation of Natural Recovery and Mortality rates

Background Tuberculosis (TB) natural history remains poorly characterised and new investigations are impossible as it would be unethical to follow up TB patients without treating them. Estimates of TB burden and mortality rely heavily on TB self-recovery and mortality rates, as around 40% of individuals with TB are never detected, making their prognosis entirely dependent on the disease natural history. Methods We considered the reports identified in a previous systematic review of studies from the prechemotherapy era, and extracted detailed data on mortality over time. We used a continuous-time Markov model in a Bayesian framework to estimate the rates of TB-induced mortality and self-cure. A hierarchical model was employed to allow estimates to vary by cohort. Inference was performed separately for smear-positive TB (SP-TB) and smear-negative TB (SN-TB). Results We included 41 cohorts of SP-TB patients and 19 cohorts of pulmonary SN-TB patients in the analysis. No data were available on extrapulmonary TB. The posterior median estimates of the TB-specific mortality rates were 0.390 year−1 (0.329-0.452, 95% credible interval) and 0.025 year−1 (0.016-0.036) for SP-TB and SN-TB patients, respectively. The estimates for self-recovery rates were 0.233 year−1 (0.179-0.293) and 0.147 year−1 (0.087-0.248) for SP-TB and SN-TB patients, respectively. These rates correspond to average durations of untreated TB of 1.57 years (1.37-1.81) and 5.35 years (3.42-8.23) for SP-TB and SN-TB, respectively, when assuming a natural mortality rate of 0.014 year−1 (i.e. a 70-year life expectancy). Conclusions TB-specific mortality rates are around 15 times higher for SP-TB than for SN-TB patients. This difference was underestimated dramatically in previous TB modelling studies that parameterised models based on the ratio of 3.3 between the 10-year case fatality of SP-TB and SN-TB. Our findings raise important concerns about the accuracy of past and current estimates of TB mortality and predicted impact of control interventions on TB mortality.


Introduction
The TB plague has threatened mankind since prehistory. Hippocrates described the disease (then referred to as "phthisis") as "the worst of the diseases that occurred, alone responsible for the great mortality" around 2,400 years ago (1). Today, TB is still the world's most lethal disease from a single infectious agent (2). One obstacle to efficient TB control is the lack of knowledge concerning some of the most fundamental aspects of TB epidemiology, such as the natural history of TB. This phenomenon is impossible to investigate using modern data, as antibiotics should be systematically provided to all individuals diagnosed with TB (3). And yet, the prognosis of untreated individuals remains of critical importance, as currently around 40% of diseased individuals are never identified and this proportion reaches even higher levels in settings where TB control functions poorly and where TB is the most likely to strike (4,5).
Characterising TB natural history accurately is all the more important because it is a central component of the methodology currently used to produce disease burden estimates. In particular, TB incidence is often estimated by dividing the disease prevalencegenerally obtained from field surveysby the estimated average duration of a TB episode, the latter relying significantly on estimates of spontaneous recovery and TB mortality rates (6). The poor characterisation of these parameters is extremely concerning, as TB incidence is used as the primary burden indicator in global TB control and by the main global health donors when allocating funding between countries (7). Moreover, as TB control policies increasingly rely on predictions based on mathematical models, it is essential to ensure that the natural history of the disease is accurately captured by such systems (8).
A systematic review aimed at better defining the prognosis of untreated TB was published in 2011 (9). The review only included TB literature from the pre-chemotherapy era. This review represents a landmark for TB epidemiology, as it provided a comprehensive overview of all the available reports on untreated TB patients and presented quantitative estimates of TB case fatality.
Specifically, this study reported a 70% 10-year case fatality for smear-positive TB (SP-TB) patients and 20% for smear-negative TB (SN-TB) patients. Since its publication, this review has constituted the main source of parameterisation for the natural history of TB in mathematical models and is used by WHO to estimate TB incidence. However, because this study was not designed to provide modelling guidance specifically, the optimal approach to interpreting these results and incorporating them into mathematical systems remains unclear.
For example, the case fatality proportions cited above provide the final outcome but do not describe mortality over time or the spontaneous recovery rates, both of which are critical in disease modelling. Moreover, while these estimates were obtained from aggregation across several cohorts, the variability around the reported values due to the heterogeneity between cohorts has not been quantified and such information is crucial for modellers to produce informative estimates. Finally, it is to be noted that these rates include both the TB-induced mortality and natural mortality, as they are based on overall fatality rates estimated from survival proportions of TB patients over time.
Since the contribution from natural mortality is not explicitly reported, these estimates cannot be updated to reflect changes in natural mortality rates over time (10).
In the current paper, we propose a re-investigation of the TB prognosis data from the prechemotherapy era by taking a mathematical modelling approach to estimate TB-specific mortality rates and spontaneous recovery rates.

Literature review and data extraction
We considered the manuscripts that were identified in the previous systematic review of studies from the prechemotherapy era (9), and extracted data on mortality over time. The reports present and 23 groups of SN-TB patients ("closed TB cases"). Only 19 SN-TB cohorts could be included in the quantitative analysis, as the number of patients was not reported in four cohorts. All cohorts reported on pulmonary TB and we employed the classification that was used in the previous review to distinguish SP-TB from SN-TB (9).
All studies originated from Western Europe, with four reports from England, three from Denmark, two from the Netherlands, and one each from: Poland, Norway, Switzerland, Sweden, Germany and Iceland. Six studies reported on sanatorium patients, six on officially notified individuals with TB and three on hospital or dispensary patients. Cohort sizes varied between eight and 2,382, with a median size of 379 patients. The detailed cohort profiles are provided in the Supplement. Figure 1 presents the raw mortality proportions over time extracted from the different reports.

Model and parameters
To estimate the TB mortality and recovery rates, we used a continuous-time homogeneous Markov chain that mimics the progression between the main stages that comprise the natural history of TB: active TB, self-recovery and death. Using this approach, transitions between the different states are allowed at any time and governed by constant transition rates. Diseased individuals may spontaneously recover (at rate ) or die (at rate + ) from TB, while recovered individuals die only from natural causes at rate . An illustration of this model is shown in Figure 2. The main objective of this study was to estimate the parameters and . is the rate of spontaneous recovery, is the natural mortality rate and is the additional mortality rate due to TB disease.

Parameter estimation
We used a Bayesian approach to estimate the parameters characterising TB natural history. We used a hierarchical approach to allow for and to vary by cohort (11). That is, the associated parameters (then and ) were assumed to be drawn from zero-truncated normal prior distributions ~ ( , ) and ∼ ( , ). In a sensitivity analysis we replaced the normal priors with gamma distributions ~ Gamma( , ) and ∼ Gamma( , ). All hyperparameters were assigned improper uniform prior distributions.
Particular attention was paid to the estimation of the natural mortality rate ( ), as this parameter was anticipated to influence the results of the analysis. Although all cohorts originated from similar geographical and socio-economic settings (being all from Western Europe), variations were observed in their recruitment years, as well as in their demographic characteristics (see Figures S1 and S2 in Supplement). To account for this and to incorporate uncertainty around the parameter value, was included as an estimated cohort-specific parameter. We used more informative priors on for the cohorts with available information about demographics (see Supplement for details).  The average disease duration and TB case fatality ratios in the absence of treatment also depend on the value of the natural mortality rate ( ). Table 1   Results are presented as median estimates and 95% simulation intervals.   Inference was performed separately for the different subgroups. Results are presented as median estimates and 95% simulation intervals.

Sensitivity analysis
The estimates obtained using gamma prior distributions for the cohort-specific parameters were very similar to estimates obtained in the primary analysis. In this sensitivity analysis, the posterior median estimates of TB-specific mortality rates were 0.

Discussion
This study is the first to produce quantitative estimates of disease-induced mortality and selfrecovery rates for pulmonary TB patients. It suggests that TB mortality rates are around 15 times higher for SP-TB than for SN-TB patients, while self-recovery rates are comparable between the two categories of patients.
In order to compare our estimates to the parameter values previously employed, we conducted a  The current study is also likely to have critical implications for the estimation of local and global TB incidence and mortality (6). In particular, using our new estimates to re-estimate disease burden indicators in places with poor case detection may result in marked discrepancies with the previously reported estimates, as TB natural history is expected to play a major role in such settings.
The hierarchical approach allowed us to highlight the marked heterogeneity in cohort-specific estimates of TB mortality of smear-positive patients. This may be explained by the different diagnostic methods employed to identify TB and the various types of care used in the different cohorts. For smear positive cases, we were able to examine the impact of some factors. In particular, we found that patients who did not attend sanatoria had significantly higher mortality rates (50% increase) than those treated in such facilities. In contrast, no difference by sex was noted in the prognosis of untreated pulmonary TB patients.
An important strength of this study is that we were able to distinguish mortality that is specifically induced by TB from that linked to natural causes. That is, the rates presented in this report should be interpreted as TB-specific mortality rates that have to be added to natural mortality in order to capture the overall death rate. The main advantage of this distinction is that our estimates could be applied directly to any settings without the need for any further adjustment, such that the TBspecific mortality rate can be simply added to the setting-specific natural mortality rate.
Our study includes several limitations. First, we were unable to provide estimates for extrapulmonary TB patients, as the required data were not available. For the same reason, the analysis could not be performed separately for male versus female patients or for sanatorium versus non-sanatorium patients in the case of smear-negative patients. In addition, no age-specific estimates could be obtained due to the limited number of cohorts with information about agedistribution. Also, it is to be noted that most patients included in the analysis were aged 15 and over, such that the rates presented here are representative of adult individuals with pulmonary TB.
Another limitation is that the diagnostics used in these historical cohorts differ substantially from current state of the art diagnostics, especially in the case of SN-TB patients. As culture-based diagnostics were only invented in the early 1930s, many SN-TB patients were probably diagnosed clinically or on the basis of X-ray. Moreover, the classification into SP-TB or SN-TB patients may be unclear, as it depends very much on the quality of the microscopy, the number of slides examined or how sputum was collected. A final caveat is a possible lack of representativeness of the cohorts included, as undiagnosed patients by definition were not considered. Moreover, our finding that patients treated in sanatoria had a better prognosis is difficult to interpret, as there may be selection bias driven by factors influencing the decision of whether to admit TB patients to these institutions (16,17).
Unfortunately, most of the limitations listed above could not be addressed by future studies, as it would be unethical to follow up TB patients without providing them with treatment. Accordingly, no other data than those used to inform the present analysis and previously reported by Tiemersma and colleagues could be used to estimate TB mortality and self-recovery rates (9).
In conclusion, this study presents detailed estimates for the parameters that govern the natural history of pulmonary TB patients. It highlights that the gap between smear-positive and smearnegative TB patients in terms of mortality rates is much higher than previously thought. The parameter values reported in this study should improve the accuracy of disease burden estimations and make future TB modelling works more reliable.