A Bayesian analysis of the association between Leukotriene A4 Hydrolase genotype and survival probability of tuberculous meningitis patients treated with adjunctive dexamethasone

Tuberculous meningitis (TBM) remains the most devastating form of tuberculosis (TB) with high mortality despite effective antimicrobial treatment. As mortality has been linked to excessive inflammation, anti-inflammatory glucocorticoids are now routinely used as adjunctive treatment with antimicrobial therapy. However, they reduce mortality by only ~ 30%, raising the possibility that only a subset of TBM deaths are caused by inflammatory pathophysiology. Studies in Vietnam found that the survival benefit of adjunctive glucocorticoids was limited to individuals with a common promoter variant in the leukotriene A4 hydrolase (LTA4H) gene encoding an enzyme that regulates inflammatory eicosanoid expression. The variant constitutes a C/T transition with TT homozygotes having increased expression over CT heterozygotes and CC homozygotes. In Vietnam, the LTA4H TT genotype predicted survival, consistent with dexamethasone benefiting only those individuals with a dysregulated hyper-inflammatory response. However, a study of TBM patients in Indonesia did not find the LTA4H TT genotype to confer a significant survival benefit. Given the potential of personalized life-saving anti-inflammatory therapies guided by LTA4H genotype, we have used Bayesian methods to analyze the data from both studies. Bayesian analysis reveals that the LTA4H TT genotype confers survival benefit in both the Vietnam and Indonesia cohorts that begins within days and continues long-term. However, its benefit is nullified in the most severe cases where other factors cause early mortality. LTA4H TT genotype is associated with increased survival in HIV-positive patients also. Thus, our analysis extends the association of LTA4H genotype with TBM survival to populations outside of Vietnam and to HIV-positive patients. Patient LTA4H genotyping used in conjunction with disease severity assessment may help to target glucocorticoids to patients most likely to benefit from this broadly-acting immunosuppressive regimen despite its significant adverse effects.

by only ~ 30%, raising the possibility that only a subset of TBM deaths are caused by inflammatory pathophysiology. Studies in Vietnam found that the survival benefit of adjunctive glucocorticoids was limited to individuals with a common promoter variant in the leukotriene A4 hydrolase (LTA4H) gene encoding an enzyme that regulates inflammatory eicosanoid expression. The variant constitutes a C/T transition with TT homozygotes having increased expression over CT heterozygotes and CC homozygotes.
In Vietnam, the LTA4H TT genotype predicted survival, consistent with dexamethasone benefiting only those individuals with a dysregulated hyper-inflammatory response.
However, a study of TBM patients in Indonesia did not find the LTA4H TT genotype to confer a significant survival benefit. Given the potential of personalized life-saving antiinflammatory therapies guided by LTA4H genotype, we have used Bayesian methods to analyze the data from both studies. Bayesian analysis reveals that the LTA4H TT genotype confers survival benefit in both the Vietnam and Indonesia cohorts that begins within days and continues long-term. However, its benefit is nullified in the most severe cases where other factors cause early mortality. LTA4H TT genotype is associated with increased survival in HIV-positive patients also. Thus, our analysis extends the association of LTA4H genotype with TBM survival to populations outside of Vietnam and to HIV-positive patients. Patient LTA4H genotyping used in conjunction with disease 3 severity assessment may help to target glucocorticoids to patients most likely to benefit from this broadly-acting immunosuppressive regimen despite its significant adverse effects.

INTRODUCTION
Tuberculous meningitis (TBM) is the most severe form of tuberculosis. Despite effective antimicrobial therapy, it results in 20-25% mortality in HIV-negative individuals and ~ 40% mortality in HIV-positive individuals (Stadelman et al., 2020, in press;Thwaites, van Toorn, & Schoeman, 2013). A long-standing hypothesis that an excessive intracerebral inflammatory response underlies TBM mortality (Shane & Riley, 1953) led to multiple trials of adjunctive anti-inflammatory treatment with glucocorticoids (e.g. dexamethasone) (Prasad, Singh, & Ryan, 2016;Wilkinson et al., 2017). Findings from a randomized controlled trial (RCT) in Vietnam that adjunctive dexamethasone improved survival by ~ 30% led to it becoming standard of care treatment (Thwaites et al., 2004). However, the modest benefit of adjunctive dexamethasone treatment suggested a heterogeneity in glucocorticoid-responsiveness (Donald & Schoeman, 2004;Schoeman & Donald, 2013). Findings in a zebrafish model of TB provided a testable hypothesis for a mechanism underlying this heterogeneity (Thwaites et al., 2004;Tobin et al., 2012;Tobin et al., 2010). The zebrafish findings suggested that either deficiency or excess of leukotriene A4 hydrolase (LTA4H), a key enzyme that regulates the balance of pro-and anti-inflammatory eicosanoids, increase susceptibility to TBM for opposite reasons -too little or too much inflammation (Tobin et al., 2012;Tobin et al., 2010). It became possible to test the prediction when a common human functional LTA4H promoter variant (rs17525495) was identified comprising a C/T transition that controlled LTA4H expression, with the T allele causing increased expression (Tobin et al., 2012). A retrospective analysis of patient LTA4H rs17525495 genotypes in the Vietnam dexamethasone RCT cohort confirmed the prediction (Thwaites et al., 2004;Tobin et al., 2012). Among HIV-negative patients, the survival benefit of dexamethasone was restricted to patients with the hyper-inflammatory TT genotype, with CC patients potentially harmed by this treatment (Tobin et al., 2012).
These findings supported the model that mortality from TBM was due to two distinct inflammatory states, and that LTA4H genotype might be a critical determinant of inflammation and consequently of the response to adjunctive anti-inflammatory treatment. If true, then personalized genotype-directed adjunctive glucocorticoid treatment would be warranted, with the drug given only to TT patients. This would be particularly important given the possible harm to the hypo-inflammatory CC group, as well as the adverse effects of long-term high dose treatment with a broadly acting immunosuppressant.
To further these findings, two new studies of the association of LTA4H genotype with TBM survival in HIV-negative patients were performed in Vietnam and Indonesia, respectively (Thuong et al., 2017;van Laarhoven et al., 2017). Because glucocorticoid adjunctive therapy had become standard of care owing to the benefit observed in the randomised controlled trial (Thwaites et al., 2004), all patients received it in both studies.
Therefore, the prediction that could be tested was that TT mortality is less than CC+CT mortality. Whereas the Vietnam study confirmed this prediction, the Indonesia study did not. The Vietnam cohort had an overall mortality of 18.8%, similar to that reported in the literature (Stadelman et al., 2020, in press). A striking feature of the Indonesia cohort was its more than two-fold increased mortality in comparison with the Vietnam cohort.
Moreover, most of the Indonesia cohort deaths occurred early with a median time to death of eight days versus 50 days in Vietnam (Table 1). This high early mortality raised the possibility that the impact of the LTA4H variant differs by disease severity, and may not be relevant in more severe disease (Fava & Schurr, 2017). If so, then the effects of the LTA4H genotype were being masked by the preponderance of extremely severe cases in the Indonesia cohort (Fava & Schurr, 2017).

6
Both studies used, as the primary metric of significance testing, Cox regression modelling, an approach that assumes that the ratio of hazard rates between groups is constant throughout the observed period (Bradburn, Clark, Love, & Altman, 2003;Greenland et al., 2016). Therefore, it could miss important differences in these studies of TBM, a disease which can present acutely yet have a prolonged time course with vastly differing mortality risks over time (Thuong et al., 2017;Thwaites et al., 2013;van Laarhoven et al., 2017). Moreover, testing the hypothesis that LTA4H effects are limited to specific disease severity grades requires subgroup analysis. The use of frequentist statistics would limit the ability to perform such subgroup analyses because the penalties it sets for multiple comparisons do not reflect real-world situations (Gelman & Loken, 2013;Smith & Ebrahim, 2002). Bayesian analysis is ideally suited to simultaneously estimate treatment effects in multiple subgroups because it results in different interactions with the number of results obtained which are much less problematic than those arising in frequentist analysis (Box 1). Therefore, we used a Bayesian approach to analyze data from the two cohorts (Gelman & Loken, 2013;MacKay, 2003;Zampieri et al., 2020) (see also Methods).
Bayesian analysis also enables the detection of significant differences that might be limited to just a part of the time-course and therefore would allow analysis to be independent of the kinetics of death in the Vietnam and Indonesia cohorts. Finally, medical management decisions are guided by an assessment of the probabilities of outcome. In TBM, the question faced by the clinician is how likely is glucocorticoid therapy going to help or harm a patient. Bayesian paradigms, unlike frequentist ones, understand probability in a real-world way, using it to indicate the plausibility of a particular conclusion (MacKay, 2003;Zampieri et al., 2020).
The severity grade-specific analyses, coupled with temporal analyses made possible by Bayesian methods, reveal that the LTA4H TT genotype is associated with 7 survival in both HIV-negative cohorts and that this association extends to HIV-positive patients as well.

METHODS
The anonymized patient cohort data used here has been previously described in detail (Thuong et al., 2017;van Laarhoven et al., 2017) (Heemskerk et al., 2016). All patients were treated with adjunctive dexamethasone for the first 6-8 weeks with the regimen adjusted to disease severity on presentation (Thwaites et al., 2004).
Patient cohorts were compared overall as well as stratified into disease severity groups based on the TBM grade and by LTA4H genotypes, into the TT group (previously linked with response to steroids) and the combined CC and CT genotypes (non-TT group). The Bayesian analysis methods used are detailed in Supplementary Methods. We limited analysis to the first 9 months of the one-year observation period in Indonesia to be compatible with the 9-month observation period in Vietnam.

8
The age range of patients was similar in the Vietnam and Indonesia cohorts with Indonesia patients tending to be younger (Table 1). We compared the cohorts for disease severity on presentation using both measures used in the studies, the Glasgow Coma Score (GCS) and the modified British Medical Research Council TBM grade (TBM grade) (Box 2). The Indonesia cohort had more severe disease on presentation by both measures (Table 1 and Figure S1). We used the TBM grade for further analyses because it divides patients into just three severity groups, making comparisons more feasible.
Importantly, it also provides clinically relevant separation of GCS 15 patients, the most highly represented in both cohorts ( Figure S1), into those with and without focal neurological signs.

LTA4H TT genotype association with survival becomes stronger with increasing disease severity in Vietnam HIV-negative patients
Because the Indonesia cohort was skewed towards more severe disease on presentation, one explanation for the lack of an LTA4H genotype association with survival in Indonesia was that the underlying association is overridden by severe disease, a strong independent correlate of mortality (Fava & Schurr, 2017;Wang et al., 2019). Indeed, a detailed comparison of the Vietnam and Indonesia cohorts showed that 76% of Indonesia patients presented in Grade 2 versus 47% of the Vietnam patients (Table 1). This increase was driven entirely by a shift from Grade 1 (9% vs 37% in Vietnam). The cohorts had nearly equal proportions of Grade 3 patients (15% each). Therefore, the ~ 2-fold-increased overall mortality in Indonesia could be largely accounted for by a corresponding increase of Grade 2 patients (1.6 fold higher than 9 Vietnam). Disease severity as assessed by BMRC Grade or GCS at presentation is also a strong predictor of earlier death (Fava & Schurr, 2017;Wang et al., 2019), and the Indonesia patients died sooner (median time to death 8 days versus 50 days in Vietnam) (Table 1).
If LTA4H genotype associates with survival most strongly in mild disease, then the association seen for the entire Vietnam cohort (Thuong et al., 2017) should be strongest in Grade 1 patients. We tested this prediction with Bayesian analysis using a prior that was intentionally uninformative and very wide, while still being centered on clinician-expected survival curves and hazard rates. We included additional parameters to allow for the possibility that not all patient deaths would be linked to the same mode of death. Importantly, the model and priors used allowed us to incorporate our preexisting knowledge that mortality risk to a population of TBM patients varies smoothly with time, rather than occurring at a number of discrete times common to all patients as is implied by the maximum likelihood solution illustrated by a Kaplan-Meier plot. The details of the model and the priors are in Supplementary methods. The definitions of terms and abbreviations used throughout the paper are in Box 3.
In the original Vietnam study, the TT genotype was associated with survival and the CC and CT genotypes had similarly increased mortality over TT (Thuong et al., 2017), so we compared TT survival to that of CT and CC combined (non-TT). We first confirmed that the TT variant distribution did not differ by grade on presentation ( Table   2). Bayesian analysis confirmed that the TT genotype was associated with a survival advantage in the overall cohort ( Figure 1A). Moreover, the analysis revealed that this survival advantage manifested early and persisted over most of the observation period 10 ( Figure 1A). The Bayesian method enables a more detailed evaluation of mortality risk over time through a hazard rate analysis. This analysis reinforced the significantly higher hazard rate for non-TT starting at 4 days and persisting through 120 days ( Figure 1A, inset).
When we stratified the Vietnam patients by grade and LTA4H genotype, we got a surprising result. The LTA4H TT association with survival was barely present in Grade 1, a bit more in Grade 2, and strongest in Grade 3 where it reached significance ( Figure   1B-D). Similar to the overall cohort, the Grade 3 increased survival probability for TT also manifested early and persisted throughout ( Figure 1D). Hazard rate analysis again showed that non-TT patients had a greatly increased risk of mortality very early ( Figure   1D, inset). The non-TT over TT hazard rate ratio peaked at 16 on day 3 ( Figure 1D, inset). This early high peak dropped only gradually over time; it was 5 at day 100 and remained >1 throughout (data not shown).
In sum, our analysis revealed that in Vietnam, LTA4H TT was associated with survival, not in mild disease as suggested earlier (Fava & Schurr, 2017), but rather in the most severe disease grade. In fact, the bulk of the overall association was being driven by Grade 3 patients who constituted only 15.9% of the cohort (Table 1). Non-TT patients were at greatest risk of dying within days of admission, a risk that diminished with time but remained greater than the TT patients throughout.

In Indonesia HIV-negative patients, the LTA4H TT genotype effect does not extend beyond Grade 2
11 Bayesian analysis found that, in the overall Indonesia cohort, survival of the TT patient group was higher than non-TT though falling short of significance (maximum probability 0.92) ( Figure 1E). Moreover, this analysis detected that the hazard rate for non-TT patients was higher than TT patients from day 1 to day 13; the non-TT over TT ratio reached significance on days 2 and 3, at which time the non-TT hazard rate was twice that of TT. Thus, while the TT beneficial effect was weaker than in Vietnam (compare Figure 1E to 1A), hazard rate analysis showed that as in Vietnam, TT benefit manifested early ( Figure 1E inset, compare to Figure 1A inset). For the grade stratified cohorts, the analysis of Grade 1 patients was uninformative as there was only one TT patient in this group who survived throughout ( Table 2 and Figure 1F; also see Supplementary Methods section 4). In Grade 2, the TT survival effect was significant, in contrast to Grade 2 Vietnam (compare Figure 1G to 1H). Rather, the pattern of the Grade 2 association was similar to Vietnam Grade 3 with a significant early TT survival benefit. As in Vietnam, the TT survival benefit started within days with an early hazard rate peak for the non-TT group. In Grade 3, the LTA4H TT effect was again absent.
Since Grade 2 patients constitute the bulk of the Indonesia cohort (75.5%), why was the LTA4H genotype effect in this grade not reflected in the overall cohort analysis?
This was particularly curious given that in Vietnam the overall significant effect was being driven very substantially by Grade 3 patients who constituted only 15.9% of the cohort. We saw that the LTA4H TT benefit was weaker and less prolonged in Indonesia Grade 2 than in Vietnam Grade 3 (compare Figure 1G to 1D). Non-TT patients had similar mortality in Indonesia Grade 2 and Vietnam Grade 3 (compare Figure 1G to 1D).

12
Thus, Bayesian analysis revealed an LTA4H TT survival association in Indonesia as well. The association being only in Grade 2 and not in Grade 3 patients suggested an upper limit of disease severity for its efficacy.

specific LTA4H TT effects in Vietnam and Indonesia
Why might LTA4H effects stop after Grade 2 in Indonesia? A closer comparison of the overall survival between the two sites suggested that there were mortality differences between the two cohorts for all grades combined and in grade for grade comparisons that were LTA4H-independent ( Figure 1). For instance, Indonesia Grade 2 non-TT patients had a mortality risk similar to that of Vietnam Grade 3 non-TT patients (compare Figure 1D and 1G). We confirmed by Bayesian analysis that within each cohort, mortality risk increased with grade severity (Thuong et al., 2017;van Laarhoven et al., 2017) (Figure 2). From early on, Grade 1 survival was significantly greater than Grade 2, which was significantly greater than Grade 3 (Figure 2A and B). The hazard rate ratios highlighted that while the risk of increased mortality with higher grade was highest early, it was sustained long-term ( Figure 2C and D). Importantly, both the survival and hazard rate analyses again pointed to an increased grade-for-grade risk of mortality in Indonesia over Vietnam.
Next, we directly compared grade for grade mortality between the cohorts.
Vietnam survival probability was higher in all grades, and significantly so for Grades 2 and 3 with survival gaps of 18% and 24%, respectively ( Figure 3A, C and E). Similarly, hazard rates did not differ significantly between the cohorts in Grade 1 ( Figure 3B), but 13 there was a significant and substantial increase in early hazard rates for Indonesia Grades 2 and 3 ( Figure 3D and 3F).
In sum, these analyses show that the inherent higher mortality associated with more severe disease on presentation was sharply accentuated in Indonesia. Indonesia grade 2 patients experienced similar mortality risk as Vietnam Grade 3 patients with the Indonesia Grade 3 patients experiencing far greater mortality. This higher mortality could potentially explain the finding that the LTA4H TT survival advantage did not extend to Indonesia Grade 3 patients. It may be that the TT genotype advantage, in response to corticosteroid treatment, is overridden by other factors that cause extreme mortality.

LTA4H TT genotype is associated with increased survival in HIV-positive patients
HIV-positive TBM patients suffer ~ twice the mortality of their HIV-negative counterparts and the original RCT found that dexamethasone did not confer a significant survival benefit in HIV-positive patients (Stadelman et al., 2020, in press;Thwaites et al., 2004). The Vietnam study analyzed here also examined LTA4H genotype association with survival in dexamethasone-treated HIV-positive patients of similar ages and TBM grades to the HIV-negative cohort (Thuong et al., 2017) ( Table 1). The TT association with survival was small, failing to reach statistical significance (Thuong et al., 2017). We noted that HIV-positive patients suffered the expected increase in overall mortality, twice that of HIV-negative patients (Table 1). Therefore, it was possible that in this cohort as in the Indonesia HIV-negative cohort, a weaker LTA4H TT survival effect that did not extend to Grade 3 was being masked in the overall cohort analysis.
We first analyzed overall (LTA4H-independent) mortality differences in the HIV-14 positive patients. Like their HIV-negative counterparts, mortality rates were gradedependent -Grade 3 > Grade 2 > Grade 1 ( Figure 4A and B). A direct grade for grade comparison of the HIV-positive and HIV-negative patients showed that HIV-positive mortality was significantly higher in all grades ( Figure 4C, E, G). While both groups had an early peak in hazard rate, this peak was higher in HIV-positive patients ( Figure   4D, F, H). The HIV-positive hazard rate remained higher long-term, consistent with HIV-positive patients being broadly immunosuppressed. Thus, HIV-positive patients experienced an increased grade-for-grade mortality risk over HIV-negative patients.
Bayesian analysis of LTA4H genotype effects in the overall HIV-positive cohort confirmed that TT patients did not survive significantly better than non-TT ( Figure 5A).
When we asked if the LTA4H effect was limited to less severe grades, we found that it was present in both Grades 1 and 3. LTA4H TT patients in both Grades 1 and 3 had a higher survival probability than non-TT ( Figure 5C and G).
However, Grade 2 analysis gave an unexpected result. TT patients had lower survival probability than non-TT ( Figure 5E). The patients had been admitted to one of two hospitals (Table S1 and Methods), and it appeared that this reversal in non-TT versus TT deaths was being driven substantially by three of the eight Grade 2 TT patients in Hospital 1 dying on day 7-8. In contrast, only 7 of the 68 non-TT patients admitted to Hospital 1 had died by day 8.
In sum, ignoring the likely spurious Grade 2 result, the LTA4H TT genotype was also associated with survival in HIV-positive patients, albeit with a weaker effect.
Intriguingly, the effect persisted and indeed was strongest in Grade 3 despite this subgroup having a greater mortality risk than Indonesia Grade 3 HIV-negative patients 15 (compare Figure 5G to Figure 1H). Rather, the increasing effect strength with grade mirrored the picture seen in Vietnam HIV-negative patients. In a commentary published alongside the Indonesia and second Vietnam studies, Fava and Schurr suggested that background genetic differences between the cohorts might account for the lack of an LTA4H association (Fava & Schurr, 2017). However, noting that the Indonesia cohort presented with more severe disease and died earlier, they postulated that rather than invoking an unknown genetic phenomenon, it was more likely that the beneficial effects of dexamethasone for the LTA4H TT genotype were nullified in more severe disease. This concept of the disease being too severe for outcomes to be influenced by intervention has precedent. The beneficial effect of fluoroquinolones in 16 TBM is present in Grade1 or 2 disease, but lost in Grade 3 disease (Thwaites et al., 2011).

DISCUSSION
To test if increased grade severity on presentation in Indonesia was sufficient to account for the loss of the LTA4H TT survival effect, a grade for grade analysis within and between the two cohorts was necessary, while taking into account the temporal changes in mortality risk over the several months-long observation period. Moreover, both the magnitude and the time to mortality seemed to vary dramatically between the cohorts (Table 1). We realized that Bayesian methods were ideal for these complex and intrinsically multivariate comparisons and possibly the only path to a biologically and clinically relevant understanding.
When we analyzed the Vietnam cohort separated by grade severity, we saw that there was indeed a relationship between LTA4H effect and grade severity. However, it was opposite to what had been predicted (Fava & Schurr, 2017). The TT survival benefit became stronger not weaker with increasing grade. The analysis also suggested the reason for this. Patients with mild disease on presentation did well regardless of LTA4H genotype, so that the added benefit of the TT genotype was small. In Indonesia, the LTA4H TT effect was present in Grade 2 and completely absent in Grade 3, a finding that was perplexing until we analyzed the LTA4H-independent mortality risk of the two cohorts. The Indonesia cohort did not just contain a greater number of patients presenting at a more severe grade as had been noted earlier (Fava & Schurr, 2017), but patients in this cohort had substantially higher early mortality than their Vietnam counterparts even grade for grade (Table 1). Indonesia patients had nearly twice the mortality risk in Grade 2 and ~ 50% higher risk in Grade 3. Tellingly, Grade 2 Indonesia overall mortality risk was virtually identical to Grade 3 Vietnam (38% versus 37.9%), suggesting that this level 17 of overall mortality represents the boundary of the beneficial effect of LTA4H TT. Thus, LTA4H TT efficacy was limited by other factors that cause mortality. These factors appear independent of severity grade on presentation, and if they exceed a threshold (represented by about ~ 40% mortality) then the beneficial effect of LTA4H TT is lost.
Analysis of mortality risk enabled by temporal hazard rate analyses provided further insight. For all grades in both cohorts, the risk of death was greatest within days. So also was the excess grade for grade mortality risk in Indonesia. Likewise, the survival benefit of LTA4H TT also occurred early. Thus the LTA4H TT survival benefit was being nullified in Indonesia Grade 3 by unrelated factors increasing mortality risk in the same time period.
What might these factors be? Perhaps dysregulated inflammation had reached the point of no return in Indonesia Grade 3 TT patients in a manner not revealed by standard metrics of judging disease severity. If this were the case then glucocorticoids might no longer be beneficial. Since both TT and non-TT patients suffered identical excess mortality risk, its cause would be LTA4H-independent. Indonesia patients tended to be younger than Vietnam patients in all grades (Table 1), and perhaps were more prone to develop such a response. The more likely possibility is that better ancillary care was possible in Vietnam where all patients were enrolled into a clinical trial versus only 17% in Indonesia (Thuong et al., 2017;van Laarhoven et al., 2017). Optimized respiratory support, in particular, would be essential to keep patients alive through the early high risk stage in order allow for anti-inflammatory effects of glucocorticoids to benefit the TT patients.
18 Why was the LTA4H TT effect misunderstood to be limited to the least severe patients in both cohorts (Fava & Schurr, 2017)? Two reasons might explain this. First, it was not appreciated that apart from having fewer Grade 1 patients, Indonesia patients also suffered higher grade for grade mortality risk (Fava & Schurr, 2017). Second, because of a paucity of less severe patients, a subgroup analysis was performed in an attempt to tease out an effect in this group. Patients were divided into GCS 14-15 (less severe) versus < 13 (more severe), and a nonsignificant TT survival benefit was found in the GCS 14-15 group only, supporting the idea that TT effects, if any, were limited to the less severe group (Supplementary Figure 2A

19
It is difficult to formulate a pathophysiological mechanism that would explain this reversal in the intermediate grade exclusively in HIV-positive patients, while still maintaining the TT survival advantage in both lower and higher severity. We note that, having reported on a long series of Bayesian significant results where each result gives a 0.95 posterior probability that a particular hypothesis is true, we might expect on average one in twenty of those hypotheses to be false, despite the result (Box 1). This particular result (with a posterior probability of 0.962) seems most likely to be one such, given that we have ~ 30 significant results in this analysis, and that the data for this group may be skewed by a small cluster of deaths at a single time point in one hospital. Therefore, the TT effect is most likely to occur across all grades, and by extension glucocorticoids are likely to provide benefit to TT individuals in all grades, albeit a weaker effect than their HIV-negative counterparts. That the TT effect would be present in HIV-positive Grade 3 patients might seem surprising at first blush because this group has an even higher mortality risk than Indonesia Grade 3 HIV-negative patients (75.2% versus 63.8%) ( Table   1). The time of mortality in these groups provides a likely explanation. The median time to mortality was four times longer (37 days for Vietnam HIV-positive vs. 8 days for Indonesia). These analyses suggest that HIV-positive Grade 3 patients have a more prolonged risk of mortality for reasons that are independent of corticosteroid treatment and LTA4H genotype. Thus, the use of Bayesian methods has identified HIV-negative subgroups of TT patients that gain the most survival benefit from dexamethasone treatment.
HIV-positive TT patients also have an early survival benefit albeit weaker than in their HIV-negative counterparts. This may reflect a distinct pathophysiology from early 20 on. This idea is supported by a subgroup analysis in the original Vietnam study which found that LTA4H TT did not associate with survival in severely immunocompromised individuals with a CD4 T lymphocyte count <150. Rather it appeared to be limited to those with CD4 >150, although the small number of patients in this group precluded finding significance. Together, our findings set the stage for using Bayesian methods for subgroup analyses of two ongoing trials where 1) the survival of HIV-negative CC+CT patients randomized to getting dexamethasone or placebo is being compared to that of TT patients who are all getting dexamethasone (Donovan, Phu, Thao, et al., 2018), and 2) the benefit of dexamethasone is being examined for HIV-positive patients of all three genotypes by randomizing them to get dexamethasone or not (Donovan, Phu, Mai, et al., 2018).
Finally, in addition to providing guidance for TBM pharmacogenomic approaches, we hope that our analyses highlight the unique value of Bayesian methods for providing guidance for other complex diseases with difficult treatment decisions. The vital importance of defining the patient populations and subgroups which will benefit the most from specialised interventions and treatments is increasingly appreciated (Sadée & Dai, 2005). This is not only to target such treatments to those who will benefit, but to avoid their adverse effects in those individuals who have little chance of experiencing a clinically relevant benefit from them. Our finding that the LTA4H TT genotype's salutary role is incumbent on the optimization of other factors that maximize patient survival has broad implications for pharmacogenomic approaches.

Acknowledgments
We thank R. Troll for evaluating the two published cohort studies, realizing that Bayesian analysis could provide answers and initiating the collaboration with the Bayesian statistician R.S. This work was supported by the NIH (NIAID ULTIMATE project 1R01AI145781-01) to A.L. and R.C., by the Wellcome Trust to N.T., T.T. and L.R. and by an NIH MERIT award to L.R. provided raw data, discussed results, reviewed paper; G.T. participated in prior selection, provided raw data, discussed results, guided writing of, and edited, paper; M.T., P.E. participated in prior selection, reviewed data analysis, provided critical input, edited paper; R.S. checked code and added output, plotted data, performed housekeeping data management functions, analysed and interpreted data, wrote paper; L.R. conceived and oversaw project, participated in prior selection, guided data analysis, interpreted data, wrote paper.
II: multivariate data analysis--an introduction to concepts and methods. Br J Cancer, 89 (3) (MacKay, 2003) Bayes: "A is significantly greater than B" = Posterior probability that A greater than B is at least 0.95.

Box 1. Contrast of "95% significant" in Bayesian and frequentist paradigms
Frequentist: "A is significantly greater than B" = For any circumstance where A is at most B, the probability of getting data in this critical region, as we did, was at most 0.05. Therefore: 1. We expect 1 in 20 of Bayesianly (95%) significant results to be truly negative and therefore false positives; 2. We expect up to 1 in 20 of truly negative results to be frequentistly significant (at the 95% level) and therefore false positives.
Therefore (assuming all positives are at 95% level): -In the frequentist paradigm, the expected number of false positive results is proportional to the number of comparisons done on true negatives; -In the Bayesian paradigm, the expected number of false positive results is proportional to the number of apparent positive results, and unaffected by any vast number of accompanying apparent negative results.

Box 2. TBM disease severity classification
Glasgow Coma Score (GCS) A general measure of consciousness used for a wide range of neurological deficits, particularly brain trauma, by scoring eye opening and verbal and motor responses to stimuli, to assign a numerical value from 3-15 corresponding to decreasing severity, where 3 corresponds to completely unresponsive, deep coma and 15 to fully conscious (Teasdale & Jennett, 1974).

Modified British Medical Research Council (BMRC) TBM Grade
A classification scheme specifically tailored to assess TBM severity. It is derived from the GCS, and additionally incorporates the presence of focal neurological signs. The TBM grade is scored between 1-3 corresponding to increasing severity, converse to the GCS classification.

Definitions
• Posterior probability -the probability after seeing the data • Mean posterior survival probability at time T -the expectation after seeing the data of the fraction of patients that will still be alive at time T • Hazard rate -the fraction of those still surviving that will die per unit time. A high hazard rate at a particular time indicates that patients are at high risk of dying at that time • Mean posterior hazard rate at time T -the expectation after seeing the data of the hazard rate at time T

Onwards -for the rest of the 270-day observation period
Throughout -for the entire 270-day observation period "A was significantly greater than B at time T" -the posterior probability that A was greater than B, at time T, was at least 0.95 "A was not significantly different from B" -the posterior probability that A was greater than B was between 0.05 and 0.95 throughout "Group A survival was 30% greater than group B" -the mean posterior survival probability at 270 days, p A , was 30% absolute greater than the corresponding probability p B for group B. ("absolute" here meaning that p A = p B + 0.3, and not that p A = p B × 1.3) "Probability that group A survival was better than group B at time T was 0.97" -the posterior probability that group A survival probability at time T was greater than group B survival probability at time T was 0.97 (Note that this is not a reference to the mean posterior probability.) "The hazard rate ratio for group A over group B peaked at X at time T and remained greater than Y throughout" or "Group A had an X-fold higher relative risk of death at time T which remained greater than Y throughout" -the mean posterior hazard rate for group A, divided by that for group B, peaked at a value of X at time T and remained greater than Y at all times up to 270 days.
"The probability that hazard rate for group A was greater than that of group B was 0.97 at time T" -the posterior probability was 0.97 that the hazard rate for group A at time T was greater than that for group B at time T (Note that this is not a reference to the mean posterior hazard rate.) "The hazard rate for group A was greater than that for group B at time T" -the mean posterior hazard rate for group A was greater than that for group B at time T.
"survival gap" -the difference in mean posterior survival probability between the two groups being considered at 270 days   In Vietnam, overall (A), TT survival was significantly higher than non-TT from day 39 onwards with maximum probability 0.98, survival gap 11%; non-TT hazard rate was significantly higher than TT from day 4 to day 120, with their ratio peaking at 3 on day 6 and remaining >1 until day 223. (D) Grade 3 TT survival was significantly higher from day 3 onwards with maximum probability 0.97, survival gap 30%. The TT hazard rate dropped from the start, while the non-TT hazard peaked at 16 times higher than TT on day 3; non-TT over TT hazard rate ratio remained > 1 throughout. In Indonesia, overall (E), TT survival was nonsignificantly higher than non-TT (maximum probability 0.92); the non-TT hazard rate was greater than the TT hazard rate from day 1 to day 13, significantly so (and by 2-fold) on days 2 and 3 (maximum probability 0.97). (F) Grade 1 comparisons were uninformative due to TT sample size (n=1). (G) Grade 2 TT survival was significantly higher on days 4-32 with maximum probability 0.99, survival gap 9%. The TT hazard rate dropped from the start, while the non-TT hazard peaked at 5 times higher than TT on day 3. The non-TT over TT hazard rate ratio remained > 1 until day 15.        The non-TT hazard rate was significantly lower than TT from day 48 onwards; the non-TT over TT hazard rate ratio was >1 from 14 days onwards, peaking at 8 at day 250. (E, F) Grade 2 non-TT survival was significantly higher than TT from day 252 onwards (maximum probability 0.95, survival gap 21% in favour of non-TT), but hazard rates did not differ significantly. (G, H) Grade 3 TT survival was significantly higher than non-TT from days 12-72 with maximum probability 0.97, survival gap 18%; hazard rate ratio for non-TT over TT peaked at 3.7 on day 3 and remained >1 throughout.  Figure S1.

Overview
We adopt the Bayesian paradigm. Accordingly we define below a probabilistic generative model for patient lifetime x given model parameters θ and a suitable prior distribution on θ. We assume that, when observed, such lifetime data may be censored at known times t that are independent of the underlying lifetimes and the model parameters, so that for each patient we observe either a time of death x or a time of censoring t at which the patient was alive.
We then collect a data set of valuesx of x or t for a set of patients, and apply Bayes' theorem to deduce the posterior distribution P (θ|x) of the model parameters. Figure 1: Output of test run using synthetic data for which the right answer is known. The true survival probability curve is shown in green, with the Kaplan-Meier plot for the generated data in black. In blue are shown many samples from the posterior distribution on the survival probability curve, calculated from P (θ|x), which indicate the uncertainty in the inferred distribution. The synthetic dataset comprised 300 hypothetical patients of whom the time of death of 153 was censored.
Since this distribution is hard to visualise, we draw samples of θ from it using Markov chain Monte Carlo (MCMC) methods described below. For each such sample of θ we calculate the distribution of lifetime x given θ and hence the survival probability q(t|θ) and hazard rate h(t|θ) functions against patient time since admission, using the following relationships: ).
An example resulting set of survival probability curves is shown in Figure 1, with the corresponding hazard rate curves being shown in Figure 2. Now, given a set of such curves, we can calculate the mean posterior survival probability (resp. hazard rate) at each time point and plot the posterior mean survival probability (resp. hazard rate) against time. Similarly we can find the 2.5% and 97.5% centiles. Examples of the posterior mean and centile curves for survival probability corresponding to Figure 1 are shown in Figure 3.
Further, given two such sets of survival probability or hazard rate curves (such as the survival probability curves shown in Figure 4 and zoomed in in Figure 5), we can also calculate at each time point the probability that a value taken at that time from a random curve from set 1 is greater than a similar value taken from a random curve from set 2. As we can see from these two figures, this probability is higher at 10 days than at 1 day or 100 days. These probabilities, at each time point, can then be plotted giving in this example Figure 6, showing the probability that underlying survival in group 1 is greater than that in group 2 at each time point.
In order to carry out this process, we need to specifically define the lifetime distribution model P (x|θ), and we now turn to this.

Lifetime model
We suppose that there exist an unknown number J of different mechanisms causing death, and that each such mode of death has a different lifetime distribution. Figure 2: Output of test run using synthetic data for which the right answer is known. The true hazard rate curve is shown in green. In blue are shown many samples from the posterior distribution on the hazard rate curve, calculated from P (θ|x), which indicate the uncertainty in the inferred distribution. The synthetic dataset comprised 300 hypothetical patients of whom the time of death of 153 was censored. Figure 3: Output of test run using synthetic data for which the right answer is known. The true survival probability curve is shown in green, with the Kaplan-Meier plot corresponding to the generated data in black. In blue is the posterior mean survival probability against time, calculated from P (θ|x), and in dotted lines the 2.5% and 97.5% centiles, which indicate the uncertainty in the inferred distribution. The synthetic dataset comprised 300 hypothetical patients of whom the time of death of 153 was censored. Figure 4: Comparison of two runs generated independently from two subsets of patients, subset 1 (red) consisting of 20 patients and subset 2 (green) consisting of 182 patients. The Kaplan-Meier plot for subset 1 is in solid black and that for subset 2 in dot-dashed black. Since there were many more patients in subset 2 than in subset 1, we expect greater variance in the inferred survival probabilities for subset 1 than for subset 2.

Combination of different modes of death
Let x denote a lifetime, i.e. the time until a patient dies. Let j ∈ {1, 2, ..., J} denote a particular mechanism (or mode) of death. Let x j denote the time at which mode j would kill the patient; we set x j = ∞ to denote the possibility that that mode would never have killed the patient.
Then the patient's time of death is given by In particular x = ∞ denotes the situation that the patient never dies (unlikely as this is).

Model of a single mode of death
We now drop the subscripts j, but assume that this subsection will be repeated J times with the subscript js added to every random variable, with each of the repetitions being independent as far as the model is concerned before being conditioned on observed data. When later we want to refer to the complete set of J values of e.g. p, we will use bold face, e.g. p = (p 1 , p 2 , ..., p J ).
Thus we will set P (x|p, k, m, r), i.e. P (x j |p j , k j , m j , r j ), to be such that with probability p, x k is Gamma distributed with parameters m = m and r = mr k , and otherwise x = ∞. Thus we have Here p ∈ [0, 1], k ∈ R \ {0}, m, r > 0.
Note that we have here a distribution which has both a discrete and a continuous part, so that P (x|...) is used as notation both for a probability and for a probability density: in other words, we have a continuous distribution for finite positive x, given by a density function, whose interpretation is that its integral from x 1 to x 2 is the probability that x 1 < x < x 2 ; but as we have a non-zero probability 1 − p that x = ∞, the integral from 0 (inclusive) to ∞ (exclusive) of the density given by the first line of the above formula for P (x|p, k, m, r) must be p. On the other hand we have a discrete distribution for x = ∞, and 1 − p is a probability, not a density.
By way of very approximate intuition: p is the probability that a particular mode of death would kill the patient at a finite time; r is the reciprocal of the overall timescale to deaths of those patients who die; m governs how variable those times of death are -the smaller m is, the more variable are the times of death; and the sign of k plays a part in determining whether the hazard rate for this mode of death is increasing or decreasing, while the magnitude of k governs how abruptly the spread of death time is cut off in the less spread out direction. Specifically, k = 0 makes no sense, as then we would have x k = 1 for all x, and an invalid distribution would result (so it should not be a surprise that the prior on k is bimodal with zero density at zero).

Priors on the parameters
We specify the priors on the parameters in two stages. First, we specify their general form, and second we choose specific values for the hyperparameters that then specify a unique prior.

General form of the priors
The total number J of modes is itself to be considered a random variable, on which we put the prior ..} and for some fixed α J ∈ [0, 1).
The prior for the parameters p, m, r, k of each mode of death are taken to be independent, and as follows.
We take the prior on p to be Beta, with positive real parameters α p , β p > 0, so that We take the prior on r to be Gamma, with parameters m r , r r > 0, so that P (r|m r , r r ) = r mr r Γ(m r ) r mr−1 e −rrr .
We take the prior on each of the parameters k and m to be the conjugate prior on each with respect to this parameterisation. Thus for positive real parameters a m , b m we have Similarly for parameters N k ∈ N, a k ∈ R + and b  These result in the following depicted distributions for J, p j , m j , r j , k j , and hence for the depicted samples from the distributions for survival probability and hazard rate against time as well as the mean and 2.5% and 97.5% centiles for the last two: see Figures 7 to 15.

Summary
Thus the parameters θ consist of θ = (J, p, m, r, k), and the prior a combination of densities of different dimensionalities. This can be regarded a mixture of models, one for each possible value of J, weighted by the prior on J: the model for J = 1 has only one possible mode of death, while that for J = 2 has two possible modes of death, which are independent, and so on for higher values of J.
In particular, when comparing two subsets A and B of patients, we use the same prior to infer the posterior distribution of θ givenx for each subset of the patients independently. Thus a priori the probability at each time t that q A (t) > q B (t) is 1 2 , the probability that q A (t) < q B (t) is 1 2 , and the probability that q A (t) = q B (t) is zero (though it has non-zero probability density). In other words we are asking how sure we are that q A (t) > q B (t) given the data, reckoning the alternative to be that q A (t) < q B (t), and taking for granted that the probability that the two are exactly the same is zero.

Rationale
The basic Gamma model is a frequently used model of a failure mode that goes through several stages of failing, each with an exponentially distributed lifetime, before final failure occurs. The additional effect of the parameter k is to incorporate Weibull-type failure time models, allowing for both increasing and decreasing hazard rates. The parameter p allows for the possibility that some modes of dying may not be relevant for all patients, while the combination of the J submodels allows for a number of different types of mechanisms of death to be relevant.
In particular, we specifically used independent priors on the parameters of the subsets being compared because: • When offered the choice the users (namely LW, MT, PE, LR, GT of whom PE, LR, and GT are infectious diseases clinicians with GT having extensive experience with TB meningitis patients) indicated unanimously that this correctly represented their prior beliefs; • Because we restrict our use of posterior probabilities to those of the form P (U > V | data) rather than P (U > V +α| data) for some α > 0, and because the additional effect of any non-independent factors in the prior would be symmetric either side of the difference between the two subsets being zero, the effect of any non-independent factors would be expected to be very small; • If they did not believe the priors on the relevant disjoint subsets were independent, the clinicians involved would find it very hard to specify exactly how similar each pair of subsets being compared should be expected to be.

MCMC methodology
We introduce additional variables j i for each patient i which indicate whether the time of death was censored (value 0) or was caused by a particular mode j of death (value j = 0 unknown). We also introduce variables x i,j of unknown values giving for each patient the time of death that would have resulted from mode j if no other modes had killed the patient first. These variables take the specific value x i,j = ∞ if mode j would in fact not have killed patient i at any finite time.
We initialise the parameters J, p, m, r, k from the prior and initialise the additional variables j and x randomly to any set compatible with those and the observed variablesx. These variables then form θ 1 = (J, p, m, r, k, j, x), the first of a sequence of samples (θ n ) n=1,... to be drawn.

Sampling methods
A thorough review of all the following methods is available either in [1] or in [2] except where otherwise indicated.
The key point is that if we resample each variable by a method that satisfies detailed balance, and given other weak conditions which are here fulfilled, Feller's theorem [1] then guarantees the the sequence of samples (θ n ) will eventually converge to a sequence of samples from the desired distribution P (θ|x). The samples in this sequence will not be independent of each other, though the conditional distribution of θ n1 given θ n0 will also converge to P (θ|x) as n 1 → ∞ with n 0 fixed, i.e. to independence.
Sampling from the posterior was done by the MCMC technique of Gibbs sampling, i.e. sampling from the following distributions palindromically: 1. P (k|x, x, J, j, m, r, p). This distribution has two parts (k j > 0 and k j < 0), each of which is logconcave. We therefore first resample the sign of each k j using the Metropolis-Hastings algorithm [1], then use adaptive rejection sampling [3] to resample the magnitude of k j given its sign, then resample the sign again to maintain detailed balance.
2. P (p|x, x, J, j, m, r, k). In this case the conditional distribution is from the Beta family, and standard methods [2] are available to sample from it.
3. P (m|x, x, J, j, r, k, p). This distribution is log concave, so we may use adaptive rejection [3] sampling to sample from it.
4. P (r|x, x, J, j, m, k, p). For each j, this distribution is in general a product of a Gamma distribution on r j and a much narrower Gamma distribution on r kj j . We therefore sample from the Gamma relevant to the latter [2], using this as a proposal distribution for the Metropolis-Hastings algorithm [1], resulting in the Hastings ratio coming from the Gamma on r j . 5. P (j|x, J, m, r, k, p) then P (x|x, J, j, m, r, k, p). The first of these is a discrete distribution which is trivial to sample from, and the second reduces to a truncated Gamma distribution. To sample from the latter we divide into two cases: if the shape parameter is ≥ 1 the distribution is logconcave and we can use adaptive rejection sampling [3]; otherwise we use Metropolis-Hastings [1] with either an exponential or a Gamma proposal distribution, depending which is estimated to be likely to be quicker given the other parameters.
6. P (J|x, x, j, m, r, k, p) (where only values of j unused in j are allowed to be removed) followed, if J has increased, by sampling the new elements of m, r, k, p from the prior distributions on these variables. Resampling of J uses a discrete conditional distribution, and is done using a proposal to either increase or decrease J by 1, and applying the appropriate Hastings ratio [1] to reject the proposal in such a way as to achieve detailed balance.
10,000 samples were drawn from the posterior for subset of the data considered (e.g. for Indonesian TT patients). The first 1,500 samples were discarded and the remainder kept for analysis. To check that the software was correct we undertook two types of check: 1. The inference code was reviewed by somebody (RFS) different from its author (JC) looking for bugs, and those found were removed after RFS and JC had conferred to reach agreement on them.
2. Multiple sets of synthetic data were generated (for which the true values of x, J, j, m, r, k, p were therefore known) and the posterior distributions were compared with the true values (as for example in Figure 1 above).

Convergence checks
In addition we checked for convergence of the Markov chains by starting them from different random initial values of x, J, j, m, r, k, p, then the two corresponding sets of output samples were compared as if they were made from two different sets of patients, giving plots analogous to Figures 4 and 6 above, such as those shown in Figures 16 and 17 below, which show that the two distributions are sufficiently close as to be in practice indistinguishable.

Example of inference
Because, in the Results section of this paper, one particular specific example of Bayesian inference occurs whose interpretation is slightly tricky, it seems appropriate to discuss it specifically here. This corresponds precisely to the comparison of TT and non-TT genotypes in Grade 1 Indonesia patients.
We refer to Figure 18. The situation before knowledge of the data is described by the prior mean survival probability curve in solid magenta and its 2.5% and 97.5% centiles in dot-dash magenta, constructing the 95% prior confidence interval. (You may think, in the light of the data, that this prior is too pessimistic -but the point is that this is what was thought before knowledge of the data.) Note that the only Figure 16: Samples captured from two runs on the same data started from different random values of the parameters, illustrating that the resulting distributions are essentially identical. Figure 17: Comparison probabilities (analogous to Figure 6) for survival probability against time from two runs on the same data (and the same priors) started from different random values of the parameters. If the two distributions are identical (as they should be up to uncertainty caused by the non-infinite number of samples drawn during the MCMC runs), then at each time the probability that the "red" distribution is greater than the "green" (see Figure 16) should be 0.5 . Thus this plot, together with Figure 16, shows that the two distributions are essentially identical, and that the runs have converged to a common distribution. parts of the plot outside this confidence interval are a small piece at the bottom left and an extremely thin sliver along the right-hand part of the top edge.
We now collect the TT patients' data: there is, however, only 1 TT patient, who survives until 1 year before being censored. A single patient, however, has only a small effect on the prior (just as a single head-toss would not convince you a coin was biased): this shifts the posterior for the TT group upwards to the red lines, mean (solid) and centiles (dot-dash).
On the other hand when we collect the non-TT patients' data, there are 33 of them, so they have a bigger effect, both raising the mean and narrowing the 95% posterior confidence interval to the corresponding green plots. Even though these 33 patients survive less well than the single TT patient, they lift the posterior mean more than does the single TT patient, but the green 95% posterior confidence interval is much narrower than the red one.
Finally, analogous to Figures 4 and 6, we can calculate the probability that the TT population survives better than the non-TT population at each time point, getting Figure 19: the conclusion is that it has become very slightly less probable that TT survives better than non-TT than it was before (before it was 0.5 precisely), but in essence the posterior probability that TT survives better than non-TT remains not far off 0.5 throughout the time-course.
Of course, in most examples in the paper, there are more patients in both groups being compared, and we are more likely to get a more definite conclusion.

Sensitivity to choice of priors
Specifically for comparisons of TT and nonTT subsets, where the subsets consist of very different numbers of patients, there is particular scope for otherwise unexpected sensitivity to the choice of uninformative priors. To assess this we initially checked, for one such comparison, the effect of using a different prior, namely a J = 0.8 a p = b p = 1 Figure 19: Example of comparison probabilities for inference whose interpretation is explained in detail in section 4 of this document. See also Figure 18. As can be seen by comparing this with Figure 12, this alternative prior does not envisage nearly as many early deaths in the first few days as the chosen prior, but equally believes it to be more unlikely that survival would be near 100% at much later times.
The effect of this on a comparison of a small subset and a large subset would be expected to be to shift the posterior on the small subset upwards in the early period and downwards in the late period compared with the large subset, increasing the significance of the early comparison if the small subset survived better than the large subset at that time, and reducing it if in the other direction (and vice versa at late times).
In the case of the comparison shown in Figure 6 above, switching to the alternative prior gives Figure  21, indeed confirming this expectation: the peak significance, at around 10 days, increases from 0.994 to 0.996, while at 1 year the comparison probability reduces from 0.893 to 0.886 .