Neutrophil counts and cancer prognosis: an umbrella review of systematic reviews and meta-analyses of observational studies

Meghan A. Cupp; Margarita Cariolou; Ioanna Tzoulaki; Evangelou Evangelos; Antonio J. Berlanga-Taylor

doi:10.1101/330076

ABSTRACT

OBJECTIVE To evaluate the strength and validity of evidence on the association between the neutrophil to lymphocyte ratio (NLR) or tumour-associated neutrophils (TAN) and cancer prognosis.

DESIGN Umbrella review of systematic reviews and meta-analyses of observational studies.

DATA SOURCES Medline, EMBASE, and Cochrane Database of Systematic Reviews.

ELIGIBILITY CRITERIA Systematic reviews or meta-analyses of observational studies evaluating the association between NLR or TAN and specific cancer outcomes related to disease progression or survival.

DATA SYNTHESIS The available evidence was graded as strong, highly suggestive, suggestive, or weak through the application of pre-set grading criteria. For each included meta-analysis, the grading criteria considered the significance of the random effects estimate, the significance of the largest included study, the number of studies and individuals included, the heterogeneity between included studies, the 95% prediction intervals, presence of small study effects, excess significance and credibility ceilings.

RESULTS 239 meta-analyses investigating the association between NLR or TAN and cancer outcomes were identified from 57 published studies meeting the eligibility criteria, with 81 meta-analyses from 36 studies meeting the criteria for inclusion. No meta-analyses found a hazard ratio (HR) in the opposite direction of effect (HR<1). When assessed for significance and bias related to heterogeneity and small study effects, only three (4%) associations between NLR and outcomes in gastrointestinal and nasopharyngeal cancers were supported by strong evidence.

CONCLUSION Despite many publications exploring the association between NLR and cancer prognosis, the evidence is limited by significant heterogeneity and small study effects. There is a lack of evidence on the association between TAN and cancer prognosis, with all nine associations identified arising from the same study. Further research is required to provide strong evidence for associations between both TAN and NLR and poor cancer prognosis.

REGISTRATION This umbrella review is registered on PROSPERO (CRD42017069131)

FUNDING Medical Research Council

COPYRIGHT Open access article under terms of CC BY

SHORT TITLE Neutrophils and cancer prognosis: an umbrella review

KEY RESULT When assessed for significance and bias related to heterogeneity and small study effects, only three (4%) associations between NLR and overall survival and progression-free survival in gastrointestinal and nasopharyngeal cancers were supported by strong evidence.

WHAT IS ALREADY KNOWN ON THE TOPIC

Neutrophil counts have been linked to the progression of cancer due to their tumourigenic role in the cancer microenvironment.
Numerous meta-analyses and individual studies have explored the association between neutrophil counts and cancer outcomes for a variety of cancer sites, leading to a large body of evidence with variable strength and validity.
Uncertainty exists around the association between neutrophils and cancer outcomes, depending on the site, outcome and treatments considered.

WHAT THIS STUDY ADDS

All meta-analyses included in this review indicated an association between high neutrophil counts and poor cancer prognosis.
There is strong evidence supporting the association between the neutrophil to lymphocyte ratio and poor cancer prognosis in some respiratory and gastrointestinal cancers.
Further research is required to strengthen the existing body of evidence, particularly for the association between tumour-associated neutrophils and cancer outcomes.

INTRODUCTION

Cancer is the second leading cause of death worldwide(1), contributing to over 8.7 million deaths globally(2). Cancer incidence is increasing(2), due in part to the epidemiological transition of increasing mortality and morbidity from chronic diseases(3). This increase highlights the importance of identifying prognostic indicators associated with cancer progression(4). C-reactive protein (CRP)(5), serum albumin(6), fibrinogen and differential leukocyte counts(7–9) are all indicators of inflammation that have been linked to cancer prognosis. In recent years, the role of neutrophils in the tumour microenvironment has been explored due to their paradoxical role in both the prevention and facilitation of tumour progression(10).

Neutrophils are the most abundant white blood cells (WBCs)(11), making up 50-70% of the body’s circulating leukocytes(12). Neutrophil counts, particularly the neutrophil to lymphocyte ratio (NLR), have emerged as indicators of cancer prognosis and several systematic reviews and meta-analyses have explored their potential as a prognostic indicator in cancer(13). The NLR was first recognised for its association with systemic inflammation in the critically ill(14) and meta-analyses on the association between elevated NLR and poor prognosis have reported a wide range of effect sizes depending on the site of cancer(13). It is currently unclear how the association between NLR and poor prognosis varies depending on the site of cancer or the treatment considered. The close association between inflammation and cancer progression indicates that elevated tumour-associated neutrophils (TAN), also known as neutrophils which infiltrate tumours(15), are a potential prognostic indicator(10,16,17). Many systematic reviews and meta-analyses explore the association between neutrophils and cancer prognosis. However, the myriad of different cancer sites, stages, treatments and survival outcomes measured complicates the interpretation of this body of evidence.

Umbrella reviews allow for the analysis of broad subject areas to examine the strength and credibility of associations using the results of published systematic reviews and meta-analyses(18,19). Through umbrella review methods, the strength and consistency of the literature is assessed to evaluate bias and identify which associations are supported by strong evidence(18). Here we carried out an umbrella review of systematic reviews and meta-analyses with the aim of comprehensively evaluating the validity and strength of reported associations between NLR or TAN and cancer prognosis and identify potential biases in relevant literature.

METHODS

Literature search

Searches were conducted in Medline, Embase and the Cochrane Database (Appendix 1) and aimed to include all systematic reviews and meta-analyses published in English from inception up to 23 June 2017. Indicators of neutrophil counts included NLR and TAN (intratumoural neutrophils, peritumoural neutrophils and stromal neutrophils). Overall survival (OS), cancer-specific survival (CSS), progression-free survival (PFS), disease-free survival (DFS) and reoccurrence-free survival (RFS) were considered as cancer outcomes. Articles were initially screened by title and abstract to determine eligibility for full text screening and inclusion using RefWorks web-based bibliography and database manager(20).

Inclusion and exclusion criteria

Included studies were systematic reviews and meta-analyses of individual observational studies in humans with any cancer diagnosis and NLR or TAN measurements taken around the time of diagnosis. Systematic reviews which did not include a meta-analysis were excluded. Meta-analyses were excluded if they did not assess a cancer outcome in our inclusion criteria, included more than one outcome in a single analysis, or either did not specify the cancer site studied or included all sites in a single analysis (e.g. analyses of cancers grouped as “other” were excluded). Meta-analyses were also excluded if they did not provide sufficient detail for replication, such as the individual hazard ratio (HR), 95% confidence interval and total sample size of each included study. If a single study included multiple meta-analyses, all meta-analyses were individually assessed for eligibility.

When more than one meta-analysis was identified for a single association at a specific site they were assessed for concordance in the direction, magnitude and significance of their effect estimates. If the duplicate meta-analyses agreed in significance, magnitude and direction of effect, the meta-analysis with the greatest number of studies was included. If the duplicates had any disagreement, all meta-analyses were excluded for the association.

Data extraction

Data extraction forms were generated to record information from each meta-analysis and the included individual studies. From each meta-analysis the study’s first author, year of publication, outcome measure, indicator and cancer diagnosis were extracted. For each included individual study, the first author, year of publication, total population, epidemiological design, HR and 95% confidence interval were extracted. Each meta-analysis was allocated to one of six categories according to cancer site as follows: all cancers, gastrointestinal, gynaecological, hepatocellular, respiratory and urinary cancers.

Data analysis

Estimation of summary effects

The weighted inverse variance method was used to reproduce all included meta-analyses in R(21) with the “meta” package(22) and “metagen” command. For each cancer site specific indicator and outcome pair, the summary effect size and 95% confidence interval were calculated using fixed and random effects methods. The random effects model was used to compute summary effect size estimates taking into account the observed heterogeneity, since cancer is a highly heterogeneous disease(23,24). Estimates from the fixed effects model are also presented.

Assessment of reproducibility

Each included meta-analysis was reproduced to yield both fixed and random effects estimates. Reproduced random or fixed effect estimates which did not match the results reported in the original study were assessed for absolute and percent difference. Meta-analyses with a difference in HR of only 0.01 were attributed to rounding errors. Studies with larger discrepancies were investigated to determine the source of disagreement. Where there were issues with reproducibility, the calculated values of the random effects model were used to assess the evidence for the association.

Assessment of heterogeneity

Heterogeneity within each meta-analysis was assessed with Cochrane’s Q test and quantified using the I² statistic(25). Cochrane’s Q test detects a departure from homogeny in the effect sizes of individual studies when p<0.10(25). The I² statistic was also used to quantify the percentage of variation which can be attributed to heterogeneity due to common limitations associated with Cochrane’s Q test(25). Values exceeding 50% or 75% are considered to show large or very large heterogeneity respectively. The 95% confidence intervals around each I² value were also included to evaluate the uncertainty around estimates of heterogeneity(26). Large measures of heterogeneity, representing true heterogeneity or inconsistency due to bias(27), were further assessed through prediction intervals and Egger’s test for funnel plot asymmetry.

Estimation of prediction intervals

In order to assess the impact of heterogeneity, 95% prediction intervals were calculated for the summary random effect estimates(28). Prediction intervals account for the uncertainty caused by heterogeneity when estimating the distribution of true effect sizes in an association and yield an interval which predicts the effect size of future studies investigating the same association(28). In studies with large amounts of heterogeneity the prediction interval may be wide enough to include the null value (HR<1), suggesting that the true effect size may also include the null value.

Assessment of small study effects

Small study effects and funnel plot asymmetry were quantified through Egger’s test using the command “metabias” from the R package “meta”(22) to determine if heterogeneity occurred due to chance(29). The presence of small study effects was confirmed by a low significance value in Egger’s test (p<0.10) indicating bias or true heterogeneity(30). Since the Egger’s test is underpowered in meta-analyses including less than ten individual studies(31), further assessment was carried out in these meta-analyses to determine if the summary effect size estimate was greater than the point estimate of the largest included study, indicating potential small study effects(32).

Evaluation of excess significance

The test for excess significance (TES)(33) was used to determine if the number of observed positive results differed significantly from the expected number of significant results. TES results can reveal reporting bias if the number of observed studies with significant results in each meta-analysis is significantly larger than the expected number using a two-tailed binomial probability test (p<0.10)(34). The expected number of significant results in each meta-analysis was calculated as the sum of the statistical power estimate, or the probability that each study will find a positive result(33,34). The estimated power for each individual study was calculated in Stata 14(35), using the “power cox” command to calculate the power of each test given its sample size, effect size and significance level(36). The estimation of power for each individual study also requires an estimation of the true effect size, so the effect size of the largest study was used to give the most conservative estimation of true effect. Estimates from both fixed and random effects models were included for sensitivity analysis. The “binom.test” command in R was used to assess the significance of differences in the number of observed versus expected significant studies through an exact binomial test(37).

Credibility ceilings

Credibility ceilings were utilised for sensitivity analysis and to test the methodological limitations of using observational studies to calculate combined effect estimates(38,39). Credibility ceiling calculations inflate the variance of each study included in a meta-analysis to account for the probability c that the true effect size is in the opposite direction of effect of the observed point estimate(39). Inflated variances were calculated in Stata 14(35,38). The summary effect size and heterogeneity of each meta-analysis was assessed with ceiling values ranging from 5 to 20%.

Grading the evidence

Associations between neutrophil counts and cancer prognosis were categorised into strong, highly suggestive, suggestive, or weak through assessment of the strength and validity of the evidence for each meta-analysis, according to pre-defined criteria outlined in Supplementary Figure 1(40,41). In order for an association to be considered strong, the meta-analysis must yield a p-value of less than 10⁻⁶ in the random effects model(42), include more than 1,000 individuals, show significance at p<0.05 in the largest included study, find no heterogeneity (p>0.10) through the Q test, detect less than 50% variance due to heterogeneity through the I² statistic, yield a prediction interval excluding the null value (HR=1), display no evidence of small study effects or excess significance, and the association must maintain significance at p<0.05 with the application of a credibility ceiling of 10%. The number of studies in each meta-analysis was also included as eligibility criterion for strong evidence since a sample size greater than three is required for reliable assessment of heterogeneity and small study effects(25,31,43).

Quality assessment

Studies with meta-analyses categorised as providing either highly suggestive or strong evidence underwent quality assessment through AMSTAR 2, a tool for assessing the methodological quality of systematic reviews (44). Studies were assessed by two reviewers (MAC and MC) and consensus reached on any disagreements in quality. Statistical analyses were carried out R(21), including the packages “meta”(22) version 4.8-4 and “ggplot2”(45) version 2.2.1, and Stata 14(35).

Patient involvement

No patients were involved in development of the study design nor were they asked to advise on interpretation. No ethical approval was required for this review since it relied entirely on anonymised, published data.

RESULTS

Characteristics of included meta-analyses

The 57 published articles meeting the criteria for inclusion contained 239 meta-analyses (Appendix 2). The 81 meta-analyses meeting the eligibility criteria arose from 36 of these articles, published between 2014 and 2017 (Figure 1)(46–81). These meta-analyses included individual studies which presented NLR or TAN categorically as either high or low and investigated 40 associations for 27 different cancer diagnoses, including nine subtypes for treatment and four for cancer stage (Supplementary Table 1). The meta-analyses were grouped as all cancers (n=8), gynaecological (n=6), gastrointestinal (n=24), hepatocellular (n=11), respiratory (n=10) and urinary cancers (n=22) (Figures 2A and 2B). Included meta-analyses summarised effect size estimates from 693 individual studies, with OS as the most frequently assessed outcome (n=41). In 51 meta-analyses (63%) total sample size exceeded 1,000 individuals and each meta-analysis had a median of five studies. However, 57 meta-analyses (70%) included less than ten studies and 17 (21%) included only two studies. The characteristics of included meta-analyses are summarised in Supplementary Table 2.

Figure 1 Flowchart of study and meta-analysis selection.

Figure 2 Assessment of consistency in meta-analyses.

A, B - Box plots of random effects HR estimates for each meta-analysis by cancer site and outcome. The Y-axis labelled “HR” details the effect size for each meta-analysis describing an association between NLR or TAN and cancer prognosis for each site grouping. The X-axis labelled “Site” in Figure 2 A represents each site group meta-analyses have been sorted into. The multiple subgroup contains all cancers, defined as a grouping of cancer diagnosis unrelated to site, stage or treatment. The X-axis labelled “Outcome” in Figure 2 B represents the prognostic outcome assessed in each meta-analysis. The outlier of HR=14 for NLR and OS in rectal cancer has been excluded from these figures. C - Log(HR) of largest study versus log(HR) of random effects estimates for each meta-analysis. The Y-axis labelled “log(HR) Largest Study” represents the log of the HR of the largest study included in each analysis. The X-axis labelled “log(HR) Random Effects” represents the log of the HR of the random effects estimate calculated in each meta-analysis. D - Random effects estimates versus inverse variance. The Y-axis labelled “Random Effects Estimates Hazard Ratio” represents the HR of random effects estimate for each meta-analysis. The X-axis labelled “Inverse Variance” represents the inverse of the variance for each meta-analysis.

A total of 74 duplicate meta-analyses were excluded. Nineteen meta-analyses assessing six associations were excluded due to disagreement in significance between duplicates. A further 55 duplicate meta-analyses that agreed in significance, magnitude and direction of effect were excluded for 31 associations and only the meta-analysis with the largest number of studies was included for each association (Appendix 3).

Summary effect size

All estimated summary effect sizes for both fixed and random effects estimates are shown in Supplementary Figures 2-82.

Using a threshold of p<0.05 for statistical significance, 72 of the 81 meta-analyses (89%) were significant with random effects. At a more stringent threshold of p<10⁻⁶, the number of statistically significant meta-analyses for random effects dropped to 38 (47%) (Supplementary Table 2). The 38 meta-analyses with significance at p<10⁻⁶ assessed both NLR and intratumoural neutrophils as indicators of poor prognosis in 19 cancer sites. Thirty-five of these 38 meta-analyses assessed NLR as an indicator of poor prognosis in gynaecologic, gastrointestinal, hepatocellular, respiratory, urinary, and all cancers. Intratumoural neutrophils were assessed as an indicator of poor prognosis in three of the 38 meta-analyses (8%), including urinary and all cancers.

In 20 meta-analyses (25%), the largest study included was not statistically significant at p<0.05. However, 16 of these meta-analyses still had a statistically significant summary random effects estimate. In two meta-analyses, the largest study included had an effect size in the opposite direction to the random effects estimate (HR<1). The largest study effect sizes tended to be more conservative estimates of effect than the random effect estimates, with 58 meta-analyses (72%) yielding a HR which was greater than the point estimate of the largest included study. However, there was correlation between the log(HR) of the summary random effects and the largest study for each meta-analysis, indicating consistency in the results (Figure 2C).

In order to determine the impact of study size on the magnitude of the summary effect size, random effects estimates were plotted against inverse variance for each meta-analysis. When compared to meta-analyses with large variances those with smaller variances studies produced more conservative estimates, displaying a smaller range of HR estimates and a slight tendency toward a null value (HR=1). Meta-analyses with large variance displayed a wide range of random effects HR and included an increased number of HR estimates greater than two (Figure 2D).

Reproducibility

In 21 of the 81 included meta-analyses, the HR was reproduced imperfectly. Nine of the 21 imperfectly reproduced meta-analyses were within 0.01 of the reported HR, and the differences were attributed to rounding errors. The remaining 12 meta-analyses were within 10% of the reported HR, with six meta-analyses (50%) reporting an HR with less than a 2% difference from the calculated HR (Appendix 4).

Heterogeneity between studies

Cochrane’s Q test was significant at p<0.10 in 40 of the 81 included meta-analyses (50%). In 20 meta-analyses (25%), the I² statistic was greater than 75%, indicating a high level of variability between studies due to heterogeneity. An additional 19 meta-analysis yielded I² values between 50% and 75% (Appendix 5). In 30 meta-analyses (37%), the I² statistic was less than 50%, but only six (9%) yielded a I² statistic with a 95% CI that did not include 50%. The confidence interval of the I² statistic was not used as criteria for grading evidence, since large fluctuations in I² occur when meta-analyses include less than 15 studies(43).

Prediction intervals were not calculated for 17 (21%) meta-analyses which had included only two individual studies. The prediction intervals of 46 meta-analyses (57%) included the null value of HR=1. Of 64 meta-analyses (79%) including at least three individual studies, 18 had prediction intervals which excluded the null value (HR=1). The 12 meta-analyses (15%) including exactly three individual studies yielded very wide prediction intervals, all of which included the null value of HR=1.

Small study effects

Sixty-five (80%) of the 81 included meta-analyses were judged to have evidence of small study effects (Appendix 5). Sixty-four meta-analyses included three or more studies and were eligible for further assessment through Egger’s test for asymmetry(22). Thirty-seven (58%) of these 64 meta-analyses yielded significant p-values (p<0.10). However only 24 (30%) meta-analyses included ten or more individual studies, a cut off required to give Egger’s sufficient power(31,32). Twenty-one (88%) of the 24 meta-analyses including 10 of more individual studies yielded a significant Egger’s test (79%), indicating funnel plot asymmetry(30).

Forty meta-analyses analysed between three and nine studies and Egger’s test was significant (p<0.1) in 16 of these (40%). In 11 of the remaining 24 meta-analyses (46%), the summary effects estimate from the random effects model was larger than the point estimate of the largest study and they were considered to have evidence of small study effects.

Excess significance

Seventeen meta-analyses (21%) showed evidence of excess significance bias according to the TES when the effect size of the largest included study was utilised as an estimate of true effect size (Appendix 6). When the fixed summary effect sizes were utilised as an estimation of true effect size, fifteen meta-analyses (16%) showed evidence of excess significance. No meta-analyses showed evidence when the random summary effect sizes were used.