Dissecting tumor cell programs through group biology estimation in clinical single-cell transcriptomics

Given the growing number of clinically integrated cancer single-cell transcriptomic studies, robust differential enrichment methods for gene signatures to dissect tumor cellular states for discovery and translation are critical. Current analysis strategies neither adequately represent the hierarchical structure of clinical single-cell transcriptomic datasets nor account for the variability in the number of recovered cells per sample, leading to results potentially confounded by sample-driven biology with high false positives instead of accurately representing true differential enrichment of group-level biology (e.g., treatment responders vs. non-responders). This problem is especially prominent for single-cell analyses of the tumor compartment, because high intra-patient similarity (as opposed to inter-patient similarity) results in stricter hierarchical structured data that confounds enrichment analysis. Furthermore, to identify signatures which are truly representative of the entire group, there is a need to quantify the robustness of otherwise statistically significant signatures to sample exclusion. Here, we present a new nonparametric statistical method, BEANIE, to account for these issues, and demonstrate its utility in two cancer cohorts stratified by clinical groups to reduce biological hypotheses and guide translational investigations. Using BEANIE, we show how the consideration of sample-specific versus group biology greatly decreases the false positive rate and guides identification of robust signatures that can also be corroborated across different cell type compartments.


Abstract 23
Given the growing number of clinically integrated cancer single-cell transcriptomic studies, robust 24 differential enrichment methods for gene signatures to dissect tumor cellular states for discovery 25 and translation are critical. Current analysis strategies neither adequately represent the 26 hierarchical structure of clinical single-cell transcriptomic datasets nor account for the variability 27 in the number of recovered cells per sample, leading to results potentially confounded by sample-28 driven biology with high false positives instead of accurately representing true differential 29 enrichment of group-level biology (e.g., treatment responders vs. non-responders). This problem 30 is especially prominent for single-cell analyses of the tumor compartment, because high intra-31 patient similarity (as opposed to inter-patient similarity) results in stricter hierarchical structured 32 data that confounds enrichment analysis. Furthermore, to identify signatures which are truly 33 representative of the entire group, there is a need to quantify the robustness of otherwise 34 developed a nonparametric statistical group biology estimation method (group Biology EstimAtion 72 iN sIngle cEll, "BEANIE") inspired from He et al., 9 addressing the above-mentioned issues (Fig.  73 1, see Methods). This method first estimates the statistical significance (empirical p-value) of the 74 test signatures through a Monte Carlo approximation of the test signatures' p-value distribution 75 (test distribution) and that of the random signatures' p-value distribution (background distribution), 76 followed by contextualisation of the former with respect to the latter. It then uses the leave-one-77 out cross-validation approach (sample exclusion) to infer robustness of the gene signatures (see 78 Methods). We used publicly available datasets to demonstrate the utility of this method, and 79 present suggested guidelines for the design of clinically embedded single-cell transcriptomic 80 studies in oncology. 81 82

83
We evaluated single-cell transcriptomic data in two cancer types (melanoma and lung cancer) 84 that have the following clinical groups for comparison: (i) response to treatment; and (ii) disease 85 progression. 1,2,10-12 We contextualised their tumor compartments with signatures from the 86 our finding in the tumor compartment (differential HALLMARK_IL2_STAT5_SIGNALING). We 117 also identified the top constituent genes (ranked according to log2 fold change and robustness to 118 sample exclusion, see Methods) for these three signatures, and found that these genes were 119 differential in the tumor compartment uniformly across samples of a given group (Fig. 2e, Table  120 S2). Together, these results describe the tumor microenvironment of the ICB-exposed group 121 (consisting of treatment-resistant patients) as one depleted of T cells, with reduced IL2-STAT5 122 signaling, TNFA-NFKB signaling, and inflammatory response relative to the ICB-naive group. 123 124 Additionally, we found that the signature for genes upregulated by IL6 via STAT3 125 (HALLMARK_IL6_JAK_STAT3_SIGNALING) was upregulated in the ICB-naive group and 126 statistically significant and robust to sample exclusion. Using a MWU test, we found differential 127 STAT3 expression in the tumor compartment for the ICB-naive group (p-value = 3.28e-36). 128 Furthermore, we also found a positive correlation between the STAT3 expression and 129 HALLMARK_IL6_JAK_STAT3_SIGNALING signature score in the tumor cells on an individual 130 cell basis (Fig. 2f). This observation supports the finding that IL6 could potentially induce 131 downstream signaling via STAT3 in the tumor cells of the ICB-naive group. 15,16 132 133 We further examined the cause for non-robustness of the signatures that were identified as 134 statistically significant but not robust to sample exclusion by BEANIE (Fig. 2c). We found that the 135 exclusion of one or more samples led to statistically non-significant results, in contrast to when 136 the sample was included, by shifting the empirical p-value to greater than 0.05 as a result of an 137 overlap between the test distribution and the background distribution as shown in Fig. 2d (see 21 other signatures. This variability due to sample exclusion was also not explained by any of the 141 other available clinical variables (e.g., age, sex). Therefore, these signatures were driven by 142 sample-specific biology, and were consequently not representative of the group-level biology, but 143 would have otherwise been considered differentially enriched with statistical significance using 144 either of the conventional MWU test or GLM approaches. 145 146 We next investigated the methodological stability with respect to subsample size (Fig. 2g), and 147 accordingly repeated BEANIE's workflow using smaller subsample sizes. We found that a smaller 148 subsampling of cells led to fewer signatures that were identified as statistically significant and 149 non-robust to sample exclusion, and even fewer that were identified as both statistically significant 150 and robust to sample exclusion. However, the number of statistically significant and robust 151 signatures identified by BEANIE reached saturation around the subsample size of 30 cells per 152 sample, indicating that the subsample size of 60, which had been used for all of the 153 aforementioned results, could successfully capture all statistically significant signatures from the 154 test signature sets that were also robust to sample exclusion. 155

156
To assess the ability to detect noise from a true signal, we additionally used a curated set of all of the immune cell surface marker signatures as both statistically non-significant and non-164 robust to sample exclusion (Table S3)

Group biology analysis of distinct clinical states in non-small cell lung carcinoma 174
In an effort to demonstrate the applicability of BEANIE for a meta-analysis composed of multiple 175 single-cell transcriptomic datasets, we next analyzed the tumor compartments from four published 176 lung cancer studies 2,10-12 to evaluate potential differentially enriched signatures between early-177 vs. late-stage samples. We selected patient samples which satisfied the following criteria: (i) had 178 more than 50 tumor cells; (ii) were classified as adenocarcinoma; (iii) were staged as either I, II, 179 or IV (early-stage = I and II; late-stage = IV); and (iv) had received no prior treatment at the time 180 of sample collection. Filtering according to these criteria yielded a total of 18251 malignant cells 181 across 17 patients (11 early-stage, 6 late-stage) (Fig. 3a). 182 183 We sought to characterise the tumor compartment with Hallmark and Oncogenic gene sets from 184 MSigDb (Fig. 3b, Table 1). Again, a large number of gene sets predicted as differentially enriched 185 with statistical significance by a MWU test with a BH correction and GLMs were identified as statistically non-significant and non-robust to sample exclusion by BEANIE (Fig. 3c, Table S1). 187 We found a signature composed of genes important for mitotic spindle assembly 188 (HALLMARK_MITOTIC_SPINDLE) to be statistically significant and robust to sample exclusion 189 for early-stage lung tumors with BEANIE, consistent with prior studies. 18 Another signature 190 comprised of genes encoding proteins involved in glycolysis and gluconeogenesis 191 (HALLMARK_GLYCOLYSIS) was also found to be statistically significant and robust to sample 192 exclusion for the early-stage tumors with BEANIE, which is in agreement with a prior study 19 193 describing an association between TKI treatment and its effect on decreased activity of glycolysis.  Table 1, Table S1). Among the signatures that were found to be statistically 207 significant and robust to sample exclusion with BEANIE, signatures upregulated in the TKI-208 exposed group included a signature for genes upregulated in response to IFNG 209 (HALLMARK_INTERFERON_GAMMA_RESPONSE) and a signature for genes upregulated by the overexpression of WNT1 (ONCOGENIC_WNT_UP.V1_UP). We identified the top constituent 211 genes of both signatures, and found them to be consistently upregulated across all samples in 212 the TKI-exposed group (Fig. 4c, Table S2). Interferon gamma response has been described to be 213 associated with response to TKI treatment in non-small cell lung cancers. 20 Using a MWU test, 214 we observed that genes encoding the IFNG receptors (IFNGR1, IFNGR2) were differentially 215 expressed with statistical significance in the tumor cells of the TKI-exposed group (p-value 216 [IFNGR1] = 3.22e-104, p-value [IFNGR2] = 1.81e-13, Fig. S3). WNT signaling has also been 217 extensively studied in the context of cancer development, and increased WNT signaling has been 218 associated with tumor progression and metastasis in many different cancers. 21 We assessed 219 potential intratumoral differential gene expression of WNT1 in the tumor cells and found an 220 absence of intratumoral WNT1 expression altogether. We then assessed potential WNT1 221 differential gene expression in specific immune cell compartments (NK cells, macrophages, and 222 T cells) and found a statistically significant differential expression of WNT1 for the TKI- (ICB-naive vs. -exposed melanoma, early-vs. late-stage lung cancer, and TKI-naive vs. -exposed 245 lung cancer) for the signatures that had been classified by BEANIE as both statistically significant 246 and robust to sample exclusion. 247

248
We observed that across all datasets, the MWU test with a BH correction had a high average 249 false positive rate, followed by GLMs which exhibited a moderately high average false positive 250 rate. By contrast, BEANIE had the lowest average false positive rate, that in some cases also 251 approached the significance level (alpha) of 5% (Table 2). Individual false positive rates calculated 252 for all robust and statistically significant signatures can be found in Table S4.  253   254 To evaluate how a smaller number of cells being tested, and thereby reduced statistical power, 255 would impact the false positive rate for a MWU test with a BH correction and GLMs, we 256 subsampled cells from each sample being tested to a number equivalent to BEANIE's subsample 257 size and repeated the type I error estimation. We found that subsampling decreased the false positive rates for both a MWU test with a BH correction and GLMs, but that their false positive 259 rates were still relatively higher than those calculated with BEANIE, corroborating BEANIE's 260 aptitude for detecting robust and true signals as compared to the other two methods, and also 261 reinforcing the need to incorporate robustness estimation into differential enrichment testing. 262 263

Group biology estimation in immune cells 264
While this strategy was primarily developed to overcome challenges for differential enrichment 265 testing specifically within the tumor cell compartment, we also evaluated whether the subsampling 266 and sample exclusion approach implemented within BEANIE would likewise yield biological 267 insights in immune cell compartments, as well as to further validate some of the initial hypotheses 268 from the tumor compartment analyses. As a test case, in continuation of the preliminary evaluation 269 of the CD8+ T cell compartment as described in the earlier tumor compartment analysis, we more 270 comprehensively dissected the CD8+ T cell compartment in ICB-naive vs. -exposed melanoma 271 patients in an isolated context here (Fig. S1). 272 273 We filtered out samples which had fewer than 50 CD8+ T cells, yielding a total of 1292 cells across 274 11 patients (5 ICB-naive, 6 ICB-exposed). We evaluated the potential statistically significant 275 differential enrichment and robustness to sample exclusion of various signatures representing a 276 range of CD8+ T cell subtypes and states from Oliveira et al. 23 between the ICB-naive and -277 exposed patient groups. We found that previously reported signatures for early activated CD8+ T 278 cells (Sade-Feldman_5 24 ) and memory precursor effector CD8+ T cells (Joshi_MPEC, 25 murine-279 derived) were statistically significant and robust to sample exclusion, and upregulated for the ICB- Conventional differential enrichment methods, such as a MWU test with a BH correction and 291 GLMs, are limited in correctly estimating differential biology in clinical tumor single-cell 292 transcriptomic datasets in two aspects. First, they have an appreciably high false positive rate, 293 which can be attributed in part to an increased power of statistical tests (due to high cell numbers) 294 to detect differences between groups. However, increased power does not necessarily signify a 295 biologically relevant difference. Consequently, interpretation of these differences in a group 296 biology context is requisite to correctly distinguish genuine group biological differences from 297 technical artifacts (such as variation in cell numbers). We also observed that subsampling alone 298 is insufficient to tackle this problem, and it is important to use a background distribution for 299 contextualisation. Second, conventional differential enrichment methods do not assess the 300 robustness of a signature to sample exclusion, and as a consequence, these methods may lead 301 to results being sample-driven and of uncertain translational importance. This issue is particularly 302 relevant in clinical contexts, and especially for tumor cell compartments which demonstrate higher 303 intra-patient similarity than inter-patient similarity, as hypotheses based on group comparison 304 (about treatment effects, disease progression, etc.) may impact future clinical trials.

306
To address the shortcomings of conventional differential enrichment methods, we developed 307 BEANIE, a nonparametric statistical method for estimating group biology in clinical single-cell 308 transcriptomic datasets. We demonstrated its application on publicly available datasets from six 309 clinical single-cell transcriptomic studies, and illustrate its aptitude to detect statistically significant 310 and robust gene signatures as compared to conventional methods, through its low false positive 311 rate as compared to its counterparts (MWU test followed by a BH correction and GLMs). We 312 illustrated its extensive application in the tumor compartment, and its potential utility for the 313 immune compartment as well. It may likewise be used to identify differential enrichment of gene 314 signatures in the stromal compartment. Finally, we demonstrated that BEANIE is adept at 315 distinguishing sample-driven signatures from group-driven signatures, whereas conventional 316 differential enrichment methods fail to do so. Alternate models for representation of tumor single-317 cell data include hierarchical linear models; however, unlike BEANIE, they are parametric and 318 therefore assume normality and homogeneity of variance for the data. size. This random sampling is repeated multiple times to generate different random signatures 410 (Rkl, l = 1, 2, … , nr, where nr = the total number of times subsampling is repeated). The rationale 411 for generating the random signatures is that they should not represent any biologically meaningful 412 gene signature, and as a consequence, their differential expression can be used as a null 413 distribution (background distribution) for interpretation of the results in a biological context. (iii) 414 Each cell ci is scored for Rkl's using the aforementioned signature scoring method.

Folds and Subsampling 417
To accomplish BEANIE's two-fold aim of having equal sample representation and quantifying 418 robustness for Sjs, two statistical techniques, Monte Carlo approximations (subsampling) and 419 leave-one-out cross-validation (sample exclusion), are coupled. First, the data is divided into folds 420

Identification of Differentially Enriched Signatures 432
A multi-step strategy is adopted to identify differentially enriched signatures. First, for each 433 subsample belonging to the fold f q , a MWU test is performed between the two groups for every 434 Sj. Additionally, for each fold fq, a null p-value distribution is generated by a MWU test between 435 the two groups for every Rkl. The null distribution generated is fold-specific to ensure that the 436 sample excluded from the fold is also excluded for the generation of the null distribution. The subsamples to represent the p-value for a given fold, followed by a median across all folds to 440 represent the cell's p-value. To quantify the robustness of Sj to sample exclusion, a ratio 441 (henceforth referred to as the Fold Rejection Ratio (FRR)) is defined, and calculated for every fold 442 scores are used as covariates (exog variable), and the group labels (e.g., treatment-naive or -474 exposed and early-stage or late-stage) as the response variable to be modelled (exog variable). 475

Calculation of False Positive Rate (Type I Error) 476
False positive rate (type I error) refers to the probability of detecting a result by chance. To 477 calculate this, we permute the patient ID and group label in such a way that roughly equal numbers 478 of samples from the original group labels are placed in both comparison groups. We then repeat 479 the BEANIE workflow on these permuted datasets for signatures which are classified by BEANIE 480 as statistically significant and robust to sample exclusion in the original dataset to evaluate the 481 type I error rates for our predictions. In addition, we also run a MWU test followed by a BH correction and GLMs for these signatures to compare the type I error rates across the three 483 methods. Finally, to investigate whether Monte Carlo subsampling (with equivalent statistical 484 power to that of BEANIE's workflow) would affect the false positive rate. For this, we subsample 485 a random set of cells equal to the number of cells subsampled in the BEANIE workflow and repeat 486 the MWU test and GLM methods. 487 For the ICB-naive vs. ICB-exposed melanoma dataset (14 samples, 7 in each group) and early-488 stage vs. late-stage lung cancer dataset (17 samples, 11 early-stage and 6 late-stage), we ran 489 1000 simulations per gene signature with the above workflow to estimate the false positive rate. 490 For the TKI-naive vs. TKI-exposed lung cancer dataset (10 samples, 6 TKI-naive and 4 TKI-  BEANIE, for all datasets (ICB-naive vs. -exposed melanoma, early-vs. late-stage lung cancer, 716 and TKI-naive vs. -exposed lung cancer). 717 718 Table S2: Top genes for Hallmark and Oncogenic gene sets for all datasets (ICB-naive vs. -719 exposed melanoma, early-vs. late-stage lung cancer, and TKI-naive vs. -exposed lung cancer). 720 721