Abstract
Precision oncology has made significant advances in the last few years, mainly by targeting actionable mutations in cancer driver genes. However, the proportion of patients whose tumors can be targeted therapeutically remains limited. Recent studies have begun to explore the benefit of analyzing tumor transcriptomics data to guide patient treatment, raising the need for new approaches for systematically accomplishing that. Here we show that computationally derived genetic interactions can successfully predict patient response. Assembling a broad repertoire of 32 datasets spanning more than 1,500 patients and including both tumor transcriptomics and response data, we predicted the response in 17 out of 21 targeted and 8 out of 11 checkpoint therapy datasets across 8 different cancer types with considerable accuracy, without ever training on these datasets. Analyzing the recently published multi-arm WINTHER trial, we show that the fraction of patients benefitting from transcriptomic-based treatments could potentially be markedly increased from 15% to about 85% by targeting synthetic lethal vulnerabilities in their tumors. In summary, this is the first computational approach to obtain considerable predictive performance across many different targeted and immunotherapy datasets, providing a promising new way for guiding cancer treatment based on the tumor transcriptomics of cancer patients.
Introduction
There have been significant advances in precision oncology, with an increasing adoption of sequencing tests that identify targetable mutations in cancer driver genes. Aiming to complement these efforts by considering genome-wide tumor alterations at additional “-omics” layers, recent studies have begun to explore the utilization of transcriptomics data to guide cancer patients’ treatment1-4. These studies have reported encouraging results, testifying to the potential of such approaches to complement mutation panels and increase the likelihood that patients will benefit from genomics-guided, precision treatments. However, current approaches have still been of a heuristic exploratory nature, raising the need for developing and testing new systematic approaches for utilizing tumor transcriptomics data.
Here we present a novel precision oncology framework aimed at stratifying patients to existing cancer targeted and immunotherapy drugs, based on gene expression data from their tumors. Our main goal is to extend the scope of current approaches from a few hundred driver genes to genomic and transcriptomic alterations occurring across the whole exome. Unlike the recent tumor transcriptome-based precision oncology approaches resorting mainly to the expression of drug targets, our approach is based on identifying and utilizing the broader scope of genetic interactions (GI) of the drug target genes. We focus on two major types that are highly relevant to cancer therapies: (1) Synthetic lethal (SL) interactions, which describe the relationship between two genes whose concomitant inactivation is required to reduce the viability of the cell (e.g., an SL interaction that is widely used in the clinic is of PARP inhibitors on the background of disrupted DNA repair)5. (2) Synthetic rescue (SR) interactions, which denote a type of genetic interaction where a change in the activity of one gene reduces the cell’s fitness but an alteration of another gene (termed its SR partner) rescues cell viability (e.g., the rescue of Myc alterations by BCL2 activation in lymphomas6). When a gene is targeted by a small molecule inhibitor or an antibody, the tumor may respond by changing the activity of its rescuer gene(s), conferring resistance to therapies.
To identify the SL and SR partners of cancer drugs, we leverage two recently published novel computational pipelines, ISLE7 and INCISOR8, respectively, which extract genetic dependencies that are supported by multiple layers of omics data, including in vitro functional screens, patient tumor DNA and RNA sequencing data, and phylogenic similarity across different species. The ISLE pipeline has recently successfully identified a Gq-driver mutation as marker for FAK inhibitor treatment in uveal melanoma9 and a synergistic combination for treating melanoma and pancreatic tumors with Asparaginase and MAPK inhibitors10. Similarly, INCISOR analysis has identified SR interactions that mediate resistance to checkpoint therapies in melanoma8. Here, we set out to evaluate comprehensively for the first time the clinical utility of the computationally inferred GIs as tumor transcriptome-based companion diagnostics for stratifying cancer patients to a host of targeted and immunotherapy drugs in new, ‘unseen’ and independent, patients’ treatment cohorts.
Results
Overview of the analysis
We collected cancer patients’ pre-treatment transcriptomics profiles together with therapy response information from numerous publicly available databases, surveying Gene Expression Omnibus (GEO), ArrayExpress and the literature, and a new unpublished cohort of anti-PD1 treatment in lung adenocarcinoma. Overall, we found 32 such datasets that include both transcriptomics and clinical response data, spanning 21 targeted therapies and 11 immunotherapy datasets across 10 different cancer types.
For each drug whose response we aim to predict, we first applied the ISLE7 and INCISOR8 pipelines to identify the clinically relevant pan-cancer GIs (the interactions found to be shared across many cancer types) of its target genes11. Here we briefly describe the process for identifying and generating predictions based on SLs, and the process for using SRs is analogous (see Figure 1A and Methods): (A) SL inference: For each drug we compile a list of initial candidate SL pairs of its targets by analyzing large-scale in vitro functional screens performed with RNAi, CRISPR/Cas9, or pharmacological inhibition in DepMap12,13. Among these candidate SL pairs, we then select those that are more likely to be clinically relevant by analyzing the TCGA data. Finally, among the remaining candidate pairs we select those pairs that are supported by a phylogenetic profiling analysis7. The top 25 SL partners that pass all three filters are designated as the SL partners of that drug. (B) SL-based response prediction: the identified SL partners of the drug are then used to predict a given patient’s response to a given treatment based on her/his tumor’s gene expression. This is based on the notion that a drug will kill a tumor more effectively when its SL partners are down-regulated, because when the drug inhibits its targets more SL interactions will become jointly down-regulated and hence ‘activated’ (Figure 1B). To quantify the extent of such predicted lethality, we assign an SL-score denoting the fraction of down-regulated SL partners of a drug in a given tumor (Methods). The larger the fraction of SL partners being down-regulated, the higher the SL-score and the more likely the patient is to respond to the given therapy. Analogously for the case of immunotherapy, we use an SR-score for predicting treatment outcomes, which quantifies the differences in the fraction of up or down regulated predicted SR partner genes based on the patient’s tumor transcriptomics, and hence the likelihood of resistance to the given therapies (Methods).
We emphasize that the treatment outcome information available in the 31 test datasets we analyzed is never used in making the response predictions. The treatment outcomes (typically provided in the form of RECIST or progression-free survival (PFS)) were only used to evaluate the resulting prediction accuracy (Figure 1B). Importantly, throughout the paper we use fixed sets of parameters in making predictions for targeted and immuno-therapies, respectively. Taken together, these measures are highly important as they markedly reduce the well-known risk of obtaining over-fitted predictors that fail to predict on datasets other than those on which they were originally built on (which has been observed to occur with many existing machine learning drug response predictors).
SL-based prediction of response to targeted cancer therapies
As a proof-of-principle to assess the performance of our approach, we begin by analyzing four melanoma cohorts treated with BRAF inhibitors, where pre-treatment transcriptomics data and response information are available14-17. Applying the ISLE pipeline to the TCGA cohort, we identified the 25 most significant SL partners of BRAF (Methods). Reassuringly, we find that, as expected, responders have higher SL-scores than non-responders in the 3 different melanoma-BRAF cohorts for which therapy response data is available (Figure 1C). Quantifying the predictive power via the use of the standard area under the receiver operating characteristics curve (AUC of the ROC curve) measure, we find AUCs greater than 0.7 in all three datasets (Figure 1D). As some datasets do not have a balanced number of responders and nonresponders, we additionally quantified the resulting performance via precision-recall curves (often used as supplement to the routinely used ROC curves to obtain a fuller picture when evaluating prediction performance, Figure S1A). As evident from the latter, one can choose a single classification threshold capturing most true responders while misclassifying less than half of the non-responders. Notably, these SL based prediction accuracy levels are better overall as comparted to those obtained by several published transcriptomic based predictors, including the proliferation index18, IFNg signature19, cytolytic score20, or the expression of the drug target gene itself (BRAF in this case). They are also better than interaction-based scores, such as the fraction of down-regulated randomly selected genes, the fraction of in vitro experimentally determined SL partners, the fraction of the identified SL partners of other drugs, or the fraction of down-regulated protein-protein interaction partners (both of size similar to the SL set; empirical P<0.001) (Figure 1E). The fourth melanoma BRAF dataset21 lacks annotated RECIST response information, but we observe that patients with higher SL-scores showed significantly better treatment outcome in terms of overall survival (Figure 1F), as expected. BRAF’s SL partners are enriched with the functional annotation ‘regulation of GTPase mediated signal transduction’ (Fisher exact test P<0.002).
We next tested the prediction accuracy of the SL-based approach on an array of different targeted therapies and cancer types. We collected an additional cohort of 17 publicly available datasets from clinical trials of targeted therapies in cancer, each one containing both pre-treatment transcriptomics data and therapy response information. This compendium includes breast cancer patients treated with lapatinib22, gefinitib23, letrozole24, doxorubicin25, trastuzumab26, everolimus27 and cetuximab28; ovarian cancer patients treated with dasatinib29; colorectal cancer patients treated with irinotecan30, multiple myeloma patients treated with bortezomib31 and non-small cell lung cancer patients treated with sorafenib32. We identified the SL interaction partners of the drug targets in these datasets (Methods) and computed an SL-score for each sample using the SL partners of the corresponding drugs. Importantly, we find that higher SL-scores are associated with better response in 11 of these 15 datasets (Figure 2A), with AUC’s greater than 0.7 (and see precision-recall curves in Figure S1B), while the predictive performance of a variety of expression-based control predictors is random (Figures 2B-C). In the three datasets (one among the 15 datasets above and 2 additional datasets), where we have a sufficient number of samples with patient survival data, we observe that patients with higher SL-scores also have increased overall survival (Figures 2D-F). Taken together, these results indicate that SL interactions of the drug target genes can serve as effective biomarkers for predicting drug response across a wide range of drugs in different cancer types.
SR-based prediction of response to checkpoint blockade
We next studied the ability of our GI-based framework to predict clinical response to checkpoint inhibitors. To this end, we introduced a few modifications into the published ISLE and INCISOR pipelines7,8 (Methods). We used the protein expression levels of PD1 and CTLA4 because they are likely to better reflect their activity than their mRNA levels, and we used co-culture CRISPR data to capture immune interactions modifying T-cell killing instead of the original cancer cell line screens used in the first step of ISLE and INCISOR. Using these modified versions of ISLE and INCISOR, we analyzed TCGA data to identify the SL and SR partners of PD1 and of CTLA4. We did not identify statistically significant SL interaction partners of these checkpoint genes but did find significant pan-cancer SR interactions, which were then used for further analysis. In particular, we identified two types of SR interactions, DU and DD, as we have previously defined7,8: SRs of the DU type denote interactions where the inactivation of the target gene is compensated by the upregulation of the partner rescuer gene, while SRs of the DD type denote interactions where inactivation of the target gene is compensated by the downregulation of the partner rescuer gene. Given a drug and a patient’s tumor expression data, we quantify the fraction f of DU partners that are upregulated in the tumor plus the fraction of DD partners that are downregulated. We define 1-f as the SR-score, where tumors with higher SR scores are likely to respond better to the given checkpoint therapy (Methods).
To evaluate the accuracy of SR based predictions, we collected a set of 11 immune checkpoint therapy datasets that included both pre-treatment transcriptomics data and therapy response information (either by RECIST or PFS). Our collection includes five melanoma datasets35-39 and glioma40, and renal cell carcinoma41 cohorts treated with anti-PD1 or anti-CTLA4. Figure 3A shows that higher SR-scores are indeed associated with better response to immune checkpoint blockade, with AUCs greater than 0.7 in 7 out of 10 datasets, where RECIST information is available (Figure 3B), and Figure S2 shows the corresponding precision-recall curves. As shown in Figure 3C, the prediction accuracy of SR-scores is superior to a variety of expression-based controls. Notably, the SR-based framework is also predictive for a new unpublished dataset of lung adenocarcinoma (LUAD) patients treated with pembrolizumab, an anti-PD1 checkpoint inhibitor, at Samsung Medical Center (Methods, Figure 3B-C (denoted as ‘new SMC dataset’)). In the two datasets (including one additional dataset), where we had a sufficient patient survival data we observe that patients with higher SR-scores show significantly better prognosis for anti-PD1 therapy39 (Figure 3D) and anti-PD1/anti-CTLA4 combination38 (Figure 3E). The SR partners of PD1 and CTLA4 are enriched for IFN-gamma and antigen presentation pathways (Figure S3), including key immune genes such as IFNAR2, IFNGR1, and B2M, and PPI interaction partners of PD1 and CTLA4 such as B2M, RAE1, and C3. Taken together, these results testify that the SR partners of PD1 and CTLA4 serve as effective biomarkers for checkpoint response across a wide range of cancer types.
Next, we computed the SR scores for anti-PD1 therapy for each tumor sample in the TCGA compendium based on the expression of the SR partners (Methods). Based on the optimal classification threshold determined in the analysis presented above, we computed the fraction of responders in each cancer type and compared that to the actual response rates reported in anti-PD1 clinical trials for 16 cancer types42. Notably, we find that these two measures are significantly correlated (Figure 3F), showing that the inferred checkpoint SRs are also predictive and associated with the clinical response observed across different cancer types to these immunotherapies. Taken together, these results show that, adding to the existing determinants of response and resistance to checkpoint therapy in melanoma43,44, SR scores are robust predictors of response to checkpoint therapy across many different cancer types.
Summing up over all the targeted and immunotherapies we studied, our genetic interaction-based approach achieves greater than 0.7 AUC predictive performance levels in 21 out of 28 datasets containing RECIST response information, spanning 14 out of 18 targeted therapy cohorts and 7 out of 10 immunotherapy cohorts (including our new SMC dataset) (Figure 3G). Adding the 4 datasets where SL/SR-score is predictive of progression-free survival, our GI-based framework shows considerable predictive signals in 25 out of 32 cohorts (>78%). Notably, these accuracies are markedly better than those obtained using a range of control predictors.
Finally, we investigated if there are additional expression-based variables that could be potentially used to further refine GI based predictions in the future. Pooling together targeted therapy data to obtain a larger cohort we find that, in the patients who were predicted as responders based on their SL scores, the IFNg signature scores are significantly lower in actual non-responders vs responders (Figure S4A). Analogously, in a pooled analysis of checkpoint response datasets, we find that the non-responders have significantly lower IFNg and cytolytic scores than the responders (this time among the patients predicted as non-responders) (Figure S4B). These observations indicate that, as larger datasets are accrued, it may be worthwhile to combine GI scores and transcriptomics-based measures of immune response to enhance response prediction.
A retrospective analysis of the WINTHER trial
To evaluate our approach in a multi-arm basket clinical trial setting we performed a retrospective analysis of the recent WINTHER trial data, the first large-scale basket clinical trial that has incorporated transcriptomics data for cancer therapy in adult patients with advanced solid tumors1,2. This multi-center study had two arms, one recommending treatment based on actionable mutations in a panel of cancer driver genes and another, doing so based on the patients’ transcriptomics data. We considered the gene expression data of 86 patients with 69 different targeted treatments (single or combinations) that were available. One patient had a complete response, 10 had a partial response and 12 were reported to have stable disease (labeled as responders), while 63 had a progressive disease (labeled as non-responders).
We first identified the SL partners for each of the drugs prescribed in the study. We confirmed that the resulting SL-scores of the therapies used in the trial are significantly higher in responders than non-responders (Wilcoxon ranksum P<0.05, Figure 4A). Notably, the SL scores of the drugs given to each patient are predictive of the actual responses observed in the trial (AUC=0.73, Figure 4B, with an SL-score of 0.32 as the optimal threshold that best balances precision and recall (Figure S5)). As shown in Figure 4C, the prediction accuracy of SL-score is superior to that of control expression-based predictors. This reassuring predictive signal led us to ask the following, fundamental question: how many patients would have likely benefited from the set of drugs employed in the trial, if the treatment choices for patients would have been guided by the SL-based scores of each drug in each patient, as defined and described above?
To answer this question in a systematic manner we considered all FDA-approved targeted anti-cancer therapies from the Drugbank database11 and identified the SL partners of their target genes by running the ISLE pipeline (Methods). Having predicted the SL partners for each of these drugs, we computed the SL scores of all drugs in every patient based on its tumor transcriptomics and ranked the drugs accordingly to identify the top drugs that are predicted to be most suitable for each patient. The resulting analysis shows that for more than 94% (81/86) of the patients we could identify alternate therapies that have higher SL-scores than the drugs prescribed to them in the WINTHER trial (Figure 4D). But given these SL-based treatment recommendations, how many may actually respond? Based on the optimal classification threshold identified earlier, 86% (74/86) of the patients are predicted to respond based on our recommendations, compared to 15% that has been reported to respond in the original trial. Out of the non-responders reported in the WINTHER trial, 87% (55/63) of the patients are predicted to be matched with effective therapies through our recommendations (Figure 4D). The most frequently recommended drug with the highest SL-scores in the WINTHER cohort is dasatinib, followed by venetoclax and cobimetinib (Figure 4E). Reassuringly, an SL-based drug coverage analysis in TCGA (Figure 4F), focusing on the same cancer types and drugs as those studied in the WINTHER trial, shows a similar pattern of top recommended drugs (the drug rankings show Spearman R=0.63, P<0.01). This points to the robustness of the SL-based predictions across different independent patients’ datasets.
Figure 4D reveals an interesting emerging pattern, where samples that display a strong SL vulnerability to one drug usually tend to have SL mediated vulnerabilities to many other drugs. We thus introduced the mean SL-score (computed across all drugs considered) of a given sample as a measure of its aggregate vulnerability. To investigate the molecular features that are associated with the aggregate SL vulnerability at the sample level, we considered several transcriptomics and genomics-based measures of tumor samples in both WINTHER and TCGA cohorts (Figure 4G,H). These include cytolytic activity, IFNg signature, proliferation signature, and transcriptomic instability (TI) (Methods). We additionally considered tumor mutational burden (TMB) and genomic instability (G-Inst) when analyzing the TCGA cohort, where whole exome sequencing (WES) and copy number profiles are available. Our analysis shows that transcriptomic and genomic instabilities are significantly higher in the samples with high mean SL-score in both cohorts. Notably, this finding indicates that SL-based treatment opportunities may actually increase in advanced tumors, which have increases G-Inst and TI levels.
Discussion
We have demonstrated that by mining large-scale “-omics” data from patients’ tumors one can computationally infer candidate pairs of genetic interactions that can be used as predictive companion diagnostics for many targeted and immunotherapy treatments, across multiple cancer types. The resulting prediction accuracy is of potential translational value for many of the drugs tested. Furthermore, as shown in the analysis of the WINTHER trial, its application may significantly increase the number of patients that can benefit from precision-based treatments.
Among the drugs that were commonly predicted to be of clinical utility in our analysis of the latter trial, dasatinib and venetoclax are approved for hematopoietic cancers but their potential use for solid tumors has been investigated in multiple solid tumors including colorectal and breast cancer45,46. Cobimetinib is approved for the treatment of melanoma, and clinical trials are ongoing for multiple tumors including craniopharyngioma, non-small cell lung cancer, pancreatic cancer, and ovarian cancer. Notably, olaratumab, ranked 6th in our analysis, is a monoclonal antibody developed for the treatment of solid tumors, directed against the platelet-derived growth factor receptor alpha (PDGFRA)47. However, olaratumab has been removed from the US and the European markets due to insufficient proof of efficacy. These drugs also attain wide coverage in the independent analysis of the TCGA cohort. Taken together, these results lay a basis for future prospective studies based on SL-based stratification of patients’ treatments from their tumor transcriptomics.
Our predictor is fundamentally different from previous efforts for therapy response prediction in several important ways: (1) The GIs underlying the prediction are inferred from analyzing pre-treatment data from the TCGA ensemble without considering any treatment labels; thus, the resulting predictors are likely to be less prone to the risk of overfitting arising when applying machine learning to build predictors based on the relatively small training datasets that are currently available to us. Furthermore, the GIs used in this study are inferred from a pan cancer analysis of the TCGA – that is, they are common across many different cancer types; as such, they are less context-sensitive and more likely to be predictive in different cancer types. (2) The interactions enabling the predictions have clear biological meanings as functional GIs and their scoring is simple and intuitive, differing from the typical “black-box” solutions characteristic of machine learning approaches. (3) Finally, the resulting predictive performance is far superior to alternative predictors. As far as we are aware, no single model has previously been shown to obtain such predictive accuracies across so many targeted and immunotherapy datasets.
Importantly, we anticipate that the current levels of predictive performance can and will be further improved in the foreseen future. This will likely be achieved by analyzing even larger patient cohorts of different cancer types as those accumulate, by considering the accumulating tumor proteomic data and by developing computational approaches for inferring genetic interactions from single cell sequencing data, which will also enable one to identify genetic interactions that are grounded on specific cell types (e.g., between tumor cells and CD8+ T cell in their environment). As more data accumulates, we may learn how to combine SL and SR interactions together to further boost prediction performance. Finally, the GI-based stratification signatures of each drug are biologically interpretable and in many cases are rather small, so they can be measured in future trials in a cost-effective manner.
In summary, our work presents a new paradigm harnessing genetic interaction networks to advance precision cancer medicine, by systematically analyzing patients’ tumor transcriptomics data. Our results show that computationally inferred genetic interaction partners of the drug targets are predictive of clinical outcomes of cancer treatment when applied to a large collection of patient transcriptomics datasets, without any training on the latter. Combined with the present precision oncology approaches, the genetic interaction-based strategy presented here may provide a transformative way to further advance precision oncology, motivating prospective transcriptomics-based clinical trials to further test its value for treating patients with advanced malignancies.
Methods
Data collection
We collected the patients’ pre-treatment gene expression data with therapy response information from public databases. Our search was focused on GEO and ArrayExpress but also include general literature search using different combinations of the following search terms: ‘drug response patient cancer pre-treatment expression transcriptomics therapy resistance signature’ in March 2019. We charted 21 targeted (drugs of specific targets) and 10 immune checkpoint therapies covering a total of 10 cancer types treated with 15 drugs. In addition, we analyzed a new unpublished dataset of lung adenocarcinoma patients treated with pembrolizumab at Samsung Medical Center.
Identifying genetic interaction networks
To identify clinically relevant SL interactions for targeted therapies, we followed the three-step procedure described in 7. We first created initial pool of SL pairs identified in cell lines via RNAi/CRISPR-Cas912,13 or pharmacological screens48,49. Second, among the candidate gene pairs from the first step, we selected those gene pairs whose co-inactivation is associated with better prognosis in patients, testifying that they may hamper tumor progression. Third, we prioritized SL paired genes with similar phylogenetic profiles across different species. The top 25 clinically relevant partners were selected among the significant partners for each drug. For the cases where a drug has multiple targets, we selected the top partners that are significant among all SL partners of the target genes of the drug. We have confirmed that the predictive signal does not sensitively depend on the number of the top SL partners.
To identify GIs for immunotherapy, we introduced a few modifications to our general GI inference pipeline previously introduced7,8: (1) We used the protein expression levels of PD1 and CTLA4, as these are more likely to better represent the protein activity than its mRNA levels, and this data for PD1 and CTLA4 is available in TCGA (as reverse phase protein lysate microarray (RPPA) values). We considered the level of expression of PD1 (or CTLA4) on CD8+ T cells, which is computed as the ratio between PD1 (or CTLA4) RPPA values and the computationally estimated CD8+ T-cell abundance50. For the GI partner levels, we resorted to their gene expression and somatic copy number alterations (SCNA) data as referenced in previous studies7,8 because protein expression was measured only for a small subset of genes. (2) Instead of using the genome-scale functional screens in cancer cell lines (DepMap12,13) in the first step of ISLE or INCISOR, we analyze genome-wide CRISPR screens performed in co-cultures of cancer and T-cells, identifying genes whose knock-out enhances or prevents T-cell mediated killing. To this end we incorporated four such genome-wide CRISPR screens publicly available51-54. Since two of them involve treatment of anti-PD1/PDL1, we used these two as the basis for identifying GIs involved in anti-PD1 response prediction52,54; whereas the remaining two screens, identifying those genes involved in generic immune response51,53, were used to infer GIs for anti-CTLA4 response prediction. (3) We focused on the mediators of resistance to immune checkpoint therapies using synthetic rescue (SR) interactions, as no statistically significant SL interaction partners were identified via ISLE for either PD1 or CTLA4. The top clinically relevant partners for each of DU and DD type SR interactions were selected among the significant partners for immune-checkpoint predictions. We have confirmed that the predictive signal does not sensitively depend on the number of the top SR partners.
We analyzed TCGA data applying this immune-adapted INCISOR pipeline to identify pan cancer SR interactions that are more likely to be clinically relevant across many cancer types. In particular, we considered two types of SR interactions, defined and studied first in 8, DU and DD: SRs of the DU type denote interactions where inactivation of the target gene is compensated by the upregulation of the partner rescuer gene, while SRs of the DD type denote interactions where inactivation of the target gene is compensated by the downregulation of the partner rescuer gene. For the melanoma37 cohort where only a subset of genes’ expression were measured by nanostring, we performed our analysis only for that subset of genes available in the specific cohorts.
Predicting drug response in patients using GI partners
We used the identified SL or SR partners for drug response prediction. We define SL-score for targeted therapy, that is the fraction of the number of inactive SL partners in a given sample multiplied by an indicator variable that denotes whether the target gene expression is within 70-percentile in the given sample. We used SR-score to predict response to immunotherapy, that quantifies the fraction of overactive DU-type SR partners and inactive DD-type SR partners. In particular, we calculate SR-score, that quantifies the fraction f, of DU partners that are active and the number of DD partners that are inactive, and we used 1-f as the SR-score to predict responders. Higher SL- or SR-score is predictive of response to therapies. In each patient drug response dataset, a gene is determined to be inactive (or overactive) if its expression is below bottom tertile (or above top tertile) across samples in the same dataset following the previous studies7,8. Using the computed SL/SR-scores, we solved either classification problem to predict responders or performed Kaplan-Meier analysis for predicting patient survival, depending on the availability of the data. For the datasets where the response information is available in the form of RECIST criteria, we solved classification problem. For the cases where progression-free survival time is available for all patients (with no censoring event), we used the median progression-free survival from the relevant literature as the cutoff to distinguish the responders from nonresponders and solved a classification problem. For the datasets where we only have overall or progression-free survival with censoring information, we performed Kaplan-Meier analysis.
TCGA anti-PD1 coverage analysis
The objective response rates of anti-PD1 therapy in each cancer type in TCGA were predicted via our SR interaction partners of PD1 identified above. We computed the SR scores in each tumor sample in the TCGA compendium, based on its transcriptomics profiles following the above definition of SR-score, and labeled it as responder or non-responder accordingly using precision-recall break-even point as threshold across all 8 immune checkpoint datasets, where our SR-score is predictive. Using this fixed cut-off, we then computed the fractions of responders for each cancer type and compared these with the actual response rates reported in anti-PD1 clinical trials for 16 cancer types where the data is available42 using Spearman rank correlation. In each patient drug response datasets, a gene is determined to be inactive (or overactive) if its expression is below bottom tertile (or above top tertile) across samples in the same dataset following the previous studies7,8.
Retrospective analysis of WINTHER trial
The trial involved 10 different cancer types, mostly colon, lung and head and neck cancer. This trial has two arms, one recommending treatment based on actionable mutations in a panel of cancer driver genes and another based on the patients’ transcriptomics data. We considered the Agilent microarray data of 86 patients with 69 different targeted treatments (single or combinations) that were available to us. One patient had a complete response, 10 had a partial response and 12 were reported to have stable disease (labeled as responders in our analysis), while 63 had a progressive disease (labeled as non-responders). For each of the drug that was used in the trial, we identified their target genes’ SL partners following our computational pipeline described above. We calculated SL-score of the therapy used in the WINTHER trial and quantify the accuracy of SL-score in predicting response across the 86 patients via ROC analysis. We focused on the patients who received monotherapy to compute the AUC of ROC curves, since our subsequent analysis is focused on monotherapies. To determine alternative treatment options recommended by SL-based approach, we then considered all FDA-approved targeted therapies obtained from the Drugbank database and the literature as candidates. We then identified SL partners of the drug target genes, where we find significant SL partners for each drug. Having predicted these SL partners for each of these drugs, we finally computed SL-score of those drugs for all patients using the identified SL-network and the patients’ transcriptomic data and ranked the drugs accordingly to identify the top drugs that are predicted to be most suitable for each patient. To evaluate the robustness of our approach, we repeated the same analysis using the matching cancer types in TCGA cohorts. We define the SL vulnerability as the mean SL-score of a given tumor among all the SL-scores of different drugs. We checked the association between different tumor molecular phenotypes and the SL vulnerability, where the tumor samples were divided into two groups with high SL vulnerability (top tertile) and low SL vulnerability (bottom tertile), and Wilcoxon ranksum test was performed for the comparison. We considered the following phenotypes: cytolytic activity, IFNg signature, proliferation signature, transcriptomic instability in WINTHER and TCGA cohorts. Transcriptomic instability quantifies the genome-wide extent of the transcriptomic dysregulation of genes by computing the fraction of over- or under-expressed genes among all protein coding genes. The over- (or under-) expression of a gene is defined as the top or bottom tertile of the expression of the given gene in the reference population. If we have sufficient number of samples with the same cancer type in the same dataset, those were used as reference; if we do not have sufficient number of samples with the same cancer type in the same dataset, we used TCGA samples of the same cancer type as reference. Genomic instability and tumor mutational burden were considered only in TCGA where whole exome sequencing and copy number profiles are available.
Acknowledgements
This research is supported in part by the Intramural Research Program of the National Institutes of Health (NIH), National Cancer Institute (NCI), Center for Cancer Research (CCR). This work utilized the computational resources of the NIH HPC Biowulf cluster. We thank Peng Jiang, Alejandro A. Schäffer, E. Michael Gertz, and Sanna Madan at Cancer Data Science Lab, CCR/NCI/NIH for insightful comments.