Facilitating the design of combination therapy in cancer using multipartite network models: Emphasis on acute myeloid leukemia

From the drug discovery perspective, combination therapy is recommended in cancer due to efficiency and safety compared to the common cytotoxic and single-targeted monotherapies. However, identifying effective drug combinations is time- and cost-consuming. Here, we offer a novel strategy of predicting potential drug combinations and patient subclasses by constructing multipartite networks using drug response data on patient samples. In the present study, we used Beat AML and GDSC, two comprehensive datasets based on patient-derived and cell line-based samples, to show the potential of multipartite network modeling in cancer combinatorial therapy. We used the median values of cell viability to compare drug potency and reconstruct a weighted bipartite network, which models the interaction of drugs and biological samples. Then, clusters of network communities were identified in two projected networks based on the topological structure of networks. Chemical structures, drug-target networks, protein-protein interactions, and signaling networks were used to corroborate the intra-cluster homogeneity. We further leveraged the community structures within the drug-based multipartite networks to discover effective multi-targeted drug combinations, and the synergy levels which were supported with more evidence using the DrugComb and the ALMANAC databases. Furthermore, we confirmed the potency of selective combinations of drugs against monotherapy in vitro experiment using three acute myeloid leukemia (AML) cell lines. Taken together, this study presents an innovative data-driven strategy based on multipartite networks to suggest potential drug combinations to improve treatment of AML.


Introduction
Studies on cases with advanced cancers has shown that less than 10% of patients have actionable mutations, and improvement of outcomes was not observed in a randomized trial of precision medicine based on genomic profiles (1). The current limitation of genomics-centric personalized medicine falls short on the enormous heterogeneity and lack of actionable and sustainable treatment options. With a few exceptions, patient genomic signatures with clinical pathology do not typically predict drug responses. More precisely, cancer can principally be considered as a signaling disease, not a genetic disease. There is a wealth of data that has validated this hypothesis, including signaling behaviors involved in growth factor and nutrient responses, the process of entering and exiting the cell cycle, ensuring that chromosomes are segregated in an orderly, efficient and accurate manner during mitosis and apoptosis (2,3). On the other hand, the complexity of crosstalk between signaling pathways necessitates to modulate multiple targets in cancer cells, otherwise lack of complete response, resistance, and relapse will emerge during the course of treatment.
Despite the fact that large amounts of small molecules or drugs have been tested on many cell lines or patient-derived samples, using single drugs as monotherapies to cure cancer might not be a promising strategy, as it is known that the complex interactions of various biological components can lead to drug resistance during the treatment of cancer (4)(5)(6). As a matter of fact, monotherapy and the slogan of "one target one drug" is inefficient to cure complex diseases such as cancer (7,8). Combination therapy or polytherapy with synergistic drugs may achieve a more effective and safer outcome by targeting several targets in the same or separate pathways of the complex system (4). To better identify the synergistic drug combination based on precision medicine, we need ex-vivo drug screening to decipher the functional impact of cancer genomics at the phenotypic level to understand their interactions in the context of biological networks (9,10). Therefore, understanding network biology may provide a unique opportunity to leverage the rich source of drug response data to offer network-based models for combinatorial therapy.
These network models have shown promises for the development of clinical decision support tools to discriminate functional patient subclasses (11,12). Even though there are networks reconstructed to model biological mechanisms of diseases and predict drug combination synergies based on molecular data (13)(14)(15)(16), network models have not been systematically applied to patient data, i.e., drug response data of patient-derived samples to predict patientcustomized drug combinations (14). Instead, the ex vivo drug response data are straightforwardly translated into the clinic for patient treatment since these individualized experiments represent the efficiency of some approved drugs on patient-derived primary cultures (17,18).
In 2018, the Beat AML program reported a cohort of 672 tumor specimens collected from 531 patients, analyzing the ex vivo sensitivity for 122 drugs, as well as the mutational status and the gene expression signatures of the samples (19). Despite the dearth of large patient-related drug response datasets, some large cell line-based datasets, such as GDSC and ALMANAC, can offer a strong source of supporting evidence for predictions. The GDSC (Genomics of Drug Sensitivity in Cancer) database contains the response of 1001 cancer cell lines to 265 anti-cancer drugs, providing a rich source of information to connect genotypes with cellular phenotypes and to identify cancer-specific therapeutic options (20). The largest publicly accessible dataset for cancer combination drugs, i.e., ALMANAC, was recently published by the U.S. National Cancer Institute. This data collection contains more than 5,000 combinations of 104 investigational and licensed drugs, with synergies calculated against 60 cancer cell lines, resulting in more than 290,000 synergy scores (21). Moreover, DrugComb (https://drugcomb.org/), a web-based portal for the storage and study of drug combination screening datasets, offers a comprehensive visualization of drug combination susceptibility and synergy, which can significantly aid understanding of drug interactions at unique dosage levels. Drugcomb now has 751,498 drug combinations and 717,684 single drug screens from 37 trials, which relate to 2040 cell lines and 216 cancer forms (22).
In this study, we developed a network pharmacology approach to predict potential drug combinations for AML based on BeatAML dataset . We proposed a drug combination strategy using multipartite network modeling of ex vivo drug screening data. By ex vivo drug response data, we directly accessed the individual phenotypes of the patients' cancer cells, and by network modeling, we demonstrated the similarity of drugs and AML patients. Then, we used the community structures within the drug-based multipartite networks to discover effective multitargeted drug combination regimens. Our predicted combinations of drugs were only suggested on the basis of the phenotypic interactions of the cancer cells or patient samples to the drugs without prior understanding of the genetic origin or molecular understanding of the disease.

Methods
The entire workflow of the present study is illustrated in Fig. 1. The weighted bipartite network is constructed using BeatAML data set. This data set is a collaborative research program of 11 academic medical centers providing data on AML samples offering genomics, clinical, and drug responses. It includes a cohort study of 672 tumor specimens collected from 531 patients analyzing 122 drug responses. In order to construct a weighted bipartite network, the best response read out of drug potency was defined using information-based measures. Then two unipartite networks obtained using network projection on samples and drugs. In the next step, communities of two projected networks were extracted and intra-cluster homogeneity analysis was performed using the similarity of drugs and patients/ cell members based on available gene expression profiles for patients and protein-protein interaction network and biological pathways.
The drug candidates for drug combination were selected at two different communities and a high-throughput drug screening was used to assess their synergetic effects.
Defining the response read-out for drug screening experiments Pharmacogenomic studies require extensive standardization, to avoid inconsistency of drug responses data for further research and unbiased predictions (23,24). Therefore, first, we controlled the quality of cell viability data to select the potent compounds. To aim this, we examined the raw datasets in terms of the availability of replicated data and outlier detection, followed by assessment of distribution, pairwise correlation, and homoscedasticity analyses to select the best response read-out or "measure" of drug potency. This analysis was performed  projected the bipartite network into the two similarity networks, i.e., drug similarity network and sample similarity network. In the network projection, two unipartite graphs are derived from a bipartite graph, which result in deducing the node's relationships of the same type. In this work, we projected similarity networks, which take into account the edge weights in the bipartite network. Then, we studied the general properties of the networks such as network heterogeneity, centralization, and clustering coefficients. The critical step was community detection within the projected networks in order to discern functionally similar drugs and similar cells or patients in terms of drug response. The modularity index was used to determine the best community detection algorithms, including infomap (26), fast greedy (27), and spinglass (28). In the next step, we explored the network modules to propose a strategy for drug combination design.

Computational corroboration
Multiple computational methods were applied to validate the predictions of the drug combinations and patient or cell stratification. The validation of the community structures is similar to the general cluster quality assessment method, and we assessed the clustering performance by matching clustering structures to prior knowledge. This validation will be a basis to support the possible drug combination designs. In other words, the combination of distinct drugs in terms of chemical structure, target profile, and implicated biological pathways is most likely more efficient than similar drugs (7). Therefore, we used the drug-target network, protein- n is the number of genes. In all cases, the similarity or distance scores were compared with the random grouping of small molecules or biological samples to perform statistical testing.
The synergy scores provided by the DrugComb database (34) were used to corroborate synergistic combinations of our network-based predictions, including HSA, Bliss, Loewe, ZIP, CSS and S. Let's assume that drug 1 at dose x1 and drug 2 at dose x2 is used to produce effects of y1 and y2 , and yc is the effect of their combination. Drug's effect is usually measured as percentage of cell death and a drug combination is classified as synergetic, antagonistic or non-interactive (35). The expected effect denoted by ye represents a non-interactive level and it is quantified based on a reference model. Several mathematical models have been introduced to calculate the expected effect by assuming specific principals. The HSA model (36) considers the expected combination effect as the maximum of single drug effects, i.e. ye = max(y1,y2). The loewe model (37) assumes that an individual drug produces ye at a higher dose than in the combination. In the Bliss model (38), ye is the effect of the two drugs when they act independently. The ZIP model (35) consider the assumptions of the Loewe and Bliss models by assuming that at reference model two drugs do not potentiate each other. CSS determines the sensitivity of a drug pair and S synergy is based on the difference between the drug combination and the single drug dose response curves (39).

Defining the edge weight of bipartite networks
In the Beat AML dataset, a set of 122 inhibitor drugs were used against 531 patient-derived AML samples. The spectra of low to high potency of drugs were observed across the patientderived samples. However, this panel of small molecule inhibitors was selected according to their activity against the proteins involved in tyrosine-dependent and non-tyrosine kinase pathways particularly for AML (19). In the first step, we determined a weight value of drug-sample interaction to be used in the bipartite network reconstruction. This value should describe the most potent compounds for inhibiting tumor cells based on the drug sensitivity analysis. In addition to the relative and absolute IC50 , RI value (39), and AUC, we calculated the median of cell

Analysis of bipartite networks
In the next step, the maximum square submatrix of patient samples and small molecules was used as the incidence matrix of the bipartite network. To be specific, we selected the listwise deletion strategy to remove missing values and we used the complete cases of both variables. The downstream analysis was done on an undirected weighted bigraph consisting of 176 (88+88) nodes and 7744 edges (Fig. 3A). The distribution of the min-max normalized edge weights indicated positive skewness , indicating that the cells were not highly sensitive to most drugs. All the performed analyses were also carried out for the GDSC dataset as a proof of concept. The undirected weighted bigraph of the GDSC dataset consisted of 532 (266+266) nodes and 70,756 edges (Fig. 3C). The distribution of the min-max normalized edge weights showed positive skewness in this dataset as well (Fig 3D), indicating again low potency for most of the drugs. Therefore, exploring the best combination is not straightforward, and categorizing drug-sample interactions seems to be required. Following the projection of these bigraphs as outlined in Fig. 3, two projected graphs called the patient similarity network (PSN) and drug similarity network (DSN) were reconstructed, such that each edge was obtained by the multiplication of weighted incidence matrix. Thus, the edge weights of the projected graphs indicate the profile similarities of patient samples in PSN and small molecule inhibitors in DSN. Note that the edge weight values in DSNs and PSNs are distinct due to different matrix multiplications. The PSN and DSN of the Beat AML dataset contained 88 nodes and 3828 edges (Fig. 4), while in the GDSC projected similarity networks, there were 266 nodes and 35,378 edges (data not shown). In Fig. 4, the larger node size, the more sensitive patient-derived samples and more potent drugs. In this subset of Beat AML dataset without missing data, patient 16-00627 was found to be the most sensitive and SNS-032 was the most potent inhibitor (See Supplementary   Fig. 1). The community detection was subsequently done for both similarity networks via optimizing a modularity score, resulting in two communities for DSN with 50 and 38 small molecules, and two communities for PSN with 39 and 49 patient samples. In other words, we identified two clusters of patients with distinctive drug response profiles, suggesting two subcategories of the disease. Also, we detected two clusters of small molecules, which pointed disparate inhibiting patterns on the patient samples. In the following steps, we presented evidence of the consistency of cluster members in both networks using prior knowledge.

Drug similarity network
Focusing on small molecules, we presumed that inhibitory molecules with correlated effects on cell survival were likely to have similar structures, purposes, and functions (40)(41)(42)(43). Therefore, we evaluated the similarity of SMILES structures and the analogy of protein targets and biological pathways of detected clusters in the DSNs against random groupings of molecules.
The distribution of the Dice similarity of SMILES structures differed significantly between the random grouping and the clusters based on network topology (Fig. 5A). The statistical test of the median difference also resulted in the lowest p-values for the both pairwise two-sample Wilcoxon and Kruskal-Wallis rank sum test (p-value < 2e-16). To evaluate their target similarities, we explored the protein targets of the small molecule inhibitors and examined the number of the target intersections of small molecule pairs within the clusters. In this analysis, drug target commons (DTC) and OmniPath were applied to explore the binding targets of small molecules and second-order node neighbors (secondary target) in the signaling network, respectively. Assuming that proteins usually correspond to multiple signaling pathways, the KEGG database was used to check the number of pathway intersections of the protein targets for each pair of small molecules. The median similarity measures of the intersections within the network clusters were significantly higher than those for a large set of random pairs of small molecules (Fig. 5B-D) (p-value < 2.2e-16, Kruskal-Wallis rank sum test). Analysis of the GDSC dataset gave similar results (Fig. 6) (p-value < 2.2e-16, Kruskal-Wallis rank sum test), suggesting that our method is also reproducible for the analysis of cell line-based datasets.

Patient and cell-line similarity network
Next, we examined the member consistency of the patient clusters in the PSN using other available data from the patient samples in the Beat AML dataset. The gene expression data including RPKM and CPM of the samples were utilized to check pairwise similarity of the cluster members. The similarity measures were also computed for a large set of random pairs of patient samples to compare with our patient stratification using network clustering. When we compared the harmonic mean similarities of the RPKM values, the pairwise similarities of patients within the clusters were significantly larger than those of the randomly selected patients (p-value < 2.2e-16, Kruskal-Wallis rank sum test) (Fig. 7A). For the CPM dataset, the distributions of Jaccard distance were shown, where the distances within the clusters were found to be statistically lower than those in the random group (p-value = 4.655e-05, Kruskal-Wallis rank sum test) (Fig. 7B).
For the GDSC dataset, we used the expression profiles of signature genes provided by the SPEED platform (44). Then, differentially expressed genes have been used to provide gene signatures of perturbed cancer-related pathways. In this dataset, there are 11 activity scores to represent the activity level of 11 well-known pathways for each cell line. Therefore, we compared the distance distributions of cell line pairs in the clusters to a set of random pairs of cells lines. Our findings indicated that the distances within the clusters were much lower than those in the random grouping (p-value = 6.94e-08, Kruskal-Wallis rank sum test) (Fig. 7C). The Beat AML study also provided the first detailed view of mutational landscape in AML (19). Here, we used the dataset of non-benign gene mutations to characterize both clusters of patient samples. As shown in Fig. 7D, both clusters of patients demonstrate a distinct profile of gene mutations in terms of involved genes and the ranks of genes based on frequency. Previously, Tyner et al. have highlighted the importance of TP53 and ASXL gene mutations, both responsible for a broad drug resistance pattern. They further showed that mutations of certain genes may identify disease subgroups sensitive to certain inhibitors. For example, they found that patients with FLT3-ITD and NPM1 mutations were sensitive to SYK inhibitors. Interestingly, our molecular-independent network-based approach to characterize patient samples also captured the significance of the mutations above. Furthermore, our findings indicate that TP53, DNMT3A, and NRAS were the most frequently mutated genes in one of the patient clusters, while TET2 and NPM1 were the most frequently mutated genes in the other cluster along with the FLT3-ITD mutation. These results suggested that the phenotype-level of information in drug response data can clearly corroborate the genotype-level information to stratify patients more effectively.
Inter-cluster design strategy for drug combinations We assumed that the best drug combination strategy is the selection of one drug from each cluster to block potential drug resistance mechanisms and cancer recurrence. A common drug combination design could be the use of the most effective drugs of each cluster to prohibit cancer cells more effectively. However, other pharmacologic evidence can encourage the choice of the best combination of drugs more specifically. As the focus in drug combination studies also lies in finding the most synergistic drug combinations, previously reported studies were used to explore the synergy values (i.e. the degree of interactions) of drug combinations. We checked first whether combinations of the top five drugs (based on the median values of cell viability) of each cluster in the Beat AML and GDSC datasets (Table1), are found in the DrugComb database.
However, there were no reports regarding the 25 possible combinations of these drugs, so we aimed to compare the average of synergy values for these ten drugs in the whole database. Beat AML and GDSC datasets (p-value = 2.96e-02 and p-value = 3.56e-02, Wilcox rank sum test, respectively).

Synergy analysis of inter-cluster combination of drugs
For further validation of our strategy of predicting synergistic drug combinations using network modeling, we focused on the ALMANAC dataset (21), which has 1,892,650 combinations of 103 inhibitors tested on 60 cell lines. The same procedure as described in Fig. 1

Discussion
The availability of single drug response datasets for cancer cell lines has prompted us to develop methods for predicting and selecting the most effective combination therapy. Several AIbased combination prediction approaches have recently been introduced, which combine highthroughput molecular profiling data with drug response data to improve prediction and However, we expounded that predication accuracy of inter-cluster design strategy of drug combinations based on multipartite networks can be achieved independently of the high-quality training dataset.
Strictly speaking,, in the present study, we revisited the analysis of nominal variables, namely drug name and sample id, in drug screening results for data mining using graph theory, which we termed the nominal data mining approach. We first considered data quality control, such as outlier detection, outlier treatment, and biological and technical replicates. Because of the discrete explanatory independent variable (i.e., drug doses) (47), we assumed that regressionbased measurements might even be discarded; hence, we demonstrated that median values can represent an appropriate weight score to compare drug functionality for network reconstruction.
These values were used to quantify and weight the bipartite network, which reflect the interaction strength of drugs and biological samples. Then, two similarity networks were provided by weighted network projection to detect the topological structure of networks, i.e., network communities. We showed that network communities represent a rationale starting point to propose a combinational drug regimen. Our computational and experimental validation steps amplified the logic of our proposed platform. As a result, while training datasets were not required in this method to predict drug combination, drug response data alone was sufficient for prediction without integrating with prior knowledge of biochemical profiling.
Noting that the occurrence of synergistic toxicities, which can arise from additive toxicities when targets are shared by the combined drugs, is a major barrier to applying combination therapy in the clinic (48). If drug screening data on healthy cells is available, we suggest that a similar strategy for predicting toxicity without losing efficacy is also essential before the future translational experiment. Ianevski et al. previously illustrated the importance of a desired synergy-efficacy-toxicity balance for predicting patient-customized drug combinations (18).
Hence, a drug-response data on healthy cells is demanded to complement synergistic interactions of drug combinations with toxicity predictions Indeed, where drug synergy and toxicity data are optimally matched for combinatorial therapy, stronger and longer-lasting outcomes of drug combinations can be predicted.
Furthermore, prospective works would necessitate the provision of further patient-derived experimental validations. Despite the fact that our prediction is solely dependent on the drug sensitivity dataset, our suggested combinations address the common mutational assigned etiology of AML. Remarkably, this combination was proposed purely on the grounds of the phenotypic response of cancer cells or patient samples to the drugs, with no previous knowledge of the disease's genetic origin.