Abstract
The amplification of the gene MYCN (V-myc mye-locytomatosis viral-valeted oncogene, neuroblastoma derived) has been a well-documented indicator for poor prognosis in neuroblastoma, a childhood cancer. Unfortunately, there has been limited success in understanding MYCN functionality in the landscape of neuroblastoma and more importantly given that MYCN has been deemed "undruggable," the need to potentially illuminate key opportunities that indirectly target MYCN is of great interest. To this end, this work employs an emerging quantitative technique from network science, namely network curvature, to quantify the biological robustness of MYCN and its surrounding neighborhood. In particular, when amplified in Stage IV cancer, MYCN exhibits higher curvature (more robust) than those samples with under expressed MYCN levels. When examining the surrounding neighborhood, the above argument still holds for network curvature, but is lost when only analyzing differential expression - a common technique amongst oncologists and computational/molecular biologists. This finding points to the problem (and possible solution) of drug targeting in the context ofcomplexity and indirect cell signaling affects that have often been obfuscated through traditional techniques.
1. Introduction
Neuroblastoma, the most common extra-cranial tumor in childhood, is an embryonic tumor that is derived from the neural crest [2]. It can occur anywhere along the sympathetic nervous system, most commonly the adrenal glands in addition to the neck, chest, abdomen, pelvis, and spine, and the clinical manifestations are varied depending on location and severity. The prognosis for low and intermediate risk neuroblastoma is excellent, whereas that for high risk disease remains around 40% despite intensive chemotherapy and myeloblative therapy with stem cell transplantation [1]. Other novel approaches have emerged in the treatment of high risk disease; however drug resistance continues to be the main obstacle in finding a cure for this disease. One of the main indicators for high risk neuroblastoma is MYCN amplification, found in approximately 25% of cases and correlated with poor prognosis [3]. Although directly targeting MYCN is not currently feasible, current efforts are underway to inhibit targets directly or indirectly associated with the gene. Motivated by such work, this note focuses to better understand MYCN in a quantitative manner via biological robustness. In turn, targets of opportunity that indirectly counter-effect MYCN (e.g., specific pathways) may be uncovered and furthered in the clinical setting with respect to drug development.
To do so, we use the geometric network method proposed in our previous work [4, 5]. These papers are centered around several key concepts. Firstly, cell signaling cascades within neuroblastoma can be viewed as a complex biological network (weighted graph) whereby the nodes in the network represent genes and the edges characterize the "strength" of interaction between such genes. Secondly, the underlying network may be alternatively viewed as a discrete statistical manifold for which those edge weights represent a one-step random walk. In doing so, we are then able to show that Ricci curvature (from geometry) is positively correlated with a networks robustness [4]. Biologically speaking, this has allowed us to differentiate cancer tissue networks from their normal counterparts on several tumor types, i.e., we were able to quantitatively show that cancer is more robust than normal tissue (given a precise definition of robustness as given in Section 2.3). Using the same methodology, we were further able to show that drug resistant samples in Ewing Sarcoma were more robust than untreated and drug sensitive samples and that drug sensitive samples exhibited fragility as compared to untreated samples [5]. Nevertheless, while in these previous works, we took a more global view in our analysis of the cancer network, in the present work we considered a more localized analysis centered around MYCN for neuroblastoma.
The remainder of the present note is outlined as follows: In the next section, we revisit the concept of Ricci curvature in general and for the discrete case of graphs. We further expound upon the flexibility of providing a local proxy for robustness by introducing discrete scalar curvature that is capable of accounting for inherent cell-signaling complexity. Section 3 then presents localized curvature results to illuminate that MYCN amplification is robust and that it also promotes robustness in the neighboring signaling region. This is particularly compelling as differential expression is unable to elucidate such representative information in this same neighborhood further stressing the need for network science principles when performing a quantitive analysis. We conclude this brief note with Section 4 discussing future work.
2. Preliminaries
In this section, we provide background on geometric concept of curvature. We note that much of this background can be similarly found in previous works [4, 5, 6] and is simply provided here to make this note more self-contained.
2.1. Introduction to Ricci Curvature
We begin with introducing some background on Ricci curvature [7]; we refer the reader to [8] for all the rigorous mathematical details. Accordingly, let X be a Riemannian manifold (the generalization of a smooth surface valid in any dimension). One can naturally measure distances on X and thus define the length of a given curve. Geodesics are curves that locally are the shortest distance between two given points on the manifold X, and are essential to defining curvature.
More specifically, let x ∈ X, let Tx denote the tangent space at x with ux, w ∈ Tx denoting two orthogonal unit vectors. Then if we move along the geodesic curve γ at x in the direction of w in an infinitesimal manner, we let y ∈ X be the endpoint of the traversal. A pictorial representation of this is given in Figure 1A-B. The transversal is carried out via parallel transport, that roughly allows us to connect the geometry of nearby points in a canonical manner [8]. We denote by uy ∈ Ty, the result of the parallel transporting of ux to the point y. On the plane, this would exactly correspond to moving a given vector in a parallel manner along a straight line (the geodesics of Euclidean space). We call uy the parallel transport of ux.
On a curved space, geodesics defined along ux and uy (denoted by expx tux and expy tuy, respectively) may converge towards one another (positive curvature) or diverge from one another (negative curvature); see Figure 1C. The is called geodesic deviation. With this in mind, we are able to define Ricci curvature via sectional curvature [7, 8]. Again for uy the parallel transport of u := ux from point x to y in the direction w, we have for sufficiently small ∈, δ > 0,
The term K(u, w) denotes the sectional curvature at x in the tangent plane (u, w).
Classical Ricci curvature is defined by averaging K(u, w) over all directions in the (u, w) tangent plane. Of course, the above construction is defined in the continuous setting to provide an intuitive insight and we will now turn our attention to the discrete setting that will be needed for graphs and how we can measure the fragility/complexity of MYCN within neuroblastoma.
2.2. Graph Ricci Curvature
Given we are working on a discrete graph/network where ordinary notions of smoothness are not applicable, we need to extend notions of Ricci curvature to such a setting. We should note that there are number of possibilities [11, 12, 13, 14, 15, 16, 17, 18] that are geared towards extending the notion of Ricci curvature to more general metric spaces. Here, we employ a clever notion of a Ricci curvature, due to Ollivier [10, 9, 7], based on a synthetic coarse geometric approach. We will call this notion Ollivier-Ricci curvature and Ricci curvature will always be taken to be in this sense.
In order to define the Ollivier-Ricci curvature, we will need to define the Wasserstein 1-distance (also called the Earth Mover’s Distance) [19, 20] on a discrete metric measure space X = {x1,…,xn}. Let μ1 and μ2 denote two distributions having the same total mass, and denote by d(x, y) the distance between x, ∈ X (for graphs, taken as the hop metric). Then W1(μ1,μ2) is be defined as follows [24]: where μ(x, y) is a coupling (i.e., distribution on X x X) subject to the following constraints:
The cost above finds the optimal coupling of moving mass defined by distribution μ1 to μ2 with minimal "work." Very importantly, the Wasserstein 1-distance may be computed as a linear program allowing for an efficient, highly parallelizable algorithm.
We can now define the Ollivier-Ricci curvature. The intuition underpinning the approach is motivated from the observation (in the classical continuous case) that the distance between two small (geodesic) balls is less than the distance between their centers on a positively curved space (and greater than the distance between the centers on a negatively curved one). Accordingly, if we let (X, d) be metric space equipped with a family of probability measures [μx: x ∈ X}, we define the Ollivier-Ricci curvature k(x,y) along the geodesic connecting nodes x and y via where W1 is the Wasserstein distance defined above [19, 20, 21, 22, 23] and d is the distance on X. For the case of weighted graphs, we set where dx is the sum taken over all neighbors of node x and where wxy denotes the weight of an edge connecting node x and node y (wxy = 0 if d(x, y) ≥ 2).
The measure μx may be regarded as the distribution of a one-step random walk starting from x, with the weight wxy quantifying the strength of interaction between nodal components or the diffusivity across the corresponding link (edge).
2.3. Curvature and robustness
We summarize here the central relationship of curvature and robustness, namely their positive correlation. We express this as where ΔR denotes change in robustness and ΔRic change in curvature. This is strongly related to the Fluctuation Theorem, which shows that changes in entropy is positively correlated to changes robustness [25], and this relationship has been exploited in [27] for cancer as well. The relation between entropy and curvature is described in [11] using deep results in the theory of optimal transport.
We just note here that robustness is characterized by the ability of a system to functionally adapt to changes in the environment. The formal definition is based on the theory of large deviations [26]. One considers random fluctuations of a given network that result in perturbations of some observable. We let p∈(t) denote the probability that the mean deviates by more than e from the original (unperturbed) value at time t. Since p∈ (t) → 0 as t → ∞, we want to measure its relative rate, that is, we set
Therefore, large R means not much deviation and small R large deviations.
Because of the positive correlation of curvature and robustness, one can use curvature as a proxy for robustness. This gives a major advantage since curvature may be computed via a linear program while the computation for robustness may be quite challenging.
2.4. Weighted Graph Scalar Curvature
Based on our preceding discussion, the above measure of Ollivier-Ricci curvature provides a proxy for edge (gene-to-gene interaction) robustness. This is a local property capable of analyzing a pathway that connects any two given genes or proteins in a biological network. While this allows us to examine interactions involving MYCN, we also seek to understand MYCN as a stand alone gene, in the context of robustness, compared to other genes involved in our network. The important caveat here is a single gene (MYCN), when measured in a given neighborhood (or globally), must account for all direct and indirect interactions. This is a problem of complexity-how can we quantitatively account for such enumerated pathways.
We approach the above issue via the related notion of scalar curvature, which for intuitive purposes, can be considered as the average of aforementioned Ricci curvature. Specifically, scalar curvature at a given node x can be defined on a discrete graph as a weighted contraction with respect to the probability distribution . However, as mentioned, we would like to provide a metric that is capable of measuring network robustness in a localized region with respect to a particular gene and in turn, must measure all pathways that interact with that gene. As follows, we alternatively define local-to-global "scalar" curvature Slg (x) as follows: where ŷ are those nodes that fall within a particular ξ-geodesic neighborhood (i.e., ŷ = {y ∈ X: d(x, y) ≤ ξ{) and Ny are the number nodes ŷ. One could also weight the above contraction with respect to the measure.
3. Results
Prior work [4] has shown that cancer networks exhibits a higher degree of robustness as compared to normal tissue networks. While we highlighted the importance of understanding biological pathway fragility, much of the analysis was conducted at a macroscopic level over several cancer types. Here, we provide a localized analysis of a single cancer study (neuroblastoma) with an even more specific focus on a known poor prognosis indictor (amplification of MYCN). To the best of our knowledge, this is the first result to show that MYCN exhibits a higher degree of biological robustness in Stage IV samples when amplified as opposed to non-amplified Stage IV samples. However, before doing so, we first present details on the obtained data and corresponding network construction.
3.1. Data and Network Construction
In this study, we obtained raw gene expression data from ArrayExpress (url: http://www.ebi.ac.uk/arrayexpress, study: E-MTAB-1781). In particular, 709 neuroblastoma samples were available including Stage I, Stage II, Stage IV tumorgrades. There was a total of 161 Stage IV samples for which MYCN was considered "non-amplified" and 95 samples that were considered "amplified." In addition to this, 355 Stage I/II samples were available and used.
With regards to constructing the underlying inter-actome (topology of the graph/network), we utilized three separate databases: Human Protein Reference Database (HPRD), Human Interactome Project (LIT) and String v10. In particular, to ensure a highly accurate neighborhood construction of the MYCN neighborhood (e.g., those direct and primary indirect interactions), we first sought out all possible interactions that were listed in the HPRD and LIT database. Then, such interactions were re-examined and reconstructed by String v10 / Gene-Cards, taking only those interactions with a high confidence (>.7) to "clean" the neighborhood. The final result was further verified by our collarborators in the Division of Pediatric Hematology/Oncology/Stem Cell Transplantation at Columbia University, and the result of this neighborhood can be seen in Figure 2. For the remaining interactions, we used HPRD as our primary source for the underlying network. Using the above raw expression data, we were able to compute correlation between any two given genes and these values served as the weights in our corresponding network. Note: Given that correlation ranges from [-1,1], we utilized an affine transformation to ensure positive weights (i.e., . Accordingly, we built four separate biological networks composed of ≈ 3.5K genes: MYCN Amplified Stage IV, MYCN Non-Amplified Stage IV, Stage IV (irrespective of MYCN amplification), and Stage I/II (irrespective of MYCN amplification).
3.2. Robustness of MYCN
We begin by computing Ricci curvature K(X,y) on Stage IV samples of both amplified and non-amplified MYCN cases. In particular, while it is known that MYCN amplification in "high risk" (Stage IV) patients signify a poor outcome in terms of patient survivabil-ity, there has been limited success in being able to properly target MYCN and to further expound upon the functionality of MYCN with respect to neuroblastoma. This said, using equation (9), we are able to compute the changes in curvature (signifying changes in robustness) on such samples. These results are seen in Table 1. Interestingly, the neighborhood of MYCN, defined to be only those interactions that are direct and primary indirect interactions, exhibits higher scalar curvature (more robust) than the non-amplified samples. In contrast, if one were to compute the the average differential expression of this neighborhood, information regarding MYCN significance is seemingly "lost." The significance of this result can be seen in terms of gene ranking with respect to scalar curvature and differential expression. Examining MYCN more globally, i.e., taking all interactions involving MYCN, we see that curvature is able to aptly characterize its significance (via rank) as compared to simply differential expression. This quantification is consistent with previous thinking and current on-going clinical work.
In addition, we also examined sample that comprise Stage IV and Stage I/II data, regardless of MYCN amplification. These results can be seen in Table 2. While MYCN is still considered to be "robust" in Stage IV as compared to Stage I/II, the results are much less pronounced. Biologically speaking, this would make sense as cure rates for Stage I/II are considerably high and do not need generally involve targeting MYCN, i.e., MYCN is not considered to be a key factor contributing to the robustness of neuroblastoma. Even further and in Stage IV, MYCN has only been the focus in those samples for which it has been amplified. Indeed, the considerable risk that has been noted in clinical outcomes primarily relies only on MYCN amplification in Stage IV cancer. Together, these two, albeit short studies, provide an interesting path forward to understand how MYCN functions in the context of drug resistance.
4. Future Work
To the best of our knowledge, this work provides the first quantitive localized analysis with respect to quantitive (biological) robustness of MYCN in neuroblastoma. In particular, through a network science approach, we were able to show that those samples in Stage IV neuroblastoma for which MYCN is amplified, also exhibits higher network curvature. Through existing work [4], which suggests that increases in network curvature is positively correlated to increases in network robustness, we were able to deduce that such findings point to the robustness of MYCN for "high risk" patients. This is particularly compelling in that, while differential expression is unable to elucidate the importance of these indirect cell signaling affects, our network curvature-based approach is able to do so. From this, the next step, which will be a subject of future research, is to quantify which feedback pathways contribute to MYCN robustness and more importantly, which drug targets may be considered (e.g., AURKA) that will decrease such robustness.
Acknowledgements
This project was supported by in part by grants from the National Center for Research Resources (P41-RR-013218) and the National Institute of Biomedical Imaging and Bioengineering (P41-EB-015902) of the National Institutes of Health. This work was also supported by NIH grant 1U24CA18092401A1 as well as AFOSR grants FA9550-12-1-0319 and FA9550-15-1-0045.