Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

RoDiCE: Robust differential protein co-expression analysis for cancer complexome

View ORCID ProfileYusuke Matsui, Yuichi Abe, Kohei Uno, Satoru Miyano
doi: https://doi.org/10.1101/2020.12.22.423973
Yusuke Matsui
1Biomedical and Health Informatics Unit, Department of Integrated Health Science, Nagoya University Graduate School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Yusuke Matsui
  • For correspondence: matsui@met.nagoya-u.ac.jp
Yuichi Abe
2Division of Molecular Diagnostics, Aichi Cancer Center Research Institute
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kohei Uno
1Biomedical and Health Informatics Unit, Department of Integrated Health Science, Nagoya University Graduate School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Satoru Miyano
3Department of Integrated Data Science, M&D Data Science Center, Tokyo Medical and Dental University
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Motivation The full picture of abnormalities in protein complexes in cancer remains largely unknown. Comparing the co-expression structure of each protein complex between tumor and normal groups could help us understand the cancer-specific dysfunction of proteins. However, the technical limitations of mass spectrometry-based proteomics and biological variations contaminating the protein expression with noise lead to non-negligible over- (or under-) estimating co-expression.

Results We propose a robust algorithm for identifying protein complex aberrations in cancer based on differential protein co-expression testing. Our method based on a copula is sufficient for improving the identification accuracy with noisy data over a conventional linear correlation-based approach. As an application, we show that important protein complexes can be identified along with regulatory signaling pathways, and even drug targets can be identified using large-scale proteomics data from renal cancer. The proposed approach goes beyond traditional linear correlations to provide insights into higher order differential co-expression structures.

Availability and Implementation https://github.com/ymatts/RoDiCE.

Contact matsui{at}met.ngaoya-u.ac.jp

1 Introduction

Cancer is a complex system. Many molecular events, such as genomic mutations and epigenetic and transcriptomic dysregulations, were identified as cancer drivers (Hoadley et al., 2018). However, our knowledge of how they characterize the downstream mechanisms with proteomic phenotypes remains scarce (Clark et al., 2019; Liu et al., 2016; Mertins et al., 2016; Zhang et al., 2016). Protein complexes are responsible for most cellar activities. Recent studies (Ori et al., 2016; Romanov et al., 2019; Ryan et al., 2017) have demonstrated that protein subunits tend to show co-expression patterns in proteome profiles; furthermore, the subunits of a complex are simultaneously down-/up-regulated with the genomic mutations (Ryan et al., 2017). However, we know little about the changes in the co-regulatory modes of protein complexes between the tumor and normal tissues.

We propose a novel algorithm for differential co-expression of protein abundances to identify the tumor-specific abnormality of protein complexes. Differential co-expression (DC) analysis is a standard technique of gene expression analysis to find differential modes of co-regulation between conditions, and numerous methods already exist (Bhuva et al., 2019). Correlation is one of the most common measures of co-expression. For example, differential correlation analysis (DiffCorr) (Fukushima, 2013) and gene set co-expression analysis (GSCA) (Choi and Kendziorski, 2009) are two-sample tests of Pearson’s correlation coefficients. However, studies report that protein expression levels have greater variability than gene expression levels because of the regulatory mechanism of post-translational modifications (Gunawardana et al., 2015; Liu et al., 2016). This variability can affect the estimation of co-expression as an outlier and can significantly impact DC results.

We developed a robust DC framework, called Robust Differential Co-Expression Analysis (RoDiCE), via two-sample randomization tests with empirical copula. The notable advantage of RoDiCE is noise robustness. Our main contributions are as follows: 1) we develop an efficient algorithm for robust copula-based statistical DC testing; 2) we overcome the computational hurdles of the copula-based permutation test by incorporating extreme value theory; 3) we demonstrate the effective application of copula to cancer complexome analysis; and 4) we develop a computationally efficient multi-thread implementing as R package.

1.1 Motivational example from the CPTAC / TCGA dataset

First, using an actual dataset, we explain why there is a need for robustness in protein co-expression analysis. We analyzed a cancer proteome dataset of clear renal cell carcinoma from CPTAC/TCGA with 110 tumor tissue samples. We measured co-expression using Pearson’s correlation coefficient. We compared the correlation coefficients before and after removing the outliers. To identify outlier samples, we applied robust principal component analysis using the R package ROBPCA (Hubert et al., 2005) with default parameters. Among 49,635,666 pairs of 9,964 proteins, the correlation coefficients of 7,541,853 (15.2%) pairs were deviated by more than 0.2 after removing outlier samples (Figure 1). This result implied that a non-negligible proportion of protein co-expression would be overestimated or underestimated. To compare the structures of co-expression correctly, it is necessary to compare them while minimizing the over-/under-estimation of co-expression.

Fig 1.
  • Download figure
  • Open in new tab
Fig 1. Actual example of effects of outliers on co-expression.

Difference in Pearson’s correlation before and after removing outlier samples; the left panel shows a histogram of the difference in correlation differences. The right panel shows a scatter plot of the original correlation against one without outlier samples.

2 Methods

Figure 2 describes the outline of RoDiCE. We decompose the expression level of subunits in the protein complex into a structure representing a co-expression and one representing the expression level of each subunit, using a function called an empirical copula (Nelsen, 2010); the empirical copula rank-converts the scale of the original data. Comparing the empirical copula functions with the conditions of statistical hypothesis testing, we derive the p-value as the difference in co-expression structures. We describe our method in detail in the following sections.

Fig 2.
  • Download figure
  • Open in new tab
Fig 2. Overview of RoDiCE.

a) Objective of the analysis via RoDiCE. The proposed method aims to identify abnormal protein complexes by comparing two abnormal groups. An abnormal complex is one where the co-expressed structure is different in at least two subunits. b) Protein co-expression and outliers. The protein expression levels measured through LC/MS/MS contain some outliers because of the addition of noise from several sources. These can cause over- (or under-) estimation in the co-expression structure. c) Copula decomposition. The RoDiCE model decomposes the observed joint distributions of protein expression into a marginal distribution representing the behavior of each protein and an empirical copula function representing the latent co-expression structures between proteins. This allows us to extract potential co-expressed structures and compare them robustly against outliers. The figure shows an example where the co-expressed structure estimated by copula is actually the same for two apparently different joint distributions of protein expression. d) Copula robustness. A copula is a function that expresses a dependency on a rank-transformed space of data scales. One advantage of transforming the original scale into a space of rank scale is that it is robust to outliers. The example in the figure compares Pearson’s linear correlations with Pearson’s linear correlations in the space converted to a rank scale by a copula function (Spearman’s linear correlations). Pearson’s linear correlation underestimates from 0.74 to 0.44 due to outliers, whereas the linear correlation on the rank scale has a relatively small effect (0.72 to 0.62). e) RoDiCE is a copula-based two-sample test. RoDiCE is an efficient method for testing differences in copula functions between two groups. Rather than a summary measure such as correlation coefficients, we compare copula functions expressing overall dependence between groups. This allows us to robustly identify differences in complex co-expression structures between two groups of protein complexes to outliers.

Fig 3.
  • Download figure
  • Open in new tab
Fig 3. Simulated dataset.

Generated samples in the numerical experiments for the bivariate case. To mimic the noises in proteome abundance dataset, the outlier population was assumed other than the that of tumor and normal population.

2.1 RoDiCE model

Suppose there are n samples, and g (g = g1,g2) represents each condition. We compare two conditions and assume that g1 andg2 represent the normal group and the tumor group, respectively. Let Xg= (X1g,X2g,…,XPg)be abundances of P subunits in group g. Given a protein complex, we represent the entire behaviors of subunits with a joint distribution Xg ∼Hg(x1,x2,…,xP). The distribution function Hg has two pieces of information as follows: subunit expression levels and the structure of co-expression between subunits. The copula Cg is a function that can decompose those two pieces of information into a form that can be handled separately, as follows: Embedded Image

The behavior of each subunit Fpg(xp) is represented by a distribution function. The copula function itself is a multivariate distribution with uniform marginals. The copula function includes all dependency information among the subunits (Nelsen, 2010; Rémillard and Scaillet, 2009; Seo, 2020).

We use the empirical copula to non-parametrically estimate the copula Cg since it could be widely applicable to various situations. It can be represented using pseudo-copula samples defined via rank-transformed subunit abundance Embedded Image; Embedded Image where R(·) is a rank-transform function, and we represent transformed pseudo-sample variables as Embedded Image and Embedded Image. The empirical copula is robust to noise because it represents co-expression structures based on rank-transformed subunit expression levels, which is the so called scale invariant property in the context of copula theory (Nelsen, 2010).

To perform DC analysis between group g and g′, we consider the following statistical hypothesis: Embedded Image

We derive the following Cramér-von Mises type test statistic to perform statistical hypothesis testing (Rémillard and Scaillet, 2009): Embedded Image where Embedded Image represents pseudo-observation in group g. Note that the computational cost is n2, where Embedded Image. For testing the test statistic (4), we also derived the p-value using an algorithm based on Monte Carlo calculations (Rémillard and Scaillet, 2009); however, the computational complexity of the algorithm makes it difficult to apply it to proteome-wide co-expression differential analysis (see the results of the simulation experiments described below).

2.2 Derivation of statistical significance

Using a permutation test, we derive the p-value using the following steps:

  1. Randomizing concatenated variable from the two groups; Embedded Image

  2. Constructing a new randomized variable Embedded Image and Embedded Image with randomized index r(i).

  3. Replacing copula functions Embedded Image and Embedded Image in (3) with re-estimated empirical copula function Embedded Image and Embedded Image from the randomized samples Embedded Image and Embedded Image.

  4. Deriving test statistics s′(g1,g2)based on (4) with Embedded Image and Embedded Image.

  5. Steps 2 and 3 are indispensable for deriving the null distribution correctly. Deriving the null distribution by randomizing Embedded Image alone will distort the distribution, and we will be unable to control for the type I error correctly (Seo, 2020).

2.3 Approximation of p-value

The empirical p-value is derived as follows: Embedded Image where M is the number of randomization and Si is the test statistic from the null distribution of the i-th (i =1,2,…,M) randomization trials. The accuracy of p-value in (5) is bounded by p(M)≥1/M. As mentioned, calculating the test statistic requires a computational cost of O (n2); therefore, an efficient computational algorithm is needed to derive accurate p-values in data with a large number of samples. For instance, proteomic cohort projects such as CPTAC / TCGA have more than n=100 samples. To address this problem, we introduced an approximation algorithm for p-values based on extreme value theory (Knijnenburg et al., 2009) and devised a way to calculate accurate p-values even with a small number of trials.

The test statistic that exceeds the range of the accuracy with randomization trials M is regarded as an “extreme value,” and its tail of the distribution could be estimated via a generalized Pareto distribution (GPD), as follows: Embedded Image where N’ is the number of the randomized test statistic exceeding the threshold tthat has to be estimated via a goodness-of-fit (GoF) test (Knijnenburg et al., 2009) and G is the cumulative distribution function of the generalized Pareto distribution, Embedded Image for k≠0 and Embedded Image for k = 0. To estimate the threshold t in (6), the GoF test determines whether the excess comes from the distribution G(x) via bootstrap based maximum likelihood estimator (Villaseñor-Alva and González-Estrada, 2009). As we do not know a priori the number of samples sufficient to estimate the underlying GPD with threshold t, we must decide the initial number of samples to use. We begin with a large number of samples and increase this number until the GoF test is not rejected, according to (Knijnenburg et al., 2009). As initial samples, we start with those above 80% of quantiles and decrease samples by 1% while the GoF test is rejected.

2.4 Identification of protein complex alteration

As protein complexes show co-expression among multiple subunits (Kerrigan et al., 2011), we hypothesized that the difference in the co-expression structure of the tumor group compared to the normal group is a characteristic quantity of the protein complex abnormality. In previous studies of the cancer transcriptome, differential co-expression analysis has revealed abnormalities associated with protein complexes (Amar et al., 2013; Srihari et al., 2014). Therefore, we define a protein complex as an abnormal protein complex when it is co-expressed in at least one pair of subunits. Thus, we applied RoDiCE to all protein complexes for each subunit pair (p=2) and identified protein complexes that showed a statistically significant difference in at least one subunit pair as abnormal.

2.5 Protein membership with protein complex

As we do not know which proteins belong to which protein complexes, we must predict the membership via some method. There are two main approaches. One is membership prediction focusing on the modular structure in PPI networks (Adamcsek et al., 2006; Nepusz et al., 2012) and the other is a knowledge-based method using a curated database. We adopt the latter approach, which is based on already validated protein complex membership information, using CORUM (ver. 3.0)(Giurgiu et al., 2019) as a database (see the Supplementary Data for details).

2.6 R implementation with multi-thread parallelization

To further accelerate the computation of test statistic (4) in the randomization steps, we used RcppParallel (Allaire J, 2019). We utilize the portable and high-level parallel function “parallelFor,” which uses Intel TBB of the C++ library as a backend on systems that support it and TinyThread on other platforms.

2.7 Copula-based simulation model for protein co-expression

We provide the outline of a method for simulating co-expressed structures using a copula. We simulated protein expression levels that showed differential co-expression patterns with outliers in the tumor group and the normal group. We represented the co-expression structure by the covariance parameter in the following bivariate Gaussian copula: Embedded Image where Φgis the p dimensional Gaussian distribution parameterized by a p×pcovariance matrix (or correlation matrix) in the group g, denoted as Embedded Image and ϕ(xi) is a univariate distribution. Using the model, we generate the dependency structure with two groups; one group has high correlations and the other has low ones, Embedded Image and Embedded Image, respectively. We then generated co-expression structure using a Gaussian copula with ø(x) = N(0,1). We obtained protein expressions via Embedded Image where we simply set as Fig∼N(μ,σ) for i =1,2 and g = g1,g2 with μ∼N(2,1) and σ∼gamma(2,1). Furthermore, we added outliers that could affect the co-expression structure. Using the model in (6) and (7), we set the outlier population in both group as Embedded Image and Embedded Image for i =1,2 and g = g1,g2 (Fig3).

3 Results

3.1 Benchmarking RoDiCE with simulation dataset

We now describe the features of RoDiCE using a simulation model. First, to confirm whether RoDiCE could correctly derive the p-value, we performed a test on two groups, with no differences in co-expression structure without outliers, and confirmed the null rejection rate. We performed 100 tests with the proposed method and calculated the null rejection rate at the 1%, 5%, and 10% levels of significance. The same simulation was repeated 10 times to calculate the standard deviations. The results show that the proposed method can control type I errors (Table 1).

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 1. Type I error controls of the proposed method

We then simulated a case in which the co-expressed structure between the two groups was different and included outliers, and we examined the sensitivity of the method to identify a broken co-expressed structure in tumor tissue relative to normal tissue. To demonstrate the advantages of the proposed method, we examined the sensitivity of increasing the percentage of outliers in 2% increments from 0% to 20% and compared it further with DiffCorr and GSCA, a two-group co-expression test method based on Pearson’s linear correlation (Figure 4). For outliers, the proposed method showed robust co-expression test results, with an accuracy of more than 85% up to a percentage of outliers of approximately 15%. Conversely, the sensitivity of the method based on linear correlation starts to decline from the level of 2% of outliers, and for data containing 15% of outliers, the sensitivity drops to around 30%.

Fig 4.
  • Download figure
  • Open in new tab
Fig 4. Sensitivities and ratio of outliers.

The percentage of outliers is taken on the horizontal axis, and the sensitivity of the co-expression differences by each method (5% level of significance) is shown on the vertical axis.

To investigate the relationship between sample size and identification accuracy, we simulated the sensitivity of RoDiCE, as we increased the number of samples in increments of 10 from 30 to 100 samples. All other settings were the same as those in Figure 5, except that the percentage of outliers was set at 5%.

Fig 5.
  • Download figure
  • Open in new tab
Fig 5. Sensitivities and sample size.

The horizontal axis shows the sample size, while the vertical axis shows the sensitivity of the co-expression differences by each method (5% level of significance).

Finally, we also examined the computational speed, comparing it with the R package TwoCop, which implements the Monte Carlo-based method (ref) used for the two-group comparison of copulas (Table 2). The proposed method is 68 times faster than TwoCop and is sufficiently efficient as a copula-based two-group comparison test method. In contrast, the estimation of the copula function required more computational time than the linear correlation coefficient-based method because of the computational complexity of estimating the copula function.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 2. Computation time for 10 replicates

3.2 Application to cancer complexome analysis

We demonstrate RoDiCE with actual data using the clear renal cell carcinoma (ccRCC) published by CPTAC/TCGA (Clark et al., 2019). The data are available from the CPTAC data portal (https://cptac-data-portal.georgetown.edu) in the CPTAC Clear Cell Renal Cell Carcinoma (CCRCC) discovery study. The data labeled “CPTAC_CompRef_CCRCC_Proteome_CDAP_Protein_Report.r1” were used. In the following analysis, only protein expression data that overlap with protein groups in human protein complexes in CORUM and in CPTAC were used. Missing values were completed based on principal component analysis, and the missing values were completed by 10 principal components using the pca function in pca Methods.

For the complete data, RoDiCE was applied to the normal and cancer groups for each protein complex. FDR was calculated by correcting the p-value for each complex using the Benjamini– Hochberg method. We identified anomalous protein complexes in protein expression data from 110 tumor and 84 normal samples; out of 3,364 protein complexes in CORUM, 1,244 complexes contained at least one co-expression difference between subunits with FDR ≤5% (Supplementary Data).

The proposed method has identified several protein complexes containing driver genes on regulatory signaling pathways in ccRCC (Figure 6a) (Li et al., 2019). The identified pathways included known regulatory pathways important for cancer establishment and progression, starting with chromosome 3p loss, regulation of the cellular oxygen environment (VHL), chromatin remodeling, and disruption of DNA methylation mechanisms (PBRM1, BAP1). They also included abnormalities in regulatory signals involved in cancer progression (AKT1). Moreover, several identified complexes also included key proteins, for example, MET, HGF, and FGFR proteins, which could be inhibited by targeting them with drugs such as Cabozantinib and Lenvatinib directly. Because a previous study reported that sensitivity to knockdowns of several genes was well associated with expression levels of protein complexes (Nusinow et al., 2020), co-expression information on protein complexes containing druggable genes might be useful to optimize drug selection.

Fig 6.
  • Download figure
  • Open in new tab
Fig 6. Identified protein complexome related to driver genes.

a) Dysregulated protein complex with known driver and druggable genes. The red shows the pairs with differential co-expression between the subunits of the protein complexes (5% level of significance). The thickness of the line is proportional to −log10(p-value). The blue lines are the non-significant pairs. The yellow nodes represent proteins whose expression was actually measured by LC/MS/MS in this study, and the gray ones represent proteins that were not measured. b) Examples of VHL-TBP1-HIF1A complex and PBAF complex with the co-expression structure. Blue and red represent the tumor and normal groups, respectively, and the density distribution of protein expression is shown on the diagonal. In the lower diagonal, the co-expression pattern before copula-transformations is illustrated. The co-expression pattern after copula-transformations is illustrated in the upper diagonal.

A close examination of the above identified protein complexes allows us to partially understand how the dysregulation of protein was a co-expression abnormality between VHL and TBP1. The upregulation of TBP1 is known to induce dysregulation of downstream HIF1A molecules in a VHL-dependent manner (Corn et al., 2003). In fact, the protein expression of TBP1 increased in the tumor group. We also examined the PBAF complex containing the driver gene PBRM1, which is thought to occur following VHL abnormalities. Along with a decrease in PBRM1 protein expression, there was a loss of tumor group-specific co-expression structure among many subunits involved with PBRM1 levels.

4 Discussion

In this study, we developed an algorithm of robust identification for protein complex aberrations based on differential co-expression structure using protein abundance. Protein expression data measured through LC/MS/MS contains a non-negligible percentage of outliers due to technical limitations and variation due to biological reasons such as post-translational modifications. This causes the problem of over- (or under-) estimation of co-expression. The copula-based DC approach is a powerful statistical framework as a solution to this problem.

In addition to noise robustness, this study does not include several other key properties of the copula that are important in capturing the co-expression structure. The first is self-equitability (Chang et al., 2016; Ding et al., 2017). Copulas can capture nonlinear structures between variables, and self-equitability allows us to evaluate the degree of dependency equally between variables in linear and nonlinear relations. Therefore, copula allows us to compare a much broader range of co-expressed structures than conventional linear and nonlinear correlations.

Second, we can also model simultaneous co-expression structures between three or more proteins. Although this study only identified pairwise co-expression differences, equation (4) allows the identification of simultaneous co-expression differences across three or more proteins. However, high-dimensional estimation of the copula remains limited, and at present, in our simulations, the comparison of simultaneous co-expressed structures of 15 proteins is a performance limitation for about 100 samples.

As described, the copula-based co-expression analysis approach is a powerful modeling method for data sets where noise is expected, although there remain challenges in high-dimensional estimation. In particular, it could be useful for modeling proteome-wide protein expression patterns. The proposed approach is useful for understanding the abnormalities in the protein complexes of cancer. Studies focusing on protein complexes in large-scale cancer proteomics are in their infancy. We believe that this approach will provide valuable insights into the molecular mechanisms of cancer and the search for new drug targets.

Funding

This work was supported by MEXT Program for Promoting Researches on the supuercomputer Fugaku, JSPS KAKENHI Grant Numbers JP18H04899, JP18K18151, and JP20H04282.

Conflict of Interest

none declared.

Data Availability Statement

The data underlying this article are available in CPTAC data portal, and CORUM3.0. The datasets were derived from sources in the public domain: https://cptac-data-portal.georgetown.edu and http://mips.helmholtz-muenchen.de/corum/.

Supplementary information

Supplementary data are available online.

Acknowledgements

The super-computing resource was provided by the Human Genome Center (the Univ. of Tokyo).

Footnotes

  • https://github.com/ymatts/RoDiCE

References

  1. ↵
    Adamcsek, B. et al. (2006) CFinder: locating cliques and overlapping modules in biological networks. Bioinformatics, 22(8), 1021–1023.
    OpenUrlCrossRefPubMedWeb of Science
  2. ↵
    Allaire J., Francois, R., Ushey, K., Vandenbrouck, G., Geelnard, M. Intel (2019) RcppParallel: Parallel Programming Tools for ‘Rcpp’. R package version 4.4.4.
  3. ↵
    Amar, D. et al. (2013) Dissection of regulatory networks that are altered in disease via differential co-expression. PLoS Comput Biol, 9(3), e1002955.
    OpenUrlCrossRefPubMed
  4. ↵
    Bhuva, D.D. et al. (2019) Differential co-expression-based detection of conditional relationships in transcriptional data: comparative analysis and application to breast cancer. Genome Biol, 20(1), 236.
    OpenUrl
  5. ↵
    1. Arthur, G.and
    2. Christian, C.R.
    Chang, Y. et al. A robust-equitable copula dependence measure for feature selection. In: Arthur, G.and Christian, C.R., editors, Proceedings of the 19th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research: PMLR; 2016. p. 84–92.
  6. ↵
    Choi, Y. and Kendziorski, C. (2009) Statistical methods for gene set co-expression analysis. Bioinformatics, 25(21), 2780–2786.
    OpenUrlCrossRefPubMedWeb of Science
  7. ↵
    Clark, D.J. et al. (2019) Integrated proteogenomic characterization of clear cell renal cell carcinoma. Cell, 179(4), 964–983 e931.
    OpenUrlCrossRef
  8. ↵
    Corn, P.G. et al. (2003) Tat-binding protein-1, a component of the 26S proteasome, contributes to the E3 ubiquitin ligase function of the von Hippel–Lindau protein. Nat Genet, 35(3), 229–237.
    OpenUrlCrossRefPubMedWeb of Science
  9. ↵
    Ding, A.A. et al. (2017) A robust-equitable measure for feature ranking and selection. J Mach Learn Res, 18(1), 2394–2439.
    OpenUrl
  10. ↵
    Fukushima, A. (2013) DiffCorr: an R package to analyze and visualize differential correlations in biological networks. Gene, 518(1), 209–214.
    OpenUrlCrossRefPubMedWeb of Science
  11. ↵
    Giurgiu, M. et al. (2019) CORUM: the comprehensive resource of mammalian protein complexes-2019. Nucleic Acids Res, 47(D1), D559–D563.
    OpenUrlCrossRefPubMed
  12. ↵
    Gunawardana, Y. et al. (2015) Outlier detection at the transcriptome-proteome interface. Bioinformatics, 31(15), 2530–2536.
    OpenUrlCrossRefPubMed
  13. ↵
    Hoadley, K.A. et al. (2018) Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer. Cell, 173(2), 291-304.e296.
    OpenUrlCrossRefPubMed
  14. ↵
    Hubert, M. et al. (2005) ROBPCA: A new approach to robust principal component analysis. Technometrics, 47(1), 64–79.
    OpenUrlCrossRefWeb of Science
  15. ↵
    Kerrigan, J.J. et al. (2011) Production of protein complexes via co-expression. Protein Expr Purif, 75(1), 1–14.
    OpenUrlCrossRefPubMed
  16. ↵
    Knijnenburg, T.A. et al. (2009) Fewer permutations, more accurate P-values. Bioinformatics (Oxford, England), 25(12), i161–i168.
    OpenUrlCrossRefPubMedWeb of Science
  17. ↵
    Li, Q.K. et al. (2019) Challenges and opportunities in the proteomic characterization of clear cell renal cell carcinoma (ccRCC): A critical step towards the personalized care of renal cancers. Semin Cancer Biol, 55, 8–15.
    OpenUrl
  18. ↵
    Liu, Y. et al. (2016) On the dependency of cellular protein levels on mrna abundance. Cell, 165(3), 535–550.
    OpenUrlCrossRefPubMed
  19. ↵
    Mertins, P. et al. (2016) Proteogenomics connects somatic mutations to signalling in breast cancer. Nature, 534(7605), 55–62.
    OpenUrlCrossRefPubMed
  20. ↵
    Nelsen, R.B. An introduction to copulas. Springer Publishing Company, Incorporated; 2010.
  21. ↵
    Nepusz, T. et al. (2012) Detecting overlapping protein complexes in protein-protein interaction networks. Nat Methods, 9(5), 471–472.
    OpenUrlCrossRefPubMedWeb of Science
  22. ↵
    Nusinow, D.P. et al. (2020) Quantitative proteomics of the cancer cell line encyclopedia. Cell, 180(2), 387-402.e316.
    OpenUrlCrossRef
  23. ↵
    Ori, A. et al. (2016) Spatiotemporal variation of mammalian protein complex stoichiometries. Genome Biol, 17(1), 47.
    OpenUrlCrossRef
  24. ↵
    Rémillard, B. and Scaillet, O. (2009) Testing for equality between two copulas. J Multivar Anal, 100(3), 377–386.
    OpenUrl
  25. ↵
    Romanov, N. et al. (2019) Disentangling genetic and environmental effects on the proteotypes of individuals. Cell, 177(5), 1308-1318.e1310.
    OpenUrl
  26. ↵
    Ryan, C.J. et al. (2017) A compendium of co-regulated protein complexes in breast cancer reveals collateral loss events. Cell Syst, 5(4), 399–409 e395.
    OpenUrl
  27. ↵
    Seo, J. (2020) Randomization tests for equality in dependence structure. J Bus Econ Stat, 1–35.
  28. ↵
    Srihari, S. et al. (2014) Complex-based analysis of dysregulated cellular processes in cancer. BMC Syst Biol, 8(4), S1.
    OpenUrlCrossRef
  29. ↵
    Villaseñor-Alva, J.A. and González-Estrada, E. (2009) A bootstrap goodness of fit test for the generalized Pareto distribution. Comput Stat Data Anal, 53(11), 3835–3841.
    OpenUrl
  30. ↵
    Zhang, H. et al. (2016) Integrated proteogenomic characterization of human high-grade serous ovarian cancer. Cell, 166(3), 755–765.
    OpenUrlCrossRef
Back to top
PreviousNext
Posted December 23, 2020.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
RoDiCE: Robust differential protein co-expression analysis for cancer complexome
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
RoDiCE: Robust differential protein co-expression analysis for cancer complexome
Yusuke Matsui, Yuichi Abe, Kohei Uno, Satoru Miyano
bioRxiv 2020.12.22.423973; doi: https://doi.org/10.1101/2020.12.22.423973
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
RoDiCE: Robust differential protein co-expression analysis for cancer complexome
Yusuke Matsui, Yuichi Abe, Kohei Uno, Satoru Miyano
bioRxiv 2020.12.22.423973; doi: https://doi.org/10.1101/2020.12.22.423973

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3579)
  • Biochemistry (7523)
  • Bioengineering (5486)
  • Bioinformatics (20699)
  • Biophysics (10260)
  • Cancer Biology (7939)
  • Cell Biology (11584)
  • Clinical Trials (138)
  • Developmental Biology (6573)
  • Ecology (10144)
  • Epidemiology (2065)
  • Evolutionary Biology (13551)
  • Genetics (9502)
  • Genomics (12793)
  • Immunology (7887)
  • Microbiology (19456)
  • Molecular Biology (7618)
  • Neuroscience (41913)
  • Paleontology (307)
  • Pathology (1253)
  • Pharmacology and Toxicology (2181)
  • Physiology (3253)
  • Plant Biology (7008)
  • Scientific Communication and Education (1291)
  • Synthetic Biology (1942)
  • Systems Biology (5410)
  • Zoology (1108)