Multi-omic and multi-view clustering algorithms: review and cancer benchmark

Nimrod Rappoport; Ron Shamir

doi:10.1101/371120

Abstract

High throughput experimental methods developed in recent years have been used to collect large biomedical omics datasets. Clustering of such datasets has proven invaluable for biological and medical research, and helped reveal structure in data from several domains. Such analysis is often based on investigation of a single omic. The decreasing cost and development of additional high throughput methods now enable measurement of multi-omic data. Clustering multi-omic data has the potential to reveal further systems-level insights, but raises computational and biological challenges. Here we review algorithms for multi-omics clustering, and discuss key issues in applying these algorithms. Our review covers methods developed specifically for multi-omic data as well as generic multi-view methods developed in the machine learning community for joint clustering of multiple data types.

In addition, using cancer data from TCGA, we perform an extensive benchmark spanning ten different cancer types, providing the first systematic benchmark comparison of leading multi-omics and multiview clustering algorithms. The results highlight several key questions regarding the use of single-vs. multi-omics, the choice of clustering strategy, the power of generic multi-view methods and the use of approximated p-values for gauging solution quality. Due to the rapidly increasing use of multi-omics data, these issues may be important for future progress in the field.

1 Introduction

Deep sequencing and other high throughput methods measure a large number of molecular parameters in a single experiment. The measured parameters include DNA genome sequence [1], RNA expression [2, 3], DNA methylation [4] etc. Each such kind of data is termed “omic” (genomics, transcriptomics, methylomics, respectively). As costs decrease and technologies mature, larger and more diverse omic datasets are available. Computational methods are imperative for analyzing such data. One fundamental analysis is clustering - finding coherent groups of samples in the data, such that samples within a group are similar, and samples in different groups are dissimilar [5]. This analysis is often the first step done in data exploration. Clustering has many applications for biomedical research, such as discovering modules of co-regulated genes and finding subtypes of diseases in the context of precision medicine [6]. Clustering is a highly researched computational problem, investigated by multiple scientific communities, and a myriad algorithms exist for this task.

While clustering each omic separately reveals patterns in the data, integrative clustering using several omics for the same set of samples has the potential to expose more fine-tuned structures that are not revealed by examining only a single data type. For example, cancer subtypes can be defined based on both gene expression and DNA methylation together. Multi-omics clustering can also reduce the effect of experimental and biological noise in the data, and find structures that involve different cellular mechanisms.

A problem akin to multi-omics clustering was investigated independently by the machine learning community, and is termed “multi-view clustering” [7, 8]. Multi-view clustering algorithms can be used to perform clustering of multi-omic data. In the past, methods developed within the machine learning community have proven useful in the analysis of biomedical datasets. However, by and large, multi-view clustering have not penetrated bioinformatics yet.

In this paper, we review methods for multi-omics clustering, and benchmark them on real cancer data. The data source is TCGA (The Cancer Genome Atlas) [9] - a large multi-omic repository of data on thousands of cancer patients. We survey both multi-omics and multi-view methods, with the goal of exposing computational biologists to these algorithms. Throughout this review, we use the terms view and multi-view instead of omic and multi-omics in the context of Machine Learning algorithms.

Several recent reviews discussed multi-omics integration. [10], [11] and [12] review methods for multiomics integration, and [13] review multi-omics clustering for cancer application. These reviews do not include a benchmark, and do not focus on multi-view clustering. [14] reviews only dimension reduction multi-omics methods. To the best of our knowledge, [15] is the only benchmark performed for multi-omics clustering, but it does not include machine learning methods. Furthermore, we believe the methods tested in the benchmark do not represent the current state of the art for multi omics clustering. Finally, [7] is a thorough review of multi-view methods, directed to the Machine Learning community. It does not discuss algorithms developed by the bioinformatics community, and does not cover biological applications.

2 Review of multi-omics clustering methods

We divide the methods into several categories based on their algorithmic approach. Early integration is the most simple approach. It concatenates omic matrices to form a single matrix with features from multiple omics, and applies single-omic clustering algorithms on that matrix. In late integration, each omic is clustered separately and the clustering solutions are integrated to obtain a single clustering solution. Other approaches try to build a model that incorporates all omics, and are collectively termed intermediate integration. Those include: (1) methods that integrate sample similarities, (2) methods that use joint dimension reduction for the different omics datasets, and (3) methods that use statistical modeling of the data.

The categories we present here are not clear-cut, and some of the algorithms presented fit into more than one category. For example, iCluster [16] is an early integration approach that also uses probabilistic modeling to project the data to a lower dimension. The algorithms are described in the categories where we consider them to fit most.

Multi-omics clustering algorithms can also be distinguishable by the set of omics that they support. General algorithms support any kind of omics data, and are therefore easily extendible to novel future omics. Omic specific algorithms are tailored to a specific combination of data types, and can therefore utilize known biological relationships (e.g. the correlation between copy number and expression). A mixture of these two approaches is to perform feature learning in an omic specific way, but then cluster those features using general algorithms. For example, one can replace a gene expression omic with an omic that scores expression in cellular pathways, and thus take advantage of existing biological knowledge.

Throughout this review, we use the following notation: a multi-omic dataset contains M omics. n is the number of samples (or patients for medical datasets), p_m is the number of features in the m’th omics, and X^m is the n x p_m matrix with measurements from the m’th omic. is therefore the value of the j’th feature for the i’th patient in the m’th omic. is the total number of features, and X is the n x p matrix formed by the concatenation of all X^m matrices.

Figure 1 summarizes pictorially the different approaches to multi-omics clustering. A summary table of the methods reviewed here is given in Table 1.

Figure 1: Overview of multi-omics clustering approaches.

View this table:

Table 1:

Multi-omic clustering methods. DR: dimension reduction; EM: Expectation maximization; MV: multi-view; NMF: Non-negative matrix factorization.^•Methods included in the benchmark. Single-omic K-means and spectral clustering were also included in the benchmark.

2.1 Alternate optimization

Early research for integration of two views was performed in [77]. This work improved classification accuracy for semi-supervised data with two views using an approach termed co-training, and inspired others to analyze multi-view data. One of the first attempts to perform multi-view clustering was [17]. In this work, EM and k-means, which are widely used single-omic clustering algorithms, were adapted for multi-view clustering. Both EM and k-means are iterative algorithms, where each iteration improves the objective function value. The suggested multi-view versions perform optimization in each iteration with respect to a different omic in an alternating manner. This approach loses theoretical guarantees for convergence, but was found to outperform algorithms that use each view separately, and also algorithms that cluster the concatenated matrix of the two views. Interestingly, [17] report improved results using the multi-view clustering algorithms on single-view datasets that were randomly split to simulate multi-view data. This was the first evidence for improved clustering using multiple views, and for the utility of a multi-view algorithm in clustering single-view data.

2.2 Early integration

Early integration is an approach that first concatenates all omic matrices, and then applies single-omic clustering algorithms on that concatenated matrix. It therefore enables the use of existing clustering algorithms. However, this approach has several drawbacks. First, without proper normalization, it may give more weight to omics with more features. Second, it does not consider the different distribution of data in the different omics. Finally, it increases the data dimension (the number of features), which is a challenge even in some single-omic datasets.

One way to handle the high dimension of the data is by using regularization, i.e., adding additional constraints to a problem to avoid overfitting [78]. Specifically, LASSO regularization creates models where the number of features with non-zero effect on the model is low [79], and regularization of the nuclear norm is often used to induce data sparsity. Indeed, LASSO regularization is used by iCluster [16] (reviewed in a later section), and LRACluster uses nuclear norm regularization (reviewed in this section). While any clustering algorithm can be applied using early integration, we highlight here algorithms that were specifically developed for this task.

LRACluster [18] uses a probabilistic model, where numeric, count and binary features have distributions determined by a latent representation of the samples Θ. For example, is distributed , where Θ^m is of the same dimensions as X^m. The latent representation matrix is encouraged to be of low rank, by adding a regularization on its nuclear norm. The objective function for the algorithm is −log(model’s likelihood) + μ · |Θ|_* where Θ is the concatenation of all Θ^m matrices, and |·|_* is the nuclear norm. This objective is convex and provides a global optimal solution, which is found using a fast gradient-ascent algorithm. Θ is subsequently clustered using k-means. This method was used to analyze pan-cancer TCGA data from eleven cancer types using four different omics, and to further find subtypes within these cancer types.

In [19], all omics are concatenated to a matrix X and the algorithm minimizes the following objective: . W is a p x k projection matrix, F is an n x k cluster indicator matrix such that F^tF = I_k, 1_n is a column vector of length n of 1’s, b is an intercept column vector of dimension k and γ is a scalar. The algorithm therefore seeks a linear transformation such the projected data are as close to a cluster indicator matrix as possible. That indicator matrix is subsequently used for clustering. The regularization term uses the G₁ norm, which is the l₂ norm for W entries associated with a specific cluster and view, summed over all views and clusters. Therefore, features that do not contribute to the structure of a cluster will be assigned with low coefficients in W.

2.3 Late integration

Late integration is another approach that allows to use existing single-omic clustering algorithms. First, each omic is clustered separately using a single-omic algorithm. Different algorithms can be used for each omic. Then, the different clusterings are integrated. The strength of late integration lies in that any clustering algorithm can be used for each omic. Algorithms that are known to work well on a particular omic can therefore be used, without having to create a model that unifies all of these algorithms. However, by utilizing only clustering solutions in the integration phase we can lose signals that are weak in each omic separately.

COCA [20] was applied to pan-cancer TCGA data, to investigate how tumors from different tissues cluster, and whether the obtained clusters match the tissue of origin. The algorithm first clusters each omic separately, such that the m’th omic has c_m clusters. The clustering of sample i for omic m is encoded in a binary vector υ_im of length c_m, where υ_im(j) = 1 if i belongs to cluster j and 0 otherwise. The concatenation of the υ_im vectors across all omics results in a binary cluster indicator vector for sample i. The n x c binary matrix B of these indicator vectors, where , is used as input to consensus clustering [80] to obtain the final clustering of the samples. Alternatively, in [21] a model based on Probabilistic Latent Semantic Analysis [81] was proposed for clustering B.

PINS [22] integrates clusters by examining their connectivity matrices for the different omics. Each such matrix S^m is a binary n x n matrix, where if patients i and j are clustered together in omic m, and 0 otherwise. These S^m matrices are averaged to obtain a single connectivity matrix, which is then clustered using different methods based on whether the different S^m matrices highly agree with each other or not. The obtained clusters are tested if they can be further split into smaller clusters. To determine the number of clusters for each omic and for the integrated clustering, perturbations are performed on the data by adding Gaussian noise to it, and the number of clusters is chosen such that the resulting clustering is robust to the perturbations.

Several methods for ensemble clustering were developed over the years, and are reviewed in [82]. While these were not originally developed for this purpose, they can be used for late multi-omics clustering as well.

2.4 Similarity-based methods

Similarity-based methods use similarities or distances between samples in order to cluster data. These methods compute the similarities between samples in each omic separately, and vary in the way these similarities are integrated. The integration step uses only similarity values. Since in current multi-omic datasets, the number of samples is much smaller than the number of features, these algorithms are usually faster than methods that consider all features while performing integration. However, in such methods it may be more difficult to interpret the output in terms of the original features. An additional advantage of similarity-based methods is that they can easily support diverse omic types, including categorical and ordinal data. Each omic only requires a definition of a similarity measure.

2.4.1 Spectral clustering generalizations

Spectral clustering [83] is a widely used similarity-based method for clustering single-view data. The objective function for single-view spectral clustering is max_U trace(U^tLU) s.t. U^tU = I, where L is the Laplacian [84] of the similarity matrix of dimension n x n, and U is of dimension n x k, where k is the number of clusters in the data. Intuitively, it means that samples that are similar to one another have similar row vectors in U. This problem is solved by taking the k first eigenvectors of L (details vary between versions that use the normalized and the unnormalized graph Laplacian), and clustering them with a simple algorithm such as k-means. The spectral clustering objective was shown to be a relaxation of the discrete normalized cut in a graph, providing an intuitive explanation for the clustering. Several multi-view clustering algorithms are generalizations of spectral clustering.

An early extension to two views performs clustering by computing a new similarity matrix, using the two views’ similarities [23]. Denote by W₁ and W₂ the similarity matrices for the two views. Then the integrated similarity, W, is defined as W₁W₂. Spectral clustering is performed on the block matrix

Note that each eigenvector for this matrix is of length 2n. Either half of the vector or an average of the two halves are used instead of the whole eigenvectors for clustering using k-means.

[24] generalizes spectral clustering for more than two views. Instead of finding a global U matrix, a matrix U^m is defined for each omic. The optimization problem is:

L^m is the graph Laplacian for omic m and Reg is a regularization term equal to either with the additional constraint that U^* is an n x k matrix such that U ^*tU ^* = I.

The first regularization allows each omic to have a different low rank U^m representation, but requires that these representations are close to each other. The second regularization requires that the U^m matrices are close to a consensus matrix U^*. Each of the U^m matrices, or U^*, can then be used for clustering.

[25] uses a different formulation, which does not require a different U^m for each omic, but instead uses the same U for all matrices. The following objective function is used:

This is equivalent to performing spectral clustering on the Laplacian Σ_mL^m. The obtained clusters are then further improved in a greedy manner, by changing the assignment of samples to clusters, while looking directly at the discrete normalized cut objective, rather than the continuous spectral clustering objective.

[26] suggests a runtime improvement over [24]. Instead of looking at the similarity matrix for all the samples, a small set of “representative” vectors, termed salient points, are calculated by running k-means on the concatenation of all omics and selecting the cluster centers. A similarity matrix is then computed between these all samples in the data and their s nearest salient points. Denote this similarity matrix for the m’th omic by W^m, and let Z^m be its normalization such that rows sum to 1. These matrices are of dimension n x the number of salient points. Next, the matrices are given as input to an algorithm with the same objective as [25]. This way, similarities are not computed between all pairs of samples.

[?] views similarity matrices as networks, and examines random walks on these networks. Random walks define a stationary distribution on each network, which captures its similarity patterns [85]. Since that stationary distribution is less noisy than the original similarity measures, [?] uses them instead to integrate the networks. [27] also examines random walks on the networks, but argues that the stationary distribution in each network can still be noisy. Instead, the authors compute a consensus transition matrix, that has minimum total distance to the per-omic transition matrices and is of minimal rank.

2.4.2 Similarity Network Fusion

SNF (Similarity Network Fusion) first constructs a similarity network for every omic separately [28]. In each such network, the nodes are samples, and the edge weights measure the sample similarity. The networks are then fused together using an iterative procedure based on message passing [86]. The similarity between samples is propagated between each node and its k nearest neighbors.

More formally, denote by W^(m) the similarity matrix for the m’th omic. Initially a transition probability matrix between all samples is defined by: and a transition porbability matrix between nearest neighbors is defined by: where N_i are i’s k nearest neighbors in the input X^m matrices. The P matrices are updated iteratively using message passing between the nearest neighbors: where is the matrix for omic m at iteration t. This process converges to a single similarity network, summarizing the similarity between samples across all omics. This network is partitioned using spectral clustering.

In [28], SNF is used on gene expression, methylation and miRNA expression data for several cancer subtypes from TCGA. In addition to partitioning the graph to obtain cancer sutbypes, the authors show that the fused network can be used for other computational tasks. For example, they show how to fit Cox proportional hazards [87], a model that predicts prognosis of patients, with a constraint such that similar patients in the integrated network will have similar predicted prognosis.

2.4.3 Multiple Kernel Learning

Kernel functions implicitly map samples to a high (possibly infinite) dimension, and can efficiently measure similarity between the samples in that dimension. Multiple kernel learning uses several kernels (similarity measures), and is often used in supervised analysis. [29] developed rMKL-LPP, which uses multiple kernel learning in unsupervised settings. The algorithm performs dimension reduction on the input omics such that similarities (defined using multiple kernels) between each sample and its nearest neighbors are maintained in low dimension. This representation is subsequently clustered with k-means. rMKL-LPP allows the use of diverse kernel functions, and even multiple kernels per omic. A regularization term is added to the optimization problem to avoid overfitting. The authors run the algorithm on five cancer types from TCGA, and show that using multiple kernels per omic improves the prognostic value of the clustering, and that regularization improves robustness.

2.5 Dimension reduction-based methods

Dimension reduction-based methods assume the data have an intrinsic low dimensional representation, with the dimension often corresponding to the number of clusters. The views that we observe are all transformations of that low dimensional data to a higher dimension, and the parameters for the transformation differ between views. This general formulation was proposed by [30], which suggest to minimize , where B is a matrix of dimension n x p, f_m are the parametrized transformations, and w_m are weights for the different views, and l is a loss function. The work further provides an optimization algorithm when the f_m transformations are given by matrix multiplication. That is, f_m(B) = BP^m, and l is the squared Frobenius norm applied to X^m − BP^m. Once B is calculated, single-omic clustering algorithm can be applied to it. This general framework is widely used. Since the transformation is often assumed to be linear, many of the dimension reduction methods are based on matrix factorization. Dimension reduction methods work with real-valued data. Applying these methods to discrete binary or count data is technically possible but often inappropriate.

2.5.1 JIVE

[31] assumes that the variation in each omic can be partitioned to a variation that is joint between all omics, and an omic-specific variation: X^mt = J^m + A^m + E^m where E^m are error terms. Let J and A be the concatenated J^m and A^m matrices, respectively. The model assumes that JA^t = 0, that is, the joint and omic specific variations are uncorrelated, and that rank(J) = r and rank(A_i) = r_i for each omic, so that the structure of each omic and the total joint variation are of low rank. In order for the weight of the different omics to be equal, the input omic matrices are normalized to have equal Frobenius norm. A penalty term is added to encourage variable sparsity. This method was applied to gene expression and miRNA data of Glioblastoma Multiforme brain tumors, and identified the joint variation between these omics.

2.5.2 Correlation and covariance-based

Two of the most widely used dimension reduction methods are Canonical Correlation Analysis (CCA) [33] and Partial Least Squares (PLS) [45]. Given two omics X¹ and X², in CCA the goal is to find two projection vectors u¹ and u² of dimensions p₁ and p₂, such that the projected data has maximum correlation:

These projections are called the first canonical variates, and are the axis with maximal correlation between the omics. The k’th pair of canonical variates, and are found such that correlation between and is maximal, given that the new pair is uncorrelated (that is, orthogonal) to the previous canonical variates. [88] proved and showed empirically that if the data originate from normal or log concave distributions, the canonical variates can be used to cluster the data. CCA was formulated in a probabilistic framework such that the optimization solutions are maximum likelihood estimates [89], and further extended to a Bayesian framework [34]. An additional expansion to perform CCA in high dimension is Kernel CCA [35]. A deeplearning based CCA method, DeepCCA, was recently developed [36]. Rather than maximize the correlation between linear projections of the data, the projections are taken to be functions of the data calculated using neural networks, and the optimization process optimizes the parameters for these networks.

Solving CCA requires inversion of the covariance matrix for the two omics. Omics data usually have a higher number of features than samples, and these matrices are therefore not invertible. To apply CCA to omics data, and to increase the interpretability of CCA’s results, sparsity regularization was added [37, 38].

CCA supports only two views. Several works extend it to more than two views, including MCCA [38] which maximizes the sum of pairwise correlations between projections and CCA-RLS [39]. [40] generalize CCA to tensors in order to support more than two views.

Another line of work on CCA, with high relevance for omics data, investigated relationships between the features while performing the dimension reduction. ssCCA (structure constrained sparse CCA) allows to incorporate into the model known relationships between features in one of the input omics, and force entries in the uⁱ vector for that view to be close for similar features. This model has been developed by [41] and utilized microbiome’s phylogenies as the feature structure. Another model that considers relationship between features was developed in [42]. In this work, rather than defining similarities between features, they are partitioned into groups. Regularization is performed such that both irrelevant groups and irrelevant features within relevant groups are removed from the model. Finally, [43] extended CCA to support count data, which are common in biological datasets.

PLS also follows a linear dimension reduction model, but maximizes the covariance between the projections, rather than the correlation. More formally, given two omics X¹ and X², PLS computes a sequence of vectors and for k = 1, 2, … such that is maximal, given that , and for l < k. That is, new projections are not correlated with previous ones. PLS can be applied to data with more features than samples even without sparsity constraints. A sparse solution is nonetheless desirable, and one was developed [46, 47]. O2-PLS increases the interpretability of PLS by partitioning the variation in the datasets into joint variation between them, and variations that are specific for each dataset and that are not correlated with one another [48]. While PLS and O2-PLS were originally developed for chemometrics, they were recently used for omics data as well [90, 91]. PLS was also extended to use the kernel framework [49], and a combined version of kernel PLS and O2 PLS was developed [50].

Like CCA, PLS was developed for two omics. MBPLS (Multi Block PLS) extends the model to more than two omics [92], and sMBPLS adds sparsity constraints. sMBPLS was developed specifically for omics data [51]. It looks for a linear combination of projections of non-gene-expression omics that has maximal correlation with a projection of gene expression omic. An extension of O2-PLS also exists for multi-view datasets [52].

An additional method that is based on maximizing covariance in low dimension is MCIA [53], an extension of co-inertia analysis to more than two omics [93]. It aims to find projections for all the omics such that the sum of squared covariances with a global variation axis is maximal: . The projections of different omics can be used to evaluate the agreement between the different omics (the distance between projections reflects the level of disagreement between omics). Each of the projections can be used as a representation for clustering.

2.5.3 Non-negative Matrix Factorization

Non-negative Matrix Factorization (NMF) assumes that the data have an intrinsic low dimensional non-negative representation, and that a nonnegative matrix projects it to the observed omic [94]. It is therefore only suitable for non-negative data. For a single omic, denote by k the low dimension. The formulation is X ≈ WH, where X is the n x p observed omic matrix, W is n x k and H is k x p. The objective function is , and it is minimized by updating W and H in an alternating manner, using multiplicative update rules, such that solutions remain non negative after each update [95]. The low dimension representation W can be clustered using a simple single-omic algorithm.

Several methods generalize this model to multi-omic data. MultiNMF [54] uses the following generalization: Each omic X^m is factorized into W^mH^m. This model is equivalent to performing NMF on each omic separately. Integration between the omics is done by adding a constraint that the W^m matrices are close to a “consensus” matrix W^*. The objective function is therefore: . [55] generalizes this method to support weights for features’ and samples’ similarity. [56] extend MultiNMF by further requiring that the low dimensional representation W^* maintains similarities between samples (samples that are close in the original dimension must be close in W^*). This approach combines factorization and similarity-based methods.

Joint NMF [57] uses a different formulation, where a sample has the same low dimensional representation for all omics: X^m ≈ WH^m. Note that by writing X = WH where X and H are obtained by matrix concatenation, this model is equivalent to early integration. Joint NMF is not directly used for clustering. Rather, the data are reduced to a large dimension (k = 200) and high values in W and H^m are used to associate samples and features with modules that are termed “md-modules”. The authors applied Joint NMF on miRNA, gene expression and methylation data from ovarian cancer patients, and showed that functional enrichment among features that are associated with md-modules that is more significant than the enrichment obtained in single-omic modules. In addition, patients in certain modules have significantly different prognosis compared to the rest of the patients. Much like [56] extends multiNMF, [58] extends Joint NMF such that similarities in the original omics are maintained in lower dimension. [59] extends NMF to the case where different views can contain different samples, but constrains certain samples from different views to belong to the same cluster based on prior knowledge. Finally, PVC [60] performs partial multi-view clustering. In this setting, not all samples necessarily have measurements for all views.

2.5.4 Matrix tri-factorization

An alternative factorization approach presented in [61] is tri-matrix factorization. In this framework, each input omic is viewed as describing a relationship between two entities, which are its rows and columns. For example, in a dataset with two omics, gene expression and DNA methylation of patients, there are three entities which are the patients, the genes and the CpG loci. The gene expression matrix describes a relationship between patients and genes, while the methylation matrix describes a relationship between patients and CpG loci.

Each omic matrix R_ij of dimension n_i x n_j that describes the relationship between entities i and j is factorized as , where G_i and G_j provide a low dimensional representation for entities i and j respectively and are of dimensions n_i x k_i and n_j x k_j, and S_ij is an omic-specific matrix of dimension k_i x k_j. As in NMF, the G_i matrices are non-negative. The same G_i matrix is used in all omics with entity i, and in this way data integration is achieved. In the above example, both the gene expression and DNA methylation omics will use the same G matrix to represent patients, but different matrices to represent genes and CpG loci. In this model, an additional matrix describing the relationship between genes and CpGs could optionally be used. [61] adds constraints to the formulation that can encourage entities to have similar representations. This framework was applied to diverse problems in bioinformatics, including in supervised settings: It was used to perform gene function prediction [61], and for patient survival regression [96].

2.5.5 Convex formulations

A drawback of most factorization-based methods is that their objective functions are not convex, and therefore optimization procedures do not necessarily reach a global optimum, and highly depend on initialization. One solution to this issue is by formulating dimension reduction as a convex problem. [62] relaxes CCA’s conditions and defines a convex variant of it. Performance was assessed on reducing noise in images, but the method can also be used for clustering. However, like CCA, the method only supports two views. [63] present a different convex formulation for dimension reduction, for the general factorization framework presented earlier, which minimizes is the l_2,1 norm, namely the sum of the Euclidean norms of the matrix rows. This relaxation therefore supports multiple views. LRAcluster [18] also uses matrix factorization and has a convex objective function.

2.5.6 Tensor-based methods

A natural extension of factorization methods for multi-omic data is to use tensors, which are higher order matrices. One such method is developed in [64]. This method writes each omic matrix as X^m = Z^mX^m + E^m, diag(Z^m) = 0, where Z^m is an n x n matrix and E^m are error matrices. The idea is that each sample in each omic can be represented as a linear combination of other samples (hence the diag(Z^m) = 0 constraint), and that its representation in that base (Z^m) can then be used for clustering. To integrate the different views, the different Z^m matrices are merged to a 3^rd-order tensor, Z. The objective function encourages Z to be sparse, and the E^m error matrices to have a small norm.

2.6 Statistical methods

Statistical methods model the probabilistic distribution of the data. Some of these methods view samples as originating from different clusters, where each cluster defines a distribution for the data, while other methods do not explicitly use the cluster structure in the model. An advantage of the statistical approach is that it allows to include biological knowledge as part of the model when determining the distribution functions. This can be done either using Bayesian priors or by choosing probabilistic functions, e.g. using normal distribution for gene expression data. For most formulations, parameter estimation is computationally hard, and different heuristics are used. Several models under the Bayesian framework allow for samples to belong to different clusters in different omics.

2.6.1 iCluster and iCluster+

iCluster [16] assumes that the data originate from a low dimension representation, which determines the cluster membership for each sample: X^mt = W^mZ + ϵ^m, where Z is a k x n matrix, W^m is an omic specific p_m x k matrix, k is the number of clusters and ϵ^m is a normally distributed noise matrix. This model resembles other dimension reduction models, but here the distribution of noise is made explicit. Under this model iCluster maximizes the likelihood of the observed data with an additional regularization for sparse W^m matrices. Optimization is performed using an EM-like algorithm, and subsequently k-means is run on the lower dimension representation of the data Z to get the final clustering assignments. iCluster was applied to breast and lung cancer, using gene expression and copy number variations. iCluster was also recently used to cluster more than ten thousand tumors from 33 cancers in a pan-cancer analysis [97]. Note that by concatenating all W^m matrices to a single W matrix, and rewriting the model as X^t = WZ + ϵ, iCluster can be viewed as an early integration approach.

iCluster’s runtime grows fast with the number of features, and therefore feature selection is essential before using it [28]. [16] only use genes located on one or two chromosomes in their analysis.

Since iCluster’s model uses matrix multiplication, it requires real-values features. An extension called iCluster+ [65] includes different models for numeric, categorical and count data, but maintains the idea that data originate from a low dimension matrix Z. For categorical data, iCluster+ assumes the following model: while for numeric data the model remains linear with normal error:

A regularization term encouraging sparse solution is added to the likelihood, and a Monte-Carlo Newton-Raphson algorithm is used to estimate parameters. The Z matrix is used as in iCluster for the clustering. The latest extension of iCluster, which builds on iCluster+, is iClusterBayes [66]. This method replaces the regularization in iCluster+ with full Bayesian regularization. This replacement results in faster execution, since the algorithm no longer needs to fine tune parameters for iCluster+’s regularization.

2.6.2 PARADIGM

PARADIGM [67] is the most explicit approach to modeling cellular processes and the relations among different omics. For each sample and each cellular pathway, a factor graph that represents the state of different entities within that pathway is created. As a degenerate example, a pathway may include nodes representing the mRNA levels of each gene in that pathway, and nodes representing those genes’ copy number. Each node in the factor graph can be either activated, nominal or deactivated, and the factor graph structure defines a distribution over these activation levels. For example, if a gene has high copy number it is more likely that it will be highly expressed. However, if a repressor for that gene is highly expressed, that gene is more likely to be deactivated. PARADIGM infers the activity of non-measured cellular entities to maximize the likelihood of the factor graph, and outputs an activity score for each entity per patient. These scores are used to cluster cancer patients from several tissues.

PARADIGM’s model can be used for more than clustering. For example, PARADIGM-shift [98] predicts loss-of-function and gain-of-function mutations, by finding genes whose expression value as predicted based on upstream entities in the factor graph is different from their predicted expression value using downstream entities. PARADIGM relies heavily on known interactions, and requires specific modeling for each omic. It is also quite limited to the cellular level; For example, it is not clear how to incorporate into the model an omic describing the microbiome composition of each patient.

2.6.3 Combining omic-specific and global clustering

All the methods discussed so far assume that there exists a consistent clustering structure across the different omics, and that analyzing the clusters in an integrative way will reveal this structure more accurately than analyzing each omic separately. However, this is not necessarily the case for biomedical datasets. For example, it is not clear that the methylation and expression profiles of cancer tumors really represent the same underlying cluster structure. Rather, it is possible that each omic represents a somewhat different cluster structure. Several methods take this view point using Bayesian statistics.

[68] defines a hierarchical Dirichlet process model, which supports clustering on two omics. Each sample can be either fused or unfused. Fused samples belong to the same cluster in both omics, while unfused samples can belong to different clusters in different omics. Patterns of fused and unfused samples reveal the concordance between the two datasets. This model is extended in PSDF [69] to include feature selection. [68] applies the model to cluster genes using gene expression and ChIP-chip data, while [69] clusters cancer patients using expression and copy number data.

In MDI [70] each sample can have different cluster assignments in different omics. However, a prior is given such that the stronger an association between two omics is, the more likely a sample will belong to the same cluster in these two omics. This association strength adjusts the prior clustering agreement between two omics. In addition to these priors, MDI’s model uses Dirichlet mixture model, and explicitly represents the distribution of the data within each cluster and omic. Since samples can belong to different clusters in different omics, no global clustering solution is returned by the algorithm. Instead, the algorithm outputs sets of samples that tend to belong to the same cluster.

A different Bayesian formulation is given by BCC [71]. Like MDI, BCC assumes a Dirichlet mixture model, where the data originate from a mixture of distributions. However, BCC does assume a global clustering solution, where each sample maps to a single cluster. Given that a sample belongs to a global cluster, its probability to belong to that cluster in each omic is high, but it can also belong to a different cluster in that omic. Parameters are estimated using Gibbs sampling [99]. BCC was used on gene expression, DNA methylation, miRNA expression and RPPA data for breast cancer from TCGA.

Like MDI and BCC, Clusternomics [72] uses a Dirichlet mixture model. Clusternomics suggests two different formulations. In the first, each omic has a different clustering solution, and the global clusters are represented as the Cartesian product of clusters from each omic. This approach does not perform integration of the multi-omic datasets. In the second formulation, global clusters are explicitly mapped to omic-specific clusters. That way, not all possible combinations of clusters from different omics are considered as global clusters.

2.6.4 Survival-based clustering

One of the areas multi-omics clustering is widely used for is discovering disease subtypes. In this context, we may expect different disease subtypes to have a different prognosis, and this criterion is often used to assess clustering solutions. [73] develop a Bayesian model for multi-omics clustering that considers patient prognosis while clustering the data. Patients within a cluster have both similar feature distribution and similar prognosis. [74] also develop a probabilistic clustering method that considers survival, and that supports a large number of features compared to [73], which only uses a few dozen features. As the survival data are used as input to the model, it is not surprising that this approach gives clusters with more significantly different survival than other approaches. This was demonstrated on Glioblastoma Multiforme data by [73] and for data from several cancer types by [74], both from TCGA.

2.7 Deep multi-view methods

A recent development in machine learning is the advent of deep learning algorithms [100]. These algorithms use multi-layered neural networks to perform diverse computational tasks, and were found to improve performance in several fields such as image recognition [101] and text translation [102]. Neural networks and deep learning have also proven useful for multi-view applications [103], including unsupervised feature learning [36], [104]. Learned features can be used for clustering, as described earlier for DeepCCA. Deep learning is already used extensively for biomedical data analysis [105].

Recent deep learning uses for multi-omics data include [75] and [76]. [75] use an autoencoder, which is a deep learning method for dimension reduction. The authors ran it on RNA-seq, methylation and miRNA-seq data in order to cluster Hapatocellular Carcinoma patients. The architecture implements an early integration approach, concatenating the features from the different omics. The autoencoder outputs a representation for each patient. Features from this representation are tested for association with survival, and significantly associated features are used to cluster the patients. The clusters obtained have significantly different survival. This result is compared to a similar analysis using the original features, and features learned with PCA rather than autoencoders. However, the analysis in this work is not unsupervised, since the feature selection is based on patient survival.

[76] use a different approach. They analyze expression, methylation and miRNA ovarain cancer data using Deep Belief Networks [106] which explicitly consider the multi-omic strucutre of the data. The architecture contains separate hidden layers, each having inputs from one omic, followed by layers that receive input from all the single-omic hidden layers, thus integrating the different omics. A 3-dimensional representation over {0, 1} is learned for each patient, partitioning the patients into 2³ = 8 clusters. The clustering results are compared to k-means clustering on the concatenation of all omics, but not to other multi-omics clustering methods.

Deep learning algorithms usually require many samples and few features. They use a large number of parameters, which makes them prone to overfitting. Current multi-omic datasets have the opposite characteristics - they have many features and at least one order of magnitude less samples. The works presented here use only a few layers in their architectures to overcome this limitation, in comparison to the dozens of layers used by state-of-the-art architectures for imaging datasets. As the number of biomedical samples increases, deep multi-view learning algorithms might prove more beneficial for biomedical datasets.

3 Benchmark

In order to test the performance of multi-omics clustering methods, we compared nine algorithms on ten cancer types available from TCGA. We also compared the performance of the algorithms on each one of the single-omic datasets that make up the multi-omic datasets, for algorithms that are applicable to singleomic data. The nine algorithms were chosen to represent diverse approaches to multi-omics clustering. Three algorithms are early integration methods: LRAcluster, and k-means and spectral clustering on the omics concatenated into a single matrix. For similarity-based algorithms we used SNF and rMKL-LPP. For dimension reduction we used MCCA [38] and MultiNMF. We chose iClusterBayes as a statistical method, and PINS as a late integration approach.

The ten datasets contain cancer tumor multi-omics data, where each dataset is a different cancer type. All datasets contain three omics: gene expression, DNA methylation and miRNA expression. The number of patients range from 170 for AML to 621 for BIC. Full details on the datasets and cancer type acronyms appear in Supplementary File 2.

To assess the performance of a clustering solution, we used three metrics. First, we measured differential survival between the obtained clusters using the logrank test [107]. Using this test as a metric assumes that if clusters of patients have significantly different survival, they are different in a biologically meaningful way. Second, we tested for the enrichment of clinical labels in the clusters. We chose six clinical labels for which we tested enrichment: gender, age at diagnosis, pathologic T, pathologic M, pathologic N and pathologic stage. The four latter parameters are discrete pathological parameters, measuring the progression of the tumor (T), metastases (M) and cancer in lymph nodes (N), and the total progression (pathologic stage). Enrichment for discrete parameters was calculated using the χ² test for independence, and for numeric parameters using Kruskal-Wallis test. Not all clinical parameters were available for all cancer types, so a total of 41 clinical parameters were available for testing. Finally, we recorded the runtime of each method. We did not consider in the assessment computational measures for clustering quality, such as heterogeneity, homogeneity or the silhouette score [108], since the different methods perform different normalization on the features (and some even perform feature selection). Full details about the survival and phenotype data appear in Supplementary File 2.

To derive a p-value for the logrank test, the χ² test for independence, and the Kruskal-Wallis test, the statistic for these three tests is assumed to have χ² distribution. However, for the logrank test and χ² test this approximation is not accurate for small sample sizes and unbalanced cluster sizes, especially for large values of the test statistic (this was shown for example in [109] for the logrank test in the case of two clusters). Indeed, we encountered in our analysis cases where the approximation gave extreme p-values (< 10⁻¹⁰) for very small clusters (n = 3). Instead, we ran permutation tests for each clustering (where we permuted the cluster labels between samples) and used the test statistic to obtain an empirical p-value. The obtained empirical p-values were significantly different from the p-values returned by the χ² approximation. In fact, for the logrank test on multi-omics data, the approximation-based p-values for 54 out of 89 clustering solutions (PINS crashed on BIC dataset, giving a total of 89 solutions) were not within their 95% confidence intervals constructed using the permutation test. This inaccuracy was exacerbated for small p-values - 32 out of the 35 significant (< 0.05) approximated p-values did not fall within their 95% confidence intervals. In all these cases, the p-value was higher (less significant) for the permutation-based computation. An extreme case is MCCA’s solution for KIRC dataset, where the p-value from the approximation was reported to equal 0, while the permutation test estimated it at 1.4e-4. In several cases, results that were significant according to the approximation were actually not significant according to the permutation tests. We observed a similar problem with the approximate p-values computed for the clinical parameters. The p-values we report here are therefore estimated using the permutation tests. More details on the permutation tests appear in Supplementary File 1. After permutation testing, the p-values for the clinical labels were corrected for multiple hypotheses (since several labels were tested) using Bonferroni correction for each cancer type and method at significance level 0.05. Results for the statistical analyses are in Supplementary File 3.

We applied all nine methods to the ten multi-omics datasets, and to the thirty single-omic matrices comprising them. The only exceptions were MCCA, which we could not apply to single-omic data, and PINS, which crashed consistently on all BIC datasets. All methods were run on a Windows machine, except for iCluster which was run on a Linux cluster utilizing up to 15 nodes in parallel. Details on hardware, data preprocessing and application of the methods appear in Supplementary File 1. Full clustering results appear in Supplementary File 4. All the processed raw data are available at http://acgt.cs.tau.ac.il/multi_omic_benchmark/download.html, and all software scripts used are available at https://github.com/Shamir-Lab/Multi-Omics-Cancer-Benchmark/.

Figure 2 depicts the performance of the benchmarked methods on the different cancer datasets, and Figures 3 and 4 summarize the performance for multi-omics data and for each single-omic separately across all cancer types. No algorithm consistently outperformed all others in either differential survival or enriched clinical parameters. With respect to survival, MCCA had the total best prognostic value (sum of −log10 p-values = 18.19), while MultiNMF was second (16.04) and rMKL-LPP third (14.18). The sum of p-values can be biased due to outliers, so we also counted the number of datasets for which a method’s solution obtains significantly different survival. These results are reported in Table 2. Here, with the exception of iClusterBayes, all methods that were developed for multi-omics or multi-view data had four cancer types with significantly different survival. These four cancer types are not identical for all the algorithms.

Figure 2:

Performance of the algorithms on ten multi-omics cancer datasets. For each plot, the x-axis measures the differential survival between clusters (−log10 of logrank’s test p-value), and the y-axis is the number of clinical parameters enriched in the clusters. Red vertical lines indicate the threshold for significantly different survival (p-value ≤ 0.05)

Figure 3:

Mean performance of the algorithms on ten multi-omics cancer datasets. The x-axis measures the differential survival between clusters (mean −log10 of logrank’s test p-value), and the y-axis is the mean number of clinical parameters enriched in the clusters.

Figure 4:

Summarized performance of the algorithms across ten cancer datasets. For each plot, the x-axis measures the total differential prognosis between clusters (sum across all datasets of −log10 of logrank’s test p-value), and the y-axis is the total number of clinical parameters enriched in the clusters across all cancer types. A-C: results for single-omic datasets. D: results when each method uses the single omic that achieves the highest significance in survival. E: same with respect to enrichment of clinical labels.

View this table:

Table 2:

Cancer types with significant results per algorithm. For each benchmarked algorithm, the number of cancer subtypes for which its clustering had significantly different prognosis (first row) and had at least one enriched clinical label (second row) are shown.

rMKL-LPP achieved the highest total number of significant clinical parameters, with 16 parameters. Spectral clustering came second with 14 and LRAcluster had 13. MCCA and MultiNMF, which had good results with respect to survival, had only 11 and 10 enriched parameters, respectively. rMKL-LPP did not outperform all other methods for all cancer types. For example, it had one enriched parameter for SKCM, while several other methods had two or three. We also considered the number of cancer types for which an algorithm had at least one enriched clinical label (Table 2). rMKL-LPP, spectral clustering and MCCA had enrichment in 8 cancer types, despite MCCA having a total of only 11 enriched parameters. Overall, rMKL-LPP outperformed all methods except MCCA and multiNMF with respect to both survival and clinical enrichment. MCCA and multiNMF had better prognostic value, but found less enriched clinical labels.

Each method determines the number of clusters for each dataset. These numbers are presented in Table 3. The numbers vary drastically among methods, from 2 or 3 (iCluster and MultiNMF) to more than 10 on average (MCCA). Both MCCA and rMKL-LPP partitioned the data into a relatively high number of clusters (average of 11.1 and 6.7 respectively), and both had good performance, which may indicate that clustering cancer patients into more clusters improves prognostic value and clinical significance. The higher number of clusters is controlled in the logrank and clinical enrichment tests by having more degrees of freedom for its χ² statistic.

View this table:

Table 3:

Number of clusters chosen by the benchmarked algorithms on ten multi-omics cancer datasets. The right column is the average number of clusters across all cancer types.

The runtime of the different methods is reported in Table 4. Note that as mentioned earlier, iClusterBayes was run on a cluster, while the other methods were run on a desktop computer. All methods except for LRAcluster and iCluster took less than five minutes per dataset on average. LRAcluster and iClusterBayes took about 56 and 72 minutes per dataset, respectively.

View this table:

Table 4:

Runtime in seconds of the algorithms on ten multi-omics cancer datasets. The right column is the average runtime across all cancer types. *For iClusterBayes numbers are elapsed time on a multi-core platform.

Figure 4 also shows the performance of the benchmarked methods for single-omic data. While several methods had worse performance on single-omic datasets, some achieved better performance. For example, the highest number of enriched clinical parameters for both single and multi-omic datasets (18) was achieved by rMKL-LPP on gene expression. The gene expression solution also had better prognostic value than the multi-omic solution. LRAcluster on gene expression data had the most significant prognostic value across all single-omic and multi-omic experiments, except for MCCA on multi-omics data (sums of −log10 p-values are 18.15 and 18.19, respectively).

To further test how analysis of single-omic datasets compares to multi-omic datasets, we chose for each dataset and method the single omic that gave the best results for survival and clinical enrichment. In this analysis, rMKL-LPP had the highest total number of enriched clinical parameters (21), and the highest total survival significance was for LRAcluster (22.89). The runtime, number of clusters, and survival and clinical enrichment analysis for single-omic datasets appear in Supplementary Files 1 and 3. These results suggest that analysis of multi-omics data does not consistently provide better prognostic value and clinical significance compared to analysis of single-omic data alone, especially when different single-omics are used for each cancer types.

4 Discussion

We have reviewed methods for multi-omics and multi-view clustering. In our tests on ten cancer datasets, overall, rMKL-LPP performed best in terms of clinical enrichment, and outperformed all methods except MCCA and MultiNMF with respect to survival. The high performance of MCCA and MultiNMF is remarkable, as these are multi-view methods that were not specifically developed for omics data (though MCCA was applied to it).

Careful consideration should be given when applying multi-view clustering methods to multi-omic data, since these data have characteristics that multi-view methods do not necessarily consider. The most prominent of these characteristics is the large number of features relative to the number of samples. For example, CCA inverts the covariance matrix of each omic. This matrix is not invertible when there are more features than samples, and sparsity regularization is necessary. Another feature of multi-omic data is the dependencies between features in different omics, but several multi-view algorithms assume conditional independence of the omics given the clustering structure. This dependency is rarely considered, since it greatly increases the complexity of models. An additional characteristic of current omic data types is that due to cellular regulation, they have an intrinsic lower dimensional representation. The characteristic is utilized by many methods.

In our benchmark, single-omic data alone sometimes gave better results than multi-omics data. This was intensified when for each algorithm the “best” single-omic for each cancer type was chosen. These results question the current assumptions underlying multi-omics analysis in general and multi-omics clustering in particular.

Several approaches may lead to improved results for multi-omics analysis. First, methods that suggest different clusterings in different omics were developed and reviewed here, but were not included in the benchmark, since it is not clear how to compare algorithms that do not output a global clustering solution to those that do. These methods may be more sensitive to strong signals appearing in only some of the omics. Second, future algorithms can perform omic selection in the same manner that algorithms today perform feature selection. In the benchmark, we let each method choose a single-omic for each cancer type given the results of the analysis, which are usually not available for real data. Methods that filter omics with contradicting signals might obtain a clearer clustering. Finally, most methods for multi-omics clustering do not incorporate prior biological knowledge, and especially the relationship between omics. A notable exception is PARADIGM, which formulates both the relationships between different omics and between genes using known pathways. Other statistical methods also include some form of biological modeling by describing the distribution of the omics, and MDI tunes the similarity of clustering solutions in different omics based on the omics similarity. However, these methods do not model the biological relationship between omics. Methods that model such relations might benefit from additional biological knowledge, even without modeling whole pathways. For example, one can incorporate in a model the fact that promoter methylation is anti-correlated with gene expression. As far as we know, such methods were only developed for copy-number variation and gene expression data (e.g. [110]), and not in the context of clustering.

We detected large differences between the p-values derived from the χ² approximation compared to the p-values derived from the permutation tests in the statistical tests we used. The differences were especially large due to the small sample size, small cluster sizes (in solutions with a high number of clusters) and due to a low number of events (high survival) for the logrank test. These p-values are used by single and multi-omic methods to assess their performance, and the logrank p-value is often the main argument for an algorithm’s merit. The large differences between the p-values question the validity of analyses that are based on the χ² approximation, at least for TCGA data. Future work must use exact or permutation-based calculations of the p-value in datasets with similar characteristics to those used here for the benchmark.

The benchmark we performed is not without limitations. Gauging performance using patient survival is somewhat biased to known cancer subtypes, which may have been used in treatment decisions. Additionally, cancer subtypes that are biologically different may have similar survival. This is also true for enrichment of clinical parameters, although we attempted to choose parameters that would not lead to this bias. However, these measures are widely used for clustering assessment, including in the papers describing some of the benchmarked methods. Another limitation of the benchmark is that it only examines clustering, while some of the methods have additional goals and output. For example, in dimension reduction algorithms, the low dimensional data can be used to analyze features, and not only patients, e.g., by calculating axes of variation common to several omics. With respect to feature analysis, multi-omic algorithms can have an advantage over single-omic algorithms that we did not test. Finally, though we selected the parameters of each benchmarked method according to the guidelines given by the authors, judicious fine-tuning of the parameters may improve results.

Funding

This research was supported in part by a grant from the United State - Israel Binational Science Foundation (BSF), Jerusalem, Israel and the United States National Science Foundation (NSF) and by the Bella Walter Memorial Fund of the Israel Cancer Association. N.R. was supported in part by a fellowship from the Edmond J. Safra Center for Bioinformatics at Tel-Aviv University.

Acknowledgements

The results published here are based upon data generated by The Cancer Genome Atlas managed by the NCI and NHGRI. Information about TCGA can be found at http://cancergenome.nih.gov. We thank Nora K. Speicher for providing the rMKL-LPP tool and Ron Zeira for helpful comments.

References

[1].↵
Sara Goodwin, John D. McPherson, and W. Richard McCombie. Coming of age: ten years of next-generation sequencing technologies. Nature Reviews Genetics, 17(6):333–351, 2016.
OpenUrl CrossRef PubMed
[2].↵
Fatih Ozsolak and Patrice M. Milos. RNA sequencing: advances, challenges and opportunities. Nature Reviews Genetics, 12(2):87–98, 2011.
OpenUrl CrossRef PubMed Web of Science
[3].↵
David B. Allison, Xiangqin Cui, Grier P. Page, and Mahyar Sabripour. Microarray data analysis: From disarray to consolidation and consensus. Nature Reviews Genetics, 7(1):55–65, 2006.
OpenUrl CrossRef PubMed Web of Science
[4].↵
Wai-Shin Yong, Fei-Man Hsu, and Pao-Yang Chen. Profiling genome-wide DNA methylation. Epigenetics & Chromatin, 9(1):26, 2016.
OpenUrl CrossRef PubMed
[5].↵
A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review. ACM Computing Surveys, 31(3):264–323, 1999.
OpenUrl
[6].↵
Vinay Prasad, Tito Fojo, and Michael Brada. Precision oncology: origins, optimism, and potential. The Lancet. Oncology, 17(2):e81–e86, 2016.
OpenUrl
[7].↵
Jing Zhao, Xijiong Xie, Xin Xu, and Shiliang Sun. Multi-view learning overview: Recent progress and new challenges. Information Fusion, 38:43–54, 2017.
OpenUrl
[8].↵
G. Chao, S. Sun, and J. Bi. A survey on multi-view clustering. ArXiv e-prints, 2017.
[9].↵
The Cancer Genome Atlas Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, 455(7216):1061–1068, 2008.
OpenUrl CrossRef PubMed Web of Science
[10].↵
Sijia Huang, Kumardeep Chaudhary, and Lana X Garmire. More is better: Recent progress in multiomics data integration methods. Frontiers in Genetics, 8:84, 2017.
OpenUrl
[11].↵
Matteo Bersanelli, Ettore Mosca, Daniel Remondini, Enrico Giampieri, Claudia Sala, Gastone Castellani, and Luciano Milanesi. Methods for the integration of multi-omics data: mathematical aspects. BMC Bioinformatics, 17(S2):S15, 2016.
OpenUrl
[12].↵
Yifeng Li, Fang-Xiang Wu, and Alioune Ngom. A review on machine learning principles for multi-view biological data integration. Briefings in Bioinformatics, pages 325–340, 2016.
[13].↵
Dongfang Wang and Jin Gu. Integrative clustering methods of multi-omics data for molecule-based cancer classifications. Quantitative Biology, 4(1):58–67, 2016.
OpenUrl
[14].↵
Chen Meng, Oana A. Zeleznik, Gerhard G. Thallinger, Bernhard Kuster, Amin M. Gholami, and Aedín C. Culhane. Dimension reduction techniques for the integrative analysis of multi-omics data. Briefings in Bioinformatics, 17(4):628–641, 2016.
OpenUrl CrossRef PubMed
[15].↵
Giulia Tini, Luca Marchetti, Corrado Priami, and Marie-Pier Scott-Boyer. Multi-omics integration—a comparison of unsupervised clustering methodologies. Briefings in Bioinformatics, 2017.
[16].↵
Ronglai Shen, Adam B. Olshen, and Marc Ladanyi. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics, 25(22):2906–2912, 2009.
OpenUrl CrossRef PubMed Web of Science
[17].↵
Steffen Bickel and Tobias Scheffer. Multi-view clustering. Proc. ICDM 2004, pages 19–26, 2004.
[18].↵
Dingming Wu, Dongfang Wang, Michael Q. Zhang, and Jin Gu. Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation: Application to cancer molecular classification. BMC Genomics, 16(1):1022, 2015.
OpenUrl
[19].↵
Hua Wang, Feiping Nie, and Heng Huang. Multi-view clustering and feature learning via structured sparsity. Proc. ICML ’13, 28:352–360, 2013.
OpenUrl
[20].↵
Katherine A. Hoadley, Christina Yau, Denise M. Wolf, Andrew D. Cherniack, David Tamborero, Sam Ng, Max D.M. Leiserson, Beifang Niu, Michael D. McLellan, Vladislav Uzunangelov, Jiashan Zhang, Cyriac Kandoth, Rehan Akbani, Hui Shen, Larsson Omberg, Andy Chu, Adam A. Margolin, Laura J. van’t Veer, Nuria Lopez-Bigas, Peter W. Laird, Benjamin J. Raphael, Li Ding, A. Gordon Robertson, Lauren A. Byers, Gordon B. Mills, John N. Weinstein, Carter Van Waes, Zhong Chen, Eric A. Collisson, Christopher C. Benz, Charles M. Perou, Joshua M. Stuart, and Joshua M Stuart. Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell, 158(4):929–944, 2014.
OpenUrl CrossRef PubMed Web of Science
[21].↵
Eric Bruno and Stéphane Marchand-Maillet. Multiview clustering: A late fusion approach using latent models categories and subject descriptors. In Proc. ACM SIGIR ’09, pages 736–737, New York, New York, USA, 2009. ACM Press.
[22].↵
Tin Nguyen, Rebecca Tagett, Diana Diaz, and Sorin Draghici. A novel approach for data integration and disease subtyping. Genome Research, 27(12):2025–2039, 2017.
OpenUrl Abstract/FREE Full Text
[23].↵
Virginia R de Sa. Spectral clustering with two views. In Proceedings of the Workshop on Learning with Multiple Views, 22nd ICML, pages 20–27, 2005.
[24].↵
Abhishek Kumar, Piyush Rai, and Hal Daumé, III.. Co-regularized multi-view spectral clustering. In Proc. NIPS ’11, pages 1413–1421, USA, 2011.
[25].↵
Nacim Fateh Chikhi. Multi-view clustering via spectral partitioning and local refinement. Information Processing & Management, 52(4):618–627, 2016.
OpenUrl
[26].↵
Yeqing Li, Feiping Nie, Heng Huang, and Junzhou Huang. Large-scale multi-view spectral clustering with bipartite graph. In Proc. AAAI 15, pages 2750–2756, 2015.
[27].↵
Rongkai Xia, Yan Pan, Lei Du, and Jian Yin. Robust multi-view spectral clustering via low-rank and sparse decomposition. AAAI Conference on Artificial Intelligence, pages 2149–2155, 2014.
[28].↵
Bo Wang, Aziz M Mezlini, Feyyaz Demir, Marc Fiume, Zhuowen Tu, Michael Brudno, Benjamin HaibeKains, and Anna Goldenberg. Similarity network fusion for aggregating data types on a genomic scale. Nature Methods, 11(3):333–337, 2014.
OpenUrl
[29].↵
Nora K. Speicher and Nico Pfeifer. Integrating different data types by regularized unsupervised multiple kernel learning with application to cancer subtype discovery. Bioinformatics, 31(12):i268–i275, 2015.
OpenUrl CrossRef PubMed
[30].↵
Bo Long, Philip S. Yu, and Zhongfei (Mark) Zhang. A general model for multiple view unsupervised learning. In Proceedings of the 2008 SIAM International Conference on Data Mining, pages 822–833. Society for Industrial and Applied Mathematics, Philadelphia, PA, 2008.
[31].↵
Eric F Lock, Katherine A Hoadley, J S Marron, and Andrew B Nobel. Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. Annals of Applied Statistics, 7(1):523–542, 2013.
OpenUrl
[32].
Michael J. O’Connell and Eric F. Lock. R.jive for exploration of multi-source molecular data. Bioinformatics, 32(18):2877–2879, 2016.
OpenUrl CrossRef PubMed
[33].↵
Harold Hotelling. Relations between two sets of variates. Biometrika, 28(3/4):321, 1936.
OpenUrl CrossRef
[34].↵
A Klami, S Virtanen, and S Kaski. Bayesian canonical correlation analysis. The Journal of Machine Learning, 13(1):723–773, 2013.
OpenUrl
[35].↵
P. L. Lai and C. Fyfe. Kernel and nonlinear canonical correlation analysis. International Journal of Neural Systems, 10(05):365–377, 2000.
OpenUrl CrossRef PubMed
[36].↵
Galen Andrew, Raman Arora, Jeff Bilmes, and Karen Livescu. Deep canonical correlation analysis. In Proc. ICML ’13, volume 28, pages 1247–1255, 2013.
OpenUrl
[37].↵
Elena Parkhomenko, David Tritchler, and Joseph Beyene. Sparse canonical correlation analysis with application to genomic data integration. Statistical Applications in Genetics and Molecular Biology, 8(1):1–34, 2009.
OpenUrl
[38].↵
Daniela M Witten and Robert J Tibshirani. Extensions of sparse canonical correlation analysis with applications to genomic data. Statistical Applications in Genetics and Molecular Biology, 8(1):Article28, 2009.
OpenUrl
[39].↵
Javier Vía, Ignacio Santamaría, and Jesuś Pérez. A learning algorithm for adaptive canonical correlation analysis of several data sets. Neural Networks, 20(1):139–152, 2007.
OpenUrl PubMed
[40].↵
Yong Luo, Dacheng Tao, Kotagiri Ramamohanarao, Chao Xu, and Yonggang Wen. Tensor canonical correlation analysis for multi-view dimension reduction. In Proc. ICDE 2016, pages 1460–1461, 2016.
[41].↵
Jun Chen, Frederic D Bushman, James D Lewis, Gary D Wu, and Hongzhe Li. Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis. Biostatistics, 14(2):244–58, 2013.
OpenUrl CrossRef PubMed Web of Science
[42].↵
Dongdong Lin, Jigang Zhang, Jingyao Li, Vince D Calhoun, Hong Wen Deng, and Yu Ping Wang. Group sparse canonical correlation analysis for genomic data integration. BMC Bioinformatics, 14(1):245, 2013.
OpenUrl CrossRef PubMed
[43].↵
A. Podosinnikova, F. Bach, and S. Lacoste-Julien. Beyond CCA: Moment matching for multi-view models. ArXiv e-prints, 2016.
[44].
Florian Rohart, Benôıt Gautier, Amrit Singh, and Kim-Anh Lê Cao. mixOmics: An R package for ‘omics feature selection and multiple data integration. PLOS Computational Biology, 13(11):e1005752, 2017.
OpenUrl
[45].↵
Svante Wold, Michael Sjöström, and Lennart Eriksson. PLS-regression: A basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems, 58(2):109–130, 2001.
OpenUrl CrossRef Web of Science
[46].↵
Kim-Anh Lê Cao, Debra Rossouw, Christèle Robert-Graniè, and Philippe Besse. A sparse PLS for variable selection when integrating omics data. Statistical Applications in Genetics and Molecular Biology, 7(1):Article 35, 2008.
OpenUrl
[47].↵
Kim-Anh Lê Cao, Pascal GP Martin, Christèle Robert-Granié, and Philippe Besse. Sparse canonical methods for biological data integration: application to a cross-platform study. BMC Bioinformatics, 10(1):34, 2009.
OpenUrl CrossRef PubMed
[48].↵
Johan Trygg. O2-PLS for qualitative and quantitative analysis in multivariate calibration. Journal of Chemometrics, 16(6):283–293, 2002.
OpenUrl CrossRef Web of Science
[49].↵
Roman Rosipal, Leonard J Trejo, Nello Cristianini, John Shawe-Taylor, and Bob Williamson. Kernel partial least squares regression in reproducing kernel hilbert space. Journal of Machine Learning Research, 2:97–123, 2001.
OpenUrl
[50].↵
Mattias Rantalainen, Max Bylesjö, Olivier Cloarec, Jeremy K Nicholson, Elaine Holmes, and Johan Trygg. Kernel-based orthogonal projections to latent structures (K-OPLS). Journal of Chemometrics, 21(7-9):376–385, 2007.
OpenUrl
[51].↵
Wenyuan Li, Shihua Zhang, Chun-Chi Liu, and Xianghong Jasmine Zhou. Identifying multi-layer gene regulatory modules from multi-dimensional genomic data. Bioinformatics (Oxford, England), 28(19):2458–66, 2012.
OpenUrl CrossRef PubMed Web of Science
[52].↵
Tommy Löfstedt and Johan Trygg. OnPLS-a novel multiblock method for the modelling of predictive and orthogonal variation. Journal of Chemometrics, 25(8):441–455, 2011.
OpenUrl Web of Science
[53].↵
Chen Meng, Bernhard Kuster, Aedín C Culhane, and Amin Gholami. A multivariate approach to the integration of multi-omics datasets. BMC Bioinformatics, 15(1):162, 2014.
OpenUrl CrossRef PubMed
[54].↵
Jialu Liu, Chi Wang, Jing Gao, and Jiawei Han. Multi-view clustering via joint nonnegative matrix factorization. In Proc. ICDM ’13, pages 252–260. Society for Industrial and Applied Mathematics, Philadelphia, PA, 2013.
[55].↵
Mahdi M Kalayeh, Haroon Idrees, and Mubarak Shah. NMF-KNN: Image annotation using weighted multi-view non-negative matrix factorization. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 184–191, 2014.
[56].↵
Jin Huang, Feiping Nie, Heng Huang, and Chris Ding. Robust manifold nonnegative matrix factorization. ACM Transactions on Knowledge Discovery from Data, 8(3):1–21, 2014.
OpenUrl
[57].↵
Shihua Zhang, Chun-Chi Liu, Wenyuan Li, Hui Shen, Peter W Laird, and Xianghong Jasmine Zhou. Discovery of multi-dimensional modules by integrative analysis of cancer genomic data. Nucleic Acids Research, 40(19):9379–91, 2012.
OpenUrl CrossRef PubMed Web of Science
[58].↵
D. Hidru and A. Goldenberg. EquiNMF: Graph regularized multiview nonnegative matrix factorization. ArXiv e-prints, 2014.
[59].↵
X. Zhang, L. Zong, X. Liu, and H. Yu. Constrained NMF-based multi-view clustering on unmapped data. In Proc. AAAI ’15, volume 4, pages 3174–3180, 2015.
OpenUrl
[60].↵
Shao-Yuan Li, Yuan Jiang, and Zhi-Hua Zhou. Partial multi-view clustering. In Proc. AAAI ’14, pages 1968–1974. AAAI Press, 2014.
[61].↵
Marinka Žitnik and Blaž Zupan. Data fusion by matrix factorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(1):41–53, 2015.
OpenUrl
[62].↵
Martha White, Yaoliang Yu, Xinhua Zhang, and Dale Schuurmans. Convex multi-view subspace learning. In Proc. NIPS ’12, pages 1673–1681, USA, 2012.
[63].↵
Yuhong Guo. Convex subspace representation learning from multi-view data. AAAI 2013, pages 387–393, 2013.
[64].↵
Changqing Zhang, Huazhu Fu, Si Liu, Guangcan Liu, and Xiaochun Cao. Low-rank tensor constrained multiview subspace clustering. In Proc. ICCV ’15, pages 1582–1590. IEEE, 2015.
[65].↵
Qianxing Mo, Sijian Wang, Venkatraman E Seshan, Adam B Olshen, Nikolaus Schultz, Chris Sander, R Scott Powers, Marc Ladanyi, and Ronglai Shen. Pattern discovery and cancer gene identification in integrated cancer genomic data. Proceedings of the National Academy of Sciences of the United States of America, 110(11):4245–50, 2013.
[66].↵
Qianxing Mo, Ronglai Shen, Cui Guo, Marina Vannucci, Keith S Chan, and Susan G Hilsenbeck. A fully bayesian latent variable model for integrative clustering analysis of multi-type omics data. Biostatistics, 19(1):71–86, 2018.
OpenUrl
[67].↵
Charles J. Vaske, Stephen C. Benz, J. Zachary Sanborn, Dent Earl, Christopher Szeto, Jingchun Zhu, David Haussler, and Joshua M. Stuart. Inference of patient-specific pathway activities from multidimensional cancer genomics data using PARADIGM. Bioinformatics, 26(12):i237–i245, 2010.
OpenUrl CrossRef PubMed Web of Science
[68].↵
Richard S. Savage, Zoubin Ghahramani, Jim E. Griffin, Bernard J. de la Cruz, and David L. Wild. Discovering transcriptional modules by Bayesian data integration. Bioinformatics, 26(12):i158–i167, 2010.
OpenUrl CrossRef PubMed Web of Science
[69].↵
Yinyin Yuan, Richard S. Savage, and Florian Markowetz. Patient-specific data fusion defines prognostic cancer subtypes. PLoS Computational Biology, 7(10):e1002227, 2011.
OpenUrl
[70].↵
Paul Kirk, Jim E. Griffin, Richard S. Savage, Zoubin Ghahramani, and David L. Wild. Bayesian correlated clustering to integrate multiple datasets. Bioinformatics, 28(24):3290–3297, 2012.
OpenUrl CrossRef PubMed Web of Science
[71].↵
Eric F. Lock and David B. Dunson. Bayesian consensus clustering. Bioinformatics, 29(20):2610–2616, 2013.
OpenUrl CrossRef PubMed
[72].↵
Evelina Gabasova, John Reid, and Lorenz Wernisch. Clusternomics: Integrative context-dependent clustering for heterogeneous datasets. PLOS Computational Biology, 13(10):e1005781, 2017.
OpenUrl
[73].↵
Ashar Ahmad and Holger Fröhlich. Towards clinically more relevant dissection of patient heterogeneity via survival-based Bayesian clustering. Bioinformatics, 33(22):3558–3566, 2017.
OpenUrl
[74].↵
Pietro Coretto, Angela Serra, and Roberto Tagliaferri. Robust clustering of noisy high-dimensional gene expression data for patients subtyping. Bioinformatics, page bty502, 2018.
[75].↵
Kumardeep Chaudhary, Olivier B Poirion, Liangqun Lu, and Lana X Garmire. Deep learning-based multi-omics integration robustly predicts survival in liver cancer. Clinical Cancer Research, 24(6):1248–1259, 2018.
OpenUrl Abstract/FREE Full Text
[76].↵
Muxuan Liang, Zhizhong Li, Ting Chen, and Jianyang Zeng. Integrative data analysis of multi-platform cancer data with a multimodal deep learning approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 12(4):928–937, 2015.
OpenUrl
[77].↵
Avrim Blum and Tom Mitchell. Combining labeled and unlabeled data with co-training. In Proc. COLT ’98, pages 92–100, New York, New York, USA, 1998. ACM Press.
[78].↵
Peter J. Bickel, Bo Li, Alexandre B. Tsybakov, Sara A. van de Geer, Bin Yu, Teófilo Valdés, Carlos Rivero, Jianqing Fan, and Aad van der Vaart. Regularization in statistics. Test, 15(2):271–344, 2006.
OpenUrl CrossRef Web of Science
[79].↵
Robert Tibshirani. Regression selection and shrinkage via the lasso. Journal of the Royal Statistical Society B, 58(1):267–288, 1996.
OpenUrl Web of Science
[80].↵
Stefano Monti, Pablo Tamayo, Jill Mesirov, and Todd Golub. Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning, 52(1/2):91–118, 2003.
OpenUrl CrossRef
[81].↵
Thomas Hofmann. Probabilistic latent semantic analysis. In Proc. UAI ’99, pages 289–296, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc.
[82].↵
Sandro Vega-Pons and José Ruiz-Shulcloper. A Survey of clustering ensemble algorithms. International Journal of Pattern Recognition and Artificial Intelligence, 25(03):337–372, 2011.
OpenUrl
[83].↵
Ulrike von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416, 2007.
OpenUrl CrossRef
[84].↵
Bojan Mohar. The Laplacian spectrum of graphs. Graph Theory, Combinatorics, and Applications, 2:871–898, 1991.
OpenUrl
[85].↵
L Lo Asz. Random walks on graphs: A survey. Combinatorics, (2):1–46, 1993.
[86].↵
Judea Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, 1988.
[87].↵
D. R. Cox and David Oakes. Analysis of Survival Data. Chapman and Hall, 1984.
[88].↵
Kamalika Chaudhuri, Sham M Kakade, Karen Livescu, and Karthik Sridharan. Multi-view clustering via canonical correlation analysis. In Proc. ICML ’09, pages 1–8, 2009.
[89].↵
Francis R Bach and Michael I Jordan. A probabilistic interpretation of canonical correlation analysis. Dept Statist Univ California Berkeley CA Tech Rep, 688:1–11, 2006.
OpenUrl
[90].↵
Max Bylesjö, Daniel Eriksson, Miyako Kusano, Thomas Moritz, and Johan Trygg. Data integration in plant biology: The O2-PLS method for combined modeling of transcript and metabolite data. The Plant Journal, 52(6):1181–1191, 2007.
OpenUrl CrossRef PubMed Web of Science
[91].↵
Said el Bouhaddani, Jeanine Houwing-Duistermaat, Perttu Salo, Markus Perola, Geurt Jongbloed, and Hae-Won Uh. Evaluation of O2-PLS in omics data integration. BMC Bioinformatics, 17(S2):S11, 2016.
OpenUrl
[92].↵
D. Hwang, G. Stephanopoulos, and C. Chan. Inverse modeling using multi-block PLS to determine the environmental conditions that provide optimal cellular function. Bioinformatics, 20(4):487–499, 2004.
OpenUrl CrossRef PubMed Web of Science
[93].↵
Stéphane Dray, Daniel Chessel, and Jean Thioulouse. Co-inertia analysis and the linking of ecological data tables. Ecology, 84(11):3078–3089, 2003.
OpenUrl CrossRef Web of Science
[94].↵
H. Sebastian Seung and Daniel D. Lee. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788–791, 1999.
OpenUrl CrossRef PubMed Web of Science
[95].↵
Daniel D. Lee and H. Sebastian Seung. Algorithms for non-negative matrix factorization. Adv in Neural Inf Proc Syst, (February):535–541, 2001.
[96].↵
Marinka Žitnik and Blaž Zupan. Survival regression by data fusion. Systems Biomedicine, 2(3):47–53, 2015.
OpenUrl
[97].↵
Katherine A. Hoadley, Christina Yau, Toshinori Hinoue, Denise Wolf, Alexander Lazar, Esther Drill, Ronglai Shen, Alison M. Taylor, Andrew Cherniack, Vésteinn Thorsson, Rehan Akbani, Reanne Bowlby, Christopher K. Wong, Maciej Wiznerowicz, Francisco Sánchez-Vega, Gordon Robertson, Barbara G. Schneider, Michael Lawrence, Houtan Noushmehr, and Armaz Mariamidze. Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer. Cell, 173:291–304.e6, 2018.
OpenUrl CrossRef PubMed
[98].↵
Sam Ng, Eric A Collisson, Artem Sokolov, Theodore Goldstein, Abel Gonzalez-Perez, Nuria Lopez-Bigas, Christopher Benz, David Haussler, and Joshua M Stuart. PARADIGM-SHIFT predicts the function of mutations in multiple cancers using pathway impact analysis. Bioinformatics, 28(18):i640–i646, 2012.
OpenUrl CrossRef PubMed Web of Science
[99].↵
Stuart Geman and Donald Geman. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6(6):721–741, 1984.
OpenUrl
[100].↵
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436–444, 2015.
OpenUrl CrossRef PubMed
[101].↵
Alex Krizhevsky, Ilya Sutskever, and Hinton Geoffrey E. ImageNet classification with deep Convolutional neural Networks. In Proc. NIPS ’12, volume 1, pages 1097–1105, 2012.
OpenUrl
[102].↵
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. Sequence to sequence learning with neural networks. In Proc. NIPS’14, pages 3104–3112, Cambridge, MA, USA, 2014. MIT Press.
[103].↵
Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y Ng. Multimodal deep learning. Proc. ICML ’11, pages 689–696, 2011.
[104].↵
Weiran Wang, Raman Arora, Karen Livescu, and Jeff Bilmes. On deep multi-view representation learning: Objectives and optimization. Proc. ICML ’16, pages 1083–1092, 2016.
[105].↵
Travers Ching, Daniel S. Himmelstein, Brett K. Beaulieu-Jones, Alexandr A. Kalinin, Brian T. Do, Gregory P. Way, Enrico Ferrero, Paul-Michael Agapow, Michael Zietz, Michael M. Hoffman, Wei Xie, Gail L. Rosen, Benjamin J. Lengerich, Johnny Israeli, Jack Lanchantin, Stephen Woloszynek, Anne E. Carpenter, Avanti Shrikumar, Jinbo Xu, Evan M. Cofer, Christopher A. Lavender, Srinivas C. Turaga, Amr M. Alexandari, Zhiyong Lu, David J. Harris, Dave DeCaprio, Yanjun Qi, Anshul Kundaje, Yifan Peng, Laura K. Wiley, Marwin H. S. Segler, Simina M. Boca, S. Joshua Swamidass, Austin Huang, Anthony Gitter, and Casey S. Greene. Opportunities and obstacles for deep learning in biology and medicine. Journal of The Royal Society Interface, 15(141):20170387, 2018.
OpenUrl
[106].↵
Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527–1554, 2006.
OpenUrl CrossRef PubMed Web of Science
[107].↵
David W. Hosmer, Stanley. Lemeshow, and Susanne. May. Applied survival analysis: regression modeling of time-to-event data. Wiley-Interscience, 2008.
[108].↵
Peter J. Rousseeuw and Peter. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20(1):53–65, 1987.
OpenUrl CrossRef Web of Science
[109].↵
Fabio Vandin, Alexandra Papoutsaki, Benjamin J. Raphael, and Eli Upfal. Accurate computation of survival statistics in genome-wide studies. PLOS Computational Biology, 11(5):1–18, 05 2015.
OpenUrl CrossRef
[110].↵
Miriam Ragle Aure, Israel Steinfeld, Lars Oliver Baumbusch, Knut Liestøl, Doron Lipson, Sandra Nyberg, Bjørn Naume, Kristine Kleivi Sahlberg, Vessela N. Kristensen, Anne-Lise Børresen-Dale, Ole Christian Lingjærde, and Zohar Yakhini. Identifying in-trans process associated genes in breast cancer by integrated analysis of copy number and expression data. PLOS ONE, 8(1):1–15, 2013.
OpenUrl CrossRef PubMed

View the discussion thread.

Posted July 19, 2018.

Download PDF

Supplementary Material

Citation Tools

Subject Area

Bioinformatics

Subject Areas

All Articles

Animal Behavior and Cognition (5215)
Biochemistry (11752)
Bioengineering (8752)
Bioinformatics (29200)
Biophysics (14974)
Cancer Biology (12096)
Cell Biology (17411)
Clinical Trials (138)
Developmental Biology (9421)
Ecology (14182)
Epidemiology (2067)
Evolutionary Biology (18308)
Genetics (12245)
Genomics (16803)
Immunology (11869)
Microbiology (28097)
Molecular Biology (11594)
Neuroscience (60969)
Paleontology (451)
Pathology (1871)
Pharmacology and Toxicology (3238)
Physiology (4959)
Plant Biology (10427)
Scientific Communication and Education (1683)
Synthetic Biology (2886)
Systems Biology (7340)
Zoology (1651)

[1] [1].↵
Sara Goodwin, John D. McPherson, and W. Richard McCombie. Coming of age: ten years of next-generation sequencing technologies. Nature Reviews Genetics, 17(6):333–351, 2016.
OpenUrl CrossRef PubMed

[2] [2].↵
Fatih Ozsolak and Patrice M. Milos. RNA sequencing: advances, challenges and opportunities. Nature Reviews Genetics, 12(2):87–98, 2011.
OpenUrl CrossRef PubMed Web of Science

[3] [3].↵
David B. Allison, Xiangqin Cui, Grier P. Page, and Mahyar Sabripour. Microarray data analysis: From disarray to consolidation and consensus. Nature Reviews Genetics, 7(1):55–65, 2006.
OpenUrl CrossRef PubMed Web of Science

[4] [4].↵
Wai-Shin Yong, Fei-Man Hsu, and Pao-Yang Chen. Profiling genome-wide DNA methylation. Epigenetics & Chromatin, 9(1):26, 2016.
OpenUrl CrossRef PubMed

[5] [5].↵
A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review. ACM Computing Surveys, 31(3):264–323, 1999.
OpenUrl

[6] [6].↵
Vinay Prasad, Tito Fojo, and Michael Brada. Precision oncology: origins, optimism, and potential. The Lancet. Oncology, 17(2):e81–e86, 2016.
OpenUrl

[7] [7].↵
Jing Zhao, Xijiong Xie, Xin Xu, and Shiliang Sun. Multi-view learning overview: Recent progress and new challenges. Information Fusion, 38:43–54, 2017.
OpenUrl

[8] [8].↵
G. Chao, S. Sun, and J. Bi. A survey on multi-view clustering. ArXiv e-prints, 2017.

[9] [9].↵
The Cancer Genome Atlas Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, 455(7216):1061–1068, 2008.
OpenUrl CrossRef PubMed Web of Science

[10] [10].↵
Sijia Huang, Kumardeep Chaudhary, and Lana X Garmire. More is better: Recent progress in multiomics data integration methods. Frontiers in Genetics, 8:84, 2017.
OpenUrl

[11] [11].↵
Matteo Bersanelli, Ettore Mosca, Daniel Remondini, Enrico Giampieri, Claudia Sala, Gastone Castellani, and Luciano Milanesi. Methods for the integration of multi-omics data: mathematical aspects. BMC Bioinformatics, 17(S2):S15, 2016.
OpenUrl

[12] [12].↵
Yifeng Li, Fang-Xiang Wu, and Alioune Ngom. A review on machine learning principles for multi-view biological data integration. Briefings in Bioinformatics, pages 325–340, 2016.

[13] [13].↵
Dongfang Wang and Jin Gu. Integrative clustering methods of multi-omics data for molecule-based cancer classifications. Quantitative Biology, 4(1):58–67, 2016.
OpenUrl

[14] [14].↵
Chen Meng, Oana A. Zeleznik, Gerhard G. Thallinger, Bernhard Kuster, Amin M. Gholami, and Aedín C. Culhane. Dimension reduction techniques for the integrative analysis of multi-omics data. Briefings in Bioinformatics, 17(4):628–641, 2016.
OpenUrl CrossRef PubMed

[15] [15].↵
Giulia Tini, Luca Marchetti, Corrado Priami, and Marie-Pier Scott-Boyer. Multi-omics integration—a comparison of unsupervised clustering methodologies. Briefings in Bioinformatics, 2017.

[16] [16].↵
Ronglai Shen, Adam B. Olshen, and Marc Ladanyi. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics, 25(22):2906–2912, 2009.
OpenUrl CrossRef PubMed Web of Science

[17] [17].↵
Steffen Bickel and Tobias Scheffer. Multi-view clustering. Proc. ICDM 2004, pages 19–26, 2004.

[18] [18].↵
Dingming Wu, Dongfang Wang, Michael Q. Zhang, and Jin Gu. Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation: Application to cancer molecular classification. BMC Genomics, 16(1):1022, 2015.
OpenUrl

[19] [19].↵
Hua Wang, Feiping Nie, and Heng Huang. Multi-view clustering and feature learning via structured sparsity. Proc. ICML ’13, 28:352–360, 2013.
OpenUrl

[20] [20].↵
Katherine A. Hoadley, Christina Yau, Denise M. Wolf, Andrew D. Cherniack, David Tamborero, Sam Ng, Max D.M. Leiserson, Beifang Niu, Michael D. McLellan, Vladislav Uzunangelov, Jiashan Zhang, Cyriac Kandoth, Rehan Akbani, Hui Shen, Larsson Omberg, Andy Chu, Adam A. Margolin, Laura J. van’t Veer, Nuria Lopez-Bigas, Peter W. Laird, Benjamin J. Raphael, Li Ding, A. Gordon Robertson, Lauren A. Byers, Gordon B. Mills, John N. Weinstein, Carter Van Waes, Zhong Chen, Eric A. Collisson, Christopher C. Benz, Charles M. Perou, Joshua M. Stuart, and Joshua M Stuart. Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell, 158(4):929–944, 2014.
OpenUrl CrossRef PubMed Web of Science

[21] [21].↵
Eric Bruno and Stéphane Marchand-Maillet. Multiview clustering: A late fusion approach using latent models categories and subject descriptors. In Proc. ACM SIGIR ’09, pages 736–737, New York, New York, USA, 2009. ACM Press.

[22] [22].↵
Tin Nguyen, Rebecca Tagett, Diana Diaz, and Sorin Draghici. A novel approach for data integration and disease subtyping. Genome Research, 27(12):2025–2039, 2017.
OpenUrl Abstract/FREE Full Text

[23] [23].↵
Virginia R de Sa. Spectral clustering with two views. In Proceedings of the Workshop on Learning with Multiple Views, 22nd ICML, pages 20–27, 2005.

[24] [24].↵
Abhishek Kumar, Piyush Rai, and Hal Daumé, III.. Co-regularized multi-view spectral clustering. In Proc. NIPS ’11, pages 1413–1421, USA, 2011.

[25] [25].↵
Nacim Fateh Chikhi. Multi-view clustering via spectral partitioning and local refinement. Information Processing & Management, 52(4):618–627, 2016.
OpenUrl

[26] [26].↵
Yeqing Li, Feiping Nie, Heng Huang, and Junzhou Huang. Large-scale multi-view spectral clustering with bipartite graph. In Proc. AAAI 15, pages 2750–2756, 2015.

[27] [27].↵
Rongkai Xia, Yan Pan, Lei Du, and Jian Yin. Robust multi-view spectral clustering via low-rank and sparse decomposition. AAAI Conference on Artificial Intelligence, pages 2149–2155, 2014.

[28] [28].↵
Bo Wang, Aziz M Mezlini, Feyyaz Demir, Marc Fiume, Zhuowen Tu, Michael Brudno, Benjamin HaibeKains, and Anna Goldenberg. Similarity network fusion for aggregating data types on a genomic scale. Nature Methods, 11(3):333–337, 2014.
OpenUrl

[29] [29].↵
Nora K. Speicher and Nico Pfeifer. Integrating different data types by regularized unsupervised multiple kernel learning with application to cancer subtype discovery. Bioinformatics, 31(12):i268–i275, 2015.
OpenUrl CrossRef PubMed

[30] [30].↵
Bo Long, Philip S. Yu, and Zhongfei (Mark) Zhang. A general model for multiple view unsupervised learning. In Proceedings of the 2008 SIAM International Conference on Data Mining, pages 822–833. Society for Industrial and Applied Mathematics, Philadelphia, PA, 2008.

[31] [31].↵
Eric F Lock, Katherine A Hoadley, J S Marron, and Andrew B Nobel. Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. Annals of Applied Statistics, 7(1):523–542, 2013.
OpenUrl

[32] [32].
Michael J. O’Connell and Eric F. Lock. R.jive for exploration of multi-source molecular data. Bioinformatics, 32(18):2877–2879, 2016.
OpenUrl CrossRef PubMed

[33] [33].↵
Harold Hotelling. Relations between two sets of variates. Biometrika, 28(3/4):321, 1936.
OpenUrl CrossRef

[34] [34].↵
A Klami, S Virtanen, and S Kaski. Bayesian canonical correlation analysis. The Journal of Machine Learning, 13(1):723–773, 2013.
OpenUrl

[35] [35].↵
P. L. Lai and C. Fyfe. Kernel and nonlinear canonical correlation analysis. International Journal of Neural Systems, 10(05):365–377, 2000.
OpenUrl CrossRef PubMed

[36] [36].↵
Galen Andrew, Raman Arora, Jeff Bilmes, and Karen Livescu. Deep canonical correlation analysis. In Proc. ICML ’13, volume 28, pages 1247–1255, 2013.
OpenUrl

[37] [37].↵
Elena Parkhomenko, David Tritchler, and Joseph Beyene. Sparse canonical correlation analysis with application to genomic data integration. Statistical Applications in Genetics and Molecular Biology, 8(1):1–34, 2009.
OpenUrl

[38] [38].↵
Daniela M Witten and Robert J Tibshirani. Extensions of sparse canonical correlation analysis with applications to genomic data. Statistical Applications in Genetics and Molecular Biology, 8(1):Article28, 2009.
OpenUrl

[39] [39].↵
Javier Vía, Ignacio Santamaría, and Jesuś Pérez. A learning algorithm for adaptive canonical correlation analysis of several data sets. Neural Networks, 20(1):139–152, 2007.
OpenUrl PubMed

[40] [40].↵
Yong Luo, Dacheng Tao, Kotagiri Ramamohanarao, Chao Xu, and Yonggang Wen. Tensor canonical correlation analysis for multi-view dimension reduction. In Proc. ICDE 2016, pages 1460–1461, 2016.

[41] [41].↵
Jun Chen, Frederic D Bushman, James D Lewis, Gary D Wu, and Hongzhe Li. Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis. Biostatistics, 14(2):244–58, 2013.
OpenUrl CrossRef PubMed Web of Science

[42] [42].↵
Dongdong Lin, Jigang Zhang, Jingyao Li, Vince D Calhoun, Hong Wen Deng, and Yu Ping Wang. Group sparse canonical correlation analysis for genomic data integration. BMC Bioinformatics, 14(1):245, 2013.
OpenUrl CrossRef PubMed

[43] [43].↵
A. Podosinnikova, F. Bach, and S. Lacoste-Julien. Beyond CCA: Moment matching for multi-view models. ArXiv e-prints, 2016.

[44] [44].
Florian Rohart, Benôıt Gautier, Amrit Singh, and Kim-Anh Lê Cao. mixOmics: An R package for ‘omics feature selection and multiple data integration. PLOS Computational Biology, 13(11):e1005752, 2017.
OpenUrl

[45] [45].↵
Svante Wold, Michael Sjöström, and Lennart Eriksson. PLS-regression: A basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems, 58(2):109–130, 2001.
OpenUrl CrossRef Web of Science

[46] [46].↵
Kim-Anh Lê Cao, Debra Rossouw, Christèle Robert-Graniè, and Philippe Besse. A sparse PLS for variable selection when integrating omics data. Statistical Applications in Genetics and Molecular Biology, 7(1):Article 35, 2008.
OpenUrl

[47] [47].↵
Kim-Anh Lê Cao, Pascal GP Martin, Christèle Robert-Granié, and Philippe Besse. Sparse canonical methods for biological data integration: application to a cross-platform study. BMC Bioinformatics, 10(1):34, 2009.
OpenUrl CrossRef PubMed

[48] [48].↵
Johan Trygg. O2-PLS for qualitative and quantitative analysis in multivariate calibration. Journal of Chemometrics, 16(6):283–293, 2002.
OpenUrl CrossRef Web of Science

[49] [49].↵
Roman Rosipal, Leonard J Trejo, Nello Cristianini, John Shawe-Taylor, and Bob Williamson. Kernel partial least squares regression in reproducing kernel hilbert space. Journal of Machine Learning Research, 2:97–123, 2001.
OpenUrl

[50] [50].↵
Mattias Rantalainen, Max Bylesjö, Olivier Cloarec, Jeremy K Nicholson, Elaine Holmes, and Johan Trygg. Kernel-based orthogonal projections to latent structures (K-OPLS). Journal of Chemometrics, 21(7-9):376–385, 2007.
OpenUrl

[51] [51].↵
Wenyuan Li, Shihua Zhang, Chun-Chi Liu, and Xianghong Jasmine Zhou. Identifying multi-layer gene regulatory modules from multi-dimensional genomic data. Bioinformatics (Oxford, England), 28(19):2458–66, 2012.
OpenUrl CrossRef PubMed Web of Science

[52] [52].↵
Tommy Löfstedt and Johan Trygg. OnPLS-a novel multiblock method for the modelling of predictive and orthogonal variation. Journal of Chemometrics, 25(8):441–455, 2011.
OpenUrl Web of Science

[53] [53].↵
Chen Meng, Bernhard Kuster, Aedín C Culhane, and Amin Gholami. A multivariate approach to the integration of multi-omics datasets. BMC Bioinformatics, 15(1):162, 2014.
OpenUrl CrossRef PubMed

[54] [54].↵
Jialu Liu, Chi Wang, Jing Gao, and Jiawei Han. Multi-view clustering via joint nonnegative matrix factorization. In Proc. ICDM ’13, pages 252–260. Society for Industrial and Applied Mathematics, Philadelphia, PA, 2013.

[55] [55].↵
Mahdi M Kalayeh, Haroon Idrees, and Mubarak Shah. NMF-KNN: Image annotation using weighted multi-view non-negative matrix factorization. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 184–191, 2014.

[56] [56].↵
Jin Huang, Feiping Nie, Heng Huang, and Chris Ding. Robust manifold nonnegative matrix factorization. ACM Transactions on Knowledge Discovery from Data, 8(3):1–21, 2014.
OpenUrl

[57] [57].↵
Shihua Zhang, Chun-Chi Liu, Wenyuan Li, Hui Shen, Peter W Laird, and Xianghong Jasmine Zhou. Discovery of multi-dimensional modules by integrative analysis of cancer genomic data. Nucleic Acids Research, 40(19):9379–91, 2012.
OpenUrl CrossRef PubMed Web of Science

[58] [58].↵
D. Hidru and A. Goldenberg. EquiNMF: Graph regularized multiview nonnegative matrix factorization. ArXiv e-prints, 2014.

[59] [59].↵
X. Zhang, L. Zong, X. Liu, and H. Yu. Constrained NMF-based multi-view clustering on unmapped data. In Proc. AAAI ’15, volume 4, pages 3174–3180, 2015.
OpenUrl

[60] [60].↵
Shao-Yuan Li, Yuan Jiang, and Zhi-Hua Zhou. Partial multi-view clustering. In Proc. AAAI ’14, pages 1968–1974. AAAI Press, 2014.

[61] [61].↵
Marinka Žitnik and Blaž Zupan. Data fusion by matrix factorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(1):41–53, 2015.
OpenUrl

[62] [62].↵
Martha White, Yaoliang Yu, Xinhua Zhang, and Dale Schuurmans. Convex multi-view subspace learning. In Proc. NIPS ’12, pages 1673–1681, USA, 2012.

[63] [63].↵
Yuhong Guo. Convex subspace representation learning from multi-view data. AAAI 2013, pages 387–393, 2013.

[64] [64].↵
Changqing Zhang, Huazhu Fu, Si Liu, Guangcan Liu, and Xiaochun Cao. Low-rank tensor constrained multiview subspace clustering. In Proc. ICCV ’15, pages 1582–1590. IEEE, 2015.

[65] [65].↵
Qianxing Mo, Sijian Wang, Venkatraman E Seshan, Adam B Olshen, Nikolaus Schultz, Chris Sander, R Scott Powers, Marc Ladanyi, and Ronglai Shen. Pattern discovery and cancer gene identification in integrated cancer genomic data. Proceedings of the National Academy of Sciences of the United States of America, 110(11):4245–50, 2013.

[66] [66].↵
Qianxing Mo, Ronglai Shen, Cui Guo, Marina Vannucci, Keith S Chan, and Susan G Hilsenbeck. A fully bayesian latent variable model for integrative clustering analysis of multi-type omics data. Biostatistics, 19(1):71–86, 2018.
OpenUrl

[67] [67].↵
Charles J. Vaske, Stephen C. Benz, J. Zachary Sanborn, Dent Earl, Christopher Szeto, Jingchun Zhu, David Haussler, and Joshua M. Stuart. Inference of patient-specific pathway activities from multidimensional cancer genomics data using PARADIGM. Bioinformatics, 26(12):i237–i245, 2010.
OpenUrl CrossRef PubMed Web of Science

[68] [68].↵
Richard S. Savage, Zoubin Ghahramani, Jim E. Griffin, Bernard J. de la Cruz, and David L. Wild. Discovering transcriptional modules by Bayesian data integration. Bioinformatics, 26(12):i158–i167, 2010.
OpenUrl CrossRef PubMed Web of Science

[69] [69].↵
Yinyin Yuan, Richard S. Savage, and Florian Markowetz. Patient-specific data fusion defines prognostic cancer subtypes. PLoS Computational Biology, 7(10):e1002227, 2011.
OpenUrl

[70] [70].↵
Paul Kirk, Jim E. Griffin, Richard S. Savage, Zoubin Ghahramani, and David L. Wild. Bayesian correlated clustering to integrate multiple datasets. Bioinformatics, 28(24):3290–3297, 2012.
OpenUrl CrossRef PubMed Web of Science

[71] [71].↵
Eric F. Lock and David B. Dunson. Bayesian consensus clustering. Bioinformatics, 29(20):2610–2616, 2013.
OpenUrl CrossRef PubMed

[72] [72].↵
Evelina Gabasova, John Reid, and Lorenz Wernisch. Clusternomics: Integrative context-dependent clustering for heterogeneous datasets. PLOS Computational Biology, 13(10):e1005781, 2017.
OpenUrl

[73] [73].↵
Ashar Ahmad and Holger Fröhlich. Towards clinically more relevant dissection of patient heterogeneity via survival-based Bayesian clustering. Bioinformatics, 33(22):3558–3566, 2017.
OpenUrl

[74] [74].↵
Pietro Coretto, Angela Serra, and Roberto Tagliaferri. Robust clustering of noisy high-dimensional gene expression data for patients subtyping. Bioinformatics, page bty502, 2018.

[75] [75].↵
Kumardeep Chaudhary, Olivier B Poirion, Liangqun Lu, and Lana X Garmire. Deep learning-based multi-omics integration robustly predicts survival in liver cancer. Clinical Cancer Research, 24(6):1248–1259, 2018.
OpenUrl Abstract/FREE Full Text

[76] [76].↵
Muxuan Liang, Zhizhong Li, Ting Chen, and Jianyang Zeng. Integrative data analysis of multi-platform cancer data with a multimodal deep learning approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 12(4):928–937, 2015.
OpenUrl

[77] [77].↵
Avrim Blum and Tom Mitchell. Combining labeled and unlabeled data with co-training. In Proc. COLT ’98, pages 92–100, New York, New York, USA, 1998. ACM Press.

[78] [78].↵
Peter J. Bickel, Bo Li, Alexandre B. Tsybakov, Sara A. van de Geer, Bin Yu, Teófilo Valdés, Carlos Rivero, Jianqing Fan, and Aad van der Vaart. Regularization in statistics. Test, 15(2):271–344, 2006.
OpenUrl CrossRef Web of Science

[79] [79].↵
Robert Tibshirani. Regression selection and shrinkage via the lasso. Journal of the Royal Statistical Society B, 58(1):267–288, 1996.
OpenUrl Web of Science

[80] [80].↵
Stefano Monti, Pablo Tamayo, Jill Mesirov, and Todd Golub. Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning, 52(1/2):91–118, 2003.
OpenUrl CrossRef

[81] [81].↵
Thomas Hofmann. Probabilistic latent semantic analysis. In Proc. UAI ’99, pages 289–296, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc.

[82] [82].↵
Sandro Vega-Pons and José Ruiz-Shulcloper. A Survey of clustering ensemble algorithms. International Journal of Pattern Recognition and Artificial Intelligence, 25(03):337–372, 2011.
OpenUrl

[83] [83].↵
Ulrike von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416, 2007.
OpenUrl CrossRef

[84] [84].↵
Bojan Mohar. The Laplacian spectrum of graphs. Graph Theory, Combinatorics, and Applications, 2:871–898, 1991.
OpenUrl

[85] [85].↵
L Lo Asz. Random walks on graphs: A survey. Combinatorics, (2):1–46, 1993.

[86] [86].↵
Judea Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, 1988.

[87] [87].↵
D. R. Cox and David Oakes. Analysis of Survival Data. Chapman and Hall, 1984.

[88] [88].↵
Kamalika Chaudhuri, Sham M Kakade, Karen Livescu, and Karthik Sridharan. Multi-view clustering via canonical correlation analysis. In Proc. ICML ’09, pages 1–8, 2009.

[89] [89].↵
Francis R Bach and Michael I Jordan. A probabilistic interpretation of canonical correlation analysis. Dept Statist Univ California Berkeley CA Tech Rep, 688:1–11, 2006.
OpenUrl

[90] [90].↵
Max Bylesjö, Daniel Eriksson, Miyako Kusano, Thomas Moritz, and Johan Trygg. Data integration in plant biology: The O2-PLS method for combined modeling of transcript and metabolite data. The Plant Journal, 52(6):1181–1191, 2007.
OpenUrl CrossRef PubMed Web of Science

[91] [91].↵
Said el Bouhaddani, Jeanine Houwing-Duistermaat, Perttu Salo, Markus Perola, Geurt Jongbloed, and Hae-Won Uh. Evaluation of O2-PLS in omics data integration. BMC Bioinformatics, 17(S2):S11, 2016.
OpenUrl

[92] [92].↵
D. Hwang, G. Stephanopoulos, and C. Chan. Inverse modeling using multi-block PLS to determine the environmental conditions that provide optimal cellular function. Bioinformatics, 20(4):487–499, 2004.
OpenUrl CrossRef PubMed Web of Science

[93] [93].↵
Stéphane Dray, Daniel Chessel, and Jean Thioulouse. Co-inertia analysis and the linking of ecological data tables. Ecology, 84(11):3078–3089, 2003.
OpenUrl CrossRef Web of Science

[94] [94].↵
H. Sebastian Seung and Daniel D. Lee. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788–791, 1999.
OpenUrl CrossRef PubMed Web of Science

[95] [95].↵
Daniel D. Lee and H. Sebastian Seung. Algorithms for non-negative matrix factorization. Adv in Neural Inf Proc Syst, (February):535–541, 2001.

[96] [96].↵
Marinka Žitnik and Blaž Zupan. Survival regression by data fusion. Systems Biomedicine, 2(3):47–53, 2015.
OpenUrl

[97] [97].↵
Katherine A. Hoadley, Christina Yau, Toshinori Hinoue, Denise Wolf, Alexander Lazar, Esther Drill, Ronglai Shen, Alison M. Taylor, Andrew Cherniack, Vésteinn Thorsson, Rehan Akbani, Reanne Bowlby, Christopher K. Wong, Maciej Wiznerowicz, Francisco Sánchez-Vega, Gordon Robertson, Barbara G. Schneider, Michael Lawrence, Houtan Noushmehr, and Armaz Mariamidze. Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer. Cell, 173:291–304.e6, 2018.
OpenUrl CrossRef PubMed

[98] [98].↵
Sam Ng, Eric A Collisson, Artem Sokolov, Theodore Goldstein, Abel Gonzalez-Perez, Nuria Lopez-Bigas, Christopher Benz, David Haussler, and Joshua M Stuart. PARADIGM-SHIFT predicts the function of mutations in multiple cancers using pathway impact analysis. Bioinformatics, 28(18):i640–i646, 2012.
OpenUrl CrossRef PubMed Web of Science

[99] [99].↵
Stuart Geman and Donald Geman. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6(6):721–741, 1984.
OpenUrl

[100] [100].↵
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436–444, 2015.
OpenUrl CrossRef PubMed

[101] [101].↵
Alex Krizhevsky, Ilya Sutskever, and Hinton Geoffrey E. ImageNet classification with deep Convolutional neural Networks. In Proc. NIPS ’12, volume 1, pages 1097–1105, 2012.
OpenUrl

[102] [102].↵
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. Sequence to sequence learning with neural networks. In Proc. NIPS’14, pages 3104–3112, Cambridge, MA, USA, 2014. MIT Press.

[103] [103].↵
Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y Ng. Multimodal deep learning. Proc. ICML ’11, pages 689–696, 2011.

[104] [104].↵
Weiran Wang, Raman Arora, Karen Livescu, and Jeff Bilmes. On deep multi-view representation learning: Objectives and optimization. Proc. ICML ’16, pages 1083–1092, 2016.

[105] [105].↵
Travers Ching, Daniel S. Himmelstein, Brett K. Beaulieu-Jones, Alexandr A. Kalinin, Brian T. Do, Gregory P. Way, Enrico Ferrero, Paul-Michael Agapow, Michael Zietz, Michael M. Hoffman, Wei Xie, Gail L. Rosen, Benjamin J. Lengerich, Johnny Israeli, Jack Lanchantin, Stephen Woloszynek, Anne E. Carpenter, Avanti Shrikumar, Jinbo Xu, Evan M. Cofer, Christopher A. Lavender, Srinivas C. Turaga, Amr M. Alexandari, Zhiyong Lu, David J. Harris, Dave DeCaprio, Yanjun Qi, Anshul Kundaje, Yifan Peng, Laura K. Wiley, Marwin H. S. Segler, Simina M. Boca, S. Joshua Swamidass, Austin Huang, Anthony Gitter, and Casey S. Greene. Opportunities and obstacles for deep learning in biology and medicine. Journal of The Royal Society Interface, 15(141):20170387, 2018.
OpenUrl

[106] [106].↵
Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527–1554, 2006.
OpenUrl CrossRef PubMed Web of Science

[107] [107].↵
David W. Hosmer, Stanley. Lemeshow, and Susanne. May. Applied survival analysis: regression modeling of time-to-event data. Wiley-Interscience, 2008.

[108] [108].↵
Peter J. Rousseeuw and Peter. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20(1):53–65, 1987.
OpenUrl CrossRef Web of Science

[109] [109].↵
Fabio Vandin, Alexandra Papoutsaki, Benjamin J. Raphael, and Eli Upfal. Accurate computation of survival statistics in genome-wide studies. PLOS Computational Biology, 11(5):1–18, 05 2015.
OpenUrl CrossRef

[110] [110].↵
Miriam Ragle Aure, Israel Steinfeld, Lars Oliver Baumbusch, Knut Liestøl, Doron Lipson, Sandra Nyberg, Bjørn Naume, Kristine Kleivi Sahlberg, Vessela N. Kristensen, Anne-Lise Børresen-Dale, Ole Christian Lingjærde, and Zohar Yakhini. Identifying in-trans process associated genes in breast cancer by integrated analysis of copy number and expression data. PLOS ONE, 8(1):1–15, 2013.
OpenUrl CrossRef PubMed

Multi-omic and multi-view clustering algorithms: review and cancer benchmark

Abstract

1 Introduction

2 Review of multi-omics clustering methods

2.1 Alternate optimization

2.2 Early integration

2.3 Late integration

2.4 Similarity-based methods

2.4.1 Spectral clustering generalizations

2.4.2 Similarity Network Fusion

2.4.3 Multiple Kernel Learning

2.5 Dimension reduction-based methods

2.5.1 JIVE

2.5.2 Correlation and covariance-based

2.5.3 Non-negative Matrix Factorization

2.5.4 Matrix tri-factorization

2.5.5 Convex formulations

2.5.6 Tensor-based methods

2.6 Statistical methods

2.6.1 iCluster and iCluster+

2.6.2 PARADIGM

2.6.3 Combining omic-specific and global clustering

2.6.4 Survival-based clustering

2.7 Deep multi-view methods

3 Benchmark

4 Discussion

Funding

Acknowledgements

References

Citation Manager Formats

Subject Area