Abstract
Omics data contains signal from the molecular, physical, and kinetic inter- and intra-cellular interactions that control biological systems. Matrix factorization techniques can reveal low-dimensional structure from high-dimensional data that reflect these interactions. These techniques can uncover new biological knowledge from diverse high-throughput omics data in topics ranging from pathway discovery to time course analysis. We review exemplary applications of matrix factorization for systems-level analyses. We discuss appropriate application of these methods, their limitations, and focus on analysis of results to facilitate optimal biological interpretation. The inference of biologically relevant features with matrix factorization enables discovery from high-throughput data beyond the limits of current biological knowledge—answering questions from high-dimensional data that we have not yet thought to ask.
Glossary
- Amplitude matrix
- The matrix learned from MF that contains molecules in rows and factors in columns. Each column represents the relative contribution of the genes in a factor, which can be used to define a molecular signature for a CBP.
- Complex biological process (CBP)
- The coregulation or coordinated effect of multiple molecular species resulting in one or more phenotypes examples can range from activation of multiple proteins in a single cellular signaling pathway to epistatic regulation of development.
- Computational microdissection
- A computational method to learn the composition of a heterogeneous sample, e.g., the cell types in a tissue sample.
- Independent Component Analysis (ICA)
- A MF technique that learns statistically independent factors.
- Matrix Factorization (MF)
- A technique to approximate a data matrix by the product of two matrices (see Box 1), one of which we call the amplitude matrix and the other the pattern matrix.
- Non-negative Matrix Factorization (NMF)
- A MF technique for which all elements of the amplitude and pattern matrices are greater than or equal to zero.
- Pattern matrix
- The matrix learned from MF that contains factors in rows and samples in columns. Each row represents the relative contribution of the samples in a factor, which can be used to define the relative activity of CBPs in each sample.
- Principal Component Analysis (PCA)
- A MF technique that learns orthogonal factors ordered by the relative amount of variation of the data that they explain.