RT Journal Article
SR Electronic
T1 Enter the matrix: factorization uncovers knowledge from omics Names/Affiliations
JF bioRxiv
FD Cold Spring Harbor Laboratory
SP 196915
DO 10.1101/196915
A1 Stein-O’Brien, Genevieve L.
A1 Arora, Raman
A1 Culhane, Aedin C.
A1 Favorov, Alexander V.
A1 Garmire, Lana X.
A1 Greene, Casey S.
A1 Goff, Loyal A.
A1 Li, Yifeng
A1 Ngom, Aloune
A1 Ochs, Michael F.
A1 Xu, Yanxun
A1 Fertig, Elana J.
YR 2018
UL http://biorxiv.org/content/early/2018/04/02/196915.abstract
AB Omics data contains signal from the molecular, physical, and kinetic inter- and intra-cellular interactions that control biological systems. Matrix factorization techniques can reveal low-dimensional structure from high-dimensional data that reflect these interactions. These techniques can uncover new biological knowledge from diverse high-throughput omics data in topics ranging from pathway discovery to time course analysis. We review exemplary applications of matrix factorization for systems-level analyses. We discuss appropriate application of these methods, their limitations, and focus on analysis of results to facilitate optimal biological interpretation. The inference of biologically relevant features with matrix factorization enables discovery from high-throughput data beyond the limits of current biological knowledge—answering questions from high-dimensional data that we have not yet thought to ask.Amplitude matrixThe matrix learned from MF that contains molecules in rows and factors in columns. Each column represents the relative contribution of the genes in a factor, which can be used to define a molecular signature for a CBP.Complex biological process (CBP)The coregulation or coordinated effect of multiple molecular species resulting in one or more phenotypes examples can range from activation of multiple proteins in a single cellular signaling pathway to epistatic regulation of development.Computational microdissectionA computational method to learn the composition of a heterogeneous sample, e.g., the cell types in a tissue sample.Independent Component Analysis (ICA)A MF technique that learns statistically independent factors.Matrix Factorization (MF)A technique to approximate a data matrix by the product of two matrices (see Box 1), one of which we call the amplitude matrix and the other the pattern matrix.Non-negative Matrix Factorization (NMF)A MF technique for which all elements of the amplitude and pattern matrices are greater than or equal to zero.Pattern matrixThe matrix learned from MF that contains factors in rows and samples in columns. Each row represents the relative contribution of the samples in a factor, which can be used to define the relative activity of CBPs in each sample.Principal Component Analysis (PCA)A MF technique that learns orthogonal factors ordered by the relative amount of variation of the data that they explain.