PT - JOURNAL ARTICLE AU - Young Hwan Chang AU - Jim Korkola AU - Dhara N. Amin AU - Mark Moasser AU - Jose M. Carmena AU - Joe W. Gray AU - Claire J. Tomlin TI - Disentangling Multidimensional Spatio-Temporal Data into their Common and Aberrant Responses AID - 10.1101/004259 DP - 2014 Jan 01 TA - bioRxiv PG - 004259 4099 - http://biorxiv.org/content/early/2014/04/28/004259.short 4100 - http://biorxiv.org/content/early/2014/04/28/004259.full AB - With the advent of high-throughput measurement techniques, scientists and engineers are starting to grapple with massive data sets and encountering challenges with how to organize, process and extract information into meaningful structures. Multidimensional spatio-temporal biological data sets such as time series gene expression with various perturbations over different cell lines, or neural spike trains across many experimental trials, have the potential to acquire insight across multiple dimensions. For this potential to be realized, we need a suitable representation to understand the data. Since a wide range of experiments and the unknown complexity of the underlying system contribute to the heterogeneity of biological data, we propose a method based on Robust Principal Component Analysis (RPCA), which is well suited for extracting principal components when there are corrupted observations. The proposed method provides us a new representation of these data sets in terms of a common and aberrant response. This representation might help users to acquire a new insight from data.Author Summary One of the most exciting trends and important themes in science and engineering involves the use of high-throughput measurement data. With different dimensions, for example, various perturbations, different doses of drug or cell lines characteristics, such multidimensional data sets enable us to understand commonalities and differences across multiple dimensions. A general question is how to organize the observed data into meaningful structures and how to find an appropriate similarity measure. A natural way of viewing these complex high dimensional data sets is to examine and analyze the large-scale features and then to focus on the interesting details. With this notion, we propose an RPCA-based method which models common variations as approximately the low-rank component and anomalies as the sparse component. We show that the proposed method is able to find distinct subtypes and classify data sets in a robust way without any prior knowledge by separating these common responses and abnormal responses.