PT - JOURNAL ARTICLE AU - Young Hwan Chang AU - Jim Korkola AU - Dhara N. Amin AU - Mark Moasser AU - Jose M. Carmena AU - Joe W. Gray AU - Claire J. Tomlin TI - Disentangling Multidimensional Spatio-Temporal Data into their Common and Aberrant Responses AID - 10.1101/004259 DP - 2014 Jan 01 TA - bioRxiv PG - 004259 4099 - http://biorxiv.org/content/early/2014/04/23/004259.short 4100 - http://biorxiv.org/content/early/2014/04/23/004259.full AB - With the advent of high-throughput measurement techniques, scientists and engineers are starting to grapple with massive data sets and encountering challenges with how to organize, process and extract information into meaningful structures. Multidimensional spatio-temporal biological data sets such as time series gene expression with various perturbations with different cell lines, or neural spike data sets across many experimental trials have the potential to acquire insight across multiple dimensions. For this potential to be realized, we need a suitable representation to turn data into insight. Since a wide range of experiments and the (unknown) complexity of underlying system make biological data more heterogeneous than those in other fields, we propose the method based on Robust Principal Component Analysis (RPCA), which is well suited for extracting principal components where we have corrupted observations. The proposed method provides us a new representation of these data sets which consists of its common and aberrant response. This representation might help users to acquire a new insight from data.Author Summary One of the most exciting trends and important themes in science and engineering involves the use of high-throughput measurement data. With different dimensions, for example, various perturbations, different doses of drug or cell lines characteristics, such multidimensional data set enables us to understand commonalities and differences across multiple dimensions. A general question is how to organize the observed data into meaningful structures and how to find an appropriate similarity measure. A natural way of viewing these complex high dimensional data sets is to examine and analyze the large-scale features and then to focus on the interesting details. With this notion, we propose the RPCA-based method which models common variations as approximately the low-rank component and anomalies as the sparse component. We show that the proposed method is able to find distinct subtypes and classify data set in a robust way by separating common responses and abnormal responses without any prior knowledge.