ABSTRACT
Gene expression measurements taken over multiple time points are useful for describing dynamic biological phenomena such as tissue development, regeneration, and cancer. However, since these phenomena involve multiple processes occurring in parallel, for example differentiation and proliferation, it is difficult to discern their respective contributions to a measured gene expression profile at any given point in time. Here, we demonstrate the use of un-supervised machine learning techniques to identify and “de-convolve” processes occurring in parallel in a simple model system. We first downloaded a published dataset of RNAseq measurements from synchronized HeLa cells that were sampled at 14 consecutive time points. We then used Fourier analysis and Topic modeling to identify two concurrent processes: a periodic process, corresponding to cell division, and a transient process related to HeLa cell identity (e.g. cervical cancer), that is presumably required for recovery from cell cycle arrest. This study demonstrates the use of un-supervised machine learning techniques to identify hidden states and processes in the cell.
Competing Interest Statement
The authors have declared no competing interest.