Deep learning-enhanced morphological profiling predicts cell fate dynamics in real-time in hPSCs

Predicting how stem cells become patterned and differentiated into target tissues is key for optimising human tissue design. Here, we established DEEP-MAP - for deep learning-enhanced morphological profiling - an approach that integrates single-cell, multi-day, multi-colour microscopy phenomics with deep learning and allows to robustly map and predict cell fate dynamics in real-time without a need for cell state-specific reporters. Using human pluripotent stem cells (hPSCs) engineered to co-express the histone H2B and two-colour FUCCI cell cycle reporters, we used DEEP-MAP to capture hundreds of morphological- and proliferation-associated features for hundreds of thousands of cells and used this information to map and predict spatiotemporally single-cell fate dynamics across germ layer cell fates. We show that DEEP-MAP predicts fate changes as early or earlier than transcription factor-based fate reporters, reveals the timing and existence of intermediate cell fates invisible to fixed-cell technologies, and identifies proliferative properties predictive of cell fate transitions. DEEP-MAP provides a versatile, universal strategy to map tissue evolution and organisation across many developmental and tissue engineering contexts.


INTRODUCTION
How to define cellular states and state transitions remains a fundamental question in stem cell biology. In recent years, approaches that can quantify properties of stem cells and their differentiated derivatives have become broadly recognised for their pivotal roles in predictable and robust synthetic tissue design (Brassard and Lutolf, 2019;Del Sol et al., 2017;Prochazka et al., 2017). Although RNAseq-and DNA barcoding-based single-cell technologies are used extensively for cell state mapping and lineage analysis (Cahan et al., 2014;Kinney et al., 2019;Rackham et al., 2016;VanHorn and Morris, 2021), these have critical limitations: cells are killed and their characteristics are measured only at that point in time. Hence, historical cell state information that might be key to fate evolution -e.g. transient cell states that disappear before the endpoint observation, dying cells at the time of observation, cell tissue context and density prior fate acquisition -is lost. Furthermore, for single-cell technologies time is an implicit variable that can only be inferred mathematically. This is important, because the true dynamical and temporal variables (individual and collective/reciprocal cell movements, the actual duration and longevity of cell states, and how cell heterogeneity changes and co-evolves with fate) is lost as well (Villoutreix, 2021).
In contrast, continuous live cell imaging provides a non-interventional interrogation of cell state dynamics, including historical information of cell states and cell state transitions, in real-time (Chessel and Carazo Salas, 2019). 'Live' tracking of cell states has so far relied on using fluorescently-tagged transcription factors to reveal cell fate/state (Etzrodt and Schroeder, 2017;Filipczyk et al., 2015;Strebinger et al., 2019;Wolff et al., 2018), although newer strategies using indirect reporters of transcription factor status are emerging (Kim et al., 2021).
Crucially, each differentiation or other experimental scenario requires the establishment of a tailor-made reporter. This limitation becomes prohibitive when aiming to study cell differentiation into multiple target fates, as the possible reporters that can be co-expressed and imaged 'live' simultaneously in cells are relatively few (Kim et al., 2021). Developing technologies to quantitatively monitor cell state dynamics in real-time and without the need for specialised cell state reporters would overcome these limitations and provide a powerful solution to study and predict cell state transitions.
In the past 15 years, high-dimensional, image-based morphological profiling enhanced by machine learning has been used successfully by many groups including ours in a variety of model systems to systematically identify and characterise genes/gene-network states (Chong et al., 2015;Collinet et al., 2010;Fuchs et al., 2010;Graml et al., 2014;Neumann et al., 2010), to identify and enhance signalling networks (Bakal et al., 2007;Evans et al., 2013;Horn et al., 2011), to characterise small molecule treatments (Bray et al., 2017;Bray et al., 2016) and to infer context-dependent gene functions (Sailem et al., 2020). To date, however, machine learning-enhanced 'live' morphological profiling has not been used to identify and predict cell fate transitions in real-time.
Here, we established DEEP-MAP, an approach that combines optimised multi-day, multicolour microscopy phenomics with deep neural networks, and allows to harness highdimensional morphological profiling information obtained from hundreds of thousands of 'live' cells, to map and predict cell fate dynamics in real-time. We applied DEEP-MAP to human pluripotent stem cells (hPSCs) co-expressing broadly used cell proliferation and cell cycle reporters and asked whether phenotypic profiling alongside proliferation information can be used to identify cell states and monitor multi-fate dynamics. We show that based only on morphological and proliferative features DEEP-MAP can reliably map and spatiotemporally predict, at a single-cell level, cells' fate and fate dynamics during differentiation into the three basic germ layer fates, without a need for customised cell state reporters specific to the cell fates and lineages being monitored. We show that DEEP-MAP predicts fate changes with high temporal sensitivity -comparable to what is currently technically possible using fluorescentlylabelled transcription factor reporters. Moreover, we show that DEEP-MAP can reveal the timing of phenotypic transitions associated with cell fate conversions and the existence of cell fate intermediates, and be used to identify proliferative properties predictive of cell fate transitions.

Establishing a multi-day, multi-colour microscopy phenomics pipeline for dynamical phenotyping of hPSCs in real-time and at single-cell resolution.
To monitor hPSC morphology and proliferation dynamics at single-cell resolution, we generated a CRISPR knock-in three-colour hPSC line co-expressing a fluorescently-tagged histone H2B reporter, H2B-miRFP670 (Kim et al., 2021;Shcherbakova et al., 2016) (far red fluorescence emitting), and the two-colour FUCCI cell cycle reporter (Pauklin and Vallier, 2013;Sakaue-Sawano et al., 2008) (red/green fluorescence emitting) (Fig. 1A) (see STAR Methods for detailed descriptions from here on). When considering choice of reporters, we took into account that (1) hPSCs form very compact colonies; therefore fluorescent nuclear reporters would give the best shot at unequivocal single-cell identification and morphological profiling (2) FUCCI and H2B are robust cell proliferation reporters that together enable quantitative monitoring of detailed aspects of cell proliferation, cell cycle progression, mitosis and cell death and (3) the H2B signal specifically never disappears from cells, allowing uninterrupted cell visualisation. We then set out to establish a multi-colour, time-lapse highcontent microscopy pipeline enabling us to image cells continuously over time, in a way that would be compatible with daily media change, high-temporal resolution imaging (to allow automated cell detection and tracking) and multi-day imaging, required to observe fate transitions occurring on a time scale of days as cells continue to proliferate normally (Fig. 1B).
Frequent and extended time-lapse imaging is highly phototoxic (Loeffler and Schroeder, 2019;Piltti et al., 2018;Schroeder, 2011), particularly for stem cells, prohibiting imaging many reporters with high temporal frequency. To overcome this limitation, we developed an optimised imaging modality by which we imaged the H2B-miRFP670 signal every 5 minutes and the FUCCI signal only every 30 minutes. This scheme allowed us to achieve multi-day imaging of healthily proliferating cells for at least 3-5 days or more, limited only by cell confluence. In this manner, we captured multi-colour time-lapse images of cell populations across multiple image fields over time, and then used a customised image analysis pipeline to (A) digitally stitch neighbouring fields into larger images containing multiple hPSC colonies (B) detect all cells in all colonies through time (using nuclei as proxies; from here onwards cells/nuclei are used interchangeably) (C) where possible, track individual cells through time, and (D) detect and track all colonies through time (Fig. 1C). Typically, time series captured in this way consisted of 800-1000 images (timesteps), with each imaging field leading to >100,000 detected nuclei data points throughout the entire multi-day imaging sequence. After data capture, we used a trained multi-class Support Vector Machine (SVM) to classify and assign probabilities by machine learning to different types of nuclear events (interphase, metaphase, anaphase, cell death) (Fig. 1D), and then extracted, in addition to the SVMderived features (i.e. probability assignments), >550 different morphological, intensity and texture features across the multi-channel signals on a single-cell basis (Fig. 1E). Ultimately, this analysis yielded >550-dimensional phenoprints for hundreds of thousands of cells for each experimental condition analysed, capturing the morphological and spatiotemporal phenotype of that condition (Fig. 1F).
Live morphological profiling of hPSCs during pluripotency and early germ layer differentiation.
Next, we used this strategy to dynamically phenotype the evolution of hPSCs, either while maintaining pluripotency or when triggered to undergo early directed differentiation. Given that hPSCs have the capacity to differentiate into all three basic germ layers, we chose to trigger cells to undergo directed differentiation into primitive neural stem cells (NSCs), cardiac mesoderm-induced cells, and definitive endoderm, mimicking early ectoderm (EC), mesoderm (ME) and endoderm (EN), respectively ( Fig. S1; hereafter referred to as EC, ME and EN for shorthand). In our conditions, pluripotent hPSCs co-expressing H2B-miRFP670 and FUCCI formed colonies that grew exponentially and divided with a typical cell doubling time of 18 hours, in agreement with an average cell cycle duration of ~15 hours (Pauklin and Vallier, 2013), with most cells in S/G2/M (FUCCI-green) phases of the cell cycle ( Fig. S1A and Movie S1). Similarly, hPSCs triggered to initiate EC differentiation initially grew exponentially with a typical doubling rate of 18 hours in the first 48 hours, after which they visibly slowed their doubling time while gradually increasing the proportion of cells in G1 (FUCCI-red) ( Fig. S1B and Movie S2), coincident with differentiation onset. By contrast, cells triggered to differentiate into ME ( Fig. S1C and Movie S3) and EN ( Fig. S1D and Movie S4) changed their appearance after 24h of imaging, becoming more spread out and altering their nuclear shape and cell cycle reporter characteristics, especially in EN triggered cells (Figs. S1C-S1D).
To compare the phenotypic evolution of the four different cell populations quantitatively, we sought to map the high-dimensional phenoprints of the four cell populations. We found that linear dimensionality reduction by principal component analysis (PCA, (Pearson, 1901) did not separate the four populations well. This outcome suggested that population variance in feature space was overall comparable and overlapping ( Fig. 2A), possibly due to the low signal-tonoise (SNR) ratio imposed by the low exposure times necessary in our time-lapse imaging to keep cells healthy (Weigert et al., 2018). Similarly, when we used non-linear embeddings commonly used for single-cell transcriptomics analysis (Kobak and Berens, 2019), tdistributed stochastic neighbour embedding (tSNE (van der Maaten and Hinton, 2008)) and Uniform Manifold Approximation and Projection (UMAP, (McInnes et al., 2018), we found that (with the exception a subset of EN triggered cells) the four cell populations largely overlapped in those embedding spaces (Figs. 2B-2C).

Using neural networks to map phenotypic diversity.
To overcome these methodological limitations, we developed a supervised neural network (NN) embedding model that first uses the >550-dimensional feature datasets to learn to predict and spatially embed different cell state classes as separate as possible on a plane knowing their class labels, and can then be used subsequently to map new data points based on their feature phenoprints without knowledge of the class labels (Fig. 2D). We trained the model with four input classes: 'pluripotent', 'ectoderm', 'mesoderm' and 'endoderm'. 'Pluripotent' consisted of 2,675 early pluripotent datapoints (t=0h-6h, i.e. 0 days, when colonies are small) and 23,492 late pluripotent datapoints (t=60h40min-66h40min, i.e. 2.5 days, when colonies are visually large and dense). Importantly, the 'pluripotent' training class contained both early and late pluripotent cell datapoints to eliminate cell density as a potential confounding factor.
'Ectoderm' consisted of 23,659 late EC datapoints (t=77h20min-83h20min i.e. 3.5 days, when cells acquire SOX1+, denoting early EC fate (Kim et al., 2021)). 'Mesoderm' consisted of 15,278 late ME datapoints (t=77h20min-83h20min i.e. 3.5 days, when cells have been reported to exit the T+ state en route to becoming early cardiac mesoderm fate (Rao et al., 2016)). 'Endoderm' consisted of 1'296 late EN datapoints (t=60h40min-66h40min i.e. 2.5 days, when cells have been shown to become SOX17+ denoting early EN fate (Ogawa et al., 2013;Teo et al., 2011)). This approach proved to be pivotal for clearly separating the different cell states (Fig. 2E). Crucially, we were able to reproducibly do so across different datasets As expected, at early timepoints (0-1 days), all four cell populations were close to each other and located around the pluripotent embedding area, corresponding to the fact that they were all phenotypically pluripotent (Fig. 2F, bottom half). However, as time progressed, their temporal trajectories diverged, coincident with differential cell fate acquisition (Fig. 2F, top half). Similarly, by plotting the trajectories of single-cells tracked early in differentiation as vectors on the embedding (Fig. 2G), we observed that individual cells' trajectories started in the pluripotent area of the embedding and gradually diverged on the fate map as time progressed. Thus, multi-day, high-dimensional, image-based morphological profiling enhanced by deep learning allows to robustly map and predict cell fate evolution toward multiple, different cell fate outcomes, both at the population and at the single-cell level. We call our approach DEEP-MAP, for deep learning-enhanced morphological profiling.

Proliferative features predictive of cell fate.
With the DEEP-MAP NN model in place, we asked what features allows discrimination among different cell fate classes. Focusing on a small subset of biologically interpretable features, we carried out Bayesian network statistics and linear correlation analysis of the features and found that several features correlated directly with the cell fate class/label, indicating that they may be linked with the phenotypic states associated with the different cell fates (Fig. 3A). We then computed mutual information to quantitatively measure the mutual dependency between every feature and cell fate, as a way to estimate the importance of each feature to cell fate.
Overall, we found that cell density and G1 status (as assessed by FUCCI-red signal intensity) were the features with the highest importance for predicting the different cell state classes ( Fig. 3B), with different features displaying different degrees of importance across the different cell states. Specifically, cell density, G1 status and cell speed had high importance in both pluripotent and EC cells, with cell speed and nuclear area taking on increased importance in EC cells relative to pluripotent cells. These data suggest that cell speed and nuclear area are features that are discriminant of EC cells with respect to pluripotent cells, and that changes in cell migration control and shape/size may accompany pluripotency exit and onset of early ectodermal fate (Fig. 3C, top). By contrast, we found in ME cells that G1 status and cell speed had lower importance, and that the average distance of cells from the border of colonies (another measure of cell density) had a higher importance relative to pluripotent cells. This suggests that G1 status, cell speed and cell density discriminate ME cells, and that changes in cell cycle progression, migration and adhesion may accompany the onset of early mesodermal fate (Fig. 3C, bottom left). Finally, in EN cells we found that G1 status and cell speed had higher importance, and the average distance of cells from the border of colonies and cell density had a lower importance, relative to pluripotent cells, suggesting that those Next, we sought to exploit the predictive capacity of DEEP-MAP to observe single-cell fate dynamics in real-time. To this end, we used the NN cell fate class predictions to visually augment image sequences by displaying on the images the predominant (highest probability) predicted class for each single-cell through time. As expected, we observed that the overwhelming majority of pluripotent, EC, ME and EN triggered cells were predicted initially to be pluripotent ( Fig. 4A-4D, left most image panels and Movies S7-S10), and that as time proceeded pluripotent cells maintained that predicted fate even as colonies grew larger and denser (Fig. 4A). By contrast, after the trigger, EC cells began losing the pluripotent phenotype ~1 day later and began acquiring ectoderm phenotype by 2.5 days (Fig. 4B). Importantly, both of these timepoints are earlier than those at which pluripotency exit (e.g. OCT4 and NANOG loss, ~2-3 days (Li et al., 2011)) and ectodermal fate onset (e.g. Nestin and PAX6 gain (Li et al., 2011) and SOX1 gain (Kim et al., 2021), ~3-5 days) have been detected by quantitatively monitoring transcription factor levels. Hence, by using solely indirect proliferative readouts, DEEP-MAP is capable of detecting cell fate changes in 'live' cells earlier than what is currently technically possible. We were surprised to see that within only a few hours of receiving the differentiation trigger, ME and EN cells began losing the pluripotent phenotype ( Fig. 4C-4D). ME cells began to acquire the mesoderm phenotype within 1 day of receiving the differentiation trigger and increasingly acquired that predicted fate for multiple days (Fig. 4C).
By contrast, EN cells acquired the endoderm phenotype 2.5 days after trigger (Fig. 4D). Both of these differentiation onset timings are as early and possibly earlier than those at which early mesoderm (e.g. T gain, ~1-2 days; (Rao et al., 2016)) and early endoderm (e.g. FOXA2 and SOX17 gain, ~3 days; (Ogawa et al., 2013;Teo et al., 2011)) fate onset have been reported to occur. Taken together, these data indicate that DEEP-MAP can detect 'live' and dynamical cell fate-associated phenotypic changes earlier than previously possible, with unprecedented sensitivity and predictive power.
To gain further insights into the dynamics of lineage differentiation, we used the DEEP-MAP NN embedding to map real-time phenotypic trajectories of different EC colonies, by computing the average coordinates of each cell colony at every point in time through the 3.5 days of imaging ( Fig. 4E). While most EC colonies' trajectories began in the pluripotency domain and later evolved toward the ectoderm domain, they appeared to do so heterogeneously. By computing the predominant (highest probability) cell fate assignment for each separate colony as a function of time, we confirmed that colonies differed in the time of departure from pluripotency as well as in the time of acquisition of the majority ectodermal phenotype (Fig.   4F). By contrast, we found that EN colonies' trajectories proceeded much more homogeneously through the DEEP-MAP fate map, suggestive of a much more tightly controlled response of cells and colonies to differentiation triggers (Fig. 4G). Strikingly, all EN trajectories appeared to go through the mesoderm domain en route to acquisition of the endoderm phenotype, suggesting that acquisition of an early mesoderm-like state could be an intermediate in acquiring endodermal fate. This was evident when we looked at the colonies' predominant cell fate assignment as a function of time, which showed that EN-triggered colonies departed pluripotency early and almost synchronously, then acquired and maintained a predicted mesodermal phenotype for 1-1.5 days, and subsequently acquired an endodermal phenotype 40 hours later (Fig. 4H). Altogether, our findings demonstrate that, by integrating high-dimensional, image-based morphological profiling with deep learning, DEEP-MAP can predict cell fate dynamics 'live' in real-time with high temporal sensitivity across multiple fates.
Hence, this technology can be leveraged to reveal differences in cell fate dynamics between lineages, cells and colonies and can reveal the history, timing, and existence of cell fate intermediates, which can be elusive to fixed-cell technologies.

DISCUSSION
We have established DEEP-MAP, a deep learning-enhanced morphological profiling approach, which enables 'live' large-scale microscopy imaging and phenotyping of hPSC populations at single-cell level and predicts cell fate dynamics and transitions in real-time over several days. We generated by CRISPR knock-in hPSC lines stably co-expressing 'live' cell proliferation reporters and then established multicolour time-lapse imaging compatible with long-term cell viability and health, and compatible with proliferation and differentiation under the microscope. Our pipeline integrates image processing, machine learning and statistical analysis workflows to enable time-resolved, high-dimensional phenotyping of 100'000s of single-cells, as well as deep learning methods allowing robust visualisation, mapping, phenotypic clustering and prediction of cell fate dynamics toward multiple fate outcomes. We show that based on two very general and commonly used cell proliferation reporters -FUCCI and fluorescent H2B -DEEP-MAP yields rich, deep and continuous information about morphological and proliferative state that can predict and reveal cell fate dynamics 'live' in real-time and at single-cell level without a need for customised cell state reporters specific to the fates and lineages being monitored. The fact that the reporters used here are universally present and visible across developmental and cell differentiation contexts makes our approach highly versatile and broadly applicable.
Our choice to use nuclear-localised fluorescent reporters for morphological profiling was not fortuitous: hPSC colonies are very compact and therefore we used nuclear reporters to be able to unequivocally detect and, where possible, track cells through time. However, DEEP-MAP could be adapted to use other 'live' Cell Painting-type (Bray et al., 2016) morphological reporters, both genetically encoded and membrane permeable (e.g. fluorogenic probes (Mishra et al., 2019)), and applied to other cell types of choice. Similarly, DEEP-MAP could form the basis of methodologies to carry out label-free cell fate mapping and prediction (Christiansen et al., 2018;Ounkomol et al., 2018), by applying it to cells that are spatially sparse and can be unequivocally identified without the need for fluorescent reporters (Al-Zaben et al., 2019) or by using label-free imaging optical modalities (Kallepitis et al., 2017;Marrison et al., 2013).
Previous efforts have used deep learning to detect and predict cell states 'live' before. For instance, convolutional neural networks (CNNs) can detect morphological differences between unlabelled pluripotent and early epiblast-like differentiating cell patches of mouse embryonic stem cells just 20 minutes after onset of differentiation (Waisman et al., 2019).
Similarly, CNNs and recurrent neural networks (RNNs) can predict lineage choices of individual, disaggregated primary mouse hematopoietic stem and progenitor cells (HSPCs) up to three generations before molecular marker annotation, based on their morphological and displacement characteristics (Buggenthin et al., 2017). By combining morphological profiling applied to simple nuclear-localised cell proliferation readouts with supervised neural networks, DEEP-MAP goes significantly further. Our method captures the evolving dynamics of cell fate transitions over several days across entire colonies of highly compact human stem cells, demonstrating the suitability of our approach to large-scale, tissue-level fate dynamics' prediction with single-cell resolution.
Using DEEP-MAP we found that, upon receiving a differentiation trigger, hPSCs initiate pluripotency exit between 8 and 24 hours. This is as early and possibly earlier than previously detected using 'live' fluorescently-labelled transcription factor reporters in hPSCs (Kim et al., 2021;Wolff et al., 2018). Furthermore we found that early endoderm-triggered colony trajectories appear to transiently go through a predicted mesoderm phenotype, suggesting that acquisition of an early mesoderm-like fate could be an intermediate in acquiring endodermal fate. This is in agreement with previous observations in mouse embryonic stem cells (Kubo et al., 2004;Mahmood and Aldahmash, 2015;Tada et al., 2005), and suggests that our approach can reveal the existence of cell fate intermediates that would be invisible without real-time phenomics information.
In addition, our analysis also resolved visible differences in the real-time cell fate dynamics within and between early EC and EN lineages. This suggests that DEEP-MAP could become a general framework to quantify the heterogeneity, noise and speed of different cell fate transitions. Accordingly, DEEP-MAP could help identify the origins of variability in synthetic tissue generation, as well as give insights into how to engineer tissues predictably and robustly (Brassard and Lutolf, 2019;Prochazka et al., 2017).
In sum, DEEP-MAP provides a framework to quantitatively investigate the dynamics of cell fate decisions at single-cell level within tissues in real-time and at scale. By enabling the generation of spatiotemporal, predictive maps of tissue formation, this approach can help to measure, benchmark and, ultimately, predict and control complex tissue design.

Limitations of Study
In this study we used differentiation triggers known to result in high cell fate conversion rates.
This allowed us to assume at the endpoint that all cells acquired pseudo-homogeneously the intended fates. This assumption simplified the deep learning strategy, as it enabled us to use a common neural network architecture to both learn to spatially map cells as well predict cell fate dynamics. Such a strategy -where we constrain the network to simultaneously learn cell fate prediction while generating a spatial map -could be limiting when investigating more complex tissues or heterogenous cell populations, where differentiation may be less efficient or specific and might lead to generation of a variety of differentiation products (including cells of unknown fate). In that case using separate network architectures to learn mapping and visualisation could significantly help increase the cell fate predictive power required by the biology. In addition, combination of the approach described here with spatially resolved transcriptomics approaches (Marx, 2021) might also become possible in the future, providing a way to obtain a more detailed landscape of the resulting cell types generated, as well as quantitative, deep readouts on the transcriptional status of cells. Such a combined approach could lead to the generation of 'time machine'-type models of cell fate dynamics with increasingly precise, diverse, temporally-resolved and sensitive predictive power.
For donor constructs of H2B-miRFP670 and FUCCI, the fragments of 5' and 3' homology arms of ROSA26 locus were subcloned into H2B-670 (modified from pmiRFP670-N1, #79987, Addgene) and FUCCI (kind gift from Ludovic Vallier's lab, U. of Cambridge). All of the cloning procedures were performed using In-fusion HD Cloning kit (639650, Takara) for seamless DNA cloning.

Differentiation of hESCs into multiple lineages
To differentiate into neuroecto lineages, hESCs were applied to PSC neural induction medium Briefly, cells were grown in PSC definitive endoderm induction medium A for 1 day, then changed to definitive endoderm induction medium B till the end of experiment.

Time-lapse imaging
Established FUCCI/H2B hESC lines were plated onto Geltrex-coated CellCarrier-96 Ultra Microplates (6055302, Perkin Elmer) a day before imaging. Differentiation trigger was applied by gently changing to the differentiation medium before imaging. Cells were imaged using a Yokogawa CV7000 high throughput confocal microscope (Wako). FUCCI signal was captured every 30 min using both 488 and 561 nm at 150 and 350 ms of exposure time and H2B-miRFP670 signal was captured using 640 nm at 500 ms of exposure time every 5 min, respectively. Time-lapse imaging was performed for 5 days.

LEVER Processing
The open source LEVER software package Wait et al., 2014; 2016) (https://leverjs.net) is utilised to segment and track cells from hPSC time-lapse image sequences. The H2B channel TIFF files generated from the microscopy experiments are imported into the LEVER file format, and an ensemble-based segmentation algorithm and a cell tracking algorithm are then applied to the image sequences. The segmentation algorithm separates foreground and background regions through combining an adaptive intensity thresholding with a Laplacian of Gaussian filter. The only parameter to the algorithm is the minimum cell radius, here set at 2.5μm. Following the segmentation, cells are tracked and mitotic events identified as described previously (Winter et al., 2015).

Feature Extraction
Feature extraction is carried out using a custom script MATLAB (MathWorks). The segmentation results generated by LEVER are read into MATLAB and correlated with the original TIFF images generated by the microscopy experiment across the H2B and FUCCI channels. The script then advances frame by frame, using the segmentation results as a registration method to record features from the original images. Feature extraction only occurs at timepoints where all 3 channels are present. Images undergo local background subtraction using a 50x50 pixel sized rolling average filter in order to correct for spatial variations in microscopy illumination.
Colonies of cells are identified using the DBScan algorithm (Sander et al., 1998) (Flusser, 2000;Marchant, 2021;Ming-Kuei, 1962), Zernike moments (Saki et al., 2013;Tahmasbi et al., 2011), Gray Level Run Length Matrices (Galloway, 1975;Wei, 2021), Gray Level Size Zone Matrices (Thibault et al., 2013), Neighborhood Grey Tone Difference Matrices (Amadasun and King, 1989;Vallières et al., 2015), shape descriptors, colony based features such as distance from the colony centroid and border, and cell density features. Machine (Allwein et al., 2000). The Support Vector Machine assigns the most likely class to each segmented cell from 5 different classes: 'interphase', chromatin is decondensed and occupies a mostly round nucleus, implying that the cell is in G1, S or G2 phases of the cell cycle; 'metaphase', condensed chromatin with chromosomes aligned on a metaphase plaate; 'anaphase', two nearby masses of condensed chromatin corresponding to the cell having undergone the metaphase to anaphase transition; ; 'apoptosis', nuclear debris, i.e. fragments of apparently condensed chromatin typical of cell death; and 'mis-segmented', the H2B signal segmentation is poor and corresponds possibly to more than one cell's nucleus (Note: this latter class is not shown in Figure 1 or mentioned in Figure 1 legend

Neural Network Embeddings
After feature pre-processing, datasets are fed into the NN embedding pipeline (see later), generating an XY coordinate for each datapoint that is recorded along with the cell ID, frame, label, and classification probabilities of belonging to each differentiation class. Neural network embeddings are plotted using MATLAB using the scatter function, with the data points being The features are standardised using the Z-score method and the selected feature is used to individually colour the scatter points.

Cell Fate Probability Tracks
Cell fate probability tracks for individual colonies are plotted using the classification probabilities assigned to each data point in that colony by the NN embedding. To compare the level of differentiation each colony has undergone, the classification probabilities of the nonpluripotent states are summed together to represent the degree of differentiation. The classification probabilities of all data points at a given frame within a colony is averaged together and the mean probability is plotted over time. The differentiated probability plot is coloured according to which differentiated state has the highest classification accuracy at that given frame. A rolling average window of 2.5 hours is applied to the classification probabilities in order to minimise the amount of noise in the trajectories.

Embedding and prediction by Deep Neural Network
Overview of the idea Although our framework is pipeline-based involving multiple stages with engineering-based feature extraction, the DL embedding model refines the extracted features by learning informative/discriminative representations. Its multi-layers progressively reduce the features' dimensionality with each layer as it learns higher-levels of feature abstraction and finally outputs a two dimensional representation at the last layer. This final representation serves as our 2D visualisable embedding of the input features.
Unlike unsupervised DL embedding extraction methods (David and James, 1987;Vincent et al., 2008) that use reconstruction loss for training, our loss exploits the available label providing sharper training guidance. That is, our learning approach is fully supervised with labels used to project the embedding of different classes in different separable clusters/regions. This is done by replacing the standard hyper-plane classification layer by a clustering classification one. That is, a cluster is assigned to each class and the input's class is predicted with probability proportional to its distance from the clusters' centre. This alters the loss function to penalise clusters close to each other and embeddings not falling in its assigned cluster (more details further below).
Detailed Pipeline Fig. 2D shows the DL-based embedding pipeline, which can be divided into three stages. First, the raw input image data are segmented and tracked using LEVER (see LEVER Processing section) to produce cell patches, which are further processed to extract morphological, fluorescence intensity and texture features at single-cell resolution (see Feature Extraction section). We then established a multi-layer DL NN to learn an informative/discriminative representation from the features. Although DL can be used to learn features from raw data in a fully automated way without experts' involvement, in our case, we used DL to refine the features by further reducing the dimensionality and noise before generating a 2D embedding at its last layer.
As stated above, training this embedding pipeline was done in a supervised way where labels were used to establish the objective function of the optimisation problem. To enforce separable embedding outputs in cloud shape, we replaced the classification layer normally attached to the NN by a clustering one. Nevertheless, our embedding-clustering network can still provide predictions as well as an embedding representation. In the following, we provide details of our NN architecture including both the embedding and the clustering components.

Network architecture
Our embedding network consists of 5 fully connected layers (the number of layers was chosen based on optimal performance), each followed by a ReLu non-linear function. The first layer reduces the input dimensions to hidden_dim dimensions that are then progressively reduced by 2 and finally to 2D at the last layer. Thus, the outputs of layers 2,3,4, and 5 are hidden_dim/2, hidden_dim/4, hidden_dim /8 and 2 respectively. These multi-layers of nonlinear transformations represent multi-levels of automatic feature extractions, where the input features at each layer are further refined so as to end up with an optimal features' output for the purpose of embedding representation expressed in the objective function. Denote ∈ 2 the embedding of input ∈ where is the number of input features. We express the embedding network transformation by governed by the network layers weight . The embedding = ( ) are then fed to the clustering network serving two purposes, prediction and to build the learning objective function. To force the embedding of a given class to fall in an isolated cloud, the clustering layer assigns a cluster centre to each class and expresses a probability distribution for each embedding over the classes, which is proportional to the distance from the classes' centres. Denote " ∈ # the centre of the cluster corresponding to class , thus, the prediction ∈ {1, … , } of embedding is distributed according to a categorical distribution governed by parameters = ( 1 , … ) where is the number of classes and " ∝ 1 − || − || such that ∑

Training method
So far, we have focused on the inference/feed-forward part of the model assuming the embedding parameters and the clustering ones , fixed. Next, we present the learning part (third stage in Fig 2D) of the method. As stated above, our approach is supervised, that is the ground truth (labels) are presented during learning. Thus, considering the model parameters as variables, we can use cross entropy as a loss function between the ground truth and the prediction outputs. Cross entropy loss is useful for training classification problems when the predicted output is governed by a multinomial/categorical distribution: where is the true probability.
Since, our ground truth is deterministic, we can write the loss as − ∑ log ( " Having defined the objective function, the learning is achieved by solving this optimisation problem with respect to embedding and clustering parameters. We used the Adam optimiser (Kingma and Ba, 2014) with weight decay regularisation.

Datasets and Implementation
We split our dataset into three portions, 80 % for training, 15 % for validation and 5 % for testing. The validation is used for parameter tuning. We performed a grid search over different learning rates, weight decays, number of network layers, hidden dimensions and batch size.
The parameters that gave the best results on the validation set were taken, namely, 0.0001 learning rate, 0.0001 weight decays, 512 hidden dimension, 64 batch size and 5 total fully connected layers. Our embedding-prediction code is implemented in Python 3.8 and using Pytorch 1.6. We trained on RTX 2080TI GPU on 53,120 (80 % of 66,400 total datapoints) datapoints consisting of four classes 20,933 Pluri, 18,927 Ecto, 1,036 Endo and 2,222 Meso.
The quantitative prediction results on the testing set as well as on data from an independent biological experiment are shown below:

Correlation and features importance by Bayesian statistics
Given the black-box nature of NN outputs, we sought to recall biological interpretability for our prediction/embedding pipeline. Supported by the pipeline's hybrid approach involving both manual feature extraction and DL feature learning components, we employed Bayesian statistics to provide declarative representation of how the input extracted features interact with each other. For this analysis, we only selected biologically interpretable features.
The methodology is divided into two main steps. Firstly, we learn a Bayesian network (i.e, Directed Acyclic Graph (DAG) representation) (Koller and Friedman, 2009) consisting of nodes representing the biologically interpretable features as random variables and edges expressing the conditional independency among these features. Doing so makes it computationally feasible to perform/approximate inference across the variables. Secondly, we use the trained network to perform inference, allowing us to compute linear correlations (Benesty et al., 2009) among the variables as well as the mutual information with the different fates.
The step of learning the Bayesian network is divided into two tasks. First, we learn the Bayesian network structure from the data using Hill-Climb Search (Koller and Friedman, 2009). The learned network structure expresses the conditional independency among features, but it does not encompass the optimal numerical parameters needed to compute the joint distributions and perform inference. To learn the network parameters, we maximize the expected log likelihood (Koller and Friedman, 2009) of the joint distribution defined by the learned network structure over the data with respect to its parameters.
Having learned the network structure and parameters, we are ready to infer needed information about the features' correlations as well as their importance to the cell fate. Fig 2A shows linear correlation among all interpretable features plus the cell fate. To compute this correlation, we first convert the Bayesian network to a Markov network using moralization (Koller and Friedman, 2009), and we then compute the covariance matrix using the marginal distribution induced by the network and take the values of connected variables. Linear correlation shows how features interact; however, they do not express the features importance to the cell fate. We used mutual information as a metric to measure the mutual dependency between every feature and the cell fate. The lower the mutual importance, the more independent the feature is from the cell fate, implying that less information is conveyed by the feature to the cell fate. That is, it expresses the feature's importance to the cell fate. Computing the mutual information between two variables entails knowing the joint and marginal distribution of these variables. We used Belief Propagation method to compute these distributions. Fig. 2B presents this feature importance measure for all the biologically interpretable features. In this figure, we do not set the cell fate variable and measure the importance to cell fate in general. In Fig. 2C, we set the cell fate variable as evidence and provide the feature importance to each of the four cell fate classes.
The Bayesian analysis code is implemented in Python.3.8 using the pgmpy 0.1 package. We adopt discrete Bayesian Network whose structure and parameters are learned using 391,725 discretised datapoints. Discretisation was performed using sklearn pre-processing package.     3.5 day time-lapse video sequence of hPSCs triggered to undergo ectoderm (EC) diffferentiation at time 0, corresponding to the same EC cells and colonies shown in Figure   S1. Images are fake coloured to display the feature value levels for cell density -a feature shown in Figure 3 to have high importance in distinguishing different cell fates -through time at single-cell level for each cell detected. Note that EC cells show high density through time.

Supplemental Video 6
3.5 day time-lapse video sequence of hPSCs triggered to undergo endoderm (EN) diffferentiation at time 0, corresponding to the same EN cells and colonies shown in Figure   S1. Images are fake coloured to display the feature value levels for cell density -a feature shown in Figure 3 to have high importance in distinguishing different cell fates -through time at single-cell level for each cell detected. Note that in EN cells cell density changes dramatically through time.

Supplemental Video 7
Visually augmented 3.5 day time-lapse video sequence of hPSCs cultured in pluripotency maintaining conditions, displaying predicted single-cell fate in real-time. The sequence corresponds to the the same pluripotent cells and colonies shown in Figure S1 and Video S1.
The cells' FUCCI signal is not shown, only H2B signal is shown faintly in the images, which instead show overlaid on the original image sequences the predominant (highest probability) fate predicted by the DEEP-MAP neural network through time at single-cell level for each cell detected. Magenta/blue/green/red: predicted hPSC/EC/ME/EN fate, correspondingly (colour code shown as image inset). hPSCs are robustly predicted correctly as hPSCs through time regardless of changes in colony size and density.

Supplemental Video 8
Visually augmented 3.5 day time-lapse video sequence of ectodermal-triggered hPSCs, displaying predicted single-cell fate in real-time. The sequence corresponds to the the same ectodermal (EC) triggered cells and colonies shown in Figure S1 and Video S2. The cells' FUCCI signal is not shown, only H2B signal is shown faintly in the images, which instead show overlaid on the original image sequences the predominant (highest probability) fate predicted by the DEEP-MAP neural network through time at single-cell level for each cell detected.
Magenta/blue/green/red: predicted hPSC/EC/ME/EN fate, correspondingly (colour code shown as image inset). EC triggered cells lose predicted hPSC status after just ~1 day following differentiation trigger and evolve gradually toward the EC fate.

Supplemental Video 9
Visually augmented 3.5 day time-lapse video sequence of mesodermal-triggered hPSCs, displaying predicted single-cell fate in real-time. The sequence corresponds to the the same mesodermal (ME) triggered cells and colonies shown in Figure S1 and Video S3. The cells' FUCCI signal is not shown, only H2B signal is shown faintly in the images, which instead show overlaid on the original image sequences the predominant (highest probability) fate predicted by the DEEP-MAP neural network through time at single-cell level for each cell detected.
Magenta/blue/green/red: predicted hPSC/EC/ME/EN fate, correspondingly (colour code shown as image inset). ME triggered cells lose predicted hPSC status after just ~1 day following differentiation trigger and evolve gradually toward the ME fate.

Supplemental Video 10
Visually augmented 3.5 day time-lapse video sequence of endodermal-triggered hPSCs, displaying predicted single-cell fate in real-time. The sequence corresponds to the the same endodermal (EN) triggered cells and colonies shown in Figure S1 and Video S4. The cells' FUCCI signal is not shown, only H2B signal is shown faintly in the images, which instead show overlaid on the original image sequences the predominant (highest probability) fate predicted by the DEEP-MAP neural network through time at single-cell level for each cell detected.
Magenta/blue/green/red: predicted hPSC/EC/ME/EN fate, correspondingly (colour code shown as image inset). EN triggered cells lose predicted hPSC status after just ~1 day following differentiation trigger and evolve gradually toward the EN fate apparently transiting through an intermediate ME fate.