Predicting cell fate commitment of embryonic differentiation by single-cell graph entropy

Cell fate commitment occurs during early embryonic development, that is, the embryonic differentiation sometimes undergoes a critical phase transition or “tipping point” of cell fate commitment, at which there is a drastic or qualitative shift of the cell populations. In this study, we presented a novel computational approach, the single-cell graph entropy (SGE), to explore the gene-gene associations among cell populations based on single-cell RNA sequencing (scRNA-seq) data. Specifically, by transforming the sparse and fluctuating gene expression data to the stable local network entropy, the SGE score quantitatively characterizes the criticality of gene regulatory networks among cell populations, and thus can be employed to predict the tipping point of cell fate or lineage commitment at the single cell level. The proposed SGE method was applied to five scRNA-seq datasets. For all these datasets of embryonic differentiation, SGE effectively captures the signal of the impending cell fate transitions, which cannot be detected by gene expressions. Some “dark” genes that are non-differential but sensitive to SGE values were revealed. The successful identification of critical transition for all five datasets demonstrates the effectiveness of our method in analyzing scRNA-seq data from a network perspective, and the potential of SGE to track the dynamics of cell differentiation.

these embryonic time-course differentiation datasets, the predicted cell fate transitions agree with the 77 observation in original experiments. In these applications, from the dynamic perspective, it is also 78 demonstrated that SGE has better performances than original gene expression in temporal clustering 79 of cells, that is, the clustering analysis based on SGE score accurately distinguishes the cell 80 heterogeneity over time while the gene expression fails. Based on the temporal clustering by SGE, the 81 cell-lineage trajectories can be presented to further study the cell differentiation paths. Besides, in the 82 analysis of these single-cell datasets, SGE uncovers a few "dark" genes, which are non-differential in 83 gene expression but sensitive to SGE score and may play important roles in embryonic development 84 (Fig. 1D). Therefore, the SGE method provides a new way to analyze the scRNA-seq data, and helps 85 to track the dynamics of biological systems from the perspectives of network entropy. The successful 86 application of SGE validated its effectiveness in single-cell analysis.  18], this dynamical process is modeled as three states or stages ( Figure 1C): (1) a 95 before-transition stage with high resilience; (2) a critical stage, which is the tipping point or cell fate 96 transition with low resilience; (3) an after-transition stage, which is another stable state with high 97 resilience. 98 In this study, the cell-specific networks were constructed based on a recently proposed statistical 99 model [12], which provides a statistical dependency index (defined as Eq. (1)) to determine the gene 100 associations at a single-cell level in a reliable manner. The statistic index ranges between -1 and 1. The 101 positive statistical dependency value infers the statistically interacting relation between two genes, i.e., 102 there is an edge between such two genes in the cell-specific network.  104 Given the time series of single-cell RNA sequencing (scRNA-seq) data, the following algorithm is 105 carried out to predict the critical transition.   constant is the number of neighbors in the local network ( ) . Clearly, the local SGE value (Eq.

138
(2)) has been normalized to the number of nodes in a local network. After this step, the sparse gene 139 expression matrix from the scRNA-seq data is transformed into a non-sparse graph entropy matrix

172
For MEF-to-Neurons data, the mean SGE score abruptly increases from day 5 to day 20, as shown 173 as the red curve in Fig. 2A. This significant change of SGE score provides the early-warning signal to  For hESCs-to-DECs data, the peak of the SGE score (the red curve in Fig 2C)  showed by the median values of the box plot (the red box plot in Fig. 2C). Moreover, in terms of mean 199 gene expression, there is no significant difference among six points time (the blue curve in Fig. 2C).

200
As the red curve shown in Fig. 2D, for MHCs-to-HCCs data, the drastic increase of average SGE 201 appeared from E11.5 to E12.5 and reaches its peak at E12.5, after which hepatoblast-to-hepatocyte and 202 cholangiocytes transition occurs [22]. Moreover, the median values of the red box plot of SGE score  The SGE method has been applied to mESCs-to-MPs data, which is obtained from an experiment 207 of a retinoic acid (RA)-driven differentiation of pluripotent mouse embryonic stem cells (mESCs) to 208 lineage commitment [23]. It is seen from the red curve in Fig. 2E, the mean SGE score reaches its peak  The data transformation from the gene expression matrix to the SGE matrix not only helps to detect 246 the critical transitions of embryonic development, but provides a better way to perform clustering 247 analysis on cells during a biological process and thus explore dynamical information of cell populations.

248
The t-distributed stochastic neighbor embedding (t-SNE), a nonlinear method to perform dimension- from MHCs to HCCs (Fig. 4F). The MHCs-to-HCCs transition occurs immediately after embryonic 272 day 12.5 (E12.5), which is consistent with the results of the original experimental observation [22]. some genes were also discovered as the "dark" genes, which were non-differential in gene expression, 289 but sensitive to SGE scores. These genes show a significant difference between the critical point and hESCs-to-DECs data. Other "dark genes" for these three datasets were respectively presented in 295 Supplementary_material_3, Supplementary_material_4, and Supplementary_material_5. The results

296
for the mESCs-to-MPs data and NPCs-to-Neurons data are respectively provided in 297 Supplementary_material_6 and Supplementary_material_7. It is obvious that there are no significantly 298 differential changes at the gene expression level, but significantly differential changes at the network 299 entropy (SGE) level. Some "dark genes" have been reported to be associated with embryonic development, which illustrates that these "dark genes" play important roles in embryonic development.

301
For these three datasets, the "dark genes" which are associated with embryonic development are 302 demonstrated in Table 1-3, respectively. 303 304 3.5 Revealing vital biological signals by common dark genes. 305 Based on genes with differential SGE values, we found 6 common signaling genes (CSGs) for human 306 embryo development among NPCs-to-Neurons data and hESCs-to-DECs data ( Figure S6A

329
As an important transcription factor, HLTF has both helicase and E3 ubiquitin ligase activities. 330 We have noticed that it is directly involved in Ras activation upon Ca2 + influx through the NMDA development. Besides, SGE helps to uncover "dark genes", which are non-differential in gene 397 expression but sensitive to SGE score. Such non-differential genes were often ignored by the traditional 398 differential gene expression analyses. However, some non-differential genes may also be involved in    (2) and then get local SGE scores corresponding to local networks.

612
(C) Critical transition can be predicted through the significant increase of SGE, i.e., the SGE keeps 613 smooth when the system is in before-transition stage, while it increases abruptly when the system 614 approaches the critical stage. (D) Different from the traditional biomarkers based on differential-615 expression genes, our SGE method uncovers some "dark genes", which are sensitive to network 616 entropy (SGE), but non-differential at the gene expression level.  This is a provisional file, not the final typeset article

BAG1
Cytoplasm other BAG1 is essential for differentiation and survival of hematopoietic and neuronal cells [33].

PPP2R2D
Nucleus other PPP2R2D is correlated with embryonic growth and development [34]. 650

CDK6
Nucleus kinase CDK6 has sub-type specific and cell cycle regulation-independent functions utilized during embryonic development and differentiation of stem cells [39].

CASP3
Cytoplasm peptidase CASP3 promotes the differentiation of murine embryonic stem cell by cleaving the pluripotency factor Nanog [40].

UAP1
Nucleus enzyme Defective FANCD2 regulated by UAP1 leads to the increase in chromosomal instability in mESCs and mouse embryonic lethality [42].