## Abstract

Cell type-specific gene expression patterns are outputs of transcriptional gene regulatory networks (GRNs) that connect transcription factors and signaling proteins to target genes. These networks reconfigure during dynamic processes such as cell fate specification to drive diverse cellular states. Single-cell transcriptomic technologies, such as single cell RNA-sequencing (scRNA-seq) and single cell Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq), can examine the transcriptional state of individual cells, allowing the study of cell-type specific gene regulation at unprecedented detail. However, current approaches to infer cell type-specific gene regulatory networks from these datasets are limited in their ability to integrate scRNA-seq and scATAC-seq measurements and to model network dynamics on a cell lineage. To address this challenge, we have developed single-cell Multi-Task Network Inference (scMTNI), a multi-task learning framework to infer the gene regulatory network for each cell type on a lineage from scRNA-seq and scATAC-seq data. Using simulated, published and newly collected single cell omic datasets, we show that scMTNI is able to accurately infer gene regulatory networks and captures meaningful network dynamics that identify GRN components associated with cell type transitions. Application of our method to mouse cellular reprogramming identified key regulators associated with cell populations that reprogram versus those that are stalled. Taken together, scMTNI is a powerful framework to infer cell type-specific gene regulatory networks and their dynamics from scRNA-seq and scATAC-seq datasets.

## Introduction

Transcriptional gene regulatory networks (GRNs) specify connections between regulatory proteins and target genes and determine the spatial and temporal expression patterns of genes ^{1,2}. These networks reconfigure during dynamic processes such as development or disease progression, to specify cell type specific expression levels. Recent advances in single cell omic techniques such as single cell RNAsequencing (scRNA-seq) and single cell Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq) ^{3} enable collecting high resolution molecular phenotypes of a developing system and offer unprecedented opportunities for the discovery of cell type-specific regulatory networks and their dynamics. However, computational methods to systematically leverage these datasets to identify regulatory networks driving cell type-specific expression patterns, are limited.

Existing methods of network inference from single cell omic data ^{4–16} have primarily used transcriptomic measurements and have low recovery of experimentally verified interactions ^{17,18}. Recently a small number of methods have attempted to integrate scRNA-seq and scATAC-seq datasets ^{19,20} to examine gene regulation, however, the primary focus of these methods is to define cell clusters and the network is defined entirely based on accessible sequence-specific motif matches. This restricts the class of regulators that can be incorporated into the regulatory network to those with known motifs. Furthermore, existing methods infer a single GRN for the entire dataset or do not model the cell population structure which is important to discern dynamics and transitions in the inferred networks for cell type-specificity.

To overcome the limitations of existing methods, we have developed single-cell Multi-Task Network Inference (scMTNI), a multi-task learning framework that integrates the cell lineage structure, scRNAseq and scATAC-seq measurements to enable joint inference of cell type-specific GRNs. scMTNI takes as input a cell lineage tree, scRNA-seq data and scATAC-seq based prior networks for each cell type. scMTNI uses a novel probabilistic prior to incorporate the lineage structure during network inference and outputs GRNs for each cell type on a cell lineage. We performed a comprehensive benchmarking study of multi-task learning approaches including scMTNI on simulated data and show that incorporation of multi-task learning and tree structure is beneficial for GRN inference.

We applied scMTNI to a novel scRNA-seq and scATAC-seq time course dataset for cellular reprogramming in mouse and a published scRNA-seq and scATAC-seq cell-type specific dataset for human hematopoietic differentiation. We demonstrate the advantage of integration of scATAC-seq and scRNAseq datasets for inferring cell type specific GRNs and their dynamics. We examined how the inferred networks change along the trajectory and identified regulators and network components specific to different parts of the lineage tree. Our predictions include known as well as novel regulators of cell populations transitioning to different lineage paths, providing insight into regulatory mechanisms associated with hematopoietic specification and reprogramming efficiency.

## Results

### Single-cell Multi-Task learning Network Inference (scMTNI) for defining regulatory networks on cell lineages

We developed scMTNI, a multi-task graph learning framework for inferring cell type-specific gene regulatory networks from scRNA-seq and scATAC-seq datasets (**Figure 1A**), where a cell type is defined by a cluster of cells with a distinct transcriptional and accessibility profile. scMTNI models a GRN as a Dependency network ^{21}, a probabilistic graphical model with random variables representing genes and regulators, such as transcription factors (TFs) and signaling proteins. scMTNI takes as input cell clusters with gene expression and accessibility profiles and a lineage structure linking the cell clusters (**Figure 1**). Such inputs can be obtained from existing methods for integrative clustering ^{22} and lineage construction ^{23}. scMTNI uses the scATAC-seq data for each cell cluster to define cell type-specific sequence motif-based TF-target interactions (e.g., a motif for a particular TF, which is accessible only in specific cell types will result in a TF-target interaction only in those cell types) which are used as a prior to guide network inference (**Methods**). The output of scMTNI is a set of cell type-specific GRNs one for each cell cluster in the lineage tree. scMTNI’s multi-task learning framework incorporates a novel lineage tree prior, which uses the lineage tree structure to influence the similarity of gene regulatory networks on the lineage. This prior models the change of a GRN from a start state (e.g., progenitor cell state) to an end state (e.g. more differentiated state) as a series of individual edge-level probabilistic transitions. While scMTNI was developed to incorporate both scRNA-seq and scATAC-seq data, it can be applied to situations where scATAC-seq, and therefore a cell type-specific prior network, is not available. We refer to the versions of our approach as scMTNI+prior and scMTNI depending upon whether it uses prior knowledge or not. The output networks of scMTNI are analyzed using two dynamic network analysis methods: edge-based k-means clustering and topic models (**Figure 1B**). These approaches identify key regulators and subnetworks associated with a particular cell cluster or a set of cell clusters on a branch.

### Multi-task learning algorithms outperform single-task algorithms for single cell network inference

To evaluate scMTNI and other existing algorithms with known ground truth networks on single-cell transcriptomic data, we set up a simulation framework, which entailed creation of a cell lineage, generating synthetic networks and corresponding single-cell expression datasets for each cell type on the lineage (**Figure 2A**). We used a probabilistic process of network structure evolution to simulate the network structure for three cell types, each containing 15 regulators and 65 genes and between 202-239 edges (**Methods**). Next, we applied BoolODE ^{17} to simulate the *in silico* single-cell expression data using each cell type’s simulated network. To mimic the sparsity in single-cell expression data, we set 80% of the values to 0. We created three datasets with different numbers of cells: 2000, 1000, 200, referred here as dataset 1, dataset 2, dataset 3.

We asked whether multi-task learning is beneficial compared to single-task learning for network inference from scRNA-seq data. To this end we compared scMTNI and four other multi-task learning algorithms, MRTLE ^{24}, GNAT ^{25}, Ontogenet ^{26}, and AMuSR ^{27} to three single-task algorithms, LASSO regression ^{28}, INDEP, and SCENIC ^{29} (**Methods**). Of these methods only SCENIC uses a non-linear regression model while the others are based on linear models. INDEP is similar to scMTNI but does not incorporate the lineage prior. Each algorithm was applied within a stability selection framework and evaluated with Area under the Precision recall curve (AUPR) and F-score of top *k* edges, where *k* is the number of edges in the true network (**Figure 2B, C**). On dataset 1, based on AUPR, scMTNI, MRTLE and AMuSR are able to recover the network structure (**Figure 2B**) better than the other multi-task learning and single-task learning algorithms. Ontogenet performs better than the single-task learning algorithms in at least two cell types. Finally, GNAT performs comparably to the single-task learning algorithms. When comparing algorithms based on F-score of top *k* edges, we have similar observations that scMTNI and MRTLE have a better performance than other algorithms (**Figure 2C**). Ontogenet performs better than LASSO and INDEP in at least two cell types, and comparable to SCENIC, except that Ontogenet in cell type 3 is worse than SCENIC. GNAT is comparable to the single-task learning algorithms for at least 2 of the cell types. The low F-score of AMuSR is because the inferred networks are too sparse, with fewer than 100 edges, while the other algorithms inferred similar number of edges with the true networks. These results remain consistent for datasets 2 and 3 which have fewer cells (1000 and 200, respectively), scMTNI and MRTLE remain superior in performance than other algorithms measured by both AUPR and F-score (**Figure 2B, C**). We expect scMTNI to be better since the network simulation procedure is similar, but the data generated is different and independent. Finally, we aggregated the results across all three cell types and datasets to obtain an overall comparison of the algorithms. Here we considered algorithms across all parameter settings tested as well as the best parameter setting determined by the best F-score or AUPR. Based on the AUPR of “all parameter setting”, we found that multi-task learning methods, especially scMTNI and MRTLE are generally better than single-task learning methods with higher AUPRs (**Supplementary Figure 1A**,**C**). AMuSR also outperformed the single-task algorithms based on AUPRs, although this was not as significant as MRTLE and scMTNI. When considering the “best parameter setting” the methods were not significantly different when using AUPR, though MRTLE and scMTNI had the highest AUPR (**Supplementary Figure 1B**,**D**). When using the F-score, scMTNI and MRTLE remained top performing algorithms for the “all parameter setting” (**Supplementary Figure 2A**,**C**) and the “best parameter setting” (**Supplementary Figure 2B**,**D**). Further, GNAT and Ontogenet had a higher F-score than the single-task learning method LASSO for the “all parameter” and “best parameter” settings. AMuSR suffered for the F-score metric due to the high sparsity in the inferred networks. Across different single-task algorithms, LASSO had the worst performance. Overall, the results on the simulated networks suggest that multi-task learning algorithms have a better performance than single-task algorithms for network inference on sparse datasets, similar to single-cell transcriptomic data. Furthermore, scMTNI and MRTLE are able to more accurately infer networks than other multi-task learning algorithms.

### Inference of gene regulatory networks of somatic cell reprogramming to induced pluripotent stem cells

Cellular reprogramming is the process of converting cells in a differentiated state to a pluripotent state and is important in regenerative medicine as well as for generating patient-specific disease models. However, this process is inefficient as a small fraction of cells get reprogrammed to the pluripotent state ^{30}. To gain insight into the gene regulatory networks that govern the dynamics of this process, we profiled single cell accessibility (scATAC-seq) during the reprogramming process from mouse embryonic fibroblasts (MEFs) to the induced pluripotent state and four intermediate timepoints, day3, day6, day9 and day12, to constitute a dataset of 6 timepoints. We used LIGER to integrate the scRNA-seq and scATACseq datasets (**Figure 3A, B**) and identified 8 clusters (**Methods**). Of these clusters, C4 is MEF-specific while C5 is ESC-specific (**Figure 3C, D**) and showed good integration of the scRNA-seq and scATACseq profiles. We removed C6 as it did not have scRNA-seq cells and applied a minimum spanning tree (MST ^{23}) approach to construct the cell lineage tree from the 7 cell clusters with both scRNA-seq and scATAC-seq (**Methods, Figure 3E**). The MEF-specific cluster (C4) is at one end of the tree, while the ESC-specific cluster (C5) is at the other end. This is consistent with the starting and end state of the reprogramming process and we considered C4 to represent the root of the tree.

We applied scMTNI, scMTNI+prior (scMTNI with prior network), INDEP, INDEP+prior (INDEP with prior network) and SCENIC to this dataset (**Figure 3F**). We used the matched scATAC-seq clusters to obtain transcription factor (TF)-target prior interactions for each scRNA-seq cluster needed for INDEP+prior and scMTNI+prior (**Methods**). We assessed the quality of the inferred networks by comparing to three gold standard datasets in mouse embryonic stem cells (mESCs, **Table 2**), one derived from ChIP-seq experiments (referred to as “ChIP”) from ESCAPE or ENCODE databases ^{31,32}, one from regulator perturbation experiments (referred to as “Perturb”) ^{31,33}, and the third from the intersection of edges in ChIP and Perturb (referred to as “ChIP+Perturb”). We compared the performance of the methods using F-score on the top 500, 1k and 2k edges across methods (**Figure 3F, Supplementary Figure 3, 4**). On Perturb and Perturb+ChIP, scMTNI+Prior had a higher average performance, outperforming other methods significantly in Perturb. On ChIP, SCENIC was generally better than other methods. To examine the poorer performance of scMTNI+Prior for the ChIP gold standard, we compared the regulators and targets in the inferred networks from each method. Between SCENIC and scMTNI, the number of regulators are similar, but SCENIC’s networks have more target genes, which recovered more targets from the gold standard datasets, resulting in a higher F-score. scMTNI+prior outperformed scMTNI in all but the ChIP dataset, and INDEP+prior outperformed INDEP, indicating that addition of priors based on scATAC-seq data was beneficial.

To gain an initial assessment of the network dynamics on the cell lineage, we computed F-score between each pair of inferred networks defined by the top 4k edges (**Figure 3G**). Both scMTNI and scMTNI+prior networks diverged in a manner consistent with the lineage structure. scMTNI networks formed three groups of cell types, (C4, C8, C1, C7), (C2, C3) and (C5 (ESC)). scMNTI+prior found similar groupings but placed C5 (ESC) closer to (C1, C7, C8, C4) branch. Both methods showed that C5 is closest to C1, which could be an important transitioning state of cells during reprogramming. SCENIC showed similarity among C1, C4, C7, however had lower similarity scores for most pairwise comparisons which made it difficult to discern a clear lineage structure. The networks inferred by the other methods were very divergent which is not biologically realistic because the reprogramming system is heterogeneous with a number of transitioning populations. Overall, these results suggest that scMTNI+prior recovered regulatory networks are of high quality and the networks exhibit a gradual rewiring of structure from the MEF to the pluripotent state.

### scMTNI predicts key regulatory nodes and GRN components that are rewired during reprogramming

To gain insight into which cell populations successfully reprogram versus those that do not and to further characterize these different cell clusters, we examined the specific rewired network components in each cell type-specific network inferred by scMTNI+prior using two complementary approaches: kmeans edge clustering and Latent Dirichlet Allocation (LDA, **Methods**). In the k-means edge clustering approach, we represented each edge in the top 4k confidence set of any cell cluster, by a vector of confidence scores in each cell cluster-specific network (if an edge is not inferred in the network it is assigned a weight of 0). Next, we clustered edges based on their edge confidence pattern into 20 clusters determined by the Silhouette Coefficient optimization (**Figure 4A**). The largest “edge clusters” exhibited interactions specific to one cell cluster (e.g., E4, E6, E7, E11, E13, E15 and E16), while smaller clusters exhibited conserved edges for more than one cell cluster (e.g., E2, E5, E12). To interpret these edge clusters, we identified the top regulators associated with each of the edge clusters (**Figure 4B**). E16, which was MEF-specific (C4) had Npm1, Nme2, Thy1, Ddx5 and Loxl2 as the top regulators which are known MEF-specific genes. In contrast, E11, which was ESC-specific (C5) had Klf4, Lhx2, Elf4 which have known roles in stem cell maintenance (Klf4) and differentiation into neural (Lhx2^{34}) and hematopoieitic lineage (Elf4^{35}). Edge clusters that shared edges across multiple cell clusters, e.g. E5 (C4, C8 and C1), shared some of the top-ranking regulators such as Npm1 and Thyb1 with the MEF-specific cluster and also identified other fibroblast-specific genes such as Col5a2 and Ybx1. Finally, E2 which comprised shared edges between cell clusters C1 and C5, contained Esrrb, as its top regulator (**Figure 4B**). Esrrb plays an important role for establishing naive pluripotency. This further supports the lineage structure that C1 likely represents a population of cells that are committed to becoming pluripotent.

While the k-means analysis identified regulatory hubs specific to individual cell clusters, it was challenging to identify sub-network components that rewired at specific branch points likely because it treats each edge independently. We developed an approach by adopting Latent Dirichlet Allocation (LDA) that was recently used to study regulatory network rewiring from transcription factor ChIP-seq datasets ^{36} (**Methods**). In this approach, each TF is treated as a “document” and target genes are treated as “words” in the document. Each document (TF) is assumed to have words (genes) from a mixture of topics, each topic in turn interpreted as a pathway. TFs across cell clusters are treated as separate documents. We applied LDA with *k* = 10 topics (**Figure 4C, D, Supplementary Figure 5**,**6, 7**), and examined each of the topics based on their Gene Ontology process enrichment (**Supplementary Figure 8**), and the tendency and identity of specific regulators to rewire across the cell clusters (**Methods**). Topic 3 networks were among the most divergent networks across the cell populations and identified several known regulators for the pluripotency fate (**Figure 4C**). In particular, Esrrb was a hub in C5 (ESC) and C1 (closest to ESC) but absent in the other cell clusters. Topic 3 is enriched for cell cycle and developmental terms (**Supplementary Figure 8**). Comparison of the regulators in the (C1,C5) branch and (C7, C3, C2) branch showed that the latter branch had regulators such as Wt1. Wt1 was a major regulator in the starting MEF cluster as well suggesting the incomplete suppression of the MEF-specific program in the C7-C3-C2 branch. Wt1 is an important regulator of cellular developmental processes and can act both as a tumor suppressor and an oncogene ^{37}. Topic 9 was also interesting in that it identified the persistence of the regulators Ccng1 and Nme2 from the MEF-specific cell cluster (C4) in the C7-C3-C2 branch. Ccng1 is a cyclin that is part of the p53 pathway, which has been previously identified to be associated with the inefficiency of cellular reprogramming ^{38,39}. Nme2 is known to regulate Myc, which is an oncogene and also one of the four reprogramming factors ^{40}. The cellular reprogramming process has been considered to be similar to tumorigenesis which is supported by the identification of regulators associated with cancer signaling pathways for populations that do not reprogram. Inhibition of these regulators could potentially improve the reprogramming process. In total, using scMTNI and network rewiring analysis we identified known cell population-specific regulators and also predicted new regulators that can be perturbed to examine the impact on cellular reprogramming efficiency.

### Inferring gene regulatory networks in human hematopoietic differentiation

To examine the utility of scMTNI in a different cell fate specification system, we applied scMTNI to a published scATAC-seq and scRNA-seq dataset for human hematopoietic differentiation ^{41}. This dataset profiled accessibility and transcriptomic state of immunophenotypic populations that were sorted based on cell surface markers in hematopoietic differentiation and enabled studies of how multipotent progenitors transit into lineage-restricted cell states. We considered the cell populations measured with both scATAC-seq and scRNA-seq datasets: hematopoietic stem cell (HSC), common myeloid progenitor (CMP), granulocyte-macrophage progenitors (GMP) and monocyte (Mono). These populations are known to be heterogeneous comprising multiple sub-populations ^{41}. To identify these sub-populations we again applied LIGER ^{22} and identified 10 integrated clusters of RNA and accessibility (**Figure 5A-D**). Most clusters exhibited a mixed composition: C8 is mainly composed of HSCs but also included CMP0 cells; C6 and C9 are composed of GMP and CMP0 cells. C1 (73 cells) and C4 (37 cells) were mainly composed of Mono cells and were combined into C1. C5 had too few RNA cells (22 cells) and was excluded from further analysis. We next inferred a cell lineage tree from these 8 cell clusters using a minimal spanning tree approach ^{23} as described in the reprogramming study (**Figure 5E, Methods**). As C8 is largely made up of HSC cells and HSC is the starting cell type, we treat C8 as the root of the lineage.

We applied the same set of network inference algorithms to this dataset as the reprogramming dataset: scMTNI, scMTNI+prior, INDEP, INDEP+prior and SCENIC. We assessed the quality of the inferred networks from each method by comparing them to gold-standard edges from published ChIP-seq and regulator perturbation assays from several human hematopoietic cell types. This included ChIP-seq datasets from the UniBind database (Unibind ^{42}), ChIP-seq (Cus ChIP) and regulator perturbation (Cus KO) experiments in the GM12878 lymphoblastoid cell line from Cusanovich et al ^{43} and the intersection of ChIP and perturbation studies (Cus KO+Cus ChIP, Cus KO+Unibind). In total we had five gold standard networks. We used F-score of the top 500, 1k, 2k edges in the inferred network (**Methods, Figure 5F, Supplementary Figure 9**). The relative performance of the algorithms depended upon the gold standard. Algorithms that did not use priors (INDEP, SCENIC and scMTNI) performed comparably (with no significant difference) on three of the five gold standards. On Unibind and Cus KO+Unibind, SCENIC is significantly better than INDEP and scMTNI (**Supplementary Figure 10**). Methods that used priors, INDEP+prior, scMTNI+prior, were generally better than methods without priors. INDEP+prior and scMTNI+prior are comparable across the gold standard datasets with no significant difference in performance. For the Unibind dataset, we had ChIP-seq based gold standard edges for different blood cell types, with 1 to 48 transcription factors (**Table 3**). When comparing to these cell type-specific gold standards, prior based methods have a better performance especially for datasets with more TFs among top 500 and 1k edges (**Supplementary Figure 11**,**Supplementary Figure 12**). Furthermore, INDEP+prior had the best overall performance indicating that incorporation of accessibility priors is more important rather than the lineage information. However, these gold standards were much smaller and therefore can assess smaller portion of the inferred networks.

We next examined the inferred networks for the extent of change on the lineage structure (**Figure 5G**). The single-task learning methods INDEP and INDEP+prior exhibited a low overlap across each pair of cell lines and did not as such obey the lineage structure. SCENIC recovers part of the lineage structure, but placed C7 (common myeloid) close to C6 (granulocyte-macrophage progenitors (GMP)) rather than C10, which has similar sample composition as C7. In contrast, scMTNI and scMTNI+prior were able to find two groups of cell types, one corresponding to HSC and CMP2 branch consisting of C8, C3 and C2, and the second corresponding to the CMP0, CMP1 and GMP branch (C6, C9, C10 and C7). The excessive divergence identified by the single-task learning methods makes it difficult to identify and prioritize specific network level changes driving cell fate decisions.

### Inferring shared and lineage-specific regulators for hematopoietic differentiation

Similar to our cellular reprogramming study, we examined the scMTNI+prior networks to identify cell type-specific regulators and network components (**Figure 6**). We applied k-means edge clustering to top 5k edges in any of the cell clusters and identified 19 edge clusters (**Methods**). Compared to the reprogramming study, a larger portion (94% vs 86%) of the edges are specific to one cell cluster (**Figure 6A**). We used these edge clusters to identify differences among cell clusters which had similar compositions of the initial cell types identified based on cell surface markers, e.g., C7 and C10 had similar composition of CMP0, CMP1, CMP2 cells and C6 and C9 had similar composition of GMP and CMP cells. Edge cluster E2 had edges specific to cell cluster C7 and was associated with PLEX, YBX1, EEF1A1, TSC22D3. PLEK and YBX1^{44} are known to be involved in directing fate of HSCs, while both EEF1A1 and TSC22D3 have immune-related functions. In contrast, E8 which had edges specific to C10 had different regulators, namely KLF7, ETV5, MBD2, ZNF202, EPM2A, ULK4. Of these, KLF7, ETV5 and MBD2 have known regulatory roles in hematopoiesis, with ETV5 regulating a population of Th9 cells ^{45} and KLF7 suppressing the formation of myeloid cells ^{46}. Edge cluster E11, which was specific to C6 ranked SP4, TYR03, ZNF417, MNDA highly. MNDA is associated with granulocyte-monocyte lineage ^{47}. In contrast, E6, which was specific to C9 had a different set of top regulators including L3MBTL4, GABPA, ELF4, and RGS14. Both GABPA and ELF4 have important roles in hematopoeisis ^{48,49}. A few edge clusters represented shared network components, e.g. E19 had edges from C6, C9, C10, C7 that represented the GMP and CMP populations and E12 representing edges from C10 and C7. Both E19 and E12 had YBX1 and TSC22D3 as top regulators (**Figure 6B**). YBX1, is known to have high expression in myeloid progenitor cells ^{44}, and regulates CCL5 expression during monocyte/macrophage differentiation ^{50}. TSC22D3, which is a glucocorticoid leucine zipper ^{51}, is involved in differentiation of hematopoietic stem cells ^{52}. Taken together, the k-means edge clustering approach helped identify the key regulators with known or plausible roles in hematopoiesis that could explain the differences among the cell clusters.

To identify cell type-specific network rewiring that are associated with lineage decisions, we again examined the regulatory networks of each cell cluster using LDA (**Methods, Figure 6C, D**). The topics were enriched for diverse biological processes such as cell cycle (Topic 1 and 8, **Supplementary Figure 16**), blood related processes (Topic 9) and represented subnetworks with different extents of conservation across the lineage. For example, topic 2 showed a gradual rewiring of an ID2-specific network from the HSC populations (C8, C3, C2), to KLF1 and MYC centered networks for C7 and C10 which represented the CMP0 population. ID2 is known to negatively regulate differentiation, which is consistent with its presence in the C8, C3, C2 branches. KLF1 is an essential regulator for the erythroid lineage ^{53,54}, which is derived from the myeloid progenitor cells and therefore the association of KLF1 with these cells is consistent with the literature. Topics 1, 6 and 10 exhibited a conserved core around HMGB2, TSC22D3, and YBX1 respectively, across all cells clusters (**Supplementary Figure 13, 14, 15**). HMGB2 is an important regulator for HSCs ^{55}. Both YBX1 and TSC22D3, which were also identified in our k-means analysis, have known role in hematopoeisis ^{44}. Topic 8 was associated with various cell cycle and chromatin remodeling regulators such as TOP2A, CDC20 and CCNB1 (**Supplementary Figure 15, 16**). Taken together, the LDA analysis identified differential subnetworks centered to candidate cell fate drivers in hematopoeisis that could be followed up with functional studies.

## Discussion

Single-cell technologies have transformed our ability to study cellular heterogeneity and cell-type specific gene regulation of known and novel cell populations. Defining gene regulatory networks from scRNA-seq data of developmental systems has remained challenging as most existing methods have assumed a static view of the GRN and do not leverage accessibility to inform the GRN structure. To address this need, we develop single-cell Multi-Task Network Inference (scMTNI), a probabilistic graphical model-based approach that uses multi-task learning to infer cell type-specific GRNs on a cell lineage tree by integrating scRNA-seq and scATAC-seq data and model the dynamics of these regulatory interactions on a lineage.

Multi-task learning is well-suited for the inference of cell type-specific GRNs. However, a key question is how to implement multi-task learning for GRN inference. A number of multi-task learning algorithms were developed for inferring GRNs and functional networks from bulk transcriptomic data but have not been systematically compared for their effectiveness on single-cell transcriptomic data. Some approaches, such as AMuSR ^{27} have used a flat hierarchy where all the tasks are considered equally related. For heterogeneously related datasets, a hierarchy or a tree is well-suited to model the dependence across datasets. Such hierarchies can be implemented as a phylogenetic tree with observed data at the tips of the tree as in GNAT ^{25} and MRTLE ^{24}, or as a cell-lineage tree with observations at all nodes in the tree. scMTNI and MRTLE both use a tree-based structure prior, whereas AMuSR, GNAT and Ontogenet used a regularized regression parameter to implement multi-task learning. scMTNI and MRTLE have better performance in predicting the gene regulatory relationships than single-task learning algorithms. The performance of Ontogenet is better than the single-task learning algorithms LASSO and INDEP in at least two cell types, and comparable to SCENIC. A prominent factor contributing to the difference in the performance of the algorithms was whether the models inferred a directed graph versus an undirected graph, with GNAT generally suffering likely due to this reason. Performance of GNAT is worst among multi-task learning algorithms and comparable to the single-task learning algorithms. We speculate that the undirected relationship in the graphical model of GNAT might be a reason that the performance is not as good as other multi-task learning algorithms. We also examined the performance of algorithms across different parameter settings that control for sparsity as well as for sharing information. We found that the algorithms were generally robust to the setting of sharing and more sensitive to the extent of sparsity. However, multi-task learning algorithms generally outperformed single-task learning algorithms indicating that this is a useful direction for methodological development for GRN inference from single cell omic datasets. Importantly, single-task learning infers very different networks that makes it challenging to study transitions across the networks.

Once GRNs are inferred across multiple cell types, the next challenge is to examine which components of the GRNs change along the lineage. We developed two complimentary techniques to study dynamics. Our k-means edge clustering method was able to find regulatory connections that were unique to each cell cluster, while our topic model-based dynamic network analysis highlighted subnetworks that were activated or deactivated along the lineage. We applied our tools to study GRN dynamics in hematopoietic cell differentiation and reprogramming from mouse embryonic fibroblasts to embryonic stem cells. We found that both these systems exhibited different dynamics, with the reprogramming system exhibiting more edges shared across populations compared to the hematopoietic system which identified most edges as cell cluster-specific. In both systems, our analysis identified known and novel regulators. For example, in the reprogramming system, we found that cells that were closer to the end point pluripotent state already had an Esrrb-centered GRN component active. In contrast, for cells that were on an alternate trajectory had several oncogenes such as Wt1 as key regulators. In the hematopoietic system, our analysis examined immuno-phenotypically similar populations by identifying different set of hematopoietic regulators associated with such populations.

scMTNI currently assumes that the input lineage structure is accurate. However, lineage construction, especially from integrated scRNA-seq and scATAC-seq datasets is a challenging problem. One direction of future work is to assume the initial lineage structure is inaccurate and incorporate the refinement of the lineage structure as part of the GRN inference procedure. A second direction of work is to model more fine-grained transitions within each cell population, for example using RNA velocity or pseudotime, which will complement the coarse-grained dynamics that scMTNI currently handles. Studies from bulk RNA-seq data have shown that estimating hidden transcription factor activity (TFA) ^{56} can further improve the performance of network inference. Thus, another direction of future work is to estimate hidden TFA and incorporate these to improve the accuracy of the inferred networks. Finally, SCENIC performs very well among the single-task learning algorithms, which is likely because of its regression-tree based model that captures non-linear dependencies and is less prone to the sparsity of the dataset. While scMTNI’s stability selection framework can capture some non-linearities, another direction of future work is to extend scMTNI to model more non-lineage dependencies.

In summary, scMTNI is a tool to infer cell type-specific regulatory networks and their dynamics on a cell lineage which combines scRNA-seq and scATAC-seq data. As single cell multi-omic datasets become increasingly available, we expect scMTNI to be broadly applicable to predict GRNs and identify important regulators associated with regulatory network dynamics across cell types in diverse cell-fate specification processes.

## Methods

### Single-cell Multi-Task Network Inference (scMTNI)

Single-cell Multi-Task Network Inference (scMTNI) is a probabilistic graphical model-based approach that uses multi-task learning to infer gene regulatory networks for cell types related on a cell lineage tree (**Figure 1**). We define a cell type to be a group of cells with similar transcriptome and accessibility levels as defined by existing cell clustering methods. Each task learns the gene regulatory network (GRN), **G**^{(d)} for each cell type or cell cluster *d*. Given cell type-specific datasets for *M* cell types, 𝒟 = *{D*^{(1)}, …, *D*^{(M)}*}*, our task is to find the set of graphs 𝒢 = *{***G**^{(1)}, …, **G**^{(M)}*}* and parameters **Θ** = *{θ*^{(1)}, …, *θ*^{(M)}*}* for each of the cell types. **G**^{(d)} is modeled as a dependency network ^{21}, a class of probabilistic graphical models for inferring directed, predictive relationships among random variables (regulators and genes). Each gene is modeled as a random variable which encodes the expression level of gene *i* in each cell. A conditional probability distribution models the relationship between gene *i* and its set of regulators, in cell type *d*. In a dependency network, GRN inference entails estimating the regulators for each gene *i* in each cell type *d*. To enable joint learning of these cell type-specific networks our goal is to find the set ** 𝒢** =

*{*

**G**

^{(1)}, …,

**G**

^{(M)}

*}*and parameters

**Θ**=

*{θ*

^{(1)}, …,

*θ*

^{(M)}

*}*by estimating the posterior distribution of these two sets and finding their maximum a posteriori values:

*P* (** 𝒟**|

**) is the data likelihood, expanded as Π**

*𝒢*, Θ_{d}

*P*(

*D*^{(d)}|

*G*^{(d)},

*θ*

^{(d)}). In a dependency network, pseudo likelihood

^{21}is used to approximate the data likelihood for each cell type, defined as the products of the conditional distribution of each random variable given its neighbor set in cell type

*d*, . Thus, the likelihood can be written as:

Given the neighbor set , the above quantity can be computed efficiently. We assume that each variable and its neighbor set in cell type *d* are from a multi-variate Gaussian distribution. Thus, can be modeled using a conditional Gaussian distribution with mean and variance which can be estimated in closed form. is selected from the input list of regulators using a greedy search algorithm, executed in parallel across all cell types (See **Supplementary Methods**). The second term *P* (**Θ**|** 𝒢**) in

**Equation**(1) is estimated using the maximum likelihood settings of the parameters. The third term

*P*(

**) =**

*𝒢**P*(

**G**

^{(1)},

*· · ·*,

**G**

^{(M)}) in the objective function is the structure prior and is defined in a way to capture the state of an edge across all cell types modeled, where

**=**

*𝒢**{*

**G**

^{(1)}, …,

**G**

^{(M)}

*}*. We assume that

*P*(

**) is composed of two priors, one is the cell-type specific prior**

*𝒢**P*(

**), where**

*T***=**

*T**{T*

^{(1)}, …,

*T*

^{(M)}

*}*, and the other one is a cell lineage structure prior

*P*(

**) which captures the similarity between related cell types along the cell lineage tree, where**

*S***=**

*S**{S*

^{(1)}, …,

*S*

^{(M)}

*}*.

*P* (** T**) is the cell-type specific prior, which decomposes over a product of cell-type specific graphs: . The

*P*(

*T*

^{(d)}) decomposes over a product of individual edge configurations, , where is an indicator function that represents whether there exists an edge between regulator

*u*to target gene

*v*in cell type

*d, X*

_{u}

*→ X*

_{v}as follows:

As in Roy et al ^{57}, we model the prior probability using a logistic function:

The *β*_{0} parameter is a sparsity prior that controls the penalty of adding of a new edge to the network, which takes a negative value (*β*_{0} *<* 0). A smaller value of *β*_{0} will result in a higher penalty on adding new edges and will therefore infer sparser networks. The *β*_{1} parameter controls how strongly motifs are incorporated as prior (*β*_{1} *≥* 0). A higher value of *β*_{1} will result in motif presence being valued more strongly to select an edge. *β*_{1} is set to 0 when there is no cell type-specific motif information available. is the weight of the edge from regulator *u* to target *v* in the prior network and is computed based on the motif instance score if gene *v* has a motif of regulator *u* in its promoter region that overlaps an ATAC-seq peak. Thus, we have

The cell lineage structure prior *P* (** S**) is constructed to make use of multi-task learning. We define that

*P*(

*S*

^{(1)}, …,

*S*

^{(M)}) can be rewritten as a product over a set of edges between regulators and target genes: . Under the assumption that the prior probability of the edge state in one cell type is only dependent upon its state in the predecessor cell type, we have: where

*pa*(

*d*) denotes the predecessor cell of cell type

*d*on the cell lineage tree and

*r*denotes the starting root cell. is a measure of overall regulatory gain and loss of regulatory connections between related cell types, and is assumed to be the same across the set of edges. Thus, it can specified by three parameters: the probability of gaining a regulatory edge in the starting cell, , the probability of gaining a regulatory edge in cell type

*d*given that the edge does not exist in its predecessor cell , and the probability of maintaining a regulatory edge in cell type

*d*, given its presence in its predecessor cell . These parameters of the priors can be set by the user or estimated empirically by analyzing different configurations and selecting those values with the best agreement with existing biological knowledge of the system. scMTNI uses a greedy score-based structure learning algorithm. Please refer to

**Supplementary Methods**for details.

### Input Datasets

#### Simulated Datasets

To benchmark the performance of different multi-task and single-task learning algorithms, we simulated single cell expression data from a lineage resembling a linear differentiation process for three cell types (**Figure 2A**). We simulated network dynamics on a lineage tree and controlled the extent of similarity with the three prior parameters: *p*_{r}, the probability of having an edge in the starting/root cell type; , the probability of gaining an edge in cell type *d* that is not in the predecessor cell type; , the probability of maintaining an edge in cell type *d* from the predecessor cell type. We set and simulated three networks from a linear lineage tree for each of the three cell types, each with 15 regulators and 65 genes. Next, we applied BoolODE on the simulated gene regulatory networks and generated single cell expression data for 2000 cells for each cell type. To mimic the dropouts in the scRNA-seq data, we added 80% sparsity uniformly to all genes on the simulation data. We refer to this simulated dataset as data 1, consisting of 65 genes and 2,000 cells for three cell types. We generated smaller sample sizes of these datasets, data 2 and data 3 by downsampling data 1 to 1,000 cells (data 2) and 200 cells (data 3). We applied each of the algorithms on these three datasets within a stability selection framework and evaluated their performance based on AUPR and F-score as described in the **Evaluation** section.

#### Human hematopoietic differentiation data

Buenrostro et al. ^{41} measured single-cell accessibility (scATAC-seq) and single-cell RNA sequencing (scRNA-seq) data to study the regulatory dynamics during human hematopoietic differentiation for multiple immuno-phenotypic cell types: hematopoietic stem cells (HSCs), common myeloid progenitors (CMPs) and granulocyte-macrophage progenitors (GMPs) and Monocytes (Monos). We downloaded fragment files for the scATAC-seq data and processed scRNA-seq data for each cell type. For the scATAC-seq data we mapped the fragments into 23,347,540 bins with length of 1000bp. Next, we mapped 1kb bins to the nearest gene and extracted cells with cell barcodes labeled as HSC, CMP, GMP and Mono cells. Next, we filtered out genes with sum of counts in all samples less than 100 producing a processed scATAC-seq dataset with 54,344 genes and 1,315 cells across the four cell types. We extracted the count matrix of scRNA-seq from these four cell types. After filtering out genes with non-zero expression in less than 5 cells, the scRNA-seq data had 12,558 genes and 4,165 cells. We normalized the count matrix for depth and variance stabilization based on the pagoda pipeline ^{58}. We kept 12,393 common genes between scATAC-seq and scRNA-seq data and applied LIGER ^{22} to define integrated cell populations. We applied LIGER with *k ∈* 8, 10, 12, 15, 20 and found 10 cell subpopulations to be most appropriate. C8 was mainly composed of HSCs, C6 was mainly composed of GMP cells, C7 was mainly CMP0 cells, C1 was composed of Mono cells, and the rest clusters were a combination of several cell types. C5 had too few RNA cells (22 cells) so we excluded it from further analysis. Since the composition of C1 (73 cells) and C4 (37 cells) are very similar, mainly GMP and Mono cells, we combined these two clusters as C1. We inferred a cell lineage tree from the 8 cell clusters using a minimal spanning tree approach (python package scipy.sparse.csgraph).

To derive the prior network for each cell cluster we created cluster-specific bam files from the scATAC-seq data using the LIGER clusters. We pooled these bam files to generate pseudo bulk accessibility coverage and applied MACS2 to identify scATAC-seq peaks for each cell cluster ^{59}. We obtained sequence-specific motifs from the Cis-BP database ^{60} and used the script pwmmatch.exact.r available from the PIQ toolkit ^{61} to identify significant motif instances genome-wide using the human genome assembly of hg19. We mapped motifs to each scATAC-seq peak and mapped the peak to a gene if it was within *±*5000bp of the transcription start site (TSS) of a gene. In this case, we connect motifs to TSS that are mapped to the same scATAC-seq peak. We used the max motif score from pwmmatch.exact.r for each motif-TSS pair and took the maximum value among all TSSs of a gene as the value for each motif-gene pair. The motif instance score is the log ratio of the PWM to a uniform background. Finally, to generate the edge weight for each TF-gene pair, we used the max score among all motifs mapped to the same TF. To normalize the edge weights across TFs, we converted these weights into percentile scores and selected the top 20% of edges as prior edges.

#### Mouse reprogramming data

We generated a novel scATAC-seq time course dataset for cellular reprogramming from mouse embryonic fibroblast (MEF) reprogramming to induced pluripotent cells (iPSC). The dataset contains had a total of 6 time points corresponding to the starting MEF, the end pluripotent state (mESC), and four intermediate timepoints of day3, day6, day9 and day12. We downloaded scRNA-seq datasets (GEO: GSE108222) for the same time points from Tran et al ^{62}. The scATAC-seq data was first processed through CellRanger ATAC pipeline to provide the frags.txt file. We binned the genome at nonoverlapping 1kb bin and computed the number of fragments mapped to each 1kb bin. Next, we mapped 1kb bins to the nearest gene for all of the samples. For scRNA-seq data, we concatenated the expression data from two replicates at each time point and normalized the concatenated matrix for depth and variance stabilization based on the pagoda pipeline ^{58}. Next, for each time point, we removed genes with expression in less than 5 cells. We took the union of genes among all time points and concatenated the expression data across all time points as our final scRNA-seq data matrix. The processed scATAC-seq data contains 25,824 genes and 30,344 cells. The processed scRNA-seq dataset contains 14,953 genes and 3,460 cells. We had a total of 11,926 genes in common between the two datasets, which were used for downstream analysis. We applied LIGER with *k ∈* 8, 10, 12, 15, 20 and found *k* = 8 to provide the optimal clustering of the scRNA-seq and scATAC-seq data determined based on the clustering of the accessibility and transcriptome of the MEF and ESC time points. We used the mean expression profiles across samples of these cell clusters and computed the Euclidean distance between every cell clusters. Then, we inferred a minimal spanning tree using the distance matrix and used it as the cell lineage tree using scipy.sparse.csgraph in python. The prior motif was generated in the same way as for the hematopoeisis differentiation dataset using motifs for mouse from the CisBP database ^{60}. We used mouse genome mm10 for this analysis.

### Application of network inference algorithms on simulated datasets

We used the simulated datasets to perform extensive benchmarking of the different network inference algorithms. We also used this dataset to study the sensitivity of the algorithms to the different parameter settings. Below we describe each of the algorithms as well as the parameters used for each of the algorithms for the simulated datasets. For all three simulation datasets, we applied all algorithms other than SCENIC within a stability selection framework to estimate the confidence score for each edge in the predicted networks. For stability selection we subsampled each dataset 20 times randomly using half of the cells and all genes. SCENIC has its own internal sub-sampling and directly outputs the edge confidence.

#### scMTNI

scMTNI has five hyper-parameters: *p*_{r}, probability of having an edge in the starting cell type; , probability of gaining an edge in a child cell type *d*; the probability of maintaining an edge in *d* from its immediate predecessor cell type; a sparsity penalty *β*_{0}, that controls penalty for adding edges; *β*_{1}, that controls the strength of incorporating prior network. We tried different configurations of the hyper-parameters: *p*_{r} *∈ {*0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5*}*, and , and . *β*_{1} was set to 0 as there is no prior network in the simulations. If the size of the predicted network for a parameter setting was smaller than the size of the simulated network, we disregarded this parameter setting for comparison. We used the area under the precision-recall curve (AUPR) to compare the scMTNI inferred networks to simulated networks. We also computed F-score on top K edges ranked by the confidence score (where K is the number of edges in the simulated network, see **Table 1**). Overall performance of scMTNI was stable across different parameter configurations (**Supplementary Figure 17, Supplementary Methods**). To compare against methods, we used values from the best parameter settings for each dataset and cell type as well as all parameter settings (**Supplementary Figure 1**,**2**).

#### MRTLE

Multi-species regulatory network learning (MRTLE) ^{24} is a probabilistic graphical modelbased algorithm that uses phylogenetic structure, transcriptomic data for multiple species, and sequencespecific motifs to infer the genome-scale regulatory networks across these species simultaneously. It was developed for bulk transcriptomic data. It uses a dependency network model to specify the directed relationship among regulators to target genes. Sequence-specific motif instances can be incorporated as prior knowledge to favor edge supported with presence of motifs. The multi-task learning framework is embedded in the phylogenetic prior, which captures the evolutionary dynamics of regulatory edge gain and loss guided by the phylogenetic structure. The MRTLE algorithm has four parameters: *p*_{g}, the probability of gaining an edge in a child species *s* that is not in the ancestor species; *p*_{m}, the probability of maintaining an edge in a species *s* given that is also in *s*’s immediate ancestor of *s*; *β*_{0}, a sparsity penalty that controls penalty for adding edges, and a penalty *β*_{1} that controls the strength of motif prior. In the simulation case, we examined different parameter configurations: *p*_{g} *∈ {*0.05,0.1,0.15,0.2,0.3,0.4*}, p*_{m} *∈ {*0.5,0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85*}, β*_{0} *∈ {*-0.005,-0.01, −0.05,- 0.1, −0.5, −1*}. β*_{1} was set to 0. The overall performance of MRTLE was stable across different parameter configurations (**Supplementary Figure 18**). Similar to scMTNI, we used the AUPR and F-score of top K edges to select the best parameter setting. The best parameter setting and all parameter settings were used to compare against other algorithms.

#### GNAT

The GNAT ^{25} algorithm uses a hierarchy of tissues to share information between related tissue and infers tissue-specific gene co-expression networks. It was developed for bulk transcriptomic data. GNAT models each network using a Gaussian Markov Random Field (GMRF). It has two parameters: the *L*_{1} penalty *λ*_{s} that controls the sparsity of the network, and the *L*_{2} penalty *λ*_{p} that encourage the precision matrix of children to be similar to its parent precision matrix. It initially learns a co-expression network for each leaf tissue. Then it infers the networks in internal nodes using the networks in the leaf nodes and updates the networks in leaf nodes for several iterations until convergence. Since GNAT learns undirected networks, we transformed them to directed networks by adding edges from a regulator to a target. If the nodes of an edge are both candidate regulators, we output the edge in both directions. We tried different parameter configurations of *λ*_{s} and *λ*_{p}. For data 1 (n=2000), *λ*_{s} were set to {30, 31, 32,…, 37}, and *λ*_{p} were set to {30, 31, 32,…, 40}. For data 2 (n=1000), *λ*_{s} were set to {18, 19,…, 22}, and *λ*_{p} were set to {18, 19,…, 25}. For data 3 (n=200), *λ*_{s} were set to {5, 6, 7, 8}, and *λ*_{p} were set to {5, 6, 7, 8}. We found that *λ*_{s} dominates the performance and under the same *λ*_{s}, changing *λ*_{p} does not change the performance a lot (**Supplementary Figure 19**). If the size of the predicted network for a parameter setting is smaller than the size of the simulated network, we removed this parameter setting. In this case, the ranges of *λ*_{s} and *λ*_{p} are slightly different and varying across different datasets. We used AUPR and F-score of top K edges to select the best parameter settings. We compared the algorithms using these and all parameter settings.

#### Ontogenet

The Ontogenet ^{26} algorithm was developed to reconstruct lineage-specific regulatory networks using cell type-specific gene expression data across cell lineages. It was developed for bulk transcriptomic data. To infer the regulatory networks for each cell type, it uses a fused LASSO framework combined with an additional *L*_{2} penalty. The *L*_{1} penalty is introduced to control the sparsity of regulators, while the *L*_{2} penalty is used to select correlated predictors. The multi-task learning comes in the fused LASSO framework with additional *L*_{1} penalty on the difference of the regression weight of related cell types, which encourage the consistency of regulatory programs between related cell types. Ontogenet was applied on the same subsample of the three simulation datasets within a stability selection framework to estimate the confidence score for each edge in the networks. The Ontogenet algorithm has three parameters: the *L*_{1} penalty *λ* that controls the sparsity of the network, the *L*_{2} penalty *κ* that handles correlated predictors, and *γ* that encourage the similarity of regulatory programs between related cell types. We tried different parameter configurations of *λ, γ* and *κ*. For data 1 (n=2000), *λ* were set to {1000,1250,1500,1750,2000,2250,2500}, and *γ* were set to {1000,1250,1500,1750,2000,2250,2500}. For data 2 (n=1000), *λ* were set to {500,1000,2000,3000}, and *γ* were set to {500,1000,2000,3000}. For data 3 (n=200), *λ* were set to {475,500,525}, and *γ* were set to {475,500,525*}. κ* was set to {1, 5, 10} for each of the datasets. We found that *λ* and *γ* dominate the performance and while changing *κ* does not change the performance significantly (**Supplementary Figure 20**). If the size of the predicted network for a parameter setting is smaller than the size of the simulated network, we removed this parameter setting. The ranges of *λ* and *γ* are slightly different and varying across different datasets in order to infer similarly sized networks for different datasets. We used AUPR and F-score of top K edges to select the best parameter settings. We compared the algorithms using these and all parameter settings.

#### AMuSR

The Inferelator-AMuSR ^{27} algorithm uses sparse block-sparse regression to estimates the activities of transcription factors and infer gene regulatory networks from expression datasets. The multitask learning approach decomposes the model coefficients matrix into a dataset-specific component using a sparse penalty and a conserved component using a block-sparse penalty to capture both conserved interactions and dataset-unique interactions. It is able to incorporate prior knowledge from multiple resources and robust to false interactions in the prior network. For our simulation setting, we applied AMuSR without TFA estimation by setting worker.set tfa(tfa driver=False) in the SingleCellWorkflow from Inferelator 3.0 package. To be comparable across different algorithms, AMuSR was applied on the same subsample of the three simulation datasets within a stability selection framework to estimate the confidence score for each edge in the AMuSR networks. The AMuSR algorithm has two sparsity parameters: *λ*_{s} that controls the sparsity of the network for each dataset, the block-sparse penalty *λ*_{b} that controls the sparsity of the conserved network across all datasets. AMuSR has its own parameter selection framework (see ^{27} for details) and uses extended Bayesian information criterion (EBIC) to select the optimal (*λ*_{s}, *λ*_{b}). We additionally externally tuned the parameters by setting *c* to {0.01, 0.02154435, 0.04641589, 0.1,0.21544347,0.46415888,1., 2.15443469, 4.64158883,10} and set as suggested in the paper, where *d* is the number of cell types and n is the number of samples and *p* is the number of genes. However, by setting *λ*_{b} to 0 and *λ*_{s} to 0, we found that the inferred networks are too sparse with 7-100 edges for data 1, and 71-129 edges for data 2. We kept two settings for AMuSR, one using our criteria to select the best setting based on AUPR and F-scores among different *c* settings (AMuSR tuned) and another version using AMuSR’s default optimal parameter selection (AMuSR default). We computed AUPR and F-score of top K edges (where K is the number of edges in the simulated network) for AMuSR inferred networks with optimal parameter settings for comparison with other algorithms. We compared the algorithms using the optimal and all parameter settings.

#### INDEP

The INDEP algorithm is the single-task framework of scMTNI which does not have the prior for sharing information across cell types and infers a regulatory network for each cell type independently. It also models each network using a dependency network as scMTNI. INDEP learns the graphs for each cell type using a greedy graph learning algorithm with a score-based search, where the score contains only the data likelihood. At each iteration, the algorithm computes the change in data likelihood score ^{21} for all candidate regulators for each target gene, selects the best regulator for the target gene and adds this (regulator, target) edge to the current graph. INDEP has two parameters in the model: a sparsity penalty *β*_{0} that controls penalty for adding edges, and a penalty *β*_{1} that controls the strength of motif prior. In the simulation case, *β*_{0} were set to {-0.005,-0.01, −0.05, −0.1, −0.5, −1}, and *β*_{1} were set to 0. AUPR and F-score of top *K* edges were used to select the best parameter settings (**Supplementary Figure 21**). If the size of the predicted network for a parameter setting is smaller than the size of the simulated network, we removed this parameter setting. As above, we compared INDEP to other algorithms using best and all parameter settings for a dataset.

#### LASSO

The LASSO regression is linear regression with *L*_{1} regularization. For each gene, we use the expression profiles of candidate regulators to predict the expression profiles of this gene. The regulators with non-zero coefficients are inferred as the regulators for this gene and these edges are added to the gene regulatory network. We used matlab implementation of the LASSO regression. Similarly to scMTNI and MRTLE, LASSO was run on the same subsample of the three simulation datasets within a stability selection framework to estimate the confidence score for each edge in the networks. LASSO has only the *L*_{1} penalty *λ* that controls the sparsity of the network. In the simulation case, *λ* were set to {0.01, 0.02, 0.03, 0.04, 0.05, 0.06}. AUPR and F-score of top K edges were used to select the best parameter settings (**Supplementary Figure 22**). If the size of the predicted network for a parameter setting is smaller than the size of the simulated network, we removed this parameter setting. We compared LASSO to other algorithms using the best and all parameter settings.

#### SCENIC

The SCENIC ^{29} algorithm uses GENIE3 or GRNBoost2 to infer TF-target relationships available as part of the Arboreto framework ^{63}. We used the GRNBoost2 algorithm with default parameters for network inference. SCENIC is based on ensemble models with its own bootstrapping and hence was directly applied to each cell type-specific dataset in the simulation. SCENIC uses the feature importance score of each edge to rank the edges in the inferred network. We computed AUPR and F-score of top *K* edges (where K is the number of edges in the simulated network) for SCENIC inferred networks for comparison with other algorithms.

### Application of network inference algorithms to cellular reprogramming data

We applied scMTNI, scMTNI+prior, INDEP, INDEP+prior and SCENIC to this dataset. scMTNI and INDEP algorithms were applied within a stability selection framework to estimate edge confidence. SCENIC has its own subsampling framework which can estimate an edge importance. In the stability selection framework, we subsampled the data 50 times, each with 12,216 genes and of the cells, applied the algorithms to each subsample and used the inferred networks to estimate the confidence score for each TF-target edge in the predicted networks. In both scMTNI and scMTNI+prior, we used the following hyper-parameter settings for the lineage structure prior and . For the sparsity prior we set *β*_{0} = *−*0.9 for scMTNI, and *β*_{0} *∈ {−*0.9, *−*2, *−*3, *−*4} for scMTNI+prior. To generate prior network, we used the matched scATAC-seq clusters to obtain TF-target prior interactions for each scRNA-seq cluster. For scMTNI+prior which uses the scATAC-seq prior, we set *β*_{1} *∈ {*2, 4}. INDEP and INDEP+prior were applied on the same subsampled data followed by edge confidence estimation. We used the same settings for *β*_{0} and *β*_{1} for INDEP as scMTNI. Final results of scMTNI+prior are using *β*_{0} = *−*4 and *β*_{1} = 4, which was determined by the distribution of edges at different confidences. Final results for INDEP+prior are using *β*_{0} = *−*4 and *β*_{1} = *−*4. SCENIC was applied to the entire dataset with default parameter settings.

### Application of network inference algorithms to human hematopoietic differentiation data

We used a similar workflow for the human hematopoietic differentiation dataset as the reprogramming system. We subsampled the scRNA-seq data for each cell cluster 50 times, each with 11,994 genes and of the cells, and applied scMTNI, scMTNI+prior, INDEP, INDEP+prior on each subsample to estimate the edge confidence of the GRNs. For scMTNI and scMTNI+prior, the lineage structure prior parameters were set as follows: . The sparsity prior *β*_{0} was set to *−*0.9 for scMTNI. For scMTNI+prior, the sparsity prior was set *β*_{0} *∈ {−*0.9, *−*2, *−*3, *−*4} and *β*_{1} *∈ {*2, 4}. For INDEP and INDEP+prior, we used the same settings for *β*_{0} and *β*_{1} for as scMTNI and scMTNI+prior respectively. Final results of scMTNI+prior are with *β*_{0} = *−*4 and *β*_{1} = 4 and final results for INDEP+prior are using *β*_{0} = *−*4 and *β*_{1} = *−*4. SCENIC was applied to the entire dataset with default parameter settings.

### Evaluation

#### Gold standard datasets

To evaluate the predicted networks of different inference algorithms on real data, we downloaded and processed several gold standard datasets (**Tables 2, 3**). For human hematopoietic cell types, we have five gold standard datasets. Two gold standard datasets were a ChIP-based (Cus ChIP) and a regulator knock down-based (Cus KO) gold standard dataset in GM12878 lymphoblastoid cell line downloaded from Cusanovich et al ^{43}. For the knockout dataset, we had TF-target relationships at two p-value thresholds, 0.01 and 0.05. We used the one at 0.01 to have a more stringent gold standard. The third gold standard was from human hematopoietic cell types from the UniBind database (https://unibind.uio.no/) ^{42}, which has high confidence TF binding site predictions from ChIP-seq experiments. To obtain the TF-gene network, we mapped TF binding sites to the nearest gene if there is overlap between the TF binding sites and the promoter of the gene define by *±*5000bp. If multiple ChIP-seq datasets were available for the same TF in a given cell type, we took the union of TF-gene edges for the same cell type. We took the union of these individual cell type-specific gold standards to create our Unibind gold standard (UniBind). Finally, we took the intersection of the ChIP-based gold standards with the knockdown based gold standards, Unibind+Cus KO and CusChIP+Cus KO to produce the fourth and fifth gold standards. The statistics of the gold standard datasets are provided in **Table 3**.

For mouse reprogramming study we curated multiple experimentally derived networks of regulatory interactions from the literature and existing databases. The statistics of the gold standard datasets are provided in **Table 2**. One of these experiments is ChIP based gold standard (referred to as “ChIP”) from ESCAPE or ENCODE databases ^{31,32}, which contains ChIP-chip or ChIP-seq experiments in mouse ESCs. Another is knock-down based gold standard (referred to as “Perturb”), which is derived from regulator perturbation followed by global transcriptome profiling ^{31,33}. We took a union of the networks from LOGOF (loss or gain of function) based gold standard networks from ESCAPE database ^{31} and the networks from Nishiyama et al ^{33} as the perturbation interactions. Finally, we took the intersection of the interactions between ChIP and knock-down based gold standard to create the third gold standard network referred to as “ChIP+Perturb”.

#### Area Under the Precision Recall Curve

To evaluate the performance of scMTNI and other algorithms, we compared the inferred networks to the simulated networks or interactions from the gold standard datasets based on Area under the precision recall curve (AUPR). Edge weights for all but the SCENIC algorithm were obtained using stability selection. In our stability selection framework, we generated *N* random subsamples of the data, inferred a network for each subsample, and calculated a confidence score for each edge as the fraction of how many times this edge was present in the inferred networks across all subsamples. Next, we ranked the edges by the confidence score and estimated precision and recall at different confidence thresholds ranging from 0 to 1. Precision *P* is defined as the fraction of the number of edges that are true positives among the total number of predicted edges. Recall *R* is defined as the fraction of the number of edges that are true positives among the total number of true edges. Then, we plotted the precision recall curve and estimated the area under this curve using the AUCCalculator package developed by Davis et al. ^{64}. The area under the precision recall curve is computed as an overall assessment of the inferred networks compared to “true” networks. The higher AUPR, the better the performance is. For the real scRNA-seq datasets, we filtered the inferred networks to include TFs and targets that were in the gold standard.

#### F-score

While AUPR uses a ranking of the edges, F-score is a metric to compare a set of predicted edges to a set of “true” edges. F-score is defined as the harmonic mean of the precision (P) and recall (R),

F-score enables us to control for the number of edges across network inference algorithms as these can vary significantly across algorithms. To control for number of edges in the predicted networks, we ranked the predicted network by the confidence score or edge weight, selected top *K* edges and computed F-score compared to simulated networks or gold standard networks. *K* in the simulated datasets corresponded to the size of the simulated networks. For the real datasets, we considered top 500, 1000, 2000 edges. We obtained the top *K* edges after filtering the inferred networks based on the TFs and targets in the gold standard networks.

### Examining network dynamics on cell lineages

We used several global and subnetwork-level methods to examine how regulatory networks change on a cell lineage. These include F-score based comparison of all pairs of networks on the lineage, k-means based edge clustering and Latent Dirichlet Allocation (LDA).

#### F-score based analysis of inferred network change along cell lineage tree

To examine the overall conservation and divergence between the inferred cell type-specific networks along the cell lineage tree, we computed F-score on the predicted networks between each pair of cell types and applied hierarchical clustering on the inferred networks based on the F-score. To compute F-score, we selected top X edges ranked by confidence score to obtain a reliable network for each cell type, where X was close to the median of the number of 80% confident edges across all cell types. This was 4k in the mouse reprogramming dataset and 5k in the hematopoietic differentiation dataset. We visualized the dendrogram obtained from the hierarchical clustering and compared this to the original cell lineage tree.

#### k-means based edge clustering

For each cell cluster, we selected top *K* edges, where *K* was close to the median number of edges with at least 80% confidence across all cell types. This corresponded to 4k edges for the mosue reprogramming dataset and 5k edges for the hematopoietic differentiation dataset. We merged the confidence score of each edge across all cell types as an edge by cell type matrix, each entry corresponding to the edge confidence and with as many edges as in the union of top *K* edges from any cell type. We applied k-means clustering on this matrix to find subnetworks with different patterns of conservation. We tried a range of number of clusters and selected the one that has the highest silhouette coefficient.

#### Latent Dirichlet Allocation (LDA) model for regulatory network rewiring

We adopted Latent Dirichlet Allocation (LDA) to examine subnetwork level rewiring as shown in TopicNet ^{36}. LDA was originally developed to cluster documents based on their word distributions. Each document, *i* is assumed to have a certain composition of topics, as captured by a *θ*_{i} parameter and each topic, *k*, is assumed to have a specific distribution of words as captured by a *ϕ*_{k} parameter. In the application of LDA to a regulatory network, we first concatenated the TF by target network across cell types to have as many rows as there are TFs times the number of cell types. Each TF in a cell type is treated as a document and its targets are treated as words in the document. The topic distribution for all documents constitutes a *M × K* matrix for document-topic distribution, where *M* is the total number of TFs in any of the networks and *K* is the total number of topics. The distribution of words (genes) in each topic is captured by *K × V* matrix for *V* genes. Each gene can be assigned to a topic based on its maximum probability across topics. We applied LDA model to the 80% confidence networks of all cell clusters inferred from scMTNI with 10 or 15 topics and found 10 topics to be suitable for both datasets. We extracted the subnetworks in each cell type associated with each topic by obtaining the induced graph for the genes and regulators associated with each topic and visualized the giant components of each network to identify change across cell clusters within the same topic.

For the mouse reprogramming dataset, we used the results of LDA application with 10 topics on the 80% confidence networks of all cell clusters (**Supplementary Figure 5, 6, 7**). To interpret the topics in each cell type, we tested the genes in the cell type-specific subnetwork for each topic for enrichment of gene ontology (GO) ^{65} processes using a hypergeometric test with FDR correction. We used an FDR *<*0.01 to determine significant enrichment (**Supplementary Figure 8**). For the hematopoiesis dataset, we also used LDA results with 10 topics on the 80% confidence networks of all cell clusters (**Supplementary Figure 13, 14, 15**) and used FDR *<*0.01 to determine significantly enriched terms (**Supplementary Figure 16**).

## Data and code availability

Pre-processed datasets are available at scMTNI Supplementary website at https://github.com/Roy-lab/scMTNI. The reprogramming scATAC-seq dataset has been deposited to Gene Expression Omnibus (GEO). The scMTNI code and associated MATLAB, python and R scripts to compute various validation metrics are available at https://github.com/Roy-lab/scMTNI.

## Author contributions

S.Z. and S.R. designed the scMTNI algorithm and experiments. S.Z. implemented the code and performed most of the experiments. S.P. contributed towards creation of the gold standards and evaluating selected algorithms. S.P. and R.S. generated the scATAC-seq data for the reprogramming experiments. All authors contributed towards writing the manuscript.

## Competing Interests

The authors declare no competing interests.

## Acknowledgements

We thank the Center for High Throughput Computing at University of Wisconsin-Madison for computational resources. This work is supported by the National Institutes of Health NIGMS grant 1R01GM117339.

## References

- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].
- [6].
- [7].
- [8].
- [9].
- [10].
- [11].
- [12].
- [13].
- [14].
- [15].
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵
- [60].↵
- [61].↵
- [62].↵
- [63].↵
- [64].↵
- [65].↵
- [66].