## Abstract

Understanding gene regulation is a fundamental step towards understanding of how cells function and respond to environmental cues and perturbations. An important step in this direction is the ability to infer the transcription factor (TF)-gene regulatory network (GRN). However gene regulatory networks are typically constructed disregarding the fact that regulatory programs are conditioned on tissue type, developmental stage, sex, and other factors. Due to lack of the biological context specificity, these context-agnostic networks may not provide insight for revealing the precise actions of genes for a specific biological system under concern. Collecting multitude of features required for a reliable construction of GRNs such as physical features (TF binding, chromatin accessibility) and functional features (correlation of expression or chromatin patterns) for every context of interest is costly. Therefore we need methods that is able to utilize the knowledge about a context-agnostic network (or a network constructed in a related context) for construction of a context specific regulatory network.

To address this challenge we developed a computational approach that utilizes expression data obtained in a specific biological context such as a particular development stage, sex, tissue type and a GRN constructed in a different but related context (alternatively an incomplete or a noisy network for the same context) to construct a context specific GRN. Our method, NetREX, is inspired by network component analysis (NCA) that estimates TF activities and their influences on target genes given predetermined topology of a TF-gene network. To predict a network under a different condition, NetREX removes the restriction that the topology of the TF-gene network is fixed and allows for adding and removing edges to that network. To solve the corresponding optimization problem, which is non-convex and non-smooth, we provide a general mathematical framework allowing use of the recently proposed Proximal Alternative Linearized Maximization technique and prove that our formulation has the properties required for convergence.

We tested our NetREX on simulated data and subsequently applied it to gene expression data in adult females from 99 hemizygotic lines of the *Drosophila* deletion (DrosDel) panel. The networks predicted by NetREX showed higher biological consistency than alternative approaches. In addition, we used the list of recently identified targets of the Doublesex (DSX) transcription factor to demonstrate the predictive power of our method.

## 1 Introduction

Cell function, fitness, and survival depend on a complex regulatory program involving interactions between genes and their regulators. Regulatory relationships between transcription factors (TFs) and genes they regulate constitute a gene regulatory network (GRN) that is often represented as a directed bipartite graph. Several experimental and computationally derived types of evidences can be used to infer the topology of such regulatory network including genome-wide chromatin immunoprecipitation (ChIP), gene expression profiling, and motif analysis. Complementing the topology of a GRN, a further level of understanding can be obtained by modeling the quantitative relation between TF activities and expression of their target genes. Specifically, given expression data obtained by engineered perturbations of a reference state, or by tracing expression changes over a number of naturally occurring conditions, the goal is to model the expression changes as a function of changes in activities of TFs when the underlying GRN topology is available. In particular, network component analysis (NCA) has proven to be a powerful method for such modeling [1,2,3,4]. The essence of NCA approach is the assumption that the expression of genes can be modeled by linear combination of TF activities [5]. TF activity is a hidden parameter of each TF that the method infers from the data. While the assumption of linearity of the effects is obviously a simplifying one, it provides a good first approximation and makes the problem tractable.

An important drawback of the NCA approach is that it requires the topology of the GRN to be known. Several computational methods which integrate diverse functional genomics data sets, were developed to infer GRNs and investigate gene regulation at the systems level [6,7]. Yet, current knowledge of the topology of regulatory networks is not complete, even for simple unicellular organisms such as yeast [8] and for most organisms construction of a regulatory network has not even been attempted. In addition these networks are typically context-agnostic, namely, they were constructed without considering tissue type, development stage, and other relevant conditions. However, for very closely related organisms their regulatory networks can be assumed to be rather similar due to evolutionary conservation. Similarly, for any specific organism, the regulatory interaction in different tissues are expected to overlap significantly. This motivates the need for developing a method that can utilize a network constructed for a closely related organism, stage, or tissue as a starting point for constructing a tissue, stage or organism specific network. Indeed, there is an increasing recognition of the importance of tissue specific analyses and tissue specific networks [9,10].

To address this challenge we developed Network Rewiring using EXpression (NetREX), a new mathematically rigorous method, that builds on the linear model utilized by the NCA method, but without the assumption that the topology of the regulatory network is fixed. That is, unlike previous methods, NetREX does not restrict the structure of the regulatory network to be hardwired but instead utilizes expression data from a set of perturbations performed in a given context and a prior network that is assumed to be related to the target network by limited changes in the topology to construct a context specific network. We remark that allowing for rewiring in the topology of the prior network adds a whole new level of complexity. Specifically, we use *ℓ*_{0} norm to directly handle the number of removed and newly added edges as well as induce sparse solutions in our formulation. Unlike the widely used strategy, which is replacing the non-convex *ℓ*_{0} norm by its convex relaxation *ℓ*_{1} norm, we focus on the harder problem involving *ℓ*_{0} norm and provide a number of rigorous derivations and results allowing us to adopt the recently proposed Proximal Alternative Linearized Maximization (PALM) algorithm [11]. In addition, we also proved the convergence of the NetREX algorithm.

To evaluate method’s performance we first tested NetREX on simulated data. Specifically, we analyzed its performance as the function of the noise in the prior network and in the gene expression data. We found that under the assumptions of the model, NetREX is able to dramatically improve the accuracy of the regulatory networks as long as the prior network and the gene expression are not very noisy. After testing the method on simulated data, we used NetREX for constructing regulatory networks for adult female flies. We used the network constructed in [6] as the prior network. This network was build by integrating diverse data sets including TF binding, evolutionarily conserved sequence motifs, gene expression across developmental time-course, and chromatin modification data sets. The topology of the network provides an initial wiring diagram that includes TF-target gene interactions predicted from data available at the time of network construction disregarding the context. Starting with this network, we utilized a new expression data set that we collected for adult female flies where perturbations in expression were achieved by genetic deletions. Specifically, the gene expression data of adult females are from 99 hemizygotic lines (deletion/+) of the *Drosophila* deletion collection (DrosDel) project [12,13]. To evaluate the resulting networks we used a previously applied method [6] to access biological relevance of the networks by using Gene Ontology (GO) annotations [14] and physical protein-protein interactions (PPIs). We compared NetREX with several methods including a correlation based algorithm [15] and GENIE3 [16], the best performer in the DREAM4 *In Silico Multifactorial challenge* [15]. We observed dramatic improvements in terms of fold enrichment comparing to all competing algorithms. Subsequently we asked how well the method predicts targets of the TF that have specific roles in adult flies and those targets would be difficult to identify based on cell lines or embryonic data that were predominately used in the construction of the prior network. For this analysis we focused on the Doublesex transcription factor (DSX) whose predicted targets have been recently elucidated [17]. We showed that the target genes predicted by NetREX are in good agreement with the experimentally identified targets.

Our method addresses an important challenge in analysis of gene regulation. It can be applied in many diverse setting including construction of condition specific GRNs and networks for organisms related to a model organism where a preliminary regulatory network exits. As a spin-off of these studies we also developed mathematical underpinning allowing to adopt Proximal Alternative Linearized Maximization (PALM) algorithm to the context of the *ℓ*_{0} elastic net.

## 2 Mathematical Underpinning of the NetREX Method

Before describing the mathematical foundations behind the NetREX method, we provide a brief overview of the traditional (static) NCA method and its various implementations. Next we introduce the formula for the objective function in our NetREX method. Importantly, the objective function is non-convex and non-smooth because of using *ℓ*_{0} norm in our formulation. Rather than relaxing the problem by replacing the non-convex *ℓ*_{0} norm with the convex *ℓ*_{1} norm, we have directly solved the more challenging problem with *ℓ*_{0} norm by adopting the recently proposed Proximal Alternative Linearized Maximization (PALM) algorithm [11] to the original formulation of the problem.

### 2.1 Network Component Analysis (NCA) and Its Implementations

The main principle of the NCA is to explain the expression of each gene as a linear combination of activities of TFs regulating that gene weighted by their control strength that exerts on the gene. In case of NCA, the topology of the bipartite GRN is provided as a part of the input. Formally, let *E ∈* ℝ^{N ×L} be the matrix of expression data of *N* genes in *L* experiments. NCA is a special case of a more general problem which is to express *E* as
where *S ∈* ℝ^{N ×M} is a weighted adjacency matrix of the bipartite GRN *𝒢*(*TF, TG, S*) such that the edges of *𝒢* in the edge set *S* connect transcription factors in the *M* element set *TF* to target genes from the *N* element set *TG*. Specifically, for target gene *i* and transcription factor *j*, weight *S*(*i, j*) defines the control strength that transcription factor *j* exerts on gene *i*. The rows of *A ∈* ℝ^{M ×L} represent the (hidden) TF activities of each TF over the set of experiments, and *Γ ∈* ℝ^{N ×M} represents the noise (Fig. 1).

Many mathematical techniques, such as principle component analysis (PCA), independent component analysis (ICA), non-negative matrix factorization (NMF) [18] and sparse coding (SP) [19], can be used to determine the decomposition of *E* specified in (1) (for NMF, *E* needs to be normalized to a non-negative matrix). However, PCA and NMF [20] are unable to find a decomposition of *E* when *M > L* (i.e. the number of TFs is larger than the number of experiments). Moreover, PCA and ICA hinge on the assumptions of orthogonality and independence between the signals, which may not hold for TF activities (rows of *A*). In addition, all of them can not utilize the prior knowledge of the GRN *𝒢*. In contrast, NCA [5,21,22,23,24,25] can deal with the situation when *M > L*, make no assumptions on TF activities and is able to take full advantage of the prior knowledge of the GRN *𝒢*. Specifically, NCA aims to uncover the matrix *A* describing the hidden regulatory activities of TFs and matrix *S* describing control strengths of each TF on target genes assuming that the structure *S*_{0} (unweighted adjacency matrix) of the underlining GRN *𝒢*_{0} = (*TF, TG, S*_{0}) is known. That is only the entries of *S* that correspond to edges in *S*_{0} can be non-zero (formally Support(*S*) = Support(*S*_{0}), where Support(*S*) denotes the support of *S*, i.e. the positions of its non-zero entries.). Thus NCA recovers the TF activities *A* and their control strength *S* by solving the following optimization problem with only the expression data *E* and the structure *S*_{0} of *𝒢*_{0} as inputs.
where ‖S‖_{∞} = max_{i,j} *|S*(*i, j*)*|*. The first constraint in the above formulation restricts the structure of the regulatory network *𝒢* represented by matrix *S* to be exactly the same to the input regulatory network *𝒢*_{0}. And the rest of the constraints aim to ensure the elements in *A* and *S* remain within the domain of biologically sensible values.

The first method [5] to solve problem (2) can provide a unique solution only if the following conditions are met: (i) the matrix *S* should have full-column rank; (ii) each column of *S* should have at least *M -* 1 zeros; (iii) the matrix *A* should have full row-rank. Under these conditions, *S* and *A* are estimated using an iterative two-step least-squares algorithm [5]. Tran et al. [21] expanded NCA by allowing the specification of the zero pattern of *A* as well as *S*. Galbraith et al. [22] modified the NCA method by revising the third criterion for NCA which cannot be tested before solving the problem. Chang et al. [23] treated NCA as an unconstrained optimization problem and employed singular value decomposition (SVD) to find a closed form solution for *S* without time-consuming iterations. Jacklin et al. [24] also proposed a non-iterative algorithm for NCA resorting to convex optimization methods. All these methods are vulnerable to the presence of small number of outliners in expression data. To deal with these outliers, Noor et al. [25] proposed ROBust Network Component Analysis (ROBNCA) where an additional sparse matrix was used for explicitly modeling the outliers.

### 2.2 The Formulation of the Optimization Problem Behind NetREX

Independently of numerous variants of the NCA, the assumption that the GRN must be known in advance is a significant drawback of the NCA method. NetREX relaxes this restriction under the assumption that a prior regulatory network that is not too far from the underlining true regulatory network is given. Therefore, it is possible to recover the underlining regulatory network by limited changes to the prior network. Note that this is a very reasonable assumption in many practical applications where the prior network could come from a related organism or a related tissue or even from the same organism but without sufficient data. Additionally, to guide the network reconstruction, we assume that genes with highly correlated expression are likely to be regulated by the same TFs. The correlations between genes can be encoded in the gene correlation network *G*^{E} constructed based on gene expression data *E*. Thus in the new optimization problem (3) we remove the constraint that the structure of the network is fixed (Support(*S*) = Support(*S*_{0})) but introduce a penalty term that limits the number of added and removed edges with respect to the prior network, along with the terms encouraging consistent treatment of co-expressed genes, and network sparsity. We devote the rest of this subsection to explaining the roles of the added terms.
where *λ, κ, η, ξ, μ* are the parameters controlling the strength of the corresponding terms.

The term controlled by *λ* restricts the number of edge changes. Here is the adjacency matrix of the complement graph of *𝒢*_{0} and therefore and ‖*X*‖_{0} is the *ℓ*_{0} norm that computes the number of non-zero entries in *X*. *0* is the Hadamard product. We note that ‖*S0*‖_{0} *- IS 0 S*_{0}*I*_{0} denotes the exact number of regulations removed from *𝒢*_{0} and is the number of regulations added to the prior network *𝒢*_{0}. *λ* controls the change in topology of the regulatory network. Larger *λ* indicates that only small number of edges can be added and removed controlling how far our predicted network *𝒢* is from the prior network *𝒢*_{0}.

The term controlled by *κ* (the graph embedding term [26]) encourages *S*(*i, k*) and *S*(*j, k*) to have similar control strength if genes *i* and *j* are correlated. In Supplementary Materials A.1 we provide derivations demonstrating that:
where tr() is the trace of a matrix and *W* and *L* are the adjacency matrix and the Laplacian matrix of the correlation network *G*^{E}, respectively.

The term of equation (3) that is controlled by parameter *η* encourages sparsity of the final network (note that *ℓ*_{0} norm computes the number of non-zero elements). However we note that there might exist correlations between TF activities (rows of *A*), which would imply relations between TFs and enforcing the sparsity might weaken them. This means that, for a gene, only one TF can be selected from a group of TFs whose activities are highly correlated even though all TFs in the group regulate the gene. Therefore, we have an additional term (controlled by parameter *ξ*) using Frobenius norm to encourage that all regulating TFs have non-zero values in *S*. For the reader familiar with the elastic net model, we point that is analogous to *ℓ*_{1} elastic net [27], and we can refer to it as *ℓ*_{0} elastic net.

Finally, the last term controlled by the variable *μ* enforces smoothness of activities in *A* by avoiding many elements in *A* reach to the limit *{-b, b}*.

After some linear algebra (Supplementary Materials A.2), we obtain our final formulation as follow. We require *η - λ ≥* 0, otherwise the above formulation would preserve all regulations in *𝒢*_{0}.

### 2.3 Solving the Optimization Problem Underlying the NetREX Algorithm

Our algorithm to solve optimization problem (5) relies on the recently proposed proximal alternative linearized maximization (PALM) [11] algorithm. The PALM method can solve a general optimization problem formulated as
where *F*(*S, A*) has to be smooth but *Φ*(*S*) and *Ψ* (*A*) do not need to have the convexity and smoothness properties. *ϒ* and *Ω* are constraint sets for *S* and *A*, respectively. The PALM algorithm alternatively applies technique known as proximal forward-backward scheme to both *S* and *A*. Specifically, at iteration *k*, the proximal forward-backward mappings of *Φ*(*S*) and *Ψ* (*A*) on *S ∈ ϒ* and *A ∈ Ω* for given *S*^{k} and *A*^{k} are the solutions for the following sub-problems, respectively.
where 〈*X, Y*〉 = tr(*X*^{T} *Y*), *c*^{k} and *d*^{k} are positive real numbers and ∇_{S}*L* (*S*^{k}, *A*^{k}) is the derivative of *L* (*S, A*^{k}) with respect to *S* at point *S*^{k} for fixed *A*^{k} and ∇_{A}*L* (*S*^{k+1}, *A*^{k}) is the derivative of *L* (*S*^{k+1}, *A*) with respect to *A* at point *A*^{k} for fixed *S*^{k+1}. It has been proven that the sequence (*S*^{k}, *A*^{k})_{k∈N} generated by PALM converges to a critical point when it is bounded [11].

Casting our optimization problem (5) into the PALM algorithm framework (6) introduced above,we have and . The constraint sets *ϒ* and *Ω* are respectively *ϒ* = *{S | ‖S‖ ≤ a}* and *Ω* = *{A | IAI ≤ b}*. We note that *L* (*S, A*), *Ψ* (*A*) and *Φ*(*S*) satisfy the requirements of the PALM algorithm. Namely, *L* (*S, A*) is smooth, *Ψ* (*A*) is convex and smooth but, as allowed in the PALM approach, *Φ*(*S*) is non-convex and non-smooth. Hence, we can apply the PALM algorithm to our problem as long as we can efficiently solve the proximal forward-backward mappings for our specific *Φ*(*S*) and *Ψ* (*A*). Proving that we can actually do it is mathematically the most challenging component of the development of the method. Due to technicality of the derivations we leave most of them to the supplement and in what follows we only point to the most critical components of the argument.

It is easy to confirm that the NetREX problem (5) can be solved by alternatively applying the following proximal forward-backward mappings (8a) and (8b), which are derived from (7a) and (7b) by casting our specific *L* (*S, A*), *Φ*(*S*), *Ψ* (*A*), *ϒ* and *Ω* and some linear algebra (derivations can be found in Supplementary Materials A.3).:
where

The derivatives ∇_{S}*L* (*S*^{k}, *A*^{k}) and ∇_{A}*L* (*S*^{k+1}, *A*^{k}) can be computed by
which are Lipschitz continuous with *L*(*A*^{k}) = ‖*A*^{k}(*A*^{k})^{T} ‖ *+2κ ILI*_{L} and *L*(*S*^{k+1}) = ‖(*S*^{k+1})^{T} *S*^{k+1}‖*L* as Lipschitz constants, respectively. As suggested by [11], we set *c*^{k} = max {*v, L*(*A*k)} *, v >* 0 and *d*^{k} = {*v, L*(*S*^{k+1})} *, v >* 0 to make sure the formulas in (9) are well defined.

The closed form solution of the proximal forward-backward mapping (8a) can be obtained based on Proposition 1, the Proximal Mapping of *ℓ*_{0} Elastic Net Under ‖ ‖_{∞} Constraint Proposition, and its corollary (Corollary 1). The proposition and the corollary and their proofs can be found in the Supplementary Materials B.1 and B.2. We emphasize that Proposition 1 provides the closed form solution for the proximal mapping of *ℓ*_{0} elastic net under ‖ ‖_{∞} constraint and thus it has broader applications to diverse feature selection approaches [28,29].

With the help of Proposition 1 and Corollary 1, (8a) can be efficiently computed by (11a). And (8b) can be computed by (11b). where the definitions of and can be found in Corollary 1 and Proposition 1, respectively. The derivations of (11a) and (11b) can be found in the Supplementary Materials A.4.

We now have all the ingredients for our NetREX algorithm. Hence, we describe the NetREX algorithm in Algorithm 1 in Supplementary Materials C. We note that the constraints for both *S* and *A* (‖*S*‖_{∞} *≤ a* and *IAI*_{∞} *≤ b*) make sure that the sequence (*S*^{k}, *A*^{k}) _{k∈N} is bounded. Thus we state that the sequence produced by the NetREX algorithm converges to a critical point of the optimization problem (5), which is described in Proposition 2 in Supplementary Materials B.3.

## 3 Validation and Experimental Results

### 3.1 Results on Simulated Data

To validate our approach, we applied NetREX to the simulated data generated based on linear model (1). We first randomly generated the ground truth adjacency matrix *S* of the regulatory network *𝒢*(*TF, TG, S*) and TF activities *A*. Then, the simulated expression data was generated as following
where ∑_{p} *S*(*i, p*)*A*(*p, j*) is the noiseless data arising from known *A* and *S* matrices and the noise *Γ* (*i, j*) *∼* **N**(0*, σ*^{2}) obeys a normal distribution with 0 mean and *σ*^{2} variance. We assigned the prior network *𝒢*_{0} the same number of edges as the ground truth network *𝒢* has but only *θ* percentage of edges in *𝒢*_{0} are true edges. We can tune the difficulty of the network rewiring task by using different *o* and *θ*.

We compared NetREX with its two natural variants on the simulation data. The first variant is NetREX NP (NetREX with No edge Perturbation term) that has the same formulation as NetREX but with *λ* = 0. The difference between NetREX and NetREX NP is that NetREX penalizes the number of edges added and removed from the prior network but NetREX NP does not. Here we want to mention that NetREX NP and sparse coding have similar formulations (Supplementary Materials E.1). The other related algorithm in our comparison is NetREX *ℓ*_{1}, which estimates the *ℓ*_{0} norm in NetREX using *ℓ*_{1} norm. We note, that substituting *ℓ*_{0} norm by *ℓ*_{1} norm makes the sub-problems convex and thus easier to solve. The implementation of these two algorithms are introduced in Supplementary Materials E.1.

We evaluated the performance of the compared algorithms in terms of F-measure (defined in Supplementary Materials D.1). To avoid the effect of parameter selection, for each algorithm, under certain noise level (*σ, θ*), we first found its optimal parameters in terms of F-measure on one simulated data set through grid search. Then we ran the algorithm on another 50 randomly generated simulated data sets under the same (*σ, θ*) using its optimal parameters. We can further test whether one method is statistically better than another method under a specific noise level by computing the p-value from one-side paired t-test between their 50 paired F-measures.

The comparisons between NetREX and others are shown in Fig. 2. Fig. 2A shows the comparison between networks predicted by NetREX and the prior networks, in which we found a tendency that when the expression data is less noisy (*σ* is small) and the prior network is closer to the ground truth (*θ* is large), the network predicted by NetREX achieves higher F-measures than the prior networks. Additionally, we note that NetREX exhibits, by a larger margin, higher F-measure comparing to the prior networks after *θ ≥* 0.3. However, for *θ <* 0.3 the networks predicted by NetREX only marginally better than the prior network, which also implies that if we use random networks that do not have much overlap with the ground truth as the prior networks, we can not obtain prom‖S‖ng results. The comparison between NetREX and NetREX NP is displayed in Fig. 2B. We note that NetREX significantly outperforms NetREX NP after *θ >* 0.1. In Fig. 2C, we observe that NetREX *ℓ*_{1} performs better in certain cases where the noise in the expression data is large (*σ* is large) because *ℓ*_{1} norm is robust to noise. However, for most of the noise levels, NetREX achieved significantly higher F-measures comparing to NetREX *ℓ*_{1} demonstrating that *ℓ*_{0} norm is superior to *ℓ*_{1} norm on selecting sparse contributing components.

### 3.2 Results on Real Experimental Data from DrosDel Study

Next we applied NetREX to gene expression data in the adult female flies from 99 hemizygotic lines (deletion/+) of the *Drosophila* deletion collection (DrosDel) project covering 68% of chromosome 2L. Specifically, in each of the DrosDel lines, a different chromosomal fragment has been deleted leaving the organism with only one copy of genes for the deleted region [13]. We used the network constructed in [6] as the prior network, which is constructed through integrating diverse functional genomics data sets (such as transcription factor (TF) binding, evolutionary conserved sequence motifs and etc.) in a supervised learning framework. The data used in [6] typically comes from cell lines and expression profiles of developmental stages. NetREX predicted regulatory networks for the adult female flies. And we verified these networks using GO functional annotations [14], physical protein-protein interactions (PPIs) and *Drosophila* Doublesex transcription factor (DSX) target genes [17].

We compared our predicted networks with the prior networks and the TF-Gene correlation networks that were built based on the Pearson coefficient between expression of TFs and genes using expression measurements in DrosDel data [30]. Furthermore, we compared with GENIE3 [16], the best performer in the DREAM4 *In Silico Multifactorial challenge* [15], which predicts GRNs using only expression data. We also tried to compare with NMF based algorithms, which can not be applied because the dimensional requirement of NMF [20] is not satisfied (i.e. the number of TFs in the target GRN is larger than the number of hemizygotic lines in DrosDel data). To demonstrate performance of NetREX under different parameters, and to choose the parameters in a manner that does not depend on tested data (GO annotations and PPIs), we developed a simple heuristic that ranks the models performance using a quality score based on the objective function (5) (Supplementary Materials E.3). We then used top twenty models with respect to this ranking. In addition, since performance of the correlation based algorithm and GENIE3 might depend on different cut-offs, in the evaluation we showed the performance of the TF-Gene correlation networks and networks predicted by GENIE3 with different cutoffs (Supplementary Materials E.3). Finally, we note that, unlike other networks in this comparison, the co-expression network is not a regulatory network, however, we embedded the information about this network in our objective function, thus we need to show that the good performance of our method is not merely reflection of embedding this information in the objective function. All details of parameter setting used in the comparison are elaborated in the Supplementary Materials E.3.

#### Functional Enrichment of the Predicted Regulatory Networks

We assessed the biological relevance of the predicted regulatory networks through checking whether genes co-regulated by similar TFs exhibit similar functional properties. We used the measures (a brief review is in Supplementary Materials D.2 and D.3) proposed in [6] to evaluate the enrichment of co-regulated genes in terms of GO functional annotations [14] and experimentally derived physical Protein Protein interactions (PPIs) extracted from DroID database [31], respectively. In addition to the prior networks, the TF-Gene correlation networks and the networks predicted by GENIE3, we also computed the same measures for the co-expression network inputted as the graph embedding term in NetREX.

First, we examined whether co-regulated genes have similar GO annotations. The comparison results are illustrated in Fig. 3A. NetREX clearly outperforms all other approaches demonstrating benefit of using both the prior network and the condition specific gene expression data.

Next, we evaluated whether physically interacting genes are tend to be co-regulated in the respective networks. Fig. 3B shows the PPIs enrichment comparison. Using this enrichment as a measure of network quality as it has been proposed in [6] we observed that NetREX also outperforms all other methods.

#### Agreement with DSX Targets

To further validate the predictions obtained from different methods, we concentrated on target genes of *Drosophila* Doublesex transcription factor (DSX) which involves in the sex determination system as different isoforms in flies [32]. Recently, Clough et al. [17] reported a rich set of DSX targets based on a series of genome-wide experiments and analysis. Thus, we checked how well the predicted DSX targets of NetREX are in agreement with genes identified in [17]. For each comparing method, we selected the network giving the best fold enrichment in both GO annotation and physical PPIs (Supplementary Materials E.4). As shown in Fig. 4A, NetREX outperforms other methods and there is statistically significant overlap between targets predicted by NetREX and targets inferred from experimental data (p-value are computed by the hyper geometric test).

#### Prediction and Validation of DSX Regulators

We then used NetREX to predict the regulators of *doublesex* gene (*dsx*). Fig. 4B illustrates the results of our prediction for *dsx* in female flies. Our predictions include multiple probable transcription factors that for example, *Trl* (*Trithorax-like*) gene encodes *Drosophila* GAGA factor that has reported roles in sex chromosome dosage compensation [33]. In addition, the *retn* locus is required to repress male courtship behaviors in females [34] while *bab1* and *bab2* genes have overlapping functions controlling sex-specific morphology [35]. We also observed many other predicted genes in the list have sex-specific expression, as like *bbx* (*boby sox*) and *CG6175*, which demonstrate testis-specifically repressed, but ovary-expressed patterns [36].

## 4 Conclusion

Regulatory networks embed key information needed for modeling and interpreting experimental data. Currently, regulatory networks are constructed by combining information from various tissues, stages, and conditions [6,37,38] to obtain a context-agnostic network. However the importance of constructing tissue / stage specific networks is now being increasingly recognized [9,10]. And constructing such network from scratch for every relevant tissue and/or condition is not realistic. In addition, data obtained from different tissues and conditions might, provide additional information that context specific analysis might not be sufficiently empowered to detect. For these reasons it is fundamental to be able to utilize regulatory networks constructed in context independent way as a starting point for context specific network construction. The NetREX method proposed in this paper fulfills this critical need. Importantly, our construction is mathematically rigorous. We proved convergence of the method and validate its performance on both simulation data and real world data. The experimental results demonstrate that NetREX is capable to recover the biological meaningful condition specific TF-gene regulatory networks.

## Supplementary Materials II: Implementation Details

### C The NetREX Algorithm

#### C.1 The Details of the NetREX Algorithm

#### C.2 The Initialization Algorithm for NetREX

To ensure that the starting point is consistent with the prior network, (*S*^{0}*, A*^{0}) have to be inferred from our prior network *𝒢*_{0}. To do this, we compute (*S*^{0}*, A*^{0}) by solving the following problem, which is obtained from the original NetREX formulation by dropping the constraints and disregarding the non-smooth regularization term .

The problem (38) can be solved by the standard Gauss-Seidel scheme [39] that alternatively solves the multi-variable optimization problem with respect to one variable while f‖X‖ng the rest of the variables. Specifically, we can fix and solve (38) with respect to *A* in closed form shown in Line 4 of Algorithm 2. Then, we fix and solve (38) with respect to *S*, whose solution is the solution of the Sylvester equation (derived by setting . The Sylvester equation is solved by standard Bartels-Stewart algorithm. We alternatively ran lines 4 and 5 *K* times. In the end, we project the solutions and into feasible space of Eq. (5) by the projection operator (21) shown in lines 7 and 8. Algorithm 2 elaborates the details of obtaining (*S*^{0}*, A*^{0}).

### D Evaluation Metrics

#### D.1 F-measure

F-measure is defined as where

*∊* and *∊* ^{p} are edge sets of the underling regulatory network *𝒢* and the predicted regulatory network, respectively. F-measure ranges from 0 to 1, where 1 presents that the underlining *𝒢* is fully recovered and 0 means the opposite.

#### D.2 Fold Enrichment for GO annotations

We consider two genes are co-regulated if the Jaccard similarity coefficient between the TF set regulating the first gene and the TF set regulating the second gene is larger than 0.5. The Jaccard similarity coefficient between two sets is the ratio of the size of the intersection of the given two sets to the size of the union of these two sets. Then for each co-regulated gene pair, we again use the Jaccard similarity coefficient to measure the similarity between the GO annotation set corresponding to the first gene and the GO annotation set corresponding to the second gene. In the end, we compute the average of this coefficient overall co-regulated gene pairs. The same procedure was done for 100 randomized networks, and the enrichment is the ratio of the average coefficient of the original network to the average of the randomized networks. The randomized networks are generated by permuting the node labels of the original network. Hence, all randomized networks have the same topology to the original network but with different node labels. The statistical significance is accessed at a level of 0.05 using a one-side unpaired T-test for comparing the Jaccard coefficients from the original network with coefficients from 100 randomized networks.

#### D.3 Fold Enrichment for PPIs

Enrichment of co-regulated genes for PPIs was computed analogously to enrichment for GO annotations. Specifically, we computed the ratio of the number of PPIs for co-regulated gene pairs to the average number of such PPIs in 100 randomized networks, using the same definition for co-regulation and network randomization.

### E Parameters

#### E.1 The NetREX NP and NetREX *ℓ*_{1} Algorithms

The NetREX NP algorithm is same to Algorithm 1 with *λ* = 0. The formulation of NetREX NP is

The formulation is similar to sparse coding [19] if we remove the graph embedding term. The NetREX *ℓ*_{1} formulation is as following.

To do a fair comparison, we also use the PALM algorithm to solve it which is analogous to Algorithm 1. The only difference is that in line 6 of Algorithm 1, we use proximal mapping of *ℓ*_{1} elastic net that is given in [40] instead of proximal mapping for *ℓ*_{0} elastic net.

#### E.2 Parameter Settings for Simulated Data

The parameters used to generate simulated data is *L* = 60*, N* = 500*, M* = 100. The density of the ground truth GRN is 0.1. The noise level in simulated expression data *E* is controlled by *σ* = *{*0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0*}*. The percentage of true edges in *𝒢*_{0} is controlled by *θ* = *{*0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9*}*.

There are seven parameters for NetREX algorithm, which are *λ*, *η*, *κ*, *ξ*, *μ*, *a* and *b*. We applied grid search to find the optimal parameters. The settings are as following. We set *η - λ ∈* [0.2, 5] with interval 0.2, *η* + *λ ∈* [1, 50] with interval 1, *κ ∈* [0.1, 0.5] with interval 0.1, *ξ* = *{*0.1, 1*}*, *μ* = *{*0.1, 1*}* and *a* = *b* = max_{i,j}(abs *E*(*i, j*)). We used the same parameter setting for NetREX NP except *λ* = 0 and *η ∈* [1, 50] with interval 1. For NetREX *ℓ*_{1} algorithm, we use exactly the same parameters to the NetREX algorithm.

To test the potential of the competing algorithms, for certain noise level, we first applied grid search for all algorithms to find their optimal parameters on only one simulated data set based on F-measure. Then we use the optimal parameters to other 50 simulated data set under the same noise level. We compared the performance of different algorithms based on the F-measures.

#### E.3 Parameter Settings for DrosDel Data

We set *η - λ ∈* [0.01, 0.2] with interval 0.01, *η* + *λ ∈* [0.5, 10] with interval 0.5, *κ* = *{*0.05, 0.1*}*, *ξ* = *{*0.1, 1, 5*}*, *μ* = *{*0.1, 1*}* and *a* = *b* = max_{i,j}(abs *E*(*i, j*)). Because we do not know the ground truth regulatory networks, we proposed a heuristic score to rank our predicted networks. The score can be computed as

We reasoned that the prom‖S‖ng networks should be able to describe the underling regulatory system (making *IE - SAI*_{L} small) as well as have only the contributing regulations (the number of edges in the network ‖*S*‖_{0} is small). We used power of 4 on fitting error (*IE - SAI*_{L}) because, to build a condition specific RGN, fitting the condition specific expression data *E* is more important. The smaller *R* implies that we can fit the expression data using the network with smaller number of edges. We ranked all predicted networks under different parameters in terms of *R* in ascending order. We showed the performance of the top 20 networks.

For constructing the TF-gene correlation networks, we used Pearson coefficient cutoffs (*{*0.6, 0.7, 0.8, 0.9*}*) and show the performance of the networks under different cutoffs.

For GENIE3, there is only a parameter *K* used by it. [16] suggests two settings for *K*, which are *K* =*M-* 1 and . We compared the results of these two *K*s. *K* = *M -* 1 is better than . Therefore, we use *K* = *M -* 1 in comparison. We also need a cutoff to get the final GRN. We ranked the weighted predicted by GENIE3 and used the top 100,000, 200,000, 300,000,400,000 and 500,000 as output, respectively.

The co-expressed gene pairs used in the comparison shown in Fig. 3 is the same to the one we inputted in our formulation as the graph embedding term. The Pearson coefficient cutoff used here is 0.88.

#### E.4 The Best Network Based On Fold Enrichment

For the correlation based method, GENIE3 and NetREX, we show their performance under different parameters in fold enrichment analysis shown Fig. 3. When comparing their performance on agreement with DSX targets, we only use the networks with the best fold enrichment. We select the best networks as following. Take the networks predicted by NetREX for example. First, we ranked all networks based on fold enrichment of GO annotations and stored in *R*_{GO}. Then we ranked all networks based on fold enrichment of PPIs and stored in *R*_{P} _{P} _{Is}. We then ranked the networks based on the sum of the ranking we just computed (*R*_{GO} + *R*_{P} _{P} _{Is}) and treat the top network as the best network.

## 5 Acknowledgments

This work was supported by the Intramural Research Program of the National Institutes of Health, National Library of Medicine (YW, TMP, DYC) and National Institute of Diabetes and Digestive and Kidney Diseases National (HK, BO). The authors thanks Roded Sharan, and the members of Przytycka and Oliver groups for helpful discussions.