## Abstract

Gene regulatory network inference is essential to uncover complex relationships among gene pathways and inform downstream experiments, ultimately paving the way for regulatory network re-engineering. Network inference from transcriptional time series data requires accurate, interpretable, and efficient determination of causal relationships among thousands of genes. Here, we develop Bootstrap Elastic net regression from Time Series (BETS), a statistical framework based on Granger causality for the recovery of a directed gene network from transcriptional time series data. BETS uses elastic net regression and stability selection from bootstrapped samples to infer causal relationships among genes. BETS is highly parallelized, enabling efficient analysis of large transcriptional data sets. We show competitive accuracy on a community benchmark, the DREAM4 100-gene network inference challenge, where BETS is one of the fastest among methods of similar performance but additionally infers whether the causal effects are activating or inhibitory. We apply BETS to transcriptional time series data of 2, 768 differentially-expressed genes from A549 cells exposed to glucocorticoids over a period of 12 hours. We identify a network of 2, 768 genes and 31, 945 directed edges (FDR ≤ 0.2). We validate inferred causal network edges using two external data sources: overexpression experiments on the same glucocorticoid system, and genetic variants associated with inferred edges in primary lung tissue in the Genotype-Tissue Expression (GTEx) v6 project. BETS is freely available as an open source software package at https://github.com/lujonathanh/BETS.

## 1. Introduction

The recent availability of gene expression measurements over time has enabled the search for interpretable statistical models of gene regulatory dynamics [1]. These time series data present a unique opportunity to use the coordinated transcriptional response to environmental exposure to infer causal relationships between genes. However, there are several challenges to overcome in the analysis of time series transcriptomic data. These data are generally high-dimensional: the number of quantified gene transcripts—approximately 20,000 in human samples—often dramatically exceeds the number of available time points and samples. Many classical statistical assumptions fail to hold in this high-dimensional regime [2, 3]. Moreover, the large number of gene transcripts poses a computational burden, as the number of possible edges in a gene network grows quadratically. Finally, a transcriptional time series often has a small number of time points, and those time points are often not uniformly spaced; furthermore, because transcriptional time series data often quantify transcription post exposure, the time series is not stationary, and genes respond to the exposure and return to baseline at different rates [4, 5].

In this work, we develop an approach that uses the gene transcriptional time series following glucocorticoid (GC) exposure to build a directed gene network [6]. GCs play an essential role in regulating stress response, and are widely used as anti-inflammatory and immunosuppresive medication [6, 7]. Despite clinical benefits, prolonged exposure to GCs has been linked to increased risk for type 2 diabetes mellitus (T2DM) [8] and obesity [9]. Here, we develop a method to accurately, interpretably, and efficiently infer a directed gene network using transcriptional time series data. We focus our analysis of this network on immune-related genes, metabolism-related genes, and transcription factors (TFs) to study the inferred coordinated response of these systems to GCs.

Our method, Bootstrap Elastic net inference from Time Series (BETS), uses vector autoregression with elastic net regularization to accurately infer directed edges between genes. Stability selection, which assesses the robustness of an edge to perturbations in the data, leads to improvements over baseline vector autoregression methods in this high-dimensional context [3]. Furthermore, BETS is biologically interpretable because estimated coefficients provide the direction (sign) and effect size of the causal relationship between a pair of genes. Finally, BETS’s parallelization enables efficient inference of networks with millions of possible edges in a computationally tractable way.

We use the causal network inferred by BETS on the GC time series data to study the relationships between TFs, immune genes, and metabolic genes. We validate our network using two approaches: ten measurements of the same GC system with a specific TF overexpressed, and an expression quantitative trait loci (eQTL) study [10]. Although our framework is motivated by transcriptional response to GC exposure, our approaches are general, and BETS is applicable to inferring directed networks from arbitrary transcriptional time series.

## 2. Related Work

Several methods have been developed to estimate directed gene networks from transcriptional time series data (Figure S1) [11, 12, 13, 14, 15, 16, 17, 18, 19]. These methods estimate directed networks in which the directed edges between nodes—representing genes—indicate a cause-effect relationship between genes, i.e., perturbing expression of the *causal gene* would lead to changes in expression of the *effect gene* [20]. We briefly overview these methods; for detailed discussion, see Supplemental Information. Here, we take *g*′ to be the causal gene and *g* to be the effect gene, and quantify support for a causal edge *g*′ → *g* in the data.

Mutual information (MI) methods assess the MI between the expression of *g*′ at the previous time point and the expression of *g* at the current time point (Figure S1A) [21, 22, 23, 24, 25, 26]. A causal edge *g*′ → *g* is included in the network if the MI of the two genes across time exceeds a threshold.

Granger causality methods determine if including the expression of *g*′ at the previous time point improves our ability to predict the expression of *g* at the current time point above using the expression of *g* at the previous time point [27]. A common way to implement Granger causality is through a vector autoregression (VAR) model, which assumes a linear relationship between all genes’ expression at the previous time point and the expression of *g* at the current time point. A causal edge *g*′ → *g* is included in the network when *g*′ has a statistically significant coefficient in the VAR.

Ordinary differential equations (ODEs) fit the derivative of the expression of *g* as a function of all genes’ expression at a single time point (Figure S1C) [11, 28, 29]. ODE methods typically assume linearity, as small sample sizes make it challenging to infer the parameters of nonlinear functions. A causal edge *g*′ → *g* is included when *g*′ has a statistically significant coefficient in the ODE.

Decision trees (DTs) are a type of nonparametric function based on partitioning the data [30, 31]. DT methods fall either under VAR or ODE; either the DTs fit the expression of *g* at the current time as a function of all genes’ expression at the previous time point (VAR), or they fit the derivative of the expression of *g* as a function of all genes’ expression at a single time point (ODE) (Figure S1D) [32, 33]. A causal edge *g*′ → *g* is included in the network when an importance score for *g*′ exceeds some threshold, where importance scores are typically the reduction in variance of *g* when *g*′ is included as a predictor.

Dynamic Bayesian networks (DBNs) search the space of possible directed acyclic graphs between previous and current expression levels to identify the network structure with the highest posterior probability of each edge given the data (Figure S1E) [34, 35, 36, 37, 38]. DBNs typically assume a linear relationship between previous and current expression. A causal edge *g*′ → *g* is included in the network when its marginal posterior probability of existence exceeds some threshold.

A Gaussian process (GP) is a distribution over continuous, nonlinear functions. GPs are often used in the context of nonlinear DBNs, where GP regression is used to model a nonlinear relationship between previous expression and current expression (Figure S1F) [39, 40]. A causal edge *g*′ → *g* is included in the network based on its posterior probability of existence exceeding some threshold.

## 3. Results

First, we briefly describe the approach in BETS to infer a directed gene network. Next, we compare results from BETS to those from twenty other methods on the 100-gene time series data from the DREAM4 Network Inference Challenge [41]. Then, we describe the network estimated from the GC transcriptional time series data. Finally, we validate the inferred network using two different frameworks: overexpression experiments on the same system, and genetic variants associated with inferred edges in primary lung tissue in the Genotype-Tissue Expression (GTEx) v6 project [10].

### 3.1. BETS: A vector autoregressive approach to causal inference of gene regulatory networks

Directed networks represent causal relationships among diverse interacting variables in complex systems. We developed a robust, scalable approach based on ideas from Granger causality to construct these directed networks from short, high-dimensional time series observations of gene expression levels.

Let *G* be the set of all *p* = |*G*| genes in the data set and *g ∈ G* be a gene. Let *¬g* be *G* with *g* removed. Let *t* be a single time point, ranging from {1, 2, *…, T*}. Let be the expression of gene *g* at time *t*. Let *L* be the time lag, or the number of previous time point observations; so *L* = 2 means that we use two previous time points, *t −* 1 and *t −* 2, to predict expression at time *t*.

(Granger causality). For lag *L*, a gene *g*′ is said to *Granger-cause* another gene *g* if using , the expression value of *g*′ at times *t −* 1 to *t – L*, improves prediction of , the expression value of *g* at time *t*, beyond the prediction using alone.

To test for Granger causality from *g*′ to *g*, we first preprocessed the gene expression time series data (STAR Methods). For every potential effect gene *g*, we fit all other genes *g*′ *∈ ¬g* simultaneously (Equation 1), echoing ideas from the graphical lasso for undirected network inference [42]. Intuitively, this adapts the idea of Granger causality to conditional Granger causality, where we consider how gene *g*′ Granger causes *g* conditioning on the effects of all other genes. This approach uses the regression:
where *ε*_{t} *∼* 𝒩 (0, 1). For BETS, we set *L* = 2. To test for an edge, if , then we say *g*′ conditionally Granger-causes *g* at lag *ℓ*. We build the directed network by including a directed edge to *g* from every gene *g*′ that has been inferred to conditionally Granger-cause *g*.

Robustly building this network is difficult due to the high dimensionality of the problem: the number of genes that could Granger-cause a given *g* far exceeds the available time points and technical replicates. To address this challenge, BETS regularizes the VAR model parameters using an *elastic net* penalty (STAR Methods, Figure 1A). Elastic net regression encourages sparsity and performs automatic variable selection on the genes being tested for causal influence [43]. The elastic net penalty, unlike the lasso penalty [44], is able to select groups of correlated variables and allows the number of selected variables to be greater than the number of samples. This is particularly important for gene expression assays where gene expression levels are often well-correlated and there are far more genes than samples.

In BETS, we fit the same VAR model to a data set in which causal genes have their expression permuted over time to generate a null distribution of edge coefficients. The coefficients are thresholded to produce a causal network with each edge at edge false discovery rate (FDR) ≤ 0.05 (Figure 1A). We then apply this network inference procedure to multiple (here, 1, 000) bootstrapped samples of the original data set (Figure 1B). Each edge has a *selection frequency*, or the frequency that the edge appears in networks inferred from the bootstrapped samples. Inspired by stability selection, this approach assesses if network edges are robust to perturbations of the data [3]. Finally, we run this overall procedure on a permuted version of the original data set to obtain a null distribution of selection frequencies (Figure 1C). The selection frequency threshold for including each edge is chosen to control the stability FDR ≤ 0.2. As a baseline, we compare BETS against Enet, which runs elastic net regression without stability selection to produce a causal network with each edge at edge FDR ≤ 0.05 (Figure 1A).

### 3.2. Leading Performance on DREAM Network Inference Challenge

We evaluated BETS against other directed network inference methods. We used the DREAM4 Network Inference Challenge [41], a community benchmark for directed network inference using gene time series data. This benchmark consisted of five data sets, each with ten time series measurements for 100 genes across 21 time points [41]. Evaluation was previously done by looking at the average of the area under the precision recall curve (AUPR) or the area under the receiver operating characteristic (AUROC) over the five data sets [33, 41]. Any method that provides a ranking of possible network edges could be evaluated in this framework.

We tested BETS and Enet against 20 other methods on the DREAM challenge [32, 33, 36, 45, 46]. We ran CSId, Jump3, CLR, MRNET, and ARACNE in-house and found our results consistent with those reported in the literature. All 20 methods reported AUPR, but only 15 reported AUROC.

BETS ranked 6th out of 22 in AUPR with an average AUPR of 0.128 (Figure 2A, Table S1) and 3rd out of 17 in AUROC with an average AUROC of 0.688 (Figure 2B, Table S2). BETS was the top performer of all VAR methods, and Enet was second best. All 22 methods outperformed random selection of edges, which achieved an average AUPR of 0.002 and average AUROC of 0.50 [45]. We also found that BETS and Enet had similar performance to the DBN methods in AUPR, and outperformed most of them in AUROC. Ranked by the top AUPR of each class of methods, the best performing class was GP, followed by DT, MI, VAR, DBN, and ODE [32, 36, 45]. The VAR method used in BETS produces edge signs (indicating excitatory or inhibitory causal effects) and effect sizes. While other methods based on GPs (e.g., CSId), MI (e.g., tl-CLR) or DTs (e.g., dynGENIE3) had marginally better overall network inference, they do not provide insight into the causal relationships because they only output a positive measure of a causal interaction [28, 33, 40].

Next, we compared the speed of BETS and two other top-performing methods: CSId and Jump3 (Table S3). BETS was the fastest at 4.8 hours while CSId took 9.8 hours and Jump3 took 45 hours. Thus, while BETS had a slightly lower AUPR compared with CSId and Jump3, it was substantially faster.

BETS improved upon Enet using stability selection. To quantify this improvement, we compared three other models: elastic net with lag 1, ridge regression with lag 2, and lasso with lag 2 (Table S4). In each case, the stability selection version outperformed the original version in average AUPR and AUROC. The improvement in average AUPR ranged between 0.016 and 0.03 (+20% to +31%), while the improvement in average AUROC ranged between 0.012 and 0.04 (+1.8% to +6.1%). Hence, our stability selection procedure leads to improved performance for multiple versions of VAR.

We also found that stability selection performance is robust to the number of bootstrap samples (Table S5). Decreasing the number of bootstrap samples from 1, 000 to 100 caused minor decreases of −0.004 in AUPR and −0.008 in AUROC, within the standard deviation across the networks. It also resulted in a 10-fold decrease in memory usage and 3-fold decrease in run time, due to a constant-time hyperparameter search. If users face computational constraints, we recommend that they use 100 bootstrap samples for nearly equivalent performance.

### 3.3. Application to gene transcription response to glucocorticoids

To infer the causal relationships in the GC response network, we analyzed RNA-seq data collected from the human adenocarcinoma and lung model cell line, A549. This consisted of two data sets. In an *original exposure* data set, cells were exposed to the synthetic GC dexamethasone (dex) for 0, 0.5, 1, 2, 3, 4, 5, 6, 7, 8, 10, and 12 hours [6]. In an *unperturbed* data set, the cells were first exposed to dex for 12 hours, after which the media was replaced and dex removed, and then measurements were taken at the same intervals 0, 0.5, 1, 2, 3, 4, 5, 6, 7, 8, 10, and 12 hours. BETS was fit jointly over the two data sets. In total there were 7 technical replicates (4 from *original exposure* and 3 from *unperturbed*). A single VAR was fit on 70 samples: Each of the 7 replicates had 10 samples, because using a lag 2 VAR model turns 12 time points into 10 samples.

We applied BETS to the GC-mediated expression responses to infer a causal network (Figure 3A). Edges with selection frequency (frequency of appearance among bootstrap networks) at least 0.097 were declared significant (FDR ≤ 0.2; Figure 3B). The network contained 2, 768 nodes representing distinct genes and 31, 945 directed edges (0.4% of possible edges). Of these, 466 genes were causes (had an outward directed edge) and all 2, 768 genes were effects (had an incoming directed edge). The out-degree distribution was heavy-tailed and skewed right (Figure 3C) while the in-degree distribution was lighter-tailed and more symmetric (Figure 3D). The network’s edge in-degree had a heavier left tail and lighter right tail than a normal distribution (Figure 3E). This suggests that causal genes were relatively rare (only 1/6th of network genes were causes) and a fifth of those only affected a single gene, whereas genes that were effects tended to have multiple causes. The network was inferred efficiently due to parallelization across genes, taking six days in real time and 292 days in CPU time to perform 5.5 million elastic net model fits.

To study the network with respect to the glucocorticoid system, we annotated specific genes as transcription factor (TF), immune-related, and metabolism-related [47, 48, 49, 50]. First, we inspected enrichment of each category among the causal genes (Figure 3F). At FDR ≤ 0.05, we found enrichment for TFs among causes; there were 226 causal TFs, representing 8.2% of the 2, 768 input genes. 62 of these TFs were causal, representing 13% of all causal genes (odds ratio (OR) = 2.0, Fisher’s exact test (FET) adjusted *p* ≤ 2.9 *×* 10^{−5}). Similarly, we found an enrichment among immune-related genes as causes: of 109 immune genes, representing 3.9% of the input genes, 39 of these were causes, representing 8.4% of all causal genes (OR = 2.9, FET adjusted *p* ≤ 2.5 *×* 10^{−6}). There was no enrichment among metabolism-related genes: there were 120 metabolism genes, representing 4.3% of input genes; 19 of these metabolism genes were causes, representing 4.1% of all causes (OR = 0.93, FET adjusted *p* ≤ 0.66).

To study the interactions among gene classes inferred by our network, we quantified enrichment for edges between each of the four gene classes – immune, metabolic, TF, and other gene types (*any*) (Figure 3G, Table S6). We found enrichment of 12 of the 16 possible edge types (FDR ≤ 0.05). The network was enriched for edges from any causal genes to immune genes; causal TFs to any genes, TFs, and immune genes; causal immune genes to any genes, TFs, immune genes, and metabolic genes; and causal metabolic genes to any genes, TFs, immune genes, and metabolic genes. This suggests that our network is enriched for causal TFs, immune genes, and metabolic genes.

Our network identified known biological interactions between genes with immune, metabolic, and TF roles; we highlighted 16 of the gene pairs with experimentally validated interactions (Figure 4, Table S7). *SOCS1* and *SOCS3* bind *IRS2* and promote its degradation, leading to reduced insulin signalling [51, 52]; furthermore, *SOCS1* represses *IL-4*-induced *IRS2* signalling [53]. *NR4A1* heterodimerizes with *RXRA* to activate it to promote gene expression under vitamin A signaling [54]; *NR4A1* also inhibits *p*300-induced *RXRA* acetylation [55]. Eleven of the 16 edges had the correct interaction direction; the five that were reversed are *TNFAIP3* → *IRAK2, SOCS3* → *HIVEP1, ATF3* → *MDM2, E2F1* → *CDH1*, and *FOS* → *EGFR*. These results suggest that BETS infers biologically meaningful relationships, but transcriptional data, absent other assays on protein abundance and cellular dynamics, are often underpowered to resolve the direction of the edge.

### 3.4 Validation of inferred network on overexpression data

We asked whether our inferred network edges validated on overexpression versions of the same experimental system, in which each of ten TFs was separately overexpressed over the same 12 hours of observations. Specifically, we assessed the concordance between inferred network edges *g′* → *g* and their coefficient in the overexpression data set under a VAR model (STAR Methods).

We first evaluated how well network edges replicated on individual overexpression data sets. We performed linear regression of a one-hot encoding of the original network’s edge sign (i.e., positive versus no edge or negative sign; negative versus positive or no edge) as the predictor against the VAR model edge coefficients estimated from each of the overexpression time series as the response (Figure 5A-B, STAR Methods). Of the ten data sets, 9 showed enriched positive effect sizes among positive edges at FDR ≤ 0.2 (*CEBPB, CEBPD, FOSL2, FOXO1, FOXO3, KLF6, KLF9, KLF15, OCT4*; Figure 5A). Three data sets showed enriched negative effect sizes among negative edges (*OCT4, TFCP2L1, CEBPD*) and four showed enriched positive effect sizes among negative edges (*CEBPB, FOSL2, KLF9, KLF15*; Figure 5B). Taken together, the positive edges inferred by BETS validate on the overexpression data, but the negative edges do not, indicating repressive effects may have inconsistent signs or feedback loops.

Next, we checked whether the 123 inferred causal edges from the TF *TFCP2L1* validated in the *TFCP2L1* overexpression data set (there were only about 10 causal edges from each of the other 9 TFs). We regressed the original network’s edge sign (+1 for positive edges, 0 for no edge, and −1 for negative edge) as the predictor against the overexpression VAR model edge coefficients as the response (Figure 5C). We found a positive relationship between the edge sign and overexpression coefficient (slope 0.17, two-sided t-test *p* ≤ 5 × 10^{−5}). This shows that causal edges from *TFCP2L1* are enriched for matched effect directions in the *TFCP2L1* overexpression data.

### 3.5 Validation of network edges through lung trans-eQTLs

We validated our network edges on an expression quantitative trait-loci (eQTL) study. A single nucleotide polymorphism (SNP) *S* is an eQTL for a gene *g′* if it is associated with *g′*’s expression level within a population. Given a true causal edge *g′* → *g*, if a SNP *S* is a local (cis-) eQTL for *g′, S* might also be a distal (trans-) eQTL for *g* [56]. We used gene expression levels in primary lung tissue (*n* = 278) from the Genotype Tissue Expression (GTEx) project v6p [10]. We observed an enrichment of low trans-eQTL association p-values from the directed network compared to shuffling the SNP labels (Figure 6A-B). This suggests our network captures more valid causal effects than expected by chance.

We next inspected specific associations and their corresponding edges. We found 341 trans-eQTL pairs in lung samples corresponding with 130 network edges (q-value FDR ≤ 0.2). There are more trans-eQTLs than edges because there are multiple cis-eQTLs for some causal genes *g′*. The 341 trans-eQTLs greatly improved upon the 2 identified in the GTEx v6 trans-eQTL study [57], demonstrating the utility of transcriptional time series for prioritizing promising associations. The top trans-associations were rs2302178-*CLDN1* (q-value FDR ≤ 0.095, extended from the cis-association rs2302178-*HS3ST6*), rs590429-*ADAMTS* (q-value FDR ≤ 0.11, extended from the cis-association rs2302178-*OLR1*), and rs2072783-*CLIP2* (q-value FDR ≤ 0.11, extended from the cis-association rs2302178-*GMPR*) (Figure 6C).

We searched for validated associations between immune-related genes, metabolic-related genes, and TFs. One association was *OLR1* → *ITGAV*, where we see that the known association between SNP rs4329754 and *OLR1* extends to an association between the same SNP and effect gene *ITGAV* (q-value FDR ≤ 0.13) [10]. *OLR1* plays key roles in immunity and metabolism [58, 59]. It is associated with metabolic syndrome [60] and atherosclerosis [60], and modulates inflammatory and humoral immune responses [61, 62]. Meanwhile, *ITGAV* plays a key role in the motility of *CD*4^{+} T cells during inflammation [63].

Another association was between the TF *SNAI2* and gene *PTPN6*, where we find that the known associ-ation between SNP rs56800165 and *SNAI2* extends to an association between the same SNP rs56800165 and effect gene *PTPN6* (q-value FDR ≤ 0.17) [10]. *SNAI2* is a direct target of the glucocorticoid receptor *GR* to regulate cell migration in breast cancer [64], while *PTPN6* is involved in glucose homeostasis via negatively regulation of insulin signalling [65]. *PTPN6* is also associated with inflammatory phenotypes in multiple diseases [66, 67]. Finally, both *SNAI2* and *PTPN6* are involved in the cell-cell adherens junctions pathway, as *SNAI2* represses transcription of cadherin, while *PTPN6* positively regulates the cadherin-catenin complex [68]. Thus, for several eQTL-validated edges for gene pairs, we find that the genes are involved in related biological processes, but further experimentation is required to confirm direct interactions.

Finally, as A549 cells are models for lung tissue [69], we quantified enrichment of validated edges in lung compared to enrichment in four other tissues: subcutaneous adipose (*n* = 298), transformed fibroblasts (*n* = 272), tibial artery (*n* = 285), and thyroid (*n* = 278). We validated 341 unique network edges across the five tissues (FDR ≤ 0.2). 130 edges validated for lung, 4 for subcutaneous adipose, 125 for transformed fibroblasts, 3 for tibial artery, and 82 for thyroid tissues. More network edges validated in primary lung than in other tissues, suggesting that A549 cells most closely match lung samples among GTEx tissues; this is consistent with their tissue of tumor origin.

## 4. Discussion

We described an approach, BETS, to build directed networks using short time series observations of high-dimensional transcriptional data. BETS combined ideas from elastic net regression, graphical lasso, stability selection and VAR models to infer Granger causality relationships in high dimensional transcriptional time series data. Our method achieved competitive performance on the DREAM4 100-Gene Network Inference Challenge, ranking 6th out of 22 methods in AUPR and 3rd out of 17 methods in AUROC; it was also faster than several methods with similar or better performance and infers effect size and sign, unlike the other top performing methods. Stability selection resulted in consistent improvement to VAR models across different hyperparameter settings.

Next, we applied BETS to time series RNA-seq data from human A549 cells exposed to glucocorticoids and identified a directed network of 31, 945 edges (FDR ≤ 0.2), capturing the causal relationships among genes after exposure to GCs. In our network, we found enrichment of immune genes and TFs among causal genes. We also found enrichment of causal edges from TFs, immune genes, and metabolic genes. We validated our network first in ten overexpression data sets, replicating positive edges with overexpression effects. Validating network edges by searching for trans-eQTLs in GTEx, we found an enrichment of associations with genetic variants across network edges. Finally, we discovered 341 trans-eQTLs, dramatically improving from the GTEx trans-eQTL study without filtering tests for association [57].

While BETS has demonstrated effective inference of causal relations, there are interesting future directions to explore. All methods that infer networks from transcriptional time series face several difficulties. Transcript levels are sometimes an imperfect proxy for protein levels, especially when transcript dynamics are changing [11, 70]; the scarcity of time point samples causes statistical challenges for inferring millions of possible causal interactions between genes, let alone non-additive interactions among causes [3, 71, 72]; transcriptional data do not capture the complete regulatory context including chromatin structure and epigenetic regulations [11]; transcriptional relationships are often nonstationary: the relationship may change over time due to responses from the environment [4, 5]; and inferred networks are often sensitive to the choice of preprocessing and parameter choices [73]. Single cell data also implicitly include transcriptional time series information when pseudotime is inferred, making ideas from Granger causality exceptionally relevant. Finally, experimental followup is key to establishing causality; BETS can only generate promising, interpretable hypotheses. Indeed, by discovering hundreds of more trans-eQTLs than the GTEx study (a 170-fold increase) [57], BETS demonstrates its potential to prioritize biologically meaningful associations.

## Author Contributions

TER and BEE designed and funded the study. LKH coordinated all genomic data production. LKH and SML collected RNA-seq data. BD, JL, BJ and BEE developed the methods and validation approaches. BD and JL applied these to data. ICM, TER, BJ, BD, JL, and AB analyzed the data. BJ and JL performed validation in the GTEx data. BD, JL, and BEE drafted the manuscript, and all authors contributed to revision.

This work was funded by the following grants: CZI AWD1005664, CZI AWD1005667, NIH R01 HL133218, NIH U01 HG007900, a Sloan Faculty Fellowship, and an NSF CAREER 1750729.

## 5. STAR Methods

### 5.1. Method details

#### Bootstrap Elastic net regression from Time Series (BETS)

Bootstrap Elastic net regression from Time Series (BETS) is a vector-autoregressive approach to causal inference from gene expression time series data. It is based on the principle of Granger causality [27]: a gene *g′* Granger-causes another gene *g* if previous information from gene *g′* improves our current predictions of gene *g*, beyond using previous information of other genes.

BETS first preprocesses the data. BETS fits an elastic net vector autoregression model to handle the high dimensionality of the time series, inferring a network (Figure 1A). It infers one network for each of 1, 000 bootstrapped samples of the original data set and computes each edge’s selection frequency: its frequency of appearance among the bootstrapped networks (Figure 1B) [3]. Finally, BETS includes an edge in the network using the selection frequencies (Figure 1B). Our baseline comparison, Enet, only preprocesses the data and fits an elastic net vector autoregression model from the original data (Figure 1A; Section 3.2).

#### Preprocessing temporal time series data

For a **gene temporal profile** (i.e., one gene’s expression values across time for a single replicate), we used zero-mean unstandardized normalization, which centers each gene temporal profile to have mean zero across time. Because gene temporal profile ranges from staying almost constant to having drastic fluctuations, BETS uses this approach because a unit-variance normalization would over-represent the weak causal effects of genes with lower variability.

#### Vector autoregression model

Let *G* be the set of all genes in the data, let *p* = |*G|* be the number of genes, and let *g* be a gene. Let ¬*g* be *G* with *g* removed. Let there be *T* time points total, and let *tϵ*{1, 2, *…*, *T*} be a single time point. Let there be *R* replicates of the gene expression time series.

Let be the expression of gene *g* at time *t* for replicate *r*. Let , be the *R ×* 1 vector of gene expression levels of gene *g* across *R* replicates at time *t*. The rest of the paper does not mention replicates for simplicity, but here we discuss replicates for completeness.

Let *g′* be the gene we are testing to be causal for gene *g* and let ℓ refer to the time lag of the causal edge *g′→g*. Let *L* be the maximum lag. In BETS, *L* = 2.

We model each gene *g* as
where *ϵ*_{t} *∼ 𝒩* (0, 1). In other words, the expression of each gene *g* is modelled as a linear function of its and other genes’ *L* previous expression values, under independent Gaussian noise. represents the (scalar) effect size of gene *g*’s ℓth previous value, , on its current value, represents the (scalar) effect size of the ℓth previous value of gene , on gene *g*’s current value, . Equation 2 requires that *t >* ℓ for the ℓth previous value, , to exist.

To demonstrate how our model is fit in practice, we reformulate Equation 2 using matrix notation. Each row represents one time point for one replicate. There are *T – L* time points with *t > L* and *R* replicates, so there are *R*(*T – L*) samples, or rows, in total. Let *N* = *R*(*T – L*).

Define , an *N ×* 1 vector, as:
We can similarly write , which is with each entry replaced by its ℓth previous value. Define a *N × L* matrix consisting of the first *L* previous vectors i.e., for ℓ ranging in *{*1, *…*, *L}*.
Let be a *L ×* 1 vector of the *L* lagged coefficients.
Next, let us formulate Equation 2 involving the genes *g′* in matrix notation. Let be a *N × L*(*|G| −* 1) predictor matrix of the vector , for *g′ ≠g* and ℓ *∈ {*1, *… L}*. Note the number of columns is *L*(*|G| −* 1), |G|–1 vectors:
Let be a *L*(*|G| −* 1) *×* 1 vector of the causal coefficients where *g ′ ≠ g*.
We then fit the model:
where ϵ_{t} is a *N ×*1 vector with each element ϵ_{t,n}∼ 𝒩 (0, 1). To write in the most compact form, we can write
Note that is a *N × L|G|* matrix and is a *L|G| ×* 1 vector. Thus the final matrix formulation of Equation 2 is:

#### Elastic net penalty

Because of the large number of predictors as compared to the small number of samples, we use the elastic net penalty, which is a generalization of both ridge and lasso penalties. The elastic net fits the following objective:
Here *‖ · ‖* _{1} represents the ℓ_{1}-norm and *‖ ·‖* _{2} represents the ℓ_{2}-norm.

For the elastic net, we used the following ranges of hyperparameter values: *λ ∈* 10^{−4}, 10^{−3}, *…*, 1, *a ∈ {*0.1, 0.3, *…*, 0.9}. For lasso, we used *λ ∈ {*10^{−5}, *…*, 1}. For ridge, when we used {10^{−5}, *…*, 1}, we found that the the optimal value selected in some cases was the maximum value of *λ* = 1. We thus expanded the range to {10^{−5}, *…*, 10^{6}} to ensure that we were not missing better hyperparameters at larger values. At this point, the optimal *λ* was found to be 100.

#### Hyperparameter tuning

Hyperparameters were selected using leave-one-out cross-validation (LOOCV). The hyperparameter (or pair of hyperparameters, for elastic net) that minimizes the mean-squared error on the held-out datapoints is selected. More specifically, we first fix a hyperparameter (*λ, a*). Then, for a given gene *g* and row index *i*, extract the *i*-the row of and . Refer to this extracted validation set as (target) and (predictors). The remaining data is the training set, (target) and (predictors).

First, let be the that is fit from the training set.

We then compute prediction error on the validation set, We repeat the fit and error for every row index *i* of and for every gene *g*. The mean held-out cross-validation error for (*λ, a*) is:
The (*λ, a*) that minimizes the error in Equation 13 is selected.

#### Permuted coefficients

We evaluate the significance of any given edge *g′ → g* through permutation. In detail, we remove the time dependency between *g′*and *g* via permutations of individual gene temporal profiles over time.

We first generate a single permuted data set . For each gene, we independently shuffle the temporal profile of each gene *g ∈ {*1, *…*, *|G|}*across time (Figure 1A). This is done separately for distinct replicates.

We wish to model the hypothesis of no causal relations from any gene *g ′ ∈ ¬ g*, upon a given effect gene *g*. We use the unpermuted values of the effect gene and the permuted values of all other causal genes *g′ ϵ ¬ g*, as The effect gene *g* remains unpermuted, as we do not consider self-regulatory loops.

Permutation-based causal coefficients are then fit as We use these coefficients to perform FDR calibration.

#### Edge FDR

The result of the elastic net VAR model is a complete network whose edges are weighted according to the estimated regression coefficients.

For each lag ℓ *∈ {*1, *…*, *L}* and effect gene *g*, we control the edge FDR at *≤* 0.05 by finding the threshold such that
For each gene pair (*g′, g*), *g′ ∈ ¬g*, a directed edge *g′→ g* exists if for at least one of the lags .

#### Stability selection

Stability selection is used to ensure the robustness of BETS to small sample size. Stability selection is a method for high-dimensional graph estimation that uses bootstrap samples [74]. While the authors prove finite sample control for the family-wise error rate (FWER), we are interested in controlling the false discovery rate (FDR).

First, we draw *B* = 1, 000 bootstrap samples, where each sample consists of *N* = *R*(*T – L*) rows drawn with replacement from , the predictors, and , the target (Equation 10).

For each bootstrap sample, we infer a network using BETS. Each edge *g′ → g*’s selection frequency, *π*_{g′g,} (the frequency of *g′→ g* among the bootstrap networks) is computed. (Figure 1 B).

#### Stability FDR

To determine the appropriate cutoff for the selection frequency of each edge (*π*_{g′,g}), we generate a null distribution of selection frequencies using permutations. First, we generate a second permuted data set in which we again independently shuffle the temporal profile of each gene *g* ϵ {1, *…*, *G*} across time. This is done separately for distinct replicates. We run the selection frequency procedure on this permuted data set to get the null selection frequency of each edge,

We control the stability FDR at 0.2 by finding the threshold *T*_{b} such that
Because the maximum lag is 2, each edge *g′ → g* has two possible lags and thus two selection frequencies. The lag with larger absolute value of average coefficient across the 1, 000 networks is considered in both the permuted and the real empirical distributions. So, if exceeds, the lag is said to be 1 and the selection frequency is used.

#### Network inference performance metrics

Refer to every network edge inferred by a method as a positive and every missing edge as a negative. Let *TP* be True Positives, *FP* be False Positives, *TN* be True Negatives, and *FN* be False Negatives. Let *TPR* be True Positive Rate, (i.e., recall), and *FPR* be False Positive Rate. Then, we have
In the DREAM benchmark, each network inference method is evaluated by comparing the true network (i.e., the network used to generate the synthetic data) with the inferred network at different thresholds for edge inclusion. The two main evaluation metrics are Area Under the Receiver Operating Characteristic curve (AUROC) and Area Under the Precision-Recall curve (AUPR). AUROC plots TPR on the *y* axis and FPR on the *x* axis. AUPR plots precision on the *y* axis and recall on the *x* axis. When the number of negatives greatly exceeds the number of positives, as with gene networks, which are typically sparse, AUPR is a more relevant metric [75].

### 5.2 Software

BETS is available for download on Github at https://github.com/lujonathanh/BETS. The software is licensed under the terms of the Apache License, version 2.0.

### 5.3 Data sets and Processing

#### DREAM Network Inference Challenge

There were 5 data sets in the DREAM4 Network Inference Challenge, each consisting of 10 time series of 21 time points and 100 genes [41, 76]. For the first half of the time series, a “drug perturbation” was applied; this affected about 1*/*3 of genes. For the second half, the perturbation was removed and the system was allowed to relax back to the wild-type state.

#### Glucocorticoid gene expression data

We analyzed RNA sequencing data from a set of experiments developed to study glucocorticoid receptors (GRs) in the human adenocarcinoma and lung model cell line, A549 [6]. There was an *original exposure* data set of 4 replicates in which cells were stimulated by the glucocorticoid dexamethasone (dex), and gene expression was profiled at {0, 0.5, 1, 2, 3, 4, 5, 6, 7, 8, 10, 12} hours of dex stimulation. There was also an *unperturbed* data set of 3 replicates in which cells were exposed to dex for 12 hours, after which the conditioned media was replaced and dex removed. Gene expression was profiled at {0, 0.5, 1, 2, 3, 4, 5, 6, 7, 8, 10, 12} hours after dex removal. We integrated the *original exposure* and *unperturbed* data into a joint data set with 7 replicates.

We selected 2, 768 genes for analysis, which had average expression >2 TPM and were differentially expressed in the *original exposure data*. A gene was called differentially expressed if its expression at any time point differed from its expression at time 0, ascertained by running edgeR (FDR *≤* 0.05) [6]. We added *NR3C1*, which encodes the glucocorticoid receptor (GR). *NR3C1* was not found to be differentially expressed at FDR≤0.05.

After genes were selected, gene expression Transcripts Per kilobase Million (TPM) were log-normalized and corrected for surrogate variables using SVAseq [77]. Each gene’s temporal profile was centered to have mean zero across time. In the *original exposure data*, all replicates besides replicate 1 had a measurement for each time point. Replicate 1 was missing time points 5 and 6 hrs, so we imputed these values using a linear interpolation from time points 4 and 7 hrs in the log-transformed, surrogate-corrected space.

#### Overexpression transcriptional time series data

There were 10 *overexpression* data sets, in which each of the transcription factors *CEBPB, CEBPD, FOSL*2, *FOXO*1, *FOXO*3, *KLF* 6, *KLF* 9, *KLF* 15, *POU* 5*F* 1, and *TFCP* 2*L*1 was separately overexpressed across 12 hours of dex stimulation. Each overexpression data set had three replicates; gene expression was profiled after {0, 1, 4, 8, 12} hours of dex stimulation. The same 2768 genes were selected and the same normalization and SVAseq correction as earlier was performed.

### 5.4 Application of methods to the data

#### DREAM benchmarking

We ran the methods BETS, Enet, CSId [40], Jump3 [32], CLR [23], MRNET [24], and ARACNE [25] on the DREAM challenge. In BETS, inferred edges were ranked by their selection frequency for calculating AUPR and AUROC. In Enet, edges were ranked by the absolute value of their coefficient. The Python3 version of CSId was run after obtaining it from correspondence with Dr. Penfold.

Jump3 required setting the “systematic noise” and “observational noise” parameters. We used Dr. Huynh-Thu’s settings on the DREAM challenge, with systematic noise at 1*e*-4 and observational noise at 0.01 times the value of the gene’s expression. ARACNE, MRNET, and CLR were run using the minet R library. BETS, Enet, CSId, and Jump3 were run on a single node without parallelization. The node had 28 cores, 128 GB of memory, and 2.4 GHz processor speed. ARACNE, MRNET, and CLR were run on a 4 GB RAM, Intel Core i5 1.3 GHz laptop.

#### Network analysis: Gene annotations

We considered genes with three possible labels: immune system, metabolism, or transcription factor. Immune genes were labeled as such using two sources. The first source is the Gene Ontology (GO) annotation “Immune” (*GO:0002376)* [47]. We applied this label when the evidence codes were EXP, IDA, IGI, IMP, IPI, IC, TAS. The second source is the Gene Ontology Consortium’s curated, ranked list of immune-related genes based on multiple databases and experimental evidence [49]. For the GO annotation, we selected all genes with score ≥7. This resulted in 616 immune genes overall, and 109 immune genes in our list of 2768 genes.

Metabolic genes were called using two sources. The first source is the GO annotation “carbohydrate metabolic process” *GO:0005975* [47]. We applied this label when the evidence codes were EXP, IDA, IGI, IMP, IPI, IC, TAS. The second source is the Gene Set Enrichment Analysis (GSEA)-curated list of metabolic-related genes [48]. We searched only among those with experimental evidence: the Canonical, KEGG, BIOCARTA, and Reactome pathways. We used the following four search queries: “gluconeogenesis OR (glucose AND metabolism) OR glycolysis,” “lipid AND metabolism,” “Diabetes,” “Obesity.” This resulted in 544 metabolic genes overall, of which 120 were in our gene list. 65 genes were both immune and metabolic overall; 12 of these were in our gene list.

Transcription factors (TFs) were called using the Bioguo database of human TFs [50]. There were 1463 TFs overall, of which 226 were present in our gene list.

#### Experimental interactions

We created a list of experimentally validated interactions from the BIOGRID Homo sapiens Protein-Protein Interactions database [78] and the STRING database [79]. Proteins were mapped to genes using BioMart from Ensembl 94 [80]. Among genes in our gene list, there were 17, 990 BIOGRID interactions and 13, 148 STRING interactions.

#### Validation on overexpression data

The overexpression data had four time points with 1 to 4 hour time gaps, unlike the original 12 time points with 0.5 to 2 hour time gaps. On the overexpression data, we used a VAR model that regressed each effect gene’s expression level on its previous expression level and the causal gene’s previous expression level, assuming normal noise ϵ_{t} *∼ 𝒩*(0, 1):
No regularization was included, and ordinary least squares was used to fit the equation. The expression of a causal gene *g*′is fit as a single predictor without the other expression. Lag 1, not 2, is used due to the larger time gaps.

#### Validation on lung trans-eQTLs in GTEx v6

Trans-eQTLs were discovered using the Genotype Tissue Expression (GTEx) v6 data [10, 57]. First, we mapped our genes from hg38 to hg19. For every edge *g*′→*g*, we tested the set of single nucleotide polymorphisms (SNPs) within 20 kilobases of *g*′ for trans-eQTL association with *g* [81]. Specifically, we computed the p-value for linear association of each SNP with the corresponding effect gene *g* using MatrixEQTL [82]. A null distribution was generated by taking every edge *g*′→*g*, permuting the effect gene *g*’s expression values, and repeating the linear association test. FDR over test statistics was calculated using q-value [83]. Because not every causal gene *g*′ had a cis-eQTL, only 26, 839 edges (84% of the original 31, 945 edges) were tested.

## 6. Supplemental Information

### 6.1 Network inference methods

Several methods have been developed to estimate directed graphs of genes from transcriptional time series data (Figure S1). Broadly, these methods estimate directed networks in which the directed edges between nodes—representing genes—indicate a cause-effect relationship between those genes, such that perturbing the expression levels of the causal gene would lead to changes in expression of the effect gene [20].

Let *G* be the set of all genes and *g* be a single gene. Let ¬*g* be *G* with *g* removed. Let there be *T* time points total, and let *t* be a single time point ranging from {1, 2, *…, T*}. Let be the expression of gene *g* at time *t*. Let ϵ_{t} be the residual noise at time *t*. Let : denote sequencing through values, for example would denote all the values through . Let refer to the causal parents of gene *g* at time *t* in dynamic Bayesian Networks. For example, may include . Let *g*′ be the gene we are testing to be causal for gene *g*. Let ℓ be the time lag of the causal interaction. We are testing the existence of the edge *g*′ → *g* at lag ℓ

#### 6.1.1 Mutual information

Mutual information (MI) methods assess the MI between the expression of *g*′ at the ℓ-th previous time point and the expression of *g* at the current time point (Figure S1A) [21, 22, 23, 24, 25]:
A causal edge *g*′→*g* is included if *I*^{ℓ}(*g*′, *g*) exceeds a threshold. MI methods have the advantage of being simple and fast. However, they do not give insight into the sign of two genes’ relationship (i.e., activation or repression) because MI is an unsigned metric [24, 26].

#### 6.1.2 Granger causality

Granger causality methods determine if including the expression of *g*′ at the previous time point improves our ability to predict the expression of *g* at the current time point above using the expression of *g* at the previous time point [27]. A common way to implement a Granger causality approach uses a vector autore-gression (VAR) model, which usually assumes a linear relationship between all genes’ previous expression and *g*’s current expression. (Figure S1B) [84].
A causal edge *g*′→*g* is included in the network if is significantly different from 0 for some ℓ. While older VAR analyses did not fit the causal predictors simultaneously [84, 85, 86], newer analyses fit them simultaneously, using regularization techniques such as lasso [13, 87] or ridge regression [88] to handle the high dimensionality of genome-wide sequencing assays. Nonlinear, kernel-valued functions have also been used to implement ideas in Granger causality [89].

#### 6.1.3 Ordinary differential equations

Ordinary differential equations (ODEs) fit the derivative of the expression of *g* as a function of all genes’ expression at a single time point (Figure S1C) [11, 28, 29]:
Although complex dynamics are often nonlinear, ODE methods typically assume linearity, as small sample sizes make it challenging to infer the parameters of nonlinear functions. A causal edge *g*′→*g* is included in the network if *g*′ has a significant coefficient in the ODE.

These methods are often combined with additional methods such as spline interpolation and piecewise linear functions to improve performance [28, 29].

#### 6.1.4 Decision trees

Decision trees (DT) are a type of nonparametric function based on partitioning the data [30, 31]. DT methods fall either under VAR or ODE methods. Either the DTs fit the expression of *g* at the current time as a function of all genes’ expression at the previous time point (VAR), or they fit the derivative of the expression of *g* as a function of all genes’ expression at a single time point (ODE) (Figure S1D) [32, 33].
A causal edge *g*′→*g* is included in the network when an importance score for *g*′— typically, the reduction in variance of *g* from including *g*′ as a predictor— exceeds some threshold. One limitation of DT methods is that they only produce a ranking of edges, without specifying the sign of the relationship between the genes [32, 33].

#### 6.1.5. Dynamic Bayesian networks

Dynamic Bayesian networks (DBNs) search the space of possible directed acyclic graphs between previous and current expression levels and identify the network structure with the highest posterior probability of each edge given the data (Figure S1E) [34, 35, 36, 37, 38]. DBNs typically assume a linear relationship between previous expression values and current expression values. A causal edge *g’→g* is included in the network when its marginal posterior probability of existence exceeds some threshold. The joint probability of all the genes’ expression across the time points 1 : *T* factorizes as:
Each gene’s expression has a linear relationship with its parents:
While DBNs have been shown to be effective on smaller data sets [90], they scale poorly due to the super-exponential growth of possible causal graph structures [56, 88]. Even after limiting the number of possible parents per gene to two, this results in cubic scaling of the search space. One exception is ScanBMA, which uses a pruning method based on Occam’s window to limit the search space and gain a speedup [36].

#### 6.1.6. Gaussian process

The Gaussian process (GP) is a distribution over continuous, nonlinear functions. GPs are often used in the context of nonlinear DBNs, where GP regression is used to model a nonlinear relationship between previous expression levels and current expression levels (Figure S1F) [39, 40]. A causal edge *g’→g* is included in the network based on its posterior probability of existence, i.e., the sum of the posterior probabilities of those networks that contain the edge. Each gene’s expression has a nonlinear relationship with its parents:
By allowing nonlinear relationships between genes, GPs have proven highly effective. However, like DBN, they perform a search over causal graphs, and therefore suffer from the same scalability issues [39, 40].

## 6.2. Validation of inferred network on overexpression data

Our analyses regressed network edge signs as predictors against the VAR model edge coefficients from the overexpression data as response (Figure 5). We sought to assess the strength of these associations across the 10 data sets, compared against shuffled edges. We compared the effect sizes of all 10 regressions of positive edge one-hot encodings on the overexpression coefficients with the effect sizes estimated similarly after shuffling the edge labels; we did the same for negative edges. At FDR *≤* 0.2, there was a substantial enrichment of effect sizes of positive edges among the original network (Common Language Effect Size (CLES) = 0.93, two-sided Mann-Whitney U-test (MWU) adjusted *p ≤* 0.0026); there was no enrichment of effect sizes for negative edges in the original network (CLES = 0.55, two-sided MWU adjusted *p ≤* 0.73). Thus, the positive edges inferred by BETS validate on the overexpression data, but the negative edges do not, indicating repressive effects may have inconsistent signs.

## 6.3. Supplementary Tables

## Acknowledgments

The authors would like to thank Gregory Darnell, Derek Aguiar, Ariel Gewirtz, Allison Chaney, Isabella Grabski, Cristina Anastase, and Genna Gliner for helpful discussion, feedback, and generosity in running cluster jobs; and Jian Peng for productive discussion and helpful comments.

The authors gratefully acknowledge that this work was performed using the Princeton Research Computing resources sponsored by the Princeton Institute for Computational Science and Engineering (PICSciE) at Princeton University.

## References

- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].↵
- [6].↵
- [7].↵
- [8].↵
- [9].↵
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵
- [60].↵
- [61].↵
- [62].↵
- [63].↵
- [64].↵
- [65].↵
- [66].↵
- [67].↵
- [68].↵
- [69].↵
- [70].↵
- [71].↵
- [72].↵
- [73].↵
- [74].↵
- [75].↵
- [76].↵
- [77].↵
- [78].↵
- [79].↵
- [80].↵
- [81].↵
- [82].↵
- [83].↵
- [84].↵
- [85].↵
- [86].↵
- [87].↵
- [88].↵
- [89].↵
- [90].↵
- [91].↵
- [92].
- [93].
- [94].
- [95].
- [96].
- [97].
- [98].
- [99].
- [100].
- [101].
- [102].
- [103].
- [104].
- [105].
- [106].
- [107].