## Abstract

To identify causation, model-free inference methods, such as Granger Causality, have been widely used due to their flexibility. However, they have difficulty distinguishing synchrony and indirect effects from direct causation, leading to false predictions. To overcome this, model-based inference methods were developed that test the reproducibility of data with a specific mechanistic model to infer causality. However, they can only be applied to systems described by a specific model, greatly limiting their applicability. Here, we address this limitation by deriving an easily-testable condition for a general ODE model to reproduce time-series data. We built a user-friendly computational package, GOBI (General ODE-Based Inference), which is applicable to nearly any system described by ODE. GOBI successfully inferred positive and negative regulations in various networks at both molecular and population levels, unlike existing model-free methods. Thus, this accurate and broadly-applicable inference method is a powerful tool for understanding complex dynamical systems.

## I. INTRODUCTION

Identifying a causal interaction is crucial to understand the underlying mechanism of systems in nature. A recent surge in time-series data collection with advanced technology offers opportunities to computationally uncover causation [1]. Various model-free methods, such as Granger Causality (GC) [2] and Convergent Cross Mapping (CCM) [3], have been widely used to infer causation from time-series data. Although they are easy to implement and broadly applicable [4–10], they usually struggle to differentiate synchrony (i.e., similar periods among components) versus causality [11–15] and distinguish between direct and indirect causation [16–20]. For instance, when oscillatory time-series data is given, nearly all-to-all connected networks are inferred [12]. To prevent such false positive predictions, model-free methods have been improved (e.g., Partial Cross Mapping (PCM) [20]), but further investigation is needed to show their universal validity.

Alternatively, model-based methods infer causality by testing the reproducibility of time-series data with mechanistic models. Although testing the reproducibility is computationally expensive, as long as the underlying model is accurate, the model-based inference method is accurate even in the presence of synchrony in time series and indirect effect [21–27]. However, the inference results strongly depend on the choice of model, and inaccurate model imposition can result in false positive predictions, limiting their applicability. To overcome this limit, inference methods using flexible models were developed [28–34]. In particular, the most recent method, ION [12], infers causation from *X* to *Y* described by the general ODE model between two components, i.e., . However, ION is applicable only when every component is affected by at most one another component.

Here, we develop a model-based method that infers interactions among multiple components described by the general ODE model:
where *f* can be any smooth and monotonic increasing or decreasing functions of *X _{i}* and

*X*is

_{N}*Y*in the presence of self-regulation. Thus, our approach completely resolves the fundamental limit of model-based inference: strong dependence on a chosen model. Furthermore, we derive the simple condition for the reproducibility of time-series with Eq. (1), which does not require computationally expensive fitting, unlike previous model-based approaches. To facilitate our approach, we develop a user-friendly computational package, GOBI (General ODE-Based Inference). GOBI successfully infers causal relationships in gene regulatory networks, ecological system, and cardiovascular disease caused by air pollution from synchronous time-series data, with which popular model-free methods fail at inference. Furthermore, GOBI can also distinguish between direct and indirect causation even from noisy time-series data. Because GOBI is both accurate and broadly applicable, which have not been achieved by previous model-free or model-based inference methods, it can be a powerful tool in understanding complex dynamical systems.

## II. RESULTS

### A. Inferring regulation types from time-series

We first illustrate the common properties of time series generated by either positive or negative causation with simple examples. When the input signal *X* positively regulates *Y* (*X* → *Y*) (Fig. 1a), *Y* increases whenever *X* increases. Thus, for any pair of time point *t* and *t** with which *X ^{d}*(

*t,t**) :=

*X*(

*t*) –

*X*(

*t**) > 0, then . Similarly, when

*X*negatively regulates

*Y*(

*X*⊣

*Y*) (Fig. 1c left), if

*X*(

^{d}*t,t**) < 0, then . Thus, in the presence of either positive (

*σ*= +) or negative (

*σ*= –) regulation, the following

*regulation-detection function*is always positive (Fig. 1b and c): defined on (

*t,t**) such that

*σX*(

^{d}*t,t**) > 0.

This idea can be extended to a case with multiple causes. For instance, when *X*_{1} and *X*_{2} positively regulate *Y* together (Fig. 1d), if both and , then . This leads to the positivity of the regulation-detection function for , defined for (*t,t**) such that and (Fig. 1e). Similarly, if *X*_{1} and *X*_{2} positively and negatively regulate *Y*, respectively (Fig. 1g), the regulation-detection function for , is positive for (*t,t**) such that and (Fig. 1i). Note that for is not always positive (Fig. 1f, h). Thus, the nonpositivity of the regulation-detection function can be used to infer the absence of the regulation. The same positive relationships can be seen in other types of 2D regulations (Supplementary Fig. 1).

The positivity and negativity of the regulation-detection function reflect the presence and absence of regulation, respectively. The sign of the can be quantified with its normalized integral, *regulation-detection score* (Eq. (4)). Thus, in the presence of regulation type *σ* since the regulation-detection function is positive (see Supplementary Information for details). However, even in the absence of regulation type *σ*, can often be one. For instance, when *X*_{1} positively regulates *Y* and *X*_{2} does not regulate *Y* (Fig. 1j), increases whenever *X*_{1} increases regardless of *X*_{2}. Thus, both and are positive (Fig. 1k and l). Here, reflects that *X*_{2} does not affect the regulation *X*_{1} → *Y*. Thus, to quantify the effect of a new component (e.g., *X*_{2}) on an existing regulation (e.g., *X*_{1} → *Y*), we develop a *regulation-delta function* Δ:

If does not indicate the presence of .

### B. Inferring regulatory network structure

together with Δ ≠ 0 can be used as an indicator of regulation type *σ* from ** X** to

*Y*. Based on this, we construct a framework for inferring a regulatory network from time-series data (Fig. 2a). To illustrate this, we obtain multiple time-series data simulated with random input signal

*A*and different initial conditions of

*B*and

*C*randomly selected from [–1,1].

From each time series, regulation-detection score is calculated for every 1D regulation type *σ* (Step 1). Here, for each regulation, **X** are causes and *Y* is a target among *A, B* and *C*. Because only *A* ⊣ *B* satisfies the criteria for every time series, only *A* ⊣ *B* is inferred as 1D regulation. Note that even for the other regulations, can occur for a few time series, leading to a false positive prediction. This can be prevented by using multiple time series. Next, is calculated for every 2D regulation type *σ* (Step 2). Three types of regulation ( and ) satisfy the criteria for every time series. Among these, we can identify false positive regulations by using a regulation-delta function (Step 3). (C) is equal to zero for every time series, indicating that and are false positive regulations. Thus, is the only inferred 2D regulation as it satisfies the criteria for the regulation-delta function ( and ). By merging the inferred 1D and 2D regulations, the regulatory network is successfully inferred. Since there are three components in this system, we inferred up to 2D regulations. If there are *N* components in the system, we go up to (*N* – 1)D regulations (Supplementary Fig. 2).

We have applied the framework to infer regulatory networks from simulated time-series data of various biological models. In most biological systems, the degradation rates of molecules increase as their own concentrations increase; thus we assume that self-regulation is negative for every component in the system. Thus, to detect ND regulation, the (*N* + 1)D regulation-detection function and score, including negative self-regulation, is used. For example, to infer 1D positive regulation from *X* to *Y*, the criteria is used.

From the time series simulated with the Kim-Forger model (Fig. 2b left), describing the negative feedback loop of the mammalian circadian clock [35], using the criteria , two positive 1D regulations (*M* → *P _{C}* and

*P*→

_{C}*P*) and one negative 1D regulation (

*P*⊣

*M*) are inferred (Fig. 2b middle). Among the six different types of 2D regulations ( and ) satisfying the criteria for all the time series, none of them pass the Δ test (i.e., ) (Fig. 2b middle). Thus, no 2D regulation is inferred. By merging the three inferred 1D regulations, the negative feedback loop structure is recovered (Fig. 2b right). Our method also successfully inferred the negative feedback loop structure of

*Frzilator*[36] (Fig. 2c) and the 4-state Goodwin oscillator [37] (Fig. 2d). Furthermore, our framework correctly inferred the systems having 2D regulations: the Gold-beter model describing the

*Drosophila*circadian clock [38] (Fig. 2e) and the regulatory network of the cAMP oscillator of

*Dictyostelium*[39] (Fig. 2f) (see Supplementary Information for the equations and parameters of the models and Supplementary Data 1 for detailed inference results).

### C. Inference with noisy time series

In the presence of noise in the time-series data, the regulation-detection score may not be one even if there is a regulation type *σ* from ** X** to

*Y*. For example, in the case of an Incoherent Feed-forward Loop (IFL) which contains

*A*⊣

*B*(Fig. 3a), is not one in the presence of noise (Fig. 3b blue). Thus, for noisy data, we need to relax the criteria to where

*S*

^{thres}< 1 is a threshold. Because gets farther away from one as the noise level increases,

*S*

^{thres}also needs to be decreased. We choose

*S*

^{thres}as 0.9 – 0.005 × (noise level) with which true and false regulations can be distinguished in the majority of cases for our examples (Fig. 3b and Supplementary Fig. 3e). For instance,

*S*

^{thres}(green dashed line, Fig. 3b) overall separates true regulation (Fig. 3b blue) and false regulation (Fig. 3b red). However, is not always satisfied for true (false) regulation type

*σ*from

**X**to

*Y*(Fig. 3b). Thus, we further use a

*Total Regulation Score (TRS),*the fraction of time-series data satisfying (Fig. 3c left). Then, we use the criteria to infer the regulation. Similar to

*S*

^{thres}, TRS

^{thres}also decreases as the noise level increases. Thus, we use TRS

^{thres}= 0.9 – 0.01 × (noise level), which successfully distinguishes between the true and false regulation of IFL (Fig. 3c right) and the other systems (Supplementary Fig. 3f). Note that is the measure which integrates the weight given on the regulation-detection score reflecting the size of the domain of the regulation-detection function (see Supplementary Information for details). See Method for how to quantify the noise level.

Next, we investigate whether the Δ test can distinguish direct and indirect regulations using examples of Coherent Feed-forward Loop (CFL, Fig. 3d) and Single Feed-forward Loop (SFL, Fig. 3e). In CFL, direct regulation of *A* ⊣ *C* exists. On the other hand, in SFL, only indirect negative regulation from *A* to *C*, induced from a regulatory chain *A* ⊣ *B* → *C*, exists.

In the presence of noise, the regulation-delta function often fails to distinguish these direct and indirect regulations from *A* to *C* in CFL and SFL. Specifically, for both of CFL and SFL with 20% multiplicative noise, is larger than *S*^{thres} and (*A*) is strictly negative (Fig. 3f and g) for the most of cases. Here, the sign of Δ is quantified by using a one-tailed Wilcoxon signed rank test (Supplementary Fig. 4a). Thus, the regulation is inferred from not only CFL but also SFL. This indicates that in the presence of noise, the regulation-delta function can be skewed to the specific type of regulation even for indirect regulation. To prevent such false positive prediction, we developed another criteria. Specifically, we use a surrogate time series *A* (*A*_{shuffled}, Fig. 3h) to destroy the dependence of *C* on *A* in the presence of direct regulation (*A* ⊣ *C*). As a result, the regulation-detection score is significantly reduced compared to (Fig. 3i top). On the other hand, if *A* does not directly regulate *C*, then regulation-detection score does not decrease much (Fig. 3i bottom), and is not significantly larger than . When multiple time series are given, we calculate the *p*-values for each data and integrate them using Fisher’s method. The criteria (the combined *p*-value < combining *p* = 0.001 for every data) successfully distinguishes between direct and indirect regulation even when the noise varies (Supplementary Fig. 4b).

From the noisy time series, using the criteria , all potential 1D (Fig. 3h upper-left) and 2D (Fig. 3h upper-right) regulations are inferred. Then, among the inferred regulations, we need to identify indirect regulations. Unlike IFL, CFL and SFL have a potential indirect regulation. That is, *A* ⊣ *C* has the potential to be indirect since there is a regulatory chain *A* ⊣ *B* → *C*. In this case, we use a surrogate time series of a potential source of indirect regulation (*A*) to test whether is significantly larger than . This reveals that *A* ⊣ *C* is direct regulation for CFL, but not SFL. Then, merging 1D and 2D results successfully recovers the network structure of IFL, CFL, and SFL even from noisy time series.

Based on TRS and post-filtering tests (Δ test, surrogate test), we develop a user-friendly computational package, General ODE-Based Inference (GOBI), which can be used to infer regulations for systems described by Eq. (1). GOBI successfully infers regulatory networks from simulated time series using ODE models (Fig. 2b-g) in the presence of noise. Here, the *F*_{2} score, the weighted harmonic mean of precision and recall, is nearly one, indicating the nearly perfect recovery of all regulations (Fig. 3k).

### D. Successful network inferences from experimentally measured time series

We use GOBI to infer regulatory networks from experimentally measured time series. From the population data of two unicellular ciliates *Paramecium aurelia* (*P*) and *Didinium nasutum* (*D*) [3, 40] (Fig. 4a left), the network between the prey (*P*) and predator (*D*) is successfully inferred (Fig. 4a and Supplementary Fig. 6a).

Next, we apply GOBI to the time series of the synthetic genetic oscillator, which consists of *Tetracycline repressor* (*TetR*) and RNA polymerase sigma factor (*σ*^{28}) [41] (Fig. 4b left). While the time series are measured under different conditions after adding purified *TetR* or inactivating intrinsic *TetR*, our method consistently infers the negative feedback loop based on two direct regulations *σ*^{28} → *TetR* and *TetR* ⊣ *σ*^{28} for all cases (Fig. 4b middle and Supplementary Fig. 6b). This indicates that our method can infer regulations even when the data are achieved from different conditions since we do not specify the specific equations with parameters in Eq. (1). Here, since depletion of a component typically increases as its own concentration increases, self-regulation is assumed to be negative (Fig. 4b right, dashed arrow).

We next investigate the time-series data from a slightly more complex synthetic oscillator, the three-gene repressilator [42] (Fig. 4c left). Assuming negative selfregulation, the criteria infers three negative 1D regulations and three 2D regulations (Fig. 4c middle). Among the 2D regulations, positive regulations are inferred as indirect as they do not pass the surrogate test (Fig. 4c middle, dashed arrow). Thus, among the inferred 2D regulations, only the negative regulations, consistent with the inferred 1D regulations, are inferred as direct regulations. Gathering these results, GOBI successfully infers the network structure of the repressilator (Fig. 4c right and Supplementary Fig. 6c). Note that although our method infers the regulations among proteins as direct, in fact, *mRNA* exists as an intermediate step between the negative regulations among the proteins. This happens due to the short translation time in *E.coli* [44] and thus the similar shape and phase of the mRNA and protein profiles. This indicates that our method infers indirect regulations with a short intermediate step as direct regulations.

From the time series measuring the amount of four cofactors present at the estrogen-sensitive *pS*2 promoter after treatment with estradiol [43, 45](Fig. 4d left), four 1D regulations (*HDAC* ⊣ *hER, TRIP1* ⊣ *hER, HDAC* ⊣ *POLII* and *hER* → *POLII*) satisfy the criteria . However, we exclude them because *hER* and *POLII* have two causes, forming 2D regulations, although the 1D criteria assumes a single cause (Fig. 4d middle, dashed box). If both regulations are effective, they will be identified as 2D regulations. Indeed, among the 10 candidates for 2D regulations, most of them include the four inferred 1D regulations. Via Δ test and surrogate test, two 1D regulations (*hER* → *POLII* and *HDAC* ⊣ *hER*) and one 2D regulation are inferred (Supplementary Fig. 6d). While we are not able to further infer 3D regulations due to the limited amount of data, the inferred regulations are supported by the experiments. That is, estradiol triggers the binding of *hER* to the *pS*2 promoter to recruit *POlII* [43], supporting *hER* → *POLII*. Also, inhibition of *POLII* phosphorylation blocks the recruitment of *HDAC* but does not affect the *APIS* engagement at the *pS*2 promoter [43], supporting *POLII* → *HDAC* and no regulation from *POLII* to *TRIP*1, which is a surrogate measure of *APIS*. Without inhibition of *POLII*, *HDAC* is recruited after the *APIS* engagement, and when the *HDAC* has maximum occupation, then the *pS*2 promoter becomes refractory to *hER* [43], supporting *TRIP*1 → *HDAC* ⊣ *hER*. Interestingly, the inferred network contains a negative feedback loop which is required to generate sustained oscillations [46].

Finally, we investigate five time series of air pollutants and cardiovascular disease occurrence in Hong Kong from 1994 to 1997 [47] (Fig. 4e left). Since our goal is to identify which pollutants cause cardiovascular disease, we fix the disease as a target. Also, we assume the negative selfregulation of disease reflecting death. While two positive causal links from *NO*_{2} and respirable suspended particulates (*Rspar*) to the disease are identified as 1D regulation (Fig. 4e middle), we exclude them because they share the same target. (Fig. 4e middle, dashed box). Among two inferred 2D regulations, one of them passes the Δ test and surrogate test (Fig. 4d middle). Furthermore, no 3D and 4D regulation is inferred (Supplementary Fig. 6e). The inferred network indicates that both *NO*_{2} and *Rspar* are major causes of cardiovascular diseases (Fig. 4e right). Indeed, it was reported that *NO _{2}* and

*Rspar*are associated with hospital admissions and mortality due to cardiovascular disease, respectively [48].

### E. Comparison between our framework and other model-free inference methods

Here, we compare our framework with popular model-free methods, i.e., GC, CCM and PCM, by using the experimental time-series data in the previous section (Fig. 4a-e). Unlike our method, the model-free methods can only infer the presence of regulation and not its type (i.e. positive and negative). Thus, the arrows represent inferred regulations, which could be either positive or negative.

For the prey-predator system and genetic oscillator (Fig. 4a,b), we changed them to more challenging cases: each time series is duplicated and shifted about half of its period to increase the number of components. While our method successfully detects two independent negative feedback loops (Fig. 5a,b), model-free methods infer false positive predictions (e.g., *P* to *D*^{shift} in Fig. 5a) because they usually misidentify synchrony as causality.

For a similar reason, synchrony obscures the inference of the model-free methods for the repressilator (Fig. 5c). Moreover, the model-free methods fail to distinguish between direct and indirect regulations. For example, they infer the indirect causation *TetR* → λ*cl* induced by the regulatory chain *TetR* ⊣ *LacI* ⊣ λ*cl* unlike our method. Similarly, due to synchrony and indirect effect, for the system of cofactors at the *pS*2 promoter, model-free methods infer an almost fully connected causal network unlike our method (Fig. 5d).

When we use three years of data (full-length data) of air pollutants and cardiovascular disease, PCM infers the same structure as GOBI infers, i.e., only *NO*_{2} and *Rspar* cause the disease (Fig. 5e grey) [20]. On the other hand, when only part of the data (i.e. two years of data) is used, only GOBI infers the same structure (Fig. 5e purple). This indicates that GOBI is more reliable and accurate than the model-free methods.

## III. DISCUSSION

We develop an inference method that does not suffer from the weakness of model-free and model-based inference methods. We derive the conditions for interactions satisfying the general ODE (Eq. (1)). As this allows us to easily check the reproducibility of given time-series data with the general ODE (i.e., the existence of ODE satisfying given time-series data) without fitting, the computational cost is dramatically reduced compared to the previous model-based approaches. Importantly, as our method can be applied to any system described by general ODE (Eq. (1)), it does not suffer from the fundamental limit of the model-based approach (i.e., requirement of a priori model accurately describing the system). In addition, our method also does not run the serious risk of misidentifying synchrony as causality, unlike the previous model-free approaches. Furthermore, our method successfully distinguishes direct from indirect causal relations by adopting the surrogate test (Fig. 3). In this way, our framework dramatically reduces the false positive predictions which are the inherent flaw of the model-free inference method (Fig. 5). Taken together, we developed an accurate and broadly applicable inference method that can uncover unknown functional relationships underlying the system from their output time-series data (Fig. 4).

In our approach, we assumed that when *X* causes *Y*, *X* causes *Y* either positively or negatively. Thus, our approach cannot capture the causation when *X* causes *Y* both positively and negatively or when the type of causation changes over time. It would be an interesting future work to derive the condition of reproducibility without assuming a fixed causation type (i.e. the monotonicity of *f* in Eq. (1)). Because our method tests the reproducibility of time-series data using necessary conditions, false positive causations can be predicted. To resolve this, we used multiple time-series data and performed post-filtering tests (i.e., Δ test and surrogate test). Thus, to infer high-dimensional regulations, a large amount of data is required. Lastly, while we considered the general form of ODE, an interesting future direction would be to extend our work to models that describe interactions including time delays.

## IV. METHODS

### A. Computational package for inferring regulatory network

Here, we describe the key steps of our computational package, GOBI (Github link will be provided upon acceptance). For the experimental time-series data ** X**(

*t*) = (

*X*

_{1}(

*t*),

*X*

_{2}(

*t*), ⋯,

*X*(

_{N}*t*)),

**(**

*X**t*) can be interpolated with either the ‘spline’, or ‘fourier’ method, chosen by the user. Also, the derivative of

**(**

*X**t*) is computed using the MATLAB function ‘gradient’.

#### 1. Regulation-detection region

For the *ND* regulation (Eq. (1)) with regulation type σ, the regulation-detection region (*R*_{X}*σ*) is defined as the set of (*t,t**) on the domain of time series [0, *τ*)^{2} satisfying for all *i*. For example, with the positive 1D regulation *X* → *Y* (*σ* = +), *R*_{X+} is the set of (*t,t**) where *X ^{d}* > 0. For the 2D regulation is the set of (

*t,t**) satisfying both and . The size of the regulation-detection region (size(

*R*

_{X}

*σ*)) is the fraction of

*R*

_{X}

*σ*over the domain [0,

*τ*)

^{2}. In the presence of noise, we only consider a region which is not small (i.e., size(

*R*

_{Xσ}) >

*R*

^{thres}) to avoid an error from the noise. The value of

*R*

^{thres}can be chosen from 0 to 0.1, and the choice of

*R*

^{thres}does not significantly affect the results (Supplementary Fig. 3a). However, a small value of

*R*

^{thres}is recommended for inferring high dimensional regulations since the average of size(

*R*

_{X}

*σ*) decreases exponentially as dimension increases (see Supplementary Information for details).

#### 2. Regulation-detection function and score

When the regulation type *σ* from ** X** = (

*X*

_{1},

*X*

_{2}, ⋯,

*X*) to

_{N}*Y*exists, the following regulation-detection function defined on regulation-detection region

*R*is always positive.

_{X}σThus, the following regulation-detection score is one:
(see Supplementary Information for details). However, this is not true anymore in the presence of noise. Thus, we relax the criteria from to . Among the data which has nonempty *R*_{X}*σ* (i.e., *R*_{X}*σ* > *R*^{thres}), the fraction of data satisfying the criteria is called Total Regulation Score . Finally, we infer the regulation from noisy time-series data using the criteria for noisy time-series data. *S*^{thres} = 0.9 – 0.005 × (noise level) and TRS^{thres} = 0.9 – 0.01 × (noise level) are used (Fig. 3a-c and Supplementary Fig. 3). The noise level of the time series is approximated using the mean square of the residual between the noisy and fitted time series (Supplementary Fig. 5).

#### 3. Δ test

When we add any regulation on an existing true regulation, the regulation-detection score is always one (Fig. 1j-l). Thus, to test whether the additional regulation is effective, we consider , where is the regulation-detection score when the new component (*X*_{new}) is positively (negatively) added to the existing regulation type *σ*. Because reflects that the new component (*X*_{new}) does not have any regulatory role, the newly added regulation is inferred only when for some data. In particular, Δ > 0 (Δ < 0) represents that the new component adds positive (negative) regulation. In the presence of noise, the positive (negative) regulation is inferred if Δ ≥ 0 (Δ ≤ 0) consistently for all time series. If the number of time series is greater than 25, the sign of Δ is quantified by a one-tailed Wilcoxon signed rank test. We set the critical value of significance as 0.01, but it can be chosen by the user.

#### 4. Surrogate test

Indirect regulation is induced by the chain of direct regulations. For example, in SFL (Fig. 3e), regulatory chain *A* ⊣ *B* → *C* induces the indirect negative regulation *A* ⊣ *C*. In the presence of noise, the Δ test sometimes fails to distinguish between direct and indirect regulations (Fig. 3d-g). Thus, after the Δ test, if the inferred regulation has the potential to be indirect, we additionally perform the surrogate test to determine whether the inferred regulation is direct or indirect. Specifically, for each candidate of indirect regulation, we shuffle the time series of cause using the MATLAB function ‘perm’ and then calculate the regulation-detection scores. Then, we test whether the original regulation-detection score is significantly larger than the shuffled ones by using a one-tailed *Z* test. In the presence of the *k* number of time-series data, we can get the *k* number of *p*-values (*p _{i}, i* = 1, 2, ⋯,

*k*). Thus, we combined them into one test statistic (χ

^{2}) using Fisher’s method, . We set the critical value of the significance of Fisher’s method by combining

*p*= 0.001 for all the data, but it can also be chosen by the user.

_{i}#### 5. Model-free methods

For GC, we rejected the null hypothesis that *Y* does not Granger cause *X*, and thereby inferred direct regulations by using the *F* statistic with a significance level of 95% [2]. For Convergent cross mapping (CCM) [3] and Partial cross mapping (PCM) [20], we choose an appropriate embedding dimension using the false nearest neighbor algorithm. Also, we select a time lag producing the first minimum of delayed mutual information. Specifically, we used embedding dimension 2 for the preypredator, genetic oscillator and estradiol data-sets; and 3 for the repressilator and air pollutants and cardiovascular disease data-sets. Also, we used time lag 2 for preypredator; 3 ~ 10 for genetic oscillator (there are eight different time-series data-sets); 10 for therepressilator; 15 for the estradiol data-set; and 3 for the air pollutants and cardiovascular disease data-set.

### B. *in silico* time-series data

With the ODE describing the system, we simulate the time-series data using the MATLAB function ‘ode45’. The sampling rate is 100 points per period for all the examples (Fig. 1, 2, 3). For the multiple time-series data (Fig. 2, 3), we generate 100 different time series with different initial conditions. Then, before applying our method, we normalize each time series by re-scaling to have minimum 0 and maximum 1. For noisy time series, we add multiplicative noise sampled randomly from a normal distribution with mean 0 and standard deviation given by the noise level. For example, for 10% multiplicative noise, we add the noise *X*(*t _{i}*) ·

*ϵ*to

*X*(

*t*), where

_{i}*ϵ*~

*N*(0, 0.1

^{2}). Before applying our method, all the simulated noisy time series are fitted using the MATLAB function ‘fourier4’. However, if the noise level is too high, ‘fourier4’ tends to overfit and capture the noise. Thus, in the presence of a high level of noise, ‘fourier2’ is recommended for smoothing.

### C. Experimental time-series data

For the experimental data, we first calculate the period of data by using the first peak of auto-correlation. Then, we cut the time series into periods (Fig. 4a,b). Specifically, we cut the prey-predator time series every five days to generate seven different time series (Fig. 4a). When the number of cycles in the data is low (<5), to generate enough multiple time series (Fig. 4c-e), we cut the data using the moving-window technique. That is, we choose the window whose size is the period of the time series. Then, along the time series, we move the window until the next window overlaps with the current window by 90%. Then, the time series in every window is used for our approach. We did this for the repressilator (Fig. 4c); estradiol data-set (Fig. 4d); and air pollution and cardiovascular disease data (Fig. 4e). For instance, we used time-series data of air pollutants and cardiovascular disease with a window size of one year and an overlap of 11 months (i.e., move the window for a month) to generate 23 data-sets. Before this, the time series of disease admissions are smoothed using a simple moving average with a window width of seven days to avoid the effect of days of the week. Each time series is interpolated using the MATLAB function ‘spline’ (Fig. 4a-d) or ‘fourier2’ (Fig. 4e) depending on the noise level of the time-series data.

## AUTHOR CONTRIBUTIONS

S.H.P., S.H. and J.K.K. designed the research. S.H.P. and S.H. developed the method. S.H.P. performed computation. S.H.P. analyzed data. J.K.K. supervised the project. All authors wrote the manuscript.

## COMPETING INTERESTS

The authors declare no competing interests.

## ACKNOWLEDGMENTS

We thank Seokjoo Chae, Hyukpyo Hong and Yun Min Song for valuable comments. This work was supported by Samsung Science and Technology Foundation SSTF-BA1902-01 (to J.K.K.) and Institute for Basic Science IBS-R029-C3 (to J.K.K.).