## Abstract

Timescales characterize the pace of change for many dynamic processes in nature: radioactive decay, metabolization of substances, memory decay in neural systems, and epidemic spreads. Measuring timescales from experimental data can reveal underlying mechanisms and constrain theoretical models. Timescales are usually estimated by fitting the autocorrelation of sample time-series with exponential decay functions. We show that this standard procedure often fails to recover the correct timescales, exhibiting large estimation errors due to a statistical bias in autocorrelations of finite data samples. To overcome this bias, we develop a method using adaptive Approximate Bayesian Computations. Our method estimates the timescales by fitting the autocorrelation of sample data with a generative model based on a mixture of Ornstein-Uhlenbeck processes. The method accounts for finite sample size and noise in data and returns a posterior distribution of timescales quantifying the estimation uncertainty. We demonstrate how the posterior distribution can be used for model selection to compare alternative hypotheses about the dynamics of the underlying process. Our method accurately recovers the correct timescales on synthetic data from various processes with known ground truth dynamics. We illustrate its application to electrophysiological recordings from the primate cortex.

## 1 Introduction

Dynamic changes in many stochastic processes occur over typical periods known as timescales. The timescales of different processes in nature range broadly from milliseconds in protein folding [1, 2] or neural signalling [3–5], to days in spreading of infectious diseases [6–8], to years in demographic changes of populations [9, 10], and up to millennia in the climate change [11, 12]. Accurate measurements of timescales from experimental data are necessary to uncover mechanisms controlling the dynamics of underlying processes and to determine the operating regime of these dynamics [13]. Inferring the operating regime, e.g., how close the system is to a critical point, can reveal general principles that explain the collective behavior of complex systems [14–16] and predict their future evolution [17–19]. Thus experimentally measured timescales constrain theoretical models and enable accurate predictions in practical applications.

The timescales of a stochastic process are defined by the exponential decay rates of its autocorrelation function. Accordingly, timescales are usually estimated by fitting the autocorrelation of a sample time-series with exponential decay functions [3, 13, 20–23]. However, the shape of autocorrelations can be altered by many factors unrelated to the dynamics of the processes under study. For example, autocorrelations of *in vivo* neural activity can contain components arising from a specific trial structure of a behavioral task or slow drifts in the average activity. To correct for these irrelevant factors, several techniques were developed based on data resampling, in particular, trial shuffling and spike jittering methods [24–27]. These methods remove from the autocorrelation the average autocorrelation of surrogate data, which are designed to match the irrelevant factors in the real data but are otherwise random. The success of these methods critically depends on the ability to construct the appropriate surrogate data that exactly reproduce the irrelevant factors.

The shape of empirical autocorrelations is also affected by the fact that sample autocorrelation is a biased estimator. The values of sample autocorrelation computed from a finite time series systematically deviate from the true autocorrelation [28–33]. This statistical bias deforms the shape of empirical autocorrelations and hence can affect the estimates of timescales obtained by direct fitting of exponential functions. The magnitude of the bias mainly depends on the length of the sample time-series, but also on the value of the true autocorrelation at each time-lag. The expected value and variance of the autocorrelation bias can be derived analytically in some simple cases, such as a Markov process with a single timescale [28, 31]. However, the analytical derivation is not tractable for more general processes that involve multiple timescales or have additional temporal structure. Moreover, since the bias depends on the true autocorrelation itself, which is unknown, it cannot be corrected by constructing appropriate surrogate data as in shuffling or jittering methods. How autocorrelation bias affects the estimation of timescales was not studied systematically.

We show that the autocorrelation bias due to a finite sample size significantly affects the timescales estimated by the direct fitting of exponential functions, resulting in large systematic errors. To correct for this bias, we develop a method based on adaptive Approximate Bayesian Computations (aABC) that estimates timescales by fitting the autocorrelation with a generative model. ABC is a family of likelihood-free inference algorithms for estimating model parameters when the likelihood function cannot be calculated analytically [34]. The aABC algorithm approximates the multivariate posterior distribution of parameters of a generative model using population Monte-Carlo sampling [35]. Our generative model is based on a mixture of Ornstein-Uhlenbeck processes—one for each estimated timescale—which have exponentially decaying autocorrelations. The generative model can be further augmented with the desired noise model (e.g., a spike generation process) and additional temporal structure to match the statistics of the data. Our method accurately recovers the correct timescales from finite data samples for various synthetic processes with known ground truth dynamics. We demonstrate how the inferred posterior distributions can be used for model selection to compare alternative hypotheses about the dynamics of the underlying process. To illustrate an application of our method, we estimate timescales of ongoing spiking activity in the primate visual cortex during a behavioral task. The general method presented here can be adapted to various types of data and can find broad applications in neuroscience, epidemiology, physics, and other fields.

## 2 Results

### 2.1 Bias in timescales estimated by exponential fitting

Timescales of a stochastic process *A*(*t′*) are defined by the exponential decay rates of its auto-correlation function. The autocorrelation is the correlation between the values of the process at two time points separated by a time lag *t*. For stationary processes, the autocorrelation function only depends on the time lag:

Here *µ* and *σ*^{2} are, respectively, the mean and variance of the process, which are constant in time, and E[⋅]_{t′} is the expectation over *t′*. Different normalizations of autocorrelation are also used in literature, but our results do not depend on a specific choice of normalization.

In experiments or simulations, the autocorrelation needs to be estimated from a finite sample of empirical data. A data sample from the process *A*(*t′*) constitutes a finite time-series measured at discrete times (*i* = 1 … *N*, where *N* is the length of the time-series). For example, the sample time-series can be spike-counts of a neuron in discrete time bins, or a continuous voltage signal measured at a specific sampling rate. Accordingly, the sample autocorrelation is defined for a discrete set of time lags *t*_{j}. For empirical data, the true values of *µ* and *σ* are unknown. Hence, several estimators of the sample autocorrelation were proposed, which use different estimators for the sample mean and sample variance [28, 29]. One possible choice is:
with the sample variance and two different sample means . However, for any of these choices, the sample autocorrelation is a biased estimator: for a finite length time-series the values of the sample autocorrelation systematically deviate from the true autocorrelation [28–33]. This statistical bias deforms the shape of the sample autocorrelation and therefore may affect the estimation of timescales by direct fitting of exponential decay functions.

To investigate how the autocorrelation bias affects the timescales estimated by exponential fitting, we tested how accurately this procedure recovers the correct timescales on synthetic data with a known ground truth. We generated synthetic data from several stochastic processes for which the autocorrelation function can be computed analytically. The exponential decay rates of the analytical autocorrelation provide the ground-truth timescales. Each synthetic dataset consisted of 500 independent realizations of the process (i.e. trials) with fixed duration. Such trial-based data are typical in neuroscience but usually with a smaller number of trials. We computed the sample autocorrelation for each trial using Eq. 2, averaged them to reduce the noise, and then fitted the average autocorrelation with the correct analytical functional form to estimate the timescale parameters. We repeated the entire procedure 500 times to obtain a distribution of estimated timescales for multiple independent realizations of the data.

We considered three ground truth processes which differed in the number of timescales, additional temporal structure and noise. First, we used an Ornstein–Uhlenbeck (OU) process (Fig. 1A):
where *ξ*(*t′*) is a Gaussian white noise with zero mean, and the diffusion parameter *D* sets the variance Var[*A*_{OU}(*t′*)] = *Dτ* [36, 37]. The autocorrelation of the OU process is an exponential decay function [38]

Accordingly, the parameter *τ* provides the ground truth timescale.

Second, we used a linear mixture of an OU process and an additional oscillatory component (Fig. 1B):

Here the coefficient *c*_{1} ∈ [0,1] sets the relative weights of two components without changing the total variance of the process, and the phase *φ* is drawn on each trial from a uniform distribution on [0, 2*π*]. This process resembles the firing rate of a neuron modulated by a slow oscillation [39, 40]. The autocorrelation of this process is given by
hence the ground truth timescale is defined by the OU parameter *τ*. When fitting this functional form to the sample autocorrelation, we assumed that the frequency *f* is known and only estimated the timescale *τ* and the coefficient *c*_{1}.

Finally, we used an inhomogeneous Poisson process with the instantaneous rate modeled as a linear mixture of two OU processes with different timescales (Eq. 11, Fig. 1C). The OU mixture was shifted, scaled and rectified to produce the Poisson rate with the desired mean and variance (Eq. 15). Similar doubly-stochastic processes are often used to model spiking activity of neurons [41]. The total variance of this process consists of two contributions: the rate variance and the Poisson process variance (Eq. 17) [42]. We simulated this process in discrete time bins by sampling event-counts from a Poisson distribution with the given instantaneous rate (see Methods 4.1.3). The autocorrelation at all lags *t*_{j}(*j* > 0) arises only from the autocorrelation of the rate, since the Poisson process has no temporal correlations. The drop of autocorrelation between *t*_{0} and *t*_{1} mainly reflects the Poisson-process variance, and for all non-zero lags *t*_{j}(*j* > 0) the autocorrelation is given by

Here *τ*_{1} and *τ*_{2} are the ground-truth timescales defined by the parameters of two OU processes, *c*_{1} is the mixing coefficient, and *µ* and *σ*^{2} are the mean and variance of the event-counts, respectively. The variance of event-counts is related to the variance of the rate via Eq. 18. We fitted this analytical formula to the sample autocorrelation starting from the lag *t*_{1}. We assumed that the mean and variance of the rate are known and only fitted the timescales *τ*_{1} and *τ*_{2} and the coefficient *c*_{1}.

For all three processes, the sample autocorrelation exhibits a negative bias: the values of the sample autocorrelation are systematically below the ground-truth autocorrelation function (Fig. 1, left). This bias is clearly visible in the logarithmic-linear scale, where the ground-truth exponential decay turns into a straight line. The sample autocorrelation deviates from the straight line and even becomes systematically negative at intermediate lags (hence not visible in logarithmic scale) for processes with a strictly positive ground-truth autocorrelation (Fig. 1A,C). The deviations from the ground truth are larger when the timescales are longer or when multiple timescales are involved. The negative bias decreases for longer trial durations (Fig. 1, left inset), but it is still substantial for realistic trial durations such as in neuroscience data.

Due to the negative bias, a direct fit of the sample autocorrelation with the correct analytical function cannot recover the ground-truth timescales systematically underestimating them (Fig. 1, middle, right). When increasing the duration of each trial, the timescales obtained from the direct fits become closer to the ground-truth values (Supplementary Fig. 1). This observation indicates that timescales estimated from datasets with different trial durations cannot be directly compared, as differences in the estimation bias may result in misleading interpretation. Thus direct fitting of a sample autocorrelation, even when a correct analytical form of autocorrelation is known, is not a reliable method for measuring timescales in experimental data.

### 2.2 Estimating timescales by fitting generative models with Approximate Bayesian Computations

Since direct fitting of the sample autocorrelation cannot estimate timescales reliably, we developed an alternative method based on fitting the sample autocorrelation with a generative model. Using a generative model with known ground-truth timescales, we can generate synthetic data matching the essential statistics of the observed data, i.e. with the same duration and number of trials, mean and variance. Hence the sample autocorrelations of the synthetic and observed data will be affected by a similar statistical bias when their shapes match. As a generative model, we chose a linear mixture of OU processes—one for each estimated timescale—which if necessary can be augmented with additional temporal structure (e.g., oscillations) and noise. The advantage of using a mixture of OU processes for modelling the observed data is that the analytical autocorrelation function of this mixture explicitly defines the timescales. We set the number of components in the generative model in accordance with our hypothesis about the shape of the autocorrelation in the data, e.g., the number of timescales, additional temporal structure and noise; see Methods 4.1. Then, we optimize parameters of the generative model to match the shape of the autocorrelation between the synthetic and observed data. The timescales of the optimized generative model provide an approximation for the timescales in the observed data, without the bias corruption.

For complex generative models, calculating the likelihood function can be computationally expensive or even intractable. Therefore, we optimize the generative model parameters using adaptive Approximate Bayesian Computations (aABC) [35] (Fig. 2). aABC is an iterative algorithm that minimizes the distance between summary statistics of synthetic and observed data. We choose the autocorrelation as the summary statistics and define the distance *d* between the autocorrelations of synthetic and observed data as
where *t*_{m} is the maximum time-lag considered in computing the distance. Due to stochasticity of the observed data, point estimates (i.e. assigning a single value to a timescale) are not reliable as different realizations of the same stochastic process lead to slightly different autocorrelation shapes (Supplementary Fig. 2). aABC overcomes this problem by estimating the joint posterior distribution of the generative model parameters, which quantifies the estimation uncertainty. The uncertainty depends on the maximum time-lag *t*_{m}. Smaller *t*_{m} excludes noisy values in the tail of autocorrelation, which can produce narrower posterior distributions in fewer iterations. On the other hand, larger *t*_{m} may be necessary for estimating long timescales.

The aABC algorithm operates iteratively (Fig. 2). First, we choose a multivariate prior distribution over the parameters of the generative model and set an initial error threshold *ε* at a rather large value. On each iteration, we draw a sample of the parameter-vector *θ*. On the first iteration, *θ* is drawn directly from the prior distribution. On subsequent iterations, *θ* is drawn from a proposal distribution defined based on the prior distribution and the parameter samples accepted on the previous iteration. We use the generative model with the sample parameters *θ* to generate synthetic data. If the distance *d* between the autocorrelations of the observed and synthetic data (Eq. 8) is smaller than the error threshold, the sample *θ* is accepted. We then repeatedly draw new parameter samples and evaluate *d* until a fixed number of parameter samples are accepted. For each iteration, the fraction of accepted samples out of all drawn parameter samples is recorded as the acceptance rate *accR*. Next, the proposal distribution is updated using the samples accepted on the current iteration, and the new error threshold is set at the first quartile of the distance *d* for the accepted samples. The iterations continue until the acceptance rate reaches a specified value. The last set of accepted parameter samples is treated as an approximation for the posterior distribution (see Methods 4.2). To visualize the posterior distribution of a parameter (e.g., a timescale), we marginalize the obtained multivariate posterior distribution over all other parameters of the generative model.

We tested our method on the same synthetic data from the processes with known ground-truth timescales that we used to demonstrate the bias of direct exponential fitting (Fig. 3 cf. Fig. 1). For all three processes, the shape of the sample autocorrelation of the observed data is accurately reproduced by the autocorrelation of synthetic data generated using the maximum *a posteriori* (MAP) estimate of parameters from the multivariate posterior distribution (Fig. 3, left). The posterior distributions inferred by the aABC algorithm include the ground-truth timescales (Fig. 3, middle). The variance of posterior distributions quantifies the uncertainty of estimates. In our simulations, the number of trials controls the signal to noise ratio in sample autocorrelation, and consequently the width of the posterior distributions (Supplementary Fig. 3). The width of the posterior distributions also depends on the final acceptance rate (Fig. 3, right). A smaller final acceptance rate results in a narrower posterior distribution but requires larger number of iterations.

### 2.3 Estimating the timescale of activity in a branching network model

So far, we demonstrated that our aABC method accurately recovers the ground-truth timescales within the same model class, i.e. when the generative model and the process that produced the observed data are the same. However, the timescale inference based on OU processes is broadly applicable to data outside this model class, i.e. when the mechanism that generated the exponential decay of autocorrelation in the data is different from an OU process.

We tested our inference method on data from a different model class with known ground-truth autocorrelation function. Specifically, we applied our method to estimate the timescale of the global activity in a branching network model [43–45]. A branching network consists of *k* interconnected binary neurons, each described by the state variable *x*_{i} ∈ {0, 1}, where *x*_{i} = 1 indicates that neuron *i* is active, and 0 that it is silent (Fig. 4A). We considered a fully-connected network. Each active neuron can activate other neurons with the probability *p* = *m/k* and then, if not activated by other neurons, it becomes inactive again in the next time-step. Additionally, at every time-step, each neuron can be activated with a probability *h* by an external input. For a small input strength *h*, the state of the network’s dynamics is governed by a branching parameter *m* (*m* = 1 corresponds to the critical state). The autocorrelation function of the global activity *A*(*t′*) = Σ_{i} *x*_{i}(*t′*) in this network is known analytically *AC*(*t*_{j}) = exp(*t*_{j} ln(*m*)) [13]. Thus the ground-truth timescale of this activity is given by *τ* = −1/ ln(*m*).

We simulated the binary network model to generate the time-series data of the global activity. We then used aABC to fit the sample autocorrelation of these data with a one-timescale OU process as the generative model. The inferred posterior distribution is centered on the theoretically predicted timescale of the branching network, and the MAP estimate parameters accurately reproduce the shape of the sample autocorrelation of the network activity (Fig. 4B,C). These results show that our method can be used to estimate timescales in diverse types of data with different underlying mechanisms.

### 2.4 Model selection with Approximate Bayesian Computations

For experimental data, the correct generative model is usually unknown. For example, it may not be obvious *a priori* how many timescales should the generative model contain. Several alternative hypotheses may be plausible, and we need a procedure to select which one is more likely to explain the observed data. Assuming that autocorrelation function is a sufficient summary statistics for estimating timescales, we can use aABC to approximate the Bayes factor for selecting between alternative models [46–49].

We developed a procedure for selecting between two alternative models *M*_{1} and *M*_{2} based on their aABC fits. Models are compared using a goodness of fit measure that describes how well each model fits the data. The goodness of fit can be measured by the distance *d* between the autocorrelations of synthetic and observed data (Eq. 8). However, *d* is a noisy measure because of the finite sample-size of synthetic data and the uncertainty in the model parameters. Therefore, we compare the distributions of distances generated by two models with parameters uncertainty defined by their posterior distributions. To approximate the distributions of distances, we generate multiple samples of synthetic data from each model with parameters drawn from its posterior distribution and compute the distance *d* for each sample. If the distributions of distances are significantly different (e.g., can be tested using Wilcoxon rank-sum test), then we continue with the model selection, otherwise the summary statistics does not allow us to distinguish these models.

For selecting between *M*_{1} and *M*_{2}, we use the distributions of distances to estimate the Bayes factor, which is the ratio of marginal likelihoods of the two models [50, 51]. Assuming both models are *a priori* equally probable (*p*(*M*_{1}) = *p*(*M*_{2})), the Bayes factor can be approximated using the models acceptance rates for a specific error threshold *ε* [46, 48, 52]
*B*_{21}(*ε*) > 1 indicates that the model *M*_{2} is more likely to explain the observed data and vice versa. To eliminate the dependence on a specific error threshold, we compute the acceptance rates and the Bayes factor with a varying error threshold. For each error threshold ε, the acceptance rate is given by the cumulative distribution function of the distances . Hence, the ratio between CDFs of two models gives the value of the Bayes factor for every error threshold . We select the model *M*_{2} if *B*_{21d}(*ε*) > 1, i.e. if for all *ε*, and vice versa.

We evaluated our model selection method using synthetic data from three example processes with known ground truth, so that the correct number of timescales is known. Specifically, we used an OU process with a single timescale (Fig. 5A) and two different examples of an inhomogeneous Poisson process with two timescales. In the first example of an inhomogeneous Poisson process, the ground-truth timescales were well separated, so that the shape of the data autocorrelation suggested that the underlying process had multiple timescales (Fig. 5D). In the second example of an inhomogeneous Poisson process, we chose the ground-truth timescales to be more similar, so that a simple visual inspection of the data autocorrelation could not clearly suggest the number of timescales in the underlying process (Fig. 5G). For all three example processes, we fitted the data with one-timescale (*M*_{1}) and two-timescale (*M*_{2}) generative models using aABC and selected between these models using Bayes factors. The one- and two-timescale models were based on a single OU process or a linear mixture of two OU processes, respectively. For the data from inhomogeneous Poisson processes, the generative model also incorporated an inhomogeneous Poisson noise.

For the example OU process with a single timescale, the one- and two-timescale models fitted the shape of the data autocorrelation almost equally well (Fig. 5A). The marginal posterior distributions of two timescales estimated by the two-timescale model are not significantly different from each other (Fig. 5B), which indicates that the one-timescale model possibly better describes the data. To select between two models, we compare the cumulative distributions of distances (Fig. 5C). Although the two-timescales model has more parameters, it has significantly larger distances than the one-timescale model (Wilcoxon rank-sum test, *P* = 0.002, mean , mean ). The two-timescales model has a larger average distance because its posterior distribution has larger variance (larger uncertainty), which leads to a higher possibility to sample a combination of parameters with a larger distance. Since (i.e. *B*_{21}(ε) < 1), the one-timescale model is preferred over the two-timescale model, in agreement with the ground-truth generative process.

For both example inhomogeneous Poisson processes with two timescales, the shape of the data autocorrelation is better matched by the two-timescale than by the one-timescale model (the difference is subtle for the second example, Fig. 5D,G). The marginal posterior distributions of two timescales estimated by the two-timescale model are clearly separated and include the ground-truth values, whereas the timescale estimated by the one-timescale model is in between the two ground-truth values (Fig. 5E,H). The two-timescale model has significantly smaller distances (Wilcoxon rank-sum test, Fig. 5F: *P* < 10^{−10}, mean , mean ; Fig. 5I: *P* < 10^{−10}, mean , mean ). Since (i.e. *B*_{21d}(*ε*) > 1) for all error thresholds, the two-timescale model provides a better description of the data for both examples, in agreement with the ground truth. Thus, our method selects the correct generative model even for a challenging case where the shape of the data autocorrelation does not suggest the existence of multiple timescales.

### 2.5 Estimating timescales of ongoing neural activity

To illustrate an application of our method to experimental data, we inferred the timescales of ongoing spiking activity in the primate visual cortex. The spiking activity was recorded from the visual area V4 with a 16-channels micro-electrode array [53]. During recordings, a monkey was fixating a central dot on a blank screen for 3 s on each trial. To estimate the timescales, we pooled the spiking activity recorded across all channels and computed the spike-count autocorrelation of the pooled activity with a bin-size of 1 ms.

Previously, the autocorrelation of neural activity in several brain regions was modeled as an exponential decay with a single timescale [3]. To determine whether a single timescale is sufficient to describe the temporal dynamics of neural activity in our data, we compared the one-timescale (*M*_{1}) and two-timescale (*M*_{2}) models and selected the model that better describes the data. As a generative model for neural spike-counts, we used a doubly-stochastic process [41, 42], where spike-counts are generated from an instantaneous firing rate modelled as one OU process (*M*_{1}) or a mixture of two OU processes (*M*_{2}). To account for the non-Poisson statistics of the spike-generation process, we sampled spike-counts from a gamma distribution [54] (see Methods 4.1.3). We fitted both models with the aABC algorithm and selected between the models using Bayes factors (Fig. 6A-C). The two-timescale model provided a better description for the data, since it had smaller distances and for all the error thresholds (Fig. 6C, Wilcoxon rank-sum test, *P* < 10^{−10}, mean , mean ).

We further compared our method with a direct exponential fit of the sample autocorrelation which is usually employed to infer the timescales of neural activity [3, 13, 20–23]. We fitted the sample autocorrelation with a double exponential function
and compared the result with the two-timescale aABC fit (Fig. 6B). Similar to what we observed with synthetic data, the direct exponential fit produced timescales that were systematically smaller than the MAP estimates with the aABC fit. Since the ground-truth timescales are not available for biological data, we used a sampling procedure to evaluate whether the data is better described by the timescales from the direct fit or from the MAP estimates with the aABC fit. We generated multiple samples of synthetic data using the two-timescale doubly-stochastic generative model with the parameters from either the direct fit or MAP estimates from aABC. For each sample, we measured the distance *d* between the autocorrelation of synthetic and neural data to obtain the distribution of distances for both types of fits (Fig. 6D). The distances were significantly smaller for synthetic data generated with the parameters from the MAP estimate with aABC than from the direct exponential fit (Wilcoxon rank-sum test, *P* < 10^{−10}, mean distance of MAP parameters 10^{−4}, mean distance of exponential fit parameters 3 × 10^{−4}). Our method performs better because it accounts for the bias in the autocorrelation shape which is ignored by the direct fit. These results suggest that our method estimates the timescales of neural activity more accurately than a direct exponential fit and, moreover, allows for comparing alternative hypotheses about the underlying dynamics.

## 3 Discussion

We demonstrated that direct exponential fitting often fails to recover the correct timescales due to a statistical bias in the shape of sample autocorrelation. To overcome this problem, we developed a method based on adaptive Approximate Bayesian Computations. Our method fits the sample autocorrelation of the data with a generative model which is based on a mixture of Ornstein-Uhlenbeck processes (one for each estimated timescale) and can incorporate additional temporal structure and noise. Our method infers a posterior distribution of timescales consistent with the observed data. The width of the posterior distribution depends on the noise in the data and quantifies the estimation uncertainty, e.g., it can be used as a confidence interval. The posterior distributions can be used for model selection to compare alternative hypotheses about the dynamics of the underlying process. Our method is suitable for comparing timescales between datasets with different durations and between processes that exhibit different temporal dynamics (e.g., oscillations).

The statistical bias in sample autocorrelation arises primarily due to the deviation of the sample mean from the ground-truth mean. If the ground-truth mean of the process is known, then using the true mean instead of the sample mean for computing the autocorrelation largely eliminates the bias. In this case, direct fitting of exponential decay functions can produce satisfactory estimates, but our method additionally provides confidence intervals. When the true mean is unknown but it is known that it is the same across all trials, it is beneficial to estimate a single sample mean from the whole dataset instead of estimating the mean for trials individually. However, this assumption does not always hold. For example, the mean of spiking neural activity can change across trials because of changes in animal’s arousal state. If the assumption of a constant mean is violated in the data, estimating the sample mean from the whole dataset leads to strong distortions of the autocorrelation shape introducing additional slow timescales [55].

One of the design choices in our method is the definition of distances between the autocorrelation of the synthetic and observed data. Here we used linear differences, but it is also possible to compute distances in the logarithmic-linear scale. This latter choice allows us to better resolve the behavior in the tail of the autocorrelation at large time-lags. However, the tail is also subject to the largest noise, and emphasis on the tail slows down the convergence of the aABC algorithm.

We validated our method using data from synthetic processes, where the exponential decay rate of autocorrelation is known analytically by design. We further illustrated our method using spiking neural activity data, where the underlying ground-truth process is unknown. In this case, the posterior distributions estimated by our method can be used to approximate the Bayes factor for selecting between different generative models that represent alternative hypotheses about neural dynamics. Since the Bayes factor takes the model complexity into account, it can be used for comparing models with different number of parameters, e.g., to infer the number of timescales in the data.

Our method reliably estimates correct timescales when the standard direct exponential fitting fails, but this advantage comes with a price of higher computational costs for our method. Thus, when long time-series data are available and statistical bias does not corrupt the results, the direct exponential fit of sample autocorrelation may be preferred. To test whether this is the case, a generative model (e.g., based on a mixture of OU processes) with parameters obtained from the direct exponential fit can be used to generate multiple samples of synthetic data each with the same amount of data as in the experiment. For each sample of synthetic data, timescales can be estimated again by direct exponential fitting and the obtained distribution of timescales can be compared to the original timescales estimated from the experimental data. If there is no significant difference, the standard direct exponential fit may be sufficiently accurate.

The general approach of inferring timescales with aABC based on OU processes can be adapted to various types of data and different generative models. Our method can find broad applications in different fields including neuroscience, epidemiology, and physics. This general approach is particularly favorable for data organized in short trials or trials of different durations, when standard exponential fitting is unreliable, and it allows for comparing timescales between datasets with different trial durations and data amount.

## 4 Methods

### 4.1 Generative models

We used several generative models based on a linear mixture of OU processes—one for each estimated timescale—sometimes augmented with additional temporal structure (e.g., oscillations) and noise.

#### 4.1.1 Ornstein–Uhlenbeck process with multiple timescales

We define an OU process with multiple timescales *A*(*t′*) as a linear mixture of OU processes *A*_{k}(*t′*) with timescales *τ*_{k}, *k* ∈ {1, …, *n*}, zero mean and unit variance:

Here *n* is the number of timescales in the mixture and *c*_{k} are the mixing coefficients. We simulate each OU process *A*_{k} by iterating its time-discrete version using the Euler scheme [38]
where is the discretization time-step and is a random number generated from a normal distribution. We set the unit variance for each OU process Var(*A*_{k}) = *D*_{k}*τ*_{k} = 1 by fixing *D*_{k} = 1/*τ*_{k}. The parameter-vector *θ* for a linear mixture of *n* OU processes consists of 2*n* − 1 values: *n* timescales *τ*_{k} and *n* − 1 coefficients *c*_{k} (see Eq. 11).

We match the mean and variance of the multi-timescale OU process to the sample mean and sample variance of the observed data using a linear transformation:

We use the process *A*_{trans}(*t′*) as a generative model for data fitting and hypothesis testing (Methods 4.2).

#### 4.1.2 Multi-timescale Ornstein–Uhlenbeck process with an oscillation

To obtain a generative model with an oscillation, we add to a multi-timescale OU process (Eq. 11) an oscillatory component with the weight *c*_{k+1}:

Here we assume that the frequency *f* is known and only fit the weight *c*_{k+1}. In applications, the frequency *f* can be estimated from the power spectrum of data time-series, or it can also be fitted with aABC as an additional parameter. For each trial, we draw the phase *φ* independently from a uniform distribution on [0, 2*π*]. We use the linear transformation Eq. 13 to match the mean and variance of this generative process to the observed data.

#### 4.1.3 Doubly stochastic process with multiple timescales

The doubly stochastic process with multiple timescales is generated in two steps: first generating time-varying rate and then generating event-counts from this rate. To generate the time-varying rate, we scale, shift and rectify a multi-timescale OU process (Eq. 11) using the transformation

The resulting rate *A*_{trans}(*t′*) is non-negative and for *µ*′ » *σ* it has the mean and variance . We then draw event-counts *s* for each time-bin from an event-count distribution , where is the mean event-count and is the bin size (in our simulations ∆*t′* = 1 ms). A frequent choice of is a Poisson distribution
which results in an inhomogeneous Poisson process.

To match the mean and variance of the doubly stochastic process to the observed data, we need to estimate the mean rate *µ′* and both the variance of the rate and the variance of the event-count distribution . According to the law of total expectation, the mean rate , where is the sample mean of the observed event-counts. According to the law of total variance [42], the total variance of event-counts *σ*^{2} arises from two contributions: the variance of the rate and the variance of the event-count distribution:

For the Poisson distribution, the variance of the event-count distribution is equal to its mean: . However, the condition of equal mean and variance does not always hold in experimental data [54]. Therefore, we also use other event-count distributions, in particular a Gaussian and gamma distribution. We define *α* as the variance over mean ratio of the event-count distribution . For the Poisson distribution, *α* = 1 always holds. For other distributions, we assume that *α* is constant (i.e. does not depend on the rate). With this assumption, the law of total variance Eq. 17 becomes
where is the mean rate. From Eq. 18 we find the rate variance
where is the sample variance of event-counts in the observed data. We find that with the correct values of *µ′*, and *α*, both the Gaussian and gamma distributions of event-counts produce comparable estimates of timescales (Supplementary Fig. 4).

To calculate with Eq. 19, we first need to estimate *α* from the observed data. We estimate *α* from the drop of autocorrelation of event-counts between the time-lags *t*_{0} and *t*_{1}. Since event-counts in different time-bins are drawn independently, this drop mainly reflects the difference between the total variance of event-counts and variance of the rate (Eq. 17, we neglect a small decrease of the rate autocorrelation between *t*_{0} and *t*_{1}) and does not depend on timescales. Thus, we find *α* with a grid search that minimizes the distance between the autocorrelation at *t*_{1} of the observed and synthetic data from the generative model with fixed timescales. Alternatively, *α* can be fitted together with all other parameters of the generative model using aABC. We find that since *α* is almost independent from other parameters, aABC finds the correct value of *α* first and then fits the rest of parameters. The MAP estimate of *α* converges to the same value as estimated by the grid search, but aABC requires more iterations to get posterior distributions for estimated timescales with a similar variance (Supplementary Fig. 5). Therefore, for practical purposes, it is preferred to first find α with the grid search and then get narrower posteriors in fewer aABC iterations.

### 4.2 Optimizing generative model parameters with adaptive Approximate Bayesian Computations

We optimize parameters of generative models with adaptive Approximate Bayesian Computations (aABC) following the algorithm from Ref. [35, 56]. aABC is an iterative algorithm to approximate the multivariate posterior distribution of model parameters. It uses population Monte-Carlo sampling to minimize the distance between the summary statistics of the observed and synthetic data from the generative model. We use sample autocorrelation as the summary statistics and define the distance *d* as the mean squared deviation between the observed and synthetic autocorrelations up to time-lag *t*_{m} (Eq. 8).

On the first iteration of the algorithm, the parameters of the generative model are drawn from the prior distribution. We use a multidimensional uniform prior distribution *π*(*θ*) over fitted parameters (e.g., timescales and their weights). The domain of prior distribution for the timescales is chosen to include a broad range below and above the timescales estimated by the direct exponential fits of data autocorrelation. For weights of timescales, we use uniform prior distributions on [0, 1]. The model with parameters is used to generate synthetic time-series *A*(*t′*) with the same duration and number of trials as in the observed data. Next, we compute the distance *d* between the autocorrelations of the synthetic and observed data. If *d* is smaller than the error threshold *ε* (initially set to 1), the parameters are accepted and added to the multivariate posterior distribution. Each iteration of the algorithm is repeated until 500 parameters samples are accepted.

On subsequent iterations, the same steps are repeated but with parameters drawn from a proposal distribution and with an updated error threshold. On each iteration, the error threshold is set at the first quartile of the accepted sample distances from the previous iteration. The proposal distribution is computed for each iteration *ξ* based on the prior distribution and the accepted samples from the previous iteration:

Here is the importance weight of the accepted sample from the previous iteration

*K*_{ξ} is the random walk kernel for the population Monte Carlo algorithm, which is a multivariate Gaussian with the mean and the covariance equal to twice the covariance of all accepted samples from the previous iteration Σ = 2Cov[*θ*^{(ξ−1)}]:

Here *κ* is the number of fitted parameters, and |Σ| is the determinant of Σ.

The convergence of the algorithm is defined based on the acceptance rate *accR*, which is the number of accepted samples divided by the total number of drawn samples on each iteration. The algorithm terminates when the acceptance rate reaches *accR*_{min}, which is set to *accR*_{min} = 0.003 in our simulations. Smaller *accR*_{min} leads to narrower posterior distributions and smaller final error threshold, but requires more iterations to converge, i.e. longer simulation time. To find the MAP estimates, we smooth the final joint posterior distribution with a multivariate Gaussian kernel and find its maximum with a grid search.

### 4.3 Description of neural recordings

Experimental procedures and data pre-processing were described previously [53]. In brief, a monkey was trained to fixate a central dot on a blank screen for 3 s on each trial. Spiking activity was recorded with a 16-channel micro-electrode array inserted perpendicularly to the cortical surface to record from all layers in the visual area V4. For fitting, we used a recording session with 81 trials. We pooled the activity across all channels and calculated the population spike-counts in bins of 1 ms. Then, we computed the autocorrelation of spike-counts using Eq. 2.

### 4.4 Parameters of simulations and aABC fits in figures

For all fits, the initial error threshold was set to *ε* = 1. The aABC iterations continued until *accR* ≼ 0.003 was reached. All datasets (except for the branching network) consisted of 500 trials each of 1 s duration. The dataset for the branching network (Fig. 4) consisted of 100 trials with 500 time-steps.

## Author contributions

RZ, TAE and AL designed the research, discussed results and wrote the paper. RZ wrote the computer code, performed simulations, and analyzed the data.

## 5 Supplementary figures

## Acknowledgments

This work was supported by a Sofja Kovalevskaja Award from the Alexander von Humboldt Foundation, endowed by the Federal Ministry of Education and Research (RZ, AL), **SMART***START* 2 program provided by Bernstein Center for Computational Neuroscience and Volkswagen Foundation (RZ), NIH grant R01 EB026949 (TAE), and the Pershing Square Foundation (TAE). We acknowledge the support from the BMBF through the Tübingen AI Center (FKZ: 01IS18039B). We thank N.A. Steinmetz and T. Moore for sharing the electrophysiological data, which are presented in Ref. [53] and are archived at the Stanford Neuroscience Institute server at Stanford University.

## References

- [1].↵
- [2].↵
- [3].↵
- [4].
- [5].↵
- [6].↵
- [7].
- [8].↵
- [9].↵
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].
- [16].↵
- [17].↵
- [18].
- [19].↵
- [20].↵
- [21].
- [22].
- [23].↵
- [24].↵
- [25].
- [26].
- [27].↵
- [28].↵
- [29].↵
- [30].
- [31].↵
- [32].
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].
- [45].↵
- [46].↵
- [47].
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵