## ABSTRACT

Intrinsic timescales characterize dynamics of endogenous fluctuations in neural activity. Variation of intrinsic timescales across the neocortex reflects functional specialization of cortical areas, but less is known about how intrinsic timescales change during cognitive tasks. We measured intrinsic timescales of local spiking activity within columns of area V4 while monkeys performed spatial attention tasks. The ongoing spiking activity unfolded across at least two distinct timescales, fast and slow. The slow timescale increased when monkeys attended to the receptive fields location and correlated with reaction times. By evaluating predictions of several network models, we found that spatiotemporal correlations in V4 activity were best explained by the model in which multiple timescales arise from recurrent interactions shaped by spatially arranged connectivity, and attentional modulation of timescales results from an increase in the efficacy of recurrent interactions. Our results suggest that multiple timescales arise from the spatial connectivity in the visual cortex and flexibly change with the cognitive state due to dynamic effective interactions between neurons.

The brain processes information and coordinates behavioral sequences over a wide range of timescales^{1–3}. While sensory inputs can be processed as fast as tens of milliseconds^{4–7}, cognitive processes such as decision making or working memory require integrating information over slower timescales from hundreds of milliseconds to minutes^{8–10}. These differences are paralleled by the timescales of intrinsic fluctuations in neural activity across the hierarchy of cortical areas. The intrinsic timescales are defined by the exponential decay rate of the autocorrelation of activity fluctuations. The intrinsic timescales are faster in sensory areas, intermediate in association cortex, and slower in prefrontal cortical areas^{11}. The hierarchy of intrinsic timescales is observed across different recording modalities including spiking activity^{11, 12}, intracranial electrocorticography (ECoG)^{13, 14}, and functional magnetic resonance imaging (fMRI)^{15, 16}. The hierarchy of intrinsic timescales reflects the specialization of cortical areas for behaviorally relevant computations, such as the processing of rapidly changing sensory inputs in lower cortical areas and long-term integration of information (e.g., for evidence accumulation, planning, etc.) in higher cortical areas^{17}.

In addition to ongoing fluctuations characterized by intrinsic timescales, neural firing rates also change in response to sensory stimuli or behavioral task events. These stimulus or task-induced dynamics are characterized by the timescales of trial-average neural response^{18, 19} or encoding various task events over multiple trials^{12, 20}. The task-induced timescales also increase along the cortical hierarchy^{12, 14, 20–22}. However, task-induced and intrinsic timescales are not correlated across individual neurons in any cortical area^{12}, suggesting they may arise from different mechanisms. Indeed, the timescales of trial-average response increase through the mouse visual cortical hierarchy, whereas the intrinsic timescales do not change^{22}. Moreover, the task-induced and intrinsic timescales can depend differently on task conditions. For example, for a fixed trial-average response in a specific task condition, the intrinsic timescale of neural dynamics varies substantially across trials and these changes are predictive of the reaction time in a decision-making task^{23}. While task-induced timescales relate directly to task execution, less is known about how intrinsic timescales change during cognitive tasks. Intrinsic timescales measured with ECoG exhibit a widespread increase across multiple cortical association areas during working memory maintenance, consistent with the emergence of persistent activity in this period^{13}. However, whether intrinsic timescales can change with temporal and spatial specificity in local neural populations processing specific information during a task has not been tested. It is also unclear whether intrinsic timescales can flexibly change in sensory cortical areas and in cognitive processes other than memory maintenance.

The mechanism underlying the diversity of intrinsic timescales across cortical areas can be related to differences in the connectivity. The hierarchical organization of timescales correlates with the gradients in the strength of neural connections in different cortical areas^{24, 25}. These gradients exhibit an increase through the cortical hierarchy in the spine density on dendritic trees of pyramidal neurons^{26, 27}, gray matter myelination^{13, 28}, expression of N-methyl-D-aspartate (NMDA) and gamma-aminobutyric acid (GABA) receptor genes^{13, 29}, strength of structural connectivity measured using diffusion MRI^{16}, or strength of functional connectivity^{15, 16, 30–32}.

The relation between the connectivity and timescales is further supported by computational models. Differences in timescales across cortical areas can arise in network models from differences in the strength of recurrent excitatory connections^{27, 33}. These models matched the strength of excitatory connections to the spine density of pyramidal neurons^{27} or to the strength of structural connectivity^{33} in different cortical areas. Moreover, models demonstrate that the topology of connections in addition to the connection strength can affect the timescales of network dynamics. For example, slower timescales emerge in networks with clustered connections compared to random networks^{34}, or heterogeneity in the strength of inter-node connections gives rise to diverse localized timescales in a one dimensional network^{35}. Thus, network models can relate dynamics to connectivity and generate testable predictions to identify mechanisms underlying the generation of intrinsic timescales in the brain.

We examined how the intrinsic timescales of spiking activity in visual cortex were affected by the trial-to-trial alterations in the cognitive state due to visual spatial attention. We analyzed spiking activity recorded from local neural populations within cortical columns in primate area V4 during two different spatial attention tasks and a fixation task. In all tasks, the autocorrelation of intrinsic activity fluctuations showed at least two distinct timescales, one fast and one slow. The slow timescale was longer on trials when monkeys attended to the receptive fields of the recorded neurons and correlated with the monkeys’ reaction times. We used recurrent network models to test several alternative mechanisms for generating the multiplicity of timescales and their flexible modulation. We established analytically that spatially arranged connectivity generates multiple timescales in local population activity and found support for this theoretical prediction in our V4 recordings. In contrast, heterogeneous biophysical properties of individual neurons alone cannot account for both temporal and spatial structure of V4 correlations. Thus, the V4 timescales arise from spatiotemporal population dynamics shaped by the local spatial connectivity structure. The model indicates that modulation of timescales during attention can be explained by a slight increase in the efficacy of recurrent interactions. Our results suggest that multiple intrinsic timescales in local population activity arise from the spatial network structure of the neocortex and the slow timescales can flexibly adapt to trial-to-trial changes in the cognitive state due to dynamic effective interactions between neurons.

## Results

### Multiple timescales in fluctuations of local neural population activity

We analyzed spiking activity of local neural populations within cortical columns of visual area V4 from monkeys performing a fixation task (FT) and two different spatial attention tasks (AT1, AT2)^{36, 37} (Fig. 1a-c, Supplementary Fig. 1). The activity was recorded with 16-channel linear array microelectrodes from vertically aligned neurons across all cortical layers such that the receptive fields (RFs) of neurons on all channels largely overlapped. In FT, the monkey was rewarded for fixating on a blank screen for 3 s on each trial (Fig. 1a). During AT1, the monkeys were trained to detect changes in the orientation of a grating stimulus in the presence of three distractor stimuli and to report the change with a saccade to the opposite location (antisaccade, Fig. 1b). On each trial, a cue indicated the stimulus that was most likely to change, which was the target of covert attention, and the stimulus opposite to the cue was the target of overt attention due to the antisaccade preparation. During AT2, the monkey was rewarded for detecting a small luminance change in a grating stimulus in the presence of a distractor stimulus placed in the opposite hemifield. The monkey reported the change by releasing a bar. An attentional cue on each trial indicated the stimulus where the change should be detected, which was the target of covert attention (Fig. 1c).

We analyzed the timescales of fluctuations in local spiking activity by computing the autocorrelations (ACs) of spike counts in 2 ms bins. Previous laminar recordings showed that the neural activity is synchronized across cortical layers alternating spontaneously between synchronous phases of high and low firing rates^{36, 38}. Therefore, we pooled the spiking activity across all layers (Fig. 1d) to obtain more accurate estimates of the spike-count autocorrelations. The shape of spike-count autocorrelations in our data deviated from a single exponential decay. In logarithmic-linear coordinates, the exponential decay corresponds to a straight line with a constant slope. The spike-count autocorrelations exhibited more than one linear slope, with a steep initial slope followed by shallower slopes at longer lags (Fig. 1e). The multiple decay rates in the autocorrelations indicate the presence of multiple timescales in the fluctuations of local population spiking activity.

To verify the presence of multiple timescales and to accurately estimate their values from autocorrelations, we used a method based on adaptive Approximate Bayesian Computations (aABC, Methods)^{39}. This method overcomes the statistical bias in autocorrelations of finite data samples, which undermines the accuracy of conventional methods based on direct fitting of the autocorrelation with exponential decay functions. The aABC method estimates the timescales by fitting the spike-count autocorrelation with a generative model that can have single or multiple timescales and incorporates spiking noise. The method accounts for the finite data amount, non-Poisson statistics of the spiking noise, and differences in the mean and variance of firing rates across experimental conditions. The aABC method returns a posterior distribution of timescales that quantifies the estimation uncertainty and allows us to compare alternative hypotheses about the number of timescales in the data.

We fitted each autocorrelation with a one-timescale (*M*_{1}) and a two-timescale (*M*_{2}) generative model and selected the optimal number of timescales by approximating the Bayes factor obtained from the posterior distributions of the fitted models (Fig. 2a, Supplementary Fig. 2, Methods). The majority of autocorrelations were better described by the model with two distinct timescales (*M*_{2}) than with the one-timescale model (Fig. 2a,b). The presence of two distinct timescales (fast *τ*_{1} and slow *τ*_{2}) was consistent across both spontaneous (i.e. in the absence of visual stimuli, *τ*_{1,MAP} = 8.87 ± 0.78 ms, *τ*_{2,MAP} = 85.82 ± 15.9 ms, mean ± s.e.m. across sessions, MAP: Maximum *a posteriori* estimate from the multivariate posterior distribution) and stimulus-driven activity (*τ*_{1,MAP} = 5.05 ± 0.51 ms, *τ*_{2,MAP} = 135.87 ± 9.35 ms, mean ± s.e.m.), and across all monkeys, while the precise values of timescales were heterogeneous reflecting subject- or session-specific characteristics (Fig. 2c). Although it is possible that autocorrelations contained more than two timescales, with our data amount, the three-timescale model did not provide a better fit than the two-timescale model (Supplementary Fig. 3). Thus, the two-timescale model provided a parsimonious description of neural dynamics in our data.

### Slow timescales are modulated during spatial attention

Next, we examined whether the intrinsic timescales of spiking activity were modulated during spatial attention. We compared the timescales estimated from the stimulus-driven activity on trials when the monkeys attended toward the RFs location of the recorded neurons (attend-in condition, covert or overt) versus the trials when they attended outside the RFs location (attend-away condition). In this analysis, we included recording sessions in which the autocorrelations were better fitted with two timescales in both attend-away and attend-in (covert or overt) conditions. We compared the MAP estimates of the fast *τ*_{1} and slow *τ*_{2} timescales between attend-in and attend-away conditions across recording sessions.

We found that the slow timescale was significantly longer during both covert and overt attention relative to the attend-away condition (covert: mean *τ*_{2,att-in} = 140.69 ms, mean *τ*_{2,att-away} = 115.07 ms, *p* = 3×10^{−4}, *N* = 32; overt: mean *τ*_{2,att-in} = 141.31 ms, mean *τ*_{2,att-away} = 119.58 ms, *p* = 7×10^{−4}, *N* = 26; two-sided Wilcoxon signed-rank test) (Fig. 3), while there was no significant change in the fast timescale during attention (covert: mean *τ*_{1,att-in} = 5.53 ms, mean *τ*_{1,att-away} = 5.54 ms, *p* = 0.75, *N* = 32; overt: mean *τ*_{1,att-in} = 3.42 ms, mean *τ*_{1,att-away} = 4.12 ms, *p* = 0.39, *N* = 26; two-sided Wilcoxon signed-rank test). The increase in the slow timescale with attention was evident on individual recording sessions when comparing the marginal posterior distributions of *τ*_{2} for attend-in versus attend-away conditions (Fig. 3a,d). The significant increase of *τ*_{2} was observed in 24 out of 32 individual sessions during covert attention, and 22 out of 26 individual sessions during overt attention. Both fast and slow timescales varied across sessions, but were not significantly different between covert and overt attention (*p >* 0.05 for both *τ*_{1} and *τ*_{2}, two-sided Wilcoxon signed-rank test, Supplementary Fig. 4). The increase in *τ*_{2} was not due to increase in the firing rate with attention, since the aABC method accounts for the differences in the firing rate across behavioral conditions (Methods), and *τ*_{2} was not correlated with the mean firing rate of population activity (Supplementary Fig. 5). The increase of slow timescales during attention is consistent with the reduction in the power of low-frequency fluctuations in local field potentials^{37, 40–42} and spiking activity^{43} (Supplementary Note 1, Supplementary Fig. 6, 7). The modulation of the slow timescale was consistent across both attention tasks (AT1 and AT2) and each monkey, and appeared in response to trial-to-trial changes in the cognitive state of the animal directed by the attention cue. These results suggest that different mechanisms control the fast and slow timescales of ongoing spiking activity, and the mechanisms underlying the slow timescale can flexibly adapt according to the animal’s behavioral state.

To test whether attentional modulation of timescales was relevant for behavior, we analyzed the relationship between timescales and monkeys’ reaction times in the attention tasks. We quantified the relationship between the average reaction times of monkeys’ responses in each session (see Supplementary Fig. 1 for details of experiment) and the MAP estimated timescales of spiking activity using linear mixed-effects models fitted separately in attend-in and attend-away conditions (Fig. 4, Methods, Supplementary Table 1, 2). The linear mixed-effects models had a separate intercept for each monkey to account for individual differences between the monkeys and attention tasks (AT1 and AT2). The reaction times were negatively correlated with the slow timescales in attend-in condition (combined covert and overt) (slope = −0.16 ± 0.066, mean ±95% CIs; *p* = 9 × 10^{−6}, F-test; *N* = 58, *R*^{2} = 0.62), but not in attend-away condition (slope = 0.015 ± 0.12, *p* = 0.79, *N* = 32, *R*^{2} = 0.69). Fast timescales were not correlated with the reaction times (attend-in: slope = 0.0016 ± 0.86, *p* = 0.997, *N* = 58, *R*^{2} = 0.46; attend-away: slope = 0.53 ± 0.94, *p* = 0.26, *N* = 32, *R*^{2} = 0.70). Thus, on average monkeys responded to a stimulus change faster in sessions with longer slow timescales of neurons with the receptive fields in the attended location. The spatial selectivity of this effect suggests that the increase in the slow timescale may contribute to behavioral benefits of selective spatial attention.

### Mechanisms for generating multiple timescales in local population dynamics

What mechanisms can generate multiple timescales in the local population activity? One possibility is that multiple timescales reflect biophysical properties of individual neurons within a local population. For example, two timescales can arise from mixing heterogeneous timescales of different neurons^{44, 45} or combining different biophysical processes, such as a fast membrane time constant and a slow synaptic time constant^{46}. Alternatively, multiple timescales in local population activity can arise from spatiotemporal population dynamics in networks with spatially arranged connectivity^{47}.

Analyses of well-isolated single-unit activity (SAU) would be ideal for testing whether multiple timescales in local V4 population activity reflect mixing heterogeneous timescales of individual neurons or dynamics shared by the population. However, due to low firing rates, SUA did not yield sufficient data for conclusive model comparison. We fitted autocorrelations of SUA during the fixation task (which had the longest trial duration of 3 s and thus the largest data amount) and performed the model comparison to determine the number of timescales. While some single units clearly showed two distinct timescales, the model comparison was inconclusive for most units because autocorrelations were dominated by noise due to low data amount (Supplementary Note 2, Supplementary Fig. 8). We therefore turned to computational modeling for testing possible alternative mechanisms for generating multiple timescales.

To determine which mechanism, local biophysical properties or spatial network interactions, is consistent with neural dynamics in V4, we developed three recurrent network models each with a different mechanism for timescale generation (Fig. 5). We implemented all mechanisms within the same modeling framework. The models consist of binary units arranged on a two-dimensional lattice corresponding to lateral dimensions in the cortex (Fig. 5a-c). Each unit represents a small population of neurons, such as a cortical minicolumn^{48, 49}, and is connected to 8 other units in the network. The activity of unit *i* at time-step *t*′ is described by a binary variable *S*_{i}(*t*′) ∈ {0, 1} representing high (1) and low (0) firing-rate states of a local population^{36}. The activity *S*_{i}(t′) stochastically transitions between states driven by the self-excitation (probability *p*_{s}), excitation from the connected units (probability *p*_{r}), and the stochastic external excitation (probability *p*_{ext} « 1) delivered to each unit (Methods). The self-excitation probability describes intrinsic dynamics of a unit in the absence of network interactions, arising from biophysical properties of neurons or reverberation within a local population (via the vertical connectivity within a minicolumn). The self-excitation generates a timescale *τ*_{self}, which is the autocorrelation timescale of a two-state Markov process: *τ*_{self} = (−ln(*p*_{s}))^{−1} (Methods, Supplementary Note 3). The recurrent excitation *p*_{r} accounts for horizontal interactions between units. The sum of all interaction probabilities is the local branching parameter: BP = *p*_{s} + 8*p*_{r}, describing the expected number of units activated by a single active unit *i*.

The models differ in the mechanism generating multiple timescales in the local population activity. In two models, connectivity is random and multiple timescales arise locally from biophysical properties of individual units. In the third model, connectivity is spatially organized and multiple timescales arise from recurrent interactions between units^{47}.

The first model assumes that two timescales in local population activity reflect aggregated activity of different neuron types with distinct (fast and slow) biophysical timescales (e.g., membrane time constants), which we modeled as two types of units (A and B) each with a different self-excitation probability (*p*_{s,A}, *p*_{s,B}, Fig. 5a). We placed two units, A and B, on each vertex of the lattice and summed their activity to obtain a local population activity as in the columnar recordings. Connections between units of any type are random. As expected, the autocorrelation of local population activity exhibits two distinct timescales corresponding to the self-excitation timescales of the two unit types (Fig. 5d).

The second model assumes that two timescales arise from two local biophysical processes, e.g., a fast membrane time constant and a slow synaptic time constant (Fig. 5b)^{46}. We modeled the membrane time constant with the fast self-excitation timescale, and the synaptic time constant as a low-pass filter of the input to each unit with a slow time-constant *τ*_{synapse} (Methods)^{46}. The connectivity between units is random. The autocorrelation of individual unit’s activity in this model exhibit two timescales corresponding to the membrane (*τ*_{self}) and synaptic (*τ*_{synapse}) time constants (Fig. 5e).

Finally, in the third model, multiple timescales arise from recurrent dynamics shaped by the spatial network connectivity, akin to the horizontal connectivity in primate visual cortex^{49}. Each model unit is connected to 8 nearby units (Fig. 5c). Although each unit has only a single self-excitation timescale, the unit’s autocorrelation exhibit multiple timescales with a fast decay at short time-lags and a slower decay at longer time-lags (Fig. 5f). The fast initial decay corresponds to the self-excitation timescale. The slow autocorrelation decay is generated by recurrent interactions among units in the network. In simulations, the slow autocorrelation decay closely matches the autocorrelation of the net recurrent input received by a unit from its neighbors (excluding the self-excitation input).

To understand how recurrent interactions generate slow timescales, we analytically computed the autocorrelation timescales of the unit’s activity in the network with spatial connectivity, using the master equation for binary units with Glauber dynamics^{50} (Methods, Supplementary Note 4, details in^{47}). We found that the slow decay of the autocorrelation contains a mixture of interaction timescales *τ*_{int,k}. Each *τ*_{int,k} arises from recurrent interactions on a different spatial scale, characterized by the modes of correlated fluctuations with different spatial frequencies **k** in the Fourier space (Methods). For each spatial frequency **k**, the interaction timescale depends on both the probability of horizontal interactions (*p*_{r}) and the self-excitation probability (*p*_{s}) (Methods, Eq. 24). Shorter interaction timescales arise from higher spatial frequency modes (larger **k**) which correspond to persistent activity in local neighborhoods, and longer timescales are generated by more global interactions (smaller **k**)^{47}. The longest timescale in the network is characterized by the global interaction timescale related to the zero spatial frequency mode (Methods, Eq. 25). We can approximate the slow decay of the autocorrelation with a single effective interaction timescale (*τ*_{int}) defined as a weighted average of all interaction timescales (Methods, Eq. 27). Therefore, the autocorrelation shape is well approximated with two timescales: the fast self-excitation timescale and the slow effective interaction timescale.

Generating multiple timescales in spatial networks does not require strictly structured connectivity. Systematically changing the connectivity from structured to random reveals that networks with an intermediate level of local connectivity also exhibit multiple timescales in local dynamics (Fig. 6, Supplementary Note 5). However, by getting closer to a random connectivity, most interaction timescales become smaller and close to the self-excitation timescale, and only the global timescale does not depend on the network structure. Hence networks with different connectivity have the same global timescale (Fig. 6, inset). In fully random networks, the autocorrelation of a unit’s activity effectively exhibits only two distinct timescales: the self-excitation timescale and the global interaction timescale. However, the global timescale has a very small relative contribution in local autocorrelations (scaled with the inverse number of neurons in the network) and is hard to observe empirically as it requires data with excessively long trial duration.

While all three mechanisms account for multiple timescales in V4 autocorrelations, they can be distinguished in cross-correlations between local population activity at different spatial distances. In models with random connectivity, cross-correlations do not depend on distance between units on the lattice (Fig. 5g,h). In contrast, the model with spatial connectivity predicts that both the strength and timescales of cross-correlations depend on distance (Fig. 5i). Specifically, the zero time-lag cross-correlations decrease with distance. Moreover, cross-correlations contain multiple timescales equal to the interaction timescales in autocorrelations (Methods), but no self-excitation timescale since self-excitation is independent across units. With increasing distance, the weights of timescales generated by local interactions (high spatial frequency modes) decrease, and timescales generated by more global interactions (low spatial frequency modes) dominate cross-correlations. Thus, cross-correlations become weaker and dominated by slower timescales at longer distances (analytical derivations in Methods, details in^{47}). Approximating the shape of auto- and cross-correlations with two effective timescales, the theory predicts that both timescales in cross-correlations are larger than in the autocorrelation and increase with distance. Therefore, by measuring timescales of cross-correlations at different distances, we can determine which mechanism, spatial network interactions or local biophysical properties, is more consistent with neural dynamics in V4.

To test these model predictions in our V4 recordings, we computed cross-correlations between population activity on different channels during spontaneous activity (monkey G in FT, monkey R in AT2), which had the longest trial durations for better detection of slow timescales (Methods). Columnar recordings generally exhibit slight horizontal displacements which manifest in a systematic shift of receptive fields (RFs) across channels^{51}. We used distances between the RF centers (RF-center distance) as a proxy for horizontal cortical distances^{51}. For each monkey, we divided the cross-correlations into two groups with larger (*d*_{RF,L}) and smaller (*d*_{RF,S}) RF-center distances than the median distance (monkey G: 0 *< d*_{RF,S} *<* 2.08, 2.08 *< d*_{RF,L} *<* 5, monkey R: 0 *< d*_{RF,S} *<* 0.77, 0.77 *< d*_{RF,L} *<* 2.25, all distances are in degrees of visual angle, dva) and averaged the cross-correlations within each group. For comparison, we also computed the average auto-correlation of population activity on individual channels (i.e. without pooling spikes across channels). The differences between auto- and cross-correlations of V4 data appeared smaller than in the model since horizontal displacements between channels were relatively small, sampling mainly within the same or nearby columns^{51}.

The cross-correlations of V4 activity exhibited distinct fast and slow decay rates as predicted by the spatial network model (Fig. 5j, Supplementary Fig. 9, left). In agreement with the spatial network model, zero time-lag cross-correlations decreased with increasing RF-center distance (monkey G: mean for *d*_{RF,S} = 0.047, *d*_{RF,L} = 0.040, *p* = 4×10^{−4}, *N* = 152; monkey R: mean for *d*_{RF,S} = 0.022, *d*_{RF,L} = 0.013, *p* = 0.001, *N* = 128, two-sided Wilcoxon rank-sum test), consistent with the reduction of pairwise noise correlations with lateral distance in V4^{51, 52}. The shapes of V4 auto- and cross-correlations were well approximated by fitted two-timescale generative models (Fig. 5j, Supplementary Fig. 9, left), and the estimated posterior distributions allowed us to compare auto- and cross-correlation timescales at different distances (Fig. 5k, Supplementary Fig. 9, right). Both fast and slow timescales were smaller in autocorrelations than in cross-correlations (Fast timescale: monkey G, mean *τ*_{1,AC} = 10.11 ms, *τ*_{1,CC,S} = 12.24 ms, *τ*_{1,CC,L} = 14.19 ms; monkey R, mean *τ*_{1,AC} = 4.93 ms, *τ*_{1,CC,S} = 12.18 ms, *τ*_{1,CC,L} = 12.34 ms; Slow timescale: monkey G, mean *τ*_{2,AC} = 75.46 ms, *τ*_{2,CC,S} = 83.94 ms, *τ*_{2,CC,L} = 101.94 ms; monkey R, mean *τ*_{2,AC} = 26.53 ms, *τ*_{2,CC,S} = 358.07 ms, *τ*_{1,CC,L} = 552.70 ms; number of samples in each posterior *N* = 100, all p-values *<* 10^{−10}, two-sided Wilcoxon rank-sum test). Both fast and slow timescales of cross-correlations increased with the RF-center distance in both monkeys, but the increase in the fast timescale did not reach statistical significance in monkey R (*τ*_{2}: *p <* 10^{−10}, *τ*_{1}: *p*_{G} *<* 10^{−10}, *p*_{R} = 0.36, two-sided Wilcoxon rank-sum test), possibly due to narrower range of RF-center distances in monkey R compared to monkey G (median *d*_{RF,R} = 0.77, *d*_{RF,G} = 2.08 dva). Thus, predictions of the spatial network model, but not the models with random connectivity, were borne out by the data.

These results suggest that multiple timescales in local population activity in V4 arise from the recurrent dynamics shaped by the spatial connectivity of the primate visual cortex and not from local biophysical processes alone. Local biophysical mechanisms can also contribute to generating multiple neural timescales. For example, spatial connectivity combined with synaptic filtering can give rise to multiple autocorrelation timescales (Supplementary Fig. 10). The dependence of cross-correlation timescales on distance indicates that dominant timescales in the local population activity reflect the spatial network structure.

### Changes in the efficacy of network interactions modulate local timescales

We used the spatial network model to investigate which mechanisms can underlie the modulation of the slow timescales during attention. We matched the timescales between the model with local connectivity (*r* = 1) and experimental data to determine which changes in the model parameters can explain the attentional modulation of timescales in V4. We matched the self-excitation and effective interaction timescales of a model unit to, respectively, the fast and slow timescales of V4 activity (mean timescale ± s.e.m., Methods) for both the attend-away and attend-in (averaged over covert and overt) conditions (Fig. 7). We used a combination of analytical approximations and model simulations to find parameters that produce timescales similar to the V4 data (Methods).

We found that to reproduce the timescales in V4, the model needs to operate close to the critical point BP = 1 (Fig. 7b). At the critical point, each unit activates one other unit on average resulting in self-sustained activity^{53}. Close to this regime, the timescales are flexible, such that small changes in the network excitability give rise to significant changes in timescales. To increase the slow timescale during attention, the total excitability of the network interactions should increase, shifting the network dynamics closer to the critical point. The overall increase in the interaction strength can be achieved by increasing the strength of either the self-excitation (*p*_{s}) or the recurrent interactions (*p*_{r}). Increasing *p*_{r} while keeping *p*_{s} constant allows for substantial changes in the slow timescale and a nearly unchanged fast timescale consistent with the V4 data. The increase of *p*_{s} in the model produces a slight increase in the fast timescale (*τ*_{1}) (∼0.4 ms on average), but such small changes in *τ*_{1} would be undetectable with our available data amount (the uncertainty of *τ*_{1} MAP estimate is ±0.9 ms on average, Fig. 3b,e). The increase in *p*_{s} can also be counterbalanced by a reduction in *p*_{r} to produce the observed changes of timescales.

Several mechanisms can account for changes in the strength of recurrent interactions during attention. For example, the increase in *p*_{s} is consistent with the observation that interactions between cortical layers in V4 increase during attention^{42}, when *p*_{s} is interpreted as the strength of vertical recurrent interactions within cortical mini-columns. A reduction in *p*_{r} can be mediated by neuromodulatory effects that reduce the efficacy of lateral connections in the cortex during attention^{54}. In addition, our analytical derivations show that in the model with non-linear recurrent interactions, the effective strengths of recurrent interactions can also change by external input (Methods, details in^{51}). The input alters the operating regime of network dynamics changing the effective strength of recurrent interactions. Thus, with non-linear interactions, timescales can be modulated by the input to the network, such as top-down inputs from higher cortical areas during attention^{51, 55}. Altogether, our model suggests that attentional modulation of timescales can arise from changes in the efficacy of recurrent interactions in visual cortex that can be mediated by neuromodulation or top-down attentional inputs.

## Discussion

We found that ongoing spiking activity of local neural populations within columns of the area V4 unfolded across fast and slow timescales, both in the presence and absence of visual stimuli. The slow timescale increased when monkeys attended to the receptive fields location, showing that local intrinsic timescales can change flexibly from trial to trial according to selective attention. Furthermore, the slow timescales of neurons with RFs in the attended location correlated with the monkeys’ reaction times suggesting that the increase in the slow timescale may contribute to behavioral benefits of selective spatial attention. To understand the mechanisms underlying the multiplicity and flexible modulation of timescales, we developed network models linking intrinsic timescales to biophysical properties of individual neurons or the spatial connectivity structure of the visual cortex. Only the spatial network model correctly predicted the distance-dependence of spatiotemporal correlations in V4, indicating that multiple timescales in V4 dynamics arise from the spatial connectivity of primate visual cortex. The model suggests that slow timescales increase with the effective strength of recurrent interactions.

### Multiple intrinsic timescales in neural activity

Previous studies characterized the autocorrelation of ongoing neural activity with a single intrinsic timescale^{11, 13, 15, 16}. The intrinsic timescale was usually measured for neural populations either by averaging autocorrelations of single neurons in one area^{11} or using coarse-grained measurements such as ECoG^{13} or fMRI^{15, 16}. Thus, ongoing dynamics in each area were described with a single intrinsic timescale that varied across areas. We extended this view by showing that, within one area, local population activity exhibits multiple intrinsic timescales. These timescales reflect ongoing dynamics on single trials and are not driven by task events. Our results suggest that the multiplicity of timescales is an intrinsic property of neural activity arising from inherent cellular and network properties of the cortex.

We show that multiple timescales in local dynamics can emerge from the spatial connectivity structure in a recurrent network model. The presence of two dominant timescales (*τ*_{self}, *τ*_{int}) in local dynamics depends on the combination of the structured connectivity and strong, mean-driven interactions between units. Networks with random connectivity (Fig. 6,b) or weak, diffusion-type interactions^{51} exhibit only one dominant timescale in local activity (Supplementary Note 6). Moreover, local biophysical properties alone cannot explain the dependence of spatiotemporal neural correlations on lateral distance in the cortex, highlighting the importance of spatial network interactions for generating multiple timescales in local population activity.

In our network model with local spatial connectivity, recurrent interactions across different spatial scales induce multiple slow timescales. To generate multiple slow timescales, our network operates close to a critical point. Spiking networks with spatial connectivity can generate fast correlated fluctuations that emerge from instability at particular spatial frequency modes^{56}. Slow fluctuations of firing rates can also arise in networks with clustered random connectivity, but interactions between clusters induce only a single slow timescale^{34}. We show that more local spatial connectivity (smaller *r*) leads to slower dynamics and modifies the weights and composition of timescales in the local activity. The timescale of the global activity, on the other hand, is the same across networks with distinct local timescales and different connectivity structures. These results show that local temporal and spatial correlations of neural dynamics are closely tied together.

In our model, integrating activity over larger spatial scales leads to disappearance of faster interaction timescales (higher spatial frequencies) leaving only slower interaction timescales (lower spatial frequencies) in the coarse-grained activity. At the extreme, the global network activity exhibits only the slowest interaction timescale (the global timescale). This mechanism may explain the prominence of slow dynamics in meso- and macroscale measures of neural activity such as LFP or fMRI^{57}, while faster dynamics dominate in local measures such as spiking activity. The model predicts that the slowest interaction timescales have very small weights in the autocorrelation of local neural activity and thus can be detected in local activity only with excessively long recordings. Indeed, infraslow timescales (on the order of tens of seconds and minutes) are evident in the cortical spiking activity recorded over hours^{58}.

### Functional relevance of neural activity timescales

Intrinsic timescales are thought to define the predominant role of neurons in the cognitive processes^{17}. For example, in the orbitofrontal cortex, neurons with long intrinsic timescales are more involved in decision-making and the maintenance of value information^{44}. In the prefrontal cortex (PFC), neurons with short intrinsic timescales are primarily involved in the early phases of working memory encoding^{31}, while neurons with long timescales play a significant role in coding and maintaining information during the delay period^{31, 45}. Our finding that intrinsic timescales can flexibly change from trial to trial (and across epochs within a trail^{13}) suggests a possibility that task-induced timescales may correspond with intrinsic timescales only during specific task phases. These results may explain why the task-induced timescales of single neurons do not correlate with intrinsic timescales measured over the entire task duration^{12}.

We found that timescales of local neural activity changed from trial to trial depending on the attended location. A previous ECoG study found that the intrinsic timescale of neural activity in cortical association areas increased after engagement in a working memory task^{13}. Our findings go beyond this earlier work by showing that the modulation of timescales can be functionally specific as it selectively affects only neurons representing the attended location within the retinotopic map. While changes in timescale due to task engagement could be mediated by slow global processes such as arousal, the retinotopically precise modulation of timescales requires local changes targeted to task-relevant neurons. Our results further show that the modulation of timescales also occurs in sensory cortical areas and cognitive processes other than memory maintenance^{13} which explicitly requires temporal integration of information. The correlation of slow timescales with reaction times during attention may be functionally relevant, potentially allowing neurons to integrate information over longer durations.

Longer timescales during attention in the model are associated with shifting the network dynamics closer to a critical point. Shifting closer to criticality was also suggested as a mechanism for the increase in gamma-band synchrony and stimulus discriminability during attention^{59}. Furthermore, strong recurrent dynamics close to the critical point can flexibly control the dimensionality of neural activity^{60}. Hence, operating closer to the critical point during attention might help to optimize neural responses to environmental cues and improve information processing^{61}.

### Mechanisms for attentional modulation of timescales

Changes in the slow timescale of neural activity due to attention occurred from one trial to another. Such swift changes cannot be due to significant changes in the underlying network structure and require a fast mechanism. Our model suggests that the modulation of slow timescales during attention can be explained with a slight increase in the network excitability mediated by an increase in the efficacy of horizontal recurrent interactions, or by an increase in the efficacy of vertical interactions accompanied by a decrease in the strength of horizontal interactions.

Several physiological processes may underlie these network mechanisms in the neocortex. Top-down inputs during attention can enhance the local excitability in cortical networks^{55}. Our analytical derivations show that inputs can increase the effective strength of recurrent interactions between neurons in networks with non-linear interactions, similar to previous models^{18, 62}. Similar modulation of timescales during covert and overt attention suggests that top-down attentional inputs arrive from brain areas that represent both attention-related and saccade-related information. Frontal eye field (FEF) can be a possible source for such modulations^{37, 63, 64}. Furthermore, feedback connections from higher visual areas like PFC or the temporal-occipital area (TEO) to lower visual areas have broader terminal arborizations than the size of the receptive fields in lower areas^{65, 66}. These feedback inputs can coordinate activity across minicolumns in V4. Moreover, vertical interactions in V4 measured with local field potentials (LFPs) increase during attention^{42}, while neuromodulatory mechanisms can reduce horizontal interactions. The level of Acetylcholine (ACh) can modify the efficacy of synaptic interactions during attention in a selective manner^{54}. Increase in ACh strengthens the thalamocortical synaptic efficacy by affecting nicotinic receptors and reduces the efficacy of horizontal recurrent interactions by affecting muscarinic receptors. Decrease in horizontal interactions is also consistent with the proposed reduction of spatial correlations length during attention^{51}. These observations suggest that an increase in vertical interactions and a decrease in horizontal interactions is a likely mechanism for modulation of the slow timescale during attention.

To identify biophysical mechanisms of timescales modulation, experiments with larger number of longer trials are required to provide tighter bounds for estimated timescales. Additionally, detailed biophysical models can help distinguish different mechanisms, since biophysical and cell-type specific properties of neurons might also be involved in defining neural timescales^{67, 68}. In particular, diverse timescales observed across single neurons within one area^{17, 31, 44, 45, 69} require models considering a heterogeneous parameter space and can have computational implications for the brain^{70}. Here, we used the RF-center distances as a proxy for spatial distances in the cortex. Experiments with spatially organized recording sites would allow to study the relation between temporal and spatial correlations more directly. Furthermore, developing recurrent network models that perform the selective attention task can help to find direct links between the modulation of dynamics and task performance. Finally, perturbation experiments that modulate selectively top-down inputs or neuromodulatory levels can provide the most direct test of the underlying mechanisms.

Our findings reveal that targeted neural populations can integrate information over variable timescales following changes in the cognitive state. Our model suggests that local interactions between neurons via the spatial connectivity of primate visual cortex can underlie the multiplicity and flexible modulation of intrinsic timescales. Our experimental observations combined with the computational model provide a basis for studying the link between the network structure, functional brain dynamics, and flexible behavior.

## Methods

### Behavioral tasks and electrophysiology recordings

Experimental procedures were described previously^{36, 37}. Experimental procedures for the fixation task and attention task 1 were in accordance with NIH Guide for the Care and Use of Laboratory Animals, the Society for Neuroscience Guidelines and Policies, and Stanford University Animal Care and Use Committee. Experimental procedures for the attention task 2 were in accordance with the European Communities Council Directive RL 2010/63/EC, and Use of Animals for Experimental Procedures, and the UK Animals Scientific Procedures Act.

In brief, on each trial of the fixation task (FT, monkey G), the monkey was rewarded for fixating a central dot on a blank screen for 3 s. In attention task 1 (AT1, monkeys G, B), the monkey detected orientation changes in one of the four peripheral grating stimuli while maintaining central fixation. Each trial started by fixating a central fixation dot on the screen and after several hundred milliseconds (170 ms for monkey B and 333 ms for monkey G), four peripheral stimuli appeared. Following a 200 − 500 ms period, a central attention cue indicated the stimulus that was likely to change with ∼90% validity. Cue was a short line from fixation dot pointing toward one of the four stimuli, randomly chosen on each trial with equal probability. After a variable interval (600 − 2200 ms), all four stimuli disappeared for a brief moment and reappeared. Monkeys were rewarded for correctly reporting the change in orientation of one of the stimuli (50% of trails) with an antisaccade to the location opposite to the change, or maintaining fixation if none of the orientations changed. Due to the anticipation of antisaccade response, the cued stimulus was the target of covert attention, while the stimulus in location opposite to the cue was the target of overt attention. In attend-in conditions, the cue pointed either to the stimulus in the RFs of the recorded neurons (covert attention) or to the stimulus opposite to the RFs (overt attention). The remaining two cue directions were attend-way conditions.

In attention task 2 (AT2, monkey R), the monkey detected a small luminance change within the white phase of a square wave static grating. The monkey initiated a trial by holding a bar and visually fixating a fixation point. The color of the fixation point indicated the level of spatial certainty (red: narrow focus, blue: wide focus). After 500 ms a cue appeared indicating the location and focus of the visual field to attend to. The cue was switched off after 250 ms. After another second two gratings appeared, one in the center of the RFs and one diametrically opposite with respect to the fixation point. The grating at the position indicated by the cue was the test stimulus. The other grating served as the distractor. After at least 500 ms a small luminance change (dimming) occurred either in the center of the grating (narrow focus) or in one of 12 peripheral positions (wide focus). If the dimming occurred in the distractor grating first the monkey had to ignore it. The monkey was rewarded for a bar release within 750 ms of the dimming in the test grating. The faster the monkey reacted, the larger reward it received. Two grating sizes (small and large) were used in this experiment. We analyzed trials with the small grating to avoid surround-suppression effects created by the large grating sizes extending beyond the neurons’ summation area^{71}.

Recordings were performed in the visual area V4 with linear array microelectrodes inserted perpendicularly to the cortical layers. Arrays were placed such that receptive fields of recorded neurons largely overlapped. Each array had 16 channels with 150 *μ*m center-to-center spacing. In AT1 and FT, all 16 channels were visually responsive. In AT2, the number of visually-responsive channels per recording ranged between 8 and 12 with the median at 9.

### Computing autocorrelations of neural activity

We computed autocorrelations from multi-unit (MUA) spiking activity recorded in the presence (stimulus-driven) and absence (spontaneous) of visual stimuli (brown and yellow frames in Supplementary Fig. 1). For spontaneous activity, we analyzed spikes during the 3s fixation epoch in FT, and during the 800 ms epoch from 200 ms after the cue offset until the stimulus onset in AT2. For stimulus-driven activity, we analyzed spikes in the epoch from 400 ms after the cue onset until the stimulus offset in AT1, and from 200 ms after the stimulus onset until the dimming in AT2. For the stimulus-driven activity, trials in both attention tasks had variable durations (500 − 2200 ms). Thus, we computed autocorrelations in non-overlapping windows of 700 ms for AT1 and 500 ms for AT2. On long trials, we used as many windows as would fit within the trial duration, and we discarded trials that were shorter than the window size. The duration of windows were selected such that we had at least 50 windows for each condition in each session. 3 out of 25 recording sessions in monkey G (AT1) were excluded due to short trial durations. For spontaneous activity, the windows were 3 s in FT and 800 ms in AT2.

We computed the average spike-count autocorrelation for each recording session. On each trial we pooled the spikes from all visually-responsive channels and counted the pooled spikes in 2 ms bins. For each behavioral condition (stimulus orientation, attention condition), we averaged spike-counts at each time-bin across trials, and subtracted the trial-average from the spike-counts at each bin^{11} to remove correlations due to changes in firing rate locked to the task events. We segmented the mean-subtracted spike-counts into windows of the same length *N*, where indexes bins within a window. We then computed the autocorrelation in each window as a function of time-lag *t*_{j}^{39}:

Here is the sample variance, and and are two different sample means. In Eq.(1) for autocorrelation, we subtracted window-specific mean to remove correlations due to slow changes in firing rate across trials, such as slow fluctuations related to changes in the arousal state. Thus, the range of timescales was limited to the trial duration. These timescales reflect the intrinsic neural dynamics within single trials. Finally, we averaged the autocorrelations over windows of the same behavioral condition separately for each recording session. The exact method of computing autocorrelations does not affect the estimated timescales, since we use the same method for computing autocorrelations of synthetic data when fitting generative models with the aABC method^{39}.

In AT1, we averaged autocorrelations over trials with different stimulus orientation for each attention condition, since all attention conditions contained about the same number of trials with each orientation. For stimulus-driven activity in AT2, we first estimated timescales separately for focus wide and narrow conditions and found no significant differences (two-sided Wilcoxon signed rank test between MAP estimates, *p >* 0.05). Thus, we averaged autocorrelations of the focus narrow and wide conditions and refitted the average autocorrelations. The same procedure was applied to the spontaneous activity in AT2, and since there was no significant differences in timescales between different focus or attention conditions (two-sided Wilcoxon signed rank test between MAP estimates for the two-by-two conditions, *p >* 0.05), we averaged the autocorrelations over all conditions and refitted the average autocorrelation.

For estimating the timescales, we excluded sessions with autocorrelations dominated by noise or strong oscillations that could not be well described with a mixture of exponential decay functions. We excluded a session if the autocorrelation fell below 0.01 (log(*AC*) fell below −2) in lags smaller or equal to 20 ms (Supplementary Fig. 11). Based on this criterion, we excluded 3 out of 22 sessions for monkey G in AT1, 8 out of 21 sessions during covert attention and 9 out of 21 during overt attention for monkey B in AT1, 2 out 20 sessions for spontaneous activity and 8 out 20 sessions for stimulus-driven activity for monkey R in AT2. The difference in the number of excluded sessions for monkey R during spontaneous and stimulus-driven activity is explained by the larger amount of data available for computing autocorrelations during spontaneous activity due to averaging over attention conditions and longer window durations (800 ms vs. 500 ms).

For visualization of autocorrelations, we omitted the zero time-lag (*t* = 0 ms) (examples with the zero time-lag are shown in Supplementary Fig. 11). The autocorrelation drop between the zero and first time-lag (*t* = 2 ms) reflects the difference between the total variance of spike counts and the variance of instantaneous rate according to the law of total variance for a doubly stochastic process^{39}. This drop is fitted by the aABC algorithm when estimating the timescales.

### Estimating timescales with adaptive Approximate Bayesian Computations

We estimated the autocorrelation timescales using the aABC method that overcomes the statistical bias in empirical autocorrelations and provides the posterior distributions of unbiased estimated timescales^{39}. The width of inferred posteriors indicates the uncertainty of estimates. For more reliable estimates of timescales (i.e. narrower posteriors), we selected epochs of experiments with longer trial durations (brown and yellow frames in Supplementary Fig. 1).

The aABC method estimates timescales by fitting the spike-count autocorrelation with a generative model. We used a generative model based on a doubly stochastic process with one or two timescales. Spike-counts were generated from a rate governed by a linear mixture of OrnsteinUhlenbeck (OU) processes (one OU process for each timescale *τ*_{k})
where *n* is the number of timescales and *c*_{k} are their weights. The aABC algorithm optimizes the model parameters to match the spike-count autocorrelations between V4 data and synthetic data generated from the model. We generated synthetic data with the same number of trials, trial duration, mean and variance of spike counts as in the experimental data. By matching these statistics, the empirical autocorrelations of the synthetic and experimental data are affected by the same statistical bias when their shapes match. Therefore, the timescales of the fitted generative model represent the unbiased estimate of timescales in the neural data.

The spike-counts *s* are sampled for each time-bin from a distribution , where is the mean spike-count and is the bin size. To capture the possible non-Poisson statistics of the recorded neurons, we introduce a dispersion parameter *α* defined as the variance over mean ratio of the spike-counts distribution . For a Poisson distribution *α* is equal to 1. We allow for non-Poisson statistics by sampling the spike counts from a gamma distribution and optimize the value of *α* together with the timescales and the weights.

On each iteration of the aABC algorithm, we draw sample parameters from a prior distribution (first iteration) or a proposal distribution (subsequent iterations) defined based on the prior distribution and parameters accepted on the previous iteration. Then, we generate synthetic data from the sampled parameters and compute the distance *d* between the autocorrelations of synthetic and experimental data:
where *t*_{m} is the maximum time-lag considered in computing the distance. We set *t*_{m} to 100 ms to avoid over-fitting the noise in the tail of the autocorrelations. If the distance is smaller than a predefined error threshold *ε*, the sample parameters are accepted and added to the posterior distribution. Each iteration continued until 100 sample-parameters were accepted. The initial error threshold was set to *ε*_{0} = 0.1, and in subsequent iterations, the error threshold was updated to the first quartile of the distances for the accepted samples. The fraction of accepted samples out of all drawn parameter samples is recorded as the acceptance rate *accR*. The algorithm stops when the acceptance rate reaches *accR <* 0.0007. The final accepted samples are considered as an approximation for the posterior distribution. We computed the MAP estimates by smoothing the final joint posterior distribution with a multivariate Gaussian kernel and finding its maximum with a grid search.

The choice of summary statistic (e.g., autocorrelations in the time domain or power spectra in the frequency domain and the fitting range) does not does not affect the accuracy of estimated timescales and only changes the width of the estimated posteriors^{39}. The frequency-domain fitting converges faster in wall-clock time than time-domain fitting^{39}. As a control, we also estimated timescales by fitting the whole shape of power spectral density in the frequency domain. The results of these fits (Supplementary Fig. 7) were in agreement with the time-domain fits with a limited fitting range (Fig. 3).

We used a multivariate uniform prior distribution over all parameters. For the two-timescale generative model (*M*_{2}), the priors’ ranges were set to
and for the one-timescale generative model (*M*_{1}) they were set to

### Model comparison with adaptive Approximate Bayesian Computations

We used the inferred posteriors from the aABC fit to determine whether the V4 data autocorrelations were better described with the one-timescale (*M*_{1}) or the two-timescale (*M*_{2}) generative models^{39}. First, we measured the goodness of fit for each model based on the distribution of distances between the autocorrelation of synthetic data from the generative model and the autocorrelation of V4 data. We approximated the distributions of distances by generating 1000 realizations of synthetic data from each model with parameters drawn from the posterior distributions and computing the distance for each realization. If the distributions of distances were significantly different (two-sided Wilcoxon ranksum test), we approximated the Bayes factor, otherwise the summary statistics were not sufficient to distinguish these two models^{72}.

Bayes factor is the ratio of marginal likelihoods of the two models and takes into account the number of parameters in each model^{73}. In the aABC method, the ratio between the acceptance rates of two models for a given error threshold *ε* approximates the Bayes factor (BF) for that error threshold^{39}:

Acceptance rates can be computed using the cumulative distribution function (CDF) of the distances for a given error threshold *ε*,
where *pM*_{i} (d) is the probability distribution of distances for the model *M*_{i}. Thus, the ratio between the CDF of distances approximates the Bayes factor for every chosen error threshold. To eliminate the dependence on a specific error threshold, we computed the acceptance rates and the Bayes factor for varying error thresholds. Since only small errors indicate a well-fitted model, we computed the Bayes factor for all error thresholds that were smaller than the largest median of distance distributions of two models.

The *M*_{2} model was selected if its distances were significantly smaller than for the *M*_{1} model (two-sided Wilcoxon ranksum test) and , i.e. BF *>* 1, for all [median(*ε*)] (Supplementary Fig. 2). The same procedure was applied for selecting the *M*_{1} model. Although the Bayes factor threshold was set at 1, in most cases we obtained BF » 1, indicating strong evidence for the two-timescale model. If the distribution of distances for the two models were not significantly different or the condition for the ratio between CDFs did not hold for all selected *ε* (CDFs were crossing), we classified the outcome as inconclusive, meaning that data statistics were not sufficient to make the comparison.

### Timescales of auto- and cross-correlations of spiking activity on individual channels

We computed the average auto- and cross-correlations of the multi-unit spiking activity recorded on individual channels during spontaneous activity (monkey G in FT, monkey R in AT2). We computed the autocorrelation of each channel’s activity using the same procedure described above and then averaged the auto-correlations across channels and recording sessions for each monkey. We computed the cross-correlations between spike counts on every pair of channels (*A*_{a} and *A*_{b}) that were at least two channels apart (|*a* − *b*| ≥ 2 e.g., channels 1 and 3) as a function of time-lag *t*_{j}

Here and are the sample variances, and and are the sample means for the activity on each channel. Then, we divided the cross-correlations for each monkey in two groups based on the monkey-specific median RF-center distance and averaged over the cross-correlations within each group.

The mapping of RFs was described previously^{36}. RFs were measured by recording spiking responses to brief flashes of stimuli on an evenly spaced 6 × 6 grid covering the lower left visual field (FT) or an evenly spaced 12×9 grid centered on the RF (AT2). Spikes in the window 0−200 ms (FT) or 50−130 ms (AT2) relative to the stimulus onset were averaged across all presentations of each stimulus. First, we assessed the statistical significance of a given RF^{74} and only included channels with a significant RF. Then, we found the RF center as the center of mass of the response map, and estimated the horizontal displacements between the channels by computing the distances between their RF centers.

We estimated the timescales of auto- and cross-correlations using the aABC method. We assumed the correlation between channels’ activity can be modeled as a two-timescale OU process shared between the two channels. We fitted the cross-correlation shape by the unnormalized autocorrelation of the shared OU process, such that the variance of the OU process (i.e. the autocorrelation at lag zero) defines the strength of correlations. Thus, we used a two-timescale OU process as the generative model and applied the aABC method to optimize the model parameters by minimizing the distance between the autocorrelation of synthetic data from the OU process and V4 cross-correlations. The aABC method returned a multivariate posterior distribution for timescales, their weights and the variance of the OU process. We computed the distances starting from the first time-lag *t* = 2 ms up to *t*_{m} = 100 ms. For a fair comparison between the auto- and cross-correlations timescales, we used the same procedure to estimate the timescales of individual channels’ autocorrelations. For fitting the autocorrelation of monkey G, we additionally excluded the second time-lag *t* = 2 ms, since *AC*(*t* = 2) *< AC*(*t* = 4), potentially related to refractory period of neurons (similar to^{11, 31, 44}).

### Testing correlation between timescales and reaction times with linear mixed-effects models

To compute the reaction times for each attention condition, we separated the trials between attend-in (separate covert and overt) and attend-away conditions. We computed the average reaction times of the monkeys for each recording session and each condition as the average duration between the reappearance of the stimuli and initiation of the anti-saccade response (AT1, only trials with a change in stimuli orientation) or the average duration between dimming in the target stimulus and the bar release (AT2), across trials with the same attention condition.

We quantified the relationship between average reaction times and MAP estimates of the fast and slow timescales in each session for two different attention conditions (attend-in and attend-away). For this analysis, we pulled the data across covert and overt attend-in conditions, resulting in more samples for the attend-in than attend-away condition. For each attention condition, we fitted a separate linear mixed-effects model using the “*fitlm*” function in the MATLAB R2021a. In these models, we consider data from each monkey as a separate group (i.e. a random effect) with a separate intercept to account for individual differences between the monkeys and between the two response types in the attention tasks (anti-saccade versus bar release).

We fitted two different models that considered either one or two fixed effects for each attention condition. First, we fitted models that considered as the fixed effect, either the slow timescale (*τ*_{2,cond})
or the fast timescale (*τ*_{1,cond}),

Here the index cond denotes attend-in or attend-away condition, *RT* indicates the reaction time, *i* is the session index, and *m* ∈ {*G, B, R*} indicates three different monkeys. *ω*_{0} and *ω*_{1} give the intercept and slope of the fixed effect with a given p-value. Ω _{0,m} and *ε*_{i,m} are the random effects, where Ω _{0,m} gives a monkey specific intercept and *ε*_{i,m} gives the residuals. We also fitted models that considered both fast and slow timescales as fixed effects simultaneously,

These models return two fixed-effect coefficients *ω*_{1,2} with p-values, one for each timescale. The resulting statistics for the two fitted models were consistent (Supplementary Table 1, 2). In the main text, we reported statistics from the first model type (Fig. 4, Supplementary Table 1).

### Network model with spatially structured connections

The network model operates on a two-dimensional square lattice of size 100 × 100 with periodic boundary conditions. Each unit in the model is connected to 8 other units taken either from its direct Moore neighborhood (local connectivity, Fig. 6a, top) or randomly selected within the connectivity radius *r* (dispersed connectivity, Fig. 6a, bottom). Activity of each unit is represented by a binary state variable *S*_{i} ∈ {0, 1} (*i* = 1 … *N*, where *N* = 10^{4} is the number of units). The units act as probabilistic integrate-and-fire units^{75} following linear or non-linear integration rules. States of the units are updated in discrete time-steps *t*′ based on a self-excitation probability (*p*_{s}), probability of excitation by the connected units (*p*_{r}), and the probability of external excitation (*p*_{ext} « 1). The transition probabilities for each unit *S*_{i} at time-step *t*′ are either governed by additive interaction rules (linear model):
or multiplicative interaction rules (non-linear model):

Here, ∑ _{j} *S*_{j} indicates the number of active neighbors of unit *S*_{i} at time-step *t*′. For the analysis in the main text, we used the linear model. The non-linear model generates similar local temporal dynamics (Supplementary Fig. 12). In the linear model, the sum of connection probabilities BP = *p*_{s} + 8*p*_{r} is the branching parameter that defines the state of the dynamics relative to a critical point at BP = 1^{53, 75}.

To compute the average local autocorrelation in the network, we simulated the model for 10^{5} time-steps and averaged the autocorrelations of individual units. The global autocorrelations were computed from the pooled activity of all units in the network. To compute the autocorrelation of horizontal inputs for a unit *i*, we simulated the network with an additional “shadow” unit, which was activated by the same horizontal inputs (*p*_{r}) as the unit *i* but without the inputs *p*_{s} and *p*_{ext}. The shadow unit did not activate other units in the network. The autocorrelation of horizontal recurrent inputs was computed from the shadow unit activity. We computed the cross-correlations between the activity of each pair of units in the network and averaged the cross-correlations over pairs with the same distance *d* between units. To have the same number of sample cross-correlations for each distance, we randomly selected 4 × 10^{4} pairs per distance. The spatial distance in the model is defined as the Chebyshev distance on the lattice (e.g., *d* = 1 is the Moore neighborhood). Each simulation started with a random configuration of active units based on the analytically computed steady-state mean activity (Eq. 21). Running simulations for long periods allowed us to avoid the statistical bias in the model autocorrelations. We set *p*_{ext} = 10^{−4}, but the strength of external input in the linear model does not affect the autocorrelation timescales.

### Network model with different unit types

In this model, two unit-types A and B are placed at each node of a two-dimensional square lattice (Fig. 5a). The connectivity between the units is random and each unit is connected to 8 other units of any type.

The activity of each unit is given by a binary state variable *S*_{i} ∈ {0, 1} with transition probabilities as in the spatial linear model (Eq. 12), but with different probabilities for the self-excitation (*p*_{self,A}, *p*_{self,B}) and recurrent interactions (*p*_{r,A}, *p*_{r,B}) for each unit type. In order for both unit types to operate in the same dynamical regime, we set *p*_{self,A} + 8 *p*_{r,A} = *p*_{self,B} + 8 *p*_{r,B} = BP. Simulations were performed as for the spatial network, but auto- and cross-correlations were computed using the summed activity of two units *A* and *B* at each lattice node.

### Network model with synaptic filtering

The model operates on a two-dimensional square lattice, where each unit on the lattice is connected to 8 randomly selected units (Fig. 5b). We define the discrete-time dynamics of units in this model based on a previously proposed continuous rate model with synaptic filtering^{46}. The transition probabilities for each binary unit *S*_{i} ∈ {0, 1} at time-step *t*′ are governed by

Here, *f* (∑ _{j} *S*_{j}) is a low-pass filter on recurrent inputs to each unit with the time constant *τ*_{synapse}, which evolves in discrete time-steps:
where Δ*t*′ = 1 ms is the duration of each time step. Simulations and computation of auto- and cross-correlations were the same as for the spatial network.

### Analytical derivation of local timescales in the spatial network model

For analytical derivations, we derived a continuous-time rate model corresponding to the linear probabilistic network model (Eq. 12), with the transition rates defined as

These equations contain two non-interaction terms and , and two interaction terms , where Δ*t*′ = 1 ms is the duration of each time step (details in^{51}). For this model, the probability of units to stay in a certain configuration {*S*} = {*S*_{1}, *S*_{2}, …, *S*_{N} } at time *t*′ is denoted as *P* ({*S*}, *t*′). The master equation describing the time evolution of *P* ({*S*}, *t*′) is given by^{50}:
where {*S*}^{i*} = {*S*_{1}, *S*_{2}, …, 1 − *S*_{i}, …, *S*_{N} }. Using the master equation, we can write the time evolution for the first and second moments as
and for the time-delayed quadratic moment at time-lag *t* as

By setting the right side of Eq. 18 to zero and averaging across all units, we can compute the steady-state mean activity
where *n* = 8 is the number of incoming connections to each unit.

We compute the timescales analytically for the network with local connections (*r* = 1). From Eq. 20, we can derive the equation for the average autocorrelation of each unit *AC*(*t*) as

Here *CC*(**x**, *t*) is the cross-correlation between each unit at location (*i, j*) and its 8 nearest neighbors **x** = (*i* ± 1, *j* ± 1). The cross-correlation term in this equation gives rise to the interaction timescales in the autocorrelation. By neglecting the cross-correlation term, we can solve the Eq. 22 to get the self-excitation timescale

Solving the dynamical equation for the time-delayed cross-correlation (Eq. 20) in the Fourier domain gives the interaction timescales (Supplementary Note 4, details in^{47}):
where **k** = (*k*_{1}, *k*_{2}) are the spatial frequencies in the Fourier space. For each **k** we get a different interaction timescale. Smaller **k** (low spatial frequencies) correspond to interactions on larger spatial scales, whereas larger **k** (high spatial frequencies) correspond to interactions on more local spatial scales. The largest interaction timescale (the global timescale) is defined based on the zero spatial frequency mode:

In these derivations, we defined distances between units as Euclidean distances and discarded the contributions from third and higher moments.

Considering the self-excitation and interaction (i.e. cross-correlation) terms, we can write down the analytical form of the autocorrelation function as
where *A* is the normalization constant to get *AC*(*t* = 0) = 1. *N* ^{i} is the number of units in each dimension: *N* ′ × *N* ′ = *N*. This equation shows that the autocorrelation function contains self-excitation timescale *τ*_{self} and *N* ′^{2}*/*4 interaction timescales weighted by the amplitude of cross-correlation function for the given spatial frequency mode (*k*_{1}, *k*_{2}). We can approximate the slow decay of the autocorrelation with an effective interaction timescale *τ*_{int} given by the weighted average of all interaction timescales created by different spatial frequency modes^{47}:

Here *CC*(0, 0) is given by .

The analytical approximation of the effective interaction timescale is more accurate when the dynamics are away from the critical point. Close to the critical point (BP → 1), the mean-field approximations are not valid.

The self-excitation timescale for the discrete time network model can also be obtained analytically using the autocorrelation of a two-state Markov process driven by the self-excitation and external input. Using the transition matrix (considering the linear model)
we can compute the autocorrelation of the Markov process at time-lag *t* (Supplementary Note 3):

The decay timescale of this autocorrelation is equivalent to the self-excitation timescale in the network model
which for Δ*t*′ = 1 is equivalent to Eq. 23.

### Analytical derivation of timescales for nonlinear interactions

We can write down the general form of transition rates described previously in Eq. 16 as
ℱ (*x*) is a non-linear activation function that is a monotonically increasing function of *x* and satisfies ℱ (0) = 0, ℱ (∞) = 1. Here we consider ℱ of the form:
where *θ* is a positive constant that controls the gain of recurrent inputs, and *n* is the number of connected neighbors to each target unit. The activation function with a constant global input current *I* ⩾ 0 can be written as:
where is the steady-state mean activity. Here *I* is a constant input current that uniformly increases activation of all units, which is different from *p*_{ext} that provides stochastic and spatially random activation of units. We interpret *I* as the attentional input (e.g., from FEF) to V4 area.

To compute the timescales in the presence of non-linearity and external input current, we can perform Taylor expansion of the interaction terms around the mean activity
where ℱ ′ denotes the derivative of ℱ and ℱ _{0} is defined as

Using these expansions, we can rewrite the transition rates as where

Hence, all non-interaction and interaction terms, as well as the mean activity depend on the external input. Consequently, the self-excitation and interaction timescales become input dependent.

The explicit form of the self-excitation timescale and the global interaction timescale are given by and

When , increasing the external input *I* would lead to an increase in the mean activity and the self-excitation timescale. This conditions implies that already active units are more excitable in the next time step compared to silent units. Moreover, if in addition to , we have , the global timescale would also increase. Other interaction timescales increase with the input when (details in^{47}). The changes in the fast timescale are smaller than in the slow timescale and can remain undetected with the limited data amount.

### Matching the timescales of the network model to neural data

To match the timescales between the model and V4 data, we used the activity autocorrelation of one unit in the network model with local connections (*r* = 1). We searched for model parameters such that the model timescales fell within the range of timescales observed in the V4 activity, which was the mean ± s.e.m of the MAP timescale-estimates across recording sessions. We computed the range for the fast timescales from the pooled attend-in and attend-away conditions, since they were not significantly different: *τ*_{1,att-away} = *τ*_{1,att-in} = 4.74 ± 0.42 ms. We used this range for the fast timescale in both the attend-in and attend-away conditions. For the slow timescales, we computed the ranges separately for the attend-in (averaged over covert and overt) and attend-away conditions: *τ*_{2,att-away} = 117.09 ± 10.58 ms, *τ*_{1,att-in} = 140.97 ± 11.51 ms.

We fitted the self-excitation and dominant interaction timescales obtained from the autocorrelation of an individual unit’s activity in the model to the fast and slow timescales of V4 data estimated from the aABC method. Using Eq. 30 and Eq. 27, we found an approximate range of parameters *p*_{s} and *p*_{r} that reproduce V4 timescales. Then, we performed a grid search within this parameter range to identify the model timescales falling within the range of V4 timescales during attend-away and attend-in conditions. We used model simulations for grid search since the analytical results for dominant timescale are approximate. We used very long model simulations (10^{5} time-steps) to obtain unbiased autocorrelations and then estimated the model timescales by fitting a double exponential function
directly to the empirical autocorrelations. We fitted the exponential function up to the time-lag *t*_{m} = 100 ms, the same as used for fitting the neural data autocorrelations with the aABC method.

## Data availability

All behavioral and electrophysiological data from FT and AT1 are available on Fighshare, respectively, at https://doi.org/10.6084/m9.figshare.19077875.v1 and https://doi.org/10.6084/m9.figshare.16934326.v3.

## Code availability

Codes for the timescale estimation and Bayesian model comparison with the aABC method are available as a Python package at: https://github.com/roxana-zeraati/abcTau. Codes for simulating the spatial network model are available at: https://github.com/roxana-zeraati/spatial-network.

## Author Contributions

R.Z., A.L., and T.A.E. designed the study. N.A.S., M.A.G, A.T., and T.M. designed the experiments. N.A.S. and M.A.G performed the experiments and spike sorting. R.Z., Y.S., A.L., and T.A.E. developed the analysis methods and mathematical models. R.Z. analyzed the data and performed model simulations. Y.L. performed the analytical calculations for the network model. R.Z., Y.S., N.A.S., M.A.G, A.T., T.M., A.L., and T.A.E. discussed the findings and wrote the paper.

## Supplementary Information

Supplementary Notes 1–6

Supplementary Tables 1–2

Supplementary Figures 1–14

## Competing interests

The authors declare no competing interests.

## Acknowledgements

This work was supported by a Sofja Kovalevskaja Award from the Alexander von Humboldt Foundation, endowed by the Federal Ministry of Education and Research (R.Z., A.L.), SMARTSTART2 program provided by Bernstein Center for Computational Neuroscience and Volkswagen Foundation (R.Z.), the NIH grant R01 EB026949 (T.A.E.), the Swartz Foundation (Y.S.), the Pershing Square Foundation (T.A.E.), the Sloan Research Fellowship (Y.S.,T.A.E.), the NIH grant EY014924 (T.M.), the MRC grant MR/P013031/1 (M.A.G., A.T.). The authors acknowledge the support from the BMBF through the Tübingen AI Center (FKZ: 01IS18039B) and the International Max Planck Research School for the Mechanisms of Mental Function and Dysfunction (IMPRS-MMFD).

## Footnotes

We performed new modeling and data analyses to strengthen our main conclusion that neural timescales in the visual cortex reflect spatial connectivity. In addition, we found that slow timescales predict animals' task performance, suggesting that the increase in slow timescales may contribute to behavioral benefits of selective spatial attention. To better convey the scope of our new results, we modified the manuscript title which now reads "Intrinsic timescales in the visual cortex change with selective attention and reflect spatial connectivity".