Generating surrogates for significance estimation of spatio-temporal spike patterns

Spatio-temporal spike patterns were suggested as indications of active cell assemblies. We developed the SPADE method to detect significant spatio-temporal patterns (STPs) with ms accuracy. STPs are defined as identically repeating spike patterns across neurons with temporal delays between the spikes. The significance of STPs is derived by comparison to the null-hypothesis of independence implemented by surrogate data. SPADE binarizes the spike trains and examines the data for STPs by counting repeated patterns using frequent itemset mining. The significance of STPs is evaluated by comparison to pattern counts derived from surrogate data, i.e., modifications of the original data with destroyed potential spike correlations but under conservation of the firing rate profiles. To avoid erroneous results, surrogate data are required to retain the statistical properties of the original data as much as possible. A classically chosen surrogate technique is Uniform Dithering (UD), which displaces each spike independently according to a uniform distribution. We find that binarized UD surrogates of our experimental data (motor cortex) contain fewer spikes than the binarized original data. As a consequence, false positives occur. Here, we identify the reason for the spike reduction, which is the lack of conservation of short ISIs. To overcome this problem, we study five alternative surrogate techniques and examine their statistical properties such as spike loss, ISI characteristics, effective movement of spikes, and arising false positives when applied to different ground truth data sets: first, on stationary point process models, and then on non-stationary point processes mimicking statistical properties of experimental data. We conclude that trial-shifting best preserves the features of the original data and has a low expected false-positive rate. Finally, the analysis of the experimental data provides consistent STPs across the alternative surrogates. Author summary The generation of surrogate data, i.e., the modification of original data to destroy a certain feature, is used for the implementation of a particular null-hypothesis if an analytical approach is not feasible. Thus, surrogate data generation has been extensively used to assess significance of spike correlations, e.g., by displacing spikes locally and uniformly (uniform dithering, UD). One of the challenges of surrogate methods is to properly construct the desired null-hypothesis distribution, i.e., to not bias the null-hypothesis through altering the spike train statistics. Here we compare five surrogate techniques to UD (two newly introduced) in the context of spatio-temporal spike pattern detection for their performance, first on controlled spike trains based on point process models, and second on modeled artificial data serving as ground truth to assess the pattern detection in a more complex and realistic setting. UD fails as an appropriate surrogate because it leads to a loss of spikes and thus to a large number of false-positive patterns. The other surrogates achieve a better performance in detecting precisely timed higher-order correlations. Based on these insights, we analyze experimental data from pre-/motor cortex of macaque monkeys during a reaching-and-grasping task for spatio-temporal spike patterns.


Introduction
1 Surrogate generation is a standard tool for the significance evaluation of precise spike 2 time correlation [1][2][3][4]. As in recent years the methods for identification of higher-order 3 correlations in multiple parallel spike trains are being further developed [5], also 4 surrogate techniques need to be understood in more depth [6]. Surrogate generation is 5 an alternative to the analytical derivation of a null-hypothesis when the definition of an 6 exact test distribution is difficult or even impossible [7,8]. Such an approach is also 7 referred to as a bootstrap test of hypothesis and is widely used and effective [9]. When 8 the null-hypothesis is rejected, the results obtained are said to be statistically 9 significant. In the context of temporal coding, the goal is to test whether the present 10 temporal correlations (with millisecond precision) are given by chance or represent a 11 feature of the network dynamics. This corresponds to formulating a null-hypothesis 12 where the fine temporal correlations are not present ("independent" model). 13 We are interested in detecting significant spatio-temporal spike patterns as 14 signatures of active assemblies. Precisely timed higher-order correlations have been 15 studied by numerous authors [10][11][12][13][14][15][16]. However, these correlation-based studies were not 16 well suited to find all possible spatio-temporal patterns in recordings of hundreds of 17 neurons. Therefore, we developed a method (called SPADE) that detects and evaluates 18 spatio-temporal spike patterns at a precision level of milliseconds [17][18][19]. In contrast to 19 other methods aiming at the detection of spatio-temporal patterns, it does not build on 20 pairwise analysis and enables to derive the significance of the patterns [5,20]. SPADE 21 detects patterns of action potentials between different neurons with particular time 22 intervals between the spikes that repeat identically. It was originally conceived as a 23 technique for detecting synchronous spike patterns [17,21], but has been further 24 developed for detecting delayed spike patterns [18,19,22] and for testing them 25 statistically [19]. The null-hypothesis for the significance test is generated through 26 surrogate data that implement independence between the spike trains given the varying 27 firing rates. 28 In electro-physiological experiments, nowadays the activity of hundred(s) of neurons 29 is recorded. Therefore, the number of pattern candidates can be very high -in an 30 example data set at our hand we find up to 10 9 patterns. As a consequence, testing 31 each pattern individually is not feasible [23][24][25], as this would result in massive multiple 32 testing. Moreover, some methods for pattern detection rely on the assumption of that the surrogates contain fewer spikes than the original data. This could yield 48 false-positive detection [4,35]. 49 Thus, this study comprises the evaluation of this effect, and a careful and detailed 50 analysis of alternative surrogate methods to be used as a solution. So, after describing 51 the SPADE method and the role of surrogates therein in more detail (in section 2), we 52 analyze the origin of spike count reduction in UD surrogates (section 3.1) and identify 53 the destruction of the original inter-spike interval (ISI) distribution by UD [27,32] as the 54 cause for it. We show by simulation of defined stochastic processes that the subsequent 55 discretization (binning followed by clipping) of this surrogate data removes more spikes 56 than of the original data. 57 To avoid this we describe alternative surrogates -mostly from the literature [32,36], 58 but also newly defined ones -that aim to preserve the ISI distribution, and explain their 59 concrete implementation (section 3.2). In section 3.3, these alternative surrogates are 60 compared in their statistical properties and compared to UD. This is done by applying 61 the surrogates to well-defined ground truth data which do not contain patterns, i.e., 62 stationary point processes with ISI distribution with dead-time (Poisson process with 63 dead-time, PPD), or with a peaked ISI distribution (Gamma process) and for 64 comparison a Poisson process. The statistical aspects that are explored are the effects 65 on the potential loss of spikes, on the ISI distributions, auto-correlations, and the 66 coefficient of variation (CV), but also the effective movement of the spikes in the 67 surrogate as compared to the original data. As a summary, we provide a table with 68 preserved and destroyed features and give an outline of their expected impact on the 69 appropriate estimation of significance. 70 In a further step, we evaluate the performance of the surrogates in respect to 71 false-positive patterns (FPs) when applying them to ground truth data that are 72 generated to closely model the experimental data (section 3.4) which are later also 73 analyzed (section 3.5). The data are modeled as independent PPD or Gamma processes 74 with instantaneous firing rates estimated on a trial-by-trial and neuron-by-neuron level 75 from the experimental data. The PPD includes the same dead-time as the experimental 76 data through the spike sorting process, and the Gamma process is generated with a 77 shape factor using the CV of the real data. The modeled data are then segmented in 78 behaviorally relevant epochs (as the experimental data), which are analyzed separately 79 by SPADE. As a result, we get the amount of FPs per data set (PPD and Gamma) and 80 observe that UD generates in both cases a large number of FPs as compared to the 81 alternative surrogates. The number of FPs increases with the firing rates. 82 Finally, in the last section 3.5 we analyze two experimental data sets, each from a 83 different monkey, employing all explored surrogates. We indeed find STPs, for monkey 84 N in almost all epochs, for monkey L in less. SPADE with UD finds many more STPs, 85 which we mostly interpret as FPs, as concluded from the previous analyses. In contrast, 86 the other analyses of SPADE with the alternative surrogates yield fewer STPs but more 87 than expected by the null-hypothesis. In addition, we obtain almost the same results 88 (identical patterns and pattern counts) in the various epochs across each monkey, 89 August 6, 2021 3/35 supporting the robustness of the analysis. In the discussion, we integrate the results of 90 all analyses and conclude on trial-shifting as the surrogate of choice, since it keeps the 91 spike trains intact, leads to the smallest change of the statistical properties, thus it is 92 expected to neither over-nor underestimate the significance of the patterns. 93 The software and data resources are made available and in the supplementary 94 information we provide analytical and other details of the analyses.  the sequence of analysis steps of the original data until the pattern spectrum is derived. The bottom branch of the workflow starts with the generation of the surrogate data from the original data, followed by the same analysis steps as for the original data. The multiple overlapping panels in the lower branch indicate that this surrogate procedure is repeated many times, by which the p-value spectrum is built up. This then serves for the extraction of significant patterns through pattern spectrum filtering. After the application of the pattern set reduction, the significant STPs are provided as a result. 'Surrogate method 2' indicates that the part 'Destroy correlation' and 'surrogate data' are replaced by another surrogate method, but the other steps stay the same.
The spike train data are first discretized into exclusive time intervals (bins).

100
Typically, the bin length consists of a few milliseconds, which at the same time defines 101 the allowed temporal imprecision of the neuronal coordination. The procedure of 102 discretization counts the number of spikes within each bin (binning, [3,17]), followed by 103 reducing the bin content to 1 if a bin contains more than 1 spike (clipping). In the their number of spikes z, the number of pattern repetitions c, and the temporal extent 112 from first to last spike d (see Fig 2C). We call the triplet {z, c, d} a signature.

113
FIM efficiently collects and counts pattern candidates, nonetheless, the probability 114 (thus the statistical significance) of each of the mined patterns has still to be evaluated. 115 We aim at testing whether the patterns emerge as an effect of precisely timed neuronal 116 coordination, or merely by the firing rate profile of independently firing neurons. For 117 this reason, the null-hypothesis is that spike trains are mutually independent given their 118 firing rate (co-)modulations and that the occurring patterns in the data are given by 119 chance. All patterns that have a p-value lower than the threshold (after multiple testing 120 correction) reject the null-hypothesis and are classified as significant.

121
To implement the null-hypothesis of independence of the spike trains, we resort to 122 the generation of surrogate data. The goal is to generate data from the original spike 123 trains in such a way that putative precise spike correlations in the original data are 124 destroyed, while the remaining statistical characteristics of the spike trains, such as the 125 single neuron firing rate modulations are preserved. The classical approach regarding 126 surrogate generation is uniform dithering [1,28,29]: to obtain a surrogate data set from 127 the original data, uniform dithering displaces each spike independently by a random 128 amount δ from its original position. δ is sampled from a uniform distribution 129 U ([−∆, +∆]), and the maximum displacement ∆ is usually a multiple of the bin 130 size [36].

131
Each surrogate data set is subject to the same procedure as the original data: 132 binarization, followed by FIM, which results in pattern counts per signature. To 133 generate reliable statistics, we generate and analyze many surrogates (in the range of 134 thousands: here 5000). From these 5000 sets of pattern counts, we compile a p-value 135 spectrum, i.e., a 3-dimensional matrix containing for each signature the fraction of 136 surrogates data sets containing patterns with that signature. This yields for each 137 signature {z, c, d} a p-value ( Fig 2C). Due to the large amount of tests performed, we 138 correct the significance level by the False Discovery Rate correction [39]. The number of 139 tests considered is the number of occupied entries of the pattern spectrum having the 140 highest number of occurrences per size and duration. We call this step of testing the 141 significance pattern spectrum filtering (PSF). By comparing the pattern counts of the 142 original data to the p-value spectrum, potentially significant patterns are extracted, and 143 are further filtered by the pattern set reduction (PSR) [17]. Pattern set reduction 144 consists of conditional tests of each pattern given any other pattern surviving the PSF, 145 in order to remove spurious false positives resulting as a by-product of the overlap of 146 true patterns and chance spikes.

147
When analyzing large-size experimental data, the search for all possible patterns can 148 result into obtaining millions, if not billions, of putative patterns. Thus, the analysis 149 can quickly become infeasible due to memory or time requirements. In order to optimize 150 our analyses, we run the FIM algorithm separately per pattern size (from size 2 to size 151 10+ in steps of one). For each pattern size, we set a minimum number of occurrences 152 min occ a pattern has to occur to be further considered after the mining step. This occurrences are considered as spurious as they would be rejected anyway by the neurons, the lags between the pattern spikes, and the times of occurrences. The SPADE method is developed to be applied to experimental data. We started to 168 analyze parallel spike trains recorded during a reach-to-grasp experiment in monkey 169 pre-/motor cortex [34,40]. The data were recorded with a 100-electrode Utah array and 170 resulted in about 100-150 single units per session. The firing rates of the neurons and 171 the regularity of their inter-spike intervals (ISIs) were shown to be highly variable and 172 behavior-dependent [41,42].

173
Hence, we checked if the statistical features of the spike trains are conserved after 174 discretization and surrogate generation within our SPADE analysis. In particular, we 175 tested the spike counts per neuron before and after the binarization steps for both the 176 original and surrogate data. To our surprise, we noticed that for some neurons the total 177 spike counts of the surrogates were much lower than those of the original data. Further 178 analysis of this aspect showed, as a function of the mean firing rate of each neuron, that 179 the higher the firing rate of the neuron the larger the spike count reduction and the 180 corresponding mismatch. Fig 2A, shows that for two different data sets, each from a 181 different monkey (data are published and freely accessible in [34]), we find a spike count 182 mismatch of up to 10% between the surrogates and the original data (Fig 2A, bottom, 183 gray). Such a difference in the spike count is troublesome, since it is expected to lead to 184 a reduced pattern count of the surrogates as compared to the original data and, thus, is 185 expected to yield an overestimation of the significance of the patterns. "jittering" or "teetering" and is a classical choice for surrogate generation and was 191 employed in several experimental studies [2,21,29,32,33,43]. It is widely used for its 192 simplicity and computational speed for synchrony detection of pairwise (i.e., 193 cross-correlogram significance estimation, [1,4]), higher-order synchrony, and pattern 194 detection [2,44,45]. In particular, it was chosen as the surrogate generation technique 195 for synchrony and pattern detection using SPADE [18,19,21]. Nonetheless, in [35] the 196 authors already demonstrated in the context of analyzing pairwise synchrony, that 197 uniform dithering (UD) may lead to false positives in the case of regular firing 198 properties (CV < 1).

199
The dither parameter of the method ∆ > 0 determines the maximal displacement of 200 a spike from its original position. Typically, it is chosen to be a multiple of the bin size 201 parameter (e.g., 15 − 25ms), and must be chosen appropriately. If ∆ is too small, it 202 causes insufficient displacement of the correlated spikes and may lead to an 203 underestimation of significance, whereas if ∆ is too large, it yields a strongly smoothed 204 firing rate profile [31] and thus to an inappropriate null-hypothesis. In how far it also 205 affects other intrinsic statistical properties of the spike trains will be studied in this in blue). Neurons represented are, with (channel-id, unit-id) notation, (7,1) and (16,1) for monkey N,

Origin of spike count reduction 208
The UD procedure as such is not deleting any spike. However, the SPADE workflow 209 Fig 1 illustrates that the analysis steps of the original data and the surrogate data are 210 identical from binning and clipping on. The only difference between the two data sets is 211 the manipulation of the original data to generate the surrogate data set. Therefore, in 212 the following, we will carefully analyze the reason for the spike count reduction.

213
Only the binarization step, i.e., binning followed by clipping, may reduce the spike 214 count content in a bin. This step, applied to the original and to the surrogate data, 215 leads to different results. A potential reason may rest on the inter-spike interval (ISI) 216 statistics of the spike trains with and without dithering. Fig 2B, shows the ISI 217 distribution for two example neurons of the original data (in blue; right for monkey N, 218 left for monkey L) and for comparison, the ISI distributions of the uniform dithered 219 surrogates (gray). In the original data, the ISI distribution is peaked at a certain ISI, 220 here between 5 and 10ms, but the ISI distributions of the surrogate data are decaying 221 exponentially, and thus also fill small ISIs. This indicates the fact that the original spike 222 trains are more regular than the surrogates and small ISIs have a lower probability.

223
This is confirmed by the measurements of the coefficient of variation, here CV2 which 224 compensates for non-stationarity firing rates [46]. Indeed, the CV2 distribution of all 225 neurons of each of the two data sets (Fig 2C, left subpanels) is rather below 1, i.e., more 226 regular than Poisson. This raises the question if the surrogates lead indeed not only to 227 different ISI distributions, but also to different CV2s, which will be investigated in 228 section 3.3.

229
In addition, the distributions of the minimal ISI of each neuron per data set ( Fig 2C, 230 right subpanels) exhibit a minimal ISI of 1.3 ms for monkey N and of 1.6ms for monkey 231 L. This corresponds to the dead-times of the spike sorting algorithm that cannot resolve 232 overlapping spikes. The different dead-times for the two monkeys are due to a different 233 number of sample data points considered for spike sorting [34]. The corresponding ISI 234 distributions of the surrogate data ( Fig 2B) show that there are ISIs smaller than the 235 minimum ISI of the respective experimental data. Thus the dithering procedure 236 generates shorter ISIs in the UD surrogates than in the original data.

237
To verify our interpretation that the combination of UD and binarization causes the 238 spike count reduction, we perform a similar analysis on well-defined ground truth data, 239 i.e., simulated spike trains of a well-defined point process model and parameters. We 240 choose two types of stochastic point processes that model spike trains exhibiting similar 241 ISI distributions as the experimental data. One is a Poisson process with a hard 242 dead-time (PPD, [47]), and the other is a Gamma process. For simplicity, we now 243 choose both of a constant rate but adapt the dead-times or shape factors, respectively, 244 to account for the ISI features of the experimental data. The PPD is a variation of the 245 classical Poisson process wherein no spike is generated within an interval to the previous 246 spike smaller than the dead-time d. The Gamma process has a shape parameter γ 247 which is related to the intrinsic regularity/irregularity of the spike train [48]. If γ > 1, 248 the process is regular (i.e.,CV < 1); if instead γ = 1, it coincides with the classical 249 Poisson process with an exponential ISI distribution, which here serves for reference. 250 We do not consider here Gamma processes with CV > 1, i.e., with bursty spike trains. 251 Fig 3 shows the spike count reduction (expressed as 1 − N clip /N , where N clip is the 252 number of clipped spikes and N the total number of spikes) that results after binning 253 (5ms) of the PPD (left) and Gamma process (right) with (dashed) and without (solid) 254 application of uniform dithering, here obtained through analytical derivation (see 255 Supplementary Information S1). For the PPD and Gamma models, we vary the 256 dead-time parameter d and the shape factor γ, respectively. The graphs show the spike 257 count reduction as a function of the firing rate of the processes. The PPD model shows 258 a higher spike loss the higher the firing rate. The spike reduction is lower for larger In contrast, due to the regular ISIs of the original process, its binned data are hardly losing spikes in comparison to the dithered version. (B). Spike count reduction (after binning in 5ms intervals and clipping) shown as analytical results (see Supplementary Information S1) for renewal point process models (PPD, left and Gamma process, right; solid lines, respectively), each with 4 different parameter sets (PPD: d = 1.5, 2.0, 2.5, 3.0ms, Gamma: γ = 1, 1.5, 2, 2.5, different colors). The dashed lines show the same quantity for their UD surrogates. The firing rate of the processes is also varied and shown along the x-axis. The spike count reduction is shown on the y-axis, expressed as 1 − N clip /N, where N clip is the number of clipped spikes and N is the total number of spikes. (C). P-value spectrum of the original data (top) and of their UD surrogates (bottom) for mined patterns (of size 3 only) for a range of different pattern durations d (y-axis), and pattern counts (x-axis). The p-values are expressed by colors ranging from dark blue to light blue (see the color bar, identical for both spectra). The bin size is 5ms. The PPD data are of 100 realizations of n = 20 parallel independent spike trains with parameters λ = 60Hz, d = 1.6ms. The p-value spectrum for its uniformly dithered surrogates of the PPD data are derived from 5000 surrogates, dither parameter ∆ = 25ms.
dead-times. The uniform dithered PPD show for all dead-times an increase in the spike 260 reduction with higher firing rates, but to a larger extent than for the original PPD 261 processes. The Gamma process (right) for γ > 1 shows a similar result as for the PPD: 262 an increase of spike count reduction for increasing firing rates, and the larger the shape 263 factor, the lower the spike count reduction. The increase is rather parabolical compared 264 to the PPD. The Poisson process (blue, γ = 1) shows a much larger and linear increase 265 of spike reduction with rate, more strongly than for Gamma processes with γ > 1 266 (orange, green, red).

267
So, a) why does a Poisson-like process lose more spikes through binarization than a 268 process with a non-exponential ISI distribution, and b) why does uniform dithering lead 269 to a loss of spikes compared to the original experimental data? As shown above 270 (Fig 2B), uniform dithering generates a more Poisson-like ISI distribution. Such a 271 process contains spikes that follow each other in short intervals. Such a cluster of spikes 272 may fall within a bin, and then the content of the bin is reduced to 1 by clipping (see for 273 illustration Fig 3). A PPD process has a strict dead-time, leading to the fact that the 274 probability of more than one spike in a bin is reduced, and thus there is a smaller spike 275 loss. In fact, the goal of preserving the dead-time is to reduce the difference between the 276 binned and clipped spike counts of the original spike train and its surrogates. The closer 277 the dead-time to the bin size, the more unlikely it is that two spikes are dithered into  Through the spike count reduction in the surrogate data also the expected occurrences 282 of patterns may be reduced. This we prove by analyzing independent data comparing to 283 UD surrogates. In Fig 3C, we represent the results of such a comparison. We generate 284 20 parallel artificial independent PPD spike trains with a stationary firing rate of 285 λ = 60Hz, and with a dead-time of d = 1.6ms, and a total length of 1s. From these 286 spike trains, we generate 5000 surrogates, each with uniform dithering, discretize them 287 and count the number of patterns using FIM. Based on the generated pattern spectrum, 288 we calculate the corresponding p-values. In addition, we aim to compare the p-value 289 spectrum of the surrogate data ( Fig 3C, bottom panel) with the p-value spectrum of the 290 ground truth data. In order to do so, we generate a large number of the independent 291 PPD processes and analyze these by FIM to extract the patterns and generate the 292 p-value spectrum. In other words, knowing the ground truth, we can use new spike train 293 realizations as surrogates. Fig 3C, top panel, shows the p-value spectrum of the PPD 294 processes, for patterns of size 3, across different pattern durations. Since these data are 295 independent, the resulting patterns are occurring by chance. The p-value spectrum of 296 the surrogates of the PPD processes is shown below. The two p-value spectra illustrate 297 that the p-values of the UD surrogates are smaller and UD surrogates have fewer 298 pattern counts as compared to the ground truth data. In other words, pattern counts of 299 the ground truth (PPD) data would become significant if compared to its UD surrogates 300 although the ground truth data sets are independent. As a consequence, patterns in the 301 independent original data would be classified as significant while being false positives. 302 From this analysis, we conclude that uniform dithering is not an appropriate 303 surrogate method for spike data that either contain a hard dead-time or have a regular 304 spiking behavior, as motor cortex data tend to have [42]. Therefore, we will next 305 explore various surrogate techniques that are appropriate for such data. Here, we test five alternative surrogate methods, three known from the literature and 308 two newly developed by us, and evaluate their applicability to our type of data. The 309 goal is to find a surrogate type that preserves relevant features of the data, e.g., the 310 spike count.   [32,35]) aims 344 to keep the distribution of the preceding and following inter-spike intervals relative to a 345 spike according to the joint-ISI probability distribution. This probability distribution is 346 derived for each neuron from its spike train by calculating the corresponding joint-ISI 347 histogram (with a default bin size of 1 ms). Dithering one spike according to such a 348 two-dimensional histogram corresponds to moving the spike along the anti-diagonal of 349 the joint-ISI distribution [32,35]. The details of the surrogate procedure are described in 350 the Supplementary Information S2.

351
Unfortunately, the recordings are often non-stationary and too short to comprise 352 enough spikes to estimate the underlying joint-ISI probability distribution. Therefore, 353 we apply on the joint-ISI histogram a 2d-Gaussian smoothing with variance σ 2 , with σ 354 of the order of milliseconds [35,49] artificially, e.g., spontaneous/ongoing data.

377
TR-SHIFT has the benefit of keeping the entire spike train structures intact during 378 each trial, i.e., firing rates, ISI distributions, and auto-correlation (1). 379 380 We also introduce window shuffling (WIN-SHUFF), which divides the spike train into 381 successive and exclusive small windows of predefined duration ∆ WS , and further divides 382 the windows into bins of length b (∆ WS should then chosen to be a multiple of b). The 383 bins are then shuffled within each window, and spike times are randomized within each 384 bin. This method conserves the number of spikes of the original data and the number of 385 occupied bins in the original but already discretized data for each surrogate, i.e., there 386 is no risk of spike count reduction (1). The bin size b is typically chosen to be equal to 387 the bin size used in the SPADE analysis. The firing rate profile is modified by the local 388 shuffling of the spikes, and its degree of resemblance to the original profile depends, 389 similarly as for any other dithering technique, from the parameter ∆ WS , i.e., the 390 window of the randomization. To facilitate the comparison to the other methods, we fix 391 throughout the paper ∆ WS = 2∆.  (Table 1), we compare the features numerically on stationary and 395 independent data. For these data, we simulate point process models with well-defined 396 properties: a Poisson process as a reference, Poisson process with dead-time (PPD), and 397 a Gamma process. The latter two are chosen to mimic the ISI distributions of the 398 experimental data (Fig 2B). Here the processes are chosen to be stationary, in order to 399 be able to exclude yet another statistical aspect. In the next section (3.4), we will 400 include non-stationary firing rates. We explore the effect of all the six surrogate methods 401 onto the statistical properties of the stationary and independent data ('original data'), 402 such as the loss of spikes, ISI properties, effective moving of spikes. The parameters of 403 the data models are adjusted to be close to the experimental data at our hand and thus 404 to enable the transfer to experimental data to be analyzed later (section 3.5).   The auto-correlations (Fig 5D, left)  reference spike is close to the following and not to the preceding, it will be more likely 438 displaced backwards in time than forwards. This is also slightly visible in a difference of 439 the ISI distributions (Fig 5B, left), compared to the other methods. Moreover, the 440 increase towards ∆ above baseline is due to the shift of dither probability to higher time 441 intervals.

442
PPD process as original data. Next, we discuss the same aspects for the PPD 443 process and its surrogates, which is a process that includes a dead-time (chosen as 444 d = 1.6ms as in the experimental data). First of all, we observe that the loss of spikes 445 (Fig 5A, middle) is considerably reduced for the PPD compared to the other models.

446
This is true also for its surrogates, besides UD (orange), that loses more spikes. This  Non-preservation of the CV in the surrogate data as compared to the original data can 479 be a potential source of false positives, in particular for very small CVs or CVs > 1 [27]. 480 To facilitate the comparison, we also show the diagonal (blue). UD -as expected -481 changes the CV the most, from original 0.4 to 0.75, i.e., losing strongly its high 482 regularity, and increases even more, with a low slope, to a maximum slightly over 1.0 for 483 the original CV of 1.25, so here burstiness is reduced. WIN-SHUFF and UDD behave 484 similarly to UD, but start at a lower CV for CV = 0.4 of the original data; moreover, 485 UDD stays below UD for all CVs. WIN-SHUFF has a slightly higher slope and reaches 486 a maximum still below the original Gamma process. false negatives. Therefore, we measure the ratio of the number of spikes that are 500 displaced from their original bin position relative to the total number of spikes. We 501 generate original data and its surrogate data and vary the firing rate (from 10 to 100Hz 502 in steps of 10Hz, Fig 5E). If two spikes exchange their bin positions they are both 503 considered as not moved. As a reference the spike ratio is also shown for two 504 independent realizations of a Gamma spike train (γ = 1.23, blue line).

505
With increasing firing rates, the ratio of moved spikes decreases for the surrogates. 506 Ideally, the surrogates should be similar to the effect attained on the original process, 507 i.e., the colored lines in Fig 5E, should be as close as possible to the blue line. However, 508 none of the surrogate techniques meets this ideal setting, and the difference to the 509 independent case is almost constant around 10%. Nonetheless, we observe for all 510 surrogates that the ratio of moved spikes decreases with increasing firing rates, which 511 corresponds to the fact that increasingly more bins are already occupied and thus the 512 resulting binned surrogate spike train is more similar to the binned original. UDD, 513 ISI-D, and JISI-D displace less spikes, in particular for higher firing rates as compared 514 to UD, TR-SHIFT, and WIN-SHUFF. The fewer the spikes that are not effectively 515 displaced, the higher the peak at zero-delay of the cross-correlation of the original and 516 the surrogate data ( Fig 5C). Almost 50% of JISI-D, ISI-D, and UDD are not moved at 517 100Hz, and, for lower rates, they remain below the ratios of WIN-SHUFF, UD, and 518 TR-SHIFT. As a consequence, we can expect that JISI-D, ISI-D, and UDD in general 519 tend to yield more false negatives than WIN-SHUFF, UD, and TR-SHIFT. 520 Rate change in surrogates Changes in the firing rate profile of the surrogates 521 compared to the original data may be a source for false positives [4]. An optimal 522 surrogate method should follow as closely as possible the original firing rate profile. 523 Therefore, we test here an extreme case where the original data have a rate step (as 524 in [35]), jumping from 10Hz to 80Hz (Fig 4G). We observe that for all surrogates but 525 WIN-SHUFF the firing rate step is converted into a linear increase, which starts at 526 25ms (dither parameter ∆) before the step and ends at 25ms after the rate step. This 527 behavior has already been derived analytically and observed in [35] for UD: it 528 corresponds to the convolution of the original firing rate profile with the dither (boxcar) 529 function. WIN-SHUFF introduces a second step in the firing rate profile, which is due 530 to the fact that it generates a locally-stationary firing rate within the shuffling window 531 (here 50ms). We conclude that all surrogate techniques smooth the original firing rate 532 profile, whereas WIN-SHUFF creates an additional intermediate rate step. Thus, for 533 steep increases in the firing rate profiles, we have to expect the occurrence of 534 false-positive patterns due to this smoothing. We explored different aspects of the statistics of the surrogate spike data as compared 537 to its original process. In general, surrogate data are not identical to the original data, 538 but change to a different degree. The effects for the three data models are summarized 539 and not differentiated, since they anyway are similar. These are listed in Table 1.

540
Features that are preserved are indicated by 'yes', approximately preserved ('approx.'), 541 and not preserved ('no'). Given the insights obtained through the statistics considered, 542 we expect that surrogates generated with UD, as they evidence a strong spike count 543 reduction, lead to a large number of FPs. UDD surrogates might lead to FPs in case of 544 regular data that do not exhibit a dead-time (e.g., Gamma spike trains). The study of 545 the similarity of the surrogates to the original processes shows that JISI-D, ISI-D, and 546 UDD might lead to fewer patterns detected, i.e., an underestimation of significance.

551
Next, we apply the six surrogate techniques to artificial data sets that are generated 552 based on two experimental data sets [34] to study the effect of the various surrogate 553 methods on the occurrence of false positives [35]. The experimental data are two sets of 554 recordings from approximately 100 parallel spike trains from macaque monkey motor 555 and premotor cortex during performance of a reach-to-grasp behavior [40]. We model 556 the data with the same firing rate profiles as the experimental data, and use as point 557 process models a) the PPD to mimic the dead-time of the data (due to spike sorting), 558 and also b) Gamma processes to account for their CVs. Note that the modeled spike  Experiment and data The experimental data were recorded during a reaching and 564 grasping task from the pre-/motor cortex of two macaque monkeys. Both monkeys were 565 chronically implanted with a 100-electrode Utah array (Blackrock Microsystems). The 566 experimental protocol is schematized in Fig 6A and was also published in [34] and 567 in [40]: monkeys N and L were trained to self-initiate a trial by pressing a start button 568 (registered as trial start, TS). Then, after a fixed time of 400ms, a visual signal (yellow 569 LED) was shown, to attract the attention of the monkey (waiting signal, WS). After 570 another 400ms-long waiting period, a first visual cue (2 LEDs on) was presented to the 571 monkey for a period of 300ms (from CUE-ON to CUE-OFF) indicating the grip type: 572 full-hand side grip (SG) or two-fingers precision grip (PG). Followed by another waiting 573 period of 1000ms, the GO-signal was presented, containing also the information of the 574 expected grip force (high, HF, or low, LF, by LEDs). The behavioral conditions were 575 selected in a randomized fashion. The start of the monkey movement is recorded as the 576 release of a switch (SR). Subsequently, the object touch (OT) and the beginning of the 577 holding period (HS) are indicated. After 500ms of holding the object in place , the 578 reward (RW) was given to the monkey and the trial finished.

579
The experimental data sets consist of two sessions (i140703-001 and l101210-001 ) of 580 15 min of electrophysiological recordings containing around 35 trials per trial type, i.e., 581 combinations of grip and force type. Each session is spike sorted using the Plexon  Preprocessing for analysis We only consider neurons satisfying the following 585 constraints: SNR < 2.5 (signal-to-noise ratio of spike shapes), average firing rate across 586 trials < 70Hz. Hypersynchronous (artifact) spikes across electrodes arising at the 587 sampling resolution are detected automatically, classified as artifacts, and finally 588 removed as in [21]. Only successful trials are retained. Both experimental sessions are 589 analyzed separately, further differentiating between the different trial types. For the 590 analysis, trials are segmented into six 500ms-long separate epochs, to account for the 591 behaviorally relevant events (as in [21], and represented in Fig 6B). Segments of the 592 same epochs in the same trial type are concatenated and yield 24 (4 trial types ×6 593 epochs) data sets per session.  595 We create the artificial data by generating as many spike trains as in the concatenated 596 data sets, and use the original firing rate profiles of the individual neurons, estimated 597 with an optimized kernel density estimation as designed in [50,51] on a single 598 trial-by-trial basis. To account for dead-time and regularity of the data, we model 599 separately the spike trains with a PPD and a Gamma process. For the PPD data, we 600 estimate the dead-time for each neuron and each combination of epoch and trial type by 601 taking their minimum ISI (fourth inset, panel A of Fig 7). For the data modeled as a 602 Gamma process, we instead fix the shape factor for each neuron and each combination 603 of epoch and trial type by estimating the CV of the process in operational time [48] and 604 then transform the CV into the shape factor (γ = 1 CV 2 , [52]). The process is generated 605 in operational time and then transformed back into real time. The resulting CV2 606 distribution of all neurons of the data, simulated as a Gamma process, is close to the 607 one of the experimental data (third inset, panel A of Fig 7). Note that the Gamma 608 process does not have an absolute dead-time but for γ < 1 , it has a low probability for 609 small ISIs and can be regarded as containing a relative dead-time [48]. The resulting 610 firing rate of the artificial data is, for both processes, close to the one of the original 611 spike train (first inset, panel A of Fig 7).

612
For the two experimental sessions, each of the 24 sets of concatenated data is 613 modeled using the two point process models, resulting in a total of 2 × 24 × 2 = 96 data 614 sets. The SPADE analysis is performed on all data sets, separately for each of the six 615 surrogate techniques. We set the bin size to 5 ms, the maximum pattern duration to 616 60ms, the significance level to α = 0.05, and use 5000 surrogates. For all surrogate 617 techniques, we set the dither parameter to ∆ = 25ms. Each resulting pattern is counted 618 as a false positive.  To get an understanding of the rate properties of neurons that contribute to the FPs, 641 we consider their average firing rates (over time and trials) and the group that they 642 belong to (Fig 7C and D). In general, we find FPs in all analyzed data sets, but one 643 (monkey L, movement, Gamma, all conditions). We observe that almost all neurons 644 involved in FPs have an average firing rate higher than 20Hz. Neurons belonging to the 645 first group are the largest set, and are present for both monkeys and data models, and 646 in almost all data sets. The second group is present for both monkeys and models, but 647 is more represented for PPD. The third group is present for both monkeys only for the 648 Gamma model. This was already expected, given the higher spike count reduction, for 649 UD and UDD in the case of Gamma spike trains (section 3.3.1). 650 We also inspect the CV2, averaged over trials, of units involved in FPs (S1 Fig). FPs 651 occur in neurons with a relatively low CV2 (CV2 < 1), but, importantly, this is not the 652 case for neurons with very low CV2s (especially for monkey N). In fact, given the results 653 of [27], the most regular spike trains are expected to be involved in false-positive 654 patterns, which is not the case here. In almost all cases, bursty neurons (CV2 > 1) are 655 not involved in FPs.

656
In summary, we observe that the surrogate technique leading to most FPs is UD, 657 followed by UDD (only in the case of Gamma data). Neurons exhibiting an average 658 firing rate higher than 20Hz, and having a CV2 < 1 are predominantly involved in FPs. 659 Moreover, there is a small amount of FPs detected using all surrogate techniques, which 660 is expected given a certain significance threshold. Nonetheless, regular and high-rate 661 neurons are more prone to raise the false-positive rate [53]. As a last step, we apply SPADE with the six surrogate techniques to the two sessions of 664 experimental data introduced in section 3.4.1. Here our goal is to analyze with SPADE 665 experimental data, for which we do do not know the ground truth (i.e., the presence  TR-SHIFT we find the same patterns, 1 for PGHF and 1 for PGLF. We do not detect 702 any significant patterns in start, cue, and early delay epochs.

712
Here we have shown that the usage of uniform dithering (UD) as a surrogate technique, 713 followed by binarization (binning and clipping) of the spike train, leads to a mismatch 714 in the spike counts between the original and the surrogate data (Fig 2). The spike count 715 reduction is significant and increases monotonically with the firing rate, which we 716 verified analytically and through simulations (Fig 3B, Fig 5A). Moreover, we showed 717 that two factors play a large role in the spike count reduction: the neuronal dead-time 718 and the firing regularity ( Fig 3B). These two components are contained in the 719 auto-structure of the experimental data from pre-/motor cortex of macaque monkey we 720 aimed to analyze here [34,41].
In the context of spatio-temporal spike pattern detection using SPADE, we observed 722 that the spike reduction in the UD surrogates in combination with binarization of the 723 spike data leads to an underestimation of patterns in the surrogate data compared to 724 the original data, which in turn leads to an overestimation of pattern significance in the 725 original data (Fig 3C). The ultimate consequence of this problem is the occurrence of 726 false positives. Fortunately, SPADE is a modular method: different types of surrogates 727 can be used, while the mining algorithm and testing steps stay identical. For this 728 reason, we were able to analyze the same data sets by using different surrogates, to then 729 compare the results.

730
In order to verify and solve the issue of the spike count mismatch, we compared five 731 alternative surrogate techniques to UD: uniform dithering with dead-time (UDD), 732 window shuffling (WIN-SHUFF) -both newly introduced-, joint inter-spike interval 733 dithering (JISI-D; [32]), inter-spike interval dithering (modified from [32]), and trial 734 shifting [36] (Fig 4). All surrogate approaches have the goal of destroying the exact 735 spike timing relations across the neurons. We quantified which statistical features of the 736 original spike trains are conserved in a surrogate, examining the spike counts, the ISI 737 distribution, the auto-correlation, the cross-correlation, the firing rate modulations, the 738 ratio of moved spikes, and the regularity (Table 1). These were evaluated on artificial 739 data (Poisson, PPD, and Gamma spike trains) that contained partly the main features 740 of the data (dead-time and regularity) (Fig 5). We observed that UD does not preserve 741 the spike count, and leads to a very strong spike count reduction for the PPD model, 742 not for the Poisson model, and not (or very small) for the Gamma model. As 743 experimental data typically exhibit a dead-time or it is introduced by spike sorting, we 744 concluded that UD is not an adequate method to estimate correlations within our 745 context, especially as the effect of clipping increases with the firing rate.

746
Further we tested the surrogate alternatives in respect to false positives on ground 747 truth data, that included in addition realistic firing rate modulations as observed in the 748 experimental data. These were based on two experimental sessions of motor cortex of 749 macaque monkey involved in a reaching and grasping task [34,40]. We stress the 750 importance of generating test data that are very similar to experimental data, in order 751 to closely model all features that typically lead to complications when trying to robustly 752 estimate the null-hypothesis of independence [4]. Such realistic data serve as ground 753 truth to identify the strengths and weaknesses of the tested surrogate techniques. In 754 this case, we modeled both experimental sessions with PPD and with Gamma processes, 755 with firing rate modulations, dead-times, and regularities estimated for each neuron 756 from the original data ( Fig 7A). The analysis of these data with SPADE led to a large 757 number of FPs when employing UD. However, all other surrogate techniques showed a 758 considerably low number of FPs. A minimal number of false positives is to be expected, 759 as it is inherent to any statistical test. Due to this result, we conclude that UD is not 760 appropriate for its employment within the context of the SPADE analysis, whereas all 761 other techniques can be considered valid.

762
Finally, having validated the five alternative surrogate techniques against UD, we 763 analyzed experimental data from [34]. UD in this context leads to a large number of 764 detected patterns (Fig 8) in contrast to the other surrogates. Still, the number of 765 patterns detected by the other surrogate methods is larger than for the analysis of 766 artificial data, that were independently generated, but otherwise very similar. Given the 767 results obtained from the previous sections, analytically and through simulations, we 768 consider the patterns detected by UD as putative false positives, taking into 769 consideration that in the case of experimental data we have no ground truth at hand. 770 Moreover, we consider the patterns detected in the experimental data by the other 771 surrogate methods as significant, i.e., not resulting from any overestimation of the 772 significance, since we also found that the patterns retrieved for UDD, JISI-D, ISI-D, conservative as the other methods that we propose, and, 5) employs fewer parameters 785 than the other techniques with the same performance. 786 We stress that the surrogate techniques do not have to be restricted to the particular 787 context of the SPADE method, and are actually used in other studies for the evaluation 788 of correlation [4,29,31,33,35,45,[53][54][55][56][57][58][59][60]. Still the particular surrogate needs to be chosen 789 appropriately and cautiously case by case. Not only because the statistical test might 790 produce false positives (or false negatives), but also because the concrete null-hypothesis 791 distribution represents the model assumed for brain functioning. The degree of how 792 conservative or liberal the statistical analysis can be, through the choice of the surrogate 793 technique, becomes then not only a feature of the test but more a scientific question per 794 se regarding neural coding.

795
Several studies have already investigated the impact of different surrogate techniques 796 in the context of spike time correlations. For example, in [1], the authors evaluated the 797 influence of surrogate techniques on cross-correlation analysis of two parallel spike 798 trains; in [4,35], the focus was on the effect of surrogate techniques on synchronous 799 events in the context of the Unitary Events analysis [3,27,55,56,58,61]. Due to the 800 results of these studies, we have concentrated on surrogate techniques that preserve the 801 firing rate profile of the original neurons, but methods such as spike train randomization 802 (within single trials, [26]), spike exchange (across neurons or trials, [59,62]), ISI shuffling 803 (within and across trials, [63][64][65][66]), spike shuffling across neurons (within-trial, [63,65]) 804 do not fulfill our requirements [4]. Other methods are designed to preserve the 805 auto-correlation of a spike train, with the assumption of stationarity and Markovianity 806 of a process [67,68]. Some studies also already evidenced problems arising from the 807 application of uniform dithering, such as the non-preservation of the ISI 808 distribution [35], in particular in the case of the Poisson process [69], but not in the 809 context of multiple parallel spike trains, or in the context of binarization. We extended 810 here studies of comparisons of surrogate techniques to the context of spatio-temporal 811 patterns, i.e., including delays between spikes of multiple neurons. A similar comparison 812 of statistical methods have been done only at the level of pairwise correlations [35]. The 813 relevance of this further step is sensible, as it has been argued that the processing of 814 information may be reflected in the presence of delayed higher-order correlations in 815 parallel spike trains [70][71][72][73][74]. 816 Other methods discuss the employment of surrogates as a liability more than an 817 asset [75,76], pointing in particular to the computational burden of the approach, and 818 motivating closed-form testing (assuming a point process model for the spike train) as a 819 lighter alternative. Nonetheless, the estimation of an analytical form of the 820 null-hypothesis often involves assumptions or approximations that can also provoke the 821 detection of false positives [4,20,27,77]. In addition, treating non-stationary processes 822 analytically is cumbersome and mostly there is no possibility of an analytical treatment, 823 for example in the case with time-varying firing rates. Moreover, regarding the 824 computational costs, several improvements have been made for the performance of both 825 (matlab, [78]), TISEAN (fortran, [79]), and SpiSeMe (C++, matlab and python, [68]). 834 Our study concentrated on the problems arising from the application of UD together 835 with binarization, and false-positive patterns resulting as a consequence. Another aspect 836 to study would be to concentrate on false negatives (FNs), i.e., true positive patterns 837 present in data and not detected by SPADE. Some steps have been already done in this 838 direction [49], where results showed that the employment of UDD and JISI-D do not 839 result in a large number of FNs. Having showed that the proposed surrogate techniques 840 lead to almost the same significance level, we expect that the results obtained for UDD 841 and JISI-D can extend to ISI-D, WIN-SHUFF, and TR-SHIFT. In addition, artificial 842 data with inserted patterns testing the specificity of SPADE had already been studied 843 in [17][18][19], albeit only for the Poisson data model with stationary firing rates.

844
A number of sessions of the experiment considered in this paper have already been 845 analyzed in [21]. The authors found many patterns with SPADE when evaluating only 846 for synchronous spikes, using UD as the surrogate technique of choice. Thus, one might 847 question the findings of [21], given the results of this article. However, in [21] the 848 authors employed a version of the FIM algorithm called CoCoNAD [23][24][25] that mined 849 patterns in continuous time, without resorting to clipping. As we see in S2 Fig, we find 850 more synchronous spike events using CoCoNAD as compared to using FIM with all 851 surrogate methods even with UD (although there is a small reduction of detected 852 patterns for UD). Unfortunately, we were not (yet) able to transform CoCoNAD such 853 that it is also applicable for the detection of spatio-temporal patterns, but had to stick 854 to time discretized data. 855 Currently, we are applying SPADE to a large set of experimental data to investigate 856 the presence of spatio-temporal patterns in relation to behavior. We aim to study their 857 statistics and features, to test for their behavioral relevance.

858
Software and Data resources 859 The code to perform and reproduce the analyses presented in this study can be found at 860 (https://github.com/INM-6/SPADE_surrogates), along with the code to reproduce 861 Fig 2, Fig 3, Fig 5, Fig 6B, Fig 7, Fig 8 and Supplementary Figures 1 and 2 Fig 1, Fig 4, Fig 6A

Supporting Information 1: Derivation of spike count after binarization
The spike count after binarization (N clip ) for renewal processes can be obtained directly from the inter-spike interval (ISI) distribution ρ(τ ), τ > 0. The probability of a single spike to be discarded depends on the interval to the previous one as: P clip (τ ) = 1 − τ b for τ < b, where b is the bin size [49]. Thus, the probability of a single spike to be clipped away decreases linearly with increasing interval to the previous spike. In the extreme case of τ = 0 (if an ISI is equal to zero), the next spike is discarded, and if τ = b, the next spike is not discarded. Averaging over the ISI distribution, the firing rate after binarization λ clip becomes For a PPD process with dead-time d, the ISI distribution is with the effective rateλ = λ 1−λd [47,49], and Θ(x) being the Heaviside function. Thus, the firing rate after binarization (λ clip ) follows as For a Gamma process of shape γ, we have ρ γ (τ ) = γλ (γλτ ) γ−1 Γ(γ) e −γλτ Θ(τ ).
distribution is symmetric with respect to the diagonal, thus coincides to the outer product of the ISI distribution with itself [49]. As we analyze in Fig 5, the Poisson Process, PPD, and Gamma are all renewal processes, thus, the results coincide for joint-ISI and ISI dithering. Figure 1