Optimal burstiness in populations of spiking neurons facilitates decoding of decreases in tonic firing

A stimulus can be encoded in a population of spiking neurons through any change in the statistics of the joint spike pattern, yet we commonly summarize single-trial population activity by the summed spike rate across cells: the population peri-stimulus time histogram (pPSTH). For neurons with low baseline spike rate that encode a stimulus with a rate increase, this simplified representation works well, but for populations with high baseline rates and heterogeneous response patterns, the pPSTH has limited utility in capturing the neural representation of the stimulus. We simulated populations of spiking neurons that varied in size, baseline rate, burst statistics, and correlation, and we measured how these populations represent decreases (gaps) in spike rate. We introduce a different representation of the population spike pattern which we call an “information train,” and we show that it is more flexible and robust than the pPSTH in capturing stimulus information across different types of neuronal populations. In particular, we use this tool to study populations with varying levels of burstiness in their spiking statistics. We find that there is an optimal level of burstiness for gap detection that is robust to several other parameters of the population. Next, we consider this theoretical result in the context of experimental data from different types of retinal ganglion cells and determine that the baseline spike statistics of a particular, recently identified type support nearly optimal detection of both the onset and strength of a contrast step.

) ISI distributions resulting from a Poisson process (exponential), renewal process (gamma), and nested renewal process (NR; well fit by sum of gammas). (C) ISI distribution of bSbC and closest NR process match in the simulated data (as determined by a Kolmogorov-Smirov test) with parameters 1 = 6, 1 = 180 bursts/s, 2 = 3, 2 = 600 spikes/s.(D) ISI distribution of OFFsA and closest NR process match (as determined by a K-S test) with parameters 1 = 3, 1 = 150 bursts/s, 2 = 3, 2 = 500 spikes/s. (E) ISI distributions resulting from NR processes with the same firing rate of 100 Hz, ranging from regular to bursty. (Top: = 3, 1 = 300 bursts/s, 2 = 3, 2 = 300 spikes/s. Middle: 1 = 3, 1 = 100 bursts/s, 2 = 5, 2 = 1500 spikes/s. Bottom: 1 = 3, 1 = 30 bursts/s, 2 = 3, 2 = 3000 spikes/s.) Bottom ISI distribution extends out to 500 ms (not shown). Note: ISI distributions are shown on a semilog scale to emphasize differences in their tails. (F) Range of firing rates and burstiness in four RGC types: bSbC and OFFsA as defined in the text, sSbC27 (sustained suppressed-by-contrast type EW27) and sSbC28 (sustained suppressed-by-contrast type EW28). Details of RGC classification can be found in    142 Our next goal was to quantify burstiness by a single measure. Because our method of generat-143 ing spike trains necessarily places each spike within a burst window, we could not use a standard 144 definition of burstiness, such as the total number of spikes contained within bursts divided by the total number of spikes (Oswald et al., 2004). Instead, we reasoned that a cell should intuitively be 146 classified as more bursty if it contains, on average, more spikes within its burst windows. However, 147 increasing the firing rate will automatically increase the number of spikes per burst window, so 148 we normalized by the mean firing rate. Therefore we defined the burstiness factor, , as the aver-149 age number of spikes per burst window normalized by the firing rate (measured in seconds/burst 150 window). Figure 1E illustrates the effect of different levels of burstiness on the ISI distribution for 151 a fixed firing rate. We prefer this definition to one that thresholds the number of spikes within 152 a certain period of time, since our definition of burstiness is easily computed from the model pa-153 rameters and eliminates the need to threshold, thus reducing both the number of parameters and 154 arbitrariness. Our quantification of burstiness is dependent upon the parameters of the nested 155 renewal process and cannot be applied to experimental data; therefore, we matched ISI distribu-156 tions from experimental data with simulated data (Figure 1C,D) via a Kolmogorov-Smirov test in 157 order to measure burstiness in spike trains from recorded RGCs ( Figure 1F). In addition to vari-158 able burstiness, the nested renewal process allowed us to independently modulate the firing rate  (B) Population response to varying levels of stimulus strength, modeled with a drop and subsequent rise in firing rate. The green "rate" line refers to firing rate. Population peri-stimulus-time-histogram (PSTH) is shown in black. Every neuron in each population has the same recovery time constant, but the recovery time constant varies from top to bottom (shortest for the top population and longest for the bottom) (C) Populations of varying responsivity to the stimulus. Here the recovery time constant is the same for each population and only the responsivity is varied. Population PSTH is shown in black.
We modeled a stimulus-dependent drop in firing as an instantaneous drop to a firing rate of zero followed by an exponential rise back to the baseline firing rate, controlled by a variable time 166 constant of recovery ( Figure 2B). This stimulus model was chosen because it is consistent with 167 how both bSbC and OFFsA RGCs respond to positive contrast; longer time constants of firing re-168 covery correspond to higher contrast stimuli (Wienbar and Schwartz, 2022;Jacoby et al., 2015). 169 Although this stimulus model was chosen with bSbCs and OFFsAs in mind, it generalizes to any 170 type of neuron which decreases its spontaneous firing rate proportionally to a stimulus. We also 171 introduced heterogeneity into the population of neurons by varying responsivity to the stimulus, or 172 the proportion of the population that responds to the stimulus, since this is often less than 100% in 173 experimental studies. In other words, we made a subset of the neurons unresponsive to the stim-174 ulus by allowing them to continue spiking with baseline statistics after stimulus onset ( Figure 2C). 175 Analogous to models for detection of the onset and duration of an increase in firing, this stimulus 176 allowed us to measure performance in decoding the onset and duration of the gap in firing for 177 each simulated population of neurons.

178
The information train enables aggregation of heterogeneous spike trains. 179 By aggregating the spike trains of the population of neurons into a one-dimensional signal over   of the information about the higher-order statistics of each individual spike train that could, in prin-196 ciple, be useful to the decoder. Therefore, we developed a new aggregation method based on the 197 information content of the spike trains. We called the resulting signal the "information train."

198
The information train method was inspired by neural self-information analysis (Li and Tsien,199 2017). In order to build a continuous signal of the information content in a single cell over time,   . pPSTH detected stimulus onset 50 ms after stimulus presentation; stimulus offset was detected 2 ms after stimulus onset. Information train detected stimulus onset 40 ms after stimulus presentation; stimulus offset was detected 175 ms after stimulus onset. The filter on the pPSTH and the threshold on the information train were both chosen so that there was one false detection in the prestim time (the pPSTH touched 0 once and the information train crossed the threshold once in the prestim time). Note that this is a different false detection rate than used in our study (which was 0.1 errors/s), but for demonstration purposes we wanted to show what a false detection looks like. (G) pPSTH (left) and population information train (right) readout on stimulus onset detection task as responsivity decreases in a population of 30 neurons. Top: proportion of trials with readout vs. responsivity, where readout does not exist whenever the signal fails to cross the threshold after stimulus onset. Bottom: reaction time vs. responsivity, where reaction time is the time between stimulus onset and stimulus onset detection. Error bars are interquartile range across 10 trials. case, the current ISI is longer than the most probable one. However, we do not know that it is longer 215 until the point in time when it has passed the length of the most probable ISI and the cell has not 216 yet spiked. The SI train reflects this by beginning at baseline information and staying constant for 217 the duration of the most probable ISI, then rising according to the SI curve and stopping at the SI 218 value indicated by the total current ISI (blue sections in Figure 3C bottom). Finally, in the third case, 219 the current ISI is shorter than the most probable one. In this case, we know that it is shorter as 220 soon as it ends, so the SI train stays at baseline information until the very end of the ISI, then rises 221 instantaneously to the value indicated by the SI curve (green sections in Figure 3C bottom). At the 222 start of each ISI, the SI train resets to baseline information, meaning that it has no memory of its 223 history and considers subsequent ISIs independently. what is actually relevant to the brain) and high correlations.

259
As with any decoding mechanism, the biological plausibility of the information train needs to  Therefore we suggest that the information train decoding scheme is biologically plausible like a 268 pPSTH, but captures more information.

269
A critical benefit of the population information train over the pPSTH is that it should automati-  In each case, the pPSTH could potentially be modified to get around problems caused by popula-282 tion heterogeneity (such as a responsivity-weighted pPSTH), but the modifications would have to 283 be different in every case. In contrast, the information train is both robust and flexible without 284 tuning.

285
There exists optimal burstiness for minimizing reaction time.

286
Having developed a readout mechanism -the population information train -we then used it to 287 decode the onset of a gap in firing by measuring reaction time. We were interested in the effects 288 of multiple parameters on reaction time: burstiness, firing rate, population size, and correlation.

289
For three of these parameters, we had an intuition for how they should affect performance based 290 on one key insight: when trying to decode the onset of a gap, temporal precision is key to getting 291 accurate estimates. Increasing the firing rate of a population increases temporal precision, so we 292 expected that increasing the firing rate would improve performance (i.e. decrease reaction time).

293
Another way to gain temporal precision is to increase the size of a population -therefore we also We may write any one of these dimensionless quantities as a function of the rest, but it is 318 difficult to fit functions of three variables, so we fix population size and correlation so that the 319 function no longer depends on them, and then we have Eq (2) immediately reveals that both reaction time and burstiness are inversely proportional to 321 the firing rate. While burstiness was defined in such a way that it must be inversely proportional to 322 firing rate (see Methods), it is illuminating that reaction time is inversely proportional as well. This basic theoretical result of dimensional analysis gives us much more information than our intuition, 324 which simply told us that reaction time should decrease with firing rate.

325
The practical implication of the Buckingham Pi Theorem (Buckingham, 1914) is that we may 326 collapse all the data for different firing rates together, with no pattern, so we can analyze them 327 together. The functional form of is now possible to find by fitting ( Figure 4B). There is a clear 328 minimum in the data, demonstrating that there is some level of burstiness which is optimal for 329 minimizing reaction time across all firing rates. We chose a polynomial fit to describe the data, 330 since that is a reliable way to find the minimum. We want to emphasize here that we are not 331 claiming that the data has a polynomial form; we are only concerned with finding the minimum, 332 and any other good fit would give the same minimum. Now we may separately vary population size 333 and correlation (Figure 4C,D), illustrating that the existence of optimal burstiness for minimizing  There exists optimal burstiness for discriminating stimulus strength. 343 Besides "when," another fundamental question to ask about a stimulus is: how strong is it? In By fitting, it is clear that there is an exponential relationship between these variables ( Figure 5B). 360 Now, for each combination of parameters, for which we have several trials of gap length measure-361 ments, we chose the performance metric to be the scatter in the data around the exponential 362 function suggested by dimensional analysis (Figure 5C) which we quantified with normalized root While burstiness is still inversely proportional to the firing rate, Eq (4) reveals that NRMSE is 372 constant with firing rate. In other words, the ability to decode the duration of a gap in firing rate 373 does not depend on the spontaneous firing rate. Plotting reveals that NRMSE decays to a nonzero 374 constant ( Figure 6B). Interestingly, there is a large range of burstiness that optimizes NRMSE, in 375 contrast to how there was a single optimal value of burstiness for minimizing reaction time. We 376 chose to fit an asymptotic decay function to the data (see Methods) in order to characterize the 377 optimal value of NRMSE. By the nature of an asymptotic function, the fitting function continues to 378 decay very slightly where the data has already become constant. We chose to study the "elbow" of 379 fit because if we can describe the lowest level of burstiness that optimizes NRMSE, we will know 380 that any burstiness larger than that will also be optimal.

381
Next we separately varied population size and correlation (Figure 6C,D), which showed us that 382 the existence of asymptotic NRMSE and a large range of burstiness resulting in this optimal per-383 formance was robust for both parameters. Asymptotic NRMSE decreases with population size and 384 increases with correlation, while optimal burstiness is relatively constant with both parameters, 385 implying that the level(s) of burstiness optimal for discriminating stimulus strength is not affected 386 by population size or correlation.

387
bSbC RGCs have close to optimal burstiness for gap detection. 388 Having established that optimal burstiness exists for decoding the onset and duration of a gap 389 in firing, our next question was how the baseline spike statistics of the RGC types we studied re-390 late to this optimum. Recall that we fit functions that describe how reaction time (Figure 4) and information train and demonstrated that this method is more robust to unresponsive cells than 417 a population PSTH (Figure 3). Using the information trains of different simulated populations, we 418 discovered that there is an optimal level of burstiness for detecting the onset of a firing gap that is 419 inversely proportional to firing rate and relatively independent of population size and correlation 420 (Figure 4). There is also an optimal range of burstiness for detecting gap duration that is relatively 421 independent of all of these other parameters (Figure 5, Figure 6). Finally, we considered the base-422 line spike statistics of four RGC types in the context of these theoretical results and revealed that 423 the burst patterns of bSbC RGCs make them nearly optimal for detecting both the onset and the 424 strength of a contrast step (Figure 7). 425 Baseline spike statistics can be optimized for different decoding tasks.

426
The fact that some burstiness higher than minimum is optimal for identifying the time of stimulus 427 presentation and stimulus strength leads us to a principle that has a long history in population 428 spike analysis (Kepecs and  in firing, since it is also dependent on temporal precision. Now we will lay out our intuition for 436 why there exists optimal burstiness for both tasks, which will explain why it makes sense that 437 specific spiking patterns are optimal for encoding different stimulus features. We again attribute 438 increased performance on these tasks to increased temporal precision. As explained earlier, a 439 population could theoretically increase its temporal precision by increasing its firing rate or its size, there is an optimal level of burstiness for identifying the timing and strength of a stimulus. 456 We found that the baseline firing statistics of bSbC RGCs lie close to the optimum for perfor-457 mance on two gap detection tasks (Figure 7), but what about the other 3 high-baseline-rate RGC 458 types we analyzed and the many lower baseline rate ones we did not analyze? It is important 459 to consider (1) that the gap detection tasks we defined are only two out of many decoding tasks 460 that must be performed simultaneously across the population of RGCs, and (2) that biological con-  The information train as a way to track population-level changes in spike patterns.

472
Our work has practical as well as theoretical implications: we proposed the information train read- information train will reflect a stimulus change in both of these cases even when the whole popula-482 tion is analyzed together. This is because any ISI length out of the ordinary (i.e. different from the 483 most probable one) causes a positive deflection in the information train, so no matter whether a 484 cell's firing rate increases or decreases in response to a stimulus, the population information rises.

485
Therefore the information train is a convenient readout mechanism to use because it does not re-486 quire any assumptions to cluster cells, even when the population recording is heterogeneous. It is 487 also easy to implement by simply fitting the observed ISI distributions in the absence of a stimulus 488 to a gamma (many neural circuits have ISI patterns well fit by a gamma distribution (Li et al., 2017)) 489 or sum of gammas distribution (depending on whether the cells in question are bursty).

490
Limitations and future directions. but it is still difficult to estimate the joint probabilities -it is also necessary to be able to accurately 501 extrapolate those joint ISI distributions in order to deduce the stimulus response. Namely (still 502 assuming a stimulus which depresses firing rate), we need to be able to accurately estimate the 503 very tails of the joint ISI distributions. We were able to do this in our study by fitting our observed 504 ISI distributions with a sum of gammas distribution (Figure 2-figure Supplement 2), but it could 505 be dangerous to first estimate the joint ISI distributions and then estimate their tail values. Even 506 small errors in the density estimation would lead to drastically different results, since PMI (much 507 like SI) amplifies the significance of extremely improbable events. In other words, small differences 508 in lead to large differences in 2 ( ) for low . Calculating the true PMI of the population over 509 time is an important future direction, but it has to be done carefully.

510
Another limitation of this study is that we have not formally seen how sensitive the information 511 train readout is to more complex tasks/stimuli. We posit that the information train should be 512 able to flexibly reflect any change in spiking patterns at the population level since every observed 513 ISI different from the most common one registers a change. However, we only compared the 514 information train to the standard pPSTH for one task; we do not know if the information train 515 gives a more accurate or robust readout than a pPSTH for all stimuli.

516
Not only is there potential follow up research stemming directly from this study, such as con-517 structing the population information train using PMI and exploring more complicated stimuli, but 518 in addition, the information train will hopefully be used to investigate more complex statistics of if is 1, the ISI distribution is (1, ) = ( ). The average ISI length is and therefore the 541 average firing rate is .

542
We extended the renewal process again, nesting one renewal process inside another. The outer 543 renewal process with parameters 1 (a shape), 1 (a rate) determines the number and placement 544 of burst windows, with an average of 1 1 burst windows/second. We allowed 1 to vary between 30 545 and 600 bursts/second, and 1 to vary between 3 and 6 in order to obtain a range of 10-100 burst 546 windows/second. The inner renewal process with parameters 2 , 2 determines the number and 547 placement of spikes within each burst window, with an average of 2 2 spikes/second within each 548 burst window. We let 2 range from 300 to 6,000 spikes/second, and 2 range from 3 to 6. Thus with a range of 10-100 Hz. Population readout mechanisms.
In order to measure reaction time and gap length in the information train, we set a threshold 581 on the information train such that the error rate was 0.1 false crossings per second.

582
We set the threshold for the pPSTH readout mechanism at 0, and set the filter length such that 583 the pPSTH reached 0 with a rate of 0.1 errors/second during the prestimulus time.
In order to obtain a function of one variable which relates gap length and the time constant, we 602 set dimensionless burstiness, population size, and correlation to be constant and obtained Eq (3).

603
Once we defined a performance metric (NRMSE) for the gap length decoding task (Figure 5C), 604 we used dimensional analysis again to find its dependence on burstiness and the other model pa-605 rameters (Figure 6A,B) Fixing population size and correlation so that the function no longer depends on them, we 610 obtain Eq (4).
From dimensional analysis, we obtained Eq (4). Plotting against revealed that is an asymp-613 totically decaying function ( Figure 6B). We therefore let be of the form The range of obtained from this fitting routine was 0.5-4 with no systematic trends, so was 615 taken as 2, which results in essentially the same good fits and reduces the number of fitting param-616 eters to two. We also considered an exponential fit, but ultimately rejected it because the fit was 617 not as good as the fit given by Eq (10) with fixed , and it requires three fitting parameters instead 618 of two. Goodness of fit of Eq (10) with variable as well as fixed , and the exponential fit, were all 619 assessed with root mean square error (RMSE) returned by the fitting routine. 620 We fit the exponential relationship between the dimensionless time constant and the dimen-

636
Data and code availability.

637
All simulated data reported in this paper will be available from a GIN database upon manuscript 638 acceptance. Experimental data reported will be shared by the lead contact, Greg Schwartz 639 (greg.schwartz@northwestern.edu), upon request. All code written in support of this publication 640 will be available from a GitHub repository upon manuscript acceptance.

641
Acknowledgments 642 We are grateful to all members of the Schwartz Lab for their feedback and support on this project. 643 We would like to thank Stephanie Palmer and Sophia Weinbar for their comments on the manuscript,   Figure 1C with sum of gammas fit. Goodness of fit is reported as RMSE. (B) ISI distribution of experimentally recorded OFFsA in Figure 1D with sum of gammas fit.  Responsivity is fixed at 60%, population is fixed at 5 cells, correlation is fixed between 0-0.2, and total firing rate is fixed at 30, 60, and 100 Hz. (B) Dimensional analysis collapses the data for different firing rates. Dimensionless reaction time is plotted against dimensionless burstiness. Responsivity is fixed at 60%, population is fixed at 5 cells, and correlation is fixed between 0-0.2. Trial-averaged (median) data is shown. Error bar = standard error estimate of the data around the fit. (C) Existence of a minimum is robust across population sizes. Responsivity is fixed at 60% and correlation is fixed between 0-0.2. (D) Top: minimum (dimensionless) reaction time against population size. Bottom: optimal (dimensionless) burstiness against population size. Responsivity is fixed at 60% and correlation is fixed between 0-0.2.