Abstract
Calcium imaging technique provides the advantages in monitoring large population of neuronal activities simultaneously. However, it lacks the signal quality provided by neural spike recording in traditional electrophysiology. To address this issue, we developed a supervised data-driven approach to extract spike information from calcium signals. We propose the ENS2 (effective and efficient neural networks for spike inference from calcium signals) system for spike-rate and spike-event predictions using raw calcium inputs based on U-Net deep neural network. When testing on a large, ground truth public database, it consistently outperformed state-of-the-arts algorithms in both spike-rate and spike-event predictions with reduced computational load. We further demonstrated that ENS2 would improve analyses of orientation selectivity in primary visual cortex neurons. We concluded that optimizing our system for spike-event prediction would produce a versatile inference system that benefits diverse neuroscience studies.
Introduction
One key to understanding the complex functions of the brain is to simultaneously measure the activity of neurons across different layers and brain areas. Electrophysiological recordings, such as patch-clamp 1,2 and multielectrode extracellular recording 3, have long been the major method to record neuronal spiking events. These recordings are typically of high temporal resolution and high signal-to-noise ratio (SNR). However, it is technically challenging with these methods to acquire recordings from a large number of neurons stably in vivo 4.
In recent decades, the optical-based calcium imaging technique has increasingly been used for in vivo neuroscience research 5-12. This imaging technique enables simultaneous monitoring of activities of thousands of neurons over a considerable period of time. Moreover, as more effective fluorescent calcium indicators 13-17 and imaging devices 18-20 have become available, it is now possible to localize and extract the individual activities of a large number of neurons in various subcellular structures 21.
Nevertheless, calcium imaging is only an indirect measurement of neuronal activities. In brief, the concentration of intracellular calcium evoked by neuronal firings undergoes non-linear changes. These fluctuations in calcium are again non-linearly reflected by calcium indicators, whose fluorescent intensities could be imaged. Afterward, the locations of individual neurons or compartments (region of interest, ROI) are identified on images, and the time-varying fluctuations of fluorescence signals in the ROIs are extracted as a surrogate of neuronal activities. Another limitation of calcium imaging is that the signals can commonly have a low SNR 21, especially for those recorded in deep brain regions in vivo or at low light conditions. Furthermore, the high spatial resolution of calcium imaging trades off with indicators’ slow temporal dynamics up to hundreds of milliseconds 14,22, resulting in low-pass filtered activities. These indicators come in different types, typically synthetic dyes or genetically encoded calcium indicators (GECIs), and their different dynamics further complicate the task to convert the calcium signal into neuronal signals.
Previous work has shown that spike inference plays a crucial role in interpreting calcium data and dissecting neural circuits 5-8. In the past decades, researchers have developed various algorithms to recover multiunit neuronal spikes. These algorithms can be generally divided into two major categories: model-based systems 23-34 and data-driven systems 35-39. In model-based systems, physiologically constrained models were typically built, considering that the calcium signal concentrates with neuronal firings and decays exponentially afterward. With these models, calcium traces could be simulated through estimated spike trains and addictive noises. They include systems based on template matching 24,25,27 (e.g. peeling 25), deconvolution 23,26,30,32-34 (e.g. OASIS 30) and Bayes’ theorem 26,28,29,31 (e.g. MLspike 29). For example, as one of the state-of-the-art algorithms, MLspike was proposed using a physiologically constrained model and optimized by Maximize a Posteriori (MAP) estimate to infer the most likely spike trains from noisy calcium signals. However, these model-based methods typically require tuning of model parameters for each new recording, either manually or to be estimated by auxiliary algorithms. Moreover, when likelihood optimization is involved, they become rather computational expensive to use. On the other hand, data-driven systems based on supervised learning have emerged with promising performance. A supervised deep learning algorithm called CASCADE has been reported recently, which delivered top-ranking spike inference performance when training data with matched noise levels as the testing dataset were selected for training 38. Previous data-driven models have limited generalization ability due to the limited high quality paired data for training 35,36,39. An extensive public database of paired data (simultaneously recorded calcium fluorescence signals and electrophysiology ground truths) has been compiled alongside with the development of CASCADE. It has facilitated such data-driven approach for better generalization of the models, although some re-training is still necessary for noise matching 38. Some other works also use feature extraction and thresholding 40,41 (e.g. GDspike 41) to tackle the problem of spike inference.
For inferring unpaired calcium signals from in vivo imaging, a calibration-free inference system that could generalize on un-seen recordings with high performance is necessary. In fact, neural networks have shown satisfactory performance in processing bio-signals with severe inter-record variability, including electrocardiogram 42-44 and electromyography 45. Provided with sufficient amount of paired data, the generalization ability of neural networks makes it a promising approach for inferring spikes from calcium signals. In this work, we performed thorough research on the impact of each component in the neural networks on the spike inference tasks. The preferred configurations of input types, network architectures, and cost functions were investigated. Here, we developed the ENS2 (effective and efficient neural networks for spike inference from calcium signals) system (Fig. 1) with state-of-the-art performance and generalization ability but with lower computational complexity. To further demonstrate the validity of the ENS2 system, we deployed the ENS2 system on a set of calcium imaging data from primary visual cortex and showed how the spike inference could improve the analyses of the experimental data. Lastly, we conducted additional simulations to address factors in the calcium data that could benefit the performance of deep learning based models for spike inference. These analyses provided useful insights on how to prepare calcium data that will favor future algorithm development, which could help us understand the complex process in the brain.
The ENS2 system contains a neural network to infer spike-rate from calcium inputs (A) and an unsupervised greedy algorithm for estimating spike-events from spike-rate predictions (B). The neural network is trained with calcium trace inputs paired with ground truth (GT) spike-events, while it could test on calcium traces alone after training for obtaining predicted (PD) spike-rates. For a given calcium recording, our ENS2 system will first predict the corresponding spike-rate with the procedures in (A). Inputs are segmented and fed to the U-Net based model, whose spike-rate predictions are gathered through a voting strategy. Afterward, spike-events are estimated by the four-step algorithm in (B). In brief, valid fragments of spike-rate prediction are extracted by thresholding. Then, spike-event is inserted tentatively to approximate the extracted spike-rate fragment. The resultant spike-events sequence that achieves the minimal MSE is regarded as the final spike-events prediction. Details are explained in the Methods section.
Results
Raw calcium input with MSE loss is adequate for spike inferring tasks
For selecting the best method to infer spikes, we evaluated all potential configurations of models across all 20 datasets (see Methods), where they are expected to generalize to un-seen datasets for achieving high performance. First, we tested how differently pre-processed calcium inputs would benefit the inference task. When measured in Corr (correlation, Fig. 2A), raw inputs hold clear leads over either normalized inputs or de-noised inputs. However, no clear advantage was observed when using the other evaluation metrics (Fig. 2B-D & Extended Data Fig. 1A-B).
Performances of our ENS2 system are measured in (A) correlation, (B) van Rossum distance, (C) Victor-Purpura distance, and (D) error rate, respectively. Generally, the configuration of U-Net, MSE loss and raw input provides the best overall performance for both spike-rate prediction and spike-event prediction. (E) Examples of different pre-processed calcium inputs (colored) and ground truth spike-events (black). (F) Examples of spike-rates and spike-events predicted by our proposed method and state-of-the-art methods. (G-I) Examples of spike-rates and spike-events predicted by our proposed method with different configurations. Bar plots present means with 95% confidence intervals. Metrics shown in F-I measure the performance on the whole dataset. Orange shaded areas represent regions of interest where discrepancies in predictions are observed among different methods.
Next, we assessed if the choice of the loss functions would affect the performance of our models. Here, we compared the case with MSE (mean square error), vRD (van Rossum distance) and Corr as loss functions. Intuitively, we expected that using vRD loss and Corr loss would benefit the performance evaluation using vRD and Corr respectively. Nevertheless, our results show that this is only true for some cases. In fact, the benefit with Corr loss on Corr evaluation is not statistically significant (Fig. 2A). On the other hand, using Corr loss results in undesired vRD, VPD, and ER with either U-Net or FC-Net (Fig. 2B-D), and in fact, the performance (in terms of vRD, VPD, and ER) is always worse than using vRD loss or MSE loss. Similar observation is found when evaluated with Error and Bias (Extended Data Fig. 1A-B). The major reason of this is that the Corr loss function fails to differentiate predictions of similar temporal pattern but different amplitude (Fig. 2H), in such a way that it tends to make a prediction with minimal spike-rate (in amplitude) as compared to other configurations (Fig. 2F-I). As a consequence, spike-event could not be reliably estimated from the predicted spike-rate due to low signal-to-noise ratio. On the other hand, using vRD loss does not show clear advantage over using MSE loss in terms of performance, even when evaluated with vRD (Fig. 2B). In fact, vRD loss is more costly to use than MSE loss since its computation requires ground truth spike-event in addition to ground truth spike-rate. As a result, we took raw calcium input with MSE loss function in our proposed ENS2.
U-Net achieves the best overall performance in spike inferring tasks
We then compared three architectures of neural networks (U-Net, Le-Net, and FC-Net, see Methods). As shown in Fig. 2, U-Net delivers the best overall performance, especially for Corr evaluation. When measured by vRD and ER, slight advantages with U-Net could also be observed over the other two networks, in particular with MSE loss or vRD loss. This also holds when we evaluated the models with Error and Bias (Extended Data Fig. 1A-B).
To show that the difference in performance came from the network architecture rather than specific hyper-parameter settings, we repeated the simulation using the configuration of U-Net, raw input, and MSE loss with various hyper-parameters (Extended Data Fig. 2). The filled bars represent the default hyper-parameter combination used in this study as described in the Method section. Regardless, we showed that they all have little effect on the final performance. The Corr approaches 70% and vRD remains less than 3 in all cases. The VPD and ER are around 0.6 and 50%, respectively.
We also proved that our models were trained adequately with our early-stopping criteria (see Method). Extended Data Fig. 2G illustrates the MSE training losses for all 20 datasets. The losses decrease with more iterations generally and are stabilized sufficiently as the training stops. Moreover, Extended Data Fig. 2 demonstrates that the patience of iterations (see Method) before early-stopping has little influence on performance. Together, we proved that our models had been trained and regularized sufficiently by iterating over only thousands of batches of data to avoid over-fitting.
Comparison to state-of-the-art studies
Based on our investigations above, we selected the configuration of U-Net, raw input, and MSE loss as our proposed ENS2 system. We took this further to compare it with two representative state-of-the-art studies: CASCADE 38 and MLspike 29. We selected these two methods as they are the top performing systems within the two major categories: data-driven systems and model-based systems, respectively. Moreover, both of them have already shown surpassing performance over previous methods using various datasets and evaluation metrics in their studies. Results are summarized in Fig. 3.
Dataset-wise performance is measured in (A) correlation, (B) van Rossum distance, (C) Victor-Purpura distance, and (D) error rate, respectively. Inference performance on each neuron is summarized in (E-H) histograms and (I-L) heatmap. Empty dots in A-D denote the performance for each individual dataset. Boxplots show the median performance of the respective algorithms. Vertical dashed lines in E-H mark the median performance of ENS2.
We first benchmarked their performance across all 20 datasets (see Methods). When measured by Corr, vRD, VPD, and ER, Fig. 3A-D shows that the data-driven systems (i.e. our proposed ENS2 and CASCADE) generally perform better than the model-based system (i.e. MLspike). We showed here that data-driven models could perform better even in VPD and ER (up to 10%) than the model-based method, which is specially designed for spike-event prediction. This may originate from the generalization abilities of data-driven methods, while model-based method relies on auto-calibration of parameters. When compared to CASCADE, our systems also showed superior performance for both spike-rate prediction (Fig. 3A-B, Extended Data Fig. 1C-D) and spike-event prediction (Fig. 3C-D). In particular, our ENS2 shows 10% higher in Corr and 5% lower in ER than CASCADE.
We took a deeper look into these results by pooling all the neurons and examining the distribution of the performance indices (Fig. 3E-H). For spike-rate predictions, we found that ENS2 reached over 80% in Corr and below 2 in vRD for a notable number of neurons, while much fewer neurons for CASCADE and MLspike (Fig. 3E-F). Similar observations can be found in spike-event inferences (Fig. 3G-H). This proves that our ENS2 system yielded better performance consistently for most neurons. We also investigated how the performance of each algorithm varied for each single neuron (or dataset) (Fig. 3I-L). The comparison shows that the performance of inference for certain neuron (or dataset), regardless of algorithm, indeed depends significantly on their own properties, such that for neurons that were well/badly predicted by one algorithm were usually predicted (relatively) well/badly by the other algorithms as well (e.g. dataset 8/20).
We highlighted several specific segments of recording from four datasets of different frame rates, calcium indicators and species as examples to further understand how the different models and metrics impact the spike predictions (Fig. 4). For dataset 9 with the GCaMP6f indicator and a high sampling rate up to 160Hz (Fig. 4A), CASCADE tended to output broad spike-rate prediction, and thus resulting in over-estimation with shifted spike-events (shaded in orange). In contrast, MLspike tends to under-estimate such that quite a number of missing spike events were observed. On the other hand, our system (ENS2) showed better predictions than both these two methods for all evaluation metrics (bold values on the right in Fig. 4A). For dataset 12 under a similar high frame rate but using the GCaMP6s indicator (Fig. 4B), the calcium signals possess diverse dynamics upon spiking. The prediction task becomes challenging because some spike-events could hardly be identified even by visual inspection. In this case, we find our ENS2 system predicts spike-rates and spike-events that are closest to the ground truth. Similarly in dataset 13 (Fig. 4C), we found excessive spike-event predictions by both CASCADE and MLspike, probably due to the additive noise in noise matching input or shot noise in raw input. For dataset 6 recorded from zebrafish (Fig. 4D), the frame rate is lower (30Hz). The slow calcium dynamics caused over-estimation in all three methods. However, we found that the ENS2 system best recovered the temporal firing pattern, predicting three more precise spike-rate windows (shaded in orange) with fewer over-estimated spike-events. Similar advantages of our systems could also be observed in Fig. 2E-F, where CASCADE and MLspike also tend to over-estimate and under-estimate in dataset 11, respectively. Here, we show that our proposed systems could maintain robust inference capability under varying conditions.
In each subfigure, raw calcium inputs and noise matching inputs are shown with the ground truth spike-events on top. The predicted spike-rates and corresponding spike-events by various methods are shown below. Metrics on the right measure the performance on the whole dataset. Orange shaded areas represent regions of interest where discrepancies in predictions are significant among different methods.
Fig. 2-4 also show that pre-processing input data did not improve performance of our models apparently. For instance, when predicted on dataset 11, explicitly canceling noise (e.g. de-noised input, Fig. 2I) or matching noise-levels of input data (e.g. CASCADE, Fig. 2F) did not gain any advantage in the benchmark. They consistently showed worse performance in both spike-rate and spike-event predictions than the ENS2 system with raw input (Fig. 2F). Instead, we found that using raw inputs for our proposed model achieves the best inferring results for most cases. We suggest that our data-driven neural network could handle a considerable range of noise turbulence implicitly by itself, since they perform equally well or even better with raw inputs. This reduces the workload of input pre-processing and feature engineering that were compulsory in traditional machine learning tasks.
We would also like to point out that although our proposed ENS2 is data-driven, it is less computationally demanding than the previous method (e.g., CASCADE, Fig. 5). For a specific sampling rate (e.g. 60Hz), series of noise matching models were trained in CASCADE to meet the need of different noise levels. Each of their noise matching model consisted of 5 identical networks for ensemble learning to boost performance. On the other hand, only a single network is required in our ENS2 to predict data under each sampling rate for various conditions. As a result, the ENS2 with U-Net requires 20k fewer trainable parameters, and only a maximum of 5.12 million data segments are fed for training, which is fewer than that of CASCADE by two orders (Fig. 5A). In particular, even using an entry-level GPU (Nvidia GTX1650), training of ENS2 on millions of samples from 19 datasets completed in less than 2.5 minutes on average, and it took just seconds for inference on a testing dataset. These features are not only beneficial to off-the-shelf usage for spike inference, but also enable cost-effective re-training or fine-tuning when more paired datasets are available to improve our model further. Our spike-event estimation algorithm (see Methods) is also more computationally efficient than that used in CASCADE (Fig. 5B-D). When applied on the same spike-rates predictions of all 20 datasets from the ENS2 system, our greedy algorithm performed faster by one order of magnitude (Fig. 5B), while maintaining similar accuracy (Fig. 5C-D). As such, we show that our ENS2 demonstrated high performance and good computational efficiency for inferring spikes from calcium data than the state-of-art methods. In fact, while CASCADE took 26sec for spike-event estimation alone, this time is sufficient for the ENS2 system to complete both spike-rate prediction and spike-event estimation.
(A) Comparison of neural networks adopted in ENS2 and CASCADE. Our method requires two-order fewer samples to compute, and could generalize to all noise-levels. (B-D) Comparison of the spike-event estimation algorithm (Fig. 1B) in ENS2 and CASCADE. (B) Comparison of runtime spent between the spike-event estimation algorithms in ENS2 and CASCADE, on estimating spike-events for all 20 datasets under 60Hz. The greedy estimation algorithm in ENS2 is over one order faster than that in CASCADE. (C) VPD measured from spike-events estimated by the two different algorithms, showing comparable estimation performance. Both methods used the same spike-rate predictions from ENS2 as inputs in this comparison. (D) Same as (C), but for ER measurement. Results are presented for each dataset. The runtime was measured on a PC with an Intel Core i7-4770 CPU.
Application to information encoding in primary visual cortex
In the above benchmark, we show that ENS2 accomplished relatively high performance and high efficiency for inference in un-seen recordings. Next, we ask if our system indeed helps in real neuroscience problems as previous models did 5,7,8. Here, we trained our full ENS2 system with all 20 datasets available, and then deployed it to unseen calcium imaging data recorded from the primary visual cortex (V1) (Fig. 6). We collected in vivo calcium fluorescence images from V1 of mice that were shown to drifting grating stimuli of four unique orientations, that move in two opposing directions (8 directions total) (Fig. 6A, see Methods). Responsive neurons were selected, and their fluorescence signals were processed into calcium traces (ΔF/F0) for further analyses (Fig. 6B1-2 & 6C1-2, see Methods). We then used our ENS2 system to predict the spike-rate (Fig. 6B3 & 6C3) and spike-event (Fig. 6B4 & 6C4) accordingly. We compared our analyses with these different inputs and verified if the spike inference has any positive impact in understanding the information encoding in V1 than the raw ΔF/F0 trace alone. Here, two presentative neurons are shown in Fig. 6B-C.
(A) An example of image (A1) recorded from the binocular zone of the left V1 (A2) of mice subject to visual grating stimuli (A3). Visually responsive neurons are labeled in (A1) and are considered for further analyses (see Methods). The inner color denotes the response type of the neurons. Contra and ipsi refers to the neurons that are responsive to contralateral (right) and ipsilateral (left) eye inputs, respectively. Both means the neurons are responsive to both sides of inputs. The outer color denotes the OSI computed from raw ΔF/F0 signal for that neuron. (B-C) Examples of recorded calcium signals and the predicted spike-rates and spike-events by ENS2 (B1-B4, C1-C4). Tuning curves and selectivity indexes (B5-B10, C5-C10) are computed based on three types of inputs (B2-B4, C2-C4), respectively. (D-E) Comparisons of OSI/DSI computed from raw ΔF/F0 signal and after spike inference with ENS2 for all the 192 contra and 141 ipsi neurons considered.
We first constructed the tuning curves (see Methods) for each neuron (Fig. 6B5-7 & 6C5-7). It should be clear that the resultant tuning curves are quite different when computed using ΔF/F0 (Fig. 6B5 & 6C5) and using spike-rate or spike-event (Fig. 6B6-7 & 6C6-7). In particular, the spike-rate/spike-event tuning curves have sharpened preferred orientation for these cells, which are beneficial for calcium data analysis and interpretation. The diffused tuning curve by ΔF/F0 is mainly due to the long “tail” in ΔF/F0 signal after each peak (Fig. 6B1 & 6C1) due to the slow dynamics of calcium indicators. As a result, there is a shift in the preferred orientation and leads to a misinterpretation of the neuronal behavior. On the other hand, by predicting the spike-rate/spike-event with our ENS2 system, we reliably eliminated these long tails in the signal. We further quantified the preferred orientations by computing the orientation selectivity index (OSI) and direction selectivity index (DSI) from the tuning curves for each neuron (see Methods). The neuron in Fig. 6B is a sample cell expected to have high OSI but low DSI; while the neuron in Fig. 6C is expected to have high OSI and high DSI 46 (see Methods & Extended Data Fig. 7). Our results show that OSI computed from ΔF/F0 was low for both neurons in Fig. 6B-C but was higher when computed from spike-rate/spike event (from 0.47 to 0.77/0.83 and from 0.26 to 0.76/0.71, respectively). Similarly, DSI computed from ΔF/F0 was low for the neuron in Fig. 6C, but increased when computed from spike-rate/spike event (from 0.36 to 0.62/0.54). Extended Data Fig. 3A-B show two neurons expected to have a preferred orientation at 0°, but with a strong baseline drift in their ΔF/F0. Due to the shift in baseline, the OSI or DSI tends to be higher when using ΔF/F0. In contrast, the predicted spikes were able to filter out the shift in baseline, which resulted in a more accurate readout of the OSI/DSI measurement (such as Extended Data Fig. 3B). In all these cases, we verified that the OSI and DSI computed from spike-rate/spike event predicted from our ENS2 system can better discriminate the response pattern of these neurons than from the raw ΔF/F0.
In fact, this holds true in general for the neuron population we have recorded (192 contra and 141 ipsi neurons (see Methods)). Fig. 6D1-2 show that OSI computed with spike-event was higher than that computed with ΔF/F0 in >80% of the cells (89.06% for contra neurons and 82.98% for ipsi neurons). Also, Fig. 6D3-4 show that DSI computed with spike-event was higher than that computed with ΔF/F0 in >68% of the cells (68.23% for contra neurons and 68.09% for ipsi neurons). Overall, the median OSI/DSI for both contra and ipsi neurons are higher with spike-event/spike-rate than ΔF/F0 (Fig. 6E). The higher OSI/DSI can increase the sensitivity to discriminate the response pattern of these neurons for their orientation preference. The wider range of OSI/DSI distribution can also increase the dynamic range for analyzing these neurons’ characteristics. As such, the improved analytical power using the predicted spike-event/spike-rate from our ENS2 could benefit the understanding of the complex information processing in the visual cortex.
It is also worth noting how the spike inference could benefit analyses of neurons that have weak responses or where the signal-to-noise ratio (SNR) is low (Extended Data Fig. 3C-D). In Extended Data Fig. 3C, the ΔF/F0 of this neuron only varied over a range of 0.2 such that the SNR is very low. The resultant noisy tuning curves suggested no orientation or direction selectivity for this neuron. Instead, the spike inference has increased the data SNR such that the tuning curve is much sharpened, showing preferred orientation at 0° and 180° and hence an increased OSI. On the contrary, in Extended Data Fig. 3D, the tuning curve from ΔF/F0 resulted in a sizable OSI (0.43) which may (falsely) suggest that this neuron has an orientation preference at 0°. Nevertheless, after spike inference with our ENS2, the small peaks in the raw ΔF/F0 signal were filtered out, resulting in negligible OSI and DSI. This suggests that this neuron in fact has very weak selectivity and may not be considered a real responsive cell. In this sense, the spike inference by our ENS2 has increased the SNR of the recording to not only improving the sensitivity in detecting the orientation selectivity of the neurons, but also to screen out some marginally responsive neurons.
Neuropil correction is a common pre-processing step for improving SNR of calcium imaging signal 12,14,47. We conducted additional simulations to examine how this pre-processing step blends with the spike inference in analyzing these data. Extended Data Fig. 4A shows that neuropil correction increased OSI and DSI computed from ΔF/F0 in more neurons than that computed from predicted spike-event. After neuropil correction, when spike inference with our ENS2 was performed, more than 60% of the neurons had larger OSI and more than 45% of the neurons had larger DSI than when ΔF/F0 was used (Extended Data Fig. 4B). Consequently, the median OSI with spike-rate/spike-event were again higher than that with ΔF/F0, while the three median DSIs were similar in this case (Extended Data Fig. 4C). Nevertheless, the median OSI and DSI with spike-rate/spike-event showed a minor increase between before (Fig. 6E) and after (Extended Data Fig. 4C) neuropil correction. These results suggest that the spike inference with our ENS2 is by itself effective enough to improve the SNR in the recorded data for better analyses of orientation selectivity in these neurons. As such, additional pre-processing for neuropil correction may be unnecessary.
Factors affecting inference performance
To understand further what contributes to good spike inference performance for data-driven methods, we investigate how the dataset itself affects the performance of these models. These insights may further facilitate data collection and preparation for improving data-driven models (e.g. our ENS2).
Fig. 7A-D show the performance with individual dataset achieved by different configurations of models (the configurations are numbered horizontally in the same order as in Fig. 2A-D). The results show that performance indeed depends strongly on the dataset. For instance, regardless of the networks used, dataset 19 and 20 achieved notably poor vRD than other datasets (Fig. 7B), and some datasets (e.g., 2-3, 14-15, and 19-20) showed considerably worse ER than the others (Fig. 7D). These were also observed when comparing our ENS2 to state-of-the-art methods (Fig. 3I-L). We extracted various quantities from each dataset, such as noise level, frame rate, firing rate and AP amplitude, and examined how they may affect the inference performance (Fig. 7E-H, Extended Data Fig. 1M-N). It is shown that AP amplitude of the dataset is the key predictor for the inference performance (including Corr, ER, Error, and Bias). We also noticed that the raw AP amplitudes are of critical importance for inference, which relate the number of spike events for a certain calcium indicator. For example, normalization of calcium inputs might be beneficial when calcium signals fluctuate significantly (e.g. Fig. 2E), but would also cause under-estimation or over-estimation since the original AP amplitudes are altered. On the other hand, de-noising the inputs may improve the signal-to-noise (SNR) ratio. But it will also smooth out the calcium traces, causing broader spike-rate outputs and less precise spike-event predictions. Matching the inputs’ noise-level with the testing dataset is another potentially useful approach to improve inference performance, but it is not always straight-forward to distinguish the AP amplitude and noise amplitude (e.g. Fig. 2E, Fig. 4B-C). A previous model-based study also testified that an accurate estimate of AP amplitude improved performance 48. The AP amplitude indeed strongly depends on the calcium indicators’ sensitivity. It is clear that one major bottleneck of inference algorithm is in the calcium indicators.
(A-D) Performance of ENS2 with different configurations (see Fig. 2A-D) (x-axis) on each dataset (y-axis). (E-H) Pearson correlation coefficients between the four properties of each dataset and the corresponding spike inference performances. (I-M) Performance of spike inference for different types of calcium indicators. (I) An example to illustrate how the division of datasets was made based on calcium indicator when a GCaMP6f dataset was regarded as testing dataset. All refers to all the other 19 datasets, Same refers to the datasets that also used GCaMP6f and Different refers to all the other datasets that used calcium indicators other than GCaMP6f. (N-Q) Performance of spike inference with different length of training data. Shaded areas denote means with 95% confidence intervals.
While Fig. 7A-H suggests that the sensitivity of calcium indicator is a strong predictor for inference performance, we examined how mixing datasets with different types of calcium indicators, like the one used here, may impact the inference performance. We partitioned the training data according to their calcium indicator type with respect to the testing data. Here, “All” is the same as the leave-one-dataset-out setting that we have been using in the other part of this paper. “Same” includes only those training datasets with the same calcium indicator as the testing data. “Different” includes only those training datasets using distinct calcium indicators from the testing data. Fig. 7I-M and Extended Data Fig. 1I-J show that apparently there is no consistent advantage in using dataset with the same calcium indicators for training. Previous study 38 also reported that clustering the same calcium indicators for training showed no advantage. This may suggest that the neural network could calibrate itself over different calcium dynamics in order to generalize to un-seen testing data to certain extent.
Since our ENS2 is a data-driven model, we wonder how much training data is needed for achieving good inference performance. We randomly sampled different numbers of segments from the total of over 20 hours of available paired data. Here, we ignored the overlapping in these segments and just assumed that around 68,000 64-sized segments under 60Hz were equivalent to 20 hours of recordings. When supplying all available paired data to the model, the total duration is approximately 20 × 64 hours since the paired data is segmented with a step of 1 (see Method). Not surprisingly, the performance of inference increased with the amount of training data but it converged at a maximal level with roughly 5 hours of paired data (Fig. 7N-Q, Extended Data Fig. 1G-H).
Discussion
In this work, we have developed a high performance inference system (ENS2) through extensive and empirical research, and showed its usefulness in inferring both spike-rates and spike-events.
We have found that networks with convolutional layers (e.g. U-Net and Le-Net) typically out-performed the other (e.g. FC-Net). This may be partly due to the regularization capability of the convolutional layers. On the other hand, it is quite intuitive for humans to examine the calcium segments fraction by fraction to identify spike-events, just as sliding a kernel for convolution by the artificial neural networks. In fact, a recent data-driven model (CASCADE 38) with state-of-art performance also used a network with convolutional layers. Surprisingly, although inferring spikes from calcium signals is a typical sequence-to-sequence translation task, recurrent neural networks (e.g., LSTM) did not show notable advantages 36. However, in this work, we used deep convolutional networks with sequence-to-sequence translation ability (e.g., 1D U-Net), and showed that it excelled the other state-of-art systems. The advantages of U-Net may come from its skip-connecting architecture (see Method), which is beneficial to temporal prediction. Recently, a 3D U-Net based model has also been proposed to improve SNR in calcium images and facilitate calcium signal extraction 49. On the other hand, we revealed in our results that MSE loss could readily regulate the optimization of such models. While Corr is indisputably a major evaluation metric for spike inference, we suggest that using Corr as the sole loss function for deep learning models (e.g. in S2S 39) might be defective in real world tasks. For example, the inferred spike-rates are illy-scaled in amplitude and are unable to recover spike-events faithfully.
Several other factors may also have significant impact on the inference performance, such as sampling rate (resolution of prediction), size of smoothing window (for spike-rate prediction), and hyper-parameter of evaluation metric (e.g. ER window size). The comparisons are summarized in Fig. 8A-D. Fig. 8A shows that the Corr increased consistently with larger smoothing windows. Similar observations can also be found in several recent studies 35,36,50. This is because the GT spike-rates convolved from the GT spike-events with larger smoothing windows have smoother and broader patterns, which favors the measure of Corr. Fig. 8F shows the GT spike-rates obtained by convolving the GT spike-events in Fig. 8E with varying smoothing window sizes (25ms to 200ms) and their corresponding predictions. Apparently, the smoother and broader waveform of GT spike-rate (with larger smoothing windows) simplifies the prediction task, and it is easier to obtain a high Corr with such simpler and smoother PD spike-rate waveform. This also holds for the spike-rate evaluation of vRD and Error (Fig. 8B & Extended Data Fig. 1E). However, we argue that such resultant “better” performance (e.g. high Corr) would not guarantee meaningful predictions as reflected in the PD spike-events, since multiple GT spike-events could be merged into a single peak of spike-rate (Fig. 8E-F).
(A-D) Performance of correlation, van Rossum distance, Victor-Purpura distance, and error rate, respectively, when measured under different sampling rates and smoothing window sizes. White and red squares denote the evaluation schemes adopted by ENS2 and CASCADE, respectively. (E) Example of calcium signals under 60Hz with paired spike-events. (F) Examples of ground truth spike-rates convolved with different smoothing window sizes (from 200ms to 25ms). The resultant spike-rate and spike-event predictions are also shown. (G) Performance of error rate when measured under different sampling rates and error rate window sizes.
Instead, the temporal firing patterns could be better predicted with narrower smoothing windows. On the other hand, spike-event predictions (VPD & ER, Fig. 8C-D & Fig. 8G) generally improve with higher sampling rates. This is quite reasonable as smaller bin sizes allow more precise estimation of spike-events from spike-rate predictions. Moreover, when high sampling rates are used (e.g. 30 or 60Hz), VPD and ER would also reduce along with smoothing window sizes, indicating improved spike-event predictions. Here, the spike-event inference performance would possibly be restricted by the overly smoothed spike-rates (e.g. Fig. 8F). We also analyzed the effect of ER window sizes on ER evaluation (Fig. 8G). As expected, smaller ER window sizes put more demanding evaluation on the algorithm and hence, results in higher ER, which is similar to the findings reported in a previous study 29.
Given these analyses, we suggest that our ENS2 system should be trained with a sampling rate of 60Hz with 25ms smoothing windows for practical use (labeled with white dashed boxes in Fig. 8A-D & Extended Data Fig. 1E-F). On one hand, preparing calcium inputs under 60Hz could reduce information loss (see Method), and adopting the 25ms smoothing window could achieve near optimal spike-event inference (Fig. 8C-D). We decided to optimize our system for spike-event prediction rather than spike-rate for the reasons discussed above. On the other hand, the performance of spike-event prediction starts to saturate with this scheme. Further increase in the sampling rate would cause computational over-head while to reduce further the smoothing window size might be harmful to training neural networks with gradient descent. We also repeated our benchmark using Causal smoothing kernels as in CASCADE 38. Nevertheless, similar performance was obtained in our system (Extended Data Fig. 5). It is worth noting that the CASCADE algorithm 38 was indeed benchmarked under 7.5Hz with 200ms smoothing windows. We also trained our ENS2 under such conditions (labeled in red dashed boxes in Fig. 8A-D & Extended Data Fig. 1E-F), and it shows that our ENS2 consistently outperformed the CASCADE algorithm in this way for both spike-rate and spike-event predictions (Extended Data Fig. 6). These results support that our ENS2 is a versatile and highly effective algorithm for spike inference from calcium signals.
Importantly, we have demonstrated that our spike inference algorithm could improve the analyses of real world calcium data such as in the study of neuronal orientation preference in primary visual cortex (Fig. 6, Extended Data Fig. 3 & Extended Data Fig. 4). Our results demonstrate that our algorithm has reliably filled the gap between calcium imaging and traditional electrophysiological recording, such that we can perform analyses with both high throughput (from calcium imaging) and high precision (from electrophysiology) in the study of our brain.
Methods
Benchmark database
In this study, we used the publicly available datasets containing both calcium imaging signals and simultaneously recorded electrophysiological signals from excitatory neurons 13-17,35,38,51-53. For benchmarking and algorithm development purposes, they were recently compiled by 38 into an extensive database with 21 datasets. Specifically, we adopted dataset #2 to #21 following 38 for a fair comparison, and they are labeled as dataset 1 to 20 in this study as shown in Extended Data Table 1. These 20 datasets cover eight different kinds of calcium indicators, a wide range of frame rates (7.7Hz to 500Hz), and various firing rates (0.2Hz to 5.8Hz on average). Over 20 hours of paired ground truth data (calcium signals and spike-events) were recorded from a total of 230 neurons of either mouse or zebrafish brains.
In each dataset, raw calcium signals are provided as the percentage changes of fluorescence amplitude against baseline (ΔF/F0), while individual timestamps label spike-events. We also computed the noise-levels as defined in 38 and listed them in Extended Data Table 1. Furthermore, we presented the increase in ΔF/F0 induced by one action potential (AP amplitude) for each dataset. The AP amplitude is computed using the averaged calcium kernel, which was extracted from paired ground truth data using the deconvolution function with regularized filter in MATLAB.
Data preparation
1) Re-sampling data
To develop and validate the spike inference algorithms, we first re-sampled the input data (both training set and testing set) of different frame rates to the same sampling rates. In this work, we referred to the original frequencies where calcium signals were captured as frame rates, and the re-sampled frequencies as sampling rates. Given that most of the datasets were captured with frame rates not higher than 60Hz (Extended Data Table 1), we re-sampled all calcium signals to 60Hz. All the inference systems were then benchmarked under this same sampling rate. We also tested our system under 7.5Hz as suggested by CASCADE 38. The impact of sampling rates on inference results is discussed in this work.
2) Pre-processing of calcium signals
Aside from the raw calcium inputs (where only re-sampling is performed), we also considered several pre-processed inputs. First, since normalization is beneficial in back-propagation 54, we prepared the “normalized inputs” by rescaling the amplitude of the raw inputs to [0,1] on a record-by-record basis. Second, considering that calcium signals are intrinsically noisy 21, we computed the “de-noised inputs” by down-sampling the raw inputs to 6Hz and then up-sampling them to the required sampling rates. This should be sufficient to preserve the spiking properties as we noted that the firing rates of all datasets are well below 6Hz (Extended Data Table 1). Moreover, the “noise matching inputs” were obtained from CASCADE 38 to reproduce results with the CASCADE algorithm, where the algorithm is designed to have the noise-levels of training calcium data matching with those of the testing data.
3) Pre-processing of spike-rate
For a pre-defined sampling rate (e.g. 60Hz), raw timestamps of ground truth spike (spike-events) are re-allocated into their corresponding time bins. We can then compute the sequence of spike counts by counting the total firing events in each time bin. Note that the different pre-processed calcium inputs (raw/normalized/de-noised/noise matching) considered here share identical spike count sequences. The sequences are then smoothed with Gaussian filters to facilitate gradient descent. The smoothing window size τ for the Gaussian kernels was set to 25ms, which produces the optimal spike-event predictions with high temporal resolution in general. The selection of smoothing window size for deep learning based systems is also carefully studied. The convolved spike counts are denoted as “spike-rate” in this work.
4) Data segmentation
To train the neural networks properly, paired sequences of calcium signals and spike-rates were segmented with a moving step of 1 (Fig. 1A). The length of each segment was set to 64 samples, ensuring that each contains at least 1sec of information (e.g. for a sampling rate of 60Hz, 1sec of data consists of 60 data points). In the case of a sampling rate of 60Hz, a total of >4 million segments of paired data were obtained for training.
Network architectures
We tested three different architectures of neural networks to evaluate their effectiveness in the spike inference task. All of them (U-Net 55, Le-Net 56, and FC-Net (fully-connected network)) are typical representations of their own categories (deep neural network (DNN), convolutional neural network (CNN), and typical neural network (NN)), respectively. We adopted these existing models with minor modifications for 1-D calcium signal inputs. The network architectures are summarized in Extended Data Table 2.
The U-Net used in this study contains three contracting blocks and expanding blocks. On one hand, the input information from contracting blocks passes through the bottleneck block to the expanding blocks. On the other hand, skip-connections from contracting blocks to the corresponding expanding blocks allow direct and localized information flows 55. Within each contracting/expanding block and the bottleneck block, two convolution layers with 3-sized kernels are deployed. Instead of batch normalization, we used instance normalization 57 for regularization, since calcium signals with various dynamics may co-exist in a same batch of data. We observed that this regularization helped in model convergence. The Le-Net consists of three convolution layers with kernel sizes of 3, 3, and 6, respectively. Average pooling layers with 2-sized kernels are applied between the three convolution layers. A dropout layer 58 is included before the output layer for regularization. A typical fully-connected network with four hidden layers and two dropout layers is adopted as FC-Net. All three networks are designed to take 64-sized calcium signal inputs, and output 64-sized spike-rate vectors in a sequence-to-sequence translation manner. During prediction, given that the input data are segmented with steps of 1, each time point is indeed predicted for up to 64 times separately from its adjacent segments. We, thus, were able to average these predictions for a robust final spike-rate output of each time point (Fig. 1A). Spike-event output could then be estimated from this final spike-rate sequence as introduced below.
We kept all three networks to have similar numbers (under 150k) of trainable parameters for comparison (Fig. 5A). They are all randomly initiated to have zero means and standard deviations of 0.02. Leaky ReLU (rectified linear units) with slopes of 0.2 is used as activation functions for all layers except for the output layers, where ReLU is used for non-negative spike-rate prediction.
Loss functions and optimization
For each type of network, we optimized them with three different loss functions, respectively, for comparison. First, mean square error (MSE) loss is used, which is one of the most commonly used loss functions applicable to a wide variety of machine learning tasks. The models are expected to minimize the MSE between predicted spike-rates and ground truth spike-rates, penalizing the prediction both in time and amplitudes. In addition, we used two of the evaluation metrics (see below), correlation (Corr) and van Rossum distance (vRD) 59, as loss functions. Through gradient descents, the models are expected to maximize the performance measured by these two metrics.
The Adam optimizer 60 with a default learning rate of 1e-3 is used for all models. Each model is allowed to update for a maximum of 5000 iterations. In each iteration, a batch of 1024 paired segments is drawn randomly and fed to the model for training. The training losses are noted, and early-stopping is introduced when the losses do not improve in the past 500 interactions (patience = 500). Under these criteria, we observed that most models completed the trainings within 2500 iterations. The resultant models are then ready for prediction. Other details of hyper-parameters and operational environment are summarized in Extended Data Table 3.
Estimation of spike-events from spike-rate predictions
To reliably convert the spike-rates output by the neural networks to spike-event predictions, we propose an unsupervised greedy algorithm that is simple and straight-forward (Fig. 1B). The workflow is briefly introduced here.
Step 1: Fragments of spike-rate predictions (pd_rates) with non-zero spike-rate are identified by thresholding the spike-rate sequence output with an epsilon value. We do not use zero threshold to avoid including any fragment with overly low peak amplitude (i.e. those showing extremely small spiking probabilities or background noise), where no spike should be estimated.
Step 2: For each pd_rate of length L (in terms of number of data points), we initialize a zero-filled vector (est_spike) with L bins.
Step 3: One spike is assigned to any one bin in est_spike at one time and convolve it into a spike-rate vector (est_rate) in a same way as we have described above. Then the MSE between the resultant est_rate and pd_rate is calculated. This step is implemented parallelly for all L bins to determine the most suitable bin (i.e. with the smallest MSE) for assigning the spike.
We then repeat step 3 to assign another spike each time to the most suitable bin in a greedy manner, until the MSE would no longer be reduced by adding a spike to any location in est_spike. Then the updated est_spike is regarded as the final estimation of spike events for the concerned pd_rate fragments. The timestamp of a spike is defined as the center time of the corresponding bin within est_spike. If multiple spikes are predicted in the same bin, the same timestamp is repeated accordingly.
For a spike-rate sequence output with N fragments of pd_rates, this algorithm executes in O(N×L×k) time, where k is the maximum number of spikes in any one bin. In practice, considering the typically slow dynamics of calcium signals and relatively low firing rates of neurons imaged, this estimation method operates in linear time in proportional to the duration of recordings. We have validated this spike-events estimation algorithm with the Monte-Carlo importance sampling based algorithm proposed in CASCADE 38.
Evaluation metrics
How to reliably assess the performance of the spike inference tasks remains an open topic, where a single evaluation metric could be biased in certain aspects 35,36,38,39. In this regard, recent studies proposed to employ multiple metrics to supplement each other 29,35-39. In this work, we used four metrics to examine spike-rates prediction and two others for spike-events prediction.
Firstly, Pearson correlation coefficient (Corr) is used as the primary metric for comparing similarities of spike rates as follow:
where GT and PD stand for ground truth and prediction, respectively. Secondly, we use the van Rossum distance (vRD) 59 for the evaluation of spike rates prediction:
where the time constant τ is the normalizing factor (smoothing window size) for smoothing spike-events into spike-rates (e.g., τ = 0.025s for our proposed system). Moreover, Error and Bias proposed in 38 are also used to evaluate spike-rates:
On the other hand, for measuring spike-event prediction, we adopt the Victor-Purpura distance (VPD) 61. It is defined as the minimal cost to transform the PD spike-events to the GT spike-events. The cost for either inserting or deleting a spike equals 1, while shifting a spike by Δt costs q|Δt|. We use the default value q = 1 in this work. To make comparison across different datasets, we present the VPD as the minimal total cost divided by the total number of GT spikes.
Lastly, we compute the error rate (ER) as below 29,37,48, which measures the F1 score of the predicted spike-events:
The GT spike-events and PD spike-events are matched based on their VPD. Here, a spike is said to be correctly predicted if it co-exists with its real counterpart within a time window of 50ms (defined as the ER window size). This time window is one order smaller than that used in previous study 29, suggesting a much more stringent assessment of model performance in this study. We also examined the effect of ER window sizes in this work.
Implementation of algorithms
As described above, this study involves configurations from three types of calcium inputs, three neural network models, and three loss functions, resulting in a total of 27 configurations of models. The best performing system as evaluated by the six metrics described in the Method section was then benchmarked against the state-of-the-art algorithms, including the data-driven method, CASCADE 38 and the model-based method, MLspike 29.
Our simulations followed the leave-one-dataset-out protocol. In brief, the model is first trained on 19 datasets and tested on the remaining one. This is repeated 20 times such that all the 20 datasets are tested respectively. We then recorded the dataset-wise performance and the neuron-wise performance in each testing dataset.
For the CASCADE algorithm, we follow the protocol as described 38. For each dataset, noise matching inputs are constructed (artificial noise is added to the 19 training datasets to match the noise-level of the remaining one tested on), and five identical models are trained separately for 10 epochs. The averaged outputs of these five models are regarded as the final spike-rate predictions. Spike-event predictions are estimated using a Monte-Carlo importance sampling based algorithm in CASCADE 38.
For the MLspike algorithm 29, raw calcium inputs are used. The drift (τ = 0.01) and non-linearity parameters (saturation γ = 0.1 for synthetic dyes, polynomial coefficient [p2, p3] = [1.0,0.0] for the rest indicators) are set manually with prior knowledge. The values of the model parameters of A (action potential (AP) amplitude), τ (calcium decay time constant), and σ (noise amplitude) are obtained using its built-in auto-calibration algorithm, since manual calibration is impossible in actual usage without ground truth paired data. For a fair comparison, in addition to the direct spike-event outputs from MLspike, we also convolve them into spike-rate outputs in the same way as described in the Method section.
In vivo experiments
1) Animal preparation and surgery
Ai148 mice (Jackson Lab strain #030328) were crossed with CamKII-cre mice to express GCaMP6s in excitatory neurons. Animal surgery was described previously 12. In brief, postnatal day (P) 25 mice were anaesthetized with 3% isoflurane and confined by stereotaxic frame. Scalp was sterilized and removed for the cranial window surgery. Skull above the left binocular visual cortex (Fig. 6A3) was replaced by a 3mm/5mm stacked circular glass coverslip to ensure transparency for imaging. A tailor-made head-plate was fixed on the skull with Metabond adhesive cement.
2) Two-photon calcium imaging
The imaging process was performed by two-photon system with awake and head-fixed mice. The mice were allowed to recover for at least 3 days after the craniotomy, followed by a habituation on head-fixation. Prairie Ultima with a Spectra Physics Mai-Tai Deep See laser two-photon system (Prairie Technologies) was used for imaging. 20x Olympus objective lens was used for functional imaging. The calcium signals of neurons from layer 2/3 were visualized at 920nm with acquisition frame rate around 7.6Hz (averaging from 4 consecutive frames around 30Hz).
3) Visual stimulation protocols
The visual stimulus was delivered from Psychtoolbox-3. A computer was connected to a 10-inch 1080p LCD monitor for display. Drifting gratings were used to stimulate the visual cortex (Fig. 6A2-3). Each trial of display lasts for 10 seconds and repeated for 20 trials per stimulus, resulting in imaging session of 200 seconds. At the beginning of each trial, 6 seconds of grey screen is presented as blank, followed by 4 seconds of grating stimuli (ON period). 8 directions were presented from 0 to 315 degrees to the horizontal, while each direction was displayed for 0.5 second and started with another direction of 45 degrees increment (Fig. 6A3).
4) Data processing and analysis
The recorded images stacks were processed in Fiji (ImageJ version 1.53c) before data extraction. The slices from contralateral and ipsilateral recordings were combined and stacked by maximum intensity Z-projection to concatenate into a single movie. The motion artifact was minimal with plugin ‘Template Matching’ by recognizing and aligning blood vessels over large region within slices. The Z-projected slice was kept for alignment only. The aligned movies were imported to Suite2P 62 (version 0.10.1, https://github.com/MouseLand/suite2p) for neuron segmentation. The following parameters were changed from the default: tau of 0.75, denoise of 1, diameter of 9, anatomical only of 1, maximum iterations of 1, frames per second of 5 and 191 minimum neuropil pixels. The regions of interest (ROIs) and corresponding cells’ activities were detected and saved in .mat file for further processing in MATLAB.
The ROIs of cell were identified with Suite2P-generated file, iscell, and the non-cell components were removed. Z-score and the relative change in fluorescent (ΔF/F0, where F0 was the fluorescence baseline of that ROI) was calculated. A two-step approach was used to further select the visually responsive neuron, which showed significant activities to specific direction(s). First, the ΔF/F0 of each direction was averaged within trial and compared with the corresponding ΔF/F0 of 4th – 6th second of blank with two-sided t-test at 5% significance level. Afterwards, the neurons with significant difference at any direction were searched for exactly 3 consecutive frames of ON period with z-score > 3. The neuron passed both criteria is regarded as visually responsive.
5) Calculation of tuning curves and selectivity indexes
For each trial in the recording of responsive neurons, mean responses within each 0.5sec stimulus window are taken (e.g. mean ΔF/F0 for calcium traces and mean firing rate for spike-events predictions), producing 8 different mean response values. We also compute the background responses using the averaged activities within the 6sec resting window. Then, the resultant 8 mean response values are subtracted from their corresponding background responses in each trial. The final tuning curves are obtained by averaging these mean responses across 20 trials.
We adopted the OSI (orientation selectivity index) and DSI (direction selectivity index) as defined in a previous study 46 to quantify the tuning curves and neuronal selectivity further,
where R(θk) is the response to stimulus orientation at angle θk, and k = 8 is the total number of stimulus angles. Before computing OSI/DSI, if any value in a tuning curve is below zero, we up-shift the whole tuning curve to keep it non-negative for calculation of OSI and DSI. Extended Data Fig. 7 shows several representative examples of how different patterns in tuning curves could affect OSI/DSI quantitatively.
Author Contributions
C.T. conceived the study. Z.Z. designed the algorithms and performed analyses. J.I. and K.T. collected the in vivo calcium imaging data. J.I., K.T. and H. M.Y. pre-processed the data for spike inference and further analysis. M.S. provided expertise and inputs on dissecting V1 physiology, and visual stimulation design. All the authors contributed to interpreting the results and writing the manuscript.
Competing Interests
The authors declare no competing interests.
Acknowledgement
This work was supported by Research Grants Council of Hong Kong SAR Project CityU 11104220 (C.T.), ECS CUHK 24117220 (J.I.), and City University of Hong Kong (Project 7005645), Lo Kwee-Seong Biomedical Research Fund (J.I.) and Faculty Innovation Awards (FIA2020/A/04) from the Faculty of Medicine, CUHK (J.I.). Data and source codes to replicate the primary results will be shared upon reasonable request.