Abstract
Temporally ordered multi-neuron patterns likely encode information in the brain. We introduce an unsupervised method, SPOTDisClust (Spike Pattern Optimal Transport Dissimilarity Clustering), for their detection from high-dimensional neural ensembles. SPOTDisClust measures similarity between two ensemble spike patterns by determining the minimum transport cost of transforming their corresponding normalized cross-correlation matrices into each other (SPOTDis). Then, it performs density-based clustering based on the resulting inter-pattern dissimilarity matrix. SPOTDisClust does not require binning and can detect complex patterns (beyond sequential activation) even when high levels of out-of-pattern “noise” spiking are present. Our method handles efficiently the additional information from increasingly large neuronal ensembles and can detect a number of patterns that far exceeds the number of recorded neurons. In an application to neural ensemble data from macaque monkey V1 cortex, SPOTDisClust can identify different moving stimulus directions on the sole basis of temporal spiking patterns.
1 Introduction
Precisely timed spike patterns spanning multiple neurons are a ubiquitous feature of both spontaneous and stimulus-evoked brain network activity. Remarkably, not all patterns are generated with equal probability. Synaptic connectivity, shaped by development and experience, favors certain spike sequences over others, limiting the portion of the network’s “state space” that is effectively visited (Luczak, McNaughton, and Harris, 2015; Ikegaya et al., 2004). The structure of this permissible state space is of the greatest interest for our understanding of neural network function. Multi-neuron temporal sequences encode information about stimulus variables (Vinck et al., 2010; Siegel, Warden, and Miller, 2009; Havenith et al., 2011; Konig et al., 1995; Lu, Liang, and Wang, 2001; Gerstner et al., 1996), in some cases “unrolling” non-temporally organized stimuli, such as odors, into temporal sequences (Wehr and Laurent, 1996). Recurrent neuronal networks can generate precise temporal sequences (Memmesheimer et al., 2014; Abbott and Blum, 1996; Huh and Sejnowski, 2017; Fiete et al., 2010; Laje and Buonomano, 2013), which are required for example for the generation of complex vocalization patterns like bird songs (Hahnloser, Kozhevnikov, and Fee, 2002). Temporal spiking patterns may also encode sequences of occurrences or actions, as they take place, or are planned, projected, or “replayed” for memory consolidation in the hippocampus and other structures (Carr, Jadhav, and Frank, 2011; Foster and Wilson, 2006; Dragoi and Tonegawa, 2011; Johnson and Redish, 2007; Nadasdy et al., 1999; Pfeiffer and Foster, 2013; Skaggs and McNaughton, 1996; Euston, Tatsuno, and McNaughton, 2007; Peyrache et al., 2009; Pastalkova et al., 2008).
Timing information between spikes of different neurons is critical for memory function, as it regulates spike timing dependent plasticity (STDP) of synapses, with firing of a post-synaptic neuron following the firing of a pre-synaptic neuron typically inducing synaptic potentation, and firing in the reverse order typically inducing depotentiation (Dan and Poo, 2004; Markram et al., 1997; Abbott and Nelson, 2000). Thus, the consolidation of memories may rely on recurring temporal patterns of neural activity, which stabilize and modify the synaptic connections among neurons (Buzsaki, 1989; Carr, Jadhav, and Frank, 2011; Foster and Wilson, 2006; Dragoi and Tonegawa, 2011; Johnson and Redish, 2007; Nadasdy et al., 1999; Pfeiffer and Foster, 2013; Skaggs and McNaughton, 1996; Benchenane et al., 2010; Sejnowski and Paulsen, 2006; Suri and Sejnowski, 2002; Lee and Wilson, 2002; Drew and Abbott, 2006; van Rossum, Bi, and Turrigiano, 2000). Storing memories as sequences has the advantage that a very large number of patterns is possible, because the number of possible spike orderings grows exponentially, and different sequences can efficiently be associated to different memory items, as proposed by for instance the reservoir computing theory (Maass, Natschlager, and Markram, 2002; Lazar, Pipa, and Triesch, 2009; Singer and Lazar, 2016; Buonomano and Maass, 2009; Lukoševičius and Jaeger, 2009).
Detecting these temporal patterns represents a major methodological challenge. With recent advances in neuro-technology, it is now possible to record from thousands of neurons simultaneously (Jun et al., 2017), and this number is expected to show an exponential growth in the coming years (Stevenson and Kording, 2011). The high dimensionality of population activity, combined with the sparsity and stochasticity of neuronal output, as well as the limited amount of time one can record from a given neuron, makes the detection of recurring temporal sequences an extremely difficult computational problem. Many approaches to this problem are supervised, that is, they take patterns occurring concurrently with a known event, such as the delivery of a stimulus for sensory neurons or the traversal of a running track for hippocampal place fields, as a “template” and then search for repetitions of the same template in spiking activity (Lee and Wilson, 2004; Nadasdy et al., 1999; Davidson, Kloosterman, and Wilson, 2009). Other approaches construct a template by measuring latencies of each neuron’s spiking from a known event, such as the beginning of a cortical UP state (Havenith et al., 2011; Luczak, Bartho, and Harris, 2009). While this enables rigorous, relatively easy statistical treatment, it risks neglecting much of the structure in the spiking data, which may contain representations of other items (e.g. remote memories, presentations of different stimuli, etc.). A more complete picture of network activity may be provided by unsupervised methods, detecting regularities, for example in the form of spiking patterns recurring more often than predicted by chance. Unsupervised methods proposed so far typically use linear approaches, such as Principal Component Analysis (PCA) (Peyrache et al., 2009; Lopes-dos-Santos, Ribeiro, and Tort, 2013; Stopfer, Jayaraman, and Laurent, 2003), and cannot account for different patterns arising from permutations of spike orderings.
While approaches like frequent itemset mining and related methods (Grun, Diesmann, and Aertsen, 2002; Picado-Muino et al., 2013; Pipa et al., 2008; Torre et al., 2013) can find more patterns than the number of neurons and provide a rigorous statistical framework, they require that exact matches of the same pattern occur, which becomes less and less probable as the number of neurons grows or as the time bins become smaller (problem of combinatorial explosion). To address this problem, Effenberger and Hillar, 2015; Hillar and Effenberger, 2015 proposed another promising unsupervised method based on spin glass Ising models that allows for approximate pattern matching while not being linearly limited in the number of patterns; this method however requires binning, and rather provides a method for classifying the binary network state vector in a small temporal neighbourhood, while not dissociating rate patterns from temporal patterns.
In this paper we introduce a novel spike pattern detection method called SPOTDisClust (Figure 1). We start from the idea that the similarities of two neural patterns can be defined by the trace that they may leave on the synaptic matrix, which in turn is determined by the pairwise cross-correlations between neural activities (Dan and Poo, 2004; Markram et al., 1997). The algorithm is based on constructing an epoch-to-epoch dissimilarity matrix, in which dissimilarity is defined as SPOTDis, making use of techniques from the mathematical theory of optimal transport to define, and efficiently compute, a dissimilarity between two spiking patterns (Monge, 1781; Kantorovich, 1942; Hitchcock, 1941; Rubner, Tomasi, and Guibas, 1998). We then perform unsupervised clustering on the pairwise SPOTDis matrix. SPOTDis measures the similarity of two spike patterns (in two different epochs) by determining the minimum transport cost of transforming their corresponding cross-correlation matrices into each other. This amounts to computing the Earth Mover’s Distance (EMD) for all pairs of neurons and all pairs of epochs (see Methods). Through ground-truth simulations, we show that SPOTDisClust has many desirable properties: It can detect many more patterns than the number of neurons (Figure 2); it can detect complex patterns that would be invisible with latency-based methods (Figures 3-4); it is highly robust to noise, i.e. to the ‘insertion’ of noisy spikes, spike timing jitter, or fluctuations in the firing rate, and its performance grows with the inclusion of more neurons given a constant signal-to-noise ratio (Figure 5); it can detect sequences in the presence of sparse firing (Figure 6); and finally it is insensitive to a global or patterned scaling of the firing rates (Figure 7). We apply SPOTDisClust to V1 Utah array data from the awake macaque monkey, and identify different visual stimulus directions using unsupervised clustering with SPOTDisClust (Figure 8).
2 Results
2.1 Outline of the algorithm
Suppose we perform spiking measurements from an ensemble of N neurons, and we observe the spiking output of this ensemble in M separate epochs of length T samples (in units of the time bin length). Suppose that there are P distinct activity patterns that tend to reoccur in some of the M epochs. Each pattern generates a set of normalized (to unit mass) cross-correlation histograms among all neurons. Instantiations of the same pattern are different because of noise, but will have the same expectation for the cross-correlation histogram. The normalized cross-correlation histogram is defined as if , and otherwise. Here, is the cross-correlation function (or cross-covariance), and si(t) and Sj(t) are the spike trains of neurons i and j. In other words, the normalized cross-correlation histogram is simply the histogram of coincidence counts at different delays τ, normalized to unit mass. We take the N × N × (2T + 1) matrix of values as a full representation of a pattern, that is, we consider two patterns to have the same temporal structure when all neuron pairs have the same expected value of for each τ. For simplicity and clarity of presentation, we have written the cross-correlation function as a discrete (histogram) function of time. However, because the SPOTDis, which is introduced below, is a cross-bins dissimilarity measure and requires only to store the precise delays τ at which is non-zero, the sampling rate can be made infinitely large (see Methods). In other words, the SPOTDis computation does not entail any loss of timing precision beyond the sampling rate at which the spikes are recorded.
The SPOTDisClust method contains two steps (Figure 1), which are illustrated for five example patterns (Figure 1A-B). The first step is to construct the SPOTDis dissimilarity measure between all pairs of epochs on the matrix of cross-correlations among all neuron pairs. The second step is to perform clustering on the SPOTDis dissimilarity measure using an unsupervised clustering algorithm that operates on a dissimilarity matrix. Many algorithms are available for unsupervised clustering on pairwise dissimilarity matrices. One family of unsupervised clustering methods comprises so called density clustering algorithms, including DBSCAN, HDBSCAN or density peak clustering. Here, we use the HDBSCAN unsupervised clustering method (Ester et al., 1996; Campello, Moulavi, and Sander, 2013; Campello et al., 2015; McInnes, Healy, and Astels, 2017) (see Methods). To examine the separability of the clusters in a low dimensional 2-D embedding, we employ the t-SNE projection method (Maaten and Hinton, 2008; Hinton and Roweis, 2003) (see Methods).
The SPOTDis measure is constructed as follows:
1. We compute, for each of the M epochs separately, the cross-correlation function for all pairs of N neurons (see Methods), which yields M matrices of N(N — 1)/2 cross-correlations (Figure 1C).
2. For each pair of epochs k and m and each pair of neurons i and j, we now want to quantify how similar the temporal correlation of neuron i and j was between epochs k and m, i.e. the similarity of and . To this end, we compute the Earth Mover’s Distance (EMD) (Figure 1C-D) between the normalized cross-correlations of each neuron pair, which yields M(M — 1)/2 × N(N — 1)/2 EMD values Dij,km (see Methods). We use the L1 norm to measure dissimilarity on the time axis, and we define the cost of transporting the mass in the cross-correlation function between time τ1 and τ2 as |τ1 — τ2|/(2T + 1), such that the minimum and maximum EMDs are 0 and 1, respectively. The EMD is a metric distance function on probability distributions that determines the minimum transport cost to transform one unit distribution of mass into another unit distribution of mass (Figure 1D). In this case, the mass is the normalized (to a mass of 1) cross-correlation function for a neuron pair.
The advantage of using the EMD is multi-fold. First, it is a symmetric and metric measure of similarity between two probability distributions (as opposed to e.g. the Kullback-Leibler divergence). Second, as it is a “cross-bins” distance, it can handle jitter in spike timing. In other words, it quantifies not only whether two distributions are overlapping, like the Kullback-Leibler divergence, but also how far they are shifted away, as minimum transport cost, from each other in a metric space (in this case: time). Third, it does not rely on the computation of a measure of central tendency like the center of mass or peak of the probability distribution, but can also compute transport cost between multimodal probability distributions (Figure 1D). It can therefore capture differences between complex patterns (see Figures 3 and 4). Fourth, because our computation of the EMD uses only the exact pairwise delays among pairs of spikes as its input (the computation thus scales with the number of spikes, not bins; see Methods), our implementation does not require any binning or smoothing of the spike trains beyond the sampling rate at which the spikes are recorded, preventing any additional loss in timing precision; this means that the bins can be made infinitely small (see Methods).
3. After computing the EMDs between each pair of epochs for each neuron pair separately, we compute SPOTDis as the average EMD across neuron pairs, (see Methods) (Figure 1C). Here, the weights are defined as where sgn(x) is the sign function, with sgn(x) = 0 for x = 0 and sgn(x) = 1 for x > 0. Thus, only for neuron pairs for which both neuron i and neuron j fired in both epochs k and m will the weight wij,km = 1. The rationale behind ignoring the other neuron pairs for computing the SPOTDis is that it avoids assigning an arbitrary value to the EMD in the case where we have no information about the temporal relationship between the neurons (i.e. where we do not have any spikes for one neuron in one epoch). We assume for now, that for all (k,m), , i.e. that for each pair of epochs k and m, there is at least one pair of neurons in which both neurons fired in both epochs k and m. If all the weights equal 1, then we can simplify to
From Eq. (2) we obtain M(M - 1)/2 SPOTDis values (Figure 1E). These values are then the input to the HDBSCAN clustering algorithm and the t-SNE visualization (Figure 1E) (see Methods).
2.2 Ground truth simulations
To test the SPOTDisClust method for cases in which the ground truth is known, we generated P input patterns in epochs of length T = Tepoch = 300 samples, defined by the instantaneous rate of inhomogeneous Poisson processes, and then generated spiking outputs according to these (Figure 1A-B) (see Methods). Because the SPOTDis is a binless measure, in the sense that it does not require any binning beyond the sampling frequency, the epochs could for example represent spike series of 3s with a sampling rate of 100Hz, or spike series of 300ms with a sampling rate of 1000Hz. Each input pattern was constructed such that it had a baseline firing rate and a pulse activation firing rate, defined as the expected number of spikes per sample. The pulse activation period (with duration Tpu1se samples) is the period in the epoch in which the neuron is more active than during the baseline, and the positions of the pulses across neurons define the pattern. For each neuron and pattern, the position of the pulse activation period was randomly chosen. We generated M/(2 * P) realizations for each of the P patterns, and a matching number of M/2 noise epochs (i.e. 50 percent of epochs were noise epochs). We performed simulations for two types of noise epochs (Figure S1). First, noise was generated with random firing according to a homogeneous Poisson process with a constant rate (see Figure 1). We refer to this noise, throughout the text, as “homogeneous noise”. For the second type of noise, each noise epoch comprised a single instantiation of a unique pattern, with randomly chosen positions of the pulse activation periods. We refer to this noise as “patterned noise”. For both types of noise patterns, the expected number of spikes in the noise epoch was the same as during an epoch in which one of the P patterns was realized. The second type of noise also had the same inter spike interval statistics for each neuron as the patterns. Importantly, because SPOTDisClust uses only the relative timing of spiking among neurons, rather than the timing of spiking relative to the epoch onset, the exact onset of the epoch does not have to be known with SPOTDis; even though the exact onset of the pattern is known in the simulations presented here, this knowledge was not used in any way for the clustering.
Figure 1 illustrates the different steps of the algorithm for an example of P = 5 patterns. For the purpose of illustration, we start with an example comprising five patterns that are relatively easy to spot by eye; later in the manuscript we show examples with a very low signal-to-noise ratio (Figure 5) or sparse firing (Figure 6). We find that in the 2-D t-SNE embedding, the P = 5 different patterns form separate clusters (Figure 1E), and that the HDBSCAN algorithm is able to correctly identify the separate clusters (Figure S2). In Figure S1 we compare clustering with homogeneous and patterned noise. The homogeneous noise patterns have a consistently small SPOTDis dissimilarity to each other and are detected as a separate cluster, while the patterned noise epochs have large SPOTDis dissimilarities to each other and do not form a separate cluster, but spread out rather uniformly through the low-dimensional t-SNE embedding (Figure S1).
2.3 Detectable patterns outnumber recorded neurons
A key challenge for any pattern detection algorithm is to find a larger number of patterns than the number of measurement variables, assuming that each pattern is observed several times. This is impossible to achieve with traditional linear methods like PCA (Principal Component Analysis), which do not yield more components than the number of neurons (or channels). Other approaches like frequent itemset mining and related methods (Grun, Diesmann, and Aertsen, 2002; Picado-Muino et al., 2013; Pipa et al., 2008) require that exact matches of the same pattern occur.
Because SPOTDisClust clusters patterns based on small SPOTDis dissimilarities, it does not require exact matches of the same pattern to occur, but only that the different instantiations of the same pattern are similar enough to one another, i.e. have SPOTDis values that are small enough, and separate them from other clusters and the noise.
Figure 2 shows an example where the number of patterns exceeds the number of neurons by a factor 10 (500 to 50). In the 2-D t-SNE embedding, the 500 patterns form separate clusters, with the emergence of a noise cluster that has higher variance. Consistent with the low dimensional t-SNE embedding, the HDBSCAN algorithm is able to correctly identify the separate clusters (Figure S2).
When many patterns are detectable, the geometry of the low dimensional t-SNE embedding needs to be interpreted carefully: In this case, all 500 patterns are roughly equidistant to each other, however, there does not exist a 2-D projection in which all 500 clusters are equidistant to each other; this would only occur with a triangle for P = 3 patterns. Thus, although the low dimensional t-SNE embedding demonstrates that the clusters are well separated from each other, in the 2-D embedding nearby clusters do not necessarily have smaller SPOTDis dissimilarities than distant clusters when P is large.
2.4 SPOTDisClust can detect complex patterns
Temporal patterns in neuronal data may consist not only of ordered sequences of activation, but can also have a more complex character. As explained above, a key advantage of the SPOTDis measure is that it computes averages over the EMD, which can distinguish complex patterns beyond patterns that differ only by a measure of central tendency. Indeed, we will demonstrate that SPOTDisClust can detect a wide variety of patterns, for which traditional methods that are based on the relative activation order (sequence) of neurons may not be well equipped.
We first consider a case where the patterns consist of bimodal activations within the epoch (Figure 3A). These type of activation patterns might for example be expected when rodents navigate through a maze, such that enthorinal grid cells or CA1 cells with multiple place fields are activated at multiple locations and time points (O’Keefe and Burgess, 1996; Maurer et al., 2006; Hafting et al., 2005). A special case of a bimodal activation is one where neurons have a high baseline firing rate and are “deactivated” in a certain segment of the epoch (Figure 3B). These kind of deactivations may be important, because e.g. spatial information about an animal’s position in the medial temporal lobe ((Bos et al., 2017)) or visual information in retinal ganglion cells is carried not only by neuronal activations, but also by neuronal deactivations. We find that the different patterns form well separated clusters in the low dimensional t-SNE embedding based on SPOTDis (Figure 3A-B), and that HDBSCAN correctly identifies them (Figure S2).
Next, we consider a case where there are two coarse patterns and two fine patterns embedded within each coarse pattern, resulting in a total number of four patterns. This example might be relevant for sequences that result from cross-frequency theta-gamma coupling, or from the sequential activation of place fields that is accompanied by theta phase sequences on a faster time scale (O’Keefe and Recce, 1993; Dragoi and Buzsaki, 2006). These kinds of patterns would be challenging for methods that rely on binning, because distinguishing the coarse and fine patterns requires coarse and fine binning, respectively. We find that the SPOTDis allows for a correct separation of the data in four clusters corresponding to the four patterns and one noise cluster (Figure 4A), and that HDBSCAN identifies them (Figure S2). As expected, we find that the two patterns that share the same coarse structure (but contain a different fine structure) have smaller dissimilarities to each other in the t-SNE embedding as compared to the patterns that share a different coarse structure.
Finally, we consider a set of patterns consisting of a synchronous (i.e. without delays) firing of a subset of cells, with a cross-correlation function that is symmetric around the delay τ = 0 (i.e., correlation without delays). This type of activity may arise for example in a network in which all the coupling coefficients between neurons are symmetric.
Previous methods to identify the co-activation (without consideration of time delays) of different neuronal assemblies relied on PCA (Peyrache et al., 2009), which has the key limitation that it can identify only a small number of patterns (smaller than the number of neurons). Furthermore, while yielding orthonormal, uncorrelated components that explain the most variance in the data, PCA components do not necessarily correspond to neuronal spike patterns that form distinct and separable clusters; e.g. a multivariate Gaussian distribution can yield multiple PCA components that correspond to orthogonal axes explaining most of the data variance.
Figure 4B shows four patterns, in which a subset of cells exhibits a correlated activation without delays. Separate clusters emerge in the t-SNE embedding based on SPOTDis (Figure 4B) and are identified by HDBSCAN (Figure S2). This demonstrates that SPOTDisClust is not only a sequence detection method in the sense that it can detect specific temporal orderings of firing, but can also be used to identify patterns in which specific groups of cells are synchronously co-active without time delays.
2.5 Dependence on the signal to noise ratio
A major challenge for the clustering of temporal spiking patterns is the stochasticity of neuronal firing. That is, in neural data, it is extremely unlikely to encounter, in a high dimensional space, a copy of the same pattern exactly twice, or even two instantiations that differ by only a few insertions or deletions of spikes. Furthermore, patterns might be distinct when they span a high-dimensional neural space, even when bivariate correlations among neurons are weak and when the firing of neurons in the activation period is only slightly higher than the baseline firing around it (see further below). The robustness of a sequence detection algorithm to noise is therefore critical.
We can dissociate different aspects of “noise” in temporal spiking patterns. A first source of noise is the stochastic fluctuation in the number of spikes during the pulse activation period and baseline firing period. In the ground-truth simulations presented here, this fluctuation is driven by the generation of spikes according to inhomogeneous Poisson processes. This type of noise causes differences in SPOTDis values between epochs, because of differences in the amount of mass in the pulse activation and baseline period, in combination with the normalization of the cross-correlation histogram. In the extreme case, some neurons may not fire in an given epoch, such that all information about the temporal structure of the pattern is lost. Such a neural “silence” might be prevalent when we search for spiking patterns on a short time scale. We note that fluctuations in the spike count are primarily detrimental to clustering performance because there is baseline firing around the pulse activation period, in other words because “noisy” spikes are inserted at random points in time around the pulse activations. To see this, suppose that the probability that a neuron fires at least one spike during the pulse activation period is close to one for all M epochs and all N neurons, and that the firing rate during the baseline is zero. In this case, because SPOTDis is based on computing optimal transport between normalized cross-correlation histograms (eq. (5)), the fluctuation in the spike count due to Poisson firing would not drive differences in the SPOTDis.
A second source of noise is the jitter in spike timing. Jitter in spike timing also gives rise to fluctuations in the SPOTDis and in the ground-truth simulations presented here, spike timing jitter is a consequence of the generation of spikes according to Poisson processes. As explained above, because the SPOTDisClust method does not require exact matches of the observed patterns, but is a “cross-bins” dissimilarity measure, it can handle jitter in spike timing well. Again, we can distinguish jitter in spike timing during the baseline firing, and jitter in spike timing during the pulse activation period. The amount of perturbation caused by spike timing jitter during the pulse activation period is a function of the pulse period duration. We will explore the consequences of these different noise sources, namely the amount of baseline firing, the sparsity of firing, and spike timing jitter in Figures 5 and 6.
We define the SNR (Signal-to-Noise-Ratio) as the ratio of the firing rate inside the activation pulse period over the firing rate outside the activation period. This measure of SNR reflects both the amount of firing in the pulse activation period as compared to the baseline period (first source of noise), and the pulse duration as compared to the epoch duration (second source of noise).
We first consider an example of 100 neurons that have a relatively low SNR (Figure 5A). It can be appreciated that different realizations from the same pattern are difficult to identify by eye, and that exact matches for the same pattern, if one would bin the spike trains, would be highly improbable, even for a single pair of two neurons (Figure 5A). Yet, in the 2-D t-SNE embedding based on SPOTDis, the different clusters form well separated “islands” (Figure 5A), and the HDBSCAN clustering algorithm captures them (Figure S2).
To systematically analyze the dependence of clustering performance on the SNR, we varied the SNR by changing the firing rate inside the activation pulse period, while leaving the firing rate outside the activation period as well as the duration of the activation (pulse) period constant. Thus, we varied the first aspect of noise, which is driven by spike count fluctuations. A measure of performance was then constructed by comparing the unsupervised cluster labels rendered by HDBSCAN with the ground-truth cluster labels, using the Adjusted Rand Index (ARI) measure (see Methods). As expected, we find that clustering performance increases with the firing rate SNR (Figure 5B). Importantly, as the number of neurons increases, we find that the same clustering performance can be achieved with a lower SNR (Figure 5B). Thus, SPOTDisClust does not suffer from the problem of combinatorial explosion as the number of neurons that constitute the patterns increases, and, moreover, its performance improves when the number of recorded neurons is higher. The reason underlying this behavior is that each neuron contributes to the separability of the patterns, such that a larger sample of neurons allows each individual neuron to be noisier. This means that, in the brain, very reliable temporal patterns may span high-dimensional neural spaces, even though the bivariate correlations might appear extremely noisy; absence of evidence for temporal coding in low dimensional multi-neuron ensembles should therefore not be taken as evidence for absence of temporal coding in high dimensional multi-neuron ensembles.
We also varied the SNR by changing the pulse duration while leaving the ratio of expected number of spikes in the activation period relative to the baseline constant. The latter was achieved by adjusting the firing rate inside the activation period, such that the product of pulse duration with firing rate in the activation period remained constant, i.e. Tpulseλpulse = c. Thus, we varied the second aspect of noise, namely the amount of spike timing jitter in the pulse activation period. We find a similar dependence of clustering performance on the firing rate SNR and the number of neurons (Figure 5B). Hence, patterns that comprise brief activation pulses of very high firing yield, given a constant product Tpulseλpulse, clusters that are better separated than patterns comprising longer activation pulses.
We performed further simulations to study in a more simplified, one-dimensional setting how the SPOTDis depends quantitatively on the insertion of noise spikes outside of the activation pulse periods, which further demonstrates the robustness of the SPOTDis measure to noise (Figure S3).
2.6 Temporal pattern recognition in sparsely firing ensembles
As explained above, an extreme case of noise driven by spike count fluctuations is the absence of firing during an epoch. If many neurons remain “silent” in a given epoch, then we can only compute the EMD for a small subset of neuron pairs (eq. (2)). Such a sparse firing scenario might be particularly challenging to latency-based methods, because the latency of cells that do not fire is not defined. We consider a case of sparse firing in Figure 6 where the expected number of spikes per epoch is only 0.48. Despite the firing sparsity, the low-dimensional t-SNE embedding based on SPOTDis shows separable clusters, and HDBSCAN correctly identifies the different clusters (Figure 6). In general, given sparse firing, a sufficient number of neurons is needed to correctly identify the P patterns, but, in addition, the patterns should be distinct on a sufficiently large fraction of neuron pairs.
2.7 Insensitivity to scaling of firing rates
A key aim of the SPOTDisClust methodology is to identify temporal patterns that are based on consistent temporal relationships among neurons. However, in addition to temporal patterns, neuronal populations can also exhibit fluctuations in the firing rate that can be driven by e.g. external input or behavioral state and are superimposed on temporal patterns. A global scaling of the firing rate, or a scaling of the firing rate for a specific assembly, should not constitute a different temporal pattern if the temporal structure of the pattern remains unaltered, i.e. when the normalized cross-correlation function has the same expected value, and should not interfere with the clustering of temporal patterns. This is an important point for practical applications, because it might occur for instance that in specific behavioral states rates are globally scaled (McGinley et al., 2015; Steriade, Timofeev, and Grenier, 2001).
In Figure 7A, we show an example where there are three different global rate scalings, as well as two temporal patterns. The temporal patterns are, for each epoch, randomly accompanied by one of the different global rate scaling factors. The t-SNE embedding shows that the temporal patterns form separate clusters, but that the global rate scalings do not (Figure 7A). Furthermore, HDBSCAN correctly clusters the temporal patterns, but does not find separate clusters for the different rate scalings (Figures 7A and S2). This behavior can be understood from examination of the sorted dissimilarity matrix, in which we can see that epochs with a low rate do not only have a higher SPOTDis to epochs with a high rate, but also to other epochs with a low rate, which prevents them from agglomerating into a separate cluster (Figure 7A); rather the epochs with a low rate tend to cluster at the edges of the cluster, whereas the epochs with a high rate tend to form the core of the cluster (Figure 7A).
Another example of a rate scaling is one that consists of a scaling of the firing rate for one half of the neurons (Figure 7B). Again, the t-SNE embedding and HDBSCAN clustering show that rate scalings do not form separate clusters, and do not interfere with the clustering of the temporal patterns (Figures 7B and S2). We conclude that the unsupervised clustering of different temporal patterns with SPOTDisClust is not compromised by the inclusion of global rate scalings, or the scaling of the rate in a specific subset of neurons.
2.8 Application to visual responses of V1 populations
We apply the SPOTDisClust method to data collected from monkey V1. Simultaneous recordings were performed from 64 V1 channels using a chronically implanted Utah array (Blackrock) (see Methods). We presented moving bar stimuli in four cardinal directions while monkeys performed a passive fixation task. Each stimulus bar was presented 20 times. We then pooled all 80 trials together, and added 80 trials containing spontaneous activity. Our aim was then to recover the separate stimulus conditions using unsupervised clustering of multi-unit data with SPOTDisClust. The low dimensional t-SNE embedding shows four dense regions that are well separated from each other and correspond to the four stimuli, and HDBSCAN identifies these four clusters (Figure 8). Thus, SPOTDisClust can be successfully used on real neuronal data to identify different temporal patterns in high-dimensional multi-neuron ensembles.
3 Discussion
We have presented a novel dissimilarity measure for multi-neuron temporal spike patterns, SPOTDis, with unique properties that make it suitable for the unsupervised exploration of the space of admissible firing patterns. SPOTDis is rooted in optimal transport theory, a burgeoning field in mathematics that offers promising solutions for fields as diverse as economics, engineering, physics and chemistry (Monge, 1781; Kantorovich, 1942; Hitchcock, 1941; Rubner, Tomasi, and Guibas, 1998; Villani, 2008). In machine learning, optimal transportation based distances for image classification have been devised, which accommodate the fact that relevant image features may appear at slightly different positions in similar images. While pixel-wise comparisons of two images may fail to recognize similarity under those conditions, optimal transportation based distances operate in a “cross-bins” fashion, so they can treat those shifts in an appropriate way. In neural data analysis we face a similar problem, as spike patterns may present themselves repeatedly with the same overall structure, but not exactly the same timing. The traditional approach to accommodate for such “jitter” is to discretize spike times with a binning procedure, or, in a nearly equivalent way, to use a smoothed version of the spike train time series (van Rossum, 2001). Such approaches require setting an arbitrary scale for the timing precision of neural firing. This is in general difficult, because neural patterns may occur at different temporal scales, and with different jitters. For example, hippocampal place fields fire in sequences at the “behavioral” time scale of hundreds of milliseconds, and because of the phase precession phenomena, they fire so-called “theta” sequences at a much faster (tens of milliseconds) time scales (Dragoi and Buzsaki, 2006; O’Keefe and Recce, 1993; Pastalkova et al., 2008). Repeated sequences at any time scale will be detected by SPOTDis, in particular in combination with a density based algorithm such as HDBSCAN, which can detect state space regions of higher density surrounded by lower region areas, regardless of the absolute density. Using ground-truth simulations, we have shown that SPOTDis can deal with cases in which both coarse and fine patterns co-exist (Figure 4A). Optimal transport theory provides both theoretical grounding, as well as a host of solutions for the efficient calculation of distances. Here, we propose a novel implementation, inspired to work in optimal transport, and tailored to the case of calculating the dissimilarity between point process realization, in our case spike trains.
Distance measures based on “morphing” one spike train into another by moving spikes have been previously proposed. The Victor-Purpura distance, which is an adaptation of the Levenshtein distance to point processes, is a paradigmatic example (Victor and Purpura, 1996). Our approach differs in two fundamental ways.
First, the Victor-Purpura distance allows for the insertion and deletion of spikes, to enable computation of distances between spike trains with different numbers of spikes, adding in each case a penalty term (the penalty terms are arbitrary parameter to be optimized). While this may be a principled way to deal with this issue, it introduces additional complexity in the computation of the distance as many different combinations of spike shifting and insertion/deletion must be considered in order to find the optimal solution. This may render optimization difficult and the computation prohibitive as one attempts to compare a large number of multi-neuron patterns. We take the more simple-minded approach of rescaling the time series to be compared, in order to equalize mass. While this may be an oversimplification in some cases, this enables us to implement the computation in a very efficient way. Yet, we preserve many desirable features of spike train metrics such as the Victor-Purpura distance. For example, SPOTDis is not based on measures of central tendency, but can also compute dissimilarities between multimodal probability distributions (Figure 3A-B). Furthermore, SPOTDis is particularly noise robust, because it can handle jitter in spike timing, as it does not require exact overlap in discretized time bins, but is based on distance computations in a metric space.
A second important difference with spike train metric methods such as Victor-Purpura distance is that we calculate the pairwise epoch-to-epoch dissimilarity not directly on spike trains but on cross-correlograms between pairs of cell spike trains. This has the considerable advantage of enabling detection of similarity between spiking patterns that are misaligned, and eliminates the need for precise time reference points (e.g. the time of stimulus delivery), providing a way to freely search for repeated patterns in spontaneous or evoked activity. Comparing cross-correlation patterns between epochs has been used in seminal work on memory replay, where cross-correlation “bias” was compared across entire sleep or behavioral epochs, to assess the presence of significant replay (Skaggs and McNaughton, 1996; Euston, Tatsuno, and McNaughton, 2007). Here, we provide a method for comparison at a greater granularity, enabling efficient identification of the repeated patterns within time windows of hundreds of milliseconds. A distance based on cross-correlation also has a attractive physiological interpretation: From the perspective of synaptic plasticity, it can be interpreted as the extent to which two patterns have a similar effect on the synaptic plasticity in the network through the STDP rule, which holds that changes in synaptic plasticity depend on the timing jitter between pre-and post-synaptic spikes (Dan and Poo, 2004; Markram et al., 1997). Our dissimilarity measure acts on multi-neuron patterns, and can make use of any additional information available when the monitored neural population increases in size. Because SPOTDis between epoch k and epoch m ignores neuron pairs in which one neuron did not fire in both epochs, it also handles cases in which there is sparse firing and many neurons do not fire at all (Figure 6). Moreover, SPOTDis is based on computing a distance function on the normalized cross-correlation functions. Because of this normalization to unit mass, it copes with global fluctuations in the firing rate, and specific increases in the firing rate for subsets of neurons (Figure 5).
We combined SPOTDis with a density-based clustering algorithm, HDBSCAN, which forms a good match for several reasons: First, it can deal with non-metric dissimilarities. While SPOTDis on a single cell pair cross-correlation is metric (and the sum of metrics is a metric), absence of firing in some neurons and in some cell pairs may cause violation of metricity, which is handled gracefully by HDBSCAN. Second, it can identify clusters at different characteristic densities in different regions of the state space, adapting to patterns that may arise at different time scales and different precision due to disparate underlying mechanisms. Yet, other clustering strategies than HDBSCAN may work successfully as well. We show that in many cases, a non-linear embedding technique such as t-SNE acting on SPOTDis yields a quite intuitive representation of the underlying structure of the data.
We provide an initial application of the SPOTDis measure to real neuronal data, by analyzing multi-electrode recordings in visual cortex. In this analysis, we fed the algorithm the neural data without any knowledge of the task structure, or of the times of stimulus delivery. Strikingly, the identified clusters faithfully reflected the structure of the PSTH calculated with traditional methods, with availability of the stimulus delivery times and labels. Thus, we can recover stimulus information even after normalizing away firing rate information, which is conventionally used to decode different stimuli, demonstrating that the temporal structure of population activity encodes different moving stimulus directions. In conclusion, we have proposed a new tool for the efficient unsupervised analysis of multi-neuron data, which opens up more flexible ways to analyze spontaneous and evoked activity than it has been so far possible.
4 Methods
4.1 Construction of SPOTDis
The SPOTDisClust method contains two steps. The first step is to construct the pairwise epoch-to-epoch SPOTDis measure on the matrix of cross-correlations among all neuron pairs.
We compute, for each of the M epochs separately, the normalized cross-correlation function for all pairs of N neurons (see Methods), which yields M matrices of N(N — 1)/2 normalized cross-correlations . This function is defined where is the cross-correlation function (or cross-covariance), and si;k(t) and sj,k(t) are the spike trains of neurons i and j in the kth trial, with k = 1,…, M, and i,j = 1,…, N. In other words, the normalized cross-correlation histogram is simply the histogram of expected coincidence counts at different delays τ, normalized to unit mass. We define in eq. (5).
We then compute the Earth Mover’s Distance (EMD) between and for each (k,m). This is defined as follows. We first find all the τ for which sij,k(τ) > 0 and sij,m(τ) > 0, defining respectively the (in ascending order) sorted vectors and (we omit i and j subscripts here), with Q and R elements respectively, and associated mass and . If there exist τ for which sij;k (τ) = z with z > 1, , then there will be z elements of τ in and associated mass in , i.e. the same for m and . The vectors and contain the delays between all the pairs of spikes of the two neurons under consideration, for epochs k and m, respectively. Because the EMD is computed on the precise delay times, we can let Δt → 0 and Δt → 0, i.e. the sampling rate can be made infinitely large. In practice, we therefore directly find the vectors and and do not compute for all τ. Let c be the moving cost function which we define as the L1 norm, c(τ1,τ2) ≡ |τ1 — τ2|/(2T + 1). Note that the normalization of ensures that 0 ≤ D ≤ 1 (the EMD) in eq. (6). Solving the optimal transport problem then amounts to finding a matrix of flows F ≡ [fq,r], with fq,r the flow (i.e. the amount of mass moved) from xq to yr, such that the overall cost is minimized, i.e. subject to the constraints that
Here, D is the EMD. This follows the standard definition of the EMD with normalized mass. Note that the EMD with the L1 norm is equivalent to the 1st order Wasserstein distance (which is a special case of the Kantorovich formulation of the optimal transport). However, for simplicity, we use the notation of the EMD here, which uses discrete variables.
We solve the transport problem algorithmically as follows. Suppose that Q ≤ R (the algorithm has the same structure for Q > R). Redefine the mass vectors such that having Q elements, and , having R elements. We then solve the transport problem, given the sorted vectors x and y, with the following algorithm:
SET emd = 0, q = 0, r = 0 WHILE r < R DO SET flow = min(u[q], v[r]) SET cost = flow × c(x[q]; y[r]) SET emd = emd + cost SET u[q] = u[q] − flow SET v[r] = v[r] − flow IF v[r] = 0 THEN SET r = r + 1 ENDIF IF u[q] = 0 THEN SET q = q + 1 ENDIF ENDWHILE SET emd = emd / R
After computing the EMDs between each pair of epochs for each neuron pair separately, with computational complexity of order O(N2M2n2), where n is the average number of spikes, we compute SPOTDis as the average EMD, i.e. where sgn is the sign function.
4.2 HDBSCAN
HDBSCAN is an automated density clustering algorithm that clusters on the basis of pairwise dissimilarity matrices. An extensive overview of HDBSCAN can be found in (Campello et al., 2015; McInnes, Healy, and Astels, 2017) and we provide only a brief overview of HDBSCAN here. HDBSCAN comprises the following steps:
After pairwise distances have been computed between all data points, HDBSCAN defines a “mutual reachability distance” between each pair of data points (in our case epochs). The mutual reachability distance is an adjustment of the distance measure that effectively acts as a smoother. For each epoch k, the core distance is defined as the SPOTDis dissimilarity, (eq. (2)), to its nptsth nearest neighbour m. The mutual reachability distance is then defined as . The mutual reachability distance does not alter the distance between two points that are in dense regions, but it changes the distance for points that are in low-density regions and have a relatively large distance. The purpose of transforming the distance matrix in this way is to make the clustering algorithm more robust to noise.
HDBSCAN then defines a minimum spanning tree, in which there is a path between all points (vertices), without any loops (i.e. an acyclic graph), such that the total weight of the edge connections is minimized. Here the edges are the mutual reachability distances.
HDBSCAN then constructs an hierarchical cluster dendrogram from the minimum spanning tree as follows: Initially all points are assigned to the “root”, a single cluster containing all points. HDBSCAN then sets a threshold ∊ and cuts edges from the minimum spanning tree whose weight is higher than ∊. HDBSCAN keeps decreasing the value of ∊ such that more connections are removed and new clusters can appear, forming a cluster dendrogram. Sets of points that have fewer than ndSize members, the minimum cluster size, at a value of ∊ are deemed noise points at that value of ∊. Here we take the simplification npts = nclSiz6 (Campello et al., 2015), such that ∊pts is the only hyperparameter, which we set to 10 here, unless specified otherwise. We use the implementation of HDBSCAN developed by McInnes, Healy, and Astels (2017), and all analyses and simulations were performed in Python. HDBSCAN uses either the “leaf selection” method for selecting clusters, or the “excess of mass” method. In the leaf selection method, the end branches of the hierarchical cluster dendrogram are taken as the selected clusters. In the “excess of mass” algorithm, HDBSCAN chooses the set of clusters that is most stable under a change of ∊. This is done as follows: at some value of ∊ a cluster is born, ∊max, and at some point, the cluster dies, ∊mira. For each member of the cluster we can define the value of ∊k where each kth member fell out, and take the stability as the sum over all members. HDBSCAN then selects in the dendrogram the optimal levels at which to cut the tree in order to maximize the stability of the selected clusters, forming a set of clusters. An advantage of this selection procedure is that it allows for clusters of varying density.
4.3 t-SNE
T-SNE (t-distributed stochastic neighbor embedding) is a dimensionality reduction technique for high-dimensional datasets (Maaten and Hinton, 2008). While it typically is computed starting from a high-dimensional dataset that is then converted into a matrix of pairwise Euclidean distances, here we compute it directly on the pairwise dissimilarity matrix. We first outline the algorithm of SNE (Hinton and Roweis, 2003), and after that the adjustments made in t-SNE.
For each two datapoints (k and m), which represent epochs in our case, a measure of similarity is computed. This measure of similarity is taken as the conditional probability of observing m given a Gaussian distribution centered on k,
Here, the standard choice for dkm is the LI norm for a high dimensional dataset x1,…,xM, defined as dkm = ∥xk — xm∥, but we take it here as the SPOTDis, . The normalization in eq. (13) simply assures that the probabilities ∑mPm|k sum to one. The variance of the Gaussian, is determined for each data point individually, by finding <rk to satisfy the equation
Here, H is the entropy function using logarithm base 2, Pk is the probability distribution of all the datapoints given k, Pk = (p1|k,…,pM|k), and ψ is the perplexity which is set by the user, which can be interpreted as a smooth measure of the effective number of neighbors each datapoint has. We set the perplexity i to 30, which is in the typical range used in the literature. T-SNE is generally quite insensitive to choices of the perplexity, which is usually taken in the range 5-50.
SNE then attempts to find a low dimensional set of datapoints, {y1;…, yM}, that have a similar distribution of conditional probabilities (similarities) as the distances derived from the high-dimensional counterparts. In this case the variance of the Gaussian is constant for all datapoints, and
SNE then minimizes a cost function, which in this case is the Kullback-Leibler divergence between pm|k and qm|k over all datapoints. To do that, it starts from a random sample of points (Gaussian distributed) and then performs a gradient descent, in which each point yk is moved around depending on the attraction or repulsion from other datapoints (see Maaten and Hinton, 2008).
T-SNE makes two main adjustments relative to SNE (the rationale behind these two adjustments is extensively discussed in (Maaten and Hinton, 2008)). First, it uses a symmetric measure of similarity between two data points, as the joint probability
Second, it uses a Student’s t-distribution with one degree of freedom instead of a Gaussian for the low dimensional counterparts.
4.4 ARI measure for cluster evaluation
The ARI (Adjusted Rand Index, Hubert and Arabie, 1985) is a measure of similarity between two data clusterings X (here the ground-truth clusters) and Y (here the empirical cluster definitions). We note that the points that are labeled as noise by HDBSCAN are, for the purpose of computing the ARI, assigned to a separate cluster. The computation of the Rand Index considers each pair of observations and determines for each pair of observations (in our case epochs) whether they agree between the two data clusterings. Agreement is defined as either: a) falling in the same cluster in X and in the same cluster in Y, or b) falling in different clusters in X and in different subsets in Y. The reason why agreement must be defined over pairs of observations is that the subset partitions do not have to be matched between X and Y, and that both can contain a different number of clusters. Disagreement is defined as falling in the same subset in X but a different subset in Y, or falling in different subsets in X but the same subset in Y. The Rand Index is then defined as the ratio of the number of agreements between the data clusterings (i.e. one epoch being assigned to the same data clusterings) over the total number of agreements and disagreements. The Adjusted Rand Index corrects for a bias in the Rand Index, by subtracting the ratio of number of agreements over disagreements that is expected by chance.
4.5 Application to neuronal data
One male macaque monkey performed a passive fixation task while moving bar stimuli (white bars on gray background, 0.25 degrees in visual angle width) were presented. All procedures complied with the German law for the protection of animals and were approved by the regional authority (Regierungspräsidium Darmstadt). Recordings were performed from 64 V1 channels simultaneously, obtained from a chronic Utah array implant (Blackrock). Receptive fields had eccentricities around 3-5 degrees visual angle. We performed band-pass filtering of each channel in the frequency range of action potentials (300-6000Hz) and then thresholded the band-pass filtered signal x(t) according to (Quiroga, Nadasdy, and Ben-Shaul, 2004), using the threshold , where med is the median (i.e. effectively three standard deviations). When the signal x(t) crossed this threshold, we denoted a spike. After the detection of a threshold crossing, further threshold crossings were suppressed for 0.75ms. Moving bar stimuli were presented in four cardinal directions. Each stimulus bar was presented 20 times. We then pooled all 80 trials together, and added 80 trials containing spontaneous activity. Our aim was then to recover the separate stimulus conditions using unsupervised clustering with SPOTDisClust. We use npts = 3 with leaf selection for the HDBSCAN parameters.
4.6 Acknowledgement
Dr. Michael Schmid, Dr. Katharine Shapcott, Dr. Joscha Schmiedt, Dr. Richard Saunders, Dr. Pascal Fries, Cem Uran and Alina Peters were responsible for the collection and preprocessing of the experimental monkey V1 data, with financial support from DFG Emmy Nother grant 2806 (Dr. Michael Schmid). We thank Dr. Felix Effenberger for inspiring discussions on this topic and helpful comments. LG was financially supported by the Erasmus Plus Traineeship Program. FPB was financially supported by the European Union FP7 Project 600925 “Neuroseeker”. MV was financially supported by the Ernst Strüngmann Institute for Neuroscience in Cooperation with Max Planck Society, Frankfurt am Main, Germany.