Abstract
Objective Extracellular electrical recording of neural activity is an essential tool in neuroscience, and spike sorting is a fundamental step to process the recordings. Thus it requires spike sorting algorithms to perform robustly in the face of noise and perturbation. A few algorithms have been proposed to overcome these challenges and difficulties. However, when noise level or waveform similarity becomes relatively high, their robustness still faces big challenges.
Approach Here, we propose a spike sorting method, using Linear Discriminant Analysis (LDA) for feature extraction and Density Peaks (DP) for clustering. Since DP well adapts to the diverse distribution of spikes, LDA in our method can concentrate on features associated with clustering according to the feedback conveyed by DP and ignores the unassociated features. Through a combination of LDA and DP, our method can maintain a highly robust performance in various complex data situations.
Main results In this study, we compete the proposed density-peaks-based framework with several algorithms. It has demonstrated its high level of sorting accuracy and cluster quality, and outperforms previously established methods in both simulated and real extracellular recordings.
Significance Due to the rapid development of acquisition and recording technology, neuroscience involves more and more complex and precise signal analysis and decoding research, which makes it imperative to adopt one highly robust spike sorting method for preprocessing. Through evaluations, it has been explicitly shown that the proposed algorithm has strong robustness under high noise levels and high similarity of spike waveforms, and can meet research requirements.
Introduction
The development of neuroscience has put forward high requirements for the analysis of neural activity at both single neuron [1-4] and population level [5-7]. The basis of this series of analyses is the correct assignment of each detected spike to the originating neurons, a process called spike sorting [8-11].
Spike sorting methods will encounter difficulties in the face of noise and perturbation. Since the algorithms commonly fall into two processes [12-13], feature extraction and clustering, an outstanding spike sorting algorithm needs to be highly robust both in the feature extraction and clustering process.
For feature extraction methods, the extracted features are descriptions of spikes in low-dimensional space. An appropriate feature extraction method can reduce data dimensions while ensuring the degree of differentiation [1]. Currently, extracting the geometric features of waveforms is the simplest way, including peak-to-peak value, width, zero-crossing feature, etc [14]. Although this sort of method is easy to operate with extremely low complexity, it has a low degree of differentiation for similar spikes and is highly sensitive to noise [12-13]. First and Second Derivative Extrema (FSDE) calculates the first and second derivative extrema of spikes, which is relatively simple and has certain robustness to noise [15]. Some more complicated methods like Principal Components Analysis (PCA) [13,16] and Discrete Wavelet Transform (DWT) [17,18] have higher robustness. In 2004, Quiroga et al. proposed an improvement based on DWT, using the Lilliefors modification of the Kolmogorov-Smirnov (KS) test and selecting the wavelet coefficients with the largest variance as features [19].
So far, many previous methods tend to be perturbed by noise or the complexity of the data, they cannot effectively extract the features with identification that easy to cluster and the features always overlap in the subspace, resulting in poor effect in the subsequent clustering, especially in the case of high noise level and data similarity [20]. How can the feature extraction method avoid interference from noise, and extract features with high differentiation for better clustering? To solve this problem, studies have shown that it is a good idea to improve the robustness by using supervised feature extraction and clustering iteration to get the optimal subspace with strong clustering discrimination [20-22]. Ding et al. proposed the LDA-Km algorithm, which used K-means to obtain classification results and LDA to find the feature space based on the results, and then continuously iterated the two algorithms to convergence [22]. Compared with PCA and DWT, the feature extracted by this method has better clustering separability and effectively improves the sorting accuracy [23].
Clustering algorithms are also developing with the update of data analysis methods. Early on, the commonly used method was manually [24] segmenting clusters, and due to the need for human participation, it may introduce some errors. When there came more channels, the workload of operations became higher, and it was less used later. Nowadays, some commercial software provides manual border demarcation to aid classification. K-means (Km) [13] is a widely used clustering method. It is simple to calculate but requires users to determine the number of clusters in advance. Thus, it is sensitive to the initial parameters and lacks robustness [25-26]. There are also distribution-based methods, such as Bayesian Clustering [13] and Gaussian Mixture Model [27-29], which represent the data with Gaussian-distribution-assumptions. The methods based on neighboring relations can avoid assumptions, for example, the Superparamagnetic Clustering [19,30]. In addition, Neural Networks [31], T-distribution [32], Hierarchical Clustering [33] and Support Vector Machines [34,35] are also used in spike sorting. In 2017, Keshtkaran et al. introduced the Gaussian Mixture Model (GMM) based on LDA-Km and put forward the LDA-GMM [20] algorithm, which had high accuracy and strong robustness against noise and outliers. LDA-GMM is very complex and iterates several times by changing the initial value of important parameters (such as the initial projection matrix) to obtain the optimal result, and it also calls LDA-Km during operation which brings additional computation complexity. The previous research results have shown that it is a good idea to iteratively run LDA and clustering methods and get the feature space with strong clustering discrimination to improve sorting accuracy and robustness.
However, the distribution-based algorithms, such as GMM, attempt to reproduce the observed realization of data points as a mix of predefined probability distribution functions. Thus, for a random distribution, the sorting performance of such methods depends on the capability of the trial probability to describe the data. In K-means, data points are assigned to the nearest center, and the method may face difficulties when detecting nonspherical clusters. Methods based on the local density of data points can easily detect clusters with an arbitrary shape. Density Peaks (DP) proposed by Rodriguez et al. defines the cluster centers as local maxima in the density of data points [36]. This algorithm does not assume the data distribution and can adapt to the nonspherical distribution, which is more applicable to the complex distribution of spikes in vivo, making it a win-win for robustness and computation cost. Therefore, this paper intends to adopt the ideas above and combine LDA and DP as a new spike sorting algorithm LDA-DP.
Methods
1. An overview of the LDA-DP algorithm
In this study, we proposed a spike sorting algorithm based on LDA and DP. LDA is a supervised machine learning method, requiring prior information of cluster labels. The data is initially projected into an initial subspace and then clustered by Density Peaks to obtain cluster labels. Thus, we need an initial projection matrix W. As is summarized in Algorithm 1, projection matrix W is initialized for LDA by executing PCA on spike matrix X, cut the first d coefficients, and assign it to W. We chose d=3 for overall consideration to maintain performance and computation complexity, in line with the other feature extraction methods compared in this study. In each iteration, the algorithm obtains a clustering result L. The iteration ends when the clustering result in the current iteration (L) is relatively consistent with that in the previous iteration (L_pre) or the number of iterations reaches the upper limit Maxlte. The minimum number of iterations Minlte ensures that the algorithm iterates adequately. The suggested value for minimum iteration Minlte is 5 and maximum iteration Maxlte is 50. Finally, on the last step of the algorithm, the similar clusters are merged and we obtain the sorting result Lmerge.
2. Discriminative feature extraction using Linear Discriminant Analysis
Linear Discriminant Analysis (LDA), also known as “Fisher Discriminant Analysis”, is a linear learning method proposed by Fisher [21]. LDA is a supervised machine learning method, which finds an optimal feature space, where the intra-class scatters are relatively small and inter-class scatters are relatively large.
For a multi-cluster dataset, the quality of clusters can be measured by the intra-class scatter metric Sw and the inter-class scatter metric Sb, as shown in Formula (2.1) and (2.2): xi denotes the i th data point in the k th cluster Ck, μk denotes the mean value of data points in Ck, nk denotes the number of data points in Ck, μ denotes the mean value of all data points, and n denotes the total number of data points.
To calculate the projection matrix W, LDA performs optimization by maximizing objective function J (Formula (2.3)). Then data points can be projected to a d-dimensional subspace which captures discriminative features by the obtained projection matrix W. In this study, d was fixed to 3 by default.
3. Clustering features based on Density Peaks
The groundwork of the Density Peaks Algorithm (DP) [36] is very simple. For each data point, two parameters are calculated, the local density ρ of the point and the minimum distance δ to the data point whose local density is larger than this current point. The DP algorithm assumes that if a point is a cluster center, it will satisfy both two conditions: (1) its local density ρ is high; (2) it is far away from another point that has a larger local density. That is, the center of the cluster is large for both ρ and δ. After the cluster centers are identified, the remaining data points are allocated according to the following principle: each point falls into the same cluster with its nearest neighbor point n_up who has a higher local density.
In this study, the Gaussian kernel is adopted to calculate local density. Local density ρ of the i th point is shown in Formula (2.4): dij donates the euclidean distance between the sample yi and yj, as is shown in Formula (2.5): dc donates the cutoff distance. In this study, we defined cutoff distance by selecting a value in ascending sorted sample distances d_sort: t is the cutoff distance index, and as a rule of thumb, it generally ranges from 0.01 to 0.02, and f(•) denotes the rounding function.
The minimum distance δ and the nearest neighbor point n_up is calculated in Formula (2.7) and (2.8) where ρmax denotes the maximum local density: To automate [36] the search for the cluster centers where both ρ and δ are large, we creatively defined the DP index λ as the product of ρ and δ, as shown in Formula (2.9). The algorithm selects K data points with the largest DP index as the clustering centers. If the data is randomly distributed, then the distribution of λ is in line with the monotonically decreasing power function, and the DP index of the cluster centers is significantly higher, which makes it feasible to select cluster centers according to the DP index λ. As iteration times increase, the DP index difference between the center and the non-center increases. As the K value is determined, the method can be automated. Since K denotes the initial number of clusters, its default value in this study is 4.
For dataset X which contains n data points , giving cutoff distance index t and the initial number of clusters K, the flow of the DP clustering algorithm is as follows:
step1: Calculate the distance between every two data points dij(i,j = 1,2,… n, i < j), as shown in Formula (2.5).
step2: Calculate the cutoff distance dc as shown in Formula (2.6).
step3: For each data point, calculate the local density ρ, the minimum distance δ, and nearest neighbor point n_up, as shown in Formula (2.4) (2.7) and (2.8).
step4: For each data point, calculate the DP index λ, as shown in Formula (2.9).
step5: Select cluster centers: the point with the largest λ is the center of Cluster 1, the point with the second-largest λ is the center of Cluster 2, and so on to get K centers.
step6: Classify non-center points: rank the non-center points in descending order according to their local density ρ, traversing each non-center point, and then the label Li of the i th point is calculated as Formula (2.10): where Ln_upi denotes the cluster label of the nearest neighbor point of the i th point.
Here, we demonstrated the process using the testing set C1_005 (see Evaluation). In the feature extraction step, LDA finds out the feature subspace with the optimal clustering discrimination through continuous iteration (Figure 1a). For each data point, our algorithm calculates its local density (ρ) and the minimum distance (δ) between the current point and the data point with a larger local density. As described in the previous section, cluster centers are the points whose ρ and δ are relatively large. DP screens the cluster centers using a previously defined DP index λ that is the product of ρ and δ. Figure 1b shows a schematic diagram where ρ and δ are set as the horizontal and vertical axes in the case of screening three cluster centers. The screened centers are the three points with the largest λ. They are circled in three colors corresponding to three clusters. The defined DP index is competent for center point screening for the screened points all have a large ρ and δ value. As a result, Density Peaks clustering obtains the three clusters (Figure 1c), and Figure 1d shows the waveforms of each cluster center.
4. Determining the number of clusters
The last step of the algorithm is a cluster merging step, and through which the number of the clusters will be determined. The purpose of cluster merging is to avoid similar clusters being over-split. After the merging step, the number of clusters is determined automatically. The number of clusters is a critical parameter required by many spike sorting algorithms. However, manually setting unknown parameters in advance relies heavily on the experience of operators and may cause problems in practice. Thus, the merging step is of great importance, for without the merging step to determine the number of the clusters, no matter it is estimated too high or too low, the performance will decline sharply. The cluster merging finds the clusters similar enough, combines them, and repeats the process. Since the number of clusters is uncertain, a threshold is usually used as the ending condition. That is, once the similarity between the most similar clusters goes below the threshold, the merging is stopped, and thus the number of clusters is automatically determined.
The similarity between clusters can be measured in several ways. Common distance metrics include the Minkowski distance, the cosine distance and the inner product distance [37]. According to the Davis-Bouldin Index (DBI) [38-41], we defined cluster similarity as the ratio R of the compactness CP and the separation SP.
Intra-class distance is a parameter to evaluate the internal compactness of a cluster. Thus, the compactness CP can be calculated by (2.11). CPk denotes the within-class distance of the cluster Ck, yi denotes the i th data point in the kth cluster Ck, yck denotes the center of Ck.
Inter-class distance is a parameter to evaluate the separation of clusters. Thus, the separation SP can be calculated by (2.12). SPab denotes the inter-class distance between the cluster Ca and Cb, and yca and ycb denotes the center of the cluster Ca and Cb, respectively.
Similarity metric R is shown in Formula (2.13) If two clusters have high similarity, the two clusters are merged. We set the threshold Rth as a proportional function of the mean value of R, as is shown in Formula (2.14) α denotes the threshold coefficient. The threshold should be significantly higher than the mean value, and as a rule of thumb, α is above 1.4.
The flow of the cluster merging algorithm is as follows:
step1: Calculate the compactness CP for each cluster, as shown in Formula (2.11)
step2: Calculate the separation SPab (a, b = 1,2, …K, a < b) for every two clusters, as shown in Formula (2.12)
step3: Calculate similarity metric Rab for for every two clusters, as shown in Formula (2.13)
step4: Calculate the threshold Rth, as shown in Formula (2.14)
step5: Find the maximum similarity Rab. If Rab > Rth, merge cluster a. and cluster b, set the center of cluster a as the new center, K = K — 1, return to step 1; Otherwise, stop merging.
Notably, the threshold coefficient α is a key parameter that largely affects the merging results. Figure 2a shows the influence of the threshold coefficient α on algorithm performance (accuracy) on Dataset A (see Evaluation). The mean accuracy reaches the highest when α = 1.6. Therefore, we found an appropriate value of α = 1.6 to make the algorithm achieve a general optimal performance on all datasets. In subsequent evaluation, we fixed α as 1.6.
To visualize the effects of the merging step, we selected the testing set C1_020 (see Evaluation) for illustration. We obtained the threshold Rth in Formula (2.14) with α = 1.6. The number of clusters is 4 before merging in Figure 2b and is 3 after merging in Figure 2c. Figure 2d-e show the similarity between every two clusters measured by DBI [37] before and after merging respectively. It is worth noting that the cluster similarity between cluster 3 and 4 is above the threshold (threshold=0.99) before merging (Figure 2d), while all cluster similarity is below the threshold (threshold=0.73) when the merging step finishes (Figure 2e). Thus, our proposed algorithm can automatically determine the number of clusters through the cluster merging step.
Evaluation
1. Datasets
Spike waveform data containing cluster information are generally obtained in two ways. One way is to use simulated data that quantifies algorithm performance and compares different algorithms. The other way is in-vivo extracellular recordings capturing the variability inherent in spike waveforms, which does not exist in the simulated data.
Dataset A: simulated dataset wave_clus
In this study, we used one common simulated dataset wave_clus provided by Quiroga et al. [19]. In the simulation study, spike waveforms have a Poisson distribution of interspike intervals, and the noise is similar to the spikes in the power spectrum. In addition, the spike overlapping, electrode drift and explosive discharge under real conditions are simulated. To date, wave_clus has been used by many spike sorting algorithms for evaluating sorting performance.
Dataset A contains four sets of data C1, C2, C3 and C4. Each testing set contains three distinct spike waveform templates, in which template similarity levels are significantly different (C2, C3 and C4>C1) and the background noise levels are represented in terms of their standard deviation: 0.05, 0.10, 0.15, 0.20 (C1, C2, C3 and C4), 0.25, 0.30, 0.35, 0.40 (C1). Both similarity levels and noise levels will affect the classification performance. In this study, the correlation coefficient (CC) was used to evaluate the similarity levels of spike waveforms. The higher the correlation of the two templates, the higher the similarity of the waveforms and the more difficult it will be to distinguish the two clusters.
According to spike time information, the waveforms were extracted from the wave_clus dataset, and then the spike alignment was conducted. Each waveform lasts about 2.5ms and is composed of 64 sample points. The peak value was aligned at the 20 th sample point.
Dataset B: public in-vivo real recordings HC1
HC1 is another publicly available in-vivo dataset, which provides the extracellular and intracellular signals from rat hippocampal neurons [41]. We used the synchronized intracellular recording as the label information of extracellular recording to obtain partial ground truth. The dataset d533101:6 contains the intracellular potential of a single neuron, while the dataset d533101:4 contains simultaneous waveforms of this single neuron as well as some other neurons. After detection and alignment, the spike waveform data can be used to evaluate the algorithm’s performance.
Raw data were filtered by a Butterworth bandpass filter (filter frequency band 300-5000Hz), and the spikes were detected by double thresholding at Formula (2.18). 3000 extracellular spikes were obtained from extracellular recording (dataset d533101:4) through double threshold detection, while intracellular recording (dataset d533101:6) had little noise, thus single threshold detection was adopted to obtain 849 intracellular action potentials. If the difference between the extracellular spike time and the intracellular peak time is within 0.3 ms, they are regarded as the same action potential [30]. After analysis, we obtained 800 spikes in the extracellular recording, corresponding to the action potential in the intracellular recording. Thereafter we call them the marked spikes, and the rest 2200 spikes are called unmarked spikes.
Dataset C:in-vivo real recordings from a no-human primate
We also compared the performance of spike sorting algorithms on in-vivo recordings from the primary motor cortex of a macaque performing a center-out task in our previous studies. The in-vivo data were collected from a 96-channel Blackrock microelectrode array by using a commercial data acquisition system (Blackrock Microsystem, USA). Testing sets were obtained from 30 stable channels by measuring the stationarity of spike waveforms and the interspike interval (ISI) distribution.
2. Performance measure metrics
One of the performance measure metrics is the sorting accuracy that is the percentage of the detected spikes labeled correctly. For sample set D, the accuracy of classification algorithm f is defined as the ratio of the number of spikes correctly classified to the total number of spikes used for classification. The calculation is shown in formula (2.19): Another metrics is the Davies-Bouldin index (DBI) [37]. DBI is a common index to evaluate the quality of clustering, which does not require prior information of clusters. The similarity between clusters is denoted as R to quantitatively evaluate cluster quality. DBI calculates the worst-case separation of each cluster and takes the mean value, as shown in Formula (2.20). K denotes the number of clusters. A small DBI index indicates a high quality of clustering.
To evaluate performance on real dataset HC1 with partial ground truth, we considered it as a binary classification problem. The classification results were divided into four cases: True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN). We evaluated the performance of the algorithm in terms of True Positive Rate (TPR), False Positive Rate(FPR) and False Negative Rate(FNR).
Results
In this study, our LDA-DP algorithm was compared with five other spike sorting methods on one simulated dataset and two real datasets concerning several performance measure metrics. For comparison, we choose the algorithm LDA-GMM [20], which has the same feature extraction method as LDA-DP, and choose the algorithm PCA-DP, which has the same clustering method. Then we select two classic and widely-used spike sorting algorithms, PCA-Km [13] and LE-Km [25], along with a recently proposed algorithm GMMsort [29]. These algorithms are all unsupervised and automated, except that GMMsort needs some manual operation in the last step of clustering. In comparison, the feature subspace dimension was fixed as 5 for GMMsort [29] by default and 3 for the rest algorithms [13,20,25].
1. Performance comparison in the simulated Dataset A
A prominent feature extraction method can find the low-dimensional feature subspace with a high degree of differentiation, which is the basis of the high performance of the whole algorithm. Thus, we compared the robustness of different feature extraction methods.
As the noise level or waveform similarity increases, the feature points of different clusters will gradually get closer in the feature subspace, the inter-class distance will decrease, and the boundary will be blurred, increasing classification difficulty. Thus, we chose the testing set C3 with high waveform similarity as the testing set to compare the performance and the noise resistance among five feature extraction methods (PCA-KM and PCA-DP used the same feature extraction method PCA).
When the noise level increases, the standard deviation of each waveform template increases (Waveforms column in Figure 3), bringing difficulties to feature extraction. In this case, feature points extracted by the LDA method in LDA-DP and LDA-GMM are clustered separately, while in the contrast, feature points from the rest three methods are overlapped to some degree. Even under the worst condition when the noise level rises to 0.20, the proposed LDA-DP algorithm has the least overlapped feature points among the five methods. It notes that the feature extracted by the LDA-DP algorithm has high robustness to noise and waveform similarity.
We examined the performance of the 6 algorithms (PCA-Km, LE-Km, PCA-DP, LDA-GMM, GMMsort, and LDA-DP) on the Dataset A, excluding the overlapping spikes. In order to compare the robustness of each algorithm, two performance metrics were employed in this study: sorting accuracy and cluster quality. For each testing set, 5-fold cross-validation was performed. For PCA-Km and LE-Km, the number of clusters was set to 3; And for PCA-DP, LDA-GMM, GMMsort, and LDA-DP, the number of clusters can be determined automatically.
Table 1 presents the average and the standard deviation (std) of the sorting accuracy. It is worth noting that the average sorting accuracy of LDA-DP on most of the testing sets is higher than that of the other methods. At the same time, LDA-DP also achieves a lower standard deviation of the average accuracy on most of the testing sets.
In the testing set C1 whose classification difficulty is low, most algorithms achieve high accuracy. As the noise level rises to 0.40, the sorting accuracy of the rest five algorithms drops below 90%, but the accuracy of LDA-DP is still up to 90.7%. Moreover, in the testing set C2, C3 and C4, when both the waveform similarity and the noise level increase, only LDA-DP and LDA-GMM can maintain high accuracy relatively. The comparative results indicate that these two algorithms have a great power to distinguish waveforms and are highly resistant to noise. LDA-DP is especially outstanding because it maintains a higher sorting accuracy steadily (>85%). On average, the mean accuracy of LDA-DP reaches 96.2%, which is the highest in all 6 algorithms. At the same time, LDA-DP achieves the lowest mean standard deviation (std=4.5).
To further visualize the robustness of each algorithm concerning noise, we plotted the changing curve of performance with four or eight noise levels on 4 simulated datasets (C1, C2, C3 and C4). Figure 4a, c, e and g show the accuracy curve, while Figure 4b, d, f and h show the DBI curve. In Figure 4, as the noise level increases, the performance of all algorithms drops (The sorting accuracy decreases and the DBIs increase). When the noise level is low, all of the 6 algorithms have high accuracy and low DBI, and the gaps between algorithms are not obvious. However, when the noise levels increase, the performance of PCA-Km, LE-Km, PCA-DP and GMMsort deteriorates. And in most cases, LDA-DP performs better than LDA-GMM. In all simulated data, LDA-DP displays a high level of performance: the sorting accuracy rate is generally above 85%, and the DBI is generally below 1.5, which is generally superior to other algorithms and shows high robustness to noise.
We also compared the robustness concerning waveform similarity. In the right side of Figure 5a, the shapes of the three waveform templates were plotted for four testing sets. The correlation coefficients (CC) of the three templates were used to measure the similarity level in each testing set (Figure 5a left). The results indicate that the waveform similarity in C2, C3 and C4 is significantly higher than that in C1 (Student’s t test, p<0.01), thus the classification of C2, C3 and C4 is relatively more difficult. In order to intuitively show the algorithm performance differences, we chose to plot the accuracy and DBIs of 6 algorithms on the four testing sets: C1_020, C2_020, C3_020 and C4_020, in which the waveform similarity is diverse and the noise level remains the same (Figure5b-c). In Figure 5b, LDA-DP is superior to the other algorithms in terms of sorting accuracy. In the case of high waveform similarity, the accuracy of other algorithms fluctuates somewhat, while the accuracy of LDA-DP has been maintained. For the DBIs (Figure 5c), the cluster quality of LDA-DP is also promising.
2. Performance comparison in the in-vivo Dataset B
To further evaluate the performance of our algorithm on in-vivo datasets, we compared the performance of LDA-DP and the other 5 algorithms on Dataset B. For PCA-Km and LE-Km, the number of clusters was manually set to 3; And for PCA-DP, LDA-GMM, GMMsort, and LDA-DP, the number of clusters can be determined automatically. Dataset B contains 800 marked spikes and 2200 unmarked spikes from rat hippocampal neurons, in which the marked spikes correspond to intracellular potential and belong to the same cluster, while the clusters of the unmarked spikes are unknown. Figure 6a shows the two-dimensional feature subspace extracted by each method, data points are grouped into three clusters in the subspace, and the waveform column shows the average spike waveforms of the three clusters obtained by LDA-DP. The figure suggests that the LDA method successfully extracts optimal feature subspace due to the feedback from the clustering method (GMM or DP), and the three clusters are much more differentiated in the subspace of LDA (GMM) and LDA (DP) than other methods. Comparing the cluster quality of each algorithm (figure 6b), the DBI index of LDA-DP is the smallest, indicating LDA-DP has the highest cluster quality.
According to the partial ground truth from intracellular potential, we analyzed the results of algorithms, and evaluated the True Positive Rate (TPR), False Positive Rate (FPR) and False Negative Rate (FNR) on the Dataset B. Classification results of each algorithm are shown in Figure 6c, d. Compared with other methods, LDA-DP has a minimum FNR. At the same time, LDA-DP has a relatively higher TPR and a relatively lower FPR. Although LE-Km has a slightly higher TPR and lower FPR, its FNR is dramatically higher. Thus LDA-DP is generally superior to other algorithms. Our results indicate that LDA-DP also has high performance on the Dataset B.
3. Performance comparison in the in-vivo Dataset C
In order to test the robustness of the algorithm in the real case with more complex distribution characteristics, we also compared the performance of 6 algorithms on the in-vivo Dataset C. In particular, as shown in Figure 7a, spikes from one typical channel 54 in dataset C present a more messy distribution in the two-dimensional feature subspaces extracted by each method. The data points are colored by the sorting results and are grouped into four clusters. The Waveforms column shows the shapes of the four spikes due to the sorting results of LDA-DP It indicates that these four spikes have pretty similar shapes. The above complication may pose huge challenges to feature extraction. As Figure 7a suggests, features from different clusters extracted by LDA(DP) are apparently more separable than all other methods, leading the subsequent clustering to be more accurate. In addition, the data points are nonspherical-distributed in the LE subspace. Since clustering methods, such as K-means, have poor performance in identifying the clusters of nonspherical distribution, LE-Km may encounter difficulties when clustering features. Then we compared the cluster quality of 6 algorithms using the spikes in this channel. For PCA-Km and LE-Km, the number of clusters was manually set to 4; And for PCA-DP, LDA-GMM, GMMsort, and LDA-DP, the number of clusters can be determined automatically. In Figure 7b, the DBI index of LDA-DP is significantly lower than that of other algorithms (* p<0.05, ** p<0.01, Kruskal-Wallis test), indicating higher cluster quality and better performance than other algorithms. Moreover, we conducted a comparison on all 30 channels in Dataset C. The results are shown in Figure 7 c, the median of the DBI index for the LDA-DP is lower, and in general, LDA-DP has a significantly higher cluster quality (* p<0.05, *** p<0.001, Kruskal-Wallis test). Thus, LDA-DP also demonstrates outstanding robustness advantages on Dataset C, which is consistent with the results from the previous two datasets.
Discussion
In our study, five algorithms competed with our proposed LDA-DP on one simulated dataset and two real datasets. The LDA-DP exhibits high robustness on both simulated and real datasets. For the simulated data, the LDA-DP maintains an outstanding sorting accuracy and cluster quality, indicating high robustness to noise and waveform similarity. For the real data, the performance of LDA-DP also exceeds other algorithms when facing more complex data distributions.
In this study, the performance of LDA-DP and LDA-GMM is significantly better than the other 4 algorithms (Figure 4-7). This gap between LDA and non-LDA may be due to the characteristics of feature extraction methods: the LDA [21] is a supervised method while the other methods are unsupervised. Through multiple iterations, LDA finds the optimal feature subspace based on the feedback provided by the clustering method, while unsupervised methods get no feedback. Therefore, the advantages of LDA-DP and LDA-GMM lie in the combination of the feature extraction method LDA and the cluster method (DP or GMM).
Moreover, the DP [36] method adopted in this study has more advantages. GMM [42], as its name implies, assumes that data follows the Mixture Gaussian Distribution, and this sort of fitting often encounters difficulties in dealing with more complex situations, where data are not perfectly Gaussian distributed. Several studies in other fields have encountered similar problems [43-47]. Since DP does not make any assumption about data distribution, when addressing real data with more complex characteristics, LDA-DP performs better in both feature extraction and clustering. As the comparison of algorithm performance on in-vivo real Dataset C shows (Figure 7), LDA-DP has a better performance than LDA-GMM.
On the other hand, some classic clustering methods, such as K-means, specify the cluster centers and then assign each point to the nearest cluster center [26]. Thus, this kind of methods perform poorly when applied to nonspherical data. In the contrast, DP algorithm is based on the assumption that the cluster center is surrounded by points with lower density than it, and the cluster centers are relatively far apart. According to this assumption, DP identifies cluster centers and assigns cluster labels to the rest points. Therefore, it can well adapt to the distribution of nonspherical data. This is one of the possible reasons for the outstanding performance of LDA-DP
In addition, LDA-DP is an automated algorithm. Although the values of some parameters may affect the final results, we can still preset some optimized values to avoid manual intervention during operation. For example, the threshold coefficient α. In this study, we fixed α as 1.6 and verified its high performance on one simulation dataset and two in-vivo real datasets. Although whether it is optimal on all datasets needs to be tested and evaluated with more data, the current evaluation has fully demonstrated the general applicability of this value.
Conclusion
By combining LDA and DP, we designed an automated and highly robust spike sorting algorithm. Based on the iteration of LDA and DP, the algorithm continuously improves the differentiation of feature subspace and finally achieves high spike sorting performance. After evaluation on one simulated dataset and two in-vivo datasets, we can conclude that the LDA-DP meets the requirements of high robustness for sparse spikes in the cortical recordings.