Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

MEArec: a fast and customizable testbench simulator for ground-truth extracellular spiking activity

View ORCID ProfileAlessio P. Buccino, View ORCID ProfileGaute T. Einevoll
doi: https://doi.org/10.1101/691642
Alessio P. Buccino
1Centre for Integrative Neuroplasticity (CINPLA), University of Oslo, Oslo, Norway
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Alessio P. Buccino
  • For correspondence: alessiob@ifi.uio.no
Gaute T. Einevoll
1Centre for Integrative Neuroplasticity (CINPLA), University of Oslo, Oslo, Norway
2Faculty of Science and Technology, Norwegian University of Life Sciences, Ås, Norway
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Gaute T. Einevoll
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

When recording neural activity from extracellular electrodes, both in vivo and in vitro, spike sorting is a required and very important processing step that allows for identification of single neurons’ activity. Spike sorting is a complex algorithmic procedure, and in recent years many groups have attempted to tackle this problem, resulting in numerous methods and software packages. However, validation of spike sorting techniques is complicated. It is an inherently unsupervised problem and it is hard to find universal metrics to evaluate performance. Simultaneous recordings that combine extracellular and patch-clamp or juxtacellular techniques can provide ground-truth data to evaluate spike sorting methods. However, their utility is limited by the fact that only a few cells can be measured at the same time. Simulated ground-truth recordings can provide a powerful alternative mean to rank the performance of spike sorters. We present here MEArec, a Python-based software which permits flexible and fast simulation of extracellular recordings.

MEArec allows users to generate extracellular signals on various customizable electrode designs and can replicate various problematic aspects for spike sorting, such as bursting, spatio-temporal overlapping events, and drifting. We expect MEArec will provide a common testbench for spike sorting development and evaluation, in which spike sorting developers can rapidly generate and evaluate the performance of their algorithms.

Introduction

Extracellular neural electrophysiology is one of the most used and important techniques to study brain function. It consists of measuring the electrical activity of neurons from electrodes in the extracellular space, that pick up the electrical activity of surrounding neurons. To communicate with each other, neurons generate action potentials, which can be identified in the recorded signals as fast potential transients called spikes.

Since electrodes can record the extracellular activity of several surrounding neurons, a processing step called spike sorting is needed. Historically this has required manual curation of the data, which in addition to being time consuming also introduces human bias to data interpretations. In recent years, several automated spike sorters have been developed to alleviate this problems. Spike sorting algorithms [40, 28] attempt to separate spike trains of different neurons (units) from the extracellular mixture of signals using a variety of different approaches. After a pre-processing step that usually involves high-pass filtering and re-referencing of the signals to reduce noise, some algorithms first detect putative spikes above a detection threshold and then cluster the extracted and aligned waveforms in a lower-dimensional space [38, 41, 8, 22, 25]. Another approach consists of finding spike templates, using clustering methods, and then matching the templates recursively to the recordings to find when a certain spike has occurred. The general term for these approaches is template-matching [37, 43, 10]. Other approaches have been explored, including the use of independent component analysis [24, 3] and semi-supervised approaches [27].

The recent development of high-density silicon probes both for in vitro [2, 13] and in vivo applications [35, 26] poses new challenges for spike sorting [42]. The high electrode count calls for fully automatic spike sorting algorithms, as the process of manually curating hundreds or thousands of channels becomes more time consuming and less manageable. Therefore, spike sorting algorithms need to be be capable of dealing with a large number of units and dense probes. To address these requirements, the latest developments in spike sorting software have attempted to make algorithms scalable and hardware-accelerated [37, 25, 43].

The evaluation of spike sorting performance is also not trivial. Spike sorting is unsupervised by definition, as the recorded signals are only measured extracellularly with no knowledge of the underlying spiking activity. A few attempts at providing ground-truth datasets, for example by combining extracellular and patch-clamp or juxtacellular recordings [21, 19, 35, 43, 31, 1] exist, but the main limitation of this approach is that only one or a few cells can be patched at the same time, providing very limited ground-truth information with respect to the number of neurons that can be recorded simultaneously from extracellular probes.

Biophysically detailed simulated data provide a powerful alternative and complementary approach to spike sorting validation [11]. In simulations, recordings can be built from known ground-truth data for all neurons, which allows one to precisely evaluate the performance of spike sorters. Simulators of extracellular activity should be able to replicate important aspects of spiking activity that can be challenging for spike sorting algorithms, including bursting modulation, spatio-temporal overlap of spikes, unit drifting over time, as well as realistic noise models. Moreover, they should allow users to have full control over these features and they should be efficient and fast.

In the last years, there have been a few projects aiming at developing neural simulators for benchmarking spike sorting methods [6, 18, 33]: Camunas et al. developed NeuroCube [6], a MATLAB-based simulator which combines biophysically detailed cell models and synthetic spike trains (a so called “hybrid approach”) to simulate the activity of neurons close to a recording probe, while noise is simulated by the activity of distant neurons. NeuroCube is very easy to use with a simple and intuitive graphical user interface (GUI). The user has direct control of parameters to control the rate of active neurons, their firing rate properties, and the duration of the recordings. The cell models are shipped with the software and recordings can be simulated on a single electrodes or a tetrode. It is relatively fast, but the cell model simulations (using NEURON [7]) are re-simulated for every recording.

Hagen et al. developed ViSAPy [18], a Python-based simulator that uses multi-compartment simulation of single neurons to generate spikes, network modeling of point-neurons in NEST [9] to generate synaptic inputs onto the spiking neurons, and experimentally fitted noise. ViSAPy does not use a hybrid approach, as it runs a full network simulation in NEURON [7] and computes the extracellular potentials using LFPy [29, 17]. ViSAPy implements a Python application programming interface (API) which allows the user to set multiple parameters for the network simulation providing the synaptic input, the probe design, and the noise model generator. Cell models can be freely chosen and loaded using the LFPy package. Further, 1-dimensional drifting can be incorporated in the simulations by shifting the electrodes over time [12]. Learning to use the software and, in particular, tailoring the specific properties of the resulting spike trains, for example burstiness, requires some effort by the user. As the running of NEURON simulations with biophysically detailed neurons can be computationally expensive, the use of ViSAPy to generate long-duration spike-sorting benchmarking data is boosted by access to powerful computers.

Mondragon et al. developed a Neural Benchmark Simulator (NBS) [33] extending the NeuroCube software. NBS extends the capability of NeuroCube for using user-specific probes, and it combines the spiking activity signals (from NeuroCube), with low-frequency activity signals, and artifacts libraries shipped with the code. The user can set different weight parameters to assemble the spiking, lowfrequency, and artifact signals, but these three signal types are not modifiable.

Despite the existence of such tools for generating benchmarking data, their use in spike sorting literature has until now been limited, and the benchmarking and validation of spike sorting algorithms non-standardized and unsystematic. A natural question to ask is thus how to best stimulate the use of such benchmarking tools in the spike sorting community.

From a spike sorting developer perspective, we argue that an ideal extracellular simulator should be i) fast, ii) controllable, iii) biophysically detailed, and iv) easy to use. A fast simulator would enable spike sorter developers to generate a large and varied set of recordings to test their algorithms against and to improve their spike sorting methods. Controllability refers to the possibility to have direct control of features of the simulated recordings. The ideal extracellular spike simulator should include the possibility to use different cell models and types, to decide the firing properties of the neurons, to control the temporal and spatio-temporal synchrony of extracellular spikes, to generate recordings on different probe models, and to have full reproducibility of the simulated recordings. A biophysically detailed simulator should be capable of reproducing key physiological aspects of the recordings, including, but not limited to, bursting spikes, drifting between the electrodes and the neurons, and realistic noise profiles. Finally, to maximize the ease of use, the ideal extracellular simulator should be designed as an accessible and easy to learn software package. Preferably, the tool should be implemented with a graphical user interface (GUI), a command line interface (CLI), or with a simple application programming interface (API).

With these principles in mind, we present here MEArec, an open-source Python-based simulator. MEArec provides a fast, highly controllable, biophysically detailed, and easy to use framework to generate simulated extracellular recordings. In addition to producing benchmark datasets, we developed MEArec as a powerful tool that can serve as a testbench for optimizing existing and novel spike sorting methods. To facilitate this goal, MEArec allows users to explore how several aspects of recordings affect spike sorting, with full control of challenging features such as bursting activity, drifting, spatiotemporal synchrony, and noise effects, so that spike sorter developers can use it to help their algorithm design. Moreover, MEArec has an extensive documentation https://mearec.readthedocs.io/ and the code is tested with a continuous integration platform1.

Results

Getting started with MEArec: a simple tetrode dataset generation

One of the key goals of MEArec is to ease the simulation of extracellular recordings and make it fully reproducible. In order to demonstrate this, we first show and break down the commands used to generate a simple tetrode recording that we will to further characterize in the rest of the paper.

MEArec, at installation, comes with 13 layer 5 cortical cell models from the Neocortical Microcircuit Portal [39, 4]. This enables the user to dive into simulations without the need to download and compile cell models. On the other hand, the initial cell models can be easily extended by downloading more cell models and placing them in the cell models folder.

To generate 30 extracellular spikes (also referred as templates) per cell model recorded on a shank tetrode probe, the user can simply run this command:

>> mearec gen-templates -prb tetrode-mea-l -n 30 --seed 0 … Saved templates in path-to-templates-file.h5

The -prb option allows for choosing the probe model, -n controls the number of templates per cell model to generate, and the --seed option is used to ensure reproducibility and if it is not provided, a random seed is chosen. In both cases, the seed is saved in the HDF5 file, so that the same templates can be perfectly replicated.

Once the templates are generated, recordings can be generated as follows:

>> mearec gen-recordings -t path-to-templates-file.h5 -d 30 -ne 4 -ni 2 --st-seed 0 --temp-seed 1 --noise-seed 2 … Saved recordings in path-to-recordings-file.h5

The gen-recordings command combines the selected templates from 4 excitatory cells (-ne 4) and 2 inhibitory cells (-ni 2), that usually have a more narrow spike waveform and a higher firing rate, with randomly generated spike trains. The duration of the output recordings is 30 seconds (-d 30). In this case, three random seeds control the spike train random generation (--st-seed 0), the template selection (--temp-seed 1), and the noise generation (--noise-seed 2). Figure 1 shows one second of the generated recordings (A), the extracted waveforms and the mean waveforms for each unit on the electrode with the largest peak (B), and the principal component analysis (PCA) projections of the waveforms on the tetrode channels.

Figure 1:
  • Download figure
  • Open in new tab
Figure 1:

Example of simulated tetrode recording. (A) One second of the recording timeseries on the four tetrode channels. The templates for the different units are overlapped to the recording traces in different colors. (B) Extracted waveforms on the channel with the largest amplitude for the six units in the recordings. (C) PCA projections on the first two PC components of the four tetrode channels. Each color corresponds to a neuron. The diagonal plots display the histograms of the PC projection on the corresponding channel.

MEArec also implements a convenient Python API, which is run internally by the CLI commands. For example, the following snippet of code implements the same commands shown above for generating templates and recordings:

Figure2
  • Download figure
  • Open in new tab

Moreover, the Python API implements plotting functions to visually inspect the simulated templates and recordings. For example, Figure 1 panels were generated using the plot_recordings() (A), plot_waveforms() (B), and plot_pca_map() (C) functions.

MEArec overview

After having shown how to generate a recording in MEArec, we introduce here an overview of the software (Figure 2). The simulation is split in two phases: templates generation (Figure 2A) and recordings generation (Figure 2B).

Figure 2:
  • Download figure
  • Open in new tab
Figure 2:

Overview of the MEArec software. The simulation is divided in two phases: templates generation and recordings generation. (A) The templates generation phase is split in an intracellular and extracellular simulation. The intracellular simulation computes, for each available cell model, the transmembrane currents generated by several action potentials. In the extracellular simulation, each cell model is randomly moved and rotated several times and the stored currents are loaded to the model to compute the extracellular action potential, building a template library. (B) The recordings generation phase combines templates selected from the template library and randomly generated spike trains. Selected templates are pre-processed before a customized convolution with the spike trains. Additive noise is added to the output of the convolution, and the recordings can be optionally filtered.

Templates (or extracellular action potentials) are generated using biophysically realistic cell models which are positioned in the surroundings of a probe model. The templates generation output is a library of a large variety of extracellular templates, which can then be used to build the recordings. The templates generation phase is the most time consuming, but the same templates library can be used to generate a virtually infinite number of different recordings.

Recordings are then generated by combining templates selected with user-defined rules (based on minimum distance between neurons, amplitudes, spatial overlaps, and cell-types) and by simulating spike trains. Selected templates and spike trains are assembled using a customized (or modulated) convolution, which can replicate interesting features of spiking activity such as bursting and drifting. After convolution, additive noise is generated and added to the recordings. Finally, the output recordings can be optionally filtered with a band-pass or a high-pass filter. For a full description of the templates and recordings generation, please refer to the Materials and Methods section.

MEArec is designed to allow for full customization, transparency, and reproducibility of the simulated recordings. Parameters for the templates and recordings generation are accessible by the user and documented, so that different aspects of the simulated signals can be finely tuned (see Materials and Methods for a list of parameters and their explanation). Moreover, the implemented command line interface (CLI) and simple Python API, enables the user to easily modify parameters, customize, and run simulations.

Finally, MEArec permits to manually set several random seeds used by the simulator to make recordings fully reproducible. This feature also enables one to study how separate characteristics of the recordings affect the spike sorting performance. As an example, we will show in the next sections how to simulate a recording sharing all parameters, hence with exactly the same spiking activity, but with different noise levels or drifting velocities.

Generation of realistic Multi-Electrode Array recordings

The recent development of Multi-Electrode Arrays (MEAs) enables researchers to record extracellular activity at very high spatio-temporal density both for in vitro [2, 13] and in vivo applications [35, 26]. The large number of electrodes and their high density can result in challenges for spike sorting algorithms. It is therefore important to be able to simulate recordings from these kind of neural probes.

To deal with different probe designs, MEArec uses another Python package (MEAutility https://meautility.readthedocs.io/), that allows users to easily import several available probe models and to define custom probe designs. Among others, MEAutility include Neuropixels probes [26], Neuronexus commercial probes (http://neuronexus.com/products/neural-probes/), and a wide variety of square MEA designs with different contact densities (the list of available probes can be found in Appendix A).

Similarly to the tetrode example, we first have to generate templates for the probes. These are the commands to generate templates and recordings for a Neuropixels design with 128 electrodes (Neuropixels-128). The recordings contain 60 neurons, 48 excitatory and 12 inhibitory. With similar commands, we generated templates and recordings for a Neuronexus probe with 32 channels (A1×32-Poly3-5mm-25s-177-CM32 -Neuronexus-32) with 20 cells, and a square 10×10 MEA with 15 µm inter-electrode-distance (SqMEA-10-15) and 50 cells.

>> mearec gen-templates -prb Neuropixels-128 -n 100 --seed 0 … Saved templates in path-to-Neuropixels-templates-file.h5 >> mearec gen-recordings -t path-to-Neuropixels-templates-file.h5 -d 30 -ne 48 -ni 12 --st-seed 0 --temp-seed 1 --noise-seed 2 … Saved recordings in path-to-Neuropixels-recordings-file.h5

Figure 3 shows the three above-mentioned probes (A), a sample template for each probe design (B), and one-second snippets of the three recordings (C-D-E), with zoomed in window to highlight spiking activity.

Figure 3:
  • Download figure
  • Open in new tab
Figure 3:

Generation of high-density multi-electrode array recordings. (A) Example of three available probes: a commercial Neuronexus probe (left), the Neuropixels probe (middle), and a high-density square MEA. (B) Sample templates for each probe design. (C-D-E) One-second snippets of recordings from the Neuronexus probe (C), the Neuropixels probe (D), and the square MEA probe (E). The highlighted windows display the activity over three adjacent channels and show how the same spikes are seen on multiple sites.

While all the recordings shown so far have been simulated with default parameters, several aspects of the spiking activity are critical for spike sorting. In the next sections, we will show how these features, including bursting, spatio-temporal overlapping spikes, drifting, and noise assumptions can be explored with MEArec simulations.

Bursting modulation of spike amplitude and shape

Bursting activity is one of the most complicated features of spiking activity that can compromise the performance of spike sorting algorithms. When a neuron bursts, i.e. it fires repeated and fast action potentials, the dynamics underlying the generation of the spikes changes over the bursting period [20]. While the bursting mechanism has been largely studied with patch-clamp experiments, combined extracellular-juxtacellular recordings [1] and computational studies [18] suggest that during bursting, extracellular spikes become lower in amplitude and wider in shape.

In order to simulate this property of the extracellular waveforms in a fast and efficient manner, templates are modulated both in amplitude and shape during the convolution operation, depending on the spiking history.

To demonstrate how bursting is replicated, we built a constant spike train with 10 ms inter-spike-interval (Figure 4A). A modulation value is computed for each spike and is used to modulate the convolution operation for that event. The blue dots show the default modulation, in which the modulation values are drawn from a Gaussian distribution with unitary mean. When bursting is enabled, the modulation value is computed as a sublinear power depending of the number of consecutive spikes in a burst and the inter-spike-interval (see Materials and Methods for details). The bursting events can be either controlled by the maximum number of spikes making a burst (orange -5 spikes; green -10 spikes) or by setting a maximum bursting duration (red -75 ms).

Figure 4:
  • Download figure
  • Open in new tab
Figure 4:

Bursting behavior. (A) Modulation values computation for a sample spike train of 300 ms with constant inter-spike-intervals of 10 ms. The blue dots show the modulation values for each spike when bursting is not activated: each value is drawn from a 𝒩 (1, 0.052) distribution. When bursting is activated, a bursting event can be limited by the maximum number of spikes (orange -5 spikes, green -10 spikes), or by the maximum bursting event duration (red -75 ms). (B) Modulated templates. The blue lines show templates modulated in amplitude only. The orange and green lines display the same templates with added shape modulation. (C) Modulation in tetrode recordings. The top panel shows spikes in a one-second period. The middle panel displays the modulation values for those spikes. The bottom panel shows the modulated template on the electrode with the largest peak after convolution. (D-E) PCA projections on the first principal component for the tetrode recordings wihout bursting (D) and with bursting (E) enabled. Note that the PCA projections were computed in both cases from the waveforms without bursting. The clusters, with bursting, become more spread and harder to separate than without bursting.

The modulation value controls the level of amplitude and shape modulation of the spike event. In Figure 4B, examples of bursting templates are shown. The blue traces display templates only modulated in amplitude, i.e. the amplitude is scaled by the modulation value. The orange and green traces, instead, also present shape modulation, which is achieved by stretching the time axis using a sigmoid transform. The sigmoid transform can be adjusted to have more (green) or less (orange) shape modulation.

Figure 4C shows a one-second snippet of the tetrode recording shown previously after bursting modulation is activated. The top panel shows the spike events, the middle one displays the modulation values, and the bottom panel shows the output of the modulated convolution between one of the templates (on the electrode with the largest amplitude) and the spike train.

Figure 4D and Figure 4E show the waveform projections on the first principal component for the tetrode recording shown previously with and without bursting, respectively. In this case all neurons are bursting units and this causes a stretch in the PCA space, which is a clear complication for spike sorting algorithms.

Controlling spatio-temporal overlaps

Another complicated aspect of extracellular spiking activity that can influence spike sorting performance is the occurrence of overlapping spikes. While temporal overlapping of events on spatially separated locations can be solved with feature masking [41], spatio-temporal overlapping can cause a distortion of the detected waveform, due to the superposition of separate spikes. Some spike sorting approaches, based on template-matching, are designed to tackle this problem [37, 43, 10].

In order to evaluate to what extent spatio-temporal overlap affects spike sorting, MEArec allows the user to set the number of spatially overlapping templates and to modify the synchrony rate of their spike trains. In Figure 5 we show an example of this on a Neuronexus-32 probe (see Figure 3A). The recording was constructed with two excitatory and spatially overlapping neurons, whose templates are shown in Figure 5A (see Materials and Methods for details on spatial overlap definition). The spike synchrony rate can be controlled with the sync_rate parameter. If this parameter is not set (Figure 5B -left), some spatio-temporal overlapping spikes are present (red events). If the synchrony rate is set to 0, those spikes are removed from the spike trains (Figure 5B -middle). If set to 0.05, i.e. 5% of the spikes will be spatio-temporal collisions, events are added to the spike trains to reach the specified synchrony rate value of spatio-temporal overlap. As shown in Figure 5C, the occurrence of spatio-temporal overlapping events affects the recorded extracellular waveform: the waveforms of the neurons, in fact, get summed and might be mistaken for a separate unit by spike sorting algorithms when the spikes are overlapping.

Figure 5:
  • Download figure
  • Open in new tab
Figure 5:

Controlling spatio-temporal overlapping spikes. (A) Example of two spatially overlapping templates. The two templates are spatially overlapping because on the electrode with the largest signal (depicted as an black asterisk) for template A (blue), template B has an amplitude greater than the 90% of its largest amplitude. (B) Without setting the synchrony rate, the random spike trains (left) present a few spatio-temporal collisions (red events). When setting the synchrony rate to 0 (middle), the spatio-temporal overlaps are removed. When the synchony rate is set to 0.05 (right), spatio-temporal overlapping spikes are added to the spike trains. (C) One-second snippet of the recording with 0.05 synchrony. In the magnified window, a spatio-temporal overlapping event is shown: the collision results in a distortion of the waveform.

The possibility of reproducing and controlling this feature of extracellular recordings within MEArec could aid in the development of spike sorters which are robust to spatio-temporal collisions.

Generating drifting recordings

When extracellular probes are inserted in the brain, especially for acute experiments, the neural tissue might slowly move with respect to the electrodes. This phenomenon is known as drift.

Drifting is particularly critical for spike sorting, as the waveform shapes change over time due to the relative movement between the neurons and the probe. New spike sorting algorithms have been developed to specifically tackle the drifting problem (Kilosort22, IronClust [25]).

In order to simulate drift in the recordings, we first need to generate drifting templates:

>> mearec gen-templates -prb Neuronexus-32 -n 30 --drifting --seed 0 … Saved templates in path-to-Neuronexus-drift-templates-file.h5

Drifting templates are generated by choosing an initial and final soma position with user-defined rules (see Materials and Methods for details) and by moving the cell along the line connecting the two positions for a defined number of drifting steps (50 by default). An example of a drifting template is depicted in Figure 6A, alongside with the drifting neuron’s soma locations.

Figure 6:
  • Download figure
  • Open in new tab
Figure 6:

Drifting. (A) Example of a drifting template. The colored asterisks on the left show the trajectory from the initial (blue large asterisk) to the final (red large asterisk) neuron positions. The positions are in the x-y coordinates of the probe plane, and the electrode locations are depicted as black dots. The corresponding templates are displayed at the electrode locations with the same colormap, showing that the template peak is shifted upwards following the soma position. (B) 60-second drifting recording with four neurons moving at a velocity of 20 µm/s. The colored arrows show the initial and final soma positions for each neuron. (C) Waveforms and average waveforms on the electrode with the largest peak for each of the four neurons in the recording. (D) Amplitude of the waveforms over time recorded on the electrode with the largest initial peak. Drifting results in a slow change of amplitude over the course of the recording.

Once a library of drifting templates is generated, drifting recordings can be simulated. Depending on the drifting velocity, the drifting template is replayed so that, for each spike, the correct drifting template is selected for convolution. In Figure 6B, we show an example with four drifting cells, a drifting velocity of 20 µm/s, and a duration of 60 seconds. The colored arrows show the initial and final positions of the four neurons making up the recording. Note that a drifting velocity of 20 µm/s is much larger than normal experimental drifts, and it has been chosen to illustrate the drifting phenomenon. Figure 6C shows the waveforms and the average waveforms for the four neurons on the electrode with the largest peak. In this case, the variability is mainly due to the relative movement between the cell and the probe. This can be observed by visualizing the waveform amplitude for each spike over time (Figure 6C -each color is a different neuron).

Modeling experimental noise

Spike sorting performance can be greatly affected by noise in the recordings. Many algorithms first use a spike detection step to identify putative spikes. The threshold for spike detection is usually set depending on the noise standard deviation or mean average deviation [38]. Clearly, recordings with larger noise levels will result in higher spike detection thresholds, hence making it harder to robustly detect lower amplitude spiking activity. In addition to the noise amplitude, other noise features can affect spike sorting performance: some clustering algorithms, for example, assume that clusters have Gaussian shape, due to the assumption of an additive normal noise to the recordings. Moreover, the noise generated by biological sources can produce spatial correlations in the noise profiles among different channels and it can be modulated in frequency [6, 40].

To investigate how the above-mentioned assumptions on noise can affect spike sorting performance, MEArec can generate recordings with several noise models. Figure 7 shows 5-second spiking-free recordings of a tetrode probe for five different noise profiles that can be generated (A -recordings, B -spectrum, C -channel covariance, D -amplitude distribution).

Figure 7:
  • Download figure
  • Open in new tab
Figure 7:

Noise models. The 5 columns refer to different noise models: 1) Uncorrelated Gaussian noise, 2) Distance-correlated Gaussian noise, 3) Colored uncorrelated Gaussian noise, 4) Colored distance-correlated Gaussian noise, and 5) Noise generated by distant neurons. (A) One-second spiking-free recording. (B) Spectrum of the first recording channel between 10 and 5000 Hz. (C) Covariance matrix of the recordings. (D) Distribution of noise amplitudes for the first recording channel. The different noise models vary in the spectrum, channel correlations, and amplitude distributions.

The first column shows uncorrelated Gaussian noise, which presents a flat spectrum, a diagonal covariance matrix, and a symmetrical noise amplitude distribution. In the recording in the second column, spatially correlated noise was generated as a multivariate Gaussian noise with a covariance matrix depending on the channel distance. Also in this case, the spectrum (B) presents a flat profile and the amplitude distribution is symmetrical (D), but the covariance matrix shows a correlation depending on the inter-electrode distance. As previous studies showed [6, 40], the frequency content of extracellular noise is not flat, but its spectrum is affected by the spiking activity of distant neurons, which appear in the recordings as below-threshold biological noise. To reproduce the spectrum profile that is observed in experimental data, MEArec allows coloring the noise spectrum of Gaussian noise with a second order infinite impulse response (IIR) filter (see Materials and Methods for details). Colored noise represents an efficient way of obtaining the desired spectrum, as shown in the third and fourth columns of Figure 7, panel B. Distance correlation is maintained (panel C -fourth column), and the distribution of the noise amplitudes is symmetrical. Finally, a last noise model enables one to generate activity of distant neurons. In this case, noise is built as the convolution between many neurons (300 by default) whose template amplitudes are below an amplitude threshold (10 µV by default). A Gaussian noise floor is then added to the resulting noise, which is scaled to match the user-defined noise level. The far-neurons noise profile is shown in the last column of Figure 7. While the spectrum and spatial correlation of this noise profile are similar to the ones generated with a colored, distance-correlated noise (4th column), the shape of the noise distribution is skewed towards negative values (panel D), mainly due to the negative contribution of the action potentials.

The capability of MEArec to simulate several noise models enables spike sorter developers to assess how different noise profiles affect their algorithms and to modify their methods to be insensitive to specific noise assumptions.

Testbench for spike sorting development and assessment

In the previous sections, we have shown several examples on how MEArec is capable of reproducing several aspects of extracellular recordings which are critical for spike sorting performance, in a fully reproducible way. The proposed design and its integration with a spike sorting evaluation framework called SpikeInterface3 enables developers to actively include customized simulations in the spike sorting development phase.

Due to its speed and controllability, we see MEArec as a testbench, rather than a benchmark tool. We provide here a couple of examples. In Figure 8A, we show a one-second section of recordings simulated on a Neuronexus-32 probe with fixed parameters and random seeds regarding template selection and spike train generation, but with four different levels of additive Gaussian noise, with standard deviations of 5, 10, 20, and 30 µV (Appendix B contains the Python code used to generate and plot these recordings). The traces show the same underlying spiking activity, so the only variability in spike sorting performance will be due to the varying noise levels. Similarly, in Figure 8B, 1-minute drifting recordings were simulated with three different drifting velocities. The recordings show that for low drifting speeds the waveform changes are almost not visible (green traces), while for faster drifts (orange and blue traces), the waveform changes over time become more important.

Figure 8:
  • Download figure
  • Open in new tab
Figure 8:

MEArec as testbench platform for spike sorting. (A) Four one-second snippetd of recordings generated with a different noise level parameter (5 -red, 10 -green, 20 -blue, and 30 µV -blue). The underlying spiking activity is exactly the same for all recordings, and the only difference lie in the standard deviation of the underlying uncorrelated Gaussian noise. (B) Three drifting recordings generated with a different drifting velocity parameter (10 -green, 30 -blue, and 60 µm/s -blue). Also in this case, the underlying spiking activity is the same, but it can be observed how the different speeds result in a modification of waveforms over time.

The capability of MEArec of reproducing such behaviors in a highly controlled manner could aid in the design of specific tests for measuring and quantifying the ability of a spike sorting software to deal with specific complexities in extracellular recordings. Other examples include simulating a recording with increasing levels of bursting in order to measure to what extent bursting units are correctly clustered, or changing the synchrony rate of spatially overlapping units to assess how much spatio-temporal collisions affect performance.

Integration with SpikeInterface

We have recently developed SpikeInterface, a Python-based framework for running several spike sorting algorithms, comparing, and validating their results. MEArec can be easily interfaced to SpikeInterface so that simulated recordings can be loaded, spike sorted, and benchmarked with a few lines of code. In the following example, a MEArec recording is loaded, spike sorted with Mountainsort4 [8] and Kilosort24 [37], and benchmarked with respect to the ground-truth spike times available from the MEArec simulation:

Figure10
  • Download figure
  • Open in new tab

The combination of MEArec and SpikeInterface represents a powerful tool for systematically testing and comparing spike sorter performances with respect to several complications of extracellular recordings. MEArec simulations, in combination with SpikeInterface, are already being used by other groups to benchmark and compare spike sorting algorithms5.

Performance considerations

As a testbench tool, the speed requirement has been one of the main design principle of MEArec. In order to achieve high speed, most parts of the simulation process are fully parallelized. As shown in Figure 2, the simulations are split in templates and recordings generation. The templates generation phase is the most time consuming, but the same template library can be used to generate several recordings. This phase is further split in two sub-phases: the intracellular and extracellular simulations. The former only needs to be run once, as it generates a set of cell model-specific spikes that are stored and then used for extracellular simulations, which is instead probe specific.

We present here run times for the different phases of the templates generation and for the recordings generation. All simulations were run on an Ubuntu 18.04 Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz, with 16 GB of RAM.

The intracellular simulation run time for the 13 cell models shipped with the software was ~130 seconds (~10 seconds per cell model).

Run times for extracellular simulations for several probe types, number of templates in the library, and drifting templates are shown in the Templates generation section of Table 1. The run times for this phase mainly depend on the number of templates to be generated (N templates column), on the minimum amplitude of accepted templates (Min. amplitude column), and especially on drifting (Drifting column). When simulating drifting templates, in fact, the number of actual extracellular spikes for each cell model is N templates times N drift steps. Note that in order to generate the farneurons noise model, the minimum amplitude should be set to 0, so that low-amplitude templates are not discarded. The number of templates available in the template library will be the specified number of templates (N templates) times the number of cell models (13 by default).

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 1:

Templates and recordings generation run times depending on several simulation parameters.

Recordings are then generated using the simulated template libraries. In Table 1, the Recordings generation section shows run times for several recordings with different probes, durations, number of cells, bursting, and drifting options. The main parameter that affects simulation times is the number of cells, as it increases the number of modulated convolutions. Bursting and drifting behavior also increase the run time of the simulations, because of the extra processing required in the convolution step. The simulation run times, however, range from a few seconds to a few minutes. Therefore, the speed of MEArec enables users to generate numerous recordings with different parameters for testing spike sorter performances.

Discussion

In this paper we have presented MEArec, a Python package for simulating extracellular recordings for spike sorting development and validation. We first showed the ease of use of the software, whose command line interface and simple Python API enable users to simulate extracellular recordings with a couple of commands or a few lines of code. We then introduced an overview of the software function, consisting in separating the templates and the recordings generation to improve efficiency and simulation speed. We explored the capability of reproducing and controlling several aspects of extracellular recordings which can be critical for spike sorting algorithms, including spikes in a burst with varying spike shapes, spatio-temporal overlaps, drifting units, and noise assumptions. We illustrated two examples of using MEArec, in combination with SpikeInterface6, as a testbench platform for developing spike sorting algorithms. Finally, we benchmarked the speed performance of MEArec (Table 1).

Investigating the validation section of several recently developed spike sorting algorithms [41, 37, 26, 22, 25, 27, 43], it is clear that the neuroscientific community needs a standardized validation framework for spike sorting performance. Some spike sorters are validated using a so called hybrid approach, in which well-identified units from previous experimental recordings are artificially injected in the recordings and used to compute performance metrics [41, 37]. The use of templates extracted from previously sorted datasets poses some questions regarding the accuracy of the initial sorting, as well as the complexity of the well-identified units. Alternatively, other spike sorters are validated on experimental paired ground-truth recordings [8, 43]. While these valuable datasets [19, 21, 35, 31] can certainly provide useful information, the low count of ground-truth units makes the validation incomplete and could result in biases (for example algorithm-specific parameters could be tuned to reach a higher performance for the recorded ground-truth units). A third validation method consist of using simulated ground-truth recordings [11]. While this approach is promising, in combination with experimental paired recordings, the current available simulators [6, 18, 33] present some limitations in terms of biological realism, controllability, speed, and/or ease of use (see Introduction). We therefore introduced MEArec, a software package which is computationally efficient, easy to use, highly controllable, and capable of reproducing critical characteristics of extracellular recordings relevant to spike sorting, including bursting modulation, spatio-temporal overlaps, drifting of units over time, and various noise profiles.

The capability of MEArec to replicate complexities in extracellular recordings which are usually either ignored or not controlled in other simulators, permits the user to include tailored simulations in the spike sorting implementation process, using the simulator as a testbench platform for algorithm development. MEArec simulations could not only be used to test the final product, but specific simulations could be used to help implementing algorithms that are able to cope with drifting, bursting, and spatio-temporal overlap, which are regarded as the most complex aspects for spike sorting performance [40, 43].

In MEArec, in order to generate extracellular templates, we used a well-established modeling framework for solving the single neuron dynamics [7], and for calculating extracellular fields generated by transmembrane currents [29, 18]. These models have some assumptions that, if warranted, could be addressed with more sophisticated methods, such as finite element methods (FEM). In a recent work [5], we used FEM simulations and showed that the extracellular probes, especially MEAs, affect the amplitude of the recorded signals. While this finding is definitely interesting for accurately modeling and understanding how the extracellular potential is generated and recorded, it is unclear how it would affect the spike sorting performance. Moreover, when modeling signals on MEAs, we used the method of images [34, 5], which models the probe as a infinite insulating plane and better describes the recorded potentials for large MEA probes [5].

Secondly, during templates generation, the neuron models were randomly moved around and rotated with physiologically acceptable values [4]. In this phase, some dendritic trees might unnaturally cross the probes. We decided to not modify the cell models and allow for this behavior for sake of efficiency of the simulator. The modification of the dendritic trees for each extracellular spike generation would in fact be too computationally intense. However, since the templates generation phase is only run once for each probes, in the future we plan to both to include the probe effect in the simulations and to carefully modify the dendritic positions so that they do not cross the probes’ plane.

Another limitation of the proposed modelling approach is in the replication of bursting behavior. We implemented a simplified bursting modulation that attempts at capturing the features recorded from extracellular electrodes by modifying the template amplitude and shape depending on the spiking history. However, more advanced aspects of waveform modulation caused by bursting, including morphology-dependent variation of spike shapes, cannot be modelled with the proposed approach, and their replication requires a full multi-compartment simulation [18]. Nevertheless, the suggested simplified model of bursting could be a valuable tool for testing the capability of spike sorters to deal with this phenomenon.

Finally, the current version of MEArec only supports cell models from the Neocortical Microcircuit Portal [30, 39], which includes models from juvenile rat somatosensory cortex. The same cell model format is also being used to build a full hippocampus model [32] and other brain regions, and therefore the integration of new models should be straightforward. Moreover, we are in the process of extending the supported cell models for the Allen Brain Institute database [16]7, which contains models from mice and human cells. Further, by design, the templates and recordings generation phases are split. Therefore, the recordings generation mechanism could also be used, in principle, for user-defined template libraries, either from other unsupported cell models or from units extracted from experimental recordings.

In conclusion, we introduced MEArec, which is a Python-based simulation framework for extracellular recordings. Thanks to its speed and controllability, we see MEArec to aid both the development and validation spike sorting algorithms and to help understanding the limitation of current methods, to improve their performance, and to generate new software tools for the hard and still partially unsolved spike sorting problem.

Materials and Methods

Templates generation

This section explains the templates generation phase of the simulator (Figure 2A). Table 2 shows the list of parameters involved in this phase, their default values, types, and an explanation of their function.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 2:

Templates generation parameter list, values, types, and explanations.

MEArec is compatible with realistic multi-compartment neuronal models from the Neocortex Microcircuit Portal (NMC -[39, 30]). Upon installation, 13 cell models from layer 5 are copied in the package folder. Moreover, the user can manually download other cell models from the portal and use them for the simulation.

Intacellular simulation

The neuronal model dynamics is solved using the NEURON simulator [7]. The neuron’s soma is stimulated with a constant current for a user-defined simulation time (1 secondecond by default -sim_time parameter) and the stimulation weight is adjusted (using the weights parameter) so that the number of spikes in the simulation period is within a target interval (between 3 and 50 by default -target_spikes parameter). The stimulation starts after delay ms from the start of the simulation to avoid initialization artifacts. The simuation time step is defined by the parameter dt (default is 0.03125 ms, corresponding to 32 kHz). Single spikes are then detected by threshold crossing, aligned, and cropped (using the cut_out parameter). The transmembrane currents of all segments are saved to disk, so that the intracellular simulation only needs to be run once for each cell model.

Extracellular simulation

Transmembrane currents generated by the intracellular simulation are used to compute extracellular potentials at the electrode locations using LFPy [17]. Transmembrane currents are distributed over a line source with the length of its corresponding neural segment. Using the quasi-static approximation [36] and with the assunmption of a homogeneous, isotropic, and infinite neural tissue with conductivity s = 0.3 S/m [15], the contribution of each compartment i at position ri with transmembrane current Ii(t) to the electric potential on an electrode at position rj reads [23, 17, 4]:

Embedded Image

While the assumption of an infinite milieu holds for small probes, such as microwires and tetrodes, when using larger silicon probes, the use of the method of images (MoI) [34] can yield a better estimate of the extracellular potential [5]. Using MoI, the contribution of a transmembrane current to an electrode at position rj reads:

Embedded Image

The simulated extracellular spike is obtained by summing up the contributions of all compartments. For each recording site, the electric potential can be computed on several points within the electrode area (ncontacts parameter -10 points by default), that are then averaged to model the spatial filtering properties of the electrodes (disk-electrode approximation [29]).

Each cell model, during the templates generation phase, is used to generate several spikes (n parameter -50 by default). For each extracellular action potential, the neuron is randomly moved to a position within user-defined boundaries (xlim, ylim, zlim parameters). If the boundary for a specific axis is set to null, the limits are computed as the boundary of the probe in that axis plus the overhang value (default 30 µm). Moreover, a random rotation of the model can be optionally added (rot parameter). The models can be only shifted (norot), rotated along a single axis (xrot, yrot, zrot), rotated with a physiological rotation (physrot), or rotated randomly along all axes (3drot). For further details we refer to [4]. Extracellular spikes are included in the dataset only if their maximum amplitude is greater than a user-defined minimum amplitude (min_amp parameter -30 µV by default). In order to use the far-neurons noise model (Figure 7), the minimum amplitude parameter should be set to 0, so that low amplitude templates are not discarded.

Probe models

Probe models are handled using the MEAutility Python package (https://meautility.readthedocs.io/), which is automatically installed upon MEArec installation. The probe type can be chosen using the probe parameter (if not set, a random probe will be selected). MEAutility contains a large variety of available probe designs, e.g. commercial Neuronexus probes, Neuropixels [26], and highdensity square MEA (Figure 3), and it also allow users to define new probes using a yaml file or a Python dictionary. The probe definition contains information about the number and arrangement of the electrodes, the electrode shape and size (used for spatial filtering), the plane in which electrodes are located, and the probe type (wire or mea), which tells the simulator whether to use the infinite assumption (Equation 1) or MoI (Equation 2) for the extracellular potential calculation. In order to list the available probes and their information, one can use the mearec available-probes --info command.

Drifting templates

When inserting recording probes in the brain, over time there might be relative movement between the probe and the tissue, which causes a so-called drift in the recorded action potentials. In order to incorporate this phenomenon in the simulation of the recordings, drifting templates has to be generated (when the drifting parameter is set to true). From an initial random position of the cell model which satisfies the requirements in terms of location (within boundaries) and amplitude (above the detection threshold) a final drifting position is found so that the same conditions are satisfied. Moreover, the user can choose preferred drifting direction by setting the drift_xlim, drift_ylim, and drift_zlim parameters. When the final position is selected, the cell model is moved along a straight line connecting the initial and final position and the extracellular spike is simulated for drift_steps equidistant points (30 points by default) along this line (Figure 6A).

The templates generation phase can be reproduced by setting the seed parameter, which is randomly selected if it is set no null.

Recordings generation

When a template library is generated, it can be used to generate many recordings, as shown in Figure 2B. Tables 3 and 4 show the list of parameters involved in the recordings generation phase, their default values, types, and an explanation of their function.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 3:

Recordings generation parameter list, values, types, and explanations.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 4:

(Continued) Recordings generation parameter list, values, types, and explanations.

Spike trains generations

In order to obtain the spiking activity, spike trains have to be generated. All the spike train generation parameters can be found in the spiketrains section of the recordings parameters.

Spike trains can be generated either as Poisson or Gamma processes (process parameter). If the Gamma process is selected, its shape is controlled by the gamma_shape parameter (default is 2). The user can decide the number of excitatory (n_exc) and inhibitory neurons (n_inh) in the recordings. The average and standard deviation of the firing rates of excitatory and inhibitory neurons can be chosen (with the f_exc, f_inh, st_exc, and st_inh parameters), as well as the minimum accepted firing rate (min_rate -default 0.5 Hz). Alternatively, the user can define the type (E-I) and mean firing rate of all neurons in the recordings. As Poisson and Gamma processes do not have a minimum inter-spike-interval, spikes violating a refractory period (ref_per -2 ms by default) are removed from the spike trains. Finally, the duration of the spike trains sets the duration of the recordings (duration parameter), and the user can set the random seed for spike train generation (seed parameter in the spiketrains section). Spike trains are represented as neo.SpikeTrain objects [14].

Excitatory and inhibitory cell types

The cell_types section of the recordings parameters tells the simulator which cell types are excitatory and which are inhibitory. For all cell models in the Neocortical Microcircuit Portal [39], excitatory cells can be pyramidal cells (PC), star pyramidal cells (SP), and stellate cells (SS). The population of inhibitory cells is more diverse and it includes: axon cells (AC), bipolar cells (BP), bitufted cells (BTC), basket cells (BC), Chandelier cells (ChC), double bouquet cells (DBC), Martinotti cells (MC), and neurogliaform cells (NGC) [30]. This substrings are used to identify the cell models belonging to the excitatory and inhibitory group for the template selection process.

Template selection and pre-processing

After spike trains are generated, templates are selected from the template library and associated with each spike train. The parameters involved in the template selection and pre-processing are in the templates section of the recordings parameters.

Templates are chosen based on amplitude, distance, spatial overlap, and cell type. The selection algorithm discards templates with a peak amplitude below and above user-defined threshold (min_amp and max_amp parameters) and with a distance from already selected neurons below a minimum distance (min_dist parameter). Moreover, the user can select specific boundaries in the x-, y-, and z-direction (xlim, ylim, and zlim parameter). If the boundaries are set to null (by default), there is no restriction on the neurons’ location. Templates are chosen so that the number of excitatory and inhibitory types matches the spike trains’ ones. Finally, the user can select the number of spatially overlapping template pairs in the recordings (n_overlap_pairs parameter). Two templates A and B are identified as spatially overlapping if the amplitude of template B on the electrode with largest amplitude for template A is above 90% (overlap_threshold parameter) of its maximum amplitude, and viceversa. The template selection seed can be set with the seed parameter in the templates section of the recordings parameters.

When templates are selected, they are pre-processed before the convolution operation. First, the templates are padded on both sides (by default extending the templates of 3 ms on each side -pad_len parameter) in order to ensure a smooth convolution operation. The template baseline is first removed, then the templates are extended in both directions by linearly interpolating their initial and final values to 0. Finally, this linearly extended template is re-interpolated with a cubic spline.

Next, to model the time variation occurring during sampling, for each template n_jitter versions are created (10 by default). Jittering is performed by upsampling the templates (8x by default -upsample parameter) and shifting them randomly in time within a sampling period, before downsampling them back to the original sampling frequency.

Recordings construction

In the recordings section of the recordings parameters, the user can set several parameters for the recordings generation. If not specified, the sampling frequency of the recordings (fs parameter) is the same as the generated templates (32 kHz by default), but the user can choose a different sampling rate. In this case the templates are resampled using a polyphase filter. If the overlap parameter is set to true, each spike is annotated as NO (no overlap), TO (temporal overlap), or STO (spatio-temporal overlap). If the extract_waveforms parameter is set to true, after the recordings generation the waveforms are extracted from the recordings and loaded to the spike train objects.

Overlapping spikes and spatio-temporal synchrony

Spatio-temporal overlapping of spikes can make spike sorting very challenging [37, 43]. In order to control how spike sorting is affected by the rate of overlapping spikes, MEArec enables users to modify the spike trains in order to introduce a controlled amount of spatio-temporal overlapping synchrony (Figure 5).

If the synchrony rate is set (sync_rate parameter), the spike trains of spatially overlapping templates are modified to reach the desired synchrony rate. If the chosen synchrony rate is lower than the initial rate, spatio-temporal overlapping spikes are randomly removed from the spike trains. Conversely, when the chosen synchrony rate is greater than the initial rate, additional spikes that do not violate the refractory period are randomly added to the corresponding spike trains until the desired rate is reached. The additive spikes are jittered randomly within a user-defined interval (sync_jitt -default ±1 ms).

Modulated convolution

Pre-processed templates and spike trains are combined with a customized (modulated) convolution. The convolution step can be performed in parallel on chunks (20 seconds by default -chunk_conv_duration parameter). In order the replicate the variability of spikes in experimental data and computational models [1, 18], the convolution between spike trains and templates is modulated, i.e. the template corresponding to each spike can be modified both in amplitude and in shape (Figure 4).

There are three types of amplitude modulation available: 1) none (no modulation), 2) template, 3) electrode modulation (default). On top of amplitude modulation, when modulation is not none, shape modulation can be used by setting the shape_mod parameter to true.

Amplitude modulation

The amplitude modulation consists of scaling the amplitude of each spike event with a modulation value. When the template modulation is selected, the modulation value is the same for all the electrodes. When the electrode modulation is used, each electrode has a slightly different modulation value. For the template and electrode modulation types, if the bursting parameter is set to false, the modulation value is a random value drawn from a normal distribution 𝒩 (1, sdrand2), (where sdrand is 0.05 by default). As the distribution has mean equal 1, the average amplitude of the resulting modulated spikes is the same as the original template. When the bursting parameter is set to true, the modulation values are computed to reproduce the amplitude scaling due to bursting behavior (see Figure 4A). The user can choose how many units will be affected by bursting (n_bursting parameter). Consecutive spikes occurring within a user-defined bursting period (max_burst_duration parameter -default 100 ms) are scaled with a sub-linear function (up to a maximum number of consecutive spikes n_burst_spikes -10 by default). The amplitude scaling for the i-th consecutive spike within a bursting event is computed as: Embedded Image where avg_isi0-i is the average inter-spike-interval (ISI) from the first bursting spike to the current spike in the bursting event, c is the number of consecutive spikes encountered up to spike i, max_burst_duration is the maximum bursting period (default 100 ms), and exp_decay is the exponent (0.1 by default). Additionally, the ISI-dependent modulation value is scaled a by a random value drawn from a normal distribution at the template level (template modulation) or electrode level (electrode modulation).

Shape modulation

When shape_mod is set to true, spikes are also modulated in shape. Shape modulation consists of strecthing the template depending on its modulation value. The stretch is achieved in the following way: first, the template time axis is centered to the template peak and scaled so that its length is equal to 1 – we will refer to this centered and normalized time axis as xc; second, xc is multiplied by the bursting_sigmoid parameter, which controls the amount of stretch –we will refer to this transformed time axis as xt; then, a stretch factor s is computed for the entire template (the same factor is computed for all electrodes) as the average modulation value of all electrodes (if electrode modulation is used); if the stretch factor is less than 1, xt is projected on a sigmoid function:

Embedded Image

xs is now a non-linear stretched time axis. The template is interpolated on xs with a cubic spline and transformed back to the original time axis xc. The shape modulation with two different bursting_sigmoid values is shown in Figure 4B. The amplitude of the shape-modulated template is finally scaled with the modulation value to include the amplitude modulation.

Noise models and post-processing

Additive noise is superimposed to the signals after the modulated convolution is finished. There are three types of noise models that can be set using the noise_mode parameter: uncorrelated, distance-correlated, and far-neurons. The uncorrelated noise model is an additive Gaussian noise with a user-defined standard deviation (noise_level parameter -10 µV by default). The distance-correlated mode generates a multivariate normal noise with a covariance matrix dependent on the distance between electrodes. The covariance between electrode i and j is defined as cij = dh/2·dij, where dij is the distance between the electrodes and dh is the distance at which the covariance is 0.5 (noise_half_distance parameter -30 µm by default). Finally, the far-neurons model generates noise as the activity of many neurons (far_neurons_n parameter -300 by default) with small amplitudes (below far_neurons_max_amp -10 µV by default). The population of distant neurons has an excitatory/inhibitory ratio of far_neurons_exc_inh_ratio (default 0.8). A random noise floor with a standard deviation of far_neurons_noise_floor (default 0.5) times the standard deviation of the distant neurons’ spiking activity is added, in agreement with experimental data [6].

Uncorrelated and distance-correlated noise types can also be modulated in frequency to match the spectrum observed in experimental data [6, 17]. Extracellular spiking activity exhibit a peak in frequency at around 300 Hz, a 1/f spectrum, and a random noise floor. Noise can be colored (when the noise_color parameter is true) with a second order infinite impulse response (IIR) peak filter and an additional gaussian noise floor. The frequency peak, quality factor, and weight of the random noise floor can be set with the color_peak, color_q, and color_noise_floor parameters. Note that with distance-correlated noise the correlation is slightly reduced by the color filter, as a random noise floor is added.

Optionally, the signals can be filtered (by setting the filter to true) with an high-pass or bandpass Butterworth filter of order filter_order (3 by default) and cutoff frequencies of filter_cutoff ([300,6000] Hz by default).

Drifting recordings

When the drifting parameter is set to true, drifting recordings are generated. The template library must have been generated with the drifting mode as well. The user can decide the number of drifting units (n_drifting parameter). If n_drifting is null, all units will be drifting.

The generation of drifting recordings is only different in the template selection and modulated convolution steps. In the template selection, in addition to the selection rules based on template amplitude, inter-neuron distance, and spatial overlap, templates are selected if the angle between the drifting direction (computed as the vector connecting the final and initial position) and a user-defined preferred direction (preferred_dir parameter -[0, 0, 1] by default) is within an angle tolerance (angle_tol parameter -15° by default).

In the modulated convolution, the correct template among the drifting templates for each spike occurrence is selected based on the current drifting position computed as the initial position plus drifting velocity times simulation time. The drifting velocity can be modified by the user (drift_velocity parameter -5 µm/min by default) and the user can decide to start the drift after t_start_drift seconds.

Statistical analysis

No statistical analysis is used in this contribution.

Code availability

The presented software package is available at https://github.com/alejoe91/MEArec and https://github.com/alejoe91/MEAutility (used for probe handling). The packages are also available on pypi: https://pypi.org/project/MEArec/-https://pypi.org/project/MEAutility/.

Data availability

All the datsets generated for the paper and used to make figures are available on Zenodo at https://doi.org/10.5281/zenodo.3247736.

Acknowledgments

A.P.B. and G.T.E. are part of the Simula-UCSD-University of Oslo Research and PhD training (SU-URPh) program, an international collaboration in computational biology and medicine funded by the Norwegian Ministry of Education and Research. Moreover, we would like to thank Kristian Lensjø, Jennifer Hazen, and Mikkel Lepperød for their valuable feedback on the article.

Appendix A command line interface (CLI)

MEArec implements a command line interface (CLI) to make templates and recordings generation easy to use and to allow for scripting. In order to discover the available commands, the user can use the --help option:

>> mearec --help Usage: mearec [OPTIONS] COMMAND [ARGS]… MEArec: Fast and customizable simulation of extracellular recordings on Multi-Electrode-Arrays Options: --help Show this message and exit. Commands: available-probes Print available probes. default-config Print default configurations. gen-recordings Generates RECORDINGS from TEMPLATES. gen-templates Generates TEMPLATES with biophysical simulation. set-cell-models-folder Set default cell_models folder. set-recordings-folder Set default recordings output folder. set-recordings-params Set default recordings parameter file. set-templates-folder Set default templates output folder. set-templates-params Set default templates parameter file. Each available command can be inspected using the --help option: >> mearec command –help

At installation, MEArec creates a configuration folder (.config/mearec) in which global settings are stored. The default paths to cell models folder, templates and recordings output folders and parameters can be set using the set-cell-models-folder, set-commands. By default, these files and folders are located in the configuration folder.

>> mearec default-config {‘cell_models_folder’: path-to-cell_models, ‘recordings_folder’: path-to-recordings-folder, ‘recordings_params’: path-to-recordings-params.yaml, ‘templates_folder’: path-to-templates-folder, ‘templates_params’: path-to-templates-params.yaml}

A list of available probes can be found by running the available-probes command:

Figure11
  • Download figure
  • Open in new tab

Finally, examples of templates and recordings generation commands can be found in the Results section.

Appendix B Python API example

MEArec implements a Python API for simulating both templates and recordings. The Python API is recommended for generating recordings for testbench purposes. For example, the following script is used to generate four recordings with varying noise level, shown in Figure 8A:

Figure12
  • Download figure
  • Open in new tab

For additional examples, please refer to the Github page https://github.com/alejoe91/MEArec.

Footnotes

  • https://doi.org/10.5281/zenodo.3247736

  • ↵1 https://travis-ci.org/

  • ↵2 e.g. https://github.com/MouseLand/Kilosort2

  • ↵3 https://github.com/SpikeInterface

  • 4 https://github.com/MouseLand/Kilosort2

  • ↵5 https://spikeforest.flatironinstitute.org

  • ↵6 https://github.com/SpikeInterface

  • ↵7 https://celltypes.brain-map.org/

References

  1. [1].↵
    B. D. Allen, C. Moore-Kochlacs, J. G. Bernstein, J. Kinney, J. Scholvin, L. Seoane, C. Chronopoulos, C. Lamantia, S. B. Kodandaramaiah, M. Tegmark, et al. Automated in vivo patch clamp evaluation of extracellular multielectrode array spike recording capability. Journal of neurophysiology, 2018.
  2. [2].↵
    L. Berdondini, K. Imfeld, A. Maccione, M. Tedesco, S. Neukom, M. Koudelka-Hep, and S. Martinoia. Active pixel sensor array for high spatio-temporal resolution electrophysiological recordings from single cell to large scale neuronal networks. Lab on a Chip, 9(18):2644–2651, 2009.
    OpenUrlCrossRefPubMed
  3. [3].↵
    A. P. Buccino, E. Hagen, G. T. Einevoll, P. D. Häfliger, and G. Cauwenbergh. Independent component analysis for fully automated multi-electrode array spike sorting. In 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 2627–2630. IEEE, 2018.
  4. [4].↵
    A. P. Buccino, M. Kordovan, T. V. Ness, B. Merkt, P. D. Häfliger, M. Fyhn, G. Cauwenberghs, S. Rotter, and G. T. Einevoll. Combining biophysical modeling and deep learning for multielectrode array neuron localization and classification. Journal of neurophysiology, 2018.
  5. [5].↵
    A. P. Buccino, M. Kuchta, K. H. Jæger, T. V. Ness, P. Berthet, K. A. Mardal, G. Cauwenberghs, and A. Tveito. How does the presence of neural probes affect extracellular potentials? Journal of neural engineering, 2019.
  6. [6].↵
    L. A. Camuñas-Mesa and R. Q. Quiroga. A detailed and fast model of extracellular recordings. Neural computation, 25(5):1191–1212, 2013.
    OpenUrlCrossRefPubMed
  7. [7].↵
    N. T. Carnevale and M. L. Hines. The NEURON book. Cambridge University Press, 2006.
  8. [8].↵
    J. E. Chung, J. F. Magland, A. H. Barnett, et al. A fully automated approach to spike sorting. Neuron, 95(6):1381–1394, 2017.
    OpenUrl
  9. [9].↵
    M. Diesmann and M.-O. Gewaltig. Nest: An environment for neural systems simulations. Forschung und wisschenschaftliches Rechnen, Beiträge zum Heinz-Billing-Preis, 58:43–70, 2001.
    OpenUrl
  10. [10].↵
    R. Diggelmann, M. Fiscella, A. Hierlemann, and F. Franke. Automatic spike sorting for high-density microelectrode arrays. Journal of neurophysiology, 120(6):3155–3171, 2018.
    OpenUrl
  11. [11].↵
    G. T. Einevoll, F. Franke, E. Hagen, et al. Towards reliable spike-train recordings from thousands of neurons with multielectrodes. Current opinion in neurobiology, 22(1):11–17, 2012.
    OpenUrlCrossRefPubMed
  12. [12].↵
    F. Franke, M. Natora, P. Meier, E. Hagen, K. H. Pettersen, H. Linden, G. T. Einevoll, and K. Obermayer. An automated online positioning system and simulation environment for multielectrodes in extracellular recordings. In 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology, pages 593–597. IEEE, 2010.
  13. [13].↵
    U. Frey, U. Egert, F. Heer, S. Hafizovic, and A. Hierlemann. Microelectronic system for high-resolution mapping of extracellular electric fields applied to brain slices. Biosensors and Bioelectronics, 24(7):2191–2198, 2009.
    OpenUrlCrossRefPubMedWeb of Science
  14. [14].↵
    S. Garcia, D. Guarino, F. Jaillet, T. R. Jennings, R. Pröpper, P. L. Rautenberg, C. Rodgers, A. Sobolev, T. Wachtler, P. Yger, et al. Neo: an object model for handling electrophysiology data in multiple formats. Frontiers in neuroinformatics, 8:10, 2014.
    OpenUrl
  15. [15].↵
    T. Goto, R. Hatanaka, T. Ogawa, A. Sumiyoshi, J. Riera, and R. Kawashima. An evaluation of the conductivity profile in the somatosensory barrel cortex of wistar rats. J Neurophysiol, 104(6):3388–3412, 2010.
    OpenUrlCrossRefPubMedWeb of Science
  16. [16].↵
    N. W. Gouwens et al. Systematic generation of biophysically detailed models for diverse cortical neuron types. Nature communications, 9(1):710, 2018.
    OpenUrl
  17. [17].↵
    E. Hagen, S. Næss, T. V. Ness, and G. T. Einevoll. Multimodal modeling of neural network activity: Computing lfp, ecog, eeg, and meg signals with lfpy 2.0. Frontiers in neuroinformatics, 12, 2018.
  18. [18].↵
    E. Hagen, T. V. Ness, A. Khosrowshahi, C. Sørensen, M. Fyhn, T. Hafting, F. Franke, and G. T. Einevoll. Visapy: a python tool for biophysics-based generation of virtual spiking activity for evaluation of spike-sorting algorithms. Journal of neuroscience methods, 245:182–204, 2015.
    OpenUrlCrossRefPubMed
  19. [19].↵
    K. D. Harris, D. A. Henze, J. Csicsvari, H. Hirase, and G. Buzsaki. Accuracy of tetrode spike separation as determined by simultaneous intracellular and extracellular measurements. Journal of neurophysiology, 84(1):401–414, 2000.
    OpenUrlCrossRefPubMedWeb of Science
  20. [20].↵
    E. Hay, S. Hill, F. Schürmann, H. Markram, and I. Segev. Models of neocortical layer 5b pyramidal cells capturing a wide range of dendritic and perisomatic active properties. PLoS computational biology, 7(7):e1002107, 2011.
    OpenUrl
  21. [21].↵
    D. A. Henze, Z. Borhegyi, J. Csicsvari, A. Mamiya, K. D. Harris, and G. Buzsaki. Intracellular features predicted by extracellular recordings in the hippocampus in vivo. Journal of neurophysiology, 84(1):390–400, 2000.
    OpenUrlCrossRefPubMedWeb of Science
  22. [22].↵
    G. Hilgen, M. Sorbaro, S. Pirmoradian, J.-O. Muthmann, I. E. Kepiro, S. Ullo, C. J. Ramirez, A. P. Encinas, A. Maccione, L. Berdondini, et al. Unsupervised spike sorting for large-scale, high-density multielectrode arrays. Cell reports, 18(10):2521–2532, 2017.
    OpenUrl
  23. [23].↵
    G. R. Holt and C. Koch. Electrical interactions via the extracellular potential near cell bodies. J Comput Neurosci, 6(2):169–184, 1999.
    OpenUrlCrossRefPubMedWeb of Science
  24. [24].↵
    D. Jäckel, U. Frey, M. Fiscella, et al. Applicability of independent component analysis on high-density microelectrode array recordings. Journal of neurophysiology, 108(1):334–348, 2012.
    OpenUrlCrossRefPubMedWeb of Science
  25. [25].↵
    J. J. Jun, C. Mitelut, C. Lai, S. Gratiy, C. Anastassiou, and T. D. Harris. Real-time spike sorting platform for high-density extracellular probes with ground-truth validation and drift correction. bioRxiv, page 101030, 2017.
  26. [26].↵
    J. J. Jun, N. A. Steinmetz, J. H. Siegle, D. J. Denman, M. Bauza, B. Barbarits, A. K. Lee, C. A. Anastassiou, A. Andrei, Ç. Aydin, et al. Fully integrated silicon probes for high-density recording of neural activity. Nature, 551(7679):232, 2017.
    OpenUrlCrossRefPubMed
  27. [27].↵
    J. H. Lee, D. E. Carlson, H. S. Razaghi, W. Yao, G. A. Goetz, E. Hagen, E. Batty, E. Chichilnisky, G. T. Einevoll, and L. Paninski. Yass: yet another spike sorter. In Advances in Neural Information Processing Systems, pages 4002–4012, 2017.
  28. [28].↵
    B. Lefebvre, P. Yger, and O. Marre. Recent progress in multi-electrode spike sorting methods. Journal of Physiology-Paris, 110(4):327–335, 2016.
    OpenUrl
  29. [29].↵
    H. Lindén, E. Hagen, S. Leski, et al. LFPy: a tool for biophysical simulation of extracellular potentials generated by detailed model neurons. Frontiers in Neuroinformatics, 7:41, 2014.
    OpenUrl
  30. [30].↵
    H. Markram, E. Muller, S. Ramaswamy, et al. Reconstruction and simulation of neocortical microcircuitry. Cell, 163(2):456–492, 2015.
    OpenUrlCrossRefPubMed
  31. [31].↵
    A. Marques-Smith, J. P. Neto, G. Lopes, J. Nogueira, L. Calcaterra, J. Frazão, D. Kim, M. G. Phillips, G. Dimitriadis, and A. Kampff. Recording from the same neuron with high-density cmos probes and patch-clamp: a ground-truth dataset and an experiment in collaboration. bioRxiv, page 370080, 2018.
  32. [32].↵
    R. Migliore, C. A. Lupascu, L. L. Bologna, A. Romani, J.-D. Courcol, S. Antonel, W. A. Van Geit, A. M. Thomson, A. Mercer, S. Lange, et al. The physiological variability of channel density in hippocampal ca1 pyramidal cells and interneurons explored using a unified data-driven modeling workflow. PLoS computational biology, 14(9):e1006423, 2018.
    OpenUrl
  33. [33].↵
    S. L. Mondragón-González and E. Burguière. Bio-inspired benchmark generator for extracellular multi-unit recordings. Scientific reports, 7:43253, 2017.
    OpenUrl
  34. [34].↵
    T. V. Ness, C. Chintaluri, J. Potworowski, S. Leski, H. Glabska, D. K. Wójcik, and G. T. Einevoll. Modelling and analysis of electrical potentials recorded in microelectrode arrays (meas). Neuroin-formatics, 13(4):403–426, 2015.
    OpenUrlCrossRefPubMed
  35. [35].↵
    J. P. Neto, G. Lopes, J. Frazão, et al. Validating silicon polytrodes with paired juxtacellular recordings: method and dataset. Journal of Neurophysiology, 116(2):892–903, 2016.
    OpenUrlCrossRefPubMed
  36. [36].↵
    P. L. Nunez and R. Srinivasan. Electric fields of the brain: the neurophysics of EEG. Oxford University Press, USA, 2006.
  37. [37].↵
    M. Pachitariu, N. A. Steinmetz, S. N. Kadir, et al. Fast and accurate spike sorting of high-channel count probes with kilosort. In Advances in Neural Information Processing Systems, pages 4448–4456, 2016.
  38. [38].↵
    R. Q. Quiroga, Z. Nadasdy, and Y. Ben-Shaul. Unsupervised spike detection and sorting with wavelets and superparamagnetic clustering. Neural computation, 16(8):1661–1687, 2004.
    OpenUrlCrossRefPubMedWeb of Science
  39. [39].↵
    S. Ramaswamy, J. Courcol, M. Abdellah, et al. The neocortical microcircuit collaboration portal: a resource for rat somatosensory cortex. Front Neural Circuits, 9, 2015.
  40. [40].↵
    H. G. Rey, C. Pedreira, and R. Q. Quiroga. Past, present and future of spike sorting techniques. Brain research bulletin, 119:106–117, 2015.
    OpenUrlCrossRefPubMed
  41. [41].↵
    C. Rossant, S. N. Kadir, D. F. Goodman, J. Schulman, M. L. Hunter, A. B. Saleem, A. Grosmark, M. Belluscio, G. H. Denfield, A. S. Ecker, et al. Spike sorting for large, dense electrode arrays. Nature neuroscience, 19(4):634, 2016.
    OpenUrlCrossRefPubMed
  42. [42].↵
    N. A. Steinmetz, C. Koch, K. D. Harris, and M. Carandini. Challenges and opportunities for large-scale electrophysiology with neuropixels probes. Current opinion in neurobiology, 50:92–100, 2018.
    OpenUrlCrossRefPubMed
  43. [43].↵
    P. Yger, G. L. Spampinato, E. Esposito, B. Lefebvre, S. Deny, C. Gardella, M. Stimberg, F. Jetter, G. Zeck, S. Picaud, et al. A spike sorting toolbox for up to thousands of electrodes validated with ground truth recordings in vitro and in vivo. Elife, 7:e34518, 2018.
    OpenUrlCrossRefPubMed
Back to top
PreviousNext
Posted July 03, 2019.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
MEArec: a fast and customizable testbench simulator for ground-truth extracellular spiking activity
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
MEArec: a fast and customizable testbench simulator for ground-truth extracellular spiking activity
Alessio P. Buccino, Gaute T. Einevoll
bioRxiv 691642; doi: https://doi.org/10.1101/691642
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
MEArec: a fast and customizable testbench simulator for ground-truth extracellular spiking activity
Alessio P. Buccino, Gaute T. Einevoll
bioRxiv 691642; doi: https://doi.org/10.1101/691642

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Neuroscience
Subject Areas
All Articles
  • Animal Behavior and Cognition (2633)
  • Biochemistry (5216)
  • Bioengineering (3643)
  • Bioinformatics (15706)
  • Biophysics (7209)
  • Cancer Biology (5589)
  • Cell Biology (8037)
  • Clinical Trials (138)
  • Developmental Biology (4731)
  • Ecology (7457)
  • Epidemiology (2059)
  • Evolutionary Biology (10517)
  • Genetics (7692)
  • Genomics (10076)
  • Immunology (5144)
  • Microbiology (13816)
  • Molecular Biology (5347)
  • Neuroscience (30561)
  • Paleontology (211)
  • Pathology (870)
  • Pharmacology and Toxicology (1519)
  • Physiology (2233)
  • Plant Biology (4979)
  • Scientific Communication and Education (1036)
  • Synthetic Biology (1378)
  • Systems Biology (4128)
  • Zoology (802)