“Supersessioning”: A hardware/software system for electrophysiology spanning multiple sessions in marmosets

We introduce a straightforward, robust method for recording and analyzing spiking activity over timeframes longer than a single session, with primary application to the marmoset (Callithrix jacchus). Although in theory the marmoset’s smooth brain allows for broad deployment of powerful tools in primate cortex, in practice marmosets do not typically engage in long experimental sessions akin to rhesus monkeys. This potentially limits their value for detailed, quantitative neurophysiological study. Here we describe chronically-implanted arrays with a 3D arrangement of electrodes yielding stable single and multi-unit responses, and an analytic method for creating “supersessions” combining that array data across multiple experiments. We could match units across different recording sessions over several weeks, demonstrating the feasibility of pooling data over sessions. This could be a key tool for extending the viability of marmosets for dissecting neural computations in primate cortex.


20
The marmoset has drawn attention as a complementary nonhuman primate model system for vi-21 sual neuroscience. While the dominant primate model system in neuroscience, the rhesus monkey 22 (Macaca mulatta), has the advantage of (relatively) rich cognitive abilities, a large body and robust 23 physiology, and an aggressive work ethic, their large and convoluted (gyrified) brains currently limit 24 the number of techniques that can be applied for measurements of neural activity. Thus, despite 25 their excellent trainability for complex tasks and willingness to engage in lengthy experimental 26 sessions, the scale and variety of neurophysiological questions that can be addressed have been 27 somewhat limited by practical constraints. Recently, the common marmoset (Callithrix jacchus) has 28 emerged as a complementary primate model system because of their smooth (lissencephalic) cor-29 tex, opening up a much larger number cortical areas to the use of large-scale chronically implanted 30 electrode arrays (in addition to other techniques). However, a major current concern for adopt-31 ing the awake behaving marmoset for detailed quantitative studies is their tendency to perform 32 far fewer trials per session compared to macaques. Such a behavioral limitation would result in 33 correspondingly smaller amounts of neural data (and hence, statistical power) per experiment, un-34 dercutting the other advantages of the species, and likely limiting their applicability as a powerful 35 neurophysiological complement to the sorts of quantitative neuroscience work done in macaques. 36 To redress this fundamental potential limitation, we have developed a straightforward, user-37 friendly tool for recording from large-scale arrays in marmosets while surmounting the relatively 38 short behavioral sessions performed by this smaller (and gentler) species. First, we report success-39 Manuscript submitted to eLife allow for their relative strengths and weaknesses to be considered.

92
Neural activity apparent for more than 9 months on chronically-implanted 3D ar-93 rays 94 We recorded single and multi-unit (hereafter, "unit") activity in the brains of 2 marmosets, one with 95 a 3D N-form array in and around the middle temporal area (MT), the other with an identical array 96 placed in posterior parietal cortex (PPC). For both arrays (Figure 1 A, B, respectively), we were able 97 to record spiking activity starting a week after insertion. Activity lasted for a duration of at least 98 9 months, as depicted in Figure 1 (top rows). Figure 1 (second rows) show, in comparison, the 99 relatively short durations of individual recording sessions (approximately a half hour to an hour). 100 These durations likely reflect a lower bound on how long marmosets will work, as they were largely   113 Our goal was to identify spikes from the same units across recording sessions. This required mea-114 sures that would be robust to noise, in the sense that spikes from other neurons would not perturb 115 or distort characterization and identification of a given unit. To that aim, we focused our analysis 116 on a very short temporal window, including only the depolarization phase of a spike, represented 117 by a local minimum in the raw voltage traces. 118 For each local minimum (i.e., putative spike) in the raw voltage trace, we determined: (a) ampli-119 tude, measured as the dot product with a template (of unit power), expressed in standard devia-120 tions ( ), as calculated on the high-pass filtered voltage traces; (b) width, measured as the full width 121 at half minimum; and (c) symmetry, measured as the ratio of its falling and rising phase durations 122 (i.e., a 1 : 2 ratio means that recovering back to baseline took twice as long as reaching the voltage 123 minimum). 124 These parameterized shape characterizations of the units were put into 3D-histograms (marginals 125 shown in Figure 2 A) for each recording session, and clustered using a watershed algorithm (see 126 Methods for details). This procedure yielded shape clusters (cyan markers in Figure 2 A) for every 127 session in a common coordinate system to allow for cross-session comparisons of spike shapes. 128 Shape clusters between consecutive sessions often looked very similar, and so we further tested 129 whether they likely reflected spikes from the same or from different units. 130 Specifically, if the brain tissue was held in place by the 16 electrode shanks of the array such 131 that relative movements between the electrodes and the sampled neurons rarely happened, we 132 would always record from the same neurons and see identical spike shapes. Otherwise, if there 133 were substantial shifts in relative position between brain and electrodes, both amplitude and spike 134 shape would shift with movement, and we would be unable to track units across a large number 135 of sessions. 136 We were indeed able to systematically match units across recordings. This was done quantita-137 tively, using the Jensen-Shannon divergence as a distance measure in the histogram shape space  In conclusion, our main result is that matching simple shape statistics of spike waveforms across 155 several recording sessions using N-form arrays in marmosets is feasible, and for some units this 156 consecutive recording is possible over notably long periods of time (> 1 month). This grants us the 157 capacity to combine data from multiple experimental days, which we deem "supersessions".

158
Tuning properties on individual electrodes are stable across sessions 159 We further confirmed the stability of the measured "supersession" neuronal activity by evaluating 160 the cross-session consistency of physiological tuning properties. This evaluation was done for the 161 MT array implanted in marmoset J, where we were able to confirm that several sites on the array 162 showed directionally-tuned activity in response to moving dots in the left visual field (as expected 163 when recording from area MT in the right hemisphere).

164
The MT electrodes recorded strongly tuned multi-unit activity, so we focused on MUA super-

172
We therefore created supersessions across these sessions that exhibited stable tuning and 173 spike shapes, which allowed us to combine larger amounts of data for a single analysis. As an ex-  In this example, tuning was stable for considerably longer than one week. This demonstrates 181 that not only were shape clusters with high amplitudes were stable across sessions, but also that  Most units in a given recording were observed for several sessions 188 Having established stability of both spike waveforms and physiological tuning, we now turn to 189 report a more comprehensive statistical description of recording stability and our ability to distin-190 guish spike shape clusters (i.e., to isolate one unit from another). A summary of all tracked units 191 across recording sessions is shown in Figure 4. Spike clusters were regions in 3D-shape-histograms, 192 consisting of a set of voxels, which could be divided into boundary voxels (adjacent to a voxel out-193 side the cluster) and center voxels. If the average spike count in boundary voxels was less than 3/4 194 of the average density in center voxels, clusters were considered as "better-isolated" and shown 195 in darker colors in Figure 4. 196 We further distinguished clusters that lasted for shorter numbers of sessions (<5, orange) and waveforms from the array in MT tended to be narrower than those from the PPC array, perhaps 226 revealing a biophysical difference that our approach is capable of picking up.

227
Viewing these basic descriptive plots, we also wondered whether long term matches of spike 228 clusters might be a result of detecting different units that just happen to produce similar shapes.

229
To test this, we estimated how likely a given cluster might be mistaken for a different cluster by shapes survived less than 5 sessions ( Figure 5, yellow circles). However, we also noticed that many 238 of these clusters had relatively low amplitudes and therefore might have been lost, not due to 239 actual changes in the presence of the unit over a particular (brief) time frame, but due to insuffi-240 cient signal-to-noise ratio relative to our spike-identification standards. We therefore re-focused 241 our analysis of the relation between spike waveform uniqueness and lifetime using only clusters 242 surviving for at least 5 sessions.

243
In order to assess whether clusters with more or less common waveform shapes might show the expected lifetime at a given age, which is assuming a constant probability to lose a cluster in 247 each session. Figure 6 shows that this assumption is reasonable, as the fraction of clusters lost per 248 session does not change dramatically after 5 sessions. Importantly, except for clusters with the 249 10% most uncommon shapes, the rate at which spike clusters were lost over time did not depend 250 on how common the spike shapes of that cluster were. This is good news, as it does not appear that that were more detailed than those from single sessions.

293
Recording performance 294 With the goal of making the marmoset more strongly viable for detailed quantitative studies, we 295 aimed to develop an analysis pipeline that would be robust to different levels of recording quality, 296 measuring single-unit activity where possible, but at the same time considering multi-unit activity.

297
When applying this analysis to data recorded from implanted electrode arrays over the course of 298 more than 9 months and averaging across all recording sessions, we obtained 28 better-isolated

Manuscript submitted to eLife
Although a complete comparison between these types of array is beyond the scope of this 312 proof-of-concept tool introduction, we believe it is likely that the variations in performance ob-313 served with 'Utah' arrays in macaques were larger than for the 3D arrays we used. In fact, in mar-314 mosets, arrays with similar sizes as the ones used in this study (but with fewer electrode contacts) 315 have been reliably implanted and often measured spiking activity for months (Debnath et al., 2018). 316 We conclude this comparison by noting that we recorded from a similar number of units as Schwartz, 2011), but from a smaller region of the brain, largely thanks to the denser 3D geometry 319 of the arrays. This is another advantage on the hardware side of this tool, as it allows for larger 320 scale recordings within small brain areas in the marmoset-arrays built for larger primate brains 321 will often sparsely sample within a single area, spanning their footprint over many adjacent areas. The N-form arrays we used had the same spacing between shanks as the 'Utah' type of array 342 -albeit with a higher density of recording sites along a shank, and far fewer total shanks. Even 343 though the N-form arrays comprised only 16 shanks, we found a similar long-term stability for 344 well-isolated single units, suggesting that this number of shanks is sufficient to mitigate substantial 345 array drift. The smaller "bed of nails" also permits a slow insertion method, which we hypothesize 346 is important for avoiding damage associated with ballistic insertion methods, especially important 347 in the smaller and more delicate marmoset brain.

348
In assessing the usefulness of supersession unit data, we used relatively relaxed criteria for unit ing techniques using ultraflexible mesh electronics (Fu et al., 2016, 2017) and silicon high-density 370 arrays (Jun et al., 2017b) have not yet been systematically studied for unit longevity. In primates,

395
In reality, we found ourselves in a fruitful middle regime: Units were recorded for variable du-396 rations, in which a small fraction of units both appeared and was lost between recording sessions.

397
This process was not entirely random, as we saw that most units disappeared during the initial ses-398 sions after their appearance. This means that the chance for a unit to survive for another session 399 increased with the number of sessions that this neuron had already been observed. Hence, if we 400 were to ask which of the units we would most likely observe in a future session, the best bet would 401 be those units that were already observed for the most sessions in the past.

402
The variable lifetimes of units also provide an additional tool for raising the standard for isola- tively fast mechanism for that, with a timescale of hours to days (as opposed to weeks and months). 412 We believe that these findings can impact the planning of experiments using chronic arrays.

413
In the classical single session approach, experimenters devote part of the experimental time for  The primary reason for eschewing existing spike sorting methods was a general concern about 454 robustness when stationarity assumptions were not met across recording sessions. This is a known 455 challenge to even cutting-edge algorithms (Jun et al., 2017a). We instead chose a simple paramet-456 ric representation that was designed to be robust to noise and artifacts, which can differ from 457 session to session. Our focus was on characterizing the peak of the depolarization phase using ters of the spike shapes was essentially an optimization. We would shift a template temporally at 466 sub-sampling resolution and change its width and symmetry to best match a local minimum in the 467 raw voltage traces. In practice, this step was implemented by running the raw data through a large 468 filter bank on a GPU.

469
Our spike sorting approach did not solve the problem of overlapping spikes. However, it greatly 470 reduced the problem as the time interval needed for detection was reduced to the width of the 471 spike and thus, due to zero padding, much smaller than the the width of the templates in the fil-472 ter bank. In addition, for cases where overlapping spikes exist, we should see them in the shape 473 histograms as somewhat isolated shapes that are a bit wider and of higher amplitude than an ad-474 jacent cluster. In our data, we did not find evidence for significant numbers of overlapping spikes 475 near isolated clusters. Shape clusters were either nicely separated in the sense that overlapping 476 spikes had at least half an order of magnitude lower amplitudes, or we would be unable to separate 477 clusters in the first place, due to low amplitudes and a large number of sources, with a combined 478 firing rate beyond 100 Hz (as in Figure 3). In the latter case, peak firing rates in single trials in re-479 sponse to a stimulus can be an order of magnitude higher and we necessarily detect overlapping In this work, we used the parametric representation of local mimina as a spike sorting method.

484
But we could certainly perform spike sorting with an existing method and obtain these parametric In many cases, we observed that shape clusters appeared and disappeared gradually over time, 492 such that the observed spike amplitudes were highest around the middle of their lifetime. We 493 could thus have a situation where some shape clusters of a given unit were clearly isolated single 494 unit activity, and others were contaminated (e.g. Figure 7 I). Although this effect means that some 495 of the unit data from 'supersessions' is less well-isolated than conventional singe-session data, the 496 framework can also be used to estimate the impact of contamination for a given analysis, and 497 hence to determine in a principled manner how high an isolation standard is required.

498
To give an example how such analysis could look like, assume that we have a number of ses-499 sions (W) where a unit was well-isolated, and some sessions (C), where the same unit was contam-500 inated with low amplitude spikes from other neurons and some of its spikes were lost due to low 501 amplitudes. We would then pool data from each group (W and C) of sessions to obtain a larger 502 sample size and estimate firing rates and interspike interval histograms. Manuscript submitted to eLife number of spikes missed in group C due to low spike amplitudes, one can multiply the difference in 511 firing rates between group W and C with the total recording duration of group C and add the spike 512 count for the low amplitude spikes determined above. After doing a given analysis separately for 513 groups W and C, one could then compare the results and see how they are affected for a known 514 contamination and signal loss.

515
Furthermore, if one looked into the datasets of group W, one would likely find spikes that are 516 statistically similar to the contaminating spikes in group C, simply by identifying identically shaped 517 spikes at much lower amplitudes. Therefore, it is possible to create surrogate datasets with known 518 contamination (and, by removing spikes, signal loss) and treat them as a model to predict effects 519 on a given analysis. The above analysis would then provide independent data to test this model.

520
Apart from spike clusters, our sorting approach also gives access to low amplitude spikes that 521 do show tuned responses to visual stimulation, but likely arise from a multitude of units with a 522 continuum of corresponding spike shapes (e.g. Figure 3). For the purpose of decoding neural ac- period. We need to stress here that MUH is distinct from the 'unsorted spikes' often left behind by 531 most sorting algorithms.

532
In summary, we were able to create 'supersessions' for individual units on a timescale of sev-533 eral days to a few weeks. This allows for more statistical power than a single session's worth of 534 data can provide, and hence could put the awake marmoset preparation more on par with that of 535 macaques. This is important because the marmoset is also a "pivot species" to richer and more  The MT array was placed based on high response to direction of motion, while the LIP array was 549 placed based on high eye-movement related activity. A small craniotomy and duratomy were made 550 surrounding the desired area for array placement.

551
The N-form array was mounted on a stereotax arm and manually lowered till tips of the shanks 552 had entered the brain. The brain dimpled slightly, then the tissue relaxed around the implant.

553
The array was then slowly lowered until the baseplate was just above the brain's surface. The shanks, spaced by 400 µm. Each shank was 1.5 mm long and had 4 electrode contacts, one at its 562 tip, and three more at 250 µm, 375 µm and 500 µm distance from the tip. Extracellular signals were 563 recorded at all 64 electrode contacts with sampling rate of 30 kHz, using the OpenEphys recording 564 system (Siegle et al., 2017). For marmoset J, seven of the electrode contacts were found damaged 565 after the surgery and ignored for further analyses.  Pre-processing 590 We filtered a 60 Hz component out of the raw data for each electrode using a custom made al-591 gorithm. We also performed common average referencing by subtracting (projections onto) the 592 median of high-pass filtered signals over all electrodes from each channel. We further up-sampled 593 data to 60 kHz before feeding into Kilosort (Pachitariu et al., 2016) shape. This can be resolved by upsampling the data, but results in longer templates.

616
3. Temporally overlapping spikes: Need to be detected and fitted.

617
To address these three issues, we generated a bank of unimodal templates (essentially triangles (when compared to using templates generated from the data, about 10% of the signal power).

624
We determined local maxima (in time and width, but global in symmetry to avoid double de-625 tections) for the match (dot product) between our templates and the preprocessed voltage traces.

626
In this setting, we were fitting the peak of the depolarization phase of a spike. While an error in Spike clusters appear as local density maxima in these histograms. To show that this is indeed 638 the case, we sorted spikes with a widely used spike sorting algorithm (Kilosort, Pachitariu et al. 639 (2016)). For that, we used a low threshold for splitting clusters in the Kilosort algorithm and ex-640 tracted the shapes of the corresponding spikes from our template matching strategy. This allowed 641 us to perform the manual step of merging clusters in an automated procedure, using the Jensen-

642
Shannon divergence between shape histograms as a distance metric. 643 We obtained three dimensional histograms of shape parameters for spikes from each Kilosort 644 cluster (Figure 8 D). We compared Kilosort clusters to clusters obtained by running the watershed 645 algorithm on shape histograms and found a good match for high amplitude clusters (Figure 8 E).

646
The latter clusters were (by construction) better localized in our histograms and we decided to use 647 them instead of Kilosort clusters in the following analyses.

648
Possible extensions 649 We implemented the spike sorting for the case of single, isolated electrodes. An extension to dense 650 arrays is beyond the scope of this article, but we will briefly discuss potential implementation issues 651 here. 2. Spatial grids: memory constraints on the GPU will currently require chunking the array into 657 rows of electrodes.  Cross-session merges 669 We computed pairwise Jensen-Shannon divergences between existing clusters from the previous 670 2 sessions and clusters from the current session allowing for small shifts in amplitude, width and 671 symmetry for a penalty. Specifically, we did multiply the Jensen-Shannon divergence with the in-672 verse of Hanning kernels with a half-width of 7 (for amplitude) and 3 (width and symmetry) bins.

673
Each cluster from the current session was then merged with the existing cluster with the smallest 674 Jensen-Shannon divergence if this was below a threshold of 0.3 ln(2), otherwise it was labeled as a shape clusters, and the average of these quantities for each better-isolated unit across sessions. 697 We then ranked units according to the local density of shape clusters. A lot of short-lived units had (1) It assumes that after the N-th session, unit losses are described by a Poisson process with a fixed 705 rate.

706
Receptive fields 707 Firing rate responses were averaged across sessions and smoothed using a 41 ms Hanning kernel.