SUMMARY
State-of-the-art Ca2+ imaging studies that monitor large-scale neural dynamics can produce video datasets ~10 terabytes or more in total size, roughly comparable to ~10,000 Hollywood films. Processing such data volumes requires automated, general-purpose and fast computational methods for cell identification that are robust to a wide variety of noise sources. We introduce EXTRACT, an algorithm that is based on robust estimation theory and uses graphical processing units (GPUs) to extract neural dynamics in computing times up to 10-times faster than imaging durations. We validated EXTRACT on simulated and experimental data and processed 94 public datasets from the Allen Institute Brain Observatory in one day. Showcasing its superiority over past cell-sorting methods at removing noise contaminants, neural activity traces from EXTRACT allow more accurate decoding of animal behavior. Overall, EXTRACT provides neuroscientists with a powerful computational tool matched to the present challenges of neural Ca2+ imaging studies in behaving animals.
INTRODUCTION
State-of-the-art neural Ca2+ imaging experiments, such as those using fluorescence macroscopes1,2, can generate up to ~300 MB of imaging data per second, or >1 TB per hour of recording. Faced with such data volumes, neuroscientists need computational tools that can quickly process extremely large datasets without resorting to analytic shortcuts that sacrifice the quality of results. A pivotal step in the analysis of many large-scale Ca2+ imaging studies is the extraction of individual cells and their activity traces from the raw video data. The quality of cell extraction is critical for subsequent analyses of neural activity patterns, and, as shown below, superior analytics for cell extraction lead to superior biological results and conclusions.
Early methods for cell extraction identified neurons as regions-of-interest (ROIs) through manual3–7, semi-automated8 or automated image segmentation9,10, which in turn allowed Ca2+ activity in each ROI to be determined using either the identified spatial masks or multivariate regression. Other cell extraction methods, including independent components analysis (ICA), non-negative matrix factorization (NMF), and constrained non-negative matrix factorization (CNMF), simultaneously infer cells’ shapes and dynamics using a matrix factorization11–13. In these now widely used methods, the Ca2+ movie is treated as a three-dimensional matrix that can be approximated as the product of a two-dimensional (spatial) matrix and a one-dimensional (temporal) matrix, although the detailed assumptions about this factorization differ between the three approaches and influence their relative strengths and limitations. Together, extant cell extraction methods have enabled Ca2+ imaging studies with a wide variety of microscopy modalities and model species.
Notwithstanding the many past successes of Ca2+ imaging, neuroscientists face important computational challenges as Ca2+ imaging technology continues to progress rapidly. Many datasets contain noise that is not Gaussian-distributed, including background Ca2+ signal contaminants from neuropil or neural processes, weakly labeled or out-of-focus cell bodies, and neurons that occupy overlapping sets of pixels. For simplicity, prior algorithms have typically used signal estimators to infer cellular Ca2+ traces by assuming Gaussian-distributed contamination9,11–15. Thus, these prior methods poorly handle the non-Gaussian contaminants found in real experimental situations, impeding detection of cells and inference of their Ca2+ activity patterns. Further, due to the alternating estimation technique used in matrix factorization-based approaches13–15, errors due to mismatches between the data’s assumed and actual statistical properties can rise quickly with the number of alternating iterations. To mitigate these estimation errors, past research has applied image processing methods to process either the Ca2+ videos15 or the inferred cellular components13. However, a strict reliance on specific image processing routines can restrict a cell extraction algorithm’s utility to the specific imaging conditions or modalities for which these routines were designed. To date, no cell sorting algorithm has addressed the challenges of Ca2+ imaging within a single, generally applicable conceptual framework.
Here we present a broadly applicable cell extraction method that addresses the experimental limitations of real Ca2+ imaging datasets while also avoiding assumptions that are specific to particular imaging modalities or fluorescence labeling patterns. Using the theoretical framework of robust estimation16,17, we introduce a minimally restrictive model of data generation and derive a statistically robust method to identify neurons and their fluorescence activity traces. Robust estimation is widely used in statistics, as it provides a potent means of analyzing data that suffers from contamination, such as outlier data points, whose statistical properties differ from those of an assumed noise model (typically Gaussian)18. Instead of modeling the contamination statistics, robust estimation provides statistical estimates that have quality guarantees even in the case of the worst possible contamination.
One obtains these quality guarantees by constructing a statistical estimator that selectively downgrades the importance of contaminated, outlier observations. In the presence of Gaussian-distributed noise plus non-Gaussian outliers, non-robust estimators can suffer enormous errors, whereas a suitable robust estimator can have negligible error17. In cell extraction, robust estimation allows us to incorporate non-Gaussian contaminants into the formulation and to infer neural activity with high fidelity without having to explicitly model the contaminants in Ca2+ imaging experiments. The result is a modality-agnostic approach that makes minimal assumptions about the data. We term the algorithm EXTRACT (for EXTRACT is a tractable and robust automated cell extraction technique), and the software is openly available (https://github.com/schnitzer-lab/EXTRACT-public).
EXTRACT performs quickly and accurately with Ca2+ movies up to hundreds of gigabytes in size, due in part to its native support for graphical processing units (GPUs). For a typical imaging study, processing times with EXTRACT are an order-of-magnitude briefer than the imaging session. Even with Ca2+ videos from recent fluorescence macroscopes, EXTRACT runtimes on a standard 8-core microprocessor and one GPU are shorter than imaging durations.
We first validated EXTRACT on simulated data incorporating challenging conditions. We then analyzed experimental data from conventional, multi-plane, and mesoscopic two-photon imaging studies in head-fixed behaving mice, one-photon miniaturized microscopy studies in freely behaving mice, and the Allen Brain Observatory two-photon Ca2+ imaging dataset19. When studying data from behaving animals, we focused on how EXTRACT led to superior biological results, due to the improved quality of the Ca2+ activity traces as compared to those from prior algorithms. Specifically, we show improved identification of anatomically clustered neural activity in the striatum, enhanced identification of place- and anxiety-encoding cell populations in the ventral hippocampus, and more accurate predictions of mouse location via decoding of hippocampal neural ensemble activity, all using Ca2+ activity traces from EXTRACT.
RESULTS
A defect of conventional cell sorting: L2 loss functions are optimal only for Gaussian noise
We first illustrate the substantial shortcomings of conventional cell sorting algorithms by using a toy model in which the Ca2+ movie, M, contains a single neuron, has a field-of-view h × w pixels in size, and is n frames in duration (Fig 1A; Fig. S1). Without loss of spatial information, we refer to the two spatial dimensions using a single scalar variable whose values have a 1:1 correspondence to points in the x-y plane. With this notation we can describe M as an m × n array, where m equals the total number of pixels, hw. Within this description the column vector s (of size m) denotes the cell’s spatial profile, and the row vector t* (of size n) denotes its Ca2+ activity trace. Initially, t* is unknown. We seek an estimate, , such that the outer product, , well approximates M.
Conventionally, one finds by considering the residual, , and choosing to minimize the sum of the squared elements of R (Refs. 11,13). In other words, one places an L2 (i.e., quadratic) loss function on the residual and then minimizes this function with respect to t. This widespread method of estimating Ca2+ activity rests on an implicit assumption that R is Gaussian-distributed. Specifically, if M contains the cell’s activity plus additive Gaussian noise that is independent for each pixel, this method is optimal in that it minimizes , the mean-squared-error (MSE) between the actual, t*,and estimated, , activity traces18. In reality, however, Ca2+ imaging data are corrupted not just by Gaussian noise but also other contaminants, such as from neuropil Ca2+ activity, out-of-focus neurons, or cells with overlapping pixels. For instance, if one adds to our toy model with one cell a partially overlapping ‘distractor’ cell, this simple addition greatly impedes the estimation of Ca2+ signals from the first cell. Specifically, using an L2 loss function can lead to crosstalk from the distractor cell in the estimated trace, , for the first cell—even when regularization enforcing sparsity is used (Fig. 1B–D; Fig. S1A–D).
Robust statistical estimation of neural Ca2+ dynamics
We start our presentation of robust estimation by first relaxing the common assumption that noise is Gaussian-distributed. Signal contaminants may exist with spatially irregular and temporally non-stationary properties, as can occur when neighboring cells occupy overlapping sets of pixels or when there are Ca2+ signals from neuropil or out of focus neurons. Especially when the cells of interest are quiet, such signal contaminants can greatly exceed the Ca2+ signals we aim to extract. Second, we note that since nearly all fluorescent Ca2+ reporters have a rectified dynamic range, positive-going [Ca2+] fluctuations are reported far more strongly than negative-going fluctuations of [Ca2+] or fluorescence levels below baseline values. Based on these points, we model the noise distribution as having two components (Fig. 1E,F). There is a Gaussian-distributed component that affects a fraction, 1 – ϵ, of the pixel intensity measurements. The other component has an unknown distribution, H, and affects the remaining fraction, ϵ, of the measurements. We assume nothing about H, except that it yields non-negative measurement values, due to the rectification of the Ca2+ indicator. (More precisely, H has support on [κ, ∞), where κ is a positive number, typically on the order of 1 s.d. or less of the baseline noise fluctuations that persist after pre-processing; see Methods for details).
With this noise model, what is a suitable loss function for estimating cells’ Ca2+ signals? The lack of a prescribed noise distribution for H prevents identification of an optimal loss function that minimizes the MSE of the estimated Ca2+ activity trace, . However, by using the theory of robust statistics16,17,20, we can find a loss function that is optimal in a different sense, namely that it achieves the best MSE under the worst possible probability distribution that the unknown noise could ever assume (see Methods for proof). This loss function smoothly transitions between a quadratic function and the identity function, with the transition occurring at the value, κ, that should depend on the prevalence, ϵ, of the unknown noise component (Fig. 1D–F; Fig. S1E). The simplest approach to robust estimation uses fixed values of ϵ and κ, but one can also adaptively estimate values of ϵ and κ for each time frame from the data itself (Fig. 1D; Fig. S1B,D,E); to do this one iteratively seeks better estimates of ϵ and κ in a closed loop, while simultaneously performing robust estimation with these parameters (Methods). In this way one can let the data dictate, frame-by-frame, the degree to which the loss function should differ from its conventional L2 form.
Returning to our toy model with one cell of interest and one distractor cell, with our robust loss function we can estimate the first cell’s Ca2+ activity trace accurately, while ignoring signals from the distractor (Fig. 1C,D; Fig. S1B,D). By assuring that the MSE of the estimated Ca2+ activity trace, , is optimal in worst-case scenarios, one also obtains mathematical bounds on the magnitude of the MSE in all possible cases. Although treating worst-case scenarios might seem unduly pessimistic, real Ca2+ imaging datasets do actually contain non-Gaussian noise. This is why a use of robust estimation to account for such noise can lead to more accurate biological findings.
Cell extraction using robust estimation
Using our loss function and robust estimation (Fig. 1E,F), we now treat real data by going beyond our toy model with one cell. We consider a Ca2+ movie, M, that is a linear combination of both background signal contaminants and Ca2+ signals from an unknown number of cells, each of which contributes an activity trace given by the product of its spatial and temporal weights, sκtκ, where the index κ denotes the cell’s identity (Fig. 1E). As in prior work12,13, we accomplish cell extraction by first performing a simple (and optional) pre-processing of the movie frames, followed by two main computational stages (Fig. 2A). The pre-processing step applies a high-pass spatial filter to M to reduce background fluorescence (which is common in one-photon Ca2+ movies) and then subtracts from each pixel value its baseline fluorescence level (Methods). The first main stage of computation, ‘Robust cell finding’, identifies cells in the movie. The second main stage, ‘Cell refinement’, hones the estimates of cells’ spatial profiles and activity traces. As with the toy model above, for which an L2 loss function led to crosstalk from a distractor cell, robust estimation allows the proper isolation of individual neurons from real data, even when there is substantial spatial overlap in cells’ profiles and temporal overlap in their activity patterns.
The cell-finding stage uses a simple, iterative procedure to find cells and applies robust estimation to determine each cell’s spatial profile and activity trace (Fig. 2A,C,E). At each iteration, the algorithm finds a seed pixel that attains the movie’s maximum fluorescence intensity, and it initializes a candidate cell image at the seed pixel (Methods). The algorithm then alternatively improves its determinations of the cell’s spatial profile and activity trace via robust estimation (Fig. 2E). After the estimates of the spatial profile and activity trace stabilize, the cell’s inferred activity trace is subtracted from the movie, and in the next iteration the steps above repeat for another cell. The cell-finding procedure ends when the peak value for the activity trace of the seed pixel fails to reach a threshold value, which is set as a fixed multiple of the standard deviation of the background noise.
After cell finding, the ‘Cell-refinement’ stage improves the estimates of cells’ spatial and temporal contributions to the movie data, by accounting concurrently for all the identified cells using multivariate robust estimation (Fig. 2F; Methods). This stage is also an iterative procedure, and each iteration has three steps. First, all fluorescence traces are simultaneously updated using robust estimation, while holding fixed the cells’ spatial profiles. Second, all spatial profiles are simultaneously updated using robust estimation, while holding fixed the activity traces. Third, a validation procedure checks a set of predetermined metrics for every putative cell and removes any cell with metrics below user-set thresholds. This 3-step procedure repeats for a fixed number of iterations, and the algorithm outputs the final estimates of cells’ spatial profiles and activity traces.
Crucially, to perform these computations efficiently, we created a fast solver for robust estimation problems that combines the computational cost of a first-order optimization algorithm with a convergence behavior approaching that of second-order optimization algorithm, such as Newton’s method (Methods). Our solver is expressly adapted for and benefits greatly from the computational acceleration provided by graphical processing units (GPUs) and parallel computation.
EXTRACT allows high-fidelity cell extraction even with substantial signal contaminants
To validate a use of robust estimation for cell extraction, we first created simulated datasets on which to evaluate different cell extraction methods. We generated artificial Ca2+ imaging data with varying numbers of spatially overlapping cells with two-dimensional Gaussian shapes and activity traces comprising a set of spikes that were Poisson-distributed in time and had exponentially decaying waveforms (Fig. 3A; Methods). The artificial movies also contained additive Gaussian-distributed noise, uncorrelated between pixels. Although we did not explicitly add non-Gaussian noise, as in real datasets the spatial overlap between cells induced non-Gaussian signal contaminants. We varied the level of this contamination by adjusting the number of overlapping cells within a fixed field-of-view and by introducing temporal correlations in cells’ activity patterns (Fig. 3A; Methods).
First, we qualitatively evaluated the benefits of using robust estimation within the cell-finding stage of EXTRACT, as compared to using a conventional L2 estimator (i.e., a quadratic loss function) within this stage. We studied an artificial dataset that had 3 overlapping neurons with statistically independent spiking patterns (Fig. 3B) and compared the results from robust estimation to those from L2 estimation. For the robust estimator, we allowed κ to vary frame-by-frame so as to minimize the difference between the reconstructed and actual movie data (Methods). After running the cellfinding routine for 3 iterations in each case, robust estimation accurately identified all 3 cells, whereas with L2 estimation the activity traces had substantial crosstalk between cells, which progressively accumulated across the 3 iterations.
Next, we used multiple artificial datasets with varying densities of neurons to compare robust and non-robust estimation approaches using a variety of performance metrics (Fig. 3C–L). For each simulated Ca2+ video, we compared the results from EXTRACT using the robust loss function to those using the non-robust, L2 loss function. In both cases, we identified individual spikes in cells’ activity traces by applying a simple, threshold-based detection method to the Ca2+ traces (Methods). We computed precision-recall curves for spike detection by comparing the sets of detected and actual spikes over a range of spike detection thresholds, and we computed the mean area under the precision-recall curves (AUC) by averaging over all cells in each simulation. Notably, using robust estimation the cell-finding stage yielded substantially higher precision and recall values for spike detection than L2 estimation (Fig. 3C,D). In principle, the cell-refinement stage can correct errors incurred during cell-finding, since cell-refinement updates all the estimated cells concurrently, but in practice we found that robust estimation maintained its superiority after cell-refinement (Fig. 3C,D).
Next, we compared EXTRACT to two widely used cell extraction methods, constrained non-negative matrix factorization (CNMF)13 and the successive application of principal and independent components analyses (PCA/ICA)12 (Fig. 3E–L; Fig. S2). Like EXTRACT, CNMF is a two-stage method, but it uses regularized L2-estimation and tries to infer discrete Ca2+ events within the Ca2+ activity traces while simultaneously estimating cells’ spatial profiles and time-varying fluorescence intensities. The ICA-based approach first uses PCA to perform a dimensional reduction by identifying and then discarding principal components of the raw data whose time variations are consistent with Gaussian noise; by applying ICA to the reduced dataset, the method then un-mixes individual cells’ contributions to the fluorescence movie. Within EXTRACT, we allowed κ to vary adaptively during the cell finding stage, but for cell refinement we fixed κ = 1 s.d. of the estimated baseline noise. To evaluate the 3 methods, we tested their performances on simulations of cells that fired spikes either independently of each other, or in a temporally correlated manner, across a range of cell densities and conditions of either high (Fig. 3E–L) or low (Fig. S2) values of the optical signal-to-noise ratio (SNR).
Qualitative inspection of the estimated spatial profiles and activity traces revealed that, for cells spiking independently, both EXTRACT and ICA performed well, whereas activity traces from CNMF often suffered from crosstalk between neighboring cells (Fig. 3E). For cells with correlated spike trains, again EXTRACT performed well and CNMF produced traces with crosstalk but few instances of false negative spike detection; by comparison, ICA had reduced spike detection fidelity but almost no crosstalk, both due to the assumption in ICA of uncorrelated dynamics (Fig. 3F).
Quantitative assessments of the 3 methods corroborated these observations (Fig. 3G–L; Fig. S2). We first used the area under the precision-recall curve (AUC) to assess spike detection. For independently spiking cells, EXTRACT and ICA attained high AUC values, whereas CNMF performed more poorly (Fig. 3G; Fig. S2A). With correlated spike trains, ICA suffered a substantial decline in the AUC metric, with values comparable to those from CNMF at high SNR (Fig. 3J) and below those from CNMF at low SNR (Fig. S2D). Especially at high densities of cells, EXTRACT notably outperformed the other 2 methods and had the highest AUC values (Fig. 3G,J; Fig. S2A,D). We also determined the Pearson correlation coefficients between cells’ inferred activity traces and spatial profiles and their ground truth values (Fig. 3H,K; Fig. S2B,E); in this assessment EXTRACT surpassed or matched the other methods across all conditions.
To examine how well the different algorithms identified cells in the simulated movies, we computed precision and recall metrics for cell detection by comparing the cells found by the 3 algorithms with the actual set of cells in the simulated datasets. EXTRACT had the highest precision for cell detection, with values close to unity (Fig. 3I,L; Fig. S2C,F), showing that nearly all cells found by EXTRACT were true positives. At some cell densities and high optical SNR, ICA had slightly higher recall values, but at a cost of much lower precision values (Fig. 3I,L). At low SNR, EXTRACT had the best recall values across all cell densities (Fig. S2C,F). Overall, EXTRACT and CNMF outperformed ICA at low SNR values, and EXTRACT outperformed CNMF in nearly all conditions.
A native implementation on GPUs enables fast runtimes
EXTRACT’s main components are estimation algorithms that rely heavily on elementary matrix algebra. Thanks to several widely used software packages, such as the Intel Math Kernel Library, modern computers can perform matrix algebra operations in a highly optimized manner, which allows EXTRACT to achieve fast, computationally efficient cell extraction. Our software implementation of EXTRACT also has native support for computation on graphical processing units (GPUs), enabling even greater efficiency for matrix operations. To benchmark performance speed, we evaluated runtimes on simulated and real datasets of varying sizes.
First, we extensively tested EXTRACT on simulated movies of neural activity across a wide range of movie durations, fields-of-view and cell densities (Fig. 4A–C). We used a MATLAB implementation of EXTRACT and compared runtimes with and without GPU acceleration (Methods). For simplicity, we fixed κ = 1 s.d. for these tests. Runtimes increased close to linearly as a function of cell density and movie duration (Fig. 4A,C). When we varied the field-of-view (FOV) area while keeping the cell density constant, runtimes also rose linearly with the area (Fig. 4B). We note that, with the number of cells held constant, merely increasing the FOV does not necessarily increase the runtime, because EXTRACT only applies its estimation routines to image regions with identified cells; this minimizes computational overhead from empty regions of the FOV.
With GPU acceleration, runtimes were faster than those of the strict CPU implementation by a factor of 3 or more. On larger movies with wider fields-of-view or more image frames, the speedup from GPUs was more pronounced, as the built-in parallelization from GPUs generally allows greater performance gains with larger data structures (Fig. 4C). Both the CPU and GPU versions of EXTRACT yielded processing times comparable to or shorter than the movie durations, and the GPU version often had runtimes an order of magnitude faster than the movie durations (Fig. 4C).
To assess runtimes on real Ca2+ imaging data, we applied EXTRACT to large-scale Ca2+ movies acquired on a two-photon mesoscope1 with a 4-mm2 field-of-view (Fig. 4D; ~10 min movie durations; 17.5 Hz frame rate). We tested CNMF on the same data, which allowed us to compare the CPU and GPU versions of EXTRACT to this widely used, state-of-the-art cell extraction algorithm. We chose the parameters of EXTRACT and CNMF so as to obtain comparable output from both methods (Fig. 4E; Methods). Both versions of EXTRACT performed cell extraction more quickly than CNMF (Fig. 4E). With GPU acceleration, EXTRACT had a mean runtime of ~1.5 times the movie duration, about ~7 times faster than CNMF (Fig. 4E).
Fast, comprehensive cell extraction from the Allen Brain Observatory data repository
After validating EXTRACT on both artificial data and real data taken by two-photon imaging, we tested how well EXTRACT could process a substantial repository of Ca2+ imaging data. To perform this test at a large scale, we applied EXTRACT to the publicly available Ca2+ imaging data repository from the Allen Institute Brain Observatory19,21 (Fig. 5A–K). This data library comprises 628 sessions of in vivo two-photon Ca2+ imaging data acquired in GCaMP6-expressing cells across different visual cortical areas of behaving mice. The repository’s software development kit (SDK) has estimated spatial profiles and Ca2+ activity traces for cells from each of the movies. The spatial profiles are regions-of-interest (ROI) estimates for each cell based on its morphology. Each cell’s Ca2+ trace comes from a linear regression of the Ca2+ movie onto the cell’s ROI, after subtracting an estimate of background Ca2+ activity within the neuropil. We used these results from the SDK as a comparator for our assessments of EXTRACT.
We ran EXTRACT on 94 movies from the repository, using identical input parameters in all cases, and κ = 1 s.d. Visual inspections of the estimated Ca2+ traces revealed that those from EXTRACT had higher SNR values than those from the Allen Institute SDK, for the very same neurons (Fig. 5A–C). To confirm these observations quantitatively, we computed the SNR of the estimated Ca2+ traces from EXTRACT and the Allen Institute SDK, using sets of cells identified by both algorithms (Fig. 5H,I).
We then compared the statistics of cell detection with the 2 algorithms, by identifying the neurons found by both approaches as well as those found by only one of the two methods. Across the 94 sessions, EXTRACT identified all but a small fraction (~1%) of the cells in the Allen SDK and found many more cells not present in the Allen SDK (Fig. 5D–G). On average, EXTRACT detected over twice the number of cells (Fig. 5F). Notably, the cells identified by EXTRACT but missing from the Allen Institute SDK generally had Ca2+ traces with lower SNR values, suggesting that EXTRACT had greater sensitivity to cells with weaker optical Ca2+ signals (Fig. 5J).
We also tabulated runtime statistics for EXTRACT across all 94 Ca2+ movies, each of which was a 30-Hz-video, about 1 h in duration, with 256 × 256 pixels. EXTRACT took 12.4 ± 3.2 min per movie average for cell extraction, or ~20% of each movie’s duration (Fig. 5K; Methods). These runtime determinations are conservative, in that some of the runtime was devoted to image preprocessing, not cell extraction, and this could in principle be done beforehand.
Spatiotemporally clustered Ca2+ activity in striatal spiny projection neurons of active mice
As a first test of whether EXTRACT can yield superior biological results, we studied Ca2+ imaging data that we previously acquired in the dorsomedial striatum of freely behaving mice using a headmounted, epi-fluorescence miniature microscope22. Each dataset comprises a recording of neural Ca2+ activity, as reported using the fluorescent Ca2+ indicator GCaMP6m, in spiny projection neurons of either the direct or indirect pathway of the basal ganglia (dSPNs and iSPNs, respectively). We compared results from EXTRACT to those from PCA/ICA and from a variant of CNMF called CNMF-e that is tailored for one-photon fluorescence Ca2+ imaging15 (Fig. 6A,B).
When we inspected the neural Ca2+ activity traces from the 3 methods, our observations fit well with those from simulated datasets (Fig. 3E, F). Notably, activity traces from PCA/ICA sometimes omitted Ca2+ transients that were plainly visible by simple inspection of the raw movie data (Fig. 6C, blue dots). Further, Ca2+ activity traces from both PCA/ICA and CNMF-e exhibited crosstalk between the neighboring cells (Fig. 6C, red dots). We next investigated whether these types of errors during cell extraction could impact biological results and conclusions.
Our prior study of striatal SPNs found that mouse locomotion led to activation of SPNs in a spatiotemporally clustered manner22. However, assessments of clustered activity are likely to be influenced by missing Ca2+ transients or crosstalk between spatially adjacent neurons. For instance, crosstalk could elevate estimates of cells’ co-activation. Omitted Ca2+ transients might lead to underestimates of spatiotemporal clustering. To investigate, we used a spatial coordination metric (SCM), defined similarly to that in Ref. 22, to quantify the extent of spatially clustered activity in the striatum at each time frame (Methods). We compared the results obtained by analyzing the activity traces from EXTRACT, PCA/ICA and CNMF-e for a common set of cells.
During periods of mouse inactivity, Ca2+ activity traces from CNMF-e and PCA/ICA exhibited greater levels of correlated activity and higher SCM values as compared to the traces from EXTRACT (Fig. 6D). During locomotor activity, the traces from PCA/ICA had lower SCM values than those from CNMF-e (Fig. 6D). Notably, the ratio of the mean SCM value during locomotion to that during rest was significantly higher for the traces from EXTRACT as compared to those from CNMF-e or PCA/ICA (Fig. 6E). Perhaps most importantly, SCM values for the outputs of EXTRACT had significantly higher correlation coefficients with the mouse’s locomotor speed then the traces from either of the two other methods (Fig. 6F). We also confirmed that EXTRACT works well with two-photon Ca2+ imaging studies of dSPNs and iSPNs (Fig. S3). Overall, our results show that superior cell extraction can lead to neurophysiological signatures that relate more precisely to animal behavior.
EXTRACT detects dendrites and their Ca2+ activity
Some past cell extraction algorithms often do not provide sensible results when applied to Ca2+ videos of dendritic activity. Thus, we tested EXTRACT on videos of dendritic Ca2+ activity in cerebellar Purkinje cells and neocortical pyramidal neurons in live mice (Fig. S4). Although the default mode of EXTRACT discards candidate cells whose spatial areas or eccentricities are uncharacteristic of cell bodies, the user can opt to retain candidate sources of Ca2+ activity without regard to their morphologies, thereby allowing EXTRACT to identify active dendrites. For example, in large-scale movies of Purkinje neuron dendritic Ca2+ spiking activity acquired with a two-photon mesoscope1, EXTRACT identified the dendritic trees of >500 cells per mouse, and the extracted spatial forms had the anisotropic shapes that are characteristic of these cells’ dendritic trees, which are highly elongated in the rostral-caudal dimension12 (Fig. S4A,B). We also used EXTRACT to analyze videos of Ca2+ activity acquired by conventional two-photon microscopy in apical dendrites of layer 2/3 or layer 5 neocortical pyramidal cells in live mice (Fig. S4C,D). EXTRACT identified ~850–900 dendritic segments per mouse, and, as expected, they had a wide variety of shapes and temporally sparse Ca2+ transients. For both cerebellar and neocortical neurons, we found no limitations to the dendrite shapes that EXTRACT could identify, and it readily detected large numbers of dendritic segments.
EXTRACT improves identification of place- and anxiety-encoding cells in the ventral CA1 area
As another test of whether EXTRACT can improve biological findings, we examined the Ca2+ activity of pyramidal neurons in the CA1 area of the ventral hippocampus (Fig. 7A). We tracked the dynamics of these cells in freely behaving mice that navigated a 4-arm elevated plus maze (EPM, Fig. 7B). The EPM had 2 enclosed and 2 open arms, arranged conventionally on the perpendicular linear paths of the plus maze. The EPM assay is based on rodents’ innate aversion to open, brightly lit spaces and has been used extensively to investigate anxiety-related behavior23. A subset of ventral CA1 neurons, termed ‘anxiety cells’, show enhanced activity when the mouse is within anxiogenic regions of the EPM, namely the open arms24–26. Here, we used EXTRACT to obtain Ca2+ activity traces of ventral CA1 neurons, and we compared their encoding of the open and closed arms to that in Ca2+ activity traces obtained by applying CNMF-e to the same datasets.
In the activity traces from both EXTRACT and CNMF-e, a subset of ventral CA1 cells responded differentially when the mouse was in the open versus the closed arms (Fig. 7D). Namely, distinct subsets of cells were active when the mouse occupied the two different arm-types, in accord with past reports of anxiety-related coding by ventral CA1 cells24,25. However, Ca2+ activity traces from EXTRACT generally exhibited a purer form of coding, in that the traces were typically silent when the mouse was in one arm-type but had high activity levels in the other arm-type. By comparison, the traces from CNMF-e tended not to distinguish the two arm-types as clearly (Fig. 7E). The traces from EXTRACT also corresponded more precisely to neural Ca2+ activation events that were plainly apparent in the raw movie data (Fig. 7E, lower panels).
To quantify these observations, we compared the arm-coding cells identified using the traces from the two different cell extraction algorithms. Notably, EXTRACT yielded significantly more armcoding cells than CNMF-e (Fig. 7F; Wilcoxon signed-rank test, p < 0.05). To assess how well the Ca2+ activity traces from the two algorithms reflected events in the raw Ca2+ video data, for each cell we computed the Pearson correlation coefficient between the image of the cell, as determined by each algorithm, and the frame of the Ca2+ video at the time of each detected Ca2+ transient event (Methods). Ca2+ events identified in the activity traces from EXTRACT had significantly greater correlation coefficients than those from CNMF-e (Fig. 7G,H; Wilcoxon rank-sum test, p < 6 × 10-4), showing that EXTRACT more accurately captured the Ca2+ dynamics in the raw movie data.
Finally, we evaluated how well the sets of activity traces from the two algorithms allowed one to estimate the mouse’s behavior using decoders of neural ensemble activity. We divided the EPM into 5 spatial bins (Fig. 7I) and trained support vector machine (SVM) classifiers to predict the spatial bin occupied by the mouse based on the neural ensemble activity pattern at each time step (Methods). We compared the accuracies of the decoders using a separate subset of the data than that used to train the decoders. Irrespective of the threshold used to detect Ca2+ events in the activity traces, activity traces from EXTRACT led to superior decoding than those from CNMF-e (Fig. 7J). Strikingly, for every mouse the best performing decoder based on traces from EXTRACT outperformed the best decoder based on traces from CNMF-e (Fig. 7K).
DISCUSSION
EXTRACT is a versatile method suited for analyzing a broad range of Ca2+ imaging datasets
Here we have introduced the use of robust statistical analyses to systems neuroscience. As shown above, EXTRACT provides a superior means of analyzing somatic or dendritic Ca2+ data acquired with conventional, multi-plane or large-scale two-photon microscopes, or with head-mounted epifluorescence microscopes (Figs. 4–7; Fig. S3). This broad applicability stems from one major factor, namely that the theoretical framework on which EXTRACT is based makes minimal assumptions about the nature of the data.
The robust estimation framework does not model noise sources; instead it aims to isolate cellular Ca2+ signals from contamination sources while staying agnostic to the latter’s exact form. This approach leads to great flexibility. For example, when the contamination approximates statistically independent, Gaussian-distributed noise at each image pixel, the loss function used in EXTRACT adapts itself to behave like a linear regression loss, and it thereby achieves the optimal statistical efficiency of a standard maximum likelihood estimator18. In an opposite extreme case, when the data suffer from large contaminants due to Ca2+ activity in overlapping cells or neuropil, the EXTRACT loss function modifies its robustness parameter so as to reject these contaminants. Further, EXTRACT makes no assumptions about cell morphology, and, unlike CNMF, makes no assumptions about the temporal waveforms of Ca2+ activity. Thus, EXTRACT can detect activity in either cell bodies or dendrites, whereas with CNMF detecting dendrites can be challenging.
Several prior methods for cell sorting have sought to separate cellular Ca2+ activity from strong background contaminants. For instance, CNMF-e seeks to infer neural Ca2+ activity while modeling background activity as a linear combination of the residual activity within nearby pixels15. MIN1PIPE is also based on the CNMF method and, like CNMF-e, is mainly intended for analyses of one-photon Ca2+ imaging datasets14. It applies several image processing steps to the movie data, carefully initializes the cell locations, and then applies the CNMF method. Other authors have applied post hoc de-noising of Ca2+ activity traces, by taking a set of previously identified neurons and reestimating the Ca2+ activity traces in way that seeks to minimize crosstalk and contamination27. Common to all these prior approaches are efforts to either model the noise sources or to remove them, based on certain assumptions about the data. This general approach can lead to accurate results when the assumptions hold. However, due to the biases the assumptions introduce into the estimation process, this approach can also lead to unexpected, poor performance when the data diverges from the assumptions.
Based on this logic, EXTRACT makes few assumptions about the data and little use of image processing. Thus, while our robust estimation framework has not been fine-tuned to work optimally under specific statistical conditions, it is designed to yield high-fidelity results across a wide spectrum of data statistics. This allows EXTRACT to achieve excellent analytic performance on datasets from a variety of brain areas and Ca2+ imaging modalities.
Nevertheless, our framework does have certain limitations. Under conditions with very low optical SNR, the estimator trades off robustness for fidelity, causing it to behave more like an L2 estimator (Methods). Although EXTRACT applies spatial filtering during the pre-processing and cell finding steps to enhance the input SNR, movies with extremely low SNR lead to sub-optimal results. Nonetheless, the outputs from EXTRACT should still be sensible due to its model-agnostic nature.
An efficient implementation for fast cell extraction that scales well to large datasets
Owing to recent advances in optical technologies, such as fluorescence mesoscopes and multi-arm microscopes that can monitor multiple brain areas concurrently, Ca2+ imaging data is now routinely collected at a scale of several terabytes per publication1,2,19,22,28. Notably, time-lapse studies with multiple imaging sessions for each animal can readily produce datasets of this magnitude22,28,29. Such datasets are so large that the raw data from a single original research study typically cannot even be shared on the most commonly used public data repositories. Aside from issues of data sharing, the sheer volume of leading-edge datasets necessitates faster processing algorithms to avoid a major bottleneck in the pace of systems neuroscience research.
To handle the most massive datasets, we developed EXTRACT and showed that it can process Ca2+ movies in times that are up to ~10-fold briefer than the movie durations. EXTRACT’s built-in GPU support substantially accelerates processing, allowing cell extraction from several gigabytes of data in a few minutes. On simulated datasets, EXTRACT performed quickly in all regimes, and the runtimes scaled gracefully as dataset sizes grew (Fig. 4B,C). On the Allen Institute Brain Observatory data, EXTRACT ran in only 20% of the time of a typical recording session; this enabled batch processing of ~9 terabytes of data (~100 h of recordings) in 18 h of processing (Fig. 5G). On two-photon mesoscope recordings with a 4 mm2 FOV, EXTRACT ran much faster than the popular CNMF method while providing similar output (Fig. 4E). With these recordings, EXTRACT runtimes were comparable to the movie durations, showing that EXTRACT can readily handle neuroscientists’ most ambitious ongoing experiments.
The accelerated computation from EXTRACT’s use of GPUs does not require any special handling, such as explicit parallelization or algorithmic variations. EXTRACT runs the same code on CPUs and GPUs, if the latter are available to the user. With any suitable NVIDIA GPU installed on the analysis computer, one can readily use EXTRACT with GPU processing to achieve major speed-ups over the CPU runtime. GPUs typically cost a fraction of the analysis computer, and nowadays most pre-configured computers include GPUs that have computing capability. In addition to faster runtimes, EXTRACT’s built-in GPU support implies that, since its computationally intensive tasks are run on the GPU, the user can run other CPU-demanding software at the same time.
EXTRACT enables improved scientific results
The identification of neurons from movie data is a crucial step in neuroscience experiments that rely on Ca2+ imaging techniques for large-scale recording of neural dynamics. The extraction of individual cells and their activity traces reduces the raw data to a set of time series, the accuracy of which is crucial for the success of all subsequent analyses. Thus, EXTRACT aims to achieve high-fidelity results by avoiding extraneous image processing as much as possible while also de-noising the inference of cellular activity through robust statistical estimation. Unlike some past approaches to cell detection, we found that EXTRACT works well with Ca2+ videos of dendritic activity, which often do not provide as many fluorescence photons as videos of somatic activity. Further, our results from two separate biological experiments, in striatum and hippocampus, demonstrate that the use of EXTRACT can lead to improved scientific results.
First, we evaluated EXTRACT, CNMF-e, and PCA/ICA using Ca2+ imaging data taken from striatal spiny projection neurons (SPNs) (Fig. 6), which exhibit spatially clustered activity patterns during animal locomotion22. When we analyzed these activity patterns, the Ca2+ traces from EXTRACT revealed a greater contrast in the spatial clustering metric (SCM) between periods of locomotion and those of rest, as well as higher correlation coefficients between SCM values and locomotor speeds, as compared to the results obtained using traces from PCA/ICA or CNMF-e (Fig. 6D–F). This fits with our observations that EXTRACT made the fewest mistakes during the cell extraction process, as seen by comparing the traces from all 3 algorithms to the raw data (Fig. 6C).
Second, we characterized place- and anxiety-related representations in the ventral hippocampus of mice behaving within an elevated plus-maze (Fig. 7). Using the neuronal Ca2+ traces from EXTRACT, we identified significantly more cells with anxiety-related coding than when we used the outputs of CNMF-e (Fig. 7E,F). Moreover, the use of EXTRACT also led to superior decoding analyses (Fig. 7I–K), in that the traces from EXTRACT enabled better estimates than CNMF-e of the animals’ locomotor trajectories (Fig. 7K). These results confirm that accurate biological findings require accurate reconstructions of neuronal activity and show that EXTRACT improves the results from downstream computational analyses, especially when the raw data may have substantial noise or fluorescence contaminants.
Outlook
Ca2+ imaging technology continues to progress rapidly, with new tools arising for multi-color Ca2+ imaging of multiple cell types and three-dimensional Ca2+ imaging. Techniques for high-speed optical voltage imaging are also making rapid strides and provide direct access to neural membrane voltage dynamics. Because EXTRACT makes so few assumptions about the data statistics, future versions of the algorithm should be applicable to the data from these emerging imaging modalities with only straightforward modifications.
Moreover, to increase the numbers of neurons that can be tracked simultaneously, new imaging approaches are arising in which cells from multiple planes in tissue are deliberately superposed in the raw video data30–33; the cells and their activity traces must then be disentangled through offline data analysis. EXTRACT’s capability for high-fidelity isolation of individual cells, even when cells substantially overlap one another in the raw data, should facilitate multi-plane imaging by enabling a greater number of planes to be sampled concurrently while still being able to computationally extract the individual neurons from dense sets of overlapping cells. More broadly, we expect that the general framework of robust statistics will have broad applications throughout systems neuroscience for analyses of many types of recording data, both optical and electrophysiological.
METHODS
Mice
All procedures were approved by the Stanford University Administrative Panel on Laboratory Animal Care (APLAC) in accordance with American Veterinary Medical Association guidelines. Ca2+ imaging studies in the ventral hippocampus used male double-transgenic CaMKII-GCaMP6s mice (tetO-GCaMP6s-2Niell/J: Camk2a-tTA-1Mmay/DboJ, Jackson Laboratory, stock #007004 and #024742 respectively) aged 12-16 weeks at the start of experimentation34.
For Ca2+ imaging studies of cerebellar Purkinje neuron dendritic trees, we used mice that were a cross of PCP2-Cre driver mice with a Bl6-129 genetic background and Ai148 transgenic mice35; the resulting double transgenic mice (PCP2-cre/TIGRE-loxP-stop-loxP-CAG-tTA2-TRE-GCaMP6f [Ai148]) expressed the GCaMP6f Ca2+ indicator selectively in Purkinje cells.
EXTRACT ALGORITHM
Mathematical variables
We denote the size of the imaging field-of-view as h × w, in units of pixels. We refer to the scalar product hw as m. We use boldface characters for arrays and non-boldface characters for scalars. As in the main text, we denote the movie matrix as M (flattened in space, so that M is a twodimensional matrix), the matrix of spatial weights (cell images) as S, and the matrix of temporal weights (Ca2+ traces) as T.
Definition of Signal-to-noise Ratio (SNR)
We define the signal-to-noise ratio (SNR) for a given signal as the ratio of the maximum value of the signal divided by the s.d. of the noise. We computed the noise s.d by obtaining the power spectral density (PSD) of the signal using a Fourier transform, then taking the spectral power across the upper half of the frequency range, where most of the fluorescence dynamics comprise noise fluctuations, not high-frequency Ca2+ excitation, and extrapolating the power found there to the rest of the spectrum. We computed the SNR at an individual image pixel by considering the time-varying fluorescence from that pixel as the signal.
Theory of Robust Estimation in the Presence of Large Non-negative Contaminants
Here, we introduce our signal estimation approach, based on the theory of robust M-estimation. This theory is well-developed for symmetric and certain asymmetric contamination regimes16,36–38. However, prior theoretical work does not readily suggest an optimal estimator that is suitable for use with the types of signals that arise in neural Ca2+ imaging studies. Thus, we first motivate and introduce a simple mathematical abstraction for treating such studies. We then derive a minimax optimal M-estimator. For simplicity, we present our treatment in the setting of univariate estimation, which generalizes in a straightforward way to multivariate regression.
Given the nature of signal contaminants in Ca2+ imaging datasets, we create a noise model based on the observation that most fluctuations in the fluorescence background are well modeled as being Gaussian-distributed. This type of noise stems from the stochastic emission, propagation and detection of photons, which are all Poisson processes, implying that the numbers of detected photons are Gaussian-distributed when there are large numbers of photons. However, the fluorescence background also contains other sources of noise or contamination, such as from neuropil Ca2+ activity, out-of-focus cells, and residual activity of overlapping cells that are not detected and well accounted for by the cell extraction method. This latter category of contamination is very distinct from normally distributed noise; namely, it is non-negative (or above the signal baseline), its characteristics can be highly irregular, and it may take on large values. Therefore, we model the data generation process as having an additive noise source that is normally distributed a fraction 1 – ϵ of the time, but which is free to be any positive value greater than a threshold otherwise:
Here, yi denotes an experimental observation, which deviates from β*, the true value of the measured quantity, due to corruption with an additive noise term, σi. This noise term, σi, is normally distributed with 1 – ϵ probability and distributed according to an unknown distribution, Hα, with probability ϵ. For the sake of generality, we allow Hα to be any probability distribution with support over the range [α, ∞), for a fixed value α ≥ 0. In particular, Hα can be nonzero over an arbitrarily large range of possible noise values. Therefore, ϵ can be interpreted as setting the extent or severity of ‘gross contamination’. If ϵ is small, the noise will be close to Gaussian-distributed. On the other hand, as ϵ nears one, the noise distribution deviates from a normal distribution to an arbitrary extent. The parameter, α, can be interpreted as the minimum observed value of the positive contamination; its exact value is insignificant outside the realm of our theoretical treatment. We denote the full distribution of the noise as FHα, subscripted by Hα.
Given a set of experimental observations , we form an estimate, , of the true parameter, β*, by considering an equivariant M-estimator:
Typically, M-estimators are characterized by estimator functions, ψ, that are defined as the derivative of ρ, . Here we consider ψ’s with specific properties that enable efficient optimization and allow general theoretical guarantees.
We define a set, Ψ = {ψ | ψ is a montonically increasing function}. If we choose an estimator function, ψ ∈ Ψ, finding a point estimate, , is equivalent to solving the following first-order condition for :
This is simply because the members of Ψ correspond to convex loss functions. Our focus is on such functions, because they are typically easier to optimize and offer global optimality guarantees. We seek an M-estimator for our noise model that is robust to variations in the noise distribution (Hα in particular), in the sense of minimizing the worst-case deviation from the true parameter, as measured by the mean squared error. We first introduce our proposed estimator and then show that it is exactly optimal in the aforementioned minimax sense.
We define an estimator function, ψ0, as follows: where κ is defined in terms of the contamination level, ϵ, according to in which Φ(·) and ϕ(·) denote the distribution and the density functions for a standard normal variable. We refer to ψ0 as the one-sided Huber function and denote its corresponding loss function as ρ0(·, κ). Clearly, ψ0 ∈ Ψ, and therefore the loss function, ρ0, is convex. Under our proposed data generation model, we can now state an asymptotic minimax result for ψ0:
The one-sided Huber function, ψ0, yields an asymptotically unbiased M-estimator for . Further, ψ0 minimizes the worst-case asymptotic variance in , i.e.
Proof:
First, note that F = (1 - ϵ)Φ + ϵH yields an unbiased M-estimator for ψ0 if and only if
Using Φ(κ) + ϕ(κ)/κ = 1/(1 – ϵ) for the first term on the right-hand side, we obtain which is satisfied if and only if the support of H is [κ, ∞).
For the variance calculations, we use the fact that the one-sided Huber estimator of ψ0 is unbiased for the class of distributions . We calculate the variance for ψ0 for some using . The numerator can be written as
Similarly, for the denominator, we write
Therefore, the asymptotic variance is given as V(ψ0, F) = [(1 – ϵ)Φ(κ)]-1, which is constant over the contamination class .
Now, define a distribution F0 by its density f0 satisfying the condition –dlog(f0)/dt = ψ0:
First, we need to check whether . It is easy to check that f0 (and the corresponding contamination) is a distribution, i.e. it integrates to 1 by the condition Φ(κ) + ϕ(κ)/κ = 1/(1 – ϵ). Then, , we have
Moreover, a straightforward application of the Cauchy-Schwartz inequality yields with equality only if ψ ∝ f′0/f0, where I(F0) = (1 – ϵ)Φ(κ) is the Fisher information governing the minimum possible asymptotic variance. Combining this with the previous result, we obtain
Finally, note that the left equality is weaker than the statement in (1). This proof of Proposition 1 establishes that the one-sided Huber estimator has zero bias as long as the non-zero contamination is sufficiently larger than zero, and it also achieves the best worst-case asymptotic variance.
We now compare the one-sided Huber and some other popular M-estimators, such as the sample mean (ℓ2 loss), the sample median (ℓ1 loss), the Huber estimator, and the sample quantile. First of all, given our model of the noise, the sample mean, the sample median, and Huber estimators all have symmetric loss functions and therefore suffer from bias. This effect is particularly severe for the sample mean estimator and leads to an unbounded MSE when gross contamination takes on very large values. The bias problem may be eliminated using a quantile estimator whose quantile level is set according to ϵ. However, this estimator has a higher asymptotic variance than the onesided Huber estimator. Although we have not encountered a prior study of a one-sided Huber estimator, it is related to the technique in Ref. 39, in which samples are assumed to be non-negative, and in which the sample mean estimator summands are shrunk when they are above a certain threshold (this technique is called winsorizing). However, the model and application in Ref. 39 are both quite different than those we consider.
We can now introduce the regression setting that we use for solving for the temporal and spatial weight matrices. We illustrate this for the simple case of solving one row in the spatial weights matrix, or one column in the temporal weights matrix. We observe , where could be either fixed or random, and yi’s are generated according to , where is the true value of the parameter to be estimated, and σi is as previously defined. We estimate β* with
Classical M-estimation theory establishes, under certain regularity conditions, that the minimax optimality in the univariate case carries over to multivariate regression; we refer the reader to Ref. 20 for details.
Solving the Robust Regression Problem with a Fast, Custom Method
We seek to solve the robust regression problem of equation (2) in a large-scale setting, given the large field-of-view and duration of most neural Ca2+ videos. Hence, the solver for our problem should, ideally, be tractable for large n and provide as accurate an output as possible. To this end, we propose a fast optimization method that has a step cost equal to that of gradient descent while making use of second-order information and exhibiting similar behavior to Newton’s method:
Below we present the convergence result for our solver described in Algorithm 1.
Let β* be the fixed point of Algorithm 1 for the problem in equation (2), and let λmax and λmin > 0 denote the extreme eigenvalues of , and let max A ║xi║≤ k. Assume that for a subset of indices s ⊂ {1, 2,…, n}, ∃Δs > 0 such that yi – 〈xi, β*) ≤ κ – Δs, and denote the extreme eigenvalues of by γmax and γmin > 0 satisfying . If the initial point β0 is close to the true minimizer, i.e., ║β0 – β*║2≤ κ/Δs, then Algorithm 1 converges linearly,
Proof:
We consider the following objective function
Setting:
Assume that for some S ⊂ [n] and Δs > 0 such that yi – 〈xi, β*) ≤ κ – △s for i ∈ S ⊂ [n]. (Including more indices in S results in smaller Δs).
Let maxi|xi║ ≤ κ.
. This assumption is reasonable when n is large and consequently there are many samples in the quadratic regime.
λmax and λmin are the largest and smallest eigenvalues of XTX, respectively.
For β in the ball centered around β* with radius Δs/k, we have for ∀i ∈ S,
Therefore, when the iterates β get close to the true minimizer, ∀i ∈ S, the residual corresponding to sample i falls into the quadratic region. This implies that the Hessian satisfies which says that in the ball B = {β: ║β – β*║2 ≤ Δ/κ}, the objective function f is λs-strongly convex. Strong convexity implies smoothness, i.e., for ∀β ∈ B. In this regime, the following calculation is standard.
Assuming that the current iterate is β, our approach takes a step of the following form:
By γs-smoothness, we can write:
By λs-strong convexity:
The second inequality follows from setting β′ = β – 1/λminΔf (β), which is the minimizer of the right-hand side of the first line. Choosing β′ = β* above yields
Using this and the smoothness inequality, we write
This is linear convergence with coefficient and the following condition must hold:
Relation between our fast solver and Newton’s method for the robust estimation problem
For a convex function , unconstrained Newton update on the parameter reads
In our algorithm, and where St = {i ∈ [n]: yi – 〈xi, βt) ≤ κ }.
Replacing the Hessian with , we can write the update as which reduces to the update step of our solver.
As shown above, our solver is second-order in nature hence its convergence behavior should be close to that of Newton’s method. However, there is one caveat: the second derivative of the onesided Huber loss is not continuous. Therefore, one cannot expect to achieve a quadratic rate of convergence; this issue is commonly encountered in M-estimation. Nevertheless, Algorithm 1 converges very quickly in practice.
Setting κ Adaptively in Robust Estimation
We write ∈ and κ for the true values of these parameters. Recall that the two are related as
We introduce a shorthand function f:
We have the following routine for estimating κ: We assume that we start with a fixed κ′ (usually set to 1), for which we find a β estimate, and then compute the residual to estimate the true κ. We denote any estimate of κ with . We use an iterative scheme in which we set and do a few iterations to get increasingly finer estimates, . When estimating κ, for simplicity we only deal with univariate regression (scalar κ). We use to denote the residual for any given β.
Ideally, we would use an estimator with the lowest possible variance. On the other hand, it is important in practice to restrict ourselves to estimators that are computationally efficient to use. Therefore, we use an estimator for which we assume σi has a density fHκ (distributed according to our noise model with true parameter κ) and denote with hκ the density of Hκ. Let . We can obtain a straightforward relationship between κ′ and pi (only in the asymptotic regime) as follows:
Once we establish a mapping between the asymptotic bias, b, and κ, we can estimate κ from above. However, in practice we will not be able to get a good estimate of pi, and we need to estimate an aggregate quantity by averaging over multiple measurements. Therefore, we need to deal with the following quantity:
We can find a relationship between xib and κ′, κ using the asymptotic optimality condition, which we will simply refer to as the bias condition:
Note that if b = 0, we have κ′ = κ. In general, we can use this to eliminate ϵ and get
In order to isolate xib, we can approximate f using its first order Taylor expansion around κ′:
We wish to plug (5) into (4) to eliminate χib and attain a relationship directly between κ and the data related quantities. (From there, we can estimate the real κ with a ). For this, we need to isolate ∑ixib from (4). We simply expand the normal CDF Φ around xib = 0 using its 1st-order approximation and get
We summarize the procedure to estimate κ in Algorithm 2 below.
In practice, we use adaptive κ only for cell finding, via the univariate estimation scheme above.
EXTRACT Preprocessing Module
The spatial high-pass filter is a second-order high-pass Butterworth filter designed in the frequency domain with a cutoff determined by the user-provided average cell radius. First, a corner frequency is computed by 1/π /radius, and then the cutoff for the Butterworth filter is determined (separately in the x and y directions) by dividing the corner frequency by a dimensionless factor set by the user (the default value is 5). The resulting high-pass filter is multiplied with each frame of the Ca2+ movie in the spatial frequency domain and then transformed back to real-space. EXTRACT preprocessing also supports spatial low-pass filtering of the movie for smoothing. The spatial low-pass filter is also a second-order Butterworth filter, and the cutoff frequency is obtained by multiplying the corner frequency by another user-set, dimensionless constant (the default value is 2).
The baseline removal method is applied separately to the time trace of each movie pixel. The method samples the baseline at regularly spaced time points by taking the mode of the Ca2+ values within a temporal interval (a constant multiple of the GCaMP time constant) surrounding each chosen time point. It then smooths this coarsely sampled baseline with a moving average filter that computes a mean intensity value across 5 time bins. Finally, it uses linear interpolation to generate baseline values for all time points. The baseline trace is subtracted from the Ca2+ trace of the input pixel to yield the output.
EXTRACT Cell Finding Module
In the cell finding module, we first compute a ‘smoothed’ maximum projection image of the whole movie, which we obtain as follows. For each movie pixel, we first identify the time point at which the Ca2+ activity of the pixel reaches its maximum. We record this information in an array . We then compute the “smoothed” maximum projection image, as
In other words, for each pixel i, we average the values of the Ca2+ activity of the pixel over the time points at which neighboring pixels had their activity maximums. The function, neighbors(o), selects the neighboring pixels of a given pixel; this is done in practice by creating a binary circular mask around the query pixel with a radius of 2 pixels and returning the indices that are nonzero. This procedure has the advantage that it reports values close to the maximum values of pixels within cells, due to the co-activation of a neighborhood of pixels within, whereas the activity of the noise pixels is substantially mitigated due to averaging over uncorrelated activity.
At every iteration of the cell finding module, a seed pixel is chosen as the brightest pixel in the smoothed maximum projection array, p, and then a cell image centered at the seed pixel is initialized. This initialization is done either by generating a Gaussian shape with a radius equal to a user-given radius estimate, or by using the temporal Pearson correlation of the Ca2+ activity of the seed pixel with the movie, and truncating the correlation image at 0.5 of its maximum. With the resulting estimate of the cell image, the temporal Ca2+ trace of the cell is obtained using a one-component robust regression. The cell image is then re-estimated using the same regression routine, this time with the trace estimate as the input. This alternating estimation scheme is repeated either 10 times, or until the relative change in the cell image and the trace estimates between iterations are <1% as measured by the L2 norm.
For the one-component robust regression, we optimize the one-sided Huber loss with a non-negativity constraint on the cell image and the trace using the Newton’s method. The non-negativity constraint follows from our fundamental assumption that the neural activity always rises above the baseline noise, and this constraint leads to more sparse solutions. The non-negativity constraint is enforced by solving the problem using Newton’s method first, and truncating the result at zero. This returns the same result as if non-negativity was enforced during optimization, because it is a scalar estimation problem. After obtaining the cell image, s, and the trace, t, for the identified cell, we subtract the contribution from this cell by setting M to M – st. We then re-compute the smoothed maximum projection, p, for only the movie pixels that were affected by the activity subtraction.
At the end of each iteration, we apply a quality check to the cell image and the trace of the identified cell to decide whether to include it in the set of identified cells. We discard cells that occupy an abnormal number of pixels given the expected area of a typical cell (as computed from the user-provided estimate of a cell’s radius). We also compute the trace SNR for each cell, and discard it if the trace SNR is lower than the user-provided threshold.
We terminate cell finding if any of the following conditions are met: 1) The maximum allowed number of iterations set by the user has been exceeded 2) The pixel-wise SNR in the current seed pixel is lower than the user-provided SNR threshold 3) The running yield, defined as the fraction of good cells over the last 10 iterations, is lower than 1 in 10. The cell finding module outputs the spatial and temporal weights of the identified components in two matrices: the spatial weights matrix S, whose columns contain the (flattened) cell images, and T, the temporal weights matrix, whose rows contain the corresponding Ca2+ traces.
EXTRACT Refinement Module
In the refinement module, we update the entire spatial weights matrix or the entire temporal weights matrix at once by multivariate regression using the above-introduced fast solver. For estimating both S and T, we impose the constraint that they are non-negative, as in the cell finding step. When solving for S only, we compute a binary mask obtained by convolving each cell image with a disk filter of a radius equal to the average cell radius, followed by binary thresholding. We then add the following constraint:
This constraint ensures that estimation of each component is restricted to a local neighborhood, preventing artifacts due to strong spatiotemporal co-activity between spatially distinct regions of the movie. This local restriction constraint defines a convex set, hence it can be added to the estimation problem without violating convexity.
Overall, given M and T, the S-estimation step solves the following problem:
Given M and S, the T-estimation step solves the following problem:
We solve both of these problems with a consensus optimization method that is based on dual ascent, termed ‘alternating direction method of multipliers’ (ADMM40). Adding constraints to our original problem through ADMM is straightforward, and it allows us to use our fast solver, robust_solve(·) as a subroutine.
After each alternating estimation step, which involves first solving for T given S, and then for S given T, we compute several quality metrics and discard the subset of cells for which any of the computed metrics are worse than certain user-set thresholds. In particular, we compute the following quality metrics:
Trace SNR
We compute the trace SNR for each component given its Ca2+ trace. We eliminate cells whose trace SNR is below the trace SNR threshold.
Area of the cell image
We compute the area of each cell image by summing the number of pixels with spatial weight >0.1 times the maximum weight. If the calculated area is smaller than a lower threshold or higher than an upper threshold, then the cell is discarded.
Duplicate cells
We check whether cells are duplicates by separately examining (a) the similarities of cell images, and (b) the overall similarities of cells’ spatiotemporal profiles. For the former check, we first smooth the cell images by convolving them with a two-dimensional Gaussian kernel with σ equal to half the average cell radius. After this, we compute Pearson correlation coefficients between pairs of smoothed cell images and then apply a binary threshold at 0.95. We then treat this thresholded correlation matrix as a graph adjacency matrix, and we find the connected components using MATLAB’s graphconncomp() function. For each set of connected components, we identify the component with the most edges in the set, and we mark it as a duplicated cell. Although this procedure identifies only one cell per iteration within a highly similar set of cells, we have empirically found it to be effective in eliminating duplicates across iterations of cell refinement. For identification of duplicates based on spatiotemporal similarity, we follow the same procedure, but we fuse the spatial and temporal similarity through the following two steps: 1) We obtain a temporal correlation matrix by first pre-conditioning the temporal matrix, T, with the matrix of correlations between smoothed cell images and then computing the Pearson correlation coefficients between pairs of components in the pre-conditioned T. This allows us to enforce spatial proximity within the computations of trace similarity. 2) We then obtain a spatiotemporal similarity matrix via an elementwise multiplication of the temporal correlation matrix with the spatial correlation matrix computed above. A binary thresholding is applied to the resulting correlation matrix at 0.95 to obtain the graph adjacency matrix, and the above steps are repeated for this procedure to identify duplicates.
Spatial corruption metric
We compute a spatial corruption metric that measures the lack of local smoothness in a cell’s spatial weight values. We do this based on a heuristic that compares the variance of spatial weights for each cell to a ‘local variance’ for the same cell. We first compute the empirical variance of the spatial weights that are larger than 10-3. We then compute the local variance as the sum of squared distances between the spatial weight for a pixel and that after applying 2D low-pass filtering based on a square kernel with uniform weights over a 4 × 4 pixel neighborhood. The spatial corruption metric is the ratio of the local variance to the spatial weight variance. Intuitively, better-looking cells have negligible local variance when compared to the spatial weight variance, so the spatial corruption metric will be small for these cells. In the algorithm, the threshold for spatial corruption is set at 0.7, based on our experience of spatial corruption metric values across datasets.
Spatiotemporal match metrics
We use two quality metrics that are intended to assess the relative spatiotemporal contribution of the cell with respect to the power of the cell signal. The first metric looks at the mean gap (averaged over all movie frames) between the cellular activity within the ROI encapsulated by a cell’s spatial weights (weighted by the spatial weights), and the same cell’s fluorescence trace. This metric accounts for the activity within the ROI that is not explained by a cell’s fluorescence trace. The second metric looks at the mean gap (averaged over movie frames) between a cell’s fluorescence trace and nearby fluorescence activity in its vicinity. This metric accounts for the spurious activity estimated to belong to a cell that is attributable to its surroundings. Our implementation for these metrics can be found in our codebase inside the function find_spurious_cells(), which can be referred to for full details on how the various fluorescence activity traces are computed. Both metrics must be <10-2 for EXTRACT to accept the identified cell in the output.
Set of output activity traces
EXTRACT provides two options regarding the final set of estimated Ca2+ activity traces, termed ‘non-negative’ or ‘raw’ in the software Github. With both options, the robust solver operates under the constraint that the Ca2+ signals must be non-negative until the end of the cell refinement process. The motivation for this constraint is that EXTRACT considers activity below each cell’s baseline value to be noise, where the baseline is determined by the cell’s time-averaged mean fluorescence. The algorithm thresholds all activity that is below this baseline, which leads to non-negative activity traces. When the ‘non-negative’ option is selected, EXTRACT provides these non-negative traces to the user, and throughout the paper we used this option. However, if the user selects the ‘raw’ option, EXTRACT performs an additional final round of robust estimation to solve for the activity traces using the final set of cells’ spatial profiles, but with the non-negativity constraint removed from the robust solver.
Computer Hardware
For all studies involving CPU implementation of EXTRACT, we used an Intel® Xeon(R) CPU E5-2637 v4 @ 3.50GHz × 16 computer. For all studies involving a GPU implementation, we used a single NVIDIA GTX 1080 processor.
Simulated Ca2+ Imaging Datasets
We created synthetic Ca2+ imaging data that is designed to be representative of the Ca2+ activity of cortical pyramidal neurons. The generation of synthetic data comprised three independent steps.
In the first step, we simulated the Ca2+ traces of neurons assuming a 10 Hz imaging frame rate. For this, we first simulated spike trains for each cell by assuming that spike occurrences were governed by a Bernoulli random variable with a probability of 0.01, corresponding to spike rate of 0.1 Hz. We then convolved the resulting spike trains with an exponentially decaying temporal kernel of the form , and we chose τ = 10 time bins. This corresponds to a decay time constant of 1 s, roughly comparable to that of GCaMP6m (Ref. 3). To simulate data with correlated spiking, instead of independently generating the spike trains of each cell, we synchronized the instantaneous firing probabilities of groups of cells. Specifically, we clustered cells into groups of 5, and then at each time point, with synchronization probability (chosen as 0.2), we assigned new spiking probabilities to each neuron such that all cells of the same group shared a common spiking probability. After simulating the synchronized spikes in this way, we adjusted the baseline spiking rate of each cell to keep its overall mean firing rate constant at 0.1 Hz.
In the second step, we simulated spatial profiles of individual neurons. For this, we used a fixed-sized square field-of-view, with each square pixel corresponding to a 1 μm2 image region. We created the fluorescence image of each cell independently from the others by randomly sampling a two-dimensional Gaussian distribution oriented in a random direction relative to the x-y coordinate axes of the movie. For each cell, we independently and randomly chose s.d. values for this Gaussian distribution between 2.5–5 pixels, in order to have an effective cell radius ranging between 5–10 μm, approximating the radius as twice the s.d. of the Gaussian. We truncated the weights of each cell to 0.01 of its maximum weight, setting weight values beneath this threshold to zero. The cell centroids were randomly distributed within the field of view, and we enforced a minimum distance between the cells’ centroids. This minimum distance was 4 μm for the quantitative comparisons between the different cell detection algorithms and 7 μm for the studies of algorithmic runtimes.
In the third and the final step, we generated the noise components of the synthetic Ca2+ movie by sampling random values from a normal distribution for each pixel and each time point, with a s.d. set according to the desired mean pixel-wise SNR for the movie. We generated the final synthetic movie as the product of the matrix of the cells’ spatial weights and that of their Ca2+ traces, with the noise matrix added to this matrix product.
For the runtime experiments, the number of cells generated was controlled by the cell density, which we defined in units of the number of cells per mm2. We set the cell density between 1000–6000 cells per mm2, guided by the upper limits of the local, neuronal densities encountered in two-photon imaging studies of the neocortex (~1500 cells per mm2 for datasets from the Allen Brain Observatory) and in one-photon imaging studies of the CA1 area of hippocampus (~6000 cells per mm2 for CA1 pyramidal neurons41).
Published Ca2+ Imaging Data
The published Ca2+ imaging datasets of Fig. 4D,E were taken with a custom-built two-photon mesoscope based on 16 spatiotemporally multiplexed illumination beams that collectively sweep across a 2 mm × 2 mm area of brain tissue at an image frame-acquisition rate of 17.5 Hz, as previously described1. In brief, these movies of Ca2+ activity were acquired in cortical area V1 (plus some surrounding regions) of triple transgenic, GCaMP6f-tTA-dCre mice that express the Ca2+ indicator GCaMP6f in layer 2/3 neocortical pyramidal neurons.
The Ca2+ imaging data used for Fig. S4C,D were from studies of dendritic excitation in neocortical pyramidal neurons42, for which processed data are publicly available (https://gui.dandiarchive.org/#/dandiset/000037/draft).
Ca2+ videos from the Allen Brain Observatory were originally 512 × 512 pixels and about ~1 h in duration (http://alleninstitute.github.io/AllenSDK/brain_observatory.html), but before running EXTRACT we downsampled them to 256 × 256 pixels.
Surgical Procedures
For imaging studies of the ventral hippocampus, all surgeries were conducted under aseptic conditions using a digital small animal stereotaxis instrument (David Kopf Instruments). Double-transgenic (tetO-GCaMP6s-2Niell/J: Camk2a-tTA-1Mmay/DboJ) mice expressing GcaMP6s were anesthetized with isoflurane (5% induction, 1-2% maintenance, both in oxygen) in the stereotactic frame for the entire surgery. Body temperature was maintained using a heating pad. A craniotomy centered on the injection coordinates was performed using a trephine drill (1.0 mm in diameter). To prevent increased intracranial pressure due to the insertion of the implant, we aspirated brain tissue until the white fibers of the corpus callosum became visible. Next, we slowly lowered a custom-designed 0.6-mm-diameter microendoscope probe (Grintech GmBH) to the coordinates −3.40 mm AP, −3.75mm ML, −3.75mm DV. We fixed the implanted microendoscope to the skull using ultraviolet-light-curable glue (Loctite 4305). To ensure stable attachment of the implant, we inserted two small screws into the skull above the contralateral cerebellum and contralateral sensory cortex (18-8 S/S, Component Supply). We then applied Metabond (Parkell) around both screws, the implant and the surrounding cranium. Lastly, we applied dental acrylic cement (Coltene, Whaledent) on top of the Metabond, for the joint purpose of attaching a metal head bar to the cranium and to further stabilize the implant. After surgery, we maintained the animal’s body temperature using a heating pad until it fully recovered from anesthesia.
Mice recovered for 3–6 weeks, at which point we checked the brightness of GCaMP6s expression using a miniature microscope (nVista HD, Inscopix, Inc.). If expression was sufficiently bright, a baseplate for repetitive mounting of the miniature microscope was fixed unto the skull using blue-light curable composite (Pentron, Flow-It N11VI).
For imaging studies of cerebellar Purkinje neurons, we followed our published procedures43 and performed surgeries on isoflurane-anesthetized PCP2-Cre/Ai148 mice (1.25–2.5% in 0.5–1.5 L/min of O2). We first cleaned and removed skin to reveal part of the skull. We then opened a 4-mm-diameter craniotomy centered mediolaterally on the midline, and rostrocaudally at the boundary between cerebellar lobules V and VI. We attached a 3-mm-diameter cover slip beneath a 3-mm-diameter and 1-mm-high stainless steel ring using ultraviolet-light activated epoxy (Norland NOA81). We then implanted the cover slip / steel ring combination into the craniotomy and fixed it in place with Metabond (Parkell). Finally, we centered an aluminum headplate with a 5-mm-diameter opening over the cranial window and fixed it to the skull with Metabond. The custom-made plate was shaped to allow the additional attachment of two stainless steel bars to the cranium, which we used during Ca2+ imaging sessions to hold the mouse’s head secure.
Ca2+ Imaging Sessions
For imaging studies of ventral CA1 pyramidal neurons, we allowed the mice to explore an elevated platform (72 cm above the floor) consisting of two opposing open (35 cm × 8 cm), and two opposing closed arms [35 cm × 8 cm; wall height of 23 cm] for a total of 10 min. To start the assay in a uniform manner, we placed each mouse in the center of the platform (8 cm × 8cm) facing a closed arm. Ambient illumination in the open arms was 350-400 Lux.
For imaging studies of cerebellar Purkinje cells (Fig. S4A,B), we used a custom-built two-photon mesoscope, the design of which we have previously described in detail1,44. We acquired images over a 2 × 2 mm2 field-of-view at a 17.5 Hz frame rate (842 × 842 pixels).
Cell Extraction with CNMF, CNMF-e, and ICA
For studies with CNMF (Ref. 13), we used the open-source CaImAn-MATLAB Github repository. We based our implementation on the provided demo script and used the suggested settings in it. We set tau (half-size of a neuron) = 3, and set K (number of expected neurons) to 1.5 times the number of ground truth cells for the simulated data experiments. We used CPU parallelization by default; CNMF ran with 8 CPU workers in all experiments on our analysis computer.
To run CNMF-e, we used the original authors’ own implementation15, taken from a Github repository called CNMF_E. We based our implementation of CNMF-e on the provided demo script for running it on large data, inheriting most settings from the script. For both the striatum and the ventral CA1 data, we used gSig = 3, gSiz = 2*gSig, min_pnr=2.5, and min_corr = 0.7.
To run PCA/ICA, we used the authors’ published version12, which is available on MATLAB’s FileExchange forums. The ICA method first performs a principal component analysis (PCA) to reduce the dimensions of the data and then runs independent components analysis (ICA) to unmix the components spatiotemporally12. In all our studies, we ran ICA with μ = 0.1 (which sets the contribution of temporal information in the ICA step), its recommended value in the original paper12. We also used a maximum of 750 fixed-point iterations for the ICA step. In our studies with simulated data, we set both the number of principal components and the number of independent components to 1.5 times the number of ground truth cells.
Manual Sorting of the Cell Extraction Outputs
After running EXTRACT and CNMF-e for the striatal and the ventral CA1 datasets, we manually examined the outputs to eliminate possible false positives. For this, we wrote custom software that allows a user to view the movie with the cellular outline of each cell of interest and to judge the quality of the cell by comparing its cellular trace to the Ca2+ activity in the original movie. Using this approach, we eliminated output components that were thereby deemed of low quality, i.e., that yielded a poor spatiotemporal match between the candidate cell’s activity trace versus the activity in the movie, and the signal-to-noise ratio of its activity trace. After this step, for detailed comparisons between different cell extraction algorithms, we used cells that were retained after their identification by more than algorithm (see below).
Matching Cells between Cell Extraction Outputs
We matched cells between the outputs of the different cell extraction algorithms using custom-written MATLAB code, provided in our Github, that used a greedy matching scheme based on the cell images. For this, we first computed the distance matrix of Pearson correlations between a set of reference cells and a set of detected cells. We then traversed this distance matrix in the order of decreasing distance values, recording a match between the ith reference cell and the jth detected cell after visiting the (i, j)th index of the matrix. The ith row and the jth column were also set to infinity after visiting the (i, j)th index, to prevent further visits. Matching stopped when the currently visited index of the matrix held a lower value than a threshold, which we set to 0.5. For matching across several sets of outputs, we performed matching between all output pairs, and then reported the intersection of all pairwise-matched cells.
Detection of Ca2+ Transients
Prior to all quantitative analyses that involved Ca2+ traces, we detected the Ca2+ event peaks from the activity traces. For this, we used simple peak detection (peakseek function, available from MATLAB FIleExchange forums) on smoothed Ca2+ traces. We smoothed the Ca2+ traces using a 1-dimensional median filter with a window size of 3, followed by convolution with a Gaussian window function (gausswin in MATLAB with length 6). For peak detection, we did not consider time points in the Ca2+ trace with activity levels below an event detection threshold, which was between 0–1, as measured relative to the maximum of the Ca2+ trace. When reporting event peaks, we used the analog value of each Ca2+ trace at its event peak, instead of binary information marking the presence of Ca2+ event.
Detection of Dendritic Ca2+ Activity
For the analyses of Fig. S4, involving dendritic activity in cerebellar Purkinje neurons and neocortical pyramidal neurons42, we used two different approaches to optimize cell extraction results and runtimes. In both cases, we omitted the use of high-pass spatial filtering during the pre-processing stage, set the ‘dendritic awareness’ parameter in EXTRACT to 1 and visually inspected the outputs from EXTRACT. The default setting for the ‘dendritic awareness’ parameter is 0. However, when its value is set to 1, EXTRACT no longer discards candidate sources of Ca2+ activity whose spatial areas or eccentricity values are uncharacteristic of cell bodies. This alteration allows EXTRACT to detect Ca2+ activity sources, such as dendritic segments, with a wide range of shapes.
For studies of cortical pyramidal cell dendrites, we first temporally downsampled the Ca2+ videos from 31 fps to 7.75 fps and then ran EXTRACT on the downsampled movies. For studies of Purkinje neuron dendrites, we first sought to initialize EXTRACT with a reasonable set of candidate dendrites. To determine this set, we denoised the movie by performing a factor analysis, through a singular value decomposition of the movie. We discarded the noise components of the movie, as determined through the factor analysis, and spatiotemporally smoothed the resultant by convolving the movie with a filter that was of 3 time bins duration and 3 pixels wide in both spatial dimensions. We ran EXTRACT on the denoised, low-pass filtered movie version and used the resulting set of dendritic spatial profiles as the starting point for another iteration of EXTRACT, as performed on a denoised version of the movie that was spatially filtered as before but not temporally smoothed. Within both iterations of EXTRACT, we used the algorithm’s internal low-pass Butterworth spatial filtering in the pre-processing module, but with greater filtering along the rostral-caudal dimension then the medial-lateral dimension, to account for the rostral-caudal elongation of the Purkinje cell dendritic trees. After the second iteration of EXTRACT, we visually inspected the results and retained the larger dendritic segments with substantial Ca2+ activity.
Analyzing the Cell Extraction Outputs from the Simulated Datasets
After performing cell extraction on the simulated datasets, we first matched the found cells to the ground truth cells by using our aforementioned cell matching routine. To compute the areas under the spike precision-recall curves, we detected Ca2+ events within the traces provided by the cell detection algorithm, across a range of event-detection thresholds between 0–1 (in units of each cell’s peak Ca2+ signal), and we matched the ground truth spikes to the detected spikes to compute the spike recall and spike precision metrics. To perform this matching, we used the same greedy matching scheme described above but adapted to spike matching; instead of using a spatial distance matrix as we had for cell matching, for spike matching we computed a temporal distance matrix between the ground truth and detected spikes, and then negated the values of it to be consistent with the logic of the cell matching routine (greedy cell matching requires an affinity matrix). We also set the matching threshold to correspond to a maximum temporal separation of 3 image frames between the ground truth and detected events. After matching detected events to ground truth spikes, we computed the spike recall as the ratio of the number of matching detected spikes to the total number of ground truth spikes. We computed the spike precision as the ratio of the number of matching detected spikes to the total number of detected spikes. We averaged the spike and precision values for each detection threshold across all cells of a given movie, which resulted in the mean spike-precision curve for that movie. To compute AUC values, we performed numerical integration of the curves with MATLAB’s trapz function, which uses the trapezoidal approximation. After matching the detected cells with the ground truth cells, we computed the cell finding recall and precision metrics in an analogous manner to that used to compute the spike recall and precision.
Selection of Algorithm Parameters for Runtime Comparisons
We adjusted the parameters of EXTRACT and CNMF to compare their runtimes under conditions when the two methods returned comparable outputs. For EXTRACT, we set cellfind_min_snr (minimum acceptable pixelwise SNR for cell finding) to 2.5. For CNMF, we adjusted both patch_size (size of the independently processed movie tile), and K (number of cells to initialize), to tune the method for the fastest speed while outputting a comparable number of cell candidates as EXTRACT. Consequently, we set patch_size = 52 and K = 6.
Analyses of Striatal Spiny Projection Neural Activity
For analyses of striatal neural activity, we used published datasets of Ca2+ activity in spiny projection neurons, and to compute the spatial coordination index we followed closely the approach published in the original paper22. We first computed a matrix of centroid distances between each pair of cells in a movie. We then detected Ca2+ events from the output traces and obtained a binarized event trace by marking as ‘active’ the one-second period following each Ca2+ event. The motivation for this temporal expansion is that it better highlights clustered activity, which may not be perfectly synchronous, as described previously22. For each time point, using the centroid distance matrix, we obtained a histogram of pairwise centroid distances for all pairs of active cells at each time point. We also performed the same computations using shuffled versions of the same data in which the identification numbers of the cells were randomly permuted. From these shuffled datasets, we obtained a null distribution by aggregating the histograms of pairwise distances over 100 different permutations. For each time point, we then compared the histogram of pairwise distances for the real data to the null distribution using a one-sample Kolmogorov-Smirnov test with one tail, performed using MATLAB’s kstest function. This allowed us to test statistically whether the pairwise centroid distances in the real data were less than expected by chance. We then took the negative base-10 logarithm of the resulting p-value as the spatial coordination metric (SCM). We compared the resulting SCM values obtained using traces from PCA/ICA to those from CNMF-e and EXTRACT. For these comparisons, we used the same traces from PCA/ICA as in Ref. 22, which were already sorted, whereas for EXTRACT and CNMF-e we performed sorting (see above) ourselves after cell detection.
Classification of Arm-coding Cells in the Ventral Hippocampus
We wrote custom MATLAB software to determine mouse trajectories on the elevated plus maze, and we manually verified the accuracy of the estimated locations. We computed the average Ca2+ event rate on each arm-type of the maze by computing the mean of the event trace across the the time bins in which a mouse was on a given arm. For each cell, we obtained the difference between the Ca2+ event rates on closed and open arms, d = event_rateclosed_arm - event_rateopen_arm. We repeated the same procedure after circularly shifting the event trace of each cell by a random number of time bins, to break the dependence between Ca2+ events and mouse locations. We computed the event rate difference, d, on 1000 different instantiations of such randomized traces, providing a null distribution of d values for each cell. We classified a cell as closed-arm coding if d was within the 95th percentile or higher of the null distribution. We classified a cell as open-arm coding if d was within the 5th percentile or lower of the null distribution.
Correlation Analysis of Cell Images and Movie Frames for Ventral CA1 Pyramidal Neurons
To compute the Pearson correlation coefficient between the image of a given output cell and the movie frames at the time points with detected Ca2+ events, we first limited analysis to a small spatial neighborhood centered around the cell image. We binarized the cell image and then applied a morphological opening operation45 with a 3 pixels × 3 pixels structuring element. We treated the resulting two-dimensional binary array as a truncation mask to retain only the region-of-interest around the cell. We then determined the Pearson correlation coefficient between the truncated cell image array and the truncated movie frame array at the time of each detected Ca2+ event.
To compute a scalar, weighted correlation value for each cell, we took a weighted sum of the Pearson correlation coefficients using the event magnitudes as the weighting factors. Specifically, we first removed the zero entries of the event trace and normalized the trace so that its entries summed to 1. This yielded an array with the same size as the array of Pearson correlation coefficients for the same cell. We then took the inner product of the two arrays and reported it as the weighted correlation metric.
Decoding of Mouse Locations from Ca2+ Traces of Ventral CA1 Pyramidal Neurons
We divided the plus maze into 5 spatial bins: left arm, right arm, upper arm, lower arm, and the stem. We first obtained the analog-valued Ca2+ event traces from the output traces using our event detection routine. We then smoothed the event traces with a moving average filter of length 20 time bins, corresponding to smoothing over two seconds of activity. We trained support vector machines to predict the spatial bins from the smoothed event traces for a given session. We used the templateLinear function in MATLAB with SVM learners, selecting ridge regularization with regularization penalty selected automatically. We obtained decoding test errors by first circularly shifting the event traces by a random amount, then selecting the leading 70% of the circularly shifted event traces as the training set, and the latter 30% as the test set. We repeated this procedure 20 times, and we averaged the decoding test errors over 20 repetitions.
Acknowledgments
We gratefully acknowledge research support to M.J.S. from HHMI, the Stanford CNC Program, DARPA I2O, the NIH BRAIN Initiative, an NINDS R24 grant and the NSF NeuroNex program, an HHMI Gilliam Fellowship (B.A.), and a Burroughs Wellcome Fund CASI Fellowship (MJW). We thank R. Chrapkiewicz, A. Christensen, M.S. Ebrahimi, S. Haziza, H. Kim, A. Shai, M. White, and Y. Zhang for helpful conversations. C. Gillon and J. Zylberberg provided videos of Ca2+ activity in dendrites of cortical pyramidal neurons. D. Feng, J. Galbraith, L. Kuan, and F. Long provided videos of neocortical Ca2+ activity and helpful conversations about the Allen Institute Brain Observatory datasets and SDK.
Footnotes
EXTRACT Software is available at https://github.com/schnitzer-lab/EXTRACT-public, Correspondence about software code and Github respository: extractneurons{at}gmail.com
Figure 1 made compatible with Apple PDF viewers, Safari and Preview.