Fast and statistically robust cell extraction from large-scale neural calcium imaging datasets

large-scale neural calcium imaging datasets Hakan Inan1,2*, Claudia Schmuckermair1,2, Tugce Tasci1,2, Biafra O. Ahanonu1,2, Oscar Hernandez1,2, Jérôme Lecoq1–3, Fatih Dinç1,2, Mark J. Wagner1,2, Murat A. Erdogdu4,5 & Mark J. Schnitzer1,2,6* 1James Clark Center for Biomedical Engineering & Sciences, Stanford University, Stanford CA 94305, USA 2CNC Program, Stanford University, Stanford CA 94305, USA 3Allen Institute for Brain Science, Seattle WA 98109, USA 4University of Toronto, Toronto, ON N5S, Canada 5Vector Institute, Toronto, ON M5G 1M1, Canada 6Howard Hughes Medical Institute, Stanford University, Stanford, CA 94305, USA *Correspondence: inanh@stanford.edu or mschnitz@stanford.edu EXTRACT Software is available at https://github.com/schnitzer-lab/EXTRACT-public Correspondence about software code and Github respository: extractneurons@gmail.com

such data volumes, neuroscientists need computational tools that can quickly process extremely large datasets without resorting to analytic shortcuts that sacrifice the quality of results. A pivotal step in the analysis of many large-scale Ca 2+ imaging studies is the extraction of individual cells and their activity traces from the raw video data. The quality of cell extraction is critical for subsequent analyses of neural activity patterns, and, as shown below, superior analytics for cell extraction lead to superior biological results and conclusions.
Early methods for cell extraction identified neurons as regions-of-interest (ROIs) through manual [3][4][5][6][7] , semi-automated 8 or automated image segmentation 9,10 , which in turn allowed Ca 2+ activity in each ROI to be determined using either the identified spatial masks or multivariate regression.
Other cell extraction methods, including independent components analysis (ICA), non-negative matrix factorization (NMF), and constrained non-negative matrix factorization (CNMF), simultaneously infer cells' shapes and dynamics using a matrix factorization [11][12][13] . In these now widely used methods, the Ca 2+ movie is treated as a three-dimensional matrix that can be approximated as the product of a two-dimensional (spatial) matrix and a one-dimensional (temporal) matrix, although the detailed assumptions about this factorization differ between the three approaches and influence their relative strengths and limitations. Together, extant cell extraction methods have enabled Ca 2+ imaging studies with a wide variety of microscopy modalities and model species.
Notwithstanding the many past successes of Ca 2+ imaging, neuroscientists face important computational challenges as Ca 2+ imaging technology continues to progress rapidly. Many datasets contain noise that is not Gaussian-distributed, including background Ca 2+ signal contaminants from neuropil or neural processes, weakly labeled or out-of-focus cell bodies, and neurons that occupy overlapping sets of pixels. For simplicity, prior algorithms have typically used signal estimators to infer cellular Ca 2+ traces by assuming Gaussian-distributed contamination 9,[11][12][13][14][15] . Thus, these prior methods poorly handle the non-Gaussian contaminants found in real experimental situations, impeding detection of cells and inference of their Ca 2+ activity patterns. Further, due to the alternating estimation technique used in matrix factorization-based approaches [13][14][15] , errors due to mismatches between the data's assumed and actual statistical properties can rise quickly with the number of alternating iterations. To mitigate these estimation errors, past research has applied image processing methods to process either the Ca 2+ videos 15 or the inferred cellular components 13 .
However, a strict reliance on specific image processing routines can restrict a cell extraction algorithm's utility to the specific imaging conditions or modalities for which these routines were designed. To date, no cell sorting algorithm has addressed the challenges of Ca 2+ imaging within a single, generally applicable conceptual framework.
Here we present a broadly applicable cell extraction method that addresses the experimental limitations of real Ca 2+ imaging datasets while also avoiding assumptions that are specific to particular imaging modalities or fluorescence labeling patterns. Using the theoretical framework of robust estimation 16,17 , we introduce a minimally restrictive model of data generation and derive a statistically robust method to identify neurons and their fluorescence activity traces. Robust estimation is widely used in statistics, as it provides a potent means of analyzing data that suffers from contamination, such as outlier data points, whose statistical properties differ from those of an assumed noise model (typically Gaussian) 18 . Instead of modeling the contamination statistics, robust estimation provides statistical estimates that have quality guarantees even in the case of the worst possible contamination.
One obtains these quality guarantees by constructing a statistical estimator that selectively downgrades the importance of contaminated, outlier observations. In the presence of Gaussiandistributed noise plus non-Gaussian outliers, non-robust estimators can suffer enormous errors, whereas a suitable robust estimator can have negligible error 17 . In cell extraction, robust estimation allows us to incorporate non-Gaussian contaminants into the formulation and to infer neural activity with high fidelity without having to explicitly model the contaminants in Ca 2+ imaging experiments.
The result is a modality-agnostic approach that makes minimal assumptions about the data. We term the algorithm EXTRACT (for EXTRACT is a tractable and robust automated cell extraction technique), and the software is openly available (https://github.com/schnitzer-lab/EXTRACT-public).
contributes an activity trace given by the product of its spatial and temporal weights, 7 7 , where the index denotes the cell's identity (Fig. 1E). As in prior work 12,13 , we accomplish cell extraction by first performing a simple (and optional) pre-processing of the movie frames, followed by two main computational stages ( Fig. 2A). The pre-processing step applies a high-pass spatial filter to to reduce background fluorescence (which is common in one-photon Ca 2+ movies) and then subtracts from each pixel value its baseline fluorescence level (Methods). The first main stage of computation, 'Robust cell finding', identifies cells in the movie. The second main stage, 'Cell refinement', hones the estimates of cells' spatial profiles and activity traces. As with the toy model above, for which an L2 loss function led to crosstalk from a distractor cell, robust estimation allows the proper isolation of individual neurons from real data, even when there is substantial spatial overlap in cells' profiles and temporal overlap in their activity patterns.
The cell-finding stage uses a simple, iterative procedure to find cells and applies robust estimation to determine each cell's spatial profile and activity trace ( Fig. 2A,C,E). At each iteration, the algorithm finds a seed pixel that attains the movie's maximum fluorescence intensity, and it initializes a candidate cell image at the seed pixel (Methods). The algorithm then alternatively improves its determinations of the cell's spatial profile and activity trace via robust estimation (Fig.   2E). After the estimates of the spatial profile and activity trace stabilize, the cell's inferred activity trace is subtracted from the movie, and in the next iteration the steps above repeat for another cell.
The cell-finding procedure ends when the peak value for the activity trace of the seed pixel fails to reach a threshold value, which is set as a fixed multiple of the standard deviation of the background noise.
After cell finding, the 'Cell-refinement' stage improves the estimates of cells' spatial and temporal contributions to the movie data, by accounting concurrently for all the identified cells using multivariate robust estimation ( Fig. 2F; Methods). This stage is also an iterative procedure, and each iteration has three steps. First, all fluorescence traces are simultaneously updated using robust estimation, while holding fixed the cells' spatial profiles. Second, all spatial profiles are . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 25, 2021. ; https://doi.org/10.1101/2021.03.24.436279 doi: bioRxiv preprint simultaneously updated using robust estimation, while holding fixed the activity traces. Third, a validation procedure checks a set of predetermined metrics for every putative cell and removes any cell with metrics below user-set thresholds. This 3-step procedure repeats for a fixed number of iterations, and the algorithm outputs the final estimates of cells' spatial profiles and activity traces.
Crucially, to perform these computations efficiently, we created a fast solver for robust estimation problems that combines the computational cost of a first-order optimization algorithm with a convergence behavior approaching that of second-order optimization algorithm, such as Newton's method (Methods). Our solver is expressly adapted for and benefits greatly from the computational acceleration provided by graphical processing units (GPUs) and parallel computation.

EXTRACT allows high-fidelity cell extraction even with substantial signal contaminants
To validate a use of robust estimation for cell extraction, we first created simulated datasets on which to evaluate different cell extraction methods. We generated artificial Ca 2+ imaging data with varying numbers of spatially overlapping cells with two-dimensional Gaussian shapes and activity traces comprising a set of spikes that were Poisson-distributed in time and had exponentially decaying waveforms ( Fig. 3A; Methods). The artificial movies also contained additive Gaussian-distributed noise, uncorrelated between pixels. Although we did not explicitly add non-Gaussian noise, as in real datasets the spatial overlap between cells induced non-Gaussian signal contaminants. We varied the level of this contamination by adjusting the number of overlapping cells within a fixed field-ofview and by introducing temporal correlations in cells' activity patterns ( Fig. 3A; Methods).
First, we qualitatively evaluated the benefits of using robust estimation within the cell-finding stage of EXTRACT, as compared to using a conventional L2 estimator (i.e., a quadratic loss function) within this stage. We studied an artificial dataset that had 3 overlapping neurons with statistically independent spiking patterns (Fig. 3B) and compared the results from robust estimation to those from L2 estimation. For the robust estimator, we allowed to vary frame-by-frame so as to minimize the difference between the reconstructed and actual movie data (Methods). After running the cellfinding routine for 3 iterations in each case, robust estimation accurately identified all 3 cells, whereas . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 25, 2021. ; https://doi.org/10.1101/2021.03. 24.436279 doi: bioRxiv preprint with L2 estimation the activity traces had substantial crosstalk between cells, which progressively accumulated across the 3 iterations.
Next, we used multiple artificial datasets with varying densities of neurons to compare robust and non-robust estimation approaches using a variety of performance metrics (Fig. 3C-L). For each simulated Ca 2+ video, we compared the results from EXTRACT using the robust loss function to those using the non-robust, L2 loss function. In both cases, we identified individual spikes in cells' activity traces by applying a simple, threshold-based detection method to the Ca 2+ traces (Methods).
We computed precision-recall curves for spike detection by comparing the sets of detected and actual spikes over a range of spike detection thresholds, and we computed the mean area under the precision-recall curves (AUC) by averaging over all cells in each simulation. Notably, using robust estimation the cell-finding stage yielded substantially higher precision and recall values for spike detection than L2 estimation (Fig. 3C,D). In principle, the cell-refinement stage can correct errors incurred during cell-finding, since cell-refinement updates all the estimated cells concurrently, but in practice we found that robust estimation maintained its superiority after cell-refinement (Fig. 3C,D).
Next, we compared EXTRACT to two widely used cell extraction methods, constrained nonnegative matrix factorization (CNMF) 13 and the successive application of principal and independent components analyses (PCA/ICA) 12 (Fig. 3E-L; Fig. S2). Like EXTRACT, CNMF is a two-stage method, but it uses regularized L2-estimation and tries to infer discrete Ca 2+ events within the Ca 2+ activity traces while simultaneously estimating cells' spatial profiles and time-varying fluorescence intensities. The ICA-based approach first uses PCA to perform a dimensional reduction by identifying and then discarding principal components of the raw data whose time variations are consistent with Gaussian noise; by applying ICA to the reduced dataset, the method then un-mixes individual cells' contributions to the fluorescence movie. Within EXTRACT, we allowed to vary adaptively during the cell finding stage, but for cell refinement we fixed = 1 s.d. of the estimated baseline noise. To evaluate the 3 methods, we tested their performances on simulations of cells that fired spikes either independently of each other, or in a temporally correlated manner, across a range of cell densities . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made and conditions of either high (Fig. 3E-L) or low (Fig. S2) values of the optical signal-to-noise ratio (SNR).
Qualitative inspection of the estimated spatial profiles and activity traces revealed that, for cells spiking independently, both EXTRACT and ICA performed well, whereas activity traces from CNMF often suffered from crosstalk between neighboring cells (Fig. 3E). For cells with correlated spike trains, again EXTRACT performed well and CNMF produced traces with crosstalk but few instances of false negative spike detection; by comparison, ICA had reduced spike detection fidelity but almost no crosstalk, both due to the assumption in ICA of uncorrelated dynamics (Fig. 3F).
Quantitative assessments of the 3 methods corroborated these observations ( Fig. 3G-L;   Fig. S2). We first used the area under the precision-recall curve (AUC) to assess spike detection.
For independently spiking cells, EXTRACT and ICA attained high AUC values, whereas CNMF performed more poorly ( Fig. 3G; Fig. S2A). With correlated spike trains, ICA suffered a substantial decline in the AUC metric, with values comparable to those from CNMF at high SNR (Fig. 3J) and below those from CNMF at low SNR (Fig. S2D). Especially at high densities of cells, EXTRACT notably outperformed the other 2 methods and had the highest AUC values (Fig. 3G,J; Fig. S2A,D).
We also determined the Pearson correlation coefficients between cells' inferred activity traces and spatial profiles and their ground truth values (Fig. 3H,K; Fig. S2B,E); in this assessment EXTRACT surpassed or matched the other methods across all conditions. To examine how well the different algorithms identified cells in the simulated movies, we computed precision and recall metrics for cell detection by comparing the cells found by the 3 algorithms with the actual set of cells in the simulated datasets. EXTRACT had the highest precision for cell detection, with values close to unity (Fig. 3I,L; Fig. S2C,F), showing that nearly all cells found by EXTRACT were true positives. At some cell densities and high optical SNR, ICA had slightly higher recall values, but at a cost of much lower precision values (Fig. 3I,L). At low SNR, EXTRACT had the best recall values across all cell densities (Fig. S2C,F). Overall, EXTRACT and CNMF outperformed ICA at low SNR values, and EXTRACT outperformed CNMF in nearly all conditions. . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

A native implementation on GPUs enables fast runtimes
EXTRACT's main components are estimation algorithms that rely heavily on elementary matrix algebra. Thanks to several widely used software packages, such as the Intel Math Kernel Library, modern computers can perform matrix algebra operations in a highly optimized manner, which allows EXTRACT to achieve fast, computationally efficient cell extraction. Our software implementation of EXTRACT also has native support for computation on graphical processing units (GPUs), enabling even greater efficiency for matrix operations. To benchmark performance speed, we evaluated runtimes on simulated and real datasets of varying sizes.
First, we extensively tested EXTRACT on simulated movies of neural activity across a wide range of movie durations, fields-of-view and cell densities ( Fig. 4A-C). We used a MATLAB implementation of EXTRACT and compared runtimes with and without GPU acceleration (Methods).
For simplicity, we fixed = 1 s.d. for these tests. Runtimes increased close to linearly as a function of cell density and movie duration (Fig. 4A,C). When we varied the field-of-view (FOV) area while keeping the cell density constant, runtimes also rose linearly with the area (Fig. 4B). We note that, with the number of cells held constant, merely increasing the FOV does not necessarily increase the runtime, because EXTRACT only applies its estimation routines to image regions with identified cells; this minimizes computational overhead from empty regions of the FOV.
With GPU acceleration, runtimes were faster than those of the strict CPU implementation by a factor of 3 or more. On larger movies with wider fields-of-view or more image frames, the speedup from GPUs was more pronounced, as the built-in parallelization from GPUs generally allows greater performance gains with larger data structures (Fig. 4C). Both the CPU and GPU versions of EXTRACT yielded processing times comparable to or shorter than the movie durations, and the GPU version often had runtimes an order of magnitude faster than the movie durations (Fig. 4C).
To assess runtimes on real Ca 2+ imaging data, we applied EXTRACT to large-scale Ca 2+ movies acquired on a two-photon mesoscope 1 with a 4-mm 2 field-of-view ( Fig. 4D; ~10 min movie durations; 17.5 Hz frame rate). We tested CNMF on the same data, which allowed us to compare the CPU and GPU versions of EXTRACT to this widely used, state-of-the-art cell extraction algorithm. We chose the parameters of EXTRACT and CNMF so as to obtain comparable output from both methods ( Fig. 4E; Methods). Both versions of EXTRACT performed cell extraction more quickly than CNMF (Fig. 4E). With GPU acceleration, EXTRACT had a mean runtime of ~1.5 times the movie duration, about ~7 times faster than CNMF (Fig. 4E).

Fast, comprehensive cell extraction from the Allen Brain Observatory data repository
After validating EXTRACT on both artificial data and real data taken by two-photon imaging, we tested how well EXTRACT could process a substantial repository of Ca 2+ imaging data. To perform this test at a large scale, we applied EXTRACT to the publicly available Ca 2+ imaging data repository from the Allen Institute Brain Observatory 19,21 ( Fig. 5A-K). This data library comprises 628 sessions of in vivo two-photon Ca 2+ imaging data acquired in GCaMP6-expressing cells across different visual cortical areas of behaving mice. The repository's software development kit (SDK) has estimated spatial profiles and Ca 2+ activity traces for cells from each of the movies. The spatial profiles are regions-of-interest (ROI) estimates for each cell based on its morphology. Each cell's Ca 2+ trace comes from a linear regression of the Ca 2+ movie onto the cell's ROI, after subtracting an estimate of background Ca 2+ activity within the neuropil. We used these results from the SDK as a comparator for our assessments of EXTRACT.
We ran EXTRACT on 94 movies from the repository, using identical input parameters in all cases, and = 1 s.d. Visual inspections of the estimated Ca 2+ traces revealed that those from EXTRACT had higher SNR values than those from the Allen Institute SDK, for the very same neurons ( Fig. 5A-C). To confirm these observations quantitatively, we computed the SNR of the estimated Ca 2+ traces from EXTRACT and the Allen Institute SDK, using sets of cells identified by both algorithms (Fig. 5H,I).
We then compared the statistics of cell detection with the 2 algorithms, by identifying the neurons found by both approaches as well as those found by only one of the two methods. Across the 94 sessions, EXTRACT identified all but a small fraction (~1%) of the cells in the Allen SDK and . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 25, 2021. ; https://doi.org/10.1101/2021.03.24.436279 doi: bioRxiv preprint found many more cells not present in the Allen SDK ( Fig. 5D-G). On average, EXTRACT detected over twice the number of cells (Fig. 5F). Notably, the cells identified by EXTRACT but missing from the Allen Institute SDK generally had Ca 2+ traces with lower SNR values, suggesting that EXTRACT had greater sensitivity to cells with weaker optical Ca 2+ signals (Fig. 5J).
We also tabulated runtime statistics for EXTRACT across all 94 Ca 2+ movies, each of which was a 30-Hz-video, about 1 h in duration, with 256 ´ 256 pixels. EXTRACT took 12.4 ± 3.2 min per movie average for cell extraction, or ~20% of each movie's duration ( Fig. 5K; Methods). These runtime determinations are conservative, in that some of the runtime was devoted to image preprocessing, not cell extraction, and this could in principle be done beforehand.

Spatiotemporally clustered Ca 2+ activity in striatal spiny projection neurons of active mice
As a first test of whether EXTRACT can yield superior biological results, we studied Ca 2+ imaging data that we previously acquired in the dorsomedial striatum of freely behaving mice using a headmounted, epi-fluorescence miniature microscope 22 . Each dataset comprises a recording of neural Ca 2+ activity, as reported using the fluorescent Ca 2+ indicator GCaMP6m, in spiny projection neurons of either the direct or indirect pathway of the basal ganglia (dSPNs and iSPNs, respectively). We compared results from EXTRACT to those from PCA/ICA and from a variant of CNMF called CNMF-e that is tailored for one-photon fluorescence Ca 2+ imaging 15 (Fig. 6A,B).
When we inspected the neural Ca 2+ activity traces from the 3 methods, our observations fit well with those from simulated datasets (Fig. 3E, F). Notably, activity traces from PCA/ICA sometimes omitted Ca 2+ transients that were plainly visible by simple inspection of the raw movie data (Fig. 6C, blue dots). Further, Ca 2+ activity traces from both PCA/ICA and CNMF-e exhibited crosstalk between the neighboring cells (Fig. 6C, red dots). We next investigated whether these types of errors during cell extraction could impact biological results and conclusions.
Our prior study of striatal SPNs found that mouse locomotion led to activation of SPNs in a spatiotemporally clustered manner 22 . However, assessments of clustered activity are likely to be influenced by missing Ca 2+ transients or crosstalk between spatially adjacent neurons. For instance, . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  22, to quantify the extent of spatially clustered activity in the striatum at each time frame (Methods). We compared the results obtained by analyzing the activity traces from EXTRACT, PCA/ICA and CNMF-e for a common set of cells.
During periods of mouse inactivity, Ca 2+ activity traces from CNMF-e and PCA/ICA exhibited greater levels of correlated activity and higher SCM values as compared to the traces from EXTRACT (Fig. 6D). During locomotor activity, the traces from PCA/ICA had lower SCM values than those from CNMF-e (Fig. 6D). Notably, the ratio of the mean SCM value during locomotion to that during rest was significantly higher for the traces from EXTRACT as compared to those from CNMFe or PCA/ICA (Fig. 6E). Perhaps most importantly, SCM values for the outputs of EXTRACT had significantly higher correlation coefficients with the mouse's locomotor speed then the traces from either of the two other methods (Fig. 6F). We also confirmed that EXTRACT works well with twophoton Ca 2+ imaging studies of dSPNs and iSPNs (Fig. S3). Overall, our results show that superior cell extraction can lead to neurophysiological signatures that relate more precisely to animal behavior.

EXTRACT detects dendrites and their Ca 2+ activity
Some past cell extraction algorithms often do not provide sensible results when applied to Ca 2+ videos of dendritic activity. Thus, we tested EXTRACT on videos of dendritic Ca 2+ activity in cerebellar Purkinje cells and neocortical pyramidal neurons in live mice (Fig. S4). Although the default mode of EXTRACT discards candidate cells whose spatial areas or eccentricities are uncharacteristic of cell bodies, the user can opt to retain candidate sources of Ca 2+ activity without regard to their morphologies, thereby allowing EXTRACT to identify active dendrites. For example, in large-scale movies of Purkinje neuron dendritic Ca 2+ spiking activity acquired with a two-photon mesoscope 1 , EXTRACT identified the dendritic trees of >500 cells per mouse, and the extracted spatial forms had the anisotropic shapes that are characteristic of these cells' dendritic trees, which . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 25, 2021. ; https://doi.org/10.1101/2021.03.24.436279 doi: bioRxiv preprint are highly elongated in the rostral-caudal dimension 12 (Fig. S4A,B). We also used EXTRACT to analyze videos of Ca 2+ activity acquired by conventional two-photon microscopy in apical dendrites of layer 2/3 or layer 5 neocortical pyramidal cells in live mice (Fig. S4C,D). EXTRACT identified ~850-900 dendritic segments per mouse, and, as expected, they had a wide variety of shapes and temporally sparse Ca 2+ transients. For both cerebellar and neocortical neurons, we found no limitations to the dendrite shapes that EXTRACT could identify, and it readily detected large numbers of dendritic segments.

EXTRACT improves identification of place-and anxiety-encoding cells in the ventral CA1 area
As another test of whether EXTRACT can improve biological findings, we examined the Ca 2+ activity of pyramidal neurons in the CA1 area of the ventral hippocampus (Fig. 7A). We tracked the dynamics of these cells in freely behaving mice that navigated a 4-arm elevated plus maze (EPM, Fig. 7B).
The EPM had 2 enclosed and 2 open arms, arranged conventionally on the perpendicular linear paths of the plus maze. The EPM assay is based on rodents' innate aversion to open, brightly lit spaces and has been used extensively to investigate anxiety-related behavior 23 . A subset of ventral CA1 neurons, termed 'anxiety cells', show enhanced activity when the mouse is within anxiogenic regions of the EPM, namely the open arms [24][25][26] . Here, we used EXTRACT to obtain Ca 2+ activity traces of ventral CA1 neurons, and we compared their encoding of the open and closed arms to that in Ca 2+ activity traces obtained by applying CNMF-e to the same datasets.
In the activity traces from both EXTRACT and CNMF-e, a subset of ventral CA1 cells responded differentially when the mouse was in the open versus the closed arms (Fig. 7D). Namely, distinct subsets of cells were active when the mouse occupied the two different arm-types, in accord with past reports of anxiety-related coding by ventral CA1 cells 24,25 . However, Ca 2+ activity traces from EXTRACT generally exhibited a purer form of coding, in that the traces were typically silent when the mouse was in one arm-type but had high activity levels in the other arm-type. By comparison, the traces from CNMF-e tended not to distinguish the two arm-types as clearly (Fig.   7E). The traces from EXTRACT also corresponded more precisely to neural Ca 2+ activation events . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 25, 2021. ; https://doi.org/10.1101/2021.03.24.436279 doi: bioRxiv preprint that were plainly apparent in the raw movie data (Fig. 7E, lower panels).
To quantify these observations, we compared the arm-coding cells identified using the traces from the two different cell extraction algorithms. Notably, EXTRACT yielded significantly more armcoding cells than CNMF-e ( Fig. 7F; Wilcoxon signed-rank test, p < 0.05). To assess how well the Ca 2+ activity traces from the two algorithms reflected events in the raw Ca 2+ video data, for each cell we computed the Pearson correlation coefficient between the image of the cell, as determined by each algorithm, and the frame of the Ca 2+ video at the time of each detected Ca 2+ transient event (Methods). Ca 2+ events identified in the activity traces from EXTRACT had significantly greater correlation coefficients than those from CNMF-e (Fig. 7G,H; Wilcoxon rank-sum test, p < 6 ´ 10 -4 ), showing that EXTRACT more accurately captured the Ca 2+ dynamics in the raw movie data.
Finally, we evaluated how well the sets of activity traces from the two algorithms allowed one to estimate the mouse's behavior using decoders of neural ensemble activity. We divided the EPM into 5 spatial bins (Fig. 7I) and trained support vector machine (SVM) classifiers to predict the spatial bin occupied by the mouse based on the neural ensemble activity pattern at each time step (Methods). We compared the accuracies of the decoders using a separate subset of the data than that used to train the decoders. Irrespective of the threshold used to detect Ca 2+ events in the activity traces, activity traces from EXTRACT led to superior decoding than those from CNMF-e (Fig. 7J).
Strikingly, for every mouse the best performing decoder based on traces from EXTRACT outperformed the best decoder based on traces from CNMF-e (Fig. 7K).

EXTRACT is a versatile method suited for analyzing a broad range of Ca 2+ imaging datasets
Here we have introduced the use of robust statistical analyses to systems neuroscience. As shown above, EXTRACT provides a superior means of analyzing somatic or dendritic Ca 2+ data acquired with conventional, multi-plane or large-scale two-photon microscopes, or with head-mounted epifluorescence microscopes (Figs. 4-7; Fig. S3). This broad applicability stems from one major factor, . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made namely that the theoretical framework on which EXTRACT is based makes minimal assumptions about the nature of the data.
The robust estimation framework does not model noise sources; instead it aims to isolate cellular Ca 2+ signals from contamination sources while staying agnostic to the latter's exact form.
This approach leads to great flexibility. For example, when the contamination approximates statistically independent, Gaussian-distributed noise at each image pixel, the loss function used in EXTRACT adapts itself to behave like a linear regression loss, and it thereby achieves the optimal statistical efficiency of a standard maximum likelihood estimator 18 . In an opposite extreme case, when the data suffer from large contaminants due to Ca 2+ activity in overlapping cells or neuropil, the EXTRACT loss function modifies its robustness parameter so as to reject these contaminants.
Further, EXTRACT makes no assumptions about cell morphology, and, unlike CNMF, makes no assumptions about the temporal waveforms of Ca 2+ activity. Thus, EXTRACT can detect activity in either cell bodies or dendrites, whereas with CNMF detecting dendrites can be challenging.
Several prior methods for cell sorting have sought to separate cellular Ca 2+ activity from strong background contaminants. For instance, CNMF-e seeks to infer neural Ca 2+ activity while modeling background activity as a linear combination of the residual activity within nearby pixels 15 .
MIN1PIPE is also based on the CNMF method and, like CNMF-e, is mainly intended for analyses of one-photon Ca 2+ imaging datasets 14 . It applies several image processing steps to the movie data, carefully initializes the cell locations, and then applies the CNMF method. Other authors have applied post hoc de-noising of Ca 2+ activity traces, by taking a set of previously identified neurons and reestimating the Ca 2+ activity traces in way that seeks to minimize crosstalk and contamination 27 .
Common to all these prior approaches are efforts to either model the noise sources or to remove them, based on certain assumptions about the data. This general approach can lead to accurate results when the assumptions hold. However, due to the biases the assumptions introduce into the estimation process, this approach can also lead to unexpected, poor performance when the data diverges from the assumptions.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Based on this logic, EXTRACT makes few assumptions about the data and little use of image processing. Thus, while our robust estimation framework has not been fine-tuned to work optimally under specific statistical conditions, it is designed to yield high-fidelity results across a wide spectrum of data statistics. This allows EXTRACT to achieve excellent analytic performance on datasets from a variety of brain areas and Ca 2+ imaging modalities.
Nevertheless, our framework does have certain limitations. Under conditions with very low optical SNR, the estimator trades off robustness for fidelity, causing it to behave more like an L2 estimator (Methods). Although EXTRACT applies spatial filtering during the pre-processing and cell finding steps to enhance the input SNR, movies with extremely low SNR lead to sub-optimal results. Nonetheless, the outputs from EXTRACT should still be sensible due to its model-agnostic nature.

An efficient implementation for fast cell extraction that scales well to large datasets
Owing to recent advances in optical technologies, such as fluorescence mesoscopes and multi-arm microscopes that can monitor multiple brain areas concurrently, Ca 2+ imaging data is now routinely collected at a scale of several terabytes per publication 1,2,19,22,28 . Notably, time-lapse studies with multiple imaging sessions for each animal can readily produce datasets of this magnitude 22,28,29 .
Such datasets are so large that the raw data from a single original research study typically cannot even be shared on the most commonly used public data repositories. Aside from issues of data sharing, the sheer volume of leading-edge datasets necessitates faster processing algorithms to avoid a major bottleneck in the pace of systems neuroscience research.
To handle the most massive datasets, we developed EXTRACT and showed that it can process Ca 2+ movies in times that are up to ~10-fold briefer than the movie durations. EXTRACT's built-in GPU support substantially accelerates processing, allowing cell extraction from several gigabytes of data in a few minutes. On simulated datasets, EXTRACT performed quickly in all regimes, and the runtimes scaled gracefully as dataset sizes grew (Fig. 4B,C). On the Allen Institute Brain Observatory data, EXTRACT ran in only 20% of the time of a typical recording session; this enabled batch processing of ~9 terabytes of data (~100 h of recordings) in 18 h of processing ( (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 25, 2021. ; https://doi.org/10.1101/2021.03.24.436279 doi: bioRxiv preprint 5G). On two-photon mesoscope recordings with a 4 mm 2 FOV, EXTRACT ran much faster than the popular CNMF method while providing similar output (Fig. 4E). With these recordings, EXTRACT runtimes were comparable to the movie durations, showing that EXTRACT can readily handle neuroscientists' most ambitious ongoing experiments.
The accelerated computation from EXTRACT's use of GPUs does not require any special handling, such as explicit parallelization or algorithmic variations. EXTRACT runs the same code on CPUs and GPUs, if the latter are available to the user. With any suitable NVIDIA GPU installed on the analysis computer, one can readily use EXTRACT with GPU processing to achieve major speedups over the CPU runtime. GPUs typically cost a fraction of the analysis computer, and nowadays most pre-configured computers include GPUs that have computing capability. In addition to faster runtimes, EXTRACT's built-in GPU support implies that, since its computationally intensive tasks are run on the GPU, the user can run other CPU-demanding software at the same time.

EXTRACT enables improved scientific results
The identification of neurons from movie data is a crucial step in neuroscience experiments that rely on Ca 2+ imaging techniques for large-scale recording of neural dynamics. The extraction of individual cells and their activity traces reduces the raw data to a set of time series, the accuracy of which is crucial for the success of all subsequent analyses. Thus, EXTRACT aims to achieve high-fidelity results by avoiding extraneous image processing as much as possible while also de-noising the inference of cellular activity through robust statistical estimation. Unlike some past approaches to cell detection, we found that EXTRACT works well with Ca 2+ videos of dendritic activity, which often do not provide as many fluorescence photons as videos of somatic activity. Further, our results from two separate biological experiments, in striatum and hippocampus, demonstrate that the use of EXTRACT can lead to improved scientific results.
First, we evaluated EXTRACT, CNMF-e, and PCA/ICA using Ca 2+ imaging data taken from striatal spiny projection neurons (SPNs) (Fig. 6), which exhibit spatially clustered activity patterns during animal locomotion 22 . When we analyzed these activity patterns, the Ca 2+ traces from . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made EXTRACT revealed a greater contrast in the spatial clustering metric (SCM) between periods of locomotion and those of rest, as well as higher correlation coefficients between SCM values and locomotor speeds, as compared to the results obtained using traces from PCA/ICA or CNMF-e ( Fig.   6D-F). This fits with our observations that EXTRACT made the fewest mistakes during the cell extraction process, as seen by comparing the traces from all 3 algorithms to the raw data (Fig. 6C).
Second, we characterized place-and anxiety-related representations in the ventral hippocampus of mice behaving within an elevated plus-maze (Fig. 7). Using the neuronal Ca 2+ traces from EXTRACT, we identified significantly more cells with anxiety-related coding than when we used the outputs of CNMF-e ( Fig. 7E,F). Moreover, the use of EXTRACT also led to superior decoding analyses ( Fig. 7I-K), in that the traces from EXTRACT enabled better estimates than CNMF-e of the animals' locomotor trajectories (Fig. 7K). These results confirm that accurate biological findings require accurate reconstructions of neuronal activity and show that EXTRACT improves the results from downstream computational analyses, especially when the raw data may have substantial noise or fluorescence contaminants.

Outlook
Ca 2+ imaging technology continues to progress rapidly, with new tools arising for multi-color Ca 2+ imaging of multiple cell types and three-dimensional Ca 2+ imaging. Techniques for high-speed optical voltage imaging are also making rapid strides and provide direct access to neural membrane voltage dynamics. Because EXTRACT makes so few assumptions about the data statistics, future versions of the algorithm should be applicable to the data from these emerging imaging modalities with only straightforward modifications.
Moreover, to increase the numbers of neurons that can be tracked simultaneously, new imaging approaches are arising in which cells from multiple planes in tissue are deliberately superposed in the raw video data [30][31][32][33] ; the cells and their activity traces must then be disentangled through offline data analysis. EXTRACT's capability for high-fidelity isolation of individual cells, even when cells substantially overlap one another in the raw data, should facilitate multi-plane imaging by . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 25, 2021. ; https://doi.org/10.1101/2021.03.24.436279 doi: bioRxiv preprint enabling a greater number of planes to be sampled concurrently while still being able to computationally extract the individual neurons from dense sets of overlapping cells. More broadly, we expect that the general framework of robust statistics will have broad applications throughout systems neuroscience for analyses of many types of recording data, both optical and electrophysiological.

Acknowledgments
We gratefully acknowledge research support to M.J.S. from HHMI, the Stanford CNC Program, DARPA I2O, the NIH BRAIN Initiative, an NINDS R24 grant and the NSF NeuroNex program, an HHMI Gilliam Fellowship  (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 25, 2021. ; https://doi.org/10.1101/2021.03.24.436279 doi: bioRxiv preprint image frame for that time bin is the sum of the contributions from all the cells, plus noise. Bottom, A simple application of the above model to a movie frame that contains one cell. Unlike conventional statistical approaches in which the noise is assumed to be normally distributed, we allow the noise to have an unknown, non-negative contamination component that is subject to no other assumptions.
(F) Using a framework for robust statistical estimation, we identify a loss function, , that achieves optimal estimates with the least possible mean squared error (MSE) given the worst possible form for the unknown noise distribution with support on [ ,¥). has a quadratic dependence for negative arguments and for positive arguments below a threshold value, . For arguments greater than , rises linearly; this is what renders robust estimation relatively impervious to occasional but large non-negative noise contaminants, which typically skew conventional estimation procedures. Given the experimental data, , and an estimate of either the spatial or temporal components, and , estimating the other component simply involves minimizing as a function of the residuals between the estimated and actual movie data (see Methods for derivations).
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made   The EXTRACT algorithm comprises an optional stage for preprocessing of the raw Ca 2+ videos, followed by two primary stages of data analysis. The preprocessing stage filters fluorescence fluctuations at coarse spatial scales that arise from neuropil Ca 2+ activity. The robust cell finding stage identifies and extracts individual neurons in a successive manner, using robust estimation to infer each cell's spatial and temporal weights. In the cell refinement stage, these weights are updated through iterative, alternating refinement of first the spatial and then the temporal weights, again using robust estimation. (E) Robust cell finding is an iterative procedure that identifies individual cells in a successive manner.
At each iteration, a seed pixel is chosen that attains the maximum fluorescence values among all the pixels, and a Gaussian-shaped cell image is initialized around it. Next, this image and the cell's activity trace are iteratively updated via alternating applications of robust estimation. When this process converges, the cell's estimated fluorescence contributions to the Ca 2+ video are subtracted from the movie, and then the entire process repeats for another cell. This procedure continues until the identified seed pixel has a maximum instantaneous signal-to-noise ratio that falls below a minimum threshold, as determined by examining the s.d. of the pixel's intensity fluctuations across the entire movie.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 25, 2021. ; https://doi.org/10.1101/2021.03.24.436279 doi: bioRxiv preprint (F) Cell refinement is also an iterative procedure. At each iteration, the set of estimated Ca 2+ traces are updated using robust estimation while holding fixed the cell images; the cell images are then updated using robust estimation while holding fixed the activity traces; a set of quality metrics is computed for each cell, and cells for which one or more of metric values is below a minimum threshold are eliminated from EXTRACT's final output.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  Step 1 Step 2 Step 3

B
Correlated spikes F Robust estimation Non-robust estimation Independent spikes Robust L 2

Figure 3
Step 1 Step 2 Step 3 50% ΔF/F   with L2-estimation the cross-talk between cells in the first step led to subsequent inaccuracies, including activity estimates that did not match well to the ground truth dynamics.

Number of cells Number of cells Number of cells Number of cells Number of cells Number of cells
(C, D) Using the algorithm outputs on data with correlated spikes, we used a simple thresholding method to detect discrete Ca 2+ events within the inferred Ca 2+ traces. We then compared these Ca 2+ events to those in the ground truth data and thereby computed the precision-recall curves for spike detection (Methods). The area under the precision-recall curve (AUC) served as a metric of the fidelity of spike detection. We followed this procedure for output from EXTRACT that uses both the robust and the L2 estimator, and computed the curves both after cell finding and cell refinement.
Panel C shows precision-recall curves for a representative Ca 2+ movie with 30 neurons, after the cell . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made   Thousands of frames 36 18 CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  to normalized runtimes (ratios of runtime to movie duration) (right) that were largely independent of movie duration. Across most parameter regimes (A-C), EXTRACT runtimes on both CPUs and GPUs were comparable to or smaller than the durations of the simulated movies. The GPU implementation was generally faster than the CPU implementation by than a factor of three or more for all experiments and was up to an order of magnitude faster than the duration of the simulated Ca 2+ movie.

(D, E)
To benchmark runtimes on state-of-the-art experimentally acquired Ca 2+ imaging datasets, we applied EXTRACT to Ca 2+ imaging datasets taken at a 17.5 Hz imaging frame rate in GCaMP6f-tTA-dCre mice that express the Ca 2+ indicator GCaMP6f in layer 2/3 cortical pyramidal neurons, using a recently published two-photon mesoscope 1 with a 4 mm 2 FOV. We also evaluated runtimes for CNMF, which is often applied to two-photon Ca 2+ imaging data. We tuned the parameters of each algorithm so that they returned comparable numbers of cells when applied to the same Ca 2+ movie (Methods). Panel D shows an example cell map, displaying 1371 cells, obtained by applying EXTRACT to Ca 2+ imaging data from neocortical layer 2/3 pyramidal neurons, as taken with the 16-. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made    (G) Box-and-whisker plots providing a statistical characterization of cell extraction across the 94 individual imaging sessions. In general, EXTRACT detected nearly all the cells found by the Allen SDK. Typically, EXTRACT identified more than twice the cells, but sometimes even 4-5 times the cells that were identified by the Allen SDK. We counted only those cells output by EXTRACT that surpassed threshold values for the cell size and trace quality metrics (Methods). . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 25, 2021. ; https://doi.org/10.1101/2021.03.24.436279 doi: bioRxiv preprint (J) Statistical distributions of the Ca 2+ activity trace SNR values for the cells found by EXTRACT that either were or were not also found by the Allen SDK. The two distributions substantially overlap, indicating that SNR alone cannot explain why the Allen SDK missed many of the cells identified by EXTRACT.
(K) Box-and-whisker plots characterizing the runtime statistics for the GPU version of EXTRACT across the 94 imaging sessions. Typically, runtimes were 5 times faster than the duration of each imaging session.
In the box-and-whisker plots of G and K, horizontal lines within the box plots indicate median values across the 94 sessions, boxes enclose the second and third quartiles, whiskers indicate 1.5 times the interquartile distance, and individual data points mark outliers.

(I-K)
We divided the EPM into five bins (I) and used support vector machine classifiers to predict the mouse's location using the detected Ca 2+ events in the traces from either EXTRACT or CNMF-e.
Panel I shows the locomotor trajectory of an example mouse, color-coded such that each of the 5 spatial bins is shown in a different color. Panel J shows the accuracy of classifying the mouse's location on a test dataset, as determined using each of the two cell extraction methods and across a range of Ca 2+ event detection thresholds (normalized relative to each cell's peak Ca 2+ signal).
Panel K shows the mean prediction accuracies, averaged over 6 sessions involving 3 different mice.
EXTRACT led to superior classification of the mouse's location (Wilcoxon signed-rank test, *P < 0.05; N = 6 imaging sessions). Individual data points and lines denote results from individual imaging sessions.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Figure S1 | Comparison of robust estimation versus L2-estimation with L1-regularization.
(A) We simulated the same scenario as in Figure 1A, with one cell of interest and another overlapping 'distractor' cell. We simulated ground truth Ca 2+ activity traces for both cells under an assumption that their dynamics were independent and using an SNR value of 5 (Methods).

(B)
We inferred the Ca 2+ activity trace for the cell of interest using robust estimation (left) and L2estimation with L1-regularization (right). For robust estimation, we varied , the parameter that relates to the level of non-Gaussian contamination, between 0.2-100 (top 5 traces); we also performed robust estimation with adaptive variations of (bottom trace). Lower values of led to reduced amplitudes in the inferred activity trace; this effect was substantially more pronounced at times when the distractor cell was active. Consequently, reduced values of suppressed contamination from the distractor more than it suppressed the activity of the cell of interest. Varying in an adaptive manner . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Mathematical variables
We denote the size of the imaging field-of-view as ℎ ´ , in units of pixels. We refer to the scalar product ℎ as . We use boldface characters for arrays and non-boldface characters for scalars.
As in the main text, we denote the movie matrix as (flattened in space, so that is a twodimensional matrix), the matrix of spatial weights (cell images) as , and the matrix of temporal weights (Ca 2+ traces) as .

Definition of Signal-to-noise Ratio (SNR)
We define the signal-to-noise ratio (SNR) for a given signal as . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Theory of Robust Estimation in the Presence of Large Non-negative Contaminants
Here, we introduce our signal estimation approach, based on the theory of robust M-estimation. This theory is well-developed for symmetric and certain asymmetric contamination regimes 16,36-38 . However, prior theoretical work does not readily suggest an optimal estimator that is suitable for use with the types of signals that arise in neural Ca 2+ imaging studies. Thus, we first motivate and introduce a simple mathematical abstraction for treating such studies. We then derive a minimax optimal M-estimator. For simplicity, we present our treatment in the setting of univariate estimation, which generalizes in a straightforward way to multivariate regression. neuropil Ca 2+ activity, out-of-focus cells, and residual activity of overlapping cells that are not detected and well accounted for by the cell extraction method. This latter category of contamination is very distinct from normally distributed noise; namely, it is non-negative (or above the signal baseline), its characteristics can be highly irregular, and it may take on large values. Therefore, we model the data generation process as having an additive noise source that is normally distributed a fraction 1 − of the time, but which is free to be any positive value greater than a threshold otherwise: . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Here, A denotes an experimental observation, which deviates from * , the true value of the measured quantity, due to corruption with an additive noise term, A . This noise term, A , is normally distributed with 1 − probability and distributed according to an unknown distribution, T , with probability . For the sake of generality, we allow T to be any probability distribution with support over the range [ , ∞), for a fixed value ³ 0. In particular, T can be nonzero over an arbitrarily large range of possible noise values. Therefore, can be interpreted as setting the extent or severity of 'gross contamination'. If is small, the noise will be close to Gaussian-distributed. On the other hand, as nears one, the noise distribution deviates from a normal distribution to an arbitrary extent. The . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made This is simply because the members of correspond to convex loss functions. Our focus is on such functions, because they are typically easier to optimize and offer global optimality guarantees. We seek an M-estimator for our noise model that is robust to variations in the noise distribution ( T in particular), in the sense of minimizing the worst-case deviation from the true parameter, as measured by the mean squared error. We first introduce our proposed estimator and then show that it is exactly optimal in the aforementioned minimax sense.
We define an estimator function, u , as follows: where is defined in terms of the contamination level, , according to in which (⋅) and (⋅) denote the distribution and the density functions for a standard normal variable. We refer to u as the one-sided Huber function and denote its corresponding loss function as u (⋅, ). Clearly, u ∈ , and therefore the loss function, u , is convex. Under our proposed data generation model, we can now state an asymptotic minimax result for u : Similarly, for the denominator, we write Therefore, the asymptotic variance is given as ( u , ) = [(1 − ) ( )] †f , which is constant over the contamination class ℋ | . Now, define a distribution u by its density u satisfying the condition − log( u )/ = u : . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Then, ∀ ∈ ℱ, we have Moreover, a straightforward application of the Cauchy-Schwartz inequality yields Finally, note that the left equality is weaker than the statement in (1). This proof of Proposition 1 establishes that the one-sided Huber estimator has zero bias as long as the non-zero contamination is sufficiently larger than zero, and it also achieves the best worst-case asymptotic variance.
We now compare the one-sided Huber and some other popular M-estimators, such as the sample mean (ℓ 1 loss), the sample median (ℓ f loss), the Huber estimator, and the sample quantile.
First of all, given our model of the noise, the sample mean, the sample median, and Huber estimators all have symmetric loss functions and therefore suffer from bias. This effect is particularly severe for the sample mean estimator and leads to an unbounded MSE when gross contamination takes on very large values. The bias problem may be eliminated using a quantile estimator whose quantile level is set according to . However, this estimator has a higher asymptotic variance than the onesided Huber estimator. Although we have not encountered a prior study of a one-sided Huber . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made We can now introduce the regression setting that we use for solving for the temporal and spatial weight matrices. We illustrate this for the simple case of solving one row in the spatial weights matrix, or one column in the temporal weights matrix. We observe { A , A } Aef g , where A ∈ ℝ ™ could be either fixed or random, and A 's are generated according to A = ⟨ A , * ⟩ + A • + A ž , where * ∈ ℝ ™ is the true value of the parameter to be estimated, and A is as previously defined. We estimate * with Classical M-estimation theory establishes, under certain regularity conditions, that the minimax optimality in the univariate case carries over to multivariate regression; we refer the reader to Ref. 20 for details.

Solving the Robust Regression Problem with a Fast, Custom Method
We seek to solve the robust regression problem of equation (2) in a large-scale setting, given the large field-of-view and duration of most neural Ca 2+ videos. Hence, the solver for our problem should, ideally, be tractable for large and provide as accurate an output as possible. To this end, we propose a fast optimization method that has a step cost equal to that of gradient descent while making use of second-order information and exhibiting similar behavior to Newton's method: While || ±-f − ± || 1 ≥ : Below we present the convergence result for our solver described in Algorithm 1.
(Including more indices in results in smaller ¯) .
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made • ∑ A A∈¯A « ≽¯. This assumption is reasonable when is large and consequently there are many samples in the quadratic regime.
• ÁÄÅ and ÁÂÃ are the largest and smallest eigenvalues of « , respectively.
For in the ball centered around * with radius ¯/ , we have for ∀ ∈ , Therefore, when the iterates get close to the true minimizer, ∀ ∈ , the residual corresponding to sample falls into the quadratic region. This implies that the Hessian satisfies which says that in the ball = { : ∥ ∥ − * ∥ ∥ 1 ≤ / }, the objective function is ¯-strongly convex.
Strong convexity implies smoothness, i.e., 1 ≼¯ for ∀ ∈ . In this regime, the following calculation is standard.
Assuming that the current iterate is , our approach takes a step of the following form: By ¯-smoothness, we can write: . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 25, 2021. ; By ¯-strong convexity: The second inequality follows from setting ′ = − 1/ ÁÂÃ ( ), which is the minimizer of the righthand side of the first line. Choosing ′ = * above yields 1 2¯∥ ∥ ( )∥ ∥ 1 1 ≥ ( ) − ( * ) .
Using this and the smoothness inequality, we write . This is linear convergence with coefficient 1 − 2 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
In our algorithm, and Replacing the Hessian with « = ∑ which reduces to the update step of our solver.
As shown above, our solver is second-order in nature hence its convergence behavior should be close to that of Newton's method. However, there is one caveat: the second derivative of the onesided Huber loss is not continuous. Therefore, one cannot expect to achieve a quadratic rate of convergence; this issue is commonly encountered in M-estimation. Nevertheless, Algorithm 1 converges very quickly in practice.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Setting Adaptively in Robust Estimation
We write and for the true values of these parameters. Recall that the two are related as We introduce a shorthand function : We have the following routine for estimating : We assume that we start with a fixed ′ (usually set to 1), for which we find a estimate, and then compute the residual to estimate the true . We denote any estimate of with ^. We use an iterative scheme in which we set κ′( ) =^( ± †f) and do a few iterations to get increasingly finer estimates, ^( ±) . When estimating , for simplicity we only deal with univariate regression (scalar ). We use A ≜ A − A to denote the residual for any given .
Ideally, we would use an estimator with the lowest possible variance. On the other hand, it is important in practice to restrict ourselves to estimators that are computationally efficient to use.
Therefore, we use an estimator for which we assume A has a density c ç (distributed according to our noise model with true parameter ) and denote with ℎ | the density of | . Let ≜^− * . We can obtain a straightforward relationship between ′ and A (only in the asymptotic regime) as follows: . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Once we establish a mapping between the asymptotic bias, , and , we can estimate from above.
However, in practice we will not be able to get a good estimate of A , and we need to estimate an aggregate quantity by averaging over multiple measurements. Therefore, we need to deal with the following quantity: We can find a relationship between A and ′, using the asymptotic optimality condition, which we will simply refer to as the bias condition: Note that if = 0, we have ′ = . In general, we can use this to eliminate and get In order to isolate A , we can approximate using its first order Taylor expansion around ′: We wish to plug (5) into (4) to eliminate A and attain a relationship directly between and the data related quantities. (From there, we can estimate the real with a ^) . For this, we need to isolate ∑ A A from (4). We simply expand the normal CDF around A = 0 using its 1 st -order approximation and get . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made We summarize the procedure to estimate in Algorithm 2 below.
In practice, we use adaptive only for cell finding, via the univariate estimation scheme above.

EXTRACT Preprocessing Module
The spatial high-pass filter is a second-order high-pass Butterworth filter designed in the frequency domain with a cutoff determined by the user-provided average cell radius. First, a corner frequency is computed by 1/p /radius, and then the cutoff for the Butterworth filter is determined (separately in the x and y directions) by dividing the corner frequency by a dimensionless factor set by the user (the default value is 5). The resulting high-pass filter is multiplied with each frame of the Ca 2+ movie in the spatial frequency domain and then transformed back to real-space. EXTRACT preprocessing also supports spatial low-pass filtering of the movie for smoothing. The spatial low-pass filter is also a second-order Butterworth filter, and the cutoff frequency is obtained by multiplying the corner frequency by another user-set, dimensionless constant (the default value is 2).
The baseline removal method is applied separately to the time trace of each movie pixel. The

EXTRACT Cell Finding Module
In the cell finding module, we first compute a 'smoothed' maximum projection image of the whole movie, which we obtain as follows. For each movie pixel, we first identify the time point at which the Ca 2+ activity of the pixel reaches its maximum. We record this information in an array ∈ ℝ ³ . We then compute the "smoothed" maximum projection image, ∈ ℝ ³ as In other words, for each pixel i, we average the values of the Ca 2+ activity of the pixel over the time points at which neighboring pixels had their activity maximums. The function, neighbors(⋅), selects the neighboring pixels of a given pixel; this is done in practice by creating a binary circular mask around the query pixel with a radius of 2 pixels and returning the indices that are nonzero. This procedure has the advantage that it reports values close to the maximum values of pixels within cells, due to the co-activation of a neighborhood of pixels within, whereas the activity of the noise pixels is substantially mitigated due to averaging over uncorrelated activity.
At every iteration of the cell finding module, a seed pixel is chosen as the brightest pixel in the smoothed maximum projection array, , and then a cell image centered at the seed pixel is initialized. This initialization is done either by generating a Gaussian shape with a radius equal to a user-given radius estimate, or by using the temporal Pearson correlation of the Ca 2+ activity of the seed pixel with the movie, and truncating the correlation image at 0.5 of its maximum. With the resulting estimate of the cell image, the temporal Ca 2+ trace of the cell is obtained using a onecomponent robust regression. The cell image is then re-estimated using the same regression routine, this time with the trace estimate as the input. This alternating estimation scheme is repeated either . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 25, 2021. ; https://doi.org/10.1101/2021.03.24.436279 doi: bioRxiv preprint 10 times, or until the relative change in the cell image and the trace estimates between iterations are <1% as measured by the L2 norm.
For the one-component robust regression, we optimize the one-sided Huber loss with a nonnegativity constraint on the cell image and the trace using the Newton's method. The non-negativity constraint follows from our fundamental assumption that the neural activity always rises above the baseline noise, and this constraint leads to more sparse solutions. The non-negativity constraint is enforced by solving the problem using Newton's method first, and truncating the result at zero. This returns the same result as if non-negativity was enforced during optimization, because it is a scalar estimation problem. After obtaining the cell image, , and the trace, , for the identified cell, we subtract the contribution from this cell by setting to − . We then re-compute the smoothed maximum projection, , for only the movie pixels that were affected by the activity subtraction.
At the end of each iteration, we apply a quality check to the cell image and the trace of the identified cell to decide whether to include it in the set of identified cells. We discard cells that occupy an abnormal number of pixels given the expected area of a typical cell (as computed from the userprovided estimate of a cell's radius). We also compute the trace SNR for each cell, and discard it if the trace SNR is lower than the user-provided threshold.
We terminate cell finding if any of the following conditions are met: 1) The maximum allowed number of iterations set by the user has been exceeded 2) The pixel-wise SNR in the current seed pixel is lower than the user-provided SNR threshold 3) The running yield, defined as the fraction of good cells over the last 10 iterations, is lower than 1 in 10. The cell finding module outputs the spatial and temporal weights of the identified components in two matrices: the spatial weights matrix , whose columns contain the (flattened) cell images, and , the temporal weights matrix, whose rows contain the corresponding Ca 2+ traces.

EXTRACT Refinement Module
In the refinement module, we update the entire spatial weights matrix or the entire temporal weights . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 25, 2021. ; https://doi.org/10.1101/2021.03.24.436279 doi: bioRxiv preprint matrix at once by multivariate regression using the above-introduced fast solver. For estimating both and , we impose the constraint that they are non-negative, as in the cell finding step. When solving for only, we compute a binary mask ℳ obtained by convolving each cell image with a disk filter of a radius equal to the average cell radius, followed by binary thresholding. We then add the following constraint: This constraint ensures that estimation of each component is restricted to a local neighborhood, preventing artifacts due to strong spatiotemporal co-activity between spatially distinct regions of the movie. This local restriction constraint defines a convex set, hence it can be added to the estimation problem without violating convexity.
Overall, given and , the S-estimation step solves the following problem: Given and , the T-estimation step solves the following problem: We solve both of these problems with a consensus optimization method that is based on dual ascent, termed 'alternating direction method of multipliers' (ADMM 40 ). Adding constraints to our original problem through ADMM is straightforward, and it allows us to use our fast solver, _ (⋅) as a subroutine.
After each alternating estimation step, which involves first solving for given , and then for given , we compute several quality metrics and discard the subset of cells for which any of the . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 25, 2021. ; https://doi.org/10.1101/2021.03.24.436279 doi: bioRxiv preprint computed metrics are worse than certain user-set thresholds. In particular, we compute the following quality metrics: Trace SNR: We compute the trace SNR for each component given its Ca 2+ trace. We eliminate cells whose trace SNR is below the trace SNR threshold.

Area of the cell image:
We compute the area of each cell image by summing the number of pixels with spatial weight >0.1 times the maximum weight. If the calculated area is smaller than a lower threshold or higher than an upper threshold, then the cell is discarded. (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 25, 2021. ; https://doi.org/10.1101/2021.03.24.436279 doi: bioRxiv preprint the graph adjacency matrix, and the above steps are repeated for this procedure to identify duplicates.

Spatial corruption metric:
We compute a spatial corruption metric that measures the lack of local smoothness in a cell's spatial weight values. We do this based on a heuristic that compares the variance of spatial weights for each cell to a 'local variance' for the same cell. We first compute the empirical variance of the spatial weights that are larger than 10 -3 . We then compute the local variance as the sum of squared distances between the spatial weight for a pixel and that after applying 2D low-pass filtering based on a square kernel with uniform weights over a 4 ´ 4 pixel neighborhood.
The spatial corruption metric is the ratio of the local variance to the spatial weight variance. Intuitively, better-looking cells have negligible local variance when compared to the spatial weight variance, so the spatial corruption metric will be small for these cells. In the algorithm, the threshold for spatial corruption is set at 0.7, based on our experience of spatial corruption metric values across datasets.

Spatiotemporal match metrics:
We use two quality metrics that are intended to assess the relative spatiotemporal contribution of the cell with respect to the power of the cell signal. The first metric looks at the mean gap (averaged over all movie frames) between the cellular activity within the ROI encapsulated by a cell's spatial weights (weighted by the spatial weights), and the same cell's fluorescence trace. This metric accounts for the activity within the ROI that is not explained by a cell's fluorescence trace. The second metric looks at the mean gap (averaged over movie frames) between a cell's fluorescence trace and nearby fluorescence activity in its vicinity. This metric accounts for the spurious activity estimated to belong to a cell that is attributable to its surroundings.
Our implementation for these metrics can be found in our codebase inside the function find_spurious_cells(), which can be referred to for full details on how the various fluorescence activity traces are computed. Both metrics must be <10 -2 for EXTRACT to accept the identified cell in the output.
Set of output activity traces: EXTRACT provides two options regarding the final set of estimated . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Ca 2+ activity traces, termed 'non-negative' or 'raw' in the software Github. With both options, the robust solver operates under the constraint that the Ca 2+ signals must be non-negative until the end of the cell refinement process. The motivation for this constraint is that EXTRACT considers activity below each cell's baseline value to be noise, where the baseline is determined by the cell's timeaveraged mean fluorescence. The algorithm thresholds all activity that is below this baseline, which leads to non-negative activity traces. When the 'non-negative' option is selected, EXTRACT provides these non-negative traces to the user, and throughout the paper we used this option. However, if the user selects the 'raw' option, EXTRACT performs an additional final round of robust estimation to solve for the activity traces using the final set of cells' spatial profiles, but with the non-negativity constraint removed from the robust solver.

Computer Hardware
For all studies involving CPU implementation of EXTRACT, we used an Intel® Xeon(R) CPU E5-2637 v4 @ 3.50GHz × 16 computer. For all studies involving a GPU implementation, we used a single NVIDIA GTX 1080 processor.

Simulated Ca 2+ Imaging Datasets
We created synthetic Ca 2+ imaging data that is designed to be representative of the Ca 2+ activity of cortical pyramidal neurons. The generation of synthetic data comprised three independent steps.
In the first step, we simulated the Ca 2+ traces of neurons assuming a 10 Hz imaging frame rate. For this, we first simulated spike trains for each cell by assuming that spike occurrences were governed by a Bernoulli random variable with a probability of 0.01, corresponding to spike rate of 0.1 Hz. We then convolved the resulting spike trains with an exponentially decaying temporal kernel of (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 25, 2021. ; https://doi.org/10.1101/2021.03.24.436279 doi: bioRxiv preprint point, with synchronization probability (chosen as 0.2), we assigned new spiking probabilities to each neuron such that all cells of the same group shared a common spiking probability. After simulating the synchronized spikes in this way, we adjusted the baseline spiking rate of each cell to keep its overall mean firing rate constant at 0.1 Hz.
In the second step, we simulated spatial profiles of individual neurons. For this, we used a fixed-sized square field-of-view, with each square pixel corresponding to a 1 μm 2 image region. We created the fluorescence image of each cell independently from the others by randomly sampling a two-dimensional Gaussian distribution oriented in a random direction relative to the x-y coordinate axes of the movie. For each cell, we independently and randomly chose s.d. values for this Gaussian distribution between 2.5-5 pixels, in order to have an effective cell radius ranging between 5-10 μm, approximating the radius as twice the s.d. of the Gaussian. We truncated the weights of each cell to 0.01 of its maximum weight, setting weight values beneath this threshold to zero. The cell centroids were randomly distributed within the field of view, and we enforced a minimum distance between the cells' centroids. This minimum distance was 4 μm for the quantitative comparisons between the different cell detection algorithms and 7 μm for the studies of algorithmic runtimes.
In the third and the final step, we generated the noise components of the synthetic Ca 2+ movie by sampling random values from a normal distribution for each pixel and each time point, with a s.d.
set according to the desired mean pixel-wise SNR for the movie. We generated the final synthetic movie as the product of the matrix of the cells' spatial weights and that of their Ca 2+ traces, with the noise matrix added to this matrix product.
For the runtime experiments, the number of cells generated was controlled by the cell density, which we defined in units of the number of cells per mm 2 . We set the cell density between 1000-6000 cells per mm 2 , guided by the upper limits of the local, neuronal densities encountered in twophoton imaging studies of the neocortex (~1500 cells per mm 2 for datasets from the Allen Brain Observatory) and in one-photon imaging studies of the CA1 area of hippocampus (~6000 cells per mm 2 for CA1 pyramidal neurons 41 ).
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Published Ca 2+ Imaging Data
The published Ca 2+ imaging datasets of Fig. 4D,E were taken with a custom-built two-photon mesoscope based on 16 spatiotemporally multiplexed illumination beams that collectively sweep across a 2 mm ´ 2 mm area of brain tissue at an image frame-acquisition rate of 17.5 Hz, as previously described 1 . In brief, these movies of Ca 2+ activity were acquired in cortical area V1 (plus some surrounding regions) of triple transgenic, GCaMP6f-tTA-dCre mice that express the Ca 2+ indicator GCaMP6f in layer 2/3 neocortical pyramidal neurons.

Surgical Procedures
For imaging studies of the ventral hippocampus, all surgeries were conducted under aseptic conditions using a digital small animal stereotaxis instrument (David Kopf Instruments). Doubletransgenic (tetO-GCaMP6s-2Niell/J: Camk2a-tTA-1Mmay/DboJ) mice expressing GcaMP6s were anesthetized with isoflurane (5% induction, 1-2% maintenance, both in oxygen) in the stereotactic frame for the entire surgery. Body temperature was maintained using a heating pad. A craniotomy centered on the injection coordinates was performed using a trephine drill (1.0 mm in diameter). To prevent increased intracranial pressure due to the insertion of the implant, we aspirated brain tissue until the white fibers of the corpus callosum became visible. Next, we slowly lowered a customdesigned 0.6-mm-diameter microendoscope probe (Grintech GmBH) to the coordinates -3.40 mm AP, -3.75mm ML, -3.75mm DV. We fixed the implanted microendoscope to the skull using ultraviolet-light-curable glue (Loctite 4305). To ensure stable attachment of the implant, we inserted two small screws into the skull above the contralateral cerebellum and contralateral sensory cortex . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 25, 2021. ; https://doi.org/10.1101/2021.03.24.436279 doi: bioRxiv preprint (18-8 S/S, Component Supply). We then applied Metabond (Parkell) around both screws, the implant and the surrounding cranium. Lastly, we applied dental acrylic cement (Coltene, Whaledent) on top of the Metabond, for the joint purpose of attaching a metal head bar to the cranium and to further stabilize the implant. After surgery, we maintained the animal's body temperature using a heating pad until it fully recovered from anesthesia.
Mice recovered for 3-6 weeks, at which point we checked the brightness of GCaMP6s expression using a miniature microscope (nVista HD, Inscopix, Inc.). If expression was sufficiently bright, a baseplate for repetitive mounting of the miniature microscope was fixed unto the skull using blue-light curable composite (Pentron, Flow-It N11VI).
For imaging studies of cerebellar Purkinje neurons, we followed our published procedures 43 and performed surgeries on isoflurane-anesthetized PCP2-Cre/Ai148 mice (1.25-2.5% in 0.5-1.5 L/min of O2). We first cleaned and removed skin to reveal part of the skull. We then opened a 4-mmdiameter craniotomy centered mediolaterally on the midline, and rostrocaudally at the boundary between cerebellar lobules V and VI. We attached a 3-mm-diameter cover slip beneath a 3-mmdiameter and 1-mm-high stainless steel ring using ultraviolet-light activated epoxy (Norland NOA81).
We then implanted the cover slip / steel ring combination into the craniotomy and fixed it in place with Metabond (Parkell). Finally, we centered an aluminum headplate with a 5-mm-diameter opening over the cranial window and fixed it to the skull with Metabond. The custom-made plate was shaped to allow the additional attachment of two stainless steel bars to the cranium, which we used during Ca 2+ imaging sessions to hold the mouse's head secure.

Ca 2+ Imaging Sessions
For imaging studies of ventral CA1 pyramidal neurons, we allowed the mice to explore an elevated . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made For imaging studies of cerebellar Purkinje cells (Fig. S4A,B), we used a custom-built twophoton mesoscope, the design of which we have previously described in detail 1,44 . We acquired images over a 2 × 2 mm 2 field-of-view at a 17.5 Hz frame rate (842 × 842 pixels). To run CNMF-e, we used the original authors' own implementation 15 , taken from a Github repository called CNMF_E. We based our implementation of CNMF-e on the provided demo script for running it on large data, inheriting most settings from the script. For both the striatum and the ventral CA1 data, we used gSig = 3, gSiz = 2*gSig, min_pnr=2.5, and min_corr = 0.7.

Cell Extraction with CNMF, CNMF-e, and ICA
To run PCA/ICA, we used the authors' published version 12 , which is available on MATLAB's FileExchange forums. The ICA method first performs a principal component analysis (PCA) to reduce the dimensions of the data and then runs independent components analysis (ICA) to unmix the components spatiotemporally 12 . In all our studies, we ran ICA with µ = 0.1 (which sets the contribution of temporal information in the ICA step), its recommended value in the original paper 12 . We also used a maximum of 750 fixed-point iterations for the ICA step. In our studies with simulated data, we set both the number of principal components and the number of independent components to 1.5 times the number of ground truth cells.

Manual Sorting of the Cell Extraction Outputs
After running EXTRACT and CNMF-e for the striatal and the ventral CA1 datasets, we manually examined the outputs to eliminate possible false positives. For this, we wrote custom software that allows a user to view the movie with the cellular outline of each cell of interest and to judge the quality of the cell by comparing its cellular trace to the Ca 2+ activity in the original movie. Using this approach, . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made we eliminated output components that were thereby deemed of low quality, i.e., that yielded a poor spatiotemporal match between the candidate cell's activity trace versus the activity in the movie, and the signal-to-noise ratio of its activity trace. After this step, for detailed comparisons between different cell extraction algorithms, we used cells that were retained after their identification by more than algorithm (see below).

Matching Cells between Cell Extraction Outputs
We matched cells between the outputs of the different cell extraction algorithms using custom-written MATLAB code, provided in our Github, that used a greedy matching scheme based on the cell images. For this, we first computed the distance matrix of Pearson correlations between a set of reference cells and a set of detected cells. We then traversed this distance matrix in the order of decreasing distance values, recording a match between the i th reference cell and the j th detected cell after visiting the (i, j) th index of the matrix. The i th row and the j th column were also set to infinity after visiting the (i, j) th index, to prevent further visits. Matching stopped when the currently visited index of the matrix held a lower value than a threshold, which we set to 0.5. For matching across several sets of outputs, we performed matching between all output pairs, and then reported the intersection of all pairwise-matched cells.

Detection of Ca 2+ Transients
Prior to all quantitative analyses that involved Ca 2+ traces, we detected the Ca 2+ event peaks from the activity traces. For this, we used simple peak detection (peakseek function, available from MATLAB FIleExchange forums) on smoothed Ca 2+ traces. We smoothed the Ca 2+ traces using a 1dimensional median filter with a window size of 3, followed by convolution with a Gaussian window function (gausswin in MATLAB with length 6). For peak detection, we did not consider time points in the Ca 2+ trace with activity levels below an event detection threshold, which was between 0-1, as measured relative to the maximum of the Ca 2+ trace. When reporting event peaks, we used the . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 25, 2021. ; https://doi.org/10.1101/2021.03.24.436279 doi: bioRxiv preprint analog value of each Ca 2+ trace at its event peak, instead of binary information marking the presence of Ca 2+ event.

Detection of Dendritic Ca 2+ Activity
For the analyses of Fig. S4, involving dendritic activity in cerebellar Purkinje neurons and neocortical pyramidal neurons 42 , we used two different approaches to optimize cell extraction results and runtimes. In both cases, we omitted the use of high-pass spatial filtering during the pre-processing stage, set the 'dendritic awareness' parameter in EXTRACT to 1 and visually inspected the outputs from EXTRACT. The default setting for the 'dendritic awareness' parameter is 0. However, when its value is set to 1, EXTRACT no longer discards candidate sources of Ca 2+ activity whose spatial areas or eccentricity values are uncharacteristic of cell bodies. This alteration allows EXTRACT to detect Ca 2+ activity sources, such as dendritic segments, with a wide range of shapes.
For studies of cortical pyramidal cell dendrites, we first temporally downsampled the Ca 2+ videos from 31 fps to 7.75 fps and then ran EXTRACT on the downsampled movies. For studies of Purkinje neuron dendrites, we first sought to initialize EXTRACT with a reasonable set of candidate dendrites. To determine this set, we denoised the movie by performing a factor analysis, through a singular value decomposition of the movie. We discarded the noise components of the movie, as determined through the factor analysis, and spatiotemporally smoothed the resultant by convolving the movie with a filter that was of 3 time bins duration and 3 pixels wide in both spatial dimensions.
We ran EXTRACT on the denoised, low-pass filtered movie version and used the resulting set of dendritic spatial profiles as the starting point for another iteration of EXTRACT, as performed on a denoised version of the movie that was spatially filtered as before but not temporally smoothed.
Within both iterations of EXTRACT, we used the algorithm's internal low-pass Butterworth spatial filtering in the pre-processing module, but with greater filtering along the rostral-caudal dimension then the medial-lateral dimension, to account for the rostral-caudal elongation of the Purkinje cell dendritic trees. After the second iteration of EXTRACT, we visually inspected the results and retained the larger dendritic segments with substantial Ca 2+ activity.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Analyzing the Cell Extraction Outputs from the Simulated Datasets
After performing cell extraction on the simulated datasets, we first matched the found cells to the ground truth cells by using our aforementioned cell matching routine. To compute the areas under the spike precision-recall curves, we detected Ca 2+ events within the traces provided by the cell detection algorithm, across a range of event-detection thresholds between 0-1 (in units of each cell's peak Ca 2+ signal), and we matched the ground truth spikes to the detected spikes to compute the spike recall and spike precision metrics. To perform this matching, we used the same greedy matching scheme described above but adapted to spike matching; instead of using a spatial distance matrix as we had for cell matching, for spike matching we computed a temporal distance matrix between the ground truth and detected spikes, and then negated the values of it to be consistent with the logic of the cell matching routine (greedy cell matching requires an affinity matrix). We also set the matching threshold to correspond to a maximum temporal separation of 3 image frames between the ground truth and detected events. After matching detected events to ground truth spikes, we computed the spike recall as the ratio of the number of matching detected spikes to the total number of ground truth spikes. We computed the spike precision as the ratio of the number of matching detected spikes to the total number of detected spikes. We averaged the spike and precision values for each detection threshold across all cells of a given movie, which resulted in the mean spike-precision curve for that movie. To compute AUC values, we performed numerical integration of the curves with MATLAB's trapz function, which uses the trapezoidal approximation.
After matching the detected cells with the ground truth cells, we computed the cell finding recall and precision metrics in an analogous manner to that used to compute the spike recall and precision.

Selection of Algorithm Parameters for Runtime Comparisons
We adjusted the parameters of EXTRACT and CNMF to compare their runtimes under conditions when the two methods returned comparable outputs. For EXTRACT, we set cellfind_min_snr (minimum acceptable pixelwise SNR for cell finding) to 2.5. For CNMF, we adjusted both patch_size (size of the independently processed movie tile), and K (number of cells to initialize), to tune the . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 25, 2021. ; https://doi.org/10.1101/2021.03.24.436279 doi: bioRxiv preprint method for the fastest speed while outputting a comparable number of cell candidates as EXTRACT.

Analyses of Striatal Spiny Projection Neural Activity
For analyses of striatal neural activity, we used published datasets of Ca 2+ activity in spiny projection neurons, and to compute the spatial coordination index we followed closely the approach published in the original paper 22 . We first computed a matrix of centroid distances between each pair of cells in a movie. We then detected Ca 2+ events from the output traces and obtained a binarized event trace by marking as 'active' the one-second period following each Ca 2+ event. The motivation for this temporal expansion is that it better highlights clustered activity, which may not be perfectly synchronous, as described previously 22 . For each time point, using the centroid distance matrix, we obtained a histogram of pairwise centroid distances for all pairs of active cells at each time point. We also performed the same computations using shuffled versions of the same data in which the identification numbers of the cells were randomly permuted. From these shuffled datasets, we obtained a null distribution by aggregating the histograms of pairwise distances over 100 different permutations. For each time point, we then compared the histogram of pairwise distances for the real data to the null distribution using a one-sample Kolmogorov-Smirnov test with one tail, performed using MATLAB's kstest function. This allowed us to test statistically whether the pairwise centroid distances in the real data were less than expected by chance. We then took the negative base-10 logarithm of the resulting p-value as the spatial coordination metric (SCM). We compared the resulting SCM values obtained using traces from PCA/ICA to those from CNMF-e and EXTRACT.
For these comparisons, we used the same traces from PCA/ICA as in Ref. 22, which were already sorted, whereas for EXTRACT and CNMF-e we performed sorting (see above) ourselves after cell detection.

Classification of Arm-coding Cells in the Ventral Hippocampus
We wrote custom MATLAB software to determine mouse trajectories on the elevated plus maze, and . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 25, 2021. ; https://doi.org/10.1101/2021.03.24.436279 doi: bioRxiv preprint we manually verified the accuracy of the estimated locations. We computed the average Ca 2+ event rate on each arm-type of the maze by computing the mean of the event trace across the the time bins in which a mouse was on a given arm. For each cell, we obtained the difference between the Ca 2+ event rates on closed and open arms, d = event_rateclosed_arm -event_rateopen_arm. We repeated the same procedure after circularly shifting the event trace of each cell by a random number of time bins, to break the dependence between Ca 2+ events and mouse locations. We computed the event rate difference, d, on 1000 different instantiations of such randomized traces, providing a null distribution of d values for each cell. We classified a cell as closed-arm coding if d was within the 95 th percentile or higher of the null distribution. We classified a cell as open-arm coding if d was within the 5 th percentile or lower of the null distribution.

Correlation Analysis of Cell Images and Movie Frames for Ventral CA1 Pyramidal Neurons
To compute the Pearson correlation coefficient between the image of a given output cell and the movie frames at the time points with detected Ca 2+ events, we first limited analysis to a small spatial neighborhood centered around the cell image. We binarized the cell image and then applied a morphological opening operation 45 with a 3 pixels ´ 3 pixels structuring element. We treated the resulting two-dimensional binary array as a truncation mask to retain only the region-of-interest around the cell. We then determined the Pearson correlation coefficient between the truncated cell image array and the truncated movie frame array at the time of each detected Ca 2+ event.
To compute a scalar, weighted correlation value for each cell, we took a weighted sum of the Pearson correlation coefficients using the event magnitudes as the weighting factors. Specifically, we first removed the zero entries of the event trace and normalized the trace so that its entries summed to 1. This yielded an array with the same size as the array of Pearson correlation coefficients for the same cell. We then took the inner product of the two arrays and reported it as the weighted correlation metric.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Decoding of Mouse Locations from Ca 2+ Traces of Ventral CA1 Pyramidal Neurons
We divided the plus maze into 5 spatial bins: left arm, right arm, upper arm, lower arm, and the stem.
We first obtained the analog-valued Ca 2+ event traces from the output traces using our event detection routine. We then smoothed the event traces with a moving average filter of length 20 time bins, corresponding to smoothing over two seconds of activity. We trained support vector machines to predict the spatial bins from the smoothed event traces for a given session. We used the templateLinear function in MATLAB with SVM learners, selecting ridge regularization with regularization penalty selected automatically. We obtained decoding test errors by first circularly shifting the event traces by a random amount, then selecting the leading 70% of the circularly shifted event traces as the training set, and the latter 30% as the test set. We repeated this procedure 20 times, and we averaged the decoding test errors over 20 repetitions.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 25, 2021. ; https://doi.org/10.1101/2021.03.24.436279 doi: bioRxiv preprint