DeCalciOn: A hardware system for real-time decoding of in-vivo calcium imaging data

Epifluorescence miniature microscopes (“miniscopes”) are widely used for in vivo calcium imaging of neural population activity. Imaging data is usually collected while subjects are engaged in a task and stored for later offline analysis, but emerging techniques for online imaging offer potential for novel real-time experiments in which closed-loop interventions (such as neurostimulation or sensory feedback) are triggered at short latencies in response to neural population activity. Here we introduce DeCalciOn, a plug-and-play hardware device for online population decoding of in vivo calcium signals that can trigger closed-loop feedback at millisecond latencies, and is compatible with miniscopes that use the UCLA Data Acquisition (DAQ) interface. In performance tests, the position of rats (n=13) on a linear track was decoded in real time from hippocampal CA1 population activity by 24 linear classifiers. DeCalciOn required <2.5 ms after each end-of-frame to decode up to 1,024 calcium traces and trigger TTL control outputs. Decoding was most efficient using a ‘contour-free’ method of extracting traces from ROIs that were unaligned with neurons in the image, but ‘contour-based’ extraction from neuronal ROIs is also supported. DeCalciOn is an easy-to-use system for real-time decoding of calcium fluorescence that enables closed-loop feedback experiments in behaving animals.


Introduction
Miniature epifluorescence microscopes ("miniscopes") can be worn on the head of an unrestrained animal to perform in vivo calcium imaging of neural population activity during free behavior 1,2,3,4,5 . Imaging data is usually collected while subjects are engaged in a task and stored for later offline analysis. Popular offline analysis packages such as CaImAn 20 and MIN1PIPE 21 employ algorithms 29 for demixing crossover fluorescence between multiple sources to extract calcium traces from single neurons, but these algorithms cannot be implemented in real time because they rely on acausal computations. Emerging techniques for online trace extraction 6,7,8,9,10,11,26 offer potential for carrying out real-time imaging experiments in which closed-loop neurostimulation or sensory feedback are triggered at short latencies in response to neural activity decoded from calcium fluorescence 12,14,25 . Such experiments could open new avenues for investigating the neural basis of behavior, developing brain-machine interface devices, and preclinical testing of neurofeedback-based therapies for neurological disorders 13 .
To advance these novel lines of research, it is necessary to develop and disseminate accessible tools for online calcium imaging and neural population decoding.
Here we introduce DeCalciOn, a plug-and-play hardware device for Decoding Calcium Images Online that is compatible with existing miniscope devices which utilize the UCLA Miniscope DAQ interface board 3,15,16,17 . Device performance was evaluated by implementing 24 linear classifiers to decode hippocampal population activity in unrestrained rats (n=13) running on a linear track. We show that the system requires <2.5 ms after each end-of-frame to decode up to 1,024 calcium traces and trigger TTL outputs that can control external devices. DeCalciOn performs real-time trace extraction by summing fluorescence in regions of interest (ROIs) assigned to each individual trace, without any demixing of crossover fluorescence. In our performance tests, decoding accuracy achieved with this simple trace extraction method matched or exceeded that obtained with offline trace extraction using constrained non-negative matrix factorization 20,29 (CNMF). Online decoding was most accurate and efficient when traces were extracted using "contour-free" ROIs that tiled the entire image frame, and did not overlap with individual neurons. Hence, real-time decoding of neural population activity from calcium fluorescence was most efficient when minimal bandwidth was devoted to computations for source extraction that failed to improve decoding accuracy.
In summary, DeCalciOn provides the research community with a low-cost, open-source, easy-to-use hardware platform for real-time decoding of neural population activity that will permit researchers to perform novel closed-loop experiments in behaving animals. All hardware, software, and firmware are openly available through miniscope.org.

Image processing pipeline
Incoming frames from the MiniLFOV were cropped to a 512x512 subregion containing the richest area of fluorescing neurons in each rat, and stored to BRAM for real-time processing on the Ultra96 (Fig. 2, bottom left). Online processing of each image frame was performed in four sequential steps: 1) motion stabilization, 2) background removal, 3) calcium trace extraction, and 4) decoding. Steps 1-3 were performed by our custom ACTEV (Accelerator for Calcium Trace Extraction from Video) firmware running in the programmable logic fabric of the MPSoC's field-programmable gate array (FPGA). Step 4 was performed by a C# program running under the FreeRTOS operating system on the MPSoC's embedded ARM core.
To correct for translational movement of brain tissue (Step 1), a 128x128 pixel area with distinct anatomical features was selected within the 512x512 cropped subregion to serve as a motion stabilization window. ACTEV's image stabilization algorithm 8 rigidly corrects for translational movement of brain tissue by convolving the 128x128 stabilization window in each frame with a 17x17 contrast filter kernel, and then applying a fast 2D FFT/IFFT based algorithm to correlate the window contents with a stored reference template (derived at the beginning of each experimental session) to obtain a 2D motion displacement vector for the current frame.
Supplementary Video 1 demonstrates online performance of ACTEV's real-time motion stabilization algorithm.
After motion stabilization, ACTEV removes background fluorescence (Step 2) from the 512x512 image by performing a sequence of three enhancement operations 9 : smoothing via convolution with a 3x3 mean filtering kernel, estimating the frame background via erosion and dilation with a 19x19 structuring element 22 , and subtracting the estimated background from the smoothed image. These operations produce an enhanced image in which fluorescing neurons stand out in contrast against the background (see Supplementary Video 1; "Enhanced Image" in Fig. 2). The enhanced image then is filtered through a library of up to 1,024 binary pixel masks (each up to 25x25 in size) that define ROIs within fluorescence is summed to extract calcium traces (Step 3); each mask can be centered anywhere in the 512x512 window. Pixel masks can be created using either a contour-based or a contour-free approach (see below). Real-time position decoding from CA1 place cells Device performance was evaluated using image data collected from the hippocampal CA1 region while Long-Evans rats (n=13) ran back and forth on a 250 cm linear track (Fig. 3A). CA1 pyramidal neurons behave as "place cells" that fire selectively when an animal traverses specific locations in space 18 , so a rodent's position can be reliably decoded from CA1 population activity 2,3,19,28 . For performance testing, we used a virtual sensor (see Methods) that was capable of feeding stored image data to the ACTEV firmware at 22.8 fps, exactly as if raw video data were arriving from the MiniLFOV in real time. This allowed different firmware algorithms (for example, contour-based versus contour-free decoding; see below) to be compared and benchmarked on the same stored datasets. Results obtained with the virtual sensor were verified to be identical with those obtained in real-time.
During a typical real-time session, the experimenter carries out four steps (Fig. 3B): 1) collect an initial imaging dataset and store it to the host PC, 2) pause for an intermission to identify cell contours if necessary (this is only required for contour-based decoding; see below) and train a linear classifier to decode behavior from the initial dataset, 3) upload classifier weights from the host PC to the Ultra96, and 4) perform real-time decoding with the trained classifier. To mimic these steps using the virtual sensor in our performance tests, one session of image data was collected and stored from each of the 13 rats, yielding ~7 min (8K-9K frames) of sensor and position tracking data per rat. The linear classifier was then trained on data from the first half of each session, and tested on data from the second half.
When place cell activity is analyzed or decoded offline (rather than online) from previously stored neural and position data, it is common practice to perform speed filtering that omits time periods when the rat is sitting still. This is done because during stillness, the hippocampus enters a characteristic "sharp wave" EEG state during which place cell activity is less reliably tuned for the animal's current location 24 . Here, speed filtering was not implemented during online decoding of CA1 fluorescence because the linear classifier's input layer received only real-time calcium trace data, and not position tracking data (which would be needed for speed filtering). It is possible that some speed filtering may have been implicitly learned by the decoder if information about the animal's speed was encoded in any of the calcium traces, but explicit speed filtering of calcium data was not performed prior to training or testing the online classifier. Despite this, DeCalciOn was able to achieve highly accurate decoding of the rat's position from calcium traces (see below).
After the decoder had been trained, real-time classifier output from each frame was used to trigger TTL feedback outputs from the Ultra96. The latency to generate feedback from decoded calcium traces was measured as the time elapsed between acquiring the last pixel of each cropped frame and the rising edge of the triggered TTL pulse. Under a worst-case scenario of decoding the maximum number of calcium traces supported by the hardware (N=1,024), mean decoding latency was 2,206±4 s and never exceeded 2,220 s (Fig. 3C). This highlights one of the main advantages inherent in our FPGA-based hardware design; it would be difficult to trigger feedback with such short latencies and low variability if online image data were processed on a CPU or GPU running programs with multiple threads, or if real-time video were relayed to the image processor through USB or ethernet instead of by a direct hardware connection with the DAQ. After calcium traces had been extracted from the training dataset by either method, they were aligned with position tracking data and used to train the linear classifier during the intermission period. ACTEV can utilize a real-time spike inference engine to convert raw calcium trace values into inferred spike counts prior to decoding. However, for both contour-based and contour free trace extraction methods, the linear classifier was more accurate at learning to decode the rat's position from raw calcium traces than from inferred spikes (Supplementary Fig.   1). Similar results have been reported in prior studies of position decoding from place cell calcium fluorescence 27 . When information is decoded from raw calcium traces, the GCamp molecule effectively performs temporal integration of neural activity at a time constant similar to its decay rate. Hence, the finding that decoding is more accurate from raw traces than inferred spike counts can be interpreted to mean that when decoding position from place cell calcium signals, the ideal time constant over which to integrate neural activity is closer to the decay constant of the calcium indicator (several hundred ms for the GCamp7s indicator used here 23 ) than to the video frame acquisition interval (~44 ms for the MiniLFOV scope used here).

Contour-based versus contour-free decoding
Training the classifier on 4K-5K frames of calcium trace data required <60 s of computing time on the host PC during the intermission period. However, contour-based trace extraction required an additional 30-60 min to identify cell contours and simulate online traces before training the classifier. Faster methods for contour identification may be possible, but contour-free trace extraction has the advantage of eliminating this delay altogether. As shown below, this advantage came at no cost to decoding accuracy in our performance tests.
After the linear classifier had been trained, learned weights were uploaded from the host PC to the Ultra96; image data from the second half of the experimental session was then fed through the virtual sensor for real-time decoding. The classifier's output vector consisted of 23 units that used an ordinal scheme (see Methods) to represent 24 binned locations (12 per running direction) along the track (Fig. 3A, bottom). Accurate decoding requires that calcium traces must retain stable spatial tuning across the training and testing epochs of the session; to verify that this was the case, spatial tuning curves were derived for calcium traces during the first (training) versus second (testing) half of each session (Fig. 4C). A similarity score ( S) was computed for each trace (Fig. 4A,B) using the formula S = -log 10 (P)×sign(R), where R is the correlation between the trace's tuning curves from training versus testing, and P is the significance level for R. Contour-based traces had higher mean S scores than contour-free traces in every rat (paired t 12 =7.02, p=1.4e -5 ); hence, at the level of individual traces, spatial tuning was better for contour-based than contour-free traces ( Supplementary Fig. 2). However, the number of contour-free traces (900 per rat) was always greater than the number of contour-based traces (which varied by rat); hence, at the population level, contour-free traces could often convey more information about position than contour-based traces. To compare decoding accuracy between contour-based and contour-free traces, we measured the percentage of frames from the testing epoch during which the trained decoder's position prediction was within ±D bins of the true position. Averaged over rats, the mean decoding accuracy was significantly greater for contour-free than contour-based decoding at most values of D (Fig. 4E,H). Analogous results have been reported in electrophysiology, where decoding is sometimes more accurate from "cluster-free" multiunit activity than from single-unit spikes 21 .
Decoding position from CaImAn's offline (demixed) traces did not improve accuracy over decoding from online traces derived from the same cell contours (Fig, 4F,G). This demonstrates that the inferior accuracy of contour-based decoding was not rooted in any loss of fidelity incurred by the transition from offline (demixed) to online (non-demixed) trace extraction.
Although mean decoding accuracy was higher for contour-free than contour-based traces in almost every rat, the contour-free advantage was greatest in rats with <400 contour-based traces (Fig. 4D), indicating that the primary reason why contour-free traces provided better decoding accuracy was because they were greater in number. CaImAn's sensitivity parameters can be adjusted to detect more contour-based traces in each rat, and when this was done, the accuracy of contour-free and contour-based decoding became much more similar (Supplementary Fig. 3). Hence, contour-free traces provided better decoding accuracy than contour-based traces only when they were greater in number. Based on these results, the contour-free approach is recommendable over the contour-based approach as a faster and more efficient method for online trace extraction, since it is just as accurate and obviates the need to identify cell contours before training the decoder.

Discussion
Here Liu et al. 25 introduced a system for real-time decoding with the UCLA miniscope that, unlike DeCalciOn, requires no additional hardware components; the system is implemented entirely by software running on the miniscope's host PC. It was shown that this system can implement a single binary SVM classifier to decode calcium traces from 10 regions of interest (ROIs) in mouse cortex at latencies <3 ms. However, the system does not perform online motion stabilization, which could degrade performance when large numbers of ROIs are spaced closely together. Any system that lacks motion stabilization would also be vulnerable to artifactually decoding behavior from brain motion (which can be correlated with behavior) rather than neural activity. Liu et al. 25 did not report how their system's decoding latency scaled with the number of classifiers or the size/number of ROIs, but if serial processing on the host PC scales linearly with these variables, then decoding latencies of several seconds or more would be incurred when implementing 24 classifiers to decode 1,204 calcium traces.
Zhang et al. 14 performed 2-photon calcium imaging experiments that incorporated real time image processing (running on a GPU) to detect neural activity and trigger closed-loop optical feedback stimulation in mouse cortex. With online motion correction enabled, the system achieved mean feedback latencies of 8.5+n/30 ms after the end of each frame, where n is the number of ROIs in the image. This system would thus be expected to incur a mean decoding latency of ~43 ms to extract traces from 1,204 ROIs; the system's variability appears to scale with latency, so delays might be >60 ms for some frames and <20 ms for others. Zhang et al.'s 14 system is well suited for experiments in which fast (<10 ms) feedback is triggered from tens of ROIs or slower feedback (40±20 ms) is triggered from hundreds of ROIs. By contrast, DeCalciOn can achieve feedback latencies of <2.5 ms with submillisecond variability even for large ROI counts (Fig. 3C), and would thus be preferable for experiments requiring fast closed-loop feedback triggered from large numbers of ROIs.

Contour-free source extraction
In our experiments, contour-free outperformed contour-based decoding when the number of contour-based traces was less than half the number of contour-free traces (Fig. 4D).
When the number of contour-based and contour-free traces was similar, both methods yielded similar decoding accuracies (Supplementary Fig. 3). Decoding from offline (demixed) contour-based traces was not more accurate than decoding from online (non-demixed) contour-based traces ( Supplementary Fig. 2), so there was no evidence that decoding accuracy suffered at all from greater crosstalk between sources during online trace extraction. In the process of 'demixing' fluorescence signals from one another, CaImAn's CNMF algorithm does not in any way consider whether pixels contain information about behavior, so it may be prone to discard pixels that contain decodable position information, especially when using conservative parameters for contour detection. Supporting this, contour-free pixel masks with high S scores (bright purple squares in Fig. 4B) were sometimes located in regions of the image where CaImAn detected no cell contours at all under conservative sensitivity parameters (green pixel masks in Fig. 4A). Based on these results, the contour-free approach is recommendable as the efficient online trace extraction method for real-time population decoding with DeCalciOn.

Summary and Conclusions
DeCalciOn's low cost, ease of use, and latency performance compare favorably against other real time imaging systems proposed in the literature. By making DeCalciOn widely available to the research community, we provide a platform for real-time decoding of neural population activity that we hope will facilitate novel closed-loop experiments and accelerate discovery in neuroscience and neuroengineering. All of DeCalciOn's hardware, software, and firmware are openly available through miniscope.org.

Hardware
The DeCalciOn system is designed for use with UCLA Miniscope devices 3  software's default operation mode is to receive and display raw Miniscope video data from the DAQ via a USB 3.0 port (and store this data if the storage option has been selected), and also to receive and display raw behavior tracking video from a webcam through a separate USB port.
At the start of a real experimental session, data acquisition by both programs (DAQ and RTI software) is initiated simultaneously with a single button click in the RTI user interface, so that Miniscope video storage by the RTI software and behavior video storage by the Miniscope DAQ software are synchronized to begin at exactly the same time. This allows behavioral data stored by the DAQ software to be aligned with Miniscope video and calcium trace data stored by the RTI software. During the intermission period between initial data acquisition and real-time inference, data stored by DAQ and RTI software is used to train the linear classifier on the host PC. Trained classifier weights are then uploaded to the Ultra96 for real-time decoding.

Contour-based pixel masks
Pixel masks for contour-based trace extraction were derived using the CaImAn 20 pipeline (implemented in python) to analyze motion-corrected sensor images from the training dataset (as noted in the main text, this took 30-60 min of computing time on the host PC). CaImAn is an offline algorithm that uses constrained non-negative matrix factorization (CNMF) to isolate single neurons by demixing their calcium traces. The CNMF method is acausal so it cannot be used to extract traces in real time. But during offline trace extraction, CaImAn generates a set of spatial contours identifying pixel regions from which each demixed trace was derived, which ACTEV then uses as pixel masks for online extraction of contour-based traces. Once pixel masks were identified from the training data, motion-correct miniscope video from the training period was passed through an offline simulator that used ACTEV's causal algorithm for extracting calcium traces from contour pixel masks. This yielded a set of calcium traces identical to those that would have been extracted from the training data in real time by ACTEV. These simulated traces were then used as input vectors to train the linear classifier. Since the online traces are not generated by CNMF (and are thus not demixed from one another), they are susceptible to contamination from fluorescence originating outside of contour boundaries. However, this crosstalk fluorescence did not impair decoding since similar accuracy results were obtained by training the linear classifier on online or offline contour-based traces (Fig. 4F,G).

Contour-free pixel masks
To implement contour-free trace extraction, we simply partitioned the 512x512 image frame into a 32x32 sheet of tiles, each measuring a square of 16x16 pixels (Fig. 4B). No traces were extracted from 124 tiles bordering the edge of the frame, to avoid noise artifacts that might arise from edge effects in the motion stabilization algorithm. Hence, a total of 1,024-124=900 pixel mask tiles were used for contour-free calcium trace extraction. These traces were derived in real time and stored to the host PC throughout the initial data acquisition period, so they were immediately available for training the linear classifier at the start of the intermission.
Consequently, an advantage of contour-free trace extraction is that the intermission period between training and testing is shortened to just a few minutes, because the lengthy process of contour identification is no longer required. A disadvantage of contour-free trace extraction is that contour tiles do not align with individual neurons in the sensor image. As reported in the main text, this lack of alignment between neurons and pixel masks did not impair (and often enhanced) position decoding; however, contour-free decoding does place limits upon what can be inferred about how single neurons represent information in imaged brain regions.

Fluorescence summation
After pixel masks were created using one of the two methods (contour-based or contour-free) and uploaded to the FPGA, calculations for extracting calcium traces from the masks were the same. Each mask specified a set of pixels over which grayscale intensities were summed to obtain the fluorescence value of a single calcium trace: where T(f) is the summed trace intensity for frame f, and p i (f) is the intensity of the i th pixel in the mask for frame f, and P is the number of pixels in the mask. Each contour-free mask was a square tile containing 16x16=256 pixels (Fig. 4B), whereas the size of each contour-based mask depended upon where CaImAn identified neurons in the image (Fig, 4A).

Drop filtering
The mean size of contour-based masks was 20-50 pixels (depending on the rat), which was an order of magnitude smaller than contour-free tiles. In a few rats, a small amount of jitter sometimes penetrated the online motion correction filter, causing the stabilized sensor image to slip by 1-2 pixels against stationary contour masks during 1-2 frames (see Supplementary Video 1, line graph at lower right). This slippage misaligned pixels at the edges of each contour mask, producing intermittent noise in the calcium trace that was proportional to the fraction of misaligned pixels in the contour, which in turn was inversely proportional to the size of the contour, so that motion jitter caused more noise in traces derived from small contour masks than large contour masks. To filter out this occasional motion jitter noise from calcium traces, a drop filter was applied to traces derived from contour masks that contained fewer than 50 pixels. The drop filter exploited the fact that genetic calcium indicators have slow decay times, and therefore, sudden drops in trace fluorescence can be reliably attributed to jitter noise. For example, the GCamp7s indicator used here has a half decay time 18 of about 0.7 s, so at the MiniLFOV's 22.8 Hz frame rate, a fluorescence reduction of more than 5% between frames can only arise from jitter artifact. The drop filter defines a maximum permissible reduction in fluorescence between successive frames as follows: , = where G is the maximum possible fluorescence intensity for any single pixel (255 for 8-bit grayscale depth), C is the number of pixels in the contour mask for the trace, and q is a user-specified sensitivity threshold, which was set to 0.9 for all results presented here. The drop-filtered calcium trace value for frame f was given by: It should be noted that while drop filtering protects against artifactual decreases in trace values, it offers no protection against artifactual increases that might masquerade as neural activity events. However, motion jitter almost never produced artifactual increases in fluorescence for the small contours to which drop filtering was applied, because small contours were "difficult targets" for patches of stray fluorescence to wander into during jitter events that rarely exceeded 1-2 pixels of slippage. Larger contours were slightly more likely to experience artifactual fluorescence increases during jitter events, but in such cases, the artifact was diluted down to an inconsequential size because it only affected a tiny fraction of the large contour's pixels. In summary, jitter artifact was highly asymmetric, producing artifactual decreases but not increases for small contours, and producing negligible artifact of any kind for large contours.  Fig. 1H in the main text), but we did not deliver closed loop feedback to our pilot rats. As explained in the Results, decoding from contour-free traces was more accurate than from contour-based traces. This was because there were more contour-free traces in every rat ( Supplementary Fig. 2).

Spatial tuning curves
To generate spatial tuning curves for fluorescence traces, each trace was first referenced  cemented in place with methacrylate bone cement (Simplex-P, Stryker Orthopaedics). The dorsal surface of the skull and bone screws were cemented with the GRIN lens to ensure stability of the implant, while the dorsal surface of the implanted lens was left exposed.
Two to three weeks later, rats were again placed under anesthesia in order to cement a 3D printed baseplate above the lens.

Linear alternation behavior
After rats had been baseplated, they were placed on food restriction to reach a goal weight of 85% ad lib weight and then began behavioral training. Time between the beginning of the surgical procedures and the start of behavior training was typically 6-8 weeks. Rats earned 20 mg chocolate pellets by alternating between two rewarded ends of a linear track (250 cm) during 15 min recordings beginning 5 days after baseplating. After receiving a reward at one end, the next reward had to be earned at the other end by crossing the center of the track.

Behavior tracking
A webcam mounted in the behavior room tracked a red LED located on the top of the Miniscope and this video was saved alongside the calcium imaging via the Miniscope DAQ software with synchronized frame timestamps. These behavior video files were initially processed by custom python code, where all the session videos were concatenated together into one tiff stack, downsampled to 15 frames per second, the median of the stack was subtracted from each image, and finally they were all rescaled to the original 8-bit range to yield the same maximum and minimum values before subtraction. Background subtracted behavior videos were then processed in MATLAB. The rat's position in each frame was determined using the location of the red LED on the camera. Extracted positions were then rescaled to remove the camera distortion and convert the pixel position to centimeters according to the maze size.
Positional information was then interpolated to the timestamps of the calcium imaging video using a custom MATLAB script.

Histology
At the end of the experiment, rats were anesthetized with isoflurane, intraperitoneally injected with 1 mL of pentobarbital, then transcardially perfused with 100 mL of 0.01M PBS followed by 200 mL of 4% paraformaldehyde in 0.01M PBS to fix the brain tissue. Brains were sectioned at 40 µm thickness on a cryostat (Leica), mounted on gelatin prepared slides, then imaged on a confocal microscope (Zeiss) to confirm GFP expression and GRIN lens placement.