## Abstract

The inference of neuronal connectome from large-scale neuronal activity recordings, such as two-photon Calcium imaging, represents an active area of research in computational neuroscience. In this work, we developed FARCI (Fast and Robust Connectome Inference), a MATLAB package for neuronal connectome inference from high-dimensional two-photon Calcium fluorescence data. We employed partial correlations as a measure of the functional association strength between pairs of neurons to reconstruct a neuronal connectome. We demonstrated using gold standard datasets from the Neural Connectomics Challenge (NCC) that FARCI provides an accurate connectome and its performance is robust to network sizes, missing neurons, and noise levels. Moreover, FARCI is computationally efficient and highly scalable to large networks. In comparison to the best performing algorithm in the NCC, FARCI produces more accurate networks over different network sizes and subsampling, while providing over two orders of magnitude faster computational speed.

## Introduction

The human brain comprises about 100 billion neurons, each making thousands of synaptic connections with others [1]. Neuronal connectome, the wiring of neurons, is highly plastic and dynamic. The plasticity of the connectome is a feature that imparts the brain with an ability to learn new behavior and to store and process new information [2]. The inference of brain’s neuronal connectivity has received much attention and is one of the main scientific challenges in the 21^{st} century. For example, the Human Connectome Project (HCP) [3,4] was established in 2009 with the goal of mapping the human’s brain connectome. Knowledge of neuronal connectome and its rewiring may shed light on the operating principles of the brain and its myriad functions.

Direct measurements of the connectome based on anatomical techniques is time-consuming, non-scalable and challenging due to limitations of macroscale imaging modalities [5]. Therefore, an alternative approach is to indirectly infer connectome based on neuronal activity recording data. Technological advances in neuroscience have enabled the recording of neuronal activity signals in awake animals at high resolution [6], accelerating efforts toward neuronal connectome inference [1,7]. For instance, two-photon Calcium imaging and neuronal electrophysiology are two current technologies for measuring the activity of single neurons. While electrophysiology offers higher temporal resolution which enables recording the activity of fast spiking neurons, Calcium imaging provides a high spatial resolution as it can record a large population of neurons that are adjacent to one another [8]. Although these techniques employ different principles to detect neuronal activity signals, both technologies are capable of generating large scale time series data for 100s-1000s neurons simultaneously.

Neuronal activities are the outputs of the neuronal networks, and do not directly indicate the connections among the neurons. For this reason, state-of-the-art computational algorithms have been developed to reconstruct the neuronal connectome from high throughput time-series neuronal firing data. In general, there exist two classes of methods for inferring neuronal connectivity from neuronal activity measurements: model-free methods and model-based methods (see [7] for a more comprehensive review). Each class of methods captures the neuronal connectome with different details such as strength, type (i.e. excitatory or inhibitory), and directionality of the connections. Model-free methods make no assumption on the biological mechanism that generates the observed data, using statistical inference, information theory, and supervised learning approaches, to reconstruct the connectome [7]. Model-based methods are typically more complex by employing a generative model of the neuronal network activity. Here, the connectivity is inferred by explicitly modeling the activity of pre-and post-synaptic neurons and the connectome inference is formulated as an estimation of some model parameters so that the model is capable of explaining the observed neuronal activity patterns. Models generated from such model-based methods can further be used to generate neuronal activity data for validation and prediction. But, in practice, it is not feasible to account for all aspects governing the observed neuronal activity data (e.g. response to external stimuli, synaptic strength and connectome rewiring) in a single model. Consequently, model-based approaches use simplified models with the aim of only simulating empirical data without guaranteeing an explicit correspondence between model parameters and physiological factors [9]. Results of the Neural Connectomics Challenge (NCC) [10,11] revealed that simpler model-free methods, especially those relying on partial correlation coefficients, outperform more complex methods [12]. Note that the partial correlations are symmetric and thus do not give an indication for the directionality of the neuronal connections. Nevertheless, an undirected neuronal connectome inferred from neuronal activity recordings can facilitate the comprehension of functional neuronal circuits and potentially of their rewiring principles during learning and other brain functions.

In this work, we developed a model-free method FARCI (Fast and Robust Connectome Inference), a MATLAB toolbox for inferring neuronal connectome from time-series Calcium fluorescence data of neuronal activity. FARCI combines non-negative deconvolution, thresholding, and smoothing in data pre-processing and produces a partial correlation network for neuronal connectome reconstruction. We assessed the performance of FARCI using gold standard Calcium fluorescence datasets from the NCC [12]. Our results demonstrate that FARCI is highly efficient and scalable to large dataset and connectome, and its performance is superior to the winner of the NCC in terms of accuracy and robustness to noise levels, sampling rates, network densities, and hidden (missing) neurons.

## Methods

### Spike deconvolution

Precise temporal information of individual neurons’ spiking activity is crucial for connectome inference. Two-photon Calcium (Ca^{2+}) imaging indirectly captures neuronal action potentials via fluorescence signals of a genetically encoded Calcium indicator (GECI). Extracting neuronal spiking activity from Ca fluorescence data is a non-trivial task because raw Ca imaging recordings are known to have significant intrinsic noise, baseline fluorescence drift, and other technical constraints, such as low sampling rate and slow decay of fluorescence sensor [13]. Among various algorithms for the inference of neuronal spiking activity from two-photon imaging data, Non-Negative Deconvolution (NND) method has been shown to outperform other algorithms while having simple parameter settings [13,14].

FARCI uses a sparse NND, using the Online Active Set method to Infer Spikes (OASIS) algorithm [15] from the Suite2P MATLAB package [16]. OASIS is an adaptation of the pool adjacent violators algorithm [17]. Here, we write the neuronal spike deconvolution, as follows:
where *C* is the raw Ca fluorescence data of neuron *i, f* represents the deconvolution function, and *x* is the deconvolved neuronal spiking activity.

### Spike denoising

The deconvolved spiking activity are often contaminated by noise and such noise can degrade the accuracy of the inferred connectome. For this reason, we remove the deconvolved neuronal spike *x* that is below a certain neuron-specific threshold. The neuron specific threshold Θ_{i} is set to the average *x* of neuron *i* plus a user-specified constant multiple of the standard deviation. The denoised spiking activity *y*_{i} is computed as follows:
where Θ_{i} is the denoising threshold and *μ*_{i} and *σ*_{i} are the mean and standard deviation of the spiking activity for neuron *i* – denoted by *x*_{i}, respectively. While denoising may remove actual spiking activity with a low amplitude, our tests below indicate that the accuracy gain by spike denoising outweighs the loss of information due to removal of low-amplitude spiking activity.

### Spike smoothing

Binning spikes over multiple time points (image frames) has been shown to enhance the correlation between deconvolved and ground-truth spiking activity [18]. We tested different weighted binning strategies for smoothing the spike data (see Supplementary Table S1). We identified the following smoothing function *h*(*y* _{i}, *t*) as the best among the strategies:
where *h* is the smoothing function, *z*_{i} denotes the smoothed spikes, and *t* denotes the time index of spiking activity. The data pre-processing pipeline from Ca fluorescence data to the final neuronal spikes is illustrated in Figure 1.

### Partial correlation statistics

In FARCI, the functional connectivity between each pair of neurons are established based on their co-firing behavior. However, due to highly interconnected nature of neurons, functional correlations between any two neurons may arise indirectly from their connections to other neurons (e.g., sharing the same pre-synaptic neurons). To reduce false positives, FARCI uses the partial correlation coefficient as a measure of functional connectivity between a pair of neurons – that is, the correlation between the spike activity of two neurons while controlling for the activity of other neurons. In order to infer an edge (connection) between two neurons, for a set *V* of *N* neurons, we treat the spike data of the *N* neurons at each time point as *N* independent and identically distributed (*i. i. d*.) observations, i.e. *Z* = {*z*;_{1}, *z*_{2}, …, *z*_{n} }. The partial correlation *p*_{ij} between neuron *i* and *j*, is computed via the precision matrix ϕ as follows:
where **Σ** is the *N* × *N* covariance matrix of the neuronal spikes *z* and **Φ** is the precision matrix – the inverse of the variance-covariance matrix. The partial correlation coefficients have values between –1 and +1, where a value of +1 (–1) indicates a perfect positive (negative) correlation of spike activity between two neurons when all other neurons having their activity fixed.

### Performance evaluation

We evaluated the performance of FARCI using the gold standard *in silico* Ca fluorescence data and connectome in the NCC. The provided datasets in NCC were simulated using a realistic and state-of-the-art simulator [19], generating data that resemble actual recording of Calcium signals from neurons. The data simulator employed a mathematical model that takes into account limitations of Calcium imaging technology such as temporal resolution and light scattering artifacts. We followed the assessment of method performance in the NCC and evaluated the area under Receiver Operating Characteristic (AUROC) curve by comparing the inferred partial correlation network with the ground truth connectome. Besides, we use the area under Precision-Recall (AUPR) curve as an additional performance metric. AUROC and AUPR values range between 0 and 1, where a value of 1 indicates a perfect prediction. Note also that an AUROC of 0.5 is the expected performance for a random prediction. Although AUROC and AUPR are related to each other, unlike AUROC, AUPR takes into account the ratio between positives and negatives (i.e. class imbalance). For a sparse connectome where the number of true connections is low in comparison to the number of all possible connections, AUPR is a more sensitive measure for the performance of network inference methods than AUROC [20]. On the other hand, AUROC tends to be very high (near 1) for a sparse network.

We compared FARCI with the best performer in the NCC [12], an algorithm developed by Sutera et al., 2015. The Sutera et al. algorithm comprises a four-step signal processing pipeline (low-pass filter, high-pass filter, hard thresholding, and weighting), and like FARCI, produces partial correlation networks. We noted that the Sutera et al. algorithm gave an excellent performance for 1000 neuron connectomes in the NCC – the connectomes used for scoring in the challenge. However, its performance was relatively poorer for the smaller connectomes in the NCC with 100 neurons. In other words, the Sutera et al. algorithm appeared to be overly optimized, and thus is not robust to network size. In this work, we also evaluated the robustness of FARCI performance with respect to not just network sizes, but also noise level, sampling rate, and hidden (missing) neurons.

## Results and Discussion

FARCI is an efficacious and robust method for inferring functional neuronal connectome from Calcium fluorescence data. Figure 2 illustrates the workflow of the connectome inference in FARCI, which comprises the following key steps: (1) deconvolution of spiking activity from Ca fluorescence data, (2) spike denoising, (3) spike smoothing, and (4) evaluation of partial correlations. The functional neuronal connectome is represented by the partial correlation network among the neurons in the dataset. We benchmarked FARCI using gold standard Ca fluorescence datasets from the Neural Connectomics Challenge (NCC) [10,11]. The NCC provides a total of 16 gold standard datasets generated by *in silico* simulations [19] of the activity of 100 (*n =* 6) and 1000 neurons (*n* = 4) that are referred to as the *small* and *normal* connectomes, respectively. Additional datasets of 1000 neurons (*n* = 6) with different characteristics of noise, neuronal firing rate, network clustering coefficient, and network density are also available. More details of the datasets in the NCC are provided in Table 1. The connectivity of the small and normal connectomes has a relatively low density that ranges within 16.3 ± 1.7% for the small networks and 2.1 ± 0.5% for the normal networks, suggesting that these connectomes are sparse.

In the NCC, participants were asked to reconstruct neuronal connectome from time-series measurements of Ca fluorescence data. Submissions from the participants were ranked based on how accurately their algorithms are able to infer the neural connectomes with 1000 neurons – the so-called normal connectome. The scoring in the NCC was done by evaluating the area under the receiver operating characteristic (AUROC). But, as we noted in Methods, AUROC tends to be high for a sparse connectome, and hence, is not a sensitive measure of performance. For this reason, in our benchmarking, we evaluated the AUROC as well as the area under the precision-recall curve (AUPR) (see Methods) to assess the performance of FARCI. The AUPR is a more sensitive indicator for network inference accuracy than the AUROC for a sparse connectome [20]. Below, we compared FARCI with the best performing method in the NCC by Sutera *et al*. [12] in terms of AUROC and AUPR.

### Neuronal spike deconvolution

Ca fluorescence imaging data give only indirect measurements of neuronal activity, and thus, require data pre-processing to extract the underlying neuronal action potential spikes. We employed OASIS deconvolution algorithm [15] from the MATLAB package Suite2P [16] that uses a non-negative deconvolution strategy to provide estimates for timing and amplitude of spiking activity. Table 2 gives the AUROC and AUPR values for using the partial correlations of the deconvolved spikes to infer neuronal connectomes (see Table S2-S4 for more detailed results). While the AUROCs were generally good (>0.78), the AUPRs were as low as 0.23.

### Binarization of neuronal spikes

We also tested whether binarizing the deconvolved spiking activity might help in improving connectome inference using partial correlations. As reported in Table 2, converting spikes to binary data – by setting all non-zero spikes to 1 – led to a significant deterioration in the accuracy of the inferred connectomes for both small and normal sized networks. The result above suggests that the amplitude of spiking activity contain significant information for inferring neuronal connectivity. Thus, in FARCI, we used the deconvolved spiking activity without any binarization.

### Neuronal spike denoising

Low amplitude spiking activity may arise from random noise, and should ideally be removed to improve accuracy. In FARCI, we implemented a denoising step by enforcing a user defined threshold that is equal to a multiple *α* of the standard deviation of Ca spike heights. More specifically, we define the threshold Θ for each neuron *i* as follows:
where *μ*_{i} and *σ*_{i} denote the mean and standard deviation of the deconvolved spiking activity *x*_{i}. During denoising, any spiking activity below the threshold is set to 0.

We investigated the influence of *α* on AUROC and AUPR. To this end, we ran the connectome inference using denoised spikes for different *α* values in the range of 0 ≤ *α* ≤ 5. As shown in Figure 3, the AUROC generally drops with increasing denoising strength (i.e. increasing *α*), especially for larger connectomes, but stays reasonably high at above 0.7. Indeed, for large and sparse networks where the number of negative cases (i.e. the absence of neuronal connections) significantly outweighs the number of positive cases, AUROC often becomes too optimistic. Here, the AUPR serves as a more sensitive metric for method performance. The AUPRs for all of the connectomes show a peak for *α* between 2 and 3, with *α* = 2 often giving the highest value. For this reason, we use *α* = 2 as the default value for the spike denoising step in FARCI. As reported in Table 2, the denoising step using *α* = 2 achieves an average improvement of 18.6% in the AUPR.

### Neuronal spike smoothing

Smoothing deconvolved spiking activity has been demonstrated previously to improve the connectome inference [12]. Similarly, binning spikes from OASIS increases the correlation between the predicted and the ground truth spikes [18]. We explored a set of heuristic binning and smoothing functions for improving accuracy (see Supplementary Table S1). Our explorations using various smoothing functions (data not shown) led to Eq. 6 as a simple-yet-efficacious weighted binning function with a superior connectome inference performance for the small and normal connectomes in the NCC. The smoothing function uses a time window of 5 frames where higher weights are given to the time points closer to the center frame of the window. The performance of the connectome inference using binned spiking activity is reported in Table 2, which shows moderate improvements in the AUROC over using the spiking activity directly without binning.

### FARCI

FARCI combines denoising and smoothing of the deconvolved spiking activity to produce a synergistic improvement in the connectome inference, as shown in Table 2. Importantly, the performance comparison in Figure 4 (also see Supplementary Table S5) shows FARCI generally outperforming the best performing algorithm in the NCC by Sutera et al. [12]. The complete results of the benchmarking and comparison are provided in Table S5. The algorithm by Sutera et al. appeared to be optimized for the normal connectomes with 1000 neurons, which is the size of connectomes used for scoring in the NCC. Sutera et al. algorithm performed poorer on the small connectomes with 100 neurons than on the normal connectomes. FARCI is able to outperform Sutera et al. algorithm, providing similarly high AUROC and generally much higher AUPR, regardless of the size of the networks, the level of noise, the density of the networks, and the sampling rates (see Supplementary Table S5).

### Missing neurons

Finally, we investigated the robustness of FARCI with respect to missing neurons. The missing neurons can be considered as hidden variables in the connectome inference. Hidden variables are a common problem in functional connectome inference as only a subset of neurons are measured in a typical experimental setup. We emulated missing neurons by randomly sampling a subset of neurons from the dataset. We then applied FARCI to obtain partial correlation networks for the subsampled neurons, and compared the inferred connectome with the appropriate subnetwork of the ground truth connectome associated with the subsampled neurons. We generated five random samples of neurons and their Ca fluorescence data from the connectomes with 1000 neurons, with the following sizes: 50, 200, 400, 600, and 800 neurons. For each random sample, we performed FARCI and evaluated the average of AUROC and AUPR and the computational runtime. We again compared FARCI with the algorithm by Sutera et al. (2015). Figure 5A-B depicts the AUROC and AUPR from missing neurons simulations using the Normal-1 connectome in the NCC. The results indicate that FARCI is able to maintain high AUROC and AUPR, even up to 60% missing neurons in the dataset. While the AUROC often stays high (>0.9), the AUPR drops quickly at >80% missing neurons. Both FARCI and Sutera et al. (2015) provide comparable AUROCs, but FARCI consistently gives higher AUPR across different fractions of subsampling than Sutera et al. (2015) method. Figure 5C-D summarizes the performance of FARCI for all 1000-neuron networks in the NCC (*n* = 10), confirming the robustness of FARCI to missing neurons up to 60%-80% of the connectome.

### Computational speed

Besides accuracy, computational efficiency is a desirable feature of a connectome inference algorithm. As shown in Figure 6A, FARCI offers 2-3 orders of magnitude of computational speed-up over Sutera et al. algorithm over various network sizes. In addition, the computational runtimes of FARCI has a better scaling with network size than Sutera et al. – that is, a lower fold-increase in computational times with increasing connectome size. The fast performance of FARCI is consistently observed across the datasets in the NCC, as shown in Figure 6B. Furthermore, Figure 6B indicates that the runtime of FARCI scales linearly with the size of the connectome, at least up to 1000 neurons.

## Summary

In this work, we developed FARCI, a fast and robust procedure for inferring functional neuronal connectome from two photon Calcium imaging data. FARCI combines a fast non-negative deconvolution algorithm OASIS [15], spike denoising, and spike smoothing, to extract information for neuronal spike events from Ca fluorescence signals. FARCI produces a partial correlation network of the neurons for connectome inference. Benchmarking using gold standard *in silico* datasets from the Neural Connectomics Challenge [10,11], showed that FARCI outperforms the best algorithm in the NCC by Sutera et al. (2015) in terms of the accuracy of the connectome inference (i.e., AUROC and AUPR) and the computational runtimes and scaling. Also, the high performance of FARCI is robust with respect to the connectome size, data noise and sampling rate, and network densities. Finally, we demonstrated that FARCI performs well in the realistic scenario where there are missing neurons in the connectome inference. In this scenario, partial correlations between any two neurons may appear because they shared a pre-synaptic neuron that is not part of the measurement. In our tests, FARCI is still able to maintain high AUROC and AUPR up to 60% missing neurons in a connectome of 1000 neurons, while keeping moderately high AUPR until 90% neurons are missing.

FARCI produces a partial correlation network for the functional connectome, and such networks do not provide indications on the directionality of the neuronal connectivity. The directionality of the neuronal connectivity gives information regarding the identity of the pre- and post-synaptic neurons. However, the challenges for determining directionality in neuronal connectome from two-photon Ca fluorescence data are many. First, the typical rate of data sampling for two-photon Ca imaging ranges between 30 - 100 ms (i.e. ∼10-30 Hz) [21], which is much longer than the time scale of neuronal action potentials and the following refractory period between 1 - 5 ms [22]. Given the sampling rate of Ca imaging, neurons may have fired several times in between any two image frames, and thus the sequential timing of pre- and post-synaptic neuron firing has a low chance to be captured accurately. Other fundamental challenges in determining causal (directional) connectivity from time series data have been discussed elsewhere [7]. Despite the lack of directionality in the connections, the generated functional connectome is still useful for studying connectome dynamics and rewiring in various brain functions, such as in learning and memory formation.

## Availability

The codes, user manual, and tutorial of FARCI are available online (https://github.com/CABSEL/FARCI and http://www.projectmemonet.org/farci).

## Supplementary Materials

## Acknowledgement

The authors would like to acknowledge funding support from NSF-HDR IDEAS Lab (funding # 1939987, 1940202, 1940162, 1939999, and 1939992).