## Abstract

A cortical neuron typically makes multiple synaptic contacts on the dendrites of a post-synaptic target neuron. The functional implications of this apparent redundancy are unclear. The dendritic location of a synaptic contact affects the time-course of the somatic post-synaptic potential (PSP) due to dendritic cable filtering. Consequently, a single pre-synaptic axonal spike results with a PSP composed of multiple temporal profiles. Here, we developed a “filter-and-fire” (F&F) neuron model that captures these features and show that the memory capacity of this neuron is threefold larger than that of a leaky integrate-and-fire (I&F) neuron, when trained to emit precisely timed output spikes for specific input patterns. Furthermore, the F&F neuron can learn to recognize spatio-temporal input patterns, e.g., MNIST digits, where the I&F model completely fails. Multiple synaptic contacts between pairs of cortical neurons are therefore an important feature rather than a bug and can serve to reduce axonal wiring requirements.

## Introduction

Neurons in the central nervous system (CNS) connect to each other via chemical synapses. In recent decades it was found that two neurons that are synaptically connected typically connect via multiple synaptic contacts rather than a single contact (Holler et al. 2021; Silver et al. 2003; Feldmeyer, Lübke, and Sakmann 2006; Shepherd et al. 2005; Markram et al. 1997). Multiple synaptic contacts that originate from a single pre-synaptic axon often impinge on different parts of the dendritic tree of the post-synaptic neuron (Feldmeyer, Lübke, and Sakmann 2006; Silver et al. 2003; Holler et al. 2021). Furthermore, if contacts were just based on “Peters’ rule” (Peters and Feldman 1976) (namely by axon-dendrite proximity), then one would expect that the distribution of the number of multiple contacts would be exponential, with one contact per axon being the most frequent case (Fares and Stepanyants 2009; Markram et al. 2015; Rees, Moradi, and Ascoli 2017) which is far from what was empirically observed. The deviation from the distribution predicted by Peters’ rule suggests that the number of synaptic contacts between two connected neurons is tightly controlled and is thus likely to serve for a functional purpose.

Several phenomenological models have attempted to explain how multiple synaptic contacts between the pre-synaptic and post-synaptic neurons are formed (Fares and Stepanyants 2009), but very few studies have tried to tackle the question of how might they be beneficial from a computational perspective, but see (Sezener et al. 2021; Camp, Mandivarapu, and Estrada 2020; Jones and Kording 2021; Zhang, Hu, and Liu 2020; Hiratani and Fukai 2018; Acharya et al. 2021). It is typically thought that this redundancy overcomes the problem of unreliable synaptic vesicle release, which results in unreliable signal transmission between the pre-synaptic and the post-synaptic neurons (Rudolph et al. 2015). Several statistically independent unreliable contacts that sum together can reduce the variance of the post synaptic potentials (PSP). However, the same effect using a simpler mechanism could be achieved by multiple vesicles release (MVR) per synaptic activation (Rudolph et al. 2015; Holler et al. 2021) and does not require multiple synaptic contacts. Other studies addressed additional possible advantages for having multiple synaptic contacts between two neurons. Hiratani and Fukai (Hiratani and Fukai 2018) demonstrated that multiple synaptic contacts might allow synapses to learn quicker. Note that faster learning, although beneficial, fundamentally does not endow the neuron with the ability to perform new kinds of tasks. Zhang et al. (Zhang, Hu, and Liu 2020) model multiple contacts in the context of deep artificial neural networks but demonstrate no tangible computational benefit. Several other studies use multiple synaptic contacts in the context of artificial neural networks, demonstrating some computational benefits, sometimes without explicitly addressing the use of multiple synaptic contacts (Camp, Mandivarapu, and Estrada 2020; Jones and Kording 2021; Sezener et al. 2021).

To address the question of the functional consequence of multiple contacts, we developed a simplified neuron model: the Filter and Fire (F&F) neuron. This model is based on the Integrate and Fire (I&F) neuron model but, in our model, each pre-synaptic axon makes multiple synaptic contacts. We further add features to the model to account for the effect of the dendritic cable filtering on time course of the somatic potential resulting from the different synaptic locations on the dendritic tree (Rall 1964; Rall 1967). To analyze the memory capacity of this model, we use the formulation of Memmesheimer et al. (Memmesheimer et al. 2014) developed for the I&F model. We further showed how to teach the F&F neuron a real-world classification task and explored other aspects such as the effect of unreliable synapses on the memory capacity of the F&F neuron and the implication of multiple synaptic contacts for optimizing axonal wiring in the brain.

## Results

### Mathematical description of the filter and fire (F&F) neuron model with multiple contacts

We propose hereby a Filter and Fire (F&F) neuron model, which is similar to the standard current-based Leaky Integrate and Fire (I&F) neuron model, but with two added features. The first feature approximates the temporal characteristics of a dendritic cable as initially demonstrated by Rall (Rall 1964; Rall 1967), in which inputs that connect at distal locations on the dendrite exhibit prolonged post synaptic potentials (PSP) at the soma (Fig 1**B** top traces), whereas proximal inputs generates brief PSP profiles (Fig 1**B** bottom traces). The second feature is that each input axon connects to multiple locations on the dendritic tree, sometimes proximal and sometimes distal (Fig. 1**A,B)**. Formally, consider *N*_{axons} the number of input axons (Fig 1**A**), denoted by index i, and their spike trains will be represented by *X*_{i}(*t*). Each axon connects to the dendrite via *M* contacts (*M* = 3 is illustrated in Fig. 1). Each contact connects to the dendrite at a location denoted by index j and filters the incoming axon spike train with a specific synaptic kernel *K*_{j}(*t*). This forms the contact’s voltage contribution trace . There is a total of *M* · *N*_{axons} such contact voltage contributions traces overall (Fig 1**C**). In vector notation we denote . Each synaptic contact has a weight, *w*_{j}. In vector notation we write . Each contact contribution trace is multiplied by its corresponding weight to form the somatic voltage trace *V*_{s}(*t*) = *w*^{T} · *V*_{c}(*t*) = ∑_{j} *w*_{j} · *V*_{c,j} (*t*) (Fig 1**D**). When the spike threshold is reached, a standard reset mechanism is applied. Please note that the “dendrites” in this model are linear and therefore retain the analytic tractability of the I&F neuron models.

Here we model the temporal ramifications of the effect of adding a passive dendritic cable; we did not consider here the effect of nonlinear dendrites. The kernels we use are typical double exponential PSP shapes of the form: , where A is a normalization constant such that each filter has a maximum value of 1, and *τ*_{decay,j}, *τ*_{rise,j} are randomly sampled for each synaptic contact, representing randomly connected axon-dendrite locations. Note that due to mathematical simplicity we do not impose any restrictions on synaptic contact weights, each weight can be both positive or negative regardless of which axon it comes from. Indeed, the goal of the study is not to replicate all possible biological details, but specifically explore the computational benefit that arises due to two specific details - temporal filtering of synaptic potentials due to dendritic cable properties and multiple synaptic connections between pairs of neurons.

### Increased memory capacity of the F&F neuron with multiple synaptic contacts

We first test the memorization capacity of F&F neuron model as a function of the number of multiple connections. We utilize the framework proposed by Memmesheimer et al. (Memmesheimer et al. 2014) and measure memory capacity in an identical way, and use their proposed local perceptron learning rule for the task. In short, this capacity measure indicates the maximal number of precisely timed output spikes in response to random input stimulation during some time period. Fig. 2**A** shows random spiking activity of 100 axons for a period of 60 seconds (top). Below the output of the post-synaptic cell is shown before learning (black), after learning (blue) and the desired target output spikes (red). In this example, we used *M* = 5 multiple contacts per axon. For the given set of spike trains, it is possible to find a weight vector to perfectly place all output spikes at their precisely desired timing. In Fig. 1**B** we repeat the simulation as in Fig. 1**A**, for various values of multiple contacts (M) while re-randomizing all input spike trains, desired output spike trains, and the synaptic filter parameters of each contact. We repeat this both for the I&F neuron model (i.e., a single synaptic PSP kernel for all synapses) and the F&F neuron model with a randomly selected synaptic kernel for each synapse (see **Methods** for full details of the kernel shapes we used). The y axis represents our success in placing all of the output spikes accurately, as measured by area under the receiver operating characteristic curve (ROC) (AUC) for the binary classification task of placing each spike in 1ms time bins. Error bars represent the variance of the AUC over multiple repeats (18), while re-randomizing the input, re-randomizing the synaptic kernels and re-randomizing the desired output spike trains. The figure shows that all the curves obtained from the I&F neuron models cluster together, and there is no change for different values of M (multiple contacts). This is to be expected as this is the classic case of synaptic redundancy when using a single temporal kernel for all pre-synaptic axons. e.g. for a single axon and two contacts one can see that in the case of a single synaptic kernel (as is the case in I&F model) the somatic voltage can be written as *V*_{s}(*t*) = *w*_{1} ·∑*K*(*t* − *t*_{i}) *+ w*_{2} · ∑ *K* (*t* − *t*_{i}) = (*w*_{1}*+ w*_{2}) · ∑ *K* (*t* − *t*_{i}) = *w*_{eff} · ∑*K* (*t* − *t*_{i}). i.e. the additional weights associated to the same input axon for the I&F neuron when using multiple contacts are equivalent to a single effective weight, and therefore not utilized.

For the F&F in the case of *M* = 1 we have an I&F model just with different kernels for each synapse. This change on its own does not make any difference in the capacity of the model as the number of learnable and utilizable parameters is identical to the I&F case with *M* = 1, and thus this curve lies with the other curves of the I&F models. However, for the F&F models with multiple contacts (*M* = 2,3,5,10,15) the graph shows an increased accuracy, demonstrating that some of the additional weights are utilized. In Fig. 2**C** we display the maximal number of output spikes that can be precisely timed as a function of M (number of multiple contacts), for both I&F and F&F models. We measure the number of precisely timed spikes as the maximal number of spikes that is above a high accuracy threshold (AUC > 0.99) of the plot in Fig. 2**B**, and we normalize by the number of axons to display the number of precisely timed output spikes per input axon on the y axis.

As expected from Fig. 1**B**, the capacity of the I&F model does not depend on the number of multiple connections. The F&F, however, displays an increase of capacity, saturating at approximately 3-fold larger than that of the I&F neuron model capacity. Note that the number of degrees of freedom (tunable parameters) scales linearly with the number of multiple contacts, so there is not an obvious explanation for the observed saturation. We will come back to this and explain the precise origin of this 3-fold increase compared to the I&F model in a later section. In Fig. 2D we vary the number of input axons and observe linear scaling of the number of precise spiking achieved with different slopes for different number of multiple connections. This indicates that increasing the number of multiple connections increases the effective number of parameters utilized per axon. To better illustrate these results, we show in Fig. S1**A** a simple case of how the I&F neuron can emit temporally precise output spikes by employing a spatial strategy. In Fig. S1**B** we show how an F&F neuron can employ a temporal strategy.

### The F&F neuron can learn spatio-temporal tasks that an I&F neuron cannot

Next, we wish to demonstrate new capabilities of the F&F neuron model with multiple synaptic connections that are beyond the I&F neuron model capabilities. For this purpose, we construct a new spatiotemporal task derived from MNIST task. Towards this end, we converted the horizontal spatial image dimension (width) into a temporal dimension (Fig. 3**A** top) with a uniform time warping such that 20 horizontal pixels will be mapped into T milliseconds. T will be the pattern presentation duration. The vertical spatial image dimension (height) is simply replicated 5 times so that 20 vertical pixels will be mapped onto 100 axons. We then sample spikes for each axon according to the time varying Poisson instantaneous firing rate with additional background noise. An example of the resulting input spike trains is shown by the raster plot in Fig. 3**A** middle frame. Finally, we train our neuron to produce a spike at the end of a specific digit that was presented. We train on the full MNIST train subset of digits, and present results on the test set. Before learning (black), after learning (blue), and the desired output (red) are presented at the bottom of Fig. 3**A** for the case where the selected digit was 3. We then repeat this process for all digits for three models - I&F neuron, F&F neuron, and a spatio-temporal, temporally sliding, logistic regression (LR) model. We use the spatio-temporal sliding logistic regression model as a reference model for comparison with the F&F neuron but note that this model is not biologically plausible and cannot be considered as a model of a neuron in our case. Importantly, the LR model has *N*_{axons} · *T* learnable and fully utilizable parameters (*N*_{axons} parameters for each 1ms time bin) which is much greater than *N*_{axons} · *M* parameters that are used in the F&F and I&F models (which are also not fully utilizable as we’ve seen in Fig. 2).

The test accuracies following training for all models are depicted in Fig 3**B**. For this plot a successful true positive (hit) is achieved if at least 1 spike has occurred in the time window of 10 ms around the ground truth desired spike. The temporal duration of each pattern T was 40 ms and the number of multiple contacts M was 5, for the plot in Fig. 3**B**. Fig. 3**B** clearly shows that the I&F neuron model is at chance level for almost all digits, and basically is incapable of learning the task. In contrast, the F&F model with 5 multiple connections is consistently better than chance and sometimes approaches the “aspirational” spatio-temporal logistic regression model. In Fig. 3**C** we visually display the learned weights of all models when attempting to learn the digit 3. The weight matrix of the logistic regression model clearly depicts what appears to be an average-looking digit 3, which was the digit the neuron was trained to recognize. The F&F neuron model depicts a temporally smoothed version of the logistic regression model, and the I&F model clearly cannot learn temporal patterns and therefore cannot recognize this digit at above chance level. For precise details of how the weights for the F&F and I&F models were visualized, please consult the **Methods**. For a more simplified pattern classification case, please see Fig. S1**C**. Fig. S1**D** shows how a F&F neuron can solve the task shown in Fig. S1**C**. Fig. S1**E** explains why this task cannot be solved by an I&F neuron.

To test the effect of the presentation duration of each digit T, Fig. 3**D** displays summary statistics of test accuracy, averaged across all digits, for the three models as a function of T. The interval between patterns was 70 ms and the decay time constant for I&F model was 30 ms, to match the maximal decay time constant for all synaptic kernels in the F&F model. The F&F neuron had 5 multiple contacts. Fig. 3**D** shows that there is an optimal pattern presentation duration for the F&F model that occurs at the 40-50 ms range, which is ∼1.5 times the maximal decay time constant in the model.

Next, we seek to determine the effect of unreliable synaptic transmission and its interaction with multiple synaptic contacts. As explained in the Introduction, multiple synaptic contacts are often considered as a mechanism to overcome synaptic transmission unreliability. In Fig. 3**E** we display the test accuracy as a function of the number of multiple contacts for all 3 models for fully reliable synapses as we have displayed thus far, and in dashed line is the accuracy under the unreliable synapse’s regime with a probability of release of p = 0.5 for each contact. Note the consistent drop in test accuracy that does not go away even with a large number of multiple contacts. In this graph the positive digit used was “7” and the pattern temporal duration was T = 30 ms. The number of positive training samples used in this case was 2048 patterns. Unreliable synaptic transmission can be considered as a mechanism for implementing “drop-connect”, a method used in training artificial neural networks that has a known regularization effect (Wan et al. 2013; Srivastava et al. 2014). In Fig. 3**F** we test if this is also applicable to our case. We show the test accuracy as a function of the number of training samples and demonstrate a regularization effect in the case of F&F that increases test accuracy for a low number of training input patterns. Note that each training sample was shown to the model multiple times (15 times in this case) and for each spike and each contact an independent probability of release was applied, effectively resulting in 15 noisy patterns that were presented to the neuron during training for each original training pattern. This suggests that unreliable synapses can also be viewed as a “feature” rather than a “bug” for the regime of a small number of training data points and can help avoid overfitting.

### Maximal capacity is explained by the effective 3-dimensional subspace spanning all synaptic kernels

Next, we wish to pinpoint the mathematical origin of the properties depicted in Figures 2 and 3 of the proposed F&F neuron models with multiple synaptic connections. We observed that all PSP kernels we used are of similar shapes and therefore the inputs from the same axon will generate very correlated inputs at the local synaptic responses vector *V*_{c}(*t*). Any input correlation will limit the number of degrees of freedom available for learning by modification of the synaptic weights ** w**. In Fig. 4

**A**we show all the PSPs as heatmaps organized according to increasing values of

*τ*

_{rise}within each block and increasing

*τ*

_{decay}between the blocks. In Fig. 4

**C**we show all the PSPs as temporal traces overlayed on each other. Both Fig. 4

**A**and 4

**C**clearly show that the shapes of the PSP kernels, although chosen randomly and have some variance, are overall very similar to each other. We therefore apply singular matrix (SVD) decomposition on all PSP shapes (Fig. 4

**B**) and discover that 99.93% of the variance in all PSP shapes is explained by the first 3 singular vectors (Fig. 4

**E**). This means all synaptic kernels depicted in Fig. 4A and 4C are effectively spanned by a basis set of orthogonal 3 PSP-like shapes. For the sake of presentation and to avoid negative values in the trace shapes, we display the 3 independent kernels that are the result of non-negative matrix factorization (NMF) in Fig. 4

**D**. It’s visible that these PSP shapes basically filter the input signal with various time constants and various delays. These are very intuitive shapes that we can easily interpret. With them, we can understand both the temporal smoothing aspect of the learned weights by the F&F neuron model in Fig. 3

**C**, and the number of independent PSP shapes is a good candidate to explain the very specific 3-fold increase in capacity of Fig. 2

**C**. To verify that these the basis sets are in fact explaining the phenomenon in Fig. 2, we repeat the same experimentation as in Fig. 2, but now each axon is connected to the post synaptic neuron via only 3 multiple connections, and this time they are the optimal PSPs shapes depicted in Fig. 4

**D**instead of being randomly selected. These results are depicted in Fig. 4

**F**. The model with three orthogonal kernels has effectively identical results to those of the F&F neuron, and thus helps us explain its behavior. First, to explain the capacity, when we randomly sample more and more connections from the set of all possible kernels (e.g., increasing M), we slowly approach to span the entire 3-dimensional space by each contact and therefore we have a saturation effect at three times the I&F baseline level. This means that we have reached the three independent kernels, each corresponding to an independent learnable parameter. Second, the specific shapes of these kernels explain the temporal smoothing effect we observe in the learned weights of the F&F neuron model in Fig. 3

**C**. Note that although we used in our study somewhat artificial double exponential synaptic kernels, we verified that our results also hold for PSPs in a simulation of a highly realistic detailed cortical neuron model. This verification can be found in Fig. S2, which shows that despite some quantitative differences, qualitatively the results presented in our work using a reduced F&F neuron model are valid also for the case of a neuron with a complex dendritic tree. Namely, a basis set of three temporally-distinct kernels can effectively span all synaptic kernels in realistic models of cortical pyramidal neurons.

### Multiple synaptic contacts along with dendrites allow the CNS to save either in number of transmitting axons or decoding neurons while achieving identical computational goals

Finally, we wish to point out that multiple synaptic contacts and the proposed F&F neuron model might allow the central nervous system to transmit information via axons in a more hardware efficient way by relying on dendritic decoding downstream. Three computationally equivalent alternatives are depicted in Figure 5. Fig. 5**C** illustrates the case in which N axons transmit information to be readout by an F&F neuron model with multiple synaptic contacts downstream. If we wish for a downstream I&F point neuron to have the ability to produce the same output as the F&F in Fig. 5**C**, we will need that each of the N axons in Fig. 5**C** to be replicated 3 times, each with some additional delay to account for the temporal integration properties of the F&F neuron model. This scenario is illustrated in Fig. 5**D** and verified in simulations in Figures 5**A** and 5**B**. Alternatively, it is also possible to utilize the same N axons as in Fig. 5**C** but have a decoding network of 3N point neurons downstream. This scenario is depicted in Fig. 5**E**. These three alternatives highlight the more general scenario in which the evolutionary pressure to reduce axonal wiring in an ever-increasing brain volume (Chklovskii, Schikorski, and Stevens 2002; Chen, Hall, and Chklovskii 2006) is solved by compressing information on a limited number of axons and relying on sophisticated dendritic integration to decode these signals. In this context we have shown that employing multiple synaptic contacts between the axon and its post-synaptic neuron enables dendrites to better decode spatiotemporal patterns and save “hardware” (reduced total axon length, reduced number of neurons).

## Discussion

Neurons in the brain typically connect to each other via multiple synaptic contacts. The computational role of this redundancy is unclear. In this work we augmented the commonly used Leaky Integrate and Fire (I&F) neuron model with simplified dendrites that filter the incoming spike train via a set of multiple post synaptic potential (PSP) filters. Each filter corresponds to a particular synaptic contact, with varied time constants that directly relate to signal filtering by the cable properties of the dendritic tree. This is in contrast to a single PSP filter for all contacts in the I&F point neuron model. We term this new neuron model Filter and Fire (F&F). Specifically, we have demonstrated that the capacity of the F&F neuron to memorize precise input-output relationships is increased by a factor of ∼3 compared to that of the regular I&F. The capacity is measured as the ratio between the number of precisely timed output spikes and the number of incoming input axons. This ratio is ∼0.15 for the I&F case as shown by Memmesheimer et al. (Memmesheimer et al. 2014); and grew to ∼0.45 spikes per axons for the F&F model.

Next, we constructed a new spatiotemporal pattern discrimination task using the MNIST dataset and demonstrated that our neuron model can learn to detect single digits at well-above chance level performance on an unseen test set, whereas an I&F neuron model cannot learn the task at all, because in the specific way we chose to represent each digit – the task does not contain enough spatial-only information suitable for I&F neuron discrimination. Our specific task design was deliberately chosen to highlight this temporal aspect of pattern discrimination that is possible when using taking into account the temporal filtering due to cable properties of dendrites. We show that multiple synaptic connections with different PSP profiles allow the neuron to effectively parametrize the temporal profile of the PSP influence of each pre-synaptic axon on the somatic membrane potential. This is enabled by modifying the weight of the various (multiple) contacts made between the axon and the post synaptic cell. We show that all PSPs can be spanned by a 3 basis PSP filters, each with a different temporal profile. Taken together, this suggests that the F&F neuron model provides a low temporal frequency approximation to a spatio-temporal perceptron that assigns independent weights to each point in time. An alternative description that is mathematically equivalent, is that the F&F model effectively bins the membrane integration time into 3 non-uniform time bins and can learn to assign independent weights for each bin in the past.

Our study demonstrates that even when considering highly simplified neuron model as used here, that implements the passive temporal filtering aspect of dendrites, a computational role of dendrites is unraveled for the seeming redundancy of multiple synaptic contacts between pairs of neurons in the brain. Dendrites therefore allow us to salvage some of the redundant connectivity and put those synaptic weight parameters to good use. This allows for increased memorization capacity, but perhaps more importantly to detect specific spatiotemporal patterns in the input axons. The spatiotemporal filtering properties of dendritic processing are prominently featured in a recently published study describing how single neurons produce output spikes in response to highly complex spatiotemporal patterns (Beniaguev, Segev, and London 2021). It is important to note that there is an additional nonlinear amplification aspect in dendrites (Poirazi and Mel 2001; Poirazi, Brannon, and Mel 2003; Polsky, Mel, and Schiller 2004; Bicknell and Häusser 2021) that we did not consider in this study and that is likely to provide additional computational benefits also in the context of multiple synaptic contacts, as the previous work by Beniaguev et al. greatly suggests (Beniaguev, Segev, and London 2021). We have therefore added in the present study a new perspective on the growing literature regarding the computational function of individual neuron (McCulloch and Pitts 1943; Rosenblatt and F. 1958; Oja 1982; Hyvarinen and Oja 1998; Moldwin and Segev 2018; Golkar et al. 2020; Pehlevan et al. 2020; Gütig and Sompolinsky 2006a; Poirazi, Brannon, and Mel 2003; Häusser and Mel 2003; Moldwin, Kalmenson, and Segev 2021; Zador, Claiborne, and Brown 1991; Mel 1992; Poirazi and Mel 2001)

It is notable that it is typically believed that the phenomenon of multiple synaptic contacts is primarily attributed to noise reduction related to unreliable synaptic transmission, but our work strongly suggests that this is only a small part of the story from a computational standpoint. In fact, we would like to propose that each synapse can reduce its own noise by employing a multiple vesicle release (MVR) tactic if needed, as was also recently suggested by (Rudolph et al. 2015). If this is in fact the case, it is possible that unreliable synaptic transmission might play a computational role of its own and rather than being an unwanted “bug”, it might turn out to be a useful “feature”. More specifically, it might play a role that is similar to dropout (Srivastava et al. 2014) or more precisely drop-connect (Wan et al. 2013) which are commonly used in present day artificial neural networks paradigm. There, drop-connect is typically employed as a regularization technique that reduces overfitting and usually improves generalization. Although this was not the main focus of our work, we briefly demonstrate this effect in Figure 3**F**.

The question of how trains of spikes represent information in the nervous system has been a long-standing question in neuroscience, since its inception. A major debate revolves around whether information is largely carried by firing rates averaged over relatively long time periods, or rather that precisely timed spikes carry crucial bits of information. Evidence for both alternatives has been found for both sensory systems and motor systems and much theoretical work on this key topic has been conducted (Meister, Lagnado, and Baylor 1995; Christopher Decharms and Merzenich 1996; Wehr and Laurent 1996; Neuenschwander and Singer 1996; Johansson and Birznieks 2004; Hopfield 1995; Castelo-Branco et al. 2000; Thorpe, Delorme, and Van Rullen 2001; DeWeese, Wehr, and Zador 2003; Kara, Reinagel, and Reid 2000; London et al. 2010; Abeles et al. 1993; Schneidman, Freedman, and Segev 1998; Memmesheimer et al. 2014; Florian 2012; London et al. 2002; Abeles 1982; Gütig and Sompolinsky 2006b; Maass and Schmitt 1999). Since dendrites and multiple synaptic connections play such a crucial role in decoding incoming spike trains and increase the neuronal repertoire in emitting precisely times output spikes in response to spatiotemporal input patterns (and they might do this using a simple biologically plausible learning rule), we wish to suggest that dendritic hardware should be considered when discussing the question of the neural code and what information is transmitted via axons. In (Perez-Nieves et al. 2021), the authors show that a diversity of time constants helps increasing the computational repertoire of spiking networks.

Here we suggest that a similar thing might happen at the neuronal level and not only at the network level. The fact that a single neuron can decode complex spatiotemporal patterns on its own and does not require a highly coordinated decoding network of neurons to extract temporal information from incoming spike trains, not only allows for potential “hardware” savings as we illustrated in Fig. 5, but also suggest that information might be ubiquitously encoded by precise spike times throughout the central nervous system. A single neuron can emit precisely timed output spikes in response to spatiotemporal inputs, as we showed here and was previously shown by (Memmesheimer et al. 2014). It is therefore not required to have a large and highly coordinated network of neurons to encode temporally precise patterns that can be sent via axons. The fact that a single neuron can do this on it’s own without the use of network mechanisms, somewhat increases the likelihood of information being encoded by precise spike timing as opposed to average firing rates (over relatively long periods of time) throughout the CNS.

Lastly, in recent years, multiple groups around the world have started to create detailed reconstructions of rodent and human brains, and report neuronal connectivity maps with the help of electron microscopy (EM) (Kasthuri et al. 2015; Motta et al. 2019; Shapson-Coe et al. 2021). We would like to suggest that analyzing these EM datasets, focusing on the number of multiple contacts and their locations on the dendritic tree, might shed some additional light on the extent and role of multiple synaptic contacts between different cell types and brain regions, and hint to the possible “style” of information processing in the network based solely on EM data. As illustrated by our work, if two neurons form a connection on distal dendrites, or if they form a connection on proximal dendrites, these will result in completely different influences (time course) of the somatic voltage and, therefore, on the temporal coding capabilities of the neuron. Indeed, simply reporting connectivity maps and even the size of post synaptic density (PSD) areas is not enough to determine the temporal influence of the connection on the post synaptic cell, as the dendritic location of the synapse is key, as was also suggested in (Liu et al. 2021). Also, due to the nonlinear amplification of dendrites, it will be crucial to know whether two pre-synaptic neurons connect to similar locations on the dendritic tree as they are much more likely to undergo nonlinear amplification (and might produce additional broad/slow NMDA or Calcium dependent temporal filters) if activated at similar times.

## Methods

### F&F neuron simulation details

A F&F neuron receives as input *N*_{axons} input axons, their spike trains will be represented by and they will be denoted by index *i*. Each axon connects to the dendrite via *M* contacts Each contact connects on the dendrite on a location that will be denoted by index *j*, and filters the incoming axon spike train by with a specific synaptic kernel *K*_{j}(*t*). The kernels are typical double exponential PSP shapes of the form where A is a normalization constant such that each filter has a maximum value of 1, and *τ*_{decay,j}, *τ*_{rise,j,} are different for each contact, sampled randomly and independently for each contact from the ranges *τ*_{decay,j} ∈ [12*ms*, 30*ms*], *τ*_{rise,j} ∈ [1*ms*, 12*ms*]. Different kernel parameters represent a randomly connected axon-dendrite location.

The result of the kernel filtering of the corresponding input axons forms the contact voltage contribution trace . There are a total of *M* · *N*_{axons} such contact voltage contributions traces overall. In vector notation we denote . Each synaptic contact has a weight, *w*_{j}. In vector notation we write . each local synaptic response is multiplied by its corresponding weight to form the somatic voltage *V*_{s}(*t*) = *w*^{T} · *V*_{c}(*t*) = ∑_{j} *w*_{j} · *V*_{c,j} (*t*). When threshold is reached, the voltage is reset, and a negative rectifying current is injected that decays to zero with time constant of 15ms. Note that due to mathematical simplicity we do not impose any restrictions on synaptic contact weights, each weight can be both positive or negative regardless of which axon it comes from.

### I&F neuron simulation details

The I&F simulation details are identical in all ways to the F&F neuron, except that all of it’s contact kernels are identical , where *τ*_{rise,I&F} = 1*ms* and *τ*_{decay,I&F} = 30*ms*.

### Capacity calculation experimental details

To measure capacity, we sample *N*_{axons} random spike trains to serve as axons from Poisson instantaneous firing rate of 4Hz for a period of 120 seconds. We randomly distribute *N*_{spikes} output spikes throughout the 120 second time period to generate *y*_{GT}(*t*). For the sake of mathematical simplicity and not dealing with reset issues, we make sure that the minimal distance between two consecutive spikes is at least 120ms (= *4* · *τ*_{decay,I&F}). We bin time to 1ms time bins. We calculate *V*_{c}(*t*) for the entire trace. Our task is to find a set of weights such that *y*_{GT}(*t*) = *φ*(*w*^{T} · *V*_{c}(*t*)), where *φ* (·) is a simple thresholding function. This is in essence a binary classification dataset with 120,000 timepoints (milliseconds), for each of those timepoints the required output of a binary classifier is either 1 (for time points that should emit an output spike), or 0 for all other timepoints. We have a total *M* · *N*_{axons} weights that need to fit the entire 120K samples dataset. We calculate the AUC on the entire 120K datapoints dataset. We declare that the fit was successful when AUC > 0.99. We repeat the procedure for various values of *N*_{spikes}. The maximal value of *N*_{spikes} that we still manage to fit with AUC > 0.99 is termed *N*_{spikes,max} and the capacity measure is this number normalized by the number of axons used *Capacity*(*M*) = *N*_{spikes,max}*/N*_{axons}. Note that due to *N*_{spikes} ≪ 120,000, the capacity of the problem effectively doesn’t depend on the length of the time period we use, but by rather the number of spikes we wish to precisely time.

### Spatio-temporal MNIST task details

The MNIST dataset contains 60,000 training images and 10,000 test images. The images are of size 28×28 pixels. We crop the images at the center to be 20×20 pixels, and binarize the values. We the conver the horizontal spatial image dimension (width) into a temporal dimension by uniformly warping the time such that 20 horizontal pixels will be mapped into T milliseconds. T is the pattern presentation duration. The vertical spatial image dimension (height) is simply replicated 5 times so that 20 vertical pixels will be mapped into 100 axons. The entire training set is the concatenated sequentially (in random ordering), with 70ms of zeros between every two patterns. We then sample spikes for every 1ms time bin according to each axon’s instantaneous firing rate to generate the raster for the entire training set. On top of that we add an additional background noise rate. The output ground truth is a single output spike 1ms after the pattern is presented for the positive class digit, and no spikes for negative patterns. The fitting of the model is identical to that described in the capacity sections. i.e., we wish to find a set of weights such that *y*_{GT}(*t*) = *φ*(*w*^{T} · *V*_{c}(*t*)), where *φ* (·) is a simple thresholding function. In this case, we allow for wiggle room when evaluating performance on the test set. A successful true positive (hit) is achieved if at least 1 spike has occurred in the time window of 10ms around the ground truth desired spike. A failed false positive (false alarm) is considered if a spike has occurred during the time window of 10ms around the end of the pattern presentation. We measure the classification accuracy under these criteria. We sometimes train only on a part of the training dataset and use only *N*_{positive samples} as the positive class. In the regime of unreliable synaptic transmission with some synaptic release probability *p*, we sample the spikes of the input patterns once, and then present it *N*_{epochs} = 15 times, each time with random release probability samples for each pre-synaptic spike for each contact. In those cases, we perform the same procedure also for the test set (i.e., display the same pattern *N*_{epochs} = 15 times, each time with different synaptic release sampled for each pre-synaptic spike for each contact)

### Spatio-temporal Logistic Regression (LR) details

Spatio-temporal temporally sliding logistic regression (LR) model is a non-biologically plausible that is specifically used to serve as an aspirational model used for comparison for our proposed F&F neuron. It has a 2-dimensional weight matrix *W*_{LR}(*s, t*) where *s* spans the spatial dimention and *t* spans the temporal dimension. Mathematically, the equation describing the relationship between the input and the output of the model is given by . Note that this model has in total *N*_{axons} · *T*_{LR} weights (time is discretized into 1ms time bins here as well, as throughout this study).

### F&F learned weights visualization

In the F&F model a single input axon is filtered by multiple contact kernels . In order to display the effective linear model weights, we can group together all kernels that relate to the same axon . Therefore, the term is a composite kernel that filters each axon. This is a function that can be visualized for each input axons, and therefore related directly to the input space.

## Code and data availability

All code necessary to reproduce all results of this paper are available on GitHub via the link: https://github.com/SelfishGene/filter_and_fire_neuron

All data and live scripts to reproduce all figures are available on Kaggle at the following link: https://www.kaggle.com/selfishgene/fiter-and-fire-paper

## Acknowledgements

We thank all lab members of the Segev and London Labs for many fruitful discussions and valuable feedback. This work was supported by the Drahi family foundation [https://plfa.info/], a grant from the ETH domain for the Blue Brain Project [https://www.epfl.ch/research/domains/bluebrain/], the Gatsby Charitable Foundation [gatsby.org.uk] and the NIH Grant Agreement U01MH114812 (I.S.).

## Footnotes

Updated Author List on bioRxiv (in the manuscript itself everything was OK in the first place)