## Abstract

We introduce a novel approach to study neurons as sophisticated I/O information processing units by utilizing recent advances in the field of machine learning. We trained deep neural networks (DNNs) to mimic the I/O behavior of a detailed nonlinear model of a layer 5 cortical pyramidal cell, receiving rich spatio-temporal patterns of input synapse activations. A Temporally Convolutional DNN (TCN) with seven layers was required to accurately, and very efficiently, capture the I/O of this neuron at the millisecond resolution. This complexity primarily arises from local NMDA-based nonlinear dendritic conductances. The weight matrices of the DNN provide new insights into the I/O function of cortical pyramidal neurons, and the approach presented can provide a systematic characterization of the functional complexity of different neuron types. Our results demonstrate that cortical neurons can be conceptualized as multi-layered “deep” processing units, implying that the cortical networks they form have a non-classical architecture and are potentially more computationally powerful than previously assumed.

## Introduction

Neurons are the computational building blocks of the brain. Understanding their input-output (I/O) transformation has therefore been a major quest in neuroscience since Ramon y Cajal’s “neuron doctrine”. With the recent development of sophisticated genetical, optical and electrical techniques it has become clear that many key neuron types (e.g., cortical and hippocampal pyramidal neurons, cerebellar Purkinje cells) are highly complicated I/O information processing devices. They receive a barrage of thousands of synaptic inputs via their elaborated dendritic branches; these inputs interact with a plethora of local nonlinear regenerative processes, including the back-propagating (*Na*^{+}-dependent) action potential (G. J. Stuart and Sakmann 1994), the multiple local dendritic NMDA-dependent spikes (Polsky, Mel, and Schiller 2004; Branco, Clark, and Häusser 2010; Kastellakis et al. 2015), and the large and prolonged *Ca*^{2+} spike at the apical dendrite of L5 cortical pyramidal neurons (M E Larkum, Zhu, and Sakmann 1999). As a result of local nonlinear dendritic processing, a train of output spikes are generated in the neuron axon, carrying information that is communicated, via synapses, to thousands of other (postsynaptic) neurons. Indeed, as a consequence of their inherent nonlinear mechanisms, neurons can implement highly complicated I/O functions (Poirazi, Brannon, and Mel 2003b; Shepherd et al. 1985; C Koch, Poggio, and Torres 1982; Bar-Ilan, Gidon, and Segev 2012; Christof Koch and Segev 2014; London and Häusser 2005; Behabadi and Mel 2013; Häusser and Mel 2003; Mel 1992; Poirazi, Brannon, and Mel 2003a; Moldwin and Segev 2018; Hawkins and Ahmad 2016; Zador, Claiborne, and Brown, n.d.; Bai, Zico Kolter, and Koltun 2019; Cazé, Humphries, and Gutkin 2013; Doron et al. 2017; Greg Stuart, Spruston, and Häusser 2007), and see recent work on dendritic computations in human cortical neurons in (Gidon et al. 2020).

A classical approach to study the I/O relationship of neurons is to ignore their biological details and obtain a highly reduced phenomenological abstraction of the neuron’s I/O characteristics (McCulloch and Pitts 1943; Lapicque 1907). One such abstraction was provided by the “perceptron” (Rosenblatt and F. 1958), which lies at the heart of some of the most advanced pattern recognition techniques to date (LeCun, Bengio, and Hinton 2015) . However, the basic function of the perceptron, a linear summation of its inputs and thresholding for output generation, highly oversimplifies the synaptic integration processes taking place in real neurons. Some studies have addressed this gap, but not at the scale of a realistic number of synaptic inputs, full dendritic biophysical mechanisms, and high temporal precision of spike output (Poirazi, Brannon, and Mel 2003b; Polsky, Mel, and Schiller 2004; Ujfalussy et al. 2018).

Another common approach to study the I/O of neurons is to simulate, via a set of partial differential questions, the fine electrical and anatomical details of the neurons using the cable and compartmental modeling methods introduced by Rall (Rall 1959; Rall 1964; Segev and Rall 1988). Using these models, it is possible to account for nearly all of the above experimental phenomena and to explore conditions that are not accessible with current experimental techniques. However, this success comes at a price because compartmental and cable models are composed of a high dimensional system of coupled nonlinear differential equations, which is notoriously challenging to understand (Strogatz 2001). Specifically, it is a daunting task to extract general principles that govern the transformation of thousands of synaptic inputs to a train of spike output at the millisecond precision from such detailed simulations, but see (Rapp, Yarom, and Segev 1992; G. J. Stuart and Sakmann 1994; G Stuart et al. 1997; Matthew E Larkum et al. 2009; Schiller et al. 2000; Magee and Johnston 1995; Spruston et al. 1995).

Here we propose a novel approach to study the neuron as a sophisticated I/O information processing unit by utilizing recent advances in the field of machine learning. Specifically, we utilized the capability of deep neural networks (DNNs) to learn very complex I/O mappings, in particular that of neurons. Towards this end, we trained DNNs with rich spatial and temporal patterns to mimic the I/O behavior of a layer 5 cortical pyramidal neuron with its full complexity, including its elaborated dendritic morphology, the highly nonlinear local dendritic membrane properties, and a large number of excitatory and inhibitory inputs that bombard the neuron. Consequently, we obtained a computationally efficient DDN model that faithfully predicted the output of this neuron at a millisecond temporal resolution. We then analyzed the weight matrices of the DNN to gain new insights into the I/O function of cortical neurons. By systematically varying the DNN size, this approach allowed us to characterize the functional complexity of a single biological neuron and to pin down its origin. We demonstrated that cortical pyramidal neurons, and the networks that they form, are potentially computationally much more powerful, and “deeper”, than previously assumed.

## Results

Our goal is to fit the I/O relationship of a detailed biophysical neuron model to an analogous DNN. This DNN receives, as a training set, the identical synaptic input and the respective axonal output of the biophysical model. By changing the connection strengths of the DNN using a back-propagation learning algorithm, the DNN should replicate the I/O transformation of the detailed model. To accommodate for the temporal aspect of the input, we employed temporal convolutional networks (TCN) throughout the study.

Figure 1 demonstrates the feasibility and usefulness of this paradigm starting with the I/O transformation of a well understood neuron model: the integrate and fire (I&F) neuron (Lapicque 1907; Burkitt and N. 2006). This neuron receives a train of random synaptic inputs and produces a subthreshold voltage response as well as a spiking output (see **Methods**). But what is the simplest DNN that faithfully captures the I/O properties of this most basic single neuron model? To answer this question, we constructed DNN consisting of one hidden layer with a single hidden unit (Fig. 1**A**). The time axis was divided into 1ms time bins in which only a single spike can occur in the I&F neuron model. The objective of the DNN network is to predict the binary spike output of the I&F model at time t_{0}, based on the preceding input spike trains up to t_{0}. This input is represented using a binary matrix of size *N*_{syn} × *T*, where N_{syn} is the number of input synapses, and *T* is the number of preceding time bins considered (Fig. 1**B**). We used *N*_{syn} = 100, and trained a DNN with a single hidden unit on 5000 seconds of simulated data. When using T = 80ms we achieved a very good fit, namely: a simple DNN with a single hidden unit that accurately predicted both the subthreshold voltage dynamics as well as the spike output of the respective I&F neuron model at a millisecond precision (Fig. 1**C**).

Figure 1**D** depicts the weights (“filters”) of the single hidden unit of the respective DNN as a heatmap. It shows that the learning process automatically produced two classes of weights (“filters”), one positive and one negative, corresponding to the excitatory and inhibitory inputs impinging on the I&F model. In agreement with our understanding of the I&F model, the excitatory inputs contribute positively to output spike prediction (red color), whereas the inhibitory inputs contribute negatively it (blue color). Earlier inputs, either inhibitory or excitatory, contribute less to this prediction (teal color). Figure 1**E** depicts the temporal cross-section of those filters and reveals an exponential profile that reflects the temporal decay of post synaptic potentials in the I&F model (in the reverse time direction). From these filters one can recover the precise membrane time constant of the I&F model. These two temporal filters (excitatory and inhibitory) are easily interpretable as they agree with our previous understanding of the temporal behavior of synaptic inputs that give rise to an output spike in the I&F model.

Figure 1**F** quantifies the model performance in terms of spike prediction using the Receiver Operating Characteristic (ROC) curve (Fig. 1**F** left and see **Methods**) and the area under it (AUC). AUC for the I&F case is 0.9971, indicating a very good fit. Figure 1**F** right shows an additional quantification of spike temporal precision using the DNN prediction, by plotting the cross correlogram between the predicted spike train and the target I&F simulated spike train (the “ground truth”). The cross correlogram shows a sharp peak at 0 millisecond and has a short, ~10 millisecond, half-width, suggesting high temporal accuracy of the DNN. We also quantified the DNN performance to predict the subthreshold membrane potential by using standard regression metrics, and, in Fig 1**G**, depict the scatter plot of the predicted voltage versus the ground truth simulated output voltage. The Root Mean Square Error (RMSE) is 1.23mV, indicating a good fit between the I&F and the respective DNN.

In conclusion, as a proof of concept, we have demonstrated that a simple DNN can learn the I/O transformation of a simple I&F model with a high degree of temporal accuracy. The weight matrix (the “filter”) obtained by the learning process represents known features of the I&F model, including the existence of two classes of inputs (excitatory and inhibitory), the convolution of the synaptic inputs with the exponential decay representing the passive membrane RC properties, and the transformation from subthreshold membrane potential to spike output.

We next applied our paradigm to a morphologically and electrically complex detailed biophysical compartmental model of a 3D reconstructed layer 5 cortical pyramidal cell (L5PC) from rat somatosensory cortex (Fig. 2**A**). The model is equipped with complex nonlinear membrane properties, a somatic spike generation mechanism and an excitable apical nexus capable of generating calcium spikes (Hay et al. 2011; M E Larkum, Zhu, and Sakmann 1999). The excitatory synaptic inputs are mediated through both voltage-independent AMPA-based conductance as well as by voltage-dependent NMDA-type conductance; the inhibitory inputs are mediated through conductance-based GABA_{A}-type synapses. Both excitatory and inhibitory synapses are uniformly distributed across the dendritic tree of the model neuron (see **Methods**).

A thorough search of configurations of deep and wide fully-connected neural network architectures (FCNs) have failed to provide a good fit to the I/O characteristics of the L5PC model. These failures suggest a substantial increase in the complexity of I/O transformation compared to that of I&F. Indeed, only temporally convolutional network architecture (TCN) with 7 layers and 128 channels per layer, provided a good fit (Fig. 2**B, C** Fig. S5**)**. The example in Fig. 2C shows that this TCN, when provided with a previously unseen input pattern from the test set, can predict the somatic voltage and spikes of a highly complex neuron with high precision.

It is important to note that the accuracy of the model was insensitive to the temporal kernel sizes of the different DNN layers when keeping the total temporal extent of the entire network fixed, so the temporal extent of the first layer was selected to be larger than subsequent layers mainly for visualization purposes (see Fig. 2**G**-**I**). Figure 2**H** shows a filter from a unit at the first layer of the DNN. This filter is somewhat similar to the filter in Fig. 1**D** but integrates only basal and oblique subtrees and ignores the inputs from the apical tree. Figure 2**I**, however, shows a filter of another unit that, in contrast to Fig. 2**H**, has negligible weights assigned for basal and oblique dendrites but a very strong apical tuft dependency. By examining additional first layer filters (not shown) we found a wide variety of different activation patterns that the TCN utilized as an intermediate representation, including many temporally directionally selective filters (similar to those of Fig. 4**D**). Figure 2**D**,2**E**,2**F** shows the quantitative performance evaluation of this DNN model. For binary spike prediction (Fig. 2**D**), the AUC is 0.9911. For somatic voltage prediction (Fig. 2**E**), the RMSE is 0.71mV and 94.6% of the variance is explained by this model. Note that, despite its seemingly large size, the resulting TCN represents a substantial decrease in computational resources relative to a full simulation of a detailed biophysical model (involving numerical integration of thousands of non-liner differential equations), as indicated by a speedup of simulation time by several orders of magnitude.

Now that we have obtain an alternative DNN model that can replicate very accurately the I/O relationship of a detailed biophysical/compartmental model of a real neuron, can we learn from it what are the essential features that contribute to neuron complexity? Detailed studies of synaptic integration in dendrites of cortical pyramidal neurons suggested the primary role of the voltage-dependent current through synaptic NMDA receptors, including at the subthreshold and suprathreshold (the NMDA-spike) regimes (Polsky, Mel, and Schiller 2004; Branco, Clark, and Häusser 2010). As NMDA receptors depend nonlinearly on voltage it is highly sensitive not only to the activity of the synapse in which the receptors are located but also to the activity of (and the voltage generated by) neighboring synapses and to their dendritic location. Moreover, the NMDA-current has slow dynamics, promoting integration over a time window of tens of milliseconds (Major, Larkum, and Schiller 2013; Doron et al. 2017). Consequently, we hypothesized that removing NMDA dependent synaptic currents from our L5PC model will significantly decrease the size of the respective DNN.

Fig. 3 shows that, after removing the NMDA voltage dependent conductance, such that the excitatory input relies only on AMPA mediated conductances, we have managed to achieve a similar quality fit as in Fig. 2 when using a much smaller network - a fully connected DNN (FCN) with 128 hidden units and only a single hidden layer (Fig. 3**B**). This significant reduction in complexity is due to the ablation of NMDA channels. Also, in our DNN training attempts, we have failed to achieve a good fit when using the architecture that was successful for the I&F model neuron shown in Fig. 1. This indicates that, whereas the DNN model for L5PC is greatly simplified in the absence of NMDA conductance, additional neuronal mechanisms still contribute to the richness of its I/O transformation as compared to that of the I&F neuron model.

Fig. 3**C** shows an exemplar test trace for the DNN illustrated in Fig. 3**B**, whereas **Fig.** 3**H** depicts a representative exemplar of the weight matrix for one of the hidden units of the DNN. By examining the filters of the hidden layer of the DNN we observed that the weights representing inputs to the oblique and basal dendrites had profiles that resembles PSPs (but mirrored in time). Interestingly, the weights to the apical tuft are essentially zero. This pattern remains consistent for all first layer filters of the network, implying that, for this model, the apical dendritic synapses had negligible information regarding predictions of the output spikes of the neuron, even in the presence of calcium spikes occasionally occurring in the nexus. Contrasting this filter with the one presented in Fig. 2**I**, suggests that the NMDA non-linearity greatly assists in the activation of apical tuft dendrites.

Fig. 3**D**, 3**E** and 3**F** show the quantitative performance evaluation. For binary spike prediction (Fig. 3**E**), the AUC is 0.9913. For somatic voltage prediction (Fig. 3**F**), the RMSE is 0.58mV and 95.0% of the variance is explained by this model.

To further investigate the NMDA contribution to computational complexity, we studied it in isolation on a single dendritic branch and attempted to fit the somatic voltage of a Layer 2/3 Pyramidal Cell (L23PC), taken from the visual cortex of the mouse (Branco, Clark, and Häusser 2010), in response to random activation of only 9 excitatory synapses uniformly distributed across a single dendritic branch (Fig. 4**A**). We found that a single dendritic branch with NMDA synapses is faithfully captured by a single layer of a fully connected DNN with 4 hidden units (Fig. 4**A**&**B**). Examining the 4 filters of the first layer reveals interesting shapes that make intuitive sense (as first explored by the pioneering theoretical studies of (Goldstein and Rall 1974; Rall 1969; Rall 1964)). The topmost filter in Fig. 4**D** appears to be summing only very recent and proximal dendritic activation. The second-from-top hidden unit sums up recent distal dendritic synaptic inputs. The third filter clearly shows a direction selective hidden unit, preferring patterns in which synaptic activation are temporally activated from distal to proximal series, and the last hidden unit responds to a prolonged distal dendrite summation of activity combined with precisely timed proximal input activation.

Fig. 4**C** specifically examines the special cases studied in (Branco, Clark, and Häusser 2010), of sequential activation of the 9 synapses in the distal to proximal direction and, conversely, in the proximal to distal direction. Our DNN network, trained on random synaptic activation patterns, successfully replicated this behavior. Fig. 4**E** shows our reconstruction of the results of Branco et al. (Branco, Clark, and Häusser 2010), whereby a directionality index was suggested as a possible predictor for the peak somatic voltage for random activation sequence of the 9 input synapses. Fig. 4**F** shows the prediction of the respective DNN for the same sequences as in Fig. 4**E**. Interestingly, our DNN acts as a much better predictor for the peak somatic voltage. It is important to note that the special case of 9 synaptic activations equally spaced in time is highly unlikely to occur during the random input stimulation regime which was used to train the DNN. Nevertheless, the network can generalize even to this new input regime with high precision.

These results demonstrate the power of our approach in interpretability and ability of our models to generalize to previously unseen input patterns (out of distribution generalization). By examining the four kernels we provide an intuitive (Fig. 4D), yet powerful (Fig. 4F), interpretation for the complex process of nonlinear synaptic integration in a single dendrite with NMDA synapses. In addition, the network generalizes and predicts the outputs for temporal patterns very different from those on which it was trained.

## Discussion

Recent advances in the field of deep neural networks (DNNs), provide, for the first time, a powerful general-purpose tool that can learn a mapping of I/O relationships from examples, including that of single complex nonlinear neurons, as demonstrated by the present work. We constructed a large dataset of pairs of input-output examples by simulating a neuron model receiving a rich repertoire of synaptic inputs and recorded its output in terms of spikes at millisecond precision as well as its subthreshold membrane potential. We then trained networks of various configurations on these input-output pairs until we obtained an analogous “deep” network with close performance to that of the neuron’s detailed simulation. We applied this framework to a series of neuron models with various levels of morpho-electrical complexity.

For simple I&F neuron models, our framework provides simple DNNs that capture the full I/O relationship of the model while providing key insights that are consistent with our understanding of the I&F models (Fig. 1). Surprisingly, even a neuron model of a layer 5 cortical pyramidal neuron with complex dendritic trees and with a host of dendritic voltage dependent currents and AMPA-based synapses is well-captured by a relatively simple network with a single hidden layer (Fig. 3). However, in a full model of a L5 pyramidal neuron consisting of NMDA-based synapses, the complexity of the network is significantly increased, and we found a good fit with a temporal-convolutional network that has 7 hidden layers (Fig. 2, see also Fig. S4 & S5). However, in the case where there wasonly a single dendritic branch consisting of NMDA synapses, a shallow (one hidden layer) DNN with only a few units was required to capture different aspect of spatio-temporal integration of synaptic inputs (Fig. 4).

These results suggest that the single cortical neuron with its nonlinear synaptic inputs is already, on its own, a sophisticated computational unit. Consequently, cortical networks are “deeper” and computationally more powerful than they seem to be just based on their anatomical (pre-to-post synaptic) connections. Importantly, the implementation of the I/O function using a DNN also provides practical advantages as it is much more efficient than the traditional compartmental model. In our tests we obtained speed up of several orders of magnitude when using the DNN instead of its compartmental-model counterpart. This could make an important contribution to simulating large scale realistic neuronal networks (Markram et al. 2015; Egger et al. 2014). Furthermore, the size of the respective DNN for a given neuron could be used (under certain assumptions, see below) as an index for its computational power; the larger it is the more sophisticated computations this neuron could perform. Such an index will enable a systematic comparison between different neuron types (e.g., CA1 pyramidal cell, cortical pyramidal cell, and Purkinje cell, or for the same type of cell in different species, e.g., mouse vs. human cortical pyramidal cells).

It is important to emphasize that, due to optimization, the complexity measure described above is an upper bound of the true computational complexity of the I/O of a single neuron, i.e., it is possible that there exists a much smaller neural network that could mimic the biophysical neuron with a similar degree of accuracy but the training process we used could not find it. Additionally, we note that we have limited our architecture search space only to fully connected (FCN) and temporally convolutional (TCN) neural network architectures. It is likely that additional architectural search could yield even simpler and more compact models for any desired degree of prediction accuracy. In order to facilitate this search inthe scientific community, we hereby release our large readymade dataset of simulated inputs and outputs of a fully complex single layer 5 cortical neuron in an *invivo* like regime so that the community can focus on modelling various aspects of this endeavour and avoid running the simulations themselves.

The analysis of deep neural networks is a challenging task and a rapid growing field (Olah, Mordvintsev, and Schubert 2017; Mordvintsev, Olah, and Tyka 2015; Mahendran and Vedaldi 2015). Nevertheless, observing the weight matrix of units (“filters”) in first layer of the respective DNN is straightforward and can provide ample insights regarding the I/O transformation of the neuron. The full network can be interpreted as consisting of a basic set of filters that span the space of possible spatio-temporal patterns of synaptic inputs that will drive the original neuron to spike. The first layer defines this space, and the rest of the network mixes and matches within that space. For example, as shown in Fig. 3, in the case of a pyramidal neuron without NMDA synapses, most filters have significant weights only for basal and oblique inputs, and the weight given for apical tuft synapses is negligible (despite the existence of voltage dependent Ca^{2+} and other nonlinear currents in this model including occasional Ca^{2+} spikes). The picture is fundamentally different when NMDA synapses were included in the model. In this case the weights assigned to apical dendrite synapses is significant. Moreover, the filters devoted to these apical inputs tend to have a temporal structure that is significantly wider (in time) than for the proximal synapses, suggesting that the temporal precision of input to the apical synapses is less important. These are basic insights that could be drawn by just observing first layer filters of the resulting DNN.

This work opens multiple additional avenues for future research. One important direction is the isolation of the contribution of specific mechanisms to the computational power of the neuron in a similar way to that performed here for NMDA (See also Fig. S5 & S7). By fitting a DNN while manipulating specific dendritic currents (e.g., VGCa, Kv, HCN channels), we will better understand their contribution to the overall synaptic integration process and to the complexity of the respective DNN. An additional interesting direction is to utilize this work to improve our understanding of dendritic integration. Rather than modelling a neuron bombarded with random meaningless input, one could utilize the neuron’s equivalent-DNN to make the neuron compute some interesting meaningful functions. For example, now that we estimate that a cortical L5 pyramidal neuron is equivalent to a deep network with 7 hidden layers, this DNN could be used to teach the respective neuron to implement a function which is in the scope of the capabilities of such a network, such as classifying hand written digits or a sequence of auditory sounds. One can then both validate the hypothesis that single neurons could perform complex computational tasks and investigate how these neurons can implement such complex tasks.

If one cortical neuron is equivalent to a multi-layered DNN, what are the implications for the cortical microcircuit? Is that circuit merely a “deeper” DNN composed of simple “point neurons”? In fact, a key difference is that, under our model, synaptic plasticity can take place only in the synaptic layer (input) of the respective DNN for a single cortical neuron, whereas the weights of the hidden layers are fixed (and dedicated to represent the I/O function of that single cortical neuron). Taken together with the myriad of recurrent connections and network motifs between cortical neurons of different types (Markram et al. 2015), we hereby propose a very specific network architecture for cortical networks. Indeed, the architecture of artificial neural networks is one of the most rewarding avenues of machine learning today (He et al. 2015; Vaswani et al. 2017; Lin, Chen, and Yan 2014), and studying the specific architecture suggested by our work may unravel some of the inductive bias hidden within the cortical microcircuit and harness it for future AI applications.

## Methods

### I&F simulation

For Fig.1 simulations, membrane voltage was modeled using a leaky I&F simulation , where *w*_{i} denotes synaptic efficacy for each synapse, *t*_{i} denotes presynaptic spike times, and *K*(*t* − *t*_{i}) denotes the temporal kernel of each postsynaptic potential (PSP). We used a temporal kernel with exponential decay where *u*(*t*) is the Heaviside function and *τ* = 20*ms* is the membrane time constant. When threshold was reached, an output spike was recorded, and the voltage was reset to *V*_{rest} = −95*mV*. As input to the simulated I&F neuron, *N*_{exc} = 80 excitatory synapses and *N*_{inh} = 20 inhibitory synapses were used. Synaptic efficacies of *w*_{exc} = 5*mV* were used for excitatory synapses and *w*_{inh} = −5*mV* for inhibitory synapses. Each presynaptic spike train was taken from a Poisson process with a constant instantaneous firing rate. Values used *f*_{exc} = 3.3*Hz* for excitatory synapses and *f*_{inh} = 3.2*Hz* for inhibitory synapses. Resulting output average firing rate for these simulation values was 2.1Hz.

### L5PC simulation

For Fig. 2 and Fig. 3 simulations, we used a detailed compartmental biophysical model of cortical L5PC **as is**, modeled by Hay et. al, 2011. For full description of the model please see **Methods** in the original paper. Briefly, this model contains in total 12 ion channels for each dendritic compartment. Some of the channels are unevenly distributed over the dendritic arbor. In Fig. 3 double exponential conductances based AMPA synapses were used in simulations with *τ*_{rise} = 0.3*ms*, *τ*_{depress} = 3*ms* and *g*_{max} = 0.4*ns*. For Fig. 2 and Fig. 4, in related simulations we used the standard NMDA model by Jahr and Stevens, 1993, with *τ*_{rise} = 2*ms*, *τ*_{depress} = 70*ms*, *γ* = 0.08 *mV*^{−1} and *g*_{max} = 0.4*ns*. For both Fig. 2 and Fig. 3, we also used double exponential GABA A synapses with *τ*_{rise} = 2*ms*, *τ*_{depress} = 8*ms* and *g*_{max} = 1*nS* on each independent dendritic segment, we placed a single AMPA (for Fig. 3) or AMPA+NMDA (for Fig. 2) synapse as well as a single GABA A synapse. In order to mimic uniform coverage of excitatory and inhibitory synapses on the entire dendritic tree, we stimulated each compartment with a firing rate proportional to the segment’s length. Each presynaptic spike train was taken from a Poisson process with a smoothed piecewise constant instantaneous firing rate. The Gaussian smoothing sigma, as well as the time window of constant rate before smoothing were independently resampled for each 6-second simulation from the range 10ms to 1000ms (Fig. S1**D**). This was chosen, as opposed to a constant firing rate, to create additional temporal variety in the data in order to increase the applicability of the results to all possible situations. For Fig. 2 simulations with NMDA synapses, the total amount of excitatory and inhibitory presynaptic spikes per 100ms range between 0 and 800 spikes (Fig. S1). This is equivalent to 8000 excitatory synapses with an average rate of 1Hz and 2000 inhibitory synapses with an average rate of 4Hz. The average output firing rate of the simulated cell across all simulations was 1.0Hz. For Fig. 3 simulations, with AMPA only synapses, the total amount of excitatory and inhibitory presynaptic spikes per 100ms range were increased in order to account for the smaller amount of total current injected due to lack of NMDA current, with the purpose to achieve similar output firing rates of 1.0Hz. See Fig. S8 for detailed comparison.

### L23PC simulation

For Fig. 4 simulations, we used a detailed compartmental biophysical model of cortical L23PC **as is**, modeled by Branco et. al, 2010. In these experiments we stimulated a single branch with 9 dendritic segments with an NMDA synapse on each compartment, with parameters as in the simulation for Fig. 4. The branch was selected as in Branco et al, 2010 in order to perform comparison with original paper. Similarly, to Fig. 2. and Fig. 3 simulations, each presynaptic spike train was taken from a Poisson process with a smoothed piecewise constant instantaneous firing rate. The number of presynaptic input spikes to the branch per 100ms ranged between 0 and 15 in simulations used for training. In Fig. 4**C**,4**E**,4**F**, we repeated input stimulation protocol suggested by Branco et. al, 2010, consisting of single presynaptic spike per synapse with constant time intervals of 5ms between subsequent synaptic activations, only randomly permuting the order of activation between trials.

### DNN fitting

In order to represent the input in a suitable manner for fitting with a DNN, we discretize time using 1ms time bin Δ*t*. Using this discretization, we can represent a spike train as a sequence of binary values *S*[*t*], such that *S*[*t*] ∈ {0,1}, since the length of a spike is approximately 1ms there cannot be more than a single spike in such a time interval. We denote the spike trains the neuron receives as input as *X*[*s*, *t*], *s* ∈ {1,2, …, *N*_{syn}}, *t* ∈ {1,2, …, *T*}, where *s* denotes the synapse index, and *t* denotes time. The spike trains a neuron emits as output we denote as *y*_{spike}[*t*], The somatic voltage trance we denote as *y*_{voltage}[*t*]. For every point in time, we attempt to predict both somatic spiking *y*_{spike}[*t*] and somatic voltage *y*_{voltage}[*t*] based only a *T*_{input} sized window of presynaptic input spikes. i.e. define the vector and a neural network that maps to and . i.e. . We treat spike prediction as a binary classification task and use standard log loss and treat voltage prediction as regression task and use standard MSE loss. We wish to find a model’s parameters *θ* such that we minimize a combined loss , where *w*_{voltage} is the relative importance of the spike prediction loss with respect to the somatic voltage prediction loss. For most of our experiments we set *w*_{voltage} to be about half the size of the spike loss. The DNN architecture we used was a temporally convolutional network (TCN) (Bai, Zico Kolter, and Koltun 2019) and we applied it in a fully convolutional manner on all possible timepoints. Note that when the temporal filter size after the first layer is 1 in a TCN applied as described, this is effectively a fully connected neural network. In most of our experiments we used fully connected neural networks, except for Fig. 2 in which we used a proper TCN with hierarchical convolutional structure. After every convolutional layer, a batch normalization layer immediately follows. We employed a learning schedule regime in which we lowered the learning rate and increased batch size as we progressed through training. Full details of the learning schedule in each case are in the attached code repository. For generation of Fig. 5S. we trained many networks with different hyperparameters and trained each network for 2-7 days on a GPU cluster consisting of several V100, K80 and 2080Ti Nvidia GPUs. All results of the different hyperparameters and results can be found in the data link on the Kaggle platform.

### Model evaluation

We divided our simulations to train, validation and test datasets. We fitted all DNN models on train dataset, and all reported results are on an unseen test dataset. A validation dataset was used for modeling decisions, hyperparameter tuning and snapshot selection during the training process (early stopping). We evaluated binary spike prediction results using receiver operator characteristic (ROC) curve and calculated the area under curve (AUC). We note that due to the relatively low firing rate of the neuron, the binary classification problem of instantaneous spike prediction problem is highly unbalanced. For every second of simulation there was on average 1 positive sample (spike) for every 999 negative samples (non-spikes). Therefore, we used a very conservative threshold over the binary spike probability prediction output of the DNN in order to create the final spike train prediction and examine the cross-correlation plot. Note also that a prediction without a single True Positive on the 1ms time binning binary spike prediction problem can still be in fact a very good solution, e.g., if our model outputs, as its prediction, the exact same spike train as the original but offset by 1ms in time. In this case, there will be no True positives, and many False positives, but the predicted spike train is quite good nonetheless, namely: the temporal cross correlation between the original and predicted spike trains is not directly related to binary prediction metrics used and therefore we display it. For creating a binary prediction, we chose a threshold that corresponds to 0.2% false positive rate. In order to evaluate the temporal precision of the binary spike prediction we plotted the cross-correlation between the predicted output spike train and the ground truth simulated spike train. In order to evaluate the voltage prediction, we calculated the RMSE and plot the scatter plot between predicted voltage and the ground truth simulated voltage.

## Code and data availability

Data and pretrained networks that were used in this work are available on Kaggle at the following link: https://www.kaggle.com/selfishgene/single-neurons-as-deep-nets-nmda-test-data.

A short python script that loads a pretrained artificial network and makes a prediction on the entire NMDA test set that replicates the main result of the paper (Fig. 2) can be found in the following link: https://www.kaggle.com/selfishgene/single-neuron-as-deep-net-replicating-key-result.

A short python script that loads the data and explores the dataset (Fig. S1 & Fig. S2) can be found in the following link: https://www.kaggle.com/selfishgene/exploring-a-single-cortical-neuron.

## Supplementary Figures

## Acknowledgements

We thank Oren Amsalem, Guy Eyal, Michael Doron, Toviah Moldwin, Yair Deitcher, Eyal Gal and all lab members of the Segev and London Labs for many fruitful discussions and valuable feedback regarding this work. This work was supported by a grant from the EU Horizon 2020 program (720270, Human Brain Project), a grant from the Gatsby Charitable Foundation and a grant by Huawei Technologies Co. Ltd.

## Footnotes

Communication: David Beniaguev - david.beniaguev{at}gmail.com

The updated text contains a small narrative change, a more precise discussion section, several additional relevant citations were added, and 9 supplementary figures were also added. More importantly, links to code data and pretrained models were added. In the updated work we used larger and more diverse training and testing datasets, we trained the models longer that resulted in better performing models, and performed more rigorous hyper parameter sweep.