Target Learning rather than Backpropagation Explains Learning in the Mammalian Neocortex

,


Introduction
The mammalian neocortex maps complex sensory inputs into abstract hierarchical representations through the coordinated activity of large neuronal networks.However, the algorithmic principles behind learning such hierarchical representations remain largely unknown.Early studies combining neuroscience and machine learning revealed that neuronal response profiles in hierarchical visual processing are similar to those of deep artificial neural networks (ANNs) trained using the backpropagation (BP) algorithm (Zipser and Andersen, 1988;Khaligh-Razavi and Kriegeskorte, 2014a;Yamins and DiCarlo, 2016;Pospisil et al., 2018;Kell et al., 2018).This resemblance has sparked a debate about whether learning in cortex follows principles similar or different to BP (Grossberg, 1987;Crick, 1989;Richards et al., 2019;Saxe et al., 2021).
On one hand, proponents of BP argue that cortical circuits have the necessary circuits and architectures that could perform the gradient computations necessary for BP, (Guerguiev et al., 2017;Sacramento et al., 2018;Scellier and Bengio, 2017;Whittington and Bogacz, 2017;Lillicrap et al., 2016Lillicrap et al., , 2020)).On the other, various alternative explanations of hierarchical learning have been proposed that rely on predictive principles (Rao and Ballard, 1999;Friston and Kiebel, 2009;Song et al., 2024), self-organized contrastive representations (Zhuang et al., 2019;Illing et al., 2021), energy models (Rao and Ballard, 1999;Scellier and Bengio, 2017), or control theory (Meulemans et al., 2021(Meulemans et al., , 2022b)).In this work, we consolidate many of these explanations into a single alternative to BP, which we refer to as 'target learning' (TL)-a family of algorithms where learning is driven by enforcing target network activity, with synapses adapting to minimize the signal strength required to reach this target (Rao and Ballard, 1999;Meulemans et al., 2021;Millidge et al., 2022;Song et al., 2024).Although various computational studies did relate cortical learning to either BP (Körding and König, 2001;Richards et al., 2019) or TL (Friston and Kiebel, 2009;Lässig et al., 2023;Song et al., 2024), they rely on theoretical arguments and do not offer conclusive biological evidence, leaving open the question of which family of algorithms best explains cortical learning.
To bridge the gap between theory and biology, researchers have proposed bio-plausible cortical circuits that implement BP and TL.However, these models differ in their assumptions and the specific experimental observations they explain (Bastos et al., 2012;Lee et al., 2014;Lillicrap et al., 2016;Sacramento et al., 2018;Song et al., 2020;Payeur et al., 2021;Aceituno et al., 2023), thereby obscuring which family of algorithms best accounts for cortical learning.Experimentally validating these circuits is further complicated by the immense complexity of cortical networks and the fact that similar circuits can implement different algorithms with only slight modifications (Whittington and Bogacz, 2017;Scellier and Bengio, 2017;Meulemans et al., 2021Meulemans et al., , 2022a;;Aceituno et al., 2023;Song et al., 2024).
Regardless of the specific circuits or algorithms that have been proposed, all models of cortical learning must make specific assumptions about how pyramidal neurons (PNs), the primary building blocks of the cortex, tune their basal dendritic synapses to relay feedforward information (Larkum, 2013).Those assumptions, however, differ across models and result in different predictions (Whittington and Bogacz, 2017;Sacramento et al., 2018;Payeur et al., 2021;Meulemans et al., 2021Meulemans et al., , 2022a;;Aceituno et al., 2023;Song et al., 2024).Therefore, understanding how learning occurs in PNs is essential for resolving the debate on cortical learning.
We thus focus our investigation on the dynamics and plasticity of PNs and their basal dendritic synapses.We start by constructing a PN model from the ground up, based on well-established experimental observations of cellular processes and synaptic plasticity.We then use this model to make predictions about how PNs adjust their basal synapses in response to feedforward sensory inputs and feedback learning signals, which we test using in vitro electrophysiology.Finally, we leverage our single neuron model to formulate tests on the population level, which we apply to in vivo data, evaluating BP and TL as competing hypotheses for cortical learning.

Modeling basal synaptic plasticity and pyramidal neuron dynamics
We start by modeling plasticity in basal synapses based on the current understanding of molecularlevel observations showing that synaptic calcium is one key driver of plasticity in cortical neurons (Lisman, 1989;Yang et al., 1999;Evans and Blackwell, 2015).Depending on the synaptic calcium concentration, calcium-dependent plasticity pathways based on protein kinases or phosphatases are activated, leading to long-term potentiation (LTP) or long-term depression (LTD), respectively.The concentration and temporal dynamics of synaptic calcium thereby dictate the direction and magnitude of synaptic plasticity.Individual synapses maintain their strength when calcium concentrations are below an LTD threshold, but exceeding this threshold induces LTD until a second threshold is reached, beyond which LTP is triggered, leading to a plasticity function P -outlined in Graupner and Brunel (2012); Honnuraiah and Narayanan (2013); Chindemi et al. (2022) (see Fig. 1B)which we adopt in our model as, ∆w = P (c(t))dt (1 where ∆w is the change in synaptic strength and c(t) is the calcium concentration.
In pyramidal neurons, synaptic calcium influx is regulated via the opening of NMDA receptors (NMDARs) that are essential for plasticity (Artola and Singer, 1987;Bear et al., 1992;Markram et al., 1997;Sjöström et al., 2003).NMDARs require glutamate release from presynaptic activity as well as postsynaptic membrane depolarization (Emptage et al., 1999) to open.We thus model the dynamics of synaptic calcium through NMDARs as, where c max is the saturated calcium concentration dictated by the electrochemical gradient, τ c is the calcium time constant, I corresponds to the influx of calcium through NMDARs characterized by McRory et al. (2001); Slutsky et al. (2004), g pre is the presence or absence of glutamate and v synapse represents the postsynaptic membrane depolarization.
The postsynaptic membrane is depolarized by multiple sources within the PN.Besides EPSPs, backpropagating action potentials (bAPs) originating from the axon hillock depolarize the postsynaptic membrane shortly after somatic APs (Stuart and Sakmann, 1994).bAPs have been put forth as driving synaptic changes in spike time-dependent plasticity (STDP) (Markram, 1997;Sjöström et al., 2010), however, more recent experimental data suggests that bAPs by themselves are insufficient to explain plasticity in physiologically relevant extracellular calcium concentrations (Inglebert et al., 2020;Chindemi et al., 2022).We therefore also consider plateau potentials, which are neuronscale depolarization events originating from calcium influx through voltage-gated calcium channels (VGCCs) at the main bifurcation of the apical dendrite, which can depolarize the postsynaptic membrane for sustained periods (Larkum et al., 1999).Accordingly, we define our synaptic membrane model as, where τ v corresponds to the membrane voltage time constant, δ(t) stands for the Dirac delta function at time t, the * denotes the convolution operator, and κ indicates the response induced by the subscripted event in the superscripted compartment (for all model hyperparameters see Methods 7.1).
We next expand our synaptic model by adding somatic and apical-dendritic compartments.In L5 apical dendrites, the tuft region forms a distinct integrative compartment (Coogan and Burkhalter, 2004;Larkum et al., 1999Larkum et al., , 2001;;D'Souza and Burkhalter, 2017), integrating top-down inputs from higher cortical and subcortical areas (Wimmer et al., 2010;Williams and Holtmaat, 2019).Typically, top-down inputs to the apical tuft either excite or suppress local dendritic activity but do not induce a plateau potential or trigger somatic APs on their own (Fişek et al., 2023).Likewise, single bAPs are usually insufficient to initiate a plateau potential (Grewe et al., 2010).However, when inputs to the apical dendrite coincide with a bAP induced by basal inputs, the resulting depolarization at the apical dendrite can cross a threshold to open VGCCs, after which plateau potentials initiate (Larkum et al., 1999).Once generated, these plateau potentials propagate towards the somato-axonic region, where the membrane depolarization facilitates short, high-frequency bursts of APs, also known as bAP-activated Ca 2+ spike firing (BAC firing).
Accordingly, we treat somato-axonic and apical-dendritic regions as distinct integrative compartments that exchange information through bAPs and plateau potentials, This allows bAPs to spread from the somatic compartment to the apical and synaptic compartments and likewise the plateau potentials to spread from the apical compartment to the somatic and synaptic compartments (see Fig. 1A).In our model, this bidirectional link has two functions.First, the sustained depolarization from plateau potentials facilitates BAC firing.Second, bAPs and plateau potentials depolarize the basal synapses which in turn modulate plasticity through NMDAR-regulated synaptic calcium influx.In summary, our neuron model couples apical input with both the firing rate of the neuron and synaptic plasticity through somatic and synaptic membrane depolarization (see Fig. 1C&D respectively).While basal EPSPs coupled with single bAPs cause brief synaptic depolarization, the quick succession of bAPs together with the plateau potential during BAC firing results in prolonged synaptic depolarization.Consequently, the basal synaptic membrane potential increases proportionally to the apical input, which in turn influences synaptic calcium influx through NMDARs when glutamate from a preceding EPSP is present (see Fig. 1B).As a result, our model displays single APs and moderate synaptic calcium influx in the absence of apical input, and larger synaptic calcium concentrations when basal and apical inputs coincide (see Fig. 1A and Supp.Fig. 6 for an extended visual summary).This increased firing rate is consistent with experimental findings that apical input significantly modulates the firing rate of a neuron in a multiplicative manner (Larkum et al., 2004).

Apical input directs synaptic plasticity in basal synapses and affects PN activity
To test the predictions of our model that apical input modulates the firing rate of the neuron and directs the basal synaptic plasticity, we next performed in vitro whole-cell patch clamp experiments on L5 PNs from the mouse prefrontal cortex with extracellular electric stimulations of basal and apical afferents (see Fig. 2A).
We recorded basally-induced baseline EPSPs for 5 minutes to establish initial synaptic responses.Next, we increased the basal stimulation intensity to induce a single action potential and further adjusted apical stimulation intensity to provoke supra-threshold events that consisted of one, two, or three action potentials and prolonged somatic depolarization indicative of a plateau potential.After repeating these paired stimulation-induced supra-threshold events (PSSTs) 8 times, we again recorded basally-induced EPSPs for at least 30 minutes to measure long-term changes in EPSP amplitudes (see Fig. 2B and Methods 7.2).We conducted all recordings in physiological extracellular calcium concentrations (Inglebert et al., 2020).
Comparing basally-induced EPSPs before and after PSSTs, we found that some stimulation variants led to significantly increased EPSP amplitudes indicative of LTP (see Fig. 2C-E  p < 0.0001), while others resulted in no significant difference of EPSP amplitudes (see Fig. 2C-E bottom, p = 0.229).
To explain the observed changes in the EPSP amplitude changes, we evaluated potential predictors of basal synaptic plasticity.We correlated the number of APs and the area under the curve of the somatic membrane depolarisation during PSSTs (PSST-AUC) to the change in EPSP amplitudes (∆EPSP).We found a significant positive correlation between the PSST-AUC and ∆EPSP (R Spearman = 0.8671, p = 0.0003, see Fig. 2F).In contrast, the correlation between the number of APs and ∆EPSP was R Spearman = 0.4286 with p = 0.1645 (see Fig. 2G).This indicates that the total somatic depolarization that follows synaptic input is a better predictor of changes in synaptic strength than the number of APs following synaptic input.We note that the prolonged somatic depolarization resulting from dendritic plateau potentials can lead to burst firing (Larkum et al., 1999), indicating a dependency between the number of APs and the PSST-AUC.In this regard, we indeed find that the number of APs increases with the measured PSST-AUC (see Fig. 2H), and that even on a single neuron basis, the change in the number of APs significantly depends on the PSST-AUC (see Fig. 2I, p = 0.031).Our recordings also contain data for apical synapses, but in contrast to basal synaptic plasticity we find that apical synaptic strength changes do not correlate with the changes found in basal synapses (see Supp. 8.5 and Supp.Fig. 10, r s = 0.2308, p = 0.4705), indicating that apical synapses have different plasticity rules than basal ones.
In summary, our in vitro experiments support the predictions of our PN model by showing that basal plasticity is directed by prolonged somatic depolarization modulated by apical input.In addition, the same apical inputs induce long-lasting depolarization and trigger bursting behavior at the soma.These results agree with previous literature indicating that plasticity in PNs is reflected in their dynamics during learning (Larkum et al., 1999(Larkum et al., , 2004;;Godenzini et al., 2022), and suggests that the implementation of BP or TL in cortical circuit models should be expressed in the PN population activity during learning.

Linking single cell synaptic plasticity to in vivo population dynamics during learning
To connect our PN model and in vitro results to in vivo population dynamics and plasticity during learning, we first recast the synaptic weight updates into a simple form of a rate-based three-factor learning rule (Frémaux and Gerstner, 2016;Gerstner et al., 2018).We then relate these factors to specific neural activity patterns that have been recorded in vivo.
In our PN model, synaptic learning is driven by the apical input as follows (see Methods 7.3), where r pre is the rate of presynaptic EPSPs, r baseline post is the postsynaptic activity at baseline apical input a baseline , and a the apical input used during learning.Crucially, we model the apical input to have a multiplicative effect on the PN activity (see Fig. 2H&I and Larkum et al. (2004)).Thus we can rewrite the previous learning rule as, where r apical post is the postsynaptic firing rate during apically-induced plasticity (see Methods 7.3).The term r apical post − r baseline post can be interpreted as the subsequent change in the postsynaptic activity (see Methods 7.4).Previous works showed that the learning rule from Eq. 7 is consistent with classical learning rules such as STDP (Xie and Seung, 1999;Aceituno et al., 2023) and with existing hierarchical cortical learning models implementing either BP or TL (Rao and Ballard, 1999;Whittington and Bogacz, 2017;Scellier and Bengio, 2017;Sacramento et al., 2018;Meulemans et al., 2021).
However, the learning rule from Eq. 7 is similar but not identical for BP and TL.Our theoretical analysis of BP and TL algorithms reveals that the population activity during learning must differ for both algorithms (see Supp. 8.2).For BP, the neuronal activity remains similar between inference and learning, while TL requires a target neuronal activity to be enforced during learning (see Methods 7.5).This differentiation delineates our hypotheses which can be tested with specific in vivo data.In particular, Eq. 7 shows that the critical metric determining plasticity is the separation between sensory-driven PN activity at baseline apical input and PN activity when the apical input diverges from its baseline.Such separation can be observed in the context of neural reactivations, where neural activity seen during stimulus presentation reoccurs with slight variations when the animal is at rest or sleeping (Ego-Stengel and Wilson, 2010;Girardeau et al., 2009;Rothschild et al., 2017;Jadhav et al., 2012).Consistent with the feedback-driven activity of our model, cortical reactivations are required for learning (Squire, 1992(Squire, , 2004;;Jadhav et al., 2012;Chen and Wilson, 2023), are known to be driven by inputs to the apical dendrite (Khodagholy et al., 2017), and correlate with bursts (Sirota et al., 2003).
To test our hypotheses, we adopt data from a recent experimental study which recorded PN stimulus-evoked responses and reactivations from the mouse lateral visual cortex over learning using calcium imaging (Nguyen et al., 2024).We denote stimulus-evoked neural responses during stimulus presentations as r baseline post and reactivations occurring during rest as r apical post (see Fig. 3A and Nguyen et al. (2024) for complete details).We start by testing whether this in vivo data is consistent with our PN model.First, we verify that neuronal activity and plasticity inducing events are visible in calcium imaging of in vivo data by reproducing induction events while recording voltage and calcium signals.We found a significant correlation between somatic calcium recordings and PSST-AUC (Supp.Fig. 9B, R Spearman = 0.8303, p = 0.0029).Then, we track the individual PN activities over time as the stimulus is presented, and we correlate the difference in activity between a reactivation and its associated stimulus-evoked response to the difference in activity of consecutive stimulus-evoked responses for the same stimulus.As expected (see Supp. 8.1 and Supp.Fig. 5 for all details), we find a positive correlation between these activity changes (see Fig. 3C, R 2 = 0.80), validating our PN model and Eq. 7 for this dataset.
Thus, by leveraging Eq. 7 and the dataset from Nguyen et al. (2024) containing both stimulus evoked responses and reactivations, we can make distinct predictions for BP and TL and derive a precise test.If the cortex implements BP, the PN activity during reactivations for a given stimulus should resemble the stimulus-evoked activity from previous stimulus presentation.If the cortex uses TL, the PN activity during reactivations should resemble the stimulus-evoked activity at the end of learning when the neural representations have stabilized.

Target learning explains plasticity in cortical networks
To test the BP and TL hypotheses on the in vivo dataset, we measure the alignment between the reactivations and the appropriate stimulus-evoked responses after learning using the cosine similarity (see Methods 7.6.1&7.6.2).We find that reactivations seem to align better with the stimulusevoked responses after learning rather than the previous stimulus-evoked response (see Fig. 3C), as would be expected from TL (see Supp. 8.2).To evaluate which hypothesis explains the cortical reactivations better, we calculate the fraction of variance in the neuronal activity during reactivations that is explained by the BP or TL hypotheses (see Methods 7.6.3).Our results thus indicate that TL explains the activity during neural reactivations better than BP (see Fig. 3C, Welch T-test, p < 0.0001).
To ensure that our results are not influenced by the choice of baseline activity, we visualize the activity in relation to the early-to-late stimulus-evoked responses using a principal component analysis (see Methods 7.6.4).As before, we find that the reactivations are tightly concentrated around the stimulus-evoked responses after learning, as in TL.We complement this visualization by measuring the distance between the observed reactivations and the reactivations that would be expected from both hypotheses in this principal component space (see Methods 7.6.5).This analysis confirms that the reactivations are better explained by the TL hypothesis (see Fig. 3D, Welch T-test, p < 0.0001, and Supp.Fig. 7 for the same test with more principal components).
Finally, we visualize the resemblance between cortical and artificial learning dynamics by integrating our PN model into ANNs implementing either BP and TL (see Methods 7.7).Using the same visualization as in Fig. 3C, we observe that the cortical learning dynamics better resemble the TL simulations rather than BP (see Supp.Fig. 8).

Discussion
Our single-cell modeling and in vitro results combined with the in vivo characterization of the PN population learning dynamics show that cortical dynamics and plasticity align more closely with target learning (TL) rather than the classical backpropagation (BP) method used in deep learning.This finding addresses a longstanding debate in neuroscience and machine learning about the type of learning algorithm the cortex employs (Rumelhart et al., 1986;Grossberg, 1987;Crick, 1989;Körding and König, 2001;Lee et al., 2014;Liao et al., 2016;Nøkland, 2016;Guerguiev et al., 2017;Scellier and Bengio, 2017;Whittington and Bogacz, 2017;Sacramento et al., 2018;Richards et al., 2019;?;Meulemans et al., 2021;Song et al., 2020;Richards and Kording, 2023;Aceituno et al., 2023;Song et al., 2024).Our results align with other recent studies indicating that the same principles might apply to other brain ares such as the hippocampus (Grienberger and Magee, 2022), suggesting that TL might be a general principle of learning in PN networks.
In contrast to previous theoretical studies, our work focuses on formulating hypotheses on experimental data, which we verify in vitro and in vivo.Our work differs from these studies (Rao and Ballard, 1999;Bastos et al., 2012;Scellier and Bengio, 2017;Sacramento et al., 2018) in that we do not propose any concrete cortical circuit.Instead, we focus on comparing the dynamics of PNs and cortical networks during learning.This paradigm shift is crucial, because BP and TL are mathematical frameworks and are thus not tied to any specific formalism, network architecture or circuit.Yet, they impose different requirements on the neuronal activity.If network activity during learning is similar to the activity during stimulus presentation, then we obtain BP, whereas when activity during learning is similar to the activity after learning converges, then we define this as TL.Accordingly, many different network architectures such as control-based or predictive coding as well as energy models can implement approximations of BP or TL (Scellier and Bengio, 2017;Whittington and Bogacz, 2017;Ernoult et al., 2022;Song et al., 2024).
Importantly, our work does not characterize nor suggest any concrete circuit implementation of this family of algorithms.To address this gap, future work needs to narrow down the precise circuitry through a rigorous investigation of the feedback pathways which currently differ across TL architectures (Rao and Ballard, 1999;Scellier and Bengio, 2017;Meulemans et al., 2022b).Although we measured apical plasticity in L5 PNs directly (see Supp.Fig. 10), we did not find any consistent relation between PN activity and apical plasticity, most likely due to the unspecific nature of our stimulation.As we stimulate inputs to apical dendrites using extracellular electrodes, we probably jointly activate axons from the contralateral prefrontal cortex (cPFC) as well as thala-mic nuclei, specifically the mediodorsal (MD) and ventromedial (VM) thalamus (Anastasiades and Carter, 2021).Those axonal feedback projections are likely mediated by different neuron types, which could explain the heterogeneous plasticity we observed.Future experimental research needs to disentangle the contributions of different afferent feedback projections to elicit apical input and drive plasticity, for example, using projection-specific optogenetic methods (Keller et al., 2020).
Cortical TL may also provide new inspiration for machine learning researchers, especially in the realm of continual learning, where target-based methods have shown great promise (Fairbank et al., 2022;Lässig et al., 2023;Song et al., 2024).However, we note that training ANNs with TL requires simulating continuous-time network dynamics by iterating feedforward and feedback passes multiple times for each data sample and is therefore computationally intensive (Rao and Ballard, 1999;Scellier and Bengio, 2017;Meulemans et al., 2021).On digital processing hardware like GPUs, this renders the training process significantly slower and less salable compared to conventional deep learning frameworks based on BP.Nonetheless, TL research is still in its infancy and future studies might develop more efficient algorithmic implementations of TL or advance neuromorphic computing hardware to better support neuroscience-inspired algorithms (Laydevant et al., 2024).
In summary, our interdisciplinary study combines mathematical models and deep learning theory with in vitro neuroscience experiments and in vivo cortical data.Our investigation bridges multiple scales of neuronal learning suggesting that cortical hierarchical networks learn by enforcing targets rather than correcting errors, thus differing from their canonical artificial counterparts.Understanding that brains learn differently from existing machine learning models will likely inspire the development of novel and improved deep learning algorithms and hardware.Slutsky, I., Sadeghpour, S., Li, B., and Liu, G. (2004) (Larkum et al., 2004) Table 1: Hyperparameters of the neuron model with their associated value and references.
Neurons were visualized with differential interference contrast (DIC) using an Olympus BX61 WI microscope with an Olympus 60x water immersion objective and a CellCam Kikker MT100 (Cairn) DCIM camera.L5 pyramidal neurons (PN) in the PFC were identified based on morphology and distance to the pia and patched with glass pipettes with resistances of 4.8-6.4MΩ filled with intracellular solution containing (in mM): 10 HEPES, 20 KCl, 117 K-gluconate, 4 Mg-ATP, 0.3 GTP, 10 Na-P-creatine.For morphology reconstruction, the intracellular solution also contained 0.2% Biocytin.For Ca 2+ imaging experiments, the intracellular solution also contained 200 µM Oregon Green™ 488 BAPTA-5N.Whole-cell, somatic patch-clamp recordings were acquired in current-clamp mode using an Axon MultiClamp 700A amplifier and digitized using an Axon DigiData 1550 B digital converter.Data was sampled at 50 kHz and filtered at 3 kHz.Typical resting membrane potentials were between -64mV and -72mV, and access resistance ranged from 15 MΩ to 25 MΩ.Recordings in which neurons depolarized > 10% from resting potential or access resistance changed > 15% from initial value were discarded.Bridge potentials were compensated, and liquid-junction potentials were not corrected for.Depolarizing and hyperpolarizing currents were injected to characterize neurons further.Only neurons showing L5 PN-typical I h -currents were considered.Theta-barrel glass-pipettes with openings of 30 µm were placed at 50 µm -80 µm from the patched neuron to stimulate afferents targeting basal dendrites as well as in cortical layer 1 to stimulate apical afferents with synapses onto apical tuft dendrites.
After stabilization of resting membrane potential, apical and basal stimulation strength was adjusted to elicit EPSPs with amplitudes in the range of 2-3 mV.This usually required 100 µs voltage pulses in the range of -1 to -3 V followed by immediate 100 µs voltage pulses of +1 to +3 V for basal stimulations.Apical stimulation had the same shape but ranged from -6 to -8 V and +6 to +8 V, respectively.Next, the required strength of basal stimulations to induce a single action potential was determined.For this, tripling basal stimulation amplitudes and stimulating 3 times at 100 Hz was usually sufficient.If extracellular electrical stimulation was not sufficient to trigger action potentials, additional current was injected into the soma via the patch pipette until an AP was induced.Finally, apical stimulation strength was increased until multiple action potentials or plateau potentials were induced.
After all required stimulation amplitudes were set, baseline EPSP amplitudes were recorded for 5 minutes with alternating basal and apical stimulations every 5 seconds followed by 8 combined, strong basal and apical stimulations with a 10 second interval, each eliciting a plasticity induction event.Next, EPSPs were recorded for at least 30 minutes (same as baseline recordings).
Analysis of electrophysiology traces was performed using individual Python scripts and the pyabf library.The correlation analyses were performed with the seaborn library (regplot) and the statistical analysis with the scipy library (spearmanr).
Analysis of PSST-AUC with different numbers of action potentials (Fig. 2K) was performed on a subset of neurons in which PSSTs occurred with different amounts of APs (1vs2, 1vs3, or 2vs3).PSST-AUCs were averaged for events with min.and max.number of APs respectively and plotted as 2 data points per cell.
In the case of cell morphology reconstruction (only performed in 8 of 12 neurons), slices were kept in 4% PFA for 24-32 hours and stained overnight with Streptavidin, Alexa Fluor 350 conjugate (invitrogen) and mounted on commercially available microscopy slides.
Calcium imaging was performed in a subset of neurons (n = 7 neurons, 4 mice) after postinduction EPSP recordings where completed by imaging pipette-filled OGB5-N using a Lumencor SPECTRA Light Engine (GFP/FITC 475/28 nm) for excitation and a ORCA-Lightning Digital CMOS camera (C14120-20P, Hamamatsu) for image acquisition at 50 Hz.Imaging data was preprocessed (background-subtraction, bleaching correction) using Fiji/ImageJ2.Extracted fluorescence traces were further processed and analysed using individual Python scripts.A correlation analysis between somatic calcium recordings (derived as ∆F/F AUC) and somatic membrane potential AUC showed a significant correlation (R Spearman = 0.8303, p = 0.0029) and was used to infer calcium levels during the actual plasticity induction events using linear regression.
Group data are expressed as mean ± s.e.m. unless otherwise stated.Statistical significance was calculated using paired or unpaired t-tests as well as Spearman correlation.

Derivation of the rate-based neuron model
To render our neuron model comparable to the average activities observed in neuronal recordings in vivo and to implement the model in artificial neural networks, we reformulate our PN model as rate-based with a three-factor learning rule.
The inputs that arrive at the soma and the inputs that arrive at the apical dendrites are assumed to follow a Poisson distribution with a relatively low firing rate, in order to ensure that consecutive presynaptic spikes or feedback spikes are temporally separated with high probability, allowing us to consider spikes and bursts as independent events.
We divide the somatic output into bursts and single spikes.The number of events where somatic presynaptic input drove the soma over the threshold θ soma is represented by N .The events will sometimes generate a burst N burst and sometimes a single spike N spike , thus N = N burst + N spike .A burst is generated only if feedback input arrives at the apical dendrites within a short time interval ∆T after a somatic postsynaptic spike.Using the aforementioned assumption of low firing rates for apical and somatic inputs, which implies that somatic spikes cannot arrive very close in time, coincidence of basal and apical inputs are independent events.We can thus calculate the expected number of bursts and single APs as, where T represents the total time of the simulation, and a is the rate of feedback input.Intuitively, the number of bursts is simply the fraction of time at which the apical input can generate a burst times the probability of receiving suprathreshold somatic activity.Lastly, the firing rate of the postsynaptic neuron increases when burst firing occurs.Therefore, considering the firing rate as the number of spikes, the firing rate can be expressed as, where m > 1 is the expected number of spikes per burst.As denoted here, the expected firing rate of the neuron grows monotonically with the amount of apical feedback.
We can now derive plasticity from the calcium model, understanding that a plateau potential generates LTP, and a burst and a single spike without the corresponding apical input generates LTD when there is a presynaptic spike where ∆ burst w > 0 and ∆ spike w > 0 denote the synaptic change per burst or spike, respectively, and N pre correspond to the number of presynaptic spikes.Substituting Eq. 8 into Eq. 10 produces, Now there exist a a baseline such that ∆w = 0, which corresponds to Using a baseline , we can write the synaptic weight change as where η = ∆T (∆ burst w − ∆ spike w).To simplify the notation further, we will denote the presynaptic firing rate as r pre = N pre /T and the postsynaptic firing rate at baseline apical input r baseline post , which leaves us with the synaptic update rule ∆w = ηr pre r baseline post (a − a baseline ).
( 14) and if we set a baseline = 1 for simplicity, the postsynaptic firing rate for a given apical input is then To visualize the connection between the firing rates, synaptic plasticity and apical input, we simulated our spike-based pyramidal neuron model with basal and apical inputs in Supp.Fig. 6.

From synaptic plasticity rules to learning rules
Our neuron model can be used in a deep learning setting, as long as the appropriate learning signals are provided.Note that we are not concerned with how the signals are computed, we simply assume that a neuronal circuit exists that delivers the signal to the apical dendrite.
The learning algorithms that we consider use synaptic update rules with a generalized Hebbianlike structure of the form ∆ alg w = −ηr pre δ post where δ post = r a post − r b post is a value computed at the postsynaptic neuron.In the BP algorithm, this would be the derivative of the loss function with respect to the postsynaptic activity, and in target-based methods the difference between the target activity and the current activity (Meulemans et al., 2020;Rao and Ballard, 1999;Song et al., 2024), or the temporal change of the activity when driven towards the target (Aceituno et al., 2023).
We can now consider the plasticity rate of our model taken from Eq. 14 and compute the required algorithmic apical input as where a baseline is the apical input baseline and r b post the firing rate at baseline apical input, and η Ca ++ is the learning rate, which would either be an hyperparameter in the training process or could be extracted from our data.
Note that our equation has a singularity for r post = 0, where the fraction would go to infinity.However, this is solved by noting that when r post = 0 there should be no plasticity, neither from the biological point of view nor from the algorithmic one.From the biological point of view, this would naturally lead to a lack of plasticity, because there is no weight update in the absence of postsynaptic firing.From the algorithmic point of view, the use rectified linear units (or similar semi-positive functions) implies that in those cases the value of δ post would be zero because a small change in the inputs to the neuron would lead to no change at all in the output.Therefore, calculus-based methods would give this neuron a derivative of zero (and hence δ post = 0).

Learning signals and reactivations for BP and TL
In this section we show how BP and TL change the dynamics during learning, and derive hypotheses for how the activity in both algorithms changes due to apical feedback.
When we inject feedback into our PNs, we automatically change their activity and enforce a change in the output of the network.We can calculate this change easily with the baseline a baseline = 1, where r a , r b stand for the firing rates at of the neural population, , a for the apical input at each neuron during learning, ⊙ and ⊘ are the elementwise multiplication and division respectively and δ is a term that represents how each neuron should change its activity that appears in both the BP and TL formalism (Rao and Ballard, 1999;Meulemans et al., 2021;Richards et al., 2019;Bengio et al., 2016).

BP with our PN model
The modified activity at layer alters the output of the network, reducing the error at the output layer r L .It does so by considering the stimulus-evoked activity and evaluating how it should change, In BP, where the partial derivative is evaluated at the neuronal activity at inference.If we now put this into our PN model as a learning signal, the postsynaptic activity becomes Where η Ca ++ is a scaling parameter.Activity during learning: The synaptic update from BP is taken as the derivative of the loss with respect to the synaptic weights assuming that the neuronal activity is the one given at inference.Hence, presynaptic activity during learning is the same as in the feedforward pass, and thus the presynaptic activity during learning must be similar to the presynaptic activity during inference.As almost all the neurons in the network are presynaptic neurons for some synapse going into some other neuron, the scaling of the changes quantified by η Ca ++ must be small.

TL with our neuron model
In TL enforce a target we get the target value first and then recover the δ.This can be formalized as where the superscript −1 indicates an inverse.Note that the error in here is a mathematical construct which does not need to have any explicit representation.The network would only have a response to a stimulus and a target, but the difference between them need not to be computed by any neuron.Activity during learning: In TL, the term δ is not computed directly, but emerges from enforcing the target activity.This naturally gives a strict hypothesis for TL, namely that the activity during learning must be precisely the target activity.Since the target activity is observed after learning is complete, our hypothesis for TL is that the activity during learning resembles the activity during inference after learning.

Data source and preprocessing
We measure the alignment between the activity during learning and during inference or stimulus presentation in the cortical in vivo data available available from Nguyen et al. (2024).We modified the existing code to fit with the analysis outlined in the main text and kept the preprocessing as in the original publication (Nguyen et al., 2024).For all cortical data included in our analysis, we used only the top ten percent of neurons responding to the given stimulus, as in Nguyen et al. (2024).We denote early responses as the average neural activity of the first three stimulus-evoked responses and late responses as the average neural activity of the three last stimulus-evoked responses.We note that the neural activity has a significant variance, and thus the three points that define the centroid are have an angle of 24 degrees with the centroid itself (see the three rightmost responses in Fig. 3B).We will use r to denote the stimulus-evoked responses and ρ to denote reactivations.

Measuring alignment
To compare the reactivations with the stimulus evoked responses we will consider them as vectors in a space where every dimension corresponds to the activity of one neuron.In this vector space we can define both the reactivations and the activities, but the baseline activity for single cells during rest periods -which is when the reactivations take place -is not the same as during active periods (Nguyen et al., 2024).
To avoid the effects of the different baselines, we measure the alignment between the reactivations and the late responses we use the cosine distance between a given reactivation ρ i and the reactivation that would be given by the hypothesis ρ hyp i for each sample i, ⟨•, •⟩ for the dot product and ∥∥ the euclidean norm.Now we need to formulate our hypotheses in terms of ρ hyp i .TL would predict that the reactivations are aligned with the stimulus-evoked responses after learning, when the representations have converges, thus where r late refers to the reactivation after learning is finished which we estimate by taking the average of the last three stimulus-evoked responses, and ∥ stands for co-linearity (because the baseline activities are different).Note that for TL the ρ TL i is the same for every sample.In contrast, BP would predict that the reactivations are aligned with the stimulus-evoked responses of the last stimulus presentation where r i refers to the stimulus evoked response directly preceding ρ(i).The measure of alignment with TL is shown in Fig. 3C using data from sample mouse 'NN8' for stimulus 2 from the recording date 12-03-2021.As reported by the original publication (Nguyen et al., 2024), the remaining data looks similar.

Hypothesis comparison through alignment
To compare the TL and BP hypotheses we focus on the variance explained by each hypothesis.We start by formulating the reactivations as neural activity under a given hypothesis as where ε i stands for the noise.Notice that since we do not have a consistent baseline for the reactivations, and we must use the alignment or fit a scaling term, we implicitly assume that ε i ⊥ ρ hyp i .In other words, we are masking any noise that is co-linear with our hypothesis.Conveniently, the activity of visual cortex is very high dimensional (Stringer et al., 2019), and thus the fraction of noise that would be aligned with our hypothesis should be very small, because in high dimensional spaces a random vector (ε i ) would be almost orthogonal to a specific direction (ρ hyp i ).We can thus measure the fraction of variance unexplained in each reactivation can be obtained as We thus obtain a value for each reactivation.We perform a Welch t-test to assess the significance of the difference.

Origin-independent analysis
A potential concern with using the cosine similarity is that the results might change depending on the choice of origin of the neuronal activity.To address this concern, we also evaluate the variance across the stimulus-evoked responses and reactivations by projecting the variance that is not explained by the change on early-to-late.Since the relative position of the early and late activities is independent of the choice of basis or origin, this ensures that our results are consistent even if the origin was chosen differently.
We start by computing computed the difference vector between early and late responses as, and then evaluate the variance of the activity that is not explained by this difference.We do this by projecting the activity on the subspace orthogonal to the early-to-late vector, and then we compute the first principal component (PC) and project the neural activity for both reactivations and stimulus-evoked activity onto the component Note that since the reactivations have a different baseline, we added a constant scaling term s = 1.3 that ensures that the reactivations have the same magnitude as the stimulus-evoked activities, which we take from the original dataset analysis (Nguyen et al., 2024).

Hypothesis comparison through distances
We can use projection onto the early-to-late vector and PC components orthogonal to it as a dimensionality reduction procedure that maintains the space where our hypotheses live (the early-to-late vector) as well as other neural variability.In this space, we can measure the distance between the activity that would be expected in both our hypotheses and what we observe in the data.
Our approach is thus to project the neural activity of the reactivations into the space given by where P C is the principal component operator that is applied to the data projected on the component orthogonal do d.In this new basis, we can measure the noise-adjusted Euclidean distances between the points where each one corresponds to the BP or TL hypotheses.We can thus obtain a distance value for each reactivation for each hypotheses, and we compare the distributions generated by both through a Welch t-test.

Network simulations
We constructed ANNs with three hidden layers containing 100 hidden neurons each with ReLU activation functions.The hidden neurons were modified to be our rate-based multiplicative single neuron model (see Methods 7.3).Where the teaching signals for BP and TL are landing on the apical dendrite, and for BP modified according to Methods 7.4.We selected a supervised classification task for training for two main reasons.First, previous studies have noted that supervised learning for categories leads to feature representations in neural activity that are similar to those found in cortex, whereas unsupervised methods do not (Khaligh-Razavi and Kriegeskorte, 2014b).Second, the cortical recordings that we use do suggest that cortex is learning a separation of representations of two distinct stimuli (Nguyen et al., 2024) which is directly transferable to a supervised classification task, whereas for unsupervised training we would have to use a loss function that we do not know for cortex.
The TL experiments shown in the main text were conducted through the Deep Feedback Control (DFC) framework with strong feedback (Meulemans et al., 2021(Meulemans et al., , 2022a)).This architecture was chosen because DFC does not require a symmetric or anti-symmetric weight structure as other methods Whittington and Bogacz (2017); Scellier and Bengio (2017), which is hard to implement when the feedforward inputs are additive but the feedback is multiplicative.For the BP experiments we used the standard PyTorch implementation of BP, extracted the learning signal for every neuron and sent the signal to each neuron.
For training, we aimed to follow Nguyen et al. (2024) as closely as possible.Considering a binary stimulus task, where the network is presented with two of the MNIST digits in an alternating fashion for 80 stimuli with a learning rate of η = 0.001.A learning step with a reactivation was performed directly after the stimuli.The activity of all three hidden layers for both stimuli were measured and give qualitatively similar results.
8 Supplementary Material 8.1 Analysis of learning in a single PN Our neuron model describes the interplay between apical and basal inputs and the resulting effect on synaptic calcium.Still, we have not yet explored how the resulting synaptic plasticity alters sensory processing.To investigate this, we consider two distinct information pathways: sensory (bottom-up) inputs arriving at basal dendrites and (top-down) feedback from higher cortical areas arriving at the apical dendrite (Bannister, 2005;Petreanu et al., 2009;Godenzini et al., 2022).To test if and apical inputs could instruct plasticity at basal synapses, we next adopt a simple apical feedback learning circuit (Aceituno et al., 2023).This circuit simply compares the firing rate of the neuron with its target firing rate for a specific input, and then sends the difference as a feedback input to the apical dendrite until the neuron has reached its desired and stimulus-specific target activity.Our proof-ofconcept toy experiment involves two presynaptic neurons A and B, with synaptic weights w A and w B respectively, both connected to a single postsynaptic neuron.Before training, neuron A has a weak connection to the postsynaptic neuron, while neuron B is strongly connected.During training, stimuli S A and S B activate neurons A and B, respectively.The feedback circuit now shapes the apical input (i.e., the supervised learning signal) to invert this behavior, effectively swapping the synaptic strengths w A and w B .(Supp.Fig. 5.A) shows the evolution of the two synaptic weights over learning.As the neuron increasingly aligns with its target activity, two key metrics indicate the learning progress.First, the amount of top-down apical input required to align the stimulus' response of the neuron with the target decreases over time towards a baseline.Second, the Mean Squared Error (MSE), which corresponds to the squared difference between the output rate of the network and the target value, also reduces (Supp.Fig. 5.B).Finally, we compare for every sample how the gradients (change of activity in the neurons due to the plasticity) compares between the two models (Supp.Fig. 5.C).
Our simple neuron model can thus learn simple tasks both on its rate-based and spike-based version.For the rate model, we can also provide a mathematical argument to show that the change in activity as a result of learning is positively correlated with the change induced by the apical input during learning.
Consider a single postsynaptic neuron with a presynaptic input firing rates denoted by the vector r pre that go through synapses with weights w.At a given stimulus presentation, the response of the neuron is given by r b post = ϕ (⟨r pre , w⟩) , where ϕ(•) is the nonlinear response function which here we take as a rectified linear unit (note that any other monotonically increasing semipositive response function would work).
During learning, the apical input would modify the firing rate, and the difference induced during learning by the apical input would be Through the plasticity captured by Eq. 14, after learning we get a new weight vector which will result in a new response to the stimulus And now by a Taylor expansion we can compute the difference in the responses to stimuli, where ϕ ′ (z) is the derivative of the nonlinear response function evaluated at some presynaptic input in between the inputs before and after the learning.We can plug Eq. 36 in and obtain And here we note that the term r post (a − a * ) already appears in Eq. 35, hence Since η, ϕ ′ (z), ∥r pre ∥ 2 ≥ 0, the two changes in postsynaptic firing rate (or neuronal activity) are positively correlated.The only exceptions would be if ϕ ′ (z) = 0 or r pre = 0, which correspond to cases where the neuron is not firing anyway and thus is not involved in representing the stimulus.

Formalism for TL and BP
In this section we describe the underlying mathematics of both TL and BP in general terms.Note that practical implementations of either make simplifications or approximations that are useful in practice, which we will not develop in detail.We will use the notation from the main text r a , r b to refer to the activities during learning and inference, as those correspond to the activities driven by apical input and at baseline.A learning algorithm for a neuronal network is an optimization procedure that decreases a loss function were w are the weights of the connections between neurons (the computational equivalent of synaptic strengths), y s is a desired output or activity of the network, and f (•, •) is the function implemented by the network parameterized by the weights and the sensory input, and l(•, •) is an error function.
Note that both the error function l and the loss L are mathematical constructs, which can always be defined for any learning process, but do not need to be implemented by any observable biological quantity, and cannot always be cast as errors in the classical sense of what the activity should be (Kogo and Trengove, 2015).The formalism is still useful to make our analysis understandable and to connect our results to the machine learning literature as well as computational neuroscience frameworks such as predictive coding or energy models, thus we will keep it.

BP
In classical BP, the change of weights is computed by gradient descent, where the derivatives are computed over the neuron activity.This last technicality is crucial, because it makes BP computationally efficient and it cast it in terms of neuronal activity.Explicitly, this is framed as η is the learning rate.The change for a given weight for one given sample is then expressed as where it is common Bengio et al. (2016) to simplify the notation as δ BP = − ∂l(y s , f (x s , w)) ∂r b post , giving us ∆ BP w ∝ r b pre δ BP (44)

TL
In TL, the approach is different in that the loss is minimized by imposing a different activity through an input to each neuron, so that l(y s , f (x s , w, c * )) = 0 (45) where c * is a vector encoding the signal that imposes the activity onto each neuron.Depending on the architecture, this signal is called a control signal, a top-down prediction or a feedback input, and is computed in various ways (Rao and Ballard, 1999;Meulemans et al., 2022a,b;Song et al., 2024;Scellier and Bengio, 2017).In general, the control is chosen so as to provide the smallest input that will fulfill the aforementioned condition.In energy models, for example, it follows the direction of steepest descent of the loss (Rao and Ballard, 1999;Scellier and Bengio, 2017), although other solutions such as control theory approaches are also used (Meulemans et al., 2021).In the case of unsupervised learning, y s is simply the activity of the network that is closest to the one imposed by x s but fulfills a constraint on the activity such as sparsity.
The main point of TL is to provide a plasticity such that the control input will not be required, and thus the network will have learned the right computation without needing this control.Now a subtle step is to connect the synaptic weight update to the reduction of the control signal c.Since Eq. 45 as an invariant, the change of loss is zero and so is the input, hence (46) where ⟨•, •⟩ is a dot product, and the derivatives are taken at for a control signal c ∈ [c * + ∆c] that comes from the mean value theorem.Assuming that the change in control from one iteration to the next will be small (akin to the assumption in BP that the weight updates impose a small change), c ≈ c * .From it, we can obtain the constraint where we can already see how the weight update relates to the control signal, which we will now make more precise and in terms that are similar to the standard machine learning formalism.The first step is to recognize that both terms in Eq. 47 are single scalars, but are computed as dot products of high dimensional quantities.Thus, the constraint is on one degree of freedom, but we want to turn it into an optimization process that applies to many neurons and weights.For this we select the weight update that reduces the control signal in the most efficient way with the minimum weight change.
Starting with the fastest reduction of the control signal.If we want to pick ∆c that reduces c the fastest, we simply set ∆c = −ηc (48) where the proportionality constant corresponds to the learning rate.Thus, we can rephrase Eq. 47 for single neurons, and apply a the chain rule to the partial derivatives and make them pass through the postsynaptic activity of the neuron, ∂f (x s , w, c) ∂r post ∂c post , ∂f (x s , w, c) Since there is a derivative that relates the postsynaptic activity to the loss (a scalar to a scalar) on both sides of Eq. 49, we can simplify it to Most TL implementations use a control signal is additive (or computed as additive) to the hidden state of the neuron, the feed forward input in standard ANNs.Technically, other choices would work, such as using an additive control to the output, or a multiplicative effect, but to keep our derivations simple we will assume that the additive to the internal state is maintained, and note that the changes would be minimal.
In this line of thought, we introduce an internal neuron variable that conveys the weighted sum of the neuron inputs v post = ⟨r pre , w pre ⟩ (52) which gives the firing rate of the neuron through the relationship r post = ϕ(v post ), which we can then use to simplify Eq. 51 to where ∂v post ∂w pre = r a pre ⇒ ηc post = ⟨r a pre , ∆w pre ⟩.
where the use of r a pre as opposed to r b pre comes from the fact that the full network is controlled.Just as with the control signal, there are many possible changes of the presynaptic weight that would reduce the amount of control signal for the simple reason that there are usually many presynaptic neurons.Thus, we assume that the learning weight should be efficient, in the sense that it should provide the maximum decrease of c post for the minimum amount of weight changes ∆w pre .Geometrically, the natural way to maximize the effects of a presynaptic weight update is to align the update with the presynaptic inputs, thus setting the angle of the vectors in the dot product to zero.
Taken together, the update rule for a single synapse ends up being ∆w = ηr a pre c post (55) which we can easily recast in the language of machine learning as ∆w ∝ r a pre δ T L .
Note that a crucial component of TL is that r a pre is taken when the network is controlled, thus it is the presynaptic activity as it will be at target (when the network has already learned).

Difference in synaptic updates
Contrasting the learning rules from Eq. 44 and Eq.56, we find that both contain the presynaptic activity and a δ term.The later one can be used interchangeably between both approaches (see Supplement 8.3), but the presynaptic term differs in that r a pre for TL is the same as the activity after learning, while in BP, r b pre is the network activity during inference.

Links between BP and TL
The two families of algorithms that we study are not separated classes, but rather two extremes of a spectrum.While both algorithms adapt the synaptic weights to modify the activity so that a desired output is provided, BP uses the neuronal responses of the network to a stimulus, while TL disregards the current responses and uses instead the desired neuronal responses.In this section we express this statement in mathematical terms.
The intuitive way to understand where the two algorithms intersect is to consider a small control input c.In this case, the activity of the network with or without the control is very similar, since the modifications due to the control are, by definition, small.In fact, there are works which explicitly use architectures that were originally designed for TL implementations (Rao and Ballard, 1999;Hopfield, 1982), but were then turned into BP implementations by using a weak control that is similar to BP (Scellier and Bengio, 2017;Whittington and Bogacz, 2017).
This naturally implies that the difference between both algorithms fades away in the late stages of learning, where both desired and current neuronal responses are very close and thus both algorithms can be very similar.Since our main goal is to differentiate them, our analysis concerns the learning process at the beginning of learning, where the network has just been initialized and the control signal in TL would have to be strong.
In the next sections we prove that TL and BP are similar by showing analytically that BP with strong teaching signals that alter the neuron activities results in a learning procedure that is better described by TL than to BP, while TL with weak control is better described by BP than by TL.

BP with strong apical inputs is a close approximation of TL
At the single neuron level, the learning rule from BP is where r b pre is the presynaptic activity during the feedforward pass, implying that either the apical input would be very close to baseline to avoid altering the firing rate on the presynaptic neurons.However, if we allow the apical input to have a strong driving effect as our model and previous literature suggest (Larkum et al., 1999), we would have a different learning procedure.Such a learning procedure is actually closer to TL than to BP in terms of synaptic updates and neuronal activity during learning (as measured by cortical reactivations).
The approach that we follow is to show that the matrix transpose at the early learning stages is very similar to the matrix inverse which emerges from δ T L , and thus using it to compute a strong feedback input is very similar to using feedback to control the network and use the control signal for learning.
First we outline some general properties that we would expect from the matrix ∂r L ∂r .
At the beginning of learning, which is where the alignment between reactivations and stimulus differs for BP and TP, the network is not trained to that specific stimulus-output pair, and thus every entry of the matrix can be considered as random.Intuitively, if we sampled various networks, we would expect different entries but with similar statistics; thus it is natural to treat any of those matrix entries as drawn independently from a given probability distribution.The second property is that we expect the effects of all neurons on the output to be balanced, meaning that the synaptic weights (and the derivatives that emerge from them) have an average of zero, an assumption that is commonly used in computational neuroscience and which explains features of neuronal activity in cortex (Brunel, 2000;Ledoux and Brunel, 2011).We will also assume that the network is large, implying that the matrix has many elements, and that the variance of the distribution is not infinite.
The inverse of a matrix M is defined simply by the property that where I is the identity matrix.Computing an exact inverse is often a complex procedure with a complexity of O(n 3 ), where n is the number of rows.However, for large matrices whose entries are independently drawn from a probability distribution that has zero expectation µ M = 0 (in the language of computational neuroscience, this corresponds to a balanced distribution) and with finite variance σ M < ∞, we can obtain a good approximation by using the transpose M ⊤ .To illustrate this, consider the value of entries in the diagonal and off diagonal of the product and by using the standard rules to compute the variance and expectation of the product of independent variables, Thus, the matrix P is a matrix with a diagonal nσ 2 M ± √ nσ M and off-diagonal entries with mean zero and standard deviation of √ nσ M .Importantly, the matrix is only close to TL if M M −1 = I, implying that we need to scale the matrix M such that σ M = n −1/2 , when n is large we get that P ≈ I, hence M − 1 ≈ M ⊤ .In this calculation, we have to pick σ M , but in the network case we might not have this freedom as the variance at initialization is crucial for learning.Instead, we can adapt the scaling of the control input δ such that it cancels out σ M .

Computationally efficient TL with weak feedback is an approximation of BP
Having shown that BP with strong feedback is close to TL, we now prove the complementary direction and argue that TL with weak feedback is actually closer to BP than to TL.For simplicity and to align with our implementation of TL, we will use the nomenclature of control theory, although a similar argument has been build for other specific implementations such as predictive coding (Whittington and Bogacz, 2017) or Equilibrium Propagation (Scellier and Bengio, 2017).
As per our previous argument, we note that BP and TL are fundamentally different in that the presynaptic activity is altered during TL synaptic updates, ∆ T L w = −r a pre δ T L (65) where r a pre is the presynaptic activity when the control signal is provided to the network.An important point here is that the control signal is used as δ T L .
Naturally, if we have a very weak control with respect to the neuronal activity, then r a pre ≈ r b pre , hence the main problem is what is the control signal.Usually, controlling a system requires pushing it until it reaches a certain goal.To have a weak control signal would then mean to either not reach the target (which would then not be TL as per our definition) or set a target that is not the correct output, but rather a nudge in the direction of the target.The latter choice leads to where η ≪ 1 is the strength of the nudge, and the use of a derivative on the last term is valid because the changes in activity are very small.Given that the number of neurons is usually much higher than the dimensionality of the output, there are many possible configurations of δ T L .The one that minimizes the norm of δ T L on average across all errors would be the Moore-Penrose pseudoinverse which is rather hard to compute (although valid) and it is used implicitly in Target propagation, a machine learning version of TL (Meulemans et al., 2020;Ernoult et al., 2022).A simpler approach would be to simply evaluate the effect that each neuron has on the output and simply increase the control to each neuron until the target is achieved.This would imply that for neuron n we would provide an input proportional to the derivative of the output with respect to that neuron, Which aligns with BP.

Somatic calcium imaging
While we could not image intracellular calcium dynamics during plasticity induction events, we reproduced typical induction events in a subset of neurons (n = 7 neurons, 4 mice) after postinduction EPSP recordings were complete.During these stimulations, we recorded voltage and calcium signals for activity patterns similar to the ones we used for plasticity induction.We found a significant correlation between somatic calcium recordings (derived as ∆F/F AUC) and PSST-AUC (Fig. 9B, R Spearman = 0.8303, p = 0.0029).

Apical plasticity
In contrast to our results on basal synapses, we found that for apical synapses neither the plasticity induction AUC (R Spearman = 0.1049, p = 0.7456, Fig. 10D) nor the number of APs (R Spearman = 0.3996, p = 0.1982, Fig. 10E) are good predictors for changes in synaptic strength.This difference in the predictability of basal versus apical plasticity is also reflected in the finding, that overall changes in basal EPSP amplitudes did not correlate with changes in apical EPSPs (R Spearman = 0.2308, p = 0.4705, Fig. 10F).

Figure 1 :
Figure 1: Modeling the integration of basal and apical inputs in PNs.(A) Neuron schematic (left) details all events and interactions in the model.Responses of each compartment are shown for apical, somatic, and synaptic compartments for cases without (middle) and with (right) top-down input to the apical dendrite.(B) Synaptic plasticity as a function of calcium concentrations.(C) Synaptic membrane potential in expectation as a function of basal and apical input.(D) Firing rate in expectation as a function of basal and apical input.For(B-C-D) the apical input with respect to baseline (black) has a modulating effect, with inputs above baseline leading to LTP and increased synaptic membrane potential as well as firing rate (green), and below baseline to LTD and decreased synaptic membrane potential as well as firing rate.

Figure 2 :
Figure 2: Apical inputs direct changes in synaptic strength in basal synapses and affect PN activity.(A) Wide-field image of mouse neocortex with an L5 pyramidal neuron sketch superimposed and placement of extracellular basal and apical stimulation electrodes.(B) Stimulation and recording protocol.(C-D-E) Example recordings of basal EPSP amplitudes before and after a plasticity induction event with significant increase (top) and no-significant difference in synaptic strength (bottom).(C) Evolution of the EPSP amplitude over time.(D) Mean EPSP for pre and post plasticity-induction event.(E) Quantification of ∆EPSP (Student T-test, top: p < 0.0001, bottom: p = 0.229).(F-G) Linear correlation analysis between changes in basal EPSP amplitude and PSST-AUC and number of APs, respectively.(H) Histogram of the PSST-AUC for one, two, and three APs.(I) Pairwise comparison of PSST-AUC for the minimum and maximum number of somatic APs (Wilcoxon signed-rank test, p = 0.031).

Figure 3 :
Figure 3: Target learning better aligns with in vivo cortical learning dynamics.(A) Experimental setup of the in vivo dataset (adapted from Nguyen et al. (2024)).Two stimuli are presented in alternation with a resting period in between.The neuronal activity is recorded through two photon population calcium imaging, and stimulus-evoked responses are used to identify possible reactivations.(B) Activity changes given the apical feedback are proportional to the difference between reactivations and stimulus-evoked responses (top).Activity changes given the weight update is proportional to the difference between consecutive stimulus-evoked responses (bottom).(C) Correlation between the changes in activity driven by apical inputs during learning against changes in activity due to synaptic updates before and after learning (R 2 = 0.80).(D) The stimulus-evoked responses and their reactivations define trajectories in activity space during learning, which we visualize through their alignment with late stimulus responses.(E) Cosine distance between the late stimulus-evoked responses and reactivations as well as stimulus-evoked responses.The reactivations that are hypothesized by BP and TL are shown for the first 30 minutes of recording.(F) The fraction of unexplained variance of the reactivations for both the BP and TL hypotheses, where the fraction is significantly lower for TL (Welch T-test, p < 0.0001).(G) Neuronal activity over time projected onto a low-dimensional space defined by the early-to-late training vector and the principal components orthogonal to this vector.(H) Visualization of the projected stimulus-evoked responses and PN reactivations with the hypothesized reactivations of BP and TL overlaid.(I) Distance between the reactivations and the TL and BP hypotheses, showing that TL predicts reactivations significantly better than BP (right, Welch T-test, p < 0.0001).

5
Author contributions P.V.A. conceptualized the project and the modeling experiments.P.V.A. and S.d.H. developed the single neuron model and the in vivo data analysis.P.V.A. performed the mathematical analyses of the models.S.d.H. implemented and performed the simulations for the single neuron model and deep learning models, and performed the data analysis.R.L. conceptualized and performed the electrophysiological experiments.R.L. provided biology insights about the project on both theoretical and experimental neuroscience.B.F.G. supervised the project.P.V.A., S.d.H., R.L., and B.F.G. wrote the paper.

Figure 4 :
Figure 4: Biocytin-Streptavidin stained neurons (left) and typical hyperpolarizations and action potential trains (right top) induced by negative and positive current injection steps (right bottom) from recorded neurons.For cells 3, 4, 5, and 11, no Biocytin-Streptavidin staining/morphology reconstruction was performed.

Figure 5 :
Figure 5: Comparison of spike-based versus rate-based neuron model learning dynamics.(A) Training of the synaptic weights.(B) Evolution of the magnitude of the feedback signal and loss function.(C) Comparison of the change in gradients across the simulation.

Figure 6 :
Figure 6: Effects of apical input on somatic firing rate, synaptic membrane potential, calcium influx, and plasticity.(A) Neuron firing rate modulated by apical input.(B) Synaptic membrane potential integrating EPSPs, bAPs, and plateau potentials.(C) Synaptic calcium concentrations with LTD and LTP thresholds and their integration.

Figure 7 :
Figure 7: Significance of BP and TL hypotheses across principle components.BP and TL hypotheses for the first (early-to-late vector) to the first four principle components, with explained variance for every additional component (left).The cumulative explained variance ratio for all principle components (right).

Figure 8 :
Figure 8: Angles and Orthogonal PC of BP and TL training within ANNs.Cosine similarity of the stimulus-evoked responses and reactivations for the BP and TL network simulations to the late-training centroid (left).The movement of the trajectories in the subspace comprising of the early-to-late training vector and the first principle component orthogonal to this vector (right).