Abstract
Understanding the computational principles of the brain and replicating them on neuromorphic hardware and modern deep learning architectures is crucial for advancing neuro-inspired AI (NeuroAI). Here, we develop an experimentally-constrained biophysical network model of neocortical circuit motifs, focusing on layers 2-3 of the primary visual cortex (V1). We investigate the role of four major cortical interneuron classes in a competitive-cooperative computational primitive and validate these circuit motifs implemented soft winner-take-all (sWTA) computation for gain modulation, signal restoration, and context-dependent multistability. Using a novel parameter mapping technique, we configured IBM’s TrueNorth (TN) chip to implement sWTA computations, mirroring biological neural dynamics. Retrospectively, we observed a strong correspondence between the biophysical model and the TN hardware parameters, particularly in the roles of four key inhibitory neuron classes: Parvalbumin (feedforward inhibition), Somatostatin (feedback inhibition), VIP (disinhibition), and LAMP5 (gain normalization). Moreover, the sparse coupling of this sWTA motif was also able to simulate a two-state neural state machine on the TN chip, replicating working memory dynamics essential for cognitive tasks. Additionally, integrating the sWTA computation as a preprocessing layer in the Vision Transformer (ViT) enhanced its performance on the MNIST digit classification task, demonstrating improved generalization to previously unseen data and suggesting a mechanism akin to zero-shot learning. Our approach provides a framework for translating brain-inspired computations to neuromorphic hardware, with potential applications on platforms like Intel’s Loihi2 and IBM’s Northpole. By integrating biophysically accurate models with neuromorphic hardware and advanced machine learning techniques, we offer a comprehensive roadmap for embedding neural computation into NeuroAI systems.
Introduction
Recent advances in machine learning and computational neuroscience have significantly accelerated the progress toward the development of synthetic cognitive agents with artificial general intelligence (AGI). Vision transformers (Dosovitskiy et al. [2020]) and natural language models (Shanahan et al. [2023]) have achieved notable success in image recognition and natural language processing. However, despite surpassing human performance in specific tasks like chess (Campbell et al. [2002]) and Go (Silver et al. [2017]), AI systems still encounter significant challenges when learning in novel environments. These systems require substantially more computational resources and annotated data than biological brains. This disparity may also arise from fundamental differences in how artificial and biological neural networks process information. In this work, we investigate the potential of reverse-engineering the brain’s computational principles and integrating them into AI systems. This exploration aligns with the core tenet of the NeuroAI approach (Zador et al. [2023]), aiming to bridge the existing gap between artificial and biological intelligence.
The execution of cognitive behavior in the brain relies on the ability to select actions based on external stimuli and context (Dayan [2008]). In animals, the learning of state-dependent sensorimotor mappings (Asaad et al. [2000], Banerjee et al. [2020], Xu et al. [2022], Condylis et al. [2020]) is primarily mediated by the neocortex, which facilitates cognition through computations enabled through its modular, laminar microcircuits. These microcircuits consist of excitatory and inhibitory neurons, including four major inhibitory classes — parvalbumin (PV), somatostatin (SST), vasoactive intestinal peptide (VIP), and Lamp5 (Rudy et al. [2011], Tremblay et al. [2016]). These interneurons play a crucial role in regulating state-dependent computations, performing tasks such as arithmetic, logical operations, timing, and gain modulation (Fishell and Kepecs [2020], Kepecs and Fishell [2014], Ferguson and Cardin [2020], Niell and Scanziani [2021]). Importantly, these four inhibitory neuron classes are conserved across cortical regions and species (Pfeffer et al. [2013], Campagnola et al. [2022]), indicating that their computational logic is generalizable for diverse high-order tasks, including motor execution and working memory.
Several key candidate computational principles have been proposed to elucidate neocortical function, including normalization (Carandini and Heeger [2012]), dynamic field theory (Schöner and Spencer [2016]), attractor networks (Vyas et al. [2020]), predictive coding (Keller and Mrsic-Flogel [2018]), Bayesian inference (Bastos et al. [2012]) and winner-take-all (WTA) computations (Douglas and Martin [2007]). However, direct evidence at the level of microcircuit and biological hardware implementation remains limited. Among these, the WTA mechanism is amenable to neocortical architecture and combines key elements of these various computational approaches. By employing competitive-cooperative dynamics, the WTA mechanism facilitates selective amplification and noise minimization, thus enhancing signal restoration (Douglas and Martin [2007]). These characteristics resemble signal processing in both primary sensory and motor cortices, where superficial pyramidal neurons receive sparse and weak thalamic inputs that require amplification to extract relevant information (Balcioglu et al. [2023], Lien and Scanziani [2018], Bopp et al. [2017], Binzegger et al. [2004]). Consequently, the WTA mechanism may be a fundamental computational strategy employed by cortical circuits and represent a ubiquitous computational strategy implemented by the neocortex.
In addition, WTA models show considerable promise for neuromorphic hardware (Mead, 1990; 2023), especially in energy-efficient, real-time processing (Chicca et al. [2014], Qiao et al. [2015], Indiveri and Sandamirskaya [2019]). To execute such computations in silico, IBM’s TrueNorth (TN) chip offers a tractable platform for integrating brain-inspired principles. For example, it features a reconfigurable, asynchronous, multi-core digital architecture optimized for real-time, ultra-low-power, event-driven processing with physical neurons (Modha et al. [2023], Merolla et al. [2014], Neckar et al. [2018]). As a result, TN is especially well-suited for implementing brain-like computations (Indiveri and Sandamirskaya [2019]). While prior research has focused on leveraging statistical relationships among neuronal populations to emulate biological circuits on TrueNorth hardware (Imam [2021]), developing generalizable techniques for integrating diverse biophysical and theoretical models remains an open challenge. In this work, we demonstrate that by employing biophysically realistic computational principles, parameters of IBM TrueNorth - such as thresholds, leak rates, and crossbar weights, can be modeled to reflect those found in cortical microcircuits. By utilizing this approach, TrueNorth (TN) hardware can be programmed to perform computations analogous to those observed in simplified, biologically realistic V1 cortical circuit motifs, that may potentially underlie key V1 functions such as orientation and direction tuning (Rossi et al. [2020], Niell and Stryker [2008], Douglas and Martin [2007], Hubel and Wiesel [1962].
Our goal here was not to build an exhaustive model of V1, such as that outlined in Billeh et al. [2020], but rather to design a simplified, generalizable circuit motif that validates the core computational principles utilized in cortical processing. The retrospective analysis confirmed that the optimal parameters for configuring TN hardware to display sWTA dynamics closely aligned with the primary functions of different interneuron classes. Furthermore, our findings demonstrated that hardware-optimized abstractions could effectively replicate biological circuits. Finally, to test the functionality of this approach, we investigated whether integrating this hardware-constrained sWTA computation could be utilized to implement a neural state machine for working memory or to enhance the performance of state-of-the-art deep learning models such as Vision Transformers (ViT). For the former, we successfully achieved persistent activity in the TN hardware by leveraging sparsely coupled sWTA motifs, a critical requirement for instituting working memory. Additionally, when this approach was applied as a pre-processing layer into the ViT architecture, we observed a substantial increase in classification accuracy for previously unseen test data. Together these results suggest that adapting biophysical principles to neuromorphic chips may offer a promising pathway for NeuroAI performance.
Results
Our objective is to extract general principles of neocortical function, such as soft Winner-Take-All (sWTA), and implement them efficiently on neuromorphic hardware. By leveraging the hardware’s parametric constraints, we aim to apply these simplified neocortical computations to enhance working memory capabilities. This approach not only mimics brain-like processing but also has the potential to improve AI models’ performance across various machine learning tasks, bridging the gap between neuroscience and artificial intelligence.
Biophysical model implementation of neocortical circuit motifs
Sensory information processing in primary sensory cortices, such as V1 relies on pyramidal neurons integrating bottom-up signals with top-down feedback from higher-order visual areas. Key components in this process include recurrent excitation, feedforward and feedback inhibition, disinhibition, and divisive normalization. These functions are primarily mediated by parvalbumin (PV), somatostatin (SST), vasoactive intestinal peptide (VIP), and lysosomal-associated membrane protein 5 (LAMP5) interneurons, respectively, working together to selectively amplify thalamic inputs (Reinhold et al. [2015], Reinhold et al. [2015], Pfeffer et al. [2013], Lien and Scanziani [2013]). Mouse V1 exhibits strong recurrent connections among layer 2/3 (L2/3) pyramidal neurons (Ko et al. [2011], Harris and Mrsic-Flogel [2013], Rossi et al. [2020]), with PV interneurons providing local feedforward inhibition and SST interneurons delivering global feedback inhibition targeting axon initial segments and L2/3 dendrites (Schneider-Mizell et al. [2021], Atallah et al. [2012], Naka et al. [2019], Adesnik and Scanziani [2010]). VIP interneurons modulate feedback inhibition (Karnani et al. [2016], Pfeffer et al. [2013]), while LAMP5 cells regulate top-down and bottom-up signals through normalization (Ibrahim et al. [2021], Huang et al. [2023], Malina et al. [2021], Hartung et al. [2024]).
Consistent with earlier studies, our experimental data confirmed that L2/3 pyramidal neurons receive strong inhibition during bottom-up sensory input stimulation in V1 (Supplementary Fig. 1A-C). To dissect the specific contributions of PV and SST interneurons, we employed optogenetics in PV-Cre and SST-Cre mice and analyzed how their activation modulated the current-frequency (f/I) response of L2/3 pyramidal neurons (Supplementary Fig. 1D-F). Additionally, top-down auditory cortex inputs, which primarily convey contextual information, were found to target Lamp5 expressing neurogliaform interneurons (Supplementary Fig. 1G-J). Though not explicitly tested, we modeled Lamp5-mediated global inhibition via volumetric transmission (Ibrahim et al. [2021], Huang et al. [2023]) affecting L2/3 pyramidal neurons. Finally, VIP interneurons, although not directly included, likely modulated SST-mediated inhibition through disinhibition (Karnani et al. [2016], Pfeffer et al. [2013]).
Next, we built a biophysically detailed network model in the NEURON simulation environment to validate whether the simplified cortical circuit motifs obtained from V1 indeed implemented sWTA computations. To do this, we incorporated excitatory and inhibitory cell types with diverse spiking patterns in a conductance-based Hodgkin-Huxley network model (Supplementary Fig. 2A-D). Synaptic parameters were constrained by our experimental data (Supplementary Fig. 1F). Poisson-modulated excitatory synaptic inputs were used to assess the input-output (IO) function of L2/3 pyramidal neurons under PV and SST inhibition (Supplementary Fig. 2A). Synaptic weights were tuned to match experimentally observed inhibitory postsynaptic potentials, adjusting the pyramidal neuron f/I curve (Supplementary Fig. 2D-F). SST inhibition primarily influenced the slope of the IO function, while PV inhibition altered the offset (Supplementary Fig. 2E-F). Lamp5-mediated volumetric inhibition was represented as non-specific inhibition across pyramidal neuron dendrites, achieved by reducing both SST and recurrent excitatory weights. Notably, although VIP inhibition was not explicitly modeled, its effects were captured by reducing SST weights.
Based on experimental data, we developed a generalized cortical microcircuit model comprising 10 pyramidal neurons, 10 PV interneurons, 1 SST interneuron, and 1 Lamp5 interneuron, with their biophysical properties constrained by our mouse V1 physiology data (Figure 1A). We examined the computational behavior of this model under Poisson-modulated excitatory synaptic inputs mimicking thalamic activity, where stronger inputs were selectively amplified, and weaker inputs suppressed (Figure 1B). Stronger thalamic inputs represent a tuned orientation or direction information carried to L2/3 pyramidal neurons in V1. The soft winner-take-all (sWTA) mechanism enables neural circuits to prioritize the strongest input by balancing competitive and cooperative interactions. Non-linear amplification of stronger inputs, coupled with suppression of weaker ones, forms the core of sWTA computations. Consistent with previous findings (Douglas et al. [1995], Somers et al. [1995], Reinhold et al. [2015]), our model reproduces these dynamics through recurrent excitation and lateral feedback inhibition (Figure 1B-C). Using this model, we assessed how various interneuron populations contribute to sWTA computations, focusing on their role in enhancing responses to strongly stimulated neurons (Pyr 4) while suppressing weaker responses (Pyr 1-3, 5-7; Figure 1C). By modulating the conductances of PV, SST, VIP, and Lamp5 interneurons within physiological ranges, we quantified their influence on pyramidal neuron dynamics (Figure 1D-E). PV and SST inhibition independently shaped the width and gain of the sWTA function, while Lamp5-mediated inhibition primarily adjusted gain (Figure 1F). VIP-mediated disinhibition was examined by reducing SST synaptic weights.
We computed a selectivity index to evaluate the network’s ability to suppress weaker inputs while amplifying stronger ones. This index, calculated as the difference in pyramidal neuron responses to closely tuned inputs, revealed that recurrent excitation, along with PV and SST inhibition, is crucial for maintaining high selectivity (Supplementary Fig. 2G-I). Lamp5 inhibition modulates gain, preserving tuning specificity while enabling flexible responses to factors like attention and locomotion (Ferguson and Cardin [2020], Bugeon et al. [2022]). Our model replicates key features of cortical circuits, showing both linear gain scaling and non-linear selectivity. These dynamics are governed by the balance between excitatory and inhibitory synaptic weights and the feedback inhibition threshold, modulated by network activity. Deviations from optimal weights diminished sparsity by amplifying secondary inputs (Figure 1G-I). Linear gain modulation enhanced weak thalamic inputs (Oldenburg et al. [2024], Sievers et al. [2024], Lien and Scanziani [2018], Lien and Scanziani [2013]), while non-linear computations such as signal restoration for sharply tuned, noise-embedded inputs (Figure 1G-H) — aligned with in vivo observations. Furthermore, the model captures hysteresis and multistability, enabling the circuit to amplify relevant inputs based on initial conditions or contextual cues (Figure 1I).
In summary, our model provides a biophysical foundation for a simplified and generalized computational mechanism, such as soft WinnerTake-All (sWTA), which may represent a universal computation in sensory cortices (Douglas and Martin [2007], Niell and Stryker [2008]), that might contribute to orientation and direction tuning in the visual cortex (Hubel and Wiesel [1962]), angular whisker tuning in the barrel cortex (Lavzin et al. [2012]), and frequency tuning in the auditory cortex (Kato et al. [2017]).
Mapping neocortical algorithms onto IBM TrueNorth neuromorphic hardware
A few years ago, IBM released their neuromorphic TrueNorth (TN) chip, offering a reconfigurable, asynchronous, multi-core digital architecture ideal for implementing brain-inspired computations. We aimed to program the TN chip to implement a simplified sWTA computational primitive, inspired by the neocortex. A key challenge was that TN’s neural dynamics were governed by parameters such as thresholds, leaks, and crossbar weight that did not directly align with biophysical or artificial neural network models. Notably, these strongly resemble gain modulation regulated by the four interneuron types considered in the biophysical modeling described previously.
To implement an sWTA computation, we developed an automated gain-matching technique to match TN network dynamics to the biophysical model, enabling accurate parameter mapping (Appendix A1). Initially, we created an abstract rate-based model that mimicked the input-output (IO) function of the biophysical neurons. We then derived constraints to map these dynamics onto the TN hardware, producing a linear threshold response that closely approximated the physiological behavior of cortical neurons (Figure 2A-C). Inputs to the TN network were generated by configuring the on-chip neurons within neurosynaptic cores to produce a range of frequencies (Figure 2D-E). This allowed us to match the TN network’s IO gain to the abstract model (Figure 2F). Using contraction theory (Rutishauser and Douglas [2009]; Appendix A2), we derived optimal TN parameters, enabling us to implement all sWTA operations, as we performed in our biophysical analysis of V1 processing, including signal restoration, hysteresis, and multi-stability (Figure 2G-I).
After programming the TN chip to perform sWTA computations, we retrospectively compared TN parameters such as thresholds and leaks with those in our biophysical model. The optimized TN parameters closely aligned with the functions of the excitatory-inhibitory balance observed in the biophysical model (Supplementary Fig. 2E-F). Specifically, TN parameters such as threshold and leak mirrored the roles of PV and Lamp5-mediated inhibition in the biophysical model (Supplementary Fig. 3A-C). Moreover, the TN neurons replicated the recurrent excitation and global feedback inhibition motifs found in cortical circuits, mirroring the effects of SST interneurons on pyramidal neuron IO functions, as well as the role of VIP interneurons in disinhibiting this population (Figure 2E). Notably, the parameters derived using our gain-matching technique closely resembled those observed in experimentally constrained models of neocortical circuits across all conditions, highlighting the fidelity of TN hardware in simulating cortical computations (Supplementary Fig. 3D-F).
While our initial efforts efficiently mapped the sWTA computations in rate mode, we extended the method to implement population-level sWTA networks in spiking mode configuration (Figure 2J). This configuration, tested on a population of 10 excitatory neurons gated by shared inhibition, allowed us to translate rate-based computations into spiking dynamics, better reflecting neocortical organization. Under optimal conditions, the TN network’s dynamic range during sWTA closely matched the firing rates of cortical neurons in V1 (Figure 2K-L). We evaluated whether the TN network in spiking mode could implement sWTA computations under noisy conditions, simulating biological variability (Figure 2K-L). The noise was controlled using a parameter called threshold mask noise (TMN), which emulated spontaneous cortical activity. At TMN values up to 10, synaptic and spiking variability resulted in stable network dynamics that supported sWTA operations (Supplementary Fig. 3G-H). This demonstrated that TN spiking networks remain stable in noisy environments and avoid synchrony driven by inhibition. Without inhibition, excitatory activity would increase exponentially; however, when inhibitory neurons are activated, their gain is tuned to stabilize excitatory activity. This balance between positive and negative feedback forms an attractor state.
We next aimed to demonstrate a practical use case for sWTA circuit motifs in hardware applications, specifically by implementing a neural state machine (NSM). We hypothesized that NSMs could serve as foundational elements for complex cognitive tasks in robotics, and will benefit from the energy-efficient framework of sWTA networks. A key aspect of cognitive function is working memory, which allows for the retention of cue information even in the absence of stimuli, enabling appropriate action selection based on environmental cues. NSMs with working memory encode stimuli as distinct states, transitioning between them to support context-dependent tasks (e.g., Fuster and Alexander [1971], Wang [2001], Harvey et al. [2012]). Previous studies have shown that sparsely coupled sWTA motifs can sustain persistent activity (Neftci et al. [2013], Rutishauser and Douglas [2009]). Given the conserved use of circuit motifs across cortical areas, we hypothesized that this sWTA architecture would efficiently support working memory. To implement stable attractor states, TN hardware parameters were tuned to balance positive and negative feedback as outlined in Neftci et al. [2013] to avoid inhibition-mediated synchrony, which is critical for maintaining persistent activity in an NSM using spiking dynamics. We tested whether these motifs could facilitate action selection in response to environmental cues while retaining information in their absence. Our results showed that TN neurosynaptic cores, initially designed for sensory sWTA, could be effectively repurposed for NSM implementation. Using sparsely coupled sWTA motifs, we achieved persistent activity in TN hardware, with time constants that aligned closely with experimental data.
We then evaluated the ability of this architecture to implement a two-state NSM. Transitions between states S1 and S2 were driven by input signals (X, Y) and regulated by pointer neurons (P12, P21) within the sparsely coupled sWTA motif (Supplementary Figure 4A-C). This configuration generated stable sWTA dynamics and persistent attractor states (Neftci et al. [2013]; Appendix A2). Consistent with previous findings, gamma coupling through bidirectional excitatory weights in TN hardware sustained persistent activity even without external input. When noise was introduced, it disrupted synchronous firing but optimizing gamma coupling maintained stable persistence (Supplementary Fig. 4D). By fine-tuning the coupling strength, we identified the minimal gamma required for maintaining persistent activity across varying noise levels. Striking this balance was critical for stability, especially when environmental cues were unreliable. We further confirmed the stability of attractor states by removing one transition input, demonstrating that the circuit continued to sustain activity (Supplementary Fig. 4E).
In summary, we developed a two-state NSM where transitions were governed by input signals and the current state (Supplementary Fig. 4F). The sWTA dynamics facilitated smooth state transitions within a finite state automaton (FSA) framework. Efficiency analysis revealed that both time and energy in TN hardware scaled linearly with the number of states and computational load, contrasting with the quadratic scaling observed in Compass simulations. Notably, runtime on TN hardware was independent of firing rates, synapse activity, and neuron counts, with asynchronous state updates (Supplementary Fig. 4G-H; Appendix A3). This underscores the efficiency of neuromorphic hardware in implementing NSMs, supporting higher cognitive functions.
Neocortex-inspired WTA implementation for Artificial Intelligence applications
Performance boost in Image Classification
Finally, we tested whether incorporating a pre-processing WTA layer into deep learning models, such as Vision Transformers (ViTs), could enhance performance on real-world vision tasks. Specifically, we explored the role of sWTA computations in spatial feature extraction for object classification tasks. We developed a novel neural layer inspired by the hardware-constrained sWTA motif and integrated it into the conventional ViT architecture to assess its impact on classifying unseen digit datasets. This approach (Appendix A4) leveraged sWTA as a pre-processing layer to reduce redundancies and enhance contrast in visual inputs. A sliding window-based computation (Figure 3A) was employed for feature amplification, minimizing domain shifts (Figure 3B). This allowed parameters extracted by the TN hardware constraints to execute sWTA computations using recurrent excitation and lateral inhibition across pixels. In this setup, the patch with the highest variance, or “winner patch,” received the highest normalized value, while other patches were scaled accordingly. The selection of the “salient” patch size was optimized to maintain stable circuit dynamics in line with TN hardware parameters.
We next evaluated domain generalization to assess the ability of the sWTA model to adapt to unseen data distributions — an ideal test for mimicking the brain’s ability to generalize across diverse sensory inputs and maintain robust performance in novel environments. We trained the ViT model, with and without the WTA layer (Figure 3C), on a single source domain, and then tested its performance on unseen target domains. Significant improvements were observed across all source/target combinations (Figure 3D). Beyond ViT, similar performance gains were observed in EfficientNet (Tan and Le [2019]), CapsuleNet (Sabour et al. [2018]), MobileNet-V2 (Sandler et al. [2018]), and ResNet-50 (He et al. [2016]), where the WTA layer enhanced the models’ ability to learn generalizable features, improving domain shift robustness in object recognition tasks for MNIST and other digit datasets (Table 1). Figure 3B illustrates how the WTA layer reduces domain shift, showing high similarity across sample images post-processing. These results were achieved without intensity-based augmentation, using only geometric augmentations. EfficientNet, MobileNet, and ResNet-50 were initialized with pre-trained ImageNet weights (Russakovsky et al. [2015]). Supplementary Figures 5A-B present train/loss curve examples for ViT and CapsuleNet, while Supplementary Table 2 details model architectures and training settings. Table 1 summarizes the results, with Supplementary Table 1 showing performance improvements compared to state-of-the-art models. We also compared the WTA layer’s ability to minimize domain shift against traditional pre-processing techniques such as Local Response Normalization (LRN) Krizhevsky et al. [2017], Local Contrast Normalization (LCN) Placidi and Polsinelli [2021] and Z-score normalization. Supplementary Figure 6A provides qualitative comparisons, while Supplementary Figure 6B visualizes UMAP embeddings of MNIST and MNIST-M datasets post-normalization. Our WTA implementation significantly minimized domain shift (Supp Fig. 6B), leading to a marked improvement in classification accuracy (0.7) on unseen test data compared to baseline techniques (Supp Fig. 6C).
Performance boost in Image Segmentation
Lastly, we evaluated our approach in a deep learning model for the challenging task of natural image segmentation. Similar to the results seen in image classification, incorporating the sWTA layer into the RefineNet architecture Lin et al. [2017]) with ResNet-101 significantly improved performance in semantic segmentation. This underscores the broad applicability of our approach across various vision tasks. The model was trained on the Cityscapes dataset (Cordts et al. [2016]), consisting of 2,975 daytime driving images, and tested on 50 coarsely annotated Nighttime Driving dataset (Dai and Van Gool [2018]) (Figure 3E). We employed the same training setup, using a dynamic learning rate of 0.1 with stochastic gradient descent (SGD) on an RTX A6000 GPU and a batch size of 6. Performance was measured using mean Intersection over Union (mIoU).
Notably, adding the WTA layer to RefineNet achieved performance on par with nighttime driving data using only source-trained models (Figure 3F).
Table 2 highlights the performance improvements in RefineNet for natural image segmentation tasks, with and without the sWTA layer, demonstrating its robustness in handling domain shifts. Supplementary Fig. 7 shows qualitative results for the nighttime dataset, comparing performance with and without the WTA layer.
In conclusion, implementing sWTA motifs as a layer in various deep learning architectures substantially improves their performance. Using the competitive-cooperative dynamics of biologically inspired WTA mechanisms as a preprocessing layer creates a synergy between biological principles and artificial neural networks. Our WTA-inspired layer enhances performance in real-world tasks like image classification and segmentation by acting as an adaptive filter that sharpens focus on the most relevant features while reducing noise and irrelevant information.
Discussion
The overall goal of this study was threefold: First, to validate that computational principles, such as winner-take-all (WTA), are implemented by neocortical circuit motifs by delineating the contribution of four major cardinal interneuron classes in biophysical models. Second, to emulate WTA computational primitives and extend them to construct neural state machines on IBM’s TrueNorth (TN) neuromorphic hardware. Third, to integrate hardware-derived parametric constraints into deep learning models, such as Vision Transformers, demonstrating that biological principles can enhance performance and reduce training sessions through zero-shot learning. Figure 4 shows the block diagram architecture of our proposed framework.
Our gain-matching technique allows for the emulation of any neural network architecture and computational principle on neuromorphic hardware, similar to previous studies on analog systems (Neftci et al. [2011]). This work builds on prior neural computation research that has applied sWTA networks to object recognition (Yuille and Grzywacz [1988]; Riesenhuber and Poggio [1999]; Erlhagen and Schöner [2002]), attention (Itti et al. [1998]; Deco and Rolls [2005]), orientation selectivity (Ben-Yishai et al. [1995]; Somers et al. [1995]), decision making (Amari and Arbib [1977]) and sparse coding (Rozell et al. [2008]). Notably, our proposed framework (Figure 4) can be easily adapted for next-generation neuromorphic platforms like Braindrop (Neckar et al. [2018]), IBM’s Northpole (Modha et al. [2023]), and Intel’s Lohi2 (Davies et al. [2021]). Facilitating a direct translation of computational insights from theoretical neurobiology to hardware and AI systems.
Building on prior studies, we demonstrate that generic circuit motifs used for sensory computations can be adapted to implement working memory for cognitive decision-making tasks on TN neuromorphic hardware (Rutishauser and Douglas [2009], Neftci et al. [2013]). This multiplexing of sensory and decision-making computations is both hardwarefriendly and efficient, with computation time and energy scaling linearly with the number of states or computational load. Our approach provides a scalable solution for mapping large state machines, outperforming current Long Short-Term Memory (LSTM) models. While the components in LSTM architectures scale quadratically with the number of states, our method scales linearly, offering a more efficient alternative. Future research could investigate the role of specific interneurons and circuit motifs beyond sensory regions, potentially revealing principles for context-dependent decision-making.
Our framework can also be extended to mimic dendritic computation in cortical neurons and implement dendrocentric learning rules (Boahen [2022]), such as BCM plasticity (Bienenstock et al. [1982]), spike-timing dependent plasticity (STDP; Bi and Poo [1998]), and non-Hebbian behavioral time-scale plasticity (BTSP; Bittner et al. [2017]). These principles could inspire better hardware design for constructing self-learning AI cognitive agents with behavioral flexibility akin to that of animals. In this study, we focused on implementing aspects of the primary visual cortex within Vision Transformers and deep learning models. Future work could extend this framework to other cortical regions involved in sensory processing and decision-making. Together by implementing realistic biophysical models to neuromorphic systems we are able to improve AI networks, providing a key step towards fostering NeuroAI development.
Author Contributions
A.I., G.F., and S.H. designed research and wrote the manuscript; S.H. performed in-vitro experiments, biophysical modeling, and neuromorphic hardware implementation; A.I. and S.H. conceived mapping of neocortical computation in deep learning architectures. A.I. and H.M. performed mapping of WTA in deep learning architectures and ran experiments for image classification and segmentation tasks; G.J.S supervised in-vitro experiments; G.F supervised biophysical modeling and contributed to the organization of the manuscript.
Competing Interest Statement
The authors have declared no competing interests.
Methods
Animals
All experimental procedures were approved by and conducted in accordance with Harvard Medical School and The Australian National University Institutional Animal Care and Ethics Committee.
Viral injections and Whole-cell patchclamp recordings
For labeling bottom-up sensory input and top-down contextual inputs, AAV1-hSyn-hChR2(H134R)-EYFP-WPRE-hGH (ChR2) was injected in either the contralateral visual cortex (or visual thalamus, dLGN) and primary auditory cortex respectively (Honnuraiah et al. [2024]; Godenzini et al. [2021]). Ipsilateral eye input is stimulated by contralateral V1, and contralateral eye input is stimulated by dLGN stimulation (Honnuraiah et al. [2024]). For PV and SST inhibition experiments, Cre-dependent ChR2 (AAV1-EF1a-DIO-hChR2(H134R)-EYFP-WPREhGH) was injected in the binocular visual cortex of the transgenic mice expressing Cre in either PV or SST. Three to four weeks after viral injection mice were deeply anesthetized with isoflurane (3% in oxygen) and immediately decapitated. Slice preparation protocols and the experimental recordings are explained in detail in this study (Honnuraiah et al. [2024]). All recordings were made in the current-clamp using a current clamp BVC-700A amplifier (Dagan Instruments, USA). Data was filtered at 10 kHz and acquired at 50 kHz by a Macintosh computer running Axograph X acquisition software (Axograph Scientific, Sydney, Australia) using an ITC-18 interface (Instrutech/HEKA, Germany). Hyperpolarizing and depolarizing current steps (200 pA to +600 pA; intervals of 50 pA) were applied via the somatic recording pipette to characterize the passive and active properties of neurons. Brain slices were bathed in gabazine (10 µM) to block inhibition mediated by GABA-A receptors. Other pharmacological agents used in these experiments included tetrodotoxin (TTX; 1 µM) and 4-aminopyridine (4-AP; 100 µM), as noted in the Results. For photo-stimulation of ChR2-expressing neurons and axon terminals, a 470 nm LED (Thorlabs) was mounted on the epi-fluorescent port of the microscope (Olympus BX50) allowing wide-field illumination through the microscope objective. The timing, duration, and strength of LED illumination were controlled by the data acquisition software (Axograph).
Computational Modeling
We conducted biophyscial modeling using the NEURON 8.2 simulation environment (Carnevale and Hines [2006], Hines and Carnevale [1997]), with an integration time constant of 25 µs. The active and passive properties of the model were optimized to match the experimental recordings (Supp Fig 1). We set the passive parameters as follows: Internal/axial resistance (Ri/Ra) to 150 Ω.cm, membrane resistance (Rm) to 30 KΩ.cm2, capacitance (Cm) to 1 µF/Cm2 and resting membrane potential (Vm) to −75 mV. All neurons were simplified and implemented as a “ball and stick” model consisting of a somatic compartment (dimensions: Length=50 µm; diameter=50 µm) and a single dendritic compartment (dimensions: Length=100 µm; diameter=1 µm). Dendritic compartments were passive and were not adjusted for spines in the interneuron population but were adjusted for spines in pyramidal neurons by scaling the Cm by 2 and Rm by 0.5. Active conductances were included in the somatic compartment to mimic the regular firing pattern of pyramidal neurons, fast-spiking pattern of PV, burst-spiking for SST, and delayed spiking from the Lamp5. Active ion-channel distribution and its conductance values are obtained from our previous study (Soldado-Magraner et al. [2020]). A synapse was modeled as a co-localized combination of NMDA and AMPA receptor currents. A default value of NMDAR:AMPAR ratio was set at 1.5. All the values related to synaptic parameters were obtained from our previous study (Honnuraiah and Narayanan [2013], Testa-Silva et al. [2022]).
Rate-based abstract neural network model
We developed a simplified abstract model to reduce computational demands and extract the principles from detailed biophysical network models. We have used a rate-based approach to model neuronal activity. We approximate the neuronal activation by a linear-threshold function that describes the output action potential discharge rate of the neuron as a function of its input Bauer et al. [2014]. This type of neuronal activation function is a good approximation to experimental and biophysical observations of the frequency of action potential discharge to synaptic or current inputs. The change in activity of a neuron is modeled as the summation of synaptic input with a decay of the current activity. The dynamics of the activity of the rate-based neurons implemented are given below:
Where, [exct]+ is excitatory neuron activity, [inht]+ is inhibitory neuron activity, and τ is the neuronal time constant, α, β1, and β2 are synaptic weights (Figure 2B).
The implementation of the computational primitives obtained from the biophysical models in the rate-based abstract models was crucial. This is because it provided analytically tractable solutions for the dynamics of neural activity. The analytical solution was later used to derive constraints for the TN hardware parameters as explained in the Appendix A2.
IBM TrueNorth hardware and Compass software emulator
The IBM TN neuromorphic chip is composed of 64×64 (4096) digital neurosynaptic cores tiled in a 2-D array, containing an aggregate of 1 million neurons and 256 million synapses. Each core implements 256 neurons single-compartment leaky-integrate-and-fire neurons which could be operated in either rate or spike mode configuration. Each core is supported by a 256×256 crossbar synapse array, and communication circuits to transfer spike trains. The crossbar array is flexible and can be configured freely. Each row of the crossbar corresponds to an axon of the neuron represented by horizontal lines which could be driven by any on-chip neurons. Inputs to the cores are generated by configuring the on-chip neurons to generate various frequencies either in rate or spike mode. Each column corresponds to a dendrite of that particular neuron represented by a horizontal line. A connection between an axon and a dendrite is a synapse and is organized into a synaptic crossbar (Supplementary Figure 4A). A peripheral memory core is located at the intersection of each row and a column, and the binary value stored in the core represents whether or not a connection exists between the particular axon-dendrite pair. Therefore, each neuron can be configured to receive up to 1024 synaptic inputs (through its dendrite) depending on the crossbar value and the activity of the axons. TN operates in a mixed asynchronous–synchronous approach. All the communication and control circuits operate in asynchronous design while computations are done in synchronous design. Since TN cores operate in parallel and are governed by spike events, it is natural to implement all the routing mechanisms asynchronously. All the core computations must finish with finish in the current tick which spans 1 ms. Compass is the software emulator to program and simulate the full 4096 neurosynaptic cores and the digital asynchronous–synchronous design ensures one-to-one compass to TN correspondence (Akopyan et al. [2015] Merolla et al. [2014]).
Vision Transformer (ViT) architecture
Inspired by the original transformer (Vaswani [2017]) architecture for Natural Language Processing, ViT (Dosovitskiy et al. [2020]) is a self-attention-based architecture. It works as follows: the input image is distributed into N (flattened 2D) patches (where we keep N=6 for the MNIST experiments) and linear embeddings of these patches are fed as an input to the encoder of the trained ViT. The image patches of digits are embedded as tokens. The encoder block has several multi-headed layers with self-attention along with a normalization layer at the start of each layer. Furthermore, a Multi-Layer Perceptron (MLP) with a single hidden layer is used as a classification head that predicts the object categories present in the input image.
WTA implementation in ViT
We present a Winner-Take-All (WTA) approximation as a neuro-inspired layer in the Vision Transformer (ViT) architecture. Inspired by distinctive properties of cortical circuits in the mammalian visual cortex, this layer captures the neural signal regulation characteristics. A defining feature of these neurons lies in their ability to intricately capture and encode the contrasting elements and structural nuances of visual stimuli. This capability is reflected in the variable neural firing sequence of the neurons that WTA emulates, aligning closely with the varying contrasts present in the stimuli. This results in a more refined and contextually aware representation, which is particularly beneficial in object classification contexts where adaptability and nuanced understanding of visual stimuli are crucial. Mathematical implementation is described in Appendix A4.
MobileNet with WTA
Developed for embedded devices such as mobile phones, etc., MobileNet-v2 (Sandler et al. [2018]) is successfully reducing the number of parameters by depth-wise separable convolutions, while keeping the accuracy comparable to the state-of-the-art. We initialized the weights of MobileNet-V2, trained on ImageNet, and added the WTA layer to the model.
EfficientNet with WTA
EfficientNet-B0 (Tan and Le [2019]) is used with pre-trained weights for ImageNet. It is pre-trained to classify 1000 image classes and trained on more than a million images. We initialized the convolution layers with the pre-trained model weights. We added the WTA layer after the input layer into the model.
ResNet with WTA
Among different variants of ResNet, we select ResNet-50 (He et al. [2016]), which contains 50 neural network layers. The introduction of skip connections reduces the problem of vanishing gradient and also ensures that the higher layers do not perform any worse than the layers before by learning the identity function. Similar to the other models above, we are initializing the weights with the pre-trained model on ImageNet and including the WTA layer after the input layer in the model architecture.
CapsuleNet with WTA
Unlike other Convolutional Neural Network (CNN) architectures, CapsuleNet (Sabour et al. [2018]) applies pattern matching by decomposing the hierarchical representations of the input features. The eventual representation of this network is supposed to be invariant to the view-angle of the input samples. One of the major differences between a typical CNN and CapsuleNet is the output of the individual units in their architecture. While the output of a single neuron in a CNN is mostly a scalar value, it is a vector in the case of CapsuleNet. Similar to the ViT, we include the WTA layer as an initial layer in the CapsuleNet architecture to make it robust for domain adaptation tasks for digit datasets.
RefineNet with WTA
RefineNet (Lin et al. [2017]) is a versatile multi-path refinement network that leverages all the information gathered during the down-sampling process to facilitate highresolution prediction through long-range residual connections. This approach enables the deeper layers, which capture high-level semantic features, to be directly refined using fine-grained features from earlier convolutions. The individual components of RefineNet employ residual connections following the identity mapping principle, enabling efficient end-to-end training. In our experiments, we used pre-trained RGB weights of ResNet on ImageNet for training and testing RefineNet with and without adding a WTA layer.
MNIST and digit datasets
For the domain generalization task, we utilized a suite of digit datasets that included the MNIST, SVHN, USPS, and MNIST-M (LeCun et al. [2010], Netzer et al. [2011], Hull [1994], Ganin et al. [2016]). Each dataset was split into 70-20-10 train, val, and test splits.
MNIST dataset, introduced by LeCun et al. [2010], is one of the most widely used datasets for handwritten digit classification. It contains a total of 70,000 grayscale images of handwritten digits. Each image is of size 28×28 pixels, and the dataset has been instrumental in benchmarking various machine learning algorithms.
SVHN (Street View House Numbers) dataset, presented by Netzer et al. [2011] is a real-world image dataset obtained from house numbers in Google Street View images. It comprises over 600,000 digit images. Specifically, it contains 73,257 digits for training, 26,032 digits for testing, and an additional 531,131 somewhat less difficult samples that can be used as extra training data. This dataset challenges models with recognizing digits in more complex and varied scenarios compared to the controlled environment of MNIST.
USPS (United States Postal Service) dataset, introduced by Hull [1994] is another handwritten digit dataset used for text recognition research. It contains 9,298 16×16 grayscale images of handwritten digits. The dataset was derived from scanned mail and has been a staple in the handwritten digit recognition field.
MNIST-M dataset, presented in the work by Ganin et al. [2016], is a modified version of the original MNIST dataset. It was created by overlaying MNIST digits onto patches randomly extracted from color photos of the BSDS500 dataset (Arbelaez et al. [2010]), resulting in a blend of digits and colored backgrounds. The MNIST-M dataset contains 149,002 images. This combination introduces additional challenges due to the color and texture variations in the background, making it a valuable dataset for studying domain adaptation.
For domain generalization tasks, these datasets are particularly valuable because they offer variations in terms of image quality, resolution, and real-world applicability. The diversity in these datasets, ranging from clean handwritten digits to digits in natural scenes, challenges models to generalize well across different domains. This makes them ideal benchmarks for evaluating the robustness and adaptability of machine learning algorithms, especially in scenarios where the training and test data distributions differ significantly.
Natural image dataset
To explore the effect of WTA for segmentation tasks on natural images, we select a cityscape Cordts et al. [2016] data for training and nighttime driving Dai and Van Gool [2018] dataset for testing. This evaluation aims to test the robustness of the model against day-to-night time domain shifts.
Acknowledgments
We thank Ehsan Arabzadeh, William Connelly, and Giacomo Indiveri for the fruitful discussion. We thank Jan Drugowitsch, Christopher Harvey, Debanjan Dasgupta, Alessandro Galloni, and Samuel Gershman for their helpful comments and constructive criticism of the manuscript. We thank Dharmendra Modha, Hayley Hu, and Ben Shaw for assistance and guidance on the TrueNorth hardware. The neuromorphic implementation was performed at the Institute of Neuroinformatics (INI), UZH/ETH Zurich as a part of the IBM-INI collaboration. We thank the organizers of the Telluride workshop and IBM Bootcamp. We acknowledge the generous support of high-computing GPUs from Tibbling Technologies for running experiments to train and test the WTA implementation in deep learning architectures.
Appendix
0.1 Emulating Cortical Neuron Physiology in TN Hardware
We have derived a relationship between the parameters of the COMPASS neurons in order to obtain the desired dynamic range, shown below:
Here,
Δr = Dynamic range of the neuron,
= Sensitivity of single weight,
Nsyn = Total number of synapses,
= Crossbar synaptic weight,
= Range of the crossbar weight,
λl = Leak parameter of the TN neuron.
The threshold (λl) of the COMPASS neuron is decided based on the desired dynamic range and the total number of synapses (Nsyn), such that the sensitivity of a single synapse is preserved while setting up the actual crossbar synaptic weights .
For example, let us assume that we want to set the parameters of a COMPASS neuron that receives 10 synaptic inputs and has a dynamic range of 1 (Λr). If all of the synaptic weights are equal, then each synapse will have an impact factor of 0.1 and we want a range of 50 for the crossbar synaptic weight . Then, according to the above equations, the threshold should be set to 2.
Thus, by using this relation, we can set the parameters of the TN neuron to obtain any behavior within the physiological range that closely matches the cortical neurons. TN simulation results below verifying the above relation and to understand the role of synaptic weight on the transfer function.
We tuned the feedback and feedforward inhibition of the model to match the impact of PV and SST activation on the pyramidal neuron. The subtractive inhibition is obtained by tuning the threshold (θ) value. The divisive inhibition is implemented by tuning the crossbar synaptic weight to negative values and leak parameter value (λl). Subtractive inhibition is implemented by tuning the threshold value of the TN neuron. The TN simulation results were verified with a conductance-based model implemented in NEURON.
We have derived a relationship between the parameters of the TN neurons to incorporate biophysically plausible excitatory and inhibitory synaptic interaction as shown below: Δr = Dynamic range of the Neuron.
Nsyn = Total number of synapses.
Next = Excitatory synapses.
Ninh = Inhibitory synapses.
= Crossbar excitatory weight.
= range of the excitatory weight.
= Crossbar inhibitory weight.
= range of the inhibitory weight.
λl = leak of the TN neuron. where,
Thus, by appropriately tuning the parameters of the TN neuron {α, λl, we can achieve the desired, biophysically realistic synaptic integration that closely matches the cortical neurons.
0.2 Automated Parameter Mapping to TN parameters
TN neurons are configured to operate as linear threshold units (in rate-based mode) as described in the previous section. Based on this linear operation, we can estimate the role of the self-excitatory feedback connection on the transfer function, according to the equation below: fout = output firing rate; fin = Input rate; λl = Leak of TN neuron.
Based on the linear operation, we can estimate the role of the recurrent excitatory and inhibitory feedback connection on the transfer function, according to the equation below:
The effect of leak and threshold on the transfer function is quantified and the relationship between the parameters is shown below: fout = output firing rate; fin = Input rate; θ = Threshold, λl = Leak of TN neuron and σ =Sign of Leak.
To achieve winner-take-all behavior, the parameters have to satisfy certain constraints imposed by Contraction analysis shown below: αm; β1m; β2m = parameters in programming platform. αc; β1c; β2c = parameters in TN. For the programming model, the parameters must satisfy the following criteria:
The optimal solution in the programming environment’s parametric space satisfying the above conditions is:
Now, using these parameters we can obtain the corresponding Truenorth/Compass parameter values according to the equation we have derived that maps the parameters from the programming platform to TrueNorth/Compass space, shown below:
Substituting the values, we obtain the corresponding optimal TrueNorth/Compass parameters that satisfy the contraction analysis criteria in TrueNorth/Compass space:
We plug in the above values in the TrueNorth/Compass circuit shown in Figure 2 and verify if the following WTA functional characteristics are satisfied:
Non-linear signal amplification (winner selection). (Validated in Figure 2G)
Robustness and signal restoration (broadly tuned inputs). (Figure 2H)
Dynamic switching and multi-stability. (Validated in Figure 2I)
Thus, by appropriately tuning the parameters of the TN neuron {α, λl} we can achieve the desired, biophysically realistic synaptic integration that closely matches the cortical neurons.
0.3 Hardware load analysis
Computation load = C(N) × numTicks
C(N) = I(N) + O(N)
C(N) = Number of Connector pins
I(N) = Number of Input pins
O(N) = Number of Output pins
0.4 Mathematical formulation of WTA in ViT
To mathematically implement the WTA layer, we process an input image I ∈ ℝ W ×H×C, where W, H, and C represent its width, height, and channel count, respectively. Our goal is to form a domainindependent representation, denoted as IG. The image I is segmented into patches of size s, represented as , where each patch p of size s encircles a pixel k at coordinates i, j. For each patch, its mean and standard deviation are calculated to construct IG: where:
Footnotes
↵† Senior author and Lead contact