Abstract
Can the human brain, a complex interconnected structure of over 80 billion neurons learn to control itself at the most elemental scale – a single neuron. We directly linked the firing rate of a single (direct) neuron to the position of a box on a screen, which participants tried to control. Remarkably, all subjects upregulated the firing rate of the direct neuron in memory structures of their brain. Learning was accompanied by improved performance over trials, simultaneous decorrelation of the direct neuron to local neurons, and direct neuron to beta frequency oscillation phase-locking. Such previously unexplored neuroprosthetic skill learning within memory related brain structures, and associated beta frequency phase-locking implicates the ventral striatum. Our demonstration that humans can volitionally control neuronal activity in mnemonic structures, may provide new ways of probing the function and plasticity of human memory without exogenous stimulation.
Advances in physical, and computational tools continue to inspire the development of devices to interrogate brain circuits and restore lost neural functioning. The motor system has long been a target for such devices, with an emerging interest in neuromodulatory as well as neuroprosthetic technologies for the interrogation and augmentation of cognition – in particular memory (1–5). The seminal work of Eberhard Fetz in the late 1960s, demonstrated that with the appropriate feedback and reward, monkeys can learn to control the activity of individual neurons in the primary motor cortex(6, 7). More recent work using advanced imaging and stimulation technologies in transgenic mice have demonstrated intentional neuroprosthetic learning of individual neurons within primary motor, and visual cortices (8–12). Whether such high fidelity neuroprosthetic skill learning is possible in the much larger and more architecturally complex human brain remains unknown. More specifically, it is unknown if such neuroprosthetic skills can be acquired in mnemonic structures that are not directly connected to the dorsal striatum (13, 14), a structure which appears essential for neuroprosthetic skill learning in the neocortex (10–12).
At a large spatial scales, scalp electroencephalography (EEG) has provided varied, albeit supportive literature regarding the efficacy of biofeedback to control oscillatory power in non-motor regions of the human brain (15–19). On a mesoscopic scale intracranial EEG (iEEG) recordings, have shown humans can control oscillations in the local field potential (LFP) within medial temporal lobe structures (20, 21). Few have even reported the possibility of controlling neuronal activity in medial temporal lobe (22), and other non-motor structures (23), however such control relied on invocation of previously identified concepts or motor imagery. Thus it remains unknown if operant conditioning of individual neurons within memory structures of the human brain is possible.
To explore this question we exploited the unique opportunity to obtain human single neuron recordings from epilepsy patients undergoing diagnostic iEEG to assess their surgical candidacy. We developed a closed-loop real-time instrumental learning task, where visual feedback is provided to participants as they try to learn to increase the firing rate of an arbitrarily chosen neuron. We show that: 1) humans can volitionally increase the firing rate of arbitrary individual neurons; 2) as with all forms of instrumental learning only a subset of participants get better at the task (learners); and 3) only learners demonstrated an increase in local spike field coherence (SFC), with the strongest SFC in the beta band, an uncommon oscillation observed in the human hippocampus. Our findings show that: 1) instrumental learning to control arbitrary individual neurons is possible in the human brain; 2) that such learning is possible outside of primary motor and sensory areas, and of particular interest in mnemonic structures, and; 3) intriguingly the unique beta band SFC signature of learners implicates the striatum, likely the ventral striatum, in instrumental learning within mnemonic structures (14, 25).
Results
Performing the Neurofeedback Task
We developed a neurofeedback task which required upregulation of the firing rate of an arbitrarily chosen neuron (henceforth called the direct neuron; Figure 1), from a mnemonic and/or non-motor brain region (Table S1. See Methods in Supplementary Materials for details on the choice of the direct neuron). Spiking activity of the direct neuron was sorted in real-time and convolved with a 200ms Gaussian (21) to obtain its smoothed instantaneous firing rate. The smoothed firing rate of the direct neuron was linearly mapped onto the vertical position of a square on a screen placed in front of the participant (see Methods in Supplementary Materials). Participants were instructed to try and move the block above a white horizontal line (threshold). Maintaining the box above threshold for over half a second indicated success. A successful trial was displayed, followed shortly by a distractor question, after which the next trial was triggered. In this way, it was ensured that each trial ended in a success. Testing was divided into blocks of 10 trials, and the participants were asked to finish the 10 trials in 10 minutes or less. To keep the participants motivated, we increased the difficulty of the next block of trials (by moving up the target line) if the previous 10 trials were completed in less than 5 minutes. Eleven participants completed a total of 17 sessions, where they controlled a different direct neuron in each session.
All participants completed at least 30 trials (57±22 trials) indicating that all the participants were able to successfully upregulate the activity of their direct neuron (Figure 2A). Conversely, the firing rate of all other neurons recorded from the same bundle of microwires as the direct neuron (henceforth called indirect neurons) did not change prior to successful completion of the trial. Similarly, we did not observe a significant change in the firing rate of the 743 indirect neurons recorded from other microwire bundles prior to successful trial completion (Figure S1-A). To quantify this further we calculated the modulation depth of the direct neurons, defined as the average firing rate in the one second window after success subtracted from the average firing rate in the one second window before success. If success was triggered by random bursts of activity, the modulation depth would be close to zero since the bursts would likely continue into the post-success period. To the contrary, we saw a sharp decline in the activity of the direct neurons immediately following success (Figure 2B), resulting in a modulation depth significantly greater than zero (p<0.001, Single sample T-Test). To determine whether this type of upregulation was specific to the direct neuron, we calculated the modulation depth of indirect neurons. Indirect neurons’ firing rates were neither task-contingent, nor modulated with the direct neuron as evidenced by their modulation depth being close to zero (Figure 2B and Figure S1B).
Learning to Improve Performance
Neuroprosthetic skill learning is not uniformly acquired, where up to 30% of subjects are “non-learners” (26). We classified each session as a “learner” session when the participant was able to upregulate the average and/or peak firing rate of the direct neuron within the session (20). To do this, we performed a linear regression between the average and peak firing rate of the direct neuron as a function of the trial number. Sessions were defined as learner sessions if there was a significantly positive trend in either the peak or average firing rate of the direct neuron (Figure 2C; see Methods in Supplementary Materials for more details). With this definition, we defined 10 sessions as learners (across 7 patients) and the remaining 7 sessions as non-learners (see Supplementary Table 1 for participant demographics). Thus, while all participants were able to upregulate the activity of the direct neurons, only during some sessions were they able to improve their performance in the task.
As expected, the average firing rate of the direct neuron in learner sessions was significantly higher in the later trials (i.e. the last 15 trials) compared to the early trials (i.e. the first 15 trials) (Figure 2E, top panel). Similarly, the burst frequency of the direct neurons (calculated using a modified Poisson-surprise method) also increased significantly in the learner sessions but not in the non-learner sessions (Figure 2E, bottom panel). Indirect neurons demonstrated the opposite trend, with average firing rate and burst frequency increasing (by a small albeit significant magnitude) in the non-learner sessions, but not in the learner sessions (Figure 2F). Interestingly, the firing rate or burst frequency of indirect neurons recorded from other brain regions did not change from early to late trials in learners or non-learners (Figure S1-C and D). There is thus a stark dissociation between the neural activity between the direct and indirect neuronal populations, where during learner sessions, participants selectively modulate the activity of the direct neurons, and in non-learner sessions they unknowingly modulate the activity of neighbouring neurons, while failing to modulate the direct neuron. Thus, in the human brain, learning is accompanied by selective, volitional control over the direct neuron, whereas unsuccessful learning is characterized by non-specific modulation of the entire neural subpopulation consisting of both direct and indirect neurons. This dissociation is further exemplified by a decorrelation of the activity of the direct neuron from the neighbouring indirect neurons in the learner sessions, and an increase int his correlation in the non-learner sessions (Figure 2G). This finding mirrors similar findings reported using calcium imaging studies in rodents.(9).
Spike-Field-Coherence develops During Learning
During neuroprosthetic skill acquisition in rodents, learning is accompanied by increased cortico-striatal communication evidenced by cortico-striatal coherence observed in the LFP (11), as well as spike field coherence (SFC) between cortical neurons and striatal oscillations and vice versa (10–12). Striatum is not a clinical target in iEEG recordings in epilepsy patients, and thus we were unable to test the hypothesis of striatal communication in the volitional control of individual neurons in humans. However we used rodent SFC findings to motivate a similar analysis to infer ‘communication through coherence’ (27) if such SFC was observed. Towards this end we computed the SFC between direct neurons and the LFP recorded by the closest macro contact of the Behnke-Fried electrode (local LFP; Figure 3A). For learners we found a striking increase in the SFC in the 10-20Hz range immediately before success (Figure 3A&B), while the non-learner population displayed no such increase in SFC. The ability to learn this skill is thus associated with a unique electrophysiological state of the brain (28), evidenced by increased SFC in the beta frequency range, and likely different from other “learning” states of the human brain (29).
Since indirect neurons are not task relevant (their firing rates did not contribute to success), we anticipated that these indirect neurons would not develop the same learning-related SFC that was observed for direct neurons. To test this hypothesis, we calculated the SFC for indirect neurons to the local LFP and found no learning-related changes in both the learner and non-learner populations (Figure 3C). Prior to calculating the SFC, the firing rate of early and late trials were matched using a spike thinning procedure to prevent any biases resulting from unequal firing rates between the conditions (see Methods for details). Phase-related measures can often be affected by changes in oscillatory power (30). To determine whether the observed change in learning-related SFC was affected by spectral power changes, we computed the power spectra in the same time period in early and late trials. We found no differences in the power in the 10-20Hz frequency bands (Figure 3 D and E). Thus, in the absence of firing rate and power-related changes, the observed changes in the learning-related SFC must be driven by changes in spike timing immediately before success.
Additionally, we observed instances where the same participant could learn successfully in one session, but not in another (Supplementary Table 1). Despite this, we observed the learning related SFC changes confined to the learner sessions, suggesting the specificity of these changes to the act of learning itself, and not to other demographic factors.
The observed SFC in the beta band might be due to volume conducted low frequency oscillations. To address this specifically we calculated the SFC between the direct neurons and the LFP at non-local macro electrodes throughout the brain (Figure S2). In addition to the increase in the 10-20Hz frequency band SFC between the direct and non-local LFP was (Figure S2-D), an even more profound increase in theta frequency SFC observed in the SFC of the direct neuron to non-local LFP. Since theta is a ubiquitous oscillation in the human brain (31), including the human hippocampus (29, 32) and more likely to contribute to volume conduction (33), the beta frequency learning related SFC appears to be both specific in frequency range and local to the direct neuron during neuroprosthetic skill learning.
Learning Related Spike-Field-Coherence Is Distinct from Anticipatory Reward
Since the participants were asked to hold the square above the threshold for more than 500ms, we wondered whether the observed SFC in the period immediately before success was driven by a reward anticipation mechanism (34). To test this theory, we extracted epochs around unsuccessful threshold crossings, i.e. points in time when the firing rate of the direct neuron crossed the threshold but for an insufficient time to trigger a successful trial. We hypothesized that if the 10-20Hz SFC we observed in the pre-success period was indeed the result of an anticipatory reward mechanism, we would observe a similar increase in the 10-20Hz SFC immediately after unsuccessful threshold crossings. Arguing against such an anticipatory reward mechanism, the spike-field-coherogram of the threshold crossing-aligned epochs (Figure 4A) did not demonstrate an increase in the 10-20Hz SFC immediately after threshold crossings (Figure 4B, top panel). Furthermore, there was no change in the SFC in this frequency band in the post-threshold crossing window between the early and late trials (Figure 4C) for learners and non-learners, confirming that this type of reward anticipation does not drive the learning related SFC changes observed in the success aligned epochs.
Interestingly, we did observe a significant increase in the delta-band (1-3Hz) SFC immediately following threshold crossings (Figure 4B). This finding was concordant with the increased delta SFC observed in the window immediately surrounding success (Figure 3A). To determine whether this delta SFC was learning related, we compared the SFC in the post-threshold crossing window in the early vs the late trials (Figure 4C). We observed no significant difference in the delta SFC in this window in the early vs. late trials (for learners and non-learners), suggesting that the delta-SFC was not learning-related, and likely related to the design of the neurofeedback task. Consistent with this hypothesis the delta-SFC increase was associated with a delta power increase in a similar time window (Figure S3). This suggests that the observed delta-SFC increase is likely driven by the image onset evoked response due to the colour change of the square from red to purple when it crosses the threshold (32) rather than either anticipatory reward, or a learning related mechanism.
Discussion
Here we demonstrate, using a visual neurofeedback task, that humans can learn to upregulate the activity of arbitrarily chosen neurons in their brain in a highly specific and volitional manner. Our results compliment non-human primate and rodent single neuron neuroprosthetic skill learning research and extend the possibility of such learning beyond previous work in sensorimotor cortices to associational structures of the brain, in particular the limbic system.
A large body of existing literature provides evidence for this type of neuroprosthetic skill learning in the motor cortex of rodents (9–12, 35), and primates (6, 7, 36–38). Control at the single neuron level in the motor cortex has been shown to require the dorsal striatum (11, 12), which serves as an input tier for the basal ganglia. Hence, neuroprosthetic skill learning in the motor cortex is largely analogous to motor learning, in which cortico-basal ganglia loops facilitate an action selection process where competing motor programs are either inhibited or released from inhibition. This is facilitated by parallel direct and indirect pathways which allow disinhibition and inhibition, respectively, of neuronal ensembles in the sensorimotor cortices, allowing for selection of a contextually relevant motor program (39). Similar cortico-basal ganglia loops are implicated in selection and generation of a variety of different cognitive patterns which may facilitate more abstract skill learning (40). In fact, recent rodent studies provide conclusive evidence that animals can modulate highly specific neuronal activity in primary sensory cortex which again is dependent on the dorsal striatum, similar to learning in the motor cortex (10). Since the majority of the neocortex projects to the dorsal striatum (13, 41), we anticipate that this type of neuroprosthetic skill learning may be possible in most of the neocortex.
In this study however, we demonstrate that this type of learning is also possible in the paleo-cortex of the human brain, as well as other non-motor, non-sensory regions. These structures are largely dissociated from the dorsal striatal system (14). However, despite this dissociation, we demonstrated that participants learned to modulate activity in a specific and volitional manner, much like other neocortical regions explored in non-human primates and rodents. Motor skill learning, and neuroprosthetic skill learning, proceeds in a prototypical manner, where the early phase of learning is characterized by a rapid acquisition of task parameters, following by a slower refinement process (10–12, 28). The experimental sessions in this study were not long enough to investigate the later stages of learning, but we robustly demonstrate the early stage of learning, characterized by rapid changes in the firing characteristics of the direct neurons. While limbic structures do not directly project to the dorsal striatum, they do project heavily to the ventral striatum. In fact, the ventral striatum is thought to serve as the interface between the limbic and motor systems (25, 42). So is it possible that the type of neuroprosthetic skill learning that we demonstrate here is facilitated by the ventral striatum instead?
While we cannot answer this question by directly recording activity from the ventral striatum in humans, we sought out signatures of this interaction, where we observed an increase in SFC in the high alpha/low beta bands as learning progressed. This increase in SFC was independent of power or firing rate changes, was specific to the direct neurons, and occurred only in learning sessions. In the medial temporal lobe, oscillations in this frequency band are rarely observed and in particular have not been reported in human MTL regions where delta, theta, and gamma frequency activity has been associated with a myriad of behaviors (32, 43–45) including neuroprosthetic skill learning (20, 21). In rodents Lansink and colleagues demonstrated beta oscillations in the hippocampus driven by reward predictive cues, and enhanced by learning (34). They also demonstrated the presence of hippocampal spiking activity phase-locked to the underlying beta oscillations and driven by reward-predictive cues. Interestingly, they also observed increased SFC coherence between neurons in the ventral striatum and beta oscillations in the hippocampus in response to reward predictive cues. Thus, beta oscillations in the hippocampus and related structures may be driven by a reward prediction mechanism, potentially driven by the ventral tegmental afferents to CA1 (46), or indirectly from the striatum via the ventral pallidal-mediodorsal thalamic route (47). When we aligned our data to unsuccessful reward crossings, we did not observe any reward-predictive increases in beta power or synchrony (Figure 4). Furthermore, Lansink and colleauges reported concomitant reward-predictive increase in theta power and theta SFC, which were also not present in our findings. Thus, the rodent studies suggest to us that the SFC that we observe in the beta frequency range: 1) may indeed reflect MTL-ventral striatal communication through coherence; 2) is unlikely to be a reward prediction signal, and; 3) is akin to the cortico-striatal coherence seen during instrumental learning in rodents and non-human primates (NHP).
Unlike limbic structures, beta oscillations are commonplace in the striatal system, in rodents (25, 48) and NHPs (49, 50), including the ventral striatum of rodents (Berke, 2009; Berke et al., 2004) and the nucleus accumbens in humans (51). In rodents Neely and colleagues investigated neuroprosthetic skill learning in the primary visual cortex (V1), while simultaneously recording from the dorsomedial striatum (which receives direct projections from the primary visual cortex) (41, 52). They demonstrated that as animals learned to control specific neuronal activity in V1, they increasingly recruited the striatum, demonstrated by an increased spiking of striatal neurons along with concomitant increases in beta and gamma power (10). Furthermore, and perhaps most intriguingly, they demonstrated that as learning progressed, spiking activity of the direct units in V1 became more coherent with the local field potentials in the 10-25Hz band. This finding mirrors our results in the human limbic structures, although we did not observe changes in oscillatory power in this frequency band as they reported. Since attention has actually been shown decrease alpha/beta band SFC in the visual cortex (53), the increased SFC we observe in the alpha/beta SFC is unlikely due to attentional modulation. Similarly, Koralek and colleagues demonstrated that neuroprosthetic skill learning in the primary motor cortex (M1) is accompanied by increased success-aligned spike field coherence between M1 spikes and striatal LFP in the 6-14Hz band (11). In light of these results our SFC findings likely reflect the signature of increased communication between MTL-ventral striatal ensembles that underlie the learning of the visual neuroprosthetic skill.
One of the canonical characteristics of the cortico-basal ganglia loops is the presence of parallel inhibitory and disinhibitory pathways (39, 54), which allow the basal ganglia to play a role in selection of context relevant motor plans or even cognitive strategies (40). The medium spiny neurons (MSNs), which are ubiquitous within the basal ganglia, are furnished with dopamine receptors in close proximity to the corticostriatal terminals (55). Dopaminergic innervation of these MSNs by the midbrain dopaminergic system facilitates plastic synaptic changes which shapes striatal, and the resulting basal ganglia outputs, playing a role in facilitating reward-based learning. While the dorsal striatum is unlikely to be involved in the limbic neuroprosthetic skill learning demonstrated here, the ventral striatum is also known to form cortico-basal ganglia loops (54), with a variety of limbic structures and the anterior cingulate cortex (ACC) as its primary input and output (14, 56–58). Since the MSNs that compose much of the striatum are difficult to excite (39), convergent input from the limbic structures and the ACC could drive ventral striatal MSNs, activating a series of parallel inhibitory and disinhibitory circuits that can be actively tuned via the midbrain dopaminergic system to facilitate reward-based learning of precise limbic activity patterns. Future work in animal models will certainly focus on interrogating this limbic-basal ganglia circuitry to establish the significance of the ventral striatum in facilitating this type of limbic neuroprosthetic skill learning.
The data presented here suggest that single neuron activity in limbic structures can be precisely regulated in a rapid, highly specific and volitional manner in humans. Furthermore, this type of neuroprosthetic skill learning in limbic structures is likely facilitated by the limbic-basal ganglia circuity involving the ventral striatum. Such, high fidelity self-regulation of neural activity may provide an avenue for the development of novel neuroprosthetics for the treatment of neurological conditions which commonly present with pathological activity in limbic structures, such as medically refractory epilepsy. Furthermore since limbic structures, and particularly those of the medial temporal lobe, are critical to mnemonic processes, obtaining volitional control over highly specific activity in these structures may provide a mechanism of probing the function and plasticity of these brain structures without exogenous stimulation.
Funding
This work was funded by the National Science and Engineering Research Council (RGPIN-2015-05936 to T.V. and RGPIN-2016-06358 to M.R.P), National Institutes of Health and Brain Canada (grant number 1U01NS103792-01 to T.V.), Canadian Fund for Innovation and Ontario Research Fund (Grant #35923), Vanier Canada Graduate Scholarships (to K.P. and C.N.K), Toronto General and Western Foundation, Dean Connor and Maris Uffelmann Donation, and Walter & Maria Schroeder Institute.
Author Contributions
Conceptualization – K.P., T.V.; Data curation – K.P; Formal Analysis – K.P.; Funding Acquisition – T.V.; Investigation – K.P., C.K.; Methodology – K.P.; Resources – K.P., T.V., S.K.; Software – K.P.; Supervision – M.P, T.V.; Visualization – K.P. Writing (original draft) – K.P.; Writing (review and editing) – K.P., C.K., M.P., S.K., T.V.
Competing Interests
Dr. Milos R. Popovic is a Director and Co-Founder of the company MyndTec Inc., that has no involvement in this study. Dr. Taufik Valiante is an investor in the company Neurescence, that has no involvement in this study. Dr. Valiante provides consulting to the company Panaxium, that has no involvement in this study.
Data and Materials Availability
The spike detection and sorting toolbox used for offline sorting (OSort), and the Chronux toolbox (use for spectral analysis) are both available as an open source toolboxes. Data and custom MATLAB scripts used for analysis here are available upon request from Taufik A. Valiante (taufik.valiante{at}uhn.ca).
Acknowledgements
We would like to acknowledge Victoria Barkley for her help with organizing and assisting with patient data collection, Andrea Gómez Palacio for her help with patient data collection, Ryan Ramos for his contributions towards developing the neurofeedback task, and David Groppe for his advice on choosing appropriate statistical analyses.