Interneuron Specific Gamma Synchronization Indexes Cue Uncertainty and Prediction Errors in Lateral Prefrontal and Anterior Cingulate Cortex

Inhibitory interneurons are believed to realize critical gating functions in cortical circuits, but it has been difficult to ascertain the content of gated information for well characterized interneurons in primate cortex. Here, we address this question by characterizing putative interneurons in primate prefrontal and anterior cingulate cortex while monkeys engaged in attention demanding reversal learning. We find a subclass of narrow spiking neurons with relative suppressive effects on the local circuit indicating they are inhibitory interneurons. The activity of one subclass of these interneurons prominently indexed area-specific information in their firing rates and in event-triggered (35-45 Hz) gamma band synchronization. Firing rates and gamma synchrony of this interneuron subclass indexed in prefrontal cortex the uncertainty of attention cues, and in anterior cingulate cortex the unexpectedness of outcomes during learning. Computational analysis suggest that these interneuron-specific activity dynamics reflect in prefrontal cortex the gating of expected stimulus values into choice probabilities, and in anterior cingulate cortex the gating of chosen stimulus values and the received rewards into reward prediction errors. These findings elucidate an electrophysiologically characterized interneuron subclass in the primate, that forms gamma synchronous networks in two different areas while realizing an area-specific computation during adaptive goal-directed behavior.


Introduction
Inhibitory interneurons in prefrontal cortex are frequently reported to be altered in neuropsychiatric diseases with debilitating consequences for cognitive functioning. Groups of fast spiking interneurons with basket cell or chandelier morphologies have consistently been found to be abnormal in individuals with schizophrenia and linked to dysfunctional working memory and reduced control of attention (Dienel and Lewis, 2019). Altered functioning of a non-fast spiking interneuron class is linked to reduced GABAergic tone in individuals with severe major depression (Levinson et al., 2010;Fee et al., 2017). These findings suggest that the circuit functions of different subtypes of interneurons in prefrontal cortices are critically important to regulate specific aspects of cognitive and affective functioning.
But it has remained a challenge to identify how individual interneuron subtypes support specific cognitive or affective functions in the nonhuman primate. For rodent prefrontal and anterior cingulate cortices, cells with distinguishable functions express differentially cholecystokinin (CCK), parvalbumin (PV) or somatostatin (SOM), amongst others (Roux and Buzsaki, 2015;Cardin, 2018). Prefrontal CCK expressing basket cells have been shown to impose inhibition that is required during the choice epoch, but not during the delay epoch of a working memory task (Nguyen et al., 2020). In contrast, retention of visual information during working memory delays has been shown to require activation specifically of PV+ expressing fast spiking interneurons (Lagler et al., 2016;Kamigaki and Dan, 2017;Nguyen et al., 2020). In the same prefrontal circuits, the PV+ neurons have also been associated with attentional orienting (Kim et al., 2016), shifting of attentional sets and response strategies during reward learning (Cho et al., 2015;Canetta et al., 2016;Cho et al., 2020), and with spatial reward choices (Lagler et al., 2016), among other functions (Pinto and Dan, 2015). Distinct from PV+, the group of somatostatin expressing neurons (SOM+) have been shown to be necessary during the initial encoding phase of a working memory task but not during the delay (Abbas et al., 2018), and in anterior cingulate cortex they activate specifically during the approach of reward sites (Kvitsiani et al., 2013;Urban-Ciecko and Barth, 2016). Taken together, these findings illustrate that rodent prefrontal cortex interneurons expressing PV, SOM or CCK fulfill separable, unique roles at different processing stages during goal-directed task performance (Pinto and Dan, 2015;Lagler et al., 2016).
The rich insights into cell-specific circuit functions in rodent prefrontal cortices stand in stark contrast to the limited empirical data from primate prefrontal cortex. While there are recent advances of optogenetic tools for use in primates (Acker et al., 2016;Dimidschstein et al., 2016;Gong et al., 2020), most existing knowledge about cell specific circuit functions are indirectly inferred from studies that distinguish only one group of putative interneuron that show narrow action potential spike width. Compared to broad spiking neurons the group of narrow spiking, putative interneurons in lateral prefrontal cortex have been found to more likely encode categorical information during working memory delays (Diester and Nieder, 2008), show stronger stimulus onset responses during cognitive control tasks (Johnston et al., 2009), stronger attentional modulation (Thiele et al., 2016), more location-specific encoding of task rules (Johnston et al., 2009), stronger reduction of firing selectivity for task irrelevant stimulus features (Hussar and Pasternak, 2009), stronger encoding of errors and loss (Shen et al., 2015;Sajad et al., 2019), more likely encoding of outcome history (Kawai et al., 2019), and stronger encoding of feature-specific reward prediction errors (Oemisch et al., 2019a), amongst other unique firing characteristics Ardid et al., 2015;Rich and Wallis, 2017;Voloh and Womelsdorf, 2018;Torres-Gomez et al., 2020).
These summarized findings suggest that there are subtypes of narrow spiking neurons that are particularly important to regulate prefrontal circuit functions. But it is unclear whether these narrow spiking neurons are inhibitory interneurons and to which interneuron subclass they belong.
Comparisons of protein expression with action potential spike width have shown for prefrontal cortex that >95% of all PV+ and ~87% of all SOM+ interneurons show narrow spike width (Ghaderi et al., 2018;Torres-Gomez et al., 2020), while narrow spikes are also known to occur in ~20% of VIP interneurons (Torres-Gomez et al., 2020) among other GABAergic neurons (Krimer et al., 2005;Zaitsev et al., 2009), and (at least in primate motor cortex) in a subgroup of pyramidal cells (Soares et al., 2017). In addition, electrophysiological characterization have shown at least three different types of firing patterns in narrow spiking neurons of monkeys during attention demanding task Dasilva et al., 2019;Trainito et al., 2019). Taken together, these insights raise the possibility that spike width and electrophysiology will allow identifying the interneuron subtypes that are particularly important for prefrontal cortex functions.
Here, we approached this question by recording narrow spiking cells in nonhuman primate prefrontal and cingulate cortex during an attention demanding reversal learning task. We found that in both areas three narrow spiking neuron classes are well distinguished and show a suppressive influence on the local circuit activity compared to broad spiking neurons, supporting labeling them as inhibitory interneurons. Among these interneurons the same sub-type showed significant functional correlations in both ACC and LPFC, firing stronger to reward predictive cues when their predictability is still learned during the reversal (in LPFC), and firing stronger to outcomes when they are most unexpected during reversal. Notably, in both, ACC and in LPFC, these functions were evident in 35-45 Hz gamma rhythmic synchronization to the local field potential. These findings describe the specific electrophysiological subtype of narrow spiking interneurons that carries out area-specific functions during adaptive behavioral regulation.

Results
We used a color-based reversal paradigm that required subjects to learn which of two colors were rewarded with color-reward associations reversing every ~30-40 trials. Two different colors were assigned to stimuli appearing randomly left and right to a central fixation point (Fig. 1A). During the task the color information was presented independently from the up-/downward-direction of motion of the stimuli. The up-/downward-direction instructed the saccade direction that animals had to show to a Go event in order to receive reward. Motion was thus the cue for an overt choice (with saccadic eye movements), while color was the cue for covert selective attention. Color was shown either before (as Feature-1) or after the motion onset (as Feature-2) (Fig. 1B). Both animals took on average 7/7 (monkey H/K) trials to reach criterion performance, i.e. they learned which color was associated with reward and which one was unrewarded within 7 trials (Fig. 1C). The asymptotic performance accuracy was 87/89% for monkey's H/K (excluding fixation break errors).

Characterizing narrow spiking neurons as inhibitory interneurons
During reversal performance we recorded the activity of 329 single neurons in LPFC areas 46/9 and anterior area 8 (monkey H/K: 172/157) and 397 single neurons in dorsal ACC area 24 (monkey H/K: 213/184) (Fig. 1D, Suppl. Fig. 1). The average action potential waveform shape of recorded neurons distinguished neurons with broad and narrow spikes similar to previous studies in LPFC and ACC (Gregoriou et al., 2012;Ardid et al., 2015;Westendorff et al., 2016;Dasilva et al., 2019;Oemisch et al., 2019a) (Fig. 1E). Prior biophysical modeling has shown that the extracellular action potential waveform shape, including its duration, is directly related to transmembrane currents and the intracellularly measurable action potential shape and duration (Gold et al., 2006;Bean, 2007;Gold et al., 2007;Buzsaki et al., 2012). Based on this knowledge we quantified the extracellularly recorded spike duration of the inferred hyperpolarization rates and their inferred time-of-repolarizations (see Methods). These measures split narrow and broad spiking neurons into a bimodal distribution, which was better fit with two than one gaussian (Fig. 1E, calibrated Hartigan's dip test for bimodality, p<0.001; Bayesian information criterion for two and one gaussian fit: 4.0450, 4.8784). We found in LPFC 21% neurons had narrow spikes (n=259 broad, n=70 narrow cells) and in ACC 17% of neurons had narrow action potentials (n=331 broad, n=66 narrow cells).
To assess the excitatory or inhibitory identity of the broad and narrow spiking neurons (B-and N-type neurons), we estimated the power of multi-unit activity (MUA) in the vicinity (at different electrodes than the spiking neuron) around the time of spike for each cell and tested how this spiketriggered MUA-power changed before versus after the cell fired a spike (see Methods). This approach expects for an excitatory neuron to spike concomitant with neurons in the local population reflected in a symmetric rise and fall of MUA before and after its spike. In contrast, inhibitory neurons are expected to spike when MUA rises, but when the spike occurs it should contribute to suppress the local MUA activity, which should be reflected in a faster drop in MUA activity after the spike occurred (Oemisch et al., 2015). We found that B-type cells showed on average a symmetric pre-to post-spike triggered MUA activity modulation indicative of excitatory participation with local activity (Fig. 1F). In contrast, spikes of N-type cells were followed by faster drop of MUA activity indicating an inhibitory influence on MUA (Fig. 1F). The excitatory and inhibitory effects on local MUA activity were consistent across the populations and significantly distinguished B-and N-type neurons ( Fig. 1G; MUA modulation index: [(post MUAspike -pre MUAspike) / pre MUAspike] for B-vs N-type cells, Wilcoxon test, P=0.001). This distinction was evident within both, ACC and in LPFC ( Fig. 1H; for N-type MUA modulation index was different from zero, Wilcoxon test, in ACC (P<0.001) and in LPFC (P=0.03); for Btype cells the difference was not sign.). These findings provide additional evidence for considering narrow spiking cells as putative inhibitory interneurons.

Putative interneurons in prefrontal cortex encode the choice probability of targets
To discern how B-and N-type neurons encoded the learning of the rewarded color during reversal we analyzed neuronal response modulation around color onset, which instructed animals to covertly shift attention to the stimulus with the reward predicting color. In addition to this color (attention) cue we also analyzed activity around the motion onset that served as action cue. Its direction of motion indicated the saccade direction the animal had to elicit for receiving reward.
This action cue could happen either 0.5-0.9 sec. before or 0.5-0.9 sec. after the color cue. Many single neurons in LPFC selectively increased their firing to the color attention cue with no apparent modulation to the motion action cue (n=71 cells with firing increases to the color but not motion cue) (for examples: Fig. 2A, B). These neurons increased firing to the color onset when it was the first, or the second feature that was presented, but did not respond to the motion onset when it was shown as first or second feature (for more examples, see Suppl. Fig. S2).
We found that N-type neurons in LPFC change transiently their firing to the attention cue when it occurred either early or late relative to action cue (significant increase within 25-275ms post-cue for Feature 1 and within 50-250ms post-cue for Feature 2, p<0.05 randomization statistics, n=21 N-type cells with increases and 7 with decreases to the color cue, Fig. 2C). This attention cuespecific increase was absent for B-type neurons in LPFC (n.s., randomization statistics, n=44 Btype cells with increases and n=35 with decreases to the color cue, Fig. 2C). ACC N-and B-type neurons did not show on-response to the color cue (n=36 / 6 B-and N-type cells with increases, respectively, and n=31 / 12 B-and N-type cells with decreased firing, respectively, to the color cue, total cell number included in this analysis for B-and N-type n= 216 / 50 respectively) (Fig.   2D).
The N-type specific attention cue response might carry information about the rewarded stimulus color or the rewarded stimulus location. We found that the proportion of neurons whose firing rate significantly distinguished rewarded and nonrewarded colors sharply increased for N-type cells after the onset of the color cue in LPFC (proportion of color selective responses within 0-0.5 sec. after cue, 18%; n=10 of 54 N-type cells, randomization test p<0.05 within [175 575] ms after cue onset, but not in ACC (cells with significant information: 6%; n=3 of 50 N-type cells, ns., randomization test within [300 700] ms after cue onset) (Suppl. Fig. S3A,B). Similar to the selectivity for rewarded stimulus color N-type cells in LPFC (but not in ACC) showed significant encoding of the right versus left location of the rewarded stimulus (in LPFC: 22% with reward location information; n=12 of 54 N-type cells, randomization test p<0.05 within [200 500] ms after cue onset; in ACC: 10% with reward location information ;n=5 of 50 N-type cells, n.s. randomization test) (Suppl. Fig. S3C,D).
The color-specific firing increase and the encoding of the rewarded color by N-type neurons in LPFC suggest they support reversal learning. We tested this by correlating their firing rates around the color cue onset with the trial-by-trial variation on choice probability for choosing the stimulus with the rewarded color. Choice probability (p(choice)) was calculated with a reinforcement learning model that learned to optimize choices based on reward prediction errors (see Methods and (Oemisch et al., 2019a)). Choice probability rose after each reversal and asymptoted after around ~10 trials (Fig. 1C). We found that during the post-color onset time period 17% (n=20 of 120) of B-type cells and 27% (n=11 of 41) of N-type cells in LPFC significantly correlated their firing with p(choice), which was larger than expected by chance (binomial test B-type cells: p<0.001; NS cells: p<0.001). On average, N-type cells in LPFC showed positive correlations (Pearson r=0.068, Wilcoxon rank test, p=0.011), while B-type neurons showed on average no correlation (Wilcoxon rank test, p=0.20) (Fig. 2E). The positive p(choice) correlations of N-type neurons in LPFC grew following color onset and remained significant for 0.7s following color onset (N=41 N-type cells, randomization test, p<0.05 from 0-0.7 s post-cue, Fig. 2E). Compared to LPFC, significantly less N-type cells in ACC correlated with choice probability (6% (n=2 of 33) in ACC, versus 27% in LPFC, X 2 -test for prop. Difference, X 2 -stat= 5.45, p=0.019) and showed no p(choice) correlations over time (Wilcoxon rank test, p=0.49, n.s., Fig. 2F).

Putative interneurons in anterior cingulate cortex encode reward prediction errors.
Choice probabilities (p(choice)) increase during reversal learning when reward prediction errors (RPEs) of outcomes decrease, evident in anticorrelation of (p(choice)) and RPE of r=-0.928 in our task (Suppl. Fig. 4A). Prior studies have shown that RPEs are prevalently encoded in the ACC (Kennerley et al., 2011;Oemisch et al., 2019a). We therefore reasoned that RPEs might be preferentially encoded by the same narrow spiking putative interneurons. We estimated trial-bytrial RPEs with the same reinforcement learning model that also provided p(choice) for the previous analysis. We found that on average 23% of LPFC and 35% of ACC neurons showed

Classification of neural subtypes of putative interneurons.
We next asked whether the narrow spiking, putative interneurons that encode p(choice) in LPFC and RPE in ACC are from the same electrophysiological cell type, or e-type (Markram et al., 2015).
Prior studies have distinguished different narrow spiking e-types using the cells' spike train pattern and spike waveform duration Dasilva et al., 2019;Trainito et al., 2019;Banaie Boroujeni et al., 2020b). We followed this approach using a cluster analysis to distinguish e-types based on spike waveform duration parameters (inferred hyperpolarization rate and time to 25% repolarization), on whether their spike trains showed regular or variable interspike intervals (local variability 'LV'), or more or less variable firing relative to their mean interspike interval (coefficient of variation 'CV'). Clustering resulted in eight e-types ( Fig. 4A-C) with reliable cluster boundaries and consistent assignment of cells to individual classes (Suppl. Fig. 5) similar to previous clustering results Dasilva et al., 2019). Narrow spiking neurons fell into three e-types. The first narrow spiking 'N1' e-type (n=18, 13% of narrow spiking neurons) showed high firing rates and highly regular spike trains (low LV's, mean LV 0.47, SE 0.05), an   (Fig. 4B,C). LV values >1 indicate bursty firing patterns which is supported by a positive correlation of the LV of neurons with their probability to fire bursts defined as spikes occurring ≤ 5 ms apart (r = 0.44, p < 0.001). We next calculated the postto pre-spike-triggered MUA modulation ratio for each of the e-types. Across all e-types only the spike triggered MUA modulation ratio for the N3 e-type was different from zero (P<0.05, FDRcorrected) (Fig. 4D). Comparison between cell classes showed that the spike triggered MUA modulation ratio for the N3 e-type differed significantly from the B4 (p=0.02) and B5 (p=0.03) etypes.

The same putative interneuron subclass encodes p(choice) in LPFC and RPE in ACC.
The distinct e-types allowed testing how they correlated their firing with choice probability and with RPE. We found that the only e-type with a significant correlation of firing and choice difference to zero, multiple comparison corrected p<0.05). There was no subtype specific RPE correlation in LPFC (Fig. 5C,D).

Narrow spiking neurons synchronize to theta, beta and gamma band network rhythms.
Prior studies have suggested that interneurons have unique relationships to oscillatory activity (Cardin et al., 2009;Vinck et al., 2013;Womelsdorf et al., 2014a;Voloh and Womelsdorf, 2018), raising the possibility that N3 e-type neurons realize their functional role also through neuronal synchronization. To discern this, we first inspected the spike triggered LFP averages (STA's) of neurons and found that STA's of many N3 e-type neurons showed oscillatory sidelobes in the 10-30 Hz range (Fig. 6A). We quantified this phase synchrony by calculating the spike-LPF pairwise phase consistency (PPC) and extracting statistically significant peaks in the PPC spectrum (Vinck et al., 2012;Banaie Boroujeni et al., 2020a), which confirmed the presence of significant synchrony peaks across theta/alpha, beta and low gamma frequency ranges (Fig. 6B). The density of spike-LFP synchrony peaks showed a high prevalence of 15-30 Hz beta synchrony for broad spiking neurons in both, ACC and LPFC, a peak of ~5-12 Hz synchrony that was unique to ACC, and a high prevalence of 35-45 Hz gamma synchronization in narrow spiking cells (but not in broad spiking cells) in both areas (Fig. 6C). The synchrony peak densities of the N3 e-type neurons mimicked this overall pattern by showing beta to gamma synchrony peak density in LPFC and a 5-12 Hz theta/alpha and a gamma synchrony in ACC ( Fig. 6C) (for other e-types, see Suppl. Fig.   S7).
Interneuron-specific gamma synchrony to cues in LPFC and outcomes in ACC.
The overall synchrony patterns leave open whether the synchrony is task modulated or conveys information about choices and prediction errors. We addressed these questions by calculating spike-LFP phase synchronization time-resolved around the color cue onset (for LPFC) and around reward onset (for ACC) separately for trials with high and low choice probabilities (for LPFC) and high and low reward prediction errors (for ACC). We found in LPFC that the N3 e-type neurons For ACC, the N3 e-type neurons as well as the broad spiking neurons that significantly synchronized in a 35-42 Hz gamma band following the reward onset when RPE's were high (i.e. when the outcome was unexpected) (Fig. 8A,B). During these high RPE trials, the gamma synchrony emerged earlier (Fig. 8A,B) and significantly stronger in N3 e-type neurons than broad spiking neurons (Fig. 8C). In contrast, when RPE's were low the reward onset triggered increased 6-16 Hz theta/alpha frequency spike-LFP synchronization in N3 e-type and broad spiking neurons ( Fig. 8D,E). The increase of 8-14 Hz synchrony was significantly stronger in the N3 e-type than in broad spiking neurons (Fig. 8F).
A circuit model of interneuron-specific gating of values and outcome predictions. This model assumes that the two visual objects we present in our task drive two separate pyramidal cell ensembles in LPFC representing their expected value. These ensembles project to N3 e-type interneurons, presumed to be PV+, fast-spiking basket cells (see discussion), which are connected amongst themselves. When such an interneuron network is activated by excitatory inputs it can synchronize by way of mutual inhibition at beta or gamma frequencies depending on the total amount of drive the network receives (Wang and Buzsaki, 1996;White et al., 1998;Tiesinga and Jose, 2000). When both objects have similar values (see Fig. 9A), the drive to the network is high and it synchronizes in the gamma band. When one of the objects has a value that is much larger than the other (see Fig. 9B), it results in a medium level of drive that makes the network synchronize in the beta band. We observed such a switch from gamma to beta frequencies in N3 e-type interneurons in LPFC when the choice probabilities changed from low to high (Fig. 7). We therefore suggest that the N3 type inhibition accomplishes two things, it leads to a normalization that transforms the object value into a choice probability (a softmax gating of values) and its gamma synchrony selects a winner (Fig. 9A). A similar synchrony-based gating of diverse inputs has been described as a powerful circuit mechanism for selecting a target stimulus among distractors during visual attention and working memory tasks (Buia and Tiesinga, 2008;Sherfey et al., 2018;Sherfey et al., 2020). Modeling also suggests that the interneuron network is most likely heterogeneous, which means there will be N3 e-type neurons that increase their rate and others that decrease their rate upon synchronization . Importantly, the output of this gamma mediated gating in LPFC is a neural ensemble representing the values of the chosen object ('Vt-chosen' in Fig. 9). We hypothesize that this chosen value Vt is projected to ACC where N3 e-type neurons combine it with input about the reward Rt to calculate the reward prediction error RPE = Rt -Vt (Watabe-Uchida et al., 2017) (see eq. 2 in Methods). Since N3 etype neurons fired stronger when RPE was large (i.e. when the chosen value was low (≤0.5)) we assume that N3 e-type neurons do not receive Vt inputs directly, but indirectly from an intermediate interneuron that disinhibits them when the chosen values are high (Fig. 9A). Such disinhibition could be realized by N1 e-type neurons which showed relatively a positive spike triggered MUA modulation ratio and tended to correlate negatively with RPE in ACC (Fig.5D). The activation of this neuron suppresses N3 e-type neurons when the chosen value is high (Fig. 9B). But when the chosen value is low the N3 e-type neurons are not suppressed by Vt. They rather receive excitatory drive about the reward input Rt and show gamma synchronization, suggesting that the gamma synchronous RPE correlated N3 e-type neurons in ACC form an interneuron network that synchronizes at gamma when driven strongly (Fig. 9A), but does not show gamma when chosen values match the obtained reward (Fig. 9B). The described circuits entail that N3 e-type interneurons in ACC and LPFC form similar interneuron networks. In LPFC the N3 e-type interneuron network gates values of conflicting object values, and in ACC it gates reward outcomes according to the chosen values. In both networks, the N3 e-type interneurons synchronize at gamma when there is more synaptic excitatory drive during the reversal learning period of the task (Fig. 7, 8).

Discussion
We found that narrow spiking neurons in the medial and lateral prefrontal cortex of macaques cause a fast drop in local multiunit activity indicative of inhibitory interneurons. These putative interneurons increased their firing rates in LPFC to the color-cue onset, encoded the rewarded color and correlated their rates with the choice probabilities, while in ACC their firing correlated with reward prediction errors during the processing of the reward outcome. These functional signatures were specifically linked to a putative interneuron subtype that showed intermediate narrow action potential waveforms and more regular firing patterns than expected from a Poisson process (LV's of N3 e-type neurons: 0.84). Moreover, this putative interneuron subtype (N3 etype) engaged in prominent event-triggered 35-45 Hz gamma band synchronization in each of the recorded brain areas. In LPFC the N3 e-type gamma synchronized to the cue when choice probabilities were low and uncertain, and in ACC the N3 e-type gamma synchronized to the reward onset when the RPE was high and the reward outcome was unexpected. Thus, the same e-type showed functional firing correlations and gamma synchrony in LPFC and in ACC indicating an area specific functional contribution with the same activation signature.
Taken together, these findings point to a special role of the same type of interneuron in LPFC and in ACC to realize their area specific functional contribution to the color-based reversal learning task. This interpretation highlights several aspects of interneuron specific circuit functions.

Characterizing narrow spiking interneurons in vivo
The first implication of our findings is that narrow spiking neurons can be reliably subdivided in three subtypes based on their electrophysiological firing profiles. Distinguishing three narrow spiking neurons in vivo is a significant step forward to complement tripartite electrophysiological distinctions of interneurons in vitro (Zaitsev et al., 2009;Torres-Gomez et al., 2020) or using molecular and genetic classification schemas that show a multitude of distinguishable interneurons with ion channel make up that lead to narrow action potential shapes Monyer and Markram, 2004;Medalla et al., 2017). The genetic and molecular tools have the limitation that they are not easily applied to the whole population of activated neurons during task performance in primates. In this situation our study could prove particularly important in pinpointing the electrophysiological characteristics of those putative interneurons that are functionally activated during cognitive performance and associated with latent processes reflected in the choice probability of the rewarded target feature and the strength of the reward prediction error.
We believe that the N3 e-type that showed functional correlations in two areas encompasses mostly parvalbumin PV+ expressing neurons, because of their narrow spikes, regular interspike intervals and their propensity to synchronize at gamma, which resemble the regular firing and gamma synchrony described for PV+ cells in the rodent (Cardin et al., 2009;Tiesinga, 2012;Stark et al., 2013;Amilhon et al., 2015). Moreover, similar to the N3 e-type responses to the attention cue, rodent dorsomedial frontal PV+ neurons systematically activate to preparatory cues while somatostatin neurons respond significantly less (Pinto and Dan, 2015). However, PV+ neurons are heterogeneous and entail Chandelier cells and variably sized basket cells Markram et al., 2015). It might therefore be an important observation that the N3 e-type was distinguished from another narrow spiking neuron by having a lower baseline firing rate and an intermediate-narrow action potential shape as opposed to the narrowest waveform and highest firing rates that N1 e-types showed. The tentative suggestion that N3 e-type neurons will be mostly PV+ cells also entails for the primate brain that they would not be part of calretinin (CR+) or calbindin (CB+) expressing cells as their expression profiles do not apparently overlap (Dombrowski et al., 2001;Medalla and Barbas, 2009;Raghanti et al., 2010;Torres-Gomez et al., 2020).

What is the circuit role of the N3 interneuron e-type?
Assuming that N3 e-type neurons are partly PV+ neurons we speculate that this translates into gamma rhythmic inhibition of local circuit pyramidal cells close to their soma where they impose output gain control Bartos et al., 2007;Womelsdorf et al., 2014b;Tremblay et al., 2016). In our task, such local inhibition was linked to how uncertain the expected values of stimuli were (reflected in low choice probabilities) or how unexpected rewarded outcomes were  Fig. 9), or through connections to reward representing pyramidal cells. Disinhibitory connections are typical for calretinin expressing interneurons that receive ~35% of excitatory afferent connections (while PV+ cells receive ~18% afferent connections and pyramidal cells receiving ~65%) neurons in primate prefrontal cortex Barbas, 2009, 2010). In our study such a disinhibition could include the N1 e-type population that tended to have opposite RPE correlation than N3 e-type neurons.
In summary, the proposed circuit model of behavioral regulation asserts that N3 e-type neurons will have a similar gating function for the output of the circuit in LPFC and in ACC, but they operate on different inputs in each area. The model hypothesizes that these interneurons in LPFC gate object values into choices, while in ACC gate chosen value and rewards into a scalar updating signal (the reward prediction error). It will require a combined electrophysiological and optogenetic approach to tag these cells in future studies for causally testing their importance for the proposed circuit functions. The results presented here provide a powerful starting point for these causal experiments in primates to clarify cell-type specific circuit functions during higher cognitive operations.

Interneuron-specific gamma synchronization realizes area specific functions.
Two key findings of our study pertain to spike-LFP gamma band synchronization. First, we found that N3 e-type neurons showed an event-triggered synchrony increase in the same 35-45 Hz gamma frequency band in both LPFC and ACC (see Fig. 7C and 8F). The N3 e-type gamma synchrony was not paralleled by synchrony of broad spiking neurons, which either did not synchronize or synchronized to beta and theta/alpha rhythms. These observations suggest that gamma synchrony was intrinsically generated by the putative N3 e-type interneurons themselves as opposed to being inherited from pre-existing rhythmic activity of the larger network, e.g. from afferent connections to interneurons (Medalla and Barbas, 2009;Schmitt et al., 2017;Ahrlund-Richter et al., 2019).
Such an intrinsic propensity for generating gamma rhythmic activity through, e.g. GABAaergic time constant, is well described for PV+ interneurons (Wang and Buzsaki, 1996;Bartos et al., 2007;Womelsdorf et al., 2014b;Chen et al., 2017) even at relatively low excitatory feedforward drive that might be more typical for prefrontal cortices than earlier visual cortices (Cardin et al., 2009;Vinck et al., 2013). Consistent with a cell-intrinsic gamma specific activity we found that gamma synchronization was selectively associated with narrow spiking neurons in LPFC and ACC while there was not a single broad spiking neural e-type that showed a ~40 Hz gamma peak in their overall spike-LFP spectrum (Suppl. Fig. S7). Taken together, these findings provide strong empirical evidence that narrow spiking interneurons are not only the main carriers of gamma rhythmic activity in nonhuman primate prefrontal cortex, but are also primary generators of gamma activity in-vivo (Whittington et al., 2000;Hasenstaub et al., 2005;Bartos et al., 2007;Hasenstaub et al., 2016;Chen et al., 2017). This conclusion resonates well with rodent studies that document how interneurons in infra-/peri-limbic and cingulate cortex engage in gamma synchrony (Fujisawa and Buzsaki, 2011;Cho et al., 2015).
The second major implication of the gamma synchronous N3 e-type neurons is that gamma band synchrony was associated with task epochs in which neural circuits realize an area-specific circuit function. In LPFC, the gamma increase was triggered by the color-cue onset of two peripherally presented stimuli that instructed covertly shifting attention. Our circuit model (Fig. 9A) illustrates that cue related gamma was restricted to periods when object values were similar, and the animal still learned which object is most reward predictive. The control of learning what is relevant during cognitively demanding tasks is a key function of the lateral prefrontal cortex, suggesting that gamma activity emerges when this key function is called upon (Miller and Cohen, 2001;Szczepanski and Knight, 2014;Cho et al., 2020). A similar scenario holds for the ACC whose central function is often considered to monitor and evaluate task performance and detect when outcomes should trigger change in behavioral strategies (Shenhav et al., 2013;Heilbronner and Hayden, 2016;Alexander and Brown, 2019;Fouragnan et al., 2019). In ACC, the gamma increase was triggered by an unexpected rewarded outcome (high RPE). Thus, the N3 e-type specific gamma band signature occurred specifically in those trials with conflicting stimulus values requiring behavioral control to reduce the prediction errors through future performance (Fig. 9A).
Considering this ACC finding together with the PFC finding suggest that gamma activity of N3 etype neurons is particularly important when these areas exert their area specific key function, supporting recent causal evidence from rodent optogenetics (Cho et al., 2020).
Consistent with the proposed importance of interneurons for area-specific key functions prior studies have documented the functional importance of inhibition in these circuits. Blocking inhibition with GABA antagonists like bicuculline not only renders fast spiking interneurons nonselective during working memory tasks but abolishes the spatial tuning of regular spiking (excitatory) cells during working memory tasks in monkeys (Sawaguchi et al., 1989;Rao et al., 2000), disturbs accuracy in attention tasks (Paine et al., 2011) and reduces set shifting flexibility by enhancing perseveration (Enomoto et al., 2011). Similarly, abnormally enhancing GABAa levels via muscimol impairs working memory and set shifting behavior (Rich and Shapiro, 2007;Urban et al., 2014) and can result in either maladaptive impulsive behaviors (Paine et al., 2015), and when applied in anterior cingulate cortex to perseveration (Amiez et al., 2006). Thus, altered medial and lateral prefrontal cortex inhibition is closely linked to an inability to adjust attentional strategies given unexpected outcomes. This evidence provides additional evidence complementing our findings of the importance of inhibitory neuron involvement in RPE and choice probability coding.
Taken together, our interneuron specific findings in primate LPFC and ACC stress the importance of interneurons to influence circuit activity beyond a mere balancing of excitation. Multiple theoretical accounts have stressed that some types of interneurons 'control information flow' (Fishell and Kepecs, 2019), by imposing important filters for synaptic inputs to an area and gaincontrol the output from that area (Akam and Kullmann, 2010;Kepecs and Fishell, 2014;Womelsdorf et al., 2014b;Roux and Buzsaki, 2015;Cardin, 2018). Testing these important circuit functions of interneurons has so far been largely limited to studies using molecular tools. Our study addresses this limitation by characterizing putative interneurons, delineating their suppressive effects on the circuit and highlighting their functional activation during reversal learning. The observed interneuron specific, gamma synchronous coding of choice probabilities and prediction errors lends strong support for studying the cell-type specific circuit function of neurons to support higher cognitive functions.

Materials and Methods
All animal care and experimental protocols were approved by the York University Council on Animal Care and were in accordance with the Canadian Council on Animal Care guidelines.
Electrophysiological Recording Data was collected from two male rhesus macaques (Macaca mulatta) from the anterior cingulate cortex and lateral prefrontal cortex as described in full in (Oemisch et al., 2019b). Extra-cellular recordings were made with tungsten electrodes (impedance 1.2 -2.2 MOhm, FHC, Bowdoinham, ME) through rectangular recording chambers implanted over the right hemisphere. Electrodes were lowered daily through guide tubes using software-controlled precision micro-drives (NAN Instruments Ltd., Israel). Wideband local field potential (LFP) data was recorded with a multichannel acquisition system (Digital Lynx SX, Neuralynx) with a 32kHz sampling rate. Spiking activity was obtained following a 300 -8000 Hz passband filter and further amplification and digitization at a 32 kHz sampling rate. Sorting and isolation of single unit activity was performed offline with Plexon Offline Sorter, based on the first two principal components of the spike waveforms and the temporal stability of isolated neurons. Only well isolated neurons were considered for analysis . Experiments were performed in a custom-made sound attenuating isolation chamber. Monkeys sat in a custom-made primate chair viewing visual stimuli on a computer monitor (60Hz refresh rate, distance of 57cm) and performing a feature-based attention task for liquid reward delivered by a custom-made valve system (Oemisch et al., 2019b).

Anatomical reconstruction of recording locations
Recording locations were identified using MRI images obtained following initial chamber placement. During MR scanning, we placed a grid marking the chamber center and peripheral positions as well as a diluted iodine solution inside the chamber for visualization. This allowed the referencing of target regions to the chamber center in the resulting MRI images. The positioning of electrodes was estimated daily using the MRI images and audible profiles of spiking activity. The relative coarseness of the MRI images did not allow us to differentiate the specific layer of recording locations in lateral prefrontal and anterior cingulate cortices.

Task Paradigm
The task (Fig. 1) required centrally fixating a dot and covertly attending one of two peripherally presented stimuli (5° eccentricity) dependent on color-reward associations. Stimuli were 2.0° radius wide block sine gratings with rounded-off edges, moving within a circular aperture at 0.8 °/s and a spatial frequency of 1.2 (cycles/°). Color-reward associations were reversed without cue after 30 trials or until a learning criterion was reached, which makes this task a color-based reversal learning task.
Each trial began with the appearance of a grey central fixation point, which the monkey had to fixate. After 0.5 -0.9s, two black/white gratings appeared to the left and right of the central fixation point. Following another 0.4s the two stimulus gratings either changed color to green and red (monkey K: cyan and yellow), or they started moving in opposite directions up and down, followed after 0.5 -0.9s by the onset of the second stimulus feature that had not been presented so far, e.g. if after 0.4s the grating stimuli changed color then after another 0.5 -0.9s they started moving in opposite directions. After 0.4 -1s either the red and green stimulus dimmed simultaneously for 0.3s or they dimmed separated by 0.55s, whereby either the red or green stimulus could dim first. The dimming of the rewarded stimulus represented the GO cue to make a saccade to one of two response targets displayed above and below the central fixation point. The dimming of the norewarded stimulus thus represented a NO-GO cue triggering the withholding of a response and waiting until the rewarded stimulus dimmed. The monkeys had to keep central fixation until this dimming event occurred.
A saccadic response following the dimming was rewarded if it was made to the response target that corresponded to the (up-or down-ward) movement direction of the stimulus with the color that was associated with reward in the current block of trials, e.g. if the red stimulus was the currently rewarded target and was moving upward, a saccade had to be made to the upper response target at the time the red stimulus dimmed. A saccadic response was not rewarded if it was made to the response target that corresponded to the movement direction of the stimulus with the nonreward associated color. Hence, a correct response to a given stimulus must match the motion direction of that stimulus as well as the timing of the dimming of that stimulus. This design ensures the animal could not anticipate the time of dimming of the current target stimulus (which could occur before, after, or at the same time as the second stimulus), and thus needed to attend continuously until the 'Go-signal' (dimming) of that stimulus occurred. If dimming of the target stimulus occurred after dimming of the second/distractor stimulus, the animal had to ignore dimming of the second stimulus and wait for dimming of the target stimulus. A correct response was followed by 0.33ml of water reward.
The color-reward association remained constant for 30 to a maximum of 100 trials. Performance of 90% rewarded trials (calculated as running average over the last 12 trials) automatically induced a block change. The block change was un-cued, requiring monkeys to use the trial's reward outcome to learn when the color-reward association was reversed. Reward was delivered deterministically.
In contrast to color, other stimulus features (motion direction and stimulus location) were only randomly related to reward outcome -they were pseudo-randomly assigned on every trial. This task ensured that behavior was guided by attention to one of two colors, which was evident in monkeys choosing the stimulus with the same color following correct trials with 89.5% probability (88.7%/ 90.3% for monkey H/K), which was significantly different from chance (t-test, both p < .0001).

Behavioral analysis of the animal's learning status
To characterize the reversal learning status of the animals we determined the trial during a block when the monkey showed consistent above chance choices of the rewarded stimulus using the expectation maximization algorithm and state-space framework introduced by (Smith et al., 2004), and successfully applied to reversal learning in our previous work (Balcarras et al., 2016;Hassani et al., 2017). This framework entails a state equation that describes the internal learning process as a hidden Markov or latent process and is updated with each trial. The learning state process estimates the probability of a correct (rewarded) choice in each trial and thus provides the learning curve of subjects. The algorithm estimates learning from the perspective of an ideal observer that takes into account all trial outcomes of subjects' choices in a block of trials to estimate the probability that the single trial outcome is reward or no reward. This probability is then used to calculate the confidence range of observing a rewarded response. We identified a "Learning Trial" as the earliest trial in a block at which the lower confidence bound of the probability for a correct response exceeded the p = 0.5 chance level.
Reinforcement learning modeling to estimate choice probability and expected value of color The color reversal task required monkeys to learn from trial outcomes when the color reward association reversed to the alternate color. This color-based reversal learning is well accounted for by an attention augmented Rescorla Wagner reinforcement learning model ('attention-augmented RL') that we previously tested against multiple competing models (Balcarras et al., 2016;Hassani et al., 2017;Oemisch et al., 2019b). Here, we use this model to estimate the trial-by-trial fluctuations of the expected value (EV) for the rewarded color and the choice probability (CP) of the animal's stimulus selection. EV and CP increase with learning similar to the increase in the probability of the animal to make rewarded choices, causing all three variables to correlate (Fig  4E,F). The attention augmented RL is a standard Q Learning model with an added decay constant that reduces the value of those features that are part of the non-chosen (i.e. non-attended) stimulus on a given trial. On each trial t this model updates the value V for features i of the chosen stimulus according to !,#$% = !,# + ( # − !,# ), (eq. 2). where R denotes the trial outcome (0=non-rewarded, 1=rewarded) and is the learning rate bound to [0 1]. For the same trial the feature values i of the non-chosen stimulus decay according to !,#$% = (1 − ) !,# , (eq. 3) With ω denoting the decay parameter. Following these value updates, the next choice Ct+1 is made by a softmax rule according to the sum of values that belongs to each stimulus. We indicate the stimulus by the index j and the set of feature values that belong to it by set sj, (for instance, color x, location y, direction z): We optimized the model by minimizing the negative log likelihood over all trials using up to 20 iterations of the simplex optimization method to initialize the subsequent call to fmincon matlab function, which constructs derivative information. We used an 80/20% (training/test dataset) crossvalidation procedure repeated for n=50 times to quantify how well the model predicted the data. Each of the cross-validations optimized the model parameters on the training dataset. We then quantified the log-likelihood of the independent test dataset given the training datasets optimal parameter values.

Waveform analysis
We initially analyzed 750 single units and excluded 24 units that showed double troughs or those that we had overall less than 50 spike number. We then analyzed 726 highly isolated cells in ACC (397 cells), and PFC (149 cells area 8, and 180 cells dLPFC). We trough aligned all action potentials (AP) and normalized them to the range of -1(trough) to 1 (peak). APs were then interpolated from their original time-step of 1/32000 s to a new time step of 1/320000 s. To characterize AP waveforms, we initially computed three different measures of Trough to Peaks (T2P) and Time for Repolarization (T4R) and Hyperpolarization Rate (HR) according to eq. 1-3: 2 = ( #/0123 − 4567 ) (eq.1), 4 = 3 8.:; < 4567 − 4567 4 (eq.2), where tpeak is time of the most positive value (peak) of the spike waveform, ttrough is time of the most negative value of the spike waveform, 8.:; < 4567 is the time of spike waveform after the peak with a voltage equal to 75% of the peak and -8.>? < 4567 is the time of the spike waveform before the peak with a voltage value equal to 63% of the peak. We performed Hartigan's dip test was to test the unimodality hypothesis of distributions (P<0.05). HR and T2P were highly correlated (r=-.76). We chose HR as it was able to reject the Hartigan's dip test null hypothesis of distribution unimodality (P=0.01). We then used HR and T4R to characterize waveform dynamics. T4R interval likely describes dynamics of the waveform in a period that calcium activated potassium channels are activated and most voltage-gated potassium channels are closed. While, HR reflects a time interval that most of sodium channels are closed and potassium channels have greater contribution to the dynamics of the waveform (Bean, 2007). Both T4R and HR and their first component of the PCA were fitted with a bi-modal Gaussian distribution. We applied Akaike's and Bayesian information criteria for the two vs one Gaussian fits to select the best fit to the waveform measures. For all statistical tests that were performed on time-series, we used permutation randomization test and multiple comparisons with both primary and secondary alpha level of 0.05, unless the type of multiple comparison correction is explicitly mentioned.

Spike-triggered multiunit modulation
We used spike-triggered multiunit analysis to estimate whether its spiking increased or decreased concomitantly with the surrounding neural activity -measured on a different electrode located ~200-450 µm from the electrode measuring the spiking activity. To compute the relative multiunit activity (MUA) of the signal before and after spike occurrences, we used the Wide-Band signal and bandpass filtered the signal to a frequency range of [800 3000] Hz. The signal was then rectified to positive values. For each single unit, we extracted a period of [-50 50] ms around each spike aligned to the spike trough and estimated the power time-course of the signal using a sliding median filter window (window length=5 ms) over the extracted signal every 0.5 ms. For a given single unit, we computed the Z-transformation of each spike-aligned median filtered peakamplitude by subtracting its mean and dividing by its standard deviation. This step normalized the MUA around the spike times. We then computed the average Z-transformed MUA across all spikes for each single unit. To compare the post spike MUA to pre-spike MUA, we computed the spike triggered MUA modulation ratio (SMUM) according to equation = . Prespike MUA was the mean in a period of 10ms before the spike and the Post-spike MUA was the mean in a period of 10ms after the spike.
For comparison of spike triggered MUA modulation of broad vs narrow spiking neurons we used the Wilcoxon test on the computed ratio, under the null hypothesis that there is no difference of MUA strength before and after the spike occurrence for narrow vs broad spiking neurons. We also performed the test on each individual group compare with population.
We also tested whether spike triggered MUA modulation differed varies with the distance of the electrode tip that measured the spike providing neuron and the electrode that measured the MUA, but found no distance dependency (Wilcoxon test, n.s.).

Analysis of Firing Statistics
To analyze firing statistics of cells, we followed procedures described in by , and for each neuron we computed the mean firing rate (FR), Fano factor (FF, mean of variance over mean of the spike count in consecutive time windows of 100 ms), the coefficient of variation (CV, standard deviation over mean of the inter-spike intervals), and a measure of local variability of spike trains called the local variation (LV). LV is a measure of regularity/burstiness of spike train and is proportional to the square of the difference divided by sum of two consecutive interspike intervals (Shinomoto et al., 2009).
Cell clustering technique We followed procedures described in , with minor adjustments to test whether neurons fall into different clusters according to the dynamics of their waveform dynamic measures and their firing statistics. For main clustering analysis, we used the K-Means clustering algorithm MATLAB/GNU Octave open source code, freely available in public Git repository https://bitbucket.org/sardid/clusteringanalysis. We used the K-Means clustering algorithm to characterize subclasses of cells within the dataset upon the Euclidian distances of neuronal measures. We initially used three measures of the waveform: Hyperpolarization Rate, Time for repolarization, and their first component of PCA. For the firing statistic measures we used local variation, coefficient of variance, Fano factor, and firing rate. The k-Means clustering algorithm is sensitive to duplicated and uninformative measures. We set a criterion of .9 of Spearmans' correlation coefficient to exclude measures that were highly correlated (1 st PCA was excluded).
To reduce the biases upon on variable magnitudes, we z-score transformed each measure and normalized it to a range of [0 1]. We then computed the percent of variance explained by each measure from overall variance in our data. The measures were sorted based on their explaining variance of the overall variance within data. To disregard uninformative measures, a cut off criterion of 90% were set to the cumulated sorted variance explained across measures. The Fano Factor was excluded based on this criterion from the k-Means clustering (Suppl. Fig.5A).

Determining cluster numbers
We used a set of statistical indices to determine a range of number of clusters that best explains our data. These indices evaluate the quality of the k-means clustering (for details see ): Rand, Mirkin, Hubert, Silhouette, Davies-Bouldin, Calinski-Harabasz, Hartigan, Homogeneity and Separation indexes. We then run 50 replicates of k-means clustering for k=1-40 number of clusters. For each k, we chose the best replicate based on the minimum squared Euclidian distances of all cluster elements from their respective centroids. While validity measures were improved by increasing number of clusters, the benefit was slowed down for number of clusters more than 5, suggesting a range of at least five to 15 clusters that could be accountable for our dataset. We then used a meta-clustering algorithm to determine the most appropriate number of clusters: n=500 realizations of the k-means (from k=5 to k=15) were run. For each k and n 50 replicates of the clustering were run and the best replicate were selected. For each k and across n, we computed the probability that different pairs of elements belonged to the same cluster. To identify reliable from spurious clusters, we used a probability threshold (P >= 0.9) and considered only reliable clusters with at least 5 neurons to remove those composed of outliers. The final clustering was then visualized with a dendrogram based on squared Euclidean distances between the cluster centroids. We validated finally determined number of clusters using Akaike's and Bayesian criteria which showed the smallest value for k=8 (AIC: [-17712, -17735, -18476, -11114] and BIC: [-1.7437, -1.7368, -1.8109, -1.0747], for k= [6,7,8,9]).

Validation of the identified cell classes
We used dataset randomization (n = 200 realizations) as in , to validate our meta-clustering analysis by computing two validity measures. First, In each realization, each of eight clusters were associated to the closest cell class in Fig. 4A,B. From all realizations and for each cell class, the difference between the mean of all clusters that were associated to the same cell class with respect to the mean of all clusters that were not associated to that cell class is computed versus when the clusters were randomly assigned to the cell classes (Suppl. Fig. 5C). Second, we validated the reliability of cell class assignment using n = 200 realizations of a randomization procedure that calculated the proportion of consistently assigned cells to a class compared to other cells assigned to that class. The proportion of class-matching cells with respect to control was systematically higher than class-matching when using a bootstrap procedure with random assignment of class labels (Suppl. Fig. 5D).

Correlation of local variation with burst index
The Local Variation (LV) measured how regular neighboring spike trains are, leading to higher values when neurons fire short interspike interval (ISIs) spikes (bursts) intermittent with pauses. We quantified how the LV correlated with the likelihood of neurons to show burst spikes. We calculated the burst proportion as: number of ISIs<5 ms divided by number of ISIs<100 ms similar to . To control for effect of firing rate on the measure, we normalized it by the firing rate that would have been expected for a Poisson distribution of ISIs.
We used burst-index computed for neurons and grouped neurons in PFC and ACC into two subgroups, high burst proportion and low burst proportion (Log(BI)>0 and Log(BI)<0 respectively).
We computed the proportion of neurons in each group that showed significant correlation with RPE (in ACC) and Choice Probability (in PFC). In PFC, 25% of high BI neurons and 27.5% of low BI neurons were significantly correlated with Choice probability. In ACC, however, 47% of high BI neurons and 35.2% of low BI neurons were significantly correlated with RPE. Chi-square test failed to show significant differences between two groups (low vs high BI) for proportion of significantly correlated cells with RPE (in ACC, P=0.15), and with Choice Probability (in PFC, P=0.75).
Spike-LFP synchronization analysis Adaptive Spike Removal method was used on wide-band signal to remove artifactual spike current leakage to LFP (details in (Banaie Boroujeni et al., 2020a)). We then used the fieldtrip toolbox on the spike removed data to compute the Fourier analysis of the local field potential (LFP). Spike removed signals were resampled with 1000 Hz sampling rate. For each frequency number, Fourier transform was performed on 5 complete frequency cycles using an adaptive window around each spike (two and a half cycles before and after the spike). We then computed the pairwise phase consistency (PPC) to measure spike-LFP synchronization.
To determine at which frequency-band single neurons showed reliable spike-LFP PPC, a permutation test was adapted and used to construct a permutation distribution of spike-LFP PPC under the null hypothesis of no significant statistical dependencies of spike-LFP phase locking were preserved between spike phases and across frequencies. Then, each bands of significant frequencies were identified and for each band the sum of PPC value (which is unbiased by number of spikes) was computed. We then determined the significance based on PPC band-mass.
To determine whether the spectrum of spike-LFP synchronization measure (PPC) contains peaks that are statistically significant we used four criteria similar to . These criteria ensure to indicate reliable frequencies that show phase-consistent spiking. First, detected peaks had to be Rayleigh test significant (P<0.05), to reject the homogeneity hypothesis of the phase distribution. Second, each peak had to have PPC value greater than 0.005. Third, each peak had to have peak prominence of at least 0.0025 from its neighboring minima to disregard locally noisy and possibly spurious PPC peaks. Fourth, detected peaks had to have PPC value greater than 25% of PPC range.
Statistical analysis on the class-specific PPC peak distribution To determine whether clusters show significant proportion of PPC peaks in a specific frequency band, 1000 samples with the same size to each class was selected from the population of neurons. For each sample we computed the mean to construct a distribution of sample means under the null hypothesis that no class show proportion of PPC peak in frequency bands different than the population of samples. The distribution of peak proportion for each class was then compared with identified 95% confidence interval of the population of samples. This procedure was done separately for classes of neurons in PFC and ACC.
Analysis of the firing onset-responses to the Color onsets and Error/Reward Outcome onsets For each neuron, the spike density was computed using a gaussian window of 600ms (std 50ms) around the Cue onsets, Error outcome onsets and Reward onsets across trials. We then performed the z-score transformation of event onset aligned mean response of each cell over trials, by subtracting the pre-onset mean of spike density divided by its standard deviation (a time window of [-500ms 0ms] prior to the event onsets). To investigate class specific event response, we used a permutation approach and randomly selected 1000 samples with a class size same as each class. We then constructed a distribution of mean samples under the null hypothesis that no class show event response different than sample population. Cell classes that showed significantly different response than the population were then identified in a duration that they show response more extreme than 2 standard deviation from the population of samples. We performed these tests separately for classes in area PFC and ACC and event onsets: Color-Cue, Motion-Cue, Error outcome, and reward outcome.
For Broad vs Narrow spiking cell comparison of event onset response, we randomly shuffled the label of neurons and constructed a distribution of 1000 times randomly sampled difference of mean of Narrow and Broad spiking cells. We then computed 95% CI of the population samples and computed the most extreme 5% of time courses from the 95% CI under the null hypothesis that Broad and Narrow population of neurons do not show significant mean difference responses to the event onsets.

Analysis of time-resolved spike-LFP coherence under different behavioral conditions
To analyze the spike-LFP phase synchronization of neurons for the trials with 50% lowest and the 50% highest reward prediction error (RPE) for ACC neurons, and for the trials with 50% lowest and 50% highest choice probability (CHP) for LPFC neurons we computed time-resolved spike-LFP pairwise phase consistency. First, we divided trials into two groups of high and low RPE and CHP values (trials were assigned based on their median value for each neuron). Then, for each neuron, RPE, and CHP condition we extracted spikes and their phase synchronization to the LFP in different frequencies (4-80 Hz, 1 Hz resolution) by applying Fourier transform on a hanningtapered LFP signal (+/-2.5 frequency cycles around each spike). Then we computed the PPC for moving windows of +/-350ms every 50 ms around the outcome onset (for RPE) and around color onset (for CHP). We included only neurons with at least 50 spikes per time window. To control for spike number, we repeated the procedure 500 times with a random subsample of 50 spikes of a neuron for each window before computing the PPC. For each neuron, behavioral condition, and window we calculated the average PPC over the random subsamples.
Statistical Analysis of time resolved spike-LFP coherence for putative interneurons and broad spiking neurons Statistics on the time-resolved coherence was computed in two steps. In the first step, we tested for each post-event time window the null hypothesis that N3-type neurons and broad spiking neurons showed similar spike-LFP synchronization strength after the event onset compared to the time windows prior to the event. To test this, we first normalized the time resolved coherence for each neuron to the baseline coherence (-850ms to 0ms) before reward or attention-cue onset (in ACC and PFC respectively). We then randomly selected 1000 sample of neurons from the population with the same size as neurons in class N3 and broad cells under the null hypothesis that N3 class and broad spiking neurons do not show different synchronization pattern triggered by event onset compared with population. For each sample we extracted the 95% CIs, and over the population of samples we extracted the most extreme 5% of the previously extracted CIs and set the final 95% multiple comparison corrected confidence intervals. We then found the average of normalized PPC values for N3 class and broad spiking neurons in a time period and frequency domain that were more extreme then the defined confidence intervals. The area of significance then was shown by black contours. In the second step we asked whether N3 class neurons show different average synchrony strength over a time window of [0ms 500ms] aligned windows to the attention-cue onset (in PFC and for high and low Choice Probability conditions) and to the reward onset (in ACC and for high and low Reward Prediction Error conditions). We randomly selected 1000 samples, with the same size as N3 class, from broad spiking neurons and computed their average pre-onset normalized synchrony in the defined post-onset period. We then constructed the most 5% extreme values of 95% confidence intervals defined over 1000 samples and across frequencies under the null hypothesis that N3 class cells do not show different synchrony strength from broad spiking cells in the post-onset time period and across different frequencies. We set the confidence interval levels and selected frequency bands more extreme than the CIs as significantly different (multiple comparison adjusted alpha level=0.05).
Analysis of effect size of the firing onset-responses to the Cue onsets and Error/Reward outcome event onsets For effect size analysis of cell class specific response to each of the onsets, we computed the mean difference of each cell class from each of 1000 randomly labeled samples divided by their pooled standard deviation to compute Cohen's d for each randomly selected sample. At the end we averaged over the 1000 unsigned Cohen's d computed for each cell class. The procedure was done separately for ACC and LPFC classes and for Cue onsets and Error/Reward outcome event onsets.
Analysis of narrow vs. broad and cell class specific firing correlations with reversal learning To investigate whether firing rate of cells correlate with the learning state, we performed correlation analysis between firing rate of single neurons and model parameters: probability of chosen stimulus (choice probability, p(choice)), and positive Reward Prediction Error (RPEpos). For the correlation analysis, we excluded neurons that had less than 30 trials of neural activity. For each neuron, the event onset response was normalized to the mean of all trials' pre-onset firing (in a period of -0.5s to the event onset) and was divided by the standard deviation of all in that period. We then computed for each neuron the Spearman correlation coefficient between p(choice) values and then normalized firing rate in a moving window ±200ms with sliding increments of 25ms relative to the Color-Cue onset. We used the same procedure for the reward-onset mean of normalized firing rate and RPEpos values. To test whether narrow and broad spiking neurons correlate their firing rate differently to model values, we randomly shuffled cell labels and constructed a distribution of 1000 differences of the mean correlations of randomly assigned neurons to the broad and narrow groups under the null hypothesis that there is no difference in correlations depending on the spike waveform group. We then computed the most extreme 5% of the sample difference of means through their time course and identified the 95% confidence interval to test our null hypothesis. We also tested whether cells of different cell classes showed different correlations of firing rate and p(choice) or RPEpos.using the Kruskal-Wallis test considering cell class as the grouping variable. To test which class shows correlations different than the population mean, we randomly shuffled cell class labels 1000 times and computed the mean difference between each randomly labeled cell class and the population. We then constructed a distribution of mean difference samples under the null hypothesis that no class shows a mean correlation different from the population mean. We then computed the top 5% of samples and identified 95% confidence interval. Classes that showed a mean difference of correlation to the population more extreme than the identified CI were marked as significant. All mentioned procedures were performed separately for neurons in area ACC and PFC and for both, p(choice) or RPEpos values.

Analysis of the information coding cells for the rule identity and target location
To determine what proportion of neurons relative to the Color-Cue onsets as well as Reward/Error outcome onsets systematically carry information about the rule identity (Red vs. Green), or target location (Left vs. Right), we considered neurons we had at least 20 trials for each condition. We used a moving window of ±200ms with sliding increments of 25ms relative to the Cue-onset or Error/Reward outcome onsets. For each window with performed the nonparametric rank sum test between the two of conditions under the null hypothesis that neurons do not fire preferentially different to a specific color or location. For Narrow and Broad spiking neurons we computed the proportion of neurons that showed statistically significant firing rate (P<0.05) to each condition.
We then randomly shuffled the proportion amounts of significantly different firing neurons over the time course and computed 95% CI under the null hypothesis that each group of neurons do not show proportionally different number of neurons compared to the pre-onset population of proportion values.
Analysis of cell class firing statistics measures For each of firing statistic measures (firing rate, local variation, and coefficient of variance) we performed nonparametric Kruskal-Wallis test with cell class as grouping variable to test for a main effect of cell class on each firing statistics. We then performed rank sum multiple comparison for pairwise comparison of cell class differences (P<0.05).

Analysis of PPC strength for learning correlated cells vs non-correlated cells
We grouped our neurons based on their waveform (Narrow vs Broad) and then further grouped them into subgroups of those that their firing after the onset were significantly correlated with learning values and those that were not (PCS x Firing Rate after Cue-onset in PFC, and RPEpos x Firing rate after Reward-onset in ACC). For each waveform-grouped neuron, we randomly shuffled their labels and computed the difference of PPC peak proportions between neurons that their firing rate were significantly correlated with learning state and those that were not significantly correlated. We constructed a distribution of 1000 randomly selected samples of difference of proportions of PPC peaks under the null hypothesis that for each waveform grouped neurons there is no significant difference in the proportion PPC peaks for neurons that their firing rate were significantly correlated with learning values and those that were not significantly correlated. We then identified the most extreme 5% of the peak proportion difference and computed 95% CI over the population of samples.    Values below 0 reflect reduced multiunit firing after the neuron fires a spike compared to before the spike, indicating a relative suppressive relationship. Only the N3 etype showed a systematically reduced post-spike MUA modulation. MUA were always recorded from other electrodes nearby the spiking neuron.     Value and reward outcome encoding in LPFC and ACC was shown before in (Oemisch et al., 2019a). (A) When object values are similar (e.g. ~0.5 each) the interneuron network receives a strong drive, causing gamma synchrony and a gating of one of the values. The chosen value is then projected to an interneuron network in ACC that calculates the reward prediction error (RPE) and synchronizes at gamma when the chosen value is relatively low relative to the obtained reward, corresponding to a high RPE. This N3 e-type gamma synchronous response indicates that values need to be updated (thick grey line from ACC to LPFC).
(B) Same network as (A) when one object has a high and another object a low value. In this case, the overall excitatory drive to the interneuron network is lower resulting in beta synchronization in LPFC, and in ACC a switch from N3 e-type gamma synchrony to theta synchrony that follows the rhythmic activity of reward activated pyramidal cells. In this case, N3 e-type cells show low firing rates indicating a low RPE, hence that no value update is needed (thin grey line from ACC to LPFC). See text for details.