Abstract
The nucleus accumbens (NAc) is a key node within corticolimbic circuitry for guiding action selection and cost/benefit decision making in situations involving reward uncertainty. Preclinical studies have typically assessed risk/reward decision making using assays where decisions are guided by internally generated representations of choice-outcome contingencies. Yet, real-life decisions are often influenced by external stimuli that inform about likelihoods of obtaining rewards. How different subregions of the NAc mediate decision making in such situations is unclear. Here, we used a novel assay colloquially termed the “Blackjack” task that models these types of situations. Male Long–Evans rats were trained to choose between one lever that always delivered a one-pellet reward and another that delivered four pellets with different probabilities [either 50% (good-odds) or 12.5% (poor-odds)], which were signaled by one of two auditory cues. Under control conditions, rats selected the large/risky option more often on good-odds versus poor-odds trials. Inactivation of the NAc core caused indiscriminate choice patterns. In contrast, NAc shell inactivation increased risky choice, more prominently on poor-odds trials. Additional experiments revealed that both subregions contribute to auditory conditional discrimination. NAc core or shell inactivation reduced Pavlovian approach elicited by an auditory CS+, yet shell inactivation also increased responding during presentation of a CS−. These data highlight distinct contributions for NAc subregions in decision making and reward seeking guided by discriminative stimuli. The core is crucial for implementation of conditional rules, whereas the shell refines reward seeking by mitigating the allure of larger, unlikely rewards and reducing expression of inappropriate or non-rewarded actions.
SIGNIFICANCE STATEMENT Using external cues to guide decision making is crucial for adaptive behavior. Deficits in cue-guided behavior have been associated with neuropsychiatric disorders, such as attention deficit hyperactivity disorder and schizophrenia, which in turn has been linked to aberrant processing in the nucleus accumbens. However, many preclinical studies have often assessed risk/reward decision making in the absence of explicit cues. The current study fills that gap by using a novel task that allows for the assessment of cue-guided risk/reward decision making in rodents. Our findings identified distinct yet complementary roles for the medial versus lateral portions of this nucleus that provide a broader understanding of the differential contributions it makes to decision making and reward seeking guided by discriminative stimuli.
Introduction
Over the past 20 years, converging evidence from humans and animals has revealed that different aspects of decision making involving uncertainty are mediated by distributed neural circuits linking dissociable regions of the frontal lobes, the amygdala, and the dopamine system (Bechara et al., 1999; Kuhnen and Knutson, 2005; Clark et al., 2008; Ghods-Sharifi et al., 2009; Zeeb and Winstanley, 2011; St. Onge et al., 2012). In turn, information processed by these circuits may influence the direction of action selection via converging projections to the nucleus accumbens (NAc; Mogenson et al., 1980; Mannella et al., 2013; Floresco, 2015).
Several tasks have been developed to study the neural basis of risk/reward decision making in animals. Some are patterned after the Iowa gambling task used with humans (Bechara et al., 1994, 1999), wherein rats choose between multiple options associated with different probabilities of rewards and punishments (e.g., time-outs, bitter tastes, etc.; van den Bos et al., 2006; Pais-Vieira et al., 2007; Zeeb et al., 2009). Another approach has used probabilistic discounting tasks, involving choice between smaller, certain rewards and larger rewards delivered in a probabilistic manner, with the odds of obtaining a larger reward changing systematically over a session. The NAc plays a key role in mediating risk/reward decision making under these conditions. Disruption of NAc neural activity reduces preference for larger, uncertain rewards (Cardinal and Howes, 2005; Stopper and Floresco, 2011; Stopper et al., 2013; Mai et al., 2015) that may reflect impaired integration of information processed by amygdala-prefrontal circuits used to guide action selection (St. Onge et al., 2012; Jenni et al., 2017).
In many decision-making tasks, efficient choice is guided by internally generated value representations, requiring subjects to learn about reward probabilities associated with different actions and/or keep track of choice-outcome contingencies to ascertain which options may be more profitable. However, “real-life” risk/reward decisions are often guided by external cues that inform about the likelihood of obtaining certain rewards. Laboratory procedures such as the Cambridge Gamble Task or Dynamic Investment Task provide human subjects with explicit information about the likelihoods of obtaining rewards associated with different choices (Rogers et al., 1999; Kuhnen and Knutson, 2005). In real-life situations, a decision maker may likewise infer the probability of obtaining a reward based on the presence of external stimuli. For example, an experienced player of the casino game “Blackjack” can estimate that the odds of winning a hand are relatively good when the dealer shows a “6” card compared with when showing an “ace”. This can influence how the player manages the hand (potentially doubling down or surrendering to maximize profits/minimize losses). Yet, there has been a relative paucity of preclinical research investigating the neural basis of decision making informed by discriminative external cues. The possibility remains that the manner in which various nodes within prefrontal-amygdalar-striatal circuitry contribute to guiding choice under these conditions may differ from those where decisions are guided by internally generated information.
To explore this, we developed a procedure, colloquially termed the “Blackjack task”, that entails choice between small/certain and large/risky rewards. Critically, each choice was preceded by one of two auditory cues (presented pseudorandomly), which provided information that the odds of being rewarded after a risky choice were either good (50%) or poor (12.5%). Our initial focus was on clarifying the respective contribution of the core and shell subregions of the NAc to decision making under these conditions. An emerging literature suggests that these subregions play distinct yet complementary roles in refining reward seeking, with the core mediating approach to reward-related stimuli, and the shell suppressing irrelevant or non-rewarded behaviors (Di Ciano et al., 2008; Blaiss and Janak, 2009; Ambroggi et al., 2011; Floresco, 2015). Our initial findings prompted additional studies examining how these subregions contribute to performance of simpler conditional auditory discriminations and discriminative Pavlovian approach. The collection of findings provides a broader understanding of the differential contributions of NAc subregions to risk/reward decision making and more generally, to refining reward seeking guided by discriminative stimuli.
Materials and Methods
Animals
Male Long–Evans rats, weighing 225–275 g at time of arrival were used for these experiments. All experiments were conducted in accordance with the Canadian Council for Animal Care and approved by the Animal Care Committee of the University of British Columbia. Upon arrival, animals were housed in groups of four and given at least 7 d to acclimatize to the colony before being singly housed for the duration of the experiment. Colony temperature was maintained at 21°C with a 12 h light/dark cycle. Experiments were performed during the light phase of the cycle. Five days before the start of behavioral training, animals were food restricted to no less than 85–90% of their free-feeding body weight relative to the start of food restriction. Thereafter, rats were maintained on 20–25 g of chow/d, provided at the end of a training day. Body weights were monitored daily and weight gain was permitted for normal growth over the course of the experiments.
Apparatus
Behavioral testing was conducted in operant chambers (30.5 × 24 × 21 cm; Med-Associates) enclosed in sound-attenuated boxes. The boxes were ventilated with a fan that also served to mask outside noise. Each chamber was fitted with two retractable levers; one located on each side of the food receptacle where sucrose pellet reinforcement (45 mg; Bio-Serv) was delivered via a dispenser. On the wall opposite the food receptacle, a single 100 mA house light was situated next to a speaker connected to a programmable sound generator (ANL-926, Med Associates), through which auditory stimuli were delivered. Four infrared photobeams were mounted on the side of each chamber, and photobeam breaks were used as an index of locomotor activity. Another photobeam was situated in the food receptacle, which was used to monitor approaches to the food cup. All experimental data were recorded by a computer connected to the chambers via an interface.
Initial lever pressing training procedures
Before training on the main task, rats underwent a sequence of pretraining procedures, consisting of basic lever pressing, retractable lever training, side-bias testing, and reward magnitude discrimination training. On the day before their first exposure to the operant chambers, rats were given ∼25 sucrose reward pellets in their home cage to reduce food neophobia. Operant training started with lever press training under a fixed-ratio-1 (FR1) schedule. Before the start of the first session, two sucrose pellets were placed into the food receptacle and crushed pellets were placed on the lever before the animal was placed in the chamber. During these sessions, the house light was illuminated and one of the levers remained inserted into the chamber for 30 min or until 60 lever presses were made, whichever came first. Rats were trained to a criterion of 60 presses in a session. On the following day(s), they were required to press the opposite lever (counterbalanced left/right) until achieving criterion.
Retractable lever training commenced the day after achieving criterion on the FR1 schedule. These 90-trial sessions began with the levers retracted and the operant chamber in darkness. Every 40 s, a trial started with the illumination of the house light and the insertion of one of the two levers into the chamber (randomized in pairs). Failure to respond on the lever within 10 s caused its retraction, the chamber to darken, and the trial was scored as an omission. A response within 10 s caused the lever to retract and the delivery of a single pellet with 50% probability. The house light extinguished 3.5 s after pellet delivery. Rats were trained for ∼3–6 d, at which point they were making 10 or fewer omissions.
Immediately after the final session of retractable lever training, rats were tested for their bias toward a particular lever (Jenni et al., 2017). This single session consisted of trials where both levers would be inserted into the chamber. On the first trial, a pellet was delivered following a response made on either lever. Following this choice, food was delivered only after the rat responded on the lever opposite to the one initially chosen. If the rat chose the same lever as the previous response, no food was delivered and the house light was extinguished. This would continue until the rat correctly chose the lever opposite to what it initially selected. After a response was made on each lever, a new trial started, such that each trial in this side bias task consisted of at least one response on each lever. Their side bias was assigned based on the lever (left or right) selected most often during the initial choice of each trial. The only exception was if a rat happened to make a disproportionate number of responses on one lever over the entire session (i.e., a 2:1 ratio for the total number of presses), in which case that particular lever was deemed the one to which the rat displayed an overall bias.
Subsequently, rats were trained to associate one lever with a larger four-pellet reward and the other with a one-pellet reward (reward magnitude discrimination training). The first phase of reward magnitude discrimination training consisted 2–3 d of training on a 48-trial task, partitioned into four blocks of two forced-choice trials followed by 10 free-choice trials (12 trials per block). Every 40 s, one or both levers were inserted in the chamber. Pressing one lever always delivered four pellets (delivered 0.5 s apart), whereas the other always delivered one pellet. For each rat, the lever associated with the larger reward was the one that was opposite of its side bias, and this remained consistent for the duration of the experiment.
Next, rats were trained for another 2–3 d on a modified version, which introduced a probabilistic component. These sessions consisted of 72 trials, partitioned into four blocks of eight forced-choice and 10 free-choice trials (18 trials per block). Here, selection of the small reward lever always delivered one pellet, whereas choice of the large reward lever delivered four pellets with a 50% probability. All other aspects of this task were identical to those of the retractable lever training sessions. The day after this phase of training was completed, training on the Blackjack task commenced.
Blackjack task
In the main task used in these studies, one lever was designated the large/risky option and the other the small/certain option, which remained consistent for each rat throughout training. The lever designated as the large/risky option was the lever associated with the larger reward during pretraining. Rats were trained 6–7 d/week.
Training on the Blackjack task consisted of two phases. During the initial phase, sessions consisted of 52 trials (36 min). The first 32 trials were forced-choice, where only one lever was inserted into the chamber (randomized in pairs, 16 trials with each lever). This was followed by 20 free-choice trials, where both levers were inserted. Once animals displayed stable choice behavior, they were continued to the final phase of training which consisted of 40-trial sessions (27 min), which were identical to the initial phase, except that all trials were free-choice.
The basic structure of a free-choice trial on the Blackjack task is shown in Figure 1A. A session began in darkness with both levers retracted (the intertrial state). Trials began every 40 s with house light illumination and presentation of one of two distinct auditory cues (i.e., either 3 kHz pure tone or white noise, 80 dB). An equal number of both cues were presented pseudorandomly over the session (randomized in pairs). Three seconds after house light/cue were presented, one or both levers were inserted into the chamber. A response on the small/certain lever immediately turned off the auditory cue and delivered one pellet with 100% probability, regardless of which cue was presented. However, a response on the large/risky lever could yield four pellets, delivered in a probabilistic manner. Importantly, the probability of obtaining the larger reward on a particular trial was indicated on which auditory cue was presented. Thus, one cue was associated with “good-odds” trials, where a risky choice delivered reward with 50% probability. The other cue signaled “poor-odds” trials, where a risky choice was rewarded with 12.5% probability. As such, the large/risky option had a greater utility compared with the small/certain option on good-odds trials, whereas on poor-odds trials, the small/certain option had greater utility (Fig. 1A, right).
The specific auditory cues associated with good- versus poor-odds trials were counterbalanced across rats and remained consistent over the duration of the experiment. A response on either lever caused both to retract. If the rat chose the large/risky option and received a reward, the cue and house light remained on during the delivery of the four pellets and turned off 3 s after the choice. Large/risky choices that did not yield reward extinguished the house light immediately and the auditory cue was silenced 2 s after the choice. The presentation of the cues for a period after risky choices was intended to facilitate learning of the associations between each auditory cue and the likelihood of the different outcomes with these choices. Choice of the small/certain option delivered one pellet and turned off the cue. Following an omission (a lack of response within 10 s of lever insertion), both levers were retracted and the house light and auditory cue were turned off.
Rats were trained initially on the 52-trial (32 forced, 20 free-choice) version of the task. Although only one of the levers was inserted during the forced choice trials, the auditory cues were still pseudorandomly presented, regardless of whether the large/risky or small/certain lever was inserted. On forced-choice trials when the large/risky lever was inserted, the auditory cue presented indicated the respective probability of obtaining the four-pellet reward on that trial (50% or 12.5%). Conversely, for forced-choice trials where the small/certain lever was inserted, an equal number of each auditory cue was presented, so that rats could learn that a response on this lever always delivered one-pellet, regardless of the specific cue that was presented. Rats were trained for 15–17 d on this version of the task, after which they displayed stable levels of choice, determined by analyzing data from three consecutive sessions with a two-way repeated-measures ANOVA, with day and odds as factors. Choice behavior of a group was deemed stable if there was no main effect of day and no day × odds interaction (at p > 0.10).
Rats then progressed to the final phase of training (40 free-choice trials) for another 5–6 d, after which they again displayed stable patterns of choice for 3 d. They were then subjected to surgery, retrained on the task (the first day with the forced/free-choice version, thereafter the free-choice version) for at least 5 d until stable choice behavior was reattained, after which they received their first microinfusion test day (described in subsequent sections).
Auditory conditional discrimination
Separate groups of rats were trained on auditory conditional discrimination (Auger et al., 2017), which resembled the structure of the Blackjack task in many respects. Initial lever press training (basic lever pressing and retractable lever training) was similar to the procedures used for the Blackjack task, except that during retractable lever training, a response always delivered reward. Rats were then trained on the main task, in which they learned to press a lever associated with a particular auditory cue (e.g., 3 kHz = right lever, white noise = left lever) to receive a two-pellet reward. Thus, presentation of one cue signaled that a “correct” response on one lever always delivered reward, whereas an “incorrect” response on the opposite lever delivered no reward. The stimulus associated with the correct lever was counterbalanced across animals and remained consistent throughout training. After a correct, rewarded choice, the auditory cue and house light were turned off 1 and 3 s after the response, respectively. Incorrect responses immediately turned off the house light and auditory cue. All other task procedures were identical to the Blackjack task.
Training on the auditory conditional discrimination consisted of two phases. Like Blackjack training, the initial phase consisted of 52-trial sessions (36 min), starting with 32 forced-choice trials followed by 20 free-choice trials. Each auditory cue was presented an equal number of times throughout the session in a pseudorandom order. On forced-choice trials, only the correct lever was inserted into the chamber. Rats were trained for ∼10 d on this version of the task, until an individual rat displayed criterion performance of >70% correct responses for 2 consecutive days. Subsequently, they received an additional 4 d of training on the final version of the task that consisted of 40 free-choice trials. They were then subjected to surgery and retrained to criterion performance before receiving their first microinfusion test day.
Discriminative Pavlovian approach
Separate group of rats were trained to discriminate between one auditory conditioned stimulus (CS+) that signaled the impending passive delivery of two food pellets and another, CS− auditory cue that was not associated with reward. In this task, the house light was illuminated during the entire session, no levers were present, and the main behavior of interest was approach responses (nosepokes) directed toward the food receptacle during different phases of a trial. Before initial exposure to the chambers, rats were given ∼25 sucrose reward pellets in their home cage. The next day, they were familiarized to the chambers and the delivery of pellets into the food receptacle over a 30 min session in which no cues were presented and sucrose pellets were delivered on a variable interval (VI) 60 s schedule. Discriminative Pavlovian training started the next day with sessions consisting of 40 trials. Two auditory stimuli were used during these sessions, identical to those used for the other tasks in this study (3 kHz tone or white noise, counterbalanced). Trials were initiated on a VI 40 s schedule. At the start of each trial, one of the two auditory cues was presented for 10 s in a pseudorandom order. Termination of the CS+ coincided with the delivery of two pellets into the food cup, whereas termination of the CS− had no consequences. To minimize the likelihood that animals formed an instrumental association between a random nosepoke coinciding with initiation of a cue, the task was programmed so that a nosepoke near the end of the intertrial interval would delay the presentation of a CS by 2 s. As training on this task was considerably shorter than the others used in this study, animals were implanted with guide cannulae before behavioral training. Rats were trained for 9 d, after which they received their first microinfusion test day. They were then retrained for 1–2 d before receiving their second counterbalanced microinfusion.
Stereotaxic surgery.
Rats were provided food ad libitum for at least 1–3 d before surgery. Rats were given a subanesthetic dose of ketamine and xylazine (50 and 4 mg/kg, respectively) and maintained on isoflurane for the duration of the procedure. Rats were implanted stereotaxically with bilateral 23-gauge stainless steel guide cannula into subregions of the NAc. The coordinates were as follows: NAc core [anteroposterior (AP) = +1.8 mm; medial-lateral (ML) = ±1.8 mm from bregma; dorsoventral (DV) = −6.3 mm from dura] and shell (AP +1.6; ML ±1.0; DV −6.3 mm). For the Pavlovian discriminative approach experiment, rats were implanted at a lower weight; in these experiments the AP coordinate for the shell was set at +1.3 mm from bregma. Cannulae were held in place with stainless steel screws and dental acrylic. Thirty gauge obdurators were inserted into the guide cannula and remained in place until infusions were performed. The animals were given a minimum of 1 week to recover from surgery before being retrained on their respective task until they displayed stable patterns of choice behavior.
Before receiving their first microinfusion test, animals received a mock infusion before a training session to familiarize them with the procedures. Obdurators were removed, injectors were placed inside the guide cannula for 2 min but no infusion was administered. Rats were subsequently placed in their home cage for 10 min before behavioral training started.
One to 3 d following mock infusions, animals received their first microinfusion test day. A within-subjects design was used for all experiments. Drugs or saline were infused at a volume of 0.3 μl. Inactivations were induced using a solution containing the GABAB agonist baclofen (75 ng; Sigma-Aldrich), and the GABAA agonist muscimol (75 ng; Sigma-Aldrich) dissolved in 0.9% saline. Previous studies have shown that infusion of these GABA agonists at this concentration and volume can induce dissociable effects on a wide variety of behavioral measures when administered into different subregions of the NAc and other regions that are separated by ∼1 mm. (Floresco et al., 2006; McLaughlin and Floresco, 2007; Stopper and Floresco, 2011; Dalton et al., 2014; Piantadosi et al., 2017). The order of infusions was counterbalanced across animals, such that some rats received saline before inactivation treatments, whereas others received infusion in the opposite order.
Infusions were administered via 30-gauge injection cannulae that protruded 0.8 mm past the end of the guide cannulae and were delivered over 45 s. The cannulae remained in place for an additional 60 s to allow for diffusion. Rats were returned to their home cages and remained there for 10 min, after which they were placed in the operant chambers for behavioral testing. After their first microinfusion test day, rats were retrained on their respective task until they returned to their baseline levels (i.e., pretest day) performance (typically between 1 and 3 d). On the following day they received a second infusion before behavioral testing.
Histology.
After completion of behavioral testing, animals were killed with CO2. Brains were removed and fixed in a 4% formalin solution for at least 24 h. Brains were frozen, sectioned at 50 μm, mounted, and Nissl stained using cresyl violet. Placements were verified with reference to the neuroanatomical atlas of Paxinos and Watson (2005). Data from rats with asymmetrical placements or those residing outside the border of the NAc core or shell were removed from the analysis. The locations of acceptable infusion placements are presented in Figure 2.
Experimental design and statistical analysis.
For the Blackjack task, the primary dependent variable of interest was the proportion of choices of the large/risky option on good- and poor-odds trials, factoring out trial omissions. This was calculated by dividing the number of choices of the large/risky lever by the number of trials in which the rats made a choice, partitioned over good- and poor-odds trials. These data were analyzed using a two-way within-subjects ANOVA with treatment and odds (good vs poor) as within-subjects factors. Response latencies and the number of trial omissions were analyzed with one-way repeated-measures ANOVAs.
Supplementary analyses were conducted on Blackjack data to clarify whether changes in choice behavior were associated with alterations in reward sensitivity (win-stay behavior) and/or negative-feedback sensitivity (lose-shift behavior). The primary analysis of this type focusing on how the outcome of the most recent risky choice influenced the next choice, in a manner similar to previous studies (St. Onge et al., 2011, 2012). Choices that followed a risky choice were analyzed according to the outcome of the preceding choice (reward/no reward) and expressed as a ratio. Win–stay ratios were calculated as a proportion of the number of risky choices following a receipt of the larger reward (a risky win) divided by the total number of larger rewards obtained. Lose–shift ratios were calculated as the proportion of small/certain choices following a non-rewarded risky choice (risky loss) over the total number of non-rewarded choice trials. We further subdivided win–stay and lose–shift ratios based on the odds the rats faced during a particular choice (good vs poor), regardless of the odds they faced on the preceding choice. These scores were analyzed together using a three-way ANOVA, with treatment, feedback sensitivity (win-stay or lose-shift) and odds (good vs poor) as within-subject factors. Note that in this task, reliance on a win-stay or lose-shift strategy is not necessarily advantageous, because the probability of receiving the large reward on a particular trial was independent of the outcome on the preceding trial.
We also attempted to compare win–stay/lose–shift ratios based on the most recent risky choice of the same trial type. For example, if a rat chose risky on a good-odds trial and was rewarded, how likely were they to choose risky again on the next good-odds trial (typically occurring 1–3 trials later in the choice sequence). For both win-stay and lose-shift analyses, the ratios were calculated by dividing the number of stays or shifts on good- and poor-odds trials by the total number of wins or losses on each of these types of trials. However, the task structure and choice patterns of animals in these experiments made these analyses problematic. This is because across the different measures and groups, some rats received zero or only one win or loss after either control or inactivation treatments, which precluded calculation of reliable win–stay or lose–shift ratios for each rat across both treatments.
For the auditory conditional discrimination, the primary dependent variable of interest was the percentage of correct responses. This was calculated proportional to the number of trials on which a response was made, thereby correcting for trial omissions. For this experiment, we also partitioned correct responses over blocks of 10 trials over the 40 trial session. These data were analyzed with two-way repeated-measures ANOVAs with treatment and block as within-subjects factors.
For the discriminative Pavlovian approach task, our primary dependent variable of interest was the proportion of CS+ and CS− trials that elicited at least one approach response toward the food receptacle (i.e., a photobeam break) during the 10 s auditory cue presentation. This enabled a more direct comparison of approach behavior elicited by the two cues on this task versus the Blackjack task. These data were analyzed with a two-way repeated-measures ANOVA, with treatment and CS type as factors. Complementary analyses were conducted on the total number of entries into the food receptacle (nosepokes) toward the food receptacle averaged across 10 s epochs before cue presentation (pre-CS), during the CS+ and CS−, and during 3-s epochs after food delivery. These data were analyzed with two-way, repeated-measures ANOVAs with treatment and epoch as within-subjects factors. We also analyzed latencies to make the first approach toward the food receptacle during CS+ and CS− trials with two-way repeated-measures ANOVAs. All statistical tests were performed using Systat 13 (Systat Software).
Results
Blackjack task
Task acquisition
Figure 1B displays the progression of choice behavior during free-choice trials over the course of training on the Blackjack task from 30 rats whose data were included in this part of the study. During the first few days of training on the task that consisted of 32 forced-choice followed by 20 free-choice trials, rats displayed a discernable bias toward the large/risky option on both good- and poor-odds trials (Fig. 1B), likely attributable to a carryover effect from the reward magnitude training. As training progressed, rats continued to display a strong bias for the risky option on good-odds trials while learning to bias choice away from this option on poor-odds trials. This pattern of behavior was confirmed by the analysis of the choice data with a two-way ANOVA. This yielded a significant training day × odds interaction (F(14,406) = 7.90, p < 0.001), with simple main effects comparisons revealing that rats were selecting the risky option significantly less often on poor- versus good-odds trials from the fifth day of training (p < 0.05). In addition, they began to display an apparent bias toward the small/certain option on poor-odds trials approximately the 10th day of training. After 15 d of training on the forced- and free-choice programs, animals were trained on the version of the task that consisted of 40 free-choice trials. Over the first 4 d of training on this version, they continued to discount the large/risky option on poor-odds trials. By the end of this training period, rats as a group displayed stable patterns of choice behavior.
In a separate group of rats (n = 24) that were used for another study, we wanted to confirm that choice behavior was guided by the auditory cues presented on good- and poor-odds trials. To this end, after 19 d of training on the Blackjack task, we administered a probe session that was identical to a typical Blackjack session, except that the auditory cues that usually informed animals about the odds on each trial were not presented. As displayed in Figure 1C, under baseline task conditions (averaged across 2 d of training before the probe session), rats selected the large/risky option more often on good- versus poor-odds trials. In contrast, on the probe trial, rats chose the risky option with equal frequency on both types of trials (session × odds interaction: F(1,23) = 48.52, p < 0.001), even though the average number of risky choices on good- and poor-odds trials did not differ from baseline (baseline = 44 ± 4%, probe = 40 ± 6%; main effect of day: F(1,23) = 0.85, p > 0.36). This confirmed that rats were incorporating the information provided by the different auditory cues presented before each trial to guide their choice behavior on the Blackjack task. Interestingly, under probe conditions when no informative stimuli were presented, exclusive selection of the risky option over the 40 trials would yield ∼50 pellets compared with 40 pellets if only the small/certain option was selected. However, despite the slightly greater utility of the risky option under these conditions, rats as a group tended to be risk-averse in their choice behavior.
NAc core inactivation
A total of 16 rats with acceptable placements within the NAc core were included in the analysis (Fig. 2A). Performance of these animals following saline infusions was comparable to baseline conditions, in that rats displayed a discernable bias toward the large/risky option on good-odds trials and selected this option more often than on poor-odds trials, where they showed a strong bias away from this option. Inactivation of the NAc core induced a pronounced disruption in choice behavior (Fig. 3A). The ANOVA revealed a significant treatment × odds interaction (F(1,15) = 18.88, p < 0.001), with no main effect of treatment (F(1,15) = 2.56, p > 0.13). Simple main effects analyses confirmed following inactivation of the NAc core, rats chose the risky option on both good- and poor-odds trials with equal frequency (p > 0.40) compared with control conditions where they chose the risky option significantly more often on good- versus poor-odds trials (†p < 0.001). Furthermore NAc core inactivation reduced the proportion of risky choices on good-odds trials and increased risky choice on poor-odds trials (p < 0.05).
Analysis of win-stay/lose-shift behavior complemented that of choice behavior (Fig. 3B). When we analyzed these ratios that were computed as a function of the outcome of the most recent risky choice, we observed that under control conditions, and averaged across good and poor-odds trials, rats followed a risky win with another risky choice on ∼30% of trials. Notably, these values were considerably lower than win–stay ratios displayed by rats performing traditional probabilistic discounting tasks (∼75–85%; Stopper and Floresco, 2011; St. Onge et al., 2012; Montes et al., 2015; Yang et al., 2016; Jenni et al., 2017). This is likely attributable to the fact that in the Blackjack task, the outcome of a risky choice on a particular trial does not provide reliable information about the likelihood of a win or loss on the following trial. In comparison, following a non-rewarded risky choice, rats shifted choice to the small/certain options on ∼60% of these trials, which was considerably higher than lose–shift ratios displayed during probabilistic discounting tasks (∼25–40%).
NAc core inactivation markedly altered these tendencies, in a way that was dependent on the odds an animal was faced with. Analysis of these data revealed a significant two-way, treatment × feedback sensitivity interaction (F(1,15) = 8.94, p < 0.01) and more pertinently a significant three-way, treatment × feedback sensitivity × odds interaction (F(1,15) = 14.51, p < 0.005). Partitioning this interaction revealed that NAc core inactivation increased win-stay and decreased lose-shift behavior predominantly on poor-odds trials (treatment × feedback sensitivity interaction: F(1,15) = 18.29, p < 0.01 and simple main effects, p < 0.05, for each), which reflected the increase in risky choice observed on these trials. In comparison, the reduction in risky choice on good-odds trials did not appear to be related to changes in how preceding rewarded or non-rewarded outcomes influenced subsequent action selection (all F values <1.0, all p values >0.40; Fig. 3B).
NAc core inactivation also caused a general slowing of behavior, increasing choice latencies and trial omissions, and reducing overall locomotion (all F(1,15) > 25.59, all p values <0.001; Table 1). Together these results indicate that inactivation of the NAc core markedly impaired the use of discriminative stimuli to guide action selection during risk/reward decision making. Animals seemed unable to incorporate information provided by different cues about the likelihood of obtaining larger rewards to guide their decision making, and instead reverted to seemingly random patterns of choice, similar to that displayed by rats performing a probe session where the auditory cues were omitted (Fig. 1C).
NAc shell inactivation
Inactivation of the NAc shell altered decision making in a manner that was conspicuously different from that induced by inactivation of the NAc core. Analysis of the data from 14 rats with acceptable placements produced a significant main effect of treatment (F(1,13) = 4.74, p < 0.05), which reflected an overall increase in risky choice after inactivation treatment (Fig. 3C). Although the treatment × odds interaction did not achieve statistical significance (F(1,13) = 2.72, p > 0.12), inspection of Figure 3C shows that NAc shell inactivation increased risky choice primarily on poor-odds trials. This was confirmed with exploratory comparisons across treatment conditions (poor-odds: t(13) = 4.25, p < 0.01; good-odds: t(13) = 0.13, p > 0.80). The analysis also revealed a significant main effect of odds (F(1,13) = 16.67, p < 0.01), indicating that rats chose the risky option more on good- versus poor-odds trials following both inactivation and control treatments.
The increase in risky choice induced by NAc shell inactivation was accompanied by a reduction in lose-shift behavior. Analysis of these data produced a significant three-way interaction (treatment × feedback sensitivity × odds: F(1,13) = 5.42, p < 0.05; Fig. 3D). In keeping with the impression from the choice data, partitioning this interaction revealed that NAc shell inactivations reduced lose-shift behavior on poor-odds trials (treatment × feedback sensitivity interaction: F(1,13) = 6.45, p < 0.05 and simple main effect for lose-shift: p < 0.05), but not on good-odds trials (all F < 1.2, all p > 0.30). Win-stay behavior did not differ across treatments on these trials (p > 0.20). NAc shell inactivation did not significantly affect choice latencies, trial omissions or locomotor activity (all F < 1.6, all p > 0.20; Table 1).
Last, the dissociation between the effects of NAc shell versus core inactivation on choice during the Blackjack task was confirmed statistically with a three-way between/within-subjects ANOVA, which yielded a significant three-way, region × treatment × odds interaction (F(1,28) = 5.00, p < 0.05). Collectively, these findings show that, as opposed to the core, suppression of neural activity within the NAc shell increased the tendency to pursue larger, risky rewards, particularly when the probability of obtaining these rewards was low. This increase in risky choice was associated with a diminished sensitivity to negative feedback, in that previously non-reward choices had less influence on subsequent action selection.
Auditory conditional discrimination
The Blackjack task requires integration of information about reward magnitudes associated with different options with information provided by discriminative stimuli about reward probabilities to estimate which option may have greater utility on a particular trial. Thus, this task may be viewed as a complex form of a conditional discrimination, wherein an arbitrary cue indicates one action may be more profitable, whereas another cue indicates a different action may yield more reward in the long-term. In light of this consideration, it was imperative to clarify whether the NAc core and shell may also be involved in the more basic process of implementing conditional rules, in the absence of the added complexity of integrating information about variations in reward magnitude and probability. In so doing, separate groups of rats were trained on an auditory conditional discrimination task that shared a similar structure to the Blackjack task in many respects. As exemplified in Figure 4A, animals were required to use information provided by the two auditory cues to ascertain which lever was correct on a particular trial (e.g., 3 kHz = right lever, white noise = left lever). Selection of the correct lever always delivered two pellets and choosing the incorrect lever was never reinforced. Given that NAc core inactivation induced indiscriminate choice on the Blackjack task, whereas shell inactivation increased risky choice without affecting discriminative responding, a parsimonious expectation would be that inactivation of the core, but not shell would impair performance on this simpler task.
Task acquisition
Figure 4B displays the progression of learning on the auditory conditional discrimination for rats whose data were included in this the study, partitioned by those that received infusions in the NAc core or shell. After 10 d of training on the task that consisted of 32 forced/20 free-choice trials, rats in both groups displayed high levels of accuracy (>80%). Subsequently, they were trained on a task consisting of 40 free-choice trials. Comparison of the data obtained during these last 4 d of training confirmed there were no differences in performance in rats allocated to the NAc core or shell group (F(1,13) = 0.59, p > 0.80). Following surgery, all rats were retrained for 6 d before receiving their first counterbalanced microinfusions. On the day before the first test day, there were no differences in performance for rats in the core (mean ± SEM = 85 ± 4%) versus shell group (91 ± 3%; F(1,13) = 1.05, p > 0.30).
NAc core inactivation
Data from eight rats with acceptable placements within the NAc core were included in the analysis. Following saline infusions, rats displayed high levels of accuracy, selecting the correct lever on >85% of trials. Inactivation of the NAc core impaired performance, indexed by a sharp reduction in the percentage of correct responses (F(1,7) = 24.09, p < 0.01; Fig. 4C). This impairment was apparent during the first 10 trials and continued for the remainder of the session, as indicated by a lack of a treatment × block interaction (F(3,21) = 0.75, p > 0.50; Fig. 4D). Furthermore, performance after core inactivation over the entire session was not significantly different from chance levels (one-sample t test vs 50%; t(7) = 2.22, p > 0.06). As was observed in the Blackjack experiment, NAc core inactivation increased choice latencies, trial omissions, and reduced locomotor activity (all F(1,7) >10.66, all p values <0.05; Table 1).
NAc shell inactivation
Data from seven rats with acceptable placements within the NAc shell were included in the analysis. Contrary to our expectations, shell inactivation also impaired performance on the conditional discrimination (F(1,6) = 16.52, p < 0.01; Fig. 4E). However, we also observed a significant treatment × block interaction (F(3,18) = 3.21, p < 0.05). Simple main-effects analyses confirmed that, unlike the effects of core inactivation, inactivation of the shell did not significantly impair accuracy during the first 10-trial block of the session (p > 0.20; Fig. 4F). Instead, differences between treatment conditions emerged during the second block (p < 0.05) and were most prominent during the last block (p < 0.01), but not the third block (p > 0.06)
The impairment in performance induced by shell inactivation was not significantly different in magnitude compared with the effects of core inactivation, as revealed by an ANOVA that directly compared the percentage of correct responses after inactivation and control treatment between the groups (treatment × group interaction: F(1,27) = 0.74, p < 0.35). However, even though shell inactivation degraded performance on this task, accuracy following these treatments was significantly above chance levels (one-sample t test vs 50%; t(6) = 4.24, p < 0.01; Fig. 4E, hashtag). In this experiment, shell inactivation increased choice latencies (Table 1), yet this effect did not achieve statistical significance (F(1,6) = 4.70, p > 0.07). Trial omissions and locomotor activity were not altered by inactivation treatments (both F < 2.7, both p > 0.15). Viewed collectively, these results suggest that the use of conditional discriminative stimuli to guide action selection is dependent on intact neural activity in both the NAc core and shell. However, inactivation of the core induced a seemingly more marked impairment in performance that was apparent from the start of the session, whereas impairments induced by NAc shell inactivation only emerged later in the session.
Discriminative Pavlovian approach
Inactivation of the NAc core or shell induced dissociable effects on choice on the Blackjack task, with core inactivation causing random patterns of responding and shell inactivation increasing choice of the larger/risky option. Yet, we observed somewhat similar impairments in conditional discrimination performance using auditory cues following inactivation of either subregion. This combination of findings alludes to the possibility that impairments in conditional discrimination after inactivation of the NAc core or shell may reflect perturbations in distinct processes involved in efficient performance of this task. Specifically, inactivation of the NAc core may have caused a more fundamental impairment in implementing conditional rules to guide approach toward the appropriate lever (Nicola, 2010; Saunders and Robinson, 2012). Conversely, the effects of NAc shell inactivation may be related to a failure to inhibit inappropriate, non-rewarded actions, as has been observed under other conditions (Blaiss and Janak, 2009; Ambroggi et al., 2011). To explore this issue, we investigated how inactivation of these NAc subregions affected discriminative Pavlovian approach behavior in a separate group of well trained rats. As opposed to the Blackjack and conditional discrimination tasks, where arbitrary discriminative cues guided animals to select different actions, in this task, a 10 s auditory CS+ or CS− signaled that reward would either be delivered (regardless of the animal's response) or not. Our primary measure was the proportion of trials where the CS+ or CS− elicited at least one anticipatory approach response toward the food receptacle during the CSs.
NAc core inactivation
Data from eight rats with acceptable placements were included in the analysis. Under control conditions, presentation of the CS+ always elicited at least one approach toward the food receptacle during the 10 s periods that the auditory cue was on (before reward delivery; Fig. 5A). Conversely, the CS− only elicited an approach on ∼35% of trials. Inactivation of the NAc core selectively reduced approaches during CS+ trials. The ANOVA of these data produced a significant main effect of treatment (F(1,7) = 6.45, p < 0.05) and more pertinently, a significant treatment × CS interaction (F(1,7) = 35.99, p < 0.001). Simple main effects analyses confirmed that inactivation of the NAc core caused a significant reduction in the proportion of CS+ trials that elicited an approach (p < 0.001), but no change in the proportion of approaches during CS− trials (p > 0.2; Fig. 5A). Moreover, partitioning of this interaction also revealed that after control treatments, rats were more likely to make an approach to the food receptacle on CS+ versus CS− trials (p < 0.001), whereas after NAc core inactivation, rats were as likely to make an approach on CS+ trials as they were on CS− trials (p = 1.0).
This analysis was complemented by a comparison of the total number of entries into the food receptacle made during different task epochs (Fig. 5B). The overall ANOVA yielded a significant main effect of treatment (F(1,7) = 10.67, p < 0.05) and treatment × epoch interaction (F(3,21) = 15.73, p < 0.01). This interaction was driven by a significant reduction in the total number of entries made during CS+ trials after core inactivation (p < 0.01), but no difference between treatments on CS− trials (p > 0.90; Fig. 5B). In addition, core inactivation actually caused a slight, non-significant increase in the number of entries made during the pre-CS and food delivery periods (p < 0.06). This latter finding confirmed that the effects of NAc core inactivation was limited to those anticipatory approaches elicited by the CS+ while not affecting approaches toward the food receptacle when reward was actually delivered. Analysis of the latency data revealed that NAc core inactivation increased approach latencies selectively during CS+, but not CS− trials (treatment × CS interaction: F(1,7) = 8.03, p < 0.05; Fig. 5C). Last, locomotor activity was reduced after core inactivation (mean ± SEM = 765 ± 104) compared with control treatment (1172 ± 92; F(1,7) = 29.40, p < 0.001). Collectively, the results of this experiment complement numerous findings demonstrating that the NAc core plays a key role in promoting timely approach behavior instigated by various types of stimuli that predict reward availability (Di Ciano et al., 2008; Blaiss and Janak, 2009; Nicola, 2010; Saunders and Robinson, 2012).
NAc shell inactivation
Inactivation of the NAc shell (n = 11) altered discriminative Pavlovian approach in a manner that differed from core inactivation. Analysis of the proportion of CS+ and CS− trials that elicited an approach revealed a significant treatment × CS interaction (F(1,10) = 65.68, p < 0.001) in the absence of a main effect of treatment (F(1,10) = 0.15, p > 0.70; Fig. 5D). Following shell inactivation, rats were less likely to initiate an approach on CS+ trials (p < 0.05), but at the same time, they were more likely to approach the receptacle during the CS− (p < 0.05). Despite these effects, rats still displayed some discriminative control over approach behavior, as they were more likely to approach on CS+ compared with CS− trials after both treatments (both p < 0.05).
Analysis of the total number of entries into the food receptacle made during different task epochs revealed a significant main effect of treatment (F(1,10) = 10.61, p < 0.05) and a treatment × epoch interaction (F(3,30) = 16.49, p < 0.001; Fig. 5E). This effect was driven exclusively by a reduction in nosepokes made during CS+ trials after shell inactivation (p < 0.05). Interestingly, even though inactivation of the shell increased the proportion of CS− trials where rats made an approach (Fig. 5D), there was no difference in the total number of nosepokes made during these trials across treatments (p > 0.80). This likely reflects that even though animals were nearly twice as likely to initiate an approach in response to the CS− after shell inactivation, they were quicker to disengage from the food receptacle after these approaches. Additionally, there were no significant differences between treatments in terms of the number of entries during the pre-CS or food delivery epochs (both p > 0.35). Moreover, there were no differences in the latencies to make the first approach during CS+/CS− trials (main effect of treatment: F(1,10) = 2.34, p > 0.15; treatment × CS interaction: F(1,10) = 0.03, p > 0.80; Figure 5F). However, in contrast to what was observed in the Blackjack and conditional discrimination experiments, during this task, inactivation of the NAc shell increased locomotor activity (mean ± SEM = 1736 ± 121) compared with control conditions (1362 ± 143; F(1,10) = 5.55, p < 0.05). These discrepant effects on locomotion across experiments may be attributable to differences in response requirements across the tasks. The Blackjack and conditional discrimination tasks required rats to make an instrumental response within 10 s of lever insertion. This would be expected to increased attentional focus to the section of the chambers that housed the levers, which may have curtailed excessive ambulatory activity. In contrast, the Pavlovian task had no such demands on reaction times, because rewards were delivered regardless of how rapidly rats approached the food receptacle. In this regard, these differential effects are in keeping with others findings by our group demonstrating that shell inactivation increases locomotion during tasks that do not place limits on reaction times (e.g., free-operant responding; Floresco et al., 2008), but not during those that require a response within a limited amount of time (Stopper and Floresco, 2011).
Last, the dissociation between the effects of NAc shell versus core inactivation on Pavlovian approach was confirmed statistically, using a three-way between/within-subjects ANOVA, which yielded a significant three-way, region × treatment × CS interaction (F(1,17) = 5.27, p < 0.05). Viewed collectively, these findings suggest that, like the NAc core, activity within the NAc shell also facilitates timely Pavlovian approach behavior elicited by a CS+. However, the shell also appears to suppress approach behavior when other stimuli signal explicitly that rewards will not be delivered.
Discussion
A primary objective of this study was to explore how the NAc core and shell contribute to risk/reward decision making under conditions where discriminative stimuli inform about the likelihood of obtaining larger, riskier rewards. The Blackjack experiment revealed dissociable roles for these two regions. In the NAc core group, inactivation yielded indiscriminate patterns of choice on good- versus poor-odds trials, whereas shell inactivation increased risky choice. Complementary experiments revealed both contributed to performance of a conditional discrimination that did not require integration of information about reward probabilities or magnitudes. NAc core or shell inactivation both reduced Pavlovian approach elicited by an auditory CS+, whereas shell inactivation also increased responding during presentation of a non-rewarded CS−. Collectively these findings provide novel insight into the complementary yet dissociable manners in which NAc subregions refine risk/reward decision making and reward-related action selection guided by discriminative stimuli.
Fundamental role for the NAc core in implementing conditional rules
NAc core inactivation induced indiscriminate choice patterns on the Blackjack task that did not appear to be guided by external information, and also increased win-stay and decreased lose-shifted tendencies on poor-odds trials. More importantly, these treatments also induced chance levels of performance on a conditional auditory discrimination. Thus, the effects on the Blackjack task do not appear to be attributable to specific alterations judgements about the relative value or probabilities of obtaining rewards or incorporating information about recent outcomes during cue-guided risk/reward decision making. Rather, our interpretation is that this nucleus plays a more fundamental role in implementing higher-order conditional rules (e.g., if stimulus A then select left, if stimulus B then select right). This complements other studies demonstrating that the NAc core also facilitates switching between different rules or strategies (Floresco et al., 2006; Block et al., 2007; Haluk and Floresco, 2009).
The disruption of choice on the Blackjack task contrasts with previous findings that NAc core inactivation did not affect decision making on a probabilistic discounting task (Stopper and Floresco, 2011). That study also required animals to choose between small/certain and large/risky rewards, with probabilities changing systematically over a session, so that adjustments in biases were guided primarily by internally generated information. Similarly, inactivation of the NAc core does not disrupt preference for larger versus smaller rewards, nor does it impair acquisition of simple spatial or visually cued discriminations (Floresco et al., 2006; Ghods-Sharifi and Floresco, 2010). Thus, even though core inactivation increased response latencies and trial omissions, we find it unlikely that the effects on choice reported here are attributable solely to nonspecific motivational or discrimination impairments. Instead, the present data suggest the NAc core plays an integral role in using discriminative cues to implement conditional rules and guide actions toward stimuli more likely to yield rewards, regardless of whether their delivery is probabilistic or deterministic. An important question pertains to afferent regions that process representations of conditional rules and instigate appropriate approach via the NAc core. Lesions of key inputs to the NAc core, such as the basolateral amygdala, orbitofrontal, anterior cingulate, or prelimbic regions of prefrontal cortex do not impair acquisition of conditional visual or olfactory discriminations when reward contingencies are deterministic (Bussey et al., 1996; Burns et al., 1999; Blundell et al., 2001; Chudasama et al., 2001; Chudasama and Robbins, 2003; Schoenbaum et al., 2003). The possibility remains that these regions may play a role in guiding this form of action selection when reward contingencies are probabilistic, as in the Blackjack task. On the other hand, learning tone/light conditional discriminations is impaired by mediodorsal thalamic lesions (Wolff et al., 2015), and thalamo-accumbens circuitry has been implicated in facilitating acquisition of novel rules (Block et al., 2007). Thalamic inputs to the NAc core may converge with those from auditory cortex (McGeorge and Faull, 1989), which have been proposed to facilitate conditional go/no-go avoidance discriminations (Schulz et al., 2016). Furthermore, NAc dopamine release in response to instructive reward-predicting cues is shaped by correct movement initiation, suggesting that dopaminergic transmission in this nucleus may act to promote correct selection and “execution of actions to enable reward to be efficiently realized” (Syed et al., 2016). Additional studies aimed at clarifying the (sub)cortical regions that interface with the core to implement conditional rules guided by auditory cues are warranted.
The NAc shell and refinement of reward-related action selection
In contrast to the indiscriminate choice patterns on the Blackjack task induced by core inactivation, shell inactivation increased risky choice. These dissociable effects parallel findings of different patterns of activity within these subregions when humans accept or reject risky gambles (Baliki et al., 2013). Activation within the core was independent of perceived gains or losses, whereas shell activation reflected whether the gamble was accepted or rejected. Similarly, in the present study, core inactivation led to indiscriminate choice behavior, yet following shell inactivation, choice biases leaned more toward larger rewards, suggesting this region may play a more prominent role in using value-related information to guiding choice in situations involving reward uncertainty.
An examination of how choice patterns evolved over training on the Blackjack task provides additional insight into the processes affected by shell inactivation. Initially, animals preferred the larger reward on good- and poor-odds trials. As training progressed and rats learned that the poor-odds cue indicated risky choices were unlikely to be rewarded, bias shifted away from the large/risky option on these trials. Thus, the poor-odds stimulus redirected action selection, suppressing tendencies to pursue larger rewards and favor smaller, reliable ones, an effect that was amplified after non-rewarded choices. The observation that shell inactivation increased risky choice and reduced sensitivity to losses predominantly on poor-odds trials suggest that activity within this nucleus curbs pursuit of potentially larger rewards, particularly when external stimuli inform a decision maker that these actions are unlikely to pay off.
The effects of shell inactivation on the Blackjack task seemingly contrast with the findings that these treatments reduced risky choice and win-stay behavior during probabilistic discounting and also caused a slight reduction in preference for larger versus smaller rewards (Stopper and Floresco, 2011). These contrasting findings highlight that the manner in which certain nodes of corticolimbic-striatal circuitry refine action selection when rewards are uncertain may differ, depending on whether decision making is guided by internally generated information or external stimuli. Notably, under control conditions of our previous study (Stopper and Floresco, 2011), animals displayed a prominent bias toward the large/risky option. As such, these previously reported effects may be viewed as an increased tendency to direct actions toward less-preferred rewards. Taking this into account, the present data, along with previous findings suggest that in choice situations, the shell does not uniformly promote risky or risk-averse patterns of choice, but instead appears to refine choice by suppressing actions that may lead to subjectively inferior rewards.
Conditional discrimination performance was also disrupted by shell inactivation. Interestingly, these impairments only emerged later in the session, after which animals selected both levers multiple times. This profile is similar to the effects of shell inactivation on performance of a within-session serial reversal task (Dalton et al., 2014). In that study, shell inactivation did not affect accuracy during the initial discrimination or first reversal phases of the session, but did increase errors during latter reversal phases. This suggests that activity within the NAc shell is not essential for appropriate rule implementation per se, but may instead aid in maintaining accuracy under conditions entailing multiple back-and-forth shifts to maximize procurement of rewards.
The findings that shell inactivation impaired auditory conditional discrimination performance is in keeping those of Reading et al. (1991), showing that NAc lesions impaired visual conditional discrimination performance. Although these lesions were not specific to core or shell subregions, they were “most marked in the medial parts” of the nucleus (their p. 152). Interestingly, these impairments were ameliorated partially by reducing attentional demands of the task, leading to the conclusion that these effects of NAc lesions reflected “difficulties in ignoring stimuli irrelevant to task performance” (their p. 156). We propose a similar explanation for the impaired conditional discrimination performance induced by shell inactivation, in that this reflects a failure to suppress inappropriate, non-rewarded actions. This supposition is supported by the discriminative Pavlovian approach experiment. Here, shell inactivation increased the tendency to approach a food receptacle during CS− presentation, which was never associated with reward. In contrast, these treatments reduced approaches during CS+ trials, consistent with previous findings (Blaiss and Janak, 2009). This latter effect, combined with the increase in locomotion observed in this experiment suggests that shell inactivation may have reduced Pavlovian approach by displacing appropriate anticipatory behaviors directed toward the food receptacle with other, non-directed behaviors. Thus, even under these relatively rudimentary conditions, shell inactivations rendered animals less capable of behaving appropriately in response to CS+ (approach) and CS− (ignore). This notion is in keeping with a burgeoning literature that lesions/inactivations of this nucleus disinhibits inappropriate, non-rewarded or punished behaviors under a wide variety of experimental conditions (Gal et al., 2005; Di Ciano et al., 2008; Floresco et al., 2008; Blaiss and Janak, 2009; Ambroggi et al., 2011; Feja et al., 2014; Piantadosi et al., 2017). Viewed collectively, these findings suggest that the NAc shell refines reward-seeking by suppressing actions irrelevant to the task at hand and/or unlikely to achieve desired goals.
Conclusions
The collection of findings reported here provide novel insight into the complementary yet distinct roles that the NAc core and shell play in facilitating reward seeking guided by discriminative stimuli. The core plays a fundamental role in implementing conditional rules, in which arbitrary discriminative stimuli are associated with different actions, regardless of the relative reward probability or value linked to those actions. In comparison, the shell refines ongoing behavior to increase the likelihood that actions yield more profitable outcomes. In the context of risk/reward judgements, the shell mitigates the allure of larger rewards when external stimuli inform the decision maker that they are unlikely to be received. More broadly, the shell may focus behavior toward procurement of rewards by reducing the tendency to emit inappropriate, irrelevant, or non-rewarded actions. Additional exploration of how the NAc shell subserves these functions may provide insight into the pathophysiological mechanisms that may underlie obsessive-compulsive, attention deficit/hyperactivity, and pathological gambling disorders, which are associated with both difficulties in restraining maladaptive behavior and abnormal activity within the NAc (Admon et al., 2012; van Holst et al., 2012; Hoogman et al., 2013; Rausch et al., 2014; Ma et al., 2016).
Footnotes
This work was supported by a Grant from the Canadian Institutes of Health Research (MOP 133579) to S.B.F. We thank Lawrence Bau for assistance with pilot experiments related to this study.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Stan B. Floresco, Department of Psychology, University of British Columbia, 2136 West Mall, Vancouver, BC V6T 1Z4, Canada. Floresco{at}psych.ubc.ca