Abstract
Decision-making circuits are modulated across life stages (e.g. juvenile, adolescent, or adult)—as well as on the shorter timescale of reproductive cycles in females—to meet changing environmental and physiological demands. Ovarian hormonal modulation of relevant neural circuits is a potential mechanism by which behavioral flexibility is regulated in females. Here we examined the influence of prepubertal ovariectomy (pOVX) versus sham surgery on performance in an odor-based multiple choice reversal task. We observed that pOVX females made different types of errors during reversal learning compared to sham surgery controls. Using reinforcement learning models fit to trial-by-trial behavior, we found that pOVX females exhibited lower inverse temperature parameter (β) compared to sham females. These findings suggest that OVX females solve the reversal task using a more exploratory choice policy, whereas sham females use a more exploitative policy prioritizing estimated high value options. To seek a neural correlate of this behavioral difference, we performed whole-cell patch clamp recordings within the dorsomedial striatum (DMS), a region implicated in regulating action selection and explore/exploit choice policy. We found that the intrinsic excitability of dopamine receptor type 2 (D2R) expressing indirect pathway spiny projection neurons (iSPNs) was significantly higher in pOVX females compared to both unmanipulated and sham surgery females. Finally, to test whether mimicking this increase in iSPN excitability could recapitulate the pattern of reversal task behavior observed in pOVX females, we chemogenetically activated DMS D2R(+) neurons within intact female mice. We found that chemogenetic activation increased exploratory choice during reversal, similar to the pattern we observed in pOVX females. Together, these data suggest that pubertal status may influence explore/exploit balance in females via the modulation of iSPN intrinsic excitability within the DMS.
Introduction
As animals interact with their environment in pursuit of rewards in the form of food, water, mates etc., they learn from trial and error to guide their future choices. This process involves learning from positive and negative feedback and also, importantly, deciding how learned information should influence choice, referred to as choice policy. Reinforcement learning (RL) models (Sutton and Barto, 1998) have provided a useful framework for understanding and quantifying aspects of trial-and-error learning, including choice policy. A classic RL problem that hinges on choice policy is the explore/exploit tradeoff. If an animal (or any agent for that matter) adopts an exploit policy, it will consistently select the highest estimated value option, but may miss out on better alternative options. On the other hand, if an animal favors a more exploratory choice policy, characterized by less value-dependent, more stochastic choice behavior, it may discover new and better options more readily (Sutton and Barto, 1998; Daw et al., 2006). Importantly, the optimal balance of exploration and exploitation may depend on the statistics of the environment and/or the needs of the animal as defined by its particular physiological or developmental state (Cohen et al., 2007; Frank et al., 2009; Humphreys et al., 2015; Addicott et al., 2017; Lenow et al., 2017; Gopnik, 2020). In humans, choice behavior generally becomes less exploratory and more exploitative during the transition from childhood to adulthood (Nussenbaum and Hartley, 2019; Gopnik, 2020; Xia et al., 2020; Eckstein et al., 2021). Natural fluctuations in ovarian hormones across the estrous cycle or exogenous estradiol administration have been shown to regulate aspects of value-based decision making in female rats (Uban et al., 2012; Orsini et al., 2021), including explore/exploit balance (Verharen et al., 2019b). These data suggest that the rise in ovarian hormones at puberty could contribute to the developmental shift in choice policy during adolescence in females. In previous work, we observed that pOVX altered performance in a multiple choice reversal task in adult C57/Bl6 mice (Delevich et al., 2020a). Compared to intact females, pOVX females showed lower ratios of perseverative to regressive errors during reversal learning, but the potential underlying biological processes that contributed to this behavioral effect remained unclear.
The DMS is implicated in the regulation of goal-directed action selection (Tai et al., 2012; Nonomura et al., 2018; Matamales et al., 2020; Peak et al., 2020) and choice policy (Collins and Frank, 2014), and recent work suggests that enhancing the activity of D2R(+) SPNs in the dorsal striatum biases choice behavior to be more exploratory (Lee et al., 2015; Delevich et al., 2020b) but see (Verharen et al., 2019a). While nuclear estrogen receptors are notably absent from the dorsal striatum in adulthood (Krentzel et al., 2021), extranuclear estrogen receptors (ERα, ERβ, and GPER1) localize to SPNs, glia, and the presynaptic terminals of striatal GABAergic and cholinergic interneurons of adult female rats (Almey et al., 2012). At the neuronal level, estrous cycle has been shown to regulate the intrinsic excitability of SPNs located within the rodent striatum (Proano et al., 2018; Alonso-Caraballo and Ferrario, 2019). Studies examining the influence of estrous cycle on SPN physiology have been primarily performed in rats, where SPN cell types were not distinguished, but see (Tansey et al., 1983). Taken together, these findings raise the question of whether pubertal status influences choice strategies employed by females by modulating striatal SPN physiology. Here we focused on D2R(+) SPNs of the indirect pathway (iSPNs) within the DMS, whose activity we hypothesized regulates explore/exploit balance in decision making based on theoretical predictions (Collins and Frank, 2014; Dunovan and Verstynen, 2016) plus genetic (Beeler et al., 2010; Kwak et al., 2014) and pharmacological (Lee et al., 2015; McCoy et al., 2019) evidence.
In the current study, after analyzing raw behavioral data, we applied reinforcement learning modeling to examine how pOVX influenced learning and choice policy processes underlying performance in the odor-based reversal learning task. We next examined the influence of pOVX on the intrinsic excitability of genetically identified D2R(+) SPNs within the DMS of adult female mice. Finally, we chemogenetically activated D2R(+) neurons within the DMS of female mice during reversal learning and applied our RL model to determine whether this manipulation recapitulated the reversal learning strategy employed by pOVX females. We found that compared to intact adult females, pOVX females exhibited a more exploratory choice strategy during reversal learning as evidenced by a lower explore/exploit inverse temperature parameter, β. In addition, D2R(+) SPN intrinsic excitability was increased in pOVX females compared to sham females. Finally, chemogenetic activation of D2R(+) SPNs within the DMS promoted a more exploratory choice strategy during reversal learning in intact female mice, resembling pOVX female behavior. Together, these data suggest that pubertal status influences the choice strategy female mice employ via the modulation of D2R(+) SPN activity.
Materials & Methods
Animals
Female C57BL/6NCR (Charles River), Drd2-eGFP BAC (GENSAT), and D2-Cre ER43 (MMRC) mice were bred in-house. Drd2-eGFP BAC and D2-Cre ER43 mice were bred onto the C57BL/6NCR background for at least 5 generations. All mice were weaned on postnatal day (P)21 and housed in groups of 2–3 same-sex siblings on a 12:12 hr reversed light:dark cycle (lights on at 2200 h). All behavioral tests were conducted during the dark phase. For all experiments, mice were randomly assigned to experimental groups and sample sizes were based on previously conducted experiments (e.g. Delevich et al. 2020a,b). Each behavioral experiment was conducted once, and no animal was tested on multiple occasions. All procedures were approved by the Animal Care and Use Committee of the University of California, Berkeley and conformed to principles outlined by the NIH Guide for the Care and Use of Laboratory Animals.
Prepubertal Ovariectomy
Prepubertal ovariectomy was performed as previously described (Delevich et al., 2020a). To eliminate ovarian hormone exposure during and after puberty, ovariectomies were performed before puberty onset at P25. Prior to ovariectomy (OVX) surgery, all female mice were visually inspected to confirm that vaginal opening had not occurred. Prior to surgery, mice were injected with 0.05 mg/kg buprenorphine and 10 mg/kg meloxicam subcutaneously and were anesthetized with 1–2% isoflurane during surgery. The incision area was shaved and scrubbed with ethanol and betadine. Ophthalmic ointment was placed over the eyes to prevent drying. A 1 cm incision was made with a scalpel in the lower abdomen across the midline to access the abdominal cavity. The ovaries were clamped off from the uterine horn, with locking forceps and ligated with sterile sutures. After ligation, the ovaries were excised with a scalpel. The muscle and skin layers were sutured, and wound clips were placed over the incision for 7–10 days to allow the incision to heal. An additional injection of 10 mg/kg meloxicam was given 24 and 48 h after surgery. Sham control surgeries were performed in which fat pads were visualized but the ovaries were not clamped, ligated, or excised. Female littermates were randomly assigned to sham or pOVX groups. Mice were allowed to recover on a heating pad until ambulatory and were post-surgically monitored for 7–10 days to check for normal weight gain and signs of discomfort/distress. Mice were co-housed with 1-2 siblings who received the same surgical treatment. To confirm the success of prepubertal ovariectomies, necropsy was performed on a subset of adult sham and ovariectomized mice to confirm that the uteri of pOVX mice were underdeveloped compared to age-matched sham females (data not shown).
4 choice odor-based reversal task
Sham or pOVX mice were tested in an odor-based reversal task that has previously been described in detail (Johnson and Wilbrecht, 2011; Johnson et al., 2016) as young adults (P60-P70). The task is designed such that only the odor cue is predictive of reward, while spatial and egocentric information is irrelevant. Briefly, mice were food restricted to ~85% body weight by the Discrimination phase. Mice were habituated to the testing arena on day 1, they were taught to dig for a honey nut cheerio reward in a pot filled with unscented wood shavings on day 2, underwent a 4 choice odor Discrimination on day 3, and finally, were tested on Recall of the previously learned odor-reward association, which was immediately followed by a Reversal phase on day 4. During the Discrimination phase of the task, mice learned to discriminate among four pots with different scented wood shavings (anise, clove, litsea, and thyme). All four pots were sham-baited with cheerio (under wire mesh at bottom) but only one pot was rewarded (anise). The pots of scented shavings were placed in each corner of an acrylic arena (12”, 12”, 9”) which was divided into four quadrants. Mice were placed in a cylinder in the center of the arena, and a trial started when the cylinder was lifted. Mice were then free to explore the arena and indicate their choice by making a bi-manual dig in one of the four pots of wood shavings. The cylinder was lowered as soon as a choice was made. If the choice was incorrect, the trial was terminated and the mouse was gently encouraged back into the start cylinder. Trials in which no choice was made within 3 minutes were considered omissions. If mice omitted for two consecutive trials, they received a reminder: a baited pot of unscented wood shavings was placed in the center cylinder and mice dug for the “free” reward. Mice were disqualified if they committed four pairs of omissions. The location of the four odor scented pots was shuffled on each trial, and criterion was met when the mouse completed 8 out of 10 consecutive trials correctly. 24 hours after completing Discrimination, mice were tested for Recall of the initial odor Discrimination to criterion, after which, mice immediately proceeded to the Reversal phase in which the previously rewarded odor (anise) was no longer rewarded, and a previously unrewarded odor (clove) was now rewarded. During the Reversal phase, Odor 4 (thyme) was replaced by a novel odor (eucalyptus) that was unrewarded. Again, mice were run until they reached a criterion of 8 out of 10 consecutive correct trials.
4 choice odor-based reversal task analysis
To compare reversal task performance across groups, trials to criterion and errors (incorrect choices) were compared for each phase of the task (Discrimination, Recall, and Reversal). Omission trials did not count towards trials to criterion. In addition, for the Reversal phase we separated errors in which mice chose the odor that was rewarded during Discrimination (Odor 1) into two types: 1) perseverative errors occurred when Odor 1 was chosen prior to the first correct trial and 2) regressive errors occurred when Odor 1 was chosen after the first correct trial during the Reversal phase. To compare the relative proportion of these error types within mice, we calculated Reversal error bias as (perseverative – regressive errors)/(perseverative + regressive errors). Therefore, a value > 1 indicates a bias for perseverative errors whereas a value < 1 indicates a bias for regressive errors. Finally, we examined how quickly mice accumulated rewards after the first correct trial during the Reversal phase by aligning trial histories to the first correct trial and summing rewarded trials across the subsequent 8 trials. Data were fit by linear regression for each group and the slope of the lines compared to determine whether groups significantly differed in their rate of reward accumulation. Behavioral data from 14 of the 16 pOVX females and 10 of the 15 sham females presented here were included in a previously published study examining sex differences of prepubertal gonadectomy on approach-avoidance behaviors, but latent decision variables were not examined (Delevich et al., 2020a).
Reinforcement learning modeling of 4 choice odor-based reversal task
We modeled Discrimination and Reversal phase behavior using a reinforcement learning model driven by an iterative error-based rule (Rescorla and Wagner, 1972; Sutton and Barto, 1998). The model uses a prediction error (δ) to update the value (V) of each odor stimulus, where δ is the difference between the experienced feedback (λ) and the current expected value (r= 100 for rewarded, r= 0 for unrewarded) scaled by a learning rate parameter (α), with 0<α<1:
Because mice exhibit innate preferences for odors, we set initial odor values to fixed parameters [v1,v2,v3,v4] for all mice tested by calculating the probability of choosing each odor during the first 4 trials of Discrimination × 100 (see Johnson et al. 2016). These initial odor values were calculated separately for mice included in Figure 1 and Figure 4 (see data source files or analysis code for more details). To model trial-by-trial choice probabilities, the stimulus values were transformed using a softmax function to compute choice probabilities based on estimated odor values, V(0)i. The inverse temperature parameter (β), which we refer to in the text as the explore/exploit parameter, determined the stochasticity of the choices:
For RL modeling, trial histories from Discrimination and Recall phases were concatenated to create one Discrimination phase trial history. We compared the alternative models using AIC (Watanabe, 2010) and found that the best fit model included phase-specific (non-zero) α and β parameters; all RL model comparisons for pOVX and sham females are presented in Table S1 as well as source data files and analysis code. To assess model performance, trial-by-trial behavioral data was recovered using the best fit parameters for each animal, and average recovered choices to criterion for Discrimination and Reversal phases (100 simulations/animal) were plotted against the actual choices to criterion for each animal.
(A) Female C57/Bl6 mice underwent OVX or sham surgery at P25 and were trained in the multiple choice reversal task in adulthood (P60-70). (B) Mice were trained to a criterion of 8/10 correct consecutive choices to Odor 1 during Discrimination. 24 hours later they were tested for Recall of the previous day’s rule before immediately advancing to a Reversal phase during which Odor 2, rather than Odor 1, was rewarded. Reversal criterion was reached when mice made 8/10 correct consecutive choices to Odor 2. (C) There was a main effect of task phase on trials to criterion but no effect of treatment (Two-way RM ANOVA main effect of task phase F(1, 29) = 6.30, p<0.05). (D) There was a significant effect of treatment and error type on the number of reversal errors (Two-way RM ANOVA treatment × error type interaction: F(5, 145) = 2.79, p<0.05). pOVX females made significantly more regressive errors compared to sham females (11.25 ± 1.8 vs. 6.13 ± 1.2, p<0.05, uncorrected Fisher’s LSD). (E) pOVX females had a significantly lower Reversal error bias (perseverative – regressive errors)/(perseverative + regressive errors) compared to sham females (0.11 ± 0.13 vs. 0.49 ± 0.12, p<0.05, unpaired t-test). (F) Sham females accumulated rewards after the first correct Reversal trial faster than pOVX females (best fit line with 95% C.I. plotted). (G) RL model applied to odor-based multiple choice reversal task. Schematic based on (Verharen et al., 2019b). (H) Best-fit α learning rate estimates did not significantly differ by task phase or treatment. (I) There was a significant interaction between task phase and treatment group on the best-fit explore/exploit β parameter (Two-way ANOVA task phase × treatment interaction: F(1,29)= 7.101, p<0.05). Post-hoc comparisons revealed that β parameter estimates were significantly higher during Reversal compared to the Discrimination phase for sham (p<0.0001, uncorrected Fisher’s LSD) and pOVX females (p<0.05, uncorrected Fisher’s LSD). In addition, Reversal phase β parameter estimates were significantly lower in pOVX females compared to sham (p<0.05, uncorrected Fisher’s LSD). Data in (H) plotted as median ± IQR.
Stereotaxic Virus Injection
Female D2-Cre mice (6-8 weeks) were deeply anesthetized with 5% isoflurane (vol/vol) in oxygen and placed into a stereotactic frame (Kopf Instruments; Tujunga, CA) upon a heating pad. Anesthesia was maintained at 1-2% isoflurane during surgery. An incision was made along the midline of the scalp and small burr holes were drilled over each injection site. Virus was delivered via microinjection using a Nanoject II injector (Drummond Scientific Company; Broomall, PA). Injection coordinates for DMS were (in mm from bregma): 0.90 anterior, +/-1.4 lateral, and −3.0 from surface of the brain. Adeno-associated viruses (AAVs) were produced by Addgene viral service and had titers of >1012 genome copies per mL. For chemogenetic manipulations, mice were bilaterally injected with 0.5 uL of rAAV8-hsyn-DIO-mCherry (N=9) rAAV8-hsyn-DIO-hM3Dq- mCherry (N=6), or rAAV8-hsyn-DIO-hM4Di-mCherry (N=5). Mice were given subcutaneous injections of meloxicam (10 mg/kg) during surgery and 24 and 48 hours after surgery. Mice were group-housed before and after surgery and 4-6 weeks were allowed for viral expression before behavioral training or electrophysiology experiments.
Drugs
Clozapine-N-Oxide was generously provided by the NIMH Chemical Synthesis and Drug Supply Program (NIMH C-929). CNO was made fresh each day and dissolved in DMSO (0.5% final concentration) and diluted to 0.1 mg/mL in 0.9% saline USP.
Electrophysiology
Mice were deeply anesthetized with an overdose of ketamine/xylazine solution and perfused transcardially with ice-cold cutting solution containing (in mM): 110 choline-Cl, 2.5 KCl, 7 MgCl2, 0.5 CaCl2, 25 NaHCO3, 11.6 Na-ascorbate, 3 Na-pyruvate, 1.25 NaH2PO4, and 25 D-glucose, and bubbled in 95% O2/5% CO2. 300 μm thick coronal sections were cut in ice-cold cutting solution before being transferred to ACSF containing (in mM): 120 NaCl, 2.5 KCl, 1.3 MgCl2, 2.5 CaCl2, 26.2 NaHCO3, 1 NaH2PO4 and 11 Glucose. Slices were bubbled with 95% O2/ 5% CO2 in a 37°C bath for 30 min, and allowed to recover for 30 min at room temperature before recording. All recordings were made using a Multiclamp 700B amplifier and were not corrected for liquid junction potential. The bath was heated to 32°C for all recordings. Data were digitized at 20 kHz and filtered at 1 or 3 kHz using a Digidata 1440 A system with pClamp 10.2 software (Molecular Devices, Sunnyvale, CA, USA). Only cells with access resistance of <25 MΩ were retained for analysis. Cells were discarded if parameters changed more than 20%. Data were analyzed using pClamp or R (RStudio 0.99.879; R Foundation for Statistical Computing, Vienna, AT).
Whole-cell current clamp recordings were performed using a potassium gluconate-based intracellular solution (in mM): 140 K Gluconate, 5 KCl, 10 HEPES, 0.2 EGTA, 2 MgCl2, 4 MgATP, 0.3 Na2GTP, and 10 Na2-Phosphocreatine. Alexa Fluor 594 (40 μM) was added to the internal solution to enable morphological confirmation of SPN identify following recording. In order to block NMDA and AMPA-mediated currents, 5 μM AP5 and 25 μM NBQX were added to the ACSF, respectively for intrinsic excitability data in Figure 2. For all recordings, cells were allowed to stabilize for 2 min after break in and prior to any current injection. For current clamp recordings to test the effect of CNO in Gq-DREADD-expressing vs. mCherry-expressing D2R(+) neurons, baseline input-output curves were collected before 5 minute wash-on of 10 μM CNO.
(A) At P25 D2-eGFP(+) female mice underwent sham or pOVX surgery, while a third group of female D2-eGFP(+) mice received no surgery. (B) Whole-cell current clamp recordings were made from visually identified D2-eGFP(+) SPNs within the DMS from all groups in adulthood (P65-90). (C) Representative responses to negative current steps (−150, −100, −50, 0 pA) in D2R(+) SPNs from sham, pOVX, and unmanipulated females. Scale bar: 100 ms, 5 mV. (D) D2-eGFP(+) SPNs in pOVX female mice had higher input resistance compared to sham and unmanipulated females. (E) Representative responses to positive current steps (120, 180 pA) in D2R(+) SPNs from sham, pOVX, and unmanipulated females. Scale bar: 100 ms, 50 mV. (F) Decreased rheobase of D2-eGFP(+) SPNs was observed in pOVX compared to sham and unmanipulated females. (G) Spike number across sequential depolarizing current steps (10-500 pA) for D2-eGFP(+) SPNs. Increased spiking was observed in pOVX compared to sham and unmanipulated females (Two-way RM ANOVA current x treatment interaction: F(98, 1715)= 2.52, p<0.0001). (H) No difference in maximum firing rate was observed across treatment groups. (I) No difference in RMP was observed across treatment groups. *p<0.05, **p<0.01; n/N = 15/5, 13/5, and 10/3 for sham, pOVX, and unmanipulated mice, respectively.
Histology
Mice were transcardially perfused with PBS followed by 4% PFA in PBS. Following 24h postfixation, coronal brain slices (75 μm) were sectioned using a vibratome (VT100S Leica Biosystems; Buffalo Grove, IL). To confirm viral targeting, we performed a standard immunohistochemical procedure using a primary antibody against red fluorescence protein (RFP) (rabbit, Rockland 600-401-379; 1:1000) to enhance the mCherry signal expressed in mice transduced with rAAV8-hSyn-DIO-DREADD-mCherry or rAAV8-hSyn-DIO-mCherry. Sections were counterstained with DAPI (Life Technologies; Carlsbad, CA). Images were acquired with a Zeiss Axio Scan.Z1 epifluorescence microscope (Molecular Imaging Center, UC Berkeley) at 10x magnification and viewed using FIJI (ImageJ). Anatomical regions were identified according to the Mouse Brain in Stereotaxic Coordinates by Franklin and Paxinos and the Allen Institute Mouse Brain Atlas.
Statistics and Data Analysis
For comparisons between 2 groups, a t-test was used when data were normally distributed, and Welch's correction was applied when variance was unequal. The D’Agostino & Pearson test was used to test for normality. For experiments in which 3 groups were compared, a one-way ANOVA or Kruskal Wallis test when not normally distributed was performed, followed by two-tailed uncorrected Fisher’s LSD or Dunn’s test, respectively, for pairwise comparisons. Two-way ANOVA was performed when two independent variables were examined (e.g. treatment and error type), followed by uncorrected Fisher’s LSD (two-tailed) for pairwise comparisons. Post-hoc comparisons were not corrected, due to the limited number of planned comparisons. Throughout the paper, p=0.05 was used as the criterion for a significant statistical difference unless noted otherwise. Data are expressed as mean ± SEM unless noted otherwise.
Data availability
All data generated or analyzed during this study are included in the manuscript and supporting files. Source data files have been provided for all experiments reported in this manuscript in an online repository at https://doi.org/10.6084/m9.figshare.14783628.v1. Analysis code is availableat https://github.com/kdelevich/4choiceRLmodeling.
Results
Prepubertal ovariectomy affects reversal learning by promoting exploratory choice policy
We performed sham surgery or pOVX on female C57/Bl6 mice at postnatal day 25 (P25), prior to puberty onset, and trained them in an odor-based reversal task between P60-70 (Fig. 1A). The odor-based reversal task consisted of two main phases: 1) a Discrimination phase during which mice learned through trial and error that one of four scented pots of wood shavings contained a buried food reward and 2) a Reversal phase in which the odor-reward contingency was reversed (Fig. 1B). Sham females were not staged for estrous cycle, and both groups performed similarly in the Recall phase (Supplementary Fig. 1). When comparing Discrimination and Reversal, there was a significant effect of task phase but not treatment on trials to reach criterion [task phase: F(1,29)= 6.31, p= 0.018; treatment: F(1,29)= 0.11, p= 0.74; task phase treatment: F(1,29)= 0.05, p= 0.83] (Fig. 1C).
Next, we more closely examined the types of errors that mice made during Reversal. Error types included those made to the previously rewarded odor, which we divided into 2 subtypes: perseverative (errors made before the first correct trial) and regressive (errors made after first correct trial). Perseverative errors reflect a tendency to stick to a previously learned rule, whereas regressive errors reflect a failure to acquire or maintain the new rule. There was a significant interaction between error type and treatment group [F(5,145)= 2.79, p=0.02] (Fig. 1D). Post hoc analyses revealed that pOVX females made significantly more regressive errors compared to sham females (p= 0.03 uncorrected Fisher’s LSD). We next examined the pattern of perseverative and regressive errors made by individual mice. Sham females exhibited a significantly higher ratio of perseverative to regressive errors (Reversal error bias) compared to pOVX females (sham vs. pOVX females: t(29)=2.12, p= 0.04) (Figure 1E). Finally, we observed that sham females accumulated rewards at a significantly higher rate after the first rewarded trial compared to pOVX females during Reversal but not Discrimination (Figure 1F). These data suggest that sham females and pOVX females reach criterion in the reversal task using different trial-by-trial strategies.
We next turned to computational modeling to assess if differences observed in the Reversal phase between sham and pOVX females arise from a difference in odor value updating, a difference in choice policy, or a combination of both. To do so, we fit trial-by-trial behavioral data with RL models and used the maximum log likelihood to determine the parameters that best fit each animal’s behavior. The best fit model included phase-specific parameters for the learning rate α and the explore/exploit inverse temperature parameter β (Fig. 1G) (see Supplementary Table 1 for alternate model comparison). We found that there was a significant interaction between task phase and treatment for the explore/exploit parameter β [task phase × treatment: F(1,29) = 7.101, p= 0.013]. In sham and pOVX female mice, the explore/exploit parameter was significantly higher during the Reversal phase compared to Discrimination phase (sham Reversal vs. Discrimination: p<0.0001; pOVX Reversal vs. Discrimination p= 0.011 uncorrected Fisher’s LSD) and Reversal phase explore/exploit parameter was significantly lower in pOVX vs. sham females (pOVX vs. sham: p= 0.014 uncorrected Fisher’s LSD) consistent with pOVX females employing a more exploratory choice policy compared to sham females (Fig. 1H).
Prepubertal OVX is associated with increased intrinsic excitability of D2R(+) SPNs
The DMS has been implicated in action selection and determining choice policy, and previous studies have found evidence that estrous cycle modulates the intrinsic excitability of striatal SPNs. Furthermore, several lines of evidence suggest that D2R(+) iSPNs: 1) are modulated by ovarian hormones (Le Saux and Di Paolo, 2005; Le Saux et al., 2006; Krentzel et al., 2019) and 2) can influence explore/exploit balance during decision making (Kwak et al., 2014; Lee et al., 2015; Delevich et al., 2020b). We therefore investigated whether changes in the intrinsic excitability of D2R(+) SPNs within DMS may contribute to sham vs. pOVX differences in choice policy during reversal learning. We performed whole-cell current clamp recordings of visually identified eGFP+ and neurons within the DMS of adult D2-eGFP transgenic female mice who underwent pOVX or sham surgery and unmanipulated female mice in the presence of the excitatory synaptic blockers NBQX and AP5 (Fig. 2A-C). AlexaFluor-594 was included in the internal solution, and all cells included in analysis were confirmed to have spinous morphology. We found a main effect of treatment on D2-eGFP(+) SPN input resistance [main effect of treatment: H= 8.76, p= 0.0125] with pOVX females exhibiting higher input resistance compared to sham and unmanipulated females (pOVX vs. sham p=0.013; pOVX vs. unman. p=0.009, uncorrected Dunn’s test) (Fig. 2D). When we injected a series of positive current steps (Fig. 2E), we found that the minimum amount of current necessary to trigger an action potential (rheobase) was significantly lower in pOVX females compared to sham and unmanipulated females (pOVX vs. sham p=0.001; pOVX vs. unman. p= 0.003, uncorrected Fisher’s LSD) (Fig. 2F). In addition, there was a significant interaction between treatment and current on spike output [F(98, 1715)= 2.517, p<0.0001] (Fig. 2G). While input-output curves were shifted leftward in pOVX compared to sham and unmanipulated females, there was no significant effect of treatment on maximum firing rate [F(2, 18.68)= 0.10, p= 0.90] (Fig. 2H). Finally, resting membrane potential (RMP) did not differ across treatments [F(2, 35)= 2.172, p= 0.129] (Fig. 2I). These data indicate that D2R(+) SPNs within the DMS are more intrinsically excitable in pOVX females compared to unstaged sham and unmanipulated female mice.
Chemogenetic activation of D2R(+) SPNs in DMS reduces perseverative bias and promotes a more exploratory reversal strategy in female mice
Given that OVX females exhibit a more exploratory reversal strategy and greater intrinsic excitability of D2R(+) SPNs in DMS, we next asked whether experimentally increasing D2R(+) SPN intrinsic excitability would similarly bias intact female mice towards increased exploration during the Reversal phase. Female D2-Cre mice were bilaterally infused with 0.5 μL of Cre-dependent DREADD virus (hM4Di-mCherry or hM3Dq-mCherry) and trained 4-6 weeks later in the 4 choice odor-based reversal learning task (Fig. 3A). Female mice expressing Cre-inducible mCherry were used to control for any effects of surgery, AAV infection, and clozapine-N-oxide (CNO) administration on behavior. To examine how CNO activation of hM3Dq expressed by D2R(+) SPNs in DMS alters their activity, we performed whole-cell current clamp recordings of identified mCherry+ neurons in mice that expressed the excitatory DREADD hM3Dq or mCherry alone (Fig. 3B). Briefly, spike output in response to depolarizing steps (0–360 pA, 20 pA steps) was recorded from visually identified mCherry+ neurons in D2-mCherry or D2-hM3Dq-mCherry expressing SPNs in DMS (Fig. 3A-D). Next, 10 μM CNO was bath-applied for 5 minutes and spike output to the same sequential series of depolarizing current steps was recorded (Fig. 3A-D). There was no significant interaction between current step and drug on spike output in D2-mCherry expressing SPNs [Two-way RM ANOVA, current x drug: F(18,36)=0.89, p=0.59] (Fig. 3C) but there was a significant interaction between current and drug on spike output in D2-hM3Dq- mCherry SPNs [Two-way RM ANOVA, current x drug: F(18,36)=3.93, p=0.0002] (Fig. 3D). Finally, there was a significant interaction between virus and drug on rheobase [Two-way RM ANOVA F(1,4)= 16.0, p=0.016] (Fig. 3E).
(A) Schematic of injection and representative brain section showing hM3Dq-mCherry expression in the DMS of D2-Cre mouse. (B) Schematic of indirect pathway expression (sagittal view) and whole-cell patch-clamp configuration of mCherry+ or hM3Dq-mCherry+ iSPNs in female D2-Cre mice. (C) Top panel: representative responses to positive current steps (100, 120, 140, 160 pA) in mCherry+ iSPNs before and after CNO wash on. Scale bar: 100 ms, 50 mV. Bottom panel: no significant interaction between current step and CNO treatment on spike output in D2R(+) mCherry-expressing iSPNs. (D) Top panel: representative responses to positive current steps (100, 120, 140, 160 pA) in hM3Dq-mCherry+ iSPNs before and after CNO wash on. Scale bar: 100 ms, 50 mV. Bottom panel: significant interaction between current step and CNO treatment on spike output in D2R(+) mCherry-expressing iSPNs (Two-way ANOVA current step × drug, p<0.0001). (E) Summary of CNO wash on effect on rheobase (Two-way ANOVA virus × drug, p<0.05).
Prior to Discrimination training all mice received i.p. injections of saline (Fig. 4A) and learned through trial and error that one of four presented odors indicated the location of a buried food reward. Mice completed the Discrimination task phase when they selected the rewarded odor (Odor 1) on 8/10 consecutive trials. Twenty-four hours later, all groups were administered CNO (1.0 mg/kg, i.p.) and tested for their recall of discrimination learning followed immediately by a Reversal phase in which Odor 1 was no longer rewarded and Odor 2 became rewarded. There was a significant effect of task phase on trials to criterion [Reversal vs. Discrimination; F(1,17)= 16.58, p= 0.0008] but no significant effect of virus [F(2,17) = 0.37, p= 0.69] or interaction between virus and task phase [F(2,17)= 0.09, p=0.92] (Fig. 4B). While there was no significant effect of chemogenetic manipulation on Reversal phase trials to criterion, we found a significant interaction between virus and error type during Reversal [F(10,85)= 2.721, p= 0.006] (Fig. 4C) that was absent during Discrimination when mice were on saline (Supplementary Figure 2). D2-hM3Dq mice made significantly fewer perseverative errors compared to D2-mCherry (p= 0.015, uncorrected Dunn’s test) and D2-hM4Di groups (p= 0.028, uncorrected Fisher’s LSD) and made significantly more regressive errors compared to D2-hM4Di mice (p= 0.03, uncorrected Fisher’s LSD) (Fig. 4C). We next examined whether chemogenetic manipulation of D2R(+) neurons in the DMS altered Reversal error bias within mice. There was a significant effect of virus on Reversal error bias (H= 9.06, p= 0.005 Kruskal-Wallis test), with D2-hM3Dq mice having a significantly lower Reversal error bias compared to D2-mCherry (p=0.019, uncorrected Dunn’s test) and D2-hM4Di groups (p= 0.005, uncorrected Dunn’s test) (Fig. 4D), consistent with a greater tendency to make regressive errors compared to perseverative errors. This data suggests that chemogenetic activation of D2R(+) neurons in the DMS produced a pattern of reversal phase choice behavior that was similar to the effect seen in pOVX mice.
(A) Top panel: schematic illustrating injection site and viral spread female D2-Cre DIO-mCherry (N=9), DIO-hM3Dq (N=6), and DIO-hM4Di (N=5) mice. Bottom panel: summary of behavioral training. (B) There was a main effect of task phase but no effect of virus on trials to criterion (Two-way RM ANOVA main effect of task phase: F(2,17)= 16.58, p<0.001). (C) There was a significant interaction between error type and virus on reversal errors (Two-way RM ANOVA error type } manipulation interaction: F(10,85)= 2.72, p<0.01) with D2-hM3Dq mice making fewer perseverative errors (p<0.05, uncorrected Fisher’s LSD) compared to D2-mCherry and D2-hM4di mice and more regressive errors compared to D2- hM4di mice (p<0.05, uncorrected Fisher’s LSD). (D) There was a main effect of virus on Reversal error bias (W=9.06, p<0.01 Kruskal-Wallis test), with D2-hM3Dq mice showing reduced bias for perseverative errors compared to D2-mCherry (p<0.05, uncorrected Dunn’s test) and D2-hM4Di mice (p<0.01, uncorrected Dunn’s test). (E) Best-fit α learning rate did not significantly differ by task phase or virus. (F) There was a significant interaction between task phase and treatment group on the best-fit explore/exploit parameter β Two-way ANOVA task phase x treatment interaction: F(2,17)= 5.18, p<0.05). Post-hoc comparisons revealed that β parameter estimates were significantly higher during Reversal compared to Discrimination phase for D2-mCherry mice (p<0.01, uncorrected Fisher’s LSD) and D2-hM4Di mice (p<0.05, uncorrected Fisher’s LSD) but not D2-hM3Dq mice (p=0.41, uncorrected Fisher’s LSD). In addition, Reversal phase β parameter estimates were significantly lower in D2-hM3Dq mice compared to D2-mCherry (p<0.001, uncorrected Fisher’s LSD) and D2-hM4Di mice (p<0.01, uncorrected Fisher’s LSD). *p<0.05, **p<0.01, ***p<0.001. Data in (E) plotted as median ± IQR.
Finally, we applied RL modeling to determine whether similar changes in decision-making parameters might explain the pattern of reversal behavior we observed in pOVX female mice and D2-hM3Dq female mice. Fitting the same RL model (task phase-specific α and β parameters; see Methods) we found that there was no significant interaction between virus and task phase on learning rate α [F(2,17)= 0.45, p=0.64] (Fig. 4E), but there was a significant interaction between virus and task phase for the explore/exploit parameter β [F(2,17)= 5.18, p=0.018] (Fig. 4F). The Reversal phase explore/exploit parameter was significantly lower in D2-hM3Dq mice compared to D2-mCherry (p=0.0002, uncorrected Fisher’s LSD) and D2-hM4Di (p= 0.002, uncorrected Fisher’s LSD) (Fig. 4F). These data suggest that chemogenetic activation of D2R(+) neurons within DMS biases choice strategy in female mice to be more exploratory during reversal learning. Moreover, chemogenetic activation of D2R(+) neurons within the DMS produced behavior in female mice that mimicked the behavioral pattern seen in OVX females, including a reduction in Reversal error bias during reversal learning and a reduced explore/exploit β parameter consistent with a less exploitative, more exploratory choice policy. Taken together with evidence that D2R(+) iSPNs within the DMS are more intrinsically excitable in pOVX compared to sham females, these data support a model whereby pOVX biases reversal learning strategy towards exploration by modulating iSPN intrinsic excitability within DMS (Fig. 5).
Both pOVX and chemogenetic activation of D2R(+) neurons within the DMS are associated with increased intrinsic excitability of iSPNs. In turn, both manipulations are associated with less perseverative, more exploratory choice strategy during reversal learning. While indirect, the convergent behavioral effects of pOVX and chemogenetic activation of DMS D2R(+) neurons suggest that the increased intrinsic excitability of iSPNs in the DMS of pOVX mice could contribute to the altered reversal strategy observed. Future experiments should perform in vivo recording of D2R(+) SPNs and/or chemogenetic manipulation experiments in pOVX females to probe the relationship between altered D2R(+) SPN intrinsic properties and reversal learning strategy on a trial-by-trial basis.
Discussion
We found that pOVX altered how female mice solved a reversal learning task. Using RL models fit to trial-by-trial behavioral data, we found that pOVX mice exhibited a more exploratory choice policy during reversal learning than sham controls, captured by a lower inverse temperature β parameter. This difference in exploratory choice behavior was accompanied by increased intrinsic excitability of D2R(+) iSPNs in the DMS, a region that is implicated in regulating action selection and choice policy. We then sought to mimic this effect using chemogenetics. We demonstrated that chemogenetic activation of D2R(+) neurons in vitro similarly enhanced iSPN intrinsic excitability in slices from female brains. In addition, activation of DMS D2R(+) neurons in vivo decreased the ratio of perseverative to regressive errors and promoted exploratory choice captured by a lower inverse temperature β parameter. Together, these data suggest that two distinct manipulations: pOVX and hM3Dq activation converged on similar behavioral effects through a shared mechanism of enhancing DMS iSPN intrinsic excitability.
Our data are consistent with studies that manipulate D2Rs and model choice policy. Germline D2R knockout (Kwak et al., 2014), systemic D2R antagonist administration (Eisenegger et al., 2014), and intrastriatal D2R antagonist infusion (Lee et al., 2015) are each associated with more exploratory choice policy. However, none of these studies could rule out the contribution of presynaptic D2 autoreceptors, which is important given the apparent role of tonic dopamine in modulating explore/exploit balance (Beeler et al., 2010; Humphries et al., 2012; Cinotti et al., 2019), but see (Costa et al., 2014). Our chemogenetic manipulation experiments (which do not infect D2R(+) dopamine axon terminals) clearly demonstrate that activation of D2R(+) neurons within DMS is sufficient to bias performance towards exploration.
We speculate that there are two likely circuit mechanisms downstream of D2R(+) iSPNs that may be responsible for promoting exploratory choice policy. The first involves local lateral connections from iSPNs to direct pathway SPNs (dSPNs) and the second involved the interface of the direct and indirect pathways in basal ganglia output centers such as the substantia nigra pars reticulata (SNr). One recent study showed that systemic injection of the D2R antagonist raclopride induced dopamine-dependent transcriptional activation in iSPNs that opposed the activation of dSPNs, suggesting that iSPN to dSPN transmodulation is an important mechanism for behavioral flexibility (Matamales et al., 2020). Therefore, it is possible that elevated iSPN activity, either through pOVX or chemogenetic activation, promotes exploratory choice by dampening the activity of task-relevant ensembles of dSPNs that would normally promote the selection of the highest estimated-value option. Opponent mechanisms between the direct and indirect pathway at convergent downstream targets are also predicted to regulate choice policy (Collins and Frank, 2014).
Studies have shown that the intrinsic properties of striatal SPNs differ between females and males before puberty (Dorris et al., 2015) and in an estrous cycle-dependent manner after puberty (Proano et al., 2018). Interestingly, Proano et al. found that the intrinsic excitability of accumbal SPNs was significantly higher during diestrus/metestrus, when estradiol and progesterone levels are low, compared to proestrus/estrus (Proano et al., 2018). This may be in keeping with our observation that the intrinsic excitability of D2R(+) SPNs was higher in pOVX females compared to sham females, since pOVX females lack gonadally-produced estradiol and progesterone. Estradiol has also been shown to influence dopamine release and reuptake in the striatum (Calipari et al., 2017), including the dorsal striatum (Becker and Beer, 1986; Becker, 1990). Recent data also suggest that dopamine influences the postnatal maturation of intrinsic excitability of SPN populations within the striatum (Lieberman et al., 2018). Therefore, it is possible that the changes we observed in D2R(+) SPN excitability in pOVX female mice may occur by direct action on SPNs or downstream of hormonal effects on presynaptic dopamine release (Lin et al., 2020).
There are several lines of evidence that suggest that ovarian hormones preferentially modulate the activity of D2R(+) iSPNs versus D1R(+) dSPNs. OVX decreases D2 receptor binding in the striatum, and estradiol or treatment with an ERβ agonist counteracts the effect of OVX (Le Saux et al., 2006). The aforementioned treatments did not alter D2R mRNA expression, suggesting that estradiol modulates D2R binding through a mechanism other than transcriptional regulation of D2R expression. Furthermore, OVX reduces the expression of preproenkephalin, which produces the endogenous opioid peptide, enkephalin, which is expressed in iSPNs (Le Saux and Di Paolo, 2005). Again, the effect of OVX on preproenkephalin expression can be counteracted by estradiol administration. Finally, in the nucleus accumbens core, rapid enhancement of mEPSC amplitude by estradiol is inversely correlated with rheobase (Krentzel et al., 2019). Given that D2R(+) SPNs typically display lower rheobase compared to D1R+ SPNs, this suggests that estradiol may exert a greater effect on D2R(+) iSPNs compared to D1R(+) dSPNs. However, it should be noted that the authors did not observe rapid effects of estradiol on mEPSC amplitude within dorsal striatum (Krentzel et al., 2019).
There are several limitations to our current study that should be noted. First, we cannot assume that the changes we observed in SPN intrinsic excitability are specific to the dorsomedial region of the striatum or to the D2R(+) SPN cell type. Second, our evidence linking the increased excitability of D2R(+) SPNs in DMS to the more exploratory choice strategy used by pOVX females is correlational. In independent experiments we observed that 1) pOVX promoted exploratory choice strategy during Reversal 2) pOVX is associated with elevated intrinsic excitability of D2R(+) SPNs in DMS and 3) that chemogenetic activation of D2R(+) SPNs in DMS promoted exploratory choice strategy during Reversal. In the future, more direct evidence could be gained by performing manipulation experiments to reduce the activity of DMS D2R(+) SPNs in OVX females, or by recording the activity of these same neurons in pOVX and sham females during behavior. Finally, while we performed OVX prior to puberty onset, we do not know whether the timing of OVX plays an important role in the observed effect on behavior and physiology. We also do not know if and when hormone replacement may rescue the effects of pOVX. Future studies should examine timing effects of OVX and hormone replacement on these outcome measures. Still, in light of these limitations, our data suggest future lines of inquiry into the relationship between puberty, ovarian hormones, SPN physiology, and choice policy in value-based decision making.
As yet, we have not identified the mechanism by which pOVX leads to enhanced excitability of D2R(+) SPNs. Ovarian hormones have been shown to regulate dendritic complexity and spine density in cell types in other brain regions (Gould et al., 1990; Woolley et al., 1990; Wallace et al., 2006; Chen et al., 2009; Ye et al., 2019). The dendrites of D2R(+) SPNs are enriched in Kir2 family inward rectifying K+ channels (Uchimura et al., 1989; Nisenbaum and Wilson, 1995; Mermelstein et al., 1998; Shen et al., 2007), and a reduction in dendritic length/complexity has been associated with reduced Kir2 expression and increased intrinsic excitability (Cazorla et al., 2012; Sebel et al., 2017). Therefore, it would be informative to compare iSPN Kir2 channel currents and dendritic morphology in sham vs. pOVX females. Finally, it is possible that the increase in intrinsic excitability of the D2R(+) SPNs in pOVX females could represent a homeostatic plasticity mechanism that accompanies a reduction in excitatory synaptic inputs to them, but we did not measure synaptic inputs onto D2R(+) SPNs in this study.
There is a growing interest in understanding the mechanisms that underlie sex differences in value-based decision making. A recent study showed that compared to males, female mice employed a more consistent strategy while learning a two-dimensional decision-making task (Chen et al., 2021). This observed tendency for female mice to constrain their decision-space align with the exploitative reversal learning strategy we observed in sham females. Conversely, we observed that pOVX females exhibited a markedly more exploratory reversal learning strategy, sticking less to the previously rewarded odor choice and committing more regressive errors compared to sham females. These findings suggest that ovarian hormones contribute to the female-biased choice strategies utilized during value-based decision making. While previous studies have separately provided evidence that ovarian hormones regulate the intrinsic excitability of SPNs and aspects of value-based decision making, we show for the first time that pOVX alters explore/exploit balance of choice strategy while also increasing the intrinsic excitability of D2R(+) SPNs in the DMS. These data suggest that pubertal status may influence explore/exploit balance via the modulation of SPN intrinsic excitability within the DMS and highlight a role for ovarian hormones in establishing sex-specific decision-making strategies in adulthood. These data can inform the basic science of decision making and the study of the many psychiatric disorders that emerge after puberty and also show sex differences in their prevalence or manifestation.
Declaration of interests
The authors declare that there are not conflicts of interest.
Acknowledgments
We thank Yuting Zhang and Kenechukwu Okwuosa for technical assistance with mouse behavior testing. We thank Benjamin Hoshal for contributing to analysis code. We thank Dr. Helen Bateup for feedback on the manuscript and Dr. Anne Collins and Wilbrecht lab members for helpful discussion.