Chronic Exposure to Glucocorticoids Induces Suboptimal Decision-Making in Mice

Anxio-depressive symptoms as well as severe cognitive dysfunction including aberrant decision-making (DM) are documented in neuropsychiatric patients with hypercortisolaemia. Yet, the influence of the hypothalamo-pituitary-adrenal (HPA) axis on DM processes remains poorly understood. As a tractable mean to approach this human condition, adult male C57BL/6JRj mice were chronically treated with corticosterone (CORT) prior to behavioural, physiological and neurobiological evaluation. The behavioural data indicate that chronic CORT delays the acquisition of contingencies required to orient responding towards optimal DM performance in a mouse Gambling Task (mGT). Specifically, CORT-treated animals show a longer exploration and a delayed onset of the optimal DM performance. Remarkably, the proportion of individuals performing suboptimally in the mGT is increased in the CORT condition. This variability seems to be better accounted for by variations in sensitivity to negative rather than to positive outcome. Besides, CORT-treated animals perform worse than control animals in a spatial working memory (WM) paradigm and in a motor learning task. Finally, Western blotting neurobiological analyses show that chronic CORT downregulates glucocorticoid receptor expression in the medial Prefrontal Cortex (mPFC). Besides, corticotropin-releasing factor signalling in the mPFC of CORT individuals negatively correlates with their DM performance. Collectively, this study describes how chronic exposure to glucocorticoids induces suboptimal DM under uncertainty in a mGT, hampers WM and motor learning processes, thus affecting specific emotional, motor, cognitive and neurobiological endophenotypic dimensions relevant for precision medicine in biological psychiatry.


INTRODUCTION
Chronically elevated circulating glucocorticoids (GC) have been extensively shown to have detrimental physiological and cognitive effects (see for instance [1]). Particularly, persistent hypothalamo-pituitary-adrenal (HPA) axis dysfunction has been reported in humans upon repeated stress, with elevated levels of the endogenous GC cortisol [2][3][4], but also in patients with chronic inflammatory diseases treated with exogenous GC [5][6][7]. In fact, hypercortisolaemia is part of the symptomatology reported in patients with neuropsychiatric disorders afflicted with severe cognitive dysfunction [8,9]. Specifically, aberrant decisionmaking (DM) has been described in patients suffering from depression using the Iowa Gambling Task (IGT) [10]. This paradigm involves probabilistic learning via monetary rewards and penalties, and optimal task performance that leads to the maximization of gains, requires subjects to develop a preference for smaller immediate rewards in order to avoid more important losses in the long-term. Interestingly, maladaptive DM strategies have also been reported in healthy subjects [11,12]. Of particular interest, depressed patients show a reduced ability to detect and incorporate experience from reward-learning associations [13], therefore anhedonia is thought to act by modifying goal-directed behaviours when positive reinforcements are involved [14]. Moreover, hyposensitivity to positive outcome (reward) and maladaptive responses to negative outcome have been linked to depression [15,16], suggesting a dysfunctional interaction between limbic and motor-executive regions as putative underlying mechanisms. Yet, the influence of the HPA axis on DM alterations remains poorly understood. The regulatory role of GC on HPA axis activity [17] has pointed to imbalances in the expression of their main receptors (glucocorticoid-GR, and mineralocorticoid receptors -MR) as biomarkers of depressive states [18,19]. Simultaneously, the corticotropin-releasing factor (CRF), the major activator of the HPA axis, is thought a key player in stress-induced executive dysfunction [20] and in mood and anxiety disorders [21][22][23][24][25]. Chronic corticosterone (CORT) administration in rodents represents a tractable mean to address these human pathological conditions [26]. In fact, chronic CORT-treated animals exhibit a behavioural spectrum reminiscent to emotional anxio-depressive symptoms as evidenced in several non-conditioned tasks [27][28][29]. Besides, in gambling tasks, healthy rodents efficiently explore and sample from different options prior to establish their choice strategy upon associative and reinforcement learning, showing a high inter-individual variability, probably shared with humans [30][31][32][33][34][35][36].
Here, we hypothesized that chronic CORT exposure leads to suboptimal DM processing under uncertainty in a mouse Gambling Task. In line with the dimensional framework of the Research Domain Criteria Initiative (RDoC) [37], we addressed feedback sensitivity since optimal performance in gambling tasks requires effective exploration of options in their early stages [32,33]. Aiming to elucidate their implication in suboptimal DM, spatial working memory (WM) and psychomotricity, as cognitive and arousal-sensorimotor constructs, were also explored. Three relevant brain regions were targeted in this study given their contribution in instrumental behaviour, and mood and stress-related symptomatology: the medial prefrontal cortex (mPFC) and the dorsolateral striatum (DLS), modulators of goal-directed and habit-based learning processes respectively [38,39], and the ventral hippocampus (VH), involved in stress and emotional processing, exerting strong regulatory control on the HPA axis [40]. The protein levels of GR, MR and CRF were quantified in these brain areas. As depression is associated with a high rate of pharmacological resistance [41,42] and to a high risk of suicide [43,44], understanding how neuronal mechanisms underlying DM processes are altered may offer insights towards the detection of predictive biomarkers for treatment selection.  week-old male C57BL/6JRj mice (EtsJanvier Labs, Saint-Berthevin, France) were group-housed and maintained under a normal 12-hour light/dark cycle with constant temperature (22±2ºC). They had access to standard chow (Kliba Nafag 3430PMS10, Serlab, CH-4303 Kaiserau, Germany) ad libitum for three weeks, and the fourth week onwards, under food restriction to 80-90% of their free-feeding weight (mean ± SEM (g) = 26.20±0.26). Bottles containing water and/or treatment were available at all times. Experiments were all conducted following the standards of the Ethical Committee in Animal Experimentation from Besançon (CEBEA-58; A-25-056-2). All efforts were made to minimize animal suffering during the experiments according to the Directive from the European Council at 22 nd of September 2010 (2010/63/EU).

Pharmacological treatment
Mice started being treated four weeks before the beginning of the behavioural assessment. Half the individuals received corticosterone (CORT, -4-Pregnene-11β-diol-3,20-dione-21dione, Sigma-Aldrich, France) in the drinking water (35μg/ml equivalent to 5 mg/kg/day, CORT group). CORT was freshly dissolved twice a week in vehicle (0.45% hydroxypropyl-βcyclodextrin -βCD, Roquette GmbH, France) which control animals (SHAM group) received in the drinking water throughout the entire experiment.

Multi-domain behavioural characterization
Mice were tested behaviourally during the light phase of the cycle (from 8:00 a.m.) after 4 weeks of differential treatment. A timeline of the experiment is presented in Figure 1, established within the framework of the RDoC to assess the functioning of several complementary systems, including Negative and Positive Valence Systems, Cognitive, Sensorimotor and Arousal and Regulation Systems. Delayed spatial Win-Shift Task (dWST) After 5 consecutive training days, spatial WM was tested in a subset of animals (SHAM, n=22; CORT, n=22) as previously described (adapted from [45], for details see SOM).

Mouse Gambling Task (mGT)
Decision-making was measured using the mGT task the protocol of which we have previously described [30] and is extensively detailed in the SOM. Decision-making performance in the mGT was measured as the percentage of advantageous choices over five 20-trial sessions. Choice strategy based on 4 different behavioural dimensions (stickiness, flexibility, lose-shift and win-stay scores), was assessed in 40-trial blocks as previously described [30,32,33,46]. Performance during the last session was considered for the overall measure of DM performance, as previously described [34,35]. Six mice displaying immediate spatial preference among options (choice proportion different from the expected in absence of spatial preference, thus 25% of choices for each of the 4 available options; Χ 2 , p<0.05) were discarded from the subsequent mGT analyses.

Sucrose Preference Test (SPT)
The individual sensitivity to reward [47] was measured using the preference for a sucrose solution over water, as previously described [30].

Forced Swim Test (FST)
Coping strategies in the face of distressing, uncertain conditions were measured in a FST [19,48]. Mice were individually placed for 6 minutes in an inescapable glass cylinder filled with 20 cm of warm water (31.5±0.5°C) and the overall time during which they were immobile was recorded [49]. Two animals were discarded from the analysis due to technical reasons. Motor Learning Task (MLT) Psychomotricity was measured using a rotarod task (adapted from [50]) and detailed in the SOM. All procedures including food reward (20mg Dustless Precision Pellets® Grain-Based Diet, PHYMEP s.a.r.L., Paris, France) were preceded by a habituation period inside the home cages (see Figure 1 for the experimental design).

Fur Coat State (FCS)
The effects of chronic CORT exposure on self-oriented behaviour were assessed using the weekly measurement of the state of the fur coat of each animal, as previously described [51].

CORT plasma assays
Final trunk blood samples were collected from all animals (n=80) 5-7 days after the last behavioural test and directly centrifuged at 2100 g for 15 minutes at 20°C. Serum was collected and stored at -80°C until assayed. Plasma CORT concentration was measured using an immunoassay kit (DetectX Corticosterone Immunoassay kit, Arbor Assays, Ann Arbor, Michigan, USA). In order to measure the homeostatic stress reactivity of the HPA axis, blood samples were also collected from a subset of mice (SHAM, n=10; CORT, n=9) following a gentle restraint stress [52] directly before sacrifice.

Data and statistical analyses
Data are presented as means ± SEM. Statistical analyses were conducted using STATISTICA 10 (Statsoft, Palo Alto, USA) and figures were designed using GraphPad Prism 7 software (GraphPad Inc., San Diego, USA). The sample sizes were identified a priori by statistical power analysis (G*Power software, Heinrich Heine Universität, Dusseldorf, Germany) with a repeated measures ANOVA (RM-ANOVA) design including 3 groups (between-subject factor), 5 measurements (within-subject factor) and predicted effect size of 0.14, 1-ß=0.8 and α=0.05. Our animal sample is predicted to yield highly reproducible outcomes with 1-ß>0.8 and α<0.05. Individuals across pharmacological conditions were clustered in three different groups, namely (1) good, (2) intermediate and (3) poor decision-makers (DMs), with distinct preference for the advantageous options: ≥70% preference, between 70% and 50% preference, and ≤50% preference respectively. Assumptions for parametric analysis were verified prior to each analysis: normality of distribution with Shapiro-Wilk, homogeneity of variance with Levene's and sphericity with Mauchly's tests. Behavioural time-dependent measures assessed during the mGT, the dWST and the MLT were analysed by RM-ANOVA with session (1 to 5) or 40-trial block (beginning or end) as within-subject factors, and treatment (SHAM vs CORT) or clusters (good, intermediate or poor DMs) as between-subject factors. Group's performance in the mGT was compared to chance level (50% of advantageous choices) using Student t-tests. The degradation of the coat state due to the treatment was analysed by ANOVA with factors being weeks (1 to 13) and treatment (SHAM vs CORT). When datasets did not meet assumptions for parametric analyses, non-parametric analyses i.e. Kruskal-Wallis, Wilcoxon or Mann Whitney U tests, were used. Upon significant main effects, further comparisons were performed with Duncan or Bonferroni corrections.
The assumption of independent and normally distributed distribution of treatment populations within each cluster was tested with Chi-squared tests (Χ 2 ). Dimensional relationships between behavioural markers of DM, as measured as final performance (% of advantageous choices in the last session) variable in the mGT and protein levels in the various brain structures under investigation were analysed using Pearson correlations.
For all analyses, alpha was set at 0.05 and effect sizes are reported as partial ⴄ 2 (pⴄ 2 ).

RESULTS
At the population level, all mice showed a progressive increase in their performance in the mGT over 5 sessions [main effect of session: F4,288=31.8, p<0.0000, pⴄ 2 =0.31] but no general difference was found between groups [treatment: F1,72=2.4, p>0.05, pⴄ 2 =0.03] (Figure 2A). However, further analyses revealed that whereas SHAM mice allocated their response preferentially towards advantageous options from session 2 onwards [% advantageous choices vs chance, session 1: t38=1.2, p>0.05, sessions 2-5: t38>2.5, all ps<0.0000], CORT mice required 20 more trials to improve DM [session 1&2: t34<2.7, ps>0.05, sessions 3-5: t34>4.8, all ps<0.0000]. Thereby this highlights that chronic CORT lengthens exploration and delays the onset of optimal DM performance. We further explored whether chronic CORT could be considered a vulnerabilization factor to suboptimal DM ( Figure 2B). The majority of individuals displayed the optimal strategy (good DMs). They represent 82.05% of SHAM and only 62.86% of CORT animals (mean % advantageous choices ± SEM: 82.78±1.25). Individuals from the intermediate DM subpopulation developed a delayed preference towards the advantageous options, without reaching the optimal strategy. They constitute only 5.13% of SHAM whereas 28.57% of CORT mice (60.00±1.38). Poor DMs failed to develop a preference for any option and correspond to 12.82% of SHAM and 8.57% of CORT animals (39.38±2.58). The distribution of CORT and SHAM mice in the three subpopulations was compared, highlighting a significant difference (Χ 2 , p<0.0001) which is mainly accounted for by the intermediate DMs.
In-depth analysis of interindividual variability was further performed. Decision-making subpopulations learnt at different rates [main effect of group: F2,71= 21.0, p<0.0000; pⴄ 2 =0.37; session x cluster interaction: F8,284=6.1, p<0.0000, pⴄ 2 =0.15] (Figure 2C) (Figure 3A, 3B).  (Figure 3F) and final mGT performance did not correlate with FST scores [r=-0.12, p>0.05], these results suggest that the coping style does not primarily influence DM performance. Both SHAM [consumption of sucrose solution vs 50%: t39=79.3, p<0.00] and CORT [t39=61.6, p<0.00] mice expressed a strong preference for the sucrose solution compared to water (percentage of total sucrose consumption ± SEM, SHAM: 96.50 ± 0.59; CORT: 94.84 ± 0.73) (Figure 3G), and no difference between them was evidenced for the sucrose solution consumption [t78=1. 8, p>0.05]. Moreover, DMs categories did not either differ [F2,71=0.7, p>0.05, pⴄ 2 =0.02] (Figure 3H) As CORT slowdowns onset of optimal DM strategy, we tested whether motor performance sub-serving execution and exploitation of the mGT was also impacted. Mice improved their performance along the task [main effect of session: F3,636=38.5, p<0.0000, pⴄ 2 =0.15], with chronic CORT impairing their general MLT performance [main effect of treatment: F1,212=12.7, p<0.001; pⴄ 2 =0.06]. Nevertheless, no significant interaction was found between factors [treatment x session interaction: F3,636=0.9, p>0.05, pⴄ 2 =0.00]. When comparing individual sessions, CORT animals hold shorter on the rotor than SHAM mice in the first three sessions, a difference that disappeared at the end of the task, thus suggesting a delayed learning process in the pathological condition ( Figure 4C). Decision-making clusters did not behave differently during the task [main effect of cluster: F2,211=0.1, p>0.05; pⴄ 2 =0.00] and MLT scores do not correlate with final mGT performance [r=0.0497, p>0.05]. These results show that CORT treatment interferes with motor learning processes, which could impact exploration in the mGT. We next investigated whether biochemical adaptations to chronic CORT could account for differential DM performance. The FCS appeared significantly degraded in CORT animals from the third week of treatment [main effect of treatment: F1,890=448.1, p>0.0000, pⴄ 2 =0.33; week: F13,890=20.1, p<0.0000, pⴄ 2 =0.23; treatment x week interaction: F13,890=11.5, p<0.0000, pⴄ 2 =0.14; from third week of treatment: all ps<0. Finally, we focused on the key players GR/MR ratio and CRF from the regions of interest. The mPFC GR/MR ratio value of CORT animals (mean ± SEM: 1.55±0.37) was significantly decreased compared to control animals (2.30±0.55) [U=451.0, p<0.05] (Figure 5B, 5C). To disentangle the origin of this difference, GR and MR levels were compared separately. While  Figure 5D). Nonetheless, a significant correlation between the mPFC CRF levels and the final mGT performance of CORT animals was observed [r=-0.5166, p<0.05] (Figure 5E), suggesting that vulnerability to suboptimal DM induced by chronic CORT relates more to CRF signalling deregulation than to GR per se.

DISCUSSION
The results of this study reveal that several weeks of CORT exposure delays the encoding of the contingencies required to select responding towards optimal DM in a probabilistic gambling task. Inter-individual differences in the capability to develop an optimal DM strategy were evidenced in the global mouse population and remarkably, the proportion of individuals displaying suboptimal DM performance is enhanced upon chronic CORT. The identified chronic CORT-induced suboptimal spatial WM, which seems to be more detrimental to the learning rate than to memory load, could somehow hamper early exploration in the mGT, impending integration of task contingencies, in line with preclinical [53][54][55][56] and clinical reports [57]. Besides, chronic CORT, instead of hindering final performance, slowsdown the learning rate in the MLT, a DLS-dependent motor task that does not rely on positive valence systems [58], thus evoking clinical psychomotor retardation [59]. Additionally, though reward sensitivity does not directly account for differential DM performance, learning to cope with negative outcomes is impaired upon chronic CORT. Taken together, these results suggest a CORT-induced hedonic misbalance with differential DM performance better accounted for by a dysregulated negative (lose-shift dimension) rather than positive (win-stay dimension and sucrose preference) valence system. This study disclosed a GR downregulation in the mPFC of treated animals, suggesting that chronic CORT exposure may disturb reflective behaviour for optimal planning, and favour suboptimal habit formation, in line with previous studies addressing the role of the mPFC in instrumental behaviour [39,60,61]. Of particular interest, CRF signalling in the mPFC of CORT individuals was found to negatively correlate with their final DM performance, underlining synergetic effects of CRF and CORT in stress-induced cognitive alterations (in line with [62]). Experimental studies in humans have proposed that IGT performance relies on reward sensitivity [16] though anhedonia measures do not always correlate with task performance [63]. This is the case for our data in reward sensitivity upon chronic CORT. An efficient exploration phase in the IGT and therefore in its rodent adaptations, would guide the behaviour to a faster stickiness to the optimal choice strategy, that would become more independent of the outcome, i.e. more habitual [39,64]. The spatial WM and psychomotor deficits observed in our CORT-treated mice could affect primarily action-outcome learning crucial for optimal DM strategies, rather than solely cue processing, and even if they do not directly predict differential DM, they may compromise the transition from goal-directed to habit-based behaviour. Taking in consideration the complementary roles of the studied brain structures in instrumental learning [38,39,54], we suggest that the GR downregulation observed in the mPFC of treated animals would primarily entail suboptimal action-outcome encoding over cue processing, yielding action-outcome consolidation especially vulnerable to chronic CORT. This interpretation is in agreement with previous studies reporting negative consequences on cognition upon chronic stress and anxiety [4,65,66]. Stress has been shown to affect crf signalling, compromising the positive valence system [67] and disrupting fronto-striatal cognition [20,68]. Together with the present study, these results suggest that high mPFC CRF levels can be considered a neurobiological endophenotype of vulnerability to suboptimal DM under chronic stress. Corticotropin-releasing factor signalling disruption may hinder mPFC computations supporting optimal fronto-striatal cognitive functioning, and specifically goal-directed behaviours, and contribute to overreliance on the negative valence system to form suboptimal action-outcome learning. This interpretation further support the somatic marker hypothesis [69,70] and point towards an integral role of CRF. However, previous studies have warned about the dissociable roles of the ventral and dorsal mPFC in DM, especially when behaviour is reward-guided and sensitive to negative feedback [71]. Since we jointly processed ventral and dorsal mPFC structures, further investigations will be necessary to establish the exact contribution of the prelimbic and infralimbic areas of the mPFC in the CRF signalling upon chronic CORT. The results presented here demonstrate that chronic CORT exposure impedes optimal DM under uncertainty in mice, impacting mPFC GR and CRF signalling. Manipulating the latter to counterbalance overreliance on the negative valence system with suboptimal habit formation, could prove useful to improve coping with risk aversion towards rigidification of optimal choices when DM involves overcoming a conflict. In sum, this study provides novel insight into the mechanisms of maladaptive value-based DM caused by chronic exposure to GC, and have important implications for understanding pathophysiological mechanisms in a transdiagnostic perspective and for identifying alternative pharmacological targets towards precision medicine in biological psychiatry.

Funding and Disclosure
This study was supported by grants from the Communauté d'Agglomération du Grand Besançon (LC), which had no role in the study design, collection, analysis or interpretation of the data, writing of the report and in the decision to submit the paper for publication. DB is funded by a grant from the Medical Research Council (MR/N02530X/1) and a research grant from the Leverhulme Trust (RPG-2016-117). All authors declare no conflicts of interest.