Sex-specific interactions between procedural and deliberative decision-making systems in a mouse model of Alzheimer’s disease

A central question in aging and Alzheimer’s disease (AD) is when and how neural substrates underlying decision-making are altered. Here we show that while APP mice, a commonly used mouse model of AD, were able to learn Restaurant Row, a complex neuroeconomic decision-making task, they were significantly impaired in procedural, habit-forming, aspects of cognition and relied heavily on deliberation when making decisions. Surprisingly, these behavioral changes are associated with amyloid-beta (Aβ) pathology and network remodeling in the striatum, a key brain region involved in procedural cognition. Furthermore, APP mice and control mice relied on distinct sex-specific strategies in this neuroeconomic task. These findings provide foundational pillars to examine how aging and age-related neurodegenerative diseases impact decision-making across sexes. They also highlight the need for complex behavioral tasks that allow for the dissociation of competing neurally-distinct decision-making circuits to get an accurate picture of changes in neurodegenerative models of human disease.


INTRODUCTION
Alzheimer's Disease (AD) is a neurodegenerative disease clinically characterized by a progressive decline in cognitive skills, such as visual-spatial skills and memory, leading to dementia 1 . Although, historically, assessment of cognitive deficits in AD has focused on memory, language, and visuospatial skills 2 , growing attention is now being paid to how cognitive decisionmaking changes over the course of AD [3][4][5] . Elderly individuals have more difficulties than young individuals choosing between uncertain alternatives 6 and more difficulties learning advantageous decision strategies 7 . These difficulties may be more pronounced when cognitive function is compromised by mild cognitive impairment (MCI) or by dementia 5,8,9 . While decision-making has been assessed in patients affected by neurodegenerative disorders such as Parkinson's disease 9,10 or frontotemporal dementia 9,11 , little is known about Alzheimer's disease (AD) and its prodromal stage MCI beyond the fact that both are associated with gross impairments in decisionmaking 5,9,12 . In these human studies, assessment of decision-making has relied on paradigms such as the Iowa Gambling Task (IGT), where participants are tasked with maximizing bets under changing circumstances or the Cambridge Gambling Task (CGT), which assesses decisionmaking under risk. While informative, most of these studies have relied on simple assessments of risk or ambiguity, which do not capture other important aspects of decision-making such as value-based learning, foraging abilities, or deliberation. Importantly, these tasks capture fundamentally human measures of risky decision-making, which makes them difficult to assess in non-human animal models. In addition, these paradigms do not take into account modern theories that behavior arises from multiple decision-making systems [13][14][15][16] . Extensive work has shown that what appears to be similar behaviors on simple tasks could be produced by neurallydistinct computations 15,16 . However, it is possible to identify behavioral markers within more complex tasks that can identify these neurally-distinct computations [16][17][18] . These different decisionmaking systems are instantiated in different neural circuits, and thus may become dysfunctional at different times in the development of disease. It is thus essential to identify the behavioral and economic phenotypes that account for individual variation in decision-making and to characterize the cognitive and motivational variables of intact and impaired decision-making. Further, advancing our basic knowledge of the cognitive and neuronal circuits underlying decision-making will also help us identify processes impacted in aging and in AD and other dementias.
Neuropathologically, AD is characterized by amyloid plaques composed of the amyloidbeta peptide (Aβ), neurofibrillary tangles (NFT) comprised of hyperphosphorylated tau proteins, neuronal death and neuroinflammation 19 . Brain Aβ deposition follows a pathological progression established by the Thal and Braak postmortem staging for regional extent of Aβ pathology 20 . Of note, these neuropathological studies as well as positron emission tomography (PET) amyloid imaging suggest that the presence of amyloid burden in the cerebral cortex alone is not an accurate predictor of clinicopathological AD 21 . Several groups have instead reported that striatal Aβ plaque density predicts the presence of a higher Braak NFT stage and clinicopathological AD in living subjects 22,23 . Specifically, Hanseeuw and colleagues (2018) demonstrated that a novel three-category PET Aβ staging system that includes striatum better predicted hippocampal volumes and subsequent cognitive decline than a similar staging system including only cortical amyloid. Despite these exciting findings and the existence of previous work in individuals carrying familial AD mutations suggesting a strong relation between striatal Aβ and executive functions 24 , none of these studies determined whether Aβ pathology was linked to striatal dysfunction or whether some specific cognitive domains would be impaired. Additional work is thus needed to distinguish the respective contributions of striatal Aβ pathology to the onset of neuropsychiatric deficits or decision-making.
A number of mouse models have been developed that recapitulate some of these features of AD 25,26 . Amongst those, the transgenic J20 mouse model 27 overexpressing mutant human amyloid precursor protein (APP) has proven particularly useful when examining the role of Aβ on synaptic and memory deficits [28][29][30][31][32] due to early plaque development, predominantly in the hippocampus at around 5-6 months of age 27,33 . Recent work further expanded the characterization of amyloid pathology development in APP mice using whole brain imaging 33 , making this APP transgenic line an ideal candidate to study the impact of Aβ pathology on neural domains. Despite this seemingly extensive characterization of its phenotype, it is important to point out that only amygdala-and hippocampus-dependent memory modalities have been tested in this animal model, leaving unanswered questions of whether these mice are impaired in other decisionmaking modalities that either involve other brain structures or these same brain structures but engaged in multiple complex, dynamical ways.
Current theories of decision-making suggest that decisions arise from computationally separable systems implemented neurally through different neural circuits [14][15][16]34 . Deliberative strategies depend on the ability to predict the consequences of one's actions 17 . Spatially, these strategies depend on the presence of a cognitive map in the hippocampus containing information about the shape of the environment and the locations therein 15,35 , however, evidence is that the cognitive "map" in the hippocampus is more general, containing general information about the structure of the world with which to plan [36][37][38] . In contrast, procedural strategies depend on wellpracticed action-chains and fast pattern recognition of situations with stored cache values involving the dorsolateral striatum and the basal ganglia 16,39 . Instinctual (Pavlovian) systems learn when to release actions from a limited action repertoire 16,40 , involving the amygdala, periaqueductal gray, and nucleus accumbens shell 40,41 . It is likely that the brain has evolved to have these different decision-making systems of which some are more advantageous in specific situations than others.
To better understand complex decision-making, we tested APP and control nontransgenic (nTG) littermates on a neuroeconomic spatial foraging task called Restaurant Row (RRow) that accesses multiple decision-making systems in controlled ways both within trial and across days. This neuroeconomic decision-making task was initially developed for rats 42 and was recently adapted to mice 43 and humans 44 giving it immense translational value. Importantly, the RRow task 43 allows for the dissection of different aspects of decision-making, including instinctual approach (Pavlovian), procedural habit (cached-action sequences), and deliberative (planning). While deliberative decision-making is thought to be hippocampal dependent, procedural decisionmaking relies on behavioral repetition and proper dorsolateral striatal functioning, and Pavlovian action-selection depends on amygdala function 15,16,40,45 . The discrimination of these different submodalities of decision-making is particularly informative because when one of these neural circuits is impaired, another may compensate for it. For instance, as a consequence of age-dependent changes in hippocampal function, aged rats and humans shift from using a hippocampaldependent "place" strategy to instead using a striatal-dependent "response" strategy when navigating [46][47][48][49] . Without knowing the strategy an animal is using, the gross behavioral output may look unimpaired while a disruption at the circuit-level may be overlooked 50,51 . On the other hand, two similarly appearing behavioral impairments could arise from distinct circuit disruptions. Thus, pitting multiple decision systems against one another on a neuroeconomic task can read out competing processes that ultimately produce behavioral outputs and in turn aid in revealing the source of underlying computational dysfunction. Considering the well-documented impairment of hippocampal function in APP mice [29][30][31][32][52][53][54] , we hypothesized that the multifaceted components of decision-making captured by RRow would be able to discern a more fine-grained approach to better characterize putative compensatory shifts in behavior.
In the present study, we examined young adult nTG and APP male and female mice on RRow at 6 months of age, when APP mice display intact spatial memory learning with impaired memory retention in hippocampal-dependent tasks such as the Barnes maze 53,54 . All mice were able to learn this complex neuroeconomic decision-making task in which costs (delay to wait for food) start out low and then transition over weeks to become much higher (subsequently the reward environment becomes scarcer). Surprisingly, APP mice adapted their behavior more quickly upon the transition to scarcity and were able to renormalize their earnings to pre-transition numbers faster than nTG littermates. nTG mice typically accepted offers that, from prior experience with the different flavors, they most preferred and that were high in value (cheaper than what they were willing to spend in time waiting) quickly. This decision-making behavior resulted in less time deliberating, as has been observed previously 43 . By contrast, APP mice took significantly more time to decide before making a decision for all offers, whether advantageously cheap or disadvantageously expensive, and in general only took offers that they were willing to wait and earn based on their individual thresholds. This increased deliberative behavior was surprising considering that this APP transgenic mouse line is widely known for its hippocampal dysfunction. However, upon examination of Aβ plaque distribution throughout the brain, APP animals displayed a historically undocumented presence of amyloid deposits in the striatum. This new finding suggests that APP mice might have disrupted dorsolateral striatal procedural circuits. Overall, our results suggest that though APP mice were able to learn the RRow task, and perform it successfully, they required extensive deliberation to make any decision, even under conditions where their nTG counterparts used habitual, procedural decisions, quickly taking offers and then re-evaluating if necessary. Our results are the first to examine decision-making deficits across multiple decision-systems in a mouse model of AD and strongly emphasize the importance of examining multiple decision-making systems using tasks that access these multiple decision systems for behavior.

Hyperactivity and early discrimination of offers by APP mice
To better understand the cognitive and motivational behaviors of normative and impaired decision-making in AD, APP transgenic J20 mice and nTG littermates were subjected to RRow, a neuroeconomic decision-making task, to work for their sole source of food for the day (Fig. 1A).
Mice were given one hour to traverse a square maze with four different feeding sites (i.e., restaurants), each with unique spatial cues and flavors. Upon entry into the "offer zone" (OZ, Fig.  1B) a tone indicated the delay animals would have to wait before getting a pellet. Higher pitch indicated longer delays while lower pitch indicated shorter delays. Delays were random on each entry, selected from a range depending on task stage (see methods). At this point, mice could choose to enter the "wait zone" (WZ) or skip the offer, thereby leaving the restaurant and continuing foraging by moving on to the next restaurant in the correct order (Fig. 1B). If the mouse decided to enter the WZ, the tone would step down in pitch until completion of the time delay, after which food would be delivered (earned). However, mice could re-evaluate their initial decision upon entering and quit the WZ at any time, forfeiting the pellet. Thus, the task was selfpaced, and mice needed to alter their behavior to gain the most food in the one-hour time limit.
Mice progressed from a reward-rich environment to a reward-scarce environment in stages across days (Fig. 1C). Each stage was defined by the range of possible delays that could be encountered upon entry into the OZ. The first stage spanned 7 days, during which all offers were only 1 s. Following the first stage, the range of offers encountered increased to 1-5 s. This second stage lasted 5 days (days [8][9][10][11][12], after which the offers increased to 1-15 s for the subsequent 5 days (days [13][14][15][16][17]. The last stage (stage 4) spanned days 18-70 (i.e., rest of the experiment) and consisted of offers being randomly chosen between 1-30 s. Mice only had 1 hour to get all of their food for the day, so these changes in offer distributions produced increasingly reward-scarce environments.
This task was used previously with young 3-month-old C57BL/6J male mice to examine how non-transgenic mice behaved in a complex, neuroeconomic, decision-making task 43,44,55 . Here, we examined both male and female adult 6-month-old APP and C57BL/6J nTG littermates to determine the effect of early amyloid deposition in absence of overt neuronal or synaptic loss 27,33,53,54,56 on decision-making neural substrates. Importantly at this age, APP mice display normal spatial reference learning but are slightly impaired spatial memory retention using the Barnes circular maze compared to control littermates 54,57 .
All mice, including the seven APP mice and the eight nTG littermates, learned to run laps in the correct counterclockwise direction quickly during the first stage of the task (days 1-7, Fig.  2A). On day 1 and throughout the first week, APP mice ran significantly more laps than control mice (RM-ANOVA, F(1,13) = 11.105, p = 0.0054; Fig. 2A and Suppl. Figure 1). As APP mice are well-known to display hyperactivity phenotypes 54,56 , we measured running speed and distance travelled during the task. APP mice did travel more (Suppl. Figure 2) and ran faster during the first week of the experiment (Suppl. Figure 3), suggesting hyperactivity. This increase in average travel speed and distance equalized by the end of the experiment as nTG mice increased their running speed to match those of APP mice (Suppl. Figures 2A-3A) suggesting that by the later stages of the experiment, differences in reward-rate were not due to running speeds. During the first week (1 s offers), APP mice also earned significantly more pellets than controls (RM-ANOVA, F(1,13) = 9.777, p = 0.008; Fig. 2B and Suppl. Figure 4). However, the two groups showed similar earning rates in the subsequent 2 nd and 3 rd stages of the task (see below), even though APP mice were still covering more ground and running faster than nTG mice (Suppl. Figures 2A-3A). All mice developed flavor preferences in the first stage of the experiment, whose rank order remained stable throughout the experiment (Suppl. Figure 5).
To determine how efficiently mice were using their time, we analyzed reinforcement rate as the amount of time in seconds between earning a pellet (inter-earn-interval, IEI). During the first week of the task, APP mice had a shorter IEI compared to nTG animals (peaking at ~140s vs. ~250s respectively), indicating higher reinforcement rates in APP mice than in control littermate mice (RM-ANOVA, F(1,13) = 10.35, p = 0.0018; Fig. 2C), though this equalized between groups in the next 10 days of the task.
The second stage of the experiment (beginning day 8) consisted of offers varying between 1-5 s and lasted for 5 days (days [8][9][10][11][12]. The number of laps run and pellets earned increased from the previous stage (RM-ANOVA between stages indicated by the open star on dotted lines; laps: F(1,13) = 42.93, p < 0.0001; earns: F(1,13) = 12.692, p = 0.0035; Fig. 2A, B) but equalized between the groups, and subsequently the reinforcement rates stabilized and equalized between APP and nTG mice (Fig. 2C). Following 1-5 s offers, mice transitioned to the third stage consisting of 1-15 s offers between days 13-17. All mice continued increasing the number of laps they ran, and both groups did this equally (RM-ANOVA between stages indicated by the open star on dotted lines; F(1,13) = 25.22, p = 0.0002; Fig. 2A). Though mice were running more laps, the number of pellets earned remained stable relative to the previous stage (Fig. 2B).
A key element to RRow is that it entails multiple junctures of decision-making, allowing us to examine different behavioral components involved in these decisions (Fig. 1B). Upon entering the OZ, mice can choose to accept the offer and enter the WZ or decide to skip and continue on to the next restaurant. Mice who enter the WZ can then decide to wait out the delay thus earning food or can re-evaluate and quit, forfeiting the pellet and continuing on to the next restaurant (Fig.  1B). Prior work 43 has suggested that the decision to skip or enter the WZ develops differently than the decision of whether to wait and earn or quit out of the WZ. Non-transgenic mice entered WZs for most offers that were presented to them, regardless of flavor, during the first week of the experiment when offer durations were very short (Fig. 2D, grey line). This pattern of accepting most offers was starkly different from APP mice who began discriminating among the offers they accepted and instead starting skipping offers (RM-ANOVA, F(1,13) = 7.801, p = 0.016; Fig. 2D). As costs began to increase with the progression of stages, nTG began entering WZs at the same rate as APP mice (Fig. 2D).
With the increase in offer length, all mice also began quitting accepted offers (RM-ANOVA between stages 2 and 3 indicated by the open star, F(1,13) = 6.062, p = 0.029; Fig. 2E). Lastly, we analyzed the thresholds of the willingness to enter an offer (OZ threshold), as well as the willingness to wait for the reward (WZ threshold) to assess how mice handled offers below or above their individual threshold. During the 1-5 s (2 nd ) and 1-15 s (3 rd ) stages while the reward environments were still relatively rich, OZ and WZ thresholds were equivalent in both groups, indicating that all mice, for the most part, decided to enter wait zones where the delay matched how long they were willing to wait and earn (Fig. 2F).
Together, these data from the initial stages of the experiment are consistent with prior work relying on this task despite using older adult animals 43 . These results suggest that both APP and nTG littermates were able to learn this complex neuroeconomic decision-making task and to adapt to evolving scarcity.

APP Mice Adapt to Scarce Foraging Environment Faster than Control Mice
Upon transitioning to the final stage in which offers ranged from 1-30 s, all mice experienced a drop in the number of pellets earned as they learned to navigate the increased offer lengths (RM-ANOVA between stages 3 and 4, F(1,13) = 26,49, p = 0.0002, indicated by the open star, Fig. 2B). Almost immediately (on the second day of this stage, day 19 of the task), APP mice responded to this new environment, ran more laps, and were able to earn significantly more pellets than controls (RM-ANOVA, laps: F(1,13) =4.022, p =0.06; earns: F(1,13) = 8.018, p = 0.014, Fig. 2A, B). In fact, APP mice were able to renormalize their earnings in the reward-scarce environment to the earnings they had achieved in the reward-rich environments within a couple of days, i.e., by day 23 of the task (Fig. 2B, green bar). In contrast, nTG littermates did not reach pre-transition earnings until day 53 of the task (Fig. 2B, grey bar). Using this information, we analyzed this fourth and final stage of the experiment in three separate epochs. Prior to this final stage (before day 18) where the reward-distribution became scarce, OZ and WZ thresholds for both groups were equivalent, as mice generally took offers for which they were willing to wait and earn. However, immediately following the transition to 1-30 second offers, all mice initially took most of the offers in the OZ as indicated by the very high 20-25 s OZ thresholds for both groups that increased on day 18 (RM-ANOVA for transition between stages 3 and 4, F(1,13) = 67.56, p < 0.0001, indicated by the open star; Fig. 2F, black and pink lines) but quit the majority of trials that were accepted with a starting delay above 10 s as indicated by the ~10 s WZ thresholds, suggesting offer zone decisions were not in register with willingness to wait in the wait zone (Fig. 2F, grey and blue lines). Over time and despite the scarcity of the environment, all mice developed lower OZ thresholds, indicating that the delay for which they were willing to enter the WZ approached the delay for which they were willing to wait in the WZ (Fig. 2F). Here again, APP mice adapted their behavior to this change in reward scarce environment quickly by skipping more offers than nTG mice by epoch B (days 23-52, RM-ANOVA, F(1,13) = 288.42, p < 0.0001; Fig. 2D). APP mice continued skipping more offers than nTG for the rest of the experiment (Epoch C days 53-70, RM-ANOVA, F(1,13) = 81.944, p < 0.0001; Fig. 2D).
By not entering WZs for high offers, APP mice showed a quick drop in OZ thresholds that became significantly lower than the OZ thresholds of nTG mice during epoch B of the last stage of training (days 23-52, RM-ANOVA, F(1,13) = 223.16, p < 0.0001; Fig 2F) and that eventually became in register with WZ thresholds (Fig. 2F). APP mice had significantly lower OZ thresholds than nTG mice during the rest of the experiment, even after nTG mice renormalized their food intake (epoch C, days 53-70, RM-ANOVA, F(1,13) = 39.31, p < 0.0001; Fig. 2F).
The fact that by the end of the experiment, OZ and WZ thresholds were not significantly different in APP mice suggests that APP mice generally entered the WZ when the offer was one that they would wait to earn. This hypothesis was further supported by examining how often APP mice quit upon entering the WZ. Consistent with this observation, APP mice quit significantly less than control mice starting during epoch B (days 23-52, RM-ANOVA, F(1,13) = 97.94, p < 0.0001; Together, these data suggest that APP mice were more selective in the offers they chose to enter and were more likely to wait and earn pellets upon entering, whereas control mice were more likely to enter but also more likely to quit.

Mice Show Vicarious-Trial and Error in the Offer Zone
To better understand how mice approached the decision to enter or skip offers in the task, we measured deliberative behavior. To assess ongoing deliberation and planning at the decision point (OZ), we examined vicarious-trial and error (VTE), a behavioral phenomenon in which an animal pauses at a choice point and orients sequentially towards its options. VTE was estimated by calculating the absolute integrated angular velocity, IdPhi, whereby larger IdPhi corresponds to increased VTE 58 . Prior studies using RRow found that mice showed increased VTE (more biphasic trajectories, which include a sharp turn) when making the decision to skip, with less VTE (smooth entrances) when eventually deciding to enter 43 as can be seen in illustrations of examples from a nTG mouse from this study (Fig. 3A, B). When the environment was rich in rewards during stages 1-3, and as animals increased their knowledge of the task, VTE enter and VTE skip scores equally decreased from ~80 to ~20-40 for both nTG and APP groups (Fig. 3C). This is consistent with previous observations 43 .
Though previous observations suggest the decision to skip required more VTE than the decision to enter 43 , we did not see this in APP mice. APP mice presented with an offer they would go on to accept showed a biphasic trajectory similar to what we would expect of an eventual skip. By the last 10 days of the experiment, when all animals had renormalized food and were making consistent behavior choices, nTG mice showed significantly lower VTE amounts when entering, than when deciding to skip (one-way ANOVA within-group F(1,7) = 24.08, p = 0.0002; Fig. 3D). In contrast to the discrepancy nTG mice showed when skipping or entering, APP mice showed equal levels of VTE whether eventually entering or skipping (one-way ANOVA withingroup F(1,6) = 3.222, p = 0.098). Student t-tests within-and between-groups (nTG enter, nTG skip, APP enter, APP skip) revealed significant differences between nTG enter and nTG skip (F(1,7) = 24.08, p =0.0002; Fig. 3D) but no other significant group differences. Together, the VTE results suggested that, for nTG mice, the decision to enter the WZ or the decision to skip the offer in the OZ was dichotomous but APP mice appeared to treat the two decisions equally, requiring high deliberation upon making any decision.
Because previous data have demonstrated that low-value offers (where the delay is expensive relative to the individual's threshold) require more deliberation and more time to decide than high-value offers (where the delay is cheap relative to the individual's threshold) across species 44 , we next evaluated the amount of VTE based on the value of the offer. We defined the value of the offer by subtracting the given offer from the WZ threshold for that animal, i.e., value= WZ Th -Offer (concrete examples are provided in Fig. 3A, B). Across the spectrum of possible values, VTE distributions for both groups followed a quadratic function (last 10 days of the experiment; Fig. 3E), consistent with previous studies 43 . However, APP mice showed larger IdPhi values at peak than nTG mice (Fig. 3E) indicating that they showed more VTE. Importantly, the value at which the peak of the IdPhi distribution appeared was shifted to the right (higher value, better offer) for APP mice relative to nTG mice (-3 value compared to nTG value of -10). Segregating value offers between economically good deals (Value > 0) and bad deals (Value < 0) for the last 10 days when animals were well trained, control animals displayed markedly higher VTEs for bad offers and lower VTE for good offers (One-way ANOVA within group F(1,7) = 48.615, p < 0.001; Fig. 3F). This makes intuitive sense as it is a sound economic decision to take good, cheap offers. By contrast, turning down an offer, even if expensive, may take more processing when considering the other option is exploring other unknown opportunities. It may take more mental effort to leave a known, though expensive, reward in a scarce environment.
Importantly, APP mice did not show this distinction and instead showed similar amounts of VTE (One-way ANOVA, F(1,6) = 0.601, p = 0.453; Fig. 3F) prior to deciding, regardless of whether the offer was good or bad. Analyses between groups for both good and bad values showed that while nTG and APP mice deliberated equally for bad offers (Fig. 3F), APP mice showed higher VTE (F(1,14) = 4.338, p = 0.05; Fig. 3F) for good offers than nTG mice. These data further support the conclusion that nTG mice deliberated more when they had to make skip decisions in conflict with their approach desires, but that APP mice deliberated more regardless of whether the offer presented was economically cheap or expensive, or whether they skipped or entered.
This leads to the question as to why APP mice deliberated more for good offers? In a scarce reward environment, good offers (value > 0) should be taken. nTG mice reliably demonstrate this, with low VTE (smooth entrances) for values greater than 0. By the end of the experiment, this should be procedural and habitual. The fact that APP mice did not show smooth entrances for good offers, and instead showed high amounts of deliberation (equal to amounts of bad offers) led us to question if APP mice had impaired procedural decision-making. To investigate this, we examined the amount of VTE animals demonstrated for distinct flavor preferences. All animals showed reliable flavor preferences throughout the experiment, earning more of their preferred flavor than their least preferred flavor (Suppl. When we evaluated VTE for flavor preferences by stages throughout the experiment, nTG mice showed differences in VTE by flavor for every stage, except the first stage where costs were low (Suppl. Figure 6A; first stage, F (3,196

Vicarious-Trial and Error Used Differently throughout the Experiment
Time spent in the OZ is wasted time from the time budget. It would be economically beneficial for mice to enter the WZ directly, make the decision while in the WZ, and then quit as soon as they realized they had entered a bad deal. However, previous work has found that neither mice, rats, nor humans behave this way 42,43,58 , and that instead all are sensitive to aspects of choice history beyond the value strictly tied to the reward itself. Additional information, which may be associated with fluctuations in affective state, is also taken into consideration in the OZ and WZ. Previous work has found that OZ time does reduce the efficiency of getting food on the RRow task 43 . However, we observed that APP mice displayed enhanced deliberative processes (i.e., higher VTE) and earned more food than nTG mice, suggesting that the APP behaviors produced an increased efficiency. To determine the role VTE was playing in both the nTG and APP mice, we simulated the number of earns one would expect if the mice had hypothetically displayed different VTE behaviors in relation to OZ choice outcome. First, we identified high-VTE vs. low-VTE trials for each mouse defined by a median split among all VTE values for each day, as previously reported 43 . We then took two approaches to simulate hypothetical alternatives to the total number of pellets earned: replace the trial outcome of high-VTE trials with that of low-VTE trials or simply force all high-VTE trials to end with no food earned before calculating expected earns for that day. Finally, we then looked at all of the stages of the experiment, including the three epochs in the last stage (epoch A, no food renormalization: days 18-22; epoch B, APP renormalized: days 23-52; epoch C, all animals renormalized: days 53-70) to see how patterns in VTE are related to the number of earns throughout the process of mastering the task (Fig. 4A ; Fig.  4A). Interestingly, in simulated nTG mice, removing high VTE trials renormalized their food intake faster than the observed mice (day 48 compared to day 53) though the number of earns they needed to get back to was lower (70 daily earns compared to 80 daily earns).
For APP mice, the same trend replicated. Replacing high VTE trials with low VTE trials significantly increased their earnings during the last stage of training (RM-ANOVA epoch A:  ; Fig. 4B). Unlike nTG mice, who still exhibited renormalized food faster without high VTE trials at all, simulated APP mice took longer to renormalize food than the observed APP mice (renormalizing by day 30 instead of day 23), showing an extreme penalization.
ANOVA analyses comparing the means of earns during the last stages revealed that, for nTG mice, replacing high VTE had the most impact on earning potential (Fig. 4C), particularly during the last epoch (epoch C; F(2,21) = 20.72, p < 0.0001) with simulated nTG animals who had high VTE trials replaced with low VTE trials earning more than observed mice (post-hoc analysis: p = 0.0003). APP mice did not show significant differences when the earnings are averaged and compared between simulations (Fig. 4C).
Lastly, when simulations were compared by condition (Fig. 4D-F), it became clear that nTG mice benefit from high VTE trials removed or replaced with low VTE trials. In the observed data, APP mice earn more than their nTG littermates throughout the last stage of the experiment (Fig. 4D). However, in both removed (Fig. 4E) and replaced simulations (Fig. 4F), nTG mice are able to earn as much as their APP counterparts by the last epoch (days [53][54][55][56][57][58][59][60][61][62][63][64][65][66][67][68][69][70]. These data imply that some deliberation is important for maximum earnings but that too much deliberation is costly, for both nTG and APP mice. However, it is important to note that the impact of deliberation was stronger for nTG mice who had much more to gain by deliberating less than their median amount. This also suggests that VTE may be serving other purposes beyond just reinforcement maximization in nTG mice.

Newly identified striatal Aβ deposition in APP mice
Because amyloid plaques are already present in the parenchyma of APP mice at 9 months of age 27,31,33,56 (age at which mice ended the RRow task), we wondered whether development of amyloid-beta (Aβ) pathology could explain the altered behavior of APP animals compared to nTG littermates. Despite the extensive use of J20 APP transgenic mice as a model of AD, the spatiotemporal distribution of amyloid pathology in this model has only been sporadically characterized (https://www.alzforum.org/research-models/j20-pdgf-appswind). While recent work provided a whole-brain assessment of methoxy-X04-positive plaques in this line 33 , the approach used inherently missed early amyloid pathology present at the onset of cognitive deficits in hippocampus-dependent tasks 54,57 . To fill this knowledge gap and to establish the association of possible pathologically impacted neural networks with underlying behavioral changes in RRow, we sought to gain a better understanding of the overall localization of amyloid deposits in APP mice using an unbiased approach. Immediately following the RRow task, brains from 9-monthold APP and nTG mice were mass-processed using MultiBrain ® technology and analyzed by immunolabeling using the antibody 6E10 to spatially characterize amyloid plaque localization (Fig. 5A). No staining was observed when the antibody was used to stain tissue from nTG mice. Moreover, each Aβ plaque was confirmed morphologically by examination at high-power magnification (Fig. 5B) and spatial registration was performed using the sagittal Allen Brain Reference Atlas (Allen Institute for Brain Science ® ). Confocal image analysis revealed substantial amyloid plaque deposition in the cortex and hippocampus (Fig. 5A) as previously described 27,31 . Unexpectedly, striatal Aβ plaque density (Fig. 5C) and amyloid plaque number (Fig. 5D, E) were just as high as that seen in the isocortex and hippocampus respectively. Further segmentation of the major brain divisions revealed that amyloid plaque frequency was by far the highest in two subregions, the corpus callosum (11.72% [15/128]) and the caudoputamen (10.93% [14/128] ,  Fig. 5E). Amyloid burden was present in all three functional divisions of the striatum including dorsolateral striatum, dorsomedial striatum and ventral striatum often defined as sensorimotor, associative and limbic striatum respectively 59 . To our knowledge, these results are the first to document the deposition of Aβ in the striatum of middle-aged APP animals, a structure wellestablished for its role in value-based decision-making and in the learning of reward associations 59,60 .

Behavioral Differences in APP Mice were Sex Differentiated
The effect of sex is especially important for AD (recently reviewed here 61 ) for which phenotype heterogeneity is an intrinsic characteristic. Specifically, women typically present with faster rates of cognitive decline during mild cognitive impairment (MCI) or prodromal AD, and brain atrophy rates are also 1-1.5% faster in women with MCI and AD compared to men 62,63 . These sex differences are noteworthy considering that putative sex effects on Aβ burden remain unclear in AD. In mice, despite well-established sex differences whereby earlier-onset amyloid pathology has been consistently noted in females across multiple APP transgenic models 26 , sex effects on synaptic or cognitive deficits have not been studied systematically. For these reasons and because previous work from our group only used male mice, we were intrigued to determine whether decision-making processes altered in APP mice were driven by sex.
Following the recommendations made by Shansky 64 , analyses originally compared putative effects of transgene expression between nTG and APP groups but post-hoc data examination also included potential effects of sex differences in both groups of animals. We found significant behavioral differences between males and females in both nTG and APP mice. Male and female nTG mice looked similar in many behaviors, with the exceptions of laps run and WZs quit (Suppl. Figure 7).  Figure 7E).
We also found significant sex differences amongst APP mice (Suppl. Figure 8).  Figure 8E). Together this data suggest that male APP mice were more selective about the offers they entered (by entering significantly fewer WZ than females) and less likely to quit upon entering. By contrast, the OZ and WZ threshold patterns displayed by female APP mice were reminiscent of those generally observed in nTG controls.
Given that female APP mice looked more similar to nTG mice than male APP mice, we examined whether sex affected choice deliberation differentially in APP animals. When examining VTE behavior in the OZ, male APP mice had higher VTE values than female APP mice (RM-  Figure 6D).

Spatial differences in Aβ pathology are sex specific in APP mice
Upon identifying sex-dependent alterations in decision-making from APP mice performing the RRow task, we wondered whether amyloid burden was also influenced by sex differences. Unlike previous studies which reported enhanced amyloid deposition in female animals from several other APP transgenic mouse models 26 , averaged plaque numbers did not differ between APP male and female mice (t test, p = 0.424, Suppl. Figure 9A). However, factoring in the localization of Aβ deposits revealed that the proportion of amyloid plaques per brain were different between the sexes (Suppl. Figure 9B-D). Notably, inverse proportions of plaque deposition were observed in male APP mice compared to female APP littermates in isocortical and hippocampal divisions whereby Aβ deposits were proportionally more abundant in the isocortex of female animals (33 in female vs. 27 in male mice respectively) but less abundant by a factor of two-fold in their hippocampi (9 in female vs. 18 in male mice respectively; Suppl. Figure 9C). This observation was further supported upon increasing segmentation of the brain subdivisions by areas and layers, when applicable (Suppl. Figure 9D). Out of 27 cortical areas/layers, amyloid plaques were detected in 20 specific locales from female APP brains (i.e., 74.1%) whereas only 11 locales displayed Aβ deposits in male APP mice (i.e., 40.7%). In sharp contrast, amyloid plaques were not only found in all hippocampal fields from male APP brains unlike in female APP mice, their relative proportions within these locales were also quantitatively much higher as exemplified by the 80/20 and 66/33 ratios observed for CA1 and subiculum respectively (Suppl. Figure 9D). This novel sex-specific finding is particularly interesting considering that both of these hippocampal domains critically determine diverse behavioral and cognitive functions.

Striatal inhibitory network alterations in male APP mice
Because network activities supporting cognition are altered prior to disease onset in AD and because interneuron dysfunction has emerged as a potential mechanism underlying these network abnormalities (see for review 65 ), we measured the ectopic expression of neuropeptide Y (NPY), a well-established marker of molecular alterations linked to the network remodeling in APP mice 27,32,[66][67][68] , in all mice subjected to RRow. Adapting the approach developed by the Mucke group 66 for confocal imaging analysis, we observed a ~2-fold increase in NPY immunoreactivity in mossy fiber axons in the stratum lucidum (SL) of APP mice (t test, F(1,15) = 3.5321, p = 0.0414; Suppl. Fig. 10A,B). A more modest elevation in NPY immunoreactivity was also found in the stratum lacunosum moleculare (SLM) of APP mice (t test, F(1,15) = 4.4108, p = 0.0279; Suppl. Fig.  10A,C). In both subfields, putative sex effects were not found for SL axons (two-way ANOVA,  Figure 10E). These hippocampal findings are consistent with prior observations from 7-to 10-month-old APP animals 32 .
Since Aβ pathology was newly identified in the striatum of APP mice, we also evaluated the ectopic expression of NPY in the caudoputamen and nucleus accumbens (Fig. 7A). When comparing nTG and APP mice, NPY immunoreactivity was similar across genotype groups (t test,  Fig. 7B,C). However, considering sex as a variable revealed sex-specific alterations in NPY expression whereby male nTG mice displayed high levels of ectopic NPY expression in the caudoputamen which was not present in male APP animals (two-way ANOVA followed by Tukey HSD, F(3,15) = 9.3868, P = 0.0023 with an effect of transgene F(1,15) = 5.0201, P = 0.0406, sex F(1,15) = 13.5644, P = 0.0036, and transgene*sex interaction F(1,15) = 8.0451, P = 0.0162). In addition, these basal high amounts of caudoputamen NPY expression in male nTG mice was not observed in female nTG mice (Fig.  7D), consistent with a previous report detailing a sex-specific difference of striatal NPY expression in rats (70). Despite falling just short of statistical significance, similar trends were observed in the nucleus accumbens (two-way ANOVA followed by Tukey HSD, F(3,14) = 3.6874, p = 0.0507). Altogether, these results suggest that Aβ pathology is associated with alterations in network activity in the striatum of APP mice.

DISCUSSION
To better understand how complex decision-making may be altered with early stages of Alzheimer's disease, we subjected a widely used mouse model of AD, the J20 line, and young adult nTG littermates to a neuroeconomic spatial foraging task called RRow that accesses multiple decision-making systems. We found that the strategies used by APP mice on this task differed drastically from that of the non-transgenic controls 43 , with APP animals relying heavily on VTE events and showing impaired procedural decision-making. These behavioral differences were largely accounted for by the male APP mice, with female APP mice behaving similarly to nTG mice. Neuropathological analyses of Aβ deposits throughout the whole brain unexpectedly revealed that 9-month-old APP mice displayed much more widespread plaque aggregation than has been reported previously. This was particularly notable in the striatum, which has never been reported. Along with this plaque deposition, ectopic NPY expression was significantly decreased in the striatum of male APP mice suggesting network remodeling of this region.
Theories of decision-making suggest that there are at least three dissociable systems: a Pavlovian system that chooses unconditioned responses based on associations between stimuli and outcomes 16,40 , a procedural system that chooses actions based on learned associations between actions and stimuli 16,39 , and a deliberative system that considers how an action influences future possibilities 15,17,35 . Vicarious-trial-and-error behavior has been shown to be indicative of deliberation 58 . During VTE, in rats, hippocampal representations sweep forward, alternating potential goals that are synchronized with reward value representations suggesting that outcome predictions are being evaluated 58,69,70 . Control mice in this task showed higher VTE when eventually deciding to skip an offer versus deciding to accept an offer, suggesting that the decision to accept an offer, particularly a good offer, likely arises from a non-deliberative decisionsystem (either Pavlovian or procedural). This discrimination replicates previous observations of mice on this task 43 . Additionally, nTG mice showed higher VTE when presented with bad offers (value below their threshold to wait; see methods) and with their least preferred flavor of food. In those cases, deliberative decision-making is used to counter instinctual approach behaviors. In stark contrast to the distinction nTG mice demonstrated, APP mice showed equal amounts of VTE whether deciding to skip or enter, suggesting deliberative mechanisms of decision-making were paramount in both taking and rejecting offers. Additionally, APP mice deliberated equally for both good and bad offers (high or low values), as well as for their most and least preferred flavors, which was fundamentally different than the nTG control mice.
Our observations largely suggest that APP mice rely excessively on deliberation even in situations where nTG mice use procedural systems. One hypothesis for the excessive deliberation observed in the APP mice may reflect an inability to process the planning signals normally generated by the hippocampus, potentially due to the amyloid pathology in the hippocampus. Another potential hypothesis for this high deliberative behavior may be that the amyloid pathology in the striatum disrupted procedural decision making, forcing reliance on the deliberative system. Evidence largely supports the later hypothesis, as APP mice actually normalize their behavior faster than nTG controls. This suggests that the excessive deliberation observed was functional and helpful to their processing. This disrupted procedural decisionmaking hypothesis is further supported by the sex-linked differences and changes in NPY functionality observed in the striatum.
As RRow gives us the ability to examine multiple forms of decision making that encompass multiple brain regions, it was of critical importance to us to possess a broad characterization of amyloid pathology throughout the brain. Previous work has extensively examined amyloid plaques in the hippocampus 27 throughout the lifespan of APP mice but rare studies have determined amyloid burden across the brain 33 . The mice used in our study were 9-months of age at the time of tissue processing and showed significant pathology in the hippocampus, as was expected. What was not expected was how dense plaque aggregation was in the striatum. In these animals, the striatum showed the second highest plaque density. The fact that both the hippocampus and the striatum had high plaque deposition suggests that neuronal functioning may have been impaired in striatum as well as, or even more than, the hippocampus. It is well established that the hippocampus is centrally implicated in spatial navigation and memory 15,35 . And though proper lateral striatal functioning is known for being important for procedural decision making 71 , the hippocampus and the medial striatum likely work in concert to switch between rigid and flexible behaviors through connections between the prefrontal cortex 13,34,72 . Thus, the fact that both the hippocampus and the striatum showed plaque deposits suggests that the impairment in procedural decision-making displayed by APP mice could result from effects of pathology, perhaps on either procedural decision systems or on decision-system-conflict mediation systems.
It has recently emerged that including imaging data from subcortical areas might be critical for the clinical presentation of AD. Supporting this view, several studies reported that high striatal Aβ burden predicts faster cognitive decline than high cortical Aβ loads in humans with mild cognitive impairment (MCI) or AD 22,23 . It is important to note that despite this advancement, the impact of these striatal amyloid plaques on human cognition are currently unknown and requires further studies. The neuropathological findings of our study documents large depositions of Aβ plaques in the striatum of APP mice, reminiscent of the striatal Aβ plaques recently reported in subjects with AD 22,23 . This parallel is even more striking considering that familial AD is distinguished from late-onset AD by early striatal Aβ deposition and considering that APP mice like J20 animals harbor FAD mutations 23,27 . The novel description that Aβ deposition in the striatum of APP mice is associated with abnormal alterations of procedural decision-making provides support for an impact of striatal Aβ pathology on behavior for the first time in this model. Because RRow has been successfully translated to a similar neuroeconomic task called WebSurf for use in human subjects 44 , our studies using preclinical models of AD open up the possibility of testing patients with prodromal AD or MCI using this approach.
Because we included both males and females in our experimental design, we conducted post-hoc analyses to investigate any potential sex differences in the behavior we observed. Epidemiologic studies of AD are reported that women typically present with a more rapid cognitive decline 62 while imaging studies indicated that women also present with an exacerbated brain atrophy when compared to men 63 . These observations and others indicate that sex is an important contributor of disease heterogeneity 61 , although the causes underlying these potential differences are unclear. In most mouse models of AD, female mice have been shown to have Aβ pathology develop earlier than male mice 26 . In our studies, the most striking behavioral differences between nTG and APP mice, namely high reliance on deliberative behavior and impaired procedural decision-making in APP mice, were largely carried by male APP animals. In fact, in most cases, female APP mice looked very similar to control nTG littermate animals. Although the average plaque number was similar between males and females, segregating by subregions revealed that males carried a higher plaque burden in the hippocampus, whereas females carried the largest Aβ burden in the isocortical regions. It is thus possible that the sex differences in observed behaviors are due to this differential localization of Aβ deposits. Further studies will be needed to rigorously establish causal links.
These data also highlight the need for sensitive behavioral paradigms that allow for the dissociation of multiple valuation processes which is difficult using standard tasks that have been previously used to characterize APP mice. For instance, APP mice of the same age as used in this study are characterized as having impaired spatial reference memory. This has been assessed using standard spatial memory tasks, such as the Barnes maze and the Morris water mazes 54,57 . Duration to find the platform or hole and the distance used to find the platform/hole are the primary quantitative measures for cognitive memory performance in these tasks. A more detailed analysis of behaviors used when searching for the goal allows for a finer dissection of neural circuits involved and elucidates different strategies that may be used that are obscured by measurements of only distance and speed. Recent examples of groups using the Morris water maze have discovered different search strategies between transgenic and non-transgenic mice 73 . These fine analyses allow for group differences to be revealed even if both groups look equivalent in other more commonly reported measures such as distance traveled or latency to find the platform. Without careful examination and dissection of the finer details of behavior, important findings are obscured. When only considering earns in our task, it would have been easy to just say that APP mice did better at this task than nTG mice or that they learned the task faster. What is lost in limited analyses would be the very disparate strategies used by both groups in response to a changing economic environment.
In conclusion, APP mice displayed impaired procedural decision-making and that, to compensate for this impairment, these animals relied on deliberation to succeed in a neuroeconomic decision-making task. To our knowledge, this is the first study to report a previously undocumented deposition of Aβ in the striatum of J20 mouse line, which is associated with aberrant ectopic expression of NPY and sex-specific alterations in decision-making while performing a neuroeconomic task. These studies are directly relevant to the recently described striatal deposition of Aβ in FAD carriers and may extend to subjects with late-onset AD.

Mice
Eight transgene-positive (4 male and 4 female) and eight transgene-negative J20-C57BL/6J APP transgenic mice (4 male and 4 female; The Jackson Laboratory, #006293) all 6-months of age were used for this study. During the experimental procedure, one female J20 mouse became sick and was eliminated from the study, leaving NJ20=7. J20 mice express a mutant form of human APP by way of the Swedish (KM670/671NL) and Indiana (V717F) mutations, resulting in higher levels of total human amyloid-beta and an increase in the Aβ42/Aβ40 ratio, respectively. This transgene is driven by the PDGF-β promoter. Mice were single-housed (beginning at 6 months) in a temperature-and humidity-controlled environment with a 12 h light/12 h dark cycle with water ad libitum. Mice were food restricted to 90% free-feeding body weight and trained to earn their entire day's food ration during their 1-hour Restaurant Row session. All experiments were approved by the University of Minnesota Institutional Animal Care and Use Committee (IACUC) and adhered to NIH guidelines. All mice were tested at the same time (beginning at 8:30 am) every day during their light phase in a dimly lit room. Mice were weighed before and after each session to make sure they were above 90% of their free-feeding body weight and were fed a small post-session ration of food (1.5-2g) on the occasion that their body weight fell below the 90% guideline.

Pellet training
One week prior to the start of training on Restaurant Row, mice were trained to eat the pellets that were used in the task. During this time, mice were taken off their regular food and introduced to a single daily serving of BioServ full-nutrition 20 mg dustless precision pellets (5 g). This serving consisted of an equal mixture of all four flavors found in the maze; chocolate, banana, grape, plain. One day prior to the start of training, mice deprived of their previous day's ration were introduced to the maze. Each mouse was given 20 minutes to explore the maze and familiarize themselves with the feeding sites which were filled with excess food of the particular flavor that would be found during the experiment. Each restaurant location was marked with unique spatial cues and remained constant throughout the entire experiment.

Restaurant Row
In this task, food deprived mice are allowed one hour to traverse a square maze with four feeding sites that offer different flavors of food at varying delays (i.e., costs). Mice need to learn to balance their food preferences against the potential cost at each reward-site encounter in order to obtain their only source of food for the day. Because mice have limited time to forage on the task, the delays that they must wait for food are analogous to costs spent from a limited (time) budget. As mice progress through the stages of learning, the range of delays increases and thus the reward environment grows increasingly scarce. As rewards become scarce, conflicts in decision-making arise forcing the animals to adapt new foraging strategies that may no longer suffice in previously rich environments. Each daily session lasted one hour. At the beginning of the test, one restaurant was randomly selected to be the starting restaurant. An offer was made if mice entered the restaurant's offer zone (OZ) from the appropriate direction in a counterclockwise manner. An offer began when the mouse entered the OZ and consisted of a delay that the mouse would need to wait before earning a pellet upon entering the wait zone (WZ). Brief tones (4,000-15,223 Hz for 500ms followed by 500ms of silence) sounded upon entry into the OZ, with pitch indicating the delay of the offer.
Tones repeated every second until mice either left the OZ for the next restaurant (skip) or entered the WZ (enter). Upon entering the WZ, the tones counted down (in 387 Hz steps) each second until the mouse either left (and quit) the WZ, or the countdown reached 0 (following the final 1s tone = 4,000 Hz), at which point a pellet was dispensed (earn). If the mouse left (quit) the WZ during the countdown, the tone stopped, and the offer was rescinded. To discourage mice from hoarding earned pellets, motorized feeding bowls cleared uneaten pellets after the mouse exited the WZ. Mice quickly learned not to leave the WZ without consuming the earned pellet. The next restaurant in the counterclockwise sequence was always and only the next available restaurant where an offer could be made. This ensured mice encountered offers across all restaurants in a fixed consecutive order. Training was broken into four stages. During the first stage (days 1-7), mice were given only 1 sec offers for all restaurants. During the second stage (days [8][9][10][11][12], mice were given offers that ranged from 1 to 5 sec (4,000 Hz to 5,548 Hz in 387 Hz steps). Offers were pseudo-randomly selected, such that all 5 offer lengths were encountered in 5 serial trials before being reshuffled, ensuring a uniform distribution of offer lengths. Stage 3 (days 13-17) consisted of offers from 1-15 s (4,000-9,418 Hz). Stage 4, the final stage (days , consisted of offers ranging from 1-30 s (4,000-15,223). Again, all offers in all stages were pseudo-randomly selected in each restaurant independently and all offers were encountered before being reshuffled. To assess flavor preferences, the total earnings of each flavor at the end of the session were examined. Flavors were ranked from most earned to least earned for each individual mouse. Flavor preferences were established by the beginning of the second stage and remained consistent throughout the rest of the experiment. Four Audiotek tweeters positioned next to each restaurant were powered by Lepy amplifiers to play tones at 70 dB in each restaurant. Med Associates 20 mg feeder pellet dispensers and 3Dprinted feeder bowls fashioned with mini-servos to control automated clearance of uneaten pellets were used for pellet delivery. Animal tracking, task programming, and maze operation were powered by AnyMaze (Stoelting).

Quantification of Inhibitory Network Activity
To investigate inhibitory network activity, we followed previously described protocols for quantification of NPY immunofluorescence 66 . In brief, sagittal sections embedded and sectioned by NeuroScience Associates were stained with anti-neuropeptide-y (Cell Signaling 11976, 1:400), goat anti-rabbit Alexa Fluor 555 Plus (ThermoFisher A32732, 1:400), MAP2, goat anti-chicken Alexa Fluor 647 Plus, and imaged using an Olympus FV1000 microscope. Hippocampal sections were sequentially acquired using a 10x objective with NA of 0.4 and striatal sections were sequentially acquired using a 4x objective with NA of 0.16. For imaging both structures, the Alexa Fluor 555 Plus was excited at 559nm and 10.0% transmissivity and emissions were collected from 575-620nm with a Kalman integration of 10. Maximum grey values were lowered in FIJI to match the upper end of the distribution peak. The minimum grey value was increased until the background level measured in the granule layer of the dentate gyrus for the hippocampus or the corpus callosum for the striatum was consistent across all images of the same structure. For quantification of NPY expression in the hippocampus, regions of interest were selected over the stratum lacunosum moleculare (SLM) and the stratum lucidum (SL). In the striatum, the selected regions of interest were the caudoputamen (CP) and the nucleus accumbens (NAc). All regions of interests were drawn over the MAP2 channel with the help of the Allen Brain Atlas. The calculated mean grey values of the regions of interest were normalized to their structure's respective background control area that was used to determine the minimum grey value. In addition, two sections for each animal were stained and used to calculate average values for the regions of interest.

Statistical analysis
All data were processed in Matlab and statistical analyses were carried out using JMP Pro 13 Statistical Discovery software package from SAS. All data are expressed as mean +/-SEM. Offer zone thresholds were calculated by fitting a sigmoid function to offer zone choice outcome (skip versus enter) as a function of offer length for all trials in a single restaurant for a single session and measuring the inflection point. Wait zone thresholds were calculated by fitting a sigmoid function to wait zone outcomes (quit versus earn) as a function of offer length for all entered trials in a single restaurant for a single session. For analyses that depend on thresholds, analyses at each timepoint used that specific timepoint's threshold information. Statistical significance was assessed using Student t tests, one-way, two-way, and repeated measures (RM) ANOVAs, using mouse as a random effect in a mixed model, with post-hoc Tukey t tests correcting for multiple comparisons. Significance testing of immediate changes at block transitions within group were tested using repeated measures ANOVA between 1 d pre-and 1 d post-transition. These are indicated by significance annotations on the dotted lines denoting transitions on relevant figures. Significance testing of behavior differences between groups were tested using a repeated measures ANOVA across all days within a given block. These are indicated by significance annotations within the plot. The period of renormalization was estimated based on animal driven performance improvements in the 1-30 s stage and not imposed on the animals by experimenters nor the protocol design. Renormalization was characterized by identifying the number of days in the 1-30 s block, after which total pellet earnings and reinforcement rate reliably stabilized and was no different from performance in relatively reward-rich environments in earlier stages of the experiment.

Acknowledgements & Funding Sources
This work was supported by grants from the National Institutes of Health (NIH) to SEL (RF1-AG044342, R21-AG065693, R01-NS092918, R01-AG062135 and R56-NS113549), to ADR (R01-MH080318, R01-MH112688) and to MJT (R01-DA041808). Additional support included start-up funds from the University of Minnesota Foundation and bridge funds from the Institute of Translational Neuroscience to SEL.

Data availability statement
The datasets generated during and/or analyzed during the current study will be available on Dryad. Food-restricted mice (90% free food body weight) were trained to run counter-clockwise for flavored rewards in 4 "restaurants". Restaurant flavor and location were identifiable with specific contextual cues and did not move location throughout the experiment. Each restaurant contained a separate offer zone (OZ) and wait zone (WZ). Tones sounded once the animal entered the offer zone. Fixed tone pitch indicated delay that mice would have to wait in the wait zone before earning a pellet. Upon entering the wait zone, tone pitch descended during delay "countdown". Mice could quit the wait zone for the next restaurant during the countdown, terminating the trial. Mice were tested daily for 60 min in which they received their full food ration for the day.  decision thresholds as a function of cost (offer delay). Early in training, OZ and WZ thresholds are equivalent, indicating that mice entered offers that they were in turn willing to wait for. Following the transition to 1-30 s offers, OZ thresholds were much higher than WZ thresholds, indicating that mice entered offers that they were not willing to wait and earn. Data are presented as the daily means (± SEM) across the entire experiment. The x-axis reflects the days of training with vertical lines indicating changes in training stages (corresponding with Fig. 1B). White start on a vertical transition line indicates immediate significant behavioral change at the block transition within groups; black star indicates significant differences between groups within the given training stage. . Mice initially orient toward entering (to the right) but then ultimately reorient and skip. This orientation is the basis for the IdPhi measurement used to classify vicarious-trial-and-error (VTE). Here we show how value of the offer is calculated based on the offer (in this case 27 seconds) and the individual's willingness to wait (WZ threshold, in this case 12 s). Value is calculated by subtracting the WZ threshold from the offer (12-27 = -15). (B) Example of a nTG mouse's path trajectory in the OZ (decision point 1) during a single enter trial (from day 70). nTG mice make relatively smooth entrances into the WZ while APP mice show sharper trajectories leading to higher VTE values when entering. Here again we illustrate how the value is calculated (WZ threshold -offer; in this case 12-3 = value of 9). (C) Average VTE (IdPhi) values split by skip versus enter decisions across days of training by groups (dark grey, grey = nTG, pink and blue = APP). Pink star denotes p<0.05 for between-group differences in skip VTE. Blue star denotes p <0.05 for between group differences in enter VTE. Data are presented as the daily means (± SEM) across the entire experiment. Vertical dashed lines represent stage transitions. (D) Bar graph depicting mean ± SEM for well-trained animals (last 10 days) to illustrate group differences. *, p<0.05 one-way ANOVA within group (nTG). (E) Quadratic functions of VTE by value (WZ Th. minus offer) for the last 10 days. Negative value denotes an economically unfavorable offer. Peaks are right-shifted for APP mice (value of -3 compared to -10 for nTG mice), indicating higher VTE for closer to neutral values. (F) Mean ± SEM of values from E for bad (value <0) and good (value >0) deals *, p<0.05 one-way ANOVA within group (nTG). High VTE trials were determined as being greater than the median VTE for an individual mouse on that day. High VTE trials were first removed and earns were analyzed (removed; pink). Then high-VTE trials were replaced with low-VTE trials (VTE trials < median; replaced, light blue). nTG (A) and APP mice (B) showed an effect on earning potential based on simulation during the last stage of training, earning significantly more pellets if high VTE trials were replaced with low VTE trials (denoted by light blue star), and earning significantly less when high VTE trials were removed all together (denoted by pink star). Data are presented as the daily means (± SEM) across the entire experiment. Vertical dashed lines represent stage transitions. (C) To look at how earnings changed in these simulations based on training, we examined average earnings by simulation during three separate epochs of the last stage of training (when no animals had their food intake renormalized: days 18-22; when APP had their food intake renormalized: days 23-52; when all animals had their food intake renormalized: days 53 to 70). nTG mice had significantly higher earning potential when high VTE events were replaced with low VTE events in the last epoch (days 53-73; Fig. 4C left). In contrast, APP mice do not show any significant differences in earning simulations when the earnings are averaged across days (right). Data are presented as mean + SEM. (D) Actual earns that were observed in this experiment. APP mice earned significantly more than nTG mice during the last stage of training (1-30 second offers; RM-ANOVA between groups). (E) High VTE removed simulations compared between groups. Simulated APP mice earn significantly more early on in the last stage of training but nTG mice catch up to APP mice during the last epoch (days [53][54][55][56][57][58][59][60][61][62][63][64][65][66][67][68][69][70]. (F) High VTE replaced with low VTE simulations compared between groups. Simulated APP mice earn significantly more early in the last stage of training (1-30 s Amyloid-β (Aβ) deposits were labeled with 6E10 antibody and confirmed at 40x magnification (blue hues correspond to different animals, pink hues depict relative structure density of plaque load). (B) Representative micrograph of an amyloid-β plaque stained for human APP/Aβ (6E10, pink), microtubuleassociated protein 2 (MAP2, green), and synaptophysin (SYP, blue) (scale bar = 10 μm). (C-E) Quantification of (C) amyloid plaque count density in millimeters squared of isocortex (dark green), hippocampus (light green), striatum (light blue), pallidum (blue), thalamus (red-orange), hypothalamus (red), midbrain (light pink), cerebellum (yellow), and fiber tracts (grey), (D) total plaque counts per brain region, and (E) a breakdown of plaque number by individual structures, referenced to the Allen Brain Atlas.