Externally provided rewards increase internal preference, but not as much as preferred ones without extrinsic rewards

It is well known that preferences are formed through choices, known as choice-induced preference change (CIPC). However, whether value learned through externally provided rewards influences the preferences formed through CIPC remains unclear. To address this issue, we used tasks for decision-making guided by reward provided by the external environment (externally guided decision-making; EDM) and for decision-making guided by one’s internal preference (internally guided decision-making; IDM). In the IDM task, we presented stimuli with learned value in the EDM and novel stimuli to examine whether the value in the EDM affects preferences. Stimuli reinforced by rewards given in the EDM were reflected in the IDM’s initial preference and further increased through CIPC in the IDM. However, such stimuli were not as strongly preferred as the most preferred novel stimulus in the IDM, indicating the superiority of intrinsically learned values (SIV). The underlying process of this phenomenon is discussed in terms of the fundamental self-hypothesis. Author Summary We make decisions based on internal value criteria, which are individual preferences, or based on external value criteria, which are the values learned from the external environment. Although it is known that values are learned in both types of decisions, is there a difference in the nature of these values? Our study uses simulation and fits human behavioral data to address this question. The results showed that stimuli that were learned to be highly valued because of external feedback became preferred in subsequent preference judgments. However, it is interesting to note that such stimuli were not chosen as much as stimuli that were preferred without influence from the external environment. This finding suggests that values formed through one’s own criteria have characteristics distinct from those formed through external environmental influence. Our findings promote an integrated understanding of the decision-making process.


47
Existentialism, according to Jean-Paul Sartre [1] , supposed that the reason for 48 one's existence is not predetermined and that a sense of self is established through 49 one's own choices under the freedom given to one, as expressed in the phrase 50 "Existence precedes essence." In contrast, structuralism, which was developed by 51 Claude Lévi-Strauss [2,3] , emphasizes the influences on human behavior from the decision-making (EDM), or on one's own internal criteria (such as a sense of value, 62 beliefs, and preferences), referred to as internally guided decision-making (IDM). 63 Although EDM and IDM are distinct in their conceptual definition, experimental 64 operation, and neural bases [4,5,6,7,8] , they are similar in that the option's values are 65 learned through decision making [6,7,9,10,11] . 66 In EDM, the value of an option is considered to be updated based on the 67 predicted value and actual feedback given after the decision making [12,13,14,15,16,17,18] . 68 For example, in a reward learning task where one item is chosen from two items and 69 each is rewarded with a certain probability, the item's value increases when rewarded 70 feedback is received [12,18] . 71 In IDM tasks, such as a preference judgment task, no externally delivered 72 feedback indicates a correct answer. Even in such a case, an item's value (preference) 73 changes with the choice itself, not based on the feedback [19,20,21] . More specifically, in 74 the preference judgment task of choosing a preferred item from two presented items, 75 the value of the chosen item increases while the value of the rejected item decreases 76 (choice-induced preference change, CIPC [9,19] ). 77 EDM studies have progressed in mathematical understanding by analyzing 78 behavioral data using computational models that represent the information process with an 80 percent probability) associated with the option. The expected value is 83 updated according to the prediction error (i.e., the difference between the provided 84 reward and the expected value) [22,26,27,28] . The suitability of the computational model  [19,29,30] . 91 For IDM, CIPC has been examined for many years using changes in 92 subjective preference ratings as an index [21,31,32,33] . In recent years, choice-based 93 learning (CBL) models have been proposed [6,7,10,11] , and computational modeling has 94 progressed. CBL models are based on the RL model, but unlike RL, they use choice 95 behavior itself instead of feedback from the external environment. In addition, 96 differing from the typical RL, a CBL model was verified that updates the value of 97 both chosen and rejected items [9] . Thus, in the CBL model, chosen items are treated Since EDM and IDM have been studied as independent research areas, there 104 is much ambiguity about their relationship. Although comparative studies of EDM 105 and IDM have been reported in recent years, these studies have focused on their 106 differences [4,5,6,7,8,34] . Therefore, whether the value learned through the EDM affects 107 the value of the IDM has not been explored. It has been reported that reward-related 108 neural responses are involved in the value-learning process in IDM [6,21,31,35,36,37] , as 109 with EDM [38,39,40,41,42] . From these findings, it can be inferred that the values learned  This study investigated whether, how, and to what extent the value learned in 116 EDM affects the value in IDM. We used simple EDM and IDM tasks with novel 117 contour shapes; the IDM task followed the EDM task. In the IDM task, the same 118 stimuli used with the EDM task were used for the four (two high and two low reward 119 probability) stimuli in addition to eleven novel stimuli. 120 We first tested whether the values learned in EDM affect IDM from classical 121 model-free behavioral data analysis using the chosen frequency of items in IDM [5,7] .

122
If the value learned in the EDM affects the IDM, it is predicted that stimuli with 123 higher values learned in the EDM will be chosen more frequently than novel stimuli 7 124 in the IDM. It is also expected that stimuli with lower values learned in the EDM will 125 be chosen less frequently than novel stimuli in the IDM.

126
To examine how the values in EDM reflected IDM, we applied four 127 computational models (see Table 1 in Methods) to the IDM data, compared the 128 models, and investigated the estimated initial values of each stimulus in the IDM. The 129 initial values of different stimulus types in the IDM in these models differed (Table   130   1 respectively, thereby further changing their values. As all four models described 151 above with different initial value settings assumed value changes in the IDM based on 152 a previous study [9] , an additional computational model analysis was performed to rule 153 out the possibility that the value did not change in the IDM. chosen frequency of each stimulus in the IDM or the subjective ratings of each 167 stimulus after the IDM task. 168 We confirmed whether the participants successfully learned the value of each 169 stimulus in the EDM from the correct response rate. Therefore, in this study, a

275
The mean initial value of the HP stimuli in the IDM estimated using the CBL 276 model (Model 2) was 0.575 (Fig 3a). To confirm the values learned in EDM were 277 reflected in the initial values in IDM, we compared the estimated initial value for HP, 278 fixed initial values (0.5) for LP, and novel stimuli. We found that HP stimuli had a 279 higher initial value than LP stimuli and novel stimuli (t(37) = 7.057, p < .001, 95% CI  the four models with different initial value settings (Table 1). Therefore, an additional Model recovery was confirmed for all models except Model D (see Table 5).

343
When the same true model (used to generate artificial data) was used for the analysis,  to N11) and then compared them with HP or LP stimuli (Fig 5a)   The results demonstrated that the chosen frequency of the HP stimuli was lower than reflected in the initial IDM values (Fig 3a). This result was consistent with the results 451 of chosen frequency (Fig 1b) had changed. They were also placed in a position in which they expected that 459 choosing the reinforced option would eventually reward them [43,44] . In contrast, 460 participants in the present study were clearly instructed on the difference between the 461 EDM and IDM, and they were aware that they would not be rewarded in the IDM.

462
That is, the present study operationally eliminated the participants' choice of shown that the value of HP was further increased by selection in IDM (Fig 3b).

480
Although we also examined the possibility that the value of HP stimuli was not 481 updated in the IDM (Table 4), such a model did not fit well with the behavioral data 482 (  highly valued in the EDM and subsequently chosen (Fig 1b) and valued in the IDM 495 (Fig 3b). However, the SIV of novel stimuli is observed in the IDM (Fig 5a), 496 indicating that our preferences are strongly influenced not only by externally given 497 rewards but also by increased preferences on our own choices. This postulates that the self is a fundamental brain function that precedes and controls 502 cognitive functions, such as perception, emotion, and reward, which has been 503 proposed in studies of spontaneous brain activity [49,50,51] and the self-prioritization 504 effect (SPE) [48,52,53,54] . In this hypothesis, the self is embedded in spontaneous brain 505 activity, and when a stimulus appears, the default mode network (DMN), which is 506 responsible for processing self-associated stimuli, interacts with a task-related 507 network to influence cognitive processing. A meta-analysis of the neural basis of IDM 508 and EDM also confirmed that IDM differs from EDM in that the DMN is its primary 509 neural substrate [4] . In addition to the conceptual and operational differences between 510 the EDM and IDM, there is a difference in task demands (i.e., whether decisions are 511 made based on value criteria given by the environment or based on one's own value 30 512 criteria), that is, an essential difference in self-involvement. The continuous choice of 513 stimuli as one's own favorite shape, rather than because it has previously been 514 rewarded, is likely to increase the self-relatedness of the item. As self-related stimuli 515 are known to induce reward-related brain activity [55,56] , an increase in self-relevance 516 may trigger an internal reward response [6,21,31,35,36,37] , leading to an increase in value.

517
As a result, the most preferred novel stimulus learned in the IDM might have a higher 518 value than HP stimuli in the IDM.

519
Although the model-free measure of chosen frequency also confirmed this 520 SIV (Fig 5b), which was not reflected in the subjective preference ratings after IDM, 521 subjective preference showed no significant difference between HP and N1 (Fig 5c).

527
The low values learned in EDM did not affect IDM. In the EDM task, when 528 an incorrect answer was chosen, feedback was displayed as 0 (simply not presented 529 with a reward) rather than presented with a punishment. There was a possibility that However, we should note that the results did not lead to any general conclusions about 555 the relationship between EDM and IDM values but were a conclusion that depended 556 on the task settings of this study. In this study, EDM used relatively easy reward 557 probability settings such as 90% vs. 10% and 80% vs. 20%, where learning of value 558 was easily established, to examine whether the value of EDM was reflected in IDM. participants likely learned that LP stimuli were of relatively low value but did not 565 come to the realization that they had to actively avoid LP stimuli. Therefore, it is 566 assumed that LP stimuli in IDM were treated as having the same initial value as novel 567 stimuli. When using an EDM task where the participant actively decides not to choose 568 an item to avoid losses, the low value in EDM may affect IDM. Therefore, there is 569 room for further study on this point.

570
Second, the possibility of explanations using other types of models has not 571 yet been explored. For example, the cognitive dissonance theory has been used to 572 explain the phenomenon of CIPC in IDM [57] . In this theory, CIPC is explained by 573 which choosing one item from two items with the same subjective preference rating,  Finally, there was a possibility that preferences were formed to some extent 589 by the first impression in IDM. Although we used novel contour shapes by following 590 the previous study [9] to minimize the impact of initial preferential differences, we 591 cannot rule out the possibility that value can be formed by first impression. For 592 individuals whose preferences were formed by first impressions, it is possible that the 593 estimated learning rate was estimated to be larger than the true value. Fifteen novel contour shapes were selected from a previous study [59] . We All participants conducted EDM tasks followed by IDM (Fig 6a). Subjective 649 preference ratings of each stimulus were conducted after the IDM. do not support this possibility (see the supplementary materials for more details).

676
IDM task (Preference judgment task) This task was the same as that in the previous 677 study [9] . Including the four shapes in the EDM task, all 15 shapes randomly created carried out five blocks of 21 preference decision trials and were asked to choose the 683 preferred shape among two shape stimuli presented according to their own 684 preferential criteria in each trial. We also informed participants that there was no 685 objectively correct answer in this task. Stimuli were presented in the same manner as 686 the EDM task, except there was no feedback after the choice.

687
Rating task We conducted a subjective rating task for each shape stimuli (Fig 6b)   better fit to the behavior. We used the same IDM task as Zhu et al. [9] . Although not 753 directly related to the aim of this study, we confirmed that the behavioral data from was a free parameter, whereas the initial value of the LP stimuli was fixed at 0.5, the 764 same as with novel stimuli. In contrast to Model 2, in Model 3, only LP was a free 765 parameter, and the others were fixed at 0.5. In Model 4, both HP and LP were free 766 parameters, and the novel stimuli were fixed at 0.5. Since the previous study [9] 767 reported that the model that used different learning rates for chosen and rejected items 768 was unsuitable for model comparison, the above four models were created based on 769 the model that used the same learning rates (i.e., = ) as the main model-based 770 analysis. Model 1 is a null model that assumes no effect of the EDM on the IDM. If  To calculate the probability of choice in the CBL models, the softmax 779 function was applied to the value difference between the two options.