Response time modelling reveals evidence for multiple, distinct sources of moral decision caution

People are often cautious in delivering moral judgments of others’ behaviours, as falsely accusing others of wrongdoing can be costly for social relationships. Caution might further be present when making judgements in information-dynamic environments, as contextual updates can change our minds. This study investigated the processes with which moral valence and context expectancy drive caution in moral judgements. Across two experiments, participants (N = 122) made moral judgements of others’ sharing actions. Prior to judging, participants were informed whether contextual information regarding the deservingness of the recipient would follow. We found that participants slowed their moral judgements when judging negatively valenced actions and when expecting contextual updates. Using a diffusion decision model framework, these changes were explained by shifts in drift rate and decision bias (valence) and boundary setting (context), respectively. These findings demonstrate how moral decision caution can be decomposed into distinct aspects of the unfolding decision process.

However, there is also another form of caution, which is highly relevant for moral 84 judgements: People may particularly slow their RTs when judging someone's action as 85 morally bad, to increase the likelihood of being correct (according to their personal moral 86 norms) when selecting this option. This tendency can be conceptualised as a decision bias, 87 which has been shown to occur in other contexts against choice options associated with 88 smaller rewards 21 , or larger punishments 26 . Morally blaming others is socially risky as it may 89 lead to reprisals if that blame is improperly placed. Indeed, people are more motivated to stay 90 as accurate as possible by ensuring their judgements are up to date with all the available 91 information when making negative judgements 27, 28 . However, there is an alternative 92 explanation for why people may take longer when judging someone as bad that is unrelated 93 to caution. Namely, people tend to take longer to evaluate negative information, even when 94 they are not required to make any decisions, and there are no response options to be cautious 95 about. For instance, people report thinking more thoroughly about negative events 29 , they 96 look longer at negative content when scrolling through images 30 , and are longer distracted by 97 morally negative words 31 . Such effects suggest that people take longer to process negatively 98 valenced information 32 . Therefore, there are two distinct explanations for slower RTs when 99 making negative moral judgements: a decision bias (defined as a tendency to be more 100 cautious when judging someone as bad), and a slower rate of evaluation of negative 101 information (i.e. evidence for the negative judgement). For this reason, previous research 102 relying on simple comparisons of mean RTs has been unable to disentangle the cognitive 103 processes underlying the slowing 32 . This is again due to the fact the there is no process model 104 specifically developed for moral decision-making that would allow us to investigate this 105 question. 106 In other fields of decision science, evidence accumulation models have been widely 107 applied to disentangle parts of the decision process. These process models include 6 mathematically formalized parameters that correspond to evidence accumulation (i.e. the rate 109 at which evidence is evaluated) and the two forms of caution described above, and might 110 therefore be useful for partitioning distinct sources of moral decision caution. One prominent 111 model of this class is the Diffusion Decision Model (DDM) 33-35 which has been used to study 112 decision-making across a broad range of discrete choice tasks 33,36-39 . The DDM describes the 113 decision process as a continuous accumulation of noisy evidence for different choice 114 outcomes. Once evidence in favour of a particular choice reaches a boundary, a decision is 115 made. These models find substantial support from animal studies where neural firing rates in 116 middle temporal and ventral intraparietal areas found to closely track the trajectory of 117 evidence accumulation 40,41 . Although predominantly used to model perceptual decision-118 making processes, where sensory evidence is accumulated by the sensory systems, the DDM 119 can be regarded as a universal decision process model, and it has been used to model value-120 based decisions 42 , sharing and cooperation choices 43-45 as well as moral decisions 46 . For such 121 higher-level decisions, the accumulation process represents integration of signals from brain 122 areas that calculate subjective value 47 , integrate representations of potential gains and 123 losses 48 , and perform diverse social and moral computations that have not yet been well 124 specified by previous research 44, 46 . 125 The rate of evidence processing (i.e. the evidence strength), and two forms of caution, 126 correspond to specific parameters of the DDM model. In the DDM model, caution against 127 making an error across response options is formalized as the amount of evidence needed to 128 make a choice and is estimated by the boundary separation (a) parameter. Given the role of 129 this parameter in adapting decision processes to environmental demands 21,24,25 (as described 130 8 1 these two conditions were presented interleaved. Experiment 2 was used to replicate the 159 results using a near-identical paradigm with an independent sample of participants. In the 160 second experiment the two conditions were presented in separate blocks, which further 161 controlled for the possibility that the interleaved presentation of conditions might have had an 162 impact on participants' decision strategies. Naturally, there are individual differences in the 163 norms people rely on to make such judgements. A majority of people, however, condemn low 164 and endorse high offers 9 . To avoid possible confounding of response times due to potential 165 differences in the reliance on different sets of norms across individuals, and to ensure that the 166 perception of our stimuli as "evidence" for judgement options was roughly consistent across 167 the sample (a necessary assumption of the DDM when fit for a group of individuals in a 168 hierarchical model, see Methods), we limited our investigation in both experiments to this 169 largest subset of participants, who endorsed generosity and condemned selfishness 9 170 (implications for limitations will be addressed below). 171 172 173 story participants read prior to the experiment about a recently conducted study. The cover-9 story study was fictitious, but our participants were not informed of this. It involved persons 176 interacting across two rounds: In Round 1, Person A played the role of the Decision-maker 177 and had to decide how to share $10 with their partner, Person B. In Round 2, a new person 178 (Person C) became a Decision-maker and was paired with either Person A or a new person 179 (Person D), and had to decide how to share $10 with their partner. Importantly, Person C 180 knew whether their partner took part in Round 1, and if they did (e.g., Person A), how much 181 they gave when they were the Decision-maker ($x). Person C decided to give a certain 182 amount ($y) to their partner (either person A or person D, depending on the trial). (b) Trial 183 sequence. Participants were presented with information regarding the context-expectancy 184 condition of the current trial. "OLD Receiver" indicated that they would judge the Round 2 185 Decision-maker who was paired with a Receiver (i.e. Person A) who gave an amount $x to 186 another person in the previous round. Our participants made this judgement without yet 187 knowing this $x amount, but knowing that they would soon learn this information (i.e. the 188 context-expectancy condition); or "NEW Receiver", indicating that they would judge the 189 Round 2 Decision-maker paired with a new person (i.e. Person D), and that there was no 190 additional contextual information to expect (no-context condition). Next, participants were 191 presented with the amount that the Round 2 Decision-maker gave to their partner and selected 192 their judgement (one of the four options) on a keyboard. After this, in context-expectant 193 condition, the amount that the Receiver had given in the previous round ($x) was revealed. In 194 the no-context condition, no additional information was presented. Participants again 195 indicated their judgement of the Decision-maker's action on their keyboard. 196 Results 197 no systemic differences in the proportions of moral choice for each choice option across 200 expectancy conditions (depicted in Figure 2). 201 We took two approaches to test for effects of context-expectancy and moral valence 202 on response speed in moral judgements. The first approach was to test for these effects by 203 comparing RTs without formally specifying the decision process. Our predictions for these 204 RT comparisons together with the analysis approach were preregistered 205 (https://aspredicted.org/blind.php?x=dy3qk9). The second approach was to use the DDM to 206 better characterise these effects by comparing model parameter estimates across expectancy 207 and valence conditions. 208 With regards to our first approach we tested three hypotheses. First, we investigated 209 whether expectancy of contextual information increases caution, by testing whether the RTs 210 of initial judgements were higher in the context-expectant than in the no-context condition. 211 Second, we investigated whether morally negative evidence is evaluated more cautiously and 212 is processed at a slower rate. This hypothesis was operationalised as the assumption that the 213 effect of morally negative valence linearly decreases with the size of the Decision-maker's 214 offer. We therefore expected a negative relationship between the Decision-maker's offer and 215 RT. Third, to investigate whether caution when expecting a contextual update is particularly 216 pronounced for negative judgements, we tested whether the slope of the negative relationship 217 between RT and Decision-maker's offer was steeper in the context-expectant condition. To 218 test these hypotheses, we formulated several Generalised Linear Mixed-effects Models, 219 which included the Decision-maker's offer, the expectancy condition, and their interaction, as 220 predictors of RT (Supplement 1 Table S2). 221 The best-fitting model included main effects of Decision-maker's offer and 222 expectancy condition but did not include an interaction, and the intercepts and the slopes of 223 these two main effects were allowed to vary across individuals (Supplement 1 is evaluated more cautiously or takes longer to process. The two main effects were consistent 236 across quantiles (RT quantiles by condition are displayed in Figure 3). They were also robust 237 across different models (Supplement 1 Table S2) and across alternative approaches to 238 modelling RT distributions (Supplement 1 Table S4 and S5). As for the interaction effect, 239 there was no evidence across these two studies supporting the hypothesis that effects of 240 negative valence are more pronounced when people are expecting a contextual update. conditions. The graph shows group mean quantile values across participants. The general 256 pattern of results was consistent across quantiles and across two studies: there was a slight 257 increase in speed for higher Decision-maker offers; and there was a slight slowing in context-258 expectancy trials in Experiment 1 (dashed lines higher than solid lines), which was more 259 pronounced in Experiment 2. 260 261 Next, to better characterise these patterns of RT effects, and to test our predictions 262 regarding the relationships between context-expectancy, moral valence and components of 263 the decision process, we fitted a Diffusion Decision Model. To test whether context-264 expectancy increased the general amount of caution across judgement options (i.e. boundary 265 separation), we computed two a parameters, one for each expectancy condition, and 266 compared them. We expected: a context-expectant > a no-context . To test whether moral prototypicality 267 of Decision-maker's offers reflected stronger evidence for judgement options (with lower 268 offer magnitude reflecting evidence for "bad" option, and higher offers reflecting stronger fitted a v parameter separately for each Decision-maker's offer. The v parameter was signed, 271 meaning negative values indicated evidence for "bad" judgement and positive values 272 indicated evidence for "good" judgement. We tested for a monotonic positive relationship 273 between the offer magnitude and the v parameter. Moreover, to test whether negatively 274 valenced evidence is accumulated more slowly than positively valenced evidence, we tested 275 whether the estimates of the v parameter were in absolute terms (drift towards either "good" 276 or "bad") larger for high as opposed to low Decision-maker's offers. We expected: |v 0-4 | < |v 6-277 10 |. Finally, to test whether participants were more cautious against making "bad" judgements, 278 independent of the tendency to more slowly accumulate negatively valenced information, we 279 tested whether the z parameter differed from .5 (which would indicate no starting point bias), 280 and whether the z parameter was biased in the direction of 'morally good' judgement. The 281 position of decision bounds with respect to the starting point were standardized as 1 for 282 'morally good', and 0 for 'morally bad' judgements, hence we expected z > .5. 283 First, we formulated a hypothesised model (m1), which included separate a 284 parameters for each expectancy condition, separate v parameters for every offer value, and a z 285 parameter. We then tested whether the use of this model, which allowed us to test our specific 286 hypotheses, was justifiable and appropriately explained our data, by comparing it to a null 287 model (m0), which did not include differences between conditions for any parameter. We 288 used the Deviance Information Criterion (DIC) to compare the model fits (lower value 289 indicates better fit) 50 . We found that this model provided a substantially better fit to the data 290 (Experiment 1 DIC = 13396.303; Experiment 2 DIC = 15021.171) than the null model (m0) 291 (Experiment 1 DIC = 29569.801; Experiment 2 DIC = 30071.387). Additionally, we ran Figure S10). The m1 model simulation also reproduced the observed rates of judgements 295 across Decision-maker offers (Supplement 2 Figure S8) and patterns of changes in RT 296 distributions across different Decision-maker's offers and expectancy conditions (Supplement 297 2 Figure S9), and overall provided an excellent fit to the data. 298 Next, we tested for hypothesised differences in the m1 model parameters across 299 conditions. Statistical significance was defined as the posterior probability for the 300 hypothesised difference exceeding .95. Consistent with our hypothesis that context-301 expectancy increases caution against making errors, the a parameter estimate was nominally 302 larger in the context-expectant condition compared to the no-context condition; however, this 303 difference was not statistically significant in Experiment 1 (posterior P(a context-expectant > a no-304 context ) = 0.913) ( Figure 4a). In Experiment 2 this difference was statistically significant 305 (posterior P(a context-expectant > a no-context ) > 0.999, Figure 4b). As for the drift rate (v), we 306 expected this parameter to monotonically increase with the value of the Decision-maker's 307 offer. We observed a perfect monotonic relationship across both experiments (see Figure 4c  308 and d). To test our hypotheses regarding the reduction in absolute drift rate when processing 309 negative moral valence as compared to positive valence, we compared the v parameter for 310 negative stimuli (Decision-maker gave $0-4) with positive stimuli (Decision-maker gave $6-311 10). Consistent with our hypothesis we found a large and statistically significant decrease in 312 absolute drift-rate for negative stimuli (Experiment 1 posterior P(|v 0-4 | < |v 6-10 |) > 0.999; 313 Experiment 2 posterior P(|v 0-4 | < |v 6-10 |) > 0.999, see Figure 4c and d). To ensure that this 314 effect was not due to a perception of Decision-maker's offer of $4 as neutral as opposed to 315 negative, we repeated these analyses on a more constrained set of stimuli by excluding offers 316 $4 and $6, and the effect survived in both studies (Experiment 1 posterior P(|v 0-3 | < |v 7-10 |) = 317 0.999; Experiment 1 posterior P(|v 0-3 | < |v 7-10 |) = 0.999). To test our hypothesis regarding the 318 shift of the bias parameter (z) away from the 'bad' and toward the 'good' judgement option, we tested whether the z parameter was larger than .5. Consistent with our hypothesis we 320 found estimates of z parameter to be larger than .5 in both studies (Experiment 1 posterior P(z 321 > 0.5) > 0.999; Experiment 2 posterior P(z > 0.5) > 0.999). 322 323 In Experiment 2, this difference was replicated with a larger effect and there was minimal 328 overlap between the two posterior distributions. (c) In Experiment 1, the drift rate parameter 329 (v) monotonically increased with higher Decision-maker's offers, suggesting that higher offer 330 numbers provide more evidence for the judgement option 'good' and less for 'bad'. 331 Positively valenced actions (DM gave more than 6) had higher absolute drift rates towards 332 option 'good' than negatively valenced actions did towards option 'bad' (DM gave less than 333 $4), which suggests that participants processed negatively valenced actions slower than there was no interaction between the two factors. Moreover, these effects were well 347 accounted for by differences in multiple DDM parameters. The boundary separation 348 parameter was larger in the context-expectancy condition compared to the no-context 349 condition, pointing to more caution to avoid erroneous responses (across judgement options) 350 in the former condition. In addition, signed drift rates increased with the Decision-maker's 351 offer, suggesting that lower offers corresponded to stronger evidence for negative judgments 352 and higher offers corresponded to stronger evidence for positive judgements. Absolute drift 353 rates were smaller for negatively valenced offers, supporting the notion that negative evidence is accumulated at a slower rate than positive evidence, for reasons most likely not 355 related to moral decision caution per se. Additionally, the starting point parameter showed a 356 bias against "bad" judgements, suggesting that people also slowed their negative judgements 357 as they were particularly cautious about them. 358 Our findings that participants slowed their judgments when expecting contextual 359 information is consistent with previous research showing that people are more cautious when 360 aware that they are more prone to making mistakes 24,25 . Notably, previous research has 361 demonstrated this effect for decision mistakes in tasks in which people are not given 362 additional information or a chance to change their minds 24,25 . The current findings show that 363 this effect also extends to dynamic decision-making contexts, in which learning additional 364 information can lead to changes of mind. Crucially, here we show that this type of caution 365 can be explained by the widening of the decision boundary separation in a process model of 366 decision-making. 367 Finding that the expectancy of contextual information increases the boundary 368 separation also highlights the importance of contextual information for moral judgements. 369 This finding is consistent with previous research that showed that contextual information 370 influences the judgements that we make 2-8 , and that some people make less extreme good/bad 371 judgements when expecting contextual information 9 . To note, we did not find an adjustment 372 of the judgement itself (see Figure 2), but the relatively course four-point scale might not 373 have been ideal to capture any potential subtle effects that might have occurred but could not 374 be expressed without a finer scale. The difference in response times, however, was observed 375 even though the expected contextual information could never directly impact the initial 376 judgment. This is important because it shows that context-dependent norms affect our 377 judgements even when contextual information is not yet known, a point which has been 378 overlooked in the moral judgement literature. 379 We further found that participants were slower when evaluating lower offers, which is 380 in line with both the idea that people take longer to process negative evidence 29-32 , as well as 381 with the idea people are more cautious against judging people as bad, as negative judgements 382 have higher social repercussions for individuals 27,28 . Our DDM results further support each of 383 these accounts separately. Firstly, our finding that the drift rate was slower for lower offers as 384 compared to higher offers is in line with the idea that people accumulate negative evidence at 385 a slower rate 29-32 . Secondly, we found that participants showed biases, or caution, against 386 judging moral actions as bad, independent of taking longer to process negative evidence. 387 Previous research on financial decision-making showed similar bias parameter shifts away 388 from options associated with less favourable monetary outcomes 21-23,26 . Our results extend 389 these findings to moral judgement valence, suggesting that people are inclined to default to 390 positive judgements. This may be because of the sensitivity of the bias parameter to social 391 outcomes, such as the repercussions that come with placing moral blame improperly 27, 28 . 392 Overall, our findings suggest that people take longer to make judgements about negative 393 actions both because it takes them longer to process negative information, and because they 394 favour positive judgements. 395 Our finding that people have biased caution against making negative judgements 396 complements recent findings showing that people are more prone to adjust and change 397 negative rather than positive beliefs about others 28 . Although negative beliefs are more 398 susceptible to change, our results suggest that people are more cautious to form these beliefs 399 in the first place. Together, these findings suggest that people are more careful about being 400 accurate when evaluating morally negative evidence, both in terms of changing their minds 401 when receiving information updates 9,28 , and by allowing themselves time to consider all the 402 information that is available when prompted to make a judgement. 403 magnitude of Decision-makers' offers is in line with the idea that moral prototypicality of the 405 action determines the quality of evidence for moral badness and goodness. Previous research 406 showed that drift rate scales with perceptual discriminability of the stimuli in classical 407 perceptual decision tasks 21 . Our findings suggest that this effect generalizes to moral 408 decisions, which is in line with the idea that moral prototypicality (i.e. how well a moral 409 action represents adherence to or deviation from a moral norm) equates to moral 410 discriminability and determines the rate of moral decision evidence accumulation. 411 We did not find support for our hypothesis that context-expectancy would interact 412 with the moral valence effect. Our RT results instead suggest that these two effects were 413 additive. These results are somewhat in discord with a previous finding that some participants 414 reduced the intensity of their negative moral judgements (but not positive moral judgements) 415 when expecting a contextual update 9 . There are several explanations for this discrepancy. incorrect answers in more than 40% of catch trials of either category; see below), and two 462 participants had missing responses for over 5% of trials, again suggesting a lack of attention. 463 We preselected the final sample such that all included participants would rely on the same 464 moral norms to make their judgements. This was done to avoid possible confounding of 465 response times due to potential differences in norm-related information processing across 466 norms, and to ensure that all participants were assigning moral meaning to presented stimuli 467 in a similar manner (which is a necessary assumption of the DDM when fit for a group of 468 participant datasets). Based on previous research using a similar task, we expected the largest 469 group to be participants who endorsed high and condemned low offers 9 . A strong positive 470 correlation between moral judgements and Decision-maker's offer was typical for this largest 471 group. We excluded eleven participants who did not show this strong positive correlation 472 (Spearman correlation was below r = .5). All of these criteria were predefined and 473 preregistered (http://aspredicted.org/blind.php?x=n2fi7g). The final sample consisted of 55 474 participants (37 female, 18 male, M age = 24.84, SD = 5.86, range: 18-43 years).

attention-check criterion (see above) and three had missing responses for over 5% of trials. 479
Another ten participants were excluded because their moral judgements did not correlate 480 strongly with the Decision-maker's offer (Spearman correlation r < .6). All of these criteria 481 were predefined and preregistered (https://aspredicted.org/blind.php?x=dy3qk9). Participants made responses on a black Hewlett-Packard KU1469 QWERTY keyboard. The 489 "z", "x", "." and "/" keys were covered with white stickers to indicate to participants that 490 these were the primary buttons to be used in the experiment. They were instructed to place 491 their fingers on these keys in preparation for every trial in the following manner: the middle 492 finger and the index finger of their left hand were to be placed on the "z" and "x" keys, 493 respectively, and the index finger and the middle finger of their right hand were to be placed 494 on the "." and "/" keys, respectively. 495

Experimental Paradigm 496
Cover Story. Participants first read a cover story about a recently conducted 497 experiment investigating people's economic decisions. This experiment was fictional, but 498 participants were not informed of this. In the fictional experiment a group of people, assigned 499 to pairs, completed a two-round variant of the dictator game (for the original dictator game, 500 see ref 58 ). In the first round, one person (the "Decision-maker") in each pair was given $10 could give any whole dollar portion (i.e. any amount $0-$10). In the second round, the same 503 task was repeated except with people taking new roles -first round Decision-makers 504 became Receivers in the second round -and were assigned different partners. Some of these 505 new partners were Decision-makers in the first round of the experiment ("Old Receivers") 506 and some of them were not ("New Receivers"). Importantly, second round Decision-makers 507 were aware whether their partner was an Old Receiver or a New Receiver. If their partner 508 was an Old Receiver, they were also aware how much money their partner had shared with 509 another person in the first round of the experiment. A visualisation of this cover story is 510 shown in Figure 1a. 511 Instructions. This cover story along with the description of the experimental task 512 were presented to participants via text interleaved with animated depictions. Participants read 513 the instructions and attended to animations at their own pace. Participants were then required 514 to pass, with 100% accuracy, a test comprised of 32 true-false questions which assessed their 515 understanding of both the cover story and the experiment instructions. Participants could 516 attempt this instruction-check test three times. If they experienced troubles completing the 517 quiz, participants could return to the cover story or instruction presentations to clarify their 518 understanding or ask questions of the experimenters for the same. Participants were required 519 to pass this test before continuing to the experiment. 520 Experimental task. Participants were asked to observe a series of independent 521 transactions that various Decision-makers made towards various Receivers as described in 522 the cover story. Each trial started with the participant being shown, for 3 s, whether the 523 Receiver for that trial was an "OLD Receiver" (for context-expectant trials) or a "NEW 524 Receiver" (for no-context trials) which corresponded to whether the Receiver participated in 525 the first round of the fictitious experiment. Then, a fixation cross was presented in the middle 526 of the screen for 2 s. Participants were then presented with the phrase "Decision-maker gave: $y" where y was an integer from the set Y = {0, 1, 2, … 10} ("Decision-maker offer"). 528 Simultaneously, response options "very bad", "bad", "good" and "very good" were presented 529 below the Decision-maker offer. Participants selected their response, with a maximum 530 response window of 3 s, to indicate how morally good or bad they believed this Decision-531 maker's action was by pressing the button on the keyboard corresponding to the position of 532 the presented option. To control for possible RT differences that could arise due to 533 differences in motor execution across different fingers, participants were randomly assigned 534 one of four possible mappings of responses to buttons, and this mapping remained the same 535 throughout the experiment. Four mappings were selected to ensure that across participants 536 any of the four fingers was mapped onto each response option. For consistency, none of the 537 mappings had a monotonically increasing or decreasing order in space. 538 Once participants made their response, the corresponding response option 539 immediately changed colour to yellow until the 3 s time-limit had elapsed (or for 0.3 s if the 540 judgment was made between 2.7 s and 3 s) before reverting to white. This was done to assure 541 participants that their response had been recorded. 542 Participants were then shown another fixation cross above this information for 0.5 s. 543 The stimuli presented next differed depending on the experimental condition of the trial. In 544 context-expectant trials, participants were presented with the phrase "Receiver gave: $x", 545 where x was an integer from the set X = {0, 1, 2, … 10}, providing the contextual 546 information of how much the Receiver had given when they were a Decision-maker the first 547 round. In the no-context trials, participants were presented with the phrase "NEW Receiver", 548 reminding them that the Receiver had not participated in the first round, and thus there was 549 no contextual information about them available. In both conditions, participants made a 550 second moral judgment, within 3 s, about the Decision-maker's action (not the Receiver's colour to yellow until the 3 s time-limit had elapsed (or for 0.3 s if the judgment was made 553 between 2.7 s and 3 s), after which a new trial began. 554 There were 121 trials in each condition, totalling 242 trials per participant. This was 555 chosen such that in the context-expectant condition, participants made moral judgments about 556 all possible combinations of the Decision-maker's offer (i.e. Decision-maker gave $0-$10) 557 and the Receiver's prior offer (Receiver gave $0-$10; 11 × 11 = 121). To ensure there was 558 symmetry between the experimental conditions, we also included 121 trials for the no-context 559 condition. In Experiment 1 the order of these 242 trials was randomised for each participant 560 and the two trial types alternated randomly (i.e. the two conditions were interleaved). In 561 Experiment 2, we used a version of the experiment with the two expectancy conditions 562 presented in separate blocks. There were 40-41 trials of the same kind in each block and 6 563 alternating blocks in total. The order of trials was randomised for each participant, and the 564 participants were randomly assigned one of the two alternating block sequences. 565 Questionnaires. Following the experiment, participants completed various personality 566 measures. We administered the agreeableness section of the HEXACO Personality Inventory-567 Revised (HEXACO) 59 , a brief set of self-report measures for political orientation 60 , the Social 568 Dominance Orientation scale (SDO) 61 , the Consequentialist Thinking Scale (CTS) 62 , and 569 basic demographic measures. We will analyse and report the questionnaire results in a 570 separate publication. 571

Experiment Feedback and Instruction Checks. Participants were instructed to 572
respond as quickly and accurately as possible and always give a response. If they failed to do 573 so within the 3 s time limit, they were presented with feedback at the end of that trial advising 574 which response was missing (or both) and to "please make sure you always respond". Two Decision-maker and/or the Receiver had given. Participants responded by entering this value 578 into number keys on the keyboard. For the second attention-check participants had to report, 579 via button press, whether the Receiver in the current trial was an Old Receiver or New 580 Receiver. Participants were instructed that both these attention-check trials would occur at 581 random times during the experiment. 582

Statistical Analyses 583
Regression Analysis. RTs for the first moral judgement were modelled with the 584 Generalised Linear Mixed Models (GLMMs) approach which is a form regression suitable 585 for hierarchical data (e.g. data of multiple individuals in several conditions) that is not 586 normally distributed. Invalid trials (i.e. trials without any response) were excluded from all 587 the analyses (0.72% of all trials in Experiment 1, and 1.17% in Experiment 2). GLMMs are 588 superior to the common practice of transforming data before applying an ordinary-least-589 square linear mixed model 63 . GLMMs were specified as follows: An identity link was used 590 because it assumes that RTs are direct measures of the duration of the decision process, rather 591 than functional transformations of this duration 63 . A gamma distribution was used as the 592 conditional distribution as it provided a good empirical fit to the data. Moreover, gamma-593 distributed GLMMs have been used in numerous RT studies with similar tasks 64-67 . Lastly, 594 random effects were included in the model to account for individual differences. 595 We compared a list of theoretically plausible candidate models which were derived 596 with an increasingly complex random effects structure, as shown in Supplement 1 Table S1. 597 For each random effect structure, a model was fit both with and without a fixed interaction 598 parameter. For all models, the random effects were allowed to correlate; that is, the model 599 had an unstructured variance-covariance matrix. Model parameters were estimated using 600 maximum likelihood estimation via the Laplace approximation, implemented with the because, unlike the likelihood ratio test, the AIC method helps prevent overfitting 69 . AIC was 605 also preferred over the Bayesian Information Criterion 70 because it was unlikely that any of 606 our candidate models are the true model, which better agreed with the assumptions of AIC 71 . 607 Akaike weights 72 were calculated for all candidate models as a means to quantify the relative 608 merits of the competing models, and the degree to which one model should be preferred over 609 the others. Confidence intervals (and where necessary, p values) for fixed effects were 610 calculated for most models using Wald's z method 73 . The fixed parameter effects from the 611 best fitting model, and their 95% confidence intervals, were then used to test our hypotheses. free parameters (a, v, z, and t). HDDM estimates these parameters for each individual, as well 617 as at the group level (which are the estimates we report in this publication). This analysis was 618 not preregistered, but was run separately for Experiment 1 and Experiment 2 samples 619 allowing us to assess whether the findings replicated across samples. Estimation procedure 620 implemented in the HDDM package was chosen as it outperforms other estimation 621 techniques and can accurately recover model parameters based on a small number of 622 observations per participant, especially for participant sample sizes larger than 20 75 . Since the 623 DDM is sensitive to outliers, it is recommended to devise exclusion criteria that ensure that 624 some of the contaminant RTs are excluded whilst ensuring that criteria do not exclude larger 625 portions of the data (e.g., more than 1%) 76 . We conservatively excluded trials in which 626 reaction time was faster than 0.2 s (0.05% of valid trials in Experiment 1 and 0.27% of valid 627 trials in Experiment 2), and slower than 2.8 s (0.37% of valid trials in Experiment 1 and 628 0.37% of valid trials in Experiment 2). The DDM was designed for binary decisions (e.g., 629 "good" versus "bad"), which means that in order to model our data using the DDM, we 630 simplified our data by collapsing across "very good" and "good" responses (good judgement) 631 and across "very bad" and "bad" responses (bad judgement). We formulated two models to 632 address our hypotheses: m0 -the null model which assumes no difference between 633 conditions when estimating DDM parameters; m1 -the hypothesised model, in which 634 parameter a was allowed to vary across two expectancy conditions (a context-expectant and a no-  (Table S4). Similarly, even using a gaussian conditional 866 distribution (i.e. an ordinary linear mixed model) yields similar fixed effects results (Table  867 S5). These analyses showed that results remain similar regardless of the exact methodology 868 utilised. 869 870 Note. DM = Decision-maker.  Figure S1. Diagnostic plot of model raw conditional residuals (y axis) by model predicted 895 value (x axis). Note that increasing residual raw variance (heteroscedasticity) is expected for 896 gamma models as the variance increases with the mean of the distribution. 897 898 899 Figure S2. Diagnostic plot of model raw marginal residuals (y axis) by model predicted 900 value (x axis). Note that increasing residual raw variance (heteroscedasticity) is expected for 901 gamma models as the variance increases with the mean of the distribution.  Figure S10. Comparison of the observed RT quantiles and RT quantiles from a dataset 944