Perseverative behavior under uncertainty is enhanced by tryptophan depletion but diminished with subclinical obsessive–compulsive symptoms

The authors have withdrawn their manuscript as the results and interpretation have changed substantially following further analyses. Please do not cite this manuscript in its current form. If you have any questions, please contact the corresponding author.


Introduction
To understand the neurochemical mechanisms supporting flexible decision-making would have profound implications for improving the lives of individuals with a wide range of neuropsychiatric disorders. Serotonin has a well-established role in adapting behaviour flexibly to new circumstances; however, the preponderance of evidence comes from nonhuman animals (Bari et al. 2010;Lapiz-Bluhm et al. 2009;Matias et al. 2017;Rygula et al. 2015). Flexible decision-making is often modelled using probabilistic reversal learning (PRL) paradigms, in which subjects first learn, through trial and error, which of two stimuli more commonly leads to positive reinforcement; the contingencies then swap in a reversal phase. A core appeal of PRL is its translational value, amenable to study in rodents, monkeys, and humans both in health and disease (Robbins and Cardinal 2019).
Acute tryptophan depletion (ATD) is the technique most commonly used for studying serotonin in humans (Faulkner and Deakin, 2014) and has been in use for over 30 years (Bel and Artigas 1996;Biggio et al. 1974;Young, 2013). ATD capitalises on the principles of serotonin synthesis and transport across the blood-brain barrier (BBB), and involves dietary depletion of serotonin's biosynthetic precursor, tryptophan, in the presence of other amino acids (Hood et al. 2005). This decreases central serotonin synthesis (Crockett et al. 2012;Nishizawa et al. 1997). While many insights about the function of serotonin have been gleaned via ATD (Faulkner and Deakin 2014;Mendelsohn et al. 2009), evidence that ATD modulates choice behaviour during PRL remains notably absent.
Existing studies of ATD and PRL have focused on observable behavioural measures analysed using classical statistics. These have included "win-stay" behaviour (staying with the same choice after a win) and "lose-shift" behaviour (changing choices after negative feedback), which assay sensitivity to immediate feedback. Perseveration-inappropriately staying with the previously optimal choice following contingency reversal-has also been assessed. However, there has been a proliferation of ways in which perseveration has been measured, which hampers comparison across species, studies, and paradigms (den Ouden et al. 2013). No effects of ATD on conventional measures of choice during PRL have been reported to date (Evers et al. 2005;Finger et al. 2007;Murphy et al. 2002;Kanen et al. 2020). We recently attempted to overcome some of the shortcomings of prior studies (Evers et al. 2005;Finger et al. 2007;Murphy et al. 2002) by nearly tripling the sample size, testing both sexes, using a between-subjects design to avoid practice effects, and calculating additional conventional measures-to no avail (Kanen et al. 2020). That we were unable to identify an effect of ATD on choice during PRL is puzzling, especially given that acute selective serotonin reuptake inhibitors (SSRIs) do have an effect on PRL in humans (Chamberlain et al. 2006;Skandali et al. 2018).
Here, we make another attempt to uncover the effects of ATD during PRL, by employing an entirely different methodology. We applied computational modelling of reinforcement learning processes, using Bayesian statistics. Computational modelling allows the examination of how learning evolves dynamically as experience accrues, using trial-bytrial raw data rather than relying on averaged data such as the probability of shifting (Daw 2011). This computational approach has shed light on the role of serotonin during PRL in marmosets (Rygula et al. 2015) and also revealed dissociable deficits during PRL in obsessive-compulsive disorder (OCD) and stimulant use disorder (SUD) (Kanen et al. 2019): a rigorous cross-species and clinical comparison is therefore possible and might represent a major advantage.
Guided by previous studies (Kanen et al. 2019;Rygula et al. 2015), the concepts we were most interested in examining were the reward learning rate, punishment learning rate, overall sensitivity of behaviour to reinforcement (via a softmax inverse temperature parameter), and stimulus stickiness. These model parameters enable the assessment of whether ATD affects the rate at which individuals update their valuation of choices following reinforcement; whether this differs for positive (reward learning rate) or negative feedback (punishment learning rate); whether ATD influences the extent to which actions are guided by reinforcement-driven valuation (e.g. versus random responding) (reinforcement sensitivity); and if ATD modulates the extent to which behaviour is driven by a basic tendency to choose recently chosen options, regardless of reinforcement (stimulus stickiness). Unlike conventional measures of perseveration, which could reflect a variety of underlying mechanisms (e.g. failure to update the internal "value" of stimuli), stimulus stickiness is conceptualized as a specific inclination towards stimulus-bound behaviour (e.g. Miller et al. 2019). Because focal serotonin depletion from the marmoset amygdala or orbitofrontal cortex (OFC), via the neurotoxin 5,7-dihydroxytryptamine (5,7-DHT), led to alterations in stimulus stickiness as well as learning rates (Rygula et al. 2015), we predicted that ATD may affect the stimulus stickiness or learning rate parameters in humans.
The recent modelling effort in OCD and SUD (Kanen et al. 2019), which revealed novel PRL deficits beyond those previously detected via conventional means (Ersche et al. 2011), motivated another set of empirical questions tested here. Perhaps counterintuitively, individuals with OCD demonstrated diminished perseverative tendencies during PRL, reflected by lower stimulus stickiness (Kanen et al. 2019). That is, people with OCD showed increased shifting of their choice away from their previous choice, regardless of the outcome of their actions. In the present study, we tested whether lower stimulus stickiness in healthy volunteers was related to higher levels of obsessive-compulsive symptoms (Foa et al. 2002).
We also explored the specificity of the hypothesized inverse relationship between subclinical OCD symptoms and stimulus stickiness, by examining whether low stimulus stickiness might alternatively be related to self-reported intolerance of uncertainty (Carleton et al. 2007), symptoms of anxiety (Spielberger et al. 1983), or depression (Beck et al. 1996). We hypothesized that low reinforcement sensitivity (inverse temperature) might instead be related to elevated symptoms of depression in healthy volunteers, which would be reminiscent of the overall blunting of affect seen in depression (Huys et al. 2012;Mukherjee et al. 2020).
Moreover, individuals with SUD, a characteristically impulsive population (Ersche et al. 2010), have shown blunted reward learning rates yet elevated learning rates for negative feedback during PRL (Kanen et al. 2019). We therefore tested whether self-reported impulsivity (Patton et al. 1995) in healthy individuals was likewise correlated with a lower reward learning rate and an elevated punishment learning rate.
This study addresses the following questions. Does ATD modulate latent reinforcement learning mechanisms, in particular stimulus stickiness or reinforcement learning rates? Is stimulus stickiness related to obsessive-compulsive symptoms or instead intolerance of uncertainty, anxiety, or depressive symptoms in a non-clinical sample? How does reinforcement sensitivity relate to these self-report measures, in particular depressive symptoms? Is trait impulsivity correlated with learning rate parameters? Are any such relationships influenced by ATD? We investigated these questions in a large sample of healthy volunteers (n = 62), in a randomised double-blind placebo-controlled betweensubjects design: the primary objective was to better determine whether ATD modulates choice behaviour in PRL.

Participants
Participants (33 males, 17 placebo, 16 ATD; 29 females, 15 placebo, 14 ATD) were screened to be psychiatrically healthy using the Mini International Neuropsychiatric Interview (MINI; Sheehan et al. 1998). They were medically healthy, free of regular medication besides contraception, and had not taken psychiatric or neurological medication in the past. Participants reported having no first-degree relatives with a diagnosed mental illness at the time of screening. Further participant characteristics and exclusion criteria can be found in Kanen et al. (2020Kanen et al. ( , 2021. Participants gave informed consent and were paid.

General procedure
The study was approved by the Cambridge Central Research Ethics Committee (16/EE/0101) and was carried out at the National Institute for Health Research/Wellcome Trust Clinical Research Facility at Addenbrooke's Hospital in Cambridge, England.
Volunteers arrived in the morning having fasted for at least 9 hours beforehand, completed a 16-item visual analogue scale (VAS) to rate baseline mood and other feelings including alertness, gave a baseline blood sample, and ingested either a tryptophan-containing placebo mixture, or a tryptophan depletion drink, administered in a randomised and doubleblinded fashion. Approximately 4.5 hours after ingesting the drink (Carpenter et al. 1998), participants completed the VAS again, gave another blood sample, and completed the PRL task. Specific quantities of amino acids used can be found in Kanen et al. (2020).

Probabilistic reversal learning task
The task is shown in Figure 1 and contained 40 trials during acquisition and 40 following reversal for a total of 80 trials (Chamberlain et al., 2006;Murphy et al., 2002;Skandali et al., 2018). During acquisition, one option yielded positive feedback on 80% of trials, the other option on 20% of trials. These contingencies reversed for the latter 40 trials.
Probabilistic reinforcement provides an element of uncertainty not present in tasks that employ deterministic reinforcement.

Classical statistics
Homogeneity of variance for t tests was verified with Levene's test, and degrees of freedom were adjusted when this assumption was violated. Multiple comparisons correction was conducted using the Benjamini-Hochberg procedure (Benjamini and Hochberg, 1995).

Overview
These methods are based on Kanen et al. (2019). Four reinforcement learning (RL) models were fitted to the behavioural data, which incorporated parameters that have been studied previously using a hierarchical Bayesian method (Kanen et al. 2019). The priors used for each parameter are shown in Table 1. Trials were sequenced across all 80 trials of the PRL task, and on each trial the computational model was supplied with the participant's identification number and drug condition, whether the trial resulted in positive or negative feedback, and which stimuli were presented.

Models
Model 1 incorporated three parameters and was used to test the hypothesis that ATD would affect the speed at which the action values guiding behaviour are updated following positive versus negative feedback. Separate learning rates for reward (positive feedback) α rew and punishment (negative feedback, nonreward) α pun were implemented. Positive reinforcement led to an increase in the value Vi of the stimulus i that was chosen, at a speed governed by the reward rate α rew , via Vi,t+1 ← Vi,t + α rew (Rt -Vi,t). Rt represents the outcome on trial t (defined as 1 on trials where positive feedback occurred), and (Rt -Vi,t) the prediction error. On trials where negative feedback occurred Rt = 0, which led to a decrease in value of Vi at a speed governed by the punishment rate α pun , according to Vi,t+1 ← Vi,t + α pun (Rt -Vi,t). Stimulus value was incorporated into the final quantity controlling choice according to Q reinf t = τ reinf Vt. The additional parameter τ reinf , termed reinforcement sensitivity, governs the degree to which behaviour is driven by reinforcement history. The quantities Q associated with the two available choices, for a given trial, were then input to a standard softmax choice function to compute the probability of each choice: for n = 2 choice options. The probability values for each trial emerging from the softmax function (the probability of choosing stimulus 1) were fitted to the subject's actual choices (did the subject choose stimulus 1?). Softmax inverse temperature was set to β = 1, and as a result the reinforcement sensitivity parameter (τ reinf ) directly represented the weight given to the exponents in the softmax function.
Model 2 was as model 1 but incorporated a "stimulus stickiness" parameter τ stim , which measures the tendency to repeat a response to a specific perceptual stimulus, irrespective of the action's outcome. This four-parameter model served to test whether accounting for stimulus-response learning, in addition to learning about action-outcome associations, would best characterise behaviour under ATD. The stimulus stickiness effect was modelled as Q stim t = τ stim st-1, where st-1 was 1 for a stimulus that was chosen on the previous trial and was otherwise 0. The final quantity controlling choice incorporated this additional parameter as Qt = Q reinf t + Q stim t. Quantities Q, corresponding to the two choice options on a given trial, were then fed into the softmax function as above.
Model 3 incorporated three parameters and served to test whether a single learning rate α reinf , rather than separate learning rates for positive and negative feedback, optimally characterised behaviour under ATD. Reinforcement led to a change in the value Vi of the stimulus i that was chosen, at a speed controlled by the reinforcement rate α reinf , via Vi,t+1 ← Vi,t + α reinf (Rt -Vi,t). Rt represents the outcome on trial t (1 for positive feedback, 0 for negative feedback). Model 3 also included the stimulus stickiness parameter. The final quantity controlling choice was determined by Qt = Q reinf t + Q stim t.
Derived from the experienced-weighted attraction model (EWA) of Camerer and Ho (1999), here it was implemented as in den Ouden et al. (2013) where the EWA model best described behaviour on a nearly identical task. A key difference to the other reinforcement learning models tested in this study is that here the learning rate can decline over time, governed by a decay factor ρ (rho). The EWA model weighs the value of new information against current expectations or beliefs, accumulated from previous experience.
Learning from reinforcement is modulated by an "experience weight", nc,t, which is a measure of how often the subject has chosen a stimulus (i.e. experienced the action), and is updated every time the stimulus is chosen (where c is choice and t is trial) according to the experience decay factor ρ (range 0 < ρ < 1) and can increase without bounds (den Ouden et al. 2013): The value of a choice is updated according to the outcome, λ, and the decay factor for previous payoffs, φ (range 0 < φ < 1) (den Ouden et al. 2013) The payoff decay factor φ (phi) is related to a Rescorla-Wagner-style (Rescorla and Wagner, 1972) learning rate α (as in Models 1-3), by α = 1 -φ. A high value of φ means that stimuli keep a high fraction of their previous value and thus learning from reinforcement is slow. When ρ is high, then "well-known" actions (with high n) are updated relatively little by reinforcement, by virtue of the terms involving n, whilst reinforcement has a proportionately larger effect on novel actions (with low n). For comparison to Models 1-3, when ρ = 0, the experience weight n, is 1, which reduces to a learning rate α controlling the influence of learning from prediction error. Choice in the EWA model is also governed by a softmax process, only here the softmax inverse temperature β was also a parameter able to vary, in contrast to Models 1-3.

Model fitting and comparison
Models were fitted using Hamiltonian Markov chain Monte Carlo sampling via Stan 2.17.2 (Carpenter et al. 2017). Convergence was checked with the potential scale reduction factor measure ^ (Brooks and Gelman,1998;Gelman et al. 2012), which approaches 1 for perfect convergence; values below 1.2 are typically used as a guideline for convergence and a cutoff of <1.1 is a stringent criterion for convergence (Brooks and Gelman 1998). The use of multiple simulation runs with measurement of convergence is an important check for simulation reliability (Wilson and Collins 2019) and is an intrinsic part of Stan. Parameter recovery from simulated data for this modelling approach has been confirmed by Kanen et al. (2019).
Models were compared using a bridge sampling estimate of the marginal likelihood (Gronau et al. 2017a) via the "bridgesampling" R package (Gronau et al. 2017b). This procedure directly estimates the marginal likelihood, and thus the posterior probability of each model, given the data, prior model probabilities, and the assumption that the models represent the entire family of those to be considered. All models were assumed to have equal prior probability.
In addition to the estimated parameters, group comparisons (e.g. between ATD and placebo for a given parameter) were also sampled directly to give a posterior probability distribution for each quantity of interest. Posterior distributions were interpreted using the 95% highest posterior density interval (HDI), the Bayesian "credible interval". Identical priors were used for all groups and conditions.

Choice of reinforcement learning model
The core modelling results are summarised in Figure 2. Computational modelling accounts for how behaviour is influenced by an integration of previous choices and feedback history from multiple experiences. Behaviour in the present PRL task was best characterised by a simple reinforcement learning model, as determined by a bridge sampling estimate of the marginal likelihood (Table 2). Four reinforcement learning models were fitted and compared. Convergence was nearly perfect with all four models having ^ < 1.001. The winning model (Model 2) included four parameters: 1) reward learning rate, the speed at which an action's value is updated following positive feedback; 2) punishment learning rate, the speed at which an action's value decreases following negative feedback; 3) reinforcement sensitivity, which is the degree to which overall outcome of behaviour contributes to choice (how heavily stimulus value learned through reinforcement is weighted); and 4) "stimulus stickiness" which indexes the tendency to get "stuck" to a cue: was the chosen stimulus selected on the previous trial, irrespective of outcome? It should be noted that Model 2 (which had separate learning rates for reward and punishment) and Model 3 (which had a single learning rate) were nearly equiprobable. Comparison via parameter estimation is in general better than via model comparison (Kruschke 2011): Model 2 both wins via model comparison and allows for a direct comparison of reward and punishment learning rates. In other words, a very close model comparison has gone the way that does not lead to potential loss of information.

Effect of ATD on model parameters
Modelling results are summarised in Figure 2. The key result was that stimulus stickiness was elevated under ATD compared to placebo (with the posterior 95% highest posterior density interval [HDI] of the difference between these means excluding zero; 0 ∉ 95% HDI). There were no differences between placebo and ATD for the reward learning rate, punishment learning rate, or reinforcement sensitivity parameters, which all had a 95% HDI containing zero (0 ∈ 95% HDI).

Relationship between obsessive-compulsive symptoms and stimulus stickiness
Results are summarised in Figure 3a The relationship between OC symptoms and stickiness was not modulated by serotonergic status. Analysis of covariance (ANCOVA) was performed to test an interaction effect of serotonin (placebo versus depletion) and OC symptoms (OCI-R scores) (between-subjects factors) on stimulus stickiness, controlling for main effects. A serotonin × OC interaction was not present, indicating that the relationship between total OCI-R score and stickiness was not modulated by serotonergic status (F(1,61) = 2.239, p = .14 , ηp 2 = .037). To characterise the relationship between OCI-R score and stickiness at baseline, we also conducted an exploratory analysis on the placebo group alone: lower stimulus stickiness remained correlated with elevated OCI-R scores (r(32) = -.529, p = .002).

Specificity of low stimulus stickiness to obsessive-compulsive symptoms
Results are summarised in Figure 3b-3d. To examine the specificity of the hypothesized relationship between low stimulus stickiness and greater OC symptoms, we conducted exploratory correlations to determine whether reduced stickiness in healthy volunteers is also related to elevations in trait anxiety, depressive symptoms, or intolerance of uncertainty. Stimulus stickiness was not correlated with intolerance of uncertainty (r(62) = -.032, p = .804), trait anxiety (r(62) = -.142, p = .270), or depressive symptoms (r(62) = -.054, p = .674), collapsing across serotonergic status. Repeating the above ANCOVA approach separately for intolerance of uncertainty, trait anxiety, and depressive symptoms confirmed there were no trait × serotonin interaction effects on stimulus stickiness (F < 1.4, p > .05, ηp 2 < .03 for all interaction terms).

Relationship between obsessive-compulsive symptom categories and stimulus stickiness
Because OCD and OC symptoms have numerous manifestations, which can be assessed using subscales of tools like the OCI-R, we followed up the significant correlation between stickiness and total OCI-R score by exploring categories of symptoms, collapsed across serotonergic status. Lower stimulus stickiness was significantly correlated with higher scores of checking (r (62)  These relationships were impervious to experimental changes in serotonin, as determined via ANCOVA: interaction effects, between serotonin status and each subscale, on stimulus stickiness were absent for all subscales (F < 3, p > .05, ηp 2 < .05 for all interaction terms).
This converges with evidence from individuals with OCD, who show aberrantly low stickiness despite serotonergic medication (Kanen et al. 2019).

Relationship between trait measures and reinforcement sensitivity
Results are shown in We then explored the relationship between individual subject reinforcement sensitivity parameter estimates and self-report measures in the placebo group only, to characterise these relationships at baseline, when unaffected by ATD, which may have been obscured by including all participants in the same analysis. As was the case for the entire sample, diminished reinforcement sensitivity remained correlated with elevated BDI scores when isolating the placebo group (r(32) = -.421, p = .016). On placebo, lower reinforcement sensitivity was additionally correlated with greater intolerance of uncertainty (r(32) = -.433, p = .013) and higher trait anxiety (r(32) = -.366, p = .04), but not with OCI-R scores (r(32) = -.246, p = .174). These three significant correlations involving reinforcement sensitivity in the placebo group survived the Benjamini-Hochberg procedure for four comparisons.

Relationship between learning rates and trait impulsivity
Results are displayed in Figure 5. Next, we tested the relationship between learning rates and trait impulsivity, assessed using the Barratt Impulsiveness Scale (BIS; Patton et al. 1995). This analysis was prompted by Kanen et al. (2019): individuals with stimulant use disorder (SUD), who are characteristically impulsive (Ersche et al. 2010), demonstrated diminished reward and elevated punishment learning rates during PRL. Indeed, analysis of healthy volunteers under placebo revealed that individuals higher in trait impulsivity showed a lower reward learning rate (r(32) = -.363, p = .041) and a higher punishment learning rate (r(32) = .375, p = .034). Correlations, within the depletion, were not observed between trait impulsivity and the reward learning rate (r(30) = -.086, p = .652) or the punishment learning rate (r(30) = .267, p = .153). Critically, the two significant correlations in the placebo group survived the Benjamini-Hochberg procedure for four comparisons.
There was no serotonin × impulsivity interaction effect on either the reward or punishment learning rate, as assessed by two ANCOVAs (F < .7, p > .05, ηp 2 < .015, for both interaction terms).

Discussion
In light of the previous difficulty in detecting effects of ATD on choice behaviour during PRL (Evers et al. 2005;Finger et al. 2007;Murphy et al. 2002;Kanen et al. 2020), we reasoned that the microstructure of behaviour as revealed through computational modelling, might be more sensitive to ATD. Indeed, the present study uncovered effects of ATD on choice tendencies during PRL for the first time, enabled by computational methods. Beyond the effect of ATD, the modelling approach also enabled new relationships to be uncovered between the parameters governing behaviour and both selfreported traits and subclinical symptoms. Specifically, this study has shown that 1) ATD increased stimulus stickiness; 2) diminished stimulus stickiness was related to elevated obsessive-compulsive symptoms and not to depressive symptoms, trait anxiety, or intolerance of uncertainty; 3) diminished stimulus stickiness was correlated with all OC subscales (checking, washing, etc.) except for hoarding; 4) diminished reinforcement sensitivity was related to elevated depressive symptoms, trait anxiety, and intolerance of uncertainty, but not OC symptoms; and 5) higher trait impulsivity was related to diminished reward and enhanced punishment learning rates.
That stimulus stickiness was elevated under ATD may be particularly relevant for understanding stimulant use disorder, where enhanced stimulus stickiness was recently found during PRL (Kanen et al. 2019). There is robust evidence that serotonin in the OFC is critical for reversal learning (Barlow et al. 2015;Clarke et al. 2004;Lapiz-Bluhm et al. 2009). Indeed, it has been shown that individuals with SUD have decreased serotonin concentration in the OFC assessed post mortem (Wilson et al., 1996), as well as diminished levels of the OFC serotonin transporter (Kish et al., 2009). Furthermore, rats prone to compulsive drug-taking have been shown to have diminished forebrain serotonin (assessed via the 5-HT:5-HIAA ratio) in ventral and dorsal PFC, ventral and dorsal striatum, and amygdala (Pelloux et al. 2012). Cox et al. (2011), moreover, studied non-dependent cocaine users and found that during cocaine administration, ATD enhanced drug craving, which was accompanied by elevated striatal dopamine responses.
Our result, that ATD enhanced stimulus stickiness, is strengthened when aligned with a series of studies using other behavioural paradigms showing that ATD increases compulsive tendencies in healthy volunteers (Seymour et al. 2012;Worbe et al. 2015;. One study found that ATD elevated a comparable measure of stickiness on a four-choice probabilistic task (Seymour et al. 2012). Other studies have shown a shift away from goal-directed behaviour towards habitual responding for rewards under ATD (Worbe et al. 2015;, a pattern of responding seen in SUD (Ersche et al. 2016;Voon et al. 2015a) but also relevant for understanding alcohol use disorder (Sjoerds et al. 2013;Chen et al. 2021), nicotine dependence (Luijten et al. 2020), binge eating disorder (Voon et al. 2015a), Tourette syndrome (Delorme et al. 2016), and OCD (Gillan et al. 2011;Voon et al. 2015a; but see Voon et al. 2015b, Worbe et al. 2016 for how effects in OCD and ATD differ when responding to avoid punishment).
Enhanced stickiness under ATD is also consistent with results in marmosets after serotonin depletion using 5,7-DHT (Rygula et al. 2015). Following administration of 5,7-DHT to either the OFC or amygdala, marmosets showed heightened location stickiness (a tendency to repeat choices in the same location as before, regardless of outcome; Rygula et al. 2015).
Additionally, marmosets that received 5,7-DHT in the OFC tended to repeat choices to recently chosen stimuli across a longer timescale (were marginally slower than controls to "update" the effects of stimulus stickiness following their choices), whereas 5,7-DHT in the amygdala produced a more ephemeral tendency to repeat choices (Rygula et al. 2015).
The modelling approach allowed for further comparisons to clinical conditions, beyond the effects of ATD. Healthy volunteers with higher OC symptom scores showed lower stimulus stickiness, which replicates previous results in clinically diagnosed OCD (Hauser et al. 2017;Kanen et al. 2019) and highlights a possible marker of vulnerability. Having established the relationship between diminished stimulus stickiness and OC symptoms in the present healthy sample, we were able to address a series of follow-up questions.
Previous studies had conjectured that the tendency of patients with OCD not to repeat previous choices during PRL could be a manifestation of checking behaviour-checking the previously unchosen option (Hauser et al. 2017;Kanen et al. 2019). Here we showed that it was not only the "checking" subscale of the OCI-R that was correlated with stimulus stickiness, but that all other subscales (neutralising, obsessing, ordering, washing) save for "hoarding" were correlated. The suggestion here is that the phenomenon of low stimulus stickiness is correlated with OCD symptoms globally, except perhaps hoarding. This could be relevant for the recent reclassification of hoarding disorder as a distinct condition from OCD, albeit in the same DSM5 section: obsessive-compulsive and related disorders (APA 2013). That hoarding was not related to diminished stimulus stickiness in this study raises the possibility that this parameter may be an important novel objective measure for distinguishing other related conditions such as body dysmorphic disorder, excoriation (skin-picking) disorder, and trichotillomania (hair-pulling disorder; APA 2013).
The significant correlations involving OC symptoms were impervious to changes in serotonin, which is consistent with Kanen et al. (2019), in which individuals with OCD showed diminished stimulus stickiness despite the fact that most were taking SSRIs. Whilst we also clarified that the stickiness parameter was not related to intolerance of uncertainty, anxiety, or depressive symptoms (regardless of serotonergic status), which we had charted as candidate phenomena underlying this behavioural pattern, the question remains: What does reduced stimulus stickiness mean for understanding OCD?
One intriguing account posits that the brain in OCD has difficulty estimating the probability of transitioning between states of the world (Fradkin et al. 2020a(Fradkin et al. , 2020b. In the context of PRL, one state would be the acquisition phase, another state would be the reversal phase. In OCD, this would map onto statistical uncertainty (variance) about the likelihood of transitioning from dirty to clean hands after washing, for example. The optimal response to uncertainty about the current state would be to continue gathering information (Fradkin et al. 2020a). In the context of PRL, this would manifest as overly exploratory responses, consistent with low stimulus stickiness as seen here in individuals with high subclinical OCD symptoms, and in clinical OCD (Kanen et al. 2019). Importantly, the OFC, a key area of dysfunction in OCD (Chamberlain et al. 2008;Remijnse et al 2006), is implicated in representing states (Schuck et al. 2016;Wilson et al. 2014). In extension of this framework, it should be noted that SUD, where high stimulus stickiness has been observed (Kanen et al. 2019), may be characterised by over-encoding of state-specific rules and information (Mueller et al. 2020).
Subsequent analyses primarily focused on the placebo group in order to assess baseline relationships between model parameters and self-report measures. There was a striking dissociation in how reinforcement sensitivity as opposed to stimulus stickiness related to self-report measures: Blunted reinforcement sensitivity was related to depressive symptoms, as well as trait anxiety and intolerance of uncertainty, but not to OC symptoms.
The learning rates of impulsive healthy volunteers, furthermore, aligned with what has been reported in SUD (lower reward and higher punishment learning rates; Kanen et al. 2019), providing additional replications between the non-clinical sample studied here, and clinical populations. It also raises the possibility that alterations in learning rates may precede and potentially predispose to problematic use of stimulants like cocaine and methamphetamine, rather than being a consequence of drug toxicity. That impulsivity is a behavioural endophenotype, or vulnerability trait for stimulant abuse, found also in firstdegree unaffected relatives (Ersche et al. 2010), suggests that learning rates assessed using computational modelling may represent an additional endophenotype warranting further study. It is advantageous that learning rates are objective measures that can be assessed across paradigms, populations, and species.
The present study has, for the first time, identified a latent mechanism supporting behaviour that was modulated by ATD during PRL. That computational methods unearthed effects in the face of null results on conventional measures (Evers et al. 2005;Finger et al. 2007;Kanen et al. 2020;Murphy et al. 2002) highlights the sensitivity and promise of applying reinforcement learning models. ATD enhanced a basic perseverative tendency, which, remarkably, aligned with behaviour in SUD (Kanen et al. 2019). Considering a series of self-report measures in a hypothesis-driven approach yielded several important clinically relevant insights. Behavioural patterns in clinical OCD and SUD were replicated using analogous methods in healthy volunteers high in OC symptoms and trait impulsivity, respectively. Applying reinforcement learning models to PRL may hold promise for refining psychiatric classification (Cuthbert and Insel 2013) and identifying vulnerability factors to an array of psychopathologies (Marx et al. 2020), including disorders of compulsivity. Figure 1. Task schematic. Two visual stimuli were presented simultaneously in two of four randomised locations on each trial. Participants needed to choose one stimulus by pressing on the touchscreen. Feedback not shown. The task was self-paced.

Figure 2.
Effects of ATD relative to placebo. Stimulus stickiness was enhanced following ATD. Red signifies a difference between the parameter per-condition mean according to the Bayesian "credible interval", 0 ∉ 95% HDI.    Immediate perseveration, by contrast, refers to perseverative errors contiguous with the reversal (Murphy et al. 2002); however, the first trial in the reversal phase (trial 41 of 80) was excluded from the perseveration analysis because at that point behaviour cannot yet be shaped by the new feedback structure (e.g. an incorrect response on trials 42, 43, and 44 would be three immediate perseverative errors). These tests were performed in the placebo and depletion groups separately. The results were subsequently subjected to the Benjamini-Hochberg procedure. Stimulus stickiness was the only parameter correlated with several conventional measures of behaviour after accounting for 48 comparisons.
Significant correlations were followed up by testing whether the relationships were significantly different between the placebo and depletion groups.

Relationship between model parameters and extent of depletion
Individual subject values of each of the four parameters from the winning model were tested for correlation with the extent of depletion achieved. Using data from all subjects, none of the four parameters were correlated with the change in the tryptophan to large neutral amino acid (TRP:LNAA) ratio: reward rate (r(59) = -.241, p = .065), punishment rate (r(59) = .151, p = .253), reinforcement sensitivity (r(59) = -.023, p = .861), and stimulus stickiness (r(59) = -.057, p = .666). When examining the ATD group alone, there also was no correlation between the extent of depletion and stimulus stickiness r(28) = -.040, p = .839).