Abstract
Background Schizophrenia is a complex disorder in which the causal relations between risk genes and observed clinical symptoms are not well understood and the explanatory gap is too wide to be clarified without considering an intermediary level. Thus, we aimed to test the hypothesis of a pathway from molecular polygenic influence to clinical presentation occurring via deficits in reinforcement learning.
Methods We administered a reinforcement learning task (Go/NoGo) that measures reinforcement learning and the effect of Pavlovian bias on decision making. We modelled the behavioural data with a hierarchical Bayesian approach (hBayesDM) to decompose task performance into its underlying learning mechanisms. Study 1 included controls (n= 29, F|M=0.81), At Risk Mental State for psychosis (ARMS, n= 23, F|M=0.35) and FEP (First-episode psychosis, n= 26, F|M=0.18). Study 2 included healthy adolescents (n= 735, F|M= 1.06), 390 of whom had their polygenic risk scores for schizophrenia (PRSs) calculated.
Results Patients with FEP (but not ARMS) showed significant impairments in overriding Pavlovian conflict, a lower learning rate and a lower sensitivity to punishment. PRSs did not significantly predict performance on the task in the general population, which did not strongly correlate with measures of psychopathology.
Conclusions Reinforcement learning deficits are observed in first episode psychosis but not clinical risk for psychosis, and were not predicted by molecular genetic risk for schizophrenia in healthy individuals. The study does not support the role of reinforcement learning as an intermediate phenotype in psychosis.
1. Introduction
Cognitive deficits are commonly observed in schizophrenia, including prominent deficits in decision making and in reinforcement learning (trial and error based learning from feedback). Reinforcement learning (RL) is a cognitive domain of interest, not only because impairments in this domain may have a direct impact on educational and occupational outcomes, but also because reinforcement learning deficits may mechanistically contribute to the pathogenesis of positive and/or negative symptoms of schizophrenia and other psychoses (Frank, 2008; Deserno et al., 2013; Murray et al., 2016). Reinforcement learning has been suggested as a candidate process for an intermediate phenotype in schizophrenia, lying on the casual path between identified risk factors and the full clinical expression of the phenotype of illness (Kasanova et al., 2018)
Despite the strong role for genetics in the aetiology of schizophrenia (Tsuang, 2000), there is only indirect evidence that reinforcement learning deficits in schizophrenia are at least partly genetic in origin. Recent evidence indicates shared genetic overlap between the genes underpinning general intellectual function and schizophrenia liability (Toulopoulou et al., 2018), and reinforcement learning correlates significantly with IQ (Chen, 2015). However, much less is known concerning the genetic basis of specific cognitive deficits in schizophrenia. There is evidence that some aspects of reward processing, which is abnormal at different stages of psychosis (Murray et al., 2008; Ermakova et al., 2018), may be an intermediate phenotype in schizophrenia. For example, relatives of people with schizophrenia show altered brain activation during reward anticipation during fMRI scans (Grimm et al., 2014). Furthermore, molecular genetic risk for schizophrenia is associated with reward related brain activation: in the IMAGEN study of about 2000 14-year-olds, Lancaster et al. (2016) found that schizophrenia polygenic risk scores were associated with striatal activation during reward anticipation. If this altered brain activation is manifest in the altered ability to learn about rewards and reward-related decision making, then we expect that reward-based reinforcement learning behaviour should also be related to polygenic risk for schizophrenia.
If reinforcement learning is an intermediate phenotype for schizophrenia, it is also to be expected that individuals who partially express the schizophrenia phenotype genetically, such as people at increased clinical risk of developing psychosis (At Risk Mental States ARMS), should show a degree of deficit in reinforcement learning, but of lesser severity than individuals with the full illness phenotype. Of relevance, here is the study of schizotypal traits in the general population. It is not yet established whether schizotypal traits or clinical risk for psychosis are associated with altered reinforcement learning. Recent evidence has suggested that patients at clinical risk for psychosis show subtle subcortical prediction error abnormalities during reinforcement learning (Ermakova et al., 2018), but whether these neural deficits are associated with the behavioural deficits are not clear, as they may be insufficient to result in altered behaviour or may be compensated for by adaptations in other brain regions (Murray et al., 2010).
There is some suggestion that reinforcement learning abnormalities may not be uniform in schizophrenia but may be particularly prominent in certain patient groups. For example, reward-related reinforcement learning deficits are particularly prominent in patients with negative symptoms, consistent with the possibility that such deficits may causally contribute to the pathogenesis of such symptoms (Gold et al., 2012). Further support for such a link between reinforcement learning deficits and negative symptoms comes from computational modelling studies that tried to tease apart the different learning mechanisms involved. For example, Albrecht et al., (2016) administered a Go/NoGo reinforcement learning task (Guitart-Masip et al., 2012) to a group of chronic schizophrenia patients. Patients showed impaired Pavlovian biases, a tendency to seek a reward with action invigoration and avoid a punishment with action suppression, possibly suggesting a reduction of those mechanisms in the striatal regions that give rise to the Pavlovian biases in the first place, coupled with a disruption in communication between these striatal areas and the prefrontal cortex. The influence of Pavlovian biases on reinforcement learning have not previously been studied in first episode psychosis (FEP) or clinical risk for psychosis, and it is not known whether profiles of reinforcement learning differ across different stages of psychotic illness or in patients who are or who are not taking antipsychotic medication. The effects of Pavlovian biases on learning and decision making are of interest for both theoretical reasons, as in relation to pathogenesis of psychiatric symptoms (Moutoussis et al., 2018) and in decision-making in everyday life (Hunt et al., 2016).
If reinforcement learning is an intermediate phenotype in schizophrenia, we hypothesised to find reinforcement learning deficits in FEP patients, in ARMS individuals, and in members of the general population with a raised molecular genetic risk for the disorder. Further, we would expect that reinforcement learning performance would relate to trait schizophrenia measures in the population. We thus studied reinforcement learning in a group of FEP patients, ARMS individuals, and healthy individuals. In several hundred healthy individuals we examined whether their performance on a reinforcement learning task related to their molecular genetic risk for schizophrenia and to their psychopathology. We hypothesised that impairments in reinforcement learning would relate to trait level manifestations of subclinical positive and negative symptoms. We combined standard measures of learning with a computational psychiatry analysis approach (Teufel & Fletcher, 2016; Redish & Gordon, 2016), as it offers the possibility of developing rigorous and testable models of behaviour that can contribute to our understanding of how abnormal neurobiological substrates become expressed in clinical phenotypes.
2. Methods and Materials
Participants
Clinical study
We recruited three groups of participants aged 17 to 35 (mean age 22.8 years): n= 23 participants for the ARMS group, n= 26 FEP patients and n= 29 Controls. FEP participants were recruited from the Cambridge First Episode Psychosis service, CAMEO. ARMS participants were recruited through CAMEO, through advertisements at University Counselling Services, and from existing local research databases; ARMS status was confirmed using the CAARMS interview Comprehensive Assessment of At Risk Mental States (CAARMS), as used in the EDIE-II trial (Morrison et al., 2012). Medication details can be found in Table 6 in the Supplementary Section. Controls were recruited thorough advertisement in Cambridgeshire and through existing University of Cambridge research databases. Exclusion criteria were: current or past history of neurological disorder or trauma, currently or recently participating in a clinical trial of an investigational medical product, learning disability, or not satisfying standard MRI safety exclusion criteria, including pregnancy. The latter requirement was due to the fact that a subset of volunteers had MRI scans, reported elsewhere (Whittaker et al., 2016). Past or current treatment for a mental health problem was an exclusion criterion for controls. The project received ethical approval from the National Research Ethics Service. Written informed consent was signed by all participants; if they were below 16 years of age, then written parental consent was also required. Further demographic information can be found in Table 2 in the Supplementary Material.
Healthy adolescent volunteer study
N= 785 participants took part (mean age 18.6 years, SD= 2.96; F|M=1.06) and underwent cognitive reinforcement learning testing. Participants were recruited from General Medical Practice lists as a sampling frame as well as by direct advertisement so as to represent the UK population in this age range (Kiddle et al, 2017). Inclusion criteria were age 14 to 24 years old, able to understand written and spoken English, living in Greater London or Cambridgeshire & Peterborough, being willing and able to give informed consent for recruitment into the study cohort and consent to be re-contacted directly. Exclusion criteria were as described above for controls in the clinical study. A detailed analysis of reinforcement performance in these participants is available in Moutoussis et al., (2018), which does not address molecular genetics or schizotypal traits. Further demographic information can be found in Table 3 in the Supplementary Material.
Psychopathology measures
The participants in the Clinical study were administered: the Comprehensive Assessment of At Risk Mental States (CAARMS) (Yung et al., 2005), providing operational criteria for identification of clinical risk for psychosis; the Mood and Feelings Questionnaire (MFQ) subset of the Young People Questionnaire (YPQ) (Costello & Angold, 1988) to measure depressive symptoms; the Positive and Negative Symptoms Scale (PANSS)(Kay et al., 1987); to measure schizotypy they were administered the 21-items Peters Delusions Inventory (PDI-21) (Peters et al., 2004) and the Schizotypal Personality Questionnaire (SPQ); IQ was measured from combining the scores of two subscales of the Wechsler Abbreviated Scale of Intelligence (WASI), namely the Vocabulary and Matrix subtests. The healthy adolescent participants were administered the following: MFQ; PLIKS (Psychosis-Like Symptoms) to measures unusual experiences, hallucinations and delusions (Zammit et al., 2008). The Schizotypal Personality Questionnaire (SPQ)(Raine, 1991) to measure schizotypy. The SPQ was later scored according to the novel subscales provided by Davies (2017); the Snaith Hamilton Pleasure Scale (SHAPS)(Snaith et al., 1995) to measure some aspects of anhedonia (higher scores reflect higher values of anhedonia); IQ was measured from the WASI, the same way as in the Clinical study.
Reinforcement learning task
All participants were assessed on a modified version of a traditional Go/NoGo reinforcement learning task, developed by Guitart-Masip et al., (2012) that provides several measures of reinforcement learning (Figure 1). The task involved the presentation of four fractal images 36 times each, for a total of 144 trials across the 4 conditions. The order of the stimuli was random and each cue was presented for 800ms, followed by cross-hair in the middle of the screen for 250-3500ms. Then there was a target detection task showing a circle on either side of the screen for a maximum time of 800ms, during which time the participant had to make a button press response (Go) or not (NoGo). The Go response was given via pressing a keyboard button on the side on which the cue was presented (right or left), then the probabilistic outcome was shown. Possible outcomes were: a green arrow upward for wins (£0.5), a red one downwards for losses (-£0.5) and a yellow horizontal bar for neutral outcomes (£0). For the reward conditions, only positive or neutral outcomes were possible, while for the losses conditions participants could experience either a loss or a neutral outcome. Importantly, these outcomes were probabilistic on a 80:20 schedule. Overall, there were four conditions depending on the cue presented at the start of the task: two Pavlovian congruent conditions requiring to press the button to get a reward (Go-to-win) or to not press the button to avoid losing (NoGo-to-avoid-losing); two Pavlovian Incongruent conditions requiring to either not press the button to get a reward (NoGo-to-win) or to press the button to avoid losing (Go-to-avoid-losing). Further details on the specifics of the task can be found in the Supplementary Material.
Computational modelling: hBayesDM
Behavioural performance on the Go/NoGo task was calculated by summing scores for the task conditions, and by modelling latent task variables using the hBayesDM package (hierarchical Bayesian modeling of Decision Making tasks) for R (version 0.5.0 on MacOS High Sierra version 10.13.1) developed by Ahn et al. (2017). We used this approach to generate posterior distributions of the parameters characterising task performance to improve the balance of within-subject and between-subject random effects, whilst also taking into account within-subject variability and group-level similarities (O’Callaghan et al., 2017). Full information on the details of the modelling parameters and model fitting and comparison can be found in the Supplementary Material. “Model 4” was the best model (lowest LOIC) for both cohorts of participants and included the following parameters: lapse rate (random errors), learning rate, Go bias (tendency to make a response), Pavlovian bias (tendency to make a response to stimuli associated with reward and withhold a response to stimuli associated with punishment), sensitivity to reward, sensitivity to punishment.
Polygenic risk score calculation
Participants in the healthy adolescent study participants were drawn from a larger sample of over 2000 adolescents on whom genetic data were acquired from by saliva sample (Kiddle et al., 2017). Genotyping was carried out by the Cambridge Bioresource on an Affymetrix chip array, yielding genotype at 507,968 SNPs for subjects. Quality control and imputation was performed. The parameters for retaining SNPs were: SNP missingness < 0.01 (before sample removal); SNP Hardy-Weinberg equilibrium (P > 10-6) and minor allele frequency MAF > 0.01. Final statistical analyses were carried out on n = 390 participants of European ancestry for whom both adequate genotype and reinforcement learning data were available. See Figure 5 in Supplementary Material for a detailed flowchart of excluded participants. The generation of the PRS was based on the methods described by the International Schizophrenia Consortium (2009). Polygenic scores were calculated for each individual using the PLINK (version 1.9) score command. Scores were created by adding up the number of risk alleles for each SNP, i.e. single nucleotide polymorphism, which took the value of 0,1, or 2 and weighted by the logarithm of its odds ratio for schizophrenia from the results reported in Pardinas et al., (2018): the meta-analysis of the CLOZ-UK sample and the Psychiatric Genomics Consortium PGC2 schizophrenia dataset (Jones et al., 2016). The scores used were generated from a list of SNPs with a GWAS training-set P<.05 threshold, as this is the threshold that has been suggested to capture maximal schizophrenia liability (Schizophrenia Working Group of the Psychiatric Genomics Consortium 2014; Pardiñas et al. 2018).
Statistical analyses
In the clinical study, group differences on task performance (behavioural and modelled) were examined by one-way analysis of variance (ANOVAs), and Spearman Rank Order correlation coefficients were used investigate the relationships between task performance and clinical measures at each group level. Despite the group differences in IQ in the clinical study, since matching for education and IQ could yield a non-representative sample of patients, and given that both the participants’ own level of education and their maternal levels of education were not significantly different from controls, we did not match ARMS and FEP for IQ and, like Albrecht et al., (2016), we did not use IQ as a covariate for the statistical analyses carried out.
In the healthy adolescent study, the relationships between task performance (behavioural and modelled) and clinical measures were examined by Spearman Rank Order correlation coefficients (n= 735). Standard multiple regression analysis was first used to test whether PRS at P-threshold 0.05 predicted learning rate as measured by the computational model (chosen as the main outcome variable given the robust evidence in the literature showing learning deficits in patients with schizophrenia). Covariates included age, sex and the first five primary component analysis factors for ancestry. N= 5 participants were excluded as outliers, with a final sample of n= 390. To test if the PRS scores predicted the other aspects of task performance, standard multiple regression analyses were then run for each of the other cognitive variables of interest. False Discovery Rate (Benjamini-Hochberg) correction was applied to control for the expected proportion of falsely rejected hypotheses and to gain power (Benjamini & Hochberg, 1995). Further, Bayesian linear regressions were also performed in JASP to compare the likelihood of the task performance data under models with, versus without, schizophrenia polygenic risk score.
3. Results
Clinical study
All groups showed the classic pattern of better performance in the Pavlovian congruent conditions. There were significant differences between groups in performance of the task, with FEP overall performing worse than the other groups in several measures of performance.
In terms of overall performance (percent for best outcome) on the four GNG conditions, all groups showed better performance in the Pavlovian congruent conditions compared to the Pavlovian incongruent ones. Specifically, when looking at group differences in performance of each condition, FEP performed significantly worse than controls and ARMS in the Punishment conditions (Go-to-avoid losing and NoGo-to-avoid-losing) and also significantly worse than controls only on the easier Go-to-win condition. See Figure 2 and for the descriptive statistics Table 4 in the Supplementary Material.
When then looking at the latent variables of performance, we found group differences across the six modelled parameters, largely driven by FEP versus control results. See Figure 3 below and Table 5 in the Supplementary Material for the results of group-comparisons and the statistics of each parameter per group.
To further explore the possible effect of antipsychotic medication on our results, we subdivided the FEP group into two different sub-groups: one of FEP individuals who did not take antipsychotics (FEP-n= 11) and one with those taking antipsychotics, (FEP+ n= 15). The results remained largely the same. See Supplementary Material for more details. FEP+ had a higher sensitivity to punishment compared to the FEP-(1.800, 95% CI [-3.341, .258], p= .019), and this latter group also showed a significantly higher Pavlovian bias compared to Controls (0.363, 95% CI [.0001, .727], p= .05).
Results from the Spearman correlational analyses investigating possible relationships between task performance and clinical measures for each group can be found in Figure 6 in the Supplementary Material.
Healthy adolescent study
The pattern of performance in the healthy adolescent study is reported in detail by Moutoussis et al., (2018). In brief, there were, as expected, significant differences in performance across conditions, with better performance on the Pavlovian congruent conditions compared to the Pavlovian incongruent ones, and similar patterns for the learning curves. The Spearman correlational analyses on the Healthy Adolescent group showed a moderate negative correlation between the modelled parameters of Pavlovian bias and that of learning rate. Moutoussis et al., (2018) reported that there were no significant associations between task indices and mood. Our behavioural results (Figure 7 in Supplementary Material) indicate weak positive correlations between the Go bias parameter and SPQ tot (r =.13, p= .01), as well as with two SPQ subscales tapping on social anxiety and eccentricity (r = .13, p= .01 and r = 0.10 p= .04). The SPQ subscale reflecting anomalous experiences and beliefs was weakly negatively correlated with the sensitivity to reward in the task (r = −.11, p= .03). The sensitivity to punishment was weakly negatively associated with the SPQ subscale of paranoid ideation (r = −.15, p< .001) and with the PLIKS (r = −.11, p= .03).
The results from the standard multiple regression analysis between PRS at P-threshold 0.05 and the modelled parameter of learning rate (with age, sex, first five primary component analysis factors for ancestry as covariates) was not statistically significant: R2 = .004, F(8, 381) = .177, p =.994, adjusted R2 = −.017, Unstandardized B Coefficient = −.001 (Standard error = .008, t-value = −.109, p= .913). Standardized Beta coefficient (β)= −.006. See Figure 8 in the Supplementary Material. Results for the other main cognitive variables of interest are summarised in Table 1 below in ascending order of adjusted significance p-value. Overall, after corrections, no significant results were found.
We also run Bayesian linear regression analyses, comparing a model with PRS to a null model including age, gender and the first five PCA components of ancestry as covariates. Results can be found in Table 8 in the Supplementary Material. The null model with the covariates out-predicted the model that contained the main predictor of interest for all task-related variables. The only exceptions regarded the two modelled parameters tapping on sensitivity to rewards (BF10 = 1.331) and to punishment (BF10 = 1.011). Nevertheless, a Bayes Factor between 1 and 3 is considered as providing only weak and inconclusive evidence for the support of H1 over the null model (Lee & Wagenmakers, 2013; Wagenmakers et al., 2017) and the results converge with what found in the standard multiple regression analyses.
4. Discussion and Conclusions
In the Clinical study, overall, all groups showed better performance in the Pavlovian congruent conditions compared to the Pavlovian incongruent ones. We found group differences in behavioural and modelled performance on the task, with FEP performing worse than the other two groups. Both ARMS and FEP showed a decrease in behavioural performance across all conditions of the task compared to controls, but the differences were only significant for the FEP group. Further to this, and contrary to what was expected, FEP performed relatively better on the Pavlovian congruent conditions compared to the Pavlovian incongruent ones, and even more so when having to make an action to get a reward than to avoid a punishment, thus suggesting preserved action (Go) reward-related learning.
There were also significant group differences in the modelled parameters. Learning rate was lower in FEP compared to controls and to ARMS (who showed a similar learning rate to controls) and FEP had a significantly higher Pavlovian bias than controls. Finally, although the sensitivity to punishment was intact for ARMS, it was significantly reduced in FEP compared to the other groups, which is seemingly at odds with what was hypothesised and expected from previous literature in chronic schizophrenia patients (Gold et al., 2008). There were no significant differences in the sensitivity to punishment nor in the other modelled parameters.
The finding of a higher Pavlovian bias in first episode psychosis patients compared to controls is in contrast with the findings from Albrecht et al., (2016) in chronic illness. This might be attributable to the progression of the disease which, alongside an extensive use of antipsychotics (Scherer et al., 2004), is linked to the worsening of deficits in gradual reinforcement learning, the neural substrate of which is thought to involve the basal ganglia, and specifically the striatum, the same areas thought to give rise to the Pavlovian biases. In turn, this might have the effect of weakening the Pavlovian biases and result in the pattern observed in Albrecht’s study. To investigate further whether disease stage and antipsychotics can have such effects on Pavlovian biases, longitudinal follow up of FEP patients is necessary.
Our results show clear group effects and deficits in reinforcement learning in patients, although these are different from the deficits found in Albrecht et al., (2016). Such discrepancies could be due to the different stages of schizophrenia among the patients. In fact, in Albrecht et al., (2016), patients were at a chronic stage of the disorder and much older (mean age= 37.7 years) than those in the current study. In the current study, we looked at individuals suffering from early psychotic episodes (FEP) and at those who were at-risk (ARMS) for developing schizophrenia, respectively being 24.6 and 21.2 years old on average. Furthermore, the patients in Albrecht’s study presented more severe negative symptoms, thus they were potentially a different subgroup of schizophrenia patients compared to ours.
In the Healthy Adolescent study, the pattern of overall performance on the reinforcement learning task is the same as that of controls from the Clinical study, thus showing that, in the general population, individuals learn the Pavlovian congruent conditions more easily and have more difficulties with the incongruent ones. Although similar results had been shown in previous experiments, the main contribution the current findings is that they confirm the influences of Pavlovian biases on reinforcement learning in a bigger and younger sample compared to those in which this task had been used so far. In doing so, the current study strengthens the confidence that the observed pattern of results is representative of how an average healthy individual performs the task.
When correlating task performance and clinical measures of psychopathology, we found some evidence of weak associations between task performance and schizotypy. The Go bias parameter of performance was positively associated with the total score on the SPQ scale measuring schizotypy, as well as with two of the subscales tapping on social anxiety and on eccentricity; this is in contrast to the clinical results, where Go bias was reduced in patients. The SPQ subscale reflecting anomalous experiences and beliefs was negatively associated with the sensitivity to reward – this domain did not differentiate patients and controls in the clinical study. Sensitivity to punishment was negatively correlated with the SPQ subscale of paranoid ideation, and with the PLIKS, and was reduced in first episode psychosis patients; taken together might suggest a link between impaired punishment-related learning and delusional thinking in clinical psychosis and the healthy population.
Our results further show that PRSs for schizophrenia in the general population do not predict performance on this specific reinforcement learning task. There are multiple possible explanations of this lack of significance, which cannot be disentangled in the current study. The first possible explanation is that the PRS for schizophrenia does not specifically bear on the cognitive domain of reinforcement learning, which could be more associated with illness itself rather than illness risk; this explanation would align with the clinical study findings where we did not find significantly impaired performance in the clinical risk (ARMS) group. The second explanation is that the regression analyses were underpowered to detect any small polygenic risk effect sizes present in this sample and/or the GNG task might not have capture sufficient individual variability in performance (we note performance did not significantly relate to any measured psychological traits in the healthy adolescent study). We did not record fMRI responses during reinforcement learning which were shown to associate with schizophrenia PRS in a recent study (Lancaster et al., 2019). We also conducted post-hoc power calculations in order to inform future studies and examine the power of the current study to detect genetic effects on reinforcement learning. Post-hoc power calculations (Soper, 2018) suggested that a sample of 390 individuals in the PRS analysis, with learning rate as the main predictor, had 0.43 power to detect an association, and therefore the analysis might have been underpowered to find any significant results. For a 0.80 power with the same observed effect size, a minimum sample of 959 individuals would have been needed to demonstrate a significant effect. Nevertheless, for the majority of cognitive outcomes measures, Bayesian analysis indicated the data was slightly more likely under a model without schizophrenia polygenic risk score than one including it. Finally, the sample in the Healthy Adolescent study consisted of individuals who were partly recruited on the basis of their good health; it is possible that this lack in mental health variance might have reduced our ability to detect relationships between task performance and other traits.
The study has several limitations. Firstly, in the Clinical study the groups differed significantly on age, and this could possibly be problematic when looking at group differences, as some studies point at age-related effects on reinforcement learning performance (Samanez-Larkin & Knutson, 2015; Radulescu, Daniel and Niv, 2016); however, the group differences in age were only slight, and the behavioural performance differences remained intact when controlling for age. We did not demonstrate significant differences between ARMS and controls. This could be party linked to the conservative approach we used in the modelling, namely fitting the models to all members of the clinical study as if they were drawn from a single group (with a single mean and variance), rather than assuming they were drawn from separate populations. Although this approach has the benefit of minimising potential false positives, it might have reduced the between-group differences and therefore impacted the overall sensitivity of the study. However, the results in the analysis of the modelled latent parameters were largely consistent with those in the modelled observed performance measures. Finally, we acknowledge the possible influence of severe traumatic stress experiences, which was linked to increased Pavlovian biases in a previous study (Ousdal et al., 2018). In fact, one cannot exclude the possibility that one of the driving mechanisms leading to the observed increase in Pavlovian biases in FEP might be linked to the stress of having experienced a psychotic episode, thus opening interesting avenues for further research.
Overall, the current work makes some important contributions to the field of reinforcement learning in schizophrenia. Firstly, we show that there are specific reinforcement learning deficits in psychotic illness and that such deficits are sensitive to illness stage, being present in frank psychosis but not in At Risk Mental States. Secondly, we show that there is no clear association between these reinforcement learning domains identified as deficient in psychosis and psychopathology in the general population. Lastly, we found no large effects of either clinical risk for psychosis or molecular polygenic risk for schizophrenia in reinforcement learning, with the power calculations indicating that a bigger sample would be required for definitive results; the results do not support reinforcement learning as an intermediate phenotype for schizophrenia.
Conflicts of interest
ETB is employed 50% by GSK. All other authors declare no conflicts of interest. PBJ was a member of scientific advisory boards for Janssen, Ricordati and Lundbeck.
Author roles
MMontagnese: conceptualization, methodology, formal analysis, writing (original draft preparation, review and editing); FK: conceptualization, methodology, formal analysis, supervision, writing (original draft preparation, review and editing); JH, JG: conceptualization, investigation, methodology, writing (review and editing); AR, PV, BK,: methodology, resources; writing; PCF conceptualization, methodology, writing (review and editing); PBJ, PF, ETB, RD, IG: conceptualization, project administration, funding acquisition, supervision, writing (review and editing); MO: resources; writing; MMoutoussis: conceptualization, methodology, supervision, writing (review and editing); NSPN Consortium: conceptualization, methodology, project administration. GKM, conceptualization, project administration, methodology, supervision, writing (original draft preparation, review and editing).
Ethical standards
The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008.
Acknowledgements & Funding
The authors are grateful to the volunteers who took part in these studies, as well as all the members of staff involved in the recruitment process. This work was supported by the Neuroscience in Psychiatry Network (NSPN) Consortium, a strategic award from the Wellcome Trust to the University of Cambridge and University College London (095844/Z/11/Z); by the Cambridge NIHR Biomedical Research Centre.
Footnotes
This work was supported by the Neuroscience in Psychiatry Network (NSPN) Consortium, a strategic award from the Wellcome Trust to the University of Cambridge and University College London (095844/Z/11/Z); by the Cambridge NIHR Biomedical Research Centre.
https://github.com/marcellamontagnese/reinforcementlearningNSPN