TutorialParameter recovery for the Leaky Competing Accumulator model
Introduction
The ability to quickly and accurately make decisions about the environment is of crucial importance to survival. Whether an animal near you is predator or prey means the difference between being lunch and having lunch.
To understand how perceptual decision-making works, formal sequential sampling models (SSMs) are often used. The fundamental idea of SSMs is that, for each choice option (e.g., predator or prey), there is a latent accumulator that gathers noisy evidence from sensory input. Evidence accumulation continues until an accumulator reaches a level of activation that exceeds a certain threshold, and a decision is made.
The most influential and benchmark SSM is the diffusion decision model (DDM, Ratcliff & McKoon, 2008). The DDM proposes that evidence accumulation in a two-choice decision is characterized by Eq. (1): where is the change in activation of an accumulator, is the average speed of evidence accumulation (drift rate), and is Gaussian noise with standard deviation . At the start of each trial, the accumulator starts at point (i.e., no evidence) between . During a trial, the accumulator collects evidence, and the gathered amount of evidence moves towards or . The stochastic nature of the accumulation process makes it possible to make an incorrect decision (e.g., hit the lower threshold with a positive drift rate), and causes variation in the speed of the decision process, resulting in distributions of reaction times. Once one of the boundaries is reached, a response is initiated corresponding to the boundary that is reached. The actual reaction time is the decision time, plus a non-decision time to account for early perceptual processes and motor processes. Extensions of the standard DDM include between-trial variability in drift rates and starting point (Ratcliff, 1978), and sometimes variability in the non-decision time (Ratcliff & Tuerlinckx, 2002).
The use of SSMs is appealing for both practical and theoretical reasons. Practically, SSMs explain both reaction time distributions and accuracy data at the same time very well, unlike, for example, signal detection theory (Ratcliff & McKoon, 2008). In addition, SSMs can account for a wide range of findings in reaction time data, including the speed–accuracy trade-off (e.g., Boehm et al., 2014, Forstmann et al., 2008, Mulder et al., 2013), and Hick’s law (Usher et al., 2002, Van Maanen et al., 2012). Theoretically, SSMs are appealing because they provide information about latent processes instead of raw behavioral measures, and as such, they are informative for the mechanistic processes underlying behavior (Anders et al., 2016, Forstmann et al., 2015, Mulder et al., 2014, Van Maanen et al., 2012).
Although the DDM and related SSMs were primarily developed to model cognitive processes, several lines of research have shown that evidence accumulation happens on a neuronal level as well. For example, animal studies have reported single cell recordings showing evidence accumulation (e.g. in the LIP, FEF, Churchland and Ditterich, 2012, Gold and Shadlen, 2007, Purcell et al., 2010, Shadlen and Newsome, 2001, but see Latimer et al., 2015, Shadlen et al., 2016). In addition, EEG-studies have shown ERP-components that correlate remarkably well with drift rate (O’Connell, Dockree, & Kelly, 2012) or boundary (Boehm et al., 2014). This work is accompanied by more detailed biophysical models of neuronal networks that on a relatively gross level are consistent with the sequential sampling framework as well (Lo and Wang, 2006, Wang, 2008).
It has been argued (Usher & McClelland, 2001) that SSMs make assumptions about the evidence accumulation process that are implausible when considering the behavior of pools of neurons. In particular, the assumption of full evidence integration is argued to be problematic: neural excitatory input decays within a very short time period (5–10 ms) (Abbott, 1991). A mechanism of recurrent self-excitation delays the decay of information to some extent, but not perfectly (Amit, Brunel, & Tsodyks, 1994). Hence, information decays over time, so accumulators in a SSM should be ‘leaky’ integrators. To reconcile SSMs with findings in neurophysiology, neurally plausible models have been proposed.
One particular such model is the leaky, competing accumulator model (LCA, Usher & McClelland, 2001). The LCA (Fig. 1) is an n-choice version of the DDM, including two neuroscientifically motivated parameters: leakage (sometimes called decay) and lateral inhibition. Evidence accumulation in the LCA is described by: where is the change in activation of accumulator , is input (c.f. drift rate in the DDM), is leakage, is inhibition, is the time step size, and is noise which is drawn from a Gaussian distribution when the model is simulated (Fig. 1). The second equation is a non-linear component preventing the activation of an accumulator to drop below . The idea behind this is that because neurons cannot have ‘negative’ activation, neither should accumulators (but see below). Response times are obtained through the first-passage time distribution with a common threshold level of activation , and non-decision time .
A few mechanistic differences between the LCA and the canonical DDM should be mentioned. The DDM is only applicable in two-choice situations (but see Leite and Ratcliff, 2010, McMillen and Holmes, 2006, Ratcliff and Starns, 2013, Smith, 2016), as it assumes one accumulator that gathers evidence for both choice options. The sign of the accumulator (positive or negative) determines whether there is more evidence for choice option one or two. The DDM includes competition: every piece of evidence for choice option 1 always counts as evidence against choice option 2. The LCA assumes one accumulator for each choice option, and these accumulators race towards the threshold. The LCA also includes competition, but proposes a different competition mechanism: the inhibition parameter determines the proportion of one accumulator’s activation that inhibits the other accumulators (see bottom right panel in Fig. 1).
The inclusion of the leakage and inhibition parameters is theoretically appealing, but needs to be validated by reaction time data that can only be explained by these parameters (Pitt & Myung, 2002). There have been some studies into the effect of inhibition (Churchland and Ditterich, 2012, Teodorescu and Usher, 2013, Tsetsos et al., 2011), but the importance of including a leakage parameter remains debated (Ossmy et al., 2013, Purcell et al., 2010, Ratcliff and Starns, 2013, Turner et al., 2016 but see Ratcliff, 2006, Winkel et al., 2014).
An important reason for the sparse amount of research into the LCA is that the model does not have a (known) likelihood function (Turner & Sederberg, 2014). The only methods available to fit the model to data are simulation-based, meaning that, in order to calculate any measure of fit, the LCA has to generate simulated data for each proposal set of parameters. As a consequence, model-fitting procedures are slow—so slow, that a thorough investigation of model fitting procedures, recoverability, and identifiability of the LCA has, as far as the authors are aware, not been performed.
To circumvent fitting a model without a known likelihood function, studies in which leakage and inhibition may be of relevance have often used the Ornstein–Uhlenbeck process (OU, Busemeyer & Townsend, 1993). The OU can be considered as a linear simplification of the LCA with a known likelihood function. The use of the OU instead of the full LCA is, in case of a two-choice paradigm, validated if we drop the assumption of non-linearity. Without the non-linearity, it can be shown that the evidence accumulation process as described by the LCA Eqs. (2) is equivalent to an OU-process (ignoring differences in the threshold parameter, see also van Ravenzwaaij, van der Maas, & Wagenmakers, 2012). If we define (a single accumulator representing the difference in activation between the two accumulators in the LCA, c.f. the DDM), the two LCA-equations of the two accumulators: can be rewritten as one simple equation: The major disadvantage of the use of the OU is that the leakage and inhibition parameters reduce to one ‘difference’ parameter, often named . Fitting an OU-process can thus only tell something about whether the difference value between leakage and inhibition is positive or negative; which means that we can only know whether there was more leakage than inhibition or vice versa.
This problem only occurs when the assumption of non-linearity is dropped. Excluding the non-linearity has been defended with theoretical and practical reasons. Theoretically, the motivation behind the assumption has been questioned: although the activation of neurons can indeed not decrease below 0, this argument might be based on the false assumption that ‘inactive’ neurons are not firing. In contrast, inactive neurons fire at a baseline rate, and it is known that activation can drop below baseline by inhibitory input (see, for this line of reasoning, Bogacz, Brown, Moehlis, Holmes, & Cohen, 2006). The practical reason for dropping the assumption of non-linearity is that it has been shown that, under reasonable parameter ranges (i.e., when all accumulators receive a substantial amount of input), the non-linear process can be mimicked very well by a linear process (Brown & Holmes, 2001; McMillen & Holmes, 2006; Usher and McClelland, 2001, Usher and McClelland, 2004; van Ravenzwaaij et al., 2012).
Thus, in part of the parameter space, it will be impossible to disentangle the leakage and inhibition parameters in a two-choice LCA model. To ensure that these crucial parameters can be disentangled, we focus on three-choice settings, where the leakage and inhibition parameters can be disentangled in the whole parameter space. A thorough analysis of fitting the LCA on three-accumulator data has, as far as the authors are aware, not been performed (although fits to multi-alternative data have been reported, e.g., Teodorescu and Usher, 2013, Tsetsos et al., 2011). Further studies can use our findings to study the two-choice LCA.
The goal of the present article is twofold. Firstly, we want to share best practices in fitting likelihood-free cognitive models such as the LCA. In our attempts to perform parameter recovery for this model, we found that not all methods are equally suitable to handle the increased uncertainty that is a direct consequence from the lack of likelihood function. In particular, we focus on the Probability Density Approximation method (PDA, Turner & Sederberg, 2014). This method is the most suitable for obtaining a reliable estimate of the likelihood function, and we will demonstrate its use in the context of maximum likelihood estimation as well as a Bayesian parameter estimation procedure (the supplemental materials describe two additional methods).
Secondly, our aim is to study the ability to recover the data-generating parameters of the LCA model. The ability to recover parameters is of crucial importance when model fitting is used as a method to gain insight into latent cognitive processes.
By studying the recoverability, we indirectly also investigate the identifiability of the LCA. That is, if the LCA is identified, there is only one unique set of parameters that predict a certain outcome (e.g., a probability density; see also Moran, 2016). The ability to recover data-generating parameters implies that the model is identified (although the reverse inference does not necessarily hold). In the context of the current research, this exercise will also allow us to learn which model fitting routine should be preferred for fitting the LCA model.
Section snippets
Maximum likelihood estimation
In general, maximum likelihood estimation (MLE) is the standard method of estimating the parameters of a model (e.g., Myung, 2003). MLE uses the likelihood function to describe the likelihood of a set parameters given the data set : where is the probability density function of the model, describing the relative probability of observation under parameters . The likelihood is the product of the probabilities of the individual observations under
Differential evolution Markov chain Monte Carlo
Another popular method for estimating cognitive model parameters is Markov-chain Monte Carlo, typically in a Bayesian context (MCMC, e.g., Turner et al., 2013, Turner et al., 2015, Wiecki et al., 2013). In order to perform Bayesian MCMC estimation, a prior and a likelihood function are needed. Prior distributions reflect our knowledge of the parameters before estimation. In the current simulation study, uniform priors were used to reflect the fact that, before estimation, we know nothing of
Contrasts between conditions
Researchers in the cognitive neurosciences are often not primarily interested in the absolute parameter values. Instead, most experiments try to find the effect of a certain manipulation between conditions. For example, in speed–accuracy trade-off experiments, research aims to discover whether emphasizing the importance of being fast versus being accurate influences the response caution (i.e., threshold) or the efficiency of perceptual processing (i.e., drift rate) (Mulder et al., 2013, Rae
Correlational structure of the parameters of the LCA
The results so far have indicated that many fitting routines produce inaccurate estimates of the input, leakage, and inhibition parameters (see also the methods discussed in the supplemental materials). The best estimates were obtained using the Bayesian fitting procedure with a data set consisting of 10,000 trials, but even there, there is some uncertainty. In this section we discuss possible reasons for the difficulty to reliably recover parameters for LCA. For this, the posterior
Discussion
The present study investigated whether the parameters of the leaky, competing accumulator model could be reliably recovered. Using maximum likelihood estimation as well as Bayesian fitting procedures with various data set sizes, it was shown that only a DE-MCMC sampler with a very large data set resulted in reasonably accurate parameter estimates.
Three major causes contributed to the general fitting difficulties. Firstly, there is noise in goodness-of-fit measures, caused by the
References (77)
- et al.
Trial-by-trial fluctuations in CNV amplitude reflect anticipatory adjustment of response caution
Neuroimage
(2014) - et al.
Observing evidence accumulation during multi-alternative decisions
Journal of Mathematical Psychology
(2009) - et al.
New advances in understanding decisions among multiple alternatives
Current Opinion in Neurobiology
(2012) Thou shalt identify! the identifiability of two high-threshold models in confidence-rating recognition (and super-recognition) paradigms
Journal of Mathematical Psychology
(2016)- et al.
Perceptual decision neurosciences—A model-based review
Neuroscience
(2014) Tutorial on maximum likelihood estimation
Journal of Mathematical Psychology
(2003)- et al.
The timescale of perceptual evidence integration can be adapted to the environment
Current Biology
(2013) - et al.
When a good fit can be bad
Trends in Cognitive Science
(2002) Modeling response signal and response time data
Cognitive Psychology
(2006)- et al.
How to use the diffusion model: Parameter recovery of three methods: EZ, fast-dm, and DMAT
Journal of Mathematical Psychology
(2009)
A Bayesian framework for simultaneously modeling neural and behavioral data
Neuroimage
Approximate Bayesian computation with differential evolution
Journal of Mathematical Psychology
Bayesian analysis of simulation-based models
Journal of Mathematical Psychology
A tutorial on approximate Bayesian computation
Journal of Mathematical Psychology
Hick’s law in a stochastic race model with speed-accuracy tradeoff
Journal of Mathematical Psychology
Decision making in recurrent neural circuits
Neuron
Firing-rate models for neural populations
A model-based FMRI analysis with hierarchical Bayesian parameter estimation
J Neurosci Psychol Econ
Correlations of cortical Hebbian reverberations: theory versus experiment
Journal of Neuroscience
The shifted Wald distribution for response time data analysis
Psychological Methods
Jump-diffusion calibration using differential evolution
Wilmott Magazine
Differential evolution with DEoptim: An application to non-convex portfolio optimization
The R Journal
The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks
Psychological Review
Generative methods for monitoring convergence of iterative simulations
Journal of Computational and Graphical Statistics
Modeling a simple choice task: Stochastic dynamics of mutually inhibitory neural groups
Stochastics and Dynamics
Kernel density estimation for heavy-tailed distributions using the Champernowne transformation
Statistics
Decision field theory: a dynamic cognitive approach to decision making in an uncertain environment
Psychological Review
A globally convergent augmented lagrangian algorithm for optimization with general constraints and simple bounds
SIAM Journal on Numerical Analysis
A globally convergent lagrangian barrier algorithm for optimization with general inequality constraints and simple bounds
Mathematics of Computation
Striatum and pre-SMA facilitate decision-making under time pressure
Proceedings of the National Academy of Sciences of the United States of America
Sequential sampling models in cognitive neuroscience: Advantages, applications, and extensions
Annual Review of Physiology.
Inference from iterative simulation using multiple sequences
Statistical Science
The neural basis of decision making
Annual Review of Neuroscience
Genetic algorithms in search, optimization and machine learning
The subthalamic nucleus during decision-making with multiple alternatives
Human Brain Mapping
Single-trial spike trains in parietal cortex reveal discrete steps during decision-making
Science
Modeling reaction time and accuracy of multiple-alternative decisions
Attention, Perception & Psychophysics
Cited by (50)
Towards the application of evidence accumulation models in the design of (semi-)autonomous driving systems – an attempt to overcome the sample size roadblock
2024, International Journal of Human Computer StudiesCentroparietal activity mirrors the decision variable when tracking biased and time-varying sensory evidence
2020, Cognitive PsychologyCitation Excerpt :Note, however, that our conclusion favouring an LCA model without any delay was dependent on our decision to elevate BIC over AIC in model comparison, and that a model with delay performed similarly well. Finally, although LCA complexity seems advantageous in this case, it is also known to induce parameter recovery issues and has been described as a model in which different combinations of parameter values result in similar reaction time distributions (Miletić, Turner, Forstmann, & van Maanen, 2017). In a recovery study (Appendix A) we also observed poor recovery for several of the parameters, with the implied trade-off being consistent with that observed by Miletić et al.’s (2017).
Double responding: A new constraint for models of speeded decision making
2020, Cognitive PsychologyComparison of magnitude-sensitive sequential sampling models in a simulation-based study
2020, Journal of Mathematical PsychologyCitation Excerpt :This way we checked whether our fitting method was able to reproduce the original parameter set. Sometimes models can yield good fits with parameter sets different from the original one, e.g. see Miletić, Turner, Forstmann, and van Maanen (2017) for a recent study relating to the LCA, which considers the case of abstract drift rates that are not connected to the stimulus values (or magnitudes). That is, it seems to be intrinsically difficult to recover model parameters of the LCA model even in case the model is known (Miletić et al., 2017).
Mutual benefits: Combining reinforcement learning with sequential sampling models
2020, Neuropsychologia