Tutorial
Parameter recovery for the Leaky Competing Accumulator model

https://doi.org/10.1016/j.jmp.2016.12.001Get rights and content

Highlights

  • There is no closed-form solution to the likelihood function of the Leaky Competitive Accumulator (LCA) model.

  • There are strong trade-offs between accumulation rate, leakage, and inhibition in the LCA model.

  • Therefore, it is extremely difficult to faithfully recover the parameters of the LCA model.

Abstract

The Leaky Competitive Accumulator (LCA) model for perceptual discrimination is rapidly growing in popularity due to its neural plausibility. The model assumes that perceptual choices and associated response times are the consequence of the accrual of evidence for the various response alternatives up to a certain predetermined threshold. In addition, accrual of evidence is influenced by temporal leakage of information and mutual inhibition between the accumulators. In this paper we provide an overview of fitting routines that may be used to identify the parameter values used for generating data under the LCA assumptions. We find that because (a) there is no closed-form solution to the likelihood function of the LCA model, and (b) there are strong trade-offs between accumulation rate, leakage, and inhibition, it is extremely difficult to faithfully recover the parameters of the LCA model. To minimize these problems, we recommend to use DE-MCMC sampling, collect very large amounts of data, and constrain the parameter space where possible.

Introduction

The ability to quickly and accurately make decisions about the environment is of crucial importance to survival. Whether an animal near you is predator or prey means the difference between being lunch and having lunch.

To understand how perceptual decision-making works, formal sequential sampling models (SSMs) are often used. The fundamental idea of SSMs is that, for each choice option (e.g., predator or prey), there is a latent accumulator that gathers noisy evidence from sensory input. Evidence accumulation continues until an accumulator reaches a level of activation that exceeds a certain threshold, and a decision is made.

The most influential and benchmark SSM is the diffusion decision model (DDM,  Ratcliff & McKoon, 2008). The DDM proposes that evidence accumulation in a two-choice decision is characterized by Eq. (1): dx=Adt+cdW where dx is the change in activation of an accumulator, A is the average speed of evidence accumulation (drift rate), and dW is Gaussian noise with standard deviation c. At the start of each trial, the accumulator starts at point 0 (i.e., no evidence) between [a,a]. During a trial, the accumulator collects evidence, and the gathered amount of evidence moves towards a or a. The stochastic nature of the accumulation process makes it possible to make an incorrect decision (e.g., hit the lower threshold with a positive drift rate), and causes variation in the speed of the decision process, resulting in distributions of reaction times. Once one of the boundaries is reached, a response is initiated corresponding to the boundary that is reached. The actual reaction time is the decision time, plus a non-decision time to account for early perceptual processes and motor processes. Extensions of the standard DDM include between-trial variability in drift rates and starting point (Ratcliff, 1978), and sometimes variability in the non-decision time (Ratcliff & Tuerlinckx, 2002).

The use of SSMs is appealing for both practical and theoretical reasons. Practically, SSMs explain both reaction time distributions and accuracy data at the same time very well, unlike, for example, signal detection theory (Ratcliff & McKoon, 2008). In addition, SSMs can account for a wide range of findings in reaction time data, including the speed–accuracy trade-off (e.g.,  Boehm et al., 2014, Forstmann et al., 2008, Mulder et al., 2013), and Hick’s law (Usher et al., 2002, Van Maanen et al., 2012). Theoretically, SSMs are appealing because they provide information about latent processes instead of raw behavioral measures, and as such, they are informative for the mechanistic processes underlying behavior (Anders et al., 2016, Forstmann et al., 2015, Mulder et al., 2014, Van Maanen et al., 2012).

Although the DDM and related SSMs were primarily developed to model cognitive processes, several lines of research have shown that evidence accumulation happens on a neuronal level as well. For example, animal studies have reported single cell recordings showing evidence accumulation (e.g. in the LIP, FEF,  Churchland and Ditterich, 2012, Gold and Shadlen, 2007, Purcell et al., 2010, Shadlen and Newsome, 2001, but see  Latimer et al., 2015, Shadlen et al., 2016). In addition, EEG-studies have shown ERP-components that correlate remarkably well with drift rate (O’Connell, Dockree, & Kelly, 2012) or boundary (Boehm et al., 2014). This work is accompanied by more detailed biophysical models of neuronal networks that on a relatively gross level are consistent with the sequential sampling framework as well (Lo and Wang, 2006, Wang, 2008).

It has been argued (Usher & McClelland, 2001) that SSMs make assumptions about the evidence accumulation process that are implausible when considering the behavior of pools of neurons. In particular, the assumption of full evidence integration is argued to be problematic: neural excitatory input decays within a very short time period (5–10 ms) (Abbott, 1991). A mechanism of recurrent self-excitation delays the decay of information to some extent, but not perfectly (Amit, Brunel, & Tsodyks, 1994). Hence, information decays over time, so accumulators in a SSM should be ‘leaky’ integrators. To reconcile SSMs with findings in neurophysiology, neurally plausible models have been proposed.

One particular such model is the leaky, competing accumulator model (LCA,  Usher & McClelland, 2001). The LCA (Fig. 1) is an n-choice version of the DDM, including two neuroscientifically motivated parameters: leakage (sometimes called decay) and lateral inhibition. Evidence accumulation in the LCA is described by:dxi=[Iiκxiβiixi]dtτ+ξidtτxi=max(xi,0) where dxi is the change in activation of accumulator i, I is input (c.f. drift rate in the DDM), κ is leakage, β is inhibition, dtτ is the time step size, and ξ is noise which is drawn from a Gaussian distribution when the model is simulated (Fig. 1). The second equation is a non-linear component preventing the activation of an accumulator to drop below 0. The idea behind this is that because neurons cannot have ‘negative’ activation, neither should accumulators (but see below). Response times are obtained through the first-passage time distribution with a common threshold level of activation Z, and non-decision time NDT.

A few mechanistic differences between the LCA and the canonical DDM should be mentioned. The DDM is only applicable in two-choice situations (but see  Leite and Ratcliff, 2010, McMillen and Holmes, 2006, Ratcliff and Starns, 2013, Smith, 2016), as it assumes one accumulator that gathers evidence for both choice options. The sign of the accumulator (positive or negative) determines whether there is more evidence for choice option one or two. The DDM includes competition: every piece of evidence for choice option 1 always counts as evidence against choice option 2. The LCA assumes one accumulator for each choice option, and these accumulators race towards the threshold. The LCA also includes competition, but proposes a different competition mechanism: the inhibition parameter determines the proportion of one accumulator’s activation that inhibits the other accumulators (see bottom right panel in Fig. 1).

The inclusion of the leakage and inhibition parameters is theoretically appealing, but needs to be validated by reaction time data that can only be explained by these parameters (Pitt & Myung, 2002). There have been some studies into the effect of inhibition (Churchland and Ditterich, 2012, Teodorescu and Usher, 2013, Tsetsos et al., 2011), but the importance of including a leakage parameter remains debated (Ossmy et al., 2013, Purcell et al., 2010, Ratcliff and Starns, 2013, Turner et al., 2016 but see  Ratcliff, 2006, Winkel et al., 2014).

An important reason for the sparse amount of research into the LCA is that the model does not have a (known) likelihood function (Turner & Sederberg, 2014). The only methods available to fit the model to data are simulation-based, meaning that, in order to calculate any measure of fit, the LCA has to generate simulated data for each proposal set of parameters. As a consequence, model-fitting procedures are slow—so slow, that a thorough investigation of model fitting procedures, recoverability, and identifiability of the LCA has, as far as the authors are aware, not been performed.

To circumvent fitting a model without a known likelihood function, studies in which leakage and inhibition may be of relevance have often used the Ornstein–Uhlenbeck process (OU,  Busemeyer & Townsend, 1993). The OU can be considered as a linear simplification of the LCA with a known likelihood function. The use of the OU instead of the full LCA is, in case of a two-choice paradigm, validated if we drop the assumption of non-linearity. Without the non-linearity, it can be shown that the evidence accumulation process as described by the LCA Eqs. (2) is equivalent to an OU-process (ignoring differences in the threshold parameter, see also  van Ravenzwaaij, van der Maas, & Wagenmakers, 2012). If we define x=x2x1 (a single accumulator representing the difference in activation between the two accumulators in the LCA, c.f. the DDM), the two LCA-equations of the two accumulators: dx1=[I1κx1βx2]dtτ+ξdtτdx2=[I2κx2βx1]dtτ+ξdtτ can be rewritten as one simple equation: dx=[(I2I1)+(βκ)x]dtτ+ξ2dtτ. The major disadvantage of the use of the OU is that the leakage and inhibition parameters reduce to one ‘difference’ parameter, often named λ. Fitting an OU-process can thus only tell something about whether the difference value between leakage and inhibition is positive or negative; which means that we can only know whether there was more leakage than inhibition or vice versa.

This problem only occurs when the assumption of non-linearity is dropped. Excluding the non-linearity has been defended with theoretical and practical reasons. Theoretically, the motivation behind the assumption has been questioned: although the activation of neurons can indeed not decrease below 0, this argument might be based on the false assumption that ‘inactive’ neurons are not firing. In contrast, inactive neurons fire at a baseline rate, and it is known that activation can drop below baseline by inhibitory input (see, for this line of reasoning,  Bogacz, Brown, Moehlis, Holmes, & Cohen, 2006). The practical reason for dropping the assumption of non-linearity is that it has been shown that, under reasonable parameter ranges (i.e., when all accumulators receive a substantial amount of input), the non-linear process can be mimicked very well by a linear process (Brown & Holmes, 2001; McMillen & Holmes, 2006; Usher and McClelland, 2001, Usher and McClelland, 2004; van Ravenzwaaij et al., 2012).

Thus, in part of the parameter space, it will be impossible to disentangle the leakage and inhibition parameters in a two-choice LCA model. To ensure that these crucial parameters can be disentangled, we focus on three-choice settings, where the leakage and inhibition parameters can be disentangled in the whole parameter space. A thorough analysis of fitting the LCA on three-accumulator data has, as far as the authors are aware, not been performed (although fits to multi-alternative data have been reported, e.g.,  Teodorescu and Usher, 2013, Tsetsos et al., 2011). Further studies can use our findings to study the two-choice LCA.

The goal of the present article is twofold. Firstly, we want to share best practices in fitting likelihood-free cognitive models such as the LCA. In our attempts to perform parameter recovery for this model, we found that not all methods are equally suitable to handle the increased uncertainty that is a direct consequence from the lack of likelihood function. In particular, we focus on the Probability Density Approximation method (PDA,  Turner & Sederberg, 2014). This method is the most suitable for obtaining a reliable estimate of the likelihood function, and we will demonstrate its use in the context of maximum likelihood estimation as well as a Bayesian parameter estimation procedure (the supplemental materials describe two additional methods).

Secondly, our aim is to study the ability to recover the data-generating parameters of the LCA model. The ability to recover parameters is of crucial importance when model fitting is used as a method to gain insight into latent cognitive processes.

By studying the recoverability, we indirectly also investigate the identifiability of the LCA. That is, if the LCA is identified, there is only one unique set of parameters that predict a certain outcome (e.g., a probability density; see also  Moran, 2016). The ability to recover data-generating parameters implies that the model is identified (although the reverse inference does not necessarily hold). In the context of the current research, this exercise will also allow us to learn which model fitting routine should be preferred for fitting the LCA model.

Section snippets

Maximum likelihood estimation

In general, maximum likelihood estimation (MLE) is the standard method of estimating the parameters of a model (e.g.,  Myung, 2003). MLE uses the likelihood function to describe the likelihood of a set parameters θ given the data set D: L(θ|D)=i=1NModel(Di|θ) where Model(Di|θ) is the probability density function of the model, describing the relative probability of observation Di under parameters θ. The likelihood is the product of the probabilities of the individual observations Di under

Differential evolution Markov chain Monte Carlo

Another popular method for estimating cognitive model parameters is Markov-chain Monte Carlo, typically in a Bayesian context (MCMC, e.g.,  Turner et al., 2013, Turner et al., 2015, Wiecki et al., 2013). In order to perform Bayesian MCMC estimation, a prior and a likelihood function are needed. Prior distributions reflect our knowledge of the parameters θ before estimation. In the current simulation study, uniform priors were used to reflect the fact that, before estimation, we know nothing of

Contrasts between conditions

Researchers in the cognitive neurosciences are often not primarily interested in the absolute parameter values. Instead, most experiments try to find the effect of a certain manipulation between conditions. For example, in speed–accuracy trade-off experiments, research aims to discover whether emphasizing the importance of being fast versus being accurate influences the response caution (i.e., threshold) or the efficiency of perceptual processing (i.e., drift rate) (Mulder et al., 2013, Rae

Correlational structure of the parameters of the LCA

The results so far have indicated that many fitting routines produce inaccurate estimates of the input, leakage, and inhibition parameters (see also the methods discussed in the supplemental materials). The best estimates were obtained using the Bayesian fitting procedure with a data set consisting of 10,000 trials, but even there, there is some uncertainty. In this section we discuss possible reasons for the difficulty to reliably recover parameters for LCA. For this, the posterior

Discussion

The present study investigated whether the parameters of the leaky, competing accumulator model could be reliably recovered. Using maximum likelihood estimation as well as Bayesian fitting procedures with various data set sizes, it was shown that only a DE-MCMC sampler with a very large data set resulted in reasonably accurate parameter estimates.

Three major causes contributed to the general fitting difficulties. Firstly, there is noise in goodness-of-fit measures, caused by the

References (77)

  • B.M. Turner et al.

    A Bayesian framework for simultaneously modeling neural and behavioral data

    Neuroimage

    (2013)
  • B.M. Turner et al.

    Approximate Bayesian computation with differential evolution

    Journal of Mathematical Psychology

    (2012)
  • B.M. Turner et al.

    Bayesian analysis of simulation-based models

    Journal of Mathematical Psychology

    (2016)
  • B.M. Turner et al.

    A tutorial on approximate Bayesian computation

    Journal of Mathematical Psychology

    (2012)
  • M. Usher et al.

    Hick’s law in a stochastic race model with speed-accuracy tradeoff

    Journal of Mathematical Psychology

    (2002)
  • X.J. Wang

    Decision making in recurrent neural circuits

    Neuron

    (2008)
  • L. Abbott

    Firing-rate models for neural populations

  • W.-Y. Ahn et al.

    A model-based FMRI analysis with hierarchical Bayesian parameter estimation

    J Neurosci Psychol Econ

    (2011)
  • D.J. Amit et al.

    Correlations of cortical Hebbian reverberations: theory versus experiment

    Journal of Neuroscience

    (1994)
  • R. Anders et al.

    The shifted Wald distribution for response time data analysis

    Psychological Methods

    (2016)
  • D. Ardia et al.

    Jump-diffusion calibration using differential evolution

    Wilmott Magazine

    (2011)
  • D. Ardia et al.

    Differential evolution with DEoptim: An application to non-convex portfolio optimization

    The R Journal

    (2011)
  • Ardia, D., Mullen, K.M., Peterson, B.G., & Ulrich, J. 2015. DEoptim: Differential Evolution in R. Version 2.2-3. URL...
  • R. Bogacz et al.

    The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks

    Psychological Review

    (2006)
  • S.P. Brooks et al.

    Generative methods for monitoring convergence of iterative simulations

    Journal of Computational and Graphical Statistics

    (1998)
  • E. Brown et al.

    Modeling a simple choice task: Stochastic dynamics of mutually inhibitory neural groups

    Stochastics and Dynamics

    (2001)
  • T. Buch-larsen et al.

    Kernel density estimation for heavy-tailed distributions using the Champernowne transformation

    Statistics

    (2005)
  • J.R. Busemeyer et al.

    Decision field theory: a dynamic cognitive approach to decision making in an uncertain environment

    Psychological Review

    (1993)
  • A.R. Conn et al.

    A globally convergent augmented lagrangian algorithm for optimization with general constraints and simple bounds

    SIAM Journal on Numerical Analysis

    (1991)
  • A.R. Conn et al.

    A globally convergent lagrangian barrier algorithm for optimization with general inequality constraints and simple bounds

    Mathematics of Computation

    (1997)
  • B.U. Forstmann et al.

    Striatum and pre-SMA facilitate decision-making under time pressure

    Proceedings of the National Academy of Sciences of the United States of America

    (2008)
  • B.U. Forstmann et al.

    Sequential sampling models in cognitive neuroscience: Advantages, applications, and extensions

    Annual Review of Physiology.

    (2015)
  • A. Gelman et al.

    Inference from iterative simulation using multiple sequences

    Statistical Science

    (1992)
  • J.I. Gold et al.

    The neural basis of decision making

    Annual Review of Neuroscience

    (2007)
  • D.E. Goldberg

    Genetic algorithms in search, optimization and machine learning

    (1989)
  • M.C. Keuken et al.

    The subthalamic nucleus during decision-making with multiple alternatives

    Human Brain Mapping

    (2015)
  • K. Latimer et al.

    Single-trial spike trains in parietal cortex reveal discrete steps during decision-making

    Science

    (2015)
  • F.P. Leite et al.

    Modeling reaction time and accuracy of multiple-alternative decisions

    Attention, Perception & Psychophysics

    (2010)
  • Cited by (50)

    • Centroparietal activity mirrors the decision variable when tracking biased and time-varying sensory evidence

      2020, Cognitive Psychology
      Citation Excerpt :

      Note, however, that our conclusion favouring an LCA model without any delay was dependent on our decision to elevate BIC over AIC in model comparison, and that a model with delay performed similarly well. Finally, although LCA complexity seems advantageous in this case, it is also known to induce parameter recovery issues and has been described as a model in which different combinations of parameter values result in similar reaction time distributions (Miletić, Turner, Forstmann, & van Maanen, 2017). In a recovery study (Appendix A) we also observed poor recovery for several of the parameters, with the implied trade-off being consistent with that observed by Miletić et al.’s (2017).

    • Comparison of magnitude-sensitive sequential sampling models in a simulation-based study

      2020, Journal of Mathematical Psychology
      Citation Excerpt :

      This way we checked whether our fitting method was able to reproduce the original parameter set. Sometimes models can yield good fits with parameter sets different from the original one, e.g. see Miletić, Turner, Forstmann, and van Maanen (2017) for a recent study relating to the LCA, which considers the case of abstract drift rates that are not connected to the stimulus values (or magnitudes). That is, it seems to be intrinsically difficult to recover model parameters of the LCA model even in case the model is known (Miletić et al., 2017).

    View all citing articles on Scopus
    View full text