## Abstract

Recent empirical findings have indicated that gaze allocation plays a crucial role in simple decision behaviour. Many of these findings point towards an influence of gaze allocation onto the speed of evidence accumulation in an accumulation-to-bound decision process (resulting in generally higher choice probabilities for items that have been looked at longer). Further, researchers have shown that the strength of the association between gaze and choice behaviour is highly variable between individuals, encouraging future work to study this association on the individual level. However, only few decision models exist that easily allow studying the gaze-choice association on the individual level, due to the high cost of developing and implementing them. The model space is particularly scarce for choice sets with more than two choice alternatives. Here, we present GLAMbox, a Python-based toolbox that is built upon PyMC3 and allows the easy application of the gaze-weighted linear accumulator model (GLAM) to experimental choice data. The GLAM assumes gaze-dependent evidence accumulation in a linear stochastic race that extends to decision scenarios with many choice alternatives. GLAMbox enables Bayesian parameter estimation of the GLAM for individual, pooled or hierarchical models, provides an easy-to-use interface to predict choice behaviour and visualize choice data, and benefits from all of PyMC3’s Bayesian statistical modeling functionality. Further documentation, resources and the toolbox itself are available at https://github.com/glamlab/glambox.

A plethora of empirical findings has established an association between gaze allocation and decision behaviour on the group-level. For example, in value-based decision making, it was repeatedly shown that longer gaze towards one option is associated with a higher choice probability for that option (Armel, Beaumel, & Rangel, 2008; Cavanagh, Wiecki, Kochar, & Frank, 2014; Fiedler & Glöckner, 2012; Folke, Jacobsen, Fleming, & De Martino, 2017; Glöckner & Herbold, 2011; Konovalov & Krajbich, 2016; Krajbich, Armel, & Rangel, 2010; Krajbich & Rangel, 2011; Pärnamets et al., 2015; Shimojo, Simion, Shimojo, & Scheier, 2003; Stewart, Gächter, Noguchi, & Mullett, 2016; Stewart, Hermens, & Matthews, 2016; Vaidya & Fellows, 2015) and that external manipulation of gaze allocation changes choice probabilities accordingly (Armel et al., 2008; Pärnamets et al., 2015; Shimojo et al., 2003; Tavares, Perona, & Rangel, 2017). Such gaze bias effects are not limited to value-based decisions, but have recently also been observed in perceptual choices, where participants judge the perceptual attributes of stimuli based on available sensory information (Tavares et al., 2017).

These findings have led to the development of a set of computational models, aimed at capturing the empirically observed association between gaze allocation and choice behaviour by utilizing gaze data to inform the momentary accumulation rates of diffusion decision processes (Ashby, Jekel, Dickert, & Glöckner, 2016; Cavanagh et al., 2014; Fisher, 2017; Krajbich et al., 2010; Krajbich, Lu, Camerer, & Rangel, 2012; Krajbich & Rangel, 2011; Tavares et al., 2017). Specifically, these models assume that evidence accumulation in favour of an item continues while it is not looked at, but at a discounted rate. The application of these models is limited so far, as fitting them to empirical data depends on computationally expensive simulations, involving the simulation of fixation trajectories. These simulations, as well as the creation of models of the underlying fixation process, become increasingly difficult with increasing complexity of the decision setting (e.g., growing choice set sizes or number of option attributes, etc.). Existing approaches that circumvent the need for simulations, model the evidence accumulation process as a single diffusion process between two decision bounds and are therefore limited to binary decisions (Cavanagh et al., 2014; Lopez-Persem, Domenech, & Pessiglione, 2016).

However, researchers are increasingly interested in choice settings involving more than two alternatives. Choices outside the laboratory usually involve larger choice sets or describe items on multiple attributes. Besides, many established behavioural effects only occur in multi-alternative and multi-attribute choice situations (e.g., context effects; Trueblood, Brown, Heathcote, & Busemeyer, 2013).

Furthermore, recent findings indicate strong individual differences in the association between gaze allocation and choice behaviour (Smith & Krajbich, 2018; Thomas, Molter, Krajbich, Heekeren, & Mohr, 2019) as well as individual differences in the decision mechanisms used (Ashby et al., 2016). Yet, the majority of model-based investigations of the relationship between gaze allocation and choice behaviour were focused on the group level, disregarding differences between individuals.

With the Gaze-weighted linear accumulator model (GLAM; Thomas et al., 2019), we have proposed an analytical tool that allows the model-based investigation of the relationship between gaze allocation and choice behaviour at the level of the individual, in choice situations involving more than two alternatives, solely requiring participants’ choice, response time (RT) and gaze data, in addition to estimates of the items’ values.

The GLAM assumes that the decision process is biased by momentary gaze behaviour. While an item is not fixated, its value representation is discounted. The resulting decision signals are averaged using relative gaze data, compared, transformed and fed into a stochastic race framework. Thereby, GLAM naturally generalizes to choice scenarios involving more than two alternatives while remaining analytically tractable. By providing an analytical solution for the first-passage-density (FPD) of the stochastic race process, these models’ parameters can be efficiently estimated in a hierarchical Bayesian framework.

To make GLAM more accessible, we now introduce GLAMbox, a Python-based toolbox for the application of the GLAM to empirical choice, RT and gaze data. GLAMbox allows for individual and hierarchical estimation of the GLAM parameters, simulation of response data and model-based comparisons between experimental conditions and groups. It further contains a set of visualization functions to inspect choice and gaze data and evaluate model fit. We illustrate three application examples of the toolbox: In Example 1, we illustrate how GLAMbox can be used to analyze individual participant data with the GLAM. In particular, we perform an exemplary model comparison between multiple model variants on the individual level, as well as an out-of-sample prediction of participants’ choice and RT data. In Example 2, we demonstrate the application of the GLAM to perform a comparison of group-level parameters in a setting with limited amounts of data, using hierarchical parameter estimation. Lastly, in Example 3, we walk the reader through a step-by-step parameter recovery study with the GLAM, which is encouraged to increase confidence in the estimated parameter values.

## Methods

### Gaze-weighted linear accumulator model details

The GLAM assumes that preference formation, during a simple choice process, is guided by the allocation of visual gaze (for an overview, see Fig. 1). Particularly, the decision process is guided by a set of decision signals: An absolute and relative decision signal. Throughout the trial, the absolute signal of an item can be in two states: An unbiased state, equal to the item’s value *r*_{i} while the item is looked at, and a biased state while any other item is looked at, where the item value *r*_{i} is discounted by a parameter *γ*.

The average absolute decision signal is given by
where *g*_{i} is defined as the fraction of total trial time that item *i* was looked at. If *γ* = 1, there is no difference between the biased and unbiased state, resulting in no influence of gaze allocation on choice behaviour. For *γ* values less than 1, the absolute decision signal *A*_{i} is discounted, resulting in generally higher choice probabilities for items that have been looked at longer.

To determine the relative decision signals, the average absolute decision signals are transformed in two steps: First, The difference between the average absolute decision signal and the maximum of all other decision signals is computed:

Second, the resulting difference signals are scaled through a logistic transform *s*(*x*). The GLAM assumes an adaptive representation of the relative decision signals, which is maximally sensitive to small differences in the absolute decision signals close to 0 (where the difference between the absolute decision signal of an item and the maximum of all others is small):

The sensitivity of this transform is determined by the temperature parameter *τ* of the logistic function. Larger values of *τ* indicate stronger sensitivity to small differences in the average absolute decision signals .

To lastly capture response behaviour as well as RTs, the relative signals *R*_{i} can be fed into a linear stochastic race. Here, one item accumulator *E*_{i} is defined for each item in the choice set:

At each time step *t*, the amount of accumulated evidence is determined by the accumulation rate *vR _{i}*, (where

*v*indicates a general speed parameter that is independent of gaze and item value) and zero-centered normally distributed noise with standard deviation

*σ*. A choice for an item is made as soon as one accumulator reaches the decision boundary

*b*. The first passage time density

*f*

_{i}(

*t*) of a single linear stochastic accumulator

*E*

_{i}, with decision boundary

*b*, is given by the inverse Gaussian distribution:

However, this density does not take into account that there are multiple accumulators in each trial racing towards the same boundary. For this reason, *f*_{i}(*t*) must be corrected for the probability that any other accumulator crosses the boundary first. The probability that an accumulator crosses the boundary prior to *t*, is given by its cumulative distribution function *F*_{i}(*t*):

Here, Φ(*x*) defines the standard normal cumulative distribution function. Hence, the joint probability *p*_{i}(*t*) that accumulator *E*_{i} crosses *b* at time *t*, and that no other accumulator *E*_{j} has reached *b* first, is given by:

#### Contaminant response model

To reduce the influence of erroneous responses (e.g., when the participant presses a button by accident or has a lapse of attention during the task) on parameter estimation, we include a model of contaminant response processes in all estimation procedures: In line with existing drift diffusion modelling toolboxes (e.g., Wiecki, Sofer, & Frank, 2013), we assume a fixed 5% rate of erroneous responses ∊ that is modeled as a participant-specific uniform likelihood distribution *u*_{s}(*t*). This likelihood describes the probability of a random choice for any of the *N* available choice items at a random time point in the interval of empirically observed RTs (cf., Ratcliff & Tuerlinckx, 2002; Wiecki et al., 2013):

The resulting likelihood for participant *s* choosing item *i*, accounting for erroneous responses, is then given by:

The rate of error responses ∊ can be specified by the user to a different value than the default of 5% using the `error_weight` keyword in the `make_model` method (see below).

#### Individual parameter estimation details

The GLAM is implemented in a Bayesian framework using the Python library PyMC3 (Salvatier, Wiecki, & Fonnesbeck, 2016, version 3.7). The model has four parameters (*v*, *γ*, *σ*, *τ*). By default, uninformative, uniform priors between sensible limits (derived from earlier applications to four different datasets; Thomas et al., 2019) are placed on all parameters:

The *γ* parameter has a natural upper bound at 1 (indicating no gaze bias). The *τ* parameter has a natural lower bound at 0 (resulting in no sensitivity to differences in average absolute decision signals ). The velocity parameter *v* and the noise parameter *σ* must be strictly positive.

#### Hierarchical parameter estimation details

For hierarchical models, individual parameters are assumed to be drawn from Truncated Normal distributions, parameterized by mean and standard deviation, over which weakly informative, Truncated Normal priors are assumed (based on the distribution of group level parameter estimates obtained from four different datasets in Thomas et al. 2019; see Figs. 2, A2 and A3 and Table A1):

### Basic usage

#### Data format, the `GLAM` class

The core functionality of the GLAMbox is implemented in the `GLAM` model class. To apply the GLAM to data, an instance of the model class needs to be instantiated and supplied with the experimental data, first:
import glambox as gb
glam = gb.GLAM(data=data)

The data must be a pandas (McKinney, 2010) DataFrame with one row per trial, containing the following variable entries:

`subject`: Subject index (integer, first subject should be 0)`trial`: Trial index (integer, first trial should be 0)`choice`: Chosen item in this trial (integer, items should be 0, 1,…,*N*)`rt`: Response time (float, in seconds)for each item

*i*in the choice set:`item_value_i`: The item value (float)`gaze_i`: The fraction of total time in this trial that the participant spent looking at this item (float, between 0 and 1)

additional variables coding groups or conditions (string or integer)

For reference, the first two rows of a pandas DataFrame ready to be used with GLAM-box are shown in Table 1.

Next, the respective PyMC3 model, which will later be used to estimate the model’s parameters, can be built using the `make_model` method. Here, the researcher specifies the kind of the model: `‘individual’` if the parameters should be estimated for each subject individually, `‘hierarchical’` for hierarchical parameter estimation, or `‘pooled’` to estimate a single parameter set for all subjects. At this stage, the researcher can also specify experimental parameter dependencies: For example, a parameter could be expected to vary between groups or conditions. In line with existing modeling toolboxes (e.g., Voss & Voss, 2007; Wiecki et al., 2013) dependencies are defined using the `depends_on` argument. `depends_on` expects a dictionary with parameters as keys and experimental factors as values (e.g., `depends_on=dict(v=‘speed’`) for factor `‘speed’` with conditions `‘fast’` and `‘slow’` in the data). The toolbox internally handles within- and between subject de-signs and assigns parameters accordingly. If multiple conditions are given for a factor, one parameter will be designated for each condition. Finally, the `make_model` method allows parameters to be fixed to a specific value using the `*_val` arguments (e.g., `gamma_val`=1 for a model without gaze bias). If parameters should be fixed for individual subjects, a list of individual values needs to be passed.

#### Inference

Once the PyMC3 model is built, parameters can be estimated using the fit method: model.fit(method=‘MCMC’)

The fit method defaults to Markov-Chain-Monte-Carlo (MCMC; Gamerman & Lopes, 2006) sampling, but also allows for Variational Inference (see below).

#### Markov-Chain-Monte-Carlo

MCMC methods approximate the Bayesian posterior parameter distributions, describing the probability of a parameter taking certain values given the data and prior probabilities, through repeated sampling. GLAMbox can utilize all available MCMC step methods provided by PyMC3. The resulting MCMC traces can be accessed using the `trace` attribute of the model instance (note that a list of traces is stored for models of kind `‘individual’`). They should always be checked for convergence, to ascertain that the posterior distribution is approximated well. Both qualitative visual and more quantitative numerical checks of convergence, such as the Gelman-Rubin statistic and the number of effective samples are recommended (see Gelman & Shirley, 2011; Kruschke, 2014, for detailed recommendations). PyMC3 contains a range of diagnostic tools to perform such checks (such as the `summary` function).

#### Variational inference

Estimation can also be done using all other estimation procedures provided in the PyMC3 library. This includes variational methods like Automatic Differentiation Variational Inference (ADVI; Kucukelbir, Tran, Ranganath, Gelman, & Blei, 2017). To use variational inference, the `method` argument can be set to `‘VI’`, defaulting to the default variational method in PyMC3. We found variational methods to quickly yield usable, but sometimes inaccurate parameter estimates, and therefore recommend using MCMC for final analyses.

#### Accessing parameter estimates

After parameter estimation is completed, the resulting `estimates` can be accessed with the estimates attribute of the GLAM model instance. This returns a table with one row for each set of parameter estimates for each individual and condition in the data. For each parameter, a maximum a posteriori (MAP) estimate is given, in addition to the 95% Highest-Posterior Density Interval (HPD). If the parameters were estimated hierarchically, the table also contains estimates of the group-level parameters.

#### Predicting choice and response times

Choices and RTs can be predicted with the GLAM by the use of the predict method:

model.predict(n_repeats=50)For each trial of the dataset that is attached to the model instance, this method predicts a choice and RT according to Eq. 10, using the previously determined *maximum a posteriori* (MAP) parameter estimates. To obtain a stable estimate of the GLAM‘s predictions, as well as the noise contained within them, it is recommended to repeat every trial multiple times during the prediction. The number of trial repeats can be specified with the `n_repeats` argument. After the prediction is completed, the predicted data can be accessed with the `prediction` attribute of the model.

## Results

### Example 1: Individual Level Data & Model Comparison

Our first example is based on the study by Thomas et al. (2019). Here, the authors study the association between gaze allocation and choice behaviour on the level of the individual. In particular, they explore whether (1) gaze biases are present on the individual level and (2) the strength of this association varies between individuals. In this example, we replicate this type of individual model-based analysis, including parameter estimation, comparison between multiple model variants, and out-of-sample prediction of choice and RT data.

#### Simulating data

First, we simulate a dataset containing 30 subjects, each performing 300 simple value-based choice trials. We assume that in each trial participants are asked to choose the item that they like most out of a set of three presented alternatives (e.g., snack food items; similar to the task described in Krajbich and Rangel (2011)). While participants perform the task, their eye movements, choices and RTs are measured. After completing the choice trials, participants are further asked to indicate their liking rating for each of the items used in the choice task on a liking rating scale between 1 and 10 (with 10 indicating strong liking and 1 indicating little liking). The resulting dataset contains a liking value for each item in a trial, the participants’ choice and RT, as well as the participant‘s gaze towards each item in a trial (describing the fraction of trial time that the participant spent looking at each item in the choice set).

To simulate individuals’ response behaviour, we utilize the parameter estimates that were obtained by Thomas et al. (2019) for the individuals in the three item choice dataset by Krajbich and Rangel (2011) (see Fig. A2). Importantly, we assume that ten individuals do not exhibit a gaze bias, meaning that their choices are independent of the time that they spend looking at each item. To this end, we set the *γ* value of ten randomly selected individuals to 1. We further assume that individuals’ gaze is distributed randomly with respect to the values of the items in a choice set. An overview of the generating parameter estimates is given Fig. A1.

We first instantiate a GLAM model instance using `gb.GLAM()` and then use its `simulate_group` method. This method requires us to specify whether the individuals of the group are either simulated individually (and thereby independent of one another) or as part of a group with hierarchical parameter structure (where the individual model parameters are drawn from a group distribution, see below). For the former, the generating model parameters (indicated in the following as `gen_parameters`) are provided as a dictionary, containing a list of the individual participant values for each model parameter:
import glambox as gb
import numpy as np
glam = gb.GLAM()
no_bias_subjects = np.random.choice(a=gen_parameters[‘gamma’].size,
size=10,
replace=False)
gen_parameters[‘gamma’][no_bias_subjects] = 1
glam.simulate_group(kind=‘individual’,
n_individuals=30,
n_trials=300,
n_items=3,
parameters=gen_parameters)

As this example is focused on the individual level, we can further create a summary table, describing individuals’ response behaviour on three behavioural metrics, using the `aggregate_subject_level_data` function from the `analysis` module. The resulting table contains individuals’ mean RT, their probability of choosing the item with the highest item value from a choice set and a behavioural measure of the strength of the association between individuals’ gaze allocation and choice behaviour (indicating the mean increase in choice probability for an item that was fixated on longer than the others, after correcting for the influence of the item value on choice behaviour; for further details, see Thomas et al. 2019).

#### Exploring the behavioural data

In a first step of our analysis, we explore differences in individuals’ response behaviour. To this end, we plot the distributions of individuals’ scores on the three behavioural metrics, and their associations, using the `plot_behaviour_associations` function implemented in the plots module:

The resulting plot is displayed in Fig. 3 and shows that individuals’ probability of choosing the best item, as well as the strength of their behavioural association of gaze and choice, are not associated with their mean RT (Fig. 3D-E). However, individuals’ probability of choosing the best item increases with decreasing strength of the behavioural association of gaze and choice (Fig. 3F).

#### Likelihood-based model comparison

In a second step of our analysis, we want to test whether the response behaviour of each individual is better described by a decision model with or without gaze bias. To this end, we set up the two GLAM variants:

glam_bias = gb.GLAM(data=data) glam_bias.make_model(kind=‘individual’) glam_nobias = gb.GLAM(data=data) glam_nobias.make_model(kind=‘individual’, gamma_val=1)For the GLAM variant without gaze bias mechanism, we use the `gamma_val` argument and set it to a value of 1 (fixing *γ* to 1 for all subjects).

Subsequently, we fit both models to the data of each individual and compare their fit by means of the Widely Applicable Information Criterion (WAIC; Vehtari, Gelman, & Gabry, 2017):

glam_bias.fit(method=‘MCMC’, tune=5000, draws=5000) glam_nobias.fit(method=‘MCMC’, tune=5000, draws=5000)The fit method defaults to Metropolis-Hastings MCMC sampling (for methodological details, see Methods Section). The `draws` argument sets the number of samples to be drawn. This excludes the tuning (or burn-in) samples, which can be set with the `tune` argument. In addition, the `fit` method accepts the same keyword arguments as the PyMC3 sample function, which it wraps (see the PyMC3 documentation for additional details). The `chains` argument sets the number of MCMC traces (it defaults to four and should be set to at least two, in order to allow convergence diagnostics).

After convergence has been established for all parameter traces (for details on the suggested convergence criteria, see Methods), we perform a model comparison on the individual level, using the `compare` function of the PyMC3 library. For each individual, this function requires as input a dictionary, containing the individual‘s model and trace. The individually fitted models, as well as their traces, can be accessed through the `model` and `trace` attributes of our GLAM instances. Both are lists, with one entry per subject in the dataset. The ic argument specifies the information criterion to be used for the model comparison (`‘WAIC’` or Leave-One-Out Cross Validation `‘LOO’`). The `compare` function returns a table containing an estimate of the specified information criterion for each inputted model variant.

With this comparison, we are able to identify those participants whose response behaviour matches the assumption of gaze-biased evidence accumulation. In particular, we find that we accurately recover whether an individual has a gaze bias or not for 29 out of 30 individuals.

Looking at the individual parameter estimates (defined as *maximum a posteriori* (MAP) of the posterior distributions), we find that the individually fitted *γ* values (Fig. 4A) cover a wide range between −0.8 and 1, indicating strong variability in the strength of individuals’ gaze bias. We also find that *γ* estimates have a strong negative correlation with individuals’ scores on the behavioural gaze bias measure (Fig. 4B).

#### Out-of-sample prediction

We have identified those participants whose response behaviour is better described by a GLAM variant with gaze-bias than one without. Yet, this analysis does not indicate whether the GLAM is a good model of individuals’ response behaviour on an absolute level. To test this, we perform an out-of-sample prediction exercise.

We divide the data of each subject into even-and odd-numbered experiment trials and use the data of the even-numbered trials to fit both GLAM variants:

glam_bias.exchange_data(data_even) glam_bias.fit(method=‘MCMC’, tune=5000, draws=5000) glam_nobias.exchange_data(data_even) glam_nobias.fit(method=‘MCMC’, tune=5000, draws=5000)Subsequently, we evaluate the performance of both models in predicting individuals’ response behaviour using the MAP estimates and item value and gaze data from the odd-numbered trials. To predict response behaviour for the odd-numbered trials, we use the `predict` method. We repeat every trial 50 times in the prediction (as specified through the `n_repeats` argument) to obtain a stable pattern of predictions:

Lastly, to determine the absolute fit of both model variants to the data, we plot the individually predicted against the individually observed data on all three behavioural metrics. To do this, we use the `plot_individual_fit` function of the plots module. This function takes as input the observed data, as well as a list of the predictions of all model variants that ought to be compared. The argument `prediction_labels` specifies the naming used for each model in the resulting figure. For each model variant, the function creates a row of panels, plotting the observed against the predicted data:

The resulting plot is displayed in Fig. 5. We find that both model variants perform well in capturing individuals’ RTs and probability of choosing the best item (Fig. 5A, D, B, E). Importantly, only the GLAM variant with gaze bias is able to also recover the strength of the association between individuals’ choice behaviour and gaze allocation (Fig. 5C).

#### Conclusion

GLAMbox provides an easy-to-use tool to test the presence (and variability) of gaze biases on the individual level. With GLAMbox, we can easily fit the GLAM to individual participant data, compare different model variants and predict individuals’ response behaviour. It also provides a set of analysis functions to explore behavioural differences between individuals and to compare the fit of different model variants to observed response behaviour.

### Example 2: Hierarchical Parameter Estimation in Cases with Limited Data

In some research settings, the total amount of data one can collect per individual is limited, conflicting with the large amounts of data required to obtain reliable and precise individual parameter estimates from diffusion models (Lerche, Voss, & Nagler, 2017; Voss, Nagler, & Lerche, 2013). Hierarchical modeling can offer a solution to this problem. Here, each individual‘s parameter estimates are assumed to be drawn from a group level distribution. Thereby, during parameter estimation, individual parameter estimates are informed by the data of the entire group. This can greatly improve parameter estimation, especially in the face of limited amounts of data (Ratcliff & Childers, 2015; Wiecki et al., 2013). In this example, we will simulate a clinical application setting, in which different patient groups are to be compared on the strengths of their gaze biases, during a simple value-based choice task that includes eye tracking. It is reasonable to assume that the amount of data that can be collected in such a setting is limited on at least two accounts:

The number of patients available for the experiment might be low

The number of trials that can be performed by each participant might be low, for clinical reasons (e.g., patients feel exhausted more quickly, time to perform tests is limited, etc.)

Therefore, we simulate a dataset with a low number of individuals within each group (between 5 and 15 per group), and a low number of trials per participant (50 trials). We then estimate model parameters in a hierarchical fashion, and compare the group level gaze bias parameter estimates between groups.

#### Simulating data

We simulate data of three patient groups (*N*_{1} = 5, *N*_{2} = 10, *N*_{3} = 15), with 50 trials per individual, in a simple three item value-based choice task, where participants are instructed to simply choose the item they like the best. These numbers are roughly based on a recent clinical study on the role of the prefrontal cortex in fixation-dependent value representations (Vaidya & Fellows, 2015). As before, we sample parameter sets for each individual from the estimates obtained from fitting the model to the data from (Krajbich & Rangel, 2011). This ensures that the simulated data used here closely resemble data that could be obtained experimentally. Importantly, we let the gaze bias parameter *γ* differ systematically between the groups, with means of *γ*_{1} = 0.7 (weak gaze bias), *γ*_{2} = 0.1 (moderate gaze bias) and *γ*_{3} = 0.5 (strong gaze bias), respectively. All other parameters are sampled from the same distribution across groups (for an overview of the generating parameters, see Fig. A4). The groups primarily differ in the gaze bias parameter *γ*, whereas other parameters largely overlap (even though there is some non-systematic variance between individuals).

Behavioural differences between the three groups are plotted in Fig. 6, using the `plot_behaviour_aggregate` function from the plots module. Group-level summary tables can be created using the `aggregate_group_level_data` from the analysis module. Even though the groups only differ in the gaze bias parameter, they also exhibit differences in RT (Group 1 mean ± s.d. = 1.96 ± 0.33 s, Group 2 mean ± s.d. = 2.38 ± 1.4 s; Group 3 mean ± s.d. = 2.59 ± 1.26 ms; Fig. 6A) and choice accuracy (Group 1 mean ± s.d. = 0.88 0.06, Group 2 mean ± s.d. = 0.71 ± 0.07, Group 3 mean ± s.d. = 0.50 ± 0.16; Fig. 6B). As is to be expected, we can also observe behavioural differences in gaze influence measure (Group 1 mean ± s.d. = 0.08 ± 0.07, Group 2 mean ± s.d. = 0.26 ± 0.11, Group 3 mean ± s.d. = 0.38 ± 0.11; Fig. 6 C-D, where the choices of Group 3 are driven by gaze more than those of the other groups.

#### Building the hierarchical model

When specifying the hierarchical model, we allow all model parameters to differ between the three groups. This way, we will subsequently be able to address the question whether individuals from different groups differ on one or more model parameters (including the gaze bias parameter *γ*, which we are mainly interested in here). As for the individual models, we first initialize the model object using the GLAM class and supply it with the behavioural data using the data argument. Here, we set the model kind to `‘hierarchical’` (in contrast to `‘individual’`). Further, we specify that each model parameter can vary between groups (referring to a `‘group’` variable in the data):

#### Parameter estimation with MCMC

After the model is built, the next step is to perform statistical inference over its parameters. As we have done with the individual models, we can use MCMC to approximate the parameters’ posterior distributions (see Methods for details):

hglam.fit(method=‘MCMC’, draws=10000, tune=10000)#### Evaluating parameter estimates, interpreting results

After sampling is finished, and the chains were checked for convergence, we can turn back to the research question: Do the groups differ with respect to their gaze biases? Questions about differences between group-level parameters can be addressed by inspecting their posterior distributions. For example, the probability that the mean *γ*_{1,µ} for Group 1 is larger than the mean *γ*_{2,µ} of Group 2 is given by the proportion of posterior samples in which this was the case.

GLAMbox includes a `plot_node_hierarchical` function that plots posterior distributions of group level parameters. Additionally, the user can specify a list of comparisons between groups or conditions. If comparisons are specified, the posterior distributions of their difference and corresponding relevant statistics are added to the figure:
from glambox.plots import plot_node_hierarchical
parameters = [’v’, ‘gamma’, ‘s’, ‘tau’]
comparisons = [(‘group1’, ‘group2’),
(‘group1’, ‘group3’),
(‘group2’, ‘group3’)]
plot_node_hierarchical(model=hglam,
parameters=parameters,
comparisons=comparisons)

With the resulting plot (Fig. 7), the researcher can infer that the groups did not differ with respect to their mean velocity parameters *v _{i,µ}* (top row, pairwise comparisons), mean accumulation noise

*σ*(third row), or scaling parameters

_{i,µ}*τ*. The groups differ, however, in the strength of their mean gaze bias

_{i,µ}*γ*(second row): All differences between the groups were statistically meaningful (as inferred by the fact that the corresponding 95% HPD did not contain zero; second row, columns 2-4).

_{i,µ}#### Conclusion

When faced with limited data, GLAMbox allows users to easily build and estimate hierarchical GLAM variants, including conditional dependencies of model parameters. The Bayesian inference framework allows the researcher to answer relevant questions in a straightforward fashion. To this end, GLAMbox provides basic functions for computation and visualization.

### Example 3: Parameter Recovery

When performing model-based analyses of behaviour that include the interpretation of parameter estimates, or comparisons of parameter estimates between groups or conditions, the researcher should be confident that the model‘s parameters are actually identifiable. In particular, the researcher needs to be confident that the set of estimated parameters unambiguously describes the observed data better than any other set of parameters. A straightforward way of testing this is to perform a parameter recovery: The general intuition of a parameter recovery analysis is to first generate a synthetic dataset from a model using a set of known parameters, and then fitting the model to the synthetic data. Finally, the estimated parameters can be compared to the known generating parameters. If they match to a satisfying degree, the parameters were recovered successfully. Previous analyses have already indicated that the GLAM‘s parameters can be recovered to a satisfying degree (Thomas et al., 2019). Yet, the ability to identify a given set of parameters always depends on the specific features of a given dataset. The most obvious feature of a dataset that influences recoverability of model parameters is the number of data points included. Usually this quantity refers to the number of trials that participants performed. For hierarchical models, the precision of group-level estimates also depends on the number of individuals per group. Additional features that vary between datasets and that could influence parameter estimation are the observed distribution of gaze, the distribution of item values or the number of items in each trial. For this reason, it is recommended to test whether the estimated parameters of a model can be recovered in the context of a specific dataset.

To demonstrate the procedure of a basic parameter recovery analysis using GLAMbox, suppose we have collected and loaded a data set called `data`. In the first step, we perform parameter estimation as in the previous examples:

The next step is to create a synthetic, model-generated dataset using the model parameters estimated from the empirical data, together with the empirically observed stimulus and gaze data using the predict method. Setting `n_repeats` to 1 results in a dataset of the same size as the observed one:

The synthetic dataset should resemble the empirically observed data closely. If there are major discrepancies between the synthetic and observed data, this indicates that GLAM might not be a good candidate model for the data at hand. Next, we create a new model instance, attach the synthetic data, build a model and re-estimate its parameters:

glam_rec = gb.GLAM(data=synthetic) glam_rec.make_model(kind=‘individual’) glam_rec.fit(method=‘MCMC’, draws=5000, tune=5000)Finally, the recovered and generating parameters can be compared. If the recovered parameters do not match the generating parameters, the parameters cannot be identified given this specific dataset. In this case, parameter estimates should not be interpreted.

If, on the other hand, generating and recovered parameters do align, the parameters have been recovered successfully. This indicates that the model‘s parameters can be identified unambiguously given the general characteristics of the dataset and thereby increases confidence that the parameters obtained from the empirical data are valid and can be interpreted.

Here, all parameters could be recovered as illustrated in Fig. 8. For most individuals, the MAP estimates and their 95% HPDs are close to the known generating parameters. Across individuals, no systematic biases in the estimation can be identified.

#### Conclusion

In this example, we demonstrated how to perform a basic parameter recovery for a given dataset. When successful, this increases confidence that the parameters can be identified with the given dataset.

## Discussion

Researchers have recently started to systematically investigate the role of visual gaze in the decision making process. By now, it is established that eye movements do not merely serve to sample information that is then processed independently to produce a choice, but that they are actively involved in the construction of preferences (Ashby et al., 2016; Cavanagh et al., 2014; Folke et al., 2017; Konovalov & Krajbich, 2016; Krajbich et al., 2010; Krajbich & Rangel, 2011; Orquin & Mueller Loose, 2013; Shimojo et al., 2003; Tavares et al., 2017; Thomas et al., 2019). The dominant theoretical perspective is that evidence accumulation in favor of each option is modulated by gaze allocation, so that accumulation for non-fixated options is attenuated. This mechanism is formally specified in various models of gaze-dependent decision making, such as the attentional Drift Diffusion Model (aDDM; Krajbich et al., 2010; Krajbich & Rangel, 2011) and the conceptually related Gaze-weighted Linear Accumulator Model (GLAM; Thomas et al., 2019). In contrast to analyses based on behavioural and eye tracking data alone, these models can act as analytical tools that enable researchers to address questions regarding specific mechanisms in the decision process, like the gaze bias. They further formally establish mechanistic associations between choice, RT and eye tracking data and enable prediction of these data. Even though the advantages of applying these models are apparent, their use is limited by their complexity and the high cost of implementing, validating and optimizing them. Further, there are only few off-the-shelf solutions researchers can turn to, if they want to perform model-based analyses of gazedependent choice data, particularly for choice settings involving more than two alternatives. With GLAMbox, we present a Python-based toolbox, built on top of PyMC3, that allows researchers to perform model-based analyses of gaze-bias effects in decision making easily. We have provided step-by-step instructions and code to perform essential modeling analyses using the GLAM. These entail application of the GLAM to individual and group-level data, specification of parameter dependencies for both within-and between-subject designs, (hierarchical) Bayesian parameter estimation, comparisons between multiple model variants, out-of-sample prediction of choice and RT data, data visualization, Bayesian comparison of posterior parameter estimates between conditions, and parameter recovery. We hope that GLAMbox will make studying the association between gaze allocation and choice behaviour more accessible. We also hope that the resulting findings will ultimately help us better understand this association, its inter-individual variability and link to brain activity.

## Author contributions

F.M. and A.W.T. contributed equally to the manuscript and share first authorship. F.M. and A.W.T. conceived of the GLAM, implemented all analyses and visualizations. F.M. and A.W.T. also co-wrote all software. F.M. and A.W.T. wrote the original draft of the manuscript, and H.R.H. and P.N.C.M. reviewed and edited the manuscript. Funding for this work was acquired by P.N.C.M. The work was supervised by H.R.H. and P.N.C.M.

## Acknowledgments

The Junior Professorship of P.N.C.M. as well as the associated Dahlem International Network Junior Research Group Neuroeconomics is supported by Freie Universität Berlin within the Excellence Initiative of the German Research Foundation (DFG). Further support for P.N.C.M. is provided by the WZB Berlin Social Science Center. F.M. is supported by the International Max Planck Research School on the Life Course (LIFE). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

## Appendix

## Footnotes

Felix Molter and Peter N. C. Mohr, Freie Universität Berlin, School of Business & Economics, Garystr. 21, 14195 Berlin, Germany; WZB Berlin Social Science Center, Reichpietschufer 50, 10785 Berlin, Germany. Armin W. Thomas, Technische Universität Berlin, Marchstr. 23, 10587 Berlin, Germany; Max Planck School of Cognition, Stephanstr. 1A, 04103 Leipzig, Germany. Hauke R. Heekeren, Freie Universität Berlin, Department of Education and Psychology, Habelschwerdter Allee 45, 14195 Berlin, Germany