Temporal integration is a robust feature of perceptual decisions

Making informed decisions in noisy environments requires integrating sensory information over time. However, recent work has suggested that it may be difficult to determine whether an animal’s decision-making strategy relies on evidence integration or not. In particular, strategies based on extrema-detection or random snapshots of the evidence stream may be difficult or even impossible to distinguish from classic evidence integration. Moreover, such non-integration strategies might be surprisingly common in experiments that aimed to study decisions based on integration. To determine whether temporal integration is central to perceptual decision making, we developed a new model-based approach for comparing temporal integration against alternative “non-integration” strategies for tasks in which the sensory signal is composed of discrete stimulus samples. We applied these methods to behavioral data from monkeys, rats, and humans performing a variety of sensory decision-making tasks. In all species and tasks, we found converging evidence in favor of temporal integration. First, in all observers across studies, the integration model better accounted for standard behavioral statistics such as psychometric curves and psychophysical kernels. Second, we found that sensory samples with large evidence do not contribute disproportionately to subject choices, as predicted by an extrema-detection strategy. Finally, we provide a direct confirmation of temporal integration by showing that the sum of both early and late evidence contributed to observer decisions. Overall, our results provide experimental evidence suggesting that temporal integration is an ubiquitous feature in mammalian perceptual decision-making. Our study also highlights the benefits of using experimental paradigms where the temporal stream of sensory evidence is controlled explicitly by the experimenter, and known precisely by the analyst, to characterize the temporal properties of the decision process.

Making informed decisions in noisy environments requires integrating sensory information 16 over time. However, recent work has suggested that it may be difficult to determine whether 17 an animal's decision-making strategy relies on evidence integration or not. In particular, 18 strategies based on extrema-detection or random snapshots of the evidence stream may be 19 difficult or even impossible to distinguish from classic evidence integration. Moreover, such 20 non-integration strategies might be surprisingly common in experiments that aimed to study 21 decisions based on integration. To determine whether temporal integration is central to 22 perceptual decision making, we developed a new model-based approach for comparing 23 temporal integration against alternative "non-integration" strategies for tasks in which the 24 sensory signal is composed of discrete stimulus samples. We applied these methods to 25 behavioral data from monkeys, rats, and humans performing a variety of sensory decision-26 making tasks. In all species and tasks, we found converging evidence in favor of temporal 27 integration. First, in all observers across studies, the integration model better accounted for 28 standard behavioral statistics such as psychometric curves and psychophysical kernels. 29 Second, we found that sensory samples with large evidence do not contribute 30 disproportionately to subject choices, as predicted by an extrema-detection strategy. Finally, 31 we provide a direct confirmation of temporal integration by showing that the sum of both early 32 and late evidence contributed to observer decisions. Overall, our results provide experimental 33 evidence suggesting that temporal integration is an ubiquitous feature in mammalian 34 perceptual decision-making. Our study also highlights the benefits of using experimental 35 paradigms where the temporal stream of sensory evidence is controlled explicitly by the 36 experimenter, and known precisely by the analyst, to characterize the temporal properties of 37 the decision process. 38

INTRODUCTION 39
Perceptual decision-making is thought to rely on the temporal integration of noisy sensory 40 information on a timescale of hundreds of milliseconds to seconds. Temporal integration 41 corresponds to summing over time the evidence provided by each new sensory stimulus, and 42 optimizes perceptual judgments in face of noise (Bogacz et al. 2006; Gold and Shadlen 2007). 43 A perceptual decision can then be made on the basis of this accumulated evidence, either as 44 some threshold on accumulated evidence is reached, or if some internal or external cue 45 signals the need to initiate a response. 46 Although many behavioral and neural results are consistent with this integration framework, 47 temporal integration is a feature that has often been taken for granted rather than explicitly 48 tested. Recently, the claim that standard perceptual decision-making tasks rely on (or even 49 frequently elicit) temporal integration has been challenged by theoretical results showing that 50 non-integration strategies can produce behavior that carries superficial signatures of temporal 51 integration (Stine et al. 2020). These signatures include the relationship between stimulus 52 difficulty, stimulus duration and behavioral accuracy, the precise temporal weighting of 53 sensory information on the decisions, and the patterns of reaction times. 54 Here, we propose new analytical tools for directly assessing integration and non-integration 55 strategies from fixed-duration or variable-duration paradigms where, critically, the 56 experimenter controls the fluctuations in perceptual evidence over time within each trial 57 (discrete-sample stimulus, or DSS). By leveraging these controlled fluctuations, our methods 58 allow us to make direct comparisons between integration and non-integration strategies. We 59 apply these tools to assess temporal integration in data from monkeys, humans and rats that 60 performed a variety of perceptual decision-making tasks with DSS. Applying these analyses 61 to these behavioral datasets yields strong evidence that perceptual decision-making tasks in 62 all three species rely on temporal integration. Temporal integration, a critical element of many 63 major theories of perception at both the neural and behavioral levels, is indeed a robust and 64 pervasive aspect of mammalian behavior. Our results also illuminate the power of targeted 65 stimulus design and statistical analysis to test specific features of behavior. primates and rodents. Here we focus on experiments in which observers report their choice at 80 the end of a period whose duration is controlled by the experimenter ( Tasks using the DSS paradigm are classically thought to rely on sequential accumulation of 97 the stimulus evidence (Bogacz et al. 2006), which we refer to here as temporal integration. 98 Figure 1A shows an example stimulus sequence composed of n samples that provide differing 99 amounts of evidence in favor of one alternative vs. another ("A" vs. "B"). The accumulated 100 evidence fluctuates as new samples are integrated and finishes at a positive value indicating 101 overall evidence for stimulus category A ( Figure 1B). This integration process can be 102 formalized by defining the the decision variable or accumulated evidence and its updating 103 dynamics across stimulus samples: evidence. More specifically, the observer commits to a decision based on the first sample i in 127 the stimulus sequence that exceeds one of the two symmetrical thresholds, i.e. such that 128 | | ≥ . In our example stimulus, the first sample that reaches this threshold in evidence 129 space is the fifth sample, which points towards stimulus category B, so response B is selected 130 ( Figure 1C). This policy can be viewed as a memory-less decision process with sticky bounds. 131 If the stimulus sequence contains no extreme samples, so that neither threshold is reached, 132 the observer selects a response at random. (Following (Stine et al. 2020), we also explored  133 an alternative mechanism where in such cases the response is based on the last sample in 134 the sequence). 135 The second non-integration model corresponds to the snapshot model (Stine et al. 2020;Pinto 136 et al. 2018). In this model, the observer attends to only one sample i within the stimulus 137 sequence, and makes a decision based solely on the evidence from the attended sample: = 138 if > 0, and = if < 0. The position in the sequence of the attended sample is 139 randomly selected on each trial. In our example, the fourth sample is randomly selected, and 140 since it contains evidence towards stimulus category A, response A is selected ( Figure 1D). 141 We considered variants of this model that gave it additional flexibility, including: allowing the 142 prior probability over the attended sample to depend on its position in the sequence using a 143 non-parametric probability mass function estimated from the data; allowing for deterministic 144 vs. probabilistic decision-making rule based on the attended evidence; including attentional 145 lapses that were either fixed to 0.02 (split equally between leftward and rightward responses) 146 or estimated from behavioral data. We finally considered a variant of the snapshot where the 147 decision was made based on a sub-sequence of K consecutive samples within the main 148 stimulus sequence (1 ≤ < ), rather than based on a single sample. 149 150

Standard behavioral statistics favor integration accounts of pulse-based motion 151 perception in primates 152
To compare the three decision-making models defined above (i.e., temporal integration, 153 extrema-detection, snapshots), we first examined behavioral data from two monkeys 154 performing a fixed-duration motion integration task (Yates et al. 2017). In this experiment, 155 each stimulus was composed of a sequence of 7 motion samples of 150 ms each where the 156 motion strength towards left or right was manipulated independently for each sample. At the 157 end of the stimulus sequence, monkeys reported with a saccade whether the overall sequence 158 contained more motion towards the left or right direction. The animals performed 72137 and 159 33416 trials for monkey N and monkey P respectively, allowing for in-depth dissection of their 160 response patterns. 161 We fit the three models (and their variants) to the responses for each animal individually (see 162 Supplementary the accuracy and psychometric curves were accurately captured by the integration model. In 185 line with Stine and colleagues, we also found that both non-integration models could 186 reproduce the shape of the psychometric curve in monkey N, although the quantitative fit was 187 always better for the integration than non-integration models. By contrast both non-integration 188 models failed to capture the psychometric curve for monkey P ( Figure 2B, bottom row). More 189 systematically, the overall accuracy, which is an aggregate measure of the psychometric 190 curve, clearly differs between models, as the accuracy of the non-integration models 191 systematically deviated from animal data for both animals ( Figure 2C). In other words, all 192 models produce the same type of psychometric curves up to a scaling factor, and this scaling 193 factor (directly linked to the model accuracy) is key to differentiate model fits. better-than-observed accuracy for certain parameter ranges, but these are not the parameters 203 found by the maximum likelihood procedure, probably because they produce a pattern of 204 errors that is inconsistent with the observed pattern of errors. This indicates an inability of the 205 models to match the pattern of errors of the animal (see Discussion). 206 Finally, we assessed quantitatively which model provided the best fit, while correcting for 207 model complexity using the Akaike Information Criterion (AIC, Figure 2A). In both monkeys, 208 AIC favored the integration model over the two non-integration models by a very large margin. 209 We also explored whether variants of the extrema-detection and snapshot models could 210 provide a better match to the behavioral metrics considered above (Supp Figure 2 & 3). We 211 found using the AIC metric that the integration model was preferred over all variants of both 212 non-integration models, for both monkeys. Moreover, these model variants could not replicate 213 the psychophysical kernels as well as the integration model did (Supp Figure 2 & 3). In 214 conclusion, while psychometric curves may not always discriminate between integration and 215 non-integration strategies, other metrics including psychophysical kernels, predicted accuracy 216 and quality of fit (AIC) support temporal integration in monkey perceptual decisions. For one 217 model in one monkey (the snapshot model in monkey P), even the simple metric of overall 218 accuracy compellingly supported temporal integration (Fig. 2C). For the other monkey and/or 219 model, where the distinction was less clear, our model-based approach allowed us to leverage 220 these other metrics to reveal strong support for the temporal integration model ( Fig. 2A-C). 221 Although these data relies only on two experimental subjects, we show below further evidence 222 supporting the integration model in humans and rats. 223  and identify how non-integration models failed to capture them. We started by designing two 246 analyses aimed at testing whether choices were consistent with the extrema-detection model, 247 namely by testing whether choices were strongly correlated with the largest-evidence 248 samples. In the first analysis, we looked at the subset of trials where the evidence provided 249 by the largest-evidence sample in the sequence was at odds with the total evidence in the 250 sequence: we show one example in Figure 3B, where the largest evidence sample points 251 towards response B, while the overall evidence points towards response A. These 'disagree 252 trials' represent a substantial minority of the whole dataset: 1865 trials (2.6%) in monkey N, 253 1831 trials (5.5%) in monkey P. If integration is present, the response of the animal should in 254 general be aligned with the total evidence from the sequence ( Figure 3A, red bars). By 255 contrast, if it followed the extrema-detection model ( Figure 1C), it should in general follow the 256 largest evidence sample ( Figure 3A, green bars). In both monkeys, animal choices were more 257 often than not aligned with the integrated evidence ( Figure  Cumming 2007). The extrema-detection model predicts that, in principle, samples whose 270 evidence is below the threshold have little impact on the decision, while samples whose 271 evidence is above the threshold have full impact on the decision. By contrast, the integration 272 model predicts that subjective weight should grow linearly with sample evidence. We 273 estimated subjective weights from monkey choices using a regression method similar in spirit 274 to previous methods (Yang and Shadlen 2007; Waskom and Kiani 2018), taking the form 275 Here f is a function that captures the subjective weight of 276 the sample as a function of its associated evidence. Whereas previous methods estimated 277 subjective weights assuming a uniform psychophysical kernel, our method estimated 278 simultaneously subjective weights ( ) and the psychophysical kernel , thus removing 279 potential estimation biases due to unequal weighting of sample evidence (see Methods). In 280 both monkeys, we indeed found that the subjective weight depends linearly on sample 281 evidence for low to median values of sample evidence (motion pulse lower than 6), in 282 agreement with the integration model (Supp. Figure 4). Surprisingly however, simulated data 283 of the extrema-detection model displayed the same linear pattern for low to median values of 284 sample evidence. We realized this was due to the very high estimated sensory noise (Supp 285 Fig 1), such that, according to the model, even samples with minimal sample evidence were 286 likely to reach the extrema-detection threshold. In other words, unlike the previous analyses, 287 inferring the subjective weights used by animals was inconclusive as to whether animals 288 deployed the extrema-detection strategy. This somewhat surprising dependency reinforces 289 the importance of validating intuitions by fitting and simulating models (

293
Example of an 'agree trial' where the total stimulus evidence (accumulated over samples) and the 294 evidence from the largest-evidence sample point towards the same response (here, response A). In 295 this case, we expect that temporal integration (TI) and extrema-detection (ED) will produce similar 296 responses (here, A). B. Example of a 'disagree trial', where the total stimulus evidence and evidence 297 from the largest-evidence sample point towards opposite responses (here A for the former; B for the 298 latter). In this case, we expect that integration and extrema-detection models will produce opposite 299 responses. C. Proportion of choices out of all disagree trials aligned with total evidence, for animal 300 (black bars), integration (red) and extrema-detection model (green).

302
Choice dependence on early and late stimulus evidence show direct evidence for 303 temporal integration 304 Following model comparisons favoring integration over both snapshot and extrema-detection 305 models, the immediately previous analysis relied on a special subset of trials to provide an 306 additional, and perhaps more intuitive, signature of integration, which ruled out extrema-307 detection as a possible strategy of either monkey. We next employed another novel analysis 308 specifically designed to tease apart unique signatures of the integration and snapshot models. 309 More specifically, we tested whether decisions were based on the information from only one 310 part of the sequence, as predicted by the snapshot model, or from the full sequence, as 311 predicted by the integration model. To facilitate the analysis, we defined early evidence Et by 312 grouping evidence from the first three samples in the sequence, and late evidence Lt, as the 313 grouped evidence from the last four samples. We then displayed the proportion of rightward 314 responses as a function of both early and late evidence in a graphical representation that we 315 call integration map ( Figure 4A). A pure integration strategy corresponds to summing early 316 and late evidence equally, which can be formalized as ( ) = ( + ), where is a 317 sigmoidal function. Because this only depends on the sum + , the probability of response 318 is invariant to changes in the ( , ) space along the diagonal, which leaves the sum 319 unchanged. These diagonals correspond to isolines of the integration map ( Figure 4A, left; 320 Supp Figure 5A). In other words, straight diagonal isolines in the integration map reflect the 321 fact that the decision only depends on the sum of evidence + . Straight isolines thus 322 constitute a specific signature of evidence integration. 323 We contrasted this integration map with the one obtained from a non-integration strategy 324 ( Figure 4A middle panel; Supp Figure 5A). There we assumed that the decision depends either 325 on the early evidence or on the late evidence, as in the snapshot model, with equal probability. 326 This can be formalized as ( ) = 0.5 ( ) + 0.5 ( ). In this case, if late evidence is null 327 ( ( ) = 0.5) and early evidence is very strong toward the right ( ( ) ≃ 1) the overall 328 probability for rightward response is By contrast, the lapse parameters showed no consistent relationship with late evidence 364 ( Figure 4E, right panel). Finally, we directly assessed the similarities between the integration 365 maps from monkey responses and from simulated responses for the three models (integration, 366 snapshot, extrema-detection). The model-data correlation was larger in the integration model 367 than in the non-integration strategies for both monkeys ( Figure 4E; unpaired t-test on 368 bootstrapped r values: p<0.001 for each animal and comparison against extrema-detection 369 and against snapshot model). Overall, integration maps allow to dissect how early and late 370 parts of the stimulus sequence are combined to produce a behavioral response. In both 371 monkeys, these maps carried signatures of temporal integration. For monkey P, the integration 372 model and the data look very similar. For monkey N, there is still a qualitative dependency that 373 deviates from non-integration, but which is not as uniquely matched to the integration strategy 374 (although the imperfect coverage of the two-dimensional space impedes further investigations). 375 Thus, complementing the statistical model tests favoring integration, this richer visualization 376 allows the data to show us that some degree of integration is occurring, albeit not perfect.

Temporal integration in human visual orientation judgments 397
Overall, all our analyses converged to support the idea that monkey decisions in a fixed-398 duration motion discrimination task relied on temporal integration. We explored whether the 399 same results would hold for two other species and perceptual paradigms. We first analyzed 400 the behavioral responses from 9 human subjects performing a variable-duration orientation 401 discrimination task (Cheadle et al. 2014). In each trial, a sequence of 5 to 10 gratings with a 402 certain orientation were shown to the subject, and the subject had to report whether they 403 thought the gratings were overall mostly aligned to the left or to the right diagonal. In this task, 404 the experimenter can control the evidence provided by each sample by adjusting the 405 orientation of the grating. We performed the same analyses on the participant responses than 406 on monkey data. As for monkeys, we found that the integration model nicely captured 407 psychometric curves, participant accuracy and psychophysical kernels ( Figure 5A-C, red 408 curves and symbols). By contrast, both non-integration models failed to capture these patterns 409 ( Figure 5A-C, blue and green curves and symbols). The accuracy from both models 410 consistently underestimated participant performance: 8 and 6 out of 9 subjects outperformed 411 the maximum performance for the snapshot and extrema-detection models, respectively 412 (Supp. Figure 7). This suggests that human participants achieved such accuracy by integrating 413 sensory evidence over successive samples. Moreover, subjects overall weighted more later 414 samples ( Figure 5C), which is inconsistent with the extrema-detection mechanism. A formal 415 model comparison confirmed that in each participant, the integration model provided a far 416 better account of subject responses than either of the non-integration models did ( Figure 5D). 417 We then assessed how subjects combined information from weak and strong evidence 418 samples into their decisions, using the same analyses as for monkeys. As predicted by the 419 integration model, but not by the extrema-detection model, humans choices consistently 420 aligned with the total stimulus evidence and not simply with the strongest evidence sample 421 ( Figure 5E). Finally, the average integration map for early and late evidence within the stimulus 422 sequence displayed nearly linear diagonal isolines, showing that both were integrated into the 423 response ( Figure 5F). Integration maps from participants correlated better with maps predicted 424 by the integration model than with maps predicted by either of the alternative non-integration 425 strategies ( Figure 5G; two-tailed t-test on bootstrapped r values: p<0.001 for 7 out 9 426 participants in the integration vs snapshot comparison; in all 9 participants for the integration 427 vs extrema-detection comparison). Overall, these analyses show converging evidence that 428 human decisions in an orientation discrimination task rely on temporal integration. 429

438
Integration map for early and late stimulus evidence, computed as in Figure 4A

Temporal integration in rat acoustic intensity judgments 444
Finally, we analyzed data from 5 rats performing a fixed-duration auditory task where the 445 animals had to discriminate the side with larger acoustic intensity (Pardo-Vazquez et al. 2019). 446 The relative intensity of the left and right acoustic signals was modulated in sensory samples 447 of 50 ms, so that the stimulus sequence provided time-varying evidence for the rewarded port. 448 The stimulus sequence was composed of either 10 or 20 acoustic samples of 50 ms each, for 449 a total duration of 500 or 1000 ms. We applied the same analysis pipeline as for monkey and 450 human data. The integration model provided a much better account of rat choices than non-451 integration strategies, based on psychometric curves (Fig. 6A), predicted accuracy (Fig. 6B), 452 psychophysical kernel (Fig. 6C) and model comparison using AIC (Fig. 6D). Similar to humans 453 and monkeys, rats tended to select the side corresponding to the total stimulus evidence and 454 not the largest sample evidence in "disagree" trials, as predicted by the integration model (Fig.  455  6E). Finally, the integration map was largely consistent with an integration strategy (Fig. 6F), 456 and correlated more strongly with simulated maps from the integration model ( In all analyses we contrasted predictions from one integration and two non-integration 472 computational models of behavioral responses (Figure 1). For each non-integration model, we 473 considered multiple variants to explore the maximal flexibility offered by each framework to 474 capture animal behavior. For our datasets, evidence in favor of integration was easy to achieve 475 using standard model comparison technique as well as comparing simulated psychometric 476 curves and psychophysical kernels to their experimental counterparts (Figure 2 This overall increased noise level leads to a looser relationship between the stimulus condition 507 and the behavioral responses, which can thus be accounted for by a larger spectrum of 508 computational mechanisms. These issues have been addressed by forcing "pulses" of a 509 certain stimulus strength and/or by performing post hoc analyses to estimate signal and noise 510 (Kiani, Hanks, and Shadlen 2008) but these are partial solutions that DSS paradigms solve by 511 design. This illustrates the benefits of using experimental designs where variability in stimulus 512 information can be fully controlled and parametrized by the experimenter, as these paradigms 513 discriminate more precisely between different models of perceptual decisions. 514 In at least one monkey, although quantitative metrics such as penalized log-likelihood and fits 515 to psychometric curves clearly pointed to the integration model as the best account to 516 behavior, the qualitative failure modes of the non-integration strategies (especially the 517 snapshot model) was not immediately clear. Although we tried variants for each non-518 integration model, there remained a possibility that our precise implementation failed to 519 account for monkey behavior but that other possible implementations would. Note that the 520 extrema-detection and snapshot are two of the many possible non-integration strategies. A 521 generic form for non-integration strategies corresponds to a policy that implements position-522 dependent thresholds on the instantaneous sensory evidence. In this framework, the extrema-523 dependent model corresponds to the case with a position-independent threshold, while the 524 snapshot model corresponds to a null bound for one sample and infinite bounds for all other 525 samples. To rule out these more complex strategies, we conducted additional analyses that 526 specifically targeted core assumptions of the integration and non-integration strategies. 527 First, the extrema-detection model fails to account for the data because it predicts that largest-528 evidence samples should have a disproportionate impact on choices. However, this does not 529 occur, as monkeys and humans tend to respond according to the total evidence and not the 530 single large-evidence sample ( Figure 3C and 5E) -see (Levi et al. 2018) for a similar analysis. 531 All non-integration strategies share the property that on each trial the decision should only rely 532 either on the early or the late part of the trial. We thus directly examined the assumptions of 533 integration and non-integration models by assessing how the evidence from the early and late 534 parts of each stimulus sequence is combined to produce a decision. We introduced integration 535 maps (Figure 4)  We present here the most relevant features of the behavioral protocol -see (Yates et al. 2017) 606 for further experimental details. Two adult rhesus macaques (subject N, a 10-year old female; 607 and subject P, a 14-year old male) performed a motion discrimination task. On each trial, a 608 stimulus consisting of a hexagonal grid (5-7 degrees, scaled by eccentricity) of Gabor patches 609 (0.9 cycle per degree; temporal frequency 5 Hz for Monkey P; 7 Hz for Monkey N) was 610 presented. Monkeys were trained to report the net direction of motion in a field of drifting and 611 flickering Gabor elements with an eye movement to one of two targets. Each trial motion 612 stimulus consisted of seven consecutive motion pulses, each lasting 9 or 10 video samples 613 (150 ms or 166 ms; pulse duration did not vary within a session), with no interruptions or gaps 614 between the pulses. The strength and direction of each pulse for trial t and sample i was 615 set by a draw from a Gaussian rounded to the nearest integer value. The difficulty of each trial 616 was modulated by manipulating the mean and variance of the Gaussian distribution. Monkeys 617 were rewarded based on the empirical stimulus and not on the stimulus distribution. We 618 analyzed a total of 112 sessions for monkey N and 60 sessions for monkey P, with a total of 619 72137 and 33416 valid trials, respectively. These sessions correspond to sessions with 620 electrophysiological recordings reported in (Yates et al. 2017) and purely behavioral sessions. 621 All experimental protocols were approved by The University of Texas Institutional Animal Care 622 and Use Committee (AUP-2012-00085, AUP-2015-00068) and in accordance with National 623 Institute of Health standards for care and use of laboratory animals 624 625 Human experiment. 626 9 adult subjects (5 males, 4 females; aged 19-30) performed an orientation discrimination task 627 whereby on each trial they reported in each trial whether a series of gratings were perceived 628 to be mostly tilted clockwise or counterclockwise (Drugowitsch et al. 2016). Each discrete-629 sample stimulus consisted of five to ten gratings. Each grating was a high-contrast Gabor 630 patch (colour: blue or purple; spatial frequency = 2 cycles per degree; SD of Gaussian 631 envelope = 1 degree) presented within a circular aperture (4 degrees) against a uniform gray 632 background. Each grating was presented during 100 ms, and the interval between gratings 633 was fixed to 300 ms. The angles of the gratings were sampled from a von Mises distribution 634 centered on the reference angle ( 0 = 45 degrees for clockwise sequences, 135 degrees for 635 anticlockwise sequences) and with a concentration coefficient = 0.3. The normative 636 evidence provided by sample i in trial t in favor of the clockwise category corresponds to how 637 well the grating orientation aligns with the reference orientation, i.e. = 2 (2( − 638 0 )) . 639 Each sequence was preceded by a rectangle flashed twice during 100 ms (the interval 640 between the flashes and between the second flash and the first grating varied between 300 641 and 400 ms). The participants indicated their choice with a button press after the onset of a 642 centrally occurring dot that succeeded the rectangle mask and were made with a button press 643 with the right hand. Failure to provide a response within 1000 ms after central dot onset was 644 classified as invalid trial. Auditory feedback was provided 250 ms after participant response 645 (at latest 1100 ms after end of stimulus sequence). It consisted of an ascending tone (400 646 Hz/800 Hz; 83 ms/167 ms) for correct responses; descending tone (400 Hz/ 400 Hz; 83 647 ms/167 ms) for incorrect responses; a low tone (400 Hz; 250 ms) for invalid trials. there was only one grating in the sequence, and it was perfectly aligned with one of the 652 reference angles. In the second initiation block, sequences of gratings were introduced, and 653 the difficulty was gradually increased (the distribution concentration linearly decreased from 654 = 1.2 to = 0.3). Invalid trials (mean 6.9 per participant, std 9.4) were excluded from all 655 regression analyses. Long-Evans rats (no genetic modifications; 350-650g; 8-10 weeks-old at the beginning of the 662 experiment), pair-housed and kept on stable conditions of temperature (23 o C) and humidity 663 (60%) with a constant light-dark cycle (12h:12h, experiments were conducted during the light 664 phase). Rats had free access to food, but water was restricted to behavioral sessions. Free 665 water during a limited period was provided on days with no experimental sessions. 666 Rats performed a fixed-duration auditory discrimination task where they had to classify noisy 667 stimuli based on the intensity difference between the two lateral speakers (Pardo-Vazquez et 668 al. 2019; Hermoso-Mendizabal et al. 2020). A LED on the center port indicated that the rat 669 could start the trial by poking in that center port. After this poke, rats had to hold their snouts 670 in the central port during 300 ms (i.e. fixation). Following this period, an acoustic DSS was 671 played. Rats had to remain in the central port during the entire presentation of the stimulus. 672 At stimulus offset, the center LED went off and rats could then come out of the center port and 673 head towards one of the two lateral ports. Entering the lateral port associated with the speaker 674 that generated the larger sound intensity led to a reward of 24 µl of water (correct responses), 675 while entering the opposite port lead to a 5 s timeout accompanied with a bright light during 676 the entire period (incorrect responses). If rats broke fixation during the pre-stimulus fixation 677 period or during the stimulus presentation, the sound was interrupted, the center LED 678 remained on, and the rat had to initiate a new trial starting by center fixation followed by where describes a sensitivity parameter. The deterministic case can be viewed as the limit 733 of the non-deterministic case when all sensitivity parameters diverge to +∞, i.e. when 734 sensory and decision noise are negligible. 735 The overall probability for selecting right choice (marginalizing over the attended sample, 736 which is a hidden variable) can be captured by a mixture model : 737 The mixture coefficients ( = 1, . . , , ) are constrained to be non-negative and sum up to 739 1. In the non-deterministic model, the parameters also include sensitivity parameters .The 740 model is fitted using Expectation-Maximization (Bishop 2006). In the Expectation step, we 741 compute the responsibility , i.e. the posterior probability that the sample i was attended at 742 trial t (for i=L, R, the probability that the trial corresponded to a lapse trial To speed up the computations, in each M step, we only performed one Newton-Raphson 757 update for each sensitivity parameter, rather than iterating the updates fully until convergence. 758 The EM procedure was run until convergence, assessed by an increment in the log-likelihood 759 ( , ) of less than 10 -9 after one EM iteration. The log-likelihood for a given set of parameters 760 is given by ( , ) = ( (0, 2 ). H is the step function. If the stimulus sequence ends and no sample has reached the 784 threshold, then the decision is taken at chance. As described in (Waskom and Kiani 2018),  785 the probability for a rightward choice at trial t can be expressed as: 786 obtained from the Laplace approximation. For psychometric curves, we first defined the 808 weighted stimulus evidence Tt at trial t as the session-modulated weighted sum of signed 809 sample evidence; with the weights obtained from the logistic regression model above 810 = . We then divided the total stimulus evidence into 50 quantiles (10 for human 811 subjects) and computed the psychometric curve as the proportion of rightward choices for 812 each quantile. 813 The boundary performance for the snapshot and extrema-detection models corresponds to 814 the best choice accuracy out of all the parameterizations for each model . In the snapshot  815  model, the boundary performance corresponds to the deterministic version with no-lapse,  816 where the attended sample is always the sample * whose sign better predicts the stimulus 817 category over all animal trials, i.e. * = 1 and = 0 if ≠ * . For the extrema-detection model, 818 the boundary performance corresponds to the lapse-free model with no sensory noise ( = 0) 819 and a certain value for threshold that is identified for each subject by simple parameter 820 search. 821 Finally, model selection was performed using the Akaike Information Criterion = 2 − 822 2 , where p is the number of model parameters and is the likelihood evaluated at 823 maximum likelihood parameters. 824 825 Analysis of majority-driven choices 826 We selected for each animal the subset of trials corresponding to when the largest evidence 827 sample was at odds with the total stimulus evidence, i.e. where ( , | | ≥ | | ∨ ) ≠ 828 ( ). For this subset of trials, we computed the proportion of animal choices that were 829 aligned with the overall stimulus evidence. We repeated the analysis for simulated data from 830 the integration and extrema-detection models. 831 832 Subjective weighting analysis 833 In order to estimate the impact of each sample on the animal choice as a function of sample 834 evidence, we built and estimated the following statistical model 835 As can be seen, this model is equivalent to the temporal integration model under the 837 assumption that f is a linear function. Rather, here we wanted to estimate the function f (as 838 well as the session gain , lateral bias 0 and sensory weight ). Including the session gain 839 was necessary for estimating f accurately from the monkey and rat behavioral data, since the 840 distribution of pulse strength was varied across sessions and could otherwise induce a 841 confound. We assumed that f is an odd function, i.e. (− ) = − ( ). This equation takes 842 the form of a Generalized Unrestricted Model and was fitted using the Laplace approximation 843 method as described in (Adam and Hyafil 2020 , where the 854 weights and session gains correspond to parameters estimated from the temporal 855 integration model (session gains were omitted for human participants). Next we plotted the 856 integration map which represents the probability for rightward choices as a function of ( , ). 857 The map was obtained by smoothing data points with a two-dimensional gaussian kernel. 858 More specifically, for each pair value (E,L), we selected the trials whose early and late 859 evidence values and fell within a certain distance to (E,L), i.e. = (( , )( , )) < 860 2. We then computed the proportion of rightward choices for the selected trials, with a weight 861 for each trial depending on the distance to the pair value = (( , ); ( , ),0. 1 2 ). 862 Because the space (E,L) was not sampled uniformly during the experiment, we represent the 863 density of trials by brightness. For each subject we obtained integration maps both from 864 subject data as well as from model simulations. For each model, we computed the Pearson 865 correlation between the maps obtained from the corresponding simulation and from the 866 subject data. We tested the significance of correlation measures between models by using a 867 bootstrapping procedure: we calculated the correlation measure r from 100 bootstraps for 868 each model and participant, and then performed an unpaired t-test between bootstrapped r. 869 870 Next, we analyzed the conditional psychometric curves, i.e. the psychometric curves for the 871 early evidence conditioned on the value of late evidence, which correspond to vertical cuts in 872 the integration map. To do so, we first binned late evidence by bins of width 0.5. Conditional 873 psychometric curve represent the probability of rightward choices as a function of early 874 evidence , separately for each late evidence bin. For each late evidence bin, we also 875 estimated the corresponding bias , left lapse and right lapse by fitting the following 876 function on the corresponding subset of trials: 877 . Note that this weighting converts the 889 evidence onto the space defined by the preferred direction of the neuron, such that positive 890 evidence signals evidence towards the preferred direction and negative evidence signals 891 evidence towards the anti-preferred direction. We then merged the vectors for normalized 892 spike counts ( ) = ( ) / ( 0 ( ) ), early evidence ( ) and late evidence ( ) across all 893 neurons. The normalized spike counts were binned by values of early and late evidence (bin 894 width: 0.02), and the average over each bin was computed after convolving with a two-895 dimensional gaussian kernel of width 0.1. The neural integration map represents the average 896 normalized activity per bin. 897 Simulations of spiking data for the integration and non-integration models were proceeded as 898 follows. First, the neural integration model corresponds to linear summing with neuron-specific 899 weights which are then passed through an exponential nonlinearity; the spike counts for each 900 trial are generated using a Poisson distribution whose rate is equal to the nonlinear output 901 (Supp Figure 9a, top). This corresponds exactly to the generative process of the Poisson GLM 902 described above. For the extrema detection model (Supp Figure 9a middle), we hypothesized 903 that LIP activity would only be driven by the sample that reaches the threshold (and dictates 904 the animal response). To this end, we first simulated the behavioral extrema detection model 905 for all trials, using parameters ( , , , ) fitted from the corresponding animal, to identify 906 which sample i reaches the subject-specific threshold. We then assumed that the spiking 907 activity of the neuron would follow the stimulus value at sample i (signed by the preferred 908 direction of the neuron ( ) through: 909 ( ( ) ) = ( 0 ( ) + ( ) ( ) /2) 910 Again the spike count were generated from a Poisson distribution with rate ( ( ) ). 911 Finally, for the snapshot model (Supp. Figure 9a bottom), we assumed that the neuron activity 912 would merely reflect the sensory value of the only sample it would attend. We assumed that 913 the probability mass function to attend each of the 7 samples would be neuron-specific, so we 914 used the normalized weights of the Poisson GLM for that specific neuron as defining such 915 probability (weights were signed by the neuron preferred direction so that the vast majority of 916 weights were positive; negative weights were ignored). For each trial, we thus randomly 917 sampled the attended sample i using this probability mass function and then simulated the 918 spike count ( ) from a Poisson distribution with rate ℎ ( ( ) ) = ( 0 ( ) + 919 ( ) ( ) ). 920 We simulated spiking activity for each neuron and for each integration and non-integration 921 model, and then used simulated data to compute neural integration maps exactly as described 922 above for the actual LIP neuron activity. 923 924 Data and code availability 925 All experimental data (behavioral and neural data in monkeys, behavioral data in rats and 926 humans) and code to run the analysis will be made publicly available 927 at https://github.com/ahyafil prior to final publication 928