Abstract
The ability to rapidly estimate non-symbolic numerical quantities is a well-conserved sense across species with clear evolutionary advantages. Despite its importance, the rapid representation and estimation of numerosity is surprisingly imprecise and biased. However, a formal explanation for this seemingly irrational behavior remains unclear. We develop a unified normative theory of numerosity estimation that parsimoniously incorporates in a single framework information processing constraints alongside Brownian diffusion noise to capture the effects of time exposure of sensory estimations, logarithmic encoding of numerosity representations, and optimal inference via Bayesian decoding. We show that for a given allowable biological capacity constraint our model naturally endogenizes time perception during noisy efficient encoding to predict the complete posterior distribution of numerosity estimates. This model accurately predicts many features of human numerosity estimation as a function of temporal exposure, indicating that humans can rapidly and efficiently sample numerosity information over time. Additionally, we demonstrate how our model fundamentally differs from a thermodynamically-inspired formalization of bounded rationality, where information processing is modeled as acting to shift away from default states. The mechanism we propose is the likely origin of a variety of numerical cognition patterns observed in humans and other animals.
Author summary Humans have the ability to estimate the number of elements in a set without counting. We share this ability with other species, suggesting that it is evolutionarily relevant. However, despite its relevance, this sense is variable and biased. What is the origin of these imprecisions? We take the view that they are the result of an optimal use of limited neural resources. Because of these limitations, stimuli are encoded with noise. The observer then optimally decodes these noisy representations, taking into account its knowledge of the distribution of stimuli. We build on this view and incorporate stimulus presentation time (or contrast) directly into the encoding process using Brownian motion. This model can parsimoniously predict key characteristics of our perception and outperforms quantitatively and qualitatively a popular modeling approach that considers resource limitations at the stage of the response rather than the encoding.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
↵* rafael.polania{at}hest.ethz.ch
Revised manuscript following submission to a journal. It includes updates in all sections of the manuscript.
↵1 The rationale behind the choice of the squared m is the following: The signals can take any value over the real line. If we assume that the status quo of no-message (or no energy being spent) is 0, then any deviations from 0 that lead to decodable information should be considered including negative values leading to energy expenditure.