ABSTRACT
Attention is a key component of the visual system, essential for perception, learning, and memory. Attention can also be seen as a solution to the binding problem: concurrent attention to all parts of an entity allows separating it from the rest. However, the rich models of attention in computational neuroscience are generally not scaled to real-world problems and there are thus many behavioral and neural phenomena that current models cannot explain. Here, we propose a bidirectional recurrent model of attention that is inspired by modern neural networks for image segmentation. It conceptualizes recurrent connections as a multi-stage internal gating process where bottom-up connections transmit features, while top-down and lateral connections transmit attentional gating signals. Our model can recognize and segment simple stimuli such as digits as well as objects in natural images and is able to be prompted with object labels, attributes or locations. It can learn to perform a range of behavioral findings, such as object binding, selective attention, inhibition of return, and visual search. It also replicates a variety of neural findings, including increased activity for attended objects, features, and locations, attention-invariant tuning, and relatively late onset attention. Most importantly, our proposed model unifies decades of cognitive and neurophysiological findings of visual attention into a single principled architecture. Our results highlight that the ability to selectively and dynamically focus on specific parts of stimulus streams can help artificial neural networks to better generalize and align with human brains.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
↵‡ work done while at University of Pennsylvania