RT Journal Article SR Electronic T1 A dual foveal-peripheral visual processing model implements efficient saccade selection JF bioRxiv FD Cold Spring Harbor Laboratory SP 725879 DO 10.1101/725879 A1 Emmanuel Daucé A1 Pierre Albiges A1 Laurent Perrinet YR 2020 UL http://biorxiv.org/content/early/2020/01/27/725879.abstract AB Visual search involves a dual task of localizing and categorizing an object in the visual field of view. We develop a visuo-motor model that implements visual search as a focal accuracy-seeking policy, and we assume that the target position and category are random variables which are independently drawn from a common generative process. This independence allows to divide the visual processing in two pathways that respectively infer what to see and where to look, consistently with the anatomical What versus Where separation. We use this dual principle to train a deep neural network architecture with the foveal accuracy used as a monitoring signal for action selection. This allows in particular to interpret the Where network as a retinotopic action selection pathway, that drives the fovea toward the target position in order to increase the recognition accuracy by the What network. After training, the comparison of both networks accuracies amounts either to select a saccade or to keep the eye focused at the center, so as to identify the target. We test this on a simple task of finding digits in a large, cluttered image. A biomimetic log-polar treatment of the visual information implements the strong compression rate performed at the sensor level by retinotopic encoding, and is preserved up to the action selection level. Simulation results demonstrate that it is possible to learn this dual network. After training, this dual approach provides ways to implement visual search in a sub-linear fashion, in contrast with mainstream computer vision.Author summary The visual search task consists in extracting a scarce and specific visual information (the “target”) from a large and cluttered visual display. In computer vision, this task is usually implemented by scanning all different possible target identities in parallel at all possible spatial positions, hence with strong computational load. The human visual system employs a different strategy, combining a foveated sensor with the capacity to rapidly move the center of fixation using saccades. Then, visual processing is separated in two specialized pathways, the “where” pathway mainly conveying information about target position in peripheral space (independently of its category), and the “what” pathway mainly conveying information about the category of the target (independently of its position). This object recognition pathway is shown here to have an essential role, providing an “accuracy drive” that serves to force the eye to foveate peripheral objects in order to increase the peripheral accuracy, much like in the “actor/critic” framework. Put together, all those principles are shown to provide ways toward both adaptive and resource-efficient visual processing systems.