Visuomotor mismatch responses as a hallmark of explaining away in causal inference

How are visuomotor mismatch responses in visual cortex embedded into cortical processing? We here argue that mismatch responses are best understood as the result of a cooperation of motor and visual areas to jointly explain optic flow. This cooperation ensures that optic flow is not explained redundantly by both areas, which in the language of causal inference is termed explaining away. This improves the efficiency of the resulting neural code, and allows the animal to easily detect movements that are independent of its own locomotion. We demonstrate the emergence of mismatch responses from explaining away in simulations, where spiking neurons learn to encode optic flow stimuli and locomotion. We furthermore lay out arguments against the prevailing idea that mismatch signals are the result of a dedicated error computation in a hierarchical model. These results provide a new perspective on several recent experiments of cross-modal neural interactions in cortex.


Introduction
In recent years several experiments confirmed the surprising result that locomotion has a considerable impact on neural activity in visual cortex [1,2]. Some of these experiments managed to show that pyramidal cells in layer 2/3 of V1 visual cortex compute a visuomotor mismatch, i.e. the difference between presented optic flow and optic flow predicted from locomotion of the animal [3,4].
An important question to ask is what purpose these computations fulfill in cortex, and how they should be interpreted. A widespread idea is that these mismatch responses are indicative for the canonical computation of 'error neurons' [4], that occurs in a hierarchical predictive coding model of cortex [5]. However, as of yet no clear picture of how exactly these error neurons could be embedded into a cortical hierarchical model has been presented.
Here we argue for a different interpretation of mismatch responses in visual cortex, as a result of explaining away. First, we explain the idea of this effect, and demonstrate it in simulations, where spiking neurons learn to encode simple optic flow stimuli and locomotion. We then lay out arguments against the prevailing interpretation in terms of dedicated error neurons.

Theory
The core idea of our model is that visual and motor neurons jointly explain the optic flow the animal perceives. This does not necessarily require that motor neurons are actively driven by optic flow stimuli, but only means that the explanation of optic flow (in the internal model of the animal) is distributed over different populations. Formally, we can state this with the following model (Fig 1A), where it is assumed that optic flow r flow can be reconstructed as a linear sum of the activity of a visual population r V1 and a motor population r M2 plus some Gaussian noise n flow with variance σ 2 flow : Here, D b←a are decoding matrices, that decode activity from a to b. To constrain the activity of the motor population r M2 , we furthermore require that it encodes the locomotion of the animal r move according to a linear model From the model of optic flow (Eq 1) it is directly visible that (on average) activity in visual neurons should be proportional to the difference between optic flow and predictions from motor neurons Hence, as a result of the cooperation of motor and visual neurons to explain optic flow, motor neurons should cancel predictable (i.e. self-generated) optic flow in visual neurons via efference copies. In terms of inference, this is called explaining away, since activity of one area must not explain aspects of the input that are already explained by the other area [6].

Results
To illustrate the emergence of mismatch responses via explaining away, we simulated inference and learning in this model (Eqs 1 & 2) in a previously proposed framework of population coding with spiking neurons [7] (However, any neural implementation of the model should yield similar results). In this framework, explaining away is implemented via connections between and within neural populations ( Fig 1B), which cancel (i.e. balance) inputs on a neuron's dendrites that can be predicted from the activity of other neurons. For the encoding of optic flow as in Eq (3), this framework therefore requires that motor neurons learn to cancel optic flow inputs on the dendrites of visual neurons. Visual neurons can then learn to efficiently encode the residual visual flow with a voltage-based plasticity rule [7].
Using this model, we recreated the visuomotor mismatch experiment of [4] in a simplified manner ( Fig 1C). The task of the network was to encode locomotion and optic flow, which were presented simultaneously (for details about the data creation, see Methods). We used simple locomotion signals, that indicated a turn to the left or right. Optic flow consisted of the activity of three receptors, which indicated speed and direction of optic flow, and were correlated with the locomotion signal ( Fig 1D). The idea of this setup is that locomotion can partly predict the optic flow, hence motor neurons should cancel this predictable component in the dendrites of visual neurons in V1.
In the simulations, the network first learned to represent locomotion and optic flow by adapting feed-forward weights from sensory inputs, as well as weights within and between populations using local plasticity rules [7]. After learning, we tested the responses of V1 neurons with conflicting stimuli (where optic flow in the center receptor did not match the prediction from locomotion) or non-conflicting stimuli. As we expected, we indeed found neurons that specifically reacted to a mismatch between optic flow and prediction, and encoded a positive or negative deviation ( Fig 1E). Here, explaining away is implemented via connections between and within neural populations (blue arrows), which learn to cancel (i.e. balance) sensory inputs that can already be explained from the activity of other neurons. Because motor neurons also explain optic flow, connections from M2 to V1 learn to cancel optic flow inputs in V1. To find an efficient encoding, connections from sensory inputs (red arrows) are learned via voltage-dependent plasticity [7]. C Experimental setup to induce mismatch responses [4]. A mouse is placed in a virtual environment, while the head is fixed. Egomotion of the mouse (red arrow) results in visual flow (green arrows) that is displayed on a screen. D Sample optic flow stimuli that are presented in our model. A rotation to the left would predict uniform visual flow to the right. Two mismatch conditions are also presented, where center optic flow is slower (#1) or faster (#2) than expected. E Simulation of optic flow mismatch responses with spiking neurons. The mouse turns to the left, which is encoded by motor neurons (M2). Visual neurons (V1) with mismatch responses are indicated by arrows. After learning, neurons emerge that are active for faster or slower optic flow than expected, which is similarly found in experiment [4].

Discussion
In any causal inference problem where multiple competing explanations exist for the same observation, having knowledge of one of the explanations will reduce the likelihood of the other explanations-the likely causes explain away the data. Here, we showed that explaining away manifests as visuomotor mismatch responses in a model where motor and visual areas jointly infer the underlying causes of optic flow. In particular, if an animal is moving, this can partially explain the optic flow, and thus locomotion competes with other (external) explanations that are represented in visual areas. In this case explaining away means that these external causes are less likely, and hence activity in visual areas should be suppressed if it is predictable from locomotion. Disentangling the potential explanations of the perceived optic flow in this way provides two large benefits: First, the resulting neural code is efficient, since information is not redundantly represented in both, visual and motor areas. Second, and more importantly, it allows the animal to rapidly identify objects that move independently of its own locomotion.
A common assumption is that mismatch responses of dedicated error neurons in a hierarchical predictive coding model [8] could also serve these two purposes. We argue that this is not the case. First, their computation is not efficient, since in these models every error neuron is associated with a prediction neuron. Thus, while activity in error neurons is cancelled, it always is accompanied by prediction neuron activity which is not cancelled, and which is redundant with activity in motor areas (as also explained in [8]). Second, if error neurons showed mismatch responses as observed experimentally, in this framework this would indicate that mismatch information is only slowly processed. Here the reasoning is as follows: Error neurons arise in this framework always and only to mediate between populations of prediction neurons by computing their mismatch. If another higher-level population of prediction neurons was involved, which would further process the locomotion-independent optic flow (e.g. in V2), the expectation would be that this population also tries to rapidly cancel activity in the error neurons with its own predictions (leading to a short transient error neuron response). However, mismatch responses in experiments are not rapidly cancelled [4], meaning that in this theory their activity would not be integrated by any higher-level population of prediction neurons on short timescales. Therefore, when mismatch responses are explained with dedicated error neurons, the open question remains why cortex would employ such redundant and slow representations.
Another observation, which seems difficult to reconcile with the concept of error neurons, is that responses in V1 cannot be clearly classified as either mismatch selective or visual. Rather, there exists a continuum of responses between neurons with high selectivity for visuo-motor mismatch and selectivity for visual stimulation [3]. This, however, is compatible with the model presented here, where mismatch selective neurons are not a priori distinguished from neurons that are not mismatch selective, but become mismatch selective (or not) by adjusting their stimulus tuning for an efficient neural code. Thus, the observed diversity of response characteristics in V1 seems less compatible with a dedicated encoding of prediction errors, and more compatible with the idea of explaining away in causal inference as presented here. Similar to our model, previous models showed that mismatch responses emerge when connections between neural populations learn to establish a balance on neural dendrites [9]. Yet, whereas previously the emerging mismatch responses were modelled without functional context, we here embedded these computations into causal inference in a particular graphical model. Based on our results we argue that experimentally observed mismatch responses are unlikely to be a signature of dedicated error neurons, but instead emerge as a hallmark of explaining away when multiple areas explain the same sensory inputs. This interpretation of mismatch responses can also be applied to other such observations, for example audio-visual suppression in V1 [10], or mismatch responses in the tactile [11] and auditory modality [12,13].

Methods
Spiking neurons were modeled with the model presented in [7]. Neurons were updated in discrete time steps δ = 0.2 ms. Feedforward weights F b←a ≈ D a←b T from signals to populations were learned online with voltage based learning rules [7]. For weights W b←a within and between populations, we here used an analytical solution for simplicity: Previously we showed that this analytical solution can be well approximated by learning a tight balance on neural dendrites [7]. Spiking rates of V1 and M2 neurons were homeostatically regulated to 8 Hz and 30 Hz, respectively. After learning we probed neural activity with plasticity turned off to produce the results in Fig 1. To simulate the experiment in [4], we created pairs of locomotion and optic flow signals. Signals were each presented for 100 ms before switching to the next pair. Locomotion is represented by a one-dimensional signal, where -1 indicated a turn to the left, 0 no movement and 1 a turn to the right, each occurring with probability p = 1/3. Optic flow is represented by a three-dimensional signal, where all values were initially set to 1, 0 or -1 for each movement condition, respectively. We then added additional optic flow that could not be predicted from locomotion. Every dimension each had the chance to be increased by 1 or decreased by 1 (with probability p = 0.1 each). After the creation of these two vectors, we ensured that only positive values were presented to the network, by doubling dimensions, making a negative copy to the new dimensions and rectifying the signal.
Code for reproducing the simulations can be found online at https://github.com/Priesemann-Group/ mismatch_responses.