Estimating voluntary and involuntary attention in bistable visual perception: A MEG study

We introduce a method for measuring human attention when performing a visual task consisting in different interpretations of a bistable image. The Necker cube with flickering faces was presented to nine conditionally healthy volunteers. The pixels intensity in the front and rear cube faces were modulated by a sinusoidal signal with 6.67-Hz (60/9) and 8.57-Hz (60/7) frequencies, respectively. The tags of these frequencies and their second harmonics were clearly identified in the average Fourier spectra of the magnetoencephalographic (MEG) data recorded from the occipital cortex. In the first part of the experiment, the subjects were asked to voluntary control their attention by interpreting the cube orientation as either left- or right-orientated. Accordingly, we observed the dominance of the corresponding spectral component and voluntary attention performance was measured. In the second part of the experiment, the subjects were just asked to observe the cube image without any effort in its interpretation. The alternation of the dominant spectral energies at the second harmonic tag frequencies was treated as changes in the cube orientation. Based on the results of the first experimental stage and using wavelet analysis, we developed a novel method which allowed us to identify currently perceived cube orientations. Finally, we characterized involuntary attention using the dominance time distribution and related it to voluntary attention performance and brain noise. In particular, we have shown that higher attention performance is associated with stronger brain noise.

Wilhelm Wundt was the first who suggested, as early as in 1897, that there exist two 2 forms of attention: voluntary and involuntary [1]. There is already a surplus amount of 3 terms used in the community that overlap with these two forms of attention such as 4 endogenous versus exogenous attention, automatic versus controlled attention, and pull 5 versus push attention [2]. According to Prinzmetal and his colleagues, voluntary and 6 involuntary attention have different functions and are controlled by distinct mechanisms. 7 They supposed that voluntary attention affects perceptual attention and would affect 8 both accuracy and reaction time (RT) experiments, whereas involuntary attention deals 9 with the response selection decision and is manifested only in RT experiments. 10 To investigate such a distinction, a spatial cuing task developed by Posner and his 11 colleagues [3][4][5][6] is useful. In this paradigm, subjects perform a detection or 12 February 14, 2020 1/13 identification task with a peripheral stimulus. The participants are precued to a 13 possible location of the stimulus beforehand; in valid trials, the cue indicates the target 14 location, whereas in the case of invalid trials, the cue indicates a nontarget location. 15 Since the participants are not allowed to move their eyes to the cued location, the 16 observed differences in performance between valid and invalid trials reflect differences in 17 attention which are independent of fixation. Jonides [7,8] used this paradigm to study 18 the difference between voluntary and involuntary attention by altering the "validity" of 19 the cuing information. If the total number of valid trials for the correct stimulus 20 location is as low as that for a random distribution in which no useful bias for the target 21 location is provided, only involuntary attention would be involved in seeing the 22 peripheral stimulus. On the other hand, in the presence of a high number of valid trials 23 in which correct cuing information for the target location is available, both voluntary 24 and involuntary attentions would be engaged. 25 Moreover, the effect of involuntary attention on the initial response selection 26 decreases as stimulus onset asynchrony (SOA) increases. The SOA is defined as the time 27 between the onset of cue and target [9,10]. At the same time, voluntary attention to the 28 biasing cue improves the perceived contrast of attended and unattended stimuli [11][12][13]. 29 Whether the enhance in the contrast is due to increasing dominance of attended 30 stimulus [14] or decreasing dominance of unattended stimulus [9] is still unclear.

31
In 2005 Prinzmetal et al. [2] introduced the idea of channel enhancement and 32 channel selection in order to show how the two kinds of attention manifest. Channel 33 enhancement is a process driven by voluntary attention that causes the visual system to 34 gather more information from the attended stimulus than from the unattended stimulus 35 specified by the informative cues. It changes the perceptual representation so that the 36 observers have a clearer view of the stimulus they are attending to [15][16][17]. The channel 37 enhancement should affect experiments around detection accuracy in the target location 38 which is being attended. Furthermore, it may also improve RT as information is 39 presumably gathered faster in the cued than in the uncued location. Besides, the 40 channel selection deals with the decision making in determining the correct target 41 location or response selection, and would only affect the RT experiments. 42 There is a general consensus that the Stroop effect alters response selection only, but 43 not perceptual representation [18,19]. For example, when shown the word BLUE 44 written in red ink and asked the ink color, it would lead to a competition in the 45 response selection that delays the response, but no alteration in the perceived color 46 would be observed. Similarly, involuntary attention will affect RT, but not detection 47 accuracy. Conveniently, several researchers found that involuntary attention to a 48 stimulus only affects the response selection [9,10,20]. 49 It should be noted that there is precedence for accuracy and RT studies to produce 50 different effects [21][22][23]. In particular, Santee & Egeth [23] worked on the redundant 51 target paradigm, in which a target letter can be repeated in the display. They reported 52 that the repeating target speeds up the reaction [24][25][26], a phenomenon known as a 53 flanker effect, but in turn reduces accuracy [27,28]. There is evidence that voluntary 54 and involuntary attentions affect SOA differently. Specifically, the SOA in the case of 55 involuntary attention engagement is smaller than for voluntary attention [29,30].

56
In this paper, we employ multistable perception as a tool for studying voluntary and 57 involuntary attention. Multistable perception is the phenomenon where the same 58 stimulus can be perceived in two or more different ways. With regards to degrees of 59 freedom, the simplest form of multistable perception is bistable perception, when two 60 different interpretations of the same stimulus are possible. There has been extensive 61 research on this topic over the last two decades and many descriptive models have been 62 proposed [31][32][33][34][35][36]. The switches between alternative percepts have been proposed to be 63 driven by stochastic processes in the brain [31,37] due to random neurophysiological 64 dominant perception after being active for a prolonged time or due to both noise and 66 adaptation [32,35,36]. Each percept competes for dominance over another rival state 67 and the active state tends to suppress alternate perception. Whether the interstate 68 suppression is realized before binocular confluence, such as in the primary visual cortex 69 or the lateral geniculate nucleus [38][39][40] or after [41,42] was a matter of numerous 70 debates. The latter mechanism suggests that competition exists between high-level 71 stimulus representations in visual neurons. Behavioral studies [9] support the latter 72 mechanism.

73
Similarly, the phenomenon of visual attention is based on the competition of one 74 object among a variety of other competing alternatives for enhanced perceptual 75 representation (voluntary attention). This led to the suggestion that bistable perception 76 and attention may be related processes [43,44]. The previous studies on this topic were 77 performed using an evoked response that consisted of numerous relatively short trials as 78 opposed to a single long trial. On the contrary, in the present work we design  The visual stimulus was a grey Necker cube image on a grey background generated by a 117 personal computer on the computer monitor with a 60-Hz frame rate and projected by a 118 digital light processing projector onto a translucent screen located 150 cm away from 119 the subject. The pixels' brightness on the left and right cube faces was modulated by a 120 sinusoidal signal with 6.67-Hz (60/9) and 8.57-Hz (60/7) frequencies respectively, as frequencies, which are integral fractions of the 60-Hz frame rate (i.e., 60/2, 60/3, 60/4, 126 60/5, 60/6, 60/7, 60/8, 60/9, 60/10, and 60/12), as frequencies which produce the best 127 tagging brain response [46]. The subjects were sat in a comfortable reclining chair with their legs straight and arms 130 resting on an armrest in front or on their laps. The participants were asked to remove 131 any metallic items above their waist like jewelry, belts, and brassieres, along with their 132 shoes before the experiment. The experiment began with the recording of a two-minute 133 background activity while the subject was focusing on a red dot at the middle of a 134 stationary (non-flickering) cube image. This MEG trial acted as a background reference 135 for further measurements.

136
The experiment included two stages, voluntary control of the perceived cube 137 orientation and involuntary spontaneous switching between the two cube orientations. 138 During the first stage, after a 30-s rest and an instructional visual message, the 139 flickering Necker cube with two frequencies was presented 24 times on the screen (5-s 140 each with a 5-s interval in-between). For the first 12 trials, 9 out of 12 participants were 141 asked to interpret the cube as left-oriented. After a 30-s rest and an instructional visual 142 message, the participants were requested to interpret the next 12 cubes as right-oriented. 143 For 3 subjects, we reversed the order of voluntary perception by asking them to 144 interpret the first 12 cubes as right-oriented and the next 12 cubes as left-oriented. This 145 concluded the first experimental stage.

146
When the subject was ready, the second experimental stage started. The same 147 Necker cube stimulus was presented for 120 s. At this stage the subjects were instructed 148 not to fix their attention on a particular cube orientation, but only at the red dot at the 149 centre of the image.

151
The brain is modeled using a mesh of 15004 points representing cortical sources. There 152 are multiple combinations in which these numerous brain sources can produce the 153 observed magnetic activity in the 306 MEG channels. This so-called inverse problem is 154 ill-posed and can only be solved by using additional assumptions about the neuronal 155 system such as minimization of the total energy of the system. The effect of 156 depth-dependent sensitivity and spatial resolution was normalized using the 157 standardized Low-Resolution Electromagnetic Tomography (sLORETA) method. 158 We used the Brodmann atlas in Brainstorm [47] to find cortical sources associated 159 with visual areas V1 and V2 on the modelled cortical mesh (1227 points). We then 160 averaged the response of these visual sources to obtain VIF for each trial. The time-frequency analysis is based on the continuous wavelet transform [48,49] 169 where "*" signifies the complex conjugate and X(t) is the analyzed MEG signal. The wavelet powers W (f 1 , t) and W (f 2 , t) given by Eq (1) and the difference between the spectral energies at f 1 and f 2 was then calculated as and normalized to its maximum absolute value as We averaged E 1 and E 2 over time and over all trials separately for the left-(P L 1 and 181 P L 2 ) and for the right-oriented cube interpretations (P R 1 and P R 2 ). The average spectra 182 are shown in The performance µ characterizes the ability of the subject to concentrate voluntary 192 attention. Similar to the voluntary case, wavelet power time series for both frequencies 193 were evaluated from VIF for involuntary perception. However, the time duration of the 194 trials was increased to 120 s.

Marking perception states 196
To determine the moment of switches between two different cube orientations, we 197 propose a method based on wavelet power time series. In our approach, ∆E calculated 198 by Eq (5) was screened for significant changes above a threshold equal to its standard 199 deviation δ: The active state was determined as left-oriented (Switch = 1) if ∆E > δ and as  The frequency tagging was successfully used in MEG research for studying perception of 207 ambiguous images [50]. Similarly, in our experiments on voluntary attention, we 208 observed that when the subject endeavored to interpret the cube as left-oriented, the 209 power spectrum at f 1 exhibited higher energy than at f 2 , whereas for the right-oriented 210 cube the contribution of f 2 was stronger. This can be seen in Fig 2, where we plot the 211 power spectra averaged over all subjects during the left-oriented cube, right-oriented 212 cube, and stationary cube (or background) without flickering.

213
Hence, we expect the difference between spectral powers corresponding to the left 214 and the right cube orientations at f 1 (or D 1 ) to be positive or at least higher than the 215 February 14, 2020 7/13 spectral power difference at f 2 (or D 2 ) which should either be negative or at least lower 216 than D 1 . Furthermore, the difference between D 1 and D 2 would signify the 217 performance in voluntary attention (µ) of the subject to tend to both cube orientations, 218 as the cause for perceived contrast between the attended and the unattended stimuli is 219 voluntary attention. Figure 3 shows typical spectral power difference time series for the 220 left and right face frequencies during voluntary attention on left and right cube faces.

221
The spectral difference was largely positive for the left face and negative for the right 222 face as predicted.

223
For all subjects, we found that D 2 was negative, whereas D 1 was only positive for 9 224 out of 11 subjects. In Fig 5 we   The average values of dominance times for both orientations are similar (T a1 = 4.097 235 ms, T a2 = 5.124 ms), but curiously, the most probable or modal dominance time for the 236 left orientation (T m1 = 2.275 s) is much higher than for the right orientation 237 (T m2 = 0.424 s). This seems to suggest a bias in the perception of the two cube 238 orientations. The same stimulation excites the left orientation more easily and frequently 239 than the right orientation. One of possible reasons for the preference in the left cube 240 orientation may be that in European languages the reading and writing are from left to 241 right. This can explain why in our everyday practice we observe the left-oriented cube 242 more often than the right-oriented cube and hence the perceptual stability of the 243 left-oriented cube is higher than the right-oriented cube one. At the same time, it is also 244 fair to acknowledge that in all the voluntary attention experiments that preceded the  Interestingly, higher attention performance leads to shorter dominance time. This can 252 be explain by the hypothesis that higher attention requires a larger neuronal 253 network to process information and make a decision, that in turn increases 254 neural noise since a larger number of synapses and neurons are involved [51]. Finally, 255 stronger brain noise causes more frequent switching between perceptual states or more 256 frequent response selection (involuntary attention) and hence shorter dominance times. 257 To check this hypothesis, we estimated brain noise using the methodology based on 258 phase synchronization [46]. Namely, we measured kurtosis of probability distribution of 259 the phase difference between the second harmonic of the flickering signal (f 1 ) and visual 260 induced field in the occipital cortex. In the right panel of Fig 7 we plot the average 261 modal dominance time versus brain noise (in units of inverse kurtosis). As expected, 262 these values anticorrelate; this confirms our hypothesis that higher attention 263 performance is associated with stronger brain noise because a larger neural 264 network is involved in information processing.

266
In this work, we proposed novel approaches for estimating attention performance and 267 classification of bistable perceived states, based on wavelet transformation of 268 neurophysiological brain activity. The developed algorithm for bistable state 269 classification can be useful for designing new noninvasive real-time brain-computer 270 interfaces, due to its fast computation and relative simplicity.