Working memory signals in early visual cortex 1 do not depend on visual imagery

of relationships: the strong correlation between target reconstruction accuracy (“BFCA target”) and behavioral precision (“Behav. precision”), and the close relationship between pre-and post-scan VVIQ (i.e., test-retest reliability) and the visual OSIQ scores. There are some significant effects between several variables and items from the strategy (“Strat.”) questionnaire. Please note, however, that these questions were purely heuristic in nature. We only asked for each strategy in rater general terms and did not ask for the vividness of each strategy. The questions were not based on any previously validated procedure, in contrast to the established VVIQ and OSIQ scales. Also, the ratings on these items have high variance, rendering any interpretation difficult. We are currently not aware of any established and standardized sets of questions regarding the use of cognitive strategies.

stimuli were followed by a numeric retro-cue ("1" or "2"), indicating which one of them was to be used for the 153 subsequent delayed-estimation task ("target"), and which could be dropped from memory ("distractor"). The 154 orientation of the cued target grating had to be maintained for a 10-second delay. After the delay a probe grating 155 appeared, which had to be adjusted using two buttons and then confirmed via an additional button press. 156 Subsequently, visual feedback was given indicating whether a response was given in time (by turning the fixation

212
(C) Detailed correlation between delay-period accuracy (BFCA) and visual imagery score. There was no significant 213 correlation between the strength of delay-period representations and imagery vividness even when using the full 214 graded imagery scores (shaded area: 95 % confidence interval). Neural information during the delay-period was 215 significantly above chance-level even for aphantasic individuals with a visual imagery score below 32 (grey bar at 216 x-axis; t(4) = 8.758, p < 0.001, one-tailed, E.A.). The arrow on the x-axis points to the aphantasia cutoff.

218
We also conducted several checks to test for other predictions of our analysis. First, we 219 reconstructed the orientation of the distractor, i.e., the task-irrelevant orientation stimulus that over our original prediction that the early visual cortex signal of strong imagers should contain 242 more information about the stimulus (BF01 = 5.275).

244
To further corroborate the effect, we assessed the possibility that the effect of imagery 245 vividness is more gradual in nature and thus might not be captured by the categorical group 246 difference. To address this, we calculated the correlation between delay-period accuracies and 247 graded imagery vividness scores. Again, the result was not significant ( Figure 3C; r = -0.256, 248 p = 0.11), with strong evidence for the absence of a positive correlation (BF01 = 12.442). There 249 was also no relationship between working memory signals and any of the post-scan imagery 250 assessments (see Table S1). Note that delay-period accuracy was significantly greater than 251 chance-level even for the five participants with a visual imagery score of below 32 (marked 252 with a grey bar on the x-axis of Figure 3C; one-sample t-test: t(4) = 8.758, p < 0.001, one-tailed; 253 E.A.), which is generally considered the threshold for aphantasia (Zeman et al., 2015). Taken

265
Finally, we tested a further prediction that would be expected if strong imagers relied more on 266 sensory information encoded in early visual cortex than weak imagers. In that case, there 267 should be a tighter predictive link between behavioral performance and the encoding of 268 information in early visual areas, especially for strong imagers. For this, we assessed whether 269 there was more performance-predictive information in early visual areas of strong imagers. In

280
In this study, we investigated to which extent an individual's visual imagery vividness affects 281 the strength of working memory representations in their visual cortex. Two experimental 282 groups, strong and weak imagers, performed a visual working memory task, which involved 283 memorizing images of oriented lines over a delay. In both groups we found that early visual 284 cortex contained robust information about the remembered orientations across the entire delay 285 period. Importantly, the level of this information did not differ between strong and weak imagery 286 groups. There was also no apparent dependency of visual cortex representations on any other imagery vividness. Crucially, even the five participants with a VVIQ score of below 32, which 290 is generally considered the threshold for complete absence of phenomenal imagery 291 ("aphantasia"; Zeman et al., 2015) showed comparable visual neural information to the strong 292 imagers (see Figure 2C). Our results therefore show that working memory signals can be 293 present in early visual cortex even in the (near) absence of phenomenal imagery.

295
While working memory signals in early visual cortex were not modulated by imagery vividness, 296 we did observe a strong correlation between encoded information and individual behavioral 297 precision. Moreover, the overall strength of this effect was also indistinguishable between 298 imagery groups. This suggests that the sensory information represented in early visual cortex 299 was equally important for strong and weak imagers to successfully guide behavior. We thus 300 find no evidence for differences between strong and weak imagers, neither in the encoding of 301 sensory information nor in the degree to which this information is predictive of behavior. These  addressing the effects of individual differences. One study found that the overlap between working memory signals do not seem to depend on imagery vividness is not in direct 327 contradiction to these previous decoding studies.

329
Importantly, our study was specifically designed to assess the neural encoding of working 330 memory contents, not the neural representations of imagery. If working memory signals in early 331 visual areas were to exclusively reflect imagery, one would predict these working memory 332 signals both to be modulated by imagery ability and to be completely absent for individuals 333 without phenomenal imagery (aphantasics). Our results show that both are not the case.

334
Please note that we are not claiming that visual imagery and visual working memory are never     visual cortex (Bartolomeo, 2008). As a consequence, orientation-specific signals could be 383 maintained in early visual cortex, but weak imagers might not be able to access them to 384 produce phenomenal imagery. On this basis, one could speculate that the weak imagers in 385 our case might have had a deficit in a (potentially temporal) imagery network, whereas working 386 memory performance is based on sensory information that is largely intact. Early visual 387 information would thus be available to solve the working memory task but would not 388 necessarily lead to the experience of imagery. Importantly, however, this is at odds with a large body of behavioral, neuroimaging and brain-stimulation work which suggests a close link

636
After detrending, we applied temporal smoothing to the data by running a moving average of 637 width 3 TR across the data of each run.

638
To increase the signal-to-noise ratio for samples from trials with neighboring stimulus 639 orientations, we developed a method that we refer to as "feature-space smoothing". Feature-

819
After preprocessing, we entered the data into the same pSVR reconstruction analysis as the 820 fMRI data, using the x and y ordinates of the gaze position as input instead of voxel signal, 821 and evaluated the reconstruction by calculating the BFCA. As with the fMRI data, we tested 822 for clusters of above-chance time points using the cluster-based t-mass permutation approach 823 described above.

825
Feature-space smoothing simulation

826
To demonstrate how feature-space smoothing can increase signal-to-noise ratio (SNR) and 827 increase accuracy in a continuous reconstruction setting, we simulated fMRI data with varying 828 amounts of SNR and used different levels of feature-space smoothing before reconstruction.

829
Following the specifics of our experiment, we simulated data comprising 8 runs with 40 trials   The distribution of behavioral responses was modeled as a combination of the three model components: detections (responses to target orientations, assumed to follow a von Mises distribution with mean 0° plus bias µ and precision ; green), swap errors (responses to distractor orientations, following the same assumptions as detections; purple) and guesses (assumed to follow a continuous uniform distribution between -90° and +90°; red). These components were weighted by individual event probabilities (mixture coefficients) ! , " and # , respectively. Participants correctly responded to the target direction in 94.7 % of trials ( ! = 0.947 ± 0.063), and only infrequently made swap errors ( " = 0.026 ± 0.034) or guesses ( # = 0.027 ± 0.041). Responses to targets were precise ( ! = 5.673 ± 2.377), while responses to the distractor, where present, were imprecise ( " = 1.735 ± 2.41). There was a small but significant bias to respond anti-clockwise of the target (μ = -0.889 ± 1.635°; t(39) = -3.437, p = 0.0014, two-tailed; see also   There were no temporal clusters with significantly above-chance BFCA, suggesting that participants did not systematically use gaze position to maintain target orientation across the delay period. Shaded areas indicate 95 % confidence intervals.

Figure S3. Schematic representation of feature-space smoothing and simulation results. (A)
We used a Gaussian smoothing kernel to compute a weighted average from the voxel signal of samples lying closely together in feature space. Samples close to a given orientation in feature-space therefore contribute more to the resulting average than those further away. The full width at half maximum (FWHM) of the smoothing kernel controls the smoothing range, i.e., the number (or distance) of samples that are included in the weighted average. We used FWHM values between 0° (no smoothing) and 90° in steps of 10° and determined the optimal kernel width for each participant via nested cross-validation across subjects. Note that this was done (a) at the level of the input data to the analysis, not the results, (b) for training and test data separately, and (c) was confirmed not to produce artifacts or spurious results by extensive simulations (see (C) and Extended Methods). (B) We simulated data with varying levels of SNR and used feature-space smoothing with different kernel widths (measured as FWHM in degrees) before reconstruction of the underlying signal. The plot shows BFCA for all parameter combinations, averaged across 1000 repetitions. (C) BFCA across smoothing levels, for the pure noise condition. BFCA remained at chancelevel across all levels of smoothing (all p > 0.25) and BFCA for any smoothing condition did not differ from the nosmoothing condition (all p > 0.15). (D) BFCA gain compared to no smoothing, averaged across all 1000 repetitions. The first column corresponds to baseline, i.e., zero smoothing. In the signal conditions (SNR > 0), feature-space smoothing was able to reliably increase BFCA compared to no smoothing. The effect was strongest for smoothing kernel widths between 30° and 170°, where we observed increases in accuracy of up to 20 %. Generally, the effect of feature space smoothing was stronger for data with low SNR (orange-yellow area). In cases of extremely high kernel-width and comparatively high SNR (i.e., SNR > 0.6 and FWHM > 220°), feature-space smoothing had a detrimental effect, meaning that BFCA was decreased compared to no smoothing (dark blue area). Please note, however, that kernel-widths this high do not make any sense for real-world applications and were only included for the purpose of demonstration. We conclude that feature-space smoothing is a powerful preprocessing technique to increase SNR in a feature-continuous reconstruction setting. As the optimal kernel-width for smoothing depends on the specific data and SNR, we recommend using nested cross-validation to determine the optimal FWHM value, similar to the approach described in the main text. Figure S4. Schematic representation of periodic support vector regression (pSVR). The aim of our reconstruction analysis was to predict an angular label between 0° and 180° from the multivariate voxel signal in response to a stimulus grating with the respective orientation. However, the linear scale of orientation labels (from 0° to 180°) does not reflect the periodic nature of the stimulus (i.e., 0° and 180° are identical). To account for this, we projected the angular labels into a periodic space by fitting two sinusoids into the range [0, 180). Both functions had an amplitude of 1 and a period of 180°, so that one period spanned the entire label space. One function was shifted by 45°, so that the combination of both periodic functions coded for the linear label scale. This is equivalent to the way sine and cosine functions between 0 and 360° code for the angles on a unit circle. We trained and tested a multivariate SVR model for both periodic label sets (x, y) separately. From the combination of the predicted periodic labels, we then reconstructed a predicted angular label using the four-quadrant inverse tangent. The predicted orientation was then compared to the true orientation to derive BFCA, our measure of reconstruction accuracy.
-0.  Table of correlation coefficients between all variables of interest, including the items from the heuristic strategy questionnaire. There are two notable sets of relationships: the strong correlation between target reconstruction accuracy ("BFCA target") and behavioral precision ("Behav. precision"), and the close relationship between pre-and post-scan VVIQ (i.e., test-retest reliability) and the visual OSIQ scores. There are some significant effects between several variables and items from the strategy ("Strat.") questionnaire. Please note, however, that these questions were purely heuristic in nature. We only asked for each strategy in rater general terms and did not ask for the vividness of each strategy. The questions were not based on any previously validated procedure, in contrast to the established VVIQ and OSIQ scales. Also, the ratings on these items have high variance, rendering any interpretation difficult. We are currently not aware of any established and standardized sets of questions regarding the use of cognitive strategies.