No Vergence Size Constancy

Abstract Vergence (the angular rotation of the eyes) is thought to provide essential distance information for size constancy (perceiving an object as having a constant physical size). Evidence for this comes from the fact that a target with a constant retinal size appears to shrink as the rotation of the eyes increases (indicating that the target has reduced in distance). This reduction in perceived size is supposed to maintain a constant perception of physical size in natural viewing conditions by cancelling out the increasing size of the retinal image as an object moves closer. Whilst this hypothesis has been extensively tested over the last 200 years, it has always been tested in the presence of confounding cues such as a changing retinal image or cognitive cues to distance. Testing members of the public with normal vision, we control for these confounding cues and find no evidence of vergence size constancy. Statement of Relevance This work has important implications for the neural basis of size constancy and for multisensory integration. First, leading work on the neural basis of size constancy cannot differentiate between recurrent processes in V1 (based on triangulation cues, such as vergence) and top-down processing (based on pictorial cues). Since our work challenges the existence of vergence size constancy, and therefore much of the basis of the recurrent processing account, our work indicates that top-down processing is likely to have a much more important role in size constancy than previously thought. Second, vergence size constancy is thought to be largely responsible for the apparent integration of the retinal image with proprioceptive distance information from the hand in the Taylor illusion (an afterimage of the hand viewed in darkness appears to shrink as the observer’s hand is moved closer). This explanation for the Taylor illusion is challenged, and a cognitive account proposed instead.


Introduction
The retinal image is often ambiguous as to scale. It doesn't tell us whether what we are looking at is a small object up close or a large object far away. We therefore need an extra-retinal source of distance information to specify scale. In the early 17 th Century, Kepler (1604) and Descartes (1637) argued that 'vergence' (the angular rotation of the eyes) was used to specify scale, since the closer an object is, the more the eyes have to rotate to fixate upon it. This hypothesis has been consistently confirmed over the last two centuries (Wheatstone, 1852;Wallach & Zuckerman, 1963;Mon-Williams et al., 1997;Sperandio et al., 2013). However, previous experimental studies fail to control for a number of confounding cues. We found no evidence of vergence size constancy once these confounding cues were controlled for. Sperandio et al. (2013) found subjects could achieve close to perfect size constancy from vergence alone. A doubling of distance (25cm to 50cm) led to roughly a doubling of the perceived size of an after-image for their representative participant (their Fig.5B), when vergence was changed gradually. However, like Mon-Williams et al. (1997) before them, they use an LED moving in depth to drive vergence. This is liable to introduce a number of additional depth cues, most notably retinal slip, as well as relative disparity between the LED and the afterimage whose size participants were evaluating. We therefore sought to test the effect of vergence on perceived size whilst keeping the retinal image constant.

Methods
We achieved a constant retinal image using four innovations: 1. First, we had subjects fixate on a 3º target on a display for 5 seconds whilst their vergence changed from 50cm to 25cm. Preliminary experiments showed that, unlike an LED moving in depth, a 3º target did not produce subjectively noticeable retinal slip. 2. Second, we corrected for viewing distortions by rendering each eye's stimulus separately in OpenGL (Fig.S1). 3. Third, we eradicated all residual luminance by using an OLED display, filters, and a narrow viewing window. OLED displays give off no residual luminance, unlike CRTs, liquid crystal, or LED displays. 4. Fourth, we used a forced choice paradigm ("did the target get bigger or smaller?"), rather than ask subjects to compare their visual experience with a chart (Sperandio et al., 2013).  During each 5 second trial subjects viewed a 3º target and answered the question "did the target get bigger or smaller?". On each trial we introduced a variable amount of physical size change (between -20% and +20%) into the target. At the same time, the observer's vergence was changed from 50cm to 25cm, and the question was whether this change in vergence biased the point of subjective equality (the point at which subjects couldn't tell if the target got bigger or smaller).
Two targets were presented on a display at eye height. Parallax barriers were used to ensure that the left eye could only see the right target, and the right eye could only see the left target, ensuring that subjects saw a single fused target at 50cm. Vergence was changed from 50cm to 25cm over 5 seconds by increasing the separation between the targets on the display. Between trials vergence was reduced back to 50cm over 10 seconds whilst subjects viewed a fixation cross.
Vergence / accommodation conflict was kept within reasonable bounds (±1 diopter, well within the 'zone of clear single binocular vision';  by fitting observers with contact lenses. The participants' heads were fixed using a bite bar. The apparatus was covered in black-out fabric, both to exclude external light and to absorb any residual luminance from the display, and participants viewed the stimuli through a narrow viewing window. They wore a mask that blocked out any remaining peripheral light, with red filters as lenses for viewing the stimulus. External lighting was reduced to dim, and a hood of additional black-out fabric was pulled over their head and body to block out any residual external light. These extensive precautions ensured that the target was viewed in isolation and in complete darkness with no residual depth cues. A four-parameter psychometric function was estimated over 200 trials (10 sets of 20 trials).
Quest+ (Watson, 2017;Brainard, 2017) was used to specify the angular size change for each trial.
Pilot data from the author using a method of constant stimuli found that size changes below 1.5% could not be detected. We simulated the experiment in Quest+ and, given plausible assumptions about detection thresholds (5% size change), and lapse rate (2%), estimated that we could exclude any effects greater than 1.5% with 5 or more observes. 11 observers (8 female, 3 male; age ranges 20-34, average age 24.5) participated in the study: the author and 10 participants recruited online (13 were recruited; 2 were excluded because they couldn't get clear vision with the contact lenses, 1 because she saw target as double). The author's participation was required to (a) confirm Quest+ mirrored the method of adjustment data, and (b) provide a criterion for the minimum effect size.
All other subjects were naïve as to the purpose of the experiment. The study was approved by the School of Health Sciences Research Ethics Committee at City, University of London in accordance with the Declaration of Helsinki.

Results
The results are illustrated in Fig.2. But before we discuss the results, it is worth considering what we might predict? Fig.2A illustrates the predictions for various degrees of vergence size constancy.
Because we are reducing the distance by half (50cm to 25cm), and because the angular size of the retinal image is proportional to distance in natural viewing conditions (i.e. the angular size of the retinal image will double as the object moves from 50cm to 25cm), full vergence size constancy in natural viewing conditions (i.e. constant perceived size with distance) would require the vergence signal to halve the perceived size of the retinal image. Because the retinal image remains fixed in our experiment, this predicts that if we experience full vergence size constancy in natural viewing conditions, we would have to increase the physical size of the target in our experiment by 100% in order to maintain a constant perceived size (subjects at chance at determining whether the target got bigger or smaller). Full vergence size constancy is indicated in dark grey in Fig.2A. Degrees of partial vergence size constancy are also illustrated in lighter shades of grey. By contrast, if there were no vergence size constancy, we would expect subjects to be at chance at determining whether the target got bigger or smaller just when the physical size of the target didn't actually change. And this is exactly what we find, as illustrated in Fig.2A. Bayesian psychometric function using the Palamedes toolbox  with the toolbox's standard priors (bias and slope: normal (0,100), upper and lower lapse rates: beta (1,10)).  1.00 1.00 −20 −10 0 10 20  They show individual biases ranging from -2.2% to +1.2%, but clustered around 0. We then fitted a four-parameter logistic hierarchical Bayesian psychometric function to the data, which enables us to estimate the population level psychometric function with a multilevel model that takes into account the variability of each subject. This population level psychometric function is plotted in Fig.2A, with an average bias of -0.219% (95% CI: -1.82% to 1.39%) and average slope of -0.732 (95%CI: -1.07 to 0.378). The estimate that particularly interests us is the bias, and so a probability density function of 15,000 posterior estimates is plotted in Fig.2B. To summarise, we found no statistically significant effect of vergence on perceived size (and the negligible bias we did find was in the wrong direction for size constancy).

A. Hierarchical Bayesian Model of Size Judgements
To go beyond the negative claim that we found no effect of vergence on perceived size (null hypothesis not rejected) to the positive claim that there is no effect of vergence on perceived size (null hypothesis accepted), we can make two arguments. First, from a Bayesian perspective, we can perform a JZS Bayes factor (Rouder et al., 2009), and the estimated Bayes factor (3.99, ±0.03%) suggests that the data are four times more likely under the null hypothesis (bias = 0) than under the alternative (bias ≠ 0). Second, from a frequentist perspective, we can perform an inferiority test that tests whether any true vergence size constancy effect is at least as great as the smallest effect size of interest (Lakens et al., 2018). We define our smallest effect size of interest as the detection threshold for our most sensitive observer (1.43%). Put simply, any vergence size constancy effect that's smaller than a 1.43% size change won't be detected by any of our observers.
Since we have a directional hypothesis (vergence size constancy is positive, rather than negative), we perform an inferiority test by taking the 90% confidence interval of the population bias in the predicted direction (0.96%), and since it is smaller than 1.43%, from a frequentist perspective we can conclude that any vergence size constancy effect is effectively equivalent to zero.

Discussion
To our knowledge, this is the first study to report a failure of vergence size constancy at near distances. But ours is also the first study to control for (a) confounding perceptual cues (changes in the retinal image), whilst also controlling for (b) confounding cognitive cues (keeping subjects naïve about changes in absolute distance, unlike the Taylor illusion discussed below). These results have important implications for the neural basis of size constancy and for multisensory integration.
1. Neural Basis of Size Constancy: Ever since vergence responsive neurons were recorded in the monkey primary visual cortex (V1) (Trotter et al., 1992), it has been suggested that vergence processing in the visual cortex plays an important role in size constancy. Chen et al. (2019) recently demonstrated that size constancy takes ~150ms to evolve, suggesting that the vergence signal and the retinal image are not integrated during initial visual processing (~50ms). However, this finding is still consistent with the vergence size constancy, which they previously reported in Sperandio et al. (2013). There is no question size constancy occurs in V1 (Murray et al., 2006;Sperandio et al., 2012), but Chen et al. (2019) are unable to differentiate between (a) recurrent processing within V1, and (b) top-down input from other visual areas to V1. The recurrent processing account is almost entirely dependent on vergence (their 8° stimulus is too small to bring vertical disparities significantly into play; Rogers & Bradshaw, 1995). So our results, which find a complete absence of vergence size constancy, significantly challenge the recurrent processing account, in favour of a top-down explanation (especially when combined with previous results from our lab that question vergence as an absolute distance cue: Linton, in press).

Multisensory Integration: When observers view an after-image of their hand in complete
darkness, the after-image appears to shrink if they move their hand closer. This phenomenon (the Taylor illusion) has been wholly (Taylor, 1941;Mon-Williams et al., 1997) or largely (Sperandio et al., 2013) attributed to vergence size constancy. Our results contradict this account, so it is worth considering how the Taylor illusion is explained. It doesn't appear to be primarily an effect of proprioception on vision since the effect is dominated by the gaze direction, even when the gaze and hand are moved in opposite directions (Sperandio et al., 2013). Instead, what seems to be required is subjective knowledge about gaze direction, either from retinal or from proprioceptive cues (both of which were absent in our experiment). This suggests a changing cognitive interpretation of an unchanging visual percept, rather than the integration of vision and proprioception reported by Sperandio et al. (2013).

Experimental Apparatus
Fig.S1A provides a schematic of the apparatus, and Fig.S1C illustrates the viewing mask that subjects wore when inside the apparatus. The emphasis was on eliminating any retinal disparities from residual luminance in the apparatus, since we argue that this was the key confounding cue in many experimental tests of vergence size constancy. We achieved this in five ways: 1. We used an OLED display to present the stimuli. Unlike CRTs, LED displays, and liquid crystal displays, OLED displays do not produce residual luminance from black pixels.
Piloting with CRTs and neutral density filters convinced us that this could not be achieved simply using a combination of CRTs and filters.
2. Subjects wore a mask to block out any residual light, and viewed the red stimuli through red filters to block out any residual light (100% green, ~90% blue). In a preliminary experiment these were found to have no effect on accommodation (see Linton, in press).
3. Subjects viewed the stimuli through a narrow (17°) viewing window of 48cm x 18cm at a distance of 60cm.
4. The whole apparatus was covered by blackout fabric, and subjects pulled a hood of blackout fabric over their heads and the external lights were turned off.
5. We did not to include eyes-tracking for this specific reason. We share Quinlan & Culham (2007)'s concern that light from eye-trackers strays into the visible spectrum, introducing disparity cues. Other concerns included the fact that research eye-trackers "are not accurate enough to be used to determine vergence" (Hooge et al., 2019), that eye-tracking would be impractical given our use of parallax barriers, that calibration targets would be on the wrong focal plane, and that calibration would compromise the naivety of the subjects.

Distortion Correction
Most studies translate the stimulus horizontally on the screen with increasing vergence. But as Fig.S1A illustrates, this produces an incorrect retinal image (compare the horizontally translated stimulus with one that maintains a constant radius and orientation to the eye). To achieve a stimulus that maintains a constant radius and orientation to the eye, whilst displaying the stimulus on a flat fronto-parallel display, we rendered the stimulus for each eye separately in OpenGL. An incorrect, but helpful, way of conceptualising this is that a virtual eye, placed where the physical eye will be, projects the correct stimulus onto the display behind it, so that when the real eye views the projected image from the same location, the retinal image of the stimulus is correct. Figure S2. OpenGL 'projection' to ensure a constant retinal image with eye rotation.

Accommodation
Contact lenses set accommodation at 3 dioptres to keep vergence / accommodation conflict within reasonable bounds (+/-1 dioptre). This is well within the zone of 'clear single binocular vision' , and since some of the most dramatic reports of vergence size changes ("size changes … about threefold when convergence changed from about 0 deg to 25 deg"; Regan et al., 1986) have been in the presence of large vergence / accommodation conflicts (6.5 dioptres in Regan et al., 1986), permitting +/-1 dioptre seemed reasonable. We used contact lenses rather than trial lenses to avoid off-axis distortions / magnifications affecting our results.
Contact lenses were fitted by a registered UK optometrist, and each observer was given an anterior eye health check with fluorescein sodium both before and after the experiment. Contact lenses strength was based on the subject's valid UK prescription.

Recruitment
The experiment was simulated 10,000 times in Quest+, assuming a normal distribution, a bias of 0, and reasonable assumptions for the standard deviation (7.5) and upper and lower asymptote (0.02). For an inferiority test we want to exclude any effect greater than +1.5% (detection thresholds) from the 90% confidence intervals. The simulation illustrated that we could achieve this level of precision with 5 or more observers. We recruited 15 participants (the maximum number our budget permitted given optometrists and subjects were compensated for their time).
Subjects were paid £15/hr for 3 hours (£45 total). Subjects were recruited using an advertisement posted on callforparticipants.com and exclusion criteria were (a) aged between 18-35, (b) valid UK prescription with no more than +/-5D (spherical) and 0.5D of astigmatism (c) accommodation within normal bounds for age (tested with a RAF near-point rule), (d) vergence within normal bounds (18D or above on a Clement Clarke prism bar), (e) stereoacuity within normal bounds (60 arc secs or less on a TNO stereo test), and (f) could achieve clear single binocular vision in the apparatus. One participant was excluded for failing the stereoacuity test, one participant was excluded for an inability to achieve single vision within the apparatus, and two participants were excluded for an inability to achieve clear vision within the apparatus.

Experimental Protocol
Subjects head position was fixed with a bite-bar. They closed their left eye and rotated the right metal plate until they could see one clear target in their right eye. They then closed their right eye until they could see one clear target in their left eye. They then opened both eyes and confirmed they could see one clear single target. Once this was confirmed, they completed a set of 20 trials, before they took a break and confirmed the targets had been clear and single during the set.