Idiosyncratic Perception: A Link Between Acuity, Perceived Position, and Apparent Size

Zixuan Wang; Yuki Murai; David Whitney

doi:10.1101/2020.06.09.143081

Abstract

Perceiving the positions of objects is a prerequisite for most other visual and visuomotor functions, but human perception of object position varies from one individual to the next. The source of these individual differences in perceived position and their perceptual consequences are unknown. Here, we tested whether idiosyncratic biases in the underlying representation of visual space propagate across different levels of visual processing. In Experiment 1, using a position matching task, we found stable, observer-specific compressions and expansions within local regions throughout the visual field. We then measured Vernier acuity (Experiment 2) and perceived size of objects (Experiment 3) across the visual field and found that individualized spatial distortions were closely associated with variations in both visual acuity and apparent object size. Our results reveal idiosyncratic biases in perceived position and size, originating from a heterogeneous spatial resolution that carries across the visual hierarchy.

Idiosyncratic Perception: A Link Between Acuity, Perceived Position and Apparent Size

Accurately registering the locations of objects is a critical visual function. Most other perceptual functions including pattern and object recognition, as well as visually guided behavior, hinge on first localizing object positions. Position perception is generally assumed to be dictated by retinotopic location, and that may explain a lot of the variance in perceived position. However, perceived position can be biased due to various external factors, such as overt attention [1], motion [2] and saccadic eye movements [3]. The impact of these factors can be significant, especially considering the spatial scale at which object recognition and visually guided action happen. A 0.5-degree shift in the location of a pedestrian or car crossing a freeway could result in a catastrophic collision. The scale at which perception and action needs to operate is often very fine, and many factors bias perceived position at a scale that is behaviorally relevant.

In the absence of these external factors, perceived position is often assumed to be uniformly dictated by retinotopic position. However, a recent study challenges this belief and demonstrates that people mislocalize objects idiosyncratically and consistently even without apparent change in the environment [4]. The unique biases in object locations were shown to be stable across time when tested after weeks or months, indicating a stable perceptual fingerprint of object location.

Why do people perceive idiosyncratically biased object locations in different parts of the visual field and what are the perceptual consequences of it? Here, we test the possibility that variations in spatial resolution across the visual field might cause the spatial distortions in perceived position. Many researchers have shown that visual acuity varies across the visual field [5-7]. Because many models of localization depend implicitly or explicitly on the underlying resolution and homogeneity of spatial coding [1, 8-10], it is conceivable that the inhomogeneity in visual acuity could result in an inhomogeneous visual space representation, consisting of areas of contraction (sinks) and expansion (sources). Mislocalization would be one of the natural perceptual consequences of these inhomogeneities.

A further prediction is that if individual observers have inhomogeneous visual acuity and consequential distorted representations of visual space, the biases might be carried along with the visual system so that object representations and appearance may also vary in a predictable and related way. To test this, we also measured whether the perceived size of objects varies at individualized perceptually contracted or expanded regions of visual space.

Experiment 1: Idiosyncratic Visual Space Distortion

Kosovicheva and Whitney [4] demonstrated that observers have stable and idiosyncratic patterns of mislocalization at different polar angles in the visual field. We hypothesized that this mislocalization pattern reflects distinct distortions of visual space and that it should be observed from the fovea to the periphery (not just at one eccentricity). The purpose of Experiment 1 was to identify whether there are idiosyncratic spatial distortions across the visual field.

Method

Participants

Nine observers (3 females, 2 authors, age range: 19 - 33) participated in this experiment. All subjects were experienced psychophysical participants, and all but the two authors were naïve to the purpose of the study. All subjects reported to have normal or corrected-to-normal vision. Procedures were approved by the Institutional Review Board at the University of California, Berkeley.

Stimuli

Stimuli were presented on a 19-inch gamma-corrected Dell P991 CRT monitor (Dell, Round Rock, TX; 1024 × 768 pixels resolution, 100 Hz refresh rate). To minimize any off-screen reference (i.e., any visible references outside of the computer monitor including the difference between the monitor frame and the experiment room), the monitor frame was covered by black tape. Visual stimuli were generated using MATLAB (The MathWorks, Natick, MA) and Psychophysics Toolbox (Version 3) [11] and the experiment program was run on an Apple Macintosh computer (Apple Inc., Cupertino, CA). Observers viewed the stimuli binocularly at a distance of 40 cm using a chin rest.

We used noise patches as targets for the localization task (See Figure 1a). Each noise patch contained random black (< .001 cd/m², measured by Minolta LS110 Luminance Meter) and white (92.6 cd/m²) squares (each square was 0.1 × 0.1 degrees of visual angle [d.v.a.]). Noise patches were enveloped with a two-dimensional Gaussian contrast aperture (standard deviation: 0.75) and only visible within a circular aperture with a radius of 1.22 d.v.a., Noise patches were shown on one of 5 invisible isoeccentric rings (eccentricity: 2, 4, 6, 8, 10 d.v.a.) and one of 48 angular positions equally distributed with a separation of 7.5 degrees (°) on each ring, which resulted in a total number of 240 possible locations. Angular locations from 0° to 360° correspond to positions starting from the right of the fixation, moving clockwise. The four exactly vertical and horizontal positions at each eccentricity were included.

Figure 1.

Experimental paradigm and results for Experiment 1. (a) Left: Observers fixated at the center and a target was displayed briefly at one of five possible eccentricities (depicted by dashed lines, which were not visible in the experiment). Right: After the target disappeared, observers moved the cursor to match the target s location. (b) All nine individual observers spatial distortion indices plotted as distortion maps. The color gradient represents the degree of distortion, with blue indicating contracted visual space and red for expanded. See Supplemental Figure S1for gray-scale luminance-defined distortion maps. (c) Averaged within vs. between-subject correlation calculated by bootstrap procedures (see Experiment 1, Method). There was significantly higher within-observer agreement than between-observer agreement, indicating that each observer had a unique pattern of spatial distortions. The error bars represent the bootstrapped 95% CI.

Procedure

Observers were instructed to maintain fixation throughout the experiment. On each trial, a black (< .001 cd/m²) 0.3-d.v.a. diameter fixation dot was presented at the center of a gray (48.3 cd/m²) background on the screen. After 1000 milliseconds (ms), a noise patch appeared at a pseudo-randomly chosen location among all 240 locations for 50 ms. Upon the offset of the noise patch, the fixation dot changed to dark gray (30.4 cd/m²) and 500 ms later, a white (92.6 cd/m²) 0.45-d.v.a. diameter response dot representing the location of the cursor was superimposed on the fixation and participants freely moved the dot using the mouse to match the location of the noise patch center. The position of the cursor when participants clicked the mouse was recorded as the reported location.

In the experiment, each target location was tested 12 times, so there were 2880 trials in total (12 repetitions × 48 angular locations × 5 eccentricities). The whole experiment was separated into 6 sessions (2 repetitions for every location per session, random sequence within session). The time interval between every 2 sessions was on average 1.3 days (standard deviation: 1.6 days).

Data Analysis

For each observer and each session, we first calculated the polar angles of the cursor locations reported by the observer. Then for each target location, the two reported locations from the two trials were averaged within each session. Thus, there were 6 sessions of averaged reported locations. Within each session, the average locations were grouped by 5 eccentricities with each consisting of 48 isoeccentric reported angular locations.

For each session and at each eccentricity, the 48 reported locations were transformed into 48 visual space distortion indices. Since any two physically adjacent target locations at the same eccentricity were separated by 7.5° polar angle, if the angular distance between the two adjacent reported locations in the same session was larger than 7.5°, then the area between them was effectively a region of expanded visual space. On the other hand, if the distance was smaller than 7.5°, the visual space between them was effectively compressed. Thus, when the physical target distance (i.e., 7.5° polar angle) was subtracted from the reported distance (i.e., difference in perceived locations), it yields a visual space distortion index in degrees of polar angle for each location. A more positive index refers to increasing visual space expansion and a more negative index refers to larger visual space compression. Zero means no distortion. These resulting 48 distortion indices at each eccentricity within each session were smoothed using a simple moving average method with a window of 45° polar angle. The smoothing is to better characterize a continuous change of distortion across space and to compensate for the discrete spatial sampling given that only 48 locations were tested at each eccentricity. The smoothed distortion indices were averaged across the six sessions at each location (shown as individual distortion map in Fig. 1b).

To quantify the idiosyncrasies of the distortion indices, we compare the within-observer to between-observer consistency using a nonparametric bootstrap method [12]. The within-observer consistency is defined as the similarity of the distortion indices across sessions. On each iteration, for every observer, 3 random sessions were sampled without replacement from all of the 6 sessions and the spatial distortion indices for each location were averaged across the 3 sessions as one bootstrapped half. The 3 remaining unsampled sessions were averaged and formed the second half. We z-scored the distortion indices at each eccentricity within each half in order to remove the effect of eccentricity and then correlated the two halves. The Pearson’s r value for each observer was transformed into Fisher z value before averaging across observers and the averaged Fisher z was transformed back into Pearson’s r value (Fisher transformation, used for all procedures that required averaging correlation values). This procedure was repeated 1,000 times to estimate a 95% bootstrapped confidence interval (CI) for within-observer consistency. Between-observer consistency was estimated similarly. On each iteration, one of the two halves from one observer was correlated with one half from another observer. All possible pairwise between-observer correlations were averaged together. This procedure was also repeated 1,000 times to estimate a 95% bootstrapped CI for between-observer consistency.

To evaluate the significance of the within-observer and between-observer correlations, we also generated permuted null distributions that showed the expected chance correlations from two uncorrelated and permutated halves of the distortion indices. On each iteration, for every observer, all distortion indices were randomly split into two halves (as described previously). One of the two halves was rotated by a random number of positions at each eccentricity to shift the distortion indices away from its original physical positions while simultaneously preserving the spatial relationships between adjacent isoeccentric target locations. The rotated half was then correlated with the other unchanged half from the same observer to estimate a within-observer null correlation. Then, the null correlations were averaged together. This procedure was repeated 10,000 times to estimate a within-observer permuted null distribution. For the between-observer null distribution, on each iteration, the rotated half was correlated with an unchanged half from another observer and all pairwise null correlations between observers were averaged together. This procedure was also repeated 10,000 times. The mean within-observer and between-observer empirical correlations obtained in the bootstrap analyses described above were compared to these null distributions.

To quantify the unique contributions of the distortion indices within each individual rather than a common distortion pattern between observers, we fitted linear regression models and compared the variance explained by different models using a bootstrap test in R [13]. On each iteration, we first fitted a full model which was formalized as: The dependent variable DI (distortion indices) for every target location was calculated by randomly sampling half of the 6 sessions without replacement and averaging the distortion indices across the 3 sampled sessions. Then the 3 remaining unsampled sessions were also averaged to represent the observer-specific spatial distortion pattern (self). The other predictor (others) was the distortion indices averaged across the remaining observers, using only half of the sessions from every observer to minimize the signal-to-noise differences between the two predictors. To estimate how much variance of each observer’s distortion indices can be explained by their own distortion pattern (self) versus the other observers averaged distortion pattern (others), we also fitted two other models, the within-observer and the other-observer models. The within-observer model was formalized as: and the other-observer model was expressed as: Unique variance explained by self (i.e., the distortion indices within each observer) was estimated by subtracting variance explained by the other-observer model from that explained by the full model. Variance explained by others (i.e., the averaged distortion indices across other observers) was estimated by subtracting variance explained by the within-observer model from the full model. We also estimated shared variance between self and others by subtracting the unique variances of each of them from the explained variance of the full model. Note that since the shared variance is the common contribution between observers themselves and others, it essentially represents a between-observer similarity in the spatial distortion indices. We repeated this procedure 1,000 times to compare the unique variance explained by within-observer or other-observer distortion indices, or the shared variance between observers.

Results

The bootstrapped within and between-observer similarity is shown in Figure 1c. The average bootstrapped within-observer correlation (r = .71) was significantly higher than the null correlation expected by chance (p < .001, permutation test). This reveals consistent spatial distortion patterns within individual observers. The mean bootstrapped between-observer correlation (r = .22) was significantly higher than chance (p < .001, permutation test). We further found that within-observer similarity is significantly higher than between-observer similarity (p < .001, bootstrap test), suggesting that each individual observer has their own unique spatial distortions that are consistent within themselves and distinguished from other observers (Fig. 1c), consistent with a previous study [4]. To quantify the unique contributions of within-observer versus between-observer effects, we also fitted linear regression models and compared the unique variance explained by distortion indices within each observer versus averaged distortion indices across other observers, as well as the shared variance, using a bootstrap procedure (see Data Analysis). Results showed that distortion indices within observers (“self”) on average uniquely explained 76.76% of the variance in the full model. Put simply, this means that a particular observer predicts their own pattern of distortions very well. The unique variance that cannot be explained by the observer themselves, and can only be explained by the distortion indices from other observers (“others”) was less than 0.1%. This is not surprising: it simply means that other observers ‘ judgments do not have any explanatory power beyond what is already explained by one’s own pattern of distortion (i.e., it should be zero). The shared variance between self and others on average explained 23.16% of the variance in the full model. This shared variance is akin to the between-subject similarity in Fig. 1c. The unique variance explained by distortion indices within each observer was significantly larger than both the averaged distortion indices across other observers (p < .001, bootstrap test, Bonferroni corrected a_B = .025) and the shared variance between these two predictors (p < .001, bootstrap test, a_B = .025). Since the shared variance between self and others captures the shared contributions from every observer versus all other observers, this is essentially a between-observer effect. The regression models show that idiosyncratic observer-specific biases are the major contributor to the spatial distortions, rather than a common spatial bias among observers.

Experiment 2: Associate Individual Distortion Fingerprints with Visual Acuity

In Experiment 1, we demonstrated that individual observers have unique visual space distortions. Where do these idiosyncratic spatial biases emerge? Given that previous studies have revealed substantial heterogeneity in visual acuity across the visual field [5-7], it is plausible that the spatial distortions emerge as a consequence of this. Therefore, in Experiment 2, we measured Vernier acuity [14] at different spatial locations to assess the potential association between spatial distortions and variations in visual resolution. We used Vernier acuity task because unlike other acuity measurements such as grating acuity, Vernier acuity, also called hyperacuity [15], exceeds the limits imposed by the maximal cone density on the retina, and has been shown to measure acuity at a cortical level [16].