Abstract
Perceptual learning is typically highly specific to the stimuli and task used during training. However, recently it has been shown that training on global motion can transfer to untrained tasks, reflecting the generalising properties of mechanisms at this level of processing. We investigated if a) feedback was required for learning when using an equivalent noise global motion coherence task, and b) the transfer across spatial frequency of training on a global motion coherence task, and the transfer of this training to a measure of contrast sensitivity. For our first experiment two groups, with and without feedback, trained for ten days on a broadband global motion coherence task. Results indicated that feedback was a requirement for learning. For the second experiment training consisted of five days of direction discrimination on one of three global motion tasks (broadband, low or high frequency random-dot Gabors), with trial-by-trial auditory feedback. A pre- and post-training assessment was also conducted, consisting of all three types of global motion stimuli (without feedback) and high and low spatial frequency contrast sensitivity. We predicted that if learning involves low level processing, then based on the frequency specificity of the lower level areas, transfer would be contingent on the frequency of training. However, if learning involves a higher, global level of processing, more selective transfer would occur that matches the broadband tuning of the higher processing levels. Our training paradigm was successful at eliciting improvement in the trained tasks over the five days. However, post-training assessments found learning and transfer exclusively for the group trained on low spatial frequency global motion. This group exhibited increased sensitivity to low spatial frequency contrast, and an improvement for the broadband global motion condition. Our findings are consistent with perceptual learning which depends on the global stage of motion processing in V5.
Introduction
Perceptual learning has attracted much attention as a potential tool to aid recovery of lost visual function for clinical populations [1]. The success of perceptual training in amblyopia [2–4], presbyopia [4] and cortical damage [5–7] has demonstrated sensory plasticity in adulthood. This evidence contradicts the position that sensory development is restricted to a critical period early in life [8, 9] and that the visual system is hard-wired in mature systems [10]. While it has repeatedly been established that training can improve perceptual abilities [11], these benefits tend to be highly specific for both the perceptual features of the stimuli [12–14] and the behavioural task used in training [15]. This specificity severely limits the effectiveness of perceptual learning as a general therapeutic tool. Resolving the conditions under which learning is tied to the features and tasks used in training, and how much it can generalise to new tasks and stimuli, is imperative for understanding the mechanisms of perceptual learning [11, 16].
Models of Perceptual Learning
An important theoretical question in perceptual learning concerns the location in the processing stream of the learning mechanism(s), for which there are, broadly speaking, two main positions. The first explains perceptual learning in terms of a change in the neurons that code for that feature [17, 18]. Perceptual learning is typically highly specific for the features of the stimuli used for training, including their orientation [13, 19, 20], spatial frequency [11, 13, 21], direction of motion [15], retinal location [14, 20, 22–25] and the eye to which they are presented [14, 23, 24]. On the basis of this feature-specificity it has been argued that the underlying brain area responsible for the learning process must be the primary visual cortex [14] where the receptive fields of cells display this high degree of specificity. Gilbert et al., (2001) [18] suggest that this indicates that there is an element of plasticity early in cortical visual processing and consider that individual neurons at this level change over time [26]. One way perceptual learning could occur is by altering the receptive fields of sensory neurons to exploit correlations between their responses, via anti-Hebbian or Hebbian learning [18]. Anti-Hebbian learning has the aim of reducing the correlations between the responses of neurons, so as to produce an efficient encoding of sensory information [27–29]. Conversely, Hebbian learning, by accentuating the correlations between responses, might be beneficial in tasks such as global motion and form, in which it is important to segregate signal from background noise [18].
However, Mollon and Danilova (1996) [30] argue that the failure of perceptual learning to transfer to test stimuli that differ from those used in training cannot be used to infer unambiguously that learning occurs through a change in the representation of information in early visual areas. The specificity of learning, and lack of transfer, merely indicate that these early visual areas are involved in the learning process. Herzog and Fahle (1998) [31] suggest that learning is more complex than proposed by the visual representation model. They argue that learning is not driven purely by a set of stimuli in a bottom up manner, and that a model of learning also needs to include top down factors such as an internal estimation of one’s performance and knowledge and understanding of the task. While later ‘low-level’ models [26] do consider top-down influences on learning, such as attention and feedback, they still argue that the change occurs at the level of the receptive field.
The second prominent position hypothesises that learning takes place at higher levels within the processing hierarchy [24, 32–35]. Consistent with the lower-level models, Dosher and Lu (2010) [33] agree that Hebbian learning is one of the most important forms of perceptual learning. However, unlike models where the change occurs at the level of the receptive fields in V1, higher-level models propose that the repeated experience at V1 alters the connections between lower visual areas and the high level areas where the perceptual decision is then made [36]. Therefore, rather than a change in the sensory neuron, learning represents a change in the weights of connections between the input, sensory encoding layer and the classification layer at which the decision is made.
Many behavioural studies of perceptual learning are unable to distinguish between improvements that result from a change in sensory encoding, and those that result from a change in the connections between sensory and decision-making stages of processing. While the failure of perceptual learning to transfer across retinal location, orientation and other stimulus dimensions implicates V1 in the learning process, this in itself cannot dissociate a change in the receptive fields of neurons from a change in their connections to subsequent processing stages [37]. Petrov et al., [36, 37] suggest using perceptual learning experiments in which the sensory representations required for two tasks are the same, but the decision stages differ, or where both sensory representations and the decision stage are common to both. Bejjanki et al., (2011) [38] propose a model that is computationally similar to the late theory models, however it illustrates how changing the population codes in the early sensory areas can create similar changes in response to those made by high-level re-weighting models. Bejjanki et al., (2011) [38] argue that the model captures the characteristics for perceptual learning in both behavioural and physiological changes, through sensory inputs improving the decision weights in the feed-forward connections, and improved probabilistic inference in early visual areas resulting from increased neural activity in the feedback network.
The effects of learning on the receptive field structure of visual neurons have been measured directly using single-cell recordings. Changes in orientation and spatial frequency tuning in V1 [39, 40] and V4 [41], and increased sensitivity to motion coherence in V5 [42], have all been found as a result of perceptual learning. Neuroimaging in humans observers has indicated a reorganisation of the visual cortex after training [43–45]. However it has been argued that the changes to the tuning of individual sensory neurons are not sufficient to account for the behavioural improvements achieved through training, and that the effects on populations of neurons, including the reductions in the correlation in noise between neural responses, need to be taken into account [46]. Dosher and Lu (1998) [32] suggested that learning may be explained as an improved ability to filter out external noise and reduce internal noise. Likewise, Gu et al., (2011) [46] predicted that by pooling sensory information across cells, noise would be reduced, efficiency improved and learning would occur. Noise levels were examined in macaques, by recording from individual pairs of neurons. Gu et al., (2011) [46] found that noise correlations were weaker for those trained on a perceptual discrimination task than those who were untrained. However, their results also showed that optimally decoding all neurons did not predict a meaningful change in learning performance. Gu et al., (2011) [46] concluded that the reduction in noise is also an unlikely explanation for the effects of training, and support the idea that perceptual learning may change the weighting of sensory inputs to a decision unit. It is generally judged that the changes to responses in sensory neurons are not sufficient to explain the improvements in task performance that are observed [33, 46].
Transfer
A priority for perceptual learning researchers is identifying learning that transfers. A number of recent studies have demonstrated transfer after training [47–49]. Xiao et al., (2008) [48] demonstrated a transfer of contrast discrimination to different retinal locations using a double-training protocol. Double-training involves overt training on a task-relevant feature in addition to exposure to a task-irrelevant feature [50]. Xiao et al., (2008) [48] trained observers using two simultaneous tasks; one task was to detect a relevant feature (contrast) at one retinal location, followed by detection of an irrelevant feature (orientation) at another location. They reported full transfer of contrast sensitivity to the untrained location. Similarly Zhang et al. (2010) [49] showed that observers who were trained to discriminate orientation and simultaneously exposed to a second orientation experienced a full transfer of improvement in orientation sensitivity as long as training preceded exposure, rather than being simultaneously presented. Exposure to the untrained orientation was not sufficient to lead to transfer. Zhang et al., (2010) [49] proposed a rule-based learning model to account for the transfer, where higher level decision units learn and re-weight the V1 inputs. They proposed that the absence of functional connections to the untrained orientation or location prevents any potential reweighting. Double training, through exposure, activates the functional connections at the new locations or orientations, enabling transfer. This suggests that there may not be a straightforward correspondence between stimulus feature and the neural loci involved in performing the task. Double-training has been a successful paradigm in obtaining evidence of transfer [48, 51, 52] however, task difficulty during training and confidence in decisions still play a vital role in the learning and transfer process [53].
The Reverse Hierarchy Theory is an early theoretical model proposed to account for the transfer of learning [54]. Hochstein and Ahissar (2002) [54] proposed that learning is not a simple bottom up, stimulus driven process. Rather, learning is a top down process, beginning at higher processing areas and filtering backwards to lower-level areas, with the degree of transfer dependent on the characteristics of the receptive fields involved in performing the training task. Hochstein and Ahissar (2002) [54] proposed that transfer occurs as a result of modification of neurons found in the higher cortical levels, where receptive fields generalise. Alternatively, specificity could arise from changes in neurons where the receptive fields are localised, at the lower cortical levels. Global processing may therefore play an important role in creating learning that generalises to other tasks and stimuli [55]. If, as predicted by feature specificity, learning is stimulus driven and reflects the tuning properties of the relevant receptive fields, what specificity is expected for tasks that are not processed in early, highly specific lower visual areas? For example, perceptual learning has been shown to improve detection and discrimination in tasks using global form [56] and global motion [57–59]. These kinds of tasks are known to be processed at higher levels of the visual cortex, in areas V4 [56] and V5 [60, 61], respectively.
The visual system is typically represented as a hierarchy of cortical areas [62, 63]. At the lowest level, V1 and V2 represent simple visual dimensions, such as the position, orientation and scale of local image features [12]. The receptive fields of cells in V1 are small, only responding to a very restricted area of the visual field [64–66]. Furthermore, physiological evidence from non-human animals has shown that V1 neurons are sharply tuned to orientation [67, 68], spatial frequency [67, 69] and direction [63, 70]. In contrast, the receptive fields in higher cortical areas, such as those found in V4 and V5, are larger than those of V1 and are less dependent on location, retinal size, viewpoint, and lighting [64, 69, 71–75].
The Reverse Hierarchy Theory predicts that the feedback connections from higher levels back to early visual processing areas may be involved in facilitating learning that transfers, and that the key to understanding specificity and transfer lies in the hierarchy of processing and the feedforward and feedback connections between them [54, 55].
Global versus Local Processing
The receptive fields of neurons in higher cortical areas integrate information to represent global stimulus properties [18, 76–79]. Global processes are investigated using stimuli or tasks that can only be resolved through integration and segregation of coherent or conflicting information [78–80]. For example, the perception of motion is hypothesised to occur as a two stage process [77, 81]. At the first stage, spatial frequency- and orientation-tuned mechanisms in V1 encode the motion signals that occur locally within the receptive fields of individual neurons [70, 78]. However, to process more complex motion, the ambiguous or conflicting first-stage signals need to be integrated over a wider spatial area to provide a global representation of motion, and to unambiguously estimate local velocity [82, 83]. This process is hypothesised to occur higher in the visual hierarchy in area V5 [77, 81]. Single cell recordings from the visual cortex of non-human primates have identified the neural populations responsible for the two processing stages to be V1 and MT/V5, respectively [70]. The role that V5 plays in spatially integrating motion signals is well supported from non-human primate data, as well as neuroimaging studies in humans [60, 64, 84, 85]. Interestingly, the re-entrant connections from V5 to V1 are particularly extensive, suggesting an important role for top-down information in the encoding and interpretation of motion [86].
Global motion coherence is typically studied using random dot kinematograms, requiring observers to make a direction judgement from a stimulus comprised of a pattern of moving dots. A typical stimulus will often contain a proportion of signal dots moving in one direction and noise dots moving in random directions [15, 59, 87] (fig 1a). Difficulty is increased by reducing the ratio of signal-to-noise dots; the more noise dots the lower the coherence, and the more difficult the task. In order to perceive a coherent global motion, observers need to integrate and segregate the motion signals over space and time [78]. Another method of investigating motion coherence is to use an equivalent noise paradigm [58, 88–90]. Rather than having distinct populations of signal and noise dots, all dots contribute both to the signal and the noise by drawing the direction of motion for each dot from a random distribution. Dots move the same distance between each frame and the direction travelled is independent of the directions of the other dots [90]. Difficulty is increased by manipulating the standard deviation of the distribution of directions presented, thus each dot contributes to the signal [89] (fig 1b).
Schematic of global motion stimuli where a) is a typical stimulus comprised of signal dots and noise dots (approximately 30% coherence). b) An equivalent noise stimulus where all dots contribute equally to the signal and the noise (drawn from a distribution of approximately 90° out of a possible 360°). In each case the correct answer would be that the dots are moving towards the right.
The local and global stages of motion processing differ in their spatial frequency tuning. Local motion detectors have receptive fields that are narrowly tuned for spatial frequency [67]. In contrast, receptive fields that process global motion have a much broader tuning. Cells higher in the processing cortical hierarchy, involved in the perception of global aspects of the image, thus generalise over stimulus parameters such as spatial frequency and location. Modification of these generalising receptive fields could therefore produce perceptual learning that also generalises over these stimulus parameters. On the other hand, specificity in perceptual learning arises from changes in lower-level cortical areas, where neurons show much more specific tuning for stimulus features. Since perceptual learning occurs for both global motion [58, 59] and form tasks [80, 91], this suggests that learning is not restricted to the initial encoding of information in V1, and can occur at higher levels of cortical processing. Learning that involves higher level global aspects of perception is of particular interest since it has the potential to produce more generalisable improvements in perception [55]. Establishing the conditions for and the extent to which global motion training generalises its learning in this way is an important step to understanding the global level of processing in perceptual learning.
Perception of global motion
The role that V5 plays in the perception of global processes has been highlighted by extra-cellular recordings [60, 92] and lesion studies [93, 94] in non-human primates. Receptive fields in V5 can be up to tenfold larger than those in V1. V5 neurons have large, broadly tuned receptive fields that sum the responses of a set of V1 neurons, across space, orientation and spatial frequency [66]. More recently, Lui et al., (2007) [84] showed that the majority of V5 neurons in marmoset monkeys have band-pass spatial and temporal frequency tuning, with a preference for low spatial frequencies. Incoming sensory information causing V5 to respond drives feedback projection from V5 to V1 that corresponds to the retinotopic locations around the stimulus location of the V1 cells [74]. Using a noise masking paradigm Amano et al., (2009) [76] investigated if the specificity of local motion frequency tuning was preserved once pooling for the perception of global motion has occurred. Stimuli were composed of either drifting Gabors or plaid elements that were either signal or noise. Signal elements were defined by a consistent drift speed and direction, and noise elements were defined by random speeds and directions. Frequency tuning was determined by examining how noise elements at one spatial frequency interfered with the extraction of the signal at another frequency. Amano et al., (2009) [76] found that thresholds increased as the spatial frequency of the noise elements was reduced, and concluded that this is indicative of a broadband, low-pass tuning of motion pooling. The results obtained by Amano et al., (2009) [76] are consistent with those of Bex and Dakin (2002) [77], and suggest a global level pooling for low spatial frequencies. Moreover, this is further supported by an fMRI study [95] which found that while areas V1,V2,V3 and V4 showed bandpass spatial frequency tuning, area V5+ exhibited an attenuated response for high spatial frequencies, but no reduction in responses for low spatial frequencies. Amano et al., (2009) [76] propose that, within V5, there is a “motion-pooling mechanism” that demonstrates a broadband, low-pass tuning. Pooling of visual sensory information is an important step for integration and segregation and has also been found in studies investigating transfer in binocular stereopsis [96]. The neural mechanisms of stereoscopic vision are also known to be processed on multiple levels of the visual hierarchy [97], and the pooling of information across spatial frequency mechanisms has been proposed as an important step in estimating depth [98–101]. In the same way, pooling in V5 for low frequencies is potentially an important step for integration and segregation of coherent motion.
Learning mechanisms for global tasks
Training using global motion has been found to transfer between eyes [15, 59]; increase sensitivity for detection and discrimination [5]; help recover some of the blind field for cortically blind subjects [5]; and reduce contrast thresholds for drifting stimuli [58]. Huxlin et al., (2009) [5] suggested that extensive training using global motion may result in general improvements in visual sensitivity, and proposed that the improvements may be a result of the feedback connections from V5 to V1. Huxlin et al., (2009) [5] explain their findings using the Reverse Hierarchy Model of perceptual learning [54] and propose that training may stimulate and reactivate “intact islands” of activity.
Levi et al., (2014) [58] also found transfer of learning in a control population. Transfer occurred from global motion to contrast sensitivity, a task known to be processed in an early visual area where cells are narrowly tuned [67]. Levi et al., (2014) [58] suggested that improvements in their study may be due to the combination of learning a global motion task, at a higher cortical level, while simultaneously being exposed to high contrast, broad-frequency random dot stimuli. Levi et al., (2014) [58] remarked that a striking result from their findings was the particular improvement in direction discrimination for low spatial frequency stimuli, despite training with broadband random dot stimuli. They consider these findings evidence for a broad transfer of learning, across spatial frequency, in the primary visual areas, and that the learning achieved through training on global tasks differs fundamentally from that from local tasks.
Greater transfer of learning is proposed to occur through the re-weighting of high-level perceptual units, which tend to generalise across stimulus properties to a greater degree than low-level perceptual neurons [34, 58]. However, physiological evidence of training on global motion tasks suggests that, as in training for the discrimination of local image properties, learning changes the weighting between the sensory neurons (in this case direction selective neurons in V5) and neurons involved in the decision making process, rather than altering the sensory neuron’s properties [102]. Given the differences in spatial and temporal frequency tuning of processing between the local and global levels [60, 67, 84], it is important to determine the degree of specificity for low-level stimulus properties when training with global stimuli. This allows us to determine whether the degree of transfer is limited by the tuning properties of global detectors, in the same way as for information encoded purely locally, and thus whether the involvement of sensory mechanisms is the same for the two levels of processing.
Current study: Learning transfer from global motion
Levi et al., (2014) [58] found that after training on global motion tasks, with a broadband frequency range, observers showed an increase in contrast sensitivity. However, broad frequency stimuli contain both high and low frequencies, and contrast detection is known to be associated with processing in early visual areas [21, 103, 104]. Therefore, if part of the learning mechanism for global motion operates at the early, local processing stage, this may explain the transfer of learning they found, since the broadband stimuli used would activate neurons with a range of spatial frequency tuning. This would support the model of learning which asserts that learning changes involve early, local receptive cells [17, 18, 26].
However, if the learning mechanism for global motion is located at higher levels, where motion detectors have a preference for low spatial frequencies [76, 84], it might be predicted that perceptual learning for global motion would show a broad transfer to stimuli that contain low spatial frequency components. This would also be consistent with the transfer of learning from global motion to contrast sensitivity for drifting gratings found by Levi et al., (2014) [58]. The spatial frequency tuning of learning for global motion may thus be used to determine whether this learning involves early, local motion detectors, or is restricted to the global processing stage. In order to test this, we focused on the spatial frequency tuning of learning, using a global motion coherence task. Since the stimuli used by Levi et al., (2014) [58] were broadband, it was not possible to determine the degree to which learning depends on the spatial frequency content of their training stimuli. To address this, we used a similar learning paradigm, but used test and training stimuli that contained (1) a broad range of spatial frequencies; (2) only low spatial frequencies or (3) only high spatial frequencies. Additionally, we included measures for contrast sensitivity for static low and high spatial frequency targets. Furthermore, based on the predictions that learning and transfer are most successful with a combination of easy and difficult trials [47, 53, 105, 106] the stimulus levels chosen ensured a range from extremely easy (high coherence) to very difficult (low coherence), with a high proportion of easy trials.
If training at the global level impacts learning at the early, local motion processing stage, we may expect an improvement in the trained frequency, and some transfer between low and broad frequencies, and high and broad frequencies, but no transfer between high and low frequencies (fig 2a). We might also expect frequency-tuned improvements in contrast sensitivity, mediated by the training on global motion stimuli. A second possibility is that learning depends on re-entrant connections from V5 to V1 [107], consistent with the reverse hierarchy model [55], but that this feedback is limited by the spatial frequency tuning of V5 mechanisms. Under this model, we would expect improvement in performance for low- and broad-frequency test stimuli, regardless of the frequency tuning of the stimuli used in training. Additionally, if the learning mechanism for global motion operates at the higher levels, between the directionally-tuned sensory neurons in V5 and the decision stage [36, 102] then a different pattern of transfer might be expected, with a pronounced effect for low frequencies, for which these mechanisms are tuned, more modest effects for broadband stimuli, and minimal transfer from high frequencies (fig 2b). Finally, if global motion training improves contrast sensitivity, we would expect to find some specificity to the trained frequency of global motion stimuli (fig 2c).
a) If transfer occurs at the local level within V1, improvement across frequencies for global motion would be predicted to occur for stimuli with shared properties. Training on Broad Frequency stimuli should improve performance on both High and Low Frequency stimuli. Training on Low frequency should result in some improvement for Broad Frequency stimuli, as would training on High Frequency stimuli. We would not expect any transfer of learning between Low and High frequencies b): If training for global motion transfers at the global level we predict that, as a result of the preference for low frequencies, a more selective transfer would occur c) Finally, if training on global motion improves contrast sensitivity we would expect that this would occur based on shared spatial frequency properties
The purpose of this study was to investigate the specificity of spatial frequency tuning in global motion after training and to determine the extent to which training with global motion stimuli is specific for the spatial frequency content of the training stimuli. In order to assess this, observers completed pre- and post-assessments in all frequencies of global motion, both frequencies of contrast sensitivity and a control task for orientation discrimination. The control task was added as a control for the possibility that training is simply an increased ability resulting from participating in psychophysical experiments. However, prior to collecting the data for this study we questioned whether trial-by-trial feedback was a requirement to learning.
Experiment 1: Is feedback necessary for learning?
The objective of this experiment was methodological in nature, in that we investigated the necessity of feedback for perceptual learning to occur for our specific stimuli. Performance feedback notifies a learner how accurate their performance is, and can be generated internally (unsupervised learning) or provided by an external source (supervised learning) [109]. External feedback often takes the form of an auditory or visual cue that is presented after a trial to indicate a correct or incorrect response. Behavioural perceptual learning studies have shown that using external feedback can improve learning and increase efficiency [110, 111]. On the other hand, internal feedback is an internally generated benchmark to one’s own performance on a task [109]. It has been hypothesised that learning should occur without external feedback, if the observer’s internal confidence and accuracy are sufficiently high [22].
In a study conducted by Seitz et al., (2006) [112], learning did not occur without feedback even when easy trials were presented. However, some studies have found that learning occurs without external feedback [11, 15, 25, 36, 61, 106]. This results in a complicated pattern of empirical findings, and the specific nature of feedback and its role in perceptual learning is unclear. Recently, Lu and Dosher (2012) [106] found that interleaving high accuracy (easy) trials and low accuracy (difficult) trials resulted in perceptual learning without the need for feedback even on difficult trials. Based on the findings of Lu and Dosher (2012) [106] we predicted that we would find learning in both conditions, as long as there were easy and difficult trials presented in an interleaved method. As detailed in the following sections, our study found that learning did not occur in the condition where no feedback was provided, even when easy trials were presented. Learning was only found when feedback was presented. With this in mind our design for experiment 2 included feedback during training, but no feedback when testing.
Methods and Materials
Observers
24 observers were randomly and evenly assigned into a feedback or a no-feedback group. All observers were employees or students from the University of Essex and self declared as having normal or corrected-to-normal vision. All work was carried out in accordance with the Code of Ethics of the World Medical Association (Declaration of Helsinki). The study procedures were approved by the University of Essex University Ethics Committee (JA1601). All observers gave informed written consent and were either paid or received course credit for their participation.
Apparatus
Stimuli were generated and presented with Matlab 2015a using the Psychophysics Toolbox extensions [113–115]. The broadband global motion stimuli were presented using a 27” 2.7 Ghz IMac running OSX 1.9.5 iMac with a display resolution of 2560 x 1440 pixels and a 60 Hz refresh rate. Viewing distance was 450mm, the stimuli subtended a visual angle of 66.8°, and one pixel subtended 1.77 arc minutes.
Stimuli and Procedure
Global motion stimuli contained 100 Gaussian elements, each with a standard deviation of 6.8 arc minutes. Elements were presented within a mid-grey rectangle measuring 17.° x 17.6° on a mid-grey background. Elements moved 5 pixels per frame, and each element moved a fixed distance of 8.8°. Dots wrapped around the rectangle when approaching the edges (fig 5a). Global motion stimuli were based on the task designed by Williams and Sekuler (1984) [90]; dots move around the screen at 7 levels of coherence (180°; 300°; 330°; 340°; 345°; 350°; 355°). Lower numbers depict higher coherence, and higher numbers depict lower coherence and thus more randomness. At each level of coherence each dot moves a fixed distance within the defined arc of degree of coherence, while moving in a leftwards or rightwards direction (fig 3).
The random walk that creates the motion of dots with a 90° coherence. The arc defines the potential area of movement each dot could take at that coherence. The arrow indicates the actual trajectory and motion of the dot on each step. On the right; possible trajectories for the random walk for stimuli with 180°, 270° and 360° coherence.
All tasks were presented as two-alternative-forced-choice (2AFC) decisions (left or right response), using the method of constant stimuli (MOCS). Observers responded by pressing the left or right arrow key associated with perceived left and right motion. When feedback was present, this was provided as an immediate auditory beep after each trial, a high pitched tone for a correct response (2000Hz for 10ms), and a low pitched tone for an incorrect response (200Hz for 40ms).
General Statistical Methods
Moscatelli et al., (2012) [116] proposed using the Generalised Linear Mixed Effects Model (GLMM) for psychophysical data. The GLMM is an extension of the General Linear Model (GLM), and one that provides a more robust statistical analysis where the data contain irregular response distributions. The GLMM contains both fixed and random effects. The fixed component estimates are the effects of interest, in our study they are a) the day (or session) of testing and b) each level of the stimulus (coherence, contrast or orientation). The modelling of random effects assesses the differences between related groups (such as those from different observers) that allow inference to a larger population [117].
The linear component of the model is given by:
where α and β are the intercept and the slope of the fixed effects parameters respectively. The link function converts the expected outcome variable (the proportion of correct responses, p) to the linear predictor [118]. Here, a logit function was used, adapted to take account of change performance, such that:
Selecting the model
All response data were analysed using a GLMM conducted using Matlab (using the fitglme function), the logit link function and REMPL (Restricted maximum pseudo likelihood) to estimate the model parameters. Each model included observers as a random effects, and Time and Level and their interaction as fixed effects. Each model was assessed for goodness of fit, using random slopes or intercepts only, or their combination. Models (3a-3d) were compared using the likelihood ratio to test which model produced the lowest model fit statistics [116]. The Akaike information criterion (AIC) is one of the best known criteria to use in selecting a good model, where fitted values are most likely to be true values, thus providing a fit closest to reality [117]. The best regression model will provide the lowest value of AIC.




Where R is the observer response (proportion correct) T is time, the day or session of testing, L is the the stimulus level (degree of coherence, level of contrast, or degree of orientation) and O is the observer.
Interpreting the model
The intercept is the proportion of correct responses when the linear model value is zero. An increase in intercept over time would reflect an overall increase in the number of correct responses. The slope relates to how the proportion of correct responses increases with the strength of the stimulus (degree of coherence, amount of contrast, or degree of orientation). Overall performance can also be summarised in terms of the 75% threshold, the point at which observers responded correctly on 75% of trials.
Results
To verify the spread of easy and difficult trials across the stimulus levels suggested by Liu et al. [106], accuracy for the first day was evaluated across all observers for each level of the stimulus and is reported in (Table 1). The accuracy across levels confirms that there is an even range of easy trials (85% and above) difficult trials (65% and below) with the balance around 75%.
Accuracy at each level of the stimulus.
To identify if easy and difficult trials were sufficient for learning in the absence of trial-by-trial feedback, we analysed individuals’ daily performance across the two training groups (Feedback or No Feedback). Individual responses from the 10 days’ training were aggregated for each variable; observer, day and level of coherence. The response data were analysed independently for each group (Feedback or No Feedback) using the method outlined above. Response was modelled as a fixed effect of day and coherence and an interaction between these two predictors. Results are presented in table 2. For the Feedback Group, there was a significant negative effect of day however there was a significant positive Day × Coherence interaction indicating there was a change in slope across the 10 days. The same analysis was undertaken for the No Feedback group, however, and there was a main effect of day and a significant negative interaction. The no feedback group performed worse after 10 days training. Performance over the 10 days is illustrated in fig 4, where the slope for the feedback group gets steeper over time, day 10 (blue) versus day 1 (red), along with the 75% discrimination threshold, which shifts leftwards, indicating a reduction in threshold between the start and end of training. Conversely, the group trained without feedback performs progressively worse over time and the 75% discrimination threshold shifts rightwards, indicating an increase in threshold.
Fixed effects parameters for the Pilot Data
Mean thresholds (in degrees) over 10 days on global motion training for feedback (red) and no feedback (blue). Error bars represent ± SE.
Global motion stimuli, at broad (a), low (b) and high spatial frequencies (c).
Discussion
The purpose of the first experiment was to establish the necessity of performance feedback when undertaking a period of training on a task to discriminate the direction of global motion. Fahle and Edelman (1993) [22] predicted that internal reinforcement could act as the teaching signal when performance feedback was absent and when the confidence is high [53, 119]. Therefore, perceptual learning should occur when training procedures include a mixture of easy and difficult trials. This was the case for the study conducted by Lu and Dosher (2012) [106] where learning occurred for easy and difficult trials without feedback. For this specific task, consistent with Seitz et al., (2006) [112], our results found that feedback was a requirement for learning. After ten days of training the group who received feedback improved significantly, while the group receiving no feedback did not improve (worsened).
Feedback and Perceptual Learning
Our predictions were based on the findings by Lu and Dosher (2012) [106] who demonstrated that interleaving high accuracy (easy) trials and low accuracy (difficult) trials, resulted in perceptual learning without the need for feedback. Lu and Dosher (2012) [106] explain their findings using the Augmented Hebbian Re-weighting Model (AHRM) [33]. When external feedback is provided, the post-synaptic activation is shifted further in the correct direction, enforcing appropriate weight changes in the decision unit. However, when external feedback is absent the model acts in a similar manner to an unsupervised Hebbian model, using the observer’s internal response. In this situation learning is dependent on the the level of difficulty of the task, and uses the observer’s internal confidence to update the weights [106]. Where a task is easy, the weights still move, on average, in the correct direction [120]. For a difficult task, the correlation is weak and the process of updating the decision weights are slow and random, and learning does not occur.
The results of our study showed that learning did not occur without feedback, even when easy trials were presented. While our task did span the 65-85% accuracy range used by Lu and Dosher (2012) [106], there were some differences between the studies. Firstly, Lu and Dosher (2012) [106] used an adaptive staircase to track performance and stimuli were presented at either 85% and 65% accuracy throughout the experiment. Ours, on the other hand, did not include an adaptive staircase. Stimuli were randomly presented using MOCS. This method involves random stimulus selection from a predefined range of stimulus magnitudes [121]. In our experiment observers were randomly presented 1 of 7 threshold levels. Interleaving these levels may influence how decision weights are updated and reduce the observer’s confidence in their judgements, for example by disrupting observers’ meta-cognitive judgements of perceptual confidence [122] and thus their ability to selectively weight high-confidence trials in perceptual learning. [53]
The findings bear some similarity to those found by Seitz et al., (2006) [112], who also used MOCS. Seitz et al., (2006) [112] found that learning did not occur without feedback, even when easy trials were presented. Observers trained on one of two tasks, either to discriminate the direction of low luminance motion stimuli, or to discriminate the orientation of a bar that had been masked in spatial noise. After training, both groups who had received feedback showed an improvement, while the groups without feedback did not improve. Seitz et al., (2006) [112] note that many of the experiments investigating perceptual learning use adaptive staircase procedures, and even though easy trials were present, this difference may contribute to the difference in findings. Seitz et al., (2006) [112] propose that interleaving easy and difficult trials within the staircase may allow for “better bootstrapping” from easy to hard, compared to trials that are randomly presented.
A second difference between the study conducted by Lu and Dosher (2012) [106] and ours was that while they used an orientation discrimination task, ours used a global motion coherence task. These tasks are known to be processed at different levels of the visual hierarchy. Learning without feedback has been obtained for local motion tasks. Ball and Sekuler (1982) [15] found that observers did not need feedback to improve on their direction discrimination task. In their study, observers were required to make a same/different judgement for two rapidly presented trials. In the “same” trials, motion took the same direction, and in the “different” trials the direction of motion varied by 3°. However, while the no feedback group did not receive trial-by-trial response feedback, they were rewarded with two cents for a correct response and had one cent deducted for each incorrect response, which may be construed as end of block feedback [112], which has been found to be as effective as trial-by-trial feedback [111].
Learning without feedback has also been found for global motion coherence tasks [61, 123]. However, none of these examples used the equivalent noise coherence task, but rather used the signal/noise coherence task. Huxlin et al., and Levi et al., [5, 58] using an equivalent noise method of global motion, also found perceptual learning, however they used an adaptive staircase for training. When investigating task-irrelevant learning, Watanabe et al., (2002) [124] found that task-irrelevant local motion improved passively, but did not find the same for task-irrelevant global motion. They suggest that this is indicative of the lower levels of the visual hierarchy being more receptive to modification, when attention is limited.
The results of this study ultimately provide us with evidence that to obtain learning during training, using the equivalent noise motion coherence used by Levi et al., (2014) [58], and presenting randomly these via MOCS, our procedure should include trial-by-trial feedback during the training process. Furthermore, since learning did not occur without feedback, no feedback would be provided during the pre- and post-assessment phases.
Experiment 2: Does learning transfer from global motion?
Methods and Materials
Observers
30 new observers, 12 male and 18 female, took part, randomly and evenly distributed across the three conditions, resulting in a total of 10 individuals in each condition.
Design and Apparatus
A series of baseline measures were completed for global motion direction discrimination, contrast sensitivity and orientation discrimination (pre-and post-training assessments). Assessment stimuli were presented on a VIEWPixx/3D 23.6 inch monitor with a display resolution of 1920 x 1080 pixels, with a 120 Hz refresh rate, using a Dell Precision T3610 PC running Windows 7. One pixel subtended 1.6 arc minutes and stimuli were viewed from a distance of 570mm. Head position for testing was stabilised using a chin rest. Following this, all observers undertook five consecutive days of global motion training in one of three spatial frequency groups (broad, high or low). Training stimuli were presented on a 19” monitor with a display resolution of 1980 x 1080 pixels and 60 Hz refresh rate, using a PC running Windows 7. One pixel subtended 1.7 arc minutes. Stimuli were viewed from a distance of 500mm. All stimuli were presented for 1 second.
Stimuli
Global Motion
Broadband stimuli were the same as previously described. For low-frequency stimuli, the elements were circularly symmetric Gabor patches. The standard deviation of the Gaussian window, σ, was 30.1 arc minutes, and the spatial frequency of luminance modulation, f, was 1 cycle/degree. For each element, the luminance profile was defined as a function of horizontal and vertical position (x, y) as:
where x0 and y0 is the central position of the element, and A determines the contrast. Elements for the high frequency stimuli were defined in the same way, but had a standard deviation of 7.48 arc minutes and a spatial frequency, f, of 4 cycles/degree. For all stimuli, elements were initially uniformly randomly distributed within a region of 16.6°x 16.6° on the centre of the screen. A central black fixation dot was presented at all times when stimuli were not being displayed. Examples of the stimuli are shown in fig 5 a-c. Motion was created using the method detailed in fig 3. On each frame each dot moved a fixed distance of 8.5 arc minutes.
Pre- and Post-training Assessment Global motion
Stimuli were identical to those described in the training session, with the following exceptions. The standard deviations of the elements were 6.4 arc minutes (broadband), 28.4 arc minutes (low frequency) and 7.0 arc minutes (high frequency). Stimuli were presented within a mid-grey rectangle measuring 15.9°x 15.9°, each element moved a fixed distance of 8 arc minutes.
Contrast Sensitivity
Stimuli were Gabor patches, with a spatial frequency of 1 cycle per degree (/°) or 4 cycles/°, presented in the centre of the screen on a mid-grey background, tilted either 20° (fig 6a&b). The Gaussian envelope of the Gabor stimulus had a standard deviation of 1.1°. 7 levels of contrast (0.05, 0.1, 0.15, 0.175, 0.2, 0.3, 0.4 % Michelson Contrast) were presented.
Gabors for a & b) contrast sensitivity for a) low spatial frequency-1 cycle per degree, ( b) high spatial frequency-4 cycles per degree, and c) orientation discrimination which is tilted leftwards at 2° with mid spatial frequency-2 cycles per degree
Training results over the 5 days for broad, low and high spatial frequencies. The red and blue dotted lines show 75% threshold on days 1 and 5 respectively.
Psychometric functions illustrating the three measures by which the non-linear regression provides evidence of a change. A: A leftward shift of the function indicates a general increase across stimulus levels, that did not vary as stimulus intensity increased. B: A steeper slope indicates an increase in the number of correct responses as stimulus intensity increases. C: An upward shift of the asymptote indicates an increase in performance where stimulus intensity is at its highest.
Orientation Discrimination
Gabor patches were presented for 100ms on a mid-grey background, measuring 7.9° x 7.9°, and presented 6.6° either above or below the central fixation point. Orientation was centred at zero, at which point lines were vertical. There was a total range of 1.8° difference in tilt, between +/- 0.9°, with 7 linearly spaced points within the range ( 0, 0.67°, 1.3° or 2°). The spatial frequency of the Gabor patch was 1.85 /° and the standard deviation of the Gaussian envelope was 0.2° fig 6c. Contrast was fixed at 50% Michelson Contrast.
Procedure
Training on Global Motion
For each observer, training was undertaken at one spatial frequency only, totalling 420 trials daily for 5 continuous days. Feedback was provided after each trial.
Pre- and Post-Assessments
Measures were taken for global motion (high, broad and low spatial frequency), contrast sensitivity (high and low spatial frequency) and orientation discrimination (angle of tilt from the vertical) of an oriented Gabor patch. Responses were captured on the DataPixx response box for contrast sensitivity, and left and right arrows on the keyboard for global motion and orientation discrimination. The presentation order of trials was randomised by contrast and orientation or angle. There were 20 repetitions for each of the seven levels, for each condition. Testing was performed in a darkened room, before and after training.
Results: Experiment 2
Training Data
The number of correct responses from the daily training was calculated for each observer, for each day and level of coherence. Training data were analysed independently for the three groups trained on different frequencies (Broad, Low, High) and were modelled as a fixed effect of day and coherence and an interaction between these two predictors.
Results were analysed using the method previously described and are reported in tab 3. Performance improved for the broad and low trained groups and there was a significant positive coherence by day interaction found for both conditions. For the group trained with high spatial frequency stimuli there was a significant main effect of day. This suggests that while there was an significant increase in the total number of correct responses over the five days, this did not vary as a function of of coherence fig 3.
Fixed effects parameters for the Training Data
Contrast Sensitivity Table
Pre- and Post-Assessment Analysis: Global Motion
Learning is often measured through monitoring performance at a particular threshold, which is expected to shift the psychometric function leftwards if performance is improved, 8A. This threshold describes performance in terms of accuracy as a function of the strength of the stimulus [125] which are usually positively correlated, and is expected to reach asymptotic performance at the highest stimulus intensity [126]. Inspection of the pre- and post-assessment data revealed that in some conditions performance did not reach perfect accuracy, asymptoting at a proportion of correct responses that was less than 1; this resulted in a poor psychometric fit of the observer response data using the GLMM. To accommodate this, a nonlinear generalised mixed effects model (NLME) was used to include an additional parameter in order to model variability in the asymptotic performance at high signal levels, as has been applied in other perceptual learning studies [127–129].
Nonlinear regression analysis
The nonlinear regression provides three measures to assess a change in performance over time. Firstly, like the GLMM a leftward shift in the curve indicates an improvement in threshold (8A). An increase in slope indicates an increase in the rate at which performance increases with increase signal level (8B). Finally a change in the asymptote indicates a significant change to the performance at the highest level of stimulus intensity (8C). These changes are independent aspects of the psychometric function fit, and may not necessarily always be congruent. For example, it is possible to obtain an increase in one measure and a decrease (or no change) in another.
Thus analysis of the pre- and post-assessment data was undertaken using a nonlinear mixed effects regression with the nlmefit function in Matlab with the following model:
where p is the proportion of correct responses, A determines the asymptotic level of performance, K defines the slope and C0 defines the threshold. Ad, Kd and C0d determine the change in these parameters after training, C is the coherence level, and S is a dummy variable, taking on values of 0 or 1 for the pre- and post-training sessions, respectively.
95% confidence limits were calculated using parametric bootstrap for 1000 simulated experiments, with the same number of simulated observers and repetitions as the actual experiment. Results for each condition are plotted in figs 9 to 11.
Top: Assessment results for the Broad Frequency trained condition on Broad, Low and High spatial frequency tests. Performance is presented as the proportion of correct responses, where blue circles show the results for the pre-assessment, and pink squares for the post assessment. 75% thresholds for pre and post are indicated by the green and orange vertical dotted lines respectively.
Bottom: Change statistics for the asymptote, threshold and slope, respectively. Plots show the median performance and the 95% confidence intervals for the change in performance between pre- and post-assessments. The red horizontal line at zero represents no change, confidence intervals crossing the zero line reflect no significant improvement. Points above the reference line show an improvement in performance and those below reflect a decrease in performance.
We proposed that should transfer occur at the local level within V1, improvement across frequencies for global motion would be predicted to occur for stimuli with shared frequency properties. Training on Broad Frequency stimuli should improve performance on both High and Low Frequency stimuli. Training on Low frequency should result in some improvement for Broad Frequency stimuli, as would training on High Frequency stimuli. We would not expect any transfer of learning between Low and High frequencies (see fig 13a). However, if transfer occurred at a global level, a more selective transfer would occur, mirroring the frequency tuning of V5 (see fig 13(b)).
Broad-frequency trained group
We predicted that should we obtain frequency specific transfer, then the Broad Frequency trained group should show some improvement in performance in the broad, low and high spatial frequency tests. The Broad trained group (fig 9) showed no consistent improvement in performance at post-testing. Most strikingly, when training and testing with broad frequency stimuli, there was no significant change in slope or threshold, and a significantly lower asymptote. For the low frequency test, no significant change was found for any of the three measures. Finally for the high frequency test, there was no change in slope, a significant reduction in asymptote, and a significant reduction in threshold. The increase in sensitivity at low signal levels in this condition was the only significant improvement found for the broad trained condition.
Low-frequency trained group
The low-frequency trained group showed the greatest improvement and transfer (fig 10). For the broad frequency test stimuli, there was a general upward shift in the asymptote but an increase in threshold and no change to the slope. This shows reduced sensitivity at low signal levels, but an increase is performance at higher levels. For the low frequency test stimuli, a reduction in threshold and increase in asymptote show clear overall improvement in performance. For the high frequency test stimuli, a reduction in threshold also indicates a significant improvement in performance at lower stimulus coherence. The significantly shallower slope suggests that at post-assessment the proportion of correct responses decreased for higher stimulus intensities in comparison with pre-test performance. Finally there was no change to asymptotic performance which was close to one at the pre-assessment phase suggesting little for improvement at the highest levels of coherence.
Top: Assessment results for the Low Frequency trained condition on Broad, Low and High spatial frequency tests. Performance is presented as proportion of correct responses, where blue (circles) show the results for the pre-assessment, and pink (squares) the post assessment. 75% thresholds for pre and post are indicated by the green and orange vertical dotted lines respectively.
Bottom: Change statistics for the asymptote, threshold and slope respectively. Plots show the median performance and the 95% confidence intervals for the change in performance between pre- and post-assessments. The red horizontal line at zero represents no change, confidence intervals crossing the zero line reflect no significant improvement. Points above the reference line show an improvement in performance and those below reflect a decrease in performance
High-frequency trained group
The group trained with high spatial frequency stimuli showed no improvement in any of the three spatial frequency conditions, for any of the three measures of threshold, slope or asymptote (fig 11).
Top: Assessment results for the High Frequency trained condition on Broad, Low and High spatial frequency tests. Performance is presented as proportion of correct responses, where blue (circles) show the results for the pre-assessment, and pink (squares) the post assessment. 75% thresholds for pre and post are indicated by the green and orange vertical dotted lines respectively.
Bottom: Change statistics for the asymptote, threshold and slope respectively. Plots show the median performance and the 95% confidence intervals for the change in performance between pre- and post-assessments. The red horizontal line at zero represents no change, confidence intervals crossing the zero line reflect no significant improvement. Points above the reference line show an improvement in performance and those below reflect a decrease in performance
Transfer : Global Motion
In summary, only the low frequency trained provided reliable evidence for transfer to untrained motion conditions, with some improvement in performance for broadband test stimuli.
Pre and Post Analysis: Contrast Sensitivity
We predicted that, should contrast sensitivity improve as a result of training on global motion, then it would depend on the frequency of training (fig reffig:trangm top). Data were analysed for all but the same 2 observers who did not complete the final session using the generalised linear effects model previously outlined. The measures of analysis are the intercept and slope, and an interaction between the two predictors.
Broad-frequency trained group
The group trained with broad frequency global motion showed a significant decrease in performance between pre- to post-training conditions, and a significant increase in slope between pre- and post-training. This suggests that their performance decreased at lower levels of the stimuli, while increasing at higher signal levels. Because of this ambiguity we plotted 75% thresholds for pre and post performance (fig 12 top left).
Results for Broad (top row), Low (middle) and high (bottom) trained groups in the for the contrast sensitivity pre- and post-assessments. Low spatial frequencies are shown in the left column and high spatial frequencies on the right. Performance is presented as proportion of correct responses as a function of Michelson Contrast. Blue (triangles) show the results for the pre-assessment, and pink (crosses) the post assessment. 75% thresholds for pre and post are indicated by the green and orange vertical lines respectively.
Figures a & b at the top show the predicted transfer. The bottom figure illustrates the actual transfer found
Post-assessment thresholds shifted rightwards suggesting worse performance overall.
Low-frequency trained group
The group trained with low frequency global motion showed a significant decrease in threshold, and a significant increase in slope. Again, we plotted 75% thresholds for pre and post performance (fig reffig:csplot middle left). In this instance the steeper slope, and leftward shift of the threshold, suggests improvement overall.
High-frequency trained group
The high frequency trained group showed no significant change for any condition (fig 12 bottom left).
Pre and Post Analysis: Orientation Discrimination
The orientation discrimination task was added as an untrained control, for which we predicted no improvement. Data were analysed for all but 3 observers who did not complete the final session, using the generalised linear effects model previously outlined. The measures of analysis are the intercept and slope, and an interaction between the two variables. For analysis, the degrees of orientation (± 0, 0.67°, 1.3° or 2°) were converted to absolute values. Results are plotted in fig 14, and the fixed effects are listed in tab 5
Assessment results for all trained groups (broad, low and high) for the orientation discrimination control task. Performance is presented as proportion of correct responses, where blue (triangles) show the results for the pre-assessment, and pink (crosses) the post assessment. 75% thresholds for pre and post are indicated by the green and orange vertical lines respectively.
Fixed Effects table for Orientation Discrimination
Neither the broad- or low-frequency trained group showed any significant change on this task. However, the high spatial frequency trained group showed a significant increase in threshold over session, and a significant increase in slope following training. The data shows that the 75% threshold at post assessment shift rightwards, suggesting worse performance overall.
Discussion
Having established the necessity of feedback for learning, the objective of the main study was to investigate the specificity of spatial frequency tuning in perceptual learning for global motion. We trained three groups of observers on a global motion task with stimuli tuned to three different spatial frequency ranges (Broad, Low, High). Our training results demonstrated that perceptual learning took place over the five days for all groups. In order to establish the degree to which learning generalised between the spatial frequency content of the stimuli, we compared the results of the pre- and post-training assessments for each condition of global motion. Analysis from the post-training assessment showed that unambiguous improvement was restricted to the low frequency trained group. Notably the broad trained group performed worse than they did at pre-assessment stage, and there was no change for the high trained group.
We predicted that if training on global motion improved contrast sensitivity at the local level it would be specific to the trained frequency, and restricted to low and broad frequency stimuli if transfer occurred at the global level. Pre- and post-training assessment for static contrast sensitivity found that there was a significant improvement for the low trained group on the low spatial frequency contrast condition. While the broad trained group showed some improvement, there was overall a decrease in threshold. No improvement was found for the high spatial frequency trained group, even on the high spatial frequency contrast sensitivity. Furthermore, no group showed improvement in sensitivity for the high spatial frequency contrast condition. The orientation discrimination task was included as a control condition and no improvement for any trained frequency was predicted or found. Our study supports that the perception of global motion shows a bias for low spatial frequencies and suggests the re-entrant connections from V5 to V1 play a crucial role in transfer.
Why is the study of global motion important?
Damage to the visual cortex, as a result of stroke or other brain injury can result in dramatic changes to connectivity between areas. It has been proposed that perceptual training may play a role in strengthening existing neural pathways and may even create novel connections [1–7]. While specificity is a limitation to the effectiveness of perceptual learning as a tool for therapy, there is evidence that this specificity is reduced for visually impaired populations [130–135]. Huxlin et al., (2009) [5] proposed that using global motion may tap into “islands of activity” within V1 through the feedback connections from V5 to V1. However, it is still unclear how much the brain is able to compensate for damage, and whether recovery involves building new connections or changes in the functional connectivity within the existing pathways [136]. Understanding how the cortical hierarchy is organised in terms of the nature of the feedforward and re-entrant pathways is central to developing theories of perception [137].
This study sought to explore the effects of training on global motion and its transfer to other spatial frequencies and tasks. The back projections from V5 to V1 are known to be crucial for the perception of motion [138], strongly represented [74, 86] and rapidly updated [138, 139], making this a theoretically plausible route for perceptual learning. Recently Romei et al., (2016) [107] published evidence suggesting that the re-entrant connections from V5 to V1 are malleable [107]. A novel dual-coil Transcranial Magnetic Stimulation (TMS) protocol aimed at inducing Hebbian plasticity [140] was employed to enhance the perception of coherent motion by targeting area V5 and V1 [107].
Thresholds were lower after TMS application, but critically dependent on the inter-pulse timing and direction of stimulation between V5 and V1. Motion perception was found to be strengthened for the re-entrant connection from V5 to V1, but not the feedforward connection from V1 to V5. Recently, a similar paired cortico-cortical TMS protocol (ccPAS), Chiappini et al., [108] induced a direction-selective improvement in performance by combining subthreshold stimulation with the simultaneous presentation of direction-specific moving stimuli. This provides accumulating evidence that the re-entrant connections from direction-tuned neurons play a role in perceptual learning in global motion coherence tasks. Furthermore, recordings obtained from macaque monkeys have previously suggested that there is no delay for information processed in V5 to be fed back to lower areas [139]. Hupe et al., (2016) [139] recorded extracellular responses from neurons in V1, V2 and V3 after inactivating V5 and compared the time course of responses for moving and flashed stimuli. Their results indicated there was no delay for either case and Hupe et al., (2016) [139] proposed that the feedback from V5 is present prior to the bottom up information from the feedforward connections, however they were not able to test this hypothesis. More recently, Bridge et al., (2016) [136] found evidence of an independent motion pathway in human observers, using diffusion-weighted MRI (see [141] for more information), that suggests a direct ipsilateral link between the lateral geniculate nucleus (LGN) and V5. These findings question the straightforward hierarchical structure of cortical connectivity and the generally assumed flow of information within the visual system.
Perception and learning of contrast stimuli
Our starting point for the investigation was the results from Levi et al., (2014) [58] who found that learning generalised from high-level global motion training to low-level contrast detection tasks. The pathway for contrast detection and discrimination is believed to develop during childhood and adolescence, reaching its peak in adulthood, followed by a decline in late adulthood [142]. Early psychophysical studies supported this view, finding little perceptual learning for contrast-dependent stimuli [17]. However, later findings [143] suggested that improvement in contrast sensitivity was dependent on having enough training. Improvement in contrast sensitivity is now a well documented result [48, 49, 143–146] but has been found to be highly tuned to spatial frequency, specific for trained eye and retinal location of training, but not selective for orientation [21]. Sowden et al., (2002) [21] trained observers to detect the orientations of sine-wave gratings at varying contrasts. Results indicated that thresholds were reduced, and that this learning was tuned for spatial frequency and specific to the trained location, but not specific for the orientation of the grating. As a result, they suggested that the mechanism for contrast detection is located in a sub-population of cells in V1, such as those in 4Cα, where cells are tuned for spatial frequency but not orientation [21].
Since Levi et al., (2014) [58] reported the biggest improvement for low spatial frequency stimuli in drifting targets, this may suggest improvement was specific to the temporal and spatial features of the training stimuli. We had questioned if the direction selective cells in the area might account for the improvement in contrast sensitivity. Thus, our paradigm tested this for static frequency specific stimuli. Our results were consistent with those found by Levi et al., (2014) [58] in that improvement occurred for low frequency stimuli, and only for the groups who trained on broad and low frequency global motion. There was no improvement on the high spatial frequency (4 cylces per degree) contrast stimuli for any trained condition.
Learning and transfer for global motion training
Based on the evidence for the frequency tuning of V5 [76, 77, 84, 95], we predicted that we might have obtained greater improvements (and transfer) between broad and low spatial frequencies, but limited or no transfer for the high spatial frequency trained group. Furthermore, we hypothesised that if learning occurred in the lower, local levels of processing, we may find transfer between the spatial frequency specific training of global motion stimuli to contrast sensitivity. With the higher level pooling of low spatial frequencies we expected that if transfer took place as a result of global motion, then we would predict some transfer between those frequencies containing low spatial frequency information, specifically between the low and the broad conditions. After the five day training period, there was significant improvement for the groups trained for all spatial frequencies
When assessing the post-training transfer to untrained global motion frequencies, the low spatial frequency trained group was the only group to show clear improvement in performance, and evidence of transfer to another condition. Although threshold was worse for the broad test after training with low frequency stimuli, the asymptotic performance was better. Interestingly, for the high spatial frequency test, there was a significant improvement to the threshold although the slope was significantly shallower. This suggests that the sensitivity for lower coherence levels increased. Performance at the highest levels reached an asymptotic performance close to 1 at pre-assessment and remained unchanged at the post-assessment stage. Finally the shallower slope suggests a reduction in correct responses as a function of stimulus intensity.
The high spatial frequency trained group showed no improvement and no transfer to any other spatial frequency. The most surprising results were those obtained from the broad frequency trained group. Asymptotic performance was significantly worse at the post test stage for their own trained frequency and the high frequency test, with no significant change in the low frequency test. There was no improvement in the slopes for any condition, and a small, but significant threshold improvement was found for the high frequency test
What our findings suggest for the models of perceptual learning
The two main theories of visual perceptual learning suggest that learning either occurs in the lower levels within the receptive fields of sensory neurons [17, 18, 26], or at a higher more cognitive level [34] as a reweighting of the decision unit. We predicted that should global motion training improve contrast detection, then based on the frequency tuning of the early visual area, we would expect specificity for the spatial frequency of training. In this case we would expect improvement on the low frequency contrast stimuli for low and broad trained groups and for the high frequency contrast stimuli improvement from the high and broad trained groups. Improvement was exclusive to the low frequency contrast condition for the low trained group. The specificity of transfer to low frequencies is consistent with V5 influencing processing at V1 through feedback loops [107]. This remains compatible with the view that global motion detectors pool for low frequency information. Suggesting that the re-entrant connections to V1 after global motion training only update low frequency channels.
We predicted that should transfer occur between motion conditions it would be most likely to occur either between groups that shared frequency content, or based on the frequency turning of V5. The group trained on high spatial frequencies showed no improvement and no transfer. Since high spatial frequency content has been shown to be attenuated in V5 [95] this again suggests that improvements for the untrained tasks are absent due to the updating of low frequency channels. The groups trained on low and broad frequency motion conditions showed transfer to low spatial frequency contrast detection, also supporting role of the back projections to low frequency channels.
The low frequency training condition also showed some improvement for the broad motion test stimuli, with a significant increase in asymptotic performance. The low frequency (and to a lesser part the broad) conditions will have experienced a high level of correlation in the activity in frequency channels between the feedforward and feedback connections. We speculate that this joint activity may be important for perceptual learning. The improvement for the low frequency contrast detection condition from the broad and low trained groups are consistent with this hypothesis.
Dosher and Lu (2010) [33] suggest that the reweighting of decision units is likely to be the dominant mechanism of learning. However, if as proposed by Dosher et al. learning occurred primarily as a result of an increase to the weight at a higher level decision making unit, we may have expected that the learning from the training, at the very least, be maintained at the assessment stage. However, our results found performance at post-assessment stage were not unambiguously better for all groups. The only difference between the exercises was that trial-by-trial feedback was present for the training phase and absent for the testing phase. Our pilot study found that, for our methods, learning occurs with feedback, but not without. Additionally, Herzog and Fahle (1997) [111]) suggested that once feedback is removed, performance plateaus around the level last obtained with feedback, so our expectation was that observers would maintain the improvement achieved for their trained frequency. This was the motivation for removing feedback during testing.
We offer some speculative ideas that may help account for the general absence of improvement at post-assessment stage. There is emerging evidence that interleaving random stimuli may disrupt perceptual learning [147]. Yu et al., (2004) [143] found that when interleaving random (as when using MOCS) contrasted Gabor stimuli, learning did not occur, but did when only one contrast was presented at a time. Yu et al., (2004) [143] termed the interleaving of contrast stimuli as “contrast roving”, and suggest that the effect of non-learning as a result of contrast roving implies that a temporally organised pattern of stimulus presentation may be required for perceptual learning to occur. Additionally, Kuai ee al., (2005) [145] found that temporal patterning was evident for motion direction discrimination. Observers were either trained using randomly varied trials or with a fixed temporal pattern of exposure (changing in a sequential clockwise direction). The first groups showed no improvement in discrimination, however the second group with the fixed temporal exposure did. Cong and Zhang (2014) [148] showed that a semantic tag associated with each stimulus presentation enabled significant learning. Sequential tags such as A,B,C,D and 1,2,3,4 which contain semantic and sequential identity information both resulted in significant learning. Cong and Zhang (2014) [148] suggest that tagging and temporal sequencing may help direct the system to switch attention to the neuronal set responsible to the specific stimulus. These are consistent with the experiments conducted by Seitz et al., (2006) [112], which as previously discussed found no learning without feedback using MOCS.
There is some evidence that interleaving easy and difficult trials for perceptual training may only lead to a temporary improvement, that (a) does not persist and (b) disappears when easy trials are removed [149, 150]. In a difficult task without feedback learning does not occur, however it does with feedback [106], since feedback improves detection for difficult tasks. Removing the feedback may be equivalent to removing the easy trials. Within a training session comprised of difficult trials, Lin and Dosher (2017) [105] presented observers a single block of mixed easy and difficult trials which led to rapid sustained improvement in accuracy within the block. However, after removing the easy trials and presenting only the difficult trials performance slowly returned to prior levels. In our study, improvement was sustained only while feedback was present, with the exception for the low frequency trained group. Since learning and transfer was evident for the low spatial frequency trained group, this may suggest an interaction between the amount of learning, and the benefit of feedback. In the broad and high spatial frequency trained groups, the amount of learning that took place was not sufficient to overcome the reduction in performance when feedback is removed. In contrast, learning for the low spatial frequency trained group was robust and persistent, and thus incurred less disruption by the absence of feedback.
Ultimately, neither model alone accounts for the findings in this study and, while the two positions are pitched as competing theories, they may not be mutually exclusive. In an attempt to combine the two models, Solgi and Weng (2013) [151] models learning as a two-way process, (descending and ascending) that attributes training effects to the connections between the early and higher sensory areas, and also as an increased representation in the lower level. For transfer to occur, Solgi and Weng (2013) [151] argue there is an important role for the re-entrant connections from higher levels, our results are consistent with Solgi and Weng (2013) [151] and suggest that the re-entrant connections play a vital role in transfer.
Conclusions
Our results support the Reverse Hierarchy Theory that the feedback connections from higher levels back to early visual processing areas may be involved in facilitating learning that transfers [54, 55], and are consistent with models that suggest that the the re-entrant connections play a vital role in learning and transfer [38, 151].
Our first experiment suggested that for perceptual learning of an equivalent noise global motion task, presented using MOCS, feedback was a necessity. However after training with feedback, there was little evidence of a robust improvement when feedback was removed at the post-assessment stage. The only condition that showed a unambiguous learning and transfer were the low spatial frequency trained group. Our findings are in line with current research which finds global motion detectors pool for low spatial frequencies.
Our main experiment found robust learning and transfer for the group trained with low spatial frequency specific global motion gabors. This has provided evidence of frequency specific transfer from in global motion perception, and to a static untrained task. The frequency tuning of these results is consistent with perceptual learning which depends on the global stage of motion processing in V5.
Acknowledgments
This research was funded by a University of Essex Doctoral studentship, and grants from ESSEXLab and PsyPAG to J.A.
References
- 1.↵
- 2.↵
- 3.
- 4.↵
- 5.↵
- 6.
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.
- 73.
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.
- 100.
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.↵
- 108.↵
- 109.↵
- 110.↵
- 111.↵
- 112.↵
- 113.↵
- 114.
- 115.↵
- 116.↵
- 117.↵
- 118.↵
- 119.↵
- 120.↵
- 121.↵
- 122.↵
- 123.↵
- 124.↵
- 125.↵
- 126.↵
- 127.↵
- 128.
- 129.↵
- 130.↵
- 131.
- 132.
- 133.
- 134.
- 135.↵
- 136.↵
- 137.↵
- 138.↵
- 139.↵
- 140.↵
- 141.↵
- 142.↵
- 143.↵
- 144.
- 145.↵
- 146.↵
- 147.↵
- 148.↵
- 149.↵
- 150.↵
- 151.↵