Variability in error-based and reward-based human motor learning is associated with entorhinal volume

Error-based and reward-based processes are critical for motor learning, and are thought to be mediated via distinct neural pathways. However, recent behavioral work in humans suggests that both learning processes are supported by cognitive strategies and that these contribute to individual differences in motor learning ability. While it has been speculated that medial temporal lobe regions may support this strategic component to learning, direct evidence is lacking. Here we first show that faster and more complete learning during error-based visuomotor adaptation is associated with better learning during reward-based shaping of reaching movements. This result suggests that strategic processes, linked to faster and better learning, drive individual differences in both error-based and reward-based motor learning. We then show that right entorhinal cortex volume was larger in good learning individuals—classified across both motor learning tasks—compared to their poorer learning counterparts. This suggests that strategic processes underlying both error- and reward-based learning are linked to neuroanatomical differences in entorhinal cortex. Significance Statement While it is widely appreciated that humans vary greatly in their motor learning abilities, little is known about the processes and neuroanatomical bases that underlie these differences. Here, using a data-driven approach, we show that individual variability in error-based and reward-based motor learning is tightly linked, and related to the use of cognitive strategies. We further show that structural differences in entorhinal cortex predict this intersubject variability in motor learning, with larger entorhinal volumes being associated with better overall error-based and reward-based learning. Together, these findings provide support for the notion that the ability to recruit strategic processes underlies intersubject variability in both error-based and reward-based learning, which itself may be linked to structural differences in medial temporal regions.


40
The human brain's capacity to learn new motor commands is fundamental to almost all activities 41 we engage in. Traditionally, such learning has been viewed as an implicit, procedural process of 42 the motor system, with neural studies focusing on brain areas in the frontoparietal cortex, striatum 43 or cerebellum (Doya, 2000 In reward-based learning, the form of learning in which motor commands are updated by signals 64 related to success or failure (Sutton and Barto, 2018), the use of cognitive strategies have also 65 been shown to play a pivotal role in performance Holland et al., 2018). 66 Conventionally, reward-based learning has been shown to involve neural circuits in the basal 67 ganglia and striatum (Doya, 2000), but there is also some emerging evidence to suggest 68 contributions from MTL regions (Gershman and Daw, 2017;Duncan et al., 2018). A key feature 69 of reward-based learning is that it is achieved through exploration (i.e., the brain figuring out motor 70 commands that increase success). Insofar as such exploration is facilitated by strategies, MTL 71 structures may also contribute to performance during reward-based motor learning. context (Tolman, 1948;O'Keefe and Nadel, 1978). Such maps are likely to be critical when 82 forming new action-outcome associations, as is the case when searching for and implementing 83 strategies during motor learning. 84

85
Here we asked whether individual differences in motor learning performance are linked to 86 hippocampal and entorhinal volume in humans. To examine this, we had human participants 87 undergo a structural neuroimaging session in addition to performing separate error-based and 88 reward-based learning tasks, both known to elicit the use of strategies. We show that learning 89 performance in both motor tasks is directly related and that better overall learning across tasks is 90 associated with larger entorhinal cortex volume. 91

92
Participants 93 The current study used a subset of participants (N=34; 18 men and 16 women, aged 20-35 years) 94 from a larger cohort study (registered at https://osf.io/y8649) in which 66 right-handed paid 95 volunteers underwent structural and resting state MRI scans. Our thirty-four participants took part 96 in an error-based and reward-based motor learning testing session in addition to participation in 97 the main study. One of these participants was excluded from further analysis because of a high 98 number of invalid trials in the error-based learning task (>25%), thus leaving 33 participants for 99 analysis. 100

101
The main experiment and motor learning follow-up tasks were approved by the Queen's 102 anterior-to-posterior encoding; 2 ⨉ acceleration factor) and an ultra-high resolution T2-weighted 121 volume centred on the medial temporal lobes (resolution 0.5 x 0.5 mm2; 384 ⨉ 384 matrix; slice 122 thickness 0.5 mm; 104 transverse slices acquired parallel to the hippocampus long axis; anterior-123 to-posterior encoding; 2 x acceleration factor; TR 3200 ms; TE 351 ms; variable flip angle; echo 124 spacing 5.12 ms). The whole brain protocols were selected on the basis of protocol optimizations 125 designed by Sortiropoulos and colleagues (2013). The hippocampal protocols were modeled after 126 Chadwick and colleagues (2014). In addition, we acquired two sets (right-left direction and left-127 right direction) of whole-brain diffusion-weighted volumes (64 directions, b = 1200 s/mm2, 93 128 slices, voxel size = 1.5 ⨉ 1.5 ⨉ 1.5 mm3, TR 5.18 s, TE 103.4 ms; 3 times multiband acceleration), 129 plus two extra B0 scans gathered separately for each orientation. 130 Data analysis 131 Automated cortical and subcortical segmentation of the T1-weighted and T2-weighted brain data 132 was performed in Freesurfer (v6.0) (Fischl et al., 2002(Fischl et al., , 2004 anterior and posterior hippocampus in each hemisphere. The ultra-high-resolution T2-weighted 140 0.5mm isotropic medial temporal lobe scans were submitted to automated segmentation using 141 HIPS, an algorithm previously validated to human raters specialized in segmenting detailed 142 neuroanatomical scans of the hippocampus (Romero et al., 2017). Three independent raters were 143 trained on segmenting the hippocampus at the uncal apex into aHC and pHC segments, and 144 achieved a Dice coefficient of absolute agreement of 80%. Two of these raters independently 145 segmented all participants using the 0.5 mm T1-weighted scans. The T2-weighted medial 146 temporal lobe scans were registered to the T2-weighted whole-brain scans, which were in turn 147 registered to the T1-weighted whole-brain scans, and the combined transform was used to place 148 the rater landmarks on the detailed medial temporal lobe scans. Finally, the total number of voxels 149 in each subregion was multiplied by the volume of each voxel to obtain a total aHC and pHC 150 volume. 151

152
To account for differences in head size, all regional volumes were corrected for total intracranial 153 (IC) volume obtained from Freesurfer. This was done by first estimating the slope b of the 154 regression line of each regional volume on the IC volume across the 33 participants included in 155 the analysis. Next, each regional volume was adjusted for the IC volume as: adjusted volume = 156 raw volume -b ⨉ (IC volume -mean IC volume). 157 158 General procedure 159 Thirty-four participants performed an error-based and a reward-based motor learning task. We 160 attempted to fully counterbalance the tasks across participants; The first 19 participants 161 performed the error-based motor learning task before performing the reward-based motor 162 learning task, with the next 15 participants performing the reward-based motor learning task 163 before the error-based motor learning task. The eye movement data were not analyzed in this study. The stimuli and motor learning tasks are 175 described in detail below. 176

Motor learning tasks
Reward-based motor learning 177 Task 178 Our task was inspired by the reward-based learning task designed by Dam and colleagues (Dam 179 et al., 2013). Participants performed reaching movements from a start position to a target line by 180 sliding the stylus across the tablet. They were instructed to "find an invisible curved path by 181 drawing paths on the tablet and evaluating your score for each attempt". Participants started with 182 a practice block of 10 trials, in which they traced a visible, straight line between the start position 183 and the target, to become familiar with the task and the timing requirement of performing the 184 movement within 2 s. Next, participants performed 12 blocks, each containing 20 attempts to copy 185 an invisible path, which differed in each block. The median score in trials 11 to 20 of each block of 20 attempts were used as a measure of 214 learning performance. We did not use trials 1-10 in our analysis based on our frequent observation 215 that participants who learned fairly quickly often used exploratory strategies when encountering 216 a new path, which often resulted in scores of, or around, zero on several trials ( Fig screen. After performing a baseline block, participants performed a visuomotor rotation task, a 230 task that has been used extensively to assess error-based learning (e.g., Cunningham, 1989; 231 Krakauer et al., 2005). In this task, the movement of the cursor representing the hand position is 232 rotated about the hand start location, in this experiment by 45º clockwise, requiring that a 233 counterclockwise adjustment of movement direction be learned. 234 235 Each trial started with the participant moving the stylus to a central start position (5 mm radius 236 circle; Fig. 1D). When the (unseen) cursor was within 5 cm of the start position, a ring was 237 presented around the start position to indicate the distance of the cursor, so that the participant 238 had to reduce the size of the ring to move to the start position. The cursor (4 mm radius circle) 239 appeared when the cursor 'touched' the start position (9 mm distance). After the cursor was held 240 within the start position for 500 ms, the target (6 mm radius open circle) was presented on an 241 (imaginary) 10 cm radius ring around the start position at one of eight locations, separated by 45º 242 (i.e., 0, 45, 90, 135, 180, 225, 270 and 315º). In addition, 64 non-target 'landmarks' (3 mm radius 243 outlined circles, spaced 5.625º apart) were presented, forming a 10 cm radius ring around the 244 start position. After a 2 s delay, the target would 'fill in' (i.e., color red), providing the cue for the 245 participant to perform a fast movement to the target. If the participant started the movement before 246 the cue, or more than 1 s after the cue, the trial was aborted and a feedback message indicating 247 "Too early" or "Too late" appeared on the screen, respectively. In correctly timed trials, the cursor 248 was visible during the movement to the ring and then became stationary for 1 s when it reached 249 the ring, providing the participant with visual feedback of their endpoint error. When any part of 250 the cursor overlapped with any part of the target, the target would color green to indicate a hit. If 251 the duration of the movement was longer than 300 ms, a feedback message "Too slow" would 252 appear on the screen. 253

254
In trials in the rotation block, the movement of the cursor was rotated by 45º clockwise around the 255 start position. To assess the contribution of the explicit process of learning, participants performed 256 several 'reporting' trials. These trials were performed at the end of the first rotation block to ensure 257 that participants' learning behavior would not be influenced, as the reporting procedure itself can 258 increase the proportion of participants that implement a cognitive strategy (3). In reporting trials, 259 participants were instructed to, before each reach movement, report the aiming direction of their 260 hand for the cursor to hit the target. They did this by turning a knob with their left hand, to rotate To do the PCA, we first transformed the variables from the error-based learning task, whereby all 298 angles were converted to errors with respect to the target, such that zero corresponds to a target 299 hit, negative errors (i.e., between -45º and 0º) correspond to no or partial compensation of the 300 rotation, and positive errors correspond to overcompensation of the rotation. This transformation 301 ensured that higher values on both the error-based and reward-based motor learning tasks were 302 associated with better learning performance. We then standardized all scores before submitting 303 them to the PCA. The principal components (PCs) were obtained using the pca function in Matlab, 304 which uses a singular value decomposition algorithm to find PCs that capture the maximal 305 variance in the data. 306

307
To test the hypothesis that better performance in the motor learning tasks is related to greater 308 volumes of brain areas in the medial temporal lobe, we performed multiple linear regression 309 analyses. All models were estimated using the fitlm function in Matlab, which returns a least-310 squares fit of the scores to the data. Our primary analysis included the left and right HC and EC 311 volumes. To control for a potential effect of overall head size on learning performance, we also 312 included each participant's total intracranial volume, making a total of five neuroanatomical 313 measures. For the first and second PC, we fitted a multiple linear regression model with the PC 314 as the dependent variable, and the set of four regional volumes plus the IC volume as predictors. 315 Previous studies have reported differential relationships between the anterior and posterior parts 316 of the hippocampus and memory (e.g., Maguire et al., 2000). Therefore, we performed a 317 secondary analysis, including the left and right anterior and posterior HC volume as predictors, 318 and the IC volume as a confounder. 319

320
In order to determine the relationship between motor performance in reward-based and error-321 based learning tasks, and the extent to which the size of hippocampal and entorhinal cortex may 322 be associated to such learning, we collected high-resolution structural MRI scans from 323 participants (N=34) prior to performing two separate motor learning tasks outside the scanner. In 324 the reward-based learning task, participants learned to copy an invisible, curved path through trial 325 and error, using only a score (between 0 and 100 points) to improve their performance. This score, 326 presented at the end of each trial, indicated how closely the participants' drawn path 327 corresponded to the invisible path (Fig. 1B). Participants drew these paths on a digital drawing 328 tablet from a start to a target position displayed on a vertical monitor (Fig. 1A), and were instructed 329 to maximize their score. To obtain a representative measure of each participant's reward-based 330 learning rate and ability, we had participants perform this task for 12 different invisible paths, with 331 20 attempts for each. Participants were naive to the possible shapes of the paths, which were 332 shaped as single curves (i.e., half sine waves) and double curves (i.e., full sine waves) between 333 the start and target position, with different amplitudes (see Fig. 1C). Because participants received 334 only visual feedback about their path trajectory-and never the rewarded path-they did not 335 receive error-based information that could be used to guide learning. By design, this reward-336 based task requires implementing a search strategy to first find the invisible path and then refine 337 the drawn path, and we thus predict that participants who perform well in this task are better at 338 implementing such strategies. 339 340 For the error-based learning task, we used the classic visuomotor rotation learning paradigm 341 (Cunningham, 1989), wherein participants had to adjust their movements to a 45° rotation of the 342 cursor movement, which represented participants' hand movements, in order to hit visual targets 343 (Fig. 1D). Participants performed center-out reaching movements on the drawing tablet to one of 344 eight targets displayed on a monitor. After a baseline phase with veridical cursor feedback, 345 participants were exposed to the 45° visuomotor rotation of the movement of the cursor, requiring 346 an adjustment of the reaching movement in the opposite direction. Learning in this task consists 347 of two components: automatic, implicit adjustments of the reach direction, resulting in gradual 348 changes in performance, and the implementation of an aiming strategy to counteract the rotation,    372 The black traces in Figure 1C and 1E show the learning curves, averaged across all participants, 373 for the reward-based and error-based learning tasks, respectively. These figures demonstrate 374 that participants learned to increase their scores in the reward-based task and change their hand 375 angle in the error-based task across trials. However, these group-averaged results may be 376 somewhat misleading, as they obscure significant intersubject variability in both the rates and 377 levels of learning obtained (see gray traces in Fig. 1C,E, which depict single participants). For 378 example, Figure 2 shows the behavior of two participants, one 'good' overall learner and one 379 'poor' overall learner, in both the reward-based learning task and the error-based learning task. 380 Figure 2A and 2B depict the paths that the participants drew (left panel) and the corresponding 381 scores (right panel), in two blocks of the reward-based learning task for a single (top) and double 382 curve (bottom) with the largest amplitude (blocks 4 and 11 for the participant in Fig. 2A; blocks 11  383 and 10 for the participant in Fig. 2B). While both participants quickly converged on a good solution 384 for the single curve, resulting in scores close to 100, the movements of the participant in Figure  385 2A resemble the invisible curve more closely. In addition, while the participant in Figure 2A quickly 386 converges upon a solution that has a similar shape to the invisible double curve, the participant 387 in Figure 2B never learns to draw that same double curve, and their score remains low. 388 389 Figure 2C and 2D show, for the same two participants, the median hand angle (in blue) for each 390 bin of eight trials across the error-based learning task, as well as the reported aiming angle (in 391 purple) assessed near the end of the first rotation block. Appropriate corrections for the 392 visuomotor rotation are plotted as positive values; that is, a hand angle of 45° corresponds to full 393 compensation for the rotation. The participant in Figure 2C shows quick adjustment of the hand 394 angle towards 45° in the first and second rotation block, and a quick return towards 0° in the 395 washout block. Such fast learning is associated with a large contribution of an aiming strategy, 396 consistent with their reported aiming angles around 39°. The participant in Figure 2D, by contrast, 397

Performance in reward-based and error-based motor learning is related
shows only gradual adjustments of the hand angle in the rotation and washout blocks, and 398 correspondingly reports aiming values around 0°, suggesting that learning in this participant is 399 mainly driven by the implicit process. Overall, the participant in Figure 2A,C showed better 400 learning performance in both tasks than the participant in Figure 2B

414
For each participant, we obtained two learning scores for the reward-based learning task (single 415 and double curves) and three learning scores for the error-based learning task (early and late 416 learning in rotation block 1 and 2, and the reported aiming angle; Fig 3A). Across the entire group 417 of participants, we observed several significant correlations in the learning scores both within and 418 between the two tasks (Fig. 3B). Notably, the latter demonstrates clear patterns of covariation in 419 subject-level performance across both the error-based and reward-based motor learning tasks. 420 To derive single participant measures of learning that capture these patterns of covariation, and 421 that can be used to relate overall learning performance to the neuroanatomical data collected in 422 these same participants, we performed a principal component analysis (PCA) on the learning 423 scores (see Methods for details). We found that the first (PC1) and second principal components 424 (PC2) explained 53.8% and 22.3% of the variance in the data (76.1% overall), respectively. The 425 projection plots in Figure 3C 465 learning 466 Having clearly established that subject-level performance in reward-based and error-based 467 learning is related and that this pattern of covariation can be captured by a single measure (i.e., 468 PC1), our next aim was to determine whether this variation in performance is associated with the 469 neuroanatomy of the MTL. To this end, we performed multiple linear regression analyses using 470 right and left hippocampus (HC) and entorhinal cortex (EC) volumes as predictors and PC1 as 471 the outcome variable (Fig. 4AB), corrected for total intracranial volume (see Methods). We also 472 included total intracranial (IC) volume in our model to account for a potential effect of overall head 473 size. Figure 4C  hippocampus as separate predictors, as previous studies have reported differential relationships 486 between these individual parts of the hippocampus and memory (e.g., Maguire et al., 2000). 487

Larger entorhinal volume is associated with better error-and reward-based motor
However, here we again did not find significant relationships between left and right aHC and pHC

505
While previous work in motor learning has often studied error-based and reward-based learning 506 processes in isolation from one another, recently there has been increased interest in 507 understanding how these separate learning processes interact at the behavioral and neural levels. 508 Here we find a strong relationship in intersubject variability between error-based and reward-509 based motor learning, showing that learning performance across tasks is correlated and can be 510 explained by a single, latent variable. Our measures of learning and the nature of the tasks used 511 suggest that this latent variable captures participants' use of cognitive strategies during learning, 512 with higher scores on this variable being associated with faster and better overall learning in both 513 tasks. We further show, using structural neuroimaging and regression analyses with participants' 514 hippocampus and entorhinal cortical volumes as predictors, that higher scores on this latent 515 variable, and thus faster and better overall learning, is associated with larger right entorhinal 516 cortex volumes. Together, these findings suggest that a shared strategic process underlies 517 individual differences in error-based and reward-based motor learning, and that this process is 518 associated with structural differences in entorhinal cortex. 519 520 Considerable computational and neural work has argued for a division of labor between the neural 521 circuits that support error-based and reward-based learning (Doya, 1999(Doya, , 2000 Recent evidence from our group further indicates that faster learning across participants is linked 547 to individual differences in the magnitude of the cognitive strategy (de Brouwer et al., 2018), which 548 drives rapid changes early in the learning process. In reward-based learning, by contrast, the 549 contribution of cognitive strategies to performance have received comparably little attention, and 550 is only beginning to be established. As one example, recent work, wherein participants were only 551 provided with reward-based feedback (binary success/failure) to perform a visuomotor rotation 552 task, has shown that good versus poor learning is related to the implementation of a cognitive 553 component . This was evidenced by the observed reduction in reach angle 554 when participants were required to remove their aiming strategy (see also Codol et al., 2018). It 555 was also evidenced by the observation that the reward-based learning was impaired when (1) 556 participants had to perform a dual task (a separate mental rotation task) that divided their cognitive 557 load , or when (2) participants' reaction times were constrained (Codol et al.,558 2018), such that they could not implement the strategy (Haith et al., 2015). To date, work 559 examining the link between error-and reward-based learning has focused on how reinforcement 560 signals (e.g., binary success/failure) shape learning in traditionally error-based tasks ( An influential hypothesis is that the hippocampal-entorhinal system supports a cognitive map, an 584 idea that was originally proposed to explain findings in rodents (Tolman, 1948;O'Keefe and 585 Nadel, 1978) and later extended to humans (for review see Epstein et al., 2017). This hypothesis 586 proposes that the brain creates flexible representations of the environment to not only support 587 memory but also guide future decisions and effective (motor) behavior (Schiller et  based learning tasks. Given that motor learning has a strong visual-spatial component 608 (particularly so in our tasks), we find it noteworthy that it is the right, and not left, entorhinal cortex 609 that is associated with the processing and integration of visual-spatial information (Dalton et al., 610 2016). 611