Absolute and relative pitch processing in the human brain: neural and behavioral evidence

Pitch is a primary perceptual dimension of sounds and is crucial in music and speech perception. When listening to melodies, most humans encode the relations between pitches into memory using an ability called relative pitch (RP). A small subpopulation, almost exclusively musicians, preferentially encode pitches using absolute pitch (AP): the ability to identify the pitch of a sound without an external reference. In this study, we recruited a large sample of musicians with AP (AP musicians) and without AP (RP musicians). The participants performed a pitch-processing task with a Listening and a Labeling condition during functional magnetic resonance imaging. General linear model analysis revealed that while labeling tones, AP musicians showed lower blood oxygenation level-dependent (BOLD) signal in the inferior frontal gyrus and the presupplementary motor area—brain regions associated with working memory, language functions, and auditory imagery. At the same time, AP musicians labeled tones more accurately suggesting that AP might be an example of neural efficiency. In addition, using multivariate pattern analysis, we found that BOLD signal patterns in the inferior frontal gyrus and the presupplementary motor area differentiated between the groups. These clusters were similar, but not identical compared to the general linear model-based clusters. Therefore, information about AP and RP might be present on different spatial scales. While listening to tones, AP musicians showed increased BOLD signal in the right planum temporale which may reflect the matching of pitch information with internal templates and corroborates the importance of the planum temporale in AP processing. Taken together, AP and RP musicians show diverging frontal activations during Labeling and, more subtly, differences in right auditory activation during Listening. The results of this study do not support the previously reported importance of the dorsolateral prefrontal cortex in associating a pitch with its label.


Introduction
In the general population, the prevalence of AP is roughly estimated to be less than one in 10,000 (Bachem 1955). 31 Therefore, it is unsurprising that previous neuroscientific studies examining AP used small sample sizes. However, 32 small samples result in low statistical power, which increases both the occurrence of false-negative and false-33 positive results (Button et al. 2013). As a consequence, previous neuroscientific AP studies reported inconsistent 34 or even conflicting results. In this study, we aimed to counteract the statistical problems associated with small 35 sample sizes by collecting and analyzing data from a large sample of musicians (n = 101). Using fMRI, we revisited 36 the topic of pitch processing in AP and RP musicians. Similar to the aforementioned PET study, we employed a 37 pitch-processing task comprising two experimental conditions (Listening vs. Labeling). Both AP and RP 38 processing represented adequate strategies to solve the task due to its low difficulty (Itoh et al. 2005). Because 39 individuals possessing AP preferentially encode pitches absolutely and non-possessors preferentially encode 40 pitches relatively (Miyazaki and Rakowski 2002), the task allowed us to contrast AP and RP processing by 41 comparing AP musicians with RP musicians. 42 According to the two-component model, AP musicians differ from RP musicians by having an association between 43 the long-term representation of a pitch and its label (Levitin 1994). The retrieval of this pitch-label association 44 might already occur during Listening and, to successfully perform the task, it must occur during Labeling (Zatorre 45 et al. 1998). At the same time, AP musicians need not rely on working memory processes during Labeling (Itoh et 46 al. 2005). For these reasons, we predicted smaller differences in AP musicians between Listening and Labeling 47 both in BOLD signal responses and behavior. Because of their suggested role in AP processing, we expected an 48 involvement of the posterior DLPFC and/or the planum temporale in AP musicians during Listening. Furthermore, 49 we expected an involvement of the IFG in RP musicians during Labeling because of its association with working 50 memory. Apart from conventional general linear model (GLM) analysis, we applied multivariate pattern analysis 51 (MVPA) to the unsmoothed fMRI data to localize brain regions differentiating between AP and RP musicians. As 52

59
Participants. Fifty-two AP musicians and 50 RP musicians completed the pitch-processing task. Due to a technical 60 error during the fMRI data export, one participant of the AP group was excluded, leaving the data of 101 61 participants for data analysis. Group assignment of the participants was based on self-report and confirmed by a 62 tone-naming test (see below). Using both the information from self-report and a tone-naming test is advantageous 63 because the assignment does not rely on an arbitrary cut-off concerning the tone-naming scores. Some RP 64 musicians demonstrated a high level of proficiency in tone-naming that was above chance-level (8.3%). It is 65 plausible that these participants used an internal reference (e.g. tuning standard 440 Hz) in combination with RP 66 processing (or another yet unknown strategy) to solve the tone-naming test. The two groups were matched for sex, 67 handedness, age, musical experience, and intelligence (see Table 1). 68 aptitude, and musical experience) was collected with an online survey tool (www.limesurvey.org). Self-reported 81 handedness was confirmed using a German translation of the Annett questionnaire (Annett 1970). Musical aptitude 82 was measured using the Advanced Measures of Music Audiation (AMMA) (Gordon 1989). Crystallized 83 intelligence was estimated in the laboratory using the Mehrfachwahl-Wortschatz-Intelligenztest (MWT-B) (Lehrl 84 2005) and fluid intelligence was estimated using the Kurztest für allgmeine Basisgrößen der 85 Informationsverarbeitung (KAI) (Lehrl et al. 1991). All participants provided written informed consent and were 86 paid for their participation. The study was approved by the local ethics committee (www.kek.zh.ch) and conducted 87 according to the principles defined in the Declaration of Helsinki. 88 Tone-Naming Test. Participants completed a tone-naming test to assess their tone-naming proficiency (Oechslin 89 et al. 2010;Elmer et al. 2015). During the test, 108 pure tones were presented in a pseudorandomized order. Each 90 tone from C3 to B5 (twelve-tone equal temperament tuning, A4 = 440 Hz) was presented three times. The tones 91 had a duration of 500 ms and were masked with Brownian noise (duration = 2000 ms), which was presented 92 7 black fixation cross on a grey background was presented on a screen. Stimulus presentation was controlled by 111 Presentation software (version 17.1, www.neurobs.com). 112 The task consisted of two experimental conditions: a Listening condition and a Labeling condition. These 113 conditions only differed in the instructions given to the participants. In the Listening condition, participants had to 114 press one response pad button (right middle finger) when they heard a pure tone, and another button (right index 115 finger) when they had heard a noise segment. In the Labeling condition, participants had to label the pure tones by 116 pressing one of three corresponding buttons on the response pad (right middle, ring, and little finger in response 117 to C4, D4, and E4, respectively) and another button (right index finger) when they had heard a noise segment. The 118 participants were instructed not to verbally respond and to respond as quickly and as accurately as possible. The 119 accuracy of the responses and the response time were recorded via the response pad (4 button curved right, Current 120 Designs INC, Philadelphia, PA, USA). Both conditions lasted for two runs each. The Listening condition always 121 preceded the Labeling condition to avoid spillover effects from the Labeling onto the Listening condition. If the 122 order had been the other way around, AP musicians might have been tempted to still covertly label the tones in the 123 Listening condition. 124 Statistical Analysis. In-scanner behavioral measures (response accuracy and response time) were analyzed in R 125 (version 3.3.2, www.r-project.org). Separately for each measure, we performed a mixed-design ANOVA with a 126 within-subject factor Condition (Listening vs. Labeling) and a between-subject factor Group (AP vs. RP). 127 Subsequently, the two measures were separately compared within each condition using Welch's t-tests. Next, we 128 calculated differences in both measures by subtracting the Listening from the Labeling condition for each subject. 129 These differences were then compared between the groups again using Welch's t-tests. Finally, the differences 130 were correlated with the tone-naming scores using the Pearson correlation coefficient. The significance level was 131 set to P < 0.05. Generalized eta-squared (η 2 G) was used as an effect size for effects within an ANOVA and Cohen's 132 d (d) for t-tests. 133 Imaging Data Acquisition and Preprocessing. Imaging data was acquired on a Philips Ingenia 3.0 T MRI system 134 (Philips Medical Systems, Best, The Netherlands), equipped with a commercial 15-channel head coil. Whole-brain 135 functional images were acquired in four runs using a T2*-weighted gradient echo (GRE) echo planar imaging 136 (EPI) sequence (scan duration of one run = 380 s). The T2*-weighted sequence had the following parameters: TR 137 8 x 3.0 mm 3 , reconstructed voxel size = 2.75 x 2.75 x 3.6 mm 3 , reconstruction matrix = 80 x 80, number of dummy 140 scans = 3, total number of scans = 122. 141 In addition, a whole-brain structural image was acquired using a T1-weighted GRE turbo field echo sequence (scan 142 duration = 350 s). The T1-weighted sequence had the following parameters: TR = 8100 ms, TE = 3. series was modeled using a GLM. The first-level design matrix contained, for each run separately, two regressors 158 of interest (onsets of pure tones, onsets of noise segments) and one regressor of no interest (onsets of button 159 presses). These regressors were modeled by convolving delta functions with the canonical double-gamma 160 hemodynamic response function (HRF). Furthermore, we included the six motion parameters estimated during 161 preprocessing as nuisance regressors and applied a high-pass filter (cutoff = 128 s) to remove low-frequency drifts. 162 The following first-level contrasts of interest were calculated: Tones Listening > Noise Listening and Tones Labeling > 163 Noise Labeling. Following the logic of cognitive subtraction, these contrasts reflect BOLD signal increases associated 164 with pitch processing. 165 Second-level random effects analysis was performed using non-parametric permutation tests as implemented in 9 between Group (AP vs. RP) and Condition (Listening vs. Labeling). To facilitate the interpretation of the 170 interaction, difference images were created for each subject by subtracting the contrast image of the Listening 171 condition (Tones Listening > Noise Listening) from the contrast image of the Labeling condition (Tones Labeling > Noise 172 Labeling). These difference images were entered in SnPM13 as inputs for a two sample t-test to compare AP and RP 173 musicians (cluster-wise inference, 10000 permutations, cluster defining threshold (CDT) P < 0.001). An 174 anatomically defined mask was used to restrict the search space of the analysis to a priori defined brain regions. 175 To create this mask, we used probability maps of the following bilateral brain regions included in the Harvard-176 Oxford cortical atlas (http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/Atlases). Two follow-up analyses with the same mask were performed. To determine the effects of condition within each 183 group, we entered the difference images as inputs for a one sample t-test for each group separately (cluster-wise 184 inference, 10000 permutations, CDT P < 0.001). To determine the effects of group within each condition, we 185 entered the first-level contrast images (Tones Listening > Noise Listening, Tones Labeling > Noise Labeling) as inputs for a 186 one sample t-test for each condition separately (cluster-wise inference, 10000 permutations, CDT P < 0.001). The 187 significance level for all analyses was set to P < 0.05, FWE-corrected for multiple comparisons. 188 MVPA. We carried out a specific type of MVPA, namely searchlight analysis as implemented in PyMVPA 189 (version 2.6.1, www.pymvpa.org) to detect brain regions containing fine-grained BOLD signal patterns which In total, we performed three searchlight analyses using the different images (difference images, Listening contrast 205 images, Labeling contrast images) as inputs. In all analyses, a sphere was moved across all voxels of the 206 anatomically defined mask that was also used in the GLM analysis. Each sphere had a radius of three voxels (9 207 mm) and consisted of one center voxel and (at most) 122 surrounding voxels. In every sphere, a linear support 208 vector machine (C = 1) was trained and tested using a 5-fold cross-validation. For the cross-validation, the input 209 images were pseudorandomly partitioned into five chunks under the restriction that each chunk contained the same 210 number of images of AP musicians and RP musicians. One chunk contained 11 images of AP musicians (instead 211 of 10), because our analyzed sample included 51 AP and 50 RP musicians. The average classification accuracy of 212 the five folds was written in the location of the center voxel to create a map of classification accuracies (i.e. an 213 information map). 214 To assess the statistical significance of informative clusters, we used non-parametric permutation testing (Nichols 215 and Holmes 2002). For this purpose, each of the three searchlight analyses was repeated with permuted group 216 labels (10000 permutations). For every iteration, the group labels were randomly permuted within each chunk. We 217 used this restriction to balance the number of images per group in each chunk. The resulting permutation set was 218 fixed for the whole searchlight analysis (i.e. across all center voxels of the mask) to preserve the spatial dependency 219 the proportion of cluster sizes under the null distribution that were larger than the empirical cluster size. The 228 significance level was set to P < 0.05, FWE-corrected. 229 ROI Analysis. In addition to the voxel-wise GLM and searchlight analyses, the mean BOLD signal changes in a 230 priori defined ROIs were compared between groups using MarsBaR (version 0.44, www.marsbar.sourceforge.net). 231 We defined four ROIs which have been previously associated with AP processing: left planum temporale ( taken from a seminal study investigating pitch processing in AP, which was the first to associate this brain region 242 with the retrieval of the pitch-label association while AP musicians were listening to tones (Zatorre et al. 1998). 243 The original study reported the coordinates in Talairach space, so we transformed the coordinates into MNI space 244 (Lacadie et al. 2008). The coordinates of the left hemispheric region were flipped at the midsagittal plane to derive 245 the coordinates of the right DLPFC (x = 40, y = 9, z = 42). For each subject and ROI, we extracted first-level 246 contrast values from the Listening condition (Tones Listening > Noise Listening). For each ROI, these contrast 247 values were compared between AP and RP musicians using Welch's t-tests in R. The significance level was set to

251
Behavior. Demographical and behavioral characteristics of the AP musicians (n = 51) and the RP musicians (n = 252 50) were compared using Welch's t-tests. The two groups did not differ in age (t(98.3) = 1.07, P = 0.29), age of onset 253 of musical training (t(98.9) = -0.84, P = 0.40), cumulative musical training (t(95.19) = 0.97, P = 0.33), crystallized 254 intelligence (t(96.4) = -1.48, P = 0.14), and fluid intelligence (t(96.7) = -1.78, P = 0.08). As predicted, AP musicians 255 had a substantially higher tone-naming score than RP musicians (t(99) = 13.53, P < 10 -15 ). There was a trend towards 256 a higher musical aptitude in AP musicians as quantified by the AMMA total score (t(97.2) = 1.99, P = 0.05). Follow-257 up analyses of the AMMA subscores showed that this difference was driven by a slightly higher tonal score in AP 258 musicians (t(96.5) = 2.27, P = 0.03), but there was no difference regarding the rhythm score (t(98.0) = 1.42, P = 0.16). 259 Descriptive statistics of participant characteristics are given in Table 1. 260 The in-scanner behavioral measures were analyzed using a mixed-design ANOVA with a within-subject factor 261 Condition (Listening vs. Labeling) and a between-subject factor Group (AP vs. RP). As shown in Figure 1A, the 262 mixed-design ANOVA of the response accuracy revealed an interaction between the factors Group and Condition 263 (F(1,99) = 8.37, P = 0.005, η 2 G = 0.02). The difference in response accuracy between the two conditions (Labeling 264 minus Listening) was smaller in AP than in RP musicians (Welch's t-test, t(79.1) = 2.88, P = 0.005, d = 0.57). 265 Furthermore, this difference correlated with the tone-naming score (r = 0.41, P < 0.001). On average, the response 266 accuracy was higher in the Listening condition than in the Labeling condition, so this correlation indicates a smaller 267 difference for participants with a higher tone-naming score (see Figure 1C). Additional follow-up analyses showed 268 a higher response accuracy for AP musicians in the Labeling condition (Welch's t-test, t(73.4) = 2.88, P = 0.005, d 269 = 0.57), but not in the Listening condition (Welch's t-test, t(87.7) = 1.10, P = 0.28, d = 0.22). As shown in Figure  270 1B, the mixed-design ANOVA of the response time revealed a Group x Condition interaction (F(1,99) = 8.85, P = 271 0.004, η 2 G = 0.01). The condition difference in response time was smaller in AP musicians (Welch's t-test, t(95.6) = 272 -2.97, P = 0.004, d = 0.59). Again, this difference correlated with the tone-naming score (r = -0.31, P = 0.002) (see 273 Figure 1D). Descriptive statistics of the in-scanner behavioral measures are given in Table 2.  Abbreviations: AP = absolute pitch, RP = relative pitch 290 291 BOLD Signal Changes. The BOLD signal changes were analyzed using a voxel-wise GLM in combination with 292 a second-level mixed factorial design. Parallel to the in-scanner behavioral measures, we found a Group x 293 Condition interaction which was characterized by smaller BOLD signal condition differences in AP musicians. As 294 shown in Figure 2A, this interaction was detected in three frontal clusters (see Table 3 for details).   Figure 2B and 2C, follow-up analyses within each group separately revealed similar BOLD signal 305 differences between the two conditions with the exception of the three clusters described above (bilateral IFG, 306 preSMA). In the bilateral IFG and the preSMA, only RP musicians showed increased BOLD signal in the Labeling 307 condition. In addition, both groups showed increases in the bilateral intraparietal sulcus (IPS) and the bilateral 308 DLPFC (see Table 4). These increases were stronger and more distributed in RP musicians, again indicating larger 309 condition differences. Further follow-up analyses within each condition revealed that there were no group 310 differences in the Listening condition (see Figure 2D). In contrast, AP musicians showed lower BOLD signal in 311 the Labeling condition in the right IFG (PFWE < 0.001, k = 312), the left IFG (PFWE = 0.003, k = 195), and the 312 preSMA (PFWE = 0.005, k = 134). These clusters were equivalent to the clusters of the Group x Condition 313 interaction (see Figure 2E and Table 5  Group Decoding by Searchlight Analysis. In addition to the voxel-wise GLM, we used searchlight analysis to 341 localize BOLD signal patterns which differentiate between the two groups (Kriegeskorte et al. 2006). For the main 342 analysis, we used the difference in BOLD signal patterns between the two conditions as the input. As shown in 343 Figure 3A, group status could be decoded in the left IFG, pars triangularis (PFWE = 0.01, k = 29). The mean 344 classification accuracy within the cluster was 72.5%. In comparison to the left IFG cluster from the GLM Group 345 x Condition interaction, this cluster was located more anteriorly on the IFG. Follow-up analyses were performed 346 with the patterns of each condition separately. Analogous to the GLM analysis, group status could not be decoded 347 based on patterns in the Listening condition. In contrast, group status could be decoded based on Labeling patterns 348 in the preSMA (PFWE < 0.001, k = 81, mean classification accuracy = 70.6%). This cluster substantially overlapped 349 with the preSMA cluster from the GLM (see Figure 3A). However, a complete overlap should not be expected, 350 because searchlight analysis is known to cause slight distortions in the localization (Etzel et al. 2013).

375
In this study, we investigated AP and RP processing in the human brain using task-based fMRI in a large sample 376 of musicians. The GLM analysis revealed smaller BOLD signal differences between Listening and Labeling in AP 377 musicians than in RP musicians. The smaller differences between the conditions were driven by lower BOLD 378 signals in AP musicians during Labeling in the left-and right-sided pars opercularis of the IFG and the preSMA. 379 The in-scanner behavioral measures (response accuracy and response time) mirrored the fMRI data by showing 380 smaller differences between Listening and Labeling in AP musicians. Using MVPA, we found that group status 381 could be decoded in the left-sided pars triangularis of the IFG based on the difference in BOLD signal patterns 382 between Listening and Labeling. Furthermore, group decoding was also possible in the preSMA based on BOLD 383 signal patterns obtained in the Labeling condition. Lastly, the ROI analysis revealed a higher mean BOLD signal 384 in AP musicians during Listening in the right planum temporale which was not detected by the GLM analysis and 385 the MVPA. 386 The IFG is an important target region for auditory information which is propagated from the auditory cortex to the 387 IFG along the ventral stream (the "what" pathway) of auditory processing (Rauschecker and Scott 2009). In this 388 context, the IFG has been repeatedly linked with auditory working memory functions (Schulze et al. 2018). More 389 specifically, the IFG has been associated with working memory for pitch, as shown by both PET and fMRI studies 390 (Zatorre et al. 1994;Gaab et al. 2003). In this study, we observed BOLD signal increases in RP musicians 391 bilaterally in the IFG during Labeling. This increase was not observable in AP musicians. As RP musicians need 392 to use their RP ability to successfully complete the task, it is plausible that the signal increase in the IFG reflects 393 pitch working memory processes as an important aspect of RP processing (McDermott and Oxenham 2008). This 394 interpretation is fully in line with the results of the PET study described in the introduction (Zatorre et al. 1998). 395 involved in speech perception (Friederici 2011). In the right hemisphere, the IFG is linked to the perception of 405 prosody (pitch changes in speech) (Buchanan et al. 2000). Therefore, the BOLD signal increases in RP musicians 406 in bilateral IFG might reflect language-related processes. More concretely, the RP musicians might have engaged 407 in covert articulation of the tone labels as a part of their strategy to label the tones. In contrast, it seems that the 408 AP musicians do not rely on a verbal code to successfully complete the task. This is in accordance with behavioral 409 evidence demonstrating non-verbal coding strategies in AP musicians (Zatorre and Beckett 1989). 410 Mirroring the bilateral IFG BOLD signal increases, the preSMA showed signal increases in RP musicians during 411 Labeling. In addition, the BOLD signal patterns during Labeling in the preSMA contained information about group 412 status. Thus, AP and RP processing were accompanied by differential BOLD signal patterns. The preSMA is 413 anatomically connected to the IFG via the frontal aslant tract and has been implicated in speech production and 414 processing (Catani et al. 2013). More importantly, the preSMA plays a key role in the auditory imagery of pitch 415 (Lima et al. 2016). Auditory imagery generally refers to the generation of auditory information in the absence of 416 sound perception. However, auditory imagery can also involve auditory information that is generated in addition 417 to the currently perceived information. Consequently, RP musicians might have imagined the pitches of previously 418 heard tones to determine the pitch of the current tone. This interpretation is in line with the anecdotal observation 419 that RP musicians often covertly sing pitches in order to identify the musical intervals. It is important to note that 420 the working memory and the language explanations of the IFG and preSMA involvement during Labeling are not 421 mutually exclusive. There is evidence that largely overlapping brain regions are involved in auditory working 422 memory for verbal material and non-verbal material, for example, pitches (Koelsch et al. 2009). 423 The results from the GLM analysis and the MVPA did not fully converge with regard to the localization of the 424 group differences. Most notably, using MVPA, we found that group status could be decoded from BOLD signal 425 patterns in the left-sided pars triangularis of the IFG whereas the GLM revealed BOLD signal differences in the 426 pars opercularis. As mentioned above, these two regions constitute Broca's area. In a previous study using MVPA,427 it was shown that BOLD signal patterns in Broca's area contain speech-related information which was not 428 detectable with GLM analysis (Lee et al. 2012). MVPA is more sensitive to information in fine-grained patterns 429 which are preserved in unsmoothed fMRI data (Kriegeskorte and Bandettini 2007). At the same time, there has 430 been a debate about whether or not Broca's area should be divided into subareas executing different functions be more homogeneous and therefore detectable by the GLM analysis. Further studies should elucidate the 434 potentially differential roles of these two brain regions in pitch processing. 435 Although showing lower BOLD signal in the IFG and preSMA during Labeling, the AP musicians identified the 436 tones more accurately than RP musicians. Therefore, AP processing seems to be more efficient than RP processing 437 with regard to the use of neural resources. Neural efficiency has been discussed in relation to intelligence, where 438 it has been proposed that more intelligent individuals show lower BOLD signal while performing cognitive tasks 439 (Neubauer and Fink 2009). In this study, there were no group differences in psychometrically evaluated 440 intelligence. Neural efficiency is often observed in tasks of low or moderate difficulty and predominantly in brain 441 regions of the frontal cortex (Neubauer and Fink 2009). Both of these prerequisites are present in this study. The 442 efficiency of AP processing might be related to the automatic retrieval of the pitch-label association which 443 presumably occurs immediately after the pitch is encoded (Itoh et al. 2005). This process is often described as 444 effortless (Deutsch 2013). RP requires more processing steps because after the encoding, the pitch needs to be 445 compared to a previous pitch held in working memory and subsequently, the exact interval between those two 446 pitches needs to be determined. One might speculate that the neural efficiency of AP processing could be a reason 447 for its continued existence throughout human evolution despite its negligible role in music and speech perception 448

(McDermott and Oxenham 2008). 449
During Listening, the AP musicians showed larger BOLD signal than RP musicians in the right planum temporale. 450 We observed this increase exclusively with the ROI analysis, so the effect seems to be spatially restricted and too 451 subtle to be detected by analyses employing a conservative correction for multiple comparisons. As described in 452 the introduction, the planum temporale has been associated with AP processing from the very beginning of 453 neuroscientific AP research (Schlaug et al. 1995). It is part of the non-primary auditory cortex and has an important 454 role in the processing of a diverse range of sounds (Griffiths and Warren 2002). In this study, the increase in signal 455 was restricted to the right hemisphere. This finding is consistent with previous studies reporting anatomical 456 differences in AP musicians in the right planum temporale (Keenan et al. 2001;Wilson et al. 2009;Wengenroth 457 et al. 2014) and with an influential theory on the importance of the right hemispheric auditory cortex in music 458 processing (Zatorre et al. 2002). However, its exact role in AP processing is still unclear. With regard to auditory 459 processing in general, it has been proposed that the planum temporale matches incoming auditory information with 460 information that is stored in templates which are not located in the planum temporale itself (Griffiths and Warren 461 incoming information is matched (Levitin 1994;Levitin and Rogers 2005). Therefore, we propose that in AP 464 musicians, incoming auditory information, more precisely the extracted pitch information, is matched with these 465 internal pitch templates by computations performed in the right planum temporale. The templates themselves could 466 be represented in more anterior regions of the right temporal lobe which are implicated in semantic memory 467 (Binder and Desai 2011). 468 In contrast to the previously described PET study, we did not find group differences in the posterior DLPFC during 469 Listening. In the PET study, the involvement of the DLPFC was attributed to the automatic retrieval of the pitch-470 label association in AP musicians (Zatorre et al. 1998). The current results do not support this interpretation. In 471 both groups, we observed bilateral DLPFC BOLD signal increases during Labeling. These increases were 472 accompanied by higher BOLD signal in the bilateral IPS, again in both groups. Both the DLPFC and the IPS are 473