Pitch perception is adapted to species-specific cochlear filtering

Pitch perception is critical for recognizing speech, music and animal vocalizations, but its neurobiological basis remains unsettled, in part because of divergent results from different species. We used a combination of behavioural measurements and cochlear modelling to investigate whether species-specific differences exist in the cues used to perceive pitch and whether these can be accounted for by differences in the auditory periphery. Ferrets performed a pitch discrimination task well whenever temporal envelope cues were robust, but not when resolved harmonics only were available. By contrast, human listeners exhibited the opposite pattern of results on an analogous task, consistent with previous studies. Simulated cochlear responses in the two species suggest that the relative salience of the two types of pitch cues can be attributed to differences in cochlear filter bandwidths. Cross-species variation in pitch perception may therefore reflect the constraints of estimating a sound’s fundamental frequency given species-specific cochlear tuning.


Introduction 35
Many of the sounds in our environment are periodic, and the rate at which such 36 sounds repeat is known as their fundamental frequency, or F0. We perceive the F0 of 37 a sound as its pitch, and this tonal quality is one of the most important features of our 38 listening experience. The way that F0 changes encodes meaning in speech [1] and 39 musical melody [2][3][4]. The F0 of a person's voice provides a cue to their identity [5-40 7] and helps us attend to them in a noisy environment [8][9][10]. 41 The vocal calls of non-human animals are also often periodic, and pitch is 42 believed to help them to identify individuals and interpret communication calls 43 [11,12]. Many mammalian species have been shown to discriminate the F0 of 44 periodic sounds in experimental settings [13][14][15][16][17], and these animal models hold 45 promise for understanding the neural mechanisms that underlie pitch perception. 46 However, pitch acuity can differ markedly across species [16,18], raising the 47 possibility that humans and other mammals may use different neural mechanisms to 48 extract pitch. 49 The auditory cortex plays a key role in pitch processing, but it remains unclear 50 how cortical neurons carry out the necessary computations to extract the F0 of a 51 sound [19]. Neural correlates of F0 cues [20][21][22] and pitch judgments [23] have been 52 observed across auditory cortical fields in some species, while a specialized pitch 53 centre has been described in marmoset auditory cortex [24]. There is similar a lack of 54 consensus regarding the neural code for pitch in the human brain [25]. A better 55 understanding of the similarities and differences in pitch processing across species is 56 essential for interpreting neurophysiological results in animals and relating them to 57 human pitch perception. and parameters in the model were derived from either human psychophysics [41] or 132 ferret auditory nerve recordings [48]. 133 As shown in Figure 1B, the cochlear filters are wider for the ferret auditory 134 nerve than the human. In Figure 1C-E, we compare the human and ferret simulated 135 responses to a 500-Hz missing F0 tone complex that we used as a training sound in 136 our ferret behavioural experiment (described below). 137 When the instantaneous power of the cochlear filters is summed across the 138 duration of the sound and plotted as a function of centre frequency, the individual 139 harmonics of the tone are more clearly resolved in the human cochlea than in the 140 ferret (Fig. 1C). This takes the form of deeper troughs in the activation of nerve fibres 141 whose centre frequencies lie between the harmonic components of the sound. To 142 visualize the temporal representation of the same stimulus, we plotted the output of a 143 single nerve fibre (here, a fibre with a centre frequency of 5 kHz) throughout time 144 (Fig. 1D). In this case, the representation of the 500 Hz F0 is clearer in the ferret -the 145 human cochlea produces weaker temporal modulation because fewer harmonics fall 146 within the fibre's bandwidth. 147 We also examined whether the temporal representation of F0 was enhanced in 148 the ferret cochlea across the full range of frequency filters. A Fourier transform was 149 performed on the output of each fibre throughout a 200ms steady-state portion of the 9 sound. The power of the response at F0 was then expressed as a proportion of the 151 overall power for that fibre. The results of this metric averaged across all fibres in the 152 model are shown in Fig. 1E. The average temporal representation of F0 was enhanced 153 in the ferret compared to the human (Wilcoxon rank sum test; z = 8.286, p = 1.175 x 154 10 -16 ). In fact, this F0 representation metric was higher in the ferret than the human 155 cochlear model across every pair of individually simulated auditory nerve fibres. 156 These simulations suggest that the ferret cochlea provides an enhanced 157 representation of the envelope periodicity of a complex tone, as conveyed by spikes 158 that are phase-locked to the F0 in the auditory nerve. On the other hand, the human 159 auditory nerve provides a better resolved representation of individual harmonics 160 across the tonotopic array. It might thus be expected that these two types of cues 161 would be utilized to different extents by the two species. 162 163 164

Behavioural measures of pitch cue use in ferrets 165
To test the role of different pitch cues in ferret pitch perception, we trained five 166 animals on a two-alternative forced choice (2AFC) task that requires "low" and 167 "high" pitch judgements analogous to those used in human psychophysical tasks (Fig.  168 2A,B). On each trial, a harmonic complex tone was presented at one of two possible 169 fundamental frequencies. Ferrets were given water rewards for responding at the right 170 nose-poke port for a high F0, and at the left port for a low F0. Incorrect responses 171 resulted in a time-out. We began by training four ferrets to classify harmonic complex 172 tones with an F0 of 500 and 1000Hz, with a repeating pure tone presented at 707Hz 173 (the midpoint on a logarithmic scale) for reference before each trial. Two of these 174 animals, along with one naïve ferret, were then trained on the same task using target 175 F0 values of 150 and 450Hz and a 260Hz pure tone reference. In both cases, the 176 harmonics of the low and high stimuli to be discriminated were matched in spectral 177 bandwidth, so that ferrets could not solve the task based on the frequency range of the 178 sound ( Fig. 3; left column). Rather, the animals had to discriminate sounds based on 179 some cue to the F0. After completing several pre-training stages to habituate the 180 animals to the apparatus and sound presentation (see Methods), the ferrets learned to 181 perform the pitch classification task within 22 ± 3 (mean ± standard deviation) days 182 of twice daily training. 183 Once the ferrets learned to perform this simple 2AFC task, we incorporated 185 "probe trials" into the task in order to determine which acoustical cues they were 186 using to categorize the trained target sounds. Probe trials made up 20% of trials in a 187 given session, and were randomly interleaved with the "standard" trials described 188 above. On probe trials, an untrained stimulus was presented, and the ferret received a 189 water reward regardless of its behavioural choice. This task design discouraged ferrets 190 from learning to use a different strategy to classify the probe sounds. 191 The inner ear is known to produce distortion in response to harmonic tones 192 that can introduce energy at the fundamental frequency to the basilar membrane 193 response, even for missing-fundamental sounds [53]. These distortion products could 194 in principle counter our attempts to match the spectral bandwidths of the sounds, 195 since they could cause the lowest frequency present in the ear to differ as a function 196 of F0. To determine if the ferrets relied on such cochlear distortion products to 197 classify tones in our task, we added pink noise to the stimulus on 20% of randomly 198 interleaved probe trials at an intensity that is known to be more than sufficient to 199 mask cochlear distortion products in humans [54,55]. Ferrets performed more poorly 200 on probe trials than on standard trials (paired t-test; t = 4.346, p = 0.005), as expected 201 for an auditory discrimination task performed in noise. However, they continued to 202 perform the pitch classification at 71.85% ± 9.60% correct (mean ± standard 203 deviation) with the noise masker, which is well above chance (1-sample t-test; t = 204 6.025, p = 0.001). This suggests that ferrets did not rely on cochlear distortion 205 products to solve our task. 206 We next moved to the main testing stage of our behavioural experiment, 207 which aimed to determine if ferrets use resolved harmonics, temporal envelope 208 periodicity, or both of these cues to identify the F0 of tones. All tone complexes, both 209 the standard and probe stimuli, were superimposed on a pink noise masker. Our 210 auditory nerve model (above) allowed us to estimate which harmonics in the tone 211 complexes would be resolved in the ferret auditory nerve (Fig. 4A) [56]. This analysis 212 suggests that our standard tones contained both resolved and unresolved harmonics 213 for ferret listeners, as intended. We constructed four types of probe stimuli based on 214 our resolvability estimates: (1) "Low Harmonic" tones containing only harmonics that 215 we expected to be resolved; (2) "High Harmonic" tones containing harmonics 216 presumed to be less well resolved; (3) "All Harmonics Random Phase" probes 217 containing the full set of harmonics present in the standard tone, but whose phases 218 were independently randomized in order to flatten the temporal envelope; and (4) 219 "High Harmonics Random Phase" stimuli with the same randomization of harmonic 220 phases, but containing only presumptively unresolved harmonics. The spectral ranges 221 of these stimuli are given in Figure 4B, and the spectra and audio waveforms 222 (showing the temporal envelope periodicity) of the 500 and 1000 Hz stimuli are 223 illustrated in Figure  Hz condition (2-way ANOVA; F = 2.063, p = 0.158), so data collected from the same 234 animals in these two conditions were treated as independent measurements. 235

236
To assess the acoustical cues used by animals to solve the pitch classification 237 task, we compared ferrets' performance on the standard trials with that on each of the 238 four probe trial types (repeated measures 2-way ANOVA, Tukey's HSD test). Ferrets 239 showed impaired performance on probes that contained only low harmonics (p = 240 0.001), but performed as well as on standard trials when only high harmonics were 241 presented (p = 1.000). Their performance was also impaired when we randomized the 242 phases of the high-harmonics (p = 0.002). Phase randomization also impaired 243 performance when the full set of harmonics (both resolved and unresolved) were 244 present (p = 2.173 x 10 -5 ). This pattern of results suggests that ferrets rely more 245 strongly on the temporal envelope periodicity (produced by unresolved harmonics) 246 than on resolved harmonics to classify the pitch of tones, unlike what would be 247 expected for human listeners. 248 249

Comparison of human and ferret pitch classification performance 250
Humans were trained on a similar pitch classification task to the one described for 251 ferrets in order to best compare the use of pitch cues between these two species. 252 Participants were presented with harmonic complex tones and classified them as high 253 or low. A training phase was used to teach participants the high and low F0s. 254 We tested human listeners using the same types of standard and probe stimuli 255 as in the final stage of ferret testing described above. As the pitch discrimination 256 thresholds of human listeners are known to be superior to those of ferrets [16], we 257 adapted the target F0s (180 and 220 Hz) and harmonic cut-offs for human hearing 258 (Fig. 4). The between-species comparison of interest here is therefore not the 259 difference in absolute scores on the task, but the pattern of performance across probe 260

conditions. 261
Human listeners also showed varied pitch classification performance across 262 the standard and probe stimuli (repeated-measures 2-way ANOVA; F = 36.999, p = 263 1.443 x 10 -15 ). However, a different pattern of performance across stimuli was 264 observed for human subjects (Fig. 5B). Tukey's HSD tests indicated that human 265 listeners were significantly impaired when resolved harmonics were removed from 266 the sounds, as demonstrated by impairments in the "High Harmonic" probes with (p = 267 9.922 x 10 -9 ) and without (p = 1.029 x 10 -8 ) randomized phases. Conversely, no 268 impairment was observed when resolved harmonics were available, regardless of 269 whether the phases of stimuli were randomized ("All Harmonics Random Phase" 270 condition; p = 0.959) or not ("Low Harmonics" condition; p = 0.101). These results 271 are all consistent with the wealth of prior work on human pitch perception, but 272 replicate previously reported effects in a task analogous to that used in ferrets. 273 The performance for each probe type relative to performance on the standard 274 stimuli, is directly compared between the two species in Figure 5C. Here, a score of 1 275 indicates that the subject performed equally well for the standard tone and the probe 276 condition, while a score of 0 indicates that the probe condition fully impaired their 277 performance (reducing it to chance levels). This comparison illustrates the differences 278 in acoustical cues underlying ferret and human pitch classifications. As our model 279 simulations predicted, we found that while ferrets were impaired only when temporal 280 envelope cues from unresolved harmonics were disrupted, humans continued to 281 classify the target pitch well in the absence of temporal envelope cues, so long as 282 resolved harmonics were present. This was confirmed statistically as a significant 283 interaction between species and probe type on performance (repeated measures 3-way 284 ANOVA; F = 14.802, p = 3.412 x 10 -9 ). The two species thus appear to 285 predominantly rely on distinct cues to pitch. The use of probe trials without feedback in the present experiment allowed us to 341 determine which acoustical cues most strongly influenced listeners' pitch judgements. 342 The ferrets relied predominantly on temporal cues under these conditions, but our 343 results do not preclude the possibility that they could also make pitch judgments 344 based on resolved harmonics if trained to do so. Indeed, although human listeners rely 345 on resolved harmonics under normal listening conditions, we can also extract pitch 346 from unresolved harmonics when they are isolated [34,36,57]. Our simulations show 347 that up to 8 harmonics are resolved on the ferret cochlea, depending on the F0 (Fig.  348   4A). Consequently, if specifically trained to do so, one might expect ferrets to be able 349 to derive F0 from these harmonics using the same template matching mechanism 350 proposed for human listeners [27,29]. It is also important to note that the relationship 351 between harmonic resolvability and auditory nerve tuning is not fully understood, and 352 nonlinearities in response to multiple frequency components could cause resolvability 353 to be worse than that inferred from isolated auditory nerve fibre measurements. 354 Overall, the available evidence fits with the idea that pitch judgments are adapted 355 to the acoustical cues that are available and robust in a particular species, with 356 differences in cochlear tuning thus producing cross-species diversity in pitch 357 perception. A similar principle may be at work in human hearing, since listeners rely 358 on harmonicity for some pitch tasks and spectral changes in others, potentially 359 because of task-dependent differences in the utility of particular cues [7]. The 360 application of normative models of pitch perception will likely provide further insight 361 into the relative importance of these cues.

Ferrets (Mustela putorius furo) 388
Five adult female pigmented ferrets (aged 6 -24 months) were trained in this study. 389 Power calculations estimated that 5 animals was the minimum appropriate sample 390 size for 1-tailed paired comparisons with alpha = 5%, a medium (0.5) effect size, and 391 beta = 20%. Ferrets were housed in groups of 2-3, with free access to food pellets. 392 Training typically occurred in runs of 5 consecutive days, followed by two days rest. 393 Ferrets could drink water freely from bottles in their home boxes on rest days. On 394 training days, drinking water was received as positive reinforcement on the task, and 395 was supplemented as wet food in the evening to ensure that each ferret received at 396 least 60 ml/kg of water daily. Regular otoscopic and typanometry examinations were 397 carried out to ensure that the animals' ears were clean and healthy, and veterinary 398 checks upon arrival and yearly thereafter confirmed that animals were healthy. The 399 animal procedures were approved by the University of Oxford Committee on Animal 400 Care and Ethical Review and were carried out under license from the UK Home 401 Office, in accordance with the Animals (Scientific Procedures) Act 1986. 402

Humans 403
The pitch classification performance of 16 adult humans (9 male, ages 18-53 years; 404 mean age = 25.3 years) was also examined, which provided a 60% beta in the power 405 calculations described for ferrets. All subjects reported having normal hearing. All where f i is the centre frequency of the filter in Hz. 425 For the ferret cochlea, the equivalent rectangular bandwidth of each filter was 426 estimated using the following linear fit to the data in Sumner and Palmer [48]: 427 The output of each channel in the above Gammatone filter bank was half-wave 429 rectified and then compressed (to the power of 0.7) to simulate transduction of sound 430 by inner hair cells. Finally, the output was low-pass filtered at 3kHz to reflect the 431 spike rate limit of auditory nerve fibres. This model architecture is similar to that used 432 in previous studies (e.g. [51,52]). 433

Training apparatus 434
Ferrets were trained to discriminate sounds in custom-built testing chambers, 435 constructed from a wire mesh cage (44 x 56 x 49 cm) with a solid plastic floor, placed 436 inside a sound-insulated box lined with acoustic foam to attenuate echoes. Three 437 plastic nose poke tubes containing an inner water spout were mounted along one wall 438 of the cage: a central "start spout" and two "response spouts" to the left and right (Fig.  439   2A). Ferrets' nose pokes were detected by breaking an infrared LED beam across the 440 opening of the tube, and water was delivered from the spouts using solenoids. Sound 441 stimuli, including acoustic feedback signals, were presented via a loudspeaker (FRS 442 8; Visaton, Crewe, UK) mounted above the central spout, which had a flat response 443 (±2 dB) from 0.2 -20 kHz. The behavioural task, data acquisition, and stimulus 444 generation were all automated using a laptop computer running custom Matlab (The 445 Mathworks, Natick, MA, USA) code, and a real-time processor (RP2; Tucker-Davis 446 Technologies, Alachua, FL, USA). 447

Pre-training 448
Ferrets ran two training sessions daily, and typically completed 94 ± 24 trials per 449 session (mean ± standard deviation). Several pre-training stages were carried out to 450 shape animals' behaviour for our classification task. In the first session, animals 451 received a water reward whenever they nose poked at any of the spouts. Next, they 452 received water rewards only when they alternated between the central and peripheral 453 spouts. The water reward presented from the peripheral response spouts (0.3 -0.5 ml 454 per trial) was larger than that presented at the central start spout (0.1 -0.2 ml per 455 trial). The animal was required to remain in the central nose poke for 300 ms to 456 receive a water reward from that spout. corresponded to rewards at one of the two peripheral spouts (right rewards for high F0 467 targets, and left for low F0s). For all training and testing stages, the target tones 468 contained harmonics within the same frequency range, so that animals could not use 469 spectral cut-offs to classify the sounds. The target tone continued to play until the 470 animal responded at the correct peripheral spout, resulting in a water reward. Once the 471 animals could perform this final pretraining task with >70% accuracy across trials, 472 they advanced to pitch classification testing. 473 Testing stages and stimuli 474 The complex tone target was presented only once per trial, and incorrect peripheral 475 spout choices resulted in an error noise and a 10 s timeout (Fig. 2B). After such an 476 error, the following trial was an error correction trial, in which the F0 presented was 477 the same as that of the previous trial. These trials were included to discourage ferrets 478 from always responding at the same peripheral spout. If the ferret failed to respond at 479 either peripheral spout for 14 s after target presentation, the trial was restarted. 480 The reference pure tone's frequency was set to halfway between the low and 481 high target F0s on a log scale. We examined ferrets' pitch classification performance 482 using two pairs of complex tone targets in separate experimental blocks: the first with 483 In each case, testing took place over 3 stages, in which the ferret's task remained the 487 same but a unique set of stimulus parameters was changed ( Fig. 3 and 4), as outlined 488 below. Ferrets were allocated to the 260 and 707 Hz reference conditions based on 489 their availability at the time of testing. 490 Stage 1: Target sounds were tone complexes, containing all harmonics within 491 a broad frequency range (specified in Fig. 4B). When an animal performed this task 492 >75% correct on 3 consecutive sessions, (32.8 ± 7.1 sessions from the beginning of 493 training; mean ± standard deviation; n = 4 ferrets), they moved to Stage 2. 494 Stage 2: On 80% of trials, the same standard target tones from Stage 1 were 495 presented. The other 20% of trials were "probe trials", in which the ferret was 496 rewarded irrespective of the peripheral spout it chose, without a timeout or error 497 correction trial. Probe trials were randomly interleaved with standard trials. The probe 498 stimuli differed only by the addition of pink noise (0.1-10 kHz) to the target sounds, 499 in order to mask possible cochlear distortion products at F0. The level of the noise 500 masker was set so that the power at the output of a Gammatone filter centred at the F0 501 (with bandwidth matched to ferret auditory nerve measurements in that range [48]) 502 was 5dB below the level of the pure tone components of the target. This is 503 conservative because distortion products are expected to be at least 15 dB below the 504 level of the stimulus components [54,55]. When an animal performed this task >75% 505 correct on 3 consecutive sessions, they moved to stage 3. 506 Stage 3: The probe stimulus from Stage 2 served as the "Standard" sound on 507 80% of trials, and all stimuli (both the standard and probes) included the pink noise 508 masker described above. Twenty percent of trials were probe trials, as in Stage 2, but 509 this stage contained tones manipulated to vary the available pitch cues. We estimated 510 the resolvability of individual harmonics using ERB measurements available in 511 previously published auditory nerve recordings [48]. For a given F0, the number of 512 resolved harmonics was approximated as the ratio of F0 and the bandwidth of 513 auditory nerve fibres with a characteristic frequency at that F0, as described by Moore 514 and Ohgushi [56], and applied by Osmanski et al. [17]. This measure yielded between 515 1 and 8 resolved harmonics for ferrets, depending on the F0 (Fig. 4A). Four types of 516 probe stimuli were presented: (1) "Low Harmonics", which contained only harmonics 517 presumed to be resolved; (2) "High Harmonics", comprised of harmonics presumed to 518 be unresolved; (3) "All Harmonics Random Phase", which contained the same set of 519 harmonics as the standard, but whose phases were independently randomized in order 520 to reduce temporal envelope cues for pitch; and (4) "High Harmonics Random 521 Phase", which contained the harmonics present in "High Harmonics" stimuli, but with 522 randomized phases. The bandpass cutoffs for the probe stimuli were chosen so that 523 the "Low Harmonic", but not "High Harmonic", probes contained resolved harmonics 524 for ferret listeners. Each probe stimulus was presented on at least 40 trials for each 525 ferret, while the standard was tested on over 1000 trials per ferret. 526

Human psychophysical task 527
Human subjects were tested on a pitch classification task that was designed to be as 528 similar as possible to Stage 3 of ferrets' task (see above). Target F0s of 180 and 220 529 Hz were tested on 16 subjects. 530 In the psychophysical task, human listeners were presented with the same 531 computer monitor then asked the subject whether the sound heard was the low or high 538 pitch, which the subjects answered via another keypress (1 = low, 0 = high). Feedback 539 was given on the monitor after each trial to indicate whether or not the subject had 540 responded correctly. Incorrect responses to the standard stimuli resulted in 541 presentation of a broadband noise burst (200 ms duration, and 60 dB SPL) and a 3 s 542 timeout before the start of the next trial. Error correction trials were not used for 543 human subjects, as they did not have strong response biases. Standard harmonic 544 27 complex tones were presented on 80% of trials, and the 4 probes ("Low Harmonics", 545 "High Harmonics", "All Harmonics Random Phase", and "High Harmonics Random 546 Phase") were presented on 20% of randomly interleaved trials. Feedback for probe 547 trials was always "correct", irrespective of listeners' responses. Humans were given 548 10 practice trials with the standard stimuli before testing, so that they could learn 549 which stimuli were low and high, and how to respond with the keyboard. Each probe 550 stimulus was tested on 40 trials for each subject, while the standard was tested on 680 551 trials per subject. 552

QUANTIFICATION AND STATISTICAL ANALYSIS 554
Psychophysical data analysis 555 Error correction trials were excluded from all data analysis, as were data from any 556 testing session in which the subject scored less than 60% correct on standard trials. T-557 tests and ANOVAs with an alpha of 5% were used throughout to assess statistical 558 significance, where the n indicates the number of subjects per group. Error bars in 559 Because humans produced higher percent correct scores overall than ferrets on 562 the behavioural task, we normalized probe scores against the standard scores when 563 directly comparing performance between species. The score of each species in each 564 probe condition was represented as: 565 Pnorm ai = (P ai -50) / (S a -50), 566 where Pnorm is the normalized probe score for species a on probe i, P ai is the percent 567 correct score for species a on probe i, and S a is the percent correct score of species a 568 on the standard trials. If the performance of species a is unimpaired for a given probe 569 stimulus i relative to the standard stimulus, then Pnorm ai will equal 1. If the listeners 570 are completely unable to discriminate the F0 of the probe, then Pnorm ai = 0. 571 The data and custom software developed in this manuscript are available on 572 the Dryad archive.

Declaration of Interests 584
The authors and funding bodies have no competing financial interests in the outcomes 585 of this research. 586