Abstract
Humans identify speech sounds, the fundamental building blocks of spoken language, using the same cues, or acoustic dimensions, as those that differentiate the voices of different speakers. The correct interpretation of speech cues is hence uncertain, and requires normalizing to the specific speaker. Here we assess how the human brain uses speaker-related contextual information to constrain the processing of speech cues. Using high-density electrocorticography, we recorded local neural activity from the cortical surface of participants who were engaged in a speech sound identification task. The speech sounds were preceded by speech from different speakers whose voices differed along the same acoustic dimension that differentiated the target speech sounds (the first formant; the lowest resonance frequency of the vocal tract). We found that the same acoustic speech sound tokens were perceived differently, and evoked different neural responses in auditory cortex, when they were heard in the context of different speakers. Such normalization involved the rescaling of acoustic-phonetic representations of speech, demonstrating a form of recoding before the signal is mapped onto phonemes or higher level linguistic units. This process is the result of auditory cortex’ sensitivity to the contrast between the dominant frequencies in speech sounds and those in their just preceding context. These findings provide important insights into the mechanistic implementation of normalization in human listeners. Moreover, they provide the first direct evidence of speaker-normalized speech sound representations in human parabelt auditory cortex, highlighting its critical role in resolving variability in sensory signals.