Prosodic categories in speech are acoustically multidimensional: evidence from dimension-based statistical learning

Kyle Jasmin; Adam Tierney; Lori Holt

doi:10.1101/2021.01.18.427088

Abstract

Segmental speech units (e.g. phonemes) are described as multidimensional categories wherein perception involves contributions from multiple acoustic input dimensions, and the relative perceptual weights of these dimensions respond dynamically to context. Can prosodic aspects of speech spanning multiple phonemes, syllables or words be characterized similarly? Here we investigated the relative contribution of two acoustic dimensions to word emphasis. Participants categorized instances of a two-word phrase pronounced with typical covariation of fundamental frequency (F0) and duration, and in the context of an artificial ‘accent’ in which F0 and duration covaried atypically. When categorizing ‘accented’ speech, listeners rapidly down-weighted the secondary dimension (duration) while continuing to rely on the primary dimension (F0). This clarifies two core theoretical questions: 1) prosodic categories are signalled by multiple input acoustic dimensions and 2) perceptual cue weights for prosodic categories dynamically adapt to local regularities of speech input.

Highlights

Prosodic categories are signalled by multiple acoustic dimensions.
The influence of these dimensions flexibly adapts to changes in local speech input.
This adaptive plasticity may help tune perception to atypical accented speech.
Similar learning models may account for segmental and suprasegmental flexibility.