Elsevier

Cognition

Volume 112, Issue 1, July 2009, Pages 1-20
Cognition

Processing interactions between phonology and melody: Vowels sing but consonants speak

https://doi.org/10.1016/j.cognition.2009.02.014Get rights and content

Abstract

The aim of this study was to determine if two dimensions of song, the phonological part of lyrics and the melodic part of tunes, are processed in an independent or integrated way. In a series of five experiments, musically untrained participants classified bi-syllabic nonwords sung on two-tone melodic intervals. Their response had to be based on pitch contour, on nonword identity, or on the combination of pitch and nonword. When participants had to ignore irrelevant variations of the non-attended dimension, patterns of interference and facilitation allowed us to specify the processing interactions between dimensions. Results showed that consonants are processed more independently from melodic information than vowels are (Experiments 1–4). This difference between consonants and vowels was neither related to the sonority of the phoneme (Experiment 3), nor to the acoustical correlates between vowel quality and pitch height (Experiment 5). The implication of these results for our understanding of the functional relationships between musical and linguistic systems is discussed in light of the different evolutionary origins and linguistic functions of consonants and vowels.

Introduction

A fundamental issue in human cognition is to determine how the different dimensions of a stimulus combine and interact in processing complex materials. Speech and music are typical examples of such materials, and have been studied not only for their own sake, but also for comparing the cognitive processes involved in each of them. While some authors view music processing as a by-product of language processing (e.g., Pinker, 1997), others argue that music involves specific computational processes (e.g., Peretz, 2006). Songs provide an ideal material to study the relations between language and music, since they naturally combine a musical dimension, the tune, and a linguistic dimension, the lyrics (e.g., Patel & Peretz, 1997). The aim of the present work was to examine whether lyrics and tunes in sung materials are processed independently or in an integrated way. More specifically, we examined the on-line processing independence or integration of the phonological and melodic dimensions of sung materials.

Up to now, studies on songs have mainly investigated the relations between semantics and melody. Depending on the experimental approach and on the materials used, results show either independence (Besson et al., 1998, Bonnel et al., 2001) or interactions (Poulin-Charronnat et al., 2005, Schön et al., 2005). The effect of harmonic congruity on phoneme monitoring seems to suggest that interactive processing of phonology and music occurs. In studies of harmonic priming,1 Bigand, Tillmann, Poulin, D’Adamo, and Madurell (2001) manipulated the structural relationship between the last sung chord and the preceding musical context, an eight-chord sung sequence. Results showed faster phoneme monitoring of the last sung vowel when it was sung on the tonic than on the subdominant chord. However, if linguistic and musical domains shared some attentional capacities, music may modulate linguistic processing by modifying the allocation of attentional resources necessary for linguistic computation. Under this view, the effect of harmonic context on phoneme processing arises from general attentional processes rather than from specific music-language dependencies (Bigand, Tillmann, Poulin, D’Adamo, & Madurell, 2001; see also Poulin-Charronnat et al., 2005). This possibility is supported by recent evidence of similar facilitation from harmonic relatedness with nonlinguistic stimuli, such as geometric shapes (Escoffier & Tillmann, 2008). In addition, Bigand, Tillmann, Poulin, D’Adamo, and Madurell (2001) only used one phoneme category for discrimination, namely vowels (i.e., the /di/-/du/ distinction). Their finding may not generalize to harmonic and phonemic processing as a rule, since vowels and consonants differ in both their acoustical properties and linguistic function.

At the acoustical level, most consonants are characterized by transient acoustic cues typical of formant transitions, whereas vowels are characterized by the relationship between more steady-state frequency information (Delattre et al., 1955, Fry et al., 1962, Liberman et al., 1967). These acoustical differences have been associated with different cerebral hemispheres: the processing of rapidly changing acoustic information (e.g., consonants) is more left-lateralized than the processing of stable spectral information (e.g., vowels or music; for reviews, see Poeppel, 2003, Zatorre et al., 2002, Zatorre and Gandour, 2008). Therefore, vowels might be more suitable to carry melodic and prosodic information than consonants. This idea is in line with studies on opera singing, which suggest that vowels are more intimately linked to melodic variations of tunes than consonants, as the latter are located at the transition between notes and are sometimes reported as breaking the melodic line (e.g., Scotto di Carlo, 1993). Thus, trained singers tend to shorten consonants and to reduce their articulation (McCrean & Morris, 2005), while vowels are typically lengthened in singing compared to speech (Scotto di Carlo, 2007a, Scotto di Carlo, 2007b, Sundberg, 1982).

At the functional level, vowels and consonants may also serve distinct roles in speech (Bonatti, Peña, Nespor, & Mehler, 2007). Statistical learning studies have shown that humans are better at capturing non-adjacent regularities based on consonants than on vowels (Bonatti et al., 2005, Mehler et al., 2006). This suggests that consonants carry lexical information. The specific lexical function of consonants seems to emerge relatively early in human life, given that 20 month-old infants can learn two words that differ by only one consonant, but fail when the distinctive phoneme is a vowel (Nazzi, 2005, Nazzi and New, 2007). In contrast with the lexical function of consonants, vowels are used to extract structural generalizations in artificial languages (Toro, Nespor, Mehler, & Bonatti, 2008). Vowels are thus involved in syntactic computations. They also contribute to grammar and to prosody (Nespor et al., 2003, Toro et al., 2008), including indexical prosody that allows for speaker identification (Owren & Cardillo, 2006).

In addition, neuropsychological dissociations have been reported between the ability to produce vowels and consonants in aphasic patients (Caramazza, Chialant, Capasso, & Miceli, 2000), with consonants being more vulnerable than vowels to such impairments (Béland et al., 1990, Canter et al., 1985; for a review, see Monaghan & Shillcock, 2003; but see Semenza et al., 2007). These two classes of speech segments would thus pertain to distinct processing systems.

Comparative human–animal studies further suggest that consonants are more specific to human speech than vowels are. Contrary to humans, New World monkeys (cotton-top tamarins) are only able to extract statistical regularities based on vowels (Newport, Hauser, Spaepen, & Aslin, 2004). Moreover, while monkeys have a steady-state formant perception comparable to the one of humans (Sommers, Moody, Prosen, & Stebbins, 1992), and can learn to discriminate the manner of articulation of consonants (Sinnott & Williamson, 1999), they exhibit problems in learning the place of articulation contrasts. Consequently, they process formant transitions differently from humans (Sinnott & Gilmore, 2004). On the production side, nonhuman primates can produce harmonic sounds very similar to vowels in order to provide indexical information about sex, age, identity, emotion, etc. (Owren et al., 1997, Rendall et al., 1996). However, only humans have elaborated the supralaryngeal articulations that, by inserting consonants into the vocalic carrier (MacNeilage & Davis, 2000), allow the emergence of a rich set of meaningful contrasts.

In summary, learning and developmental research support the notion that vowels and consonants subtend different linguistic functions, with consonants being more tied to word identification, while vowels essentially contribute to grammar and to prosody. In addition, neuropsychological dissociations show that the processing of these two classes of speech segments is dissociable by brain damage. Furthermore, comparative human–animal studies suggest that vowels may be less specific to speech than consonants. As a consequence, vowels may be more intricately linked than consonants to other non-linguistic auditory dimensions, like melody.

To our knowledge, this hypothesis has only been tested in speech using auditory adaptations of the speeded classification tasks designed by Garner (e.g., Garner, 1974, Garner, 1978a, Garner, 1978b; see also Lidji (2007), for a recent review in the song domain). In these tasks, participants are asked to classify spoken syllables according to their values on a previously specified target dimension. This dimension could be, for example, the pitch level (manipulated through the vowel fundamental frequency, F0) or the identity of the initial consonant (e.g., /bæ/ or /gæ/; Wood, 1974, Wood, 1975). Three conditions constitute the filtering and redundancy tests (Ashby & Maddox, 1994) that aim to check whether irrelevant orthogonal variations on one dimension (e.g., identity of the consonant) influence the processing of the other, namely the target dimension (e.g., pitch). Variations on the two dimensions can be either redundant (i.e., correlated, e.g., when all /bæ/ syllables have a low pitch and all /gæ/ syllables have a high pitch) or orthogonal (when both /bæ/ and /gæ/ syllables can be either low or high). Comparing sorting times and performance with a baseline control test (also called a standard or discrimination task, e.g., Garner, 1981), where only one dimension is varied (e.g., just the consonant, with only high /bæ/ and /gæ/ syllables, or just pitch, with only high and low /bæ/), allows one to evaluate the participants’ attentional filtering capacities. Indeed, if processing of the target dimension entailed processing of the non-target dimension, participants would be unable to filter out irrelevant variations. Hence, their performance would be poorer (e.g., slower Reaction Times, RTs) in the filtering test2 than in the baseline tests, an effect referred to as Garner interference (e.g., Pomerantz, 1983).3

With this speeded classification paradigm, the interactions between segmental (phonemes) and suprasegmental (pitch or pitch contour) dimensions in speech have been shown to be modulated by the nature of the phonemes. While pitch classification was not affected by the consonantal variations described above (Wood, 1974, Wood, 1975), consonant classification was slowed down by variations in pitch. In contrast, when the segmental task concerned vowel quality (e.g., /bɑ/ vs. /bæ/) rather than consonants, mutual and symmetric interference between the segmental dimension, and either the pitch or the loudness dimension, was reported (Miller, 1978). These data seem to support the idea that vowels and consonants have different relationships with pitch. According to Melara and Marks (1990), vowel and pitch are processed by the same general auditory mechanisms, while consonants are processed at a later level, a phonetic one.

However, these results cannot be generalized to lyrics and tunes in songs. While static pitch levels (i.e., synthetic syllables recorded at a constant F0: 104 or 140 Hz) were used in these studies (Miller, 1978, Wood, 1974, Wood, 1975), music (including songs) as well as speech intonation and lexical tones are characterized by pitch changes. Using dynamic tonal contours in speeded classification, Repp and Lin (1990) observed mutual interference between segmental (consonant or vowel) and tonal information in Mandarin Chinese. More crucially, in English listeners, Lee and Nusbaum (1993) observed mutual interference between consonantal and pitch information for dynamic tonal contours but asymmetrical interference for static pitch levels. Thus, contrary to what was observed with static pitch levels, both vowels and consonants interact with speech tonal contours.

Yet, there are several shortcomings in these studies. In most of them, processing interactions between dimensions were assessed only by examining the interference pattern, which may merely reflect the listeners’ inability to pay attention selectively to the target dimension (e.g., Thibaut & Gelaes, 2002). According to Garner (1974), a demonstration of integrality of multidimensional stimuli, namely of integrated, holistic processing, requires not only the occurrence of interference, but also that correlated variations on the non-target dimension lead to a benefit or redundancy gain. Indeed, in the redundant condition, when the dimensions are processed in a unitary fashion (Grau & Kemler-Nelson, 1988), the perceptual distance between the whole stimuli is enhanced according to a Euclidean metric of (dis)similarity. By contrast, for separable dimensions, (dis)similarity is based on a city-block metric in which (dis)similarity between multidimensional stimuli is additive (Torgerson, 1958), and hence no gain is expected in the redundant condition. To our knowledge, the only study that also used correlated stimuli (i.e., the redundancy test) to examine linguistic tones and segmental information interactions was that of Repp and Lin (1990). Unfortunately, in this study, there was a difference in the relative discriminability of the two dimensions (all participants discriminated tones more poorly than segments), which is known to modulate the patterns of dimensional interaction (Garner, 1974, Garner and Felfoldy, 1970). In addition, even results obtained with speech tonal contours are unlikely to generalize to interactions between other auditory dimensions such as lyrics and tunes of songs. For instance, it has been shown that “consonants” and “vowels” of synthesized consonant–vowel (CV) stimuli are processed as integral dimensions when listeners consider them as linguistic, but separately when considered as a mix of noise and tone (Tomiak, Mullennix, & Sawusch, 1987).

In the present work, we adapted the speeded classification paradigm to study the processing of the phonological and melodic dimensions of consonant–vowel bisyllabic (CVCV) nonwords sung on two-tone melodic intervals. Musically untrained participants were presented with speeded classification tasks, using either natural (Experiments 1, 2, 3 and 4) or synthesized (Experiment 5) sung syllables. The speeded classification tasks of Experiment 1, 2, 3 and 5 included the filtering, redundant and baseline conditions that constitute the filtering and redundancy tests. In all these conditions, participants were required to respond according to the identity of either the “lyrics” (the nonword) or the “tune” (the melodic interval). We contrasted materials in which the nonwords differed by their middle consonant, either a stop (Experiment 2) or a nasal (Experiment 3), to materials in which they differed by their final vowel (Experiments 1, 3 and 5). The rationale for contrasting these materials was to test the hypothesis that vowels and consonants involve different processes that may have different relationships with melodic processing.

In Experiment 1, we examined vowels and intervals processing. If vowels and intervals constituted interactive dimensions, an integrality pattern – interference cost and redundancy gain – was expected. In Experiment 2, the nonwords differed by their middle, voiceless stop consonant. If consonants were more speech-specific and provided poor melodic support, a separability pattern was predicted.

The aim of Experiment 3 was to generalize results beyond the case of stop consonants, as well as to a new vowel contrast. By definition, consonants and vowels differ in their acoustics. They also differ in the physics of their production, because only consonants are produced by either a partial or total constriction of the upper vocal tract. However, some consonants are more sonorous – and hence more vowel like – than others. Indeed, in all languages speech sounds can be ranked on a sonority hierarchy, ranging from the least sonorous stop consonants to the most sonorous glides and vowels, with fricatives, nasals, and liquids having an intermediate, progressively more sonorous status. Sonority is related to the degree of openness of the vocal apparatus during speech (e.g., Goldsmith, 1990, MacNeilage, 1998, Selkirk, 1982) and hence to relative loudness, perceptibility and acoustic intensity (but see Harris, 2006). From this perspective, vowels would be processed differently from consonants because the former are more sonorous than the latter. Such a view also contends that the sonority of consonants affects how they are processed in relation to pitch in songs. In particular, the more sonorous nasals may be more apt to support pitch variations than the less sonorous stops, and hence their processing may be more integrated with melody. By contrast, if vowels and consonants were processed differently because they carry different functions in speech processing, no difference between nasals and stops should be found.

The aim of Experiment 4 was to examine the interactions between consonants and melody in sung nonwords using a condensation test (cf. Posner, 1964) in which no single dimension can serve as the relevant basis for classification. Finally, the filtering and redundancy tests of Experiment 5 were aimed at checking that the integrality between vowels and intervals did not result from acoustical interactions between the spectral characteristics of vowels and their pitch. To this end, we used a synthesized material in which these parameters were carefully controlled.

Section snippets

Experiment 1 – interactions between vowels and intervals in sung nonwords: filtering and redundancy tests

Participants had to classify bisyllabic nonwords sung on two-note intervals on the basis of the identity of either the nonword (phonological task) or the interval (melodic task), in the three conditions defined by Garner (1974). The nonwords differed from each other by the identity of their final vowel /

/ vs. /
/ and the intervals varied in their melodic contour, either ascending or descending (see Table 1).

Within each condition, the task remained formally the same: to associate each presented

Experiment 2 – interactions between stop consonants and intervals in sung nonwords: filtering and redundancy tests

Our hypothesis that vowels and melodic intervals are integral dimensions was supported by the results of Experiment 1. In Experiment 2 we tested the additional hypothesis that consonants and intervals are less integrated. This can be either due to the acoustic properties of consonants, which prevent them from carrying melodic information, or to the different linguistic function and higher linguistic specificity of consonants compared to vowels.

Experiment 3 – generalization to other vowels and to nasal consonants

Experiment 3 had three objectives. First, in Experiments 1 and 2, there was a systematic association between nonwords varying on vowels and minor intervals, and nonwords varying on consonants and major intervals. The size of these intervals also differed as the baseline discriminability had to be as similar as possible in each material for the phonological and the melodic tasks. This had the detrimental consequence of pairing the vocalic and consonantal nonwords with largely different interval

Experiment 4 – condensation test

The occurrence of an interference effect without redundancy gain for the nasals used in Experiment 3 corresponds neither to an integrality nor to a separability pattern. According to some authors, interference without facilitation merely arises from difficulties in attending selectively to separate dimensions, because of task difficulty and/or lack of discriminability between the values of the dimensions. This has been shown, for example, in developmental studies on attentional filtering

Experiment 5 – filtering and redundancy tests on synthesized vocalic material

In Experiments 1 and 3, we observed an integrality pattern for the vocalic materials, which was not the case for the consonantal materials of Experiments 2 and 3. Such a difference suggests that vowels and intervals are at least partly processed by common auditory mechanisms, in contrast to consonants and intervals.

Alternatively, this response pattern may reflect physical interactions between the linguistic and musical dimensions. In singing, different pitches can alter the intelligibility of

General discussion

In the present study, we examined whether the phonological and melodic dimensions of sung material are processed independently or in an integrated way. For this we used the filtering (Experiments 1, 2, 3 and 5) and condensation (Experiment 4) tests designed by Garner (1974) with auditory CVCV nonsense sung syllables. Moreover, we compared materials with varying vowels (V-materials: Experiments 1, 3, and 5) and varying consonants, consisting of either stops (stop-C-material: Experiment 2) or

Acknowledgements

This research was supported by a grant of the Human Frontier of Science Program (RGP 53/2002, “An interdisciplinary approach to the problem of language and music specificity”), by a MINIARC grant (AT.20051024.CG.1, “Représentations mentales de la musique et du langage dans le chant et nature de leurs interactions”), as well by a FRFC grant (2.4633.66, “Mental representations of music and language in singing and nature of their interactions”). The two first authors are Senior Research Associate

References (103)

  • J. Mehler et al.

    The “soul” of language does not use statistics: Reflections on vowels and consonants

    Cortex

    (2006)
  • P. Monaghan et al.

    Connectionist modelling of the separable processing of consonants and vowels

    Brain and Language

    (2003)
  • T. Nazzi

    Use of phonetic specificity during the acquisition of new words: Differences between consonants and vowels

    Cognition

    (2005)
  • T. Nazzi et al.

    Beyond stop consonants: Consonantal specificity in early lexical decision

    Cognitive Development

    (2007)
  • I. Peretz

    The nature of music from a biological perspective

    Cognition

    (2006)
  • D. Poeppel

    The analysis of speech in different temporal integration windows: Cerebral lateralization as “asymetric sampling in time”

    Speech Communication

    (2003)
  • B. Poulin-Charronnat et al.

    Musical structure modulates semantic priming in vocal music

    Cognition

    (2005)
  • B.H. Repp et al.

    Integration of segmental and tonal information in speech-perception – A cross-linguistic study

    Journal of Phonetics

    (1990)
  • J.R. Saffran et al.

    Statistical learning of tone sequences by human infants and adults

    Cognition

    (1999)
  • J.R. Saffran et al.

    Word segmentation: The role of distributional cues

    Journal of Memory and Language

    (1996)
  • D. Schön et al.

    Songs as an aid for language acquisition

    Cognition

    (2008)
  • C. Semenza et al.

    A dedicated neural mechanism for vowel selection: A case of relative vowel deficit sparing the number lexicon

    Neuropsychologia

    (2007)
  • M.L. Serafine et al.

    Integration of melody and text in memory for songs

    Cognition

    (1984)
  • B.E. Shepp et al.

    The development of selective attention: Holistic perception versus resource allocation

    Journal of Experimental Child Psychology

    (1987)
  • J. Sundberg

    Perception of singing

  • C. Astesano et al.

    Le langage et la musique dans le chant

    Revue de Neuropsy News

    (2004)
  • E. Ben Artzi et al.

    Visual-auditory interaction in speeded classification – Role of stimulus difference

    Perception and Psychophysics

    (1995)
  • M. Besson et al.

    Singing in the brain: Independence of lyrics and tunes

    Psychological Science

    (1998)
  • J.J. Bharucha et al.

    Reaction-time and musical expectancy – Priming of chords

    Journal of Experimental Psychology: Human Perception and Performance

    (1986)
  • E. Bigand et al.

    Global context effects on musical expectancy

    Perception and Psychophysics

    (1997)
  • Boersma, P., Weenink, D. (2007). Praat: Doing phonetics by computer (version 5.0). Retrieved December 10, 2007, from...
  • L.L. Bonatti et al.

    Linguistic constraints on statistical computations

    Psychological Science

    (2005)
  • L.L. Bonatti et al.

    On consonants, vowels, chickens and eggs

    Psychological Science

    (2007)
  • A.M. Bonnel et al.

    Divided attention between lyrics and tunes of operatic songs: Evidence for independent processing

    Perception and Psychophysics

    (2001)
  • A. Caramazza et al.

    Separable processing of consonants and vowels

    Nature

    (2000)
  • J. Cohen et al.

    Psyscope – An interactive graphic system for designing and controlling experiments in the psychology laboratory using Macintosh computers

    Behavior Research Methods Instruments and Computers

    (1993)
  • P. Delattre et al.

    Acoustic loci and transitional cues for consonants

    Journal of the Acoustical Society of America

    (1955)
  • P.M. Fitts et al.

    S-R compatibility and information reduction

    Journal of Experimental Psychology

    (1965)
  • D. Fry et al.

    The identification and discrimination of synthetic vowels

    Language and Speech

    (1962)
  • J. Gandour et al.

    A crosslinguistic PET study of tone perception

    Journal of Cognitive Neuroscience

    (2000)
  • J. Gandour et al.

    Pitch processing in the human brain is influenced by language experience

    Neuroreport

    (1998)
  • W.R. Garner

    The processing of information and structure

    (1974)
  • W.R. Garner

    Interaction of stimulus dimensions in concept and choice processes

    Cognitive Psychology

    (1978)
  • W.R. Garner

    Selective attention to attributes and to stimuli

    Journal of Experimental Psychology: General

    (1978)
  • W.R. Garner

    The analysis of unanalyzed perceptions

  • W.R. Garner

    Asymmetric interactions of stimulus dimensions in perceptual information processing

  • J.A. Goldsmith

    Autosegmental and metrical phonology

    (1990)
  • R.L. Gottwald et al.

    Effects of focusing strategy on speeded classification with grouping, filtering, and condensation tasks

    Perception and Psychophysics

    (1972)
  • R. Gottwald et al.

    Filtering and condensation task with integral and separable dimensions

    Perception and Psychophysics

    (1975)
  • J.W. Grau et al.

    The distinction between integral and separable dimensions – Evidence for the integrality of pitch and loudness

    Journal of Experimental Psychology: General

    (1988)
  • Cited by (40)

    • Changes in Spoken and Sung Productions Following Adaptation to Pitch-shifted Auditory Feedback

      2023, Journal of Voice
      Citation Excerpt :

      Apart from the similarity between speech and song, these two functions differ in some important aspects. It has been suggested that pitch in vowels and voiced consonants might be processed differently depending on the nature of the task, i.e., singing or speaking.18 Moreover, F0 variations in speech are continuous and determine the linguistic and paralinguistic content of the material, whereas songs are mostly discrete elements organized in a formatted structure sufficiently close to a known melody.

    • Long vowel sounds induce expectations of sweet tastes

      2020, Food Quality and Preference
      Citation Excerpt :

      For example, ‘a great teacher’ vs. ‘a greeaat teacher’, can be sarcastic or factual depending on the vowel length and stresses (both of which are prosodic tools) employed along the points of speech. Some researchers suggest that, “vowels sing but consonants speak” (Kolinsky et al., 2009), and within the vowels, long vowel sounds are considered especially melodic and euphonic. Poets, writers and even babies are adept at using long vowel sounds to convey melody and euphony.

    • Predicting L2 vowel identification accuracy from cross-language mappings between L2 English and L1 Korean

      2018, Language Sciences
      Citation Excerpt :

      Previous research has documented functional differences between consonants and vowels (Toro et al., 2008). For instance, Kolinsky et al. (2009) reported that adults rely more on consonants than vowels at the lexical level. Thus, consonants are linked more to word identification, whereas vowels are tied more to prosody and indexical information.

    View all citing articles on Scopus
    View full text