Acoustic regularities in infant-directed speech and song across cultures

Courtney B. Hilton; Cody J. Moser; Mila Bertolo; Harry Lee-Rubin; Dorsa Amir; Constance M. Bainbridge; Jan Simson; Dean Knox; Luke Glowacki; Andrzej Galbarczyk; Grazyna Jasienska; Cody T. Ross; Mary Beth Neff; Alia Martin; Laura K. Cirelli; Sandra E. Trehub; Jinqi Song; Minju Kim; Adena Schachner; Tom A. Vardy; Quentin D. Atkinson; Jan Antfolk; Purnima Madhivanan; Anand Siddaiah; Caitlyn D. Placek; Gul Deniz Salali; Sarai Keestra; Manvir Singh; Scott A. Collins; John Q. Patton; Camila Scaff; Jonathan Stieglitz; Cristina Moya; Rohan R. Sagar; Brian M. Wood; Max M. Krasnow; Samuel A. Mehr

doi:10.1101/2020.04.09.032995

Abstract

Across taxa, the forms of vocal signals are shaped by their functions^1–15. In humans, a salient context of vocal signaling is infant care, as human infants are altricial^16,17. Humans often produce “parent-ese”, speech and song for infants that differ acoustically from ordinary speech and song^18–35, in fashions that are thought to support parent-infant communication and infant language learning^36–39; modulate infant affect^33,40–45; or credibly signal information to infants⁴⁶. These theories predict a universal form-function link in infant-directed vocalizations, with consistent differentiation between infant-directed and adult-directed vocalizations across cultures. Some evidence supports this prediction^{23,27,28,32,47–50}, but the limited generalizability of individual ethnographic reports and laboratory experiments⁵¹ and small stimulus sets⁵², along with intriguing reports of counterexamples^53–60, leave the question open. Here, we show that infant-directed speech and song are robustly differentiable from their adult-directed counterparts, within voices and across cultures. We built a corpus of 1615 recordings of infant- and adult-directed singing and speech produced by 410 people living in 21 urban, rural, and small-scale societies and played the recordings to 45,745 people recruited online from many countries. We asked them to guess whether or not each vocalization was, in fact, infant-directed. The patterns of inferences of these naïve listeners, supported by acoustic analyses and predictive modelling, demonstrate acoustic cues to infant-directedness that are cross-culturally robust. The cues to infant-directedness differ across language and music, however, informing hypotheses of the psychological functions and evolution of both.

Main

The forms of many animal signals are shaped by their functions, a link arising from production- and reception-related rules that help to maintain reliable signal detection within and across species^1–6. Form-function links are widespread in vocal signals across taxa, from meerkats to fish^3,7–10, causing acoustic regularities that allow cross-species intelligibility^11–13,15. This facilitates the ability of some species to eavesdrop on the vocalizations of other species, for example, as in superb fairywrens (Malurus cyaneus), who learn to flee predatory birds in response to alarm calls that they themselves do not produce¹⁴.

In humans, an important context for the effective transmission of vocal signals is between parents and infants, as human infants are particularly helpless¹⁶. To elicit care, infants use a distinctive alarm signal: they cry¹⁷. In response, adults produce infant-directed language and music (sometimes referred to as “parent-ese”) in forms of speech and song with putatively stereotyped acoustics^18–35.

These stereotyped acoustics are thought to be functional: supporting language acquisition^36–39, modulating infant affect and temperament^33,40,41, and signalling information to infants⁴⁶. These theories all share a key prediction: like the vocal signals of other species, the forms of infant-directed vocalizations should be universally shaped by their functions, instantiated with clear regularities across cultures. Evidence for a universal form-function link is mixed, however, given the limited generalizability of individual ethnographic reports and laboratory studies⁵¹; small stimulus sets⁵²; and a variety of counterexamples^{53,54,56–60}.

In language, infant-directed speech is primarily characterized by higher and more variable pitch⁶¹ and more exaggerated and variable vowels^23,62,63, in modern industrialized societies^{23,28,47,48,50,64,65} and a few small-scale societies^49,66. Infants are themselves sensitive to these features, preferring them, even if spoken in unfamiliar languages^67–69. But these acoustic features are less exaggerated in some cultures^58,64,70 and apparently vary in relation to the age and sex of the infant^64,71,72.

In music, infant-directed songs also have stereotyped acoustic features. Lullabies, for example, tend toward slower tempos, reduced accentuation, and simple repetitive melodic patterns^31,32,35,73, supporting functional roles associated with infant care^33,41,46 in industrialized^34,74–76 and small-scale societies^77,78. Infants are soothed by these acoustic features, whether produced in familiar^44,45 or unfamiliar songs⁷⁹, and both adults and children reliably associate the same features with a soothing function^31,32,73. But cross-cultural studies of infant-directed song have primarily relied upon archival recordings from disparate sources^29,31,32; an approach that poorly controls for differences in voices, behavioral contexts, recording equipment, and historical conventions.

The degree to which infant-directed vocalizations are acoustically stereotyped across cultures is therefore unclear. To address this, we created a corpus of infant-directed song, infant-directed speech, adult-directed song, and adult-directed speech from a diverse set of 21 human societies, totaling 1615 field recordings of 410 individual voices (Fig. 1a, Table 1, and Methods; the corpus is open-access at https://doi.org/10.5281/zenodo.5525161). Participants were asked to provide all four vocalization types, enabling within-voice analyses.

Fig. 1 Where vocalizations were recorded and where citizen scientists were recruited.

a, We recorded examples of speech and song from 21 urban, rural, or small-scale societies, whose locations are represented by coloured circles. b, Participants in the citizen-science experiment, who listened to the vocalizations and guessed whether each was directed toward an adult or to an infant, hailed from many countries; the gradients indicate the total number of vocalization ratings gathered from each country.

View this table:

Table 1.

Societies from which recordings were gathered.

Here, we report analyses of the corpus, using computational methods and a citizen-science experiment, to study three questions: (i) Is infant-directedness mutually intelligible across cultures? (ii) Are the acoustic cues to infant-directedness cross-culturally robust? (iii) Are human inferences about infant-directedness aligned to such acoustic cues?

Naïve listeners distinguish infant-directed from adult-directed vocalizations

We played excerpts from the vocalization corpus to 45,745 people in the “Who’s Listening?” game on https://themusiclab.org (after exclusions; see Methods). The participants resided in 184 countries and reported speaking 164 native languages. We asked them to judge, quickly, whether each vocalization was directed to a baby or to an adult (see Methods and Extended Data Fig. 1). We only included recordings that lacked confounding contextual/background cues (e.g., an audible infant; see Methods). Unless noted otherwise, all estimates reported here are generated by mixed-effects linear regression, adjusting for fieldsite (as a random effect), and with p-values generated via linear combination tests.

Corpus-wide, infant-directed speech was far more likely to be rated as infant-directed than was adult-directed speech from the same voice (Fig. 2a; ID speech = 51%, AD speech = 22%; χ²(1) = 25.3, p < .0001); and infant-directed song was far more likely to be rated as infant-directed than was adult-directed song from the same voice (Fig. 2a; ID song = 72%, AD song = 57%; χ²(1) = 13.58, p < .001). These results were robust to learning effects, as they repeated when only analyzing each participant’s first exposure to a vocalization in the experiment and listener accuracy increased by only 0.06% after each trial (Extended Data Fig. 2). They were also robust to post-hoc data trimming decisions, such as excluding recordings with confounding background noise and/or trials where the listener could likely understand the words in the vocalization (Extended Data Fig. 3).

Fig. 2 Naïve listeners distinguish infant-directed vocalizations from adult-directed vocalizations across cultures.

a, Participants listened to vocalizations drawn at random from the corpus and responded to the prompt “Someone is speaking or singing. Who do you think they are singing or speaking to?” with either “adult” or “baby”. In almost all cases, infant-directed (ID) vocalizations were more frequently rated as “baby” than were adult-directed (AD) vocalizations produced by the same voice. This was true of both speech and song. The points indicate averages for each recording (from a median of 447 ratings per recording; IQR = 152-497; minimum = 96); the gray lines connecting the points indicate the pairs of vocalizations produced by the same voice; the half-violins are kernel density estimations; the box-plots represent the medians, interquartile ranges, and 95% confidence intervals (indicated by the notches); and the horizontal dotted line represents the expected accuracy under random guesses, 50%. Surprisingly, there was an overall bias toward inferring infant-directedness in music, wherein both types of song were more likely than chance to be rated as “infant-directed”. Trials where the listener’s native language is the same as the language of the vocalization are excluded, among others; this figure also includes a small amount of supplementary data via a follow-up experiment (see Methods). b, The results replicated in most fieldsites, although with varying effect sizes; this may result from sampling variability, true variability in the differences between infant- and adult-directed vocalizations across fieldsites, or both. The circles depict each fieldsite’s mean % infant-directed rating across each of the four vocalization types; with the same color-coding as a.

There was, however, an overall bias toward “baby” responses for songs (67% of all responses were “baby”, but only 51% of songs were infant-directed) and toward “adult” responses for speech (64% “adult” responses vs. 56% actually adult-directed), however, which led adult-directed songs to be reliably mis-identified as infant-directed. To quantify sensitivity to infant-directedness independently from this bias, we ran a d- prime analysis at the level of each vocalist, i.e., analyzing participants’ ability to identify infant-directedness within each voice after correcting for response bias. Sensitivity was significantly higher than the chance level of 0 (speech: d^′ = 1.05, 95% CI [0.64, 1.46]; song: d^′ = 0.42, 95% CI [0.22, 0.62]; ps < .0001) implying that the naïve listeners reliably differentiated between infant- and adult-directed vocalizations across both speech and song, and with ∼2.5 times higher sensitivity in speech.

We also analyzed performance in the task within the subset of recordings drawn from each fieldsite. Cross-site variability was evident, especially in the size of effects (but less so in their direction); we caution that some fieldsites had small samples, making it impossible to know whether such effects represent true cross-cultural variability, sampling variability, or both. In 20 of 21 fieldsites, mean “baby” ratings were higher for infant-directed speech than adult-directed speech (Fig. 2b) and in 17 of 21 fieldsites, mean “baby” ratings were higher for infant-directed song than adult-directed song (Fig. 2b). In all fieldsites that failed to replicate the overall pattern in song, however, the mean “baby” rating for infant-directedness was nonetheless above the chance level of 50%. Fieldsite-wise d^′ scores are reported in Extended Data Table 1.

Listener sensitivity within each fieldsite was also correlated with a number of society-level characteristics: rank-order population size (speech: τ = 0.53; song: τ = 0.6), distance from fieldsite to nearest urban center (speech: r = -0.75; song: r = -0.49), and number of children per family (speech: r = -0.57; song: r = -0.8; all ps < .001). Each of these predictors were highly correlated with each other (all r > 0.6), however, suggesting that they did not each contribute unique variance. There was no correlation with ratings of how frequently infant-directed vocalizations were used within each society (ps > .4).

Tests of cross-cultural variability among listeners also revealed strong similarity in the perception of infant-directedness. On trials where the vocalization being judged was in a closely related language to the native language of the listener (e.g., when the vocalization was in Spanish and the listener’s native language was English, which are both Indo-European languages), performance increased only modestly relative to trials where the language family did not match (e.g., when the vocalization was in Mentawai, an Austronesian language, and the listener’s native language was Mandarin, a Sino-Tibetan language); the effect was statistically significant but small (difference in d^′ = 0.18, p = 0.01; Extended Data Fig. 4). Linguistic relatedness therefore only accounted for a small amount of variability in naïve listeners’ intuitions of infant-directedness. More generally, random effects of listener country, gender, and age on sensitivity were all small (each varying by < 1%), implying cross-demographic consistency in listener intuitions.

Acoustic correlates of infant-directedness across cultures

What enables such a diverse group of people to arrive at such similar conclusions about unfamiliar, foreign vocalizations, in languages that they do not understand? One possibility is that there exists a universal set of acoustic features driving listeners’ inferences concerning the intended targets of speech and song, which are reliably instantiated within and across societies, as suggested by functional accounts of infant-directed vocalization^{33,36–43,46}.

To test this possibility, we studied 15 types of acoustic features in each recording (e.g., pitch, rhythm, timbre) via multiple variables (e.g., median, interquartile range); these were treated to reduce the influence of atypical observations (e.g., extreme values caused by loud wind, rain, and other background noises), and standardized within-voices to eliminate between-voice variability. This yielded a total of 99 variables (see Methods; a codebook is in Extended Data Table 2).

Following a preregistered exploratory-confirmatory design, we fitted a multi-level mixed-effects regression predicting each acoustic variable from the vocalization types, after adjusting for voice and fieldsite as random effects, and using linear combinations to test for infant-directedness differences in song and speech separately. To reduce the risk of Type I error, we performed this analysis on a randomly selected half of the corpus (exploratory, weighting by fieldsite) and only report results that successfully replicated in the other half (confirmatory). We did not correct for multiple tests because the exploratory-confirmatory design restricts the tests to those with a directional prediction.^*

This procedure identified 16 acoustic features that distinguished infant-directedness in song, speech, or both (Fig. 3; statistics are in Extended Data Table 3), in the context of producing infant-directed vocalizations “when baby is fussy”. For example, across cultures and within voices, infant-directed speech had considerably higher pitch, greater pitch range, and more contrasting vowels than adult-directed speech. These results repeated consistently in each fieldsite: pitch, energy-rolloff, and inharmonicity showed the same direction of difference in all 21 fieldsites; and other features, such as vowel contrasts and attack curve slopes, were consistent in the majority of them (see the doughnut plots in Fig. 3a). These patterns align with prior claims of pitch and vowel-contrast being robust features of infant-directed speech^23,65, and substantiate them across many cultures.

Fig. 3 Acoustic correlates of infant-directedness in speech and song.

a, Acoustic analyses revealed 16 features with a statistically significant difference between infant-directed and adult-directed vocalizations in speech, song, or both. These features operated differently across speech and song. For example, median pitch was far higher in infant-directed speech than in adult-directed speech in all 21 fieldsites, whereas median pitch was comparable across both forms of song. The boxplots, which are ordered approximately from largest differences between speech and song to smallest, represent each acoustic feature’s median (vertical black lines) and interquartile range (boxes); the whiskers indicate 1.5 × IQR; the notches represent the 95% confidence intervals of the medians; and the doughnut plots represent the proportion of fieldsites where the main effect repeated, based on estimates of fieldsite-wise random effects. All acoustic features were normalized within-voices; only comparisons that survived an exploratory-confirmatory analysis procedure are plotted; faded comparisons did not reach significance in confirmatory analyses. Significance values are computed via linear combinations, following multi-level mixed-effects models; *p < 0.05, **p < 0.01, ***p < 0.001. Statistics are in Extended Data Table 3. b, A principal components analysis on the full 99 acoustic variables independently supports the idea that the acoustic features operate differently in language and music: the first principal component most strongly distinguishes speech from song, overall; the second distinguishes infant-directed from adult-directed song; and the third distinguishes infant-directed from adult-directed speech. The violins indicate kernel density estimations and the boxplots represent the medians and interquartile ranges. Feature loadings are in Extended Data Table 4.

The distinguishing features of infant-directed song were more subtle, however, but nevertheless corroborate its purported soothing functions^33,41,46: reduced loudness, intensity, and acoustic attack; reduced pitch range; and purer-sounding vocal qualities (reduced roughness and inharmonicity), which were mostly consistent across sites. The smaller effects in song, relative to speech, may result from the fact that while solovoice speaking is fairly natural and representative of most adult-directed speech (i.e., people rarely speak at the same time), much of the world’s song occurs in social groups where there are multiple singers and accompanying instruments^32,46,80. Asking participants to produce solo adult-directed song may have biased participants toward choosing more soothing and intimate songs (e.g., ballads, love songs; see Extended Data Table 4), or less naturalistic renditions of songs that would normally be sung in less constrained social contexts. Further, the adult-directed songs were produced in the presence of an infant, which can in principle alter participants’ singing style³⁵ (although this may comparably alter the adult-directed speech examples; see Methods for one test of this question). Thus the distinctiveness of infant-directed song (relative to adult-directed song) may be underestimated in these data.

Some acoustic correlates of infant-directedness had very different trends across language and music. For example, whereas median pitch strongly differentiated infant-directed speech from adult-directed speech, it had no such effect in music; pitch variability had the opposite effect across language and music; and similar patterns were evident in first and second formants. Loudness-related features showed a similar pattern, where intensity and attack slope were increased in infant-directed speech and decreased in infant-directed song, on average, and relative to their adult-directed counterparts. That some basic acoustic features operate differently across infant-directed speech and song supports the possibility of differentiated functional roles^{18,33,34,45,46,79,81}.

But some acoustic features were nevertheless common to both language and music; in particular, overall, infant-directedness was characterized by reduced roughness and inharmonicity, which may facilitate parent-infant signalling^5,41 through better contrast with the sounds of screaming and crying^17,82; and increased vowel contrasts, potentially to aid language acquisition^36,37,39 or as a byproduct of socio-emotional signalling^1,63.

Last, we conducted an exploratory principal components analysis of the full 99 features (Fig. 3b; the analysis accounts for ∼40% of total variability in acoustic features). The results provide convergent evidence that the main forms of acoustic variation partition into orthogonal clusters distinguishing (PC1) speech from song overall; (PC2) infant-directedness in song; and (PC1 and PC3) infant-directedness in speech. Factor loadings are in Extended Data Table 5; they largely replicate the findings of the exploratory-confirmatory analyses. One further pattern that the principal components analysis highlights is that infant-directedness makes speech more “songlike”, in terms of higher pitch and reduced roughness (PC3); but speech strongly differed from song overall in terms of the variability and rate of variability of pitch, intensity, and vowels, and infant-directedness further exaggerated these differences for speech (PC1).

Human intuitions of infant-directedness are modulated by vocalization acoustics

Last, we assessed whether these acoustic features alone are sufficient to replicate human performance in classifying infant-directedness. To do this, we trained two least absolute shrinkage and selection operator (LASSO) classifiers⁸³ with fieldsite-wise leave-one-out cross-validation, separately for speech and song recordings. This approach³² gives a strong test of the cross-cultural consistency of acoustical correlates of infant-directedness, as the model’s classification accuracy is evaluated on held-out data from a fieldsite that it has not been trained on.

Both models performed significantly above the 50% chance level (Fig. 4a; speech: 77% correct, 95% CI [71%, 83%]; song: 65% correct, 95% CI [59%, 71%]). When accounting for response bias, model performance was highly similar to the aggregate guessing patterns of human listeners, as evaluated via a receiver operating characteristic analysis (Extended Data Fig. 6), for both speech (human AUC: 90.77, 95% CI [88.41, 93.14]; model AUC: 92.13, 95% CI [90.33, 93.93]) and song (human AUC: 75.52, 95% CI [71.7, 79.33]; model AUC: 77.37, 95% CI [74.14, 80.6]). Using this same bias-free metric, both models also performed similarly to humans at the level of each individual fieldsite (speech: r = 0.38, p = 0.04; song: r = 0.56, p = 0.004; see Fig. 4a and Extended Data Fig. 7). These results demonstrate that the measured acoustic correlates of infant-directedness operate reliably across the 21 societies studied, at least with sufficient consistency to replicate the overall level of human classification performance.

Fig. 4 Human inferences about infant-directedness are predictable from acoustic features of vocalizations.

a, We trained two Least absolute shrinkage and selection operator (LASSO) models, one for speech and one for song, to classify whether recordings were infant-directed or adult-directed on the basis of the 16 acoustic features identified by our exploratory-confirmatory analysis. These predictors were then regularized using cross-validation across fieldsites. The bars represent the overall classification performance averaged across fieldsites (quantified via receiver operating characteristic/area under the curve; AUC) for both the classifier and for the humans in the naïve-listener experiment; the error bars represent 95% confidence intervals; and the points represent the average performance for each fieldsite. b, Two further LASSO models were trained using the same procedure, predicting the percentage of “baby” responses for each recording from the human listeners. Each point represents a recorded vocalization, plotted in terms of the model’s estimated infant-directedness of the model and the average “infant-directed” rating from the naïve listeners; the barplots depict the relative explanatory power of the top 8 acoustical features in each LASSO model, showing which features were most strongly associated with human inferences (the colors indicate whether each feature was higher in value for ID or AD); the dotted diagonal lines represent a hypothetical perfect match between model predictions and human guesses; the solid black lines depict linear regressions; the grey ribbons represent the standard errors of the mean, from the regressions; and the shaded regions represent kernel density estimations of the distribution of model estimates for the vocalization types plotted in each panel (with vertical black lines depicting the medians).

We then examined the precise relations between acoustic features and the experiment-wide proportions of infant-directedness ratings for each vocalization, in a similar approach to prior research⁷³. The proportions are a more strenuous target to predict than a binary classification (as in the first two LASSO models) in that they form a continuous measure of infant-directedness per the ears of the naïve listeners. We trained two further LASSO models to predict the proportions, using the same cross-validation procedure. Both models explained considerable variation in human listeners’ intuitions (Fig. 4b; speech R² = 0.56; song R² = 0.21, ps < .0001), albeit more so in speech than in song.

We also measured the relations between the influence of each acoustic cue on human intuitions and the effect sizes of each variable in the corpus-wide acoustical analyses. If human inferences are attuned to some universal profile of acoustic correlates of infant-directedness, one might expect a close relationship between the strength of actual acoustic differences between vocalizations on a given feature and the relative influence of that feature on human intuitions. We compared the variable importance scores from the LASSO model predicting human inferences (visualized in the bar plots in Fig. 4a) to a measure of how acoustically salient each feature was (estimated as mean differences in the corpus; Fig. 3). We found a significant positive relationship for speech (r = 0.82, p < .001) but not for song (r = 0.32, p = 0.14), implying that human intuitions concerning infant-directed song were likely driven by more subjective features of the recordings, higher-level acoustic features that we did not measure, or both; this contrasts with intuitions concerning infant-directed speech, which were largely explicable from simple, objective acoustic features.

Discussion

We provide convergent evidence for cross-cultural regularities in the acoustic design of infant-directed speech and song. Naïve listeners reliably identified infant-directed vocalizations as infant-directed, despite the fact that the vocalizations were of largely unfamiliar cultural, geographic, and linguistic origin; acoustic analyses showed cross-culturally reliable acoustic differentiation of infant-directed and adult-directed vocalizations, in both speech and song; and these acoustic distinctions explained substantial variability in human intuitions concerning infant-directedness.

Thus, despite evident variability in language, music, and infant care practices worldwide, when people speak or sing to fussy infants, they modify the acoustic features of their vocalizations in similar and mutually intelligible ways across cultures. This implies that the forms of infant-directed vocalizations are shaped by their functions, in a fashion similar to the vocal signals of many non-human species.

By analyzing both speech and song recorded from the same voices, we discerned precise differences in the ways infant-directedness is instantiated in language and music. In response to the same prompt of addressing a “fussy infant”, infant-directedness in speech and song was instantiated with opposite trends in acoustic modification (relative to adult-directed speech and song): infant-directed speech was more intense and contrasting (e.g., more pitch variability, higher intensity) while infant-directed song was more subdued and soothing (e.g., less pitch variability, lower intensity). These acoustic dissociations suggest functional dissociations, with speech being more attention-grabbing, the better to distract from baby’s fussiness^37,38; and song being more soothing, the better to lower baby’s arousal^{32,33,41–43,45,79}. Speech and song are both capable of playful or soothing roles⁶⁰ but each here tended toward one acoustic profile over the other, despite both types of vocalization being elicited here in the same context: vocalizations used “when the baby is fussy”.

Many of the reported acoustic differences are consistent with the bioacoustics of vocal signalling in non-human animals^1–15. For example, in both speech and song, infant-directedness was robustly associated with purer and less harsh vocal timbres, and greater formant-frequency dispersion. In non-human animals, these features have convergently evolved across taxa in the functional context of signalling friendliness or approachability in close contact calls^1,3,63,84, in contrast to alarm calls or signals of aggression, which are associated with rough sounds that have less formant dispersal^4,85–87. The use of these features in infant care may originate from signalling approachability to baby, but may have later acquired further functions more specific to the human context. For example, greater formant-frequency dispersion accentuates vowel contrasts, which could facilitate language acquisition^{36,63,88–90}; and purer vocal timbre may facilitate communication by contrasting conspicuously with the acoustic context of infant cries⁵ (for readers unfamiliar with infants, this context is acoustically harsh^17,82).

Higher pitch is also routinely a cue for animal vocal signalling of approachability and friendliness; accordingly, one of the largest and most robust results in our study was that infant-directedness raised the vocal pitch (f0) of speech to a songlike level. But infant-directedness had no effect on pitch within song. This curious asymmetry is consistent with the idea that pitched aspects of music may originate from elaborations to generic infant-directed vocalizations, where both use less harsh but more variable pitch patterns and more temporally variable and expansive vowel spaces to provide infants with ostensible “flashy” signals of attention and pro-social friendliness^{41,46,61,91,92}. This does not mean that pitch alterations are absent from infant-directed song (indeed, in one study, mothers sang a song at higher pitch when producing a more playful rendition, and a lower pitch when producing a more soothing rendition⁴⁴), but on average, both infant- and adult-directed song, along with infant-directed speech, tend to be higher in pitch than adult-directed speech.

We leave open at least two further questions. First, the results are suggestive of universality, because the corpus covers a swath of geographic locations (21 societies on 6 continents), languages (12 language families), and different subsistence regimes (8 types) (see Table 1). But these do not constitute a representative sample of humans, so strong claims of universality are not justified; indeed, we found both cross-cultural consistency and variability (e.g., with the fieldsite in Wellington, New Zealand demonstrating main effects an order of magnitude larger than some other fieldsites). In addition to studying more representative samples of infant-directed vocalizations; other future approaches may (i) use phylogenetic methods to examine whether people in societies that are distantly related nonetheless produce similar infant-directed vocalizations; (ii) test perceived infant-directedness in more diverse samples of listeners, to more accurately characterize cross-cultural variability in the perception of infant-directedness; and (iii) test listener intuitions among groups with reduced exposure to a given set of infant-directed vocalizations, such as very young infants or people from isolated, distantly related societies, as in related efforts^27,67,93. Such research would benefit in particular from a focus on societies previously reported to have unusual vocalization practices, infant care practices, or both^53,56–58; and would also clarify the extent to which convergent practices across cultures are due to cultural borrowing (in the many cases where societies are not fully isolated from the influence of global media).

Second, speech and song are used in a multiple contexts with infants, of which “addressing a fussy infant” (the type of vocalization we elicited from participants) is just one^18,34. One curious finding may bear on this question: naïve listeners displayed a bias toward “adult” guesses for speech and “baby” guesses for song, regardless of their actual targets. This suggests that listeners treated “adult” and “baby” as the default reference levels for speech and song, respectively, against which acoustic evidence was compared, a pattern consistent with theories that posit song as having a special connection to infant care in human psychology^33,46.

Methods

Vocalization corpus

We built a corpus of 1,615 recordings of infant-directed song, infant-directed speech, adult-directed song, and adult-directed speech (all audio is available at https://doi.org/10.5281/zenodo.5525161). Participants (N = 411) living in 21 societies (Fig. 1a and Table 1) produced each of these vocalizations, respectively, with a median of 15 participants per society (range: 6-57). From those participants for whom information was available, most were female (86%) and nearly all were parents or grandparents of the focal infant (95%).

Recordings were collected by principal investigators and/or staff at their field sites, all using the same data collection protocol. They translated instructions to the native language of the participants, following the standard research practices at each site. There was no procedure for screening out participants, but we encouraged our collaborators to collect data from parents rather than non-parents. Fieldsites were selected partly by convenience (i.e., via recruiting principal investigators at fieldsites with access to infants and caregivers) and partly to maximize cross-fieldsite diversity (see Table 1).

For infant-directed song and infant-directed speech, participants were asked to sing and speak to their infant as if they were fussy, where “fussy” could refer to anything from frowning or mild whimpering to a full tantrum. At no fieldsites were difficulties reported in the translation of the English word “fussy”, suggesting that participants understood it. For adult-directed speech, participants spoke to the researcher about a topic of their choice (e.g., they described their daily routine). For adult-directed song, participants sang a song that was not intended for infants; they also stated what that song was intended for (e.g., “a celebration song”). The record collection protocol is posted at https://github.com/themusiclab/infant-speech-song.

For most participants (90%) an infant was physically present during the recording (the infants were 48% female; age in months: M = 11.4; SD = 7.61; range 0.5-48). When an infant was not present, participants were asked to imagine that they were vocalizing to their own infant or grandchild, and simulated their infant-directed vocalizations. Prior research has shown that simulated infant-directedness is qualitatively similar, albeit less exaggerated than when authentic, for both speech⁹⁴ and song³⁵. Indeed, a model of the naïve listener results adjusting for fieldsite indeed showed a small decrease in “baby” guesses when an infant was not present (ID song: 7.1%, ID speech: 8.4%, AD song: 6.5%, AD speech: 4.3%, ps < .0001), and this effect was stronger for vocalizations that were infant-directed than adult-directed (χ²(1) = 5.67, p = 0.02). Both the naive listener results and acoustic analyses were robust to whether these simulated infant-directed vocalizations were included or excluded, however.

In all cases, participants were free to determine the content of their vocalizations. This was intentional: imposing a specific content category on their vocalizations (e.g., “sing a lullaby”) would likely alter the acoustic features of their vocalizations, which are known to be influenced by experimental contexts⁹⁵. Some participants produced adult-directed songs that shared features with the intended soothing nature of the infant-directed songs; data on the intended behavioral context of each adult-directed song are in Extended Data Table 4.

All recordings were made with Zoom H2n digital field recorders, using foam windscreens (where available). To ensure that participants were audible along with researchers (who stated information about the participant and environment before and after the vocalizations), recordings were made with a 360° dual x-y microphone pattern. This produced two uncompressed stereo audio files (WAV) per participant at 44.1 kHz; we only analyzed audio from the two-channel file on which the participant was loudest.

The principal investigator at each fieldsite also provided standardized background data on the behavior and cultural practices of the society (e.g., whether there was access to mobile-phones/TV/radio, and how commonly people used ID speech or song in their daily lives). Most items were based on variables included in the D-PLACE cross-cultural corpus⁹⁶. Complete data are posted on the project GitHub repository.

The 21 societies varied widely in their characteristics, from cities with millions of residents (Beijing) to small-scale hunter-gatherer groups of as few as 35 people (Hadza). All of the small-scale societies studied had limited access to TV, radio, and the internet, mitigating against the influence of exposure to the music and/or infant care practices of other societies. Four of the small-scale societies (Nyangatom, Toposa, Sápara/Achuar, and Mbendjele) were completely without access to these communication technologies.

The societies also varied in the prevalence of infant-directed speech and song in day-to-day life. The only site reported to lack infant-directed song in contemporary practice was the Quechuan/Aymaran site, although it was also noted that people from this site know infant-directed songs in Spanish and use other vocalizations to calm infants. Conversely, the Mbendjele BaYaka were noted to use infant-directed song, but rarely used infant-directed speech. In most sites, the frequency of infant-directed song and speech varied. For example, among the Tsimane, song is reportedly infrequent in the context of infant care; when it appears, however, it is specifically used to soothe and encourage infants to sleep.

Naïve listener experiment

We analyzed all data available at the time of writing this manuscript from the “Who’s Listening?” game at https://themusiclab.org/quizzes/ids, a continuously running jsPsych⁹⁷ experiment distributed via Pushkin⁹⁸. A total of 63,481 participants began the experiment, the first in January 2019 and the last in October 2021.

We played participants vocalizations from a subset of the corpus, excluding those that were less than 10 seconds in duration (n = 113) and those with confounding sounds that were not produced by the target voice in the first 5 seconds of the recording (e.g., a crying baby or laughing adult in the background; n = 364), as determined by two independent annotators who remained unaware of vocalization type and fieldsite (with disagreements resolved by discussion). We also excluded trials where the native language of the listener matched the language of the vocalization (N = 85,968 of 709,628 trials, or 12.1%), as this could enable listeners to infer whether a vocalization was infant-directed independently of the vocalization’s acoustic characteristics. Robustness checks confirmed that the data trimming decisions did not substantially alter the results (Extended Data Fig. 3). Irrespective of the recordings each participant was assigned, we also excluded participants who reported having previously participated in the same experiment (n = 3,514); participants who reported being younger than 12 years old (n = 1,340); and those who reported having a hearing impairment (n = 1,201).

This yielded a sample of 45,745 participants (gender: 20,664 female, 24,126 male, 922 other, 33 did not disclose; age: median 22 years, interquartile range 18-29). Participants self-reported living in 184 different countries (Fig. 1b) and speaking 164 different native languages; roughly half the participants were native English speakers from the United States.

Participants listened to at least 1 and at most 16 vocalizations drawn from the subset of the corpus (as they were free to leave the experiment before completing it) for a total of 388,985 ratings (Fig. 1b; infant-directed song: n = 109,994; infant-directed speech: n = 77,317; adult-directed song: n = 104,023; adult-directed speech: n = 97,651). The vocalizations were selected with weighted randomization, such that a set of 16 trials included 4 vocalizations in English and 12 in other languages; roughly half the corpus was English-language vocalizations, so this method ensured that participants heard a substantial number of vocalizations in other languages. This yielded over 46 ratings per vocalization (median = 447; interquartile range 151-496.75) and thousands of ratings for each society (median = 18,631; interquartile range: 12,100-21,393).

We asked participants to classify each vocalization as either directed toward a baby or an adult (Extended Data Fig. 1), as quickly as possible, either by pressing a key corresponding to a drawing of an infant or adult face (when the participant used a desktop computer) or by tapping one of the faces (when the participant used a tablet or smartphone). The locations of the faces (left vs. right on a desktop; top vs. bottom on a tablet or smartphone) were randomized participant-wise. As soon as they made a choice, playback stopped. After each trial, we told participants whether or not they had answered correctly and how long, in seconds, they took to respond; at the end of the experiment, we gave participants a total score and percentile rank (relative to other participants).

In revising this manuscript, we discovered that a small subset of the corpus had been erroneously excluded from the main experiment. In most cases, these were recordings that had been too-conservatively edited to be too short to include in the experiment (but could reasonably be edited to include longer sections of audio); in some other cases, the original excerpting included confounding background noises that, upon additional editing, were avoidable. To ensure maximal coverage of the fieldsites studied here, we re-excerpted the audio of 103 examples and collected supplemental naïve listener data on these recordings via a Prolific experiment (N = 97, 54 male, 42 female, 1 other, mean age = 29.7 years). The Prolific experiment was identical to the citizen-science experiment, except that each participant was paid (at US$15/hr) rather than volunteering; and each participant rated 188 instead of up to 16. In addition to the erroneously excluded recordings, we included in the Prolific experiment 85 additional recordings randomly selected from those that were included in the citizen-science experiment, ensuring that each Prolific participant heard an exactly balanced set of vocalization types. The two cohorts’ ratings of the recordings in common across the two experiments were highly correlated (r = 0.95, p < .0001), demonstrating that they had similar intuitions concerning infant-directedness in speech and song. As such, in the main text, we report all the ratings together without disambiguating between the cohorts.

Acoustic feature extraction

We manually extracted the longest continuous and uninterrupted section of audio from each of the four samples per participant (i.e., isolating vocalizations by the participant from interruptions from other speakers, the infant, and so on), using Adobe Audition. We then used the silence detection tool in Praat⁹⁹, with minimum sounding intervals at 0.1 seconds and minimum silent intervals at 0.3 seconds, to remove all portions of the audio where the participant was not speaking (i.e., the silence between vocalization phrases). These were manually concatenated in Python, producing denoised recordings, which were subsequently checked manually to ensure minimal loss of content.

We extracted and subsequently analyzed acoustic features using Praat⁹⁹, MIRtoolbox¹⁰⁰, temporal modularity using discrete Fourier transforms for rhythmic variability¹⁰¹, and normalized pairwise variability indices¹⁰². These features consisted of measurements of pitch (e.g., F₀, the fundamental frequency), timbre (e.g., roughness), and rhythm (e.g., tempo); all summarized over time: producing 99 variables in total. We standardized feature values within-voices, eliminating between-voice variability. In the main acoustic analyses (Fig. 3a), we restricted the variable set to 26 summary statistics of median and interquartile range, as these correlated highly with other summary statistics (e.g., maximum, range) but were less sensitive to extreme observations. The principal components analysis (Fig. 3b) used the full variable set of 99 variables.

Praat

We extracted intensity, pitch, and first and second formant values from the denoised recordings every 0.03125 seconds. For male participants, the pitch floor was set at 75 Hz, with a pitch ceiling at 300 Hz, and a maximum formant of 5000 Hz. For female participants, these values were 100 Hz, 600 Hz, and 5500 Hz, respectively. From these data, several summary values were calculated per recording: mean and maximum first and second formants, mean pitch, and minimum intensity. In addition to these summary statistics, we measured the intensity and pitch rates as change in these values over time. For vowel measures, the first and second formants were used to calculate both the average vowel space used, as well as the vowel change rate (measured as change in Euclidean formant space) over time.

MIRtoolbox

All MIRtoolbox (v. 1.7.2) features were extracted with default parameters¹⁰⁰. mirattackslope returns a list of all attack slopes detected, so final analyses were done on summary features (e.g., mean, median, etc.). Final analyses were also done on summary features for mirroughness, which returns time series data of roughness measures in 50ms windows. We RMS-normalized the mean of mirroughness following¹⁰³. MIRtoolbox features were computed on the denoised recordings, with the exception of mirtempo and mirpulseclarity, where removing the silences between vocalizations would have altered the tempo.

Rhythmic variability

For temporal modulation spectra we followed Ding’s¹⁰⁴ method, which combines discrete Fourier transforms applied to contiguous six-second excerpts. To analyze the entirety of each recording, we appended all recordings with silence to be exact multiples of six-seconds. The location of the peak (Hz) and variance of the temporal modulation spectra were extracted from their RMS values.

Normalized pairwise variability index

The nPVI represents the temporal variance of data with discrete events, which makes it especially useful for comparing speech and music¹⁰¹. We used an automated syllable- and phrase-detection algorithm to extract events¹⁰². We computed nPVI in two ways: by averaging the nPVI of each phrase within a recording, as well as by treating the entire recording as a single phrase. Because intervening silence would influence both temporal modulation and nPVI measures, we used recordings before they had been denoised.

Outlier preprocessing

Because automated acoustic analyses are highly sensitive at extremes (e.g., impossible values caused by non-vocal sounds, like loud wind), we Winsorized all variables. This process arbitrarily defines outliers as being those exceeding the lowest and highest 5 percentile ranks, recoding them as precisely the values of those percentile boundaries. These data were used for all acoustic analyses. This decision had no impact on the interpretation of results, but is preferable to trimming extreme values¹⁰⁵; pilot analyses using an alternate method, imputing extreme values with the mean observation for each feature within each fieldsite, yielded comparable results.

End notes

Data, code, and materials availability

A fully reproducible manuscript; data, analysis code, and visualizations; other materials; and code for the naïve listener experiment are available at https://github.com/themusiclab/infant-speech-song. The audio corpus is available at https://doi.org/10.5281/zenodo.5525161. The preregistration for the auditory analyses is at https://osf.io/5r72u. Readers may participate in the naïve listener experiment by visiting https://themusiclab.org/quizzes/ids.

Author contributions

S.A.M. and M.M.K. conceived of the research, provided funding, and coordinated the recruitment of collaborators and creation of the corpus.
S.A.M. and M.M.K. designed the protocol for collecting vocalization recordings with input from D.A., who piloted it in the field.
L.G., A.G., G.J., C.T.R., M.B.N., A.M., L.K.C., S.E.T., J. Song, M.K., A.S., T.A.V., Q.D.A., J.A., P.M., A.S., C.D.P., G.D.S., S.K., M.S., S.A.C., J.Q.P., C.S., J. Stieglitz, C.M., R.R.S., and B.M.W collected the field recordings.
S.A.M., C.M.B., and J. Simson designed and implemented the online experiment.
C.J.M. and H.L-R. processed all recordings and designed the acoustic feature extraction with S.A.M. and M.M.K.; C.M.B. provided associated research assistance.
C.M. designed the fieldsite questionnaire with assistance from M.B. and C.J.M., who collected the data from the principal investigators.
C.B.H. and S.A.M. led analyses, with additional contributions from C.J.M., M.B., and D.K., and M.M.K.
C.B.H. and S.A.M. designed the figures.
C.B.H. wrote computer code, with contributions from S.A.M., C.J.M., and M.B.
C.J.M., H.L-R., M.M.K., and S.A.M. wrote the initial manuscript.
C.B.H. and S.A.M. wrote the revision, with contributions from C.J.M. and M.B., and all authors approved it.

Ethics

Informed consent was obtained from all participants. Ethics approval for the naïve listener experiment was provided by the Committee on the Use of Human Subjects, Harvard University’s Insitutional Review Board (protocol #IRB17-1206). Ethics approval for the collection of recordings and their use in research was decentralized; each collaborating research arranged ethics approval with their local institution.

Additional information

The authors declare no competing interests.

Supplementary information is available for this paper.

Correspondence and requests for materials should be addressed to S.A.M.

Supplementary Information

Extended Data Fig.1 Screenshots from the naïve listener experiment.

On each trial, participants heard a randomly selected vocalization from the corpus and were asked to quickly guess to whom the vocalization was directed: an adult or a baby. The experiment used large emoji and was designed to display comparably on desktop computers (a) or tablets/smartphones (b).

Extended Data Fig.2 The main effects in the naïve listener experiment are not attributable to learning.

a, This panel repeats Fig. 2a, but only uses data from each participant’s first trial, to avoid the possibility of any learning effects over the course of their participation. See further details in the caption to Fig. 2a. b, Over the course of multiple trials in the experiment, which contained corrective feedback, participants’ accuracy was estimated to increase by only 0.06% per trial (p < .001).

Extended Data Fig.3 The main effects in the naïve listener experiment are robust to alternative exclusion criteria.

a, This panel repeats Fig. 2a, but including analysis of all recordings, even those that have audible confounds (e.g., a crying infant). b, This panel again repeats Fig. 2a, but excluding all English-language recordings (i.e., mostly from the Wellington, Toronto and San Diego sites). In both cases, the main effects repeat. See further details in the caption to Fig. 2a.

Extended Data Fig.4 Relation between listener language and vocalist language is minimally predictive of sensitivity to infant-directedness.

We split all trials from the main experiment into two groups: those where the native language of the listener was in the same Glottolog language family as the language of the vocalization excerpt (but was not the same language; n = 270,221), and those that were not (n = 110,565). The plot shows the estimated marginal effects of a mixed-effects model predicting d-prime values across language and music examples, after adjusting for fieldsite. Relatedness had only a modest effect on identification accuracy.

Extended Data Fig.5 Exploratory-confirmatory selected acoustic features for pre-registered analyses.

The preregistered analyses included comparisons of the acoustic features of infant-directed vocalizations, regardless of whether they included speech or song. For the reasons discussed in the Main Text and Footnote 1, and per the results reported in Fig. 3, these results should be interpreted with caution, as direct comparisons of acoustic features across modalities (language vs. music) may be spurious or may hide underlying variation within each modality. The boxplots show the 25 acoustic features with a significant difference in at least one main comparison (e.g., infant-directed song vs. infant-directed speech, in the right panel), in both the exploratory and confirmatory analyses. All variables are normalized across participants. The boxplots represent the median and interquartile range; the whiskers indicate 1.5 × IQR; and the notches represent the 95% confidence intervals of the medians. Faded comparisons did not reach significance in exploratory analyses. Significance values are computed via linear combinations, following multi-level mixed-effects models; *p < 0.05, **p < 0.01, ***p < 0.001. Prespecified hypotheses about each comparison are posted in the project GitHub repository.

Extended Data Fig.6 Machine classifier and naïve listeners perform similarly.

Receiver operator characteristic curves show a close match between the model and naïve listeners, after accounting for response bias. The dotted lines represent the model and the solid lines represent human performance, for both speech (orange) and song (blue). The area under the curve values (AUC) visualized in Fig. 4 are derived from this ROC curve; they summarize performance as a single value as the area under the curve.

Extended Data Fig.7 Comparison of machine classifier and naïve listener performance across fieldsites.

The bar graphs summarize the models’ accuracy for classifying infant-directedness in speech and song, using the same receiver operator characteristic curve approach reported in Fig. 4 and Extended Data Fig. 6, but disambiguated across the 21 fieldsites.

View this table:

Extended Data Table 1.

d-prime values quantifying sensitivity to infant-directedness in speech and song, independent of response bias, for each fieldsite. Values are estimated as random coefficients from mixed-effects model predicting d^′ from vocalization type, with random effects of fieldsite for each vocalization type. n refers to the number of vocalists that had a complete pair of vocalizations in the listener experiment (e.g., where one or both of the infant- and adult-directed vocalizations were not excluded due to confounds).

View this table:

Extended Data Table 2.

Codebook for acoustic features. Variable names are stubs, i.e., in the datasets, suffixes are added to denote summary statistics. Abbreviations: infant-directed (ID); adult-directed (AD).

View this table:

Extended Data Table 3.

Regression results from confirmatory analyses (corresponding with Fig. 3). The features tested here were limited to those with significant differences in the exploratory analyses. Statistics are from post-hoc linear combinations following multi-level mixed-effects models. Abbreviations: infant-directed (ID); adult-directed (AD).

View this table:

Extended Data Table 4.

Adult-directed songs with descriptions rated as “soothing” by two independent annotators. A mixed-effects model estimating the difference in perceived infant-directedness across these vs. other adult-directed songs, adjusting for fieldsite-wise variability, found no statistically significant difference in responses (b = −0.011, p = .13).

View this table:

Extended Data Table 5.

Factor loadings of the top three principal components reported in Main Text Fig. 3.

Acknowledgments

This research was supported by the Harvard University Department of Psychology (M.M.K. and S.A.M.); the Harvard College Research Program (H.L-R.); the Harvard Data Science Initiative (S.A.M.); the National Institutes of Health Director’s Early Independence Award DP5OD024566 (S.A.M. and C.B.H.); the Academy of Finland Grant 298513 (J.A.); the Royal Society of New Zealand Te Apārangi Rutherford Discovery Fellowship RDF-UOA1101 (Q.D.A., T.A.V.); the Social Sciences and Humanities Research Council of Canada (L.K.C.); the Polish Ministry of Science and Higher Education grant N43/DBS/000068 (G.J.); the Fogarty International Center (P.M., A.S., C.D.P.); the National Heart, Lung, and Blood Institute, and the National Institute of Neurological Disorders and Stroke Award D43 TW010540 (P.M., A.S.); the National Institute of Allergy and Infectious Diseases Award R15-AI128714-01 (P.M.); the Max Planck Institute for Evolutionary Anthropology (C.T.R., C.M.); a British Academy Research Fellowship and Grant SRG-171409 (G.D.S.); the Institute for Advanced Study in Toulouse, under an Agence nationale de la recherche grant, Investissements d’Avenir ANR-17-EURE-0010 (L.G., J. Stieglitz); the Fondation Pierre Mercier pour la Science (C.S.); and the Natural Sciences and Engineering Research Council of Canada (S.E.T.). We thank the participants and their families for providing recordings; L. Sugiyama, for supporting pilot data collection; J. Du, E. Pillsworth, P. Wiessner, and J. Ziker, who collected or attempted to collect additional recordings; S. Atwood, A. Bergson, Z. Jurewicz, D. Li, L. Lopez, E. Radyte?, and S. Ccari Cutipa for research assistance; and J. Kominsky, L. Powell, and L. Yurdum for feedback on the manuscript.

Footnotes

corrections to author spellings and affiliations
https://doi.org/10.5281/zenodo.5525161
https://github.com/themusiclab/infant-speech-song
https://themusiclab.org/quizzes/ids
↵^* We note one important deviation from the preregistration: we originally planned post-hoc linear combinations to test hypothesized differences between (1) infant-directed and adult-directed vocalizations overall; (2) infant-directed song and adult-directed song; and (3) infant-directed song and infant-directed speech. We retain the second comparison in the main text, but no longer focus on (1) or (3) as the analysis approach is confounded by the fact that acoustic differences between speech and song overall far outstrip the acoustic correlates of infant-directedness. Instead, we adopted the simpler and more informative approach of post-hoc comparisons that are only within speech and within song. We also retained the exploratory-confirmatory design, as it mitigates the potential for inflated Type I errors. For transparency, we still report the preregistered post-hoc tests in Extended Data Fig. 5, but suggest that these comparisons be interpreted with caution.

References

1.↵
Morton, E. S. On the occurrence and significance of motivation-structural rules in some bird and mammal sounds. The American Naturalist 111, 855–869 (1977).
OpenUrl CrossRef Web of Science
2.↵
Endler, J. A. Some general comments on the evolution and design of animal communication systems. Philosophical Transactions of the Royal Society B: Biological Sciences 340, 215–225 (1993).
OpenUrl PubMed Web of Science
3.↵
Owren, M. J. & Rendall, D. Sound on the rebound: Bringing form and function back to the forefront in understanding nonhuman primate vocal signaling. Evolutionary Anthropology 10, 58–71 (2001).
OpenUrl CrossRef Web of Science
4.↵
Fitch, W. T., Neubauer, J. & Herzel, H. Calls out of chaos: The adaptive significance of nonlinear phenomena in mammalian vocal production. Animal Behaviour 63, 407–418 (2002).
OpenUrl CrossRef Web of Science
5.↵
Wiley, R. H. The evolution of communication: Information and manipulation. Animal Behaviour 2, 156–189 (1983).
OpenUrl
6.↵
1. Krebs, J. &
2. Davies, N
Krebs, J. & Dawkins, R. Animal signals: Mind-reading and manipulation. in Behavioural Ecology: An Evolutionary Approach (eds. Krebs, J. & Davies, N.) 380–402 (Blackwell, 1984).
7.↵
Karp, D., Manser, M. B., Wiley, E. M. & Townsend, S. W. Nonlinearities in meerkat alarm calls prevent receivers from habituating. Ethology 120, 189–196 (2014).
OpenUrl CrossRef
8.
Slaughter, E. I., Berlin, E. R., Bower, J. T. & Blumstein, D. T. A test of the nonlinearity hypothesis in great-tailed grackles (Quiscalus mexicanus). Ethology 119, 309–315 (2013).
OpenUrl CrossRef
9.
Wagner, W. E. Fighting, assessment, and frequency alteration in Blanchard’s cricket frog. Behavioral Ecology and Sociobiology 25, 429–436 (1989).
OpenUrl CrossRef Web of Science
10.↵
Ladich, F. Sound production by the river bullhead, Cottus gobio L. (Cottidae, Teleostei). Journal of Fish Biology 35, 531–538 (1989).
OpenUrl CrossRef
11.↵
Filippi, P. et al. Humans recognize emotional arousal in vocalizations across all classes of terrestrial vertebrates: Evidence for acoustic universals. Proceedings of the Royal Society B: Biological Sciences 284, (2017).
12.
Lingle, S. & Riede, T. Deer mothers are sensitive to infant distress vocalizations of diverse mammalian species. The American Naturalist 184, 510–522 (2014).
OpenUrl
13.↵
Custance, D. & Mayer, J. Empathic-like responding by domestic dogs (Canis familiaris) to distress in humans: An exploratory study. Animal Cognition 15, 851–859 (2012).
OpenUrl CrossRef PubMed
14.↵
Magrath, R. D., Haff, T. M., McLachlan, J. R. & Igic, B. Wild birds learn to eavesdrop on heterospecific alarm calls. Current Biology 25, 2047–2050 (2015).
OpenUrl CrossRef PubMed
15.↵
Lea, A. J., Barrera, J. P., Tom, L. M. & Blumstein, D. T. Heterospecific eavesdropping in a nonsocial species. Behavioral Ecology 19, 1041–1046 (2008).
OpenUrl CrossRef Web of Science
16.↵
Piantadosi, S. T. & Kidd, C. Extraordinary intelligence and the care of infants. Proceedings of the National Academy of Sciences 113, 6874–6879 (2016).
OpenUrl Abstract/FREE Full Text
17.↵
Soltis, J. The signal functions of early infant crying. Behavioral and Brain Sciences 27, 443–458 (2004).
OpenUrl CrossRef PubMed Web of Science
18.↵
1. Barkow, J. H.,
2. Cosmides, L. &
3. Tooby, J
Fernald, A. Human maternal vocalizations to infants as biologically relevant signals: An evolutionary perspective. in The adapted mind: Evolutionary psychology and the generation of culture (eds. Barkow, J. H., Cosmides, L. & Tooby, J.) 391–428 (Oxford University Press, 1992).
19.
Burnham, E., Gamache, J. L., Bergeson, T. & Dilley, L. Voice-onset time in infant-directed speech over the first year and a half. in 19, 060094 (ASA, 2013).
OpenUrl
20.
Fernald, A. & Mazzie, C. Prosody and focus in speech to infants and adults. Developmental Psychology 27, 209–221 (1991).
OpenUrl CrossRef Web of Science
21.
Ferguson, C. A. Baby talk in six languages. American Anthropologist 66, 103–114 (1964).
OpenUrl CrossRef Web of Science
22.
Audibert, N. & Falk, S. Vowel space and f0 characteristics of infant-directed singing and speech. In Proceedings of the 19th international conference on speech prosody 153–157 (2018).
23.↵
Kuhl, P. K. et al. Cross-language analysis of phonetic units in language addressed to infants. Science 277, 684–686 (1997).
OpenUrl Abstract/FREE Full Text
24.
Englund, K. T. & Behne, D. M. Infant directed speech in natural interaction: Norwegian vowel quantity and quality. Journal of Psycholinguistic Research 34, 259–280 (2005).
OpenUrl CrossRef PubMed
25.
Fernald, A. The perceptual and affective salience of mothers’ speech to infants. in The origins and growth of communication (1984).
26.
Falk, S. & Kello, C. T. Hierarchical organization in the temporal structure of infant-direct speech and song. Cognition 163, 80–86 (2017).
OpenUrl
27.↵
Bryant, G. A. & Barrett, H. C. Recognizing intentions in infant-directed speech: Evidence for universals. Psychological Science 18, 746–751 (2007).
OpenUrl CrossRef PubMed
28.↵
Piazza, E. A., Iordan, M. C. & Lew-Williams, C. Mothers consistently alter their unique vocal fingerprints when communicating with infants. Current Biology 27, 3162–3167 (2017).
OpenUrl
29.↵
Trehub, S. E., Unyk, A. M. & Trainor, L. J. Adults identify infant-directed music across cultures. Infant Behavior and Development 16, 193–211 (1993).
OpenUrl CrossRef Web of Science
30.
Trehub, S. E., Unyk, A. M. & Trainor, L. J. Maternal singing in cross-cultural perspective. Infant Behavior and Development 16, 285–295 (1993).
OpenUrl CrossRef Web of Science
31.↵
Mehr, S. A., Singh, M., York, H., Glowacki, L. & Krasnow, M. M. Form and function in human song. Current Biology 28, 356–368 (2018).
OpenUrl CrossRef PubMed
32.↵
Mehr, S. A. et al. Universality and diversity in human song. Science 366, 957–970 (2019).
OpenUrl Abstract/FREE Full Text
33.↵
Trehub, S. E. Musical predispositions in infancy. Annals of the New York Academy of Sciences 930, 1–16 (2001).
OpenUrl CrossRef PubMed Web of Science
34.↵
Trehub, S. E. & Trainor, L. Singing to infants: Lullabies and play songs. Advances in Infancy Research 12, 43–78 (1998).
OpenUrl
35.↵
Trehub, S. E. et al. Mothers’ and fathers’ singing to infants. Developmental Psychology 33, 500–507 (1997).
OpenUrl CrossRef PubMed Web of Science
36.↵
Thiessen, E. D., Hill, E. A. & Saffran, J. R. Infant-directed speech facilitates word segmentation. Infancy 7, 53–71 (2005).
OpenUrl CrossRef Web of Science
37.↵
Trainor, L. J. & Desjardins, R. N. Pitch characteristics of infant-directed speech affect infants’ ability to discriminate vowels. Psychonomic Bulletin & Review 9, 335–340 (2002).
OpenUrl
38.↵
Werker, J. F. & McLeod, P. J. Infant preference for both male and female infant-directed talk: A developmental study of attentional and affective responsiveness. Canadian Journal of Psychology/Revue Canadienne de Psychologie 43, 230–246 (1989).
OpenUrl CrossRef
39.↵
Ma, W., Fiveash, A., Margulis, E. H., Behrend, D. & Thompson, W. F. Song and infant-directed speech facilitate word learning. Quarterly Journal of Experimental Psychology 73, 1036–1054 (2020).
OpenUrl
40.↵
Falk, D. Prelinguistic evolution in early hominins: Whence motherese? Behavioral and Brain Sciences 27, 491–502 (2004).
OpenUrl PubMed Web of Science
41.↵
Mehr, S. A. & Krasnow, M. M. Parent-offspring conflict and the evolution of infant-directed song. Evolution and Human Behavior 38, 674–684 (2017).
OpenUrl CrossRef
42.
Mehr, S. A., Kotler, J., Howard, R. M., Haig, D. & Krasnow, M. M. Genomic imprinting is implicated in the psychology of music. Psychological Science 28, 1455–1467 (2017).
OpenUrl
43.↵
Kotler, J., Mehr, S. A., Egner, A., Haig, D. & Krasnow, M. M. Response to vocal music in Angelman syndrome contrasts with Prader-Willi syndrome. Evolution and Human Behavior 40, 420–426 (2019).
OpenUrl
44.↵
Cirelli, L. K., Jurewicz, Z. B. & Trehub, S. E. Effects of maternal singing style on mother–infant arousal and behavior. Journal of Cognitive Neuroscience (2019). doi:10.1162/jocn_a_01402
OpenUrl CrossRef
45.↵
Cirelli, L. K. & Trehub, S. E. Familiar songs reduce infant distress. Developmental Psychology (2020). doi:10.1037/dev0000917
OpenUrl CrossRef
46.↵
Mehr, S. A., Krasnow, M. M., Bryant, G. A. & Hagen, E. H. Origins of music in credible signaling. Behavioral and Brain Sciences 1–41 (2020). doi:10.1017/S0140525X20000345
OpenUrl CrossRef
47.↵
Grieser, D. L. & Kuhl, P. K. Maternal speech to infants in a tonal language: Support for universal prosodic features in motherese. Developmental Psychology 24, 14 (1988).
OpenUrl CrossRef Web of Science
48.↵
Fisher, C. & Tokura, H. Acoustic cues to grammatical structure in infant-directed speech: Crosslinguistic evidence. Child Development 67, 3192–3218 (1996).
OpenUrl CrossRef PubMed Web of Science
49.↵
Broesch, T. L. & Bryant, G. A. Prosody in Infant-Directed Speech Is Similar Across Western and Traditional Cultures. Journal of Cognition and Development 16, 31–43 (2015).
OpenUrl
50.↵
Farran, L. K., Lee, C.-C., Yoo, H. & Oller, D. K. Cross-Cultural Register Differences in Infant-Directed Speech: An Initial Study. PLOS ONE 11, e0151518 (2016).
OpenUrl
51.↵
Henrich, J., Heine, S. J. & Norenzayan, A. The weirdest people in the world? Behavioral and Brain Sciences 33, 61–83 (2010).
OpenUrl CrossRef PubMed Web of Science
52.↵
Yarkoni, T. The generalizability crisis. Behavioral and Brain Sciences (2019). doi:10.1017/S0140525X20001685
OpenUrl CrossRef
53.↵
Broesch, T. & Bryant, G. A. Fathers’ Infant-Directed Speech in a Small-Scale Society. Child Development 89, e29–e41 (2018).
OpenUrl
54.↵
Ochs, E. & Schieffelin, B. Language acquisition and socialization. Culture theory: Essays on mind, self, and emotion 276–320 (1984).
55.
Ratner, N. B. Phonological rule usage in mother-child speech. Journal of Phonetics 12, 245–254 (1984).
OpenUrl
56.↵
Schieffelin, B. B. The give and take of everyday life: Language, socialization of Kaluli children. (CUP Archive, 1990).
57.
Ratner, N. B. & Pye, C. Higher pitch in BT is not universal: Acoustic evidence from Quiche Mayan. Journal of child language 11, 515–522 (1984).
OpenUrl PubMed
58.↵
Pye, C. Quiché mayan speech to children. Journal of child language 13, 85–100 (1986).
OpenUrl PubMed Web of Science
59.
Heath, S. B. Ways with words: Language, life and work in communities and classrooms. (cambridge university Press, 1983).
60.↵
Trehub, S. E. Challenging infant-directed singing as a credible signal of maternal attention. Behavioral and Brain Sciences (2021).
61.↵
Räsänen, O., Kakouros, S. & Soderstrom, M. Is infant-directed speech interesting because it is surprising? – Linking properties of IDS to statistical learning and attention at the prosodic level. Cognition 178, 193–206 (2018).
OpenUrl
62.↵
Cristia, A. & Seidl, A. The hyperarticulation hypothesis of infant-directed speech. Journal of child language 41, 913–934 (2014).
OpenUrl CrossRef PubMed
63.↵
Kalashnikova, M., Carignan, C. & Burnham, D. The origins of babytalk: Smiling, teaching or social convergence? Royal Society Open Science 4, 170306 (2017).
OpenUrl CrossRef
64.↵
Kitamura, C., Thanavishuth, C., Burnham, D. & Luksaneeyanawin, S. Universality and specificity in infant-directed speech: Pitch modifications as a function of infant age and sex in a tonal and non-tonal language. Infant Behavior and Development 24, 372–392 (2001).
OpenUrl CrossRef
65.↵
Fernald, A. Intonation and communicative intent in mothers’ speech to infants: Is the melody the message? Child Development 60, 1497–1510 (1989).
OpenUrl CrossRef PubMed Web of Science
66.↵
Broesch, T., Rochat, P., Olah, K., Broesch, J. & Henrich, J. Similarities and Differences in Maternal Responsiveness in Three Societies: Evidence From Fiji, Kenya, and the United States. Child Development 87, 700–711 (2016).
OpenUrl
67.↵
ManyBabies Consortium. Quantifying sources of variability in infancy research using the infantdirected-speech preference. Advances in Methods and Practices in Psychological Science 3, 24–52 (2020).
OpenUrl
68.
Soley, G. & Sebastian-Galles, N. Infants’ expectations about the recipients of infant-directed and adult-directed speech. Cognition 198, 104214 (2020).
OpenUrl
69.↵
Byers-Heinlein, K. et al. A Multilab Study of Bilingual Infants: Exploring the Preference for InfantDirected Speech. Advances in Methods and Practices in Psychological Science 30 (2021).
70.↵
Fernald, A. et al. A cross-language study of prosodic modifications in mothers’ and fathers’ speech to preverbal infants. Journal of Child Language 16, 477–501 (1989).
OpenUrl CrossRef PubMed Web of Science
71.↵
Kitamura, C. & Burnham, D. Pitch and Communicative Intent in Mother’s Speech: Adjustments for Age and Sex in the First Year. Infancy 4, 85–110 (2003).
OpenUrl CrossRef Web of Science
72.↵
Kitamura, C. & Lam, C. Age-Specific Preferences for Infant-Directed Affective Intent. Infancy 14, 77–100 (2009).
OpenUrl
73.↵
Hilton, C., Crowley, L., Yan, R., Martin, A. & Mehr, S.Children infer the behavioral contexts of unfamiliar foreign songs. (PsyArXiv, 2021). doi:10.31234/osf.io/rz6qn
OpenUrl CrossRef
74.↵
Yan, R. et al. Across demographics and recent history, most parents sing to their infants and toddlers daily. (PsyArXiv, 2021). doi:10.31234/osf.io/fy5bh
OpenUrl CrossRef
75.
Custodero, L. A., Rebello Britto, P. & Brooks-Gunn, J. Musical lives: A collective portrait of American parents and their young children. Journal of Applied Developmental Psychology 24, 553–572 (2003).
OpenUrl CrossRef
76.↵
Mendoza, J. K. & Fausey, C. M. Everyday music in infancy. Developmental Science (2021). doi:10.31234/osf.io/sqatb
OpenUrl CrossRef
77.↵
1. Blurton Jones, N.G
Konner, M. Aspects of the developmental ethology of a foraging people. in Ethological Studies of Child Behaviour (ed. Blurton Jones, N.G.) 285–304 (Cambridge University Press, 1972).
78.↵
Marlowe, F. The Hadza hunter-gatherers of Tanzania. (University of California Press, 2010).
79.↵
Bainbridge, C. M. et al. Infants relax in response to unfamiliar foreign lullabies. Nature Human Behaviour (2021). doi:10.1038/s41562-020-00963-z
OpenUrl CrossRef
80.↵
Hagen, E. H. & Bryant, G. A. Music and dance as a coalition signaling system. Human Nature 14, 21–51 (2003).
OpenUrl CrossRef PubMed Web of Science
81.↵
Corbeil, M., Trehub, S. E. & Peretz, I. Singing delays the onset of infant distress. Infancy 21, 373–391 (2016).
OpenUrl
82.↵
Arnal, L. H., Flinker, A., Kleinschmidt, A., Giraud, A.-L. & Poeppel, D. Human screams occupy a privileged niche in the communication soundscape. Current Biology 25, 2051–2056 (2015).
OpenUrl CrossRef PubMed
83.↵
Friedman, J., Hastie, T. & Tibshirani, R. Lasso and elastic-net regularized generalized linear models. Rpackage version 2.0-5. (2016).
84.↵
Fitch, W. T. Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques. The Journal of the Acoustical Society of America 11 (1997).
85.↵
Blumstein, D. T., Bryant, G. A. & Kaye, P. The sound of arousal in music is context-dependent. Biology Letters 8, 744–747 (2012).
OpenUrl
86.
Reber, S. A. et al. Formants provide honest acoustic cues to body size in American alligators. Scientific Reports 7, 1816 (2017).
OpenUrl
87.↵
Reby, D. et al. Red deer stags use formants as assessment cues during intrasexual agonistic interactions. Proceedings of the Royal Society B: Biological Sciences 272, 941–947 (2005).
OpenUrl CrossRef PubMed Web of Science
88.↵
Bertoncini, J., Bijeljac-Babic, R., Jusczyk, P. W., Kennedy, L. J. & Mehler, J. An investigation of young infants’ perceptual representations of speech sounds. Journal of Experimental Psychology: General 117, 21–33 (1988).
OpenUrl CrossRef PubMed Web of Science
89.
Werker, J. F. & Lalonde, C. E. Cross-language speech perception: Initial capabilities and developmental change. Developmental Psychology 24, 672 (1988).
OpenUrl CrossRef Web of Science
90.↵
Polka, L. & Werker, J. F. Developmental changes in perception of nonnative vowel contrasts. Journal of Experimental Psychology: Human Perception and Performance 20, 421–435 (1994).
OpenUrl CrossRef PubMed Web of Science
91.↵
Trainor, L. J., Clark, E. D., Huntley, A. & Adams, B. A. The acoustic basis of preferences for infant-directed singing. Infant Behavior and Development 20, 383–396 (1997).
OpenUrl CrossRef Web of Science
92.↵
Tsang, C. D., Falk, S. & Hessel, A. Infants prefer infant-directed song over speech. Child Development 88, 1207–1215 (2017).
OpenUrl
93.↵
McDermott, J. H., Schultz, A. F., Undurraga, E. A. & Godoy, R. A. Indifference to dissonance in native Amazonians reveals cultural variation in music perception. Nature 535, 547–550 (2016).
OpenUrl CrossRef PubMed
94.↵
Fernald, A. & Simon, T. Expanded intonation contours in mothers’ speech to newborns. Developmental Psychology 20, 104–113 (1984).
OpenUrl CrossRef Web of Science
95.↵
Trehub, S. E., Hill, D. S. & Kamenetsky, S. B. Parents’ sung performances for infants. Canadian Journal of Experimental Psychology 51, 385–396 (1997).
OpenUrl CrossRef PubMed Web of Science
96.↵
Kirby, K. R. et al. D-PLACE: A Global Database of Cultural, Linguistic and Environmental Diversity. PLOS ONE 11, e0158391 (2016).
OpenUrl CrossRef PubMed
97.↵
Leeuw, J. R. de. jsPsych: A JavaScript library for creating behavioral experiments in a Web browser. Behavior Research Methods 47, 1–12 (2015).
OpenUrl CrossRef PubMed
98.↵
Hartshorne, J. K., Leeuw, J. de, Goodman, N., Jennings, M. & O’Donnell, T. J. A thousand studies for the price of one: Accelerating psychological science with Pushkin. Behavior Research Methods 51, 1782–1803 (2019).
OpenUrl PubMed
99.↵
Boersma, P. W. Praat: Doing phonetics by computer. (2019).
100.↵
1. Preisach, C.,
2. Burkhardt, H.,
3. Schmidt-Thieme, L. &
4. Decker, R
Lartillot, O., Toiviainen, P. & Eerola, T. A Matlab toolbox for music information retrieval. in Data analysis, machine learning and applications (eds. Preisach, C., Burkhardt, H., Schmidt-Thieme, L. & Decker, R.) 261–268 (Springer Berlin Heidelberg, 2008).
101.↵
Patel, A. D. Musical rhythm, linguistic rhythm, and human evolution. Music Perception 24, 99–104 (2006).
OpenUrl
102.↵
Mertens, P. The prosogram: Semi-automatic transcription of prosody based on a tonal perception model. in (2004).
103.↵
Buyens, W., Moonen, M., Wouters, J. & Dijk, B. van. A model for music complexity applied to music preprocessing for cochlear implants. in 971–975 (IEEE, 2017).
104.↵
Ding, N. et al. Temporal modulations in speech and music. Neuroscience & Biobehavioral Reviews 81, (2017).
105.↵
Yale, C. & Forsythe, A. B. Winsorized regression. Technometrics 18, 291–300 (1976).
OpenUrl

View the discussion thread.

Posted October 17, 2021.

Download PDF

Data/Code

Citation Tools

Subject Area

Animal Behavior and Cognition

Subject Areas

All Articles

Animal Behavior and Cognition (5210)
Biochemistry (11740)
Bioengineering (8750)
Bioinformatics (29189)
Biophysics (14967)
Cancer Biology (12093)
Cell Biology (17410)
Clinical Trials (138)
Developmental Biology (9420)
Ecology (14178)
Epidemiology (2067)
Evolutionary Biology (18301)
Genetics (12239)
Genomics (16797)
Immunology (11865)
Microbiology (28070)
Molecular Biology (11583)
Neuroscience (60953)
Paleontology (451)
Pathology (1870)
Pharmacology and Toxicology (3238)
Physiology (4957)
Plant Biology (10425)
Scientific Communication and Education (1683)
Synthetic Biology (2884)
Systems Biology (7338)
Zoology (1651)

[1] 1.↵
Morton, E. S. On the occurrence and significance of motivation-structural rules in some bird and mammal sounds. The American Naturalist 111, 855–869 (1977).
OpenUrl CrossRef Web of Science

[2] 2.↵
Endler, J. A. Some general comments on the evolution and design of animal communication systems. Philosophical Transactions of the Royal Society B: Biological Sciences 340, 215–225 (1993).
OpenUrl PubMed Web of Science

[3] 3.↵
Owren, M. J. & Rendall, D. Sound on the rebound: Bringing form and function back to the forefront in understanding nonhuman primate vocal signaling. Evolutionary Anthropology 10, 58–71 (2001).
OpenUrl CrossRef Web of Science

[4] 4.↵
Fitch, W. T., Neubauer, J. & Herzel, H. Calls out of chaos: The adaptive significance of nonlinear phenomena in mammalian vocal production. Animal Behaviour 63, 407–418 (2002).
OpenUrl CrossRef Web of Science

[5] 5.↵
Wiley, R. H. The evolution of communication: Information and manipulation. Animal Behaviour 2, 156–189 (1983).
OpenUrl

[6] 6.↵
Krebs, J. &
Davies, N
Krebs, J. & Dawkins, R. Animal signals: Mind-reading and manipulation. in Behavioural Ecology: An Evolutionary Approach (eds. Krebs, J. & Davies, N.) 380–402 (Blackwell, 1984).

[7] Krebs, J. &

[8] Davies, N

[9] 7.↵
Karp, D., Manser, M. B., Wiley, E. M. & Townsend, S. W. Nonlinearities in meerkat alarm calls prevent receivers from habituating. Ethology 120, 189–196 (2014).
OpenUrl CrossRef

[10] 8.
Slaughter, E. I., Berlin, E. R., Bower, J. T. & Blumstein, D. T. A test of the nonlinearity hypothesis in great-tailed grackles (Quiscalus mexicanus). Ethology 119, 309–315 (2013).
OpenUrl CrossRef

[11] 9.
Wagner, W. E. Fighting, assessment, and frequency alteration in Blanchard’s cricket frog. Behavioral Ecology and Sociobiology 25, 429–436 (1989).
OpenUrl CrossRef Web of Science

[12] 10.↵
Ladich, F. Sound production by the river bullhead, Cottus gobio L. (Cottidae, Teleostei). Journal of Fish Biology 35, 531–538 (1989).
OpenUrl CrossRef

[13] 11.↵
Filippi, P. et al. Humans recognize emotional arousal in vocalizations across all classes of terrestrial vertebrates: Evidence for acoustic universals. Proceedings of the Royal Society B: Biological Sciences 284, (2017).

[14] 12.
Lingle, S. & Riede, T. Deer mothers are sensitive to infant distress vocalizations of diverse mammalian species. The American Naturalist 184, 510–522 (2014).
OpenUrl

[15] 13.↵
Custance, D. & Mayer, J. Empathic-like responding by domestic dogs (Canis familiaris) to distress in humans: An exploratory study. Animal Cognition 15, 851–859 (2012).
OpenUrl CrossRef PubMed

[16] 14.↵
Magrath, R. D., Haff, T. M., McLachlan, J. R. & Igic, B. Wild birds learn to eavesdrop on heterospecific alarm calls. Current Biology 25, 2047–2050 (2015).
OpenUrl CrossRef PubMed

[17] 15.↵
Lea, A. J., Barrera, J. P., Tom, L. M. & Blumstein, D. T. Heterospecific eavesdropping in a nonsocial species. Behavioral Ecology 19, 1041–1046 (2008).
OpenUrl CrossRef Web of Science

[18] 16.↵
Piantadosi, S. T. & Kidd, C. Extraordinary intelligence and the care of infants. Proceedings of the National Academy of Sciences 113, 6874–6879 (2016).
OpenUrl Abstract/FREE Full Text

[19] 17.↵
Soltis, J. The signal functions of early infant crying. Behavioral and Brain Sciences 27, 443–458 (2004).
OpenUrl CrossRef PubMed Web of Science

[20] 18.↵
Barkow, J. H.,
Cosmides, L. &
Tooby, J
Fernald, A. Human maternal vocalizations to infants as biologically relevant signals: An evolutionary perspective. in The adapted mind: Evolutionary psychology and the generation of culture (eds. Barkow, J. H., Cosmides, L. & Tooby, J.) 391–428 (Oxford University Press, 1992).

[21] Barkow, J. H.,

[22] Cosmides, L. &

[23] Tooby, J

[24] 19.
Burnham, E., Gamache, J. L., Bergeson, T. & Dilley, L. Voice-onset time in infant-directed speech over the first year and a half. in 19, 060094 (ASA, 2013).
OpenUrl

[25] 20.
Fernald, A. & Mazzie, C. Prosody and focus in speech to infants and adults. Developmental Psychology 27, 209–221 (1991).
OpenUrl CrossRef Web of Science

[26] 21.
Ferguson, C. A. Baby talk in six languages. American Anthropologist 66, 103–114 (1964).
OpenUrl CrossRef Web of Science

[27] 22.
Audibert, N. & Falk, S. Vowel space and f0 characteristics of infant-directed singing and speech. In Proceedings of the 19th international conference on speech prosody 153–157 (2018).

[28] 23.↵
Kuhl, P. K. et al. Cross-language analysis of phonetic units in language addressed to infants. Science 277, 684–686 (1997).
OpenUrl Abstract/FREE Full Text

[29] 24.
Englund, K. T. & Behne, D. M. Infant directed speech in natural interaction: Norwegian vowel quantity and quality. Journal of Psycholinguistic Research 34, 259–280 (2005).
OpenUrl CrossRef PubMed

[30] 25.
Fernald, A. The perceptual and affective salience of mothers’ speech to infants. in The origins and growth of communication (1984).

[31] 26.
Falk, S. & Kello, C. T. Hierarchical organization in the temporal structure of infant-direct speech and song. Cognition 163, 80–86 (2017).
OpenUrl

[32] 27.↵
Bryant, G. A. & Barrett, H. C. Recognizing intentions in infant-directed speech: Evidence for universals. Psychological Science 18, 746–751 (2007).
OpenUrl CrossRef PubMed

[33] 28.↵
Piazza, E. A., Iordan, M. C. & Lew-Williams, C. Mothers consistently alter their unique vocal fingerprints when communicating with infants. Current Biology 27, 3162–3167 (2017).
OpenUrl

[34] 29.↵
Trehub, S. E., Unyk, A. M. & Trainor, L. J. Adults identify infant-directed music across cultures. Infant Behavior and Development 16, 193–211 (1993).
OpenUrl CrossRef Web of Science

[35] 30.
Trehub, S. E., Unyk, A. M. & Trainor, L. J. Maternal singing in cross-cultural perspective. Infant Behavior and Development 16, 285–295 (1993).
OpenUrl CrossRef Web of Science

[36] 31.↵
Mehr, S. A., Singh, M., York, H., Glowacki, L. & Krasnow, M. M. Form and function in human song. Current Biology 28, 356–368 (2018).
OpenUrl CrossRef PubMed

[37] 32.↵
Mehr, S. A. et al. Universality and diversity in human song. Science 366, 957–970 (2019).
OpenUrl Abstract/FREE Full Text

[38] 33.↵
Trehub, S. E. Musical predispositions in infancy. Annals of the New York Academy of Sciences 930, 1–16 (2001).
OpenUrl CrossRef PubMed Web of Science

[39] 34.↵
Trehub, S. E. & Trainor, L. Singing to infants: Lullabies and play songs. Advances in Infancy Research 12, 43–78 (1998).
OpenUrl

[40] 35.↵
Trehub, S. E. et al. Mothers’ and fathers’ singing to infants. Developmental Psychology 33, 500–507 (1997).
OpenUrl CrossRef PubMed Web of Science

[41] 36.↵
Thiessen, E. D., Hill, E. A. & Saffran, J. R. Infant-directed speech facilitates word segmentation. Infancy 7, 53–71 (2005).
OpenUrl CrossRef Web of Science

[42] 37.↵
Trainor, L. J. & Desjardins, R. N. Pitch characteristics of infant-directed speech affect infants’ ability to discriminate vowels. Psychonomic Bulletin & Review 9, 335–340 (2002).
OpenUrl

[43] 38.↵
Werker, J. F. & McLeod, P. J. Infant preference for both male and female infant-directed talk: A developmental study of attentional and affective responsiveness. Canadian Journal of Psychology/Revue Canadienne de Psychologie 43, 230–246 (1989).
OpenUrl CrossRef

[44] 39.↵
Ma, W., Fiveash, A., Margulis, E. H., Behrend, D. & Thompson, W. F. Song and infant-directed speech facilitate word learning. Quarterly Journal of Experimental Psychology 73, 1036–1054 (2020).
OpenUrl

[45] 40.↵
Falk, D. Prelinguistic evolution in early hominins: Whence motherese? Behavioral and Brain Sciences 27, 491–502 (2004).
OpenUrl PubMed Web of Science

[46] 41.↵
Mehr, S. A. & Krasnow, M. M. Parent-offspring conflict and the evolution of infant-directed song. Evolution and Human Behavior 38, 674–684 (2017).
OpenUrl CrossRef

[47] 42.
Mehr, S. A., Kotler, J., Howard, R. M., Haig, D. & Krasnow, M. M. Genomic imprinting is implicated in the psychology of music. Psychological Science 28, 1455–1467 (2017).
OpenUrl

[48] 43.↵
Kotler, J., Mehr, S. A., Egner, A., Haig, D. & Krasnow, M. M. Response to vocal music in Angelman syndrome contrasts with Prader-Willi syndrome. Evolution and Human Behavior 40, 420–426 (2019).
OpenUrl

[49] 44.↵
Cirelli, L. K., Jurewicz, Z. B. & Trehub, S. E. Effects of maternal singing style on mother–infant arousal and behavior. Journal of Cognitive Neuroscience (2019). doi:10.1162/jocn_a_01402
OpenUrl CrossRef

[50] 45.↵
Cirelli, L. K. & Trehub, S. E. Familiar songs reduce infant distress. Developmental Psychology (2020). doi:10.1037/dev0000917
OpenUrl CrossRef

[51] 46.↵
Mehr, S. A., Krasnow, M. M., Bryant, G. A. & Hagen, E. H. Origins of music in credible signaling. Behavioral and Brain Sciences 1–41 (2020). doi:10.1017/S0140525X20000345
OpenUrl CrossRef

[52] 47.↵
Grieser, D. L. & Kuhl, P. K. Maternal speech to infants in a tonal language: Support for universal prosodic features in motherese. Developmental Psychology 24, 14 (1988).
OpenUrl CrossRef Web of Science

[53] 48.↵
Fisher, C. & Tokura, H. Acoustic cues to grammatical structure in infant-directed speech: Crosslinguistic evidence. Child Development 67, 3192–3218 (1996).
OpenUrl CrossRef PubMed Web of Science

[54] 49.↵
Broesch, T. L. & Bryant, G. A. Prosody in Infant-Directed Speech Is Similar Across Western and Traditional Cultures. Journal of Cognition and Development 16, 31–43 (2015).
OpenUrl

[55] 50.↵
Farran, L. K., Lee, C.-C., Yoo, H. & Oller, D. K. Cross-Cultural Register Differences in Infant-Directed Speech: An Initial Study. PLOS ONE 11, e0151518 (2016).
OpenUrl

[56] 51.↵
Henrich, J., Heine, S. J. & Norenzayan, A. The weirdest people in the world? Behavioral and Brain Sciences 33, 61–83 (2010).
OpenUrl CrossRef PubMed Web of Science

[57] 52.↵
Yarkoni, T. The generalizability crisis. Behavioral and Brain Sciences (2019). doi:10.1017/S0140525X20001685
OpenUrl CrossRef

[58] 53.↵
Broesch, T. & Bryant, G. A. Fathers’ Infant-Directed Speech in a Small-Scale Society. Child Development 89, e29–e41 (2018).
OpenUrl

[59] 54.↵
Ochs, E. & Schieffelin, B. Language acquisition and socialization. Culture theory: Essays on mind, self, and emotion 276–320 (1984).

[60] 55.
Ratner, N. B. Phonological rule usage in mother-child speech. Journal of Phonetics 12, 245–254 (1984).
OpenUrl

[61] 56.↵
Schieffelin, B. B. The give and take of everyday life: Language, socialization of Kaluli children. (CUP Archive, 1990).

[62] 57.
Ratner, N. B. & Pye, C. Higher pitch in BT is not universal: Acoustic evidence from Quiche Mayan. Journal of child language 11, 515–522 (1984).
OpenUrl PubMed

[63] 58.↵
Pye, C. Quiché mayan speech to children. Journal of child language 13, 85–100 (1986).
OpenUrl PubMed Web of Science

[64] 59.
Heath, S. B. Ways with words: Language, life and work in communities and classrooms. (cambridge university Press, 1983).

[65] 60.↵
Trehub, S. E. Challenging infant-directed singing as a credible signal of maternal attention. Behavioral and Brain Sciences (2021).

[66] 61.↵
Räsänen, O., Kakouros, S. & Soderstrom, M. Is infant-directed speech interesting because it is surprising? – Linking properties of IDS to statistical learning and attention at the prosodic level. Cognition 178, 193–206 (2018).
OpenUrl

[67] 62.↵
Cristia, A. & Seidl, A. The hyperarticulation hypothesis of infant-directed speech. Journal of child language 41, 913–934 (2014).
OpenUrl CrossRef PubMed

[68] 63.↵
Kalashnikova, M., Carignan, C. & Burnham, D. The origins of babytalk: Smiling, teaching or social convergence? Royal Society Open Science 4, 170306 (2017).
OpenUrl CrossRef

[69] 64.↵
Kitamura, C., Thanavishuth, C., Burnham, D. & Luksaneeyanawin, S. Universality and specificity in infant-directed speech: Pitch modifications as a function of infant age and sex in a tonal and non-tonal language. Infant Behavior and Development 24, 372–392 (2001).
OpenUrl CrossRef

[70] 65.↵
Fernald, A. Intonation and communicative intent in mothers’ speech to infants: Is the melody the message? Child Development 60, 1497–1510 (1989).
OpenUrl CrossRef PubMed Web of Science

[71] 66.↵
Broesch, T., Rochat, P., Olah, K., Broesch, J. & Henrich, J. Similarities and Differences in Maternal Responsiveness in Three Societies: Evidence From Fiji, Kenya, and the United States. Child Development 87, 700–711 (2016).
OpenUrl

[72] 67.↵
ManyBabies Consortium. Quantifying sources of variability in infancy research using the infantdirected-speech preference. Advances in Methods and Practices in Psychological Science 3, 24–52 (2020).
OpenUrl

[73] 68.
Soley, G. & Sebastian-Galles, N. Infants’ expectations about the recipients of infant-directed and adult-directed speech. Cognition 198, 104214 (2020).
OpenUrl

[74] 69.↵
Byers-Heinlein, K. et al. A Multilab Study of Bilingual Infants: Exploring the Preference for InfantDirected Speech. Advances in Methods and Practices in Psychological Science 30 (2021).

[75] 70.↵
Fernald, A. et al. A cross-language study of prosodic modifications in mothers’ and fathers’ speech to preverbal infants. Journal of Child Language 16, 477–501 (1989).
OpenUrl CrossRef PubMed Web of Science

[76] 71.↵
Kitamura, C. & Burnham, D. Pitch and Communicative Intent in Mother’s Speech: Adjustments for Age and Sex in the First Year. Infancy 4, 85–110 (2003).
OpenUrl CrossRef Web of Science

[77] 72.↵
Kitamura, C. & Lam, C. Age-Specific Preferences for Infant-Directed Affective Intent. Infancy 14, 77–100 (2009).
OpenUrl

[78] 73.↵
Hilton, C., Crowley, L., Yan, R., Martin, A. & Mehr, S.Children infer the behavioral contexts of unfamiliar foreign songs. (PsyArXiv, 2021). doi:10.31234/osf.io/rz6qn
OpenUrl CrossRef

[79] 74.↵
Yan, R. et al. Across demographics and recent history, most parents sing to their infants and toddlers daily. (PsyArXiv, 2021). doi:10.31234/osf.io/fy5bh
OpenUrl CrossRef

[80] 75.
Custodero, L. A., Rebello Britto, P. & Brooks-Gunn, J. Musical lives: A collective portrait of American parents and their young children. Journal of Applied Developmental Psychology 24, 553–572 (2003).
OpenUrl CrossRef

[81] 76.↵
Mendoza, J. K. & Fausey, C. M. Everyday music in infancy. Developmental Science (2021). doi:10.31234/osf.io/sqatb
OpenUrl CrossRef

[82] 77.↵
Blurton Jones, N.G
Konner, M. Aspects of the developmental ethology of a foraging people. in Ethological Studies of Child Behaviour (ed. Blurton Jones, N.G.) 285–304 (Cambridge University Press, 1972).

[83] Blurton Jones, N.G

[84] 78.↵
Marlowe, F. The Hadza hunter-gatherers of Tanzania. (University of California Press, 2010).

[85] 79.↵
Bainbridge, C. M. et al. Infants relax in response to unfamiliar foreign lullabies. Nature Human Behaviour (2021). doi:10.1038/s41562-020-00963-z
OpenUrl CrossRef

[86] 80.↵
Hagen, E. H. & Bryant, G. A. Music and dance as a coalition signaling system. Human Nature 14, 21–51 (2003).
OpenUrl CrossRef PubMed Web of Science

[87] 81.↵
Corbeil, M., Trehub, S. E. & Peretz, I. Singing delays the onset of infant distress. Infancy 21, 373–391 (2016).
OpenUrl

[88] 82.↵
Arnal, L. H., Flinker, A., Kleinschmidt, A., Giraud, A.-L. & Poeppel, D. Human screams occupy a privileged niche in the communication soundscape. Current Biology 25, 2051–2056 (2015).
OpenUrl CrossRef PubMed

[89] 83.↵
Friedman, J., Hastie, T. & Tibshirani, R. Lasso and elastic-net regularized generalized linear models. Rpackage version 2.0-5. (2016).

[90] 84.↵
Fitch, W. T. Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques. The Journal of the Acoustical Society of America 11 (1997).

[91] 85.↵
Blumstein, D. T., Bryant, G. A. & Kaye, P. The sound of arousal in music is context-dependent. Biology Letters 8, 744–747 (2012).
OpenUrl

[92] 86.
Reber, S. A. et al. Formants provide honest acoustic cues to body size in American alligators. Scientific Reports 7, 1816 (2017).
OpenUrl

[93] 87.↵
Reby, D. et al. Red deer stags use formants as assessment cues during intrasexual agonistic interactions. Proceedings of the Royal Society B: Biological Sciences 272, 941–947 (2005).
OpenUrl CrossRef PubMed Web of Science

[94] 88.↵
Bertoncini, J., Bijeljac-Babic, R., Jusczyk, P. W., Kennedy, L. J. & Mehler, J. An investigation of young infants’ perceptual representations of speech sounds. Journal of Experimental Psychology: General 117, 21–33 (1988).
OpenUrl CrossRef PubMed Web of Science

[95] 89.
Werker, J. F. & Lalonde, C. E. Cross-language speech perception: Initial capabilities and developmental change. Developmental Psychology 24, 672 (1988).
OpenUrl CrossRef Web of Science

[96] 90.↵
Polka, L. & Werker, J. F. Developmental changes in perception of nonnative vowel contrasts. Journal of Experimental Psychology: Human Perception and Performance 20, 421–435 (1994).
OpenUrl CrossRef PubMed Web of Science

[97] 91.↵
Trainor, L. J., Clark, E. D., Huntley, A. & Adams, B. A. The acoustic basis of preferences for infant-directed singing. Infant Behavior and Development 20, 383–396 (1997).
OpenUrl CrossRef Web of Science

[98] 92.↵
Tsang, C. D., Falk, S. & Hessel, A. Infants prefer infant-directed song over speech. Child Development 88, 1207–1215 (2017).
OpenUrl

[99] 93.↵
McDermott, J. H., Schultz, A. F., Undurraga, E. A. & Godoy, R. A. Indifference to dissonance in native Amazonians reveals cultural variation in music perception. Nature 535, 547–550 (2016).
OpenUrl CrossRef PubMed

[100] 94.↵
Fernald, A. & Simon, T. Expanded intonation contours in mothers’ speech to newborns. Developmental Psychology 20, 104–113 (1984).
OpenUrl CrossRef Web of Science

[101] 95.↵
Trehub, S. E., Hill, D. S. & Kamenetsky, S. B. Parents’ sung performances for infants. Canadian Journal of Experimental Psychology 51, 385–396 (1997).
OpenUrl CrossRef PubMed Web of Science

[102] 96.↵
Kirby, K. R. et al. D-PLACE: A Global Database of Cultural, Linguistic and Environmental Diversity. PLOS ONE 11, e0158391 (2016).
OpenUrl CrossRef PubMed

[103] 97.↵
Leeuw, J. R. de. jsPsych: A JavaScript library for creating behavioral experiments in a Web browser. Behavior Research Methods 47, 1–12 (2015).
OpenUrl CrossRef PubMed

[104] 98.↵
Hartshorne, J. K., Leeuw, J. de, Goodman, N., Jennings, M. & O’Donnell, T. J. A thousand studies for the price of one: Accelerating psychological science with Pushkin. Behavior Research Methods 51, 1782–1803 (2019).
OpenUrl PubMed

[105] 99.↵
Boersma, P. W. Praat: Doing phonetics by computer. (2019).

[106] 100.↵
Preisach, C.,
Burkhardt, H.,
Schmidt-Thieme, L. &
Decker, R
Lartillot, O., Toiviainen, P. & Eerola, T. A Matlab toolbox for music information retrieval. in Data analysis, machine learning and applications (eds. Preisach, C., Burkhardt, H., Schmidt-Thieme, L. & Decker, R.) 261–268 (Springer Berlin Heidelberg, 2008).

[107] Preisach, C.,

[108] Burkhardt, H.,

[109] Schmidt-Thieme, L. &

[110] Decker, R

[111] 101.↵
Patel, A. D. Musical rhythm, linguistic rhythm, and human evolution. Music Perception 24, 99–104 (2006).
OpenUrl

[112] 102.↵
Mertens, P. The prosogram: Semi-automatic transcription of prosody based on a tonal perception model. in (2004).

[113] 103.↵
Buyens, W., Moonen, M., Wouters, J. & Dijk, B. van. A model for music complexity applied to music preprocessing for cochlear implants. in 971–975 (IEEE, 2017).

[114] 104.↵
Ding, N. et al. Temporal modulations in speech and music. Neuroscience & Biobehavioral Reviews 81, (2017).

[115] 105.↵
Yale, C. & Forsythe, A. B. Winsorized regression. Technometrics 18, 291–300 (1976).
OpenUrl

Acoustic regularities in infant-directed speech and song across cultures

Abstract

Main

Naïve listeners distinguish infant-directed from adult-directed vocalizations

Acoustic correlates of infant-directedness across cultures

Human intuitions of infant-directedness are modulated by vocalization acoustics

Discussion

Methods

Vocalization corpus

Naïve listener experiment

Acoustic feature extraction

Praat

MIRtoolbox

Rhythmic variability

Normalized pairwise variability index

Outlier preprocessing

End notes

Data, code, and materials availability

Author contributions

Ethics

Additional information

Supplementary Information

Acknowledgments

Footnotes

References

Citation Manager Formats

Subject Area