Abstract
Tonal languages differ from other languages in their use of pitch (tones) to distinguish words. Some research suggests that the linguistic pitch expertise of tonal language speakers may generalize to improved discrimination of some aspects of musical pitch: tonal language speakers may therefore have music perception advantages over speakers of other languages. The evidence is mixed, however, as prior studies have studied small numbers of participants in only a few tonal languages and countries, making it challenging to disentangle the effects of linguistic experience from variability in music training experience, cultural differences, and so on. Here, we report an assessment of music perception skill in native speakers of 40 languages, with a preregistered exploratory-confirmatory design, including tonal (e.g., Mandarin, Vietnamese), pitch-accented (e.g., Japanese, Croatian), and non-tonal (e.g., Spanish, Hungarian) languages. Whether or not participants had taken music lessons, native speakers of tonal languages (confirmatory n = 20,102) had an improved ability to discriminate musical melodies. But this improvement came with a trade-off: relative to speakers of pitch-accented (confirmatory n = 9,694) or non-tonal languages (confirmatory n = 242,096), tonal speakers were worse at discriminating fine-scale pitch-tuning and worse at processing the musical beat. These results, which held across 5 tonal languages and were robust to geographic and demographic variation, demonstrate that linguistic experience shapes music perception ability, with implications for relations between music, language, and culture in the human mind.
1 Introduction
From infancy and early childhood, we are surrounded by people speaking (Bergelson et al., 2019; EiblEibesfeldt, 1979; Konner, 2010) and singing (Bonneville-Roussy et al., 2013; Mehr et al., 2020, 2019; Mehr, 2014; Mendoza & Fausey, 2021; Yan et al., 2021). This immersion continues throughout the lifespan and is reinforced through our own language and music production.
Human perception readily adapts to these soundscapes: early speech experiences tune our hearing to the speech contrasts of our native language(s) (Kuhl, 2004; Polka & Werker, 1994; Werker & Tees, 1984), and musical experiences during the same time period are thought to have similar “perceptual narrowing” effects, biasing listeners’ interpretations of musical rhythm and pitch based on their own musical cultures (Hannon & Trehub, 2005; Lynch et al., 1990). These effects may cross domains. While music training has minimal causal effects on high-level cognitive skills (Mehr, 2013; Sala & Gobet, 2020), it may sharpen lower-level aspects of speech processing (Patel, 2011; Wong et al., 2007) and auditory perception (Kraus & Chandrasekaran, 2010). In the opposite direction, enhanced experience with the kind of linguistic pitch used in tonal languages has been argued to shape pitch processing in music (Bidelman et al., 2013; Bradley, 2016; Pfordresher & Brown, 2009).
Here, we study the latter possibility, to examine the effects of language experience on music processing, with a focus on pitch. Languages can be classified into three distinct categories based on their use of pitch: tonal, non-tonal, and pitch-accented. While all spoken languages convey information via pitch, tonal languages, which represent over half the world’s languages [including many Southeast Asian and African languages; Yip (2002)] use pitch in a special fashion. In tonal languages, pitch is often used lexically: speaking the same syllable with a different pitch level or shape alters meaning at the word level (Pike, 1948; van der Hulst, 2011). A canonical example is the Mandarin syllable ma, which has different meanings depending on its tonal contour (i.e., level, rising, falling-rising, or falling). This property requires pitch sensitivity in both speakers and listeners, lest one scold (mà) one’s mother (mā) instead of one’s horse (mă).
The lexical use of pitch in tonal languages is distinct from how pitch is otherwise used in speech. For example, many languages use pitch to convey affect (Cowen et al., 2019); to cue non-lexical meaning [e.g., helping to differentiate between questions and statements; Patel (2008); Tong et al. (2005)]; to emphasize information (Breen et al., 2010); to cue sentence structure with metrical stress patterns (Wagner & McAuliffe, 2019) supporting comprehension (Hilton & Goldwater, 2021); and is a prominent feature of infant-directed speech (Hilton et al., 2021). These many uses of pitch are typical of speech in non-tonal languages [e.g., many Indo-European, South Asian, or Australian languages; Maddieson (2013)], but in these languages, pitch is never used lexically to denote word meanings. A third group, pitch-accented languages, is an intermediate category with limited or mixed use of lexical pitch [such as Croatian; van der Hulst (2011)].1
The special role of pitch in tonal languages has motivated the hypothesis that speaking a tonal language sharpens pitch perception in a domain-general way. Indeed, compared to speakers of non-tonal languages, native speakers of tonal languages not only better discriminate the tones of their native language and those of other tonal languages they do not speak (Li & Gao, 2018; Peng et al., 2010), but also show stronger categorical perception for non-speech pitch patterns generally (Bent et al., 2006; Bidelman & Lee, 2015). Tonal language speakers also have distinct neural responses to pitch in brain areas associated with early auditory processing (Bidelman et al., 2011a; Bidelman et al., 2011b).
Might domain-general auditory processing advantages transfer to enhanced pitch processing in music? Many studies have tested this question by comparing native speakers of tonal and non-tonal languages on a variety of musical pitch perception tasks. The results have been mixed (Table 1). Some studies report that tonal language speakers excel at discriminating pitch patterns in the form of melodies, intervals, and contours (Alexander et al., 2008; Bidelman et al., 2013; Bradley, 2016; Chen et al., 2016; Pfordresher & Brown, 2009; Wong et al., 2012); or at discerning fine-grained pitch difference either in isolation or in the context of detuned melodies (Bidelman et al., 2013; Chen et al., 2016; Giuliano et al., 2011; Hutka et al., 2015). But other studies fail to replicate these patterns, both for melodic discrimination (Giuliano et al., 2011; Stevens et al., 2013; Tong et al., 2018) and fine-scale pitch discrimination (Bent et al., 2006; Bidelman et al., 2011a; Pfordresher & Brown, 2009; Stevens et al., 2013; Tong et al., 2018). Some studies even find that tonal language speakers have more trouble distinguishing musical pitch contours, suggesting that lexical tone experience could interfere with pitch perception in some contexts (Bent et al., 2006; Chang et al., 2016; Peretz et al., 2011; Zheng & Samuel, 2018).
Prior studies on the effects of language experience on music processing have mixed results. Key: Tonal-language advantages indicated by +; disadvantages by -; null results by 0.
What explains these conflicting results? First, and most importantly, prior studies sample languages narrowly (i.e., typically comparing Mandarin or Cantonese speakers from mainland China to English speakers from the United States), with only a few exceptions (i.e., Yoruba speakers in Bradley, 2016; Thai speakers in Stevens et al., 2013; Dutch speakers in Chen et al., 2016). This, combined with generally small samples (median prior sample size per language group: n = 25; interquartile range: 16-42; minimum: n = 11; see Table 1) makes it difficult to rule out confounding effects from factors such as culture and genetics (Deutsch et al., 2006; Hove et al., 2010) or linguistic variability (Evans & Levinson, 2009). Generalization from particular speakers of specific languages to entire groups of languages (i.e., from studies of Mandarin speakers to claims concerning general effects of tonal language experience) require larger samples of both languages and participants (Hilton & Mehr, In Press; Yarkoni, 2020).
Second, participants’ musical training experience has rarely been accounted for in prior work. At best, this contributes additional unsystematic variation within a sample, reducing statistical power; but access to musical training may also vary systematically between countries (e.g., Campbell & Wiggins, 2012), potentially leading to biased estimates of music perception abilities. Such biases could erroneously be attributed to differences in tonal language experience, leading to contradictory findings across studies.
Third, the operationalization of “musical pitch perception” and its measurement in prior studies has varied widely, ranging from holistic aspects of melody apprehension to the discrimination of fine-scale pitch differences. This has made it difficult to find consensus on a general claim concerning the relation between music perception ability and language experience.
These issues can be addressed by studying many native speakers of many languages, with and without music training experience, all of whom complete the same assessments of music processing ability. Here, we report a study of the ability to discriminate melodies, detect mistuned singing, and detect misaligned beats, in 468,581 people. Participants self-reported their native language, location, demographics, and degree of musical training, enabling language-wise and language-type-wise analyses of each of these musical abilities and their relations to linguistic and musical experience.
2 Methods
2.1 Participants
Participants were visitors to the citizen-science website https://themusiclab.org who completed a set of three music perception tests presented as an online game (Test Your Musical IQ). We did not recruit participants directly; rather, they visited the site after hearing about it organically (e.g., via Reddit posts, YouTube clips, Twitch streams). All participants gave informed consent under an ethics protocol approved by the Committee on the Use of Human Subjects, Harvard University’s Institutional Review Board.
Of the 1,993,012 participants who started the experiment between Nov 8th, 2019 and Aug 21st, 2021, 716,855 completed the experiment. Those who participated in the first 5 months of data collection (n = 297,600, post-exclusion n = 196,689) were used in an exploratory dataset, analyses of which are reported in Text S1.1; these analyses formed the basis for a preregistration of confirmatory analyses, which is available at <https://osf.io/xurdb.2
The remaining 404,306 participants formed the confirmatory dataset. Based on preregistered exclusion criteria, we removed 42,571 participants with missing and/or internally conflicting demographics data, resulting in 361,735 participants. Of these remaining participants, we then removed those that (a) had participated in the experiment on another occasion, to avoid any effects of learning (n = 22,297); (b) reported a hearing impairment (n = 47,640); (c) reported their age as below 8 years or above 90 years (n = 491); (d) reported an age of music lessons onset that was either below 2 years or above 90 years (n = 496); (e) reported a music lesson onset age that was greater than their self-reported age (n = 106); (f) reported that they were participating in a noisy environment and were not wearing headphones (n = 824; see Text S1.2 for analysis of a manipulation check to test whether participants were actually wearing headphones); or (g) more than one of the above (n = 5,808). Finally, given our planned analyses at the language level, we also excluded participants whose native language was not among the 40 most-represented languages in the exploratory dataset to ensure statistically meaningful comparisons were licensed (n = 12,181 participants; or 206 languages with a median of 9 participants per excluded language).
The native language of each participant was classified as tonal, non-tonal, or pitch-accented based on the Lyon-Albuquerque Phonological Systems Database (Maddieson et al., 2014) and the World Atlas of Language Structures (Maddieson, 2013). Languages that are not present in either database were classified according to information from the Phonetics Information Base and Lexicon Database (Moran & McCloy, 2019) or other sources from the linguistics literature. No discrepancies were found across these sources; Table S1 contains information about the language classifications.
The post-exclusion confirmatory dataset, which we analyze for the remainder of this paper, thus contains data from 271,892 participants, representing 40 native languages (5 tonal, 6 pitch-accented, and 29 non-tonal; see Figure 1 and Table S1 for specific languages and associated sample sizes). Demographic information is in Table S2.
Sample sizes, grouped by language and language type. The font of each language’s name is scaled proportionally to that language’s sample size in the confirmatory dataset. Horizontal positions are jittered to improve readability.
2.2 Stimuli
Participants completed three music perception tests measuring ability in melodic discrimination (Harrison et al., 2017), mistuning perception (Larrouy-Maestri et al., 2019), and beat alignment (Harrison & Müllensiefen, 2018). The melodic discrimination assesses the ability to detect differences between melodic patterns: participants listened to three transpositions of the same melodic pattern and were asked to choose the version in which one pitch interval was altered (i.e., an oddball task). The mistuning perception test assesses the ability to identify vocal mistuning: participants listened to two versions of a short musical excerpt, one of which had a vocal track that was detuned from the background music, and were asked to identify the out-of-tune version. The beat alignment test assesses the ability to detect correct synchronization between a click track and some music: participants listened to two versions of the same musical excerpt, both accompanied by a click track; one of the click tracks was misaligned by a constant proportion and participants were asked which example was correctly aligned.
As in the original tests cited above, each of the subtests was presented adaptively using psychTestR (Harrison, 2020). To minimize the duration of the experiment, we fixed the length of each subtest at 15 trials, the minimum number of trials with acceptably low mean standard errors, according to the original test designs.
2.3 Analysis strategy
The goal of our analyses was to determine whether musical abilities differ reliably as a function of native language type (i.e., tonal vs. pitch-accented vs. non-tonal). A simple comparison of scores across language types would have been confounded with the degree of musical experience sampled within each language, along with other cultural or personal factors. Instead, following our preregistration, we applied an ordinary-least-squares regression with measured potential-confounders as covariates to our full sample, modeled as Performance ∼ Languagetype + Gender + Age + Music lessons, with non-tonal language, female, and no music lessons as the reference levels.
We then supplemented this model-based approach to dealing with confounding with a design-based approach: creating three additional versions of the data from the full sample, each controlling for confounding in different ways. The exact match subsample included participants matched 1-to-1 within each language type (tonal, pitch-accented, and non-tonal) on music lessons, gender, and age (coarsened into 10 year bins). The no music lessons exact match subsample further filtered the exact match subsample to only participants who had not received any music lessons. Finally, the inverse probability weighted sample included the total sample population, balanced on music lessons, gender, and age using inverse-probability weighting (Austin & Stuart, 2015; Stuart, 2010). These three subsamples were modeled with the same ordinary-least-squares regression: Performance ∼ Language type (with non-tonal language as the reference level). For brevity, and because these four different models provide robustly convergent evidence, we only report the full sample analysis in the main text; analysis details for each subsample model are reported in Text S1.3 and Tables S3 and S4a-c.
Based on the exploratory analyses (Text S1.1), we expected that native speakers of tonal languages would outperform native speakers of both pitch-accented and non-tonal languages on the melodic discrimination task; but would underperform both other groups on both the mistuning perception and beat alignment tasks.
As a final analytic note, because group-level comparisons in a very large sample risks yielding statistically significant effects of tiny size (i.e., practically non-significant), we also preregistered an inference criterion: in the main analyses, we compare the effects of language type to the effects of having taken music lessons. This provides a plainly interpretable benchmark: for instance, being a native speaker of a particular language type is associated with effects that will be interpreted as larger, of comparable size, or smaller than the effects of having taken music lessons.
3 Results
The main analyses (Table 2 and Figure 2) supported the exploratory predictions. First, and most strikingly, native speakers of tonal languages had reliable advantages in melodic discrimination, of large practical significance: the effect of tonal language experience on melodic discrimination ability is nearly as large as the effect of having taken music lessons. This effect replicated across all four of our methods to control for confounding (Text S1.3, Tables S3 and S4a-c).
Regression results for each musical test in the confirmatory sample. ***p < 0.001. Note that an indicator variable for the gender “Other” was included as a predictor in the model, but is omitted from this table as it was rarely selected by participants.
Estimated effects of language type (tonal language: orange, pitch-accented language: purple, non for-tonal language: blue), marginalizing over the average proportions for ages and gender, and shown relative to the marginal effect of additionally having music lessons. After marginalization, for ease of interpretability, a scalar transformation was applied to the coefficients such that “Non-tonal” and “No music lesson” coefficient was equal to zero. Solid circular points indicate marginal estimates of speaking a given language, semitransparent triangular points above indicate this effect combined with that of having music lessons. Error-bars represent 95% confidence intervals of the mean. The dotted horizontal black line indicates the baseline (y = 0).
The tonal-language advantage in melodic discrimination traded off with other music perception abilities, however. Native speakers of tonal languages had a practically and statistically significant disadvantage in beat alignment scores, relative to the non-tonal-language group (∼1/4th the effect of having taken music lessons; Figure 2), and showed small but inconsistent disadvantages in mistuning perception scores (see also Text S1.3).
Native speakers of pitch-accented languages showed a melodic discrimination advantage over non-tonal speakers, but this effect was far smaller than that of the tonal-language speakers. They performed better than non-tonal-language speakers on both the mistuning perception and beat alignment tests, however, consistent with a tonal-language disadvantage on these two tasks. We caution, however, that the category of pitch-accented languages is inherently more ambiguous than that of tonal languages, so we do not interpret these differences further (see also Footnote 1).
In follow-up analyses, we explored the consistency of language-type effects. First, we probed the stability of test performance within speakers of each of the 40 languages studied here, by comparing the mean performance of all speakers of each language across the exploratory and confirmatory datasets. This provides a measure of the stability in the relationship between performance and individual languages. Stability was high (Figure 3): mean language-wise test performance in the exploratory dataset correlated highly with that of the confirmatory dataset (melodic discrimination: r = 0.92; beat alignment: r = 0.91; mistuning detection: r = 0.87; ps < 0.001).
Mean exploratory vs confirmatory scores within each individual language are highly similar, clustered around the y = x line (representing a perfect match) across the three musical tests. Each dot represents an individual language (tonal language: orange, pitch-accented language: purple, non-tonal language: blue).
We then tested the consistency of the main effects across the languages that make up each language group (Figure 4). For melodic discrimination, of the 10 highest-performing languages, half were tonal (i.e., all 5 tonal languages studied here), with tonal languages ranked significantly higher than non-tonal languages (Mann-Whitney U = 126, p = 0.007). The opposite pattern was evident for beat alignment, where the 5 tonal languages were clustered toward the bottom of the distribution (U = 20, p = 0.008); no significant difference was evident for the mistuning perception test, consistent with the somewhat weaker estimated differences on this test (U = 42, p = 0.149). The language-wise performance in pitch-accented languages was far more variable than the tonal languages, perhaps reflecting the fuzzier nature of this category. Larger versions of each column in Figure 4 are presented in Figures S1a-c.
Languages ranked (from high to low) by their median scores on each test. Each boxplot represents a language, with the median score of all native speakers of that language indicated by the vertical black line; the interquartile range by the width of the box, and the range by the horizontal black line. Language types are indicated by the shading (see Legend). To avoid potential confounds of musical experience, only participants who reported not having had musical training are included in this figure. Larger versions of each column are in Figures S1a-c, including language labels and language-wise sample sizes.
4 Discussion
We found a clear link between linguistic experience and music processing abilities: native speakers of tonal languages performed substantially better than native speakers of non-tonal languages on a test that required discriminating changes in melodic patterns; the effect size of being a tonal language speaker was nearly as large as the effect of receiving music lessons. Notably, our melodic discrimination results did not reflect a general advantage in tonal language speakers’ music processing abilities; in fact, tonal language speakers did not show an advantage over speakers of non-tonal or pitch-accented languages on tests probing sensitivity to finer-grained pitch (mistuning) and rhythm (beat alignment). Indeed, on these aspects of music processing, tonal speakers showed a modest disadvantage.
Our results are likely to generalize across tonal languages, given that they held in our dataset across multiple tonal languages, each represented by hundreds or thousands of native speakers. They also help clarify the previously mixed pattern of results concerning the effects of linguistic experience on music processing across different tasks and samples. For example, an advantage for tonal language speakers in melodic pattern processing is consistent with the majority of previous studies (Alexander et al., 2008; Bidelman et al., 2013; Bradley, 2016; Creel et al., 2018; Pfordresher & Brown, 2009; Swaminathan et al., 2021; Wong et al., 2012), though not all (Bidelman et al., 2011a; Giuliano et al., 2011; Stevens et al., 2013; Tong et al., 2018; Zheng & Samuel, 2018). The modest disadvantage we see on fine-grained pitch tasks too is supported by some prior studies (Bent et al., 2006; Bidelman et al., 2011a; Chang et al., 2016; Peretz et al., 2011; Pfordresher & Brown, 2009; Stevens et al., 2013; Tong et al., 2018; Wong et al., 2012; Zheng & Samuel, 2018) but not others (Bidelman et al., 2013; Giuliano et al., 2011; Hutka et al., 2015). And while rhythmic abilities in tonal language speakers have rarely been studied (see Wong et al., 2012; Zhang et al., 2020), a disadvantage in beat discrimination is consistent with recent work showing that tonal speakers give more weight to pitch cues than duration cues; this weighting cuts across auditory domains (Jasmin et al., 2021). By leveraging a consistent set of tests and a large sample size, our results make clear that speaking a tonal language has a measurable but delimited connection to music skills.
Why might tonal language experience have these specific effects on music perception? Like others (e.g. Bidelman et al., 2013; Patel & Iversen, 2014), we suspect that the answer lies in the shared mechanisms and neural processing resources associated with auditory perception — whether they are applied to language or music. Both tonal languages and music rely on specialized sound categories (tone contours in speech; pitch motifs in music). If these categories are learned through shared, domain-general learning mechanisms, then improving the efficiency of these mechanisms through practice in either domain should result in mutual improvements (Asaridou & McQueen, 2013; Chang et al., 2016; Delogu et al., 2010, 2006; McMullen & Saffran, 2004; Patel, 2008). This idea highlights differences between putatively domain-general auditory processing and putatively domain-specific mechanisms that underlie music processing [e.g., the processing of pitch in the context of a tonal hierarchy; Peretz & Coltheart (2003); Zatorre et al. (2002); Krumhansl (2004)].
It does not explain, however, how these learning mechanisms might be improved by experience. One possibility is that language experience could shape domain-general perceptual strategies regarding inferences about high-level perceptual categories on the basis of low-level cues: acquired perceptual biases (i.e., from tonal language experience) may aid the processing of some stimuli while worsening the processing of others. In speech, listeners give more perceptual weight to cues that are more informative in discriminating contrasts that are salient in their native language (Schertz & Clare, 2020), and tonal language speakers rely more heavily on pitch to categorize and produce speech stress when acquiring a non-tonal L2 language compared to native speakers (Wang, 2008; Yu & Andruski, 2010). Similarly, people with pitch perception deficits learn to compensate for their deficits by giving more weight to durational cues when decoding speech prosody (Jasmin, Sun, et al., 2020; Jasmin, Dick, et al., 2020). Recent evidence suggests that similar effects emerge in music perception: Mandarin speakers have difficulty ignoring pitch cues relative to English and Spanish speakers, who have been found to more frequently make decisions based on duration cues (Jasmin et al., 2021). In turn, this is consistent with theories of the overlapping mechanisms of basic auditory perception (Patel, 2011; Patel & Iversen, 2014; Tierney et al., 2013; Wong et al., 2007). Our findings unite these results and show their generality.
While the scope of our data collection allowed for analysis of music processing abilities in thousands of native speakers of six pitch-accented languages, we hesitate to make any strong claims concerning the link between this type of linguistic experience and music skills. From a statistical standpoint, the evidence for differences in music perceptual abilities in this group were far weaker than those of tonal languages (with much smaller effect sizes; see Figure 2 and Table 2) and far more variable (with large between-language variability within the pitch-accented language group; see Figure 4 and Figures S1a-c). While the results do converge with one prior study (Burnham et al., 2015), which showed that for a variety of pitch tasks, Swedish (pitch-accented) speakers were marginally better than English (non-tonal) speakers, but worse than Thai or Mandarin (tonal) speakers, clearer evidence is necessary for a general claim concerning the effects of pitch-accented language experience on music processing. This, of course, is complicated by the inherently fuzzier nature of classification of pitch-accented languages, relative to tonal languages [see Footnote 1; Gussenhoven & Gussenhoven (2004); Hyman (2006); Hyman (2009)]. We welcome further work that codifies that nature of both linguistic and musical pitch use across this set of generally understudied languages. To this end, we make all our data available through OSF for further analysis by interested parties.
We note several other limitations. First, while we accounted for how much musical training participants had, we did not measure how long they engaged with this training, or its intensity. As a result, our estimates of the effect of musical training have greater uncertainty (although the analyses for participants with no musical training, which largely replicate the main effects, help to mitigate this concern). Second, participants only reported their first language, so we were unable to examine the effects of bilingualism or multilingualism (Krizman et al., 2012; Liu et al., 2020; Liu & Kager, 2017), nor assess whether speaking languages that use tone differently (e.g. Mandarin and English) might have contributed additional variability in our results. Third, there are a host of other unmeasured cultural, environmental, and genetic factors that surely affect musical abilities. Moreover, these likely interact with each other, complicating causal inferences from the observational data we collected [e.g. recent findings that genetics and musical experience both influence linguistic tone perception in Cantonese; Wong et al. (2020)]. These limitations may be addressed in future experiments conducted at smaller scales, but with more precision than a citizen-science approach allows for. For example, targeted analyses of musical abilities in tonal-language speakers who are not typically of East Asian descent, such as Yoruba or Zulu speakers, could help to solidify the generality of the results (thus far, we have only collected data from 40 speakers of these languages via the citizen-science methods reported here).
In sum, our results show that across a range of geographic and demographic contexts, linguistic experience alters music perception ability in reliable (but not unitary) fashions. This implies that substantively different domains of auditory perception recruit shared processing resources, which themselves are shaped by auditory experience.
End notes
Supplemental information
The supplemental information includes 3 text sections, 8 tables and 4 figures.
Data, code, and materials availability
A reproducible version of this manuscript, including all data and code, is available at https://github.com/themusiclab/language-experience-music. The preregistration is at https://osf.io/xurdb. Readers can try out the experiment at https://themusiclab.org/quizzes/miq; code for each of the three tests is available at https://github.com/pmcharrison/mpt, https://github.com/pmcharrison/mdt, and https://github.com/pmcharrison/cabat.
Author contributions
Conception: J.L.; S.A.M. Experimental design and implementation: S.A.M. Pre-registration and planned analyses: J.L.; C.B.H; E.B.; S.A.M. Participant recruitment: S.A.M. Analysis and visualization: J.L.; C.B.H.; E.B.; S.A.M. Writing: J.L.; C.B.H.; E.B.; S.A.M.
Supplementary Information
S1.1 Details of the exploratory analysis
The exploratory sample (N = 196,689; demographics are in Table S5) consisted of participants who completed the three music perception tests between Nov. 8th, 2019 and Apr 27th, 2020 and passed our exclusion criteria (See Table S5 for breakdown of demographics). We ran the same analyses as in the main text (including the main regression analysis and the three alternate approaches); the results are summarized in Table S6.
S1.2 Validation of self-reported headphone use
Participants who self-reported that they were wearing headphones completed a 6-trial headphone detection task (Woods et al., 2017) designed to be easy for participants wearing headphones and difficult for those listening on free-field speakers. Out of the 194,206 participants who indicated wearing headphones, 165,828 had clean and usable headphone detection data. The distribution of scores for these participants (Figure S2) was strongly left-skewed with the median participant scoring 5.09 of 6 (100%) correct. This implies that the bulk of participants who self-reported wearing headphones were, in fact, wearing headphones.
S1.3 Replications of main analyses at alternate sample levels
We replicated the main results (Table 2 and Figure 2) using three alternate techniques of controlling for confounding (see Methods: these were approaches of exact matching; exact matching while excluding participants who had music lessons; and inverse probability weighting). The results repeated robustly, as summarized in Table S3, and reported fully in Tables S4a-c.
The languages studied here, with sample sizes, largest-sample-size country, and the source of the language classification. Abbreviations: WALS (The world atlas of language structures), LAPSyD (Lyon-Albuquerque phonological systems database)
Demographics of participants in the confirmatory dataset, by language type.
Summary of main results, repeated across the four analysis approaches (*p < 0.05, ***p < 0.001).
Least squares regression outputs for each musical test from the exact match sample (***p < 0.001).
Least squares regression outputs for each musical test from the exact match sample without music lessons (***p < 0.001).
Least squares regression outputs for each musical test from the inverse probability weighted sample (**p < 0.001).
Demographics of the exploratory dataset, by language type.
Beta coefficients for different analysis approaches in the exploratory sample (*p < 0.05, ***p < 0.001).
Melodic discrimination scores ranked by language (ranked by median, detailed version of Figure 4 in the main text). Only participants who haven’t taken music lessons are included.
Mistuning perception scores ranked by language (ranked by median, detailed version of Figure 4 in the main text). Only participants who haven’t taken music lessons are included.
Beat alignment scores ranked by language (ranked by median, detailed version of Figure 4 in the main text). Only participants who haven’t taken music lessons are included.
Scores on the headphone detection task, from participants who self-reported that they were wearing headphones. The maximum score was 6; the dashed red line indicates the mean score.
Acknowledgments
This research was supported by the Duke University Internship Funding Program (J.L.); the Harvard Data Science Initiative (S.A.M.); and the National Institutes of Health Director’s Early Independence Award DP5OD024566 (S.A.M. and C.B.H.). We thank the participants; P. Harrison and D. Müllensiefen for sharing code and assisting with the implementation of their music perception tasks; J. Simson for technical and research assistance; and the members of The Music Lab for discussion and feedback on the citizen-science platform, the experiment, and the manuscript.
Footnotes
↵1 Whether pitch-accented language form a coherent standalone category or whether they are better considered on a spectrum between tonal and non-tonal languages, with some mixed cases, is a matter of debate (e.g., Gussenhoven & Gussenhoven, 2004; Hyman, 2009, 2006). Indeed, a category of languages that groups evidently disparate languages together, such as Japanese and Swedish, may not be defensible. As pitch-accented languages are not our primary focus, here we treat them as a separate group from tonal and non-tonal languages, but also conduct some analyses at the language level rather than the language-group level. We encourage anyone interested in alternative groupings of languages to use our public data and code to re-analyze accordingly.
↵2 Our preregistration noted that we would study participants whose data were collected between 28 April 2020 and 20 February 2021; we opted to include additional data collected up until the date of submission of this paper, to maximize the confirmatory sample size.