ABSTRACT
Interest in statistical learning in developmental studies stems from the observation that 8-month-olds were able to extract words from a monotone speech stream solely using the transition probabilities (TP) between syllables (Saffran et al., 1996). A simple mechanism was thus part of the human infant’s toolbox for discovering regularities in language. Since this seminal study, observations on statistical learning capabilities have multiplied across domains and species, challenging the hypothesis of a dedicated mechanism for language acquisition. Here, we leverage the two dimensions conveyed by speech –speaker identity and phonemes– to examine (1) whether neonates can compute TPs on one dimension despite irrelevant variation on the other and (2) whether the linguistic dimension enjoys an advantage over the voice dimension. In two experiments, we exposed neonates to artificial speech streams constructed by concatenating syllables while recording EEG. The sequence had a statistical structure based either on the phonetic content, while the voices varied randomly (Experiment 1) or on voices with random phonetic content (Experiment 2). After familiarisation, neonates heard isolated duplets adhering, or not, to the structure they were familiarised with. In both experiments, we observed neural entrainment at the frequency of the regularity and distinct Event-Related Potentials (ERP) to correct and incorrect duplets, highlighting the universality of statistical learning mechanisms and suggesting it operates on virtually any dimension the input is factorised. However, only linguistic duplets elicited a specific ERP component, potentially an N400 precursor, suggesting a lexical stage triggered by phonetic regularities already at birth. These results show that, from birth, multiple input regularities can be processed in parallel and feed different higher-order networks.
HIGHLIGHTS
Human neonates are sensitive to regularities in speech, encompassing both phonetic content and voice dimension.
There is no observed advantage for statistical computations of linguistic content over voice content.
Both speech dimensions are processed in parallel by distinct networks.
Phonetic regularities evoked a specific ERP component in the post-learning phase, suggesting activations along the pathway to the lexicon.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
The main modifications are: 1. A revision of the introduction to better explain what transitional probabilities are and clarify the rationale of the experimental design 2. A revision of the discussion: (a) To better explain the interpretation of the different ERP between duplets after a stream with phonetic or voice regularities (possibly an N400). (b) To clarify the framing of statistical learning as a universal learning mechanism that might share computational principles across features (or domains). 3. The legend of Figure 1 was modified to clarify the experimental design. 4. There was a mistake in the legend of Figure 2 (it said "thick orange line" while it should be black). 5. In Supplementary Information,n, we added a supplementary analysis to rule out the possibility that the result could be driven by the perception of only one female and one male voice.