Infant low-frequency EEG cortical power, cortical tracking and phase-amplitude coupling predicts language a year later

Cortical signals have been shown to track acoustic and linguistic properties of continuous speech. This phenomenon has been measured in both children and adults, reflecting speech understanding by adults as well as cognitive functions such as attention and prediction. Furthermore, atypical low-frequency cortical tracking of speech is found in children with phonological difficulties (developmental dyslexia). Accordingly, low-frequency cortical signals may play a critical role in language acquisition. A recent investigation with infants Attaheri et al., 2022 (1) probed cortical tracking mechanisms at the ages of 4, 7 and 11 months as participants listened to sung speech. Results from temporal response function (TRF), phase-amplitude coupling (PAC) and dynamic theta-delta power (PSD) analyses indicated speech envelope tracking and stimulus-related power (PSD) for delta and theta neural signals. Furthermore, delta- and theta-driven PAC was found at all ages, with theta phases displaying stronger PAC with high-frequency amplitudes than delta. The present study tests whether these previous findings replicate in the second half of the full cohort of infants (N = 122) who were participating in this longitudinal study (first half: N=61, (1); second half: N=61). In addition to demonstrating good replication, we investigate whether cortical tracking in the first year of life predicts later language acquisition for the full cohort (122 infants recruited, 113 retained) using both infant-led and parent-estimated measures and multivariate and univariate analyses. Increased delta cortical tracking in the univariate analyses, increased ∼2Hz PSD power and stronger theta-gamma PAC in both multivariate and univariate analyses were related to better language outcomes using both infant-led and parent-estimated measures. By contrast, increased ∼4Hz PSD power in the multi-variate analyses, increased delta-beta PAC and a higher theta/delta power ratio in the multi-variate analyses were related to worse language outcomes. The data are interpreted within a “Temporal Sampling” framework for developmental language trajectories.

envelope tracking and stimulus related power (PSD) via the delta & theta neural signals.
Furthermore, delta and theta driven PAC was found at all ages with gamma amplitudes displaying a stronger PAC to low frequency phases than beta. The present study tests whether those previous findings replicate in the second half of the same cohort (first half: N=61, (1); second half: N=61). In addition to demonstrating good replication, we investigate whether cortical tracking in the first year of life predicts later language acquisition for the full infant cohort (122 infants recruited, 113, retained). Increased delta cortical tracking and theta-gamma PAC were related to better language outcomes using both infant-led and parent-estimated measures. By contrast, increased ~4Hz PSD power and a greater theta/delta power ratio related to decreased parent-estimated language outcomes. The data are interpreted within a "Temporal Sampling" framework for developmental language trajectories.

Introduction
Infant speech perception develops rapidly during the first year of life, yet our understanding of the neural factors underpinning language acquisition is still incomplete. Whilst infant behavioral experiments have highlighted perceptual factors contributing to early phonological and morphological development, for example perceptual 'magnet' effects (2), the neural mechanisms supporting infant language acquisition are poorly understood. Here we contribute novel longitudinal neural data from a relatively large sample of 113 infants, utilising EEG recordings taken during the first year of life while infants listened to sung or chanted nursery rhymes. The neural measures are used to explore relations with subsequent language acquisition using a battery of standardised and experimental phonological, gestural and lexical measures taken from 12 to 24 months of age (3).
The auditory neuroscience of adult speech processing has revealed that accurate encoding of amplitude modulations (AMs) in the speech signal via neuronal oscillations is fundamental to many different aspects of language comprehension. Regarding development, and building on these neural insights, Temporal Sampling (TS) theory describes how low-frequency neural oscillations may be key to learning and extracting phonological information from the speech signal (4,5). According to TS theory, auditory cortical networks operating at delta, theta, beta and gamma frequencies 'sample' the envelope of the speech signal at matched frequencies, thereby underpinning the phonological (linguistic) encoding of speech by infants and children.
Cortical signals have been shown to track various key properties of speech, using methodological advances such as Temporal Response Functions (TRFs), an analysis technique that explores how neural signals encode continuous sensory stimuli. Adult TRF studies have revealed how cortical tracking of the speech envelope reflects both bottom-up and top-down cortical processes in adult listeners, linking to selective attention, prediction, signal parsing and speech intelligibility (6)(7)(8)(9)(10)(11)(12). Recent infant EEG data has revealed robust cortical tracking of the speech envelope (13)(14)(15), particularly in the delta and theta frequency bands (1). Indeed, patterns of cortical tracking in delta and theta frequency bands make distinct contributions to speech encoding by adults (16), and there is evidence for preferential tracking in the delta band during the first year of life (∼0.5Hz-4 Hz, see 1,17). Consequently, language acquisition by infants may depend in part on successful low-frequency cortical tracking of the speech signal.
Further, low-frequency cortical oscillations do not work in isolation from each other but show dynamic interactions during speech processing. There is a nested hierarchy of cortical oscillations that match the frequency of multiple components of the speech signal (18). Lowfrequency (delta, theta) neural phase dynamics temporally organize the amplitude of high-frequency signals in a process known as phase-amplitude coupling (PAC; (19)(20)(21)). Adult electrophysiological and MEG recordings have shown that PAC between delta-gamma, deltabeta and theta-gamma is induced in response to continuous speech (18,22,23), as well as to rhythmic speech stimuli (sung and chanted nursery rhymes, (17)). PAC is thought to group speech information into linguistic units (24,25). Like natural speech, rhythmic speech stimuli induce PAC in infants and adults (1). Interestingly, PAC emerges as early as 4 months of age and remains stable, while both delta-and theta-band cortical tracking show developmental changes over the first year of life (1). Accordingly, PAC may also relate to later language outcomes.
Studies relating infant cortical tracking to later language outcomes are rare (26).
However, research on the role of cortical dynamics in language development has progressed by studying older children with language difficulties. For example, recent EEG modelling studies (27) have shown that delta-theta PAC is atypical in children with developmental language disorder (DLD) while the theta/delta power ratio during passive story listening is atypical in children with developmental dyslexia. Children with dyslexia have poor phonological processing skills while children with DLD have poor syntactic skills. Accordingly, both PAC and the theta/delta power ratio in infants may predict subsequent language development, in addition to low-frequency cortical tracking. The developmental theoretical framework encapsulated by TS theory informed Araujo et al.'s (27) modelling (Pre-print), and predicts that cortical tracking within specific low-frequency bands (i.e. delta, 1-4 Hz and theta, 4-8 Hz) and their interactive dynamics is central to language acquisition (4,28,29). In brief, if low-frequency cortical tracking and/or dynamic relations between frequency bands are atypical, then developmental trajectories for language may also be atypical. Previous electrophysiological studies testing TS theory with children have revealed a significant role for delta band encoding in both phonological and vocabulary development (30)(31)(32)(33). However, PAC (delta-beta PAC) was not significantly different in children with dyslexia in a language processing (sentence repetition) task (32).
Infants and toddlers at risk for dyslexia differ from not-at-risk infants in measures of acoustic processing that relate to cortical tracking, with implications for later language development (14,38). Understanding how dynamic cortical processes in the infant brain affect language development has yet to be studied in detail.
In the current study, we explore the potential role of individual differences in low-frequency cortical tracking, PAC and theta-delta power relations in predicting individual differences in later language acquisition. Participating infants were drawn from the Cambridge UK BabyRhythm project, a longitudinal study of 122 infants (122 recruited, 113 retained) from two-to 42-monthsof-age, investigating early neural and motor tracking of auditory and audiovisual rhythms in relation to language outcomes. Here we examine whether increases in low-frequency EEG power spectral density (PSD) in the delta and theta range, low-frequency cortical tracking (delta, theta and alpha) and PAC in response to rhythmic speech predict performance on multiple language and communication measures administered in later infancy and toddlerhood. EEG data was collected longitudinally at 4-, 7-and 11-months and used to predict subsequent performance in phonological, gestural and lexical measures taken from 12 to 24 months of age. It was expected that stronger cortical tracking would be predictive of better language outcomes (26). As PAC did not show strong development with age, no strong predictions were made. However, given the DLD modelling data noted earlier (27), poorer PAC may be linked to worse language outcomes.
Further, a higher theta/delta power ratio may be linked to worse language outcomes (27). In Araújo et al., preprint, higher ratios in the dyslexic group were not driven by group differences in either delta or theta power per se, but by their joint dynamics. Accordingly, we also examine the theta/delta ratio as a potential predictor of language outcomes.

Infant participants and ethics
All the data reported here is from the longitudinal Cambridge UK BabyRhythm project (122 infants recruited, 113 retained). The cohort was split into two halves based on recruitment date) to allow for confirmatory analyses of neural factors by comparing EEG data from the first and second halves of the cohort. The current report is from the second half of the sample. Results from the first half of the sample are reported in (1). Due to participant withdrawals, missed appointments and data exclusion the final number of data points for the second half of the sample was, ~4-months (N=56), ~7-months (N=51), ~11-months (N=52). Missing data was due to missed appointments or technical issues with the data files, ~4-months (N=4), ~7-months (N=10) and ~11-months (N=7). Other infants were excluded due to having fewer than 42 data epochs available after pre-processing, ~4-months (N=1), ~7-months (N=0) and ~11-months (N=2).
Infants were recruited from a medium sized city in the United Kingdom and surrounding areas via multiple means including flyers in hospitals, schools and antenatal classes, research presentations at maternity classes and online advertising. All infants were born full term (37)(38)(39)(40)(41)(42) gestational weeks) and had no diagnosed developmental disorder. The study was reviewed by the Psychology Research Ethics Committee of the University of Cambridge. Parents gave written informed consent after a detailed explanation of the study and families were reminded that they could withdraw from the study at any point during the repeated appointments (8 EEG recordings at 2-, 4-, 5-, 6-, 7-, 8-, 9-and 11-months; 6 language follow-ups at 12-, 15-, 18-, 24-, 30-and 42months).

Stimuli
A selection of 18 common English language nursery rhymes were chosen as the stimuli.
Audio-visual stimuli of a singing female (head only) were recorded using a Canon XA20 video camera at 1080p, 50fps and with audio at 4800 Hz. A native female speaker of British English used infant directed speech to melodically sing (for example "Mary Mary Quite Contrary") or rhythmically chant (for nursery rhymes like "There was an old woman who lived in a shoe") the nursery rhymes whilst listening to a 120 bpm metronome. Although the nursery rhymes had a range of beat rates and indeed some utilized a 1 Hz rate, the metronome was used to keep the singer on time. The beat was not present on the stimulus videos, but it ensured that a consistent quasi-rhythmic production was maintained throughout the 18 nursery rhymes. To ensure natural vocalisations the nursery rhyme videos were recorded while being sung or rhythmically chanted to an alert infant.

EEG data collection
Infants were seated in a highchair, approximately one meter in front of their primary care giver, within a sound-proof acoustic chamber. EEG data was recorded at a sampling rate of 1000 Hz using a GES 300 amplifier connected to an appropriately sized 64 channel infant net (Geodesic Sensor Net, Electrical Geodesics Inc., Eugene, OR, USA). The infant was seated ~650mm away from the presentation screen and sounds were presented at 60dB (checked by sound level meter) from speakers (Q acoustics 2020i driven by a Cambridge Audio Topaz AM5 stereo amplifier) placed either side of the screen. Whilst the infant attended to the screen, the 18 nursery rhyme videos played sequentially, each repeated 3 times (54 videos, with a presentation time of 20' 33'' in total). If the infant lost attention to the screen an attention grabber video was played at the end of that nursery rhyme. If the infant became too fussy a short break was allowed before continuing, otherwise the session was ended. All infants included for analysis listened to at least 2 repetitions of each nursery rhyme (minimum of 36 nursery rhymes, lasting 13' 42'').
This stimulus period was followed by 5 minutes of silent recording (hereafter silent state). To ensure compliance from the infant, and therefore a good EEG signal during the silent state, it was necessary for a researcher to sit alongside the infant. To ensure consistency across participants, the researcher performed the same action of silently blowing bubbles and showing the same picture book to the infant during this period.

EEG preprocessing
All analyses were conducted with custom-made scripts in Matlab 2017a (The MathWorks, Inc., Natick, MA) incorporating the EEGLab toolbox (36).
The four (channels 61, 62, 63 and 64) facial electrodes (present on the larger nets) were excluded from all analyses because they do not exist on the infant-sized EGI Geodesic sensor nets. The EEG data, from the remaining 60 channels, was filtered (pop_eegfiltnew function of EEGLab toolbox) into a broadband signal (0.5-45 Hz for PSD and PAC, methods for acquiring PAC bands of interest further detailed in (1)) or the frequency band of interest (delta, 0.5-4 Hz, theta, 4-8 Hz or alpha [control band] 8-12 Hz for the mTRF analyses) using zero-phase bandpass Hamming windowed FIR filters (transition band widths of 2 Hz with cutoff frequencies at -6 dB, 0-5 Hz, 3-9 Hz and 7-13 Hz respectively). The EEG data was down sampled to 100 Hz to reduce the computational load. Next, the clean_asr EEGLab function (36,37) was used to clean noise artifacts from the data by identifying and removing bad principal components via a modified PCA procedure (see (1)). Further bad channels were identified via probability and kurtosis and were interpolated (via spherical interpolation), if they were 3SD away from the mean (average number of interpolated channels; 4mo = 6.9, 7mo = 6.5, 11mo = 6.6) before all channels were rereferenced to a 60-channel average reference.
EEG responses were epoched into trials aligned to the start and ending at the completion of a nursery rhyme phrase (e.g. "Mary had a little lamb"), producing EEG responses to 83 phrases (M length ± SD: 4.23sec ±0.88) which were repeated a maximum of 3 times in the experiment (249 epochs in total). This epoching procedure was decided as infant EEG typically contains short, irregular, movement artifacts and using shorter epochs increases the likelihood of generating a whole artifact-free epoch whilst maintaining long enough epochs for optimising the model fit with the mTRF toolbox (38). The infants occasionally rested their head or touched the EEG cap in such a way that specific channels showed short bursts of noise whilst the rest of the channels remained clean. To account for this, and retain a larger volume of data, epoch by epoch channel rejection was conducted. Per epoch, probability and kurtosis were used to identify bad channels and were interpolated (via spherical interpolation) if they were 3SD away from the mean. Finally, bad epochs were rejected with the pop_autorej function (EEGLab), removing epochs with fluctuations above 1000uV and values outside a 3SD of the probability threshold.

Multivariate Temporal Response Function (mTRF)
TRFs describe the linear relationship between an input and an output signal, taking into account that such a relationship may not be instantaneous and could extend over a certain window of time (38). TRFs can be estimated with system identification methodologies, such as the multivariate temporal response function (mTRF), which is a de-convolution method based on multiple lagged linear regression. Here, we applied the mTRF in the backward direction to reconstruct the stimulus envelope from the EEG signals. In turn, this informs us on the strength of the stimulus-EEG relationship, which we take as an index of cortical tracking of the stimulus envelope. The backwards 'stimulus reconstruction' mTRF model has important advantages compared to the forward linear encoding models which was previously used in infants study (13,14). Specifically, backward TRFs combines data from all the EEG channels simultaneously in a multivariate manner to reconstruct the univariate stimulus envelope, while forward TRF models only predict one EEG channel at a time. A Pearson's correlation value was calculated for each participant, representing how well the model reconstructed the stimulus envelope (a metric for cortical tracking), when trained with either the delta (0.5-4Hz), theta (4-8Hz) or alpha (8-12Hz) EEG band data. See (1) for full a detailed description of the mTRF method and analysis.

mTRF auditory stimuli preprocessing
The envelope of the auditory signal was extracted by taking the absolute value of the analytic signal generated by the Hilbert transform (Matlab). As the envelope of the lower frequencies is linearly relatable to the EEG signal (39-41) the envelope of the stimuli was filtered between 0.5 Hz and 15 Hz (lowpass; 6 th order Butterworth filter. Highpass; 9 th order Butterworth filter). The resultant envelopes were normalised using nt_normcol (Noisetools).
Finally, the stimulus envelopes were down sampled to 100 Hz to match the EEG signal.

Phase amplitude coupling (PAC)
All remaining epochs after preprocessing were concatenated back into one continual EEG signal. Due to the increased susceptibility of PAC analysis to noise, manual rejection was conducted to further remove noisy periods of data from the signal.

Spectral analysis (periodogram PSD estimate)
The same concatenated data sets created for the PAC analysis were also used for the periodogram Power Spectral Density (PSD) estimates, in which all epochs after preprocessing were concatenated back into one continual signal with manual rejection used to remove noisy periods of data from the signal. A one-sided PSD estimate was conducted separately for each electrode channel using the periodogram function (Matlab). The average PSD across channels was next calculated with the maximum value, within a predefined 0.25Hz window centered on the peak of interest, taken per participant for further analysis. See (1) for a detailed description of the PSD method and analysis.

Theta / delta (PSD) ratio
The PSD peak per participant at 4.35 Hz and 1.95 Hz, taken from the spectral decomposition of the EEG signal described above, was used to create a theta / delta ratio calculation. Per participant, the 4.35 Hz PSD value was divided by the 1.95 Hz PSD value (4.35/1.95) to give a theta / delta ratio per participant.

Language measures
A full description of the language measures used here can be found in (3). The parentestimated measures were taken from a carer-completed questionnaire, the UK-CDI. The UK-CDI estimates both receptive and productive vocabulary (43), which we refer to here as language comprehension or language production. In the current report, the 24-month CDI measures are used (see Rocha et al.,preprint (3), for rationale). Only measures up to the age of 24 months were included, as the 30-and 42-month measures are still being processed. Five measures, taken from 3 of our experimental tasks, were included as the infant-led language measures. These were 1, gesture (adapted from (44)), in which pointing towards a present and absent target was used a pre-verbal measure of communicative intent at 12 months; 2, the computerized comprehension task (CCT), a touch screen-based measure of infant vocabulary in which the infant is asked to select a picture of a named item (45), and 3, a custom designed non-word repetition (NWR) task, which measured the child's productive phonology. Three measures were taken from the NWR task administered at 24 months, (i) accurate reproduction of consonants, (ii) number of syllables reproduced, and (iii) stress patterns (canonical or non-canonical, see Rocha et al.,preprint (3)).
This NWR task measured phonological production of both nonsense words (e.g. "punky") and real words (e.g. "puppy") to name a series of toys.

Language outcomes: statistics
Multivariate linear models (using Rstudio lm function) were run separately for the PSD, mTRF, theta-delta ratio and PAC measures. Due to the inherent differences between the infantled and parent-estimated measures, separate multivariate linear models were conducted for each, providing a general measure of language abilities taken from either infant-led or parent-estimated language measures. This approach was considered most appropriate as additional variance can be extracted from an infant's overall language performance which may not be visible by looking at individual measures separately. However, we also include Bonferroni-corrected individual univariate linear models to highlight the individual contribution of each language measure to the overall infant-led or parent-estimated multivariate linear model results.

PSD, mTRF, PAC and theta/delta ratio
First, we present the power spectral density (using a periodogram; PSD), cortical tracking (using multivariate temporal response functions; mTRFs), phase-amplitude coupling (normalised modulation index, nMI; PAC) and theta/delta PSD ratio analyses using EEG recorded from the second half of our cohort of infants. We compare these data to the analyses from the first half of the cohort (1), and also merge the neural data to explore these neural factors in the combined full sample of 113 infants. Due to missed recording sessions, technical issues with data files and not enough trials after preprocessing, less than 113 data points are included in each analysis.
Furthermore, outlier analysis was also conducted to remove extreme data points that would compromise each of the LMEMs. The number of data points included are given separately for each analysis. Figure 1, Spectral decomposition of the EEG signal in the a) first half, b) second half and c) full infant cohort (1-12 Hz plotted from 0.5 to 45 Hz calculated), in response to nursery rhyme stimulation. A periodogram was used to obtain a power spectral density (PSD) estimate separately for 4-(red), 7-(green) and 11-(blue) months data. Bold lines indicate the mean values and pale shading plots the standard deviation of the data. Outlier analysis was also conducted to remove extreme data points that would compromise the LMEM. Panel A is reproduced from  (1), with permission.

Power spectral density (PSD) analyses
The distribution of low-frequency oscillations in the EEG recorded from the second half of the sample and the full cohort were obtained using the power spectral density (PSD) estimate ( Fig. 1). The results from the first half of the cohort (1) are reproduced for comparison. Given the differences in the location of the PSD peaks for the first and second halves of the cohort, the distribution of PSD peaks in low-frequency power for the full sample were first established by creating a grand average PSD estimate across channels, recording type and ages. This revealed three frequency peaks centred around 1.92 Hz, 4.05 Hz and 4.35 Hz (Fig. S1). The distribution of the PSD peaks were similar to those observed in the first half of the sample (see (1)) and in the second half of the sample (Fig. S2). It is important to note that when the full cohort was analysed, the ~4Hz peak split into two locations, peaking at 4.05 Hz and 4.35 Hz. To further investigate whether the PSD was significantly higher in response to the nursery rhymes compared to the silent state, and whether this relationship was affected by the age of the participants, a linear mixed effects model (LMEM) was conducted separately for the 1.92 Hz, 4.05 Hz and 4.35 Hz peaks. A random intercept (AR1) per participant was included to account for individual differences across the 3 recording sessions (4-, 7-and 11-months). The full sample submitted to the LMEM was as follows, 4-months (N=96), 7-months (N=88) and 11-months (N=92). This process was conducted separately for the first half of the sample, the second half of the sample and the full cohort (   Table 1). A significant effect of age was observed, with ~4.35 Hz power increasing parametrically with age from 4-months to 7-months to 11-months (estimates of fixed effects provided in Table S1). The observed recording type by age interaction was due to a developmental increase in NR induced power above the SS. Bonferroni-corrected pairwise comparisons showed that PSD increased significantly between 4-and 11-months (M increase = 18.41, SE ± 7.43, p = 0.043) as well as between 4-and 7-months (M increase = 30.56, SE ± 5.77, p = 6.7x10 -7 ). Hence there was a developmental increase in 4.35 Hz PSD for both the nursery rhymes and the silent state. This may indicate age-related maturation of theta activity in general. Importantly, we also observed a stimulus-driven interaction with age for the 4.35 Hz PSD peaks, with the largest increase observed at 11-months (4-months, NR, M = 38.9 SE ± 4.9 ; SS, M = 37.9 SE ± 5.5. 7-months, NR, M = 63.0 SE ± 6.9; SS, M = 50.6 SE ± 7.2. 11-months, NR, M =84.8 SE ± 5.8; SS, M = 53.1 SE ± 6.0). This showed that the interaction was driven by a greater PSD in response to the nursery rhymes at 11-months. This pattern replicated the theta PSD findings observed in the first half of the sample (see (1)). In summary, the peak at 4.35 Hz showed robust replication across the first and second half of the sample, with consistent increases in NR related power displaying a developmental increase in PSD magnitudes, above SS.  Table 2. Tests of fixed effects for LMEM on PSD power peak at 4.05Hz. Results reported for separate LMEMs run on the first half, second half and full sample of data. Significant results (p<0.05) denoted by bold italic text. The Satterthwaite approximation was applied to approximate the degrees of freedom, due to the missing cases.

~4.05 Hz
For the 4.05 HZ peak, the LMEM results replicated less well across the first half of the sample, the second half of the sample and the full cohort. Significant effects of both recording type and age were only observed when analysing the full cohort. The significant effect of age was driven by the 7-month (M = 63.1 SE ± 6.0) PSD power being larger than the 11-month (M = 52.5 SE ± 3.2), however this result is in part due to the 0.25 Hz window not fully encapsulating the 11-month peak for all participants (Fig. 1). A significant effect of recording type was observed in the second half of the sample, with a trend observed in the first half of the sample (Table 2). A significant effect of age was observed in the first half of the sample with a trend towards significance observed in the second half of the sample. As the data were more robust for the 4.35 HZ peak, and as the 0.25 Hz window did not fully capture the 4.05 Hz peak for all participants, the 4.35 Hz PSD peak was taken forward as the 'theta' peak for analyses with the language data.  Table 3. Tests of fixed effects for LMEM on PSD power peak at 1.92Hz. Results reported for separate LMEMs run on the first half, second half and full sample of data. Significant results (p<0.05) denoted by bold italic text. The Satterthwaite approximation was applied to approximate the degrees of freedom, due to the missing cases.

~1.92 Hz
For the 1.92 Hz peak, the LMEM conducted on the full cohort revealed a significant effect of recording type (Table 3) -27 ). A significant recording type by age interaction was observed in both the second half of the sample and the full cohort. In conclusion, the 1.92Hz stimulus-related increase in PSD power is broadly replicated across the first and second halves of the sample once the anomalous SS power, present in the 4-month data in the second half of the sample, was taken into account. Figure 2, mTRF replication. Grand averaged stimulus reconstruction values (Pearson's r) per frequency band in the a) first half, b) second half and c) full infant cohort. Each bar represents the grand average r value across the participants at the 3 age points: 4-months (red), 7-months (green) and 11-months (blue). Age responses are grouped along the X axis by EEG frequency band, delta (0.5-4 Hz) theta (4-8 Hz) and alpha (8)(9)(10)(11)(12). Light grey boxes signify the average random permutation value across subjects, calculated separately for each band. Outlier analysis was conducted to remove extreme values which would compromise the LMEMs. Panel a is reproduced from , with permission.

Cortical tracking (mTRF) analyses
A linear mixed effects model (LMEM) was constructed using the same parameters as described in Attaheri et al., 2022 (1) to examine potential developmental changes in cortical tracking in the delta, theta and alpha (control) bands. The full sample submitted to the mTRF LMEM was as follows, 4-months (N=108), 7-months (N=104) and 11-months (N=106). Main effects of data type (real r values, random permutation r values), frequency band (delta, theta or alpha) and age (4-, 7-or 11-months) were investigated along with interactions between data type by frequency band, data type by age and age by frequency band. A random intercept (AR1) per participant was included to account for individual differences across the 3 recording sessions (4-, 7-and 11-months).
The main pattern of results replicated between the first and second halves of the sample, with significant fixed effects of data type and frequency band, as well as significant interactions between data type by frequency band and age by frequency band (Table 4). There were no significant effects of age nor data type by age interactions when analysing the second half of the sample in isolation, although these effects were present in the first half of the sample (Table 4).
When using the full cohort, tests of fixed effects showed significant effects of data type, frequency band and age. These results indicate that the real r values (cortical tracking) in response to nursery rhymes were significantly above randomly permuted r values and that the cortical tracking values differed by frequency band and age. A significant interaction between data type and age indicated that the difference between random and real r values varied by age.
Examination of the estimates of fixed effects (Table S2) (Table S2) when using the full sample, mimicking the results from the first half of the sample (1). The significant interaction between age and frequency band showed that the relative difference in r values between the three age groups were different in the three frequency bands. In summary, all the main effects reported by Attaheri et al. 2022 (1) were replicated in the second half of the sample, with only age effects varying. This replication strengthens the conclusion that delta and theta EEG responses track the acoustic envelope of sung and chanted speech from the age of 4-months and that this infant neural response is robust and reliable.  Table 4. Tests of fixed effects for LMEM of cortical tracking across data type (real r values, random permutation r values), frequency band (delta, theta or alpha) and age (4-, 7-or 11months). Results reported for separate LMEMs run on the first half, second half and full sample of data. Significant results (p<0.05) denoted by bold italic text. The Satterthwaite approximation was applied to approximate the degrees of freedom, due to the missing cases. Figure 3, PAC replication. Violin plots of phase amplitude coupling (normalised modulation index, nMI) at 4-months (red), 7-months (green) and 11-months (blue) for the a) first half, b) second half and c) full infant cohort. The nMIs, from all significant analysis windows, were averaged for each individual infant. The PAC pairing with the maximum nMI per infant, from within the pre-defined frequency bands of interest (delta/beta, delta/gamma, theta/beta and theta/gamma; beta 2-4 Hz, theta 4-8 Hz, beta 15-30 Hz and gamma 30-45 Hz), were included in the grand average violin plot. Horizontal black and red lines represent the group mean and median respectively. Outlier analysis was conducted to remove extreme data points that would compromise the LMEM. Panel A is reproduced from Attaheri et al. 2022 (1), with permission.

Phase amplitude coupling (PAC) analysis
The PAC analyses (Fig. 3) were conducted for the second half of the sample and the full cohort, replicating the LMEM conducted with the first half of the sample (1). The full sample submitted to the PAC LMEM was as follows, 4-months (N=108), 7-months (N=103) and 11months (N=107). This LMEM included fixed effect factors of low-frequency phase (delta or theta), high-frequency amplitude (beta or gamma) and age (4-, 7-or 11-months). Interactions of low-frequency phase by age, high-frequency amplitude by age and low-frequency phase by highfrequency amplitude were also included in the model, with a random intercept (AR1) per participant (see methods section and Attaheri et al., 2022 (1) for detailed methods). Tests of fixed effects are reported in Table 5 and estimates of fixed effects are provided in Table S3. It is important to note that all the normalised modulation index (nMI) values included in the LMEM displayed significant coupling above a randomly permuted surrogate distribution, thus the LMEM intended to explore whether the nMI values were significantly different when either delta or theta was the low-frequency carrier phase or when either beta or gamma were the highfrequency amplitudes, and if this relationship was affected by age.
Analysis of the full cohort revealed a significant effect of high frequency amplitude (HFA), demonstrating that the nMI values where larger when gamma (M = 2.929, SE ± 0.31) was the HFA, rather than beta (M = 2.820, SE ± 0.31), when coupling with low-frequency phases ( Table 5). The non-significant effect of low-frequency phase (LFP) demonstrated that the gamma amplitudes coupled equally as strongly with delta and theta phases ( Table 5). The main effect of high frequency amplitude (HFA) observed in the first half of the sample and the full cohort did not replicate between the first and second half of the sample. However, there was a significant LFP by HFA interaction in the second half of the sample. This arose because the relationship between the LFPs and the HFAs varied by frequency band. When delta was the LFP, delta-beta (M = 2.884, SE ± 0.063) PAC was stronger than delta-gamma PAC (M = 2.827, SE ± 0.063). This pattern reversers when theta was the LFP with theta-gamma (M = 2.961, SE ± 0.063) demonstrating stronger coupling than theta-beta (M = 2.737, SE ± 0.063). Posthoc analysis revealed that the LFP by HFA interaction was driven most strongly by the strength of the theta-gamma nMI values. A posthoc paired sample t-test revealed theta-gamma was significantly greater than theta-beta (p = 0.025), whilst there was no significant difference between delta-beta and delta-gamma (p = 0.324). Although the main effect of HFA did not replicate between the first and second halves of the sample, theta-gamma was consistently greater than theta-beta across the first and second halves of the sample. All other fixed effects were not significant in either half of the sample or together in the full sample (Table 5). In summary, significant PAC was present at all ages. When delta was the carrier phase, equal PAC was found for both beta and gamma bands. However, when theta was the carrier phase, there was stronger PAC with the gamma band amplitudes.  Table 5, Tests of fixed effects, from LMEM, assess how PAC varied by low frequency phase (LFP), high frequency amplitude (HFA) and age (4-, 7-or 11-months). Results reported for separate LMEMs run on the first half, second half and full sample of data. Significant results (p<0.05) denoted by bold italic text. The Satterthwaite approximation was applied to approximate the degrees of freedom, due to the missing cases.

Theta/delta PSD power ratio
A significant difference between the theta/delta PSD power ratio across the three ages was also found (Fig. 4), indicated by a one-way ANOVA (F(2,273) = 9.54, p = 9.389x10 -5 ). The full sample submitted to the ANOVA was as follows, 4-months (N=96), 7-months (N=88) and 11-months (N=92). The amount of theta PSD power increased relative to the amount of delta PSD power as our sample aged from 4-to 11-months. For comparison, the child modelling using EEG recorded from children with and without dyslexia aged around 9 years showed that a higher theta/delta ratio was associated with poorer language outcomes (27). Figure 4, Theta / delta PSD ratio. The PSD peak per individual at 4.35 Hz and 1.95 Hz, taken from the spectral decomposition of the EEG signal, was used to create a theta / delta ratio calculation (4.35/1.95). Data provided at 4-months (red), 7-months (green) and 11-months (blue) with standard error bars in black.

Neural measures as predictors of language acquisition
Multivariate linear models were used to test whether the neural findings reported in 3.1 (for the full sample) were related to the different language measures outlined in the methods above. These language measures were selected as most reliable in previous analyses of the Cambridge UK BabyRhythm cohort, as they had the largest number of infants contributing data and scores were neither at floor nor ceiling (3). As noted, the parent-estimated measures were receptive and productive vocabulary scores from the UK-CDI (43)

PSD and language outcomes
The 11-months EEG data were selected for comparison to both the parent-estimated and infant-led language measures, due to the observed developmental maturation of the ~4 Hz peak.
The analyses for the 4-month and 7-month EEG are provided as Tables S4 to S10. As described above, the 4.35 Hz peak was the more robust and consistent of the two theta peaks (see Tables 1 & 2) and therefore was used for predicting the language outcomes (4.05 Hz peak results are also reported in Tables S6-S8). The analyses for the ~1.92 Hz peak are also reported here. 77 infants provided complete data for all the parent-estimated measures and PSD power results, whilst 70 infants provided data for all the infant-led language measures and PSD power.
Regarding global language outcomes, the multivariate models indicated that increased ~1.92 Hz PSD power in response to nursery rhymes did not lead to significant prediction of either infant-led (F(5,65) = 1.257, p= 0.285) nor parent-estimated (F(2,75) = 0.560, p= 0.574) outcomes (Table 6). However, the posthoc univariate linear models revealed a trend for pointing at 12-months to be correlated with increases in ~1.92 Hz power (Table 1a; (Table 7). It is important to note that whilst the posthoc univariate linear model analyses indicated that comprehension and production did not individually make significant contributions to this prediction (Table 7b), the direction of the beta estimates suggest that higher power was associated with fewer known words later in development. Taken together, these data suggest that there is a global effect of theta power on parent-estimated (UK-CDI) language outcomes, but that this is not driven by either single parent-     by bold italic text). Modified alpha level for trends are p = <0.02 for infant-led and p = <0.05 for parent-estimated (denoted by bold text).

mTRF and language outcomes
The 11-months EEG data was selected for analyses with both the parent-estimated and infant-led language measures, due to it being the closest in time to the language measures (please see supplementary information for the earlier age groupings; these did not reach significance). 89 infants provided complete data for all the parent-estimated measures and a cortical tracking value, whilst 82 infants provided data for all the infant-led language measures and a cortical tracking value.
The multivariate linear models (

PAC and language outcomes
Phase amplitude coupling (PAC) at 4-months was selected for the language outcome analyses as PAC did not show any developmental improvement, indicating that the early presence of stronger PAC may be informative. 87 infants provided complete data for all the parent-estimated measures and a PAC value, whilst 81 infants provided data for all the infant-led language measures and a PAC value. Multivariate linear models were run separately to investigate whether delta-beta, delta-gamma, theta-beta or theta-gamma PAC respectively predicted the language outcome measures.  (Table 11b).
In summary, theta-based PAC was associated with an overall increase in both the infantled and parent-estimated global language outcomes. Theta-gamma PAC robustly affected both groups of language measures, while theta-beta PAC affected parent-estimated measures alone.
Delta-based PAC did not make significant contributions to later language outcomes. When gamma was the high frequency amplitude, both infant-led and parent-estimated language outcomes showed significant predictive relations.

Discussion
According to recent extensions of Temporal Sampling theory (4,28) cortical tracking of infant directed speech (IDS) by delta-and theta-band neural signals and delta-theta neural dynamics may be key factors in explaining differences in the developmental trajectories for language acquisition. In the Cambridge UK BabyRhythm infant sample assessed here (N=113), individual differences in delta-band cortical tracking, theta power, theta/delta power ratio, and theta-led PAC were all significant predictors of language outcomes in infants during the first year of life. These infant data are consistent with neural oscillatory factors that have been found to explain individual differences in language development in children, for example the fidelity and synchronicity of delta-band encoding of speech envelope information for phonological development (32,46). They are also consistent with a recent EEG modelling preprint based on child recordings during passive speech listening, which showed that the theta/delta power ratio was related to phonological difficulties in dyslexia and that atypical theta-gamma PAC was related to impaired language outcomes in DLD (27). Further, these infant neural oscillatory factors were broadly consistent in terms of replicating our previous findings with the first half of the infant cohort (1). The presence and maturation of low-frequency (<12 Hz) cortical oscillations (PSD), cortical speech tracking (mTRF) and phase amplitude coupling (PAC) in response to sung IDS was largely similar across both halves of the longitudinal Cambridge UK BabyRhythm cohort.
The replication of stronger delta band tracking than theta band tracking in the youngest (4-month-old) infants suggests that delta band cortical tracking may be a key early building block for language acquisition. Efficient delta-band cortical tracking would enable infants to lock on to the information-rich strong (or stressed) syllables in the speech signal, acoustic landmarks that occur on average every 500ms across languages (47) . This mechanism would enable the infant brain to begin to represent the acoustic structure of human speech (48,49). Dynamic interactions with theta-band cortical tracking may then support further linguistic learning (see below). The importance of delta-band cortical tracking for language outcomes is consistent with child MEG studies, which show that typically-developing children exhibit efficient delta-band tracking of speech-in-noise, but not efficient theta-band tracking (50). The finding that delta band cortical tracking predicted early language comprehension and production is also congruent with research in the adult brain, where delta-band tracking is associated with key languagerelated skills such as perceptual grouping (51,52), discourse-level parsing related to phrasing (53,54) and comprehending speech-in-noise (55).
In many adult studies, the theta band is considered the key linguistic oscillator (24). In our infant cohort, better theta-gamma PAC at 4-months was associated with better outcomes in both the infant-led and parent-estimated global language outcome measures, and theta-beta PAC at 4-months predicted the parent-estimated global measure. Posthoc univariate tests revealed that nonword repetition, a phonological output measure (specifically syllable production at 24months) and CDI production (parental estimation of vocabulary output) were strong drivers of these effects. By contrast, delta-based PAC (with beta or gamma) did not predict any of our language outcome measures. However, not all the effects found for theta-band cortical tracking were positive in nature. Theta-band PSD increased significantly during the first year of life, and greater theta-band PSD was associated with poorer global language outcomes as measured by parental estimates of comprehension and production at 24-months. A larger theta/delta PSD power ratio was also associated with poorer language outcomes, as measured by both parentestimated and infant-led measures. This latter finding underscores the importance of considering neural dynamics when assessing developmental trajectories. In the previous EEG modelling work with 9-year-old children with and without dyslexia that investigated theta/delta ratios, a dyslexia-specific pattern of delta and theta power dynamics during natural speech processing was seen to become progressively more atypical the worse the child's phonological awareness, and this was not related to age (27).
The infant literature is sparse regarding PAC dynamics. For adults, theta-gamma PAC has been suggested as a neural mechanism for processing phonetic information in the speech signal (24,56,57). When the same rhythmic speech stimuli used in the current study were presented to adult participants, theta/gamma coupling was stronger than delta/gamma coupling (17). For our infant participants, presented with the same nursery rhyme stimuli, theta and delta were equally strong carriers of higher-frequency amplitudes. Nevertheless, only theta-based PAC predicted language outcomes. Indeed, it is the phase of theta band rather than delta band oscillations that adjust to speech rate variations for adults, coupling with gamma amplitudes (22).
Whilst both delta and theta phases couple with higher-frequency amplitudes in the infant brain, it seems that theta-based PAC is more strongly predictive of early language outcomes. The posthoc univariate analyses suggested that measures involving speech production, namely nonword repetition and productive vocabulary, were important regarding these relations. Whilst the adult PAC literature shows theta-gamma coupling is directly linked to speech processing (18), deltabeta PAC is particularly associated with measures of speech production via audio-motor dynamics (23,58). As the infants in our cohort continue to develop and produce more speech, it may be that delta-beta PAC will also begin to play a role in predicting language outcomes.
The current study has several limitations. Recording EEG with infants is an inherently noisy process, and this affected our data, as different infants had to be excluded from different analyses, reducing sample size for the longitudinal language outcome assessments. A key source of noisy data is infant fussing and movement, and noise was particularly apparent in the 7month-old and silent state recordings for our sample. A second limitation is that even though our study followed a relatively large cohort of infants (122 recruited and 113 retained), a study with a larger participant pool may have had increased chances of uncovering developmentally important time-lagged associations. Notwithstanding the limitations of the study, here we present a large longitudinal sample demonstrating that neural dynamics to speech stimuli predict early infant language acquisition.

Conclusion
Estimates of the efficiency of low-frequency cortical tracking, phase amplitude coupling and theta-delta power dynamics in an infant's first year of life predict their early language outcomes. Importantly, the neural mechanisms measured here are automatic processes that are inherent to the human brain, and as such represent physiological priors for language learning in a Bayesian sense (59). Accordingly, individual differences in the efficiency of these mechanisms in the pre-verbal stage of language learning assessed here are unlikely to reflect top-down linguistic processes such as language comprehension (12). Delta-band cortical tracking at 11 months and theta−gamma PAC at 4 months of age significantly predicted language comprehension and production at 24 months. Conversely, increased theta PSD power and a larger theta/delta ratio were associated with decreased performance on the language outcome measures. While other studies have also indicated a predictive relationship between cortical tracking efficiency in infancy and language development (26), the current study is the first to show that individual differences in PAC and theta/delta power dynamics also predict infant language outcomes. This data is currently being analysed with other techniques as part of the ongoing longitudinal Cambridge UK BabyRhythm project. Once this research has been completed, the raw EEG files will be uploaded to the same OSF server as above (early 2023). In the meantime, the raw data can be made available upon request.

Funding sources
This project has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No. 694786).