Introduction

The auditory system must faithfully encode and process rapid variations in acoustic signals and precisely extract important features, such as frequency, amplitude modulation, and sound onsets and offsets. This task is accomplished by a complex, interconnected, and parallel system. Auditory information enters the brainstem from the cochlea via the auditory nerve and ascends via both lemniscal and nonlemniscal auditory pathways1. Neurons in the lemniscal (or “primary/classical”) pathway are thought to be the main bearers of temporally varying information, with synapses in the brainstem (cochlear nucleus and superior olivary complex), midbrain (central nucleus of the inferior colliculus), thalamus (ventral division of the medial geniculate body), and the primary auditory cortex. The fidelity of sound encoding in these ascending pathways affects all cognitive processes that use the information—and in turn, these ascending pathways are affected by cognitive processes via the vast efferent system. Consequently, sound encoding is relevant to the study of many higher-level functions central to human communication, including speech and music.

Frequency-following responses (FFRs) are recordings of phase-locked neural activity that is synchronized to periodic and transient aspects of sound. Traditionally, FFRs have been measured in humans as electrophysiological potentials to sound, recorded from the scalp. For guidance on collecting FFRs, see Skoe and Kraus for a tutorial in EEG-FFR collection2, Krizman and Kraus for a tutorial on EEG-FFR analysis3, and Coffey et al. for technical details on the MEG-FFR4 (see Box 1 for key points).

Human FFRs were first measured in the 1970s5. Identified as subcortical in origin, they were viewed as a potential supplement to behavioral audiometry. Over the years, the field has moved away from treating the subcortical auditory system as a bottom-up, hardwired conduit for sound, and is increasingly recognizing the contribution of top-down influences within the context of distributed neural networks. Studies using the FFR have played an instrumental role in this evolution of thinking.

The FFR is a noninvasive means of reliably measuring the fidelity and precision with which the brain encodes sound. Measures derived from the FFR (e.g. timing, amplitude, consistency, and pitch tracking, see Fig. 1) reveal an individual’s mapping between a stimulus and the brain’s activity, which may be impaired in disease or enhanced through expertize. The FFR has proven essential to answering basic questions about how our auditory system manages complex acoustic information, how it integrates with other senses, and how both tasks are shaped by experience6,7,8. FFR measures are related to the ability to differentiate sounds, hear targets in noise, and to experience with music, tonal languages, or multilingualism8,9,10,11,12,13. The FFR can reveal the plastic nature of the human auditory system, including its potential to change over short-time scales, and its sensitivity to enriched and impoverished experiences with sound13,14,15,16,17,18,19,20,21,22.

Fig. 1
figure 1

The FFR is a means of non-invasively measuring the brain’s ability to encode sound, as well as the general integrity of the auditory system. a The FFR is measured using EEG or MEG while periodic or quasi-periodic sounds such as vowels, consonant-vowel syllables, or tones are presented (see also Box 1). The morphology of the averaged evoked response differs between individuals as a function of pathology and expertize. FFRs can be visualized in b the time domain, c the frequency domain, and d as the accuracy of changes in frequency content over time in response to spectrally dynamic stimuli. e Classification accuracy derived from machine learning techniques provides an additional metric

The FFR is useful to address questions concerning impaired auditory processing in populations with impaired cochlear function23,24,25,26, and in neurodevelopmental speech and language disorders27,28,29,30,31,32 or autism33,34. It can also be used to study maturational35,36 and aging-related changes37,38, sex differences in auditory functions39, and improvement caused by interventions15,40,41,42. More broadly, the FFR can provide an index of neurological health, for instance, in populations with acquired neurological disorders (e.g. concussion)43. For a comprehensive review of FFR and its role in indexing the effects of experience on the auditory brain, see refs. 44,45.

A fundamental question is what source(s) underlie the FFR in humans. This is important for basic scientific knowledge for its own sake and also because a greater understanding of the FFR’s sources can inform its translation and deployment in medicine. Methods have emerged that allow for some spatial separation of FFR sources in humans (i.e., brainstem, thalamus, cortex4,46,47). These studies have reopened questions about the degree to which activity in different subcortical and cortical centres contributes to the well-studied scalp-recorded FFR and whether sources identified using other methods generalize to the traditional, scalp-recorded response. To be clear: while many questions remain to be answered, we do not think the FFR is solely generated in the auditory cortex, nor do we exclude the possibility of cortical contributions under certain circumstances.

Here we aim to update our evolving understanding of the FFR in a way that is accessible to an interdisciplinary audience; and, we wish to outline a roadmap that promotes a more integrative understanding of the FFR and its potential to study human auditory function.

Historical roots and changing views

To our knowledge, the term “frequency-following response” was dubbed in the late 1960s by Worden and Marsh48, where it was described in an animal model. Initially investigated with low-frequency pure tones (<500 Hz), FFRs were an appealing alternative/adjunct to other objective measures of auditory function available at the time (e.g., auditory brainstem responses, electrocochleograms) because the latter have poor frequency specificity and are less effective at evoking responses to stimulus frequencies below 500 Hz.

By the 1990s, however, evidence began to emerge that the FFR reflected more than mere stimulus audibility. Gary Galbraith, a pioneer in the use of richer FFR stimuli such as two-tone complexes, missing fundamental stimuli, and speech, reported that the FFR was affected by attention49 and by how a particular speech stimulus was perceived by the listener50. Galbraith’s insight that “the FFR is a unique tool for understanding the most important of all auditory capacities: the coding and processing of human language” has proven prescient as the 21st century has seen a dramatic increase in investigations into speech-evoked FFR and how response properties relate to human communication. With these discoveries has come a renewed interest in the investigation of the FFR above and beyond its ability to signal sound detection. Instead, as we detail below, the FFR is now seen as a powerful tool to understand the neurophysiological bases of complex auditory behaviors in humans, including speech and music.

Evoked responses, which are also derived from EEG recordings but typically using a low-pass frequency filter (<40 Hz, often referred to as “cortical auditory evoked potentials” or “late-latency responses” and their variants, such as the mismatch negativity or P300), generally reflect a response to stimulus onset and later processing stages. Distinguishing the FFR is the precision with which it retains the morphological features of the waveform of the stimulus, therefore revealing how the auditory system responds to its acoustic elements. An uncommon wealth of analysis strategies accompanies interpretation of this multifaceted response (see Fig. 1 and Box 2). The past 10 years have seen refinements of FFR analyses that capitalize on the richness of the response3.

Evidence for multiple sources in human scalp-recorded FFR

The biological sources of the FFR have been a topic of debate since the early days of the technique51,52,53. Efforts to clarify the sources of far-field responses have yielded greater understanding of how and where auditory information is integrated across auditory and non-auditory regions and timescales, and the degree to which auditory centres are subject to neuroplasticity54,55.

Our view of the FFR’s origins relies on three axioms about the auditory system.

  1. 1.

    The central auditory system is a network of intertwined structures that extend across medulla, pons, midbrain, thalamus, and temporal lobes of cortex. This network is intrinsically connected to other sensory systems and motor, cognitive, and reward systems. To be sure, cells and circuits within each of the nuclei have specialized functions and properties; but, none of these cells or circuits operates in a vacuum. The interactivity of the system means that even something as simple as a primary auditory cortex neuron’s tuning curve has to be considered within the broader context of an integrative and plastic system (reviewed in Kraus and White-Schwoch44). Thus, any consideration of one or more sources of the FFR also has to consider how those sources interact with each other and with non-auditory brain circuits. It is also important to bear in mind that the same auditory structure can yield different neural activity depending on the sound’s context29,56,57,58.

  2. 2.

    Phase-locking, the phenomenon by which neurons discharge at a particular phase within the stimulus cycle, is a common feature throughout the auditory system. Through this action the recurring, periodic elements of the stimulus (e.g., the period of the fundamental frequency, the period of the amplitude modulation frequency) are encoded in the synchronous activity of a neuronal population. As you ascend the lemniscal pathway the rate of phase-locking decreases. (For more on auditory system phase-locking see Box 3 and Fig. 2).

    Fig. 2
    figure 2

    Schematic of frequency ranges of speech and music and the relative activation of subcortical and cortical phase-locking to the frequency-following response. Phase-locking limitations of neurons and neuronal assemblies in the human auditory system are not yet known, but can be partly inferred from animal models. Despite phase-locking limitations, the frequency-following response is predictive of the functionality of the entire auditory system

  3. 3.

    The auditory system is plastic. Neurons throughout the auditory axis exhibit rapid plasticity based on stimulus context (e.g., Carbajal and Malmierca59) and the interactive nature of the auditory system makes each centre subject to non-auditory input, whether by changes in overall brain physiology or metabolism, changes in environmental input, and/or changes in top-down cognitive input to refine sensory representation. Thus, while an FFR might measure the current functional state of stimulus representation in the auditory brain, that functional state reflects the legacy of this plasticity.

What supports the conventional wisdom that the FFR has a subcortical origin?

Our current understanding of sources of human scalp-recorded FFR is the culmination of non-invasive studies in humans and invasive studies in animal models, each of which has advantages and limitations. The inferior colliculus has often been considered as the dominant source of the FFR derived from EEG scalp-recordings (EEG-FFR) (reviewed in Chandrasekaran and Kraus60), based on the auditory system’s reduced capacity for high-frequency phase-locking at higher centres. Additional evidence comes from direct recordings in animal models, in which the neural sources of the FFR have been studied by selectively taking different auditory structures offline by cooling, lesioning, or pharmacological manipulation. For example, the scalp-recorded FFR was abolished or strongly reduced by cryogenic blockade of the IC in cats51, and in human patients with focal IC lesions52, confirming that the IC is an important FFR signal generator. While these experiments ruled out more peripheral sources, they cannot rule out thalamic or cortical sources—since the IC is an obligatory station of the afferent pathway, blocking IC activity fails to disambiguate IC vs. thalamocortical contributions. Approaching this question from the other direction, studies in cats and rabbits showed that FFRs close to 100 Hz remained largely unaffected by decreased auditory cortex function, but were influenced by lesions to the inferior colliculus61. Also noteworthy is that speech-evoked FFRs and evoked responses to amplitude-modulated tones recorded directly from subcortical structures in animals strongly resemble those recorded from the brain’s surface and those recorded to the same stimuli in humans62,63.

The FFR’s short stimulus-to-response latency of ~5–9 ms is often quoted as evidence of a subcortical origin (e.g. ref. 64), as the IC has a latency of 5–7 ms. However, latency-based arguments are difficult to defend as FFR latencies vary considerably according to stimulus characteristics such as sound pressure level, frequency, and amplitude envelope, and stimulus-to-response latencies much longer than 7 ms have been reported between the stimulus and EEG-FFR in some studies (e.g. 14.6 ms65). Furthermore, intracranial recordings from Heschl’s gyrus show that the first responses to sound in the cortex can occur as early as ~9 ms post stimulus onset66.

Rethinking FFR sources: The multiple generator hypothesis

There have long been hints of the idea that the FFR comprises multiple generators. We advance the hypothesis that the EEG-FFR is an aggregate response reflecting multiple auditory stations, including the auditory nerve, cochlear nucleus, inferior colliculus, thalamus, and cortex, and that the specific mixture of sources may vary depending on the recording techniques, stimulus, and participant demographic. This hypothesis motivates several predictions.

  • Prediction 1: Decomposition of a multichannel EEG signal should indicate multiple, independent components. In 1978, Stillman et al. recorded FFRs to tones with various fundamental frequencies using only two EEG channels, and concluded that the human FFR is a composite of several waveforms whose relative influence differs as a function of frequency53. Kuwada et al. recorded human EEG and electrophysiology in rabbits and concluded that surface recordings are composite responses from multiple brain generators62. Two-channel recordings and principal component analysis on multichannel EEG data have demonstrated separable FFR components that relate to stimulus properties, such as the presence or absence of energy at the fundamental frequency64,67,68.

  • Prediction 2: Multimodal source modeling should indicate multiple generators of the scalp-recorded signal. Coffey et al. reported that FFRs to speech (with f0 ~100 Hz) could be non-invasively recorded using MEG, which allows spatial source localization. MEG-FFR contributions included not only subcortical sources—the cochlear nucleus, inferior colliculus, and medial geniculate body (thalamus)—but also the auditory cortices (with a right-hemisphere predominance)4. Using a combination of EEG and functional magnetic resonance imaging (fMRI), a subsequent study confirmed that hemodynamic activity in the right auditory cortex was related to individual differences in the EEG-based FFR f0 strength, consistent with the hypothesis that phase-locked activity in auditory cortex has a hemodynamic signature69. Bidelman found corroborating evidence of multiple sources to the FFR, including a cortical one, using distributed source modeling techniques on multichannel EEG recordings and a speech stimulus (with f0 in the same range as in Coffey et al.). This EEG approach revealed subcortical sources contributing more than the auditory cortex46 (note that thalamic sources did not appear to be included in the analysis).

  • Prediction 3: Individual differences in FFR components should correlate with behavior if they are functionally relevant. Zhang and Gong used principal component analysis on multichannel EEG data, and found multiple, separable components with different scalp topographies, only one of which correlated with pitch perception; they concluded that phase-locked activity at different sources differentially relates to behavior68. Coffey et al. observed significant correlations between the magnitude of the right auditory cortical MEG-FFR response and pitch perception thresholds, as well as with musical training, suggesting that phase-locked activity in this region provides behaviorally–relevant information4. Separately, while the MEG-FFR strength at subcortical and cortical sources was predictive of speech-in-noise (SIN) perception, the strongest correlations were observed with the right auditory cortex70. In a cross-modal attention task, Hartmann and Weisz confirmed the strong contribution of cortical regions to the MEG-FFR and found that only the right auditory cortex was significantly affected by attention71.

  • Prediction 4: Different stimulus frequencies will bias certain generators. Tichko and Skoe conducted an extensive investigation that measured EEG-FFR amplitude to complex tones as a function of fundamental frequency72. EEG-FFRs to stimuli with frequencies between 16.35 and 880 Hz showed generally decreasing amplitude with increasing frequency, but with local maxima at ~44, 87, 208, and 415 Hz. The local maxima suggest an EEG-FFR with multiple underlying generators whose activity interacts constructively or destructively at the scalp depending on the stimulus frequency (Fig. 3a). The EEG-FFR interference pattern that produced these local maxima was modeled by the authors as the summation of multiple phase-locked signals, all phase-locked to the stimulus frequencies but with different latencies (i.e., neural conduction times). The authors suggested that recording protocol, electrode montage, recording quality (i.e. signal-to-noise ratio), and subject demographics influence the EEG-FFR interference patterns because each one of these manipulations alters the strength of phase-locking or the degree to which this phase-locking can be detected at the scalp.

    Fig. 3
    figure 3

    a Scalp-recorded frequency-following responses (FFRs) may reflect, in part, the summation of phase-locked activity from different sources, each with a characteristic lag relative to the onset of the stimulus. The putative sources of the FFRs include the cochlea, auditory nerve (AN), cochlear nucleus (CN), superior olive (SOC), inferior colliculus (IC), medial geniculate body (MGB), and auditory cortex (AC). b Electrode montage influences the relative contribution of sources in the scalp-recorded signal: for example, the montages shown on the left and central panels which include an electrode at the mastoid likely include a greater contribution from peripheral sources than does the montage illustrated on the right, which references a single vertex channel to the average of other scalp electrodes

  • Prediction 5: Different recording techniques will differ in their sensitivity to different sources. Source-localized EEG-FFR and MEG-FFR do not show identical patterns of source strength4,46. Results from MEG should not be directly applied to EEG due to their differing sensitivities to radial vs. tangential currents, and to superficials vs. deep sources (discussed with reference to FFR in ref. 4); although they both are sensitive to the electrochemical current flows within and between brain cells, they provide partly overlapping and partly complementary information73,74,75. Still, even using only EEG-FFR, electrode placement and referencing appears to affect signal content. Coffey et al. compared two common electrode montages and found only a moderate correlation in their sensitivity to behavioral measures76; these montages, often used interchangeably, may thus differ in the combination of sources to which they are sensitive (Fig. 3b). Likewise, reaction times on an auditory task were noted to track with amplitude of the EEG-FFR in an electrode montage that favors more central subcortical sources, but not in responses from a simultaneously recorded montage that was more peripherally biased77.

A thread through this work is that recording modalities, stimuli, and stimulus presentation paradigms all may influence the mix of sources underlying the recorded signal. One must therefore exercise caution in extrapolating conclusions from one modality or paradigm to the results of another.

In summary, the extent of contributions of sources to the scalp-recorded EEG-FFR under different experimental conditions and in different populations is an unsettled topic. Yet the discovery that different recording techniques implicate different underlying generators increases the richness of what FFR can tell us. We find ourselves sympathetic to the view that the EEG-FFR signal can represent a mixture of sources including the auditory nerve, CN, IC, MGB, and cortex, and that the contribution of each source could differ depending on where and how the response is recorded. Regardless of the “real-time” sources of an FFR, and the possibility that one source may dominate the response, we want to reemphasize that each of those potential sources operates in concert with each other (and non-auditory systems) to shape its function.

Approaches to test hypotheses about FFR origins

To make further progress on these concepts, it will be useful to employ methods whereby FFR data are collected simultaneously with other data that unambiguously reflect cortical and network activity70,78,79,80. Functional connectivity measures that allow for quantification of the strength and direction of information transfer may also prove useful when applied to spatially resolved signals such as EEG/MEG in source space81. Combinations of different methods could be especially valuable, such as EEG-based FFR together with fMRI or functional near-infrared spectroscopy (fNIRS)69,82. fMRI or fNIRS provides a means of quantifying functional networks throughout the brain which could be used to relate to FFR variables.

Recent animal neurophysiology studies have demonstrated that an FFR similar to that of humans can be recorded in awake monkeys83, confirming previous demonstrated analogs between humans and anesthetized non-human animals37,38,84,85. Awake animal preparations could be particularly enlightening because of the possibility of recording simultaneously from multiple sites in behaving animals. Neurophysiological studies in animals and humans86 could provide a ground truth comparison for FFR strength estimates, establishing cellular-level correlates of observable EEG signals and their changes with plasticity. Another approach would be to combine FFR measurements with brain stimulation of the auditory cortex.

There is also still more to learn about the “old-fashioned” scalp-recorded FFR. Much work to date has focused on the lower-frequency components of the response relating to the fundamental frequency of the stimulus, even though there are approaches that bias responses to high-frequency cues such as speech formants. A wealth of analysis techniques accompanies the interpretation of the FFR; see Krizman and Kraus3. A deeper understanding of these FFR components can enrich our understanding of complex auditory behaviors. And, when applied in tandem with animal research and other techniques, these techniques can further our understanding of generators underlying these relatively simple paradigms.

Finally, new methods to collect FFR offer many interesting possibilities for future research. For example, an exciting future direction is to record FFR to continuous, natural speech or other signals, instead of the traditional repeated singles stimulus paradigm87,88,89,90. Combined with free-field recordings91, portable FFR systems92, and/or wearable technologies93, these methods open opportunities to examine FFR in real-world settings. On the analytical front, machine learning algorithms have recently been developed allowing single-trial FFR classification94,95 which could have many applications, including for instance as neurofeedback in training paradigms.

Network dynamics and the “functional view” of the FFR

Contemporary approaches in systems and cognitive neuroscience emphasize the concept that the nervous system functions as an integrated set of complex networks, comprising various interconnected nodes and hubs at which distinct operations take place96, and from whose interactions complex cognition emerges. This perspective strongly informs our view that the auditory nervous system exhibits extensive bidirectional cortical–subcortical and ipsilateral–contralateral connectivity (in addition to bidirectional connectivity with other sensory and cognitive systems). In turn, auditory cortex may itself be considered a hub97 for the ventral and dorsal corticocortical loops that are known to underwrite auditory cognition including auditory object recognition, localization, speech, and music98,99,100,101. Thus, we may consider the entire auditory system as consisting of a number of conjoined complex networks, each of which is of course far from fully characterized at this point.

Taking this idea of a highly interconnected nervous system as a framework, we suggest that the FFR serves as an index of the functional properties of the subcortical and early cortical parts of the auditory system. By virtue of the interconnectedness of networks, the FFR is a snapshot of auditory processing. It also seems that the FFR would be influenced by, and hence be relevant to, the corticocortical loops as well. Although direct evidence for such network-level influences remains sparse, the modulation of FFR parameters associated with training-induced plasticity or with cortical dysfunction, as mentioned above, may be one instantiation of this phenomenon15,102. Similarly, the proposals that the FFR may be influenced by attention71,103,104,105 (but see ref. Varghese et al.106), arousal state107, or task demands76,86,108, may constitute another example. Conceptually similar is the idea that stimulus-specific adaptation (and mismatch negativity) were originally considered cortical109, but we now know that they reflect an integrated auditory change detection response56,57,110,111.

It is our view that the FFR should be thought of as an aggregate measure of the response of the auditory system, reflecting its cumulative prior history. Specific auditory brain centres may contribute differently to a measured response, but those centres function jointly, and in the context of broader neural networks. This gives us the “functional view” of the FFR—we see it as a measure of how well the entire brain is coding sound features much more than as a reflection of activity within any single nucleus, because the nuclei are embedded in complex functional networks. Distinct computations may happen at local nodes, but the functional metrics can be considered as an emergent property of the interactions between nodes. Considering the FFR in this way leads to the development of systems-level hypotheses that should encourage understanding of the relationships between the FFR and other neural features. For example, combining FFR measures with functional MRI may prove useful in delineating the interactions between auditory representations and higher-order cognitive functions (e.g., attention, memory, and even visual and motor operations) and how these interactions change with experience. Similarly, functional and structural connectivity metrics offer opportunities to explore individual differences in network properties and how they affect auditory encoding. All of these approaches can also inform questions relating to development and maturation, as well as to aging and disorders.

Conclusions

Auditory neuroscience is now more attuned to the significance of top-down influences and the role of neuroplasticity in auditory processing; the auditory system is correctly viewed as part of interconnected circuitry that involves cognitive, sensorimotor, and limbic systems. In many ways, the FFR is an ideal way to access this complex circuit precisely because it is not a monolithic response reflecting only a single stimulus component or single source. Rather, the FFR reveals how the auditory system responds to multiple acoustic elements throughout an entire sound, enabling a wealth of analysis strategies. Germane to this perspective article, the FFR can be measured with a number of different techniques, each of which provides a distinct window into auditory processing. Because the FFR is so rich and complex, much more is to be learned from it (Box 4). There needs to be agreement on terminology, a concerted effort against over-generalization vis-à-vis its generation, and careful harmonization between techniques and research questions to fully understand and successfully harness its potential. We hope this perspective piece serves to both inform readers and to inspire them to embrace the complexity of the FFR while remaining grounded in best practices and interpretation as research into the brain mechanisms underlying this response proceeds.