An automatic pre-processing pipeline for EEG analysis (APP) based on robust statistics

doi:10.1016/j.clinph.2018.04.600

Clinical Neurophysiology

Volume 129, Issue 7, July 2018, Pages 1427-1437

https://doi.org/10.1016/j.clinph.2018.04.600 Get rights and content

Highlights

•
A novel automatic pre-processing pipeline for both resting state and evoked EEG data is proposed.
•
The proposed automatic pipeline is tested in both clinical and healthy populations.
•
The proposed automatic pipeline is as reliable as pre-processing by EEG experts.

Abstract

Objective

With the advent of high-density EEG and studies of large numbers of participants, yielding increasingly greater amounts of data, supervised methods for artifact rejection have become excessively time consuming. Here, we propose a novel automatic pipeline (APP) for pre-processing and artifact rejection of EEG data, which innovates relative to existing methods by not only following state-of-the-art guidelines but also further employing robust statistics.

Methods

APP was tested on event-related potential (ERP) data from healthy participants and schizophrenia patients, and resting-state (RS) data from healthy participants. Its performance was compared with that of existing automatic methods (FASTER for ERP data, TAPEEG and Prep pipeline for RS data) and supervised pre-processing by experts.

Results

APP rejected fewer bad channels and bad epochs than the other methods. In the ERP study, it produced significantly higher amplitudes than FASTER, which were consistent with the supervised scheme. In the RS study, it produced spectral measures that correlated well with the automatic alternatives and the supervised scheme.

Conclusion

APP effectively removed EEG artifacts, performing similarly to the supervised scheme and outperforming existing automatic alternatives.

Significance

The proposed automatic pipeline provides a reliable and efficient tool for pre-processing large datasets of both evoked and resting-state EEG.

Introduction

The electroencephalogram (EEG) is a non-invasive tool for the investigation of human brain function, which has been continuously used for almost one century (Niedermeyer and Lopes da Silva, 2005). However, EEG data are typically contaminated with a number of artifacts. Artifacts are undesired signals that may affect the measurement and change the EEG signal of interest. These artifacts may arise from non-physiological noise sources that originate outside the participant, such as the grounding of the electrodes causing power line noise at 50/60 Hz and at its harmonics, interferences with other electrical devices, or imperfections in electrode settling. Artifacts may also arise from physiological noise sources originating within the participants, such as the ones produced by head, eye, or muscle movements (Urigüen and Garcia-Zapirain, 2015). Head movements may result in spikes and discontinuities due to a rapid change of impedance at one or several electrodes. Reflective eye movements occur frequently and are normally picked up by the frontal electrodes in the frequency range of 1–3 Hz (within the delta wave range). Blinking also contaminates the EEG signal, usually causing a more abrupt change in its amplitude than eye movements. Finally, every movement of the participant generates muscular artifacts that can be found everywhere on the scalp at frequencies higher than 20 Hz (within the beta and gamma waves range).

One simple way to deal with these artifacts is to remove segments of the data that exceed a certain level of artifact contamination, for example, signal amplitudes greater than ±100 µV. However, this coarse approach may lead to the loss of a great amount of data that could still contain artifact-free information, therefore potentially compromising the subsequent analysis and interpretation of the data. This is true for both evoked-related potentials (ERP) and resting-state (RS) signal fluctuations. Moreover, since participant generated artifacts may overlap in the spectral domain, and on many EEG channels, with the signal of interest, simple spatial and frequency band filtering approaches may be inefficient to remove this kind of artifacts (Tatum et al., 2011). Another method that is commonly used to clean-up EEG data is independent component analysis (ICA; Makeig et al., 1996). Assuming that neuronal signals and noise recorded on the scalp are independent of each other, then the EEG signal can be described by their linear summation. The ICA is used to decompose the EEG data in statistically independent sources (ICs), so as to separate the neuronal and noise contributions to the signal. The artifactual ICs can then be identified and subsequently subtracted from the EEG data, yielding an artifact-free signal.

Usually, pre-processing of EEG data, including the classification of artifactual ICs, is performed under expert supervision. However, with the advent of both high-density EEG arrays (64-256 channels) and studies of large populations, yielding increasingly greater amounts of data, supervised methods have become excessively time consuming. To cope with this, and to minimize subjectivity, automatic methods have recently been presented (Abreu et al., 2016a, Abreu et al., 2016a, Abreu et al., 2016b; Bigdely-Shamlo et al., 2015, Hatz et al., 2015, Nolan et al., 2010). Fully automated statistical thresholding for EEG artifact rejection (FASTER; Nolan et al., 2010), for instance, enables a fully automated pre-processing of ERP data, based on computing z-scores of different signal metrics, and threshold them in order to detect bad channels, bad epochs and artifactual ICs. Tool for automated processing of EEG data (TAPEEG; Hatz et al., 2015) uses a similar approach for the automatic pre-processing of RS EEG data. However, because they are based on z-scores, these approaches are not robust to outliers and as a consequence they tend to have high rejection rates of artifact-free signal. A more promising approach is to use robust statistics instead. For example, the Prep pipeline (Bigdely-Shamlo et al., 2015) provides an automatic pre-processing pipeline including filtering and bad channels identification using the RANSAC (random sample consensus) algorithm. However, in this case the identified bad channels are assumed to be globally bad. Thus, if a channel contains artifactual periods, these are neglected and left in the pre-processed EEG data. Moreover, supervised inspection of pre-processed data for bad epochs is necessary since the Prep pipeline does not provide this feature.

Here, we present APP, a novel Matlab® based fully automatic pipeline for pre-processing and artifact rejection of EEG data (including both ERP and RS data), which is based on state-of-the-art guidelines for EEG pre-processing, ICA decomposition, and robust statistics. APP consists of: (1) high-pass filtering; (2) power line noise removal; (3) re-referencing to a robust estimate of the mean of all channels; (4) removal and interpolation of bad channels; (5) removal of bad epochs; (6) ICA to remove eye-movement, muscular and bad-channel related artifacts; and (7) removal of epoch artifacts. At each step of the pipeline, a number of relevant parameters are estimated from the data and outliers are detected based on a robust data-driven outlier detection scheme.

APP was tested on ERP data from 61 healthy participants and 44 schizophrenia patients performing a visual discrimination task, and on RS data from 68 healthy participants. The inclusion of patient data in the validation of APP is of particular interest since one of the primary applications of EEG is the study of clinical populations. Furthermore, many of these populations, schizophrenia patients in particular, are known to produce more artifacts than healthy volunteers, which is a challenge to automatic pre-processing. We compare APP to three state-of-the-art automatic artifact removal methods, FASTER, TAPEEG, and Prep pipeline, which have shown to be effective at removing a wide range of EEG artifacts. We also compared APP with supervised artifact removal by experts using the CARTOOL software (Brunet et al., 2011).

Section snippets

Methods

The proposed pre-processing and artifact removal method APP is first described, including a detailed description of each step. Then, the artifact removal methods FASTER, TAPEEG, and Prep pipeline, as well as the supervised artifact removal by experts, against which APP is compared, are described. Finally, the data acquisition and analysis methods used to validate the proposed method are presented.

Results

The results obtained by applying the proposed data pre-processing and artifact removal pipeline APP, as well as its alternative pipelines, are presented here, first for the ERP data and then for the RS data.

Discussion and Conclusion

EEG data are usually contaminated by numerous artifacts and require expert supervision for artifact identification and removal. However, with the increasing size of available datasets due to increasing numbers of EEG channels and study participants, supervised data pre-processing becomes impractical, paving the way for automatic pre-processing methods.

In this study, we propose a novel automatic pipeline (APP) for EEG pre-processing and artifact detection and removal, which makes use of

Conflict of interest statement

None of the authors have declared any conflict of interest.

Acknowledgments

This work was partially funded by the Fundação para a Ciência e a Tecnologia under grants FCT UID/EEA/50009/2013 and FCT PD/BD/105785/2014, and the National Centre of Competence in Research (NCCR) Synapsy (The Synaptic Basis of Mental Diseases) under grant 51NF40-158776.

References (37)

R. Abreu et al.
Ballistocardiogram artifact correction taking into account physiological signal preservation in simultaneous EEG-fMRI
NeuroImage
(2016)
R. Abreu et al.
Objective selection of epilepsy-related independent components from EEG data
J Neurosci Methods
(2016)
A. Delorme et al.
EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis
J Neurosci Methods
(2004)
E.M. Fletcher et al.
Estimation of interpolation errors in scalp topographic mapping
Electroencephalogr Clin Neurophysiol
(1996)
F. Hatz et al.
Reliability of fully automated versus visually controlled pre- and post-processing of resting-state EEG
Clin Neurophysiol
(2015)
M. Hubert et al.
An adjusted boxplot for skewed distributions
Comput Stat Data Anal
(2008)
A. Hyvärinen et al.
Independent component analysis: algorithms and applications
Neural Netw
(2000)
D. Lehmann et al.
Reference-free identification of components of checkerboard-evoked multichannel potential fields
Electroencephalogr Clin Neurophysiol
(1980)
H. Nolan et al.
FASTER: Fully Automated Statistical Thresholding for EEG artifact Rejection
J Neurosci Methods
(2010)
P.L. Nunez et al.
EEG coherency: I: statistics, reference electrode, volume conduction, Laplacians, cortical imaging, and interpretation at multiple scales
Electroencephalogr Clin Neurophysiol
(1997)

J. Onton et al.

Imaging human EEG dynamics using independent component analysis

Neurosci Biobehav Rev

(2006)

F. Perrin et al.

Spherical splines for scalp potential and current density mapping

Electroencephalogr Clin Neurophysiol

(1989)

S. Romero et al.

A comparative study of automatic techniques for ocular artifact reduction in spontaneous EEG signals based on clinical target variables: A simulation case

Comput Biol Med

(2008)

A.C. Tang et al.

Validation of SOBI components from high-density EEG

NeuroImage

(2005)

M. Bach

The Freiburg Visual Acuity test–automatic measurement of visual acuity

Optom Vis Sci

(1996)

A. Belouchrani et al.

A blind source separation technique using second-order statistics

IEEE Trans Signal Process

(1997)

N. Bigdely-Shamlo et al.

The PREP pipeline: standardized preprocessing for large-scale EEG analysis

Front Neuroinform

(2015)

D. Brunet et al.

Spatiotemporal Analysis of Multichannel EEG: CARTOOL

Comput Intell Neurosci

(2011)

Cited by (44)

Time-resolved EEG signal analysis for motor imagery activity recognition
2023, Biomedical Signal Processing and Control
Accurately characterizing brain activity requires detailed feature analysis in the temporal, spatial, and spectral domains. While previous research has proposed various spatial and spectral feature extraction methods to distinguish between different cognitive tasks, temporal feature analysis for each separate brain region and frequency band has been largely overlooked. This study introduces two novel approaches for recognizing cognitive activity: temporal entropic profiling and time-aligned common spatio-spectral patterns analysis. These approaches capture and use discriminative short-lived signal segments for motor imagery activity recognition. In Approach-1, we evaluated nine different measures to determine timing parameters that showed altered behavior associated with maximal inter-activity differences, which we then used in a machine-learning framework. In Approach-2, we used the best-performing signal characteristic measures from Approach-1 to determine the optimum latency of each channel at each frequency band for a CSP-based activity recognition strategy. We evaluated both approaches on two online available motor imagery EEG datasets and achieved average recognition accuracy levels of 86%. We compared our methods with four established BCI methods. The performance results show that our approaches exceeded the benchmark methods' performances, with notable improvements in the proposed time-aligned common spatio-spectral patterns approach. This study demonstrates that motor imagery recognition performance is improved when a temporal analysis is adopted alongside spatio-spectral neural feature analysis and that timing parameters associated with the maximal entropic difference of EEG segments to the cognitive tasks varied between different brain regions and subjects.
Stability, change, and reliable individual differences in electroencephalography measures: A lifespan perspective on progress and opportunities
2023, NeuroImage
Electroencephalographic (EEG) methods have great potential to serve both basic and clinical science approaches to understand individual differences in human neural function. Importantly, the psychometric properties of EEG data, such as internal consistency and test-retest reliability, constrain their ability to differentiate individuals successfully. Rapid and recent technological and computational advancements in EEG research make it timely to revisit the topic of psychometric reliability in the context of individual difference analyses. Moreover, pediatric and clinical samples provide some of the most salient and urgent opportunities to apply individual difference approaches, but the changes these populations experience over time also provide unique challenges from a psychometric perspective. Here we take a developmental neuroscience perspective to consider progress and new opportunities for parsing the reliability and stability of individual differences in EEG measurements across the lifespan. We first conceptually map the different profiles of measurement reliability expected for different types of individual difference analyses over the lifespan. Next, we summarize and evaluate the state of the field's empirical knowledge and need for testing measurement reliability, both internal consistency and test-retest reliability, across EEG measures of power, event-related potentials, nonlinearity, and functional connectivity across ages. Finally, we highlight how standardized pre-processing software for EEG denoising and empirical metrics of individual data quality may be used to further improve EEG-based individual differences research moving forward. We also include recommendations and resources throughout that individual researchers can implement to improve the utility and reproducibility of individual differences analyses with EEG across the lifespan.
A novel robust Student's t-based Granger causality for EEG based brain network analysis
2023, Biomedical Signal Processing and Control
Citation Excerpt :
However, using this strategy for EEG-based directed brain network analysis is rare. In fact, a variety of studies have analyzed the character of ocular artifacts, which could be summarized in following aspects: 1) Both eye movements and eye blinks cannot be controlled as they are natural activities of human beings [18–20]; 2) Ocular artifacts generated by eye blinks are typical transient events with higher amplitude than normal EEGs, which could be treated as notable outlier data [21–23]; 3) EEG segments contaminated with eye blink artifacts are present non-Gaussian structure [24,25], which could be described as heavy-tail distribution [26]. Thus, in this paper, by assuming that the model’s residual obeys the Student’s t-distribution, a novel brain network estimation method is proposed and solved by an iterative method which is designed in the variable Bayesian structure [6,16].
Granger-causality-based brain network analysis has been widely applied in EEG-based neuroscience researches and clinical diagnoses, such as motor imagery emotion analysis and seizure prediction. However, how to accurately estimate the causal interactions among multiple brain regions and reveal potential neural mechanisms in a reliable way is still a great challenge, due to the influence of inevitable outliers such as ocular artifacts, which may lead to the deviation of network estimation and the decoding failure of the inherent cognitive states. In this work, by introducing Student’s t-distribution into multivariate autoregressive (MVAR) model, we proposed a novel Granger causality analysis to suppress the outliers influence in directed brain network analysis. To quantitatively evaluate the performance of our proposed method, both simulation study and motor imagery EEG experiment were conducted. Through these two quantitative experiments, we verified the robustness of our proposed method to outlier influence when applying it to capture the inherent network patterns. Based on its robustness, we applied it for EEG analysis of emotions and assessed its efficiency in offering discriminative network structures for emotion recognition and discovered the biomarkers for different emotional states. These biomarkers further revealed the network-topology differences between male and female subjects when they experienced different emotional states. In general, our conducted experimental results consistently proved the robustness and efficiency of our proposed method for directed brain network analysis under complex artifact conditions, which may offer reliable evidence for network-based neurocognitive research.
HAPPILEE: HAPPE In Low Electrode Electroencephalography, a standardized pre-processing software for lower density recordings
2022, NeuroImage
Citation Excerpt :
As a result, there remains a current and growing need for software that standardizes and automates the processing and removal of artifacts in low-density EEG data. There is now an extensive collection of automated EEG processing pipelines (e.g., Andersen 2018; APP, da Cruz et al. 2018; MADE, Debnath et al. 2020; EEG-IP-L, Desjardins et al. 2021; HAPPE, Gabard-Durnam et al. 2018; Hatz et al. 2015; FASTER, Nolan et al. 2010; Automagic, Pedroni et al. 2019; EPOS, Rodrigues et al. 2020). However, their reliance on independent component analysis (ICA) to segregate and correct artifacts makes them unsustainable for low-density data, as the limited number of channels provides insufficient independent components for robust artifact isolation.
Lower-density Electroencephalography (EEG) recordings (from 1 to approximately 32 electrodes) are widely-used in research and clinical practice and enable scalable brain function measurement across a variety of settings and populations. Though a number of automated pipelines have recently been proposed to standardize and optimize EEG pre-processing for high-density systems with state-of-the-art methods, few solutions have emerged that are compatible with lower-density systems. However, lower-density data often include long recording times and/or large sample sizes that would benefit from similar standardization and automation with contemporary methods. To address this need, we propose the HAPPE In Low Electrode Electroencephalography (HAPPILEE) pipeline as a standardized, automated pipeline optimized for EEG recordings with lower density channel layouts of any size. HAPPILEE processes task-free (e.g., resting-state) and task-related EEG (including event-related potential data by interfacing with the HAPPE+ER pipeline), from raw files through a series of processing steps including filtering, line noise reduction, bad channel detection, artifact correction from continuous data, segmentation, and bad segment rejection that have all been optimized for lower density data. HAPPILEE also includes post-processing reports of data and pipeline quality metrics to facilitate the evaluation and reporting of data quality and processing-related changes to the data in a standardized manner. Here the HAPPILEE steps and their optimization with both recorded and simulated EEG data are described. HAPPILEE's performance is then compared relative to other artifact correction and rejection strategies. The HAPPILEE pipeline is freely available as part of HAPPE 2.0 software under the terms of the GNU General Public License at: https://github.com/PINE-Lab/HAPPE.
The HAPPE plus Event-Related (HAPPE+ER) software: A standardized preprocessing pipeline for event-related potential analyses
2022, Developmental Cognitive Neuroscience
Event-Related Potential (ERP) designs are a common method for interrogating neurocognitive function with electroencephalography (EEG). However, the traditional method of preprocessing ERP data is manual-editing – a subjective, time-consuming processes. A number of automated pipelines have recently been created to address the need for standardization, automation, and quantification of EEG data pre-processing; however, few are optimized for ERP analyses (especially in developmental or clinical populations). We propose and validate the HAPPE plus Event-Related (HAPPE+ER) software, a standardized and automated pre-processing pipeline optimized for ERP analyses across the lifespan. HAPPE+ER processes event-related potential data from raw files through preprocessing and generation of event-related potentials for statistical analyses. HAPPE+ER also includes post-processing reports of both data quality and pipeline quality metrics to facilitate the evaluation and reporting of data processing in a standardized manner. Finally, HAPPE+ER includes post-processing scripts to facilitate validating HAPPE+ER performance and/or comparing to performance of other preprocessing pipelines in users’ own data via simulated ERPs. We describe multiple approaches with simulated and real ERP data to optimize pipeline performance and compare to other methods and pipelines. HAPPE+ER software is freely available under the terms of GNU General Public License at https://www.gnu.org/licenses/#GPL
DEEP: A dual EEG pipeline for developmental hyperscanning studies
2022, Developmental Cognitive Neuroscience
Cutting-edge hyperscanning methods led to a paradigm shift in social neuroscience. It allowed researchers to measure dynamic mutual alignment of neural processes between two or more individuals in naturalistic contexts. The ever-growing interest in hyperscanning research calls for the development of transparent and validated data analysis methods to further advance the field. We have developed and tested a dual electroencephalography (EEG) analysis pipeline, namely DEEP. Following the preprocessing of the data, DEEP allows users to calculate Phase Locking Values (PLVs) and cross-frequency PLVs as indices of inter-brain phase alignment of dyads as well as time-frequency responses and EEG power for each participant. The pipeline also includes scripts to control for spurious correlations. Our goal is to contribute to open and reproducible science practices by making DEEP publicly available together with an example mother-infant EEG hyperscanning dataset.

View all citing articles on Scopus

View full text

An automatic pre-processing pipeline for EEG analysis (APP) based on robust statistics

Highlights

Abstract

Objective

Methods

Results

Conclusion

Significance

Introduction

Section snippets

Methods

Results

Discussion and Conclusion

Conflict of interest statement

Acknowledgments

NeuroImage

J Neurosci Methods

J Neurosci Methods

Electroencephalogr Clin Neurophysiol

Clin Neurophysiol

Comput Stat Data Anal

Neural Netw

Electroencephalogr Clin Neurophysiol

J Neurosci Methods

Electroencephalogr Clin Neurophysiol

Neurosci Biobehav Rev

Electroencephalogr Clin Neurophysiol

Comput Biol Med

NeuroImage

The Freiburg Visual Acuity test–automatic measurement of visual acuity

Optom Vis Sci

A blind source separation technique using second-order statistics

IEEE Trans Signal Process

The PREP pipeline: standardized preprocessing for large-scale EEG analysis

Front Neuroinform

Spatiotemporal Analysis of Multichannel EEG: CARTOOL

Comput Intell Neurosci