Abstract
Recent advancements in deep-learning have significantly enhanced EEG-based drowsiness detection. However, most existing methods overlook the importance of relative changes in EEG signals compared to a baseline, a fundamental aspect in conventional EEG analysis including event-related potential and time-frequency spectrograms. We herein introduce SiamEEGNet, a Siamese neural network architecture designed to capture relative changes between EEG data from the baseline and a time window of interest. Our results demonstrate that SiamEEGNet is capable of robustly learning from high-variability data across multiple sessions/subjects and outperforms existing model architectures in cross-subject scenarios. Furthermore, the model’s interpretability associates with previous findings of drowsiness-related EEG correlates. The promising performance of SiamEEGNet highlights its potential for practical applications in EEG-based drowsiness detection. We have made the source codes available at http://github.com/CECNL/SiamEEGNet.
I. Introduction
Drowsy driving is a major contributor to traffic accidents and results in significant human and financial costs. In the United States, approximately 15-33 percent of fatal crashes are associated with drowsy driving [1]. Drowsiness can be caused by various factors, such as sleepiness, changes in circadian rhythm due to shift work, sleep deprivation, and fatigue, all of which greatly affect alertness, concentration, and reaction time [2]. To minimize the negative effects of drowsy driving and detect drowsiness in its early stages, it is essential to monitor drowsiness. Generally, research on drowsiness detection can be categorized into three groups based on the modalities used to monitor drowsiness: vehicle-based, behavioral-based, and physiological-based [3]. Physiological-based methods, such as electroencephalography (EEG), have the advantage of directly measuring and reflecting brain activity associated with drowsiness in the driver [4]. Thus, they are considered the most promising approach for detecting drowsiness. EEG is a widely used tool to monitor and analyze brain activity, and many studies have shown that it is feasible to use EEG to detect drowsiness [5], [6]. However, the EEG signal is characterized by being non-stationary and can vary greatly between subjects, as well as within the same subject [7], [8]. This variability requires careful analysis and interpretation to accurately understand and explore the human brain. Due to the time-resolved nature of the EEG signal, it is susceptible to temporal drift, which is not directly related to the experiment. Both physical factors, such as changes in sensor impedance, and mental factors, such as changes in mental condition, can contribute to this variability [9]. This phenomenon can have a significant impact on the analysis of EEG, as the baseline level of EEG activity changes over time.
With recent advances in deep learning (DL), an increasing number of researchers have focused on DL-based EEG decoding. For example, convolutional neural networks (CNNs) have been widely adopted for EEG classification tasks, due to their ability to extract high-level temporal and spatial information from raw EEG signals. CNN-based EEG decoding models transform raw EEG signals into a latent space and use learnable convolution kernels to extract information [10]–[12]. The convolutional kernels can achieve the effects of temporal and spatial filtering, akin to conventional approaches but without the constraints of predefined or hand-crafted features. The DL approach offers new opportunities to explore and exploit unconventional or high-level information in EEG dynamics and thus facilitates the understanding of brain dynamics.
To enhance performance of decoding drowsiness-related brain activities, many researchers have incorporated DL models into their studies. Four main DL models are used in most studies: CNN [13], [14], RNN [15], [16], Transformer [17], [18], and GCN [19], [20]. Although DL-based models have demonstrated significant improvements in decoding drowsy brain activities, most of them do not consider the relative change of brain activities, which is a crucial aspect when dealing with EEG signals. As mentioned earlier, the interand intra-subject variability in EEG poses a significant obstacle in performing cross-subject tasks. Relative change can be a potential solution to alleviate this problem, and several conventional EEG analysis approaches have already adopted this concept. One commonly used method is the baseline correction in the analysis of event-related potential (ERP) [21]. The approach involves subtracting the average voltage value of the EEG activity during the baseline period, which is the time prior to an external stimulation, from the EEG activity during the post-stimulus interval, which is the time following the stimulation. This helps correct for temporal drift in EEG signals. In the field of alertness/drowsiness monitoring, it has been observed that the brain dynamics associated with drowsiness can vary greatly between individuals, and even for the same individual, the level of alertness or drowsiness can vary at different times [22]. To address this issue and improve generalizability, many studies in this field take the relative power or the power ratio into consideration. This approach helps reduce individual differences and enhance the performance of alertness/drowsiness estimation [23], [24]. Moreover, there are numerous applications for EEG analysis and brain-computer interface (BCI) that make use of the relative change in EEG signals or its power [25]–[27]. These examples suggest that the exact voltage value or power of EEG signals is often not a sufficient criterion for EEG analysis. Therefore, it is common to employ a baseline or reference, typically in the form of the mean voltage or power of the signal, to represent the initial level for each subject or trial. As a result, the focus of many EEG analysis is on the relative change in brain activity, as this is believed to improve the robustness and generalizability for EEG analysis.
However, there are still some obstacles to overcome if we want to utilize the advantage of using the relative change in the EEG pattern in DL-based methods. For conventional single-branch CNN-based EEG decoding models, they are constrained by their structures that extracting information from a single input. This makes it difficult to capture the pattern of relative relationships or changes between trials. RNN-based models can capture short-term relative changes, but struggle with long-term considerations. As for the transformer-based model, it has tremendous potential to capture various features of EEG due to its attention mechanism. However, the biggest barrier to applying the transformer to EEG decoding is the small scale of EEG datasets. Neglecting the consideration of relative change in EEG can result in suboptimal performance in the scenario of cross-subject training or the requirement of additional cross-subject training techniques [28], [29]. Therefore, to utilize the advantage of using relative changes in the EEG pattern in DL-based methods, it is necessary to develop an alternative architecture, as shown in Fig. 1, that excels at learning changes between trials and extract different EEG patterns of relative change.
In this study, our primary objective is to enhance the performance of the drowsiness estimation in a DL-based EEG decoding model by introducing the concept of relative change. Therefore, we propose a novel Siamese neural network for EEG decoding, SiamEEGNet, to estimate drowsiness levels. The model is inspired by the Siamese network architecture [30] and incorporates modules adapted from novel DL-base EEG decoders. The Siamese architecture enables the extraction of consistent latent representations from two parallel inputs, and the shared feature extractor facilitates the learning of relative relationships between two inputs in the latent domain. This property enables learning and inference regarding the relative change in both the input data and the output labels. Thus, we can estimate the subject’s drowsy level by comparing the current trial with the baseline trial, which represents the level of alert baseline for each subject. Our key contributions are threefold. First, we present a novel Siamese EEG decoding architecture designed to capture the EEG pattern of relative change. This architecture is applied to the problem of drowsy EEG decoding and demonstrates outstanding performance in both the within-subject and crosssubject scenarios compared to existing methods. Second, we develop various manipulation techniques to improve stability and robustness in drowsy EEG decoding. Lastly, we extract visual evidence from SiamEEGNet to interpret the significant characteristics with neuroscientific insights.
II. RelatedWork
A. EEG-based drowsiness estimation
In recent years, due to the utilization of deep learning, significant advancements have been made in the field of EEGbased drowsiness estimation through the utilization of deep learning. Conventional machine learning approaches typically involve a two-step process, which includes extracting handcrafted features from EEG data and a prediction algorithm. Compared to machine learning approaches, DL-based methods enable an end-to-end training approach, allowing for datadriven feature extraction directly from EEG. Recent studies on DL-based drowsiness estimation can be broadly categorized into four types: CNN-based, RNN-based, Transformerbased and GCN-based methods. CNN-based models excel in extracting both temporal and spatial features through various convolutional kernel designs [13], [14]. Due to the basic characteristic of EEG (i.e., a time sequence), [15], [16] integrate the RNN-based method into their framework to take temporal dependencies into consideration. Transformer-based methods, known for their efficacy in handling sequential information, have shown promising results in various research fields [31]. By leveraging attention mechanisms, transformer-based methods can achieve performance comparable or even superior to RNN-based approaches [32]. Consequently, researchers have started exploring the application of Transformer-based methods in drowsiness detection, aiming to overcome the limitations associated with long-term sequences often encountered by RNN-based methods [17], [18]. Furthermore, the graph convolutional network (GCN) has become a popular choice for EEG-based drowsiness detection [33]. GCN focuses on learning spatial dependencies among EEG channels by treating multichannel EEG signals as graph data. Each EEG channel is represented as a node, and the relationships between two channels are captured by edges in the graph structure [19], [20].
B. Siamese Network
A Siamese network is a neural network architecture comprising two identical sub-networks joined at their outputs. It was initially proposed for signature verification [30]. Later, the Siamese network architecture became widely adopted for similarity learning with deep neural networks. In most applications, the Siamese network applies the same transformation to both inputs and computes the similarity (or distance) between the latent representations of these inputs. For instance, it has been effectively used in tasks such as face verification [34] and object tracking, where the objective is to compare an exemplar image with candidate images and return a score representing the similarity of the two inputs [35].
Recently, the Siamese architecture has found applications in the field of EEG-based brain-computer interfaces (BCI). Many studies have utilized this architecture for similarity learning by minimizing the distance within samples of the same class while maximizing the distance between samples from different classes. By employing a Siamese network, it becomes possible to minimize the distance between samples of the same class, ensuring greater separation between samples from different classes. In this context, S. Zhang, et al. [36] used the Siamese network to learn distance-based representations from pairwise EEG data. S. Shahtalebi, et al. [37] developed a Siamese network and adopted a binary classification strategy for motor imagery classification tasks. Additionally, Yao Li, et al. [38] and X. Zhang, et al. [39] introduced convolutional correlation analysis (Conv-CA) and bidirectional Siamese correlation analysis, respectively, to enhance the performance of frequency classification of steady-state visual evoked potentials (SSVEP). These methods aim to compare EEG signals and reference signals in their latent representations and calculate the correlation between the two inputs.
III. Architecture
In this section, we present the architecture of the SiamEEGNet, as shown in Fig.2. The proposed architecture consists of two identical feature extraction (FE) modules with identical model parameters. The objective is to map two inputs to a common latent space, which enables the prediction of the relative change in the drowsiness index (DI) using a regression layer. In the subsequent subsection, we provide a detailed explanation of each component in the SiamEEGNet, including the feature extraction module, multi-window processing, smoothing layer, and regression layer.
A. Feature extraction module
The primary objective of the feature extraction module is to derive features conducive to the decoding of brain activities. To facilitate feature extraction from EEG data, SiamEEGNet requires a feature extraction module capable of capturing EEG data characteristics. As the default configuration, we integrate the feature extraction module from the established EEG decoding model, EEGNet [11], which consists of three main blocks: two convolutional blocks (temporal and depthwise separable convolution) and the classification block. The core aim of this module is to extract features from EEG signals and map them to a common latent space. Thus, we retain only the first two blocks of EEGNet for our feature extraction module. Moreover, the feature extraction module can be replaced with different EEG decoding models.
B. Multi-window processing
To mitigate the short-term impact in the EEG recording, we developed a multi-window processing approach. The fundamental concept is to analyze a relatively extended EEG signal to identify a consistent brain state. Multi-window processing stacked Nt trials prior to the current trial together, where Nt is set to 10 as this approximately covers the trials within 90-second causal window. The Nt trials are then fed into the feature extractors individually, yielding the corresponding latent features. The calculation of feature extraction module incorporating multi-window processing can be presented as follows: where θ refers to the feature extraction module, and xi represents the ith input trial in the multi-window processing. hi is the corresponding latent feature with the size of (DL, 1), where DL denotes the dimension of the latent features. In this study, we set Nt = 10 when using EEGNet as the feature extraction module.
C. Regression layer
To predict the difference between two average latent features, we concatenate two outputs generated by the two parallel networks and apply a regression layer to predict the difference in DI between two input trials (ΔDI). The hyperbolic tangent was used as the output activation function to constrain the range of ΔDI within -1 to 1 (the range of ΔDI). The operation of the regression layer is shown as follows: where hc is the feature vector concatenating two average power features from two sub-networks. , where W are the regression coefficients of size 2DL ×1, and b is bias. Finally, we employ the mean square error (MSE) as the loss function to calculate the loss between the true ΔDIs and the predicted ΔDIs for our regression task, allowing us to perform end-to-end training.
IV. Experiment
A. Dataset
A lane-keeping driving dataset was employed in our study to explore the EEG activities associated with a sustainedattention driving task [40]. During the experiment, lanedeparture events were activated randomly while participants were cruising in a car. Participants were asked to steer the car back to the original lane as soon as possible when a lanedeparture event occurred. The time between the deviation onset and the response onset is the reaction time (RT), which can be used to assess drowsiness levels. Twenty-seven subjects (ages 22-28) participated in the study, contributing a total of 62 sessions (19 subjects contributed multiple sessions).
To better develop the EEG-decoding model for the drowsiness level, we first need to ensure each session accumulate sufficient data from both drowsy and alert states. We followed the session selection criterion in [22] to perform session selection. Additionally, subjects with only one session were excluded for the within-subject test, resulting in 12 subjects being included in our experiments. To establish a baseline for alert brain condition in each session, we designated the initial 10 trials as the baseline [41].
Then, we quantified drowsiness by converting RT into a drowsiness index (DI) using a previously proposed method [22]: where T represents the RT for each trial, T0 is the alert baseline RT, a is a constant set to 1 s−1, and H is a causal moving average filter in the temporal domain.
B. EEG data processing
The EEG signal was recorded in 30 channels with a sampling frequency of 500 Hz. First, the raw EEG data went through a bandpass filter (2-30Hz) to reduce high-frequency noise and power noise and downsampling to 250 Hz. We further applied artifact subspace reconstruction (ASR) [42] to remove the artifact in the EEG signal and set the ASR threshold as 10 times the standard deviation [22]. Finally, extract epochs that include the 3-second EEG signal prior to deviation onset. EEGLAB [43] was used to implement all data pre-processing steps.
C. Dynamic baseline training
Based on the pairwise input, a specialized training strategy called dynamic baseline training is developed in order to enhance the adaptability and robustness of the model to variations in baseline levels. This approach involves creating pairs by randomly selecting a trial from the same session, which serves as a dynamic baseline level. The ground truth for training is established by calculating the difference in DI between these two trials (ΔDI). By introducing dynamic baseline conditions during training, dynamic baseline training forces the model to effectively adapt to varying baseline levels. Additionally, dynamic baseline training allows us to expand the dataset by controlling the number of trials used to form pairs for each trial. During the inference phase, static baseline inference is utilized. We form pairs using exclusively a fixed baseline trial. In other words, we predict the difference in DI between the current trial and a predetermined baseline trial. The resulting ΔDIs are comparable to the original DI, as they reflect the current DI after removing the baseline DI. Static baseline inference enables the evaluation of actual fluctuations in drowsiness levels during the current trial.
To assess the performance of the regression tasks, we employ two metrics: root mean square error (RMSE) and Pearson’s correlation coefficient (CC). RMSE is a commonly used metric for evaluating the disparity between the predicted ΔDIs and the actual ΔDIs, while CC measures the degree of correlation between the predicted and actual ΔDIs.
We evaluated SiamEEGNet using both within-subject and cross-subject validation. In the within-subject scheme, individual models were trained for each session, with a random 20% split for validation to assess model performance. During the testing phase, we averaged the predictions of the models trained on the same subject’s sessions. For the cross-subject scheme, we implemented leave-one-subject-out cross-validation. In the training phase, one subject from the training dataset was selected as validation data to assess model performance. The model was trained for 50 epochs using the Adam optimizer with a learning rate set to 0.001. Additionally, we employed a weight decay technique with a ratio of 0.0001 to mitigate overfitting.
D. Selection of baseline methods
We conducted a performance comparison between SiamEEGNet and six other approaches. For the conventional drowsiness estimation methods, we utilized hand-crafted band-power features and selected support vector regression (SVR) as the classifier according to previous studies [44]–[46]. For deep learning-based methods, we chose EEGNet [11], ShallowConvNet [10], and SCCNet [12] as baseline methods to represent general single-input CNNbased approaches. Additionally, we included ESTCNN [13] and InterpretableCNN [14] as the leading DL-based EEG decoders. To ensure consistent and reliable evaluation, the experiments were repeated five times and subsequently averaged.
V. ExperimentResult
A. Performance comparison
First, we compared the regression performance of the proposed SiamEEGNet with 6 baseline methods. Among them, SVR used smoothed log band power features (alpha, beta, gamma and delta band log power of each channel) as input [22]. Regarding DL-based methods, we directly used the EEG signal as input and incorporated it with multi-window processing (Nt=10) to obtain a fair comparison. Two training schemes were performed on all subjects.
Table I presents the average performance of the withinsubject and cross-subject test for seven methods across 12 subjects in terms of RMSE and CC. In the within-subject test, most DL-based methods exhibit performance levels that are on par with or superior to the conventional SVR approach (RMSE=0.205, CC=0.530), with the exception of ESTCNN (RMSE=0.208, CC=0.474). Essentially, the proposed SiamEEGNet (RMSE = 0.118, CC = 0.574) can perform exceptionally well but falls slightly short of EEGNet in terms of CC. However, in the cross-subject test, the proposed model (RMSE = 0.179, CC = 0.687) outperforms all other baseline methods in terms of CC and achieves the second lowest RMSE. To assess the statistical significance of the variations among the various methods, we employed the Kruskal-Wallis test with Tukey’s HSD (Honestly Significant Difference) to perform nonparametric multiple comparisons between the methods. The results of these comparisons are presented in Table II. In the within-subject test, SiamEEGNet demonstrated significantly superior performance compared to SVR, SCCNet, and ESTCNN in CC. In the cross-subject test, SiamEEGNet statistically significantly excelled over EEGNet, SCCNet, and ShallowNet in RMSE and all baseline methods in CC. Fig. 4 presents the DI decoding results obtained by the proposed method from a sample session (S44-2). This sample DI decoding result shows that our model is capable of decoding drowsy brain dynamics and produces accurate prediction performance.
B. Impact of feature extractors
To assess the adaptability and effectiveness of the concept of relative change in enhancing drowsiness estimation, we incorporated various EEG decoding models, not solely limited to using EEGNet as the feature extraction module. These models include five baseline DL-based models and two additional models, namely EEGTCNet [47] and MBEEGSE [48], which have shown promising performance in motor imagery classification tasks. We compared the performance with and without the proposed SiamEEGNet in the crosssubject scenario. As shown in Table III, most EEG decoding models exhibit a significant improvement when SiamEEGNet is applied, particularly in terms of CC. These results show that SiamEEGNet can be effectively integrated with various EEG decoding models, serving as a feature extraction module, and significantly elevating decoding performance, particularly in terms of CC. These results further emphasize that the concept of capturing relative changes in drowsiness levels holds universal applicability and is not confined to specific EEG decoding models.
C. Impact of parameters
A series of experiments were conducted to evaluate performance under different adjustable parameter configurations. Specifically, two parameters were considered: the number of windows used in the multi-window processing and the number of dynamic baselines employed. To determine the optimal interval for achieving the best performance, various numbers of windows were tested for the multi-window processing. The changes in RMSE and CC were analyzed as a function of the number of windows, as shown in Fig. 6(a) and Fig. 6(b). The results indicate that performance initially improves with an increasing number of windows, but after 15 trials, performance reaches a plateau and even exhibited a slight decline. Based on these findings, it was determined that 10 windows would be the optimal setting for this work. In terms of time, the time span of 10 windows corresponds to approximately 90 seconds, which aligns with the filter length applied to the DIs.
Another adjustable parameter is the number of dynamic baselines used in forming input pairs. As stated previously, dynamic baseline training allows for dataset expansion by controlling the number of dynamic baselines paired with each trial. The experiments were conducted using different numbers of dynamic baselines paired with each individual trial. The results, as shown in Fig. 6(b), demonstrate consistent performance, even when the size of the dataset increased. This suggests that the size of the original training dataset is sufficient to effectively train the model with EEGNet as the feature extraction module. This can be attributed to the inherent capabilities of the model itself. EEG decoding models are typically designed to adapt to EEG datasets, which are often small-scale. Consequently, these models may lack the capacity to effectively learn from larger datasets.
D. Ablation study
We conducted ablation studies to scrutinize the contribution of three critical components in our method, namely the Siamese architecture, dynamic baseline training, and multiwindow processing. Specifically, we removed each of these three methods separately from SiamEEGNet. Subsequently, we evaluated the regression performance of these three models in the cross-subject test. As shown in Table IV, we observed that removing the Siamese architecture resulted in a significant performance drop compared to the original SiamEEGNet. Specifically, there was a decrease of 0.013 in RMSE and 0.032 in CC. Furthermore, the use of dynamic baseline training also has a significant impact on performance. The results show a significant drop of 0.08 in RMSE and 0.031 in CC when we replaced dynamic baseline training with static baseline training. Static baseline training uses only a predefined baseline to pair with each trial during model training. Lastly, our results also reveal that multi-window processing has a significant impact on performance. When we removed multi-window processing (i.e., using only one EEG trial as input for each branch), we observed a considerable decline in decoding performance, with a reduction of 0.075 in RMSE and 0.282 in CC. To scrutinize the performance of the three variant approaches in comparison to the original SiamEEGNet in decoding the DI level, Fig. 5 displays a scatter plot of a sample session (S44-2). Although all approaches exhibit a similar trend in predicting the drowsiness index, the methods that incorporate multi-window processing display a relatively concentrated DI decoding result with less fluctuation. Additionally, dynamic baseline training demonstrates a noticeable improvement in fitting the baseline, which refers to the initial phase of a session. On the contrary, static baseline training exhibits a bias at the baseline level between actual DI and predicted DI.
VI. Discussion
In this study, we introduce the concept of relative change, a common concept in traditional EEG analysis, to DL-based methods to improve the performance of drowsy brain activity decoding. We then developed SiamEEGNet, which enables DL-based EEG decoding models to learn the relative change between EEG trials. Our experimental results demonstrate that the proposed method outperforms all baseline methods in decoding drowsy brain activities in the cross-subject test. Compared to the result of the within-subject test, the averaged RMSE decreases by approximately 0.034 (0.206 to 0.184) while the averaged CC increases by approximately 0.113 (0.574 to 0.687). Previous studies have demonstrated that the performance of self-decoding approaches (within-subject test) is significantly affected by the inter-session variability, due to the limited availability of data from the same subjects [22]. In the cross-subject scenario, the models could access diverse data from different subjects. This factor can potentially boost the performance of models if the inter-subject variability is handled appropriately. Another factor is that the utilization of relative changes serves for enhancing the generalizability and robustness when analyzing the dynamics of the drowsy brain wave [23], [24], [49].Theses characteristics contribute to improve the performance of drowsiness detection in the cross-subject scenario.
We further conducted the ablation study to investigate the importance of learning relative changes in the decoding of drowsiness-related brain dynamics. The experiment results show that using the proposed SiamEEGNet to learn the relative change in drowsiness level significantly improves the decoding performance compared to the general single-branch EEG decoding with multi-window processing (p -value < 0.05 using the Mann-Whitney U test). These results highlight the importance of the Siamese architecture in capturing relative change. Furthermore, the way we construct input pairs for model training has a significant influence on the decoding results. We tested two different ways to form input pairs: static baseline training and dynamic baseline training. Dynamic baseline training resulted in better performance and baseline alignment in different subjects. This result shows that randomly selecting a trial as a dynamic baseline helps the models adjust to different baseline levels, which can reduce variability in the alert baseline condition between different subjects [41]. In addition, incorporating multi-window processing is another crucial factor that affects the decoding results and shows a significant improvement over not using multi-window processing. This improvement is due to the use of average latent features, which helps mitigate short-term variations and instability in EEG recordings [50] and leads to a relatively robust performance.
Through the analysis for the interpretation of the proposed model, SiamEEEGNet, we are able to uncover the underlying characteristics learned from the data. First, to explore how considering relative change can mitigate inter-subject variability and perform well in the cross-subject scenario, we performed layer-wise t-SNE [51] to visualize intermediate latent features in a 2D space. This visualization helps us to understand how latent features from different sessions evolve during the prediction process. Fig. 7 shows the layer-wise tSNE visualization of 4 example sessions (S40-1, S40-2, S44-2 and S44-3) from different layers. Initially, the latent features of the different sessions are intertwined and do not distinctly form clusters associated with the sessions. However, from the perspective of the drowsiness index, the features with higher DI tend to aggregate, while the features with lower DI are widely dispersed. Upon the application of an average pooling to latent features, sessions with lower variability (S40-1, S442, S44-3) exhibit a higher level of convergence. Notably, when the smoothed features are concatenated with the baseline feature of their respective sessions, features originating from the same session are anchored by their baseline and coalesce into distinct clusters. Furthermore, an increase in DI correlates with the extension of features along particular orientations relative to the baseline. By effecting predictions of the relative change for each trial in conjunction with the baseline feature, as depicted in Fig. 7(d), it is possible to align EEG trials from different subjects based on their respective baseline levels. This alignment strategy serves to mitigate the impact of inter-subject variability in baseline levels. These visualization results offer valuable insights into how the consideration of relative changes in EEG-based drowsiness detection facilitates the model’s resilience to individual difference in cross-subject scenarios.
Then, we used saliency maps [52], [53] to investigate what our models learned from the EEG data for drowsiness detection. Saliency maps highlight important input segments and are the same size as the input (channels time × points). With this technique, we explored the important segments from two perspectives: spatial distribution and band power distribution. We analyze the spatial distribution by performing a channelwise summation to identify brain regions with high activity.
For the band power distribution, we treated each channel in the saliency map as a time sequence and calculated the power spectrum density (PSD) to discover sub-band powers of gradient response (Beta: 12-16 Hz, Alpha: 8-12 Hz, Theta: 4-8 Hz, Delta: 1-4 Hz) in each channel. We performed these analyzes on models trained with single-session data or crosssubject data. We also presented the correlation between DI and sub-band powers of the input EEG signal on the scalp to compare with active segments during model training. Fig. 8 provides evidence that the theta and delta bands exhibit a stronger correlation with DI [22], as they are also the dominant frequency bands highlighted in the saliency map. Furthermore, a higher proportion of salient components in the theta band in cross-subject models indicates that the power in the theta band is relatively universal across subjects for drowsiness estimation. This finding is also in line with previous studies showing that the theta band power spectrum has good discriminating power [41], [54]. Overall, these visualization results demonstrate that the model is capable of capturing meaningful drowsiness-related information to detect drowsiness. Additionally, the results suggest that the theta band power plays a pivotal role in the ability of our model to learn how to estimate the difference in drowsiness level. This is also consistent with previous research on EEG-based drowsiness monitoring, which has revealed that theta band power levels in the EEG signal are highly correlated with drowsiness and alertness [22], [41], [54].
VII. Conclusion
In this study, we introduced a Siamese neural network architecture for EEG decoding, coined as SiamEEGNet, that enables feature extraction of EEG relative change for drowsiness detection. By exploiting the characteristics of the Siamese architecture, we employed techniques of pairing EEG input to map the association between relative EEG change and relative drowsiness level difference. Furthermore, we leverage the multi-window processing and smoothing layer to address the instability and fluctuation during EEG recording, thereby enhancing the overall performance, and yield increased resilience against data variability. Moreover, the interpretation of model reveals the drowsiness-related EEG patterns and model behaviors that explain the enhanced decoding performance, which provides insights into both neuroscientific research and future EEG-based drowsiness detection. Overall, this study demonstrates the usefulness of SiamEEGNet toward highperformance practical EEG-based drowsiness detection in realworld applications.
Footnotes
This work was supported in part by the Ministry of Science and Technology under Contracts 109-2222-E-009-006-MY3, 110-2221-E-A49-130-MY2, 110-2314-B-037-061, and 112-2222-E-A49-008-MY2; and in part by the Higher Education Sprout Project of the National Chiao Tung University and Ministry of Education of Taiwan.
References
- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].↵
- [6].↵
- [7].↵
- [8].↵
- [9].↵
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵