Introduction

Brain–Computer Interfaces (BCIs) systems aim to provide users control over a computer application by their brain activity (see Dornhege et al. 2007; Kübler et al. 2001; Millán et al. 2004; Pfurtscheller et al. 2005; Wolpaw et al. 2002). In EEG-based BCIs, one of the biggest research challenges is to understand and solve the problem of “BCI Illiteracy”, which is that BCI control does not work for a non-negligible portion of users (estimated 15 to 30%), (c.f. Dickhaus et al. 2009). In a screening study, N = 80 participants performed motor imagery first in a calibration (i.e., without feedback) measurement and then in a feedback measurement in which they could control a 1D cursor application. Coarsely, we observed three categories of users: participants for whom (I) a classifier could be successfully trained and who performed feedback with good accuracy; (II) a classifier could be successfully trained, but feedback did not work well. It is known that there are changes between the calibration and the feedback step that can affect the EEG signals, making the feedback fail. In the study with 80 users, the bias of the classifier was supervisedly updated using the first 20 feedback trials (as in Shenoy et al. 2006), but this strategy revealed not to be sufficient for some of the participants; (III) no classifier with acceptable accuracy could be trained. Whereas participants of Cat. II had obviously difficulties with the transition from offline on online operation, users of Cat. III did not show the expected modulation of sensorimotor rhythms (SMRs): either no SMR idle rhythm was observed over motor areas, or this idle rhythm was not attenuated during motor imagery. Here we present preliminary results of a one-session pilot study in which it was investigated, whether co-adaptive learning using machine-learning techniques could help users of Cat. II and III to achieve successful feedback. Our results show that adaptive machine learning methods successfully helped participants who suffered from the BCI illiteracy problem before, to gain control of the system.

Materials and Methods

Experimental Setup

The study consisted of a one-day session that immediately started with BCI feedback using a pre-trained subject-independent classifier, as in Vidaurre et al. (2007). Using supervised and unsupervised techniques, the classifier was adapted to the specific brain signals of the experimental user during the session. Adaptation was performed during three levels. While the feedback application itself stayed the same for the whole experiment, the features on which the classifier operated and the adaptation methods changed from level to level as described below.

Methods

Eleven participants took part in the study. Six of them belonged to Cat. I (for one novice user, no prior data was available, but she turned out to be a Cat. I user), two further participants belonged to Cat. II and three to Cat. III. All users performed 8 feedback runs, each of them consisting of 100 trials (50 trials of each class). The timing of the trials was as follows: at time 0, the cue was provided in the form of a small arrow over a cross placed in the middle of the screen, one second later, the cross started to move to provide feedback. Its speed was determined by the classification output (similar to Blankertz et al. (2007, 2008a)). The task of the participant was to use motor imagery to make the cursor move into a previously indicated target direction. The feedback lasted for 3 s and was followed by a short pause. Two different types of motor imagery, chosen out of three possibilities (motor imagery of left hand, right hand or foot) were selected in advance. For seven users, previous data with motor imagery performance was available which revealed which two motor imagery tasks should be used. For the other four participants (three of Cat. III and one novice) no prior information could be used and they were asked to select two out of the three possible motor imagery tasks. Throughout the whole session, all classifiers were based on Linear Discriminant Analysis (LDA). When advisable due to high dimensionality of features, the estimation of the covariance matrix that is needed for LDA was corrected by shrinkage (Ledoit and Wolf 2004; Vidaurre et al. 2009). In order to define the adaptation schemes for LDA we use a specific variant that is introduced here. For LDA the covariance matrices of both classes are assumed to be equal (assumption of linear separability) and it will be denote by \({\varvec{\Sigma}}\) here. Furthermore we denote the means of the two classes by \({\varvec{\mu}}_1\) and \({\varvec{\mu}}_2\) , an arbitrary feature vector by \(\user2{x}\) and define:

$$ D({\user2{x}}) = \left[b; \user2{w}\right]^{\top} \cdot \left[1; {\user2{x}} \right] $$
(1)
$$ \user2{w} = {\varvec{\Sigma}}^{-1} \cdot ({\varvec{\mu}}_2 - {\varvec{\mu}}_1) $$
(2)
$$ b = - \user2{w} ^{\top} \cdot {\varvec{\mu}}$$
(3)
$$ {\varvec{\mu}} =\frac{{\varvec{\mu}}_1 + {\varvec{\mu}}_2}{2} $$
(4)

where \(D(\user2{x})\) is the difference in the distance of the feature vector \(\user2{x}\) to the separating hyperplane, which is described by its normal vector \(\user2{w}\) and bias b. Note that the covariance matrices and mean values used in this paper are sample covariance matrices and sample means, estimated from the data. In order to simplify the notation and the description of the methods, we will in the following use covariance matrix instead of sample covariance matrix and mean instead of sample mean. Usually, the covariance matrix used in Eq. 2 is the class-average covariance matrix. But it can be shown that using the pooled covariance matrix (which can be estimated without using label information, just by aggregating the features of all classes) yields the same separating hyperplane. In this study we used the pooled covariance matrix in Eq. 2. Similarly, the class-average mean (calculated in Eq. 4) can be replaced by the pooled mean (average over all feature vectors of all classes). This implies that the bias of the separating hyperplane can be estimated (and adapted) in an unsupervised manner (without label information). The restriction of the method is to have an estimate of the prior probabilities of the 2 classes. If LDA is to be used as a classifier, observation \(\user2{x}\) is classified as class 1, if \(D(\user2{x})\) is less than 0, and otherwise as class 2. But in the cursor control application we use the classifier output \(D(\user2{x})\) as real number to determine the speed of the cursor. Finally, we introduce the features and classifiers that have been used in the three levels of the experiment, including three on-line adaptation schemes: the first two are supervised, i.e., they require information about the class label (type of motor imagery task) of the past trial in order to update the classifier. The last method updates the classifier without knowing the task of the past trial (unsupervised adaptation).

Methods for Level 1 (runs 1–3)

The first run started with a pre-trained subject-independent classifier on simple features: band-power in alpha (8–15 Hz) and beta (16–32 Hz) frequency range in three Laplacian channels at C3, Cz, C4. During these runs, the LDA classifier was adapted to the user after each trial. The inverse of the pooled covariance matrix (see Eq. 2) was updated for observation \(\user2{x}(t)\) using a recursive-least-square algorithm, (see Vidaurre et al. 2006 for more information):

$$ {\varvec{\Sigma}}({\user2{t}}) ^{-1}=\frac{1}{1-UC}\left( {\varvec{\Sigma}}(t-1)^{-1} - \frac{{\user2{v}}(t)\cdot {\user2{v}}^{\top}(t)}{\frac{1-UC} {UC}+{\user2{x}}^{\top}(t)\cdot{\user2{v}}(t)} \right)$$
(5)

where \({\user2{v}}(t) = {\varvec{\Sigma}}^{-1}(t-1)\cdot {\user2{x}}(t)\) . Note, the term \({\user2{x}}^{\top}(t)\cdot {\user2{v}}(t)\) is a scalar and no costly matrix inversion is needed. In Eq. 5, UC stands for update coefficient and is a small number between 0 and 1. For the present study, we chose UC = 0.015 based on a simulation using the data of the screening study. To estimate the class-specific adaptive mean \({\varvec{\mu}}_1(t)\) and \({\varvec{\mu}}_2(t)\) one can use an exponential moving average:

$$ {\varvec{\mu}}_i(t) = (1-UC)\cdot {\varvec{\mu}}_i(t-1) + UC\cdot {\user2{x}}(t) $$
(6)

where i is the class of \(\user2{x}(t)\) and UC was chosen to be 0.05. Note that the class-mean estimation is done in a supervised manner.

Methods for Level 2 (runs 4–6)

For the subsequent 3 runs, a classifier was trained on a more complex composed band-power feature. On the data of run 1–3, a subject-specific narrow band was chosen automatically (Blankertz et al. 2008b). For this frequency band, optimized spatial filters have been determined by Common Spatial Pattern (CSP) analysis (Blankertz et al. 2008b). Furthermore, six Laplacian channels have been selected according to their discriminability, which was quantified by a robust variant of the Fisher score (mean replaced by median). The selection of the positions was constraint such that two positions have been selected from each of the areas over left hand, right hand and foot. While CSP filters were static, the position of the Laplacians was reselected based on the Fisher score of the channels. Channel selection and classifier were recalculated after each trial using the last 100 trials. The classifier used here was regularized version of LDC, with automatic shrinkage, to account for the higher dimensionality of the features, as in Vidaurre et al. (2009). The feature vector was the concatenation of log band-power in the CSP channels and the selected Laplacians channels. The addition of the repeatedly selected Laplacian channels was included in order to provide flexibility with respect to spatial location of modulated brain activity. During these three runs the adaptation to the user was done again in a supervised way.

Methods for Level 3 (runs 7–8)

Finally for the last 2 runs, CSP filters have been calculated on the data of runs 4–6 and a classifier was trained on the resulting log band-power features. The bias of the classifier in Eq. 3 was adapted by updating the pooled mean \({\varvec{\mu}}\) after each trial with UC = 0.05. The update rule for the pooled mean was analogue to Eq. 6, but without distinction by class labels. Note that this adaptation scheme is unsupervised. For more information about unsupervised methods, see Vidaurre et al. (2008).

Results

As a verification of the novel experimental design, we first discuss the results for the six participants of Cat. I. Here, very good feedback performance was obtained within the first run after 20 to 40 trials (i.e., after 3–6 min) of adaptation and hit rates increased further in runs 2 and 3 and stayed on that level in subsequent runs. This can be seen in Fig. 1, where the grand average of feedback performance within each run is displayed, according to the Category of the participants. Note that all runs of one volunteer have been recorded within one session. The challenge of the experiment was the performance with the two participants of Cat. II and three users of Cat. III. All those five participants did not have control in the first three runs, but they were able to gain it when the machine learning based techniques came into play in runs 4–6: in the average performance for Cat. II a sudden jump occurred from run 3 to run 4, and for Cat. III a continuous increase in runs –6. According to Kübler et al. (2004) an accuracy of 70% is assumed to be a threshold required for BCI applications related to communication, such as cursor control.

Fig. 1
figure 1

Grand average of feedback performance within each run (horizontal bars and dots for each group of 20 trials) for participants of Cat. I (N = 6), Cat. II (N = 2) and Cat. III (N = 3). An accuracy of 70% is assumed to be a threshold required for BCI applications related to communication such as cursor control. Note that all runs of one volunteer have been recorded within one session

Conclusion

Machine Learning based BCIs use EEG features of larger complexity that can be fitted better to the individual characteristics of brain patterns of each user (see Blankertz et al. 2007, 2008b; Dornhege et al. 2004, 2007; Müller et al. 2003, 2008). The down side of this approach is the need for an initial offline calibration. Furthermore, users are in a different mental state during offline calibration than during online feedback (c.f. Shenoy et al. 2006), which renders the classifier that is optimized on the data of the calibration suboptimal and sometimes even non-functional for feedback (see Sugiyama et al. 2007; von Bünau et al. 2009) for a discussion of non-stationarities in BCI). Moreover, some users have difficulties to properly perform motor imagery for calibration due to the lack of feedback. Here, we have presented a novel method for Machine Learning based brain–computer interfacing which overcomes these problems. It replaces the offline calibration by a ‘coadaptive calibration’, in which the mental strategy of the user and the algorithm of the BCI system are jointly optimized. This approach leads some users very quickly (3–6 mins) to accurate BCI control. Other users, who could not gain BCI control in the classic Machine Learning approach (i.e., belonging to Cat. II or II), could gain BCI control within one session, see Fig. 1. In particular, one participant who had no peak of the SMR idle rhythm in the beginning of the measurement could develop such with our adaptive feedback training, (Vidaurre et al. in prep) This important finding gives rise to the development of neurofeedback training procedures that might help to cure BCI illiteracy. Further studies with a larger number of participants will be required in order to confirm these initial findings.