A multi-scale fusion CNN model based on adaptive transfer learning for multi-class MI-classification in BCI system

Deep learning-based brain-computer interface (BCI) in motor imagery (MI) has emerged as a powerful method for establishing direct communication between the brain and external electronic devices. However, due to inter-subject variability, inherent complex properties, and low signal-to-noise ratio (SNR) in electroencephalogram (EEG) signal are major challenges that significantly hinders the accuracy of the MI classifier. To overcome this, the present work proposes an efficient transfer learning-based multi-scale feature fused CNN (MSFFCNN) which can capture the distinguishable features of various non-overlapping canonical frequency bands of EEG signals from different convolutional scales for multi-class MI classification. In order to account for inter-subject variability from different subjects, the current work presents 4 different model variants including subject-independent and subject-adaptive classification models considering different adaptation configurations to exploit the full learning capacity of the classifier. Each adaptation configuration has been fine-tuned in an extensively trained pre-trained model and the performance of the classifier has been studied for vast range of learning rates and degrees of adaptation which illustrates the advantages of using an adaptive transfer learning-based model. The model achieves an average classification accuracy of 94.06% (±2.29%) and kappa value of 0.88 outperforming several baseline and current state-of-the-art EEG-based MI classification models with fewer training samples. The present research provides an effective and efficient transfer learning-based end-to-end MI classification framework for designing a high-performance robust MI-BCI system.

commands to control the external electronic devices [1,2,3,4,5,6,7]. The BCI allows rehabilitation of neuromotor disorders [8], robotic control [9,10,11], speech communication [12,13], etc. In BCI paradigms, MI classification is the most critical part in which brain signals can be translated into control signals [14,15]. During MI, desynchronization of neural activities triggers in the primary motor cortex contralateral to the movement which results in a decrease in µ − β rhythm known as event-related desynchronization (ERD) [16,17]. It is usually followed by an increase in the β rhythm called event-related synchronization (ERS) when MI ceases to exist. Thus, the main goal is to classify different MI tasks according to ERS and ERD characterized by changing of the power spectrum of µ (8-14 Hz) and β (14-30 Hz) bands.
In this regard, common spatial pattern (CSP) [18] and filter bank CSP (FBCSP) [19] are efficient methods in extracting discriminative features that represent ERD and ERS for MI classification. FBCSP finds a set of linear projections that maximize the differences in the variance of the MI classes by employing filtered signals from various frequency bands which have shown to be effective in improving accuracy.
However, brain signal from EEG has various distinct characteristics (i.e., non-linearity, uniqueness, and non-stationary behaviors) which significantly vary with the human brain and depend on the mental state of the individual subjects [20]. Moreover, due to the presence of noise from various muscle artifacts, fatigue, change of environment, and internal body states may significantly alter the characteristic of EEG signals [14,15]. Hence, aforementioned factors make it challenging to improve the signal-to-noise ratio (SNR) for better performance in MI classification.
With the recent advancement of deep learning (DL), it demonstrates superior performance in various applications [21,22,23] In this regard, convolutional neural network (CNN) has inherent capability of adapting non-linear EEG signal and extracting important feature information automatically in MI-BCI classification [24,25]. Thus, several studies are geared towards EEG signal classification employing CNN [24,25,26,27,28,29,30]. More recently, there are various CNN based methods [31,32,33,34,35] have been developed which demonstrate excellent performance in EEG MI classification.
However, DL-based algorithms contain large numbers of trainable model parameters which require a significant amount of training data and lead to increase in computation time to train the classifier [36,37]. In this regard, transfer learning (TL) based approach is an effective strategy utilizing pre-trained weights from different subject cases [37,38,39,40,41]. In a recent development, TL method have been employed DL based in the system, in which, a FBCSP based EEG representation utilizing knowledge distillation techniques has been used for MI-BCI classification with a fine-tuned CNN model [36,42]. Additionally, a subject-independent deep CNN model has been developed using spectral-spatial input generation for MI-BCI system [37].
A hybrid deep neural network utilizing transfer learning has been applied for multi-class MI decoding for better performance [41]. Furthermore, adaptive transfer learning-based deep CNN has been employed for EEG MI classification which has demonstrated significant improvement in classification accuracy compared to subject-dependent model [40].
Although, the CNN-based models have achieved better results for EEG-based MI classification, there are various issues which have caused to hinder the performance of the classifier.
Firstly, current CNN-based models only consider a single convolution scale for extracting features from MI EEG signal. Such a strategy may not be suitable to capture distinguishable features of various non-overlapping canonical frequency bands of EEG signals efficiently [43].
Secondly, an important discriminative feature extraction accounting for event-related desynchronization and synchronization (ERD/ERS) from MI EEG signal has been often ignored which limits classifier ability to learn important semantic features from the raw EEG data.
Thirdly, a handful of research has been geared towards establishing the efficient transfer learning framework to address the challenge of inter-subject variability between different subjects in the BCI system which requires fine-tuning the model during target MI subject classification [44]. Furthermore, feature extraction from multiscale CNN considering adaptation-based transfer learning yet to be designed for full integration into end-to-end DL workflow which is the main bottleneck for the deployment of robust BCI applications with good classification accuracy. Therefore, the goal of the current work is to develop a robust deep learning framework accounting for inter-subject variability of different subjects and further improve the performance of the model by employing the subject-adaptive transfer learning strategy to achieve better accuracy with fewer training samples.
Motivated by the aforementioned challenges and shortcomings, the current work proposes a transfer learning-based multi-scale feature fused CNN (MSFFCNN) for EEG-based multi-class MI classification. The major contributions and findings of the present research work can be summarized as follows.
• The current work designs an efficient multiscale CNN (MSCNN) architecture to capture the semantic features of EEG signals from multiple convolutional scales for four distinguishable frequency bands α,β, δ, and θ from raw EEG signal to enhance the performance of MI classifier.
• The present study designs a new FBCSP with the one-vs-rest (OVR) CNN block (OVR-FBCSP CNN) for extracting the discriminative spatiotemporal CSP features of eventrelated desynchronization and synchronization (ERD/ERS) suitable for multiclass MI classification tasks.
• In the current work, 4 different model variants of MSCNN including subject-specific, subject-independent, and subject-adaptive classification model considering two different adaptation configurations have been proposed to exploit the full learning capacity of the classifier.
• The performance of two different subject-adaptive models have been extensively studied for vast range of learning rates and degree of adaption to explore the adaptation capability of the proposed model accounting for inter-subject variability from different subjects.
• Current study illustrates the advantages of using adaptive transfer learning-based model over the subject-specific and subject-independent models in terms of the overall performance of the classifier by achieving the best average classification accuracy outperforming several state-of-the-art models with fewer training samples.
The proposed framework requires less training data and computation time suitable for designing efficient and robust real-time human-robot interaction. The current study effectively addresses the shortcoming of existing CNN-based EEG-MI classification models and significantly improves the classification accuracy. The paper is organized as follows: Section 2 describes the dataset and proposed EEG data representation; OVR-FBCSP have been described in section 3; section 4 introduces the proposed MSFFCNN framework; section 5 describes the transfer learning methods; sections 6 and 7 deal with the relevant finding and discussion of

Dataset and EEG input data representation :
2.1 Dataset: For the current study, the performance of the MSFFNN model has been evaluated on the BCI competition IV-2a dataset which contains EEG data from 9 subjects for 4 different MI classes (i.e., left hand, right hand, feet, and tongue) [45]. The recorded EEG data comprises two sessions from each subject with the first and second session consisting of training and test data, respectively. Each session was recorded for 288 trials (72 trials per class) from 22 EEG channels with a sampling frequency of 250 Hz according to standardized international 10-20 electrode system as shown in Fig. 1-(a). In a single trial, there was a cue followed by 4 sec of MI activity from each subject for each of the four classes as illustrated in 2.2 Proposed EEG data representation : Since DL methods in MI-BCI systems requires relatively large numbers of EEG training data to achieve good classification accuracy, current work proposed a 1D EEG segment as input data representation. Generally, in MI-BCI sys- Figure 2: Illustration of proposed EEG input data representation where acquired 2D data from the electrode distribution has been segmented into 1D vector during MI-activity time window.

MI-Time Window
tems, inputted data are considered as 2D array combining spatial information of muti-channels (electrodes) and corresponding time-series data for each trial [40]. Such representation ignores the positional electrode distribution in the actual acquisition device and may lead to complex multi-channel correlations instead of simple adjacent relationships [25]. To circumvent such issues, 1D EEG segment from each electrode of fix time window of 4 sec has been considered for a particular subject to segment the signals related to MI task as illustrated in Fig. 2. In the proposed method, input data from each channel can be represented by 1D vector where m is the number of channels; P = 1000 is the number of sampling points for a single MI task. Such simple and effective data representation results in a substantial increase in the number of training samples containing only time-varying information and time dependence of the signals related to the MI activity. The proposed method can effectively illuminate unnecessary features such as the channel-related spatial information and correlations between electrodes [25,40].

One-vs-Rest Filter Bank Common Spatial Pattern :
In the present work, FBCSP with the one-vs-rest (OVR) method has been utilized for ex-tracting the spatiotemporal-frequency features of event-related desynchronization and synchronization (ERD/ERS) [46]. The OVR-FBCSP network consists of several FBCSP blocks where the segment of EEG signal can be decomposed through a filter bank that contains an array of multiple types II Chebyshev sub-bandpass filters to extract the discriminative CSP features, for multi-class MI-BCI system [34,41,47].
3.1. Spatial feature extraction: By employing OVR-CSP, spatial filtering has been performed by linearly transforming EEG signal to obtain feature information using where t is numbers of EEG sample per channel; T denotes transpose operation; X b,t ∈ R 1×t represents the single-trial EEG segment from b-th band-pass filter of t-th trial; 4 is the weight of OVR-FBCSP filter in which W T b,j (j = 1, 2, 3, 4) represents CSP projection matrix. Following eigenvalue decomposition problem [46], the transformation matrix W T b,j can be obtained to yield optimal discriminating feature variances for multi-class MI as where N is a number of MI classes; C b,j is the covariance matrix of b-th band-pass filtered EEG signal of respective jth MI class; E b,j is the diagonal matrix containing eigenvalues of C b,j . Utilizing W b,j from Eq. 2, spatial filtered signal Z b,t maximizes the differences in the variance of MI classes. For each class, m pairs of CSP features of the t th time window for bth band-pass filtered EEG signal can be expressed as: where f b,t ∈ R 2m is the OVR-FBCSP output;Ŵ b denotes the first m and last m columns of W b,j (j = 1, 2, 3, 4); diag(·) is diagonal elements of the square matrix; tr(·) is the trace of the matrix. Note that m = 2 has been used for BCI IV 2a dataset. The OVR-FBCSP feature vector corresponds to t th time window can be expressed as where f t ∈ R 1×2mk (t = 1, 2, . . . , n); n total numbers of time windows (i.e., trials); k is the total number of band-pass filters. Training data that comprises extracted featuref t ∈ R nt×2mk and corresponding true class labelsf t ∈ R nt×1 (t = 1, 2, . . . , n t ); n t total numbers of trials in training data to make a distinction from evaluation data. Finally, the output matrix from OVR-FBCSP for each 1s time windows each has been attached to CNN layer called OVR-FBCSP CNN for spatial feature extraction (see Section 4.3).
3.2 Signal preprocessing : Each MI trial of 4 sec long EEG signal has been segmented into 4 parts of 1 sec time windows and fed into OVR-FBCSP to obtain spatial features as illustrated in Fig. 6. In OVR-FBCSP, the inputted EEG signal data have been passed through a filter bank that contains an array of a total of 12 bandpass filters including 2-6, 4-8, ..., [24][25][26][27][28] Hz. Each filter with a bandwidth of 4 Hz and an overlap of 2Hz has been employed covering the frequency range 2-32Hz. non-overlapping canonical frequency bands where each band signify distinct behavioral states [48,49]. For MI-BCI systems, α (8-13 HZ)and β (13-30) bands are the most important [50], since the increase/decrease of the power spectrum of these bands result in ERS/ERD, respectively [51,52]. Whereas, low frequency δ-bands (2-4 Hz) carry important class-related information [14,53,54]. Additionally, θ-band (4-8 Hz) differs during the left/right-hand MI which is helpful during MI-BCI classification process [33,55,56]. Hence, these four non-overlapping bands (i.e., α, β, δ, and θ) have been considered for feature extraction by employing filter Since our input EEG data representation is 1D time-series signal, 1D-CNN has been employed which is relatively easier to train and offers minimal computational complexity compare to its 2D counterparts whilst achieving state-of-the-art performance [57]. The convolution layer consists of 1D-convolution filters of a specified kernel stride which perform convolution operations sliding along the time axis of EEG signal to obtain feature maps and time-frequency information of the time series data [58,59]. In 1D-CNN, forward propagation can be expressed as follows: where x l k is the input; b l k represents the bias of k th feature information in l th layer; w l−1 ik is defined as the connecting weight between i th feature of the l − 1 th layer and k th feature of the l th layer; s i l−1 represents output of the i th feature of l − 1 th layer; conv1D denotes convolution operation. The intermediate output y l k can be obtained passing x l k through the activation function f (•) as Where s l k is the output of k th feature information in l th layer; Ω denotes the scalar factor of the down sampling operation ↓; rectified linear units (ReLU) f (x) = max(0, x) has been chosen as the activation function. During back-propagation, b l k and w l−1 ik are updated when weight and bias sensitivities are determined by minimizing the error value E. The learning rate η can be defined as The convolutional layer is followed by the pooling layer or down-sampling layer to reduce the spatial size of the representation and the number of network parameters while preserving important and relevant features information. It aggregates neighboring values in the feature map by taking the average (average pooling) or maximum (maximum pooling) of the feature information. However, max pooling is more efficient in preserving the main features of the previous layer after down-sampling an input representation. Thus max pooling has been adopted in the present study.

Multi-scale convolution block (MCB):
During the convolution operation, different kernel size can extract different spatial feature map. For example, relatively large kernel size captures the overall feature. However, it may miss the relevant and important fine-grain feature information. In such case, a relatively small kernel size can effectively retain fine-grain information [60,61]. Thus, in order to improve the performance of MI classifier, a multi-scale convolution consist of various kernel sizes can be an effective strategy to preserve both finegrain high frequency localized information as well as low-frequency overall representations for the various frequency bands of EEG signal [31,43].
Thus, a multi-scale convolution block (MCB) has been designed consists of three different kernel size in the convolution process. The network architecture of MCB has been shown in with medium kernel size 1×3 can capture relatively coarse grain feature information. Whereas, λ L represents large kernel size 1 × 5 which can collect the overall feature map efficiently. These three blocks are then followed by max-pooling layer to further reduce the network parameters.
In addition, max-pooling and convolution layer of 1 × 3 has been utilized to preserve important features [31,43] as shown in Fig. 5. In MSCNN, EEG signal has been divided into four different frequency bands channels and passed through corresponding MCB blocks (i.e., MCB i , i = δ, θ, α, and β) as shown in Fig. 4. Finally, the multi-scale feature information has been obtained by concatenation operation. The network parameters of MCB architecture has been detailed in Table. 1. The proposed MSCNN network can extract feature information from EEG signals on multiple scales which can significantly improves the classification accuracy of the MI-BCI classifier.  Table. 1 for network parameters of MCB.
output for each 1 sec time window called OVR-FBCSP CNN to extract spatial features as illustrated in Fig . 6. Each of the OVR-FBCSP CNN consists of two convolution-pooling layers. The feature output of size 12 × 12 for each time window from OVR-FBCSP has been passed through the 2D-convolution layer. The output of feature map x l k after 2D-convolution operation can be expressed as where M j represents input feature collection; conv2D denotes 2D-convolution operation. The activation function ReLU has been used. The convolution layer is followed by max-pooling layer in order to reduce the size of the feature map. The stride of the pooling kernel has been chosen as 2. Additionally, the zero-padding method has been employed to preserve the edge information and size of the spatial feature map. Finally, the output of the max-pooling layer has been resized through flatten layer which produces 1 × 432 array. The main network parameters of OVR-FBCSP CNN architecture have been listed in Table. 2. The local features extracted by MSCNN and OVR-FBCSP have been concatenated together to form the global feature. It has been connected to convolution and then max-pooling layer as shown in Fig. 3.
ConvPool -2 Figure 6: Schematic of network structure for proposed OVR-FBCSP CNN block. See Table. 2 for network parameters of OVR-FBCSP CNN.

Training optimization:
The fully connected layer utilizes ReLU as the activation function in hidden layers which helps to accelerates the optimization process of the network. It provides better classification accuracy compared to other activation functions for MI-BCI application [31]. The softmax function has been utilized to obtain exponential probability distribution of 4 different MI-BCI classification tasks in output layer which can be expressed as: where T represents the total number of classes; m represents the index of corresponding classes.
Additionally, cross-entropy loss function has been utilized during training to optimize the model. The cross-entropy L CE can be expressed as: t i log(p i ) where n is total number of classes; t i is the truth label; p i is the Softmax probability for i th class. In addition, Adam [62] optimization scheme has been implemented to minimize the difference between probabilistic cross-entropy loss. Moreover, the dropout technique [63] has been employed to prevent over-fitting and accelerate the training procedure.

Transfer learning modeling:
Generally, CNN based MI-BCI classification algorithms contain large numbers of trainable model parameters which requires significant amount of training data and leads to an increase in computation time [36,37]. In order to overcome the aforementioned issues, transfer learning can be an efficient strategy utilizing pre-trained weights from different subject cases. [36,37,39,40,41] However, transfer learning can be challenging in the MI-BCI system due to substantial inter-subject variability between different subjects [44,40]. Therefore, adaptation schemes require special attention for fine-tuning the model parameter prudently to establish an efficient transfer learning-based MI-BCI classifier [39,40,41]. In the present study, three different classification procedure including subject-specific, subject-independent and subjectadaptive classification method has been employed which have been detailed in subsequent sections.  These two proposed adaptation configurations have been illustrated in Fig. 7-(a). The degree of adaptation ξ can be defined as the fraction of training data to fine-tune the model for each subject-adaptive configuration as shown in Fig. 7 -(b). The numbers of required trainable network parameters Ψ AS for two adaption configurations have been outlined in Fig. 7 -(c).
Additionally, learning rate η has been scaled down to avoid clobbering the initialization [64].
From the Eq. 7, one can assume η = θη where 0 < θ ≤ 1 is the scaling factor of η: For θ < 1, the model accepts scaled-down weighted adaptation. In the present study, different scaling factors θ has been considered to obtain the optimal choice of the learning rate for efficient adaptation and enhance the performance of the classifier (see section 6.3.1).

Results and discussions:
In this section, the accuracy and performance of the subject-specific, subject-independent, and subject-adaptive classification models have been discussed and compared with several existing state-of-the-art methods. The performance of the proposed model has been evaluated by classification accuracy A c = i=1 Q n ii /N which can be obtained from the confusion matrix by the ratio of the sum of diagonal elements i=1 Q n ii to the total number of samples N . For a fair comparison, a similar training strategy has been applied for proposed MSFFCNN-1, MSFFCNN-2, and MSFFCNN-TL models. During training the models, batch size of 24 has been considered which attains the highest classification accuracy and optimizes the convergence speed. The network has been trained for 350 epochs. In the second FC layer, a dropout probability value of 0.5 has been prescribed. The models have been implemented in Keras API with TensorFlow as backend. The model has been trained and tested using an Intel Core i7 CPU with a single NVIDIA GeForce RTX 2080 8 GB GPU. The average A c value and standard error (SE)= s/ √ n (s is the sample standard deviation; n is the sample size ) are calculated across all subjects.
From the comparison, one can see that max-pooling provides the best classification accuracy with 0.55% improvement over average-pooling as depicted in Fig. 8 -(b). Moreover, maxpooling provides the smallest SE value of 2.89% that is 0.30% lower than average pooling.
Thus, max-pooling has been adopted in all four models which provides the better capability of highlighting features conducive to the MI-BCI classification task. In this section, the classification accuracy for each subject and average classification accuracy across all 9 subjects has been reported for subject-independent classification model MSFFCNN-2 compared with MSFFCNN-1 and other baseline models.

Comparison with baseline models :
In order to evaluate the performances of the proposed MSFFCNN variants, classification accuracy is compared with commonly used traditional ML models as MI-BCI classifier including SVM and LDA as the baseline models.
In addition, a standard CNN have been used as a baseline DL model. The standard CNN network structure consists of five conv-pool layers which are relatively deeper than proposed MSFFCNN models. For fair comparison, model hyper-parameters were kept as consistent as possible with the proposed models. As outlined in Table 6, both MSFFCNN-1 and MSFFCNN-  Tables 7-8 where bold highlights the best adaptation configuration for a particular θ. For θ = 1, the learning rate of the MSFFCNN-TL model is the same as the other two models. In such a case, the best average A c has been obtained as 91.79% for the full degree of adopted AC − 2 configuration as shown in Table 7. However, partial degree of adaptation ξ = 0.8 in AC − 2 provides the best A c = 92.36% for θ = 0.5. With further reduction of θ = 0.1 there is 0.88% improvement in accuracy compared to θ = 0.5 for ξ = 1.0. Similarly, for θ = 0.05 and 0.025, accuracy has been improved to 93.69% and 93.84% for full adaptaion ξ = 1.0, respectively as shown in Table   8. It is noteworthy to mention, the best accuracy results for θ = 0.05 and 0.025 have been achieved for AC − 2. However, the overall best average A c = 94.06% has been obtained for the lowest θ = 0.01 with 2.27% increase compared to θ = 1.0 indicating importance of tuning down θ to improve the performance of the classifier significantly.
In Fig. 10, average A c has been plotted as function of ξ for different θ.    This is because the AC − 2 configuration has a higher trainable parameter than AC − 1 which may lead to over-fitting to some degree for higher ξ. The present study reveals that there exist two different regimes of θ for two different adaptation configurations for optimum performance of the MSFFCNN-TL classifier.  Table 9. The comparison demonstrates the superiority of subject-adaptive model by  achieving performs better than subject-dependent and subject-independent counterpart even

Discussion :
The current research demonstrates the efficiency of the proposed MSFFCNN for extracting semantic features of EEG signals from multiple convolutional kernel scales for four different frequency bands to enhance the performance of the MI classifier. Incorporating OVR-FBCSP CNN further improves the accuracy of the classifier emphasizing the prospect of the current framework in adopting the distinguishable feature of MI-EEG signals. Furthermore, the current study illustrates the advantages of using an adaptive transfer learning-based CNN model over the subject-specific and subject-independent model in order to obtain better classification accuracy and performance. Due to inter-subject variability in MI-EEG data, the majority of the traditional ML methods in the MI-BCI system uses subject-specific data for better accuracy. However, to train a deep CNN network model containing a large number of trainable parameters requiring a significant amount of training data which is not suitable for relatively less subject-specific data limiting the accuracy of the classifier. In this regard, the current study shows the usefulness of subject-independent models which can be trained in a large number of inter-subject samples. In order to further increase the performance of the classifier, different adaptation techniques considering different learning rate scale factors, degree of adaption, and adaptation configurations have been explored to fine-tune the subject-independent model which provides significant improvement of accuracy of the classifier. The current study illustrates the importance of lowering learning rates which facilitate effective adaptation and improve classification accuracy. Among all adaptive models, it has been found out that for θ = 0.01 with a full degree of adaptation ξ = 1 and adaptation configuration AC − 2 provides the best average accuracy of 94.06% which is a 4.86% improvement over the subject-specific model. Thus, current work effectively addresses training and adaptation strategy in adaptive cross-subject transfer learning considering inter-subject variability for better performance. Future work will focus on further improving the classification accuracy by incorporating long short-term memory (LSTM) RNN architecture to extract temporal features and employ the proposed framework for classifying spatio-temporal multi-class MI subject classification for various BCI applications. Additionally, Maximum Mean Discrepancy (MMD) strategy [76] can be utilized to further regularize the adaptation of individual CNN layer. The proposed CNN framework can also be used in material informatics [77,78,79,80,81,82]. Nevertheless, the proposed MSFFCNN model can be employed as a more reliable and robust MI-based real-time BCI applications such as robotic control [9,10,11], rehabilitation of neuromotor disorders [8], text entry speech communication [12,13] etc.

Conclusion :
Summarizing, in the current study, a transfer learning-based multi-scale featured fused CNN (MSFFCNN) framework has been presented for multi-class MI classification where the multiscale convolution block comprise of various convolutional kernel sizes can efficiently extract semantic features for different frequency bands δ, θ, α, and β in multiple scales. Various parametric exploration including the influence of learning rate scale factor and degree of adaptions on different adaptation configurations sheds light on effective and optimal adaptation strategies to maximize the performance of the proposed transfer learning model. The current study illustrates the importance of lowering learning rates which facilitate effective adaptation and improve classification accuracy. Among all adaptive models, it has been found out that for relatively low learning rate with a full degree of adaptation and adaptation configuration corresponds to a relatively larger number of numbers of adaptable network parameters provides the best average classification accuracy of 94.06% (±2.29%) which is a 4.86% improvement over the subject-specific model. The proposed framework requires less training data and computation time suitable for designing robust and efficient human-robot interaction. The present study effectively addresses the shortcoming of existing CNN-based EEG-MI classification models and significantly improves the classification accuracy.