8
Abstract—In motor imagery-based Brain Computer Interfaces (BCI), discriminative patterns can be extracted from the electroencephalogram (EEG) using the Common Spatial Pattern (CSP) algorithm. However, the performance of this spatial filter depends on the operational frequency band of the EEG. Thus, setting a broad frequency range, or manually selecting a subject-specific frequency range, are commonly used with the CSP algorithm. To address this problem, this paper proposes a novel Filter Bank Common Spatial Pattern (FBCSP) to perform autonomous selection of key temporal- spatial discriminative EEG characteristics. After the EEG measurements have been bandpass-filtered into multiple frequency bands, CSP features are extracted from each of these bands. A feature selection algorithm is then used to automatically select discriminative pairs of frequency bands and corresponding CSP features. A classification algorithm is subsequently used to classify the CSP features. A study is conducted to assess the performance of a selection of feature selection and classification algorithms for use with the FBCSP. Extensive experimental results are presented on a publicly available dataset as well as data collected from healthy subjects and unilaterally paralyzed stroke patients. The results show that FBCSP, using a particular combination feature selection and classification algorithm, yields relatively higher cross- validation accuracies compared to prevailing approaches. I. INTRODUCTION Brain-Computer Interface (BCI) is a system that empowers a user to command external devices, such as a computer, wheelchair control [1] or prostheses [2], using scalp-recorded electroencephalogram (EEG) measurements. Studies have shown that the EEG measurements recorded during mental imagination of movements can be translated into commands. Thus, motor imagery-based BCI provides a promising control and communication control to people suffering from motor disabilities [3]. The Common Spatial Pattern (CSP) algorithm [4],[5] is effective in constructing optimal spatial filters that discriminates two classes of EEG measurements in motor- imagery-based BCI [5]. However, the performance of this spatial filter is dependent on its operational frequency band. Classification performed on the CSP features generally yields poor accuracies when the EEG measurements are either unfiltered or have been filtered with an This work was supported by the Science and Engineering Research Council of A*STAR (Agency for Science, Technology and Research), and The Enterprise Challenge, Prime Minister’s Office, Singapore. The authors are with Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), 21 Heng Mui Keng Terrace, Singapore 119613. (email: {kkang, zychin, hhzhang, ctguan}@i2r.a- star.edu.sg). inappropriately selected frequency range [6]. Hence, setting a broad frequency range or manually selecting a subject- specific frequency range are commonly used with the CSP algorithm [7]. To address the problem of manually selecting the operational subject-specific frequency band for the CSP algorithm, several approaches have been proposed. One approach is the Common Spatio-Spectral Pattern (CSSP), which optimizes a simple filter that employs a one time- delayed sample with the CSP algorithm [8]. Another approach is the Common Sparse Spectral Spatial Pattern (CSSSP). It improves the CSSP algorithm by performing simultaneous optimization of an arbitrary (Finite Impulse Response) FIR filter within the CSP algorithm [7]. However, due to the inherent nature of the optimization problem, the solution of filter coefficients is also strongly dependent on the choice of initial parameters [6]. Recently, an alternative approach called the Sub-band Common Spatial Pattern (SBCSP) was proposed and has been shown to yield superior classification accuracy compared against CSSP and CSSSP on a publicly available dataset [6]. Instead of optimizing a single arbitrary FIR filter within the CSP algorithm, SBCSP uses a Gabor filter bank that decomposes the EEG measurements into multiple sub- bands. Spatial filters that use the CSP algorithm are then employed on each of these sub-bands. After obtaining sub- band scores, recursive band elimination or a classification algorithm is employed to fuse the sub-band score. Another classification algorithm is then employed to classify the fused sub-band score. Although SBCSP can use different sub-band score fusion techniques and classification algorithms, only the results from the use of the Support Vector Machine (SVM) to fuse the sub-band score as well as to perform classification are presented in [6]. Hence, a comparative study of using different sub-band score fusion techniques and classification algorithms are not available. In this paper, a novel machine learning approach called the Filter Bank Common Spatial Pattern (FBCSP) is proposed for processing EEG measurements in motor imagery-based BCI. FBCSP comprises four stages: frequency filtering, spatial filtering, feature selection and classification. In the first stage, the EEG measurements are bandpass-filtered into multiple frequency bands. In the second stage, CSP features are extracted from each of these bands. In the third stage, a feature selection algorithm is used to automatically select discriminative pairs of frequency bands and corresponding CSP features. In the fourth stage, a classification algorithm is used to classify the CSP features. This paper also presents extensive Filter Bank Common Spatial Pattern (FBCSP) in Brain-Computer Interface Kai Keng Ang, Zheng Yang Chin, Haihong Zhang, and Cuntai Guan A 2391 978-1-4244-1821-3/08/$25.00 c 2008 IEEE

Filter Bank Common Spatial Pattern (FBCSP) in Brain-Computer Interface

  • Upload
    a-star

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Abstract—In motor imagery-based Brain ComputerInterfaces (BCI), discriminative patterns can be extracted fromthe electroencephalogram (EEG) using the Common SpatialPattern (CSP) algorithm. However, the performance of thisspatial filter depends on the operational frequency band of theEEG. Thus, setting a broad frequency range, or manuallyselecting a subject-specific frequency range, are commonlyused with the CSP algorithm. To address this problem, thispaper proposes a novel Filter Bank Common Spatial Pattern(FBCSP) to perform autonomous selection of key temporal-spatial discriminative EEG characteristics. After the EEGmeasurements have been bandpass-filtered into multiplefrequency bands, CSP features are extracted from each of thesebands. A feature selection algorithm is then used toautomatically select discriminative pairs of frequency bandsand corresponding CSP features. A classification algorithm issubsequently used to classify the CSP features. A study isconducted to assess the performance of a selection of featureselection and classification algorithms for use with the FBCSP.Extensive experimental results are presented on a publiclyavailable dataset as well as data collected from healthy subjectsand unilaterally paralyzed stroke patients. The results showthat FBCSP, using a particular combination feature selectionand classification algorithm, yields relatively higher cross-validation accuracies compared to prevailing approaches.

I. INTRODUCTION

Brain-Computer Interface (BCI) is a system thatempowers a user to command external devices, such as

a computer, wheelchair control [1] or prostheses [2], usingscalp-recorded electroencephalogram (EEG) measurements.Studies have shown that the EEG measurements recordedduring mental imagination of movements can be translatedinto commands. Thus, motor imagery-based BCI provides apromising control and communication control to peoplesuffering from motor disabilities [3].

The Common Spatial Pattern (CSP) algorithm [4],[5] iseffective in constructing optimal spatial filters thatdiscriminates two classes of EEG measurements in motor-imagery-based BCI [5]. However, the performance of thisspatial filter is dependent on its operational frequency band.Classification performed on the CSP features generallyyields poor accuracies when the EEG measurements areeither unfiltered or have been filtered with an

This work was supported by the Science and Engineering ResearchCouncil of A*STAR (Agency for Science, Technology and Research), andThe Enterprise Challenge, Prime Minister’s Office, Singapore.

The authors are with Institute for Infocomm Research, Agency forScience, Technology and Research (A*STAR), 21 Heng Mui Keng Terrace,Singapore 119613. (email: {kkang, zychin, hhzhang, ctguan}@i2r.a-star.edu.sg).

inappropriately selected frequency range [6]. Hence, settinga broad frequency range or manually selecting a subject-specific frequency range are commonly used with the CSPalgorithm [7].

To address the problem of manually selecting theoperational subject-specific frequency band for the CSPalgorithm, several approaches have been proposed. Oneapproach is the Common Spatio-Spectral Pattern (CSSP),which optimizes a simple filter that employs a one time-delayed sample with the CSP algorithm [8]. Anotherapproach is the Common Sparse Spectral Spatial Pattern(CSSSP). It improves the CSSP algorithm by performingsimultaneous optimization of an arbitrary (Finite ImpulseResponse) FIR filter within the CSP algorithm [7].However, due to the inherent nature of the optimizationproblem, the solution of filter coefficients is also stronglydependent on the choice of initial parameters [6].

Recently, an alternative approach called the Sub-bandCommon Spatial Pattern (SBCSP) was proposed and hasbeen shown to yield superior classification accuracycompared against CSSP and CSSSP on a publicly availabledataset [6]. Instead of optimizing a single arbitrary FIR filterwithin the CSP algorithm, SBCSP uses a Gabor filter bankthat decomposes the EEG measurements into multiple sub-bands. Spatial filters that use the CSP algorithm are thenemployed on each of these sub-bands. After obtaining sub-band scores, recursive band elimination or a classificationalgorithm is employed to fuse the sub-band score. Anotherclassification algorithm is then employed to classify thefused sub-band score. Although SBCSP can use differentsub-band score fusion techniques and classificationalgorithms, only the results from the use of the SupportVector Machine (SVM) to fuse the sub-band score as well asto perform classification are presented in [6]. Hence, acomparative study of using different sub-band score fusiontechniques and classification algorithms are not available.

In this paper, a novel machine learning approach calledthe Filter Bank Common Spatial Pattern (FBCSP) isproposed for processing EEG measurements in motorimagery-based BCI. FBCSP comprises four stages:frequency filtering, spatial filtering, feature selection andclassification. In the first stage, the EEG measurements arebandpass-filtered into multiple frequency bands. In thesecond stage, CSP features are extracted from each of thesebands. In the third stage, a feature selection algorithm isused to automatically select discriminative pairs offrequency bands and corresponding CSP features. In thefourth stage, a classification algorithm is used to classify theCSP features. This paper also presents extensive

Filter Bank Common Spatial Pattern (FBCSP)in Brain-Computer Interface

Kai Keng Ang, Zheng Yang Chin, Haihong Zhang, and Cuntai Guan

A

2391

978-1-4244-1821-3/08/$25.00 c©2008 IEEE

experimental results using a selection of feature selectionand classification algorithms for use in FBCSP. The purposeof this study is to recommend suitable feature selection andclassification algorithms for motor imagery-based BCI.

Section II describes the FBCSP approach and comparesit with the SBCSP approach. A brief description of the CSPalgorithm is then given in Section III. The feature selectionand classification algorithms studied in this paper are brieflydescribed in Section IV and Section V respectively.Extensive experimental results of using FBCSP on apublicly available dataset and experimental data collectedfrom healthy subjects as well as unilaterally paralyzed strokepatients are given in Section VI. The results are alsocompared against the SBCSP and the CSP with manuallyconfigured bandpass filter parameters. Section VIIconcludes this paper.

II. FILTER BANK COMMON SPATIAL PATTERN

The proposed Filter Bank Common Spatial Pattern(FBCSP) machine learning approach is illustrated in Fig. 1.It comprises four progressive stages of EEG measurementsprocessing: multiple bandpass filters using zero-phaseChebyshev Type II filters, spatial filtering using the CSPalgorithm, feature selection of the CSP features, andclassification of the selected CSP features.

Fig. 1. Architecture of the proposed Filter Bank Common SpatialPattern (FBCSP) machine learning approach

The first stage employs a filter bank that bandpassfilters the EEG measurements into multiple bands. Thesecond stage performs spatial filtering on each of thesebands using the CSP algorithm. Thus, each pair of bandpassand spatial filter yields CSP features that are specific to thefrequency range of the bandpass filter. The third stageemploys a feature selection algorithm to select thediscriminative CSP features from the filter bank. The fourthstage employs a classification algorithm to model andclassify the selected CSP features.

Fig. 2. Architecture of Sub-Band Common Spatial Pattern (SBCSP) [6]

Fig. 2 shows the architecture of SBCSP [6]. ComparingFig. 1 against Fig. 2, the first, second, and fourth stages ofFBCSP are similar to SBCSP [6]. However, SBCSP

employs a Gabor fourier-based filterbank whereas FBCSPemploys a zero-phase Chebyshev Type II Infinite ImpulseResponse (IIR) filterbank. The use of zero-phase filtering inthe FBCSP overcomes the non-linear phase shift caused bythe IIR filter. Furthermore, SBCSP computes a sub-bandscore for each spatial filter, followed by recursive bandelimination or a classification algorithm to fuse the sub-bandscore. SBCSP then employs another classification algorithmto model and classify the fused sub-band score. In contrast,the third stage of FBCSP in this paper employs a featureselection algorithm to select discriminating CSP features;the fourth stage employs a classification algorithm to modeland classify the selected CSP features. Hence, FBCSP ismore generalized because any feature selection andclassification algorithms from the machine learning andcomputational intelligence literature can be employed. Inaddition, SBCSP deploys all the spatial filters whereasFBCSP deploys only those effective spatial filters whosepairs of CSP features are selected. Hence, FBCSP employsonly a small subset of effective spatial filters instead. Thisreduces the computational complexity as compared againstusing the entire set of spatial filters.

The next section provides a brief description of the CSPalgorithm, followed by sections IV and V that provide abrief description of the feature selection and classificationalgorithms.

III. COMMON SPATIAL PATTERN ALGORITHM

The neurophysiological background of motor-imagery basedBCIs is that motor activity, both actual and imagined[9],[10], causes an attenuation or increase of localized neuralrhythmic activity called Event-Related Desynchronization orEvent-Related Synchronization (ERS) respectively [11]. TheCommon Spatial Pattern (CSP) algorithm is highlysuccessful in calculating spatial filters for detecting ERDand ERS [4]. The objective of spatial filtering employing theCSP algorithm [12] in BCI is to compute features whosevariances are optimal for discriminating two classes of EEGmeasurements [12],[13].

The method employed by the CSP algorithm is based onthe simultaneous diagonalization of two covariance matrices[12],[14]. In summary, the spatially filtered signal Z of asingle trial EEG E is given as

�Z WE . (1)where E is an N�T matrix representing the raw EEGmeasurement data of a single trial; N is the number ofchannels; T is the number of measurement samples perchannel. W is the CSP projection matrix. The rows of W arethe stationary spatial filters and the columns of W-1 are thecommon spatial patterns.

The spatial filtered signal Z given in (1) maximizes thedifferences in the variance of the two classes of EEGmeasurements. However, the variances of only a smallnumber m of the spatial filtered signal are generally used asfeatures for classification [12]. The m first and last rows ofZ i.e. Zp, p�{1..2m} form the feature vector Xp given in (2)as inputs to a classifier.

2392 2008 International Joint Conference on Neural Networks (IJCNN 2008)

� � � �2

1

log var varm

p p pi

X�

� � �

� �Z Z (2)

From the concept of cortical homunculus [15], differentareas of the cerebral cortex control movements of differentparts of the body [8]. In addition, the distinct areas of thecerebal cortex control movements on the contralateral sideof the body [16]. For example, the region that controls thefeet is at the center of the vertex, the region that controls theleft hand is on the right hemisphere, and the region thatcontrols the right hand is on the left hemisphere [8]. Hence,the spatial patterns of a motor action are verifiable with thespecific region that controls the motor action [4].

IV. FEATURE SELECTION ALGORITHM

Feature selection in pattern classification is defined as: givena set of d features, select a subset of size k that leads to thesmallest classification errors. There are mainly two featureselection approaches in the literature [17]: the wrapperapproach where features are selected using the classifier;and the filter approach where the features are selectedindependent from the classifier. Although the wrapperapproach may yield better performance, increasedcomputational effort is often required [17].

In Mutual Information (MI)-based feature selection, theproblem is defined as, given an initial set F with d features,find the subset S�F with k features that maximizes MutualInformation I(S;�) [18]. The MI between the two randomvariables is [19]

� � � � � �; |I H H� �X Y Y Y X . (3)

where the entropy is of a d-dimensional random variable X� �1 2, , dX X X�X � is

� � � � � �2logx

H p x p x�

� ��X

X ; (4)

the conditional entropy of random variables X and Y is

� � � � � �2| , log |x y

H p x y p y x� �

� ���X Y

Y X ; (5)

and p(�) are probability functions.In pattern classification problems, the input features are

usually continuous variables and the class has discretevalues. Thus the MI between input features X and the class� is

� � � � � �;� � � |I H H� �X X , (6)

where ���={1,…,N�}; and the conditional entropy is

� � � � � �21

| | log |N

H p x p x dx�

� ��

� � � ��XX , (7)

where N� is the number of classes.In the wrapper approach, the conditional entropy in (7)

is simply

� � � � � �21

| | log |N

H p p�

� ��

� � ��X X X , (8)

where p(�|X) is easy to estimate from the number of datasample classified as class � using the classifier over the totalnumber of data samples. In the filter approach, thisconditional entropy is relatively harder to compute since it is

not easy to estimate p(�|X). The method of using ParzenWindow to estimate p(�|X) is briefly reviewed in section V.

The following feature selection algorithms are used inthis paper:

A. MIBIF algorithm

The Mutual Information based Best Individual Feature(MIBIF) algorithm that is based on the filter approach isdescribed as follows:

� Step 1: InitializationInitialize set of d features � �1 2, , dF f f f� � , set of

selected features S � � .

� Step 2: Compute the MI of features with the output classCompute � �; 1.. ,i iI f i d f F� � �� .

� Step 3: Select the best k featuresRepeatSelect the feature if that maximizes � �;iI f � using

� � � � � � � �1.. ,

\ , | ; max ;j

i i i jj d f F

F F f S f I f I f� �

� � �� � . (9)

Until S k�

The MIBIF feature selection algorithm requires a user-defined parameter k to select the k best features.

B. MINBPW algorithm

The Mutual Information-based Naïve Bayesian ParzenWindow (MINBPW) [20] algorithm that is based on thewrapper approach for the Naïve Baysian Parzen Window(NBPW) classifier [20], is described as follows:� Step 1: Initialization

Initialize set of d features � �1 2, , dF f f f� � , set of

selected features S � � .� Step 2: Compute the MI of features with the output class

Compute � �; 1.. ,i iI f i d f F� � �� .

� Step 3: Select the first featureSelect the feature if that maximizes � �;iI f � using

� � � � � � � �1.. ,

\ , | ; max ;j

i i i jj d f F

F F f S f I f I f� �

� � �� � . (10)

where the MI is computed using (6) and the conditionalentropy is estimated using (8).

� Step 4: Greedy selectionRepeata) Compute � �; 1.. ,i iI f S i d f F� � � �� , the joint MI

between the feature i and selected features with theoutput class where the conditional entropy is estimatedusing (8).

b) Select next feature using� � � �� � � �

1.. ,

\ , |

; max ; .j

i i

i jj d f F

F F f S f

I f S I f S� �

� �

� � �� �(11)

Until � � � � � �� �; ;iS k I f S I S �� � � � �� �

The MINBPW algorithm also requires a user-definedparameter k or a small � to stop the selection of features. The

2008 International Joint Conference on Neural Networks (IJCNN 2008) 2393

parameter � is set to 0.1 in this paper.

C. MIFS algorithm

The Sequential Forward Selection method or greedyselection scheme is adopted by Battiti in the MutualInformation based feature selection (MIFS) algorithm [18].The MIFS algorithm is based on the filter approach [18].The MIFS algorithm requires a user defined parameter k toset the maximum number of features to select, or a small � tostop the selection of features once there are no furtherincrease in MI beyond �. The user-defined value of 0.1 isused for � in this paper

D. FRFS2 algorithm

In addition to the prevailing application of MI in featureselection, Rough set theory (RST) [21] is also potentiallyfeasible in feature selection and significantly reduce thepattern dimensionality [22]. However, RST is only capableof handling discretized attribute values. The Fuzzy-Roughset-based Feature Selection (FRFS) is a new approach thatextends data reduction in RST theory for crisp and real-value attributes [23]. The new Fuzzy-Rough set-basedFeature Selection (FRFS2) algorithm [23],[24] that is basedon the filter approach is used in this paper.

The FRFS2 algorithm does not require any user-definedparameters.

E. MIRSR algorithm

The Mutual Information-based Rough Set Reduction(MIRSR) that is based on the wrapper approach for theRough set-based Neuro-Fuzzy System (RNFS), employs theMI to select attributes with high relevance and the conceptof knowledge reduction in rough set theory to selectattributes with low redundancy [20].

V. CLASSIFICATION ALGORITHMS

A classifier is one that estimates the class label ��� fromthe trained model given a data sample X={X1,X2,…,Xd} withd features where Y is the true class label. The trained modelis constructed from training data that comprises n samples{X1,X2,…Xn} with respective class labels {Y1,Y2,…Yn}.The following classifiers are used in this paper:

A. NBPW algorithm

The Naïve Bayesian Parzen Window (NBPW) classifierestimates p(X|�) and P(�) from training data samples andpredicts the class � with the highest posterior probabilityp(�|X) using Bayes rule

� � � � � �� �

||

p Pp

p

� �� �

XX

X, (12)

where p(�|X) is the conditional probability of class � giventhe data sample X; p(X|�) is the conditional probability of Xgiven class �; P(�) is the prior probability of class �; andp(X) is

� � � � � �| .p p P�

� ���

� �X X (13)

The computation of p(�|X) is rendered feasible by anaïve assumption that all the features X1,X2,…,Xd areconditionally independent given class � in

� � � �1

| |d

ii

p p X� ��

��X . (14)

The NBPW classifier employs Parzen Window toestimate the conditional probability p(Xi|�) in

� � � �,

1ˆ | ,i i i j

j I

p X X X hn

��

� ��

� �� , (15)

where �=1,…,N�; n� is the number of data samplesbelonging to class �; I� is the set of indices of the datasamples belonging to class �; and � is a smoothing kernelfunction with a smoothing parameter h. The NBPWclassifier employs the univariate Gaussian kernel given by

� �2

221,

2

x

hx h e�

� � � �� � , (16)

and normal optimal smoothing strategy [25] given by1

54

3opth

n!� � �

� , (17)

where ! denotes the standard deviation of the distribution.The classification rule of the NBPW classifier is given

by

� �arg max |p�

� ���

� X . (18)

B. FLD algorithm

The Linear Discriminant classification rule is given byk b

k b�

"� #$% "& �'

W X

W X(19)

where class k�� is discriminated against the rest; W is anadjustable weight vector or projection vector for class k; "denotes transpose operator; and b is a bias. Thisclassification rule maps multi-dimensional data to one-dimensional using a linear function.

The Fisher Linear Discriminant (FLD) [26] is a lineardiscriminant that maximizes the ratio of between-classscatter to within-class scatter given by

� �J"

�"

B

W

W S WW

W S W(20)

where SB is the between class scatter matrix; and SW is thewithin class scatter matrix.

C. SVM algorithm

The Support Vector Machine (SVM) [27] is a lineardiscriminant that maximizes the separation between twoclasses based on the assumption that it improves theclassifier’s generalization capability. This is achieved byminimizing the cost function

� � 21

2J �W W , (21)

subject to the constraint� � 1 1..i iY b i n" � � # � �W X , (22)

where X1,X2,…Xn are the training data, and b is a bias.

2394 2008 International Joint Conference on Neural Networks (IJCNN 2008)

The SVM employs the linear discriminant classificationrule given in equation (19). The SVM implementation in theMatlab Bioinformatics toolbox is used in this paper.

D. CART algorithm

Decision tree is a classifier which uses symbolic tree-like representations of finite sets of if-then-else questionsthat are natural, intuitive and interpretable [26]. TheClassification And Regression Tree (CART) [28]implementation in the Matlab Statistics toolbox is used inthis paper.

E. k-NN algorithm

The k-nearest neighbor (k-NN) [29] is a classifier thatassigns the class label of a new data based on the class withthe most occurrences in a set of k nearest training data pointsusually computed using a distance measure such as theEuclidean distance. The k-nearest neighbor implementationin the Matlab Bioinformatics toolbox with k=3 is used in thispaper.

F. RNFS and DENFIS algorithms

Neuro-Fuzzy Systems are hybrid intelligent systemsthat synergize the human-like reasoning style of fuzzysystems with the learning and connectionist structure ofneural networks. Neuro-Fuzzy Systems can be classifiedinto linguistic or precise model [30]. The linguistic Neuro-Fuzzy System used in this paper is the Rough set-basedNeuro-Fuzzy System (RNFS) [20]. RNFS is a novel hybridintelligent system that synergizes the concept of knowledgereduction in rough set theory with neuro-fuzzy systems [20].The precise NFS used in this paper is the Dynamic EvolvingNeural-Fuzzy Inference System (DENFIS) [31].

VI. EXPERIMENTAL RESULTS

The “No Free Lunch” theorem states that there is no generalsuperiority of any approach over the others in patternclassification; and if one approach seems to outperformanother in a particular situation, it is a consequence of its fitto the particular pattern recognition problem [26]. Therefore,this section assesses the performance of combining thevarious feature selection algorithms described in section IVand classification algorithms described in section V for usewith the FBCSP in motor imagery-based BCI. Filter-basedfeature selection algorithms, namely, MIBIF, MIFS andFRFS2, work with any classification algorithms. However,wrapper-based feature selection algorithms, namely,MINBPW and MIRSR work only with the RNFS andNBPW classification algorithms respectively. Hence,MINBPW and MIRSR are not combined with otherclassification algorithms. The proposed FBCSP approach isapplied to the publicly available BCI competition III datasetIVa [32],[33] and data collected from healthy subjects andunilaterally paralyzed stroke patients. The FBCSP employs afilter bank that covers 4-40Hz, which comprises 9 bandpassfilters that covers 4Hz each; and the CSP algorithm withm=2.

A. Publicly available BCI Competition III dataset IVa

The BCI Competition III dataset IVa [33] is collectedfrom 5 subjects (labeled ‘aa’, ‘al’, ‘av’, ‘aw’, ‘ay’) whoperformed right hand and right foot imagination [32]. Thedata for each subject comprises 280 trials of EEGmeasurements from 118 electrodes. The data is extractedfrom selected electrodes, starting from 0.5s to 2.5s after thevisual cue. The time segment and electrode selections areconsistent with the experiment performed in [6].

Fig. 3 presents the experimental results of unbiased10�10-fold cross-validations performed using FBCSP withvarious feature selection and classification algorithmsdescribed in section IV and V respectively. The results inFig. 3 shows that among the classification algorithms,NBPW, FLD and SVM yield superior test accuraciescompared against the CART, k-NN, RNFS and DENFIS.The results also show that among the feature selectionalgorithms used with NBPW, the MIBIF that selects 4 pairsof CSP features yields superior test accuracy. Specifically,the FBCSP with MIBIF4 and NBPW yields a test accuracyof 90.3(0.7%; whereas FLD yields 89.9(0.9%, and SVMyields 90.0(0.8%.

Fig. 4 shows the results of unbiased 10�10-fold cross-validations performed using the FBCSP with the NBPWclassification algorithm and the MINBPW wrapper-basedfeature selection algorithm (labeled as FBCSPw), as well asthe MIBIF filter-based feature selection algorithm thatselects 4 pairs of CSP features (labeled as FBCSPf). TheFBCSPw and FBCSPf have been shown to yield superior testaccuracies for wrapper and filter-based approachesrespectively compared to the rest in Fig. 3. Hence, onlyFBCSPw and FBCSPf are presented in this experiment. TheSBCSP has been shown to yield superior results on thisdataset compared against existing approaches such as CSSPand CSSSP in [6]. Therefore, this experiment compares onlywith CSP and SBCSP. In the experiment, both FBCSP andSBCSP employed 9 bandpass filters and used the sameNBPW classifier. This similar configuration is used to avoidad-hoc tuning of the classifiers in order to make a faircomparison between FBCSP and SBCSP. CSP is manuallyconfigured with a broad bandpass filter of 8-30Hz [12]. Theresults in Fig. 4 show that the FBCSPf yields superioraveraged test accuracy of 90.3(0.7%, whereas FBCSPw

yields 89.2(1.1%, SBCSP yields 86.3(1.1%, and CSPyields 86.6(0.7%. Hence, the results show that both theFBCSPf and FBCSPw yield statistically superior results thanSBCSP and CSP.

Fig. 5 shows the most significant CSP using thefrequency range that is autonomously selected in FBCSPf.

The results show that the centre of the vertex and the lefthemisphere discriminates the right foot action and the righthand action respectively for subjects ‘aa’, ‘al’, ‘aw’, and‘ay’. These results verify the neurophysiological plausibilityof the CSP projection matrix computed for these subjects.However, the result for subject ‘av’ does not show suchpatterns. This is a plausible reason for the relatively inferiortest accuracy obtained for subject ‘av’.

2008 International Joint Conference on Neural Networks (IJCNN 2008) 2395

kkang
Highlight

NBPW FLD CART KNN SVM RNFS DENFIS AVG80

82

84

86

88

90

92

Acc

ura

cy(%

)

Test accuracy

MIBIF1MIBIF2MIBIF3MIBIF4MIFSFRFS2MINBPWMIRSR

Fig. 3. Experimental results on the test accuracies of 10�10-fold cross-validations performed using FBCSP with various feature selection andclassification algorithms on publicly available BCI Competition III dataset IVa

aa al av aw ay AVG60

65

70

75

80

85

90

95

100

Acc

ura

cy(%

)

Test accuracy

CSPSBCSPFBCSP

w

FBCSPf

Fig. 4. Experimental results on the test accuracies of 10�10-fold cross-validations performed using CSP, SBCSP, FBCSPw and FBCSPf on BCICompetition dataset IVa

Foot Action

Right Action

-2 -1.5 -1 -0.5 0

Foot Action

Right Action

0 0.5 1 1.5 2

Foot Action

Right Action

-1 -0.5 0 0.5 1 1.5

Foot Action

Right Action

-3 -2.5 -2 -1.5 -1 -0.5 0

Foot Action

Right Action

-2 -1.5 -1 -0.5 0

(a) 12-28 Hz (b) 8-16 Hz (c) 8-24 Hz (d) 8-24 Hz (e) 8-24 HzFig. 5. Spatial Patterns of using FBCSPf on BCI Competition dataset IVa for subjects (a) ‘aa’, (b) ‘al’, (c) ‘av’, (d) ‘aw’, and (e) ‘ay’ respectively

2396 2008 International Joint Conference on Neural Networks (IJCNN 2008)

NBPW FLD CART KNN SVM RNFS DENFIS AVG60

65

70

75

80

85

Acc

ura

cy(%

)

Test accuracy

MIBIF1MIBIF2MIBIF3MIBIF4MIFSFRFS2MINBPWMIRSR

Fig. 6. Experimental results on the (a) training and (b) test accuracies of 10�10-fold cross-validations performed using FBCSP with various featureselection and classification algorithms on data collected from healthy subjects and unilaterally paralyzed stroke patients

h1t h2t h1m h2m p1tm p2tm p3tm p4tm p5tt p6mt p7mt AVG50

55

60

65

70

75

80

85

90

95

100

Acc

ura

cy(%

)

Test accuracy

CSPSBCSPFBCSP

w

FBCSPf

Fig. 7. Experimental results on the test accuracies of 10�10-fold cross-validations performed using CSP, SBCSP, FBCSPw and FBCSPf on datacollected from healthy subjects and unilaterally paralyzed stroke patients

B. Finger tapping and motor-imagery data collected fromhealthy subjects and unilaterally paralyzed stroke patients

This dataset is collected using Neuroscan NuAmps from2 healthy subjects (labeled ‘h1~’, ‘h2~’) and 7 unilaterallyparalyzed stroke patients (labeled ‘p1~’, ‘p2~’). The data iscollected with approval from the Ethics Approval Board.The healthy subjects performed left and right hand fingertapping (labeled ‘~tt) and motor imagery (labeled ‘~mm’).The unilaterally paralyzed stroke patients performed motorimagination of their disabled arm and finger tapping usingtheir unaffected arm. The data for each subject comprises160 trials of EEG measurements from 21 electrodes (F7, F3,Fz, F4, F8, FT7, FC3, FCz, FC4, FT8, C3, Cz, C4, TP7,CP3, CPz, CP4, TP8, P3, Pz, P4) starting from 0.5s to 2.5safter the visual cue.

Fig. 6 presents the experimental results of unbiased10�10-fold cross-validations performed using FBCSP withvarious feature selection and classification algorithmsdescribed in section IV and V respectively. The results in

Fig. 6 again shows that NBPW, FLD and SVM yieldsuperior test accuracies than CART, k-NN, RNFS andDENFIS. The results also show that the FBCSP withMIBIF4 and NBPW as well as SVM yield a superior testaccuracy of 81.1(2.2%, whereas FLD yields 80.9(2.1%.

Fig. 7 shows the results of unbiased 10�10-fold cross-validations performed using the FBCSP with NBPW and thewrapper-based MINBPW (labeled as FBCSPw), as well asthe filter-based MIBIF4 (labeled as FBCSPf). The results areagain compared against CSP and SBCSP. In the experiment,both FBCSP and SBCSP employed 9 bandpass filter bandsand used the same NBPW classifier. However, CSP ismanually configured with subject-specific operationalfrequency ranges. The results in Fig. 7 shows that theFBCSPf yields superior averaged test accuracy of81.1(2.2%, whereas FBCSPw yields 77.7(3.0%, SBCSPyields 75.3(2.7%, and CSP yields 73.3(2.0%. Hence, theresults show that both the FBCSPf and FBCSPw yieldstatistically superior results than SBCSP and CSP.

2008 International Joint Conference on Neural Networks (IJCNN 2008) 2397

VII. CONCLUSIONS

This paper proposed a novel machine learning approachcalled Filter Bank Common Spatial Pattern (FBCSP) forprocessing EEG measurements in motor imagery-based BCI.FBCSP addresses the problem of selecting an appropriateoperational frequency band for extracting discriminatingCSP features. FBCSP employs a feature selection algorithmto select discriminative CSP features from a bank ofmultiple bandpass filters and spatial filters, and aclassification algorithm to classify the selected features.FBCSP is capable of learning subject-specific patterns fromthe high-dimensional EEG measurements without operatorintervention as well as yielding relatively high classificationaccuracies. Since any feature selection and classificationalgorithms from the machine learning and computationalintelligence literature can be employed, FBCSP is moregeneralized than existing approaches such as the Sub-bandCommon Spatial Pattern (SBCSP).

Experimental results showed that the proposed FBCSPyields superior classification accuracy compared againstSBCSP and CSP with manually selected operationalfrequency bands. Based on the results, the MutualInformation Best Individual Feature (MIBIF) selectionalgorithm that selects 4 pairs of CSP features; and the NaïveBayes Parzen Window (NBPW), Fisher Linear Discriminant(FLD) or Support Vector Machines (SVM) classificationalgorithms are recommended for use with FBCSP in motorimagery-based BCIs..

ACKNOWLEDGMENT

The authors would like to thank Beng Ti Ang fromNational Neuroscience Institute, Karen Chua andChristopher Kuah from Tan Tock Seng Hospital, WangChuanchu and Phua Kok Soon from Institute for InfocommResearch A*STAR for their support in the data collectionfrom the stroke patients.

REFERENCES

[1] B. Rebsamen, E. Burdet, C. Guan, H. Zhang, C. L. Teo, Q. Zeng, C.Laugier, and M. H. Ang Jr., "Controlling a Wheelchair Indoors UsingThought," IEEE Intelligent Systems, vol. 22, no. 2, pp. 18-24, 2007.

[2] N. Birbaumer, "Brain-computer-interface research: Coming of age,"Clin. Neurophysiol., vol. 117, no. 3, pp. 479-483, 2006.

[3] J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller, and T.M. Vaughan, "Brain-computer interfaces for communication andcontrol," Clin. Neurophysiol., vol. 113, no. 6, pp. 767-791, 2002.

[4] B. Blankertz, G. Dornhege, M. Krauledat, K.-R. Muller, and G. Curio,"The non-invasive Berlin Brain-Computer Interface: Fast acquisition ofeffective performance in untrained subjects," NeuroImage, vol. 37, no.2, pp. 539-550, 2007.

[5] G. Pfurtscheller and C. Neuper, "Motor imagery and direct brain-computer communication," Proceedings of the IEEE, vol. 89, no. 7, pp.1123-1134, 2001.

[6] Q. Novi, C. Guan, T. H. Dat, and P. Xue, "Sub-band Common SpatialPattern (SBCSP) for Brain-Computer Interface," 3rd InternationalIEEE/EMBS Conference on Neural Engineering, 2007. CNE '07, pp.204-207, 2007.

[7] G. Dornhege, B. Blankertz, M. Krauledat, F. Losch, G. Curio, and K.-R. Muller, "Combined Optimization of Spatial and Temporal Filters for

Improving Brain-Computer Interfacing," IEEE Trans. Biomed. Eng.,vol. 53, no. 11, pp. 2274-2281, 2006.

[8] S. Lemm, B. Blankertz, G. Curio, and K.-R. Muller, "Spatio-SpectralFilters for Improving the Classification of Single Trial EEG," IEEETrans. Biomed. Eng., vol. 52, no. 9, pp. 1541-1548, 2005.

[9] G. Pfurtscheller and A. Aranibar, "Evaluation of event-relateddesynchronization (ERD) preceding and following voluntary self-pacedmovement," Electroencephalography and Clinical Neurophysiology,vol. 46, no. 2, pp. 138-146, 1979.

[10] A. Schnitzler, S. Salenius, R. Salmelin, V. Jousmaki, and R. Hari,"Involvement of Primary Motor Cortex in Motor Imagery: ANeuromagnetic Study," NeuroImage, vol. 6, no. 3, pp. 201-208, 1997.

[11] G. Pfurtscheller and F. H. Lopes da Silva, "Event-related EEG/MEGsynchronization and desynchronization: basic principles," Clin.Neurophysiol., vol. 110, no. 11, pp. 1842-1857, 1999.

[12] H. Ramoser, J. Muller-Gerking, and G. Pfurtscheller, "Optimal spatialfiltering of single trial EEG during imagined hand movement," IEEETrans. Rehabil. Eng., vol. 8, no. 4, pp. 441-446, 2000.

[13] J. Muller-Gerking , G. Pfurtscheller, and H. Flyvbjerg, "Designingoptimal spatial filters for single-trial EEG classification in a movementtask," Clin. Neurophysiol., vol. 110, no. 5, pp. 787-798, 1999.

[14] K. Fukunaga. Introduction to Statistical Pattern Recognition, 2nd ed.New York:Academic Press, 1990.

[15] W. Penfield and T. Rasmussen. The cerebral cortex of man: a clinicalstudy of localization of function. New York :Macmillan, 1950.

[16] E. R. Kandel, J. H. Schwartz, and T. M. Jessell. Essentials of neuralscience and behavior. Norwalk, CT:Appleton & Lange, 1995.

[17] R. Kohavi and G. H. John, "Wrappers for feature subset selection,"Artificial Intelligence, vol. 97, no. 1-2, pp. 273-324, 1997.

[18] R. Battiti, "Using mutual information for selecting features insupervised neural net learning," IEEE Trans. Neural Networks, vol. 5,no. 4, pp. 537-550, 1994.

[19] T. M. Cover and J. A. Thomas. Elements of Information Theory, 2nded. New York:Wiley, 2006.

[20] K. K. Ang and C. Quek, "Rough Set-based Neuro-Fuzzy System,"Proceedings of the IEEE International Joint Conference on NeuralNetworks (IJCNN '06), pp. 742-749, 2006.

[21] Z. Pawlak. Rough Sets: Theoretical Aspects of Reasoning about Data.Dordrecht, Boston:Kluwer Academic Publishers, 1991.

[22] R. W. Swiniarski and A. Skowron, "Rough set methods in featureselection and recognition," Patt. Rec. Lett., vol. 24, no. 6, pp. 833-849,2003.

[23] R. Jensen and Q. Shen, "Semantics-preserving dimensionalityreduction: rough and fuzzy-rough-based approaches," IEEE Trans.Know. Data Eng., vol. 16, no. 12, pp. 1457-1471, 2004.

[24] R. Jensen and Q. Shen, "New Approaches to Fuzzy-Rough FeatureSelection," IEEE Trans. Fuzzy Systems, pp. in press 2007.

[25] A. W. Bowman and A. Azzalini. Applied Smoothing Techniques forData Analysis: The Kernel Approach with S-Plus Illustrations. NewYork:Oxford University Press, 1997.

[26] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification, 2nd ed.New York:John Wiley, 2001.

[27] V. N. Vapnik. Statistical learning theory. New York:Wiley, 1998.[28] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone.

Classification and regression trees. Belmont, CA:WadsworthInternational Group, 1984.

[29] T. Cover and P. Hart, "Nearest neighbor pattern classification," IEEETrans. Information Theory, vol. 13, no. 1, pp. 21-27, 1967.

[30] J. Casillas, O. Cordón, F. Herrera, and L. Magdalena. InterpretabilityIssues in Fuzzy Modeling . Berlin:Springer-Verlag, 2003.

[31] N. K. Kasabov and Q. Song, "DENFIS: dynamic evolving neural-fuzzyinference system and its application for time-series prediction," IEEETrans. Fuzzy Systems, vol. 10, no. 2, pp. 144-154, 2002.

[32] G. Dornhege, B. Blankertz, G. Curio, and K.-R. Muller, "Boosting bitrates in noninvasive EEG single-trial classifications by featurecombination and multiclass paradigms," IEEE Trans. Biomed. Eng.,vol. 51, no. 6, pp. 993-1002, 2004.

[33] B. Blankertz, "BCI Competition III", Fraunhofer FIRST.IDA,http://ida.first.fraunhofer.de/projects/bci/competition_iii, 2005.

2398 2008 International Joint Conference on Neural Networks (IJCNN 2008)