11
Real-time Adaptive EEG Source Separation using Online Recursive Independent Component Analysis Sheng-Hsiou Hsu, Student Member, IEEE, Tim Mullen, Member, IEEE, Tzyy-Ping Jung, Fellow, IEEE, Gert Cauwenberghs, Fellow, IEEE Abstract —Independent Component Analysis (ICA) has been widely applied to electroencephalographic (EEG) biosignal pro- cessing and brain-computer interfaces. The practical use of ICA, however, is limited by its computational complexity, data re- quirements for convergence, and assumption of data stationarity, especially for high-density data. Here we study and validate an optimized online recursive ICA algorithm (ORICA) with online recursive least squares (RLS) whitening for blind source separation of high-density EEG data, which offers instanta- neous incremental convergence upon presentation of new data. Empirical results of this study demonstrate the algorithm’s: (a) suitability for accurate and efficient source identification in high-density (64-channel) realistically-simulated EEG data; (b) capability to detect and adapt to non-stationarity in 64- ch simulated EEG data; and (c) utility for rapidly extracting principal brain and artifact sources in real 61-channel EEG data recorded by a dry and wearable EEG system in a cognitive experiment. ORICA was implemented as functions in BCILAB and EEGLAB and was integrated in an open-source Real-time EEG Source-mapping Toolbox (REST), supporting applications in ICA-based online artifact rejection, feature extraction for real- time biosignal monitoring in clinical environments, and adaptable classifications in brain-computer interfaces. Index Terms—Independent component analysis, blind source separation, electroencephalography, biomedical signal processing, nonstationarity. I. INTRODUCTION I NDEPENDENT Component Analysis (ICA), as a means for blind source separation (BSS), has enjoyed great suc- cess in telecommunications and biomedical signal process- ing [1]. In biomedical applications, such as scalp electroen- cephalography (EEG), ICA methods have been widely used to separate artifacts such as eye blinks and muscle activities [2] and to study brain activities [3]. For example, ICA can ex- tract fetal electrocardiography (ECG) from maternal abdomen electrode recordings [4], and it can also isolate pathological activities associated with disease states of epilepsy [5]. In addition, applying ICA to remove task-irrelevant activities and This work was supported in part by a gift by the Swartz Foundation (Old Field, NY), by the Army Research Laboratory under Cooperative Agreement Number W911NF-10-2-0022, by NIH grant 1R01MH084819-03 and NSF EFRI-M3C 1137279. S.-H. Hsu and T.-P. Jung are with Dept. of Bioengineering (BIOE), Swartz Center for Computational Neuroscience,and Institute for Neural Computation (INC) of University of California at San Diego (UCSD), La Jolla, CA 92093, USA (e-mail: [email protected]). T. Mullen was with Dept. of Cognitive Science, SCCN, and INC of UCSD. He is now with Syntrogi Labs, San Diego, CA 92111, USA. G. Cauwenberghs is with BIOE and INC of UCSD. Digital Object Identifier 10.1109/TNSRE.2015.2508759 reduce dimensionality of data can improve the performance of Brain-Computer Interfaces (BCI) [6]. The application of ICA to EEG data is justified by a reason- able assumption that multi-channel scalp EEG signals arise as a mixture of weakly dependent latent non-Gaussian sources [7]. Although several ICA algorithms have been developed [1] to learn these sources from channel mixtures, most of the algorithms require access to large amount of training data and are only suitable for offline applications. Furthermore, the offline ICA algorithms commonly assume spatiotemporal stationarity of the data, as in the widely used Infomax ICA [8] and FastICA [9] algorithms. For a few ICA methods that allow non-stationarity such as Adaptive Mixture ICA [10], they are computationally expensive. In many real-world applications, including real-time functional neuroimaging [11], artifact rejection and adaptive BCI [6], online (sequential) source separation methods are needed. Desirable properties of an online method include fast convergence, real-time com- putational performance, and adaptivity to non-stationary data. Many existing online ICA methods are listed in Table I. Two major learning rules are least-mean-squares (LMS) and recursive-least-squares (RLS) methods. LMS-type algorithms use stochastic gradient descent approaches and are compu- tationally simple, but they require careful selection of an appropriate learning rate for stable convergence. Examples include Equivariant Adaptive Separation via Independence (EASI) [12] and Natural Gradient (NG) [13] methods. RLS- type algorithms accumulate past data in an exponentially de- caying fashion and use Sherman-Morrison matrix inversion to achieve higher convergence rate and better tracking capability, yet require complex computation [14], [15]. This category includes the RLS approach of Nonlinear PCA (RLS-NPCA) [16], Iterative Inversion [17], and Natural Gradient-based RLS (NG-RLS) [18]. Alternatively, Online Recursive ICA (ORICA) [19] gives an RLS-type recursive rule by solving a fixed- point approximation and has been shown to exhibit fast convergence and low computational complexity [20]. Readers can refer to [15], [17] and [21] for theoretical relationships and comparisons between the above methods. The aforementioned papers focused on theoretical deriva- tions and proofs of convergence and only demonstrated ap- plications of the methods to low-density data (fewer than 10 channels) and simulated “toy” examples such as sinusoidal and square waves. When the number of channels and sources increases, many existing algorithms exhibited slow conver- gence and poor real-time performance [9]. In a recent study, a real-time online ICA method for high-density EEG was 1534-4320 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. http://ieeexplore.ieee.org/Xplore

Xplore Real-time Adaptive EEG Source Separation using ...EEG Source-mapping Toolbox (REST), supporting applications in ICA-based online artifact rejection, featureextraction for real-

  • Upload
    others

  • View
    27

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Xplore Real-time Adaptive EEG Source Separation using ...EEG Source-mapping Toolbox (REST), supporting applications in ICA-based online artifact rejection, featureextraction for real-

Real-time Adaptive EEG Source Separation usingOnline Recursive Independent Component Analysis

Sheng-Hsiou Hsu, Student Member, IEEE, Tim Mullen, Member, IEEE,Tzyy-Ping Jung, Fellow, IEEE, Gert Cauwenberghs, Fellow, IEEE

Abstract—Independent Component Analysis (ICA) has beenwidely applied to electroencephalographic (EEG) biosignal pro-cessing and brain-computer interfaces. The practical use of ICA,however, is limited by its computational complexity, data re-quirements for convergence, and assumption of data stationarity,especially for high-density data. Here we study and validatean optimized online recursive ICA algorithm (ORICA) withonline recursive least squares (RLS) whitening for blind sourceseparation of high-density EEG data, which offers instanta-neous incremental convergence upon presentation of new data.Empirical results of this study demonstrate the algorithm’s:(a) suitability for accurate and efficient source identificationin high-density (64-channel) realistically-simulated EEG data;(b) capability to detect and adapt to non-stationarity in 64-ch simulated EEG data; and (c) utility for rapidly extractingprincipal brain and artifact sources in real 61-channel EEG datarecorded by a dry and wearable EEG system in a cognitiveexperiment. ORICA was implemented as functions in BCILABand EEGLAB and was integrated in an open-source Real-timeEEG Source-mapping Toolbox (REST), supporting applicationsin ICA-based online artifact rejection, feature extraction for real-time biosignal monitoring in clinical environments, and adaptableclassifications in brain-computer interfaces.

Index Terms—Independent component analysis, blind sourceseparation, electroencephalography, biomedical signal processing,nonstationarity.

I. INTRODUCTION

INDEPENDENT Component Analysis (ICA), as a meansfor blind source separation (BSS), has enjoyed great suc-

cess in telecommunications and biomedical signal process-ing [1]. In biomedical applications, such as scalp electroen-cephalography (EEG), ICA methods have been widely usedto separate artifacts such as eye blinks and muscle activities[2] and to study brain activities [3]. For example, ICA can ex-tract fetal electrocardiography (ECG) from maternal abdomenelectrode recordings [4], and it can also isolate pathologicalactivities associated with disease states of epilepsy [5]. Inaddition, applying ICA to remove task-irrelevant activities and

This work was supported in part by a gift by the Swartz Foundation (OldField, NY), by the Army Research Laboratory under Cooperative AgreementNumber W911NF-10-2-0022, by NIH grant 1R01MH084819-03 and NSFEFRI-M3C 1137279.

S.-H. Hsu and T.-P. Jung are with Dept. of Bioengineering (BIOE), SwartzCenter for Computational Neuroscience,and Institute for Neural Computation(INC) of University of California at San Diego (UCSD), La Jolla, CA 92093,USA (e-mail: [email protected]).

T. Mullen was with Dept. of Cognitive Science, SCCN, and INC of UCSD.He is now with Syntrogi Labs, San Diego, CA 92111, USA.

G. Cauwenberghs is with BIOE and INC of UCSD.Digital Object Identifier 10.1109/TNSRE.2015.2508759

reduce dimensionality of data can improve the performance ofBrain-Computer Interfaces (BCI) [6].

The application of ICA to EEG data is justified by a reason-able assumption that multi-channel scalp EEG signals arise asa mixture of weakly dependent latent non-Gaussian sources[7]. Although several ICA algorithms have been developed[1] to learn these sources from channel mixtures, most of thealgorithms require access to large amount of training dataand are only suitable for offline applications. Furthermore,the offline ICA algorithms commonly assume spatiotemporalstationarity of the data, as in the widely used Infomax ICA[8] and FastICA [9] algorithms. For a few ICA methodsthat allow non-stationarity such as Adaptive Mixture ICA[10], they are computationally expensive. In many real-worldapplications, including real-time functional neuroimaging [11],artifact rejection and adaptive BCI [6], online (sequential)source separation methods are needed. Desirable propertiesof an online method include fast convergence, real-time com-putational performance, and adaptivity to non-stationary data.

Many existing online ICA methods are listed in Table I.Two major learning rules are least-mean-squares (LMS) andrecursive-least-squares (RLS) methods. LMS-type algorithmsuse stochastic gradient descent approaches and are compu-tationally simple, but they require careful selection of anappropriate learning rate for stable convergence. Examplesinclude Equivariant Adaptive Separation via Independence(EASI) [12] and Natural Gradient (NG) [13] methods. RLS-type algorithms accumulate past data in an exponentially de-caying fashion and use Sherman-Morrison matrix inversion toachieve higher convergence rate and better tracking capability,yet require complex computation [14], [15]. This categoryincludes the RLS approach of Nonlinear PCA (RLS-NPCA)[16], Iterative Inversion [17], and Natural Gradient-based RLS(NG-RLS) [18]. Alternatively, Online Recursive ICA (ORICA)[19] gives an RLS-type recursive rule by solving a fixed-point approximation and has been shown to exhibit fastconvergence and low computational complexity [20]. Readerscan refer to [15], [17] and [21] for theoretical relationshipsand comparisons between the above methods.

The aforementioned papers focused on theoretical deriva-tions and proofs of convergence and only demonstrated ap-plications of the methods to low-density data (fewer than 10channels) and simulated “toy” examples such as sinusoidaland square waves. When the number of channels and sourcesincreases, many existing algorithms exhibited slow conver-gence and poor real-time performance [9]. In a recent study,a real-time online ICA method for high-density EEG was

1534-4320 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

http://ieeexplore.ieee.org/Xplore

Page 2: Xplore Real-time Adaptive EEG Source Separation using ...EEG Source-mapping Toolbox (REST), supporting applications in ICA-based online artifact rejection, featureextraction for real-

2

TABLE ICOMPARISON OF ONLINE ICA METHODS.

Name Author Year Learning rule and optimization method Pre-whiteningEASI Cardoso et al. [12] 1996 Relative gradient-based (LMS) that max. kurtosis LMSNG Amari et al. [13] 1996 Natural gradient-based (LMS) that min. mutual information no

RLS-NPCA Karhunen et al. [16] 1997 RLS that min. LSE of NPCA criterion PCAIterative

Inversion Cruces-Alvarez et al. [17] 2000Quasi-Newton method with iterative inversion

that decorrelates high order statistics no

NG-RLS Zhu et al. [18] 2004 RLS with natural gradient that min. LSE of NPCA criterion RLS

ORICA Akhtar et al. [19] 2012 Recursive rule with iterative invesrion fromfixed-point solution of Infomax with natural gradient no

proposed [20]. The method was compared with other offlineICA methods, and its stability and steady-state performancewere analyzed in [22].

Additionally, an important advantage of online ICA methodsis their ability to adapt to spatiotemporal non-stationary data,a common occurrence in real-world applications. For instance,spatial non-stationarity in the ICA (un)mixing matrix canarise as a consequence of location shifts in either sensors orsources, or changes in electrode impedances. However, fewof the online ICA methods have been carefully studied undernon-stationary conditions. Further investigation is needed tocharacterize algorithmic performance and optimal parameterselection using non-stationary simulated and real EEG data.

In this study, we extend ORICA as formulated in [19] and[20], and the contributions are three-fold. Firstly, we demon-strate ORICA’s suitability for accurate and efficient sourceidentification in a realistic simulation of stationary 64-channelEEG data. Specifically, we include a serial orthogonalizationstep of the unmixing matrix in the ORICA algorithm, andwe systematically examine the impact of parameters such asthe forgetting factor and block sizes for pre-whitening andORICA on algorithmic performance. Secondly, we examineORICA’s capability to adaptively decompose spatially non-stationary 64-channel EEG data corresponding to abrupt dis-placements of electrodes, a common source of spatial non-stationarity in real-world mobile applications. We introducea non-stationarity index and an adaptation approach for non-stationarity detection and online adaptation. Thirdly, we eval-uate ORICA’s real-world applicability for rapidly extractingprincipal brain and artifact sources using 61-channel real EEGdata recorded from a subject performing an Eriksen flankertask [23]. We demonstrate that ORICA and offline InfomaxICA [24] obtain comparable results in terms of extractinginformative independent components (ICs) and their singletrial and averaged event-related potentials (ERPs), yet ORICAcan learn the ICs online with less than half of the data. Finally,the proposed ORICA pipeline is made freely available asfunctions supported in BCILAB [25] and EEGLAB [26], andit is also integrated in an open-source Real-time EEG Source-mapping Toolbox (REST) [27].

II. METHODS

Standard ICA assumes a linear generative model x = As,where x represents scalp EEG observations, s contains un-known sources, and A is an unknown square mixing matrix.The objective is to learn an unmixing matrix B = A−1 such

that the sources are recovered exactly, up to an unknownpermutation and scaling matrix, by y = Bx, where yrepresent the recovered source activations. A column of B−1

represents the spatial distribution of a source over all channels,often referred to as a “component map.”

It is desirable to optimize the ICA contrast function, ameasurement of the degree of independence between sourcessuch as kurtosis or mutual information, under the decorrelationconstraint Ry = E[yyT ] = I. Hence the separating processcan be factored into two stages as B = WM , where Mis the whitening matrix that decorrelates the data and W isthe weight (preferably orthogonal) matrix that optimizes theICA contrast function [12], [18]. Serial update rules of Mand W and detailed features are presented in the followingsubsections.

A. Online recursive-least-squares (RLS) pre-whitening

Pre-whitening (decorrelating) the data reduces the numberof independent parameters an ICA update must learn, andcan improve convergence [1]. Pre-whitening may be efficientlycarried out in an online RLS-type learning rule [18]:

M n+1 = Mn+λn

1− λn

[I− vn · vT

n

1 + λn(vTn · vn− 1)

]M n, (1)

where n is the number of iterations, M n is the whiteningmatrix, vn =M nxn is the decorrelated data, λn is a forget-ting factor, and I is the identity matrix. A non-overlappingblock of data xn with a block size Lwhite is used at eachiteration to reduce the computational load and to increasethe robustness of the estimated correlation matrix vnvT

n . ThisRLS-type whitening rule exhibits faster convergence than LMSwhitening methods [18].

B. Online recursive ICA (ORICA)

The ORICA algorithm can be derived from an incrementalupdate form of the natural gradient learning rule of InfomaxICA [28]:

Wn+1 =W n + η[I −f (yn) · yTn ]Wn (2)

where yn = Wnvn , η is a learning rate, and f(·) is a nonlin-ear activation function. In the limit of a small η and assuminga fixed f(·), the convergence criterion 〈f (y) · yT 〉 = I leadsto a fixed-point solution in an iterative inversion form [19]:

W+n+1 = (1 − λn)W

+n + λnvn · fT (yn) (3)

Page 3: Xplore Real-time Adaptive EEG Source Separation using ...EEG Source-mapping Toolbox (REST), supporting applications in ICA-based online artifact rejection, featureextraction for real-

3

where W+n is the Moore-Penrose pseudoinverse of Wn and

λn is a forgetting factor for an exponentially weighted seriesof updates. It should be noted that λn differs from η, whichis the step size for stochastic gradient optimization.

Following [19], applying the Sherman-Morrison matrix in-version formula to Eq. 3, the final online recursive learningrule becomes:

Wn+1 =W n+λn

1− λn

[I− yn · fT (yn)

1 + λn(fT(yn) · yn − 1)

]Wn

(4)The near-identical forms of Eq. 4 and Eq. 1 allow us tounderstand ORICA as a nonlinear (or kernel) form of the RLSwhitening filter: ORICA’s use of non-linearity f(·) allows forindependence of sources for moments above second order.

Following the Eq. 4, the orthogonality of the weight Wis not guaranteed. To preserve the decorrelation property ofrecovered source activities y = WMx and maintain learningstability, we apply an orthogonal transformation to W aftereach ICA update:

W n+1 ← (V D−1/2V −1)W n+1 , (5)

where D and V contain, respectively, the eigenvalues andeigenvectors of Wn+1W

Tn+1 . Note that this orthogonalization

step is costly compared to the ORICA update. A possible alter-native would be to reformulate the ORICA update rule underthe orthogonal constraint or combine the serial whitening andweight updates into a single update rule [12].

1) Block-update rule: The typical single measurement vec-tor approach [29] requires application of the update rule(Eq. 4) for each data sample, which can be computationallyexpensive, particularly for the commonly-used MATLAB (TheMathworks, Natick, MA) runtime environment. To reducethe computational load and ensure consistent real-time per-formance, we may adopt an multiple measurement vectorapproach [29] and perform updates on short blocks of samples.To achieve this without loss of accuracy, we solve Eq. 4 fortime index l = n to l = n+L−1, assuming yl is approximatedas Wnvl and λl is small. This leads to a block-update rule[20]:

W n+L ≈(∏n+L−1

l=n1

1−λl

) · [I −∑n+L−1l=n

yl·f T(yl)1−λlλl

+fT (yl)·y l

]Wn

(6)In this form, the sequence of updates can be vectorized for

fast computation. Note that Eq. 6 appropriately accounts forthe decaying forgetting factor at each time point. This keepsthe approximation error to a minimum.

2) Forgetting factor: The forgetting factor λ determines aneffective length of a time window wherein data are aggregated.A large value of λ corresponds to a short window length. Inthis case, much heavier weights are applied to new data thanpast data, yielding fast adaptation and convergence yet largeerrors and variability. As a general rule, a large λ is preferredduring initial learning to promote fast convergence; a smallλ is suggested at convergence to minimize variance. To thisend, we adopt the forgetting factor with time-varying annealingdefined in [19]:

λn =λ0

nγ(7)

where λ0 is a fixed initial forgetting factor and γ determinesthe rate of exponential decay of λ towards zero as a functionof time. The same forgetting factor is applied to the RLSwhitening filter, although theoretically it can be different.

3) Nonlinear function: The choice of non-linearity f(·)depends on the probability distribution of the sources. Leeet al. [28] proposed an extended Infomax ICA algorithm thatadopts distinct activation functions to separate subgaussian andsupergaussian sources based on an estimate of source kurtosis.Here we follow [1], [19] and choose the component-wise non-linearity f(y) = −2 tanh(y) for supergaussian sources andf(y) = tanh(y) − y for subgaussian sources.

4) Number of sub- and super-gaussian sources: Whileapproaches for adaptively selecting f(y) within ORICA havebeen proposed [19], these are heuristic and presently lackconvergence proofs. In practice, we found that both conver-gence speed and run-time performance were improved by pre-assuming a fixed number of sub- and super-Gaussian sources.The detail of selecting an appropriate number was discussedin Section V and the effect of inaccurate assumptions wasexplored in Section IV.

C. Non-stationarity detection

In this study, the non-stationarity refers to any changes inthe ICA model x = As, including spatial non-stationarity ofthe mixing process between sources and the measurementsA and temporal non-stationarity of probability distributions ofsource activities s. Hence the non-stationarity might arise fromswitching of active brain sources, transient muscle activities,sensor displacement, or impedance changes. Our goal is topropose a generic approach to detect and adapt to the non-stationarity in EEG data.

1) Non-stationarity index: As previously described, OR-ICA is derived from a fixed point solution to the convergencecriterion 〈y · f(y)T 〉 = I, reflecting independence of sourcesfor moments above second order. Violation of this criteriononce an algorithm reaches steady-state can be interpreted as achange in the latent source or mixture distribution, which leadsus to define the following heuristic non-stationarity index:

δns = ||ynfT (yn)− I||F (8)

where || · ||F represents the Frobenius norm and n is thecurrent sample point. After ICA decomposition converges, δnswould remain small when data are stationary, while δns wouldincrease and fluctuate when data are non-stationary.

2) Adaptation of the forgetting factor: If the non-stationarity index δns increases above a threshold, we mayinterpret this as evidence of a change in the latent mixingmatrix and increase the RLS forgetting factor allowing ORICAto more rapidly adapt to the new mixing matrix. In thisstudy, the threshold value was heuristically chosen to be apercentage (e.g. 1-10%) of initial δns, which was severalstandard deviations above the mean of δns at convergence inthe simulated stationary data. Once δns reached the threshold,we increased the forgetting factor to its initial value.

Page 4: Xplore Real-time Adaptive EEG Source Separation using ...EEG Source-mapping Toolbox (REST), supporting applications in ICA-based online artifact rejection, featureextraction for real-

4

III. MATERIALS

A. Data collection

The performance of ORICA was evaluated under simulatedand real-world conditions. Previous works of online ICAmostly used “toy” simulations with artificially constructedsources (sinusoids, i.i.d. random data, etc), stationary mix-ing matrices, and relatively small numbers of channels andsources. Here we generated high-density EEG (64 channelsand 64 sources) under more realistic conditions, including useof auto-correlated stochastic sources, realistic source locationsand mixing matrices derived from Boundary Element Method(BEM) modeling, and spatial non-stationarity. Simulated datawere generated using the EEG simulation module in SourceInformation Flow Toolbox (SIFT) [30], using an approachsimilar to [31].

1) Simulated spatially stationary EEG data: To test OR-ICA’s performance in separating stationary EEG sources, wegenerated 64 supergaussian independent source time-seriesfrom stationary and random-coefficient order-3 autoregressive(AR-3) models (300Hz sampling rate, 10-min), assigned eachsource a random cortical dipole location, and projected thesethrough a zero-noise 3-layer BEM forward model (MNI“Colin27”) with standard 10-20 electrode locations matchingthe 64-channel Cognionics montage used subsequently forreal-world ORICA evaluation. This yielded 64-channel EEGdata.

2) Simulated spatially non-stationary EEG data: To eval-uate ORICA’s capability to adapt to spatial non-stationarity,we simulated abrupt shifts of the electrode montage duringcontinuous recording. We first generated 30 minutes of tempo-rally stationary AR-3 source data, as described above. The datawas partitioned into thirds. For each 10 minute segment, 64channel EEG data was generated using a unique BEM forward(mixing) matrix, corresponding respectively to (a) the standardelectrode montage, (b) a 5 degree anterior cap rotation, and(c) a subsequent 10 degree posterior cap rotation (5 degreeposterior rotation from standard position). The procedure isillustrated in Figure 5a.

3) Real EEG data: One session of high-density EEG datawas collected from a 24 year-old right-handed male subjectusing a 64-channel wearable and wireless EEG headset withdry electrodes (Cognionics, Inc) [32]. In the 20 minutessession, the subject performed a modified Eriksen flankertask [23] with a 133 ms delay between flanker and targetpresentation. The subject was asked to press buttons accordingto the target stimuli as quickly as possible. Flanker tasks areknown to produce robust error-related negativity (ERN, Ne) atfrontal-central electrode sites. The goal here is to extract theseevent-related potential (ERP) components from high-densityEEG data in a real-world setting using the proposed ORICApipeline.

B. The ORICA pipeline

As shown in Figure 1, the ORICA pipeline continuouslyfetched the streamed data with variable size LB in the onlinebuffer and processed the data with the three filters in sequence:a Butterworth IIR high-pass filter, an online RLS whitening

Fig. 1. The ORICA pipeline for online EEG data processing. X(t) is theinput data vector at time t and LB is the size of data in the online buffer.

TABLE IILIST OF PARAMETERS FOR THE ORICA PIPELINE: IIR HIGH-PASS FILTER

(IIR), ONLINE RLS WHITENING FILTER (RLS) , AND ONLINE RECURSIVEICA FILTER (ICA).

Filters Parameters Values DescriptionIIR BW 0.2−2 Hz Transition bandwidth

RLS Lwhite 1∼16 Block-average sizeRLS λ0 0.995 Initial forgetting factorICA γ 0.60 Decay rate of forgetting factor

ICA LICA 1∼16 Block-update sizensub 0 Number of subgaussian sources

filter, and an ORICA filter. The high-pass filter removed thetrend and low-frequency drift, ensuring the zero-mean criterionfor ICA was satisfied. For each update, the pipeline computedand output the whitening matrix M and the weight matrix Waccording to Eq. 1 and Eq. 6. The next non-overlapping datachunk was then used for the subsequent update.

The pipeline was implemented and analyzed in a simulatedonline environment using BCILAB, an open source MATLABtoolbox designed for BCI research [11], [25]. It was initializedwith the first second of data segment from the dataset. Forthe simulated 64-ch stationary data, we investigated the effectof the parameters on empirical convergence, and we usedblock sizes Lwhite = LICA = 16 as an example to showthe decomposed components since the block sizes returnedsatisfactory results in the shortest computational time. For thesimulated 64-ch non-stationary data and real 61-ch data, weused Lwhite = 8 and LICA = 1, which were found to beoptimal for the simulated stationary data. Table II summarizesthe parameters of the three filters.

C. Data processing and analysis

We applied additional procedures to process and analyzethe recorded EEG data from the subject. Firstly, an auto-matic removal of bad (e.g. flatlined or abnormally correlated)channels was applied prior to the ORICA pipeline usingBCILAB routines, which removed 3 out of 64 channels.

Page 5: Xplore Real-time Adaptive EEG Source Separation using ...EEG Source-mapping Toolbox (REST), supporting applications in ICA-based online artifact rejection, featureextraction for real-

5

Secondly, following application of the ORICA pipeline, thesource activities were epoched in a -400 to 600 msec windowtime locked to subject’s responses (button press), yielding693 epochs (104 error and 589 correct trials). The epochswere averaged to produce ERPs and were analyzed offlinein EEGLAB [26].

D. Performance evaluation

1) Performance index: If the ground truth (N -by-N) mix-ing matrix A is known, a performance index PI for assessingquality of source separation can be defined as [33]:

PI = 1N−1

[N − 1

2

N∑i=1

(max

1≤j≤m|Cij(n)|2

∑Nj=1 |Cij(n)|2 +

max1≤j≤m

|Cji (n)|2∑N

j=1 |Cji (n)|2

)](9)

where C(n) = W nM nA. This measures a normalized totalcross-talk error of the estimated whitening matrix M andweight matrix W , accounting for scale and permutation ambi-guities. For perfect separation at convergence, PI approacheszero.

2) Best-matched correlation coefficients and Hungarian al-gorithm: PI reflects ORICA’s global performance across allcomponents. However, it is also useful to evaluate convergenceof individual independent components (ICs), i.e. rows of W .One metric is the Pearson correlation between an estimatedIC and its counterpart in a “ground truth” weight matrix,W ∗ . Due to permutation ambiguities, a matching algorithmis required for optimal pairing of rows of W and W ∗ . Thisstudy used the Hungarian method [34] to maximize the sumof absolute pairwise correlations. We used Niclas Borlin’simplementation in EEGLAB’s matcorr(·) function.

IV. RESULTS

A. Simulated 64-ch stationary EEG data

1) Evaluation of the decomposed components: Figure 2aplots the correlation magnitudes between ORICA componentsand their ground-truth counterparts. The components weresorted such that smaller component ID represented fasterconvergence. We observed that all components coverged, i.e.correlation magnitudes approached 1, by the end of the 10-minute session. A common empirical heuristic for the numberof training samples required for separating N stable ICAsources using Infomax ICA was kN 2, where k > 25 [24].For 64 channels, the heuristic time required for convergenceamounted to 642 × 25 = 102400 samples = 5.7 minutes witha 300 Hz sample rate. By 5.7 minutes, 77% (91%) of ICsreached a correlation magnitude of 0.95 (0.8); by 3.5 minutes,more than half of the ICs reached a correlation magnitude of0.95. Figure 2b shows the evolution of the component maps ofa randomly selected IC #29 and its correlation magnitudes withground truth. This IC converged to a steady-state correlationmagnitude of 0.95 under 4 minutes. The superimposed globalperformance index (1 − PI, in green) exhibited a similarconvergence trajectory. Figure 3 shows 300-msec time-seriesof four representative ICs reconstructed by ORICA at 3, 6, and9 minutes. At 9 minutes, ORICA correctly reconstructed all thesource dynamics with the errors approached 0. At the heuristictime 5.7 minutes, only ID #58 had not converged. Interestingly,

Fig. 2. (a) Evolution of component-wise correlation magnitudes betweenORICA-decomposed ICs and ground truth on simulated stationary 64-ch EEGdata [20]. ICs sorted with respect to time required to reach a correlationmagnitude of 0.95 (solid curve). The dotted line is the heuristic time forseparating 64 stable ICs. (b) Evolution of correlation magnitudes (blue) andcomponent maps of a randomly selected IC #29. One minus the performanceindex (green) is superimposed.

Fig. 3. Source dynamics and the corresponding component maps of fourrepresentative components reconstructed by ORICA at (a) 3 min, (b) 6 min, or(c) 9 min of simulated mixed 64-channel stationary EEG data. Reconstructedsource dynamic (blue) is superimposed on ground truth (green) with error,i.e. difference, (red). The oscillatory (autocorrelated) and burst-like sourcedynamics, as well as homogeneous component maps of the four depictedICs, are characteristic of real EEG source dynamics.

components such as IC #3 converged within 3 minutes. Bothdecomposed component maps and recovered source dynamicsdemonstrated ORICA’s suitability for accurate and efficientdecomposition of high-density (64-channel) data, albeit withsystematic variation in convergence speed.

2) Effect of ORICA parameters: As shown in Figure 4, wesystematically evaluated the effects of four ORICA parameterson convergence. The decay rate of forgetting factor γ hada significant impact on convergence speed, with the fastestconvergence for γ = 0.6 (Figure 4a). A sigmoidal profilefor the convergence trajectory was observed for γ ≤ 0.6,while an exponential decay profile was observed for γ > 0.6.The ORICA block size LICA had negligible effect on theconvergence for LICA ≤ 64 (Figure 4b). This demonstrated

Page 6: Xplore Real-time Adaptive EEG Source Separation using ...EEG Source-mapping Toolbox (REST), supporting applications in ICA-based online artifact rejection, featureextraction for real-

6

Fig. 4. Effect of ORICA pipeline parameters on convergence trajectory,i.e. performance index over time, applied to 64-ch simulated stationary EEGdata. (a) Decay rate γ of forgetting factor, (b) block size LICA of ORICA,(c) block size Lwhite of the online whitening, and (d) pre-assumed numbernsub of subgaussian sources.

the approximation error of the block update rule (Eq. 6) wasnegligible for small to moderate block sizes. The block sizeof online whitening Lwhite significantly affected the ICAconvergence (Figure 4c). Lwhite between 4 and 8 samplesachieved the best performance. Interestingly, Lwhite = 1 wasnot the optimal value, mainly due to the effect of variabletime scale of adjustments in the whitening matrix on theconvergence of the subsequent ORICA. The pre-assumednumber of subgaussian sources nsub had little effect on theconvergence, within the range of nsub = 0 ∼ 3, with the truenumber nsub = 0. The performance of ORICA was ratherinsensitive to assumptions on the kurtosis of the sources. Forthe above results, we set γ = 0.6, LICA = 1, Lwhite = 1,and nsub = 0 unless otherwise noted.

3) Quantification of computational load: Table III showsthe average execution time required to apply the ORICApipeline to 1-second of data, computed by averaging theprocessing rates (data size divided by time) of the incomingdata chunks for 1 minute. Runtime was uniformly less thanone second, illustrating the 64-ch data were processed fasterthan accumulated in the input buffer, and thus the pipelinewas capable of real-time operation. The online whitening filterran 3 to 9 times faster than ORICA did and the runtimemonotonically decreased as the block size increased. Theexecution time of the ORICA filter was nearly halved asblock size doubled when L ≤ 8, with diminishing returnsfor L ≥ 16. This allows us to balance the tradeoff betweenruntime of the pipeline and accuracy of the block-update rule.

B. Simulated 64-ch non-stationary EEG data

Figure 5 plots ORICA’s peformance in tracking spatialnon-stationarity in simulated 64-channel EEG data, with thesimulated abrupt shifts of the EEG cap (Figure 5a). Figure 5bplots the non-stationarity index δns as a function of time. The

Fig. 5. Application of ORICA to 64-ch simulated non-stationary EEGdata consisting of three concatenated 10-minute sessions which simulatea 5-deg forward EEG cap rotation followed by a 10-deg backward caprotation. (a) Electrode locations for each session. (b) non-stationarity indexδns = ||yfT − I||F detects the abrupt change between sessions. (c) The PIconvergence curve shows the adaptation behavior. (d) Zoomed-in plots of log-scaled convergence curve with ground truth component maps (top row) andreconstructed IC at different time points.

index robustly identified changes in the mixing matrix due tocap displacements. Figure 5c plots the performance index ofORICA’s decomposition as a function of time. Following thedetection of non-stationarity, ORICA’s forgetting factor wasreset to its initial value, and ORICA smoothly adapted to thenew mixing matrix. Figure 5d plots the ground truth and theestimated component maps for a representative IC at differenttime-points, superimposed on a plot of log-transformed PI.For this IC, suitable convergence was obtained within 15 min-utes, and improved further over time. The effect of cap rotationwas captured by the concomitant shift of the component maps,indicating ORICA’s capability to detect and adapt to the spatialabrupt non-stationarity.

C. Real 61-ch EEG data from the flanker task

Since the ground truth was unknown for real EEG data, weadopted the offline Extended Infomax ICA algorithm [28], asimplemented in the EEGLAB [26] function RUNICA, as a“gold standard”. The robustness and stability of the algorithmon high-density EEG data had been shown to outperform mostblind source separation algorithms [7].

To confirm whether ORICA yielded a comparable solutionat convergence (average over the last minute) as RUNICA, weinvestigated event-related activities of three sets of components

Page 7: Xplore Real-time Adaptive EEG Source Separation using ...EEG Source-mapping Toolbox (REST), supporting applications in ICA-based online artifact rejection, featureextraction for real-

7

TABLE IIIAVERAGED EXECUTION TIME (MS) FOR 1 SEC (300 SAMPLES) 64-CHANNEL DATA USING ONLINE RLS WHITENING AND ORICA WITH DIFFERENT

BLOCK SIZES Lwhite AND LICA.

Time (ms) Block SizeAlgorithm 1 2 4 8 16 32 64 128

RLS 35.2±5.2 23.8±7.9 13.5±6.0 8.6±4.3 5.3±1.5 4.1±4.5 3.0±4.2 2.5±4.9ORICA 332±29 174±37 79.5±14.5 36.3±11.8 21.0±5.2 15.1±7.2 10.6±9.4 6.6±2.6

Run in MATLAB 2012a on a dual-core 2.50GHz Intel Core i5-3210M CPU with 8GB RAM.

Fig. 6. Color-coded event-related potential (ERP) images (trials by time) of (a) fronto-central, (b) occipital, and (c) prefrontal components reconstructed byORICA (top row) and RUNICA (bottom row) on real 61-ch EEG data from the flanker task, time-locked to the response at time 0 (vertical straight line).Averaged ERP traces are shown in bottom panel. Only error trials are included in (a) such that error-related negativity (ERN) can be observed as red arrowsindicate. In (b) and (c), all trials are sorted based on reaction time, i.e. onset of flanker stimulus (sawtooth line) to response. A visually evoked potential(VEP) is clearly observed in (b). Green arrows in (c) indicate to eye blinks.

Fig. 7. (a) Evolution of component-wise correlation magnitudes betweenORICA- and RUNICA-decomposed ICs on real 61-ch EEG data from theflankerTask. (b) Evolution of correlation magnitudes and spatial filters of threerapidly converged ICs: prefrontal (eye-blinks), occipital (VEP), and fronto-central (ERN) components.

with stereotypical fronto-central, occipital, and prefrontal spa-tial topographies. Figure 6a revealed that the ORICA- andRUNICA-decomposed fronto-central ICs and their characteris-

tic ERN were comparable and consistent with the results fromprevious studies [11], [35]. Figure 6b shows that averagedoccipital visual-evoked potentials (VEP) elicited by flanker (0-50 msec window after the onset of stimulus) and target (100-150 msec window after) presentation were clearly observedusing both methods. Figure 6c shows eye blinks time-lockedto response (click of button), as previously described in [35].The ERN and VEP shown in Figure 6 were representative,with comparable results obtained from all subjects, indicat-ing the reproducibility of the ORICA pipeline. In summary,the empirical results demonstrated comparable performanceof ORICA to RUNICA in separating informative ICs andresolving single trial and averaged ERPs. Furthermore, ORICArequired significantly less computation time than RUNICA.

Adopting a procedure similar to Figure 2, Figure 7a plots thecorrelation magnitudes between all ICs learned by ORICA andtheir best-matched RUNICA counterparts. Firstly, only 10%(26%) of the ORICA components reached a correlation mag-nitude of 0.9 (0.8) at the end of the session (average over thelast minute). Secondly, 8% (21%) of the components reacheda correlation magnitude of 0.9 (0.8) within 3-4 minutes, onlyhalf of the empirical heuristic time suggested by [24].

Among those ICs with the highest correlation magnitudes(larger than 0.95), we found a number of ICs with stereo-typical and plausible component maps. Figure 7b plots theconvergence profile (evolution of correlation magnitudes) forthree such informative ICs: prefrontal (IC 1, accounting foreye-blink), occipital (IC 12, accounting for VEP), and fronto-central (IC 7, accounting for ERN) components. These ICs

Page 8: Xplore Real-time Adaptive EEG Source Separation using ...EEG Source-mapping Toolbox (REST), supporting applications in ICA-based online artifact rejection, featureextraction for real-

8

converged to their RUNICA counterparts within 3-4 minutes.The correlation magnitude curves of both the occipital andfronto-central components fluctuated across time and eventu-ally reached a steady-state. In contrast, the correlation mag-nitude time series of the prefrontal component was relativelystable across the whole session.

Figure 2a and Figure 7a exhibit significant performancedifferences in decomposing simulated versus real EEG data,which can be attributed to the differences in the qualityof the gold standard. Those ICs producing poor correlationcorresponded to non-dipolar sources in the gold standard, i.e.,sources with high residual variance in dipole fitting, suchas mixtures of sources or noise [7]. This phenomenon iscommonly observed and reported when applying ICA methodsto real EEG data that are inevitably noisy and likely non-stationary [1], [6].

V. DISCUSSION

A. Fast Convergence Speed

The ORICA pipeline was capable of accurately decom-posing 64-channel simulated EEG data within the requiredheuristic convergence time (kN 2) [24]. The fast convergencecould be attributed to three important factors: (a) combiningonline RLS pre-whitening and ORICA, (b) choosing an opti-mal forgetting factor profile and parameters, and (c) fixing thenumbers of modeled sub- and super-gaussian sources.

Simulation results showed that faster convergence of thewhitening matrix facilitated ICA convergence. This was con-sistent with the findings in previous studies [1], [18], whichsuggested pre-whitening could significantly improve the ICAconvergence by reducing the dimensionality of the parameterspace. For online whitening, a local block-average approachcould provide a more robust estimate than a stochastic (singlesample) update approach. Besides RLS whitening, online LMSwhitening method [12] can also be considered, which haslower computational complexity, but slower convergence [18].

The forgetting factor, especially its decay rate γ, had asignificant effect on ORICA’s convergence. For highly non-stationary data, a large γ and thus a shorter effective windowsize were preferred. Factors including data dimension (numberof channels) and underlying stationarity of the data affectedthe choice of optimal parameters. Alternative approaches foradapting the forgetting factor were suggested in [17] and [36].

Fixing the numbers of sub- and super-gaussian sourcesincreased both the stability and the speed of convergence,especially for high-density data. The experiments with syn-thetic data showed that the choice was not critical, and amodel mismatch in these numbers could be tolerated. Thisstudy further examined the kurtosis of the real 61-ch EEGdata and found that all sources were supergaussian distributed,which was consistent with previous studies [2], [26] that EEGsignals from most brain activities and non-brain artifacts wereprimarily supergaussian. For greater accuracy under generalconditions, online kurtosis estimation as described in [19]and [37] can be incorporated into the ORICA pipeline . Thismay lead to better steady-state performance but potentiallydecreased stability of convergence.

Several approaches not yet implemented in this study canfurther improve the convergence speed of ORICA. For ex-ample, dimensionality reduction methods such as PCA, orselecting a subset of channels prior to ORICA decomposition,can reduce the empirical convergence time. Another promisingapproach is to pre-process the data using artifact reductionmethods such as Artifact Subspace Reconstruction [11] to mit-igate sensitivity to transient artifacts in noisy high dimensionalEEG data.

B. Real-time processing

ORICA was implemented as a BCILAB function withblock-update vectorization, and could easily perform real-timeprocessing with a user-defined block-size for 64-channel EEGon a standard laptop. The block update rule, while approxi-mate, incurred negligible loss in accuracy up to LICA = 64for 64-channel simulated data. Even without the block-update(Lwhite = LICA = 1), the ORICA pipeline still ran in real-time. The block update may be most valuable when computa-tional resources are constrained; for instance, when applyingmultiple data processing operations in serial or operating ona low-power mobile device.

C. Application to real EEG data

Empirical results on the 61-channel EEG data collected inthe flanker task experiment demonstrated that ORICA coulddecompose brain sources and artifact ICs that resembledresults from standard RUNICA. We observed that the mostinformative ICs, such as VEP and ERN brain sources and eye-blink artifacts, had the highest correlation magnitudes amongall ICs and converged much quicker than the heuristic conver-gence time—a fortunate circumstance for real-time applica-tions in mobile EEG BCI. We speculate that those componentsexhibit robust and frequently occurring statistical patternswhich facilitate ICA separation. These observed phenomenasupport applications of ORICA for rapid decomposition ofhigh-density data as much less time is required to decomposebrain and artifact components.

The ORICA pipeline also revealed non-stationarity in theexperimental data, captured by the dynamics of componentmaps and the non-stationarity index. One challenge for ORICAand RUNICA is the order switching of ICs, especially fornon-stationary real EEG data. This hinders the identificationof informative ICs over time, i.e. tracking the same set ofcomponents regardless of the weight matrix permutations. Onesolution to the problem is to sort the current weight matrixW (t) based on the correlation matching with the previousweight matrixW (t−1) using the Hungarian method describedin section III-D to keep track of the same components.This is useful for online identifying and separating artifactcomponents from noisy EEG recordings.

D. Non-stationarity detection and adaptation

This study proposed using the Frobenius norm of ICA errormatrix, ||yf(y)T−I||F , for the ORICA pipeline as an index todetect non-stationarity events in the data, identified as abrupt

Page 9: Xplore Real-time Adaptive EEG Source Separation using ...EEG Source-mapping Toolbox (REST), supporting applications in ICA-based online artifact rejection, featureextraction for real-

9

changes in the mixing matrix. This non-stationarity indexmeasurement captured the fitness of the ICA model to thecurrent data, i.e. the degree to which nonlinear decorrelationwas achieved. Alternative forms for the non-stationarity indexcan be used depending on applications, for instance mutualinformation reduction (MIR) in windowed data provides ameasure of statistical independence between sources [7] andthus can capture the changes in data statistics as MIR varies.

This study also presented a method of non-stationary adap-tation by increasing the forgetting factor when the non-stationarity index exceeds a hard threshold, e.g. 1-10% ofthe initial value when ICA had not converged. However, thismethod required prior knowledge of the hard threshold and didnot address continuous variation in degree of non-stationarity.For selection of the threshold, it is possible to design anadaptive threshold that depends on the online estimated meanand standard deviation of δns. For adaption of the forgettingfactor, one possible solution is to adopt the strategy similarto the adaptive learning rate proposed by Murata et al. [38]for a gradient-based algorithm in an online environment.An adaptive forgetting factor for ORICA, as an RLS-likerecursive online algorithm, is crucial for its ability to tracknon-stationarity, calling for further investigation.

E. Applications and future directions

The proposed online ICA method for real-time processingof high-density EEG opens up new opportunities for the fol-lowing potential applications: (1) ICA-based real-time artifactremoval (especially for sporadic muscle activities), (2) ICA-based brain activity monitoring (e.g. epilepsy, etc) for clinicalpractice, and (3) adaptable ICA-based features for brain state(e.g. cognitive functions, fatigue level, etc) classification inreal-time brain-computer interfaces.

A significant next step is to leverage ORICA for real-time source localization, for instance using anatomicallyconstrained low resolution electrical tomographic analysis(LORETA) [39]. A recent study [27] attempted to combineonline ICA and source localization, yet further validation ofsources’ reliability were needed. Knowledge of the sourcelocations in the brain can be used to assess the reliability of thesources (e.g. validate consistency of the source locations overtime and with anatomical expectations), to provide biologicalinterpretation of the decomposed sources, and to integratewith other real-time source-level methods such as connectivityanalysis in SIFT [11], [30].

VI. CONCLUSION

This study proposed and demonstrated an efficient compu-tational pipeline for real-time, adaptive blind source separationof EEG data using Online Recursive ICA. The efficacy of theproposed pipeline was demonstrated on three datasets: a sim-ulated 64-channel stationary dataset, a simulated 64-channelnon-stationary dataset, and a real 61-channel EEG datasetcollected under an Eriksen flanker task. Through applicationof ORICA to simulated stationary data we (a) systematicallyevaluated the effects of key parameters on convergence; (b)characterized the convergence speed, steady state performance,

and computational load of the algorithm; and (c) quantitativelycompared the proposed ORICA method with a standard offlineInfomax ICA algorithm. Our analysis of a simulated non-stationary 64-channel EEG dataset demonstrated ORICA’sability to adaptively track changes in the mixing matrix dueto electrode displacements.

Applied to 61-channel experimental EEG data, we demon-strated ORICA’s ability to decompose brain and artifact sub-spaces online, with comparable performance to offline Info-max ICA. Furthermore, we found that subspaces of biolog-ically plausible ICs (e.g. eye, occipital and frontal midlinesources) could be reliably learned in much less time thanrequired by the kN 2 empirical heuristic for ICA convergence.To serve the EEG and BCI communities, the proposed pipelinehas been implemented as BCILAB [25] and EEGLAB compat-ible functions, it has also been integrated into an open-sourceReal-time EEG Source-mapping Toolbox (REST) [27]. Futurework will focus on further validation of this promising methodas well as application to artifact rejection, clinical monitoring,and brain-computer interfaces [11].

ACKNOW LEDGMENT

The authors would like to thank Christian Kothe for helpwith BCILAB integration.

REFERENCES

[1] A. Hyvarinen,J. Karhunen,and E. Oja, Independent componentanalysis.John Wiley & Sons, 2004, vol. 46.

[2] T.-P. Jung, S. Makeig, C. Humphries, T.-W. Lee, M. J. Mckeown,V. Iragui, and T. J. Sejnowski, “Removing electroencephalographicartifacts by blind source separation,” Psychophysiology, vol. 37, no. 2,pp. 163–178, 2000.

[3] S. Makeig, M. Westerfield, T.-P. Jung, S. Enghoff, J. Townsend,E. Courchesne, and T. Sejnowski, “Dynamic brain sources of visualevoked responses,” Science, vol. 295, no. 5555, pp. 690–694, 2002.

[4] L. De Lathauwer, B. De Moor, and J. Vandewalle, “Fetal electrocar-diogram extraction by blind source subspace separation,” IEEE Trans.Biomedical Engineering, vol. 47, no. 5, pp. 567–572, 2000.

[5] T. Mullen,G. Worrell, and S. Makeig, “Multivariate principal oscillationpattern analysis of ica sources during seizure,” in Proc. IEEE-EMBC,2012, pp. 2921–2924.

[6] Y. Wang and T.-P. Jung, “Improving brain–computer interfaces usingindependent component analysis,” in Towards Practical Brain-ComputerInterfaces. Springer, 2013, pp. 67–83.

[7] A. Delorme, J. Palmer, J. Onton, R. Oostenveld, and S. Makeig,“Independent eeg sources are dipolar,” PloS one, vol. 7, no. 2, p. e30135,2012.

[8] A. J. Bell and T. J. Sejnowski, “An information-maximization approachto blind separation and blind deconvolution,”Neural computation, vol. 7,no. 6, pp. 1129–1159, 1995.

[9] A. Hyvarinen and E. Oja, “Independent component analysis: algorithmsand applications,” Neural networks, vol. 13, no. 4, pp. 411–430, 2000.

[10] J. A. Palmer, K. Kreutz-Delgado, B. D. Rao, and S. Makeig, “Modelingand estimation of dependent subspaces with non-radially symmetricand skewed densities,” in Independent Component Analysis and SignalSeparation. Springer, 2007, pp. 97–104.

[11] T. Mullen, C. Kothe, Y. M. Chi, A. Ojeda, T. Kerth, S. Makeig,G. Cauwenberghs, and T.-P. Jung, “Real-time modeling and 3d visu-alization of source dynamics and connectivity using wearable eeg.” inProc. IEEE-EMBC, 2013, pp. 2184–2187.

[12] J.-F. Cardoso and B. H. Laheld, “Equivariant adaptive source separa-tion,” IEEE Trans. Signal Processing, vol. 44, no. 12, pp. 3017–3030,1996.

[13] S.-I. Amari, A. Cichocki, H. H. Yang et al., “A new learning algorithmfor blind signal separation,” Advances in neural information processingsystems, pp. 757–763, 1996.

Page 10: Xplore Real-time Adaptive EEG Source Separation using ...EEG Source-mapping Toolbox (REST), supporting applications in ICA-based online artifact rejection, featureextraction for real-

10

[14] X. Giannakopoulos, J. Karhunen, and E. Oja, “An experimental compar-ison of neural algorithms for independent component analysis and blindseparation,” International Journal of Neural Systems, vol. 9, no. 02, pp.99–114, 1999.

[15] X.-L. Zhu and X.-D. Zhang, “Adaptive rls algorithm for blind sourceseparation using a natural gradient,” IEEE Signal Processing Letters,vol. 9, no. 12, pp. 432–435, 2002.

[16] J. Karhunen and P. Pajunen, “Blind source separation and trackingusing nonlinear pca criterion: A least-squares approach,” in Proc. IEEEInternational Conf. on Neural Networks, vol. 4, 1997, pp. 2147–2152.

[17] S. Cruces-Alvarez, A. Cichocki, and L. Castedo-Ribas, “An iterativeinversion approach to blind source separation,” IEEE Trans. NeuralNetworks, vol. 11, no. 6, pp. 1423–1437, 2000.

[18] X. Zhu, X. Zhang, and J. Ye, “Natural gradient-based recursive least-squares algorithmforadaptiveblind source separation,” Science in ChinaSeries F: Information Sciences, vol. 47, no. 1, pp. 55–65, 2004.

[19] M. T. Akhtar, T.-P. Jung, S. Makeig, and G. Cauwenberghs, “Recursiveindependent component analysis for online blind source separation,” inProc. IEEE-ISCAS, 2012, pp. 2813–2816.

[20] S.-H. Hsu, T. Mullen, T.-P. Jung, and G. Cauwenberghs, “Online recur-sive independent component analysis for real-time source separation ofhigh-density eeg,” in Proc. IEEE-EMBC, 2014, pp. 3845–3848.

[21] H. H. Yang and S.-I. Amari, “Adaptive online learning algorithms forblind separation: maximum entropy and minimum mutual information,”Neural computation, vol. 9, no. 7, pp. 1457–1482, 1997.

[22] S.-H. Hsu, T. Mullen, T.-P. Jung, and G. Cauwenberghs, “Validatingonline recursive independent component analysis on eeg data,” in Proc.IEEE EMBS Conf. Neural Engineering, 2015, pp. 918–921.

[23] G. McLoughlin, B. Albrecht, T. Banaschewski, A. Rothenberger,D. Brandeis, P. Asherson, and J. Kuntsi, “Performance monitoring isaltered in adult adhd: a familial event-related potential investigation,”Neuropsychologia, vol. 47, no. 14, pp. 3134–3142, 2009.

[24] J. Onton and S. Makeig, “Information-based modeling of event-relatedbrain dynamics,” Progress in brain research, vol. 159,pp. 99–120, 2006.

[25] C. A. Kothe and S. Makeig, “Bcilab: a platform for brain–computerinterface development,” Journal of neural engineering, vol. 10, no. 5, p.056014, 2013.

[26] A. Delorme, T. Mullen, C. Kothe, Z. A. Acar, N. Bigdely-Shamlo,A. Vankov, and S. Makeig, “Eeglab, sift, nft, bcilab, and erica: newtools for advanced eeg processing,” Computational intelligence andneuroscience, vol. 2011, p. 10, 2011.

[27] L. Pion-Tonachini, S.-H. Hsu, S. Makeig, T.-P. Jung, and G. Cauwen-berghs, “Real-time eeg source-mapping toolbox (rest): Online ica andsource localization,” in Proc. IEEE-EMBC, 2015, pp. 4114–4117.

[28] T.-W. Lee, M. Girolami, and T. J. Sejnowski, “Independent componentanalysis using an extended infomax algorithmfor mixed subgaussian andsupergaussian sources,” Neural computation,vol. 11,no. 2, pp.417–441,1999.

[29] J. Chen and X. Huo, “Theoretical results on sparse representationsof multiple-measurement vectors,” IEEE Trans. on Signal Processing,vol. 54, no. 12, pp. 4634–4643, 2006.

[30] T. R. Mullen, The Dynamic Brain: Modeling Neural Dynamics andInteractions From Human Electrophysiological Recordings. Universityof California, San Diego, 2014.

[31] S. Haufe, R. Tomioka, G. Nolte, K.-R. Muller, and M. Kawanabe,“Modeling sparse connectivity between underlying brain sources foreeg/meg,” IEEE Trans. Biomedical Engineering, vol. 57, no. 8, pp.1954–1963, Aug 2010.

[32] S. Ha, C. Kim, Y. Chi, A. Akinin, C. Maier, A. Ueno, and G. Cauwen-berghs, “Integrated circuits and electrode interfaces for noninva-sive physiological monitoring.” IEEE Trans. Biomedical Engineering,vol. 61, no. 5, pp. 1522–1537, 2014.

[33] S. C. Douglas, “Blind signal separation and blind deconvolution,”Handbook of neural network signal processing, vol. 6, pp. 1–28, 2001.

[34] G. Carpaneto and P. Toth, “Algorithm 548: Solution of the assignmentproblem [h],” ACM Transactions on Mathematical Software (TOMS),vol. 6, no. 1, pp. 104–111, 1980.

[35] M. Falkenstein, J. Hoormann, S. Christ, and J. Hohnsbein, “Erp com-ponents on reaction errors and their functional significance: a tutorial,”Biological psychology, vol. 51, no. 2, pp. 87–107, 2000.

[36] C. Paleologu, J. Benesty, and S. Ciochina, “A robust variable forgettingfactor recursive least-squares algorithm for system identification,” IEEESignal Processing Letters, vol. 15, pp. 597–600, 2008.

[37] A. Cichocki and R. Thawonmas, “On-line algorithm for blind signalextraction of arbitrarily distributed, but temporally correlated sourcesusing second order statistics,” Neural Processing Letters, vol. 12, no. 1,pp. 91–98, 2000.

[38] N. Murata, M. Kawanabe, A. Ziehe,K.-R. Muller,and S.-I. Amari, “On-line learning in changing environments with applications in supervisedand unsupervised learning,” Neural Networks, vol. 15, no. 4, pp. 743–760, 2002.

[39] N. J. Trujillo-Barreto, E. Aubert-Vazquez, and P. A. Valdes-Sosa,“Bayesian model averaging in eeg/meg imaging,” NeuroImage, vol. 21,no. 4, pp. 1300–1319, 2004.

Sheng-Hsiou Hsu (S’14) received the B.S. degreein electrical engineering from the National TaiwanUniversity, Taipei, Taiwan, in 2011. He was a solerecipient of the scholarship of government spon-sorship for overseas study in Biomedical Engineer-ing in 2012. He is currently a Ph.D. student inbioengineering at University of California at SanDiego, La Jolla CA, working with the Swartz Centerfor Computational Neuroscience and Institute forNeural Computation. His major research interestsinclude brain-computer interfaces, machine learning,

and signal processing with respect to functional imaging of brain dynamics.

Tim R. Mullen (S’13-M’14) received dual B.A.degrees in Computer Science (high honors) andCognitive Neuroscience (highest honors) in 2008from the University of California, Berkeley. Hereceived M.S. (2011) and Ph.D (2014) degrees inCognitive Sciences from University of California,San Diego while at the Swartz Center for Com-putational Neuroscience. Graduate awards includedGlushko, Swartz, and San Diego fellowships; bestpaper awards; and the 2014-15 UCSD Chancellor’sDissertation Medal. His research has focused on

modeling neural dynamics and interactions from human electrophysiologicalrecordings, with applications to clinical and cognitive neuroscience and neuralinterfaces. He is co-founder and CEO at San Diego neurotechnology companySyntrogi Inc (aka Qusp) and Research Director of its R&D division SyntrogiLabs.

Tzyy-Ping Jung (F15) received the B.S. degree inelectronics engineering from National Chiao TungUniversity, Hsinchu, Taiwan, in 1984, and the M.S.and Ph.D. degrees in electrical engineering fromThe Ohio State University, Columbus, OH, USA,in 1989 and 1993, respectively. He is currently aResearch Scientist and the Co-Director of the Centerfor Advanced Neurological Engineering, Institute ofEngineering in Medicine, University of California-San Diego (UCSD), La Jolla, CA, USA. He is alsoan Associate Director of the Swartz Center for Com-

putational Neuroscience, Institute for Neural Computation, and an AdjunctProfessor of Bioengineering at UCSD. In addition, he is an Adjunct Professorof Computer Science, National Chiao Tung University, Hsinchu, Taiwan. Hisresearch interests are in the areas of biomedical signal processing, cognitiveneuroscience, machine learning, time-frequency analysis of human EEG,functional neuroimaging, and braincomputer interfaces and interactions. Heis currently an Associate Editor of IEEE TRANSACTIONS ON BIOM EDICALCIRCUITS AND SYSTEM S.

Page 11: Xplore Real-time Adaptive EEG Source Separation using ...EEG Source-mapping Toolbox (REST), supporting applications in ICA-based online artifact rejection, featureextraction for real-

11

Gert Cauwenberghs (S’89-M’94-SM’04-F’11) isProfessor of Bioengineering and Co-Director of theInstitute for Neural Computation at UC San Diego,La Jolla CA. He received the Ph.D. degree inElectrical Engineering from California Institute ofTechnology, Pasadena in 1994, and was previouslyProfessor of Electrical and Computer Engineeringat Johns Hopkins University, Baltimore, MD, andVisiting Professor of Brain and Cognitive Science atMassachusetts Institute of Technology, Cambridge.He co-founded Cognionics Inc. and chairs its Sci-

entific Advisory Board. His research focuses on micropower biomedicalinstrumentation, neuron-silicon and brain-machine interfaces, neuromorphicengineering, and adaptive intelligent systems. He received the NSF CareerAward in 1997, ONR Young Investigator Award in 1999, and PresidentialEarly Career Award for Scientists and Engineers in 2000. He serves IEEE ina variety of roles including as General Chair of the IEEE Biomedical Circuitsand Systems Conference (BioCAS 2011, San Diego), as ProgramChair of theIEEE Engineering in Medicine and Biology Conference (EMBC 2012, SanDiego),and as Editor-in-Chiefof the IEEE TRANSACTIONS ON BIOMEDICAL

CIRCUITS AND SYSTEMS.