The Empirical Mode Decomposition and the Hilbert-Huang Transform

The Empirical Mode Decomposition and the Hilbert-Huang Transform

Guest Editors: Nii Attoh-Okine, Kenneth Barner, Daniel Bentil, and Ray Zhang

EURASIP Journal on Advances in Signal Processing

The Empirical Mode Decompositionand the Hilbert-Huang Transform

EURASIP Journal on Advances in Signal Processing

The Empirical Mode Decompositionand the Hilbert-Huang Transform

Guest Editors: Nii Attoh-Okine, Kenneth Barner,Daniel Bentil, and Ray Zhang

Copyright © 2008 Hindawi Publishing Corporation. All rights reserved.

This is a special issue published in volume 2008 of “EURASIP Journal on Advances in Signal Processing.” All articles are open accessarticles distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction inany medium, provided the original work is properly cited.

Editor-in-ChiefPhillip Regalia, Institut National des Telecommunications, France

Associate Editors

Kenneth Barner, USARichard J. Barton, USAYasar Becerikli, TurkeyKostas Berberidis, GreeceJ. C. M. Bermudez, BrazilA. Enis Cetin, TurkeyJonathon Chambers, UKMei-Juan Chen, TaiwanLiang-Gee Chen, TaiwanHuaiyu Dai, USASatya Dharanipragada, USAKutluyil Dogancay, AustraliaFlorent Dupont, FranceFrank Ehlers, ItalySharon Gannot, IsraelFulvio Gini, ItalyM. Greco, ItalyIrene Y. H. Gu, SwedenFredrik Gustafsson, SwedenUlrich Heute, GermanySangjin Hong, USAJiri Jan, Czech RepublicMagnus Jansson, SwedenSudharman K. Jayaweera, USA

Søren Holdt Jensen, DenmarkMark Kahrs, USAMoon Gi Kang, South KoreaWalter Kellermann, GermanyJoerg Kliewer, USALisimachos P. Kondi, GreeceAlex Chichung Kot, SingaporeC.-C. Jay Kuo, USATan Lee, ChinaGeert Leus, The NetherlandsT.-H. Li, USAHusheng Li, USAMark Liao, TaiwanY.-P. Lin, TaiwanS. Makino, JapanStephen Marshall, UKC. F. Mecklenbrauker, AustriaGloria Menegaz, ItalyRicardo Merched, BrazilMarc Moonen, BelgiumVitor Heloiz Nascimento, BrazilChristophoros Nikou, GreeceSven Erik Nordholm, AustraliaAntonio Ortega, USA

D. O’Shaughnessy, CanadaBjorn Ottersten, SwedenJacques Palicot, FranceAna Perez-Neira, SpainWilfried Philips, BelgiumAggelos Pikrakis, GreeceIoannis Psaromiligkos, CanadaAthanasios Rontogiannis, GreeceGregor Rozinaj, SlovakiaMarkus Rupp, AustriaWilliam Allan Sandham, UKBulent Sankur, TurkeyDirk Slock, FranceY.-P. Tan, SingaporeJoao Manuel R. S. Tavares, PortugalGeorge S. Tombras, GreeceDimitrios Tzovaras, GreeceJacques G. Verly, BelgiumBernhard Wess, AustriaJar-Ferr Kevin Yang, TaiwanAzzedine Zerguine, Saudi ArabiaA. M. Zoubir, Germany

Contents

The Empirical Mode Decomposition and the Hilbert-Huang Transform, Nii Attoh-Okine,Kenneth Barner, Daniel Bentil, and Ray ZhangVolume 2008, Article ID 251518, 2 pages

Feature Point Detection Utilizing the Empirical Mode Decomposition, Jesmin Farzana Khan,Kenneth Barner, and Reza AdhamiVolume 2008, Article ID 287061, 13 pages

Empirical Mode Decomposition Method Based on Wavelet with Translation Invariance,Qin Pinle, Lin Yan, and Chen MingVolume 2008, Article ID 526038, 6 pages

Improved EMD Using Doubly-Iterative Sifting and High Order Spline Interpolation,Yannis Kopsinis and Steve McLaughlinVolume 2008, Article ID 128293, 8 pages

Optimal Signal Reconstruction Using the Empirical Mode Decomposition, Binwei Weng andKenneth E. BarnerVolume 2008, Article ID 845294, 12 pages

Fast and Adaptive Bidimensional Empirical Mode Decomposition Using Order-Statistics FilterBased Envelope Estimation, Sharif M. A. Bhuiyan, Reza R. Adhami, and Jesmin F. KhanVolume 2008, Article ID 728356, 18 pages

Single-Trial Classification of Bistable Perception by Integrating Empirical Mode Decomposition,Clustering, and Support Vector Machine, Zhisong Wang, Alexander Maier, Nikos K. Logothetis,and Hualou LiangVolume 2008, Article ID 592742, 8 pages

A Fault Diagnosis Approach for Gears Based on IMF AR Model and SVM, Junsheng Cheng,Dejie Yu, and Yu YangVolume 2008, Article ID 647135, 7 pages

Univariate and Bivariate Empirical Mode Decomposition for Postural Stability Analysis,Hassan Amoud, Hichem Snoussi, David Hewson, and Jacques DucheneVolume 2008, Article ID 657391, 11 pages

Multimodal Pressure-Flow Analysis: Application of Hilbert Huang Transform in Cerebral BloodFlow Regulation, Men-Tzung Lo, Kun Hu, Yanhui Liu, C.-K. Peng, and Vera NovakVolume 2008, Article ID 785243, 14 pages

Speech Enhancement via EMD, Kais Khaldi, Abdel-Ouahab Boudraa, Abdelkhalek Bouchikhi,and Monia Turki-Hadj AlouaneVolume 2008, Article ID 873204, 8 pages

Segmentation of Killer Whale Vocalizations Using the Hilbert-Huang Transform, Olivier AdamVolume 2008, Article ID 245936, 10 pages

Evaluating Pavement Cracks with Bidimensional Empirical Mode Decomposition,Albert Ayenu-Prah and Nii Attoh-OkineVolume 2008, Article ID 861701, 7 pages

Hindawi Publishing CorporationEURASIP Journal on Advances in Signal ProcessingVolume 2008, Article ID 251518, 2 pagesdoi:10.1155/2008/251518

EditorialThe Empirical Mode Decomposition andthe Hilbert-Huang Transform

Nii Attoh-Okine,1 Kenneth Barner,2 Daniel Bentil,3 and Ray Zhang4

1 Department of Civil and Environmental Engineering, University of Delaware, Newark, DE 19716, USA2 Department of Electrical and Computer Engineering, University of Delaware, Newark, DE 19716, USA3 Departments of Mathematics and Statistics and Molecular Physiology Biophysics, The University of Vermont,Burlington, VT 05405, USA

4 Civil Engineering Specialty, Division of Engineering, Colorado School of Mines, Golden, CO 80401, USA

Correspondence should be addressed to Nii Attoh-Okine, [email protected]

Received 4 November 2008; Accepted 4 November 2008

Copyright © 2008 Nii Attoh-Okine et al. This is an open access article distributed under the Creative Commons AttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properlycited.

Data from natural phenomena are usually nonstationary dueto their transient nature; also, the span of captured datamay be shorter than the longest time scale that describes thephenomenon. In fact, since it is impossible or impracticalto obtain infinite data points describing a phenomenon,all data are invariably short. To simplify processing andanalysis, data stationarity is often assumed even thoughthe condition may not be strictly satisfied. For instance,the stationarity assumption justifies traditional Fourier-based methods, which utilize a priori basis sets to globallydecompose a signal. To directly address the processing ofnonstationary and nonlinear signals, the Hilbert-Huangtransform (HHT) has recently been developed. The HHTcomprises two steps: empirical mode decomposition (EMD)and Hilbert spectral analysis (HSA). Unlike Fourier-basedmethods, the EMD decomposes a signal into its componentsadaptively without using a priori basis. The decompositionis based on the local time scale of the data. The adaptivenature of the process successfully decomposes nonlinear,nonstationary signals in the time domain. Moreover, thedecomposition components, referred to as intrinsic modefunctions (IMFs), are generally in good agreement withintuitive and physical signal interpretations. Moreover, theIMFs have well-defined instantaneous frequencies. Accord-ingly, the HSA Hilbert transforms the IMFs to generate a fullenergy-frequency-time plot (Hilbert spectrum), which givesthe instantaneous energy and frequency content of the signal.The bidimensional empirical mode decomposition (BEMD)has recently been introduced as a 2D extension to the EMD.Thus, the EMD and BEMD are increasingly being employed

to successfully address many contemporary signal processingapplications.

Bidimensional empirical mode decomposition (BEMD)is an extension of the one-dimensional EMD applied totwo-dimensional signals. Images are usually decomposedwith BEMD using different interpolation methods to extractIMFs. An important aspect of the BEMD is the constructionof envelopes when sifting for IMFs, which involves inter-polation of scattered data formed by the extrema of thedata. Three broad methods of scattered data interpolation areradial basis function methods, triangulation-based methods,and inverse distance weighted methods. In using any ofthese major methods, there are two approaches to datainterpolation: global and local approaches. In the globalapproach, interpolated data are influenced by all data withinthe given domain, whereas in the local approach, interpo-lated values are influenced by data within a neighborhoodof the interpolated points. Global methods tend to becomputationally costlier than local methods due to thegeneration of larger coefficient matrices that can easilybecome highly ill-conditioned.

A number of issues have come up concerning empiricalmode decomposition, including the following.

(1) Finding mathematical and physical meaning forIMFs, since EMD is essentially algorithmic in natureand lacks mathematical rigor.

(2) Determining the most appropriate interpolationscheme.

(3) Identifying criteria for stopping the sifting process.

2 EURASIP Journal on Advances in Signal Processing

(4) Handling of boundary or end effects during datainterpolation.

Most success in EMD has been in 1D, however, oneissue still persists in all these advancements: the physicalsignificance of IMFs derived from the original data series orsignal. A thorough understanding of the physical processesthat generate data is required before any form of scientificexplanation can be attributed to any particular IMF or groupof IMFs. Even with this kind of thorough knowledge, there isstill a level of ambiguity when trying to extract informationfrom the IMFs that is directly relevant to the original signaland the physics of the underlying system. Before gettingto the point where essential information can be extractedfrom the IMFs, there is a need to determine which IMFsare really relevant to the decomposition process and whichcarry the necessary information required to understand theunderlying system, as EMD is a numerical procedure withpossible numerical errors in the results.

BEMD has potential in image preprocessing in the areaof edge detection. The first few IMFs in BEMD contain thehighest spatial frequencies contained in the original image,so that separating out these first few IMFs can smooth outthe image for further processing.

The purpose of this special issue is to address thefollowing issues in both 1D and 2D empirical mode decom-positions:

1. theoretical analysis and understanding;

2. performance enhancements of the EMD;

3. single decomposition, monitoring, and analysis;

4. feature extraction;

5. fast and adaptive methods;

6. decomposition domain processing methods;

7. image analysis and segmentation;

8. texture representation and segmentation;

9. optimization;

10. signal fusion and interpolation;

11. signal processing applications in Engineering and Bio-medical.

This special issue contains 12 papers. Of these there are 5theoretical papers. The article by J. F. Khan et al. introduceda novel contour-based method for detecting largely affineinvariant interest or feature points. The main contributionof the paper is the selection of good discriminative featurepoints from relatively thinned edges. Repeatability rate,which evaluates the geometric stability under different trans-formation, was used as the performance criteria. L. Yan et al.developed a filtering approach to address the mode mixingproblem caused by intermittency signal in EMD process.The authors first used wavelet denoising and then appliedthe EMD procedure. The results show that this filteringapproach affectively avoids the mode mixing and retainuseful information. S. McLaughin and Y. Kopsinis useddouble iterative sifting and high interpolation in the EMD

procedure. It appears that this approach has the capabilityof improving the performance. Binwei Weng and KennethBarner developed a method for signal reconstruction. Theproposed reconstruction algorithm gives the best estimateof a given signal in the minimum mean square error sense.The algorithm involves two steps: (a) formulation of linearweighting for the IMF, (b) bidirectional weighting. S. M. A.Bhuiyan et al. proposed a multiple hierarchical method forBEMD. In the approach, order statistics are used to get theupper and lower envelopes, where the filter size is derivedfrom the data.

Two papers develop a hybrid approach between SVM,clustering, and EMD. N. Logothetis et al. initially usedEMD procedure, and unsurpervised K-means clustering theIMF and exploiting the SVM on the extracted features. Theauthors tested their methodology on local field potentialin monkey cortex for decoding its bistable structure-from-motion perception. Yu Yang et al. EMD is used as prepro-cessor for AR (autoregressive) analysis; SVM is then used toclassifier the output.

There were few papers on the applications. H. Snoussi etal. performed a comparative analysis of EMD and complexempirical mode decomposition and bivariate empiricalmode decomposition. The two new methods appear to besuitable to complex time series. The authors applied theirmethodology to posture analysis. Yanhui Liu et al. used theEMD procedure to develop a new technique—multimodalpressure flow method (MMPF) for assessment of cerebralautoregulation. The results obtained by the authors forthe new methodology are applicable in engineering andbiomedical applications. A. Bouchikhi et al. used the EMDin speech enhancement. The authors used two strategies: fil-tering and thresholding. The authors demonstrated that theirpropose approach performs better than wavelet applications.Olivier Adam used EMD as segmentation of killer whalesvocalizations; the results were very favorable compared tothe alternative methods. Finally, N. O. Attoh-Okine and A.Ayenu-Prah [1] used the BEMD to evaluate pavement imagecrack detection and classification. The work appears to havegeneral application in structural health monitoring in civilinfrastructure applications.

We sincerely hope that the diverse papers in this specialissue will introduce various researchers, engineers, andstudents to this new emerging field. Although the EMDis at its infancy, the number of papers keeps increasingastronomically every year. Finally, we hope that moremathematicians will address some of the “mathematical andtheoretical” limitations.

Nii Attoh-OkineKenneth Barner

Daniel BentilRay Zhang

REFERENCES

[1] A. Ayenu-Prah, Empirical mode decomposition and civil infras-tructure systems, Ph.D. dissertation, University of Delaware,Newark, Del, USA, 2007.


Research ArticleFeature Point Detection Utilizing the EmpiricalMode Decomposition

Jesmin Farzana Khan,1 Kenneth Barner,2 and Reza Adhami1

1 Department of Electrical and Computer Engineering, University of Alabama in Huntsville, Huntsville, AL 35899, USA2 Department of Electrical and Computer Engineering, University of Delaware, Delaware, DE 19716, USA

Correspondence should be addressed to Jesmin Farzana Khan, [email protected]

Received 22 June 2007; Revised 18 January 2008; Accepted 3 March 2008

Recommended by Ray Zhang

This paper introduces a novel contour-based method for detecting largely affine invariant interest or feature points. In the firststep, image edges are detected by morphological operators, followed by edge thinning. In the second step, corner or feature pointsare identified based on the local curvature of the edges. The main contribution of this work is the selection of good discriminativefeature points from the thinned edges based on the 1D empirical mode decomposition (EMD). Simulation results compare theproposed method with five existing approaches that yield good results. The suggested contour-based technique detects almost allthe true feature points of an image. Repeatability rate, which evaluates the geometric stability under different transformations,is employed as the performance evaluation criterion. The results show that the performance of the proposed method comparesfavorably against the existing well-known methods.

Copyright © 2008 Jesmin Farzana Khan et al. This is an open access article distributed under the Creative Commons AttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properlycited.

1. INTRODUCTION

There are a wide variety of methods reported in theliterature for interest point and corner detection in grey-level images. Current detection methods can be categorizedinto three types: contour-based, parametric model-based,and intensity-based methods. Contour-based methods firstextract contours and then search for maximal curvature orinflexion points along the contour chains, or carry out somepolygonal approximation and then search for intersectionpoints. Contour-based methods have existed for some time[1–6]. This work proposes a contour-based technique that isinspired by the fact that there is a correspondence betweenthe wavelet decomposition and the EMD of a given signal,for example, the wavelet decomposition of a signal giveshigher energy where the signal contains information, whilethe intrinsic mode function (IMF) of the EMD shows higherfrequency content at the same locations. Corner detectionschemes using the wavelet transform (WT) are popular dueto the fact that the WT is able to decompose an input signalinto smooth and detailed parts by low-pass and high-passfilters at multiresolution levels [7]. In this manner, localdeviations are easily captured at various detailed decompo-

sition levels. Several wavelet-based approaches are reportedin [8–15].

Parametric model methods fit a parametric intensitymodel to the signal. They often provide subpixel accuracy,but are limited to specific types of interest points, forexample, L-corners. A parametric model is used in [16–19].Intensity-based methods compute a measure that indicatesthe presence of an interest point directly from the grey values.This type of detector does not depend on edge detection ormathematical models [20–30].

This paper presents a novel contour-based interest pointdetector, which is largely affine transformation invariant.The main contribution of this work is the introduction ofthe 1D EMD [31] for extracting feature points from edges.In addition, a new scheme for edge thinning is proposed.Specifically, edge detection is performed using morpholog-ical gradient operator [32], followed by edge thinning basedon edge thickness in the horizontal and vertical directions.To detect true corner points from the circular arcs, the 2Dboundaries of an object are represented by the 1D tangentangles of the boundary point coordinates. Then eigenvectorsof the covariance matrix of the coordinates are calculatedover a small boundary segment [15, 33]. Based on the fact


that true corners result in stronger tangent variations, the1D EMD is utilized to decompose the 1D tangent angles andcapture the irregular angle variations. Finally, the locations ofthe true feature points are identified by comparing the localfrequency content of the first intrinsic mode function (IMF)of the 1D decomposed signal with a predefined threshold.

A requirement for good feature point detection is thatthe detector be invariant to image transformations andyields the same detected points for different viewpoints.The proposed method is largely invariant to significantaffine transformations including large rotations and scalechanges. Such transformations introduce significant changesin point locations as well as in the scale and the shape of theneighborhoods of interest points. Our approach addressesthese problems simultaneously and offers invariance togeometric transformation. Thus, the points detected in theoriginal image and points detected after the transformationof the image commute. Such points have often been calledinvariant feature points in the literature, though in principlethey change covariantly with the transformation. Thus, eventhough the regions themselves are covariant, the normalizedimage pattern they cover and the feature descriptors derivedfrom them are typically invariant.

In this paper, we evaluate the proposed method utilizingthe “repeatability” [34] criteria, which directly measures thequality of the detected feature points for tasks such as imagematching, object recognition, and 3D reconstruction. It iscomplementary to localization accuracy, which is relevantfor tasks such as camera calibration and 3D reconstructionof specific scene points. Repeatability and localization areconflicting criteria; smoothing improves repeatability butdegrades localization [35]. Repeatability explicitly comparesthe geometrical stability of the detected interest pointsbetween different images of a given scene taken undervarying viewing conditions. An interest point is “repeated”if the 3D scene point detected in the first image is alsoaccurately detected in the transformed image. The proposeddetector is compared to five existing methods that havebeen shown to yield good results. Utilizing repeatability, theproposed method is shown to yield comparable to improvedresults.

The remainder of the paper is organized as follows:the proposed corner detection algorithm is described inSection 2. Experimental results along with the compari-son to five other existing methods are demonstrated inSection 3. Concluding remarks and recommendations forfuture improvement are given in Section 4.

2. PROPOSED ALGORITHM

2.1. Motivation

It is reported that the wavelet transform is a robust schemefor feature points detection due to its ability to decomposean input signal into smooth and detailed components. Thisfact motivates the consideration of the EMD for interest. TheEMD technique was developed recently to analyze the time-frequency distribution of nonlinear and nonstationary data.The EMD is an adaptive decomposition through which any

Image acquisition

Edge detection

Nonlinear filteringMorphological gradientGlobal thresholding

Edge thinning

Elimination of spurious edges

Boundary segment decompositionEmpirical mode decomposition forsmall edge fragmentsInvestigate the frequency content ofthe first IMF

Yes NoFrequency > threshold

Interest point Not an interest point

Figure 1: Block diagram of the proposed algorithm.

signal can be decomposed into its IMFs that provide well-defined instantaneous frequency information of the signal.

Unlike Fourier or wavelet techniques, the EMD does notassume the form of the underlying oscillatory modes or basisfunctions. For a given signal, the wavelet decomposition isless compact and physically meaningful than the EMD results[36]. The EMD decomposition method is adaptive andhighly efficient. Since the decomposition is based on the localcharacteristic of the data, it is applicable to nonlinear andnonstationary processes. Experimental results presented hereshow that, for feature point selection, the EMD is also robustand gives better performance than wavelet approaches.

In the proposed approach, we select the feature pointsfrom the edges and, to make the selection process robust, weuse morphological edge detection along with edge thinning.The methodology of the proposed method is to use aneigenvector of the covariance matrix for a boundary pointover a small region of support (ROS) on a small boundarysegment as a curvature function for feature point detection.Thus, we perform the EMD on small edge fragment afteredge detection and thinning operations. The derived edgedetection and thinning method is used in lieu of traditionalmethod, such as Canny edge detection [35], because itreturns edge segments rather than contiguous edge lines,which is beneficial in feature point detection.

2.2. The EMD-based feature point determination

A block diagram of the proposed algorithm is shown inFigure 1, and the algorithm steps are summarized as follows.

After acquiring the image, edge detection is performedbased on mathematical morphology [32] applied to theintensity image. The intensity image is first blurred by

Jesmin Farzana Khan et al. 3

open-close and close-open filters [37]. Next, a morphologicalgradient operator is applied to the blurred image, whichgives symmetric edges between foreground and backgroundregions, and the resulting image is converted to binaryedge map by a global nonhistogram-based thresholdingtechnique [38]. A new edge thinning algorithm is appliedto this binary image to obtain fine, narrow, and well-defined object boundaries. Next, a novel technique forselecting feature points from object boundaries based onthe EMD is employed. In this work, we represent edgesas a set of straight or curved line fragments that are usedto extract local curvature by analyzing the eigenvectors ofcovariance matrices using the 1D EMD. Specifically, eachsmall 2D boundary segment is transformed to a 1D θ −P representation (where θ is the tangent angle variationsof the arc length, P, along the object’s boundary) that isdecomposed using the 1D EMD. At the true feature points,the first IMF signal of the EMD shows distinctly higherfrequency contents than at the points which carry less, or no,information. Thus, points where the frequency measure ofthe first IMF signal is greater than a predefined threshold areset as interest points. The following subsections discuss eachstep of the algorithm in detail.

2.2.1. Morphological edge detection

Most classical edge detectors such as Laplacian of Gaussian(LoG) [39] and Canny [35] are based on differential opera-tions and hence are primarily effective in detecting step edge.In contrast to classical techniques, morphological operations[37] are highly effective in detecting different types offeatures. In this paper, a morphological scanning edgedetector (MSED) [32] is applied. The operator is insensitiveto skew and orientation, free from artifacts introduced byboth global and fixed size block-based local thresholding,and robust to noise. It has been reported that edge features[40] can better handle lighting and scale variations in naturalscene images than texture features [41, 42]; therefore, wechoose to use an edge-based approach in this study.

An efficient morphological edge detection scheme isapplied to the the intensity image, I , as follows [32].The image I is first blurred (to reduce false edges andoversegmentation) using open-close and close-open filters[37]. The final blurred image, Ib, is the average of the outputsof these two filters,

Ib = B(BIo)c+B(BIc)o

2, (1)

where B is the 3 × 3 eight-connected structuring element,and BIo and BIc denote the opening and closing of I by thestructuring element B, respectively. Next, the morphologicalgradient operator [43] is applied to the blurred image Ib,resulting in an image,

Ies = δB(Ib)− εB(Ib

), (2)

where δB and εB and are the dilation and erosion operators,respectively, utilizing the 3 × 3 eight-connected structuringelement B. The morphological gradient is an edge-strength

extraction operator that gives symmetric edges between theforeground and background regions. The resulting image,Ies, is then thresholded to obtain a binary edge mask. A globalnonhistogram-based thresholding technique is incorporatedrather than local (adaptive) thresholding [38]. The thresholdlevel, γ, is set as,

γ =∑(

Ies·c)

∑c

, (3)

where · denotes pixel-wise multiplication and c =max(|g1∗∗Ies|, |g2∗∗Ies|); also g1 = [−101]; g2 = [−101]T ,and ∗∗ denotes 2D linear convolution. The binary edgeimage Ie is then given by,

Ie ={

1 if Ies > γ,

0 otherwise.(4)

To thin the edges, the morphological edge map is scannedalong the horizontal and vertical directions to reduce thewidth of the edges to a single pixel by through erosion.During horizontal scanning, all the nonzero neighborhoodpixels of a nonzero edge pixel in a horizontal window 1×wh

are set to 0. The resulting image is Ihe,

if Ie(xi, yj

)/=0,

then

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

Ihe(xi, yj

) = Ie(xi, yj

)

Ihe(xi, yk

) = 0; for k∈k /= j

{j − wh

2, j +

wh

2

}.

(5)

Similar operations in the vertical direction yield

if Ie(xi, yj

)/=0,

then

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

Ihe(xi, yj

) = Ie(xi, yj

)

Ihe(xi, yk

) = 0; for k∈k /= j

{j − wh

2, j +

wh

2

}.

(6)

The maximum of Ihe and Ive is set as the thinned binary edgeimage, Ite, resulting from the edge thinning operation, Ite =max(Ihe, Ive). The image Ite may still contain isolated noisyspurious edges. To remove these edge, segments of lengthless than N are deleted. Let n sequential points describe anedge segment P in Ite such that P = {pi = (xi, yi); i =1, 2, 3 · · ·n}. Then

Ite(xi, yi

) = 0 for(xi, yi

) ∈ P, n ≤ N. (7)

The resulting final binary edge image, Ifte, contains 1pixelwidth boundaries in the image.

As an example, the intensity image, the morphologicaledge strength extracted image, gradient image after globalthresholding, and the final edge image after thinning andelimination spurious edges are shown in Figures 2(a), 2(b),2(c), and 2(d), respectively.


(a) (b)

(c) (d)

(e) (f)

Figure 2: (a) Intensity image, (b) morphological edge strengthimage, (c) morphological edge after thresholding, (d) final edgeimage after thinning, (e) Canny edge image for a low threshold and(f) Canny edge image for a high threshold.

The reason for not using an existing edge detectionmethod, for example, Canny edge detection method [35], isthat Canny’s method yields extraneous boundaries as shownin Figure 2(e). Though for a higher threshold Canny methodgives fewer boundaries. In using this approach, however, thethreshold needs to be determined by hand for each image.Moreover, even by setting a different threshold it does notreturn edge segments but rather long continuous edges andconnected edges between two objects. The methodology ofthe proposed algorithm is to use the covariance matrix for aboundary point over a small region of support (ROS), on asmall boundary segment, as a curvature function for featurepoint detection. It has been found that the morphologicalframework and edge thinning scheme yields edge segments,which facilitates the extraction of feature points.

2.2.2. Empirical mode decomposition

The EMD decomposes a signal into a finite number of (zeromean) frequency and amplitude modulated signals calledIMFs. The first IMF contains the highest local frequenciesof oscillation while the final IMF, or the residue, contains asingle extremum, a monotonic trend, or simply a constant.

The basic idea embodied in the EMD analysis, as introducedby Huang et al. [31], is to allow for an adaptive andunsupervised representation of the intrinsic components oflinear and nonlinear signals, based purely on the propertiesobserved in the data without appealing to the concept ofstationarity. Although the EMD is a relatively new dataanalysis technique, its power and simplicity have encouragedits application in a myriad of fields, including almost all areasof signal processing, image processing, computer vision, andmedical analysis [44–48].

2.2.3. Feature points extraction

After obtaining the binary edge image, Ifte, the featurepoints along the boundaries of objects must be determined.If the boundary of an object involves both straight linesand circular arcs, spurious corners may be detected atcircular arcs by boundary-based approaches. To overcomethis shortcoming, Tsai et al. [33] introduced the eigenvaluesof a covariance matrix for a boundary point, over a smallregion of support (ROS) on a small boundary segment, as acurvature function for feature point detection. We adopt thisapproach in order to retain the robust merits of covariancematrix in feature point detection. However, instead of usingmultiple eigenvalues, the principal eigenvector is used in thiswork, because the dominant orientation of any pixel in thelocal neighborhood can be either denoted as the argumentof the principal eigenvector or by using the ratio betweentwo eigenvalues. This technique is applied to each segmentof the 2D boundaries of the edge image and feature pointsare extracted from each segment. As this approach considersa small boundary segment as a new curvature function forfeature detection, image noise and quantization effect arereadily eliminated.

The 1D wavelet transform has been utilized as a robustscheme in feature point detection due to its excellentlocal deviations capturing capability. In this work, the 2Dboundaries of an object are initially transformed to a 1D θ−Prepresentation. Then, 1D θ − P signal is used as input forthe 1D EMD to detect the local deviations as measured bythe number of zero crossing points of the first IMF. In thefollowing, we present the procedure of finding the tangentangle of the boundary point.

2.2.4. 1D θ − p representation of boundary segment

From the binary edge image, Ifte the x-y coordinates of eachpoint of a boundary segment of an object are first extractedinto an array. Let a boundary P of an object be described by nsequential digital points, P = {pi = (xi, yi); i = 1, 2, 3 · · ·n},where pi+1 is adjacent to pi on P. Let Ns(pi) denote a smallboundary segment of P with point pi is at the center ofNs(pi) over the ROS between points pi−s and pi+s for someinteger s. That is, Ns(pi) = {pj : j ∈ {i− s, i+ s}}. Therefore,the covariance matrix M(pi) for point pi is estimated by theboundary points coordinates within Ns(pi) [49];

M(pi) =

[m11 m12

m21 m22

]

, (8)


where,

m11 =[

12s + 1

j=i+s∑

j=i−sx2j

]

− x2i ,

m22 =[

12s + 1

j=i+s∑

j=i−sy2j

]

− y2i ,

m12 = m21 =[

12s + 1

j=i+s∑

j=i−sx j y j

]

− xi yi,

xi = 12s + 1

j=i+s∑

j=i−sx j ,

yi =1

2s + 1

j=i+s∑

j=i−sy j ,

(9)

where xi and yi are the geometrical center of Ns(pi). Thecovariance matrix M(pi) is a 2 × 2 symmetric, positivesemidefinite matrix. The eigenvalues λ1 and λ2 of M(pi) arethe solutions of the characteristic equation DET(M(pi)−D),where D is unit matrix. The corresponding eigenvectors E1

and E2 represent the tangent (major axis) and the normal(minor axis) directions for point pi over the segment Ns(pi),respectively. Therefore, the tangent angle of point pi, denotedby θ(pi), is simply defined as follows:

tanθ(pi) =

(λ1 −m11

)

m12,

θ(pi) = arctan

(λ1 −m11

m12

).

(10)

In general, the magnitude of θ(pi) is between −π/2and π/2. However, in order to avoid the large variation fortwo adjacent boundary points due to quantization effects[8, 10], θ(pi) is defined as between 0 and π/2. That is,θ(pi) = | arctan((λ1 −m11)/m12)|. However, if m12 equals to0, then θ(pi) is set to π/2 to avoid divided by zero situations.Therefore, the angle of a boundary point pi can be calculatedby the eigenvector E1 of M(pi) and the above expression forθ(pi).

2.2.5. Detection procedure

Consider an n1-point digital boundary, P = {pi = (xi, yi);i = 1, 2, 3 · · ·n1}, traversing points (x1, y1), (x2, y2), . . . ,(xn1 , yn1 ), and circumventing the boundary in the counter-clockwise direction.For each pi sequence, there correspondsa 1D θ − P signal, θ(pi), 1D wavelet signal, Y(pi), and a 1Dfirst IMF signal of the EMD, X(pi).

As an example, we have chosen a binary image with oneobject. Figure 3(a) shows a binary image of an artificial “h”-shape object. Figure 3(b) presents the edge image of thatobject with one boundary involving n1 = 273 boundarypoints. The character “+” in Figure 3(b) denotes the startingboundary point (x1, y1) and the arrow indicates the directionof boundary following. The corresponding 1D θ − Prepresentation of the object boundary, θ(pi) is shown at thetop of Figure 3(c), which is used as an input signal to both the

100

80

60

40

20

9070503010

(a)

100

80

60

40

20

9070503010

(b)

−1−0.5

00.5

1

−2−1

01

2

00.5

11.5

2

300250200150100500

300250200150100500

300250200150100500

(c)

100

80

60

40

20

9070503010

Final points fromwavelet decomposition

(d)

100

80

60

40

20

9070503010

Final points from EMD

(e)

Figure 3: (a) Binary image of letter “h”; (b) starting boundary pointand direction of boundary following; (c) the 1D θ−P representationof the “h”-shape object (top), haar wavelet decomposition at firstdecomposition level (middle), and the first IMF of the EMD(bottom); (d) feature points obtained from the location of distinctwavelet coefficients; and (e) feature points obtained from thefrequency content of the first IMF of the EMD.

1D wavelet decomposition utilizing the “harr” basis functionand the 1D empirical mode decomposition. The middle plotof Figure 3(c) shows 1D wavelet coefficients at the finest(first) detailed decomposition level, Y(pi). The bottom plotof Figure 3(c) is the first IMF X(pi) obtained from θ(pi). Thecorrespondence between the wavelet decomposition and thefirst IMF of the EMD for the 1D input signal can clearly beobserved from Figure 3(c). Both the wavelet coefficients andthe frequency of the first IMF are distinctly higher at the samepoints of that original 1D signal. The finest scale waveletenergies are distinctly higher at the true feature points than atthe smooth regions. Feature points extracted from the binary


image of letter “h” based on the 1D wavelet decompositionmethod are shown in Figure 3(d).

In the EMD case, the first IMF shows distinctly higherfrequency content at true feature points than at straight lines.The algorithm for finding true interest points makes fourpasses through the IMF signal. First, points are selected ifthey exceed a minimum number of zero crossings aroundthem. Second, if two selected points are adjacent then oneis deleted based on the concentration of zero crossings.During the third pass, the selected points that are notlocally maximum in the original intensity image in its 3 × 3neighborhood are deleted. In the final pass, the subset ofpixels are kept such that the minimum distance between anypair of points is larger than a given threshold.

Let Z(pi) be the set of zero-crossing points of the IMFaround pi:

Z(pi) ={pj : X

(pj−1

)X(pj+1) < 0

};

for j ∈{i− Wz

2, i +

Wz

2

},

(11)

where Wz defines window centered at pi. If for a point,the number of zero crossings is greater than a predefinedthreshold, thz (in our work, the threshold is 1/3 of themaximum number of zero crossings in the IMF signal), thatpoint is likely a feature point. This is the first selection of thefeature points from the object boundary, which forms the setF1 ⊂ P:

F1 ={pi : Z(pi) > thz

}

= {pi = (xi, yi); i = 1, 2, 3, · · · ,n2 : n2 < n1}

,(12)

where n1 is the number of all the boundary points and n2

is the number of selected points after discarding redundantpoints.

To discard redundant points, we check whether severalneighboring points have the same number of zero-crossingpoints over a Ws = 1 × 11 size window, and we keep thepoints among those that have the most concentrated zero-crossing points. Hence for each point over the window Wz,we calculate the sum of the distances from all the zero-crossing points to the point under consideration, pi,

S(pi) =j=i+Wz/2∑

j=i−Wz/2

|pi − pj|. (13)

If F2 ⊂ F1 is the set of feature points after discardingredundant points from F1, then F2∩F1 is the set of discardedpoints:

F2 ∩ F1 ={pj : Z(pj) = Z(pi) and S(pj) /=min

{S(pj)

}}

for j ∈{i− Ws

2, i +

Ws

2

},

F2 ={pi = (xi, yi); i = 1, 2, 3, · · · ,n3 : n3 < n2 < n1

},

(14)

where n3 is the number of selected points after discardingredundant points from F1.

Figure 4: A synthetic image.

Finally, from the points in F2, we retain those points thatare locally maximum in their Wm = 3 × 3 neighborhoodwith the restriction that the distance between any two featurepoints is larger than a given threshold (this is set to 5 pixelsin our experiment). Thus, F3 ⊂ F2 is the set of feature pointsthat are locally maximum in the edge image, Ifte, and F f ⊂ F3

is the final set of feature points after discarding those closelyspaced points:

F3 ={pi=(xi, yi) : Ifte(xi, yi)= max

{ j∈i−Wm/2,i+Wm/2}Ifte(xj , yj)

},

F3 ={pi = (xi, yi); i = 1, 2, 3 · · ·n4 : n4 < n3 < n2 < n1

},

(15)

where n4 is the number of selected points after discardingredundant points from F2:

F f ={pi :

∣∣pi−pi−1

∣∣ >5 pixels, and

∣∣pi−pi+1

∣∣ > 5 pixels

},

F f ={pi=

(xi, yi

); i=1, 2, 3 · · ·n5 : n5<n4<n3<n2<n1

},

(16)

where n5 is the number of selected points after discardingredundant points from F3.

Following the above procedure, the extracted final featurepoints, F f for the artificial “h”-shape object are shown inFigure 3(e). By comparing the final feature point extractionresults shown in Figures 3(d) and 3(e), it can be said thatthe IMF is richer in containing useful information aboutthe original signal than the wavelet decomposition. Thatis, the EMD determined feature points are found at all thecurvatures of the object whereas the wavelet decompositionapproach misses some curvatures.

For an image with more than one object and objectswith complicated shape, we perform the EMD on eachedge fragment to extract local curvature following the aboveprocedure. Thus, we find feature points for each fragmentof edge independently and the final feature points are theaccumulation of all the points obtained from all the edgeboundary segments.

3. EXPERIMENTAL RESULTS

Results of experiments conducted to test the efficacy ofthe proposed corner detection algorithm are provided. Inorder to test the immunity of the proposed algorithm to


(a) (b)

(c) (d)

(e) (f)

Figure 5: Feature points detected in the synthetic image by(a) Harris method (b) Lowe’s method, (c) Tomasi’s method,(d) Loupias’s technique, (e) Yeh’s algorithm, and (f) proposedtechnique.

transformations, the original images are scaled, rotated, andsheared. Stability to image noise is also tested. Additionally,the repeatability rates of five interest point detectors arecompared with the presented method under different imagerotation and scale changes. Finally, analysis is performed onparameter sensitivity of the algorithm.

For comparison, we have chosen five detectors thatare reported to offer good performance. Among the fivechosen detectors, Harris’s [24], Lowe’s [30], and Tomasi’smethods [25] are intensity-based methods.These are chosenbecause Harris’s method has been reported to be betterthan any other detector, Lowe’s algorithm also known asSIFT (scale invariant feature transform) is the best scaleinvariant detector, and Tomasi’s detector is the best fortracking applications. The other two detectors [13, 15]are chosen because (1) they are contour-based methodslike the proposed method and (2) they use the waveletdecomposition.

The above mentioned methods and the proposed algo-rithm are first applied to a synthetic image consisting ofhorizontal, vertical, and slanted lines; and different typesof corners. As shown in Figure 4, this synthetic imagecontains both prominent and faint edges. The featurepoints detected by Harris’s, Lowe’s, Tomasi’s, Loupias’s, and

(a) (b)

(c) (d)

(e) (f)

Figure 6: Feature points detected by (a) Harris method (b) Lowe’smethod, (c) Tomasi’s method, (d) Loupias’s technique, (e) Yeh’salgorithm, and (f) proposed technique.

Yeh’s methods are shown in Figures 5(a), 5(b), 5(c), 5(d),and 5(e), respectively. The interest points extracted by thepresented algorithm are given in Figure 5(f). Even though theproposed method identifies fewer number of feature pointsthan some of the presented approaches, it is interesting toobserve that these feature points are distributed along alledges, boundaries, and corners of interest.This is true forprominent boundaries and edges, as well as subtle interioredges. Many object corners and boundaries are missed by theother methods, especially the faint interior edges. Thus, theproposed method has produced the most judicious featurepoints, placing them logically along the structures of interest.

Simulation results for the five methods on a real imageare presented in Figure 6. For the reference image shown inFigure 2(a), feature points extracted by Harris’s approach,Lowe’s procedure, Tomasi’s method, Loupias’s technique,and Yeh’s algorithm are shown in Figures 6(a), 6(b), 6(c),6(d), and 6(e), respectively. The points detected by thepresented method are given in Figure 6(f). From this figure,it can be seen that points selected by the proposed methodcover all the curvatures of object boundaries and yields themost true corner points.

To evaluate detector rotation invariance, Figures 7 and8 show the detection results for two rotated versions of


(a) (b)

(c) (d)

(e) (f)

Figure 7: Feature points detected in 40◦ rotated image by (a) Harrismethod (b) Lowe’s method, (c) Tomasi’s method, (d) Loupias’stechnique, (e) Yeh’s algorithm, and (f) proposed technique.

the reference image. Figures 7(f) and 8(f) give the resultsof the proposed method, where the rotation angle for theimages are 40◦ and 110◦ for the images in Figures 7 and 8,respectively. The performance of Harris’s, Lowe’s, Tomasi’s,Loupias’s, and Yeh’s methods are given in Figures 7(a), 7(b),7(c), 7(d), and 7(e), respectively, for 40◦ rotation and inFigures 8(a), 8(b), 8(c), 8(d), and 8(e), for 110◦ rotation.From the figures, it is observed that Harris’s, Lowe’s, and theproposed method give the best result for both rotations. Theperformance of Tomasi’s technique is better than Loupias’sand Yeh’s methods.

The effect of image scale change on detection result istested and demonstrated in Figures 9 and 10. The pointsdetected by the proposed technique are shown in Figures9(f) and 10(f), where the scale changes are 1.5 and 3.4,respectively. Points detected by Harris’s, Lowe’s, Tomasi’s,Loupias’s, and Yeh’s methods are presented in Figures 9(a),9(b), 9(c), 9(d), and 9(e), respectively for the scale change1.5 and in Figures 10(a), 10(b), 10(c), 10(d), and 10(e),respectively for the scale change 3.4. It can be seen from thefigures that all the methods are scale invariant.

To evaluate the functioning of all five detectors as affineinvariant systems, nonuniform scaling is applied in somedirections to have shearing in the reference image shown in

(a) (b)

(c) (d)

(e) (f)

Figure 8: Feature points detected in 110◦ rotated image by(a) Harris method (b) Lowe’s method, (c) Tomasi’s method,(d) Loupias’s technique, (e) Yeh’s algorithm, and (f) proposedtechnique.

Figure 2(a). Figure 11(f) displays the feature points detectedin the sheared image by the proposed method. The results ofdetection by Harris’s, Lowe’s, Tomasi’s, Loupias’s, and Yeh’smethods are presented in Figures 11(a), 11(b), 11(c), 11(d),and 11(e), respectively. By examining the figure, it can besaid that the proposed method performs satisfactorily fordetecting interest points from the sheared image as well.

To check the performance with noise, we have addedGaussian noise to the original image. For a noisy image witha SNR of 25 dB, Figure 12(f) presents the points detected bythe proposed method. The performance of Harris’s, Lowe’s,Tomasi’s, Loupias’s, and Yeh’s methods are given in Figures12(a), 12(b), 12(c), 12(d), and 12(e), respectively, for thesame level of noise. The results demonstrate that except forTomasi’s technique, the other five methods work well in thepresence of noise.

From the figures, it can be concluded that the proposedmethod can be used as an affine and scale invariant detector.To complement the subjective evaluations, we present aquantitative performance comparison of the proposed affineand scale invariant detector and other detectors. The stabilityand accuracy of the detectors are evaluated using therepeatability criterion [34]. The repeatability score for agiven pair of images is the percentage of corresponding


(a) (b)

(c) (d)

(e) (f)

Figure 9: Feature points detected in 1.5 times scaled image by(a) Harris method (b) Lowe’s method, (c) Tomasi’s method,(d) Loupias’s technique, (e) Yeh’s algorithm, and (f) proposedtechnique.

points detected in those images under different geometricand photometric transformations. We take into account onlythe points located in the part of the scene present in bothimages. Measuring the repeatability rate within 1.5 pixels orless, the probability that two points are accidentally withinthe error distance is negligible.

We first compare the detectors for image rotationfollowed by scale change and additive noise. The repeatabilityrate as a function of the angle of image rotation is displayedin Figure 13(a). The rotation angles vary between 0◦ and180◦. Under repeatability, Harris’s and Lowe’s methods givethe best results for all rotations, where these algorithmsobtain a repeatability rate of about 82% and 77%, respec-tively. From the observation of the plot of the repeatabilityrate with image rotation, the proposed technique does notoutperform Harris method or SIFT algorithm, but it yieldsa repeatability rate of about 70% for all rotations. Notably,it offers better performance than Tomasi’s, Loupious’s, andYeh’s techniques, where these approaches offer a repeatabilityrate of about 60%, 50%, and 40%, respectively.

Figure 13(b) shows the repeatability rate as a functionof scale changes. The results show that all the detectorsare scale sensitive except Lowe’s method. As the nameimplies, the SIFT algorithm proposed by Lowe offers thebest performance with scale change. This method is the least

(a) (b)

(c) (d)

(e) (f)

Figure 10: Feature points detected in 3.4 times scaled imageby (a) Harris method (b) Lowe’s method, (c) Tomasi’s method,(d) Loupias’s technique, (e) Yeh’s algorithm, and (f) proposedtechnique.

dependent on the change in scale. The Harris’s, Tomasi’s,and the proposed detectors give reasonable results, with therepeatability rate as a decreasing function of scale change.Laopious’s and Yeh’s methods are very sensitive to scalechange and the results of these methods are hardly usable.

To study repeatability in the presence of image noise,the repeatability rate is displayed as a function of SNR. Forperformance evaluation with noise, the SNR is varied from35 dB to 21 dB and the results are displayed in Figure 13(c).All the detectors give reasonable results in the additive noisecases, with the exception of Tomasi’s method. Harris’s andLaopious’s methods give the best results followed by theproposed and then by Lowe’s, Yeh’s, and Tomasi’s techniques.The proposed method obtains a repeatability rate of nearly70% for all levels of noise considered.

For the evaluation of detection performance, the featurepoints extracted by the proposed method and the five otheralgorithms are presented for two more images in Figures 14and 15, where the detected points are superimposed on theoriginal image to evaluate the interest points location. Fromthose figures, it is observed that the proposed method extractpoints where variations occur in the image, that is, where theimage information is supposed to be the most important.Additionally, the set of detected interest points are notcluttered in a few regions rather spread out at different parts


(a) (b)

(c) (d)

(e) (f)

Figure 11: Feature points detected in sheared image by (a) Harrismethod (b) Lowe’s method, (c) Tomasi’s method, (d) Loupias’stechnique, (e) Yeh’s algorithm, and (f) proposed technique.

of the image. Most importantly, the proposed method coversall the curvatures of object boundaries and yields at the truefeatures, that is, the edges, ridges, and corners. Accordingly,the extracted points detect the structure of the scene and leadto a complete image representation.

Analysis is done on the sensitivity to algorithm param-eters. Most existing interest point detectors depend on thechoice of some parameters. The threshold in Harris’s methodis chosen by trial and error depending on the problem athand. The number of points detected by the SIFT algorithmvaries significantly with the change in image intensity. Forexample, for the image used in the paper, the number ofdetected feature points by SIFT is 862, if the input imageis not normalized and it is 363 if the same input image isnormalized.

In the proposed method there is one threshold, thz whichdetermines the set of the first selection of the interest pointssuch that any point will be selected if it exceeds a minimumnumber of zero crossings around it. This threshold is nota function of image intensity, rather it is the function ofnumber of zero-crossing points of an IMF and thus, it isdifferent for each edge segment of an image. In the paper, thisthreshold has been set as the 1/3 of the maximum numberof zero crossings present in an IMF signal and this choice

(a) (b)

(c) (d)

(e) (f)

Figure 12: (a) Feature points detected in noisy image with SNR= 25 dB by (a) Harris method (b) Lowe’s method, (c) Tomasi’smethod, (d) Loupias’s technique, (e) Yeh’s algorithm, and (f)proposed technique.

is independent of the image. As this threshold selects thepreliminary set of the feature points, any value which is equalto or smaller than the value yields the same result. Choosingthe value for this threshold is not stringent and has minoreffect on the final set of extracted feature points. Becausethe preliminary feature points selected by this thresholdgo through three more passes for the redundant pointsto be discarded. The two more thresholds in those passesdefine the size of the local neighborhood of a pixel. Thus,those thresholds are not a function of image intensity, butrather image size. The value of those two thresholds chosenin the paper work well for most of the images we comeacross in practice. We have tested Harris method and theproposed algorithm for different values of the threshold,thz. Table 1 gives the number of detected points for Harrisand the proposed method as a function of the threshold(thz = Threshold × (1/3) of the maximum number of zerocrossings).

After the examination of the overall detection results, itcan be claimed that the proposed method compares favor-ably against other well-known methods. Based on the plotfor the repeatability rate, the performance of the proposedtechnique is the least dependent on transformations andnoise, which is a desirable and attractive characteristic for


20

30

40

50

60

70

80

90

110

100

180160140120100806040200

Rotation (degrees)

HarrisSIFTEMD

TomasiLaopiousYeh

(a)

10

20

30

40

50

60

70

80

90

43.532.521.51

Change in scale

HarrisSIFTEMD

TomasiLaopiousYeh

(b)

45

505560

65

707580859095

363432302826242220

SNR (dB)

HarrisSIFTEMD

TomasiLaopiousYeh

(c)

Figure 13: Plot for the repeatability rate as a function of (a) rotation angle, (b) change in scale, and (c) noise level.

(a) (b)

(c) (d)

(e) (f)

Figure 14: Feature points detected in the second image by(a) Harris method (b) Lowe’s method, (c) Tomasi’s method,(d) Loupias’s technique, (e) Yeh’s algorithm, and (f) proposedtechnique.

any feature point detector. Though the proposed techniquedoes not outperform Harris method or SIFT algorithm, ityields better results than Tomasi’s, Loupious’s, and Yeh’stechniques. Notably, it offers better performance than the

(a) (b)

(c) (d)

(e) (f)

Figure 15: Feature points detected in the third image by (a) Harrismethod (b) Lowe’s method, (c) Tomasi’s method, (d) Loupias’stechnique, (e) Yeh’s algorithm, and (f) proposed technique.

other two contour-based methods: Loupious’s and Yeh’sapproaches. Therefore, the proposed algorithm can beexpected to perform well for applications where true inter-est points must correspond to image contours or object


100

80

60

40

20

9070503010

(a)

100

80

60

40

20

9070503010

(b)

Figure 16: Interest points detected from the binary image of letter“h” by (a) the SIFT algorithm and (b) the proposed algorithm.

Table 1: Effect of threshold on the number of detected points.

Threshold 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

Harrismethod

303 308 314 322 326 334 346 362 377 391

Proposedmethod

303 303 303 303 303 303 303 303 303 303

boundaries for further processing. As an example, for abinary image, interest points should not be found in theuniform region of constant intensity, that is, either in thebackground or foreground. Rather interest points must lieonly on the edges. As shown in Figure 16, for the binaryimage of the letter “h”, the SIFT algorithm detects featurepoints in the uniform region. But the proposed method,as a contour-based technique, extracts interest points onlyfrom the edges, which signifies the performance variation ofdifferent algorithms depending on the types of image and/orapplications.

4. CONCLUSION

This research presents a robust, rotation invariant, and scale-invariant corner detection scheme for images based on themorphological edge detection, the eigenvectors of covariancematrices for boundary segment points, and the 1D EMD.We modify an existing morphological edge detection schemeto yield thin edges and eliminat spurious edges resultingfrom the background. The main contribution of this workis the utilization of the first IMF of EMD of the 1D θ − Psignal of the edge to localize true corner points on boundarycontours. Under appropriate image resolution and region ofsupport, the proposed approach precisely captures the truecorner points and is free from false alarms on circular arcsfor both simple and complicated objects in varying rotation,scale conditions, and noise contaminations.

The interesting attribute of this technique is that it doesnot detect feature points globally. Rather it detects featurepoints locally, based on the neighboring characteristics of asmall edge segment. This results in the presented methodbeing more independent of image transformation thanthe other five methods considered for comparison. Thus,interest points detected by the proposed method are largelyindependent of the imaging conditions; that is, detectedpoints are geometrically stable.

Additionally, for the proposed technique we do not needeither to implement the computational extensive 2D EMD orto calculate all the IMFs of 1D EMD. Only the first IMF of 1DEMD is required. Experimental results also suggest that theproposed 1D EMD-based corner detection approach is stableand efficient. The proposed method is a generic concept andcan find its application in many matching and recognitionproblems.

REFERENCES

[1] A. Bandera, C. Urdiales, F. Arrebola, and F. Sandoval, “Cornerdetection by means of adaptively estimated curvature func-tion,” Electronics Letters, vol. 36, no. 2, pp. 124–126, 2000.

[2] R. Rattarangsi and R. T. Chin, “Scale-based detection ofcorners of planar curves,” IEEE Transactions on PatternAnalysis and Machine Intelligence, vol. 14, no. 4, pp. 430–449,1992.

[3] R. Horaud, F. Veillon, and T. Skordas, “Finding geometric andrelational structures in an image,” in Proceedings of the 1stEuropean Conference on Computer Vision (ECCV ’90), pp. 374–384, Antibes, France, April 1990.

[4] E. Shilat, M. Werman, and Y. Gdalyahu, “Ridge’s corner detec-tion and correspondence,” in Proceedings of IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition(CVPR ’97), pp. 976–981, San Juan, Puerto Rico, USA, June1997.

[5] F. Mokhtarian and R. Suomela, “Robust image corner detec-tion through curvature scale space,” IEEE Transactions onPattern Analysis and Machine Intelligence, vol. 20, no. 12, pp.1376–1381, 1998.

[6] A. Pikaz and I. Dinstein, “Using simple decomposition forsmoothing and feature point detection of noisy digital curves,”IEEE Transactions on Pattern Analysis and Machine Intelligence,vol. 16, no. 8, pp. 808–813, 1994.

[7] A. Bruce and H. Y. Gao, Applied Wavelet Analysis with SPLUS,Springer, New York, NY, USA, 1996.

[8] Y.-N. Sun, J.-S. Lee, and C. T. Tsai, “Multiscale cornerdetection by using wavelet transformation,” IEEE Transactionson Image Processing, vol. 4, no. 4, pp. 100–104, 1995.

[9] C.-H. Chen, J.-S. Lee, and Y.-N. Sun, “Wavelet transformationfor gray-level corner detection,” Pattern Recognition, vol. 28,no. 6, pp. 853–861, 1995.

[10] A. Quddus and M. M. Fahmy, “Fast wavelet-based cornerdetection technique,” Electronics Letters, vol. 35, no. 4, pp. 287–288, 1999.

[11] A. Quddus and M. Gabbouj, “Wavelet-based corner detectiontechnique using optimal scale,” Pattern Recognition Letters,vol. 23, no. 1–3, pp. 215–220, 2002.

[12] J. P. Hua and Q. M. Liao, “Wavelet-based multi scale cornerdetection,” in Proceedings of the 5th International Conferenceon Signal Processing (ICSP ’00), pp. 341–344, Beijing, China,August 2000.

[13] E. Loupias, N. Sebe, S. Bres, and J.-M. Jolion, “Wavelet-basedsalient points for image retrieval,” in Proceedings of IEEEInternational Conference on Image Processing (ICIP ’00), vol. 2,pp. 518–521, Vancouver, BC, Canada, September 2000.

[14] M. S. Lew, E. Loupias, T. S. Huang, Q. Tian, and N. Sebe,“Image retrival using wavelet-based salient points,” Journal ofElectronic Imaging, vol. 10, no. 4, pp. 835–849, 2001.

[15] C.-H. Yeh, “Wavelet-based corner detection using eigenvectorsof covariance matrices,” Pattern Recognition Letters, vol. 24,no. 15, pp. 2797–2806, 2003.


[16] K. Rohr, “Recognizing corners by fitting parametric models,”International Journal of Computer Vision, vol. 9, no. 3, pp. 213–230, 1992.

[17] R. Deriche and T. Blaszka, “Recovering and characterizingimage features using an efficient model based approach,” inProceedings of IEEE Computer Society Conference on ComputerVision and Pattern Recognition (CVPR ’93), pp. 530–535, NewYork, NY, USA, June 1993.

[18] S. Baker, S. K. Nayar, and H. Murase, “Parametric featuredetection,” International Journal of Computer Vision, vol. 27,no. 1, pp. 27–50, 1998.

[19] L. Parida, D. Geiger, and R. Hummel, “Junctions: detection,classification, and reconstruction,” IEEE Transactions on Pat-tern Analysis and Machine Intelligence, vol. 20, no. 7, pp. 687–698, 1998.

[20] H. P. Moravec, “Towards automatic visual obstacle avoidance,”in Proceedings of the 5th International Joint Conference onArtificial Intelligence (IJCAI ’77), p. 584, Cambridge, Mass,USA, August 1977.

[21] P. R. Beaudet, “Rotationally invariant image operators,” inProceedings of the 4th International Joint Conference on PatternRecognition (ICPR ’78), pp. 579–583, Kyoto, Japan, November1978.

[22] L. Kitchen and A. Rosenfeld, “Gray-level corner detection,”Pattern Recognition Letters, vol. 1, no. 2, pp. 95–102, 1982.

[23] W. Forstner, “A framework for low level feature extraction,” inProceedings of the 3rd European Conference on Computer Vision(ECCV ’94), pp. 383–394, Stockholm, Sweden, May 1994.

[24] C. Harris and M. Stephens, “A combined corner and edgedetector,” in Proceedings of the 4th Alvey Vision Conference(AVC ’88), pp. 147–151, Manchester, UK, September 1988.

[25] C. Tomasi and T. Kanade, “Detection and tracking of pointfeatures,” Tech. Rep. CMU-CS-91-132, Carnegie Mellon Uni-versity, Pittsburgh, Pa, USA, 1991.

[26] J. Shi and C. Tomasi, “Good features to track,” in Proceedingsof IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR ’94), pp. 593–600, Seattle, Wash,USA, June 1994.

[27] F. Heitger, L. Rosenthaler, R. von der Heydt, E. Peterhans, andO. Kubler, “Simulation of neural contour mechanism: fromsimple to end-stopped cells,” Vision Research, vol. 32, no. 5,pp. 963–981, 1992.

[28] S. M. Smith and J. M. Brady, “SUSAN—a new approach tolow level image processing,” International Journal of ComputerVision, vol. 23, no. 1, pp. 45–78, 1997.

[29] R. Laganiere, “Morphological corner detection,” in Proceedingsof the 6th IEEE International Conference on Computer Vision(ICCV ’98), pp. 280–285, Bombay, India, January 1998.

[30] D. G. Lowe, “Distinctive image features from scale-invariantkeypoints,” International Journal of Computer Vision, vol. 60,no. 2, pp. 91–110, 2004.

[31] N. E. Huang, Z. Shen, S. R. Long, et al., “The empirical modedecomposition and the Hilbert spectrum for nonlinear andnon-stationary time series analysis,” Proceedings of the RoyalSociety A, vol. 454, no. 1971, pp. 903–995, 1998.

[32] K.-K. Chin and J. Saniie, “Morphological processing forfeature extraction,” in Image Algebra and Morphological ImageProcessing IV, vol. 2030 of Proceedings of SPIE, pp. 288–302,San Diego, Calif, USA, July 1993.

[33] D.-M. Tsai, H.-T. Hou, and H.-J. Su, “Boundary-based cornerdetection using eigenvalues of covariance matrices,” PatternRecognition Letters, vol. 20, no. 1, pp. 31–40, 1999.

[34] C. Schmid, R. Mohr, and C. Bauckhage, “Evaluation of interestpoint detectors,” International Journal of Computer Vision,vol. 37, no. 2, pp. 151–172, 2000.

[35] J. Canny, “A computational approach to edge detection,”IEEE Transactions on Pattern Analysis and Machine Intelligence,vol. 8, no. 6, pp. 679–698, 1986.

[36] S. Sinclair and G. G. S. Pegram, “Empirical mode decompo-sition in 2-D space and time: a tool for space-time rainfallanalysis and nowcasting,” Hydrology and Earth System Sciences,vol. 9, no. 3, pp. 127–137, 2005.

[37] J. Serra, Image Analysis and Mathematical Morphology, Aca-demic Press, New York, NY, USA, 1982.

[38] S. U. Lee, S. Y. Chung, and R. H. Park, “A comparativeperformance study of several global thresholding techniquesfor segmentation,” Computer Vision, Graphics and ImageProcessing, vol. 52, no. 2, pp. 171–190, 1990.

[39] D. Marr and E. Hildreth, “Theory of edge detection,” Proceed-ings of the Royal Society of London B, vol. 207, no. 1167, pp.187–217, 1980.

[40] X. Chen, J. Yang, J. Zhang, and A. Waibel, “Automaticdetection and recognition of signs from natural scenes,” IEEETransactions on Image Processing, vol. 13, no. 1, pp. 87–99,2004.

[41] A. K. Jain and B. Yu, “Automatic text location in images andvideo frames,” Pattern Recognition, vol. 31, no. 12, pp. 2055–2076, 1998.

[42] V. Kastrinaki, M. Zervakis, and K. Kalaitzakis, “A survey ofvideo processing techniques for traffic applications,” Imageand Vision Computing, vol. 21, no. 4, pp. 359–381, 2003.

[43] Y. M. Y. Hasan and L. J. Karam, “Morphological text extractionfrom images,” IEEE Transactions on Image Processing, vol. 9,no. 11, pp. 1978–1983, 2000.

[44] H. Liang, Q.-H. Lin, and J. D. Z. Chen, “Application of theempirical mode decomposition to the analysis of esophagealmanometric data in gastroesophageal reflux disease,” IEEETransactions on Biomedical Engineering, vol. 52, no. 10, pp.1692–1701, 2005.

[45] D. Rouvre, D. Kouame, F. Tranquart, and L. Pourcelot,“Empirical mode decomposition (EMD) for multi-gate,multi-transducer ultrasound Doppler fetal heart monitoring,”in Proceedings of the 5th IEEE International Symposium onSignal Processing and Information Technology (ISSPIT ’05), pp.208–212, Athens, Greece, December 2005.

[46] Z. Liu, H. Wang, and S. Peng, “Texture classification throughdirectional empirical mode decomposition,” in Proceedings ofthe 17th International Conference on Pattern Recognition (ICPR’04), vol. 4, pp. 803–806, Cambridge, UK, August 2004.

[47] Md. K. I. Molla and K. Hirose, “Single-mixture audio sourceseparation by subspace decomposition of Hilbert spectrum,”IEEE Transactions on Audio, Speech and Language Processing,vol. 15, no. 3, pp. 893–900, 2007.

[48] T. Zhu, “Suspicious financial transaction detection based onempirical mode decomposition method,” in Proceedings ofIEEE Asia-Pacific Conference on Services Computing (APSCC’06), pp. 300–304, Guangzhou, China, December 2006.

[49] R. C. Gonzalez and R. E. Woods, Digital Image Processing,Addison Wesley, Reading, Mass, USA, 1993.


Research ArticleEmpirical Mode Decomposition Method Based onWavelet with Translation Invariance

Qin Pinle,1, 2 Lin Yan,1, 2 and Chen Ming1

1 School of Electrical and Information Engineering, Dalian University of Technology, Dalian 116024, Liaoning, China2 Ship CAD Engineering Center, Dalian University of Technology, Dalian 116024, Liaoning, China

Correspondence should be addressed to Qin Pinle, [email protected]

Received 20 August 2007; Revised 11 February 2008; Accepted 10 April 2008

Recommended by Nii Attoh-Okine

For the mode mixing problem caused by intermittency signal in empirical mode decomposition (EMD), a novel filtering methodis proposed in this paper. In this new method, the original data is pretreated by using wavelet denoising method to avoid themode mixture in the subsequent EMD procedure. Because traditional wavelet threshold denoising may exhibit pseudo-Gibbsphenomena in the neighborhood of discontinuities, we make use of translation invariance algorithm to suppress the artifacts.Then the processed signal is decomposed into intrinsic mode functions (IMFs) by EMD. The numerical results show that theproposed method is able to effectively avoid the mode mixture and retain the useful information.

Copyright © 2008 Qin Pinle et al. This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. INTRODUCTION

A new nonlinear technique, empirical mode decomposition(EMD), has recently been more and more popular as a newtool for time-frequency analysis method [1]. The essence ofEMD is to decompose time-varying data series into a finiteset of functions named intrinsic mode functions (IMFs). Theextracted IMFs represent the local character of original data.Furthermore, coupled with the Hilbert transform applied tothe IMFs, this decomposition method can obtain instanta-neous frequency and instantaneous amplitude. This proce-dure is called Hilbert-Huang transform (HHT). Despite thesuccess over the past few years of this analysis tool [2–6], itstill has some sections to improve. Simulations showed thatstraightforward application of EMD method may run intomode mixing when the data contain intermittency, the IMFswill lose intrinsic physics sense. We should find a suitableway to eliminate the mode mixing. To solve this problem,a criterion based on the period length was introduced toseparate the waves of different periods into different modesby Huang et al. [7]. But the detailed manipulation had notbeen presented. Zhao [8] made use of three correspondingcharacteristics for abnormal signal between the original dataand the first IMF to determine the start and end positionsof an abnormal signal. Then, the abnormal signal is removed

directly. But it is suitable only for the short interval abnormalsignal. Li et al. [9] used wavelet to avoid the mode mixing,but he did not take into account the effect of artifacts causedby wavelet.

In this paper, in order to overcome mode mixing, wefirstly combine wavelet transform and translation invariancealgorithm, which can suppress the artifacts caused by wavelettransform, to process original signal. Then, we executeempirical mode decomposition to the processed signal. Inthis way, we can eliminate mode mixing phenomenon toobtain excellent effect. Finally, in order to illustrate theeffectiveness of the proposed method, the simulations andreal data analysis are shown.

2. EMPIRICAL MODE DECOMPOSITION

The empirical mode decomposition (EMD) technique hasbeen developed recently with a view to analyze time-frequency distribution of nonlinear and nonstationary data.It is an adaptive decomposition with which any complicatedsignal can be decomposed into its intrinsic mode functions(IMFs). IMFs satisfy the following two constraints.

(i) In the whole signal segment, the number of extrema(maximum and minimum points of dataset) and the numberof zero crossing must be either equal or differ at most by one.


Sign

alIm

f1Im

f2R

es.

(a)

Sign

alIm

f1Im

f2Im

f3R

es.

(b)

Figure 1: (a) The decomposed results of formula (2) and (b) the decomposed results of signal with intermittency signal.

(ii) At any point, the mean value of the envelope definedby the local maxima and the envelope defined by the localminima is zero.

In practice, most of the signals may involve more thanone oscillatory mode, that is, the signal has more than oneinstantaneous frequency at a time locally. Assumed that anydata consist of different simple IMFs, EMD is developed todecompose a signal into IMF components and every IMF hasa unique local frequency. Given a time series data x(t), it canbe decomposed by EMD as follows [1, 10].

(1) Identify all the maxima and minima of x(t).

(2) Generate its upper and lower envelopes, xup(t) andxlow(t), with cubic spline interpolation.

(3) Compute the local mean m(t) = (xup(t) + xlow(t))/2.

(4) Extract the detail, g(t) = x(t)−m(t).

(5) Check whether g(t) is an IMF or not;

(5.1) if g(t) is an IMF according to the definitionof IMF, extract IMF and replace x(t) with theresidual r(t) = x(t)− g(t),

(5.2) if g(t) is not an IMF, further sifting is needed,and replace x(t) with g(t).

(6) repeat steps (1–5) until the residual satisfies somestopping criterion.

The sifting process will be continued until no more IMFscan be extracted. At the end of the decomposition, the signalx(t) is represented as follows:

x(t) =N∑

j=1

cj(t) + rN (t), (1)

where N is the number of IMFs, rN (t) is the residue whichis a constant, a monotonic, or a function with only maximaand one minima from which no more IMF can be derived,and cj denotes IMF.

We can apply above EMD procedure to decompose thetime series into set of IMFs and a residue. By applying theHilbert transform to each IMF we can farther analyze the

signal and calculate the instantaneous frequency of eachtransformed IMF. The whole process is called Hilbert-Huangtransform (HHT) [1].

3. EFFECT OF INTERMITTENCY POINT TO EMD

The EMD method has been applied widely in many areas,which shows that it has good effectiveness. Yet straight-forward application of the sifting method may run intodifficulties. Especially the original data contain intermittencywhich will cause mode mixing, that is, the first IMF willcontain the information of intermittency signal so that itcould not exhibit normal frequency process. Once the firstIMF caused mixing phenomenon, the subsequent IMFs willbe influenced.

Let us consider the data s(t) given in formula (2).Figure 1(a) shows the decomposed results of s(t) withapplication of the straightforward EMD, and Figure 1(b)shows the decomposed results of s(t) including intermittencysignal:

s(t) = sin(2p × 5t) + 2 cos(2p × 10t) + 3. (2)

In Figure 1(b), the first IMF includes the frequency ofintermittency signal, that is, mode mixing is caused in thepart of intermittency signal. As a result, the subsequent IMFsalso contain seriously mixed modes. To explain it more, rootmean square error (RMSE), which is expressed as formula(3), is adopted as an evaluation criterion:

RMSE =√∑(

x(t)− x(t))2/T , (3)

where x(t) denotes decomposed IMF data, x(t) denotes thereal signal data, and T is the length of time series.

The RMSE of IMFs can be summarized in Table 1. FromTable 1, we can find that errors of IMF components andreal value become larger due to intermittent signal. As themode mixing caused by intermittency is inevitable, it is moreworthwhile to explore a method to solve the problem.

4. THE SOLUTION TO INTERMITTENCY PROBLEM

In the last few years, wavelet transform has become awell-accepted time-frequency analysis tool, there has been

Qin Pinle et al. 3

Table 1: RMSE of EMD of simulation signal.

IMF1 IMF2 Residue

Normal signal 0.0064 0.0534 0.0720

Abnormal signal 0.3752 1.1341 1.021

Sign

alIm

f1Im

f2Im

f3Im

f4R

es.

Figure 2: EMD result of signal after tradition doing waveletdenoising.

considerable interest in the use of wavelet transforms forremoving noise from signals. One method has been the useof transform-based threshold, working in three steps:

(i) transform the noisy data into an wavelet domain, andget a group of wavelet coefficients,

(ii) apply soft or hard threshold to the resulting coeffi-cients, thereby suppressing those coefficients smallerthan certain amplitude, then obtain a group ofestimate coefficients, and

(iii) transform back into the original domain.

Wavelet transform can detect and characterize singulari-ties in signals so that it offers criterion for the classificationand identification of signal. Now, there are some studiesabout comparing wavelet with EMD method [11, 12].We attempt to adopt the wavelet denoising to eliminatemode mixing, but simulations show that denoising withthe traditional wavelet transform can exhibit pseudo-Gibbsphenomena in the neighborhood of discontinuities, whichstill causes mode mixing. To make our meaning clear, withapplication of the straightforward wavelet denoising in Haarbasis to Figure 1(b), and then using EMD method, we willobtain the components as shown in Figure 2, in which thefirst two IMF components contain seriously mixed modes.It is evident that the pseudo-Gibbs oscillations caused bywavelet denoising in the vicinity of discontinuities may runinto mode mixing. One method to suppress pseudo-Gibbsphenomena is called translation invariant wavelet transformby Coifman and Donoho [13].

4.1. Wavelet threshold denoising based ontranslation invariance

In the neighborhood of discontinuities, traditional waveletdenoising can exhibit pseudo-Gibbs phenomena. An impor-

tant observation about the phenomena is that the sizeof pseudo-Gibbs depends mainly on the location of adiscontinuity in the signal. For example, when using the Haarwavelets as basis, a discontinuity located at n/2 will not givepseudo-Gibbs oscillations; a discontinuity near n/3 will leadto significant pseudo-Gibbs oscillations. The essence reasonis the misalignment between the signal data and the basis[14, 15].

A possible way to correct the misalignment between thedata and the basis is to forcibly shift the data so that thediscontinuities change positions, the shifted signal will notexhibit the pseudo-Gibbs phenomena, and after denoisingthe data can be shifted back. Unfortunately, we do not knowthe location of the discontinuity. One method solving thissituation is optimization: develops a measure of artifacts andminimizes it by a proper choice of the shift, but there isno guarantee that this will always be the case. If the signalhas several discontinuities, they may interfere with eachother, that is, the best shift for one discontinuity may alsobe the worst for another discontinuity. Another reasonableapproach is called translation invariant algorithm, which isto apply a range of shifts, denoise the shifted data by waveletthreshold and average the several results, then produce areconstruction subject. Consequently, the shift dependenceof wavelet basis is eliminated. This method can effectivelysuppress the artifacts so that denoised signal is smoother andhas better approximation to original signal.

For a signal xt(0 ≤ t < n), Sh denotes the circulant shiftby h. The Sh(x)t can be specifically written as

Sh(x)t = x(t+h) mod n. (4)

The operator is unitary, and hence invertible:

(Sh)−1 = S−h. (5)

T represents the process of wavelet transform anddenoise based on threshold, the process of eliminatingoscillation by translation is shown as follows:

x′ = S−h(T(Sh(x)

)). (6)

Then apply a range of shifts, so an average over the severalresults is obtained. For time shifts, we consider a range H ofshifts and set

x′ = AVEh∈H{S−h(T(Sh(x)

))}, (7)

or in words in order to compare the efficiency with [8], weincreased the content. We can draw a conclusion that theefficiency is better than [8] from [13].

The method can be calculated rapidly in n log(n) time[13].

In wavelet transform, how to choose desirable waveletbasis is very difficult. Unsuitable wavelet basis functionmaybe reduces denoising efficiency. Fortunately, duringtranslation invariance denoising, abundant simulations showthat when the signal includes intermittency Haar basis caneliminate primely the pseudo-Gibbs in the neighborhoodof discontinuities [16, 17]. For comparison, in this paper,


Original signal

−10

0

10

0 100 200 300 400 500 600 700 800 900 1000

(a)

Signal with noise

−20

0

20

0 100 200 300 400 500 600 700 800 900 1000

(b)

Threshold filter

−10

0

10

0 100 200 300 400 500 600 700 800 900 1000

(c)

TA-threshold filter

−10

0

10

0 100 200 300 400 500 600 700 800 900 1000

(d)

Figure 3: Comparison of two denoising methods.

tradition denoising method and the denoising method basedon translation invariance algorithm all use Haar as the basisfunction.

In order to evaluate the effectiveness of the approach, weadopt random square-wave as experimental signal to com-pare traditional threshold algorithm with wavelet transformbased on translation invariance algorithm. Both methodsdecompose the signal with Haar wavelet basis to three layersand use soft threshold denoising which can be defined by[17]

wj,k ={

sgn(wj,k

)(∣∣wj,k∣∣− λ), ∣

∣wj,k∣∣ ≥ λ,

0,∣∣wj,k

∣∣ < λ,

(8)

where sgn(•) is sign function, one method of choosing λ isformula (9):

λ = δ√

2 logN , (9)

δ ≈Mx/0.6745,Mx represents the absolute median estimatedon the first scale.

Figure 3 shows that comparison between traditional softthreshold and soft threshold based on translation invariance.Here, we average over all n circulant shifts H = Hn = {h :0 ≤ h < n}, which are called fully translation-invariant [12].A benefit of the fully translation-invariant approach is thatthere are no arbitrary parameters to set: one does not have todecide whether to average over 10 or 20 shifts.

In Figure 3, it displays that denoising method withtranslation invariance is better than traditional thresholddenoising method. The Pseudo-Gibbs phenomenon is elim-inated effectively and the signal curve is smoother.

Table 2: Correlation coefficients of two results with original signal.

Traditional threshold TI-threshold

R 0.8271 0.9878

Table 3: RMSE of proposed method.

IMF2 IMF3 Residue

RMSE 0.0203 0.0981 0.0954

Using correlation coefficient (R) as an evaluation crite-rion, we compare the two methods with original signal inTable 2. From the data, also we can see that fully translation-invariant threshold is better than traditional threshold whenthe data includes discontinuity points.

So we can draw a conclusion that TI-threshold waveletcan better solve the problems which the intermittency signaleffects on wavelet transform.

4.2. EMD with translation invariance wavelettransform

Intermittency signal has the characteristic of sharp variation,so we firstly use wavelet denoising based on translationinvariance to pretreat original signal which will eliminate themode mixing caused by discontinuities. Then, we make useof EMD to extract the IMF components; a set of new modes isobtained. By this way, we can guarantee the validity of EMDmethod.

The flowchart of the proposed method is shown inFigure 4.

Figure 5 shows the processing result of signal inFigure 1(b) with the proposed method. We can see that themode mixing phenomenon is eliminated effectively.

Table 3 shows the RMSE using proposed method. Fromthe above analysis, it is shown that the denoised signal can bebetter fitting the real signal by using translation invariancewavelet denoise, which could not blur out important signalfeatures. And the decomposition no longer has mode mixing.Also, the IMFs obtained are closer to real value.

5. APPLICATION TO REAL TEMPERATURE DATA

In order to validate the feasibility of the proposed method,we adopt the data with Zhao [8] to analyze the result.

In Figure 6, the top curve is the original signal whichis 1000 hPa monthly averaged air temperature from 1958–1996 in Barrow, AK, USA. There are persistent several daysof high or low temperature which do not exist in othermonths and come into being several local maxima whichwill produce frequency mode mixing by EMD method.We can easily see from Figure 4 that abnormality data notonly affects the results of high-frequency portion but alsoinfluences the signal of multiyear variation, which attributesto effect of intermittency for whole IMFs in empirical modedecomposition. Intermittency signal may expand to everyIMF in the course of EMD, which brings about whole

Qin Pinle et al. 5

Start

Signal

Shift data

Wavelettransform

De-noise

Wavelet reversetransform

Unshift data

Achieve cycle time N

Y

Average result

x(t) = r(t)

Local maxima and minima extraction

Upper and lower envelope fits byspline interpolation

Compute mean envelope m(t)

x(t) = r g(t) = x(t)−m(t) x(t) = g(t)

N

Y

g(t) achieves criterion

n = n + 1, c(n) = (t), r = r − c(n)

r or c(n) achieves criterionN

Y

End

Figure 4: The flowchart of the proposed method.

Sign

alIm

f1Im

f2Im

f3Im

f4

Figure 5: The EMD result of the proposed method.

result distortion. So, it is necessary to pretreat original datacontained intermittency before processing data.

Figure 7 displays the results of the application of pro-posed method to the original signal data. The original datapretreat by translation invariance wavelet, the abnormal dis-turbance could be prevented efficiently. Then, decomposingthe new dataset by EMD method again, a set of IMFs isobtained, which has a more reasonable physical significance.In this way, we can guarantee the validity of EMD method.

In order to further research the signal features ofexceptional temperature, we can remove the residue (res. inFigure 7) and climatic period changing (imf2 in Figure 7),and then restructure the signal.

6. CONCLUSION

In this paper, we analyze the effect of intermittency to EMDmethod and point out that the signal with intermittencywill produce mode mixing phenomenon by directly usingEMD approach. Wavelet based on threshold method is anappropriate method for multiscale analysis signal, but it willcome into being pseudo-Gibbs phenomena on intermittentpoints which will affect empirical mode decomposition. So,we adopt translation invariance algorithm to eliminate theartifacts and then proceed to empirical mode decompositionto get IMF components which have genuine physics sense.Theoretical analysis and the given example show that.

(1) The proposed method, which combines empiricalmode decomposition and wavelet denoising based on trans-lation invariance algorithm, effectively eliminates the modemixing caused by intermittency.

(2) Compared with [8], the efficiency of method remov-ing the mode mixing in our paper is O(nlogn) [13].Comparing the reference [8], the method is better.


Empirical mode decompositionSi

gnal

Imf1

Imf2

Imf3

Imf4

Imf5

Imf6

Res

.

1960 1970 1980 1990

Figure 6: Decompose results by straightforward EMD for averagedair temperature.

Sign

alIm

f1Im

f2Im

f3Im

f4Im

f5R

es.

1960 1970 1980 1990

Figure 7: EMD results of the method in this paper.

ACKNOWLEDGMENT

The authors would like to thank Professor Zhao Jinping whoworks in Ocean University of China for providing “Barrowsounding balloon data.”

REFERENCES

[1] N. E. Huang, Z. Shen, S. R. Long, et al., “The empirical modedecomposition and the Hilbert spectrum for nonlinear andnon-stationary time series analysis,” Proceedings of the RoyalSociety of A, vol. 454, no. 1971, pp. 903–995, 1998.

[2] S.-X. Yang, J.-S. Hu, Z.-T. Wu, et al., “The comparison ofvibration signal time-frequency analysis between EMD-basedHTT and WT method in rotating machinery,” Proceedings ofthe CESS, vol. 23, no. 6, pp. 102–107, 2003.

[3] Z. K. Peng, P. W. Tse, and F. L. Chu, “An improved Hilbert-Huang transform and its application in vibration signalanalysis,” Journal of Sound and Vibration, vol. 286, no. 1-2, pp.187–205, 2005.

[4] R. Balocchi, D. Menicucci, E. Santarcangelo, et al., “Derivingthe respiratory sinus arrhythmia from the heartbeat timeseries using empirical mode decomposition,” Chaos, Solitons& Fractals, vol. 20, no. 1, pp. 171–177, 2004.

[5] M. C. Ivan and G. B. Richard, “Empirical mode decompo-sition based time-frequency attributes,” in Proceedings of the69th Annual Meeting of the Society of Exploration Geophysicists(SEG ’99), pp. 73–91, Houston, Tex, USA, November 1999.

[6] Md. Khademul Islam Molla, M. Sayedur Rahman, A. Sumi,and P. Banik, “Empirical mode decomposition analysis ofclimate changes with special reference to rainfall data,” DiscreteDynamics in Nature and Society, vol. 2006, Article ID 45348, 17pages, 2006.

[7] N. E. Huang, Z. Shen, and S. R. Long, “A new view of nonlinearwater waves: the Hilbert spectrum,” Annual Review of FluidMechanics, vol. 31, pp. 417–457, 1999.

[8] J. Zhao, “Study on the effects of abnormal events to empiricalmode decomposition method and the removal method forabnormal signal,” Journal of Ocean University of Qingdao, vol.31, no. 6, pp. 805–814, 2001.

[9] H. Li, L. Yang, and D. Huang, “The study of the intermittencytest filtering character of Hilbert-Huang transform,” Mathe-matics and Computers in Simulation, vol. 70, no. 1, pp. 22–32,2005.

[10] G. Rilling, P. Flandrin, and P. Goncalves, “On empirical modedecomposition and its algorithms,” in Proceedings of 6th IEEE-EURASIP Workshop on Nonlinear Signal and Image Processing(NSIP ’03), Grado, Italy, June 2003.

[11] H. T. Vincent, S.-L. J. Hu, and Z. Hou, “Damage detec-tion using empirical mode decomposition method and acomparison with wavelet analysis,” in Proceedings of the 2ndInternational Workshop on Structral Health Monitoring, pp.891–900, Stanford, Calif, USA, September 1999.

[12] Z.-Q. Gong, M.-W. Zou, X.-Q. Gao, and W.-J. Dong, “Onthe difference between empirical mode decomposition andwavelet decomposition in the nonlinear time series,” ACTAPhysica Sinica, vol. 54, no. 8, pp. 3947–3957, 2005.

[13] R. R. Coifman and D. L. Donoho, “Translation-invariant de-noising,” in Wavelets and Statistics, vol. 103 of Lecture Notes inStatistics, Springer, New York, NY, USA, 1994.

[14] S. E. Kelly, “Gibbs phenomenon for wavelets,” Applied andComputational Harmonic Analysis, vol. 3, no. 1, pp. 72–81,1996.

[15] H.-T. Shim and H. Volkmer, “On the Gibbs phenomenon forwavelet expansions,” Journal of Approximation Theory, vol. 84,no. 1, pp. 74–95, 1996.

[16] B. Tang, C. Yang, S. Tan, and S. Qin, “Denoise based ontranslation invariance wavelet transform and its applications,”Journal of Chongqing University, vol. 25, no. 3, pp. 1–5, 2002.

[17] D. L. Donoho, “De-noising by soft-thresholding,” IEEE Trans-actions on Information Theory, vol. 41, no. 3, pp. 613–627,1995.


Research ArticleImproved EMD Using Doubly-Iterative Siftingand High Order Spline Interpolation

Yannis Kopsinis and Steve McLaughlin

Institute of Digital Communications, School of Engineering and Electronics, College of Science and Engineering,The University of Edinburgh, King’s Buildings, Edinburgh EH9 3JL, UK

Correspondence should be addressed to Yannis Kopsinis, [email protected]

Received 3 September 2007; Accepted 2 March 2008

Recommended by Daniel Bentil

Empirical mode decomposition (EMD) is a signal analysis method which has received much attention lately due to its applicationin a number of fields. The main disadvantage of EMD is that it lacks a theoretical analysis and, therefore, our understanding ofEMD comes from an intuitive and experimental validation of the method. Recent research on EMD revealed improved criteriafor the interpolation points selection. More specifically, it was shown that the performance of EMD can be significantly enhancedif, as interpolation points, instead of the signal extrema, the extrema of the subsignal having the higher instantaneous frequencyare used. Even if the extrema of the subsignal with the higher instantaneous frequency are not known in advance, this new in-terpolation points criterion can be effectively exploited in doubly-iterative sifting schemes leading to improved decompositionperformance. In this paper, the possibilities and limitations of the developments above are explored and the new methods arecompared with the conventional EMD.

Copyright © 2008 Y. Kopsinis and S. McLaughlin. This is an open access article distributed under the Creative CommonsAttribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work isproperly cited.

1. INTRODUCTION

The empirical mode decomposition (EMD) method [1] isan algorithm for the analysis of multicomponent signals [2]that works by breaking them down into a number of ampli-tude and frequency modulated (AM/FM) zero mean signals,termed intrinsic mode functions (IMFs). In contrast to con-ventional decomposition methods, which perform the anal-ysis by projecting the signal under consideration into a num-ber of predefined basis vectors, EMD expresses the signal asan expansion of basis functions which are signal-dependent,and are estimated via an iterative procedure called sifting.This attribute of EMD potentially leads to a number of mer-its. To name a few: it can be applied regardless of the nonsta-tionary and/or nonlinear characteristics of the signal underconsideration. The results are not prejudiced by the prede-termined basis, a fact often leading to IMFs which preservethe physical meanings of the intrinsic processes underlyingthe signal. Moreover, the resulting IMFs are zero-mean nar-rowband functions well suited for meaningful instantaneousfrequency (IF) estimates via the Hilbert transform or other

alternative techniques [3]. As a result, EMD in conjunctionwith IF estimation offers an alternative path towards time-frequency signal representation.

The main drawback of EMD is the lack of a strong theo-retical analysis capable of evaluating and predicting EMD be-haviour in generalized signal conditions. However, recently,Rilling and Flandrin [4] have made some initial steps in thisdirection by theoretically analyzing the EMD outcomes in atwo-tone signal case. Although the signal can be consideredsimplistic, the analysis resulted in important conclusions. Itwas shown that there is quite a wide range of two-tone com-binations, for which EMD results do not, at least directly,agree with intuition and physical interpretation. Revealingsuch a limitation should definitely be a target for future EMDvariants.

The lack of theoretical developments on EMD has re-stricted the potential for improvements of the method itself.Literally, the current popular variation of EMD is roughlythe same as the one proposed in the original EMD paper[1], with the attempts for development of novel variants be-ing limited to less than a handful [5–8]. A recent study on


EMD [9], even though it is not analytical, revealed specificaspects that offer insight to its performance. More specif-ically, information about improved criteria for interpola-tion points selection was extracted based on a genetic al-gorithm (GA) optimization approach. A recently presentednovel EMD variant [8] called doubly-iterative EMD (DI-EMD) (DI-EMD Matlab functions can be downloaded fromhttp://www.see.ed.ac.uk/∼ykopsini/emd/emd.html) succee-ds in estimating interpolation points in agreement with theoptimized criteria derived in [9] leading to enhanced overalldecomposition performance. In this paper, the performanceand behaviour of DI-EMD is further investigated mainlythrough the Rilling two-tone signal model [4].

2. EMD METHOD

EMD [1] adaptively decompose a multicomponent signal[2] x(t) into a number K of zero-mean, narrowband IMFsh(i)(t), 1 ≤ i ≤ K ,

x(t) =K∑

i=1

h(i)(t). (1)

Each one of the IMFs, say the kth one h(k)(t), is estimatedwith the aid of an iterative process, called sifting, applied tothe residual multicomponent signal:

x(k)(t) =

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

x(t), k = 1

x(t)−k−1∑

i=1

h(i)(t), k ≥ 2.(2)

During the nth iteration of the sifting process, an esti-mate of the kth IMF is computed as follows:

h(k)n (t) = h(k)

n−1(t)−m(k)n−1(t), (3)

where, h(k)j (t) is the temporal estimate of the kth IMF at the

jth iteration and m(k)j (t) is an estimate of the local mean of

h(k)j (t).

Usually, the local mean m(k)j (t) is estimated as the aver-

age of two envelopes, an upper envelope and a lower one,

which enfold the corresponding IMF estimate h(k)j (t). In gen-

eral, the envelopes are constructed in agreement with the fol-lowing algorithm.

First, some time instances τu = [τu,1, . . . , τu,M], τ l =[τl,1, . . . , τl,L] called nodes, which correspond to the up-per and the lower envelope, respectively, are specified ac-cording to some criteria. These time instances indicate the

positions h(k)j (τu) = [h(k)

j (τu,1), . . . ,h(k)j (τu,M)], h(k)

j (τ l) =[h(k)

j (τl,1), . . . ,h(k)j (τl,L)] at which the upper and lower en-

velopes Iτu(t), Iτl (t) “touch” the temporal IMF estimate as

can be seen in Figure 1. In order to succeed in this h(k)j (τu),

h(k)j (τ l) serve as interpolation points in a piecewise polyno-

mial interpolation scheme, usually cubic spline interpola-tion.

h(k)j (t)

m(k)j (t)

Upper & lowerenvelopes

Interpolationpoints

τl,i τu,i τu,i+2

Figure 1: Quantities related to the EMD method.

Finally, the current estimate of the local mean is given by

m(k)j (t) = (Iτu(t) + Iτl (t))

2. (4)

After N sifting iterations, which is a number chosen ei-ther statically or dynamically according to specific criteria[10, 11], the sifting process is concluded and the kth IMF

is set equal to h(k)(t) = h(k)N (t). Alternatively, the intermedi-

ate estimates of the local mean can be summed up togetherforming a total mean envelope:

M(k)N (t) = m(k)

1 (t) + · · · +m(k)N , (5)

which is actually an enhanced estimate of the local mean ofthe residual signal under consideration x(k)(t). According tosuch a reformulation, the kth IMF can be obtained directlyfrom the expression:

h(k)(t) = x(k)(t)−M(k)N (t). (6)

Either (3) with (4) or (6) with (5) and (4) are equivalent ex-pressions to the sifting process [9].

In other words, from (6) it can be inferred that EMD con-siders signals (x(k)(t)) as fast oscillations (h(k)(t)) superim-

posed on slow oscillations [12] (M(k)N (t)) and the sifting pro-

cess aims to iteratively estimate the slow oscillating signalsusing (5). As a consequence, the kth IMF is an estimate ofthe fast oscillating component of the signal x(k)(t). Lets say,for example, that x(k)(t) consists of K AM/FM signals:

x(k)(t) =K∑

i=1

αi(t) cos(φi(t)

), (7)

which have corresponding instantaneous frequencies (IF)fi(t). It turns out that the sifting process tries to extract ineach time instant this signal among those consisting x(k)(t)which has the higher instantaneous frequency. At the sametime, the extracted frequency modulated (FM) signal al-though tends to be narrowband is not necessarily monocom-ponent permitting in such a way the presence of amplitudemodulation (AM). As a result, the fast oscillating signal thatthe sifting process tries to estimate with IMF h(k)(t) is givenby

s(k)f (t) = αj(t) cos

(φj(t)

), (8)

Y. Kopsinis and S. McLaughlin 3

250200150100500

Samples

Time domain

(a)

250200150100500

Samples

Time / frequency

(b)

Figure 2: (a) A transient signal having 256 samples. (b) The tran-sient signal in the frequency domain.

where φj(t) corresponds to f j(t) = max{ f1(t), . . . , fK (t)}.Note that s f (t) does not necessarily coincide with one of theK signals comprising x(k)(t). s f (t) may consist of parts ofthese signals depending on which one of them, in specifictime instances, has the higher instantaneous frequency. Ap-

parently, the “ideal” local mean of x(k)(t) is given by s(k)s (t) =

x(k)(t)− s(k)f (t).

In turn, the slow oscillating part x(k+1) = x(k) − h(k) =M(k)

N (t) is further processed through a number of sifting iter-ations for its separation to a fast oscillation part (which is the

25020015010050

−101

25020015010050

−101

25020015010050

−101

Figure 3: Three chirp signals and the corresponding IMF estimates.

next IMF h(k+1)(t)) and a slow oscillating part which servesas input to the next sifting process.

2.1. Example of EMD application

One of the potential applications of EMD is the analysis ofshort-duration transient signals. The reason is that essen-tially, the signal analysis performance of EMD does not de-pend on the length of the signal and/or the available sam-ples. In principle, the only requirement is the availability ofa large enough number of maxima and minima, which de-pend on the order of the spline interpolation. As long as theabove requirement is fulfilled, the full performance of EMDis guaranteed, otherwise, EMD cannot be applied.

The examined transient signal is the one shown inFigure 2(a), which consists of three nonlinear chirp sig-nal components depicted, in the time-frequency plane, inFigure 2(b). The three signal components in the time domaincan be seen with solid lines in Figure 3.

When the standard EMD with cubic spline interpolationis used for the analysis of the above signal, the result is anumber of IMFs with only three of them having significantenergy. It turns out that the higher energy estimated IMFs,shown in Figure 3 with dashed lines, roughly coincide withthe actual signal components.

Based on the instantaneous frequency estimates of the es-timated IMFs, a quite accurate spectrogram can be drawn asit is seen in Figure 4(a). It is important to note that the con-ventional short-time Fourier transform (STFT) regardless ofthe adopted window length is not capable of producing sucha sharp and detailed spectrogram as it can be seen in Figures4(a) and 4(b). This is happening due to the transient natureof the specific signal. In other words, if the STFT windowis considered long enough to include an adequate number ofsamples, in order to achieve reliable frequency resolution, thesignal itself changes considerable within the time span of thespecific window leading to poor time-frequency representa-tion.

3. DOUBLY ITERATIVE EMD

Several variants of EMD can be formed by altering the waythat the local mean is obtained. The local mean estimates are


250200150100500

Time

Hilbert-Huang spectrum

00.020.040.060.08

0.10.120.140.160.18

0.2

Nor

mal

ized

freq

uen

cy

(a)

250200150100500

Time

STFT spectrum (window 64)

00.020.040.060.08

0.10.120.140.160.18

0.2

Nor

mal

ized

freq

uen

cy(b)

250200150100500

Time

STFT spectrum (window 128)

00.020.040.060.08

0.10.120.140.160.18

0.2

Nor

mal

ized

freq

uen

cy

(c)

Figure 4: (a) Spectrogram using IF estimates of the IMFs. (b) Spectrogram using STFT with kaiser window of length equal to 64 samples.(c) Spectrogram using STFT with kaiser window of length equal to 128 samples.

determined from the upper and the lower envelope construc-tion, which as discussed above, basically comes down to theadopted criteria for the nodes selection as well as the inter-polation scheme used. With respect to the standard versionof EMD, usually employed in practice, the maxima and min-

ima, referred to as local extrema, of signal h(k)j (t) (or xk(t)

in the first iteration) are used as interpolation points andnatural cubic splines are used for interpolation. More specif-

ically, τu = {t : D1h(k)j = 0,D2h(k)

j < 0} and τ l = {t :

D1h(k)j = 0,D2h(k)

j > 0}, where the operator Dm f denotesthe mth derivative of function f . In [7, 9], it was shown

that the local extrema of the IMF estimates, h(k)j (t), in each

sifting iteration are far from being the optimum choice ofinterpolation points. It turned out that the decompositionperformance was significantly improved if the interpolationpoints, where set fixed in all the sifting iterations and equal

to the extrema of the fast oscillating signal s(k)f (t). Hereafter,

the extrema above will be called desired and the correspond-

ing nodes are given by τu = {t : D1s(k)f = 0,D2s(k)

f < 0} and

τ l = {t : D1s(k)f = 0,D2s(k)

f > 0}.At first glance, the observation that the desired extrema

perform much better than the standard local extrema may

be considered useless in the sense that actually s(k)f (t) is the

signal that the sifting process aims to extract. In that sense,its extrema cannot be known in advance. However, in [9]we saw that it is possible to obtain interpolation points es-timates which are closer than the local extrema to the desiredones. This can be realized by adopting, for example, the lo-cal extema of a high-pass filtered version of the signal under

consideration h(k)j (t). The filtering results in a signal with at-

tenuated slow oscillating components leading to improvedestimates of the extrema of the desired fast oscillating coun-

terpart. In other words, h(k)j (t) are preprocessed in each it-

eration in order to resemble to s(k)f (t) and then estimate the

desired extrema from this. In practice, the above techniqueexhibits certain difficulties mainly related to the filter cut-off choice which should also be time varying in the case of

nonstationary signals. Apart from that, it can be argued thatthe filtering preprocess compromises the spontaneous, data-driven nature of the EMD operation.

According to DI-EMD, the estimation of the desired ex-trema is approached from a different viewpoint which isbased on the fact that for the estimation of the desired ex-trema it is not the actual fast oscillating signal s(k)

f (t) that isneeded to be known, but its first derivative. As we will see thefirst derivative of s(k)

f (t) can be effectively estimated directlyin each iteration. More importantly, this can be naturally re-alized within the data-driven framework of EMD.

Figure 5(a) shows an example of a residual signal after k

IMF extractions (which is denoted as h(k)j (t) for generaliza-

tion purposes due to the fact that the procedure describednext can be applied not only to the residual signal x(k) but

to any tentative IMF estimate h(k)j (t). The dotted curve de-

picts the corresponding local mean s(k)s and the filled circles

indicate the local extrema of h(k)j which lay in the positions

where the first derivative of h(k)j , shown with solid line in

Figure 5(b), equals to zero. Moreover, the desired extrema,which are the extrema of the fast oscillating signal compo-

nent, s(k)f (t), are shown with asterisks in Figure 5(b). It can

be readily seen that the local extrema are deviated not onlyfrom the desired positions but also have been smeared out

completely in many cases. D1h(k)j (t) can be written as

D1h(k)j (t) = D1s(k)

s (t) +D1s(k)f (t), (9)

where D1s(k)s (t) is the first derivative of the local mean de-

picted in dotted line. The estimation of the desired extrema

can be achieved by estimating D1s(k)f (t) and then comput-

ing the positions in which it vanishes to zero. Due to thefact that the differentiation operator does not effect the fre-

quency content of the signal, we still expect D1s(k)f (t) and

D1s(k)s (t) to be the fast and the slow oscillating component of

D1h(k)j (t), respectively. Following the discussion in Section 2,

the fast oscillating part and the local mean of a signal can


be efficiently estimated through a sifting process. As a result,the application of a predefined number of sifting iterations

on D1h(k)j (t) produces an estimate of D1s(k)

s (t) which in turn

can be subtracted from D1h(k)j (t) leading to an estimate of

the first derivative of the fast oscillating part of h(k)j (t) shown

in Figure 5(c). The filled circles indicates the zero-crossing

points of D1s(k)f (t) which are adopted as estimates of the de-

sired extrema. From now on, the sifting iterations used forthe desired extrema estimates will be referred to as internaland the normal EMD sifting iterations used for the currentIMF estimate will be called external.

A summary of the DI-EMD is given next by

(1) set j = 0 and h(k)j (t) = x(k)(t);

(2) apply a number of sifting operations on D1h(k)j (t) in

order to obtain an estimate of the first derivative of thetotal local mean D1s(k)

s (t);(3) find the zero-crossings of the first derivative of the fast

oscillating part D1s(k)f (t) = D1h(k)

j (t)−D1s(k)s (t);

(4) in order to estimate h(k)j+1(t), perform a sifting iteration

on h(k)j (t) using as interpolation nodes. the positions

of the zero-crossing points estimated at Step (3). Thecharacterization of the extrema as maxima or minimasimply result from D2s(k)

f (t);(5) set j = j + 1 and return to Step (2) until the stoping

criterion is fulfilled.

As we are going to see in the simulations section, the estima-tion of the desired extrema described at Steps (2) and (3) isnot necessary to be performed for each external sifting itera-tion.

4. METHOD EVALUATION

In this section, DI-EMD will be tested in two different sim-ulation scenarios. The first one is based on a two-pure tonesignal model [4] and the second one contains a tone of lin-early increased amplitude leading to a decomposition prob-lem with gradually increased, with respect to time, difficulty.

4.1. Two-tone signal example

The signal under consideration is given by

x(t) = cos 2πt + α cos(2π f t + ϕ), t ∈ R. (10)

When f takes values in ]0, 1[, the term cos 2πt is the higherfrequency component and α cos(2π f t + ϕ) is the lower fre-quency one. The aforementioned study tried to analyticallyanswer the question: for which ranges of the parameters f ,α,the conventionalEMD

(i) is capable of separating the two signal components,that is, the first IMF h(1)(t) equals to cos 2πt;

(ii) it considers the signal as a single component, that is,h(1)(t) = x(t);

(iii) or h(1)(t) is something else.

s(k)s (t)

h(k)j (t)

(a)

D1h(k)j (t)

D1s(k)s (t)

f (t) = 0

(b)

D1s(k)f (t)

(c)

Figure 5: (a) The signal under consideration (solid line) togetherwith its slow oscillating counterpart (dotted line). The correspond-ing local and desired extrema are depicted with filled circles andasterisks, respectively. (b) Shows the first derivative of the signals in(a). (c) Shows the first derivative of the fast oscillating part, that is,D1s(k)

f (t) = D1h(k)j (t)−D1s(k)

s (t).

The latter behaviour can be considered as undesirablesince the produced results are not directly related to the an-alyzed signal and its intrinsic properties. Interestingly, con-ventional EMD interprets the first extracted component nei-ther as x(t) nor as cos 2πt for a quite wide range of f and αwhen α f 2 > 1 [4].

In Figure 6, the gray area on the f − log10a plane cor-responds to parameters ranges where the EMD behaviour isundesirable since it does not produces IMFs directly relatedto the signal components under consideration. The metricsused for this graph are similar to the ones used in [4]. In thespecific case, the 49.54% of the area α f 2 > 1 corresponds toundesirable decisions.

Figure 7(a) exhibits a comparison between the ill-beha-viour area of the conventional EMD and ill-behaviour areathat corresponds to the 3rd-order DI-EMD when 30 inter-nal iterations per external iteration are used. In both cases,the number of siftings is set equal to 10. The light gray areacorresponds to the ill-behaviour of the conventional EMD,the black area corresponds to ill-behaviour of the 3rd-orderDI-EMD and the midtone gray area corresponds to commonill-behaviour area between the two methods. Figure 7(b)compares the conventional EMD with the DI-EMD having13th-order splines for the external iteration and 3rd-ordersplines for the internal. From Figure 7 is observed that the


210−1−2

log10(α)

0

0.2

0.4

0.6

0.8

1

f

Figure 6: EMD decision areas. The white areas correspond to h(1)(t)either equal to x(t) or cos 2πt, and the gray area corresponds to adifferent decision.

ill-behaviour area is reduced in some extent, especially whenthe high-order DI-EMD method is used. More specifically,using the DI-EMD configuration of Figure 7(a), the ill-behaviour area is reduced to the 37.61% of the α f 2 > 1 areaand using the higher-order DI-EMD (Figure 7(b)) leads to28.84% undesirable area. In the two-tone example, the use ofhigh-order splines in the internal iterations offered no fur-ther improvement. However, as will be seen in the next exam-ple, the latter observation cannot be taken as a general rule.

4.2. Increased AM signal example

For the second performance evaluation, we adopted the sig-nal shown in Figure 8. It consists of a constant frequency andconstant amplitude sinusoid x1(t) with frequency f1 and am-plitude a1 and a sinusoid x2(t) having a linearly increasedamplitude a2(t) and constant frequency f2. Two cases are ex-plored. In the first one f1 = 2.2 f2, and in the second onef1 = 1.5 f2. This simulation example explores the ability ofseveral variants of EMD to extract the faster oscillating sig-nal, that is, the signal x1(t), in a progressively “hostile” envi-ronment, namely, in the presence of a slow oscillating signalhaving a gradually increased amplitude. In such an example,the performance can be quantified by the value of the ratioa2(t)/a1 up to which EMD succeeds in resolving the x1(t).

Starting with f1 = 2.2 f2, Figure 9 shows in the logarith-mic scale the difference between the fast oscillating signalx1(t) and the IMF estimated after 2000 external sifting itera-tions h2000(t) using several different variants of EMD. In fact,the error curves undergo rapid oscillations which have beensmoothed out here for visualization purposes. We observethat in general the EMD methods exhibits a gradually in-creased error as a function of a2(t)/a1 as long as the problem

210−1−2

log10(α)

0

0.2

0.4

0.6

0.8

1

f

(a)

210−1−2

log10(α)

0

0.2

0.4

0.6

0.8

1

f

(b)

Figure 7: (a) The light gray area corresponds to ill-behaviour ofthe conventional EMD, the black area corresponds to ill-behaviourof 3rd-order DI-EMD, and the midtone gray area corresponds tocommon ill-behaviour area between the two methods. (b) The sameas (a), but the 13th-order splines have been used for the externalsifting iterations.

of extracting x1(t) becomes more and more difficult. More-over, for each method there is a point at which the error is ris-ing abruptly due to the fact that the fast signal can no longerbe extracted properly.

We can consider a specific error value, say 0, that indi-cates the a2(t)/a1 value up to which a specific EMD method


151412.5119.586.553.520.5

a2(t)/a1

x2(t) = a2(t) cos(2π f2)

x1(t) = a1 cos(2π f1)

Figure 8: Multicomponent signal consisting of a constant ampli-tude signal and a signal with a linearly increased amplitude.

151050

a2(t)/a1

st-EMD (3)DI-EMD (3|3, 20, 50)DI-EMD (3|3, 500, 50)DI-EMD (3|9, 20, 50)

DI-EMD (3|9, 50, 50)DI-EMD (3|9, 100, 50)DI-EMD (3|9, 500, 50)DI-EMD (3|13, 200, 20)

−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

log 10

(x1(t

)−h

2000

(t))

Figure 9: Error between the fast oscillating signal x1(t) and the IMFestimated after 2000 external sifting iterations.

succeeds in resolving x1(t). According to this performancemeasure, we can explore the ability of extracting the fast os-cillating signal as a function of the external sifting iterationsas it is shown in Figure 10. With respect to the figure legend,st-EMD(q) corresponds to the standard EMD using local ex-trema and spline interpolation of qth order. Furthermore,the notation DI-EMD(q | iq, it, ex) corresponds to doubly-iterative EMD using spline interpolation of orders qth andiqth for the external and the internal sifting processes withthe internal sifting process being realized by a preset numberof it iterations. Moreover, there is the option for the internalsifting process to be performed not after every external siftingiteration, but every ex ones. The latter option corresponds toreduced complexity DI-EMD.

151412.5119.586.553.52

a2(t)/a1

st-EMD (3)DI-EMD (3|3, 20, 50)DI-EMD (3|3, 500, 50)DI-EMD (3|9, 20, 50)

DI-EMD (3|9, 50, 50)DI-EMD (3|9, 100, 50)DI-EMD (3|9, 500, 50)DI-EMD (3|13, 200, 20)

0

1

2

3

×103

Sift

ing

iter

atio

ns

Figure 10: Performance results of different EMD variants for thecase of f1 = 2.2 f 2.

2.82.62.42.221.81.61.41.21

a2(t)/a1

st-EMD (3)DI-EMD (3|3, 20, 50)DI-EMD (3|3, 500, 50)

DI-EMD (3|9, 100, 50)DI-EMD (3|9, 500, 50)DI-EMD (3|13, 200, 20)

0

1

2

3×103

Sift

ing

iter

atio

ns

Figure 11: Performance results of different EMD variants for thecase of f1 = 1.5 f2.

Clearly, in such an example, the proposed doubly-iterative EMD outperforms the standard EMD. More specif-ically, the standard EMD needs more than 2000 sifting itera-tions in order to extract x1(t) up to the point where the slowoscillating signal has amplitude 5 times larger than the am-plitude of the fast oscillating signal. On the other hand, all thevariants of DI-EMD exceed the value a2(t)/a1 = 10 within200 external sifting iterations. At the same time, the com-plexity remains low. For example, in the case of DI-EMD(3 |3, 20, 50) the aforementioned performance is achieved withonly a 40% increase in the total number of siftings. Moreover,


the performance can be further improved with the use ofhigher-order splines in the internal iterations. However, itturns out that the higher the order of splines is, the largerthe number of external siftings has to be in order to achievethe potential performance. Remember that in the case of thetwo tone signal the high-order splines in the internal itera-tions did not lead to performance improvements. The otherway around is happening in the case of the increased AMexample, namely, the performance deteriorates when high-order splines in the external iterations are used. When thefrequency relation is reduced to f1 = 1.5 f2, the advantages ofusing DI-EMD instead of the conventional EMD are reducedas it can be seen in Figure 11.

In general, based on examples examined here, it can beargued that the use of DI-EMD is not “harmful,” however,the corresponding performance improvements depend onthe signal to be analyzed. Moreover, high-order splines al-though, especially in the internal iterations can help, they donot guarantee enhanced performance compared to the cubicspline case and should be carefully used.

5. CONCLUSIONS

In this paper, the doubly-iterative EMD which incorporatesan enhanced technique for interpolation points estimates forthe EMD method was examined in two different simulationexamples. An improvement was demonstrated in the over-all decomposition performance which can in some cases en-hanced when the doubly-iterative method is combined withenvelope estimation using high-order spline interpolation.

ACKNOWLEDGMENTS

This work has been presented in part by Y. Kopsinis andS. McLaughlin, “Enhanced Empirical Mode DecompositionUsing a Novel Sifting-Based Interpolation Points Detection”at the IEEE Statistical Signal Processing Workshop, SSP 2007.This work was performed as part of the BIAS consortium un-der a grant funded by the EPSRC under their Basic Technol-ogy Programme.

REFERENCES

[1] N. E. Huang, Z. Shen, S. R. Long, et al., “The empirical modedecomposition and the Hilbert spectrum for nonlinear andnon-stationary time series analysis,” Proceedings of the RoyalSociety of London A, vol. 454, no. 1971, pp. 903–995, 1998.

[2] L. Cohen, Time-Frequency Analysis, Prentice-Hall, Upper Sad-dle River, NJ, USA, 1995.

[3] N. E. Huang and Z. Wu, “An adaptive data analysis method fornonlinear and nonstationary time series: the empirical modedecomposition and Hilbert spectral analysis,” in Proceedingsof the 4th International Conference on Wavelet Analysis and ItsApplications (WAA ’05), Macao, China, November-December2005.

[4] G. Rilling and P. Flandrin, “One or two frequencies? The em-pirical mode decomposition answers,” IEEE Transactions onSignal Processing, vol. 56, no. 1, pp. 85–95, 2008.

[5] R. Deering and J. F. Kaiser, “The use of a masking signal toimprove empirical mode decomposition,” in Proceedings of the

IEEE International Conference on Acoustics, Speech, and SignalProcessing (ICASSP ’05), vol. 4, pp. 485–488, Philadelphia, Pa,USA, March 2005.

[6] Y. Washizawa, T. Tanaka, D. P. Mandic, and A. Cichocki,“A flexible method for envelope estimation in empiricalmode decomposition,” in Proceedings of the 10th Interna-tional Conference on Knowledge-Based Intelligent Informationand Engineering Systems (KES ’06), vol. 4253, pp. 1248–1255,Bournemouth, UK, October 2006.

[7] Y. Kopsinis and S. McLaughlin, “Investigation of the empiricalmode decomposition based on genetic algorithm optimiza-tion schemes,” in Proceedings of the IEEE International Confer-ence on Acoustics, Speech, and Signal Processing (ICASSP ’07),vol. 3, pp. 1397–1400, Honolulu, Hawaii, USA, April 2007.

[8] Y. Kopsinis and S. McLauglin, “Enhanced empirical mode de-composition using a novel sifting-based interpolation pointsdetection,” in Proceedings of the14th IEEE/SP Workshop on Sta-tistical Signal Processing (SSP ’07), pp. 725–729, Madison, Wis,USA, August 2007.

[9] Y. Kopsinis and S. McLaughlin, “Investigation and perfor-mance enhancement of the empirical mode decompositionmethod based on a heuristic search optimization approach,”IEEE Transactions on Signal Processing, vol. 56, no. 1, pp. 1–13,2008.

[10] G. Rilling, P. Flandrin, and P. Goncalves, “On empirical modedecomposition and its algorithms,” in Proceedings of the 6thIEEE/EURASIP Workshop on Nonlinear Signal and Image Pro-cessing (NSIP ’03), Grado, Italy, June 2003.

[11] N. E. Huang, M.-L. C. Wu, S. R. Long, et al., “A confidencelimit for the empirical mode decomposition and Hilbert spec-tral analysis,” Proceedings of the Royal Society A, vol. 459,no. 2037, pp. 2317–2345, 2003.

[12] G. Rilling and P. Flandrin, “On the influence of sampling onthe empirical mode decomposition,” in Proceedings of the IEEEInternational Conference on Acoustics, Speech, and Signal Pro-cessing (ICASSP ’06), vol. 3, pp. 444–447, Toulouse, France,May 2006.


Research ArticleOptimal Signal Reconstruction Usingthe Empirical Mode Decomposition

Binwei Weng1 and Kenneth E. Barner2

1 Philips Medical Systems, MS 455, Andover, MA 01810, USA2 Department of Electrical and Computer Engineering, University of Delaware, Newark, DE 19716, USA

Correspondence should be addressed to Kenneth E. Barner, [email protected]

Received 26 August 2007; Revised 12 February 2008; Accepted 20 July 2008

Recommended by Nii O. Attoh-Okine

The empirical mode decomposition (EMD) was recently proposed as a new time-frequency analysis tool for nonstationary andnonlinear signals. Although the EMD is able to find the intrinsic modes of a signal and is completely self-adaptive, it does not haveany implication on reconstruction optimality. In some situations, when a specified optimality is desired for signal reconstruction,a more flexible scheme is required. We propose a modified method for signal reconstruction based on the EMD that enhances thecapability of the EMD to meet a specified optimality criterion. The proposed reconstruction algorithm gives the best estimate ofa given signal in the minimum mean square error sense. Two different formulations are proposed. The first formulation utilizesa linear weighting for the intrinsic mode functions (IMF). The second algorithm adopts a bidirectional weighting, namely, it notonly uses weighting for IMF modes, but also exploits the correlations between samples in a specific window and carries out filteringof these samples. These two new EMD reconstruction methods enhance the capability of the traditional EMD reconstruction andare well suited for optimal signal recovery. Examples are given to show the applications of the proposed optimal EMD algorithmsto simulated and real signals.

Copyright © 2008 B. Weng and K. E. Barner. This is an open access article distributed under the Creative Commons AttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properlycited.

1. INTRODUCTION

The empirical mode decomposition (EMD) is proposedby Huang et al. as a new signal decomposition methodfor nonlinear and nonstationary signals [1]. It providesan alternative to traditional time-frequency or time-scaleanalysis methods, such as the short-time Fourier transformand wavelet analysis. The EMD decomposes a signal intoa collection of oscillatory modes, called intrinsic modefunctions (IMF), which represent fast to slow oscillationsin the signal. Each IMF can be viewed as a subband of asignal. Therefore, the EMD can be viewed as a subband signaldecomposition. Traditional signal analysis tools, such asFourier or wavelet-based methods, require some predefinedbasis functions to represent a signal. The EMD relies on afully data-driven mechanism that does not require any apriori known basis. It has also been shown that the EMDhas some relationship with wavelets and filterbank. It isreported that the EMD behaves as a “wavelet-like” dyadicfilter bank for fractional Gaussian noise [2, 3]. Due to

these special properties, the EMD has been used to addressmany science and engineering problems [4–13]. Althoughthe EMD is computed iteratively and does not possess ananalytical form, some interesting attempts have been maderecently to address its analytical behavior [14].

The EMD depends only on the data itself and iscompletely unsupervised. In addition, it satisfies the perfectreconstruction (PR) property as the sum of all the IMFsyields the original signal. In some situations, however, notall the IMFs are needed to obtain certain desired properties.For instance, when the EMD is used for denoising a signal,partial reconstruction based on the IMF energy eliminatesnoise components [15]. Such partial reconstruction utilizesa binary IMF decision, that is, either discarding or keepingIMFs in the partial summation. Such partial reconstructionis not based on any optimality conditions. In this paper, wegive an optimal signal reconstruction method that utilizesdifferently weighted IMFs and IMF samples. Stated moreformally, the problem addressed here is the following: givena signal, how best to reconstruct the signal by the IMFs


obtained from a signal that bears some relationship to thegiven signal. This can be regarded as a signal approximationor reconstruction problem and is similar to the filteringproblem in which an estimated signal is obtained by filteringa given signal. The problem arises in many applicationssuch as signal denoising and interference cancellation. Theoptimality criterion used here is the mean square error.Numerous methodologies can be employed to combine theIMFs to form an estimate. A direct approach is using linearweighting of IMFs. This leads to our first proposed optimalsignal reconstruction algorithm based on EMD (OSR-EMD).For notational brevity, the suffix EMD is omitted and OSR,BOSR, and RBOSR are used instead of OSR-EMD, BOSR-EMD, RBOSR-EMD. A second approach is using weightingcoefficients along both vertical IMF index direction andhorizontal temporal index direction. Because of this, thesecond approach is named as the bidirectional optimal signalreconstruction algorithm (BOSR-EMD). As a supplement tothe BOSR, a regularized version of BOSR (RBOSR-EMD)is also proposed to overcome the numerical instability ofthe BOSR. Simulation examples show that the proposedalgorithms are well suited for signal reconstruction andsignificantly improve the partial reconstruction EMD.

The structure of the paper is as follows. In Section 2,we give a brief introduction to the EMD. Then the OSR isformulated in Section 3. The BOSR and RBOSR algorithmsare proposed in Section 4. Simulation examples are givenin Section 5 to demonstrate the efficacy of the algorithms.Finally, conclusions are made in Section 6.

2. EMPIRICAL MODE DECOMPOSITION

The aim of the EMD is to decompose a signal into a sumof intrinsic mode functions (IMF). An IMF is defined as afunction with equal number of extrema and zero crossings(or at most differed by one) with its envelopes, as definedby all the local maxima and minima, being symmetric withrespect to zero [1]. An IMF represents a simple oscillatorymode as a counterpart to the simple harmonic function usedin Fourier analysis.

Given a signal x(n), the starting point of the EMD is theidentification of all the local maxima and minima. All thelocal maxima are then connected by a cubic spline curve asthe upper envelop eu(n). Similarly, all the local minima areconnected by a spline curve as the lower envelop el(n). Themean of the two envelops is denoted as m1(n) = [eu(n) +el(n)]/2 and subtracted from the signal. Thus the first proto-IMF h1(n) is obtained as

h1(n) = x(n)−m1(n). (1)

The above procedure to extract the IMF is referred to as thesifting process. Since h1(n) still contains multiple extremabetween zero crossings, the sifting process is performed againon h1(n). This process is applied repetitively to the proto-IMF hk(n) until the first IMF c1(n), which satisfies the IMFcondition, is obtained. Some stopping criteria are used to

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

ai

1 2 3 4 5 6 7 8

IMF index i

Figure 1: Optimal coefficients ai’s for the OSR.

terminate the sifting process. A commonly used criterion isthe sum of difference (SD):

SD =T∑

n=0

∣∣hk−1(n)− hk(n)∣∣2

h2k−1(n)

. (2)

When the SD is smaller than a threshold, the first IMF c1(n)is obtained, which is written as

r1(n) = x(n)− c1(n). (3)

Note that the residue r1(n) still contains some usefulinformation. We can therefore treat the residue as a newsignal and apply the above procedure to obtain

r1(n)− c2(n) = r2(n)...

rN−1(n)− cN (n) = rN (n).

(4)

The whole procedure terminates when the residue rN (n) iseither a constant, a monotonic slope, or a function withonly one extremum. Combining the equations in (3) and (4)yields the EMD of the original signal,

x(n) =N∑

i=1

ci(n) + rN (n). (5)

The result of the EMD produces N IMFs and a residuesignal. For convenience, we refer to ci(n) as the ith-order IMF.By this convention, lower order IMFs capture fast oscillationmodes while higher order IMFs typically represent slowoscillation modes. If we interpret the EMD as a time-scaleanalysis method, lower-order IMFs and higher-order IMFscorrespond to the fine and coarse scales, respectively. Theresidue itself can be regarded as the last IMF.

B. Weng and K. E. Barner 3

−150−100−50

050

100150200250

b ij

12 3 4 5

6 78

IMF index i

10.5

0−0.5

−1

Sample index j

Figure 2: Optimal coefficients bi j ’s for the BOSR.

−1

−0.5

0

0.5

1

1.5

2

br ij

12 3 4 5

6 78

IMF index i

10.5

0−0.5

−1

Sample index j

Figure 3: Optimal coefficients bi j ’s for the RBOSR.

3. OPTIMAL SIGNAL RECONSTRUCTION USING EMD

The traditional empirical mode decomposition presentedin the previous section is a perfect reconstruction (PR)decomposition as the sum of all IMFs yields the originalsignal. Consider the related problem in which the objective isto combine the IMFs in a fashion that approximates a signald(n) that is related to x(n). This problem is exemplified bysignal denoising application where x(n) is a noise-corruptedversion of d(n) and the aim is to reconstruct d(n) from x(n).The IMFs can be combined utilizing various methodologiesand under various objective functions designed to approxi-mate d(n). We consider several such methods beginning witha simple linear weighting,

d(n) =N∑

i=1

aici(n), (6)

where the coefficient ai is the weight assigned to the ith IMF.Note that, for convenience, the residue term is absorbed inthe summation as the last term cN (n). Also, the IMFs aregenerated by decomposing x(n), which has some relationship

with the desired signal d(n). To optimize the ai coefficients,we employ the mean square error (MSE),

J1 = E{[d(n)− d(n)

]2} = E{[d(n)−

N∑

i=1

aici(n)]2}

. (7)

The optimal coefficients can be determined by taking thederivative of (7) with respect to ai and setting it to zero.Therefore, we obtain

N∑

j=1

ajE{ci(n)cj(n)

} = E{d(n)ci(n)

}, (8)

or equivalently,

N∑

i=1

Rijaj = pi, i = 1, . . . ,N (9)

by defining

pi = E{d(n)ci(n)

}, Rij = E

{ci(n)cj(n)

}. (10)

The above N equations can be written in a matrix form asfollows:

⎡

⎢⎢⎢⎢⎣

R11 R12 · · · R1N

R21 R22 · · · R2N...

.... . .

...RN1 RN2 · · · RNN

⎤

⎥⎥⎥⎥⎦

⎡

⎢⎢⎢⎢⎣

a1

a2...aN

⎤

⎥⎥⎥⎥⎦=

⎡

⎢⎢⎢⎢⎣

p1

p2...pN

⎤

⎥⎥⎥⎥⎦

, (11)

which can be compactly written as

R1a = p. (12)

The optimal coefficients are thus given by

a∗ = R−11 p. (13)

The dimension of the matrix R1 is N × N . Since thenumber of IMFs N is usually a small integer number, thematrix inversion does not incur any numerical difficulties.The minimum MSE can also be found by substituting (13)into (7), which yields

J1,min = E{[d(n)−

N∑

i=1

a∗i ci(n)]2}

= σ2d − pTR−1

1 p, (14)

where σ2d = E{d2(n)} is the variance of the desired signal. In

practice, pi and Rij are estimated by sample average.Many signals to which the EMD is applied are non-

stationary. Also matrix inversion may be too costly insome situations. In such cases, an iterative gradient descentadaptive approach can be utilized:

ai(n + 1) = ai(n)− μ∂J1∂ai

∣∣∣∣ai=ai(n)

, (15)


1

0.8

0.6

0.4

0.2

0

−0.2

−0.4

−0.61000 1040 1080 1120 1160 1200

n

(a)

1

0.8

0.6

0.4

0.2

0

−0.2

−0.4

−0.61000 1040 1080 1120 1160 1200

n

OriginalLinear filter

(b)

1

0.8

0.6

0.4

0.2

0

−0.2

−0.4

−0.61000 1040 1080 1120 1160 1200

n

OriginalPAR-EMD

(c)

1

0.8

0.6

0.4

0.2

0

−0.2

−0.4

−0.61000 1040 1080 1120 1160 1200

n

OriginalOSR

(d)

1

0.8

0.6

0.4

0.2

0

−0.2

−0.4

−0.61000 1040 1080 1120 1160 1200

n

OriginalBOSR

(e)

1

0.8

0.6

0.4

0.2

0

−0.2

−0.4

−0.61000 1040 1080 1120 1160 1200

n

OriginalRBOSR

(f)

Figure 4: Denoising performance. Shown in dash lines are the original signal and the solid lines are denoised signals. (a) Noisy signal, (b)linear Butterworth filter, (c) PAR-EMD, (d) OSR, (e) BOSR, (f) RBOSR.


−30

−25

−20

−15

−10

−5

0

−2 0 2

ω

B1(ω)

(a)

−20

−15

−10

−5

0

−2 0 2

ω

B2(ω)

(b)

−12

−10

−8

−6

−4

−2

0

−2 0 2

ω

B3(ω)

(c)

−40

−30

−20

−10

0

10

−2 0 2

ω

B4(ω)

(d)

−5

0

5

10

15

20

−2 0 2

ω

B5(ω)

(e)

−5

0

5

10

15

20

25

−2 0 2

ω

B6(ω)

(f)

−10

0

10

20

30

40

−2 0 2

ω

B7(ω)

(g)

−10

0

10

20

30

40

50

60

−2 0 2

ω

B8(ω)

(h)

Figure 5: Equivalent filter frequency responses for BOSR algorithm coefficients. Frequency responses of B1–B8 are shown in dB values.

where μ is a positive number controlling the convergencespeed. By taking the gradient and using instantaneousestimate for expectation, we obtain

∂J1∂ai

= −2E{[d(n)−

N∑

i=1

ai(n)ci(n)]ci(n)

}

= −2E{e(n)ci(n)

} ≈ −2e(n)ci(n).

(16)

Therefore, the weight update equation (15) can be written as

ai(n + 1) = ai(n) + 2μe(n)ci(n), i = 1, . . . ,N. (17)

From the above formulation, it is clear that the OSR isvery similar to the Wiener filtering, which aims to estimatea desired signal by passing a signal through a linear filter.The main difference is that the OSR operates samplesin the EMD domain and weights samples according tothe IMF order while the Wiener filter applies filtering totime domain signals directly and weights them temporally.Two special cases of the OSR are remarked as follows. Ifall the coefficients ai = 1, then it is equivalent to theoriginal perfect reconstruction EMD (PR-EMD). If some

coefficients are set to zero while others are set to one, itreduces to the partial reconstruction EMD (PAR-EMD) usedin [8, 15]. Therefore, the OSR extends the capability ofthe traditional EMD reconstruction and more importantly,yields the optimal estimate of a given signal in the meansquare error sense.

4. BIDIRECTIONAL OPTIMAL SIGNALRECONSTRUCTION USING EMD

In the EMD, there are two directions in the resulting IMFs.The first direction is the vertical direction denoted by theIMF order i in (5). The vertical direction correspondsto different scales. The other direction is the horizontaldirection represented by the time index n in (5). Thisdirection captures the time evolution of the signal. TheOSR proposed in the last section only uses the weightingalong the vertical direction. Therefore, it lacks degree offreedom in the horizontal, or temporal direction. In somecircumstances, adjacent signal samples are correlated andthis factor must be considered when performing reconstruc-tion.

A more flexible EMD reconstruction algorithm thatincorporates the signal correlation among samples in a


−30

−25

−20

−15

−10

−5

0

−2 0 2

ω

Br1(ω)

(a)

−20

−15

−10

−5

0

−2 0 2

ω

Br2(ω)

(b)

−14

−12

−10

−8

−6

−4

−2

0

−2 0 2

ω

Br3(ω)

(c)

−35

−30

−25

−20

−15

−10

−5

0

−2 0 2

ω

Br4(ω)

(d)

−25

−20

−15

−10

−5

0

−2 0 2

ω

Br5(ω)

(e)

−50

−40

−30

−20

−10

0

−2 0 2

ω

Br6(ω)

(f)

−15

−10

−5

0

5

10

−2 0 2

ω

Br7(ω)

(g)

−12

−10

−8

−6

−4

−2

0

−2 0 2

ω

Br8(ω)

(h)

Figure 6: Equivalent filter frequency responses for RBOSR algorithm coefficients. Frequency responses of B1–B8 are shown in dB values.

temporal window is described as follows. For a specifictime n, a temporal window of size 2M + 1 is chosenwith the current sample being the center of the win-dow. Weighting is concurrently employed to account forthe relations between IMFs. Consequently, 2D weight-ing coefficients bi j are utilized to yield the estimatedsignal

d(n) =N∑

i=1

M∑

j=−Mbi jci(n− j), (18)

where M is the half window length. This formulationtakes both vertical and horizontal directions into con-sideration and is thus referred to as the bidirectionaloptimal signal reconstruction (BOSR). From (18), thebidirectional weighting can be interpreted as follows. Theith IMF ci(n) is passed through a FIR filter bi j of length2M + 1. Thus we have a filter bank consisting of N FIRfilters, each of which is applied to an individual IMF.The final output is the summation of all filter outputs.Compared to the OSR, the BOSR makes use of the cor-relation between the samples. However, the cost paid forthe gained degrees of freedom is increased computationalcomplexity.

Similar to the OSR, the optimization criterion chosenhere is the mean square error

J2 = E{[d(n)−

N∑

i=1

M∑

j=−Mbi jci(n− j)

]2}. (19)

Differentiating, with respect to the coefficient bi j and settingit to zero, yields

N∑

k=1

M∑

l=−MbklR2(k, i; l, j) = p2(i, j),

i = 1, . . . ,N , j = −M, . . . ,M,

(20)

where we define

R2(k, i; l, j) = E{ck(n− l)ci(n− j)

}, (21)

p2(i, j) = E{d(n)ci(n− j)

}. (22)

It can be seen that the correlation in (21) is bidirectionalwith a quadruple index representing both IMF order and


temporal directions. There are altogether (2M + 1)N equa-tions in (20) and if we rearrange the R2(k, i; l, j) and p2(i, j)

according to the lexicographic order, (20) can be put into thefollowing matrix equation:

⎡

⎢⎢⎢⎢⎣

R2(1, 1;−M,−M) R2(1, 1;−M + 1,−M) · · · R2(N , 1;M,−M)R2(1, 1;−M,−M + 1) R2(1, 1;−M + 1,−M + 1) · · · R2(N , 1;M,−M + 1)

......

. . ....

R2(1,N ;−M,M) R2(1,N ;−M + 1,M) · · · R2(N ,N ;M,M)

⎤

⎥⎥⎥⎥⎦

⎡

⎢⎢⎢⎢⎣

b1,−Mb1,−M+1

...bN ,M

⎤

⎥⎥⎥⎥⎦=

⎡

⎢⎢⎢⎢⎣

p2(1,−M)p2(1,−M + 1)

...p2(N ,M)

⎤

⎥⎥⎥⎥⎦. (23)

Equation (23) can be compactly written as

R2b = p2, (24)

from which the optimal solution b∗ is given by

b∗ = R−12 p2. (25)

The dimension of the matrix R2 is (2M + 1)N × (2M + 1)N ,so the computational complexity due to matrix inversion isincreased from O(N3) for the OSR algorithm to O((2M +1)3N3). However, since the BOSR performs weighting in IMForder and temporal directions, it can better capture signalcorrelations. The elements of the matrix R2 and the vector pcan be estimated by sample averages. As in the OSR case, anadaptive approach can be utilized. After some derivation, weobtain the weight update equation for BOSR:

bi j(n + 1) = bi j(n) + 2μe(n)ci(n− j),

i = 1, . . . ,N , j = −M . . . ,M.(26)

In the BOSR, the memory length M needs to be chosen.More samples in the window will improve the performanceas more signal memories are taken into consideration toaccount for the temporal correlation. However, the perfor-mance gain is no longer substantial when M is increasedto a certain number. As such, we can set up an objectivefunction similar to Akaike information criterion (AIC) todetermine the optimal memory length M [16]. This processis analogous to choosing model order in the statisticalmodeling.

4.1. Regularized bidirectional optimalsignal reconstruction using EMD

Although the BOSR considers the time domain correlationsbetween samples, a problem arises in calculating the optimalcoefficients b∗ by (25), as the matrix R2 is sometimes illconditioned.

To see why R2 is sometimes ill conditioned, let R2 =E{c(n)cT(n)} where

c(n) = [c1(n +M), . . . , c1(n−M), c2(n +M), . . . ,

c2(n−M), . . . , cN (n +M), . . . , cN (n−M)]T.(27)

Also denote R2(:, k) as the kth column of the matrix R2. Itcan be shown that

R2(:, k) = E{

c(n)ci(n− j)}

, (28)

where k = (i − 1) × (2M + 1) + j + M + 1 for i = 1, . . . ,N ,j = −M, . . . ,M. Note that when the IMF order i is large,ci(n) tends to have fewer oscillations and thus fewer changesbetween consecutive samples. The extreme case is a nearlyconstant residue for the last IMF cN (n). Thus, ci(n) becomessmoother when the order i becomes large. Due to this fact,ci(n − j) and ci(n − j + 1) are very similar for large i.Consequently, the two columns R2(:, k) and R2(:, k + 1) arealso very similar, which results in R2 being ill conditioned.

To alleviate the potential ill-condition problem of theBOSR, we propose a regularized version of the BOSR(RBOSR). The original objective function J2 does notplace any constraints on the bi j coefficients. We add someregularizing conditions on bi j by restricting their values tobe in the range −U ≤ bi j ≤ U. This condition implies thatthe magnitudes of the coefficients are bounded by a constantU.

The original problem is thus changed into the followingconstrained optimization problem:

minimize J2 = E{[d(n)−

N∑

i=1

M∑


]2}

subject to −U≤bi j ≤U, ∀ 1≤ i≤N , −M≤ j≤M.(29)

To solve the above constrained optimization problem, wecan invoke the Kuhn-Tucker condition [17], which gives anecessary condition for the optimal solution. The Lagrangianof the minimization problem can be written as

L(bi j ,μi j , λi j

)

= J2(bi j)

+N∑

i=1

M∑

j=−Mμij(− bi j −U

)+

N∑

i=1

M∑

j=−Mλi j(bi j −U

).

(30)

Applying the Kuhn-Tucker condition yields the followingequations:

∇L(bi j ,μi j , λi j) = ∂J2

∂bi j− μi j + λi j = 0,

μi j(− bi j −U

) = 0,

λi j(bi j −U

) = 0,

μi j ≥ 0,

λi j ≥ 0.

(31)


10−4

10−3

10−2

10−1

MSE

0 5 10 15 20 25

SNR (dB)

Linear filterPAR-EMDOSR

BOSRRBOSR

Figure 7: MSE versus SNR for three different denoising algorithms.

Iterative algorithms for general nonlinear optimization, suchas the interior point method, can be utilized to find theoptimal solution to the above problem [17]. A fundamentalpoint of note is that the solution is guaranteed to be globallyoptimal since both the objective function and constraints areconvex functions.

An alternative approach to solve the constrained mini-mization problem is to view it as a quadratic programmingproblem. The objective function can be rewritten as

J2 = E{[d(n)−

N∑

i=1

M∑


]2}

= E{d2(n)

}− 2bTp2 + bTR2b,

(32)

where b, p2, R2 are defined as in (24), and c(n) is the vectorin (27). The optimization problem can thus be restated as astandard quadratic programming problem:

minimize J ′2 = bTR2b− 2pT2 b

subject to −U � b � U,(33)

where the symbol � denotes component-wise less than orequal to for vectors. Since the objective function is convexand the inequality constraints are simple bounds, a fasterconjugate gradient search for quadratic programming can beperformed to find the optimal solution [17].

5. APPLICATIONS

Having established the OSR and BOSR algorithms, we applythem to various applications. Two examples are given. Thefirst application considered is signal denoising, where sim-ulated random signals are used. In the second example, theproposed algorithms are applied to real biomedical signals

10−4

10−3

10−2

10−1

MSE

0 5 10 15 20 25

SNR (dB)

M = 1M = 2M = 3

M = 4M = 5

(a)

10−2.55

10−2.54

10−2.53

10−2.52

10−2.51M

SE

14.94 14.96 14.98 15 15.02 15.04 15.06 15.08

SNR (dB)

M = 1M = 2M = 3

M = 4M = 5

(b)

Figure 8: Performances for different memory length. (a) Large-scale view, (b) zoomed-in view.

to remove ECG interferences from EEG recording. Thefollowing example illustrates the denoising using the OSR,BOSR, and RBOSR algorithms and compares them with thelinear lowpass filtering and the partial reconstruction EMD(PAR-EMD) in [15]. The PAR-EMD method is based on theIMF signal energy and the reconstructed signal is given bythe partial summation of those IMFs whose energy exceedsan established threshold.

Example 1. The original signal in this example is a bilinearsignal model:

x(n) = 0.5x(n− 1) + 0.6x(n− 1)v(n− 1) + v(n), (34)


Table 1: Optimal coefficients of the OSR algorithm.

IMF order i 1 2 3 4 5 6 7 8

a∗i 0.2859 0.5150 0.8496 0.8833 0.9710 0.9609 0.9639 0.9653

Table 2: Optimal coefficients of the BOSR algorithm (M = 1).

b∗i jIMF order i

1 2 3 4 5 6 7 8

−1 0.2219 0.3843 0.2322 0.5301 −1.4488 −3.0298 8.7950 −123.9047

0 0.4654 0.3261 0.2439 −0.1592 3.6319 7.1100 −19.3460 246.7700

1 0.1899 0.2149 0.5048 0.5612 −1.2579 −3.1527 11.6191 −121.9207

where v(n) is white noise with variance equal to 0.01. Bilinearsignal model is a type of nonlinear signal model. AdditiveLaplacian noise with variance 0.0092 is added to the signalto attain a SNR = 10 dB, where SNR is defined as theratio of signal power and noise variance. The total signallength is 2000 and the first 1000 samples are used as thetraining signal d(n) to estimate the optimal OSR, BOSR, andRBOSR coefficients. Once these coefficients are determined,the remaining samples are tested for denosing. The denoisedsignal is obtained by substituting the optimal coefficients intothe reconstruction formulae (6) and (18). In the following,the denoising performance is evaluated by the mean squareerror calculated as

MSE = 1L2 − L1 + 1

L2∑

n=L1

[xo(n)− x(n)

]2, (35)

where L1 and L2 are starting and ending indices of testingsamples, and xo(n) and x(n) are original noise-free anddenoised signals, respectively.

In the following, the signal memory M in the BOSRis chosen to be 1. Eight IMFs are obtained after the EMDdecomposition. Hence, the total number of ai coefficientsis 8 and the total number of bi j coefficients is 24. In theRBOSR algorithm, the regularizing bound U is chosen to be10. The optimal coefficients a∗i and b∗i j obtained by the OSR,BOSR, RBOSR are listed in Tables 1, 2, and 3, respectively.These coefficients are also graphically represented by Figures1, 2, and 3. It can be observed that the first several weightingcoefficients for the OSR are relatively small. As the IMF orderincreases, the ai coefficients also increase to some values closeto one. This can be seen as a generalization of the PAR-EMDin which binary selection on the IMFs is replaced by linearweighting of the IMFs. The result is also in agreement withthat of the PAR-EMD where it is found that the lower-orderIMFs contain more noise components than the higher-orderIMFs. Consequently, lower-order IMFs should be assignedsmall weights in denoising. When comparing the optimalbi j coefficients obtained by the BOSR and RBOSR, we seethat the BOSR yields coefficients that differ in magnitudeon the order of thousands (see Table 2 and Figure 2), whilethe optimal coefficients obtained by the RBOSR are closerto each other (see Table 3 and Figure 3). Therefore, theregularization process mitigates the numerical instability ofthe original BOSR algorithm.

−1010

(μV

)

16 18 20 22 24 26 28 30

Time (s)

(a)

−1010

(μV

)16 18 20 22 24 26 28 30

Time (s)

(b)

−1010

(μV

)

16 18 20 22 24 26 28 30

Time (s)

(c)

−1010

(μV

)

16 18 20 22 24 26 28 30

Time (s)

(d)

−1010

(μV

)

16 18 20 22 24 26 28 30

Time (s)

(e)

−1010

(μV

)

16 18 20 22 24 26 28 30

Time (s)

(f)

−1010

(μV

)

16 18 20 22 24 26 28 30

Time (s)

(g)

Figure 9: ECG interference removal in EEG. (a) Original EEG,(b) EEG containing ECG interferences, (c) OSR (MSE = 4.1883),(d) adaptive OSR (MSE = 3.3599), (e) BOSR (MSE = 2.7189), (f)adaptive BOSR (MSE = 2.3354), (g) RBOSR (MSE = 2.0432).


Table 3: Optimal coefficients of the regularized BOSR algorithm (M = 1).

b∗i jIMF order i

1 2 3 4 5 6 7 8

−1 0.2235 0.3774 0.2196 0.3947 0.1248 0.3133 −0.8960 0.0719

0 0.4651 0.3275 0.2643 0.1128 0.5437 0.3387 0.3509 0.3213

1 0.1891 0.2102 0.4897 0.4160 0.3245 0.3193 1.5481 0.5592

The denoising results are shown in Figure 4 where wealso show the results of the Butterworth lowpass filteringand the PAR-EMD algorithm. The noisy signal is shownin Figure 4(a) in which testing samples from 1000–1200are shown. Figures 4(b), 4(c), 4(d), 4(e), and 4(f) showthe denoised signals reconstructed by the linear filter, PAR-EMD, OSR, BOSR, and RBOSR, respectively, and comparethe resulting signals with the original signal. It can be seenthat the OSR, BOSR, and RBOSR produce a signal closer tothe original signal than the other two methods. However,the BOSR performs slightly better than the OSR sincethe residual error is smaller. The reason for the improvedperformance is that the BOSR takes the signal correlationinto account. Furthermore, the performances of the BOSRand RBOSR are very close. This shows that even thoughthe coefficients of the BOSR are much more dispersed thanthose of RBOSR, the BOSR performance does not suffer fromthis. Measured quantitatively by the MSE from (35), thesealgorithms yield MSE of 0.0193 for linear filter, 0.01 for thePAR-EMD, 0.0063 for the OSR, 0.0046 for the BOSR, and0.0046 for the RBOSR.

We remarked in Section 4 that the bidirectional bi jcoefficients act as a FIR filter in the time domain for the ithIMF. Therefore, it is interesting to investigate the behavior ofthese filters as the order of IMF changes. Starting from thefirst IMF, we plot the frequency responses of the filters usedin the BOSR algorithm in Figure 5. It can be seen that the firstfilter B1(ω) applied to IMF 1 exhibits lowpass characteristics.As the IMF order increases, the filters first become bandpassfilters and then more highpass-like filters. In the denoisingapplication, the first IMF contains strong noise components.So the filter tries to filter the noise out and leaves onlylowpass signal components. For the mid-order IMFs, noisecomponents are mainly located in certain frequency bands,which tunes the filter to be bandpass. For high-order IMFs,the filter gain is high and the DC frequency range is nearlykept unchanged (0 dB). The BOSR is equivalent to filteringthe signal by N different filters in N different IMFs. Thiswill not be possible if we simply use the partial summationof IMFs. The frequency responses of the filters used in theRBOSR are also shown in Figure 6 with a different behaviorobserved. These filters are either of lowpass or bandpasstype and no highpass characteristics are exhibited. Also, thefilter gains for RBOSR are generally smaller than those ofBOSR, which is a result of coefficient regularization in theoptimization process.

A more thorough study using a wide range of differentrealizations of stochastic signals is carried out by MonteCarlo simulation. Figure 7 shows the MSE versus SNR for

the five algorithms: linear filtering, PAR-EMD, OSR, BOSR,and RBOSR. At each SNR, 500 runs are performed to obtainan averaged MSE as shown in the figure. We see that theOSR and BOSR algorithms outperform the linear filteringand PAR-EMD over the entire SNR range. The performancesof the BOSR and RBOSR are better than that of the OSR,as expected. The BOSR performs slightly better than theRBOSR even though its coefficients are less regular.

To investigate the effects of the memory length M on theBOSR performance, five different values of M are chosen(M = 1, 2, 3, 4, 5). Monte Carlo simulation is carried outto compare the performances of the BOSR for differentMs. From Figure 8(a), using larger M does not significantlyimprove the performance as we see those curves are gettingcloser to each other as M increases. A zoomed-in viewaround SNR = 15 dB in Figure 8(b) more clearly shows thatlarger M yields lower MSE, though this difference is noteasily distinguishable from the larger scale plot. It is thereforeadvised to choose a small M instead of large M in the BOSRsince small M can do as good a job as large M but with lesscomplexity.

Example 2. Electroencephalogram (EEG) is widely usedas an important diagnostic tool for neurological disorder.Cardiac pulse interference is one of the sources that affect theEEG recording [18]. The EMD method is especially usefulfor nonlinear and nonstationary biomedical signals [19–22].The optimal reconstruction algorithms based on EMD aretherefore used to remove the ECG interferences from EEGrecording.

Real EEG and ECG recordings are obtained from a 37-year-old woman at Alfred I., DuPont Hospital for Childrenin Wilmington, Delaware. The signals are sampled at 128 Hz.The EEG signal with ECG interferences is obtained by addingattenuated ECG component to EEG, that is, x(t) = xe(t) +αxc(t), where xe(t) is the EEG, xc(t) is the ECG, and α = 0.6reflects the attenuation in the pathways. The total durationof recording is about 29 minutes and we select the first2000 samples (0–15.625 seconds) as the training samples andthe next 2000 samples (15.625–31.25 seconds) as the testingsamples. The original EEG and the EEG containing ECGinterferences are shown in Figures 9(a) and 9(b), respectively.It is clear that the spikes due to the QRS complex of ECGis prominent in EEG. The spectra of ECG and EEG areoverlapped because the bandwidth for ECG monitoring is0.5–50 Hz, while the frequency bands of EEG range from 0.5–13 Hz and above [23]. Therefore, simple filtering techniquescannot be used to separate EEG from ECG interferences.The three optimal reconstruction methods, OSR, BOSR, and


RBOSR, together with their adaptive versions, are applied tothe ECG contaminated EEG signal. The memory length Mis set to 1 for both BOSR and RBOSR and the bound Ufor RBOSR is chosen to be 10. The reconstructed samplesare shown in Figures 9(c), 9(d), 9(e), 9(f), and 9(g). Theresulting signal of the OSR still has some residual spikes. BothBOSR and RBOSR yield signal waveforms that are closerto the original EEG. However, there is a baseline wanderin the initial stage of the BOSR result while this baselinewander does not exist in the RBOSR result. Adaptive modesof the OSR and BOSR are used and the results are shownin Figures 9(d) and 9(f), respectively. From these figures, allthese optimal reconstruction methods are able to remove theECG interferences from EEG to some extent. But the BOSRand RBOSR are better than the OSR, which agrees with thefirst example. In terms of MSE, the OSR has MSE = 4.1883while the BOSR and RBOSR achieve MSE of 2.7189 and2.0432, respectively. The adaptive modes of OSR and BOSRyield MSE of 3.3599 and 2.3354, thus slightly improve theoriginal algorithms.

6. CONCLUSION

The empirical mode decomposition is a tool for analyzingnonlinear and nonstationary signals. Conventional EMD,however, does not impose on optimality conditions forreconstruction from IMFs. In this paper, several improvedversions of EMD signal reconstruction that are optimal in theminimum mean square error sense are proposed. The firstalgorithm OSR estimates a given signal by linear weightingof the IMFs. The coefficients are determined by solving alinear set of equations. To consider the temporal structureof a signal, BOSR is then proposed. The weighting of theBOSR is carried out not only in the IMF order direction, butalso in the temporal direction. It is able to compensate forthe time correlation between adjacent samples. The proposedalgorithms are applied to signal denoising problem, whereboth the OSR and BOSR have better performance thanthe traditional partial reconstruction EMD. These methodsare also applied to real biomedical signals where ECGinterferences are removed from EEG recordings. The optimalEMD reconstruction methods proposed in this paper givesome new insight to this promising signal analysis tool.

REFERENCES

[1] N. E. Huang, Z. Shen, S. R. Long, et al., “The empiricalmode decomposition and hilbert spectrum for nonlinear andnonstationary time series analysis,” Proceedings of the RoyalSociety A, vol. 454, no. 1971, pp. 903–995, 1998.

[2] P. Flandrin, G. Rilling, and P. Goncalves, “Empirical modedecomposition as a filter bank,” IEEE Signal Processing Letters,vol. 11, no. 2, part 1, pp. 112–114, 2004.

[3] P. Flandrin and P. Goncalves, “Empirical mode decompositionas a data-driven wavelet-like expansions,” International Jour-nal of Wavelets, Multiresolution and Information Processing,vol. 2, no. 4, pp. 477–496, 2004.

[4] K. Coughlin and K. K. Tung, “Eleven-year solar cycle signalthroughout the lower atmosphere,” Journal of GeophysicalResearch, vol. 109, no. D21, p. D21105, 2004.

[5] J. C. Nunes, S. Guyot, and E. Delechelle, “Texture analysisbased on local analysis of the bidimensional empirical modedecomposition,” Machine Vision and Applications, vol. 16, no.3, pp. 177–188, 2005.

[6] D. Pines and L. Salvino, “Structural health monitoringusing empirical mode decomposition and the Hilbert phase,”Journal of Sound and Vibration, vol. 294, no. 1-2, pp. 97–124,2006.

[7] W. Huang, Z. Shen, N. E. Huang, and Y. C. Fung, “Engineeringanalysis of biological variables: an example of blood pressureover 1 day,” Proceedings of the National Academy of Sciencesof the United States of America, vol. 95, no. 9, pp. 4816–4821,1998.


[9] B. Weng, M. Blanco-Velasco, and K. E. Barner, “ECGdenoising based on the empirical mode decomposition,” inProceedings of the 28th Annual International Conference of IEEEEngineering in Medicine and Biology Society (EMBS ’06), pp. 1–4, New York, NY, USA, August-September 2006.

[10] J. C. Nunes, Y. Bouaoune, E. Delechelle, O. Niang, and Ph.Bunel, “Image analysis by bidimensional empirical modedecomposition,” Image and Vision Computing, vol. 21, no. 12,pp. 1019–1026, 2003.


[12] N. Huang and N. O. Attoh-Okine, The Hilbert-Huang Trans-form in Engineering, CRC Press, Boca Raton, Fla, USA, 2005.

[13] B. Weng, M. Blanco-Velasco, and K. E. Barner, “Baseline wan-der correction in ECG by the empirical mode decomposition,”in Proceedings of the 32nd Annual Northeast BioengineeringConference (NEBC ’06), pp. 135–136, Easton, Pa, USA, April2006.

[14] E. Delechelle, J. Lemoine, and O. Niang, “Empirical modedecomposition: an analytical approach for sifting process,”IEEE Signal Processing Letters, vol. 12, no. 11, pp. 764–767,2005.

[15] P. Flandrin, P. Goncalves, and G. Rilling, “Detrending anddenoising with empirical mode decomposition,” in Pro-ceedings of the 12th European Signal Processing Conference(EUSIPCO ’04), Viena, Austria, September 2004.

[16] H. Akaike, “A new look at the statistical model identification,”IEEE Transactions on Automatic Control, vol. 19, no. 6, pp.716–723, 1974.

[17] D. G. Luenberger, Introduction to Linear and NonlinearProgramming, Addison-Wesley, Reading, Mass, USA, 1973.

[18] J. Sijbersa, J. Van Audekerke, M. Verhoye, A. Van der Linden,and D. Van Dyck, “Reduction of ECG and gradient relatedartifacts in simultaneously recorded human EEG/MRI data,”Magnetic Resonance Imaging, vol. 18, no. 7, pp. 881–886, 2000.

[19] M. Blanco-Velasco, B. Weng, and K. E. Barner, “A new ECGenhancement algorithm for stress ECG tests,” in Proceedingsof IEEE International Conference on Computer in Cardiology(CIC ’06), pp. 917–920, Valencia, Spain, September 2006.

[20] B. Weng, G. Xuan, J. Kolodzey, and K. E. Barner, “Empiricalmode decomposition as a tool for DNA sequence analysisfrom terahertz spectroscopy measurements,” in Proceedings ofIEEE International Workshop on Genomic Signal Processing andStatistics (GENSIPS ’06), pp. 63–64, College Station, Tex, USA,May 2006.


[21] J. Dauwels, T. M. Rutkowski, F. Vialatte, and A. Cichocki,“On the synchrony of empirical mode decompositions withapplication to electroencephalography,” in Proceedings ofIEEE International Conference on Acoustics, Speech and SignalProcessing (ICASSP ’08), pp. 473–476, Las Vegas, Nev, USA,March 2008.

[22] M. Blanco-Velasco, B. Weng, and K. E. Barner, “ECG signaldenoising and baseline wander correction based on theempirical mode decomposition,” Computers in Biology andMedicine, vol. 38, no. 1, pp. 1–13, 2008.

[23] R. M. Rangayyan, Biomedical Signal Analysis: A Case-StudyApproach, IEEE Press, New York, NY, USA, 2002.


Research ArticleFast and Adaptive Bidimensional Empirical ModeDecomposition Using Order-Statistics Filter BasedEnvelope Estimation

Sharif M. A. Bhuiyan, Reza R. Adhami, and Jesmin F. Khan

Department of Electrical and Computer Engineering, University of Alabama in Huntsville, 272 Engineering Building,Huntsville, AL 35899, USA

Correspondence should be addressed to Sharif M. A. Bhuiyan, [email protected]

Received 17 August 2007; Revised 24 January 2008; Accepted 27 February 2008


A novel approach for bidimensional empirical mode decomposition (BEMD) is proposed in this paper. BEMD decomposes animage into multiple hierarchical components known as bidimensional intrinsic mode functions (BIMFs). In each iteration ofthe process, two-dimensional (2D) interpolation is applied to a set of local maxima (minima) points to form the upper (lower)envelope. But, 2D scattered data interpolation methods cause huge computation time and other artifacts in the decomposition.This paper suggests a simple, but effective, method of envelope estimation that replaces the surface interpolation. In this method,order statistics filters are used to get the upper and lower envelopes, where filter size is derived from the data. Based on theproperties of the proposed approach, it is considered as fast and adaptive BEMD (FABEMD). Simulation results demonstrate thatFABEMD is not only faster and adaptive, but also outperforms the original BEMD in terms of the quality of the BIMFs.

Copyright © 2008 Sharif M. A. Bhuiyan et al. This is an open access article distributed under the Creative Commons AttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properlycited.

1. INTRODUCTION

Empirical mode decomposition (EMD), originally devel-oped by Huang et al. [1, 2], is a data driven signal processingalgorithm that has been established to be able to perfectlyanalyze nonlinear and nonstationary data by obtaining localfeatures and time-frequency distribution of the data. Thefirst step of this method decomposes the data/signal intoits characteristic intrinsic mode functions (IMFs), while thesecond step finds the time frequency distribution of the datafrom each IMF by utilizing the concepts of Hilbert transformand instantaneous frequency. The complete process is alsoknown as the Hilbert-Huang transform (HHT) [1]. Thisdecomposition technique has also been extended to analyzetwo-dimensional (2D) data/images, which is known asbidimensional EMD (BEMD), image EMD (IEMD), 2DEMD and so on [3–8]. Both EMD and BEMD require findinglocal maxima and local minima points (jointly known aslocal extrema points) and subsequent interpolation of thosepoints in each iteration of the process. Local extrema pointsof one-dimensional (1D) signal are obtained using either

a sliding window or local derivative, and local extremapoints of 2D data/image are extracted using sliding windowor various morphological operations [1–4]. Cubic splineinterpolation is preferred for 1D interpolation while varioustypes of radial basis function, multilevel B-spline, Delaunaytriangulation, finite-element method, and so on have beenused for 2D scattered data interpolation [1–7], whereDelaunay triangulation and finite-element method providerelatively faster decomposition compared to the other meth-ods. Beside 2D implementation of the BEMD process, 1DEMD has also been applied to images to obtain 2D IMFs orbidimensional IMFs (BIMFs) [8–10]. In this technique, eachrow and/or each column of the 2D data is processed by 1DEMD, which makes it a faster process. However, it has beenfound that this 1D implementation results in poorer BIMFcomponents compared to the standard 2D procedure due tothe fact that the former ignores the correlation among therows and/or columns of a 2D image [11].

In EMD or BEMD, extraction of each IMF or BIMFrequires several iterations. Hence, extrema detection andinterpolation at each iteration make the process complicated


and time consuming. The situation is more difficult for thecase of BEMD that requires 2D scattered data interpolationat each iteration. For some images it may take hours or daysfor decomposition unless any additional stopping criterion isemployed, whereas additional stopping criteria may result ininaccurate and incomplete decomposition [12–15] that maynot be desired. Another common and significant problemrelated to the 2D scattered data interpolation in BEMD is thatthe maxima or minima map often does not contain any datapoints (interpolation centers) at the boundary region, whichmay be more severe for the later modes of decomposition.Currently available scattered data interpolation methods areinefficient in handling this kind of situation. Additionally, theeffect of incorrect interpolation at the boundary graduallypropagates into the mid region from iteration to iterationand from BIMF mode to BIMF mode causing corruptedBIMFs. Overshooting or undershooting is another problemof interpolation-based envelope estimation, which causesincorrect BIMFs. Although a few modifications have beensuggested in the literature to reduce the number of iterationsand/or to overcome the boundary effects [6, 11, 12], thetechnique still suffers from the above-mentioned problemsto some extent. In the BEMD process, the number of extremapoints decreases from one mode to the next mode. Forthe later modes, there may be very few irregularly spacedlocal maxima or minima points, which can cause highlyerroneous and misleading upper or lower envelopes, andthus incorrect modes of BIMFs. In order to improve thealgorithm performance, some modifications have been sug-gested for EMD [16–20], which may not be useful for BEMDin the context of processing speed and algorithm com-plexity. Moreover, any types of additional processing stepsmay make the process more complex and computationallyextensive.

BEMD is a promising image processing algorithm thatcan be applied successfully in various real world problems,for example, medical image analysis, pattern analysis, textureanalysis, and so on. But the problem, due to scattered datainterpolation in BEMD, limits its application to very smallsize images, while the size of the real images may be muchbigger than is suitable for BEMD processing. It is also notappropriate to reduce the size of the images only for thepurpose of BEMD processing and thus loose the fine detailsand/or relevant information. Hence, improvement of theBEMD algorithm is very important. In this paper, a novelBEMD approach is suggested that replaces the interpolationstep by a direct envelope estimation method. In this tech-nique, spatial domain sliding order-statistics filters, namely,MAX and MIN filters, are employed to get the runningmaxima and running minima of the data, which is followedby smoothing operation to get the upper envelope andlower envelope, respectively. The size of the order-statisticsfilters is derived from the available information of maximaand minima maps. In addition to eliminating the poorinterpolation effects and reducing the computation time foreach iteration, this process facilitates performing only oneiteration for each BIMF. The proposed fast and adaptiveBEMD (FABEMD) method can be a good alternative forefficient BEMD processing.

For ease of discussion, some new terms have been intro-duced in this paper in place of the existing terms associatedwith EMD or BEMD. Before introducing the novel conceptsof FABEMD, the regular BEMD process is briefly reviewedin Section 2 of this paper. The detailed description of theproposed FABEMD algorithm is given in Section 3. Althoughthe extrema detection method suggested in FABEMD isthe same as in BEMD, it is explained in the first part ofSection 3 for understanding the proposed envelope estima-tion technique, since it requires the extrema information asits foundation. The second part of Section 3 describes thenew method of envelope estimation. Simulation results withvarious images comparing FABEMD and BEMD are given inSection 4. Finally, concluding remarks are given in Section 5.

2. BEMD OVERVIEW

EMD or BEMD is a sifting process that decomposes a signalinto its IMFs or BIMFs and a residue based basically onthe local frequency or oscillation information. The firstIMF/BIMF contains the highest local frequencies of oscilla-tion or the highest local spatial scales, the final IMF/BIMFcontains the lowest local frequencies of oscillation and theresidue contains the trend of the signal/data. Like time-frequency distribution with EMD, acquiring the space-spatial-frequency distribution of 2D data/image is possiblewith BEMD, which may be named as bidimensional HHT(BHHT). Although direct estimation of the horizontal andvertical frequencies of BIMFs has been studied [21], BHHThas not yet been reported in the literature. It is claimed andexperimentally shown that the HHT performs better thanthe other existing techniques of analyzing the time-frequencydistribution of nonstationary and nonlinear data [1]. Thus,HHT or BHHT can better represent the local frequency andamplitude scale of the signal if the IMF or BIMF componentsappear perfect. However, decomposition of an image intoBIMFs alone can offer a wide variety of image processingapplications. Hence, the following discussion will be limitedto the first part of BEMD only, that is, decomposition of animage into BIMFs and the Residue. It should be noted thatonce the BIMFs are obtained, the space-spatial-frequencydistribution of an image can be acquired with standardtechniques of 2D Hilbert spectral analysis (HSA).

2.1. Properties of IMF/BIMF

The IMFs of a signal obtained by EMD are expected to havethe following properties [1, 2, 12, 22].

(i) In the whole data set, the number of local extrema(maxima and minima together) and the number ofzero crossings must be equal or differ by at most one.

(ii) There should be only one mode of oscillation, that is,only one local maxima or local minima, between twosuccessive zero crossings.

(iii) At any point, the mean value of the upper and lowerenvelopes, defined by the local maxima and minimapoints, is zero or nearly zero.

(iv) The IMFs are locally orthogonal among each otherand as a set.

Sharif M. A. Bhuiyan et al. 3

In fact, property (i) ensures property (ii), and vise versa. Thedefinition and properties of the BIMFs are slightly differentfrom the IMFs. It is sufficient for BIMFs to follow only thefinal two (iii) and (iv) properties given above [3, 4]. In fact,due to the properties of an image and the BEMD process,it is not possible to satisfy the first two properties (i) and(ii) given above in the case of BIMFs, since the maxima andminima points are defined in a 2D scenario for an image. Forthe same reason, it is also difficult or impossible to defineand/or to achieve any characteristic relationships betweenthe number of maxima points and the number of minimapoints for BIMFs.

2.2. Steps of BEMD

The required properties of IMFs are achieved via an “empiri-cal” iterative process [1] in EMD. The same algorithm appliesfor BEMD as well, where extrema detection and interpola-tion are carried out using 2D versions of the corresponding1D methods. Let the original image be denoted as I , a BIMFas F, and the residue as R. In the decomposition process ithBIMF Fi is obtained from its source image Si, where Si isa residue image obtained as Si = Si−1 − Fi−1 and S1 = I .It requires one or more iterations to obtain Fi, where theintermediate temporary state of BIMF (ITS-BIMF) in jthiteration can be denoted as FT j . With the definition of thevariables, the steps of the BEMD process can be summarizedas follows [1–5].

(i) Set i = 1. Take I and set Si = I .

(ii) Set j = 1. Set FT j = Si.

(iii) Obtain the local maxima map (LMMAX) of FT j ,denoted as Pj .

(iv) Form the upper envelope (UE) of FT j , denoted as UEj

by interpolating the maxima points in Pj .

(v) Obtain the local minima map (LMMIN) of FT j ,denoted as Qj .

(vi) Form the lower envelope (LE) of FT j , denoted as LE jby interpolating the minima points in Qj .

(vii) Find the mean/average envelope (ME) as MEj =(UEj + LE j)/2.

(viii) Calculate FT j+1 as FT j+1 = FT j −MEj .

(ix) Check whether FT j+1 follows the BIMF properties.These criteria are verified by finding the standarddeviation (SD), denoted as D, between FT j+1 and FT j

as defined below and comparing it to the desiredthreshold [1, 3]

D =M∑

x=1

N∑

y=1

∣∣FT j+1(x, y)− FT j(x, y)

∣∣2

∣∣FT j(x, y)

∣∣2 , (1a)

where (x, y) denotes the coordinate of the 2D data, Mis the total number of rows and N is the total number

of columns in the 2D data. The SD can also be definedas

D =∑M

x=1

∑Ny=1

∣∣FT j+1(x, y)− FT j(x, y)

∣∣2

∑Mx=1

∑Ny=1

∣∣FT j(x, y)

∣∣2 . (1b)

Although both of the SD measures in (1a) and(1b) provide a global measure of SD, the later oneis not dominated by the local fluctuations of thedenominator. Normally, a low value of D (e.g., below0.5 for (1a) and below 0.05 for (1b)) is chosen toensure nearly zero envelope mean of the BIMF.

(x) If FT j+1 meets the criteria given in step (ix), then takeFi = FT j+1. Set i = i+ 1 first and then Si = Si−1−Fi−1.Go to step (xi). If FT j+1 does not meet the stoppingcriteria, then set j = j+1, go to step (iii) and continueup to step (x) as before until the criteria are fulfilled.

(xi) Determine whether Si has less than three extremapoints, and if so, this is the residue R of the image(i.e., R = Si); and the decomposition is complete.Otherwise, go to step (ii) and continue up to step(xi) to obtain the subsequent BIMFs. In the process ofextracting the BIMFs, the number of extrema pointsin Si+1 should be lower than that in Si.

The BIMFs and the residue of an image together canbe named as bidimensional empirical mode components(BEMCs). Except for the truncation error of the digitalcomputer, the summation of all BEMCs returns the originaldata/image back as given by

ΣC =K+1∑

i=1

Ci = I , (1)

where Ci is the ith BEMC and K is the total number ofBIMFs excluding the residue. An orthogonality index (OI),denoted as O, has been proposed for IMFs in [1], which maybe extended for the case of BEMCs as follows:

O =M∑

x=1

N∑

y=1

(K+1∑

i=1

K+1∑

j=1

Ci(x, y)Cj(x, y)∑2

C(x, y)

)

. (2)

A low value of OI indicates a good decomposition in terms oflocal orthogonality among the BEMCs. In general, OI valuesless than or equal to 0.1 are acceptable.

2.3. Issues related to BEMD

The decomposition of an image into BEMCs is not a uniqueprocess. The number of BEMCs and their characteristicsdepend on the extrema detection method, interpolationtechnique, and stopping criteria of the iterations for eachBIMF. In that sense, there are an infinite number of BEMCsets for each image [12]. As mentioned in Section 1, localextrema (maxima and minima) points of 2D data/image areobtained using 2D sliding window or various morphologicaloperations [1–4] and radial basis function, multilevel B-spline, Delaunay triangulation, finite-element method, and


so forth, 2D scattered data interpolation [3–7] have beenused for interpolating the extrema points to form theupper and lower envelopes. To stop the iterations foreach BIMF, the SD threshold criterion is mostly used tosatisfy the zero envelope mean, although there are severaladditional stopping criteria that may be employed [12–15]. The performance of scattered data interpolation in theBEMD process is highly dependent on the interpolationcenters, their orientation, location, numbers, and so on.Hence, local maxima and minima maps play a significant rolein creating the upper and lower envelopes. Absence or lack ofextrema points at the boundaries of ITS-BIMFs FT js and thepresence of very few extrema points in the source images Sisfor higher values of i, cause erroneous surface interpolationthat results in misleading upper or lower envelopes and henceincorrect BIMFs. Because the surface interpolation methodfits a surface in iterative optimization approach utilizing thescattered data arising from the extrema points, it makes theBEMD process an extremely slow one.

3. FABEMD ALGORITHM DETAILS

With the intention of overcoming the difficulty in imple-menting BEMD via the application of surface interpolation,a novel approach is devised that eliminates the need forsurface interpolation. This new BEMD process, named asfast and adaptive BEMD (FABEMD), differs from the actualBEMD algorithm, basically in the process of estimating theupper and lower envelopes and in limiting the number ofiterations per BIMF to one. Hence, the steps of the FABEMDalgorithm remain the same as BEMD given in Section 2.2with maximum required value of j (iteration index foreach BIMF) equal to one considered being sufficient. Thedetails of extrema detection and envelope formation of theFABEMD process are discussed in this section.

3.1. Detection of local extrema

Detection of local extrema means finding the local maximaand minima points from the given data. The 2D array oflocal maxima points is called a maxima map (LMMAX)and the 2D array of local minima points is called a minimamap (LMMIN). Like BEMD, neighboring window methodis employed to find local maxima and local minima pointsfrom the jth ITS-BIMF FT j of any source image Si, whereFT j = Si for j = 1 (i = 1, 2, . . . ,K). In this method, a datapoint/pixel is considered as a local maximum (minimum), ifits value is strictly higher (lower) than all of its neighbors. LetA be an M ×N 2D matrix represented by

A =

⎡

⎢⎢⎢⎢⎣

a11 a12 · · · a1N

a21 a22 · · · a2N...

... · · ·...

aM1 aM2 · · · aMN

⎤

⎥⎥⎥⎥⎦

, (3)

where amn is the element of A located in the mth row and nthcolumn. Let the window size for local extrema determination

8 8 4 1 5 2 6 33336 32 97

7 8 3 2 1 4 3 74 1 2 4 3 5 7 86 4 2 1 2 5 3 4

81 3 7 9 9 8 777119 97

7622

9 88 1

6

(a)

0 0 0 0 0 0 0 00 0 0 0 7 0 9 00 8 0 0 0 0 0 00 0 0 4 0 0 0 86 0 0 0 0 0 0 00 0 0 0 0 0 0 89 0 0 0 0 0 0 00 0 0 9 0 9 0 0

(b)

0 0 0 1 0 2 0 00 0 0 0 0 0 0 00 0 0 0 1 0 0 00 1 0 0 0 0 0 00 0 0 1 0 0 3 01 0 0 0 0 0 0 00 0 0 0 0 0 00 0 1 0 0 0 0 0

6

(c)

Figure 1: (a) A sample 8 × 8 data matrix; (b) local maxima mapobtained from (a); and (c) local minima map obtained from (a).

be wex ×wex. Then,

amn �{

Local Maximum if amn > akl;

Local Minimum if amn < akl,(4)

where

k = m− wex − 12

: m +wex − 1

2, (k /=m);

l = n− wex − 12

: n +wex − 1

2, (l /= n).

(5)

Generally, a 3 × 3 window (i.e., wex = 3) results in anoptimum extrema map for a given 2D data. However, ahigher window size may be used in some applications; butthis will result in a lower, if not equal, number of localextrema points for a given data matrix. Let us consider the 8×8 data matrix given in Figure 1(a) for illustration purposes.The maxima map given in Figure 1(b) and minima mapgiven in Figure 1(c) are obtained when a 3 × 3 neighboringwindow is used for every point in the matrix. For findingextrema points at the boundary or corner, the neighboringpoints within the window that are beyond the image areneglected. As an example, 3× 3 windows centered at a32, a75,and a26 with darker grids are also shown in Figure 1(a). Thecenter element of the first window is a local maximum, thecenter element of the second window is a local minimum,while the center element of the third window is neither a localmaximum nor a local minimum.

3.2. Generating upper and lower envelopes

After obtaining the maxima and minima maps, Pj and Qj ,respectively, from a given ITS-BIMF FT j , the next step is tocreate the continuous upper and lower envelopes, UEj andLE j . In usual BEMD, suitable 2D scattered data interpolationis applied to Pj and Qj to create these envelopes. Inthis work, a simple but efficient modification has beenformulated for the generation of upper and lower envelopes.This approach basically applies two order statistics filters toapproximate the envelopes, where a MAX filter is used forupper envelope and a MIN filter is used for lower envelope.Order statistics filters are spatial filters whose response isbased on ordering (ranking) the elements contained withinthe data area encompassed by the filter [23]. The response ofthe filter at any point is determined by the ranking result.The crucial part of applying the order statistics filters forenvelope estimation is to determine an appropriate size for


the filter. Based on the desired properties of BIMFs alongwith the characteristics of Pj and Qj for a given Si, themethod described in Section 3.2.1 is developed for windowsize determination to extract the corresponding BIMF Fi.

3.2.1. Determining window size for order-statistics filters

The window size for order statistics filters is determinedbased on the maxima and minima maps obtained from asource image Si, that is, based on Pj and Qj derived fromFT j when j = 1 and FT j = Si. For each local maximum pointin Pj , that is, for each nonzero element in Pj , the Euclideandistance to the nearest nonzero element is calculated. Thearray of distances thus obtained is called adjacent maximadistance array (AMAXDA), denoted as dadj-max, where thenumber of elements in AMAXDA is equal to the numberof local maxima points in the maxima map Pj . Figure 2(a)shows the maxima map of Figure 1(b) with the maximapoints being represented as bright boxes while the otherpoints are represented as dark boxes. Figures 2(b) and 2(c)show two points of interest from the set of maxima pointsmarked with “©” and their corresponding nearest neighborsmarked with “�”.

Similarly, the array of distances obtained from thelocal minima map Qj is called adjacent minima distancearray (AMINDA), denoted as dadj-min, where the number ofelements in AMINDA is equal to the number of local minimapoints in the minima map Qj . Both dadj-max and dadj-min

are sorted in descending order for convenient selection ofdistances from these arrays. Considering square window, thegross window width wen−g for order statistics filters can beselected in many different ways using the distance valuesin dadj-max and dadj-min among which four choices are givenbelow

wen−g = d1 = minimum{

minimum{

dadj-max}

,

minimum{

dadj-min}}

,

wen−g = d2 = maximum{

minimum{

dadj-max}

,

minimum{

dadj-min}}

,

wen−g = d3 = minimum{

maximum{

dadj-max},maximum

{dadj-min

}},

wen−g = d4 = maximum{

maximum{

dadj-max}

,

maximum{

dadj-min}}

,

(6)

where maximum{} denotes the maximum value of theelements in the array {} and minimum{} denotes theminimum value of the elements in the array {}. wen−g isthen rounded to the nearest odd integer to get the finalwindow width wen producing a window of size wen × wen.The relation of the distances obtained from (6) is d1 ≤d2 ≤ d3 ≤ d4. Let the order statistics filter widths (OSFW)obtained via (6) be defined as Type-1, Type-2, Type-3, andType-4, respectively, where Type-1 and Type-4 may also bedenoted as lowest distance OSFW (LD-OSFW) and highestdistance OSFW (HD-OSFW), respectively. wen required fori + 1th BIMF generally appears larger than that for the ithBIMF if using Type-3 or Type-4 OSFW; however, wen for

(a) (b) (c)

Figure 2: (a) Maxima map of Figure 1(b) shown with shades wherethe brighter boxes represent the location of the maxima points,(b) and (c) sample maxima point and its nearest neighbor shown,respectively, with “©” and “�”.

i + 1th BIMF sometimes may not appear larger than that forthe ith BIMF if using Type-1 or Type-2 OSFW. Therefore,if the calculated wen for a BIMF mode is not larger thanthe previous BIMF mode, then additional manipulation maybe required to make it larger than the previous mode (e.g.,current wen may be taken as approximately 1.5 times of theprevious wen). Though it is not necessary, it will ensure thecurrently existing properties of BIMF hierarchy in the sensethat the later BIMF will contain coarser local spatial scales[1, 3]. It will be clear from Section 4 that the choice of wen

from the above four options depends on the applicationand/or desired BIMF characteristics.

It is preferable to apply the same window size for bothMAX and MIN filters as discussed above, though it maybe possible to choose different window sizes for them. Forexample, window size for the MAX filter can be selectedbased on the distances in AMAXDA, while window size forthe MIN filter can be selected based on the distances inAMINDA as follows:

wmaxen-g = minimum{

dadj-max}

,

wmaxen-g = maximum{

dadj-max}

,(7)

wminen-g = minimum{

dadj-min}

,

wminen-g = maximum{

dadj-min}.

(8)

Equation (7) can be used for the MAX filter and (8) canbe used for the MIN filter. However, there is a practicallimitation to this approach. In some situations, there maybe only one local maxima (minima) in a source image Si,which will result in an empty array for dadj-max (dadj-min)and thus will prevent upper (lower) envelope formation andhinder the algorithm before it satisfies the extrema criteriafor stopping. On the other hand, employing the same size forMAX and MIN filters for the same BIMF induces extractionof similar spatial scales into that BIMF, while differentwindow sizes for MAX and MIN filters may obstruct thisprocess. It is worthwhile to mention an additional option forthe selection of wen before describing the envelope formationin Section 3.2.2. Based on the image or desired propertiesof BIMFs, wen may be chosen arbitrarily as well. In thatcase, wen for i + 1th BIMF should be chosen higher than thewen for the ith BIMF; but extraction of BIMFs will be lessdata driven with an arbitrary selection of wen. The variouspossibilities of window sizes for MAX and MIN filters for


envelope formation provide different decomposition of animage. It is this feature that makes the proposed approachan adaptive one.

3.2.2. Applying order statistics and smoothing filters

With the determination of window size wen for envelopeformation, MAX and MIN filters are applied to the cor-responding ITS-BIMF FT j to obtain the upper and lowerenvelopes, UEj and LE j , as specified below:

UEj(x, y) = MAX(s,t)∈Zxy

{FT j(s, t)

}, (9)

LE j(x, y) = MIN(s,t)∈Zxy

{FT j(s, t)

}. (10)

In (9) the value of the upper envelope UEj at any point (x, y)is simply the maximum value of the elements in FT j in theregion defined by Zxy , where Zxy is the square region of sizewen × wen centered at any point (x, y) of FT j . Similarly, in(10) the value of the lower envelope LE j at any point (x, y)is simply the minimum value of the elements in FT j in theregion defined by Zxy . It should be noted that the MAX andMIN filters produce new 2D matrices for upper and lowerenvelope surfaces from the given 2D data matrix, it does notalter the actual 2D data. Since smooth continuous surfacesfor upper and lower envelopes are preferable, an averagingsmoothing operation is carried out on both UEj and LE jemploying the same window size used for correspondingorder statistics filters. This averaging smoothing operationmay be expressed as below:

UEj(x, y) = 1wsm ×wsm

∑

(s,t)∈Zxy

UE j(s, t),

LE j(x, y) = 1wsm ×wsm

∑

(s,t)∈Zxy

LE j(s, t),(11)

where Zxy is the square region of size wsm × wsm centered atany point (x, y) of UEj or LE j , wsm is the window width of theaveraging smoothing filter and wsm = wen. The operationsin (11) are arithmetic mean filtering that smoothes localvariations in data. From the smoothed envelopes UEj andLE j , the mean or average envelope MEj is calculated as in theoriginal BEMD method given in Section 2.

As previously mentioned, FABEMD differs from BEMDin the way of formulating the upper and lower envelopes, UEj

and LE j , and in restricting the number of iterations for eachBIMF to one. In fact, one iteration per BIMF in FABEMDproduces similar or better results than can be achieved byBEMD with more than one iteration. On the other hand,scattered data interpolation itself is an iterative process thatfits a surface over the scattered data points in multiple steps.Though upper and lower envelope formation in FABEMDrequires three steps: window size determination, getting theMAX (MIN) filter output, and averaging smoothing, all theseoperations can be done very fast using efficient programmingroutines; and the time required is much less than is requiredin the interpolation-based envelope estimation.

8 8 88 8 88 8 88 8 8 8 8

8 88 8

8

7 77 77 7

7

7

764 5

9 9 99 9 99 9 9

9 9 9 99 9 9 9 9 99 9 9 9 9 99 9 9 9 9 9

99

(a)

33

333333

3333

66

22

22

222

11

11

11

11

11

1111

11

11

11

11

11

2111

111

11111

111

1111

(b)

Figure 3: For data matrix of Figure 1(a): (a) upper envelope matrixusing FABEMD before smoothing, (b) lower envelope matrix usingFABEMD before smoothing.

3.2.3. Illustration of upper and lower envelopes estimation

For illustration purposes of the envelope formation inFABEMD, let us consider the 2D data of Figure 1(a) andcorresponding local maxima and minima maps of Figures1(b) and 1(c). Window width wen obtained using Type-4OSFW is 3. So, taking a 3 × 3 window for MAX and MINfilters and applying them to the data matrix of Figure 1(a)results in the upper and lower envelope matrices given inFigures 3(a) and 3(b), respectively.

The application of averaging smoothing operations toFigures 3(a) and 3(b) results in the smoothed upperand lower envelope matrices shown in Figures 4(a) and4(b), respectively. The mean envelope matrix produced byaveraging the matrices of Figures 4(a) and 4(b) is shown inFigure 4(c). For comparison purpose, corresponding matri-ces for UE, LE, and ME derived by using thin-plate splinesurface interpolation to the maxima and minima maps areshown in Figure 5. Comparison of data in Figures 1(a), 4(c),and 5(c) reveals that the mean envelope derived by FABEMDmethod more closely matches the local mean of the givendata. Since local mean subtraction is essential for the BEMDor FABEMD process to yield nearly zero local mean BIMFs,the FABEMD achieves this goal in as few as one iteration. Itis shown in the literature [1, 3] that IMF or BIMF propertiesare retained when local mean is defined as the local meanof the upper and lower envelopes, not just the usual localmean as might be obtained by averaging the data using aspatial averaging filter smaller than the original size of thedata matrix. Nevertheless, zero local envelope mean that alsoinduces zero local mean yields well-characterized BIMFs.

To visualize the envelope formation for FABEMD moreexplicitly, let us consider a 1D signal for simplicity, givenin Figure 6, where local maxima points are indicated by“©” and local minima points are indicated by “x” that areobtained using a 1 × 3 neighboring window. The sortedarray of AMAXDA for this signal appears as dadj-max =[ 107 106 93 93 72 ], while the sorted array of AMINDAappears as dadj-min = [ 108 107 93 93 78 ]. Using thesedistance arrays, OSFW for Type-1, Type-2, Type-3, andType-4 appears to be 73, 79, 107, and 109, respectively.Taking Type-4 OSFW (i.e., wen = 109) as the width ofthe MAX and MIN filters and applying them to the 1Dsignal of Figure 6 results in the UE, LE, and ME shownin Figure 7(a). The corresponding envelopes after applyingsmoothing averaging filter of the same size are displayed


8 88 88 8

7.4

7.7

7.8

7.3 7.7 8.3

8.3 8.3

7.7 7.7

8.18.1 8.1

8.188

8.3

8.4 8.8 8.8

8.9 8

8.4

8.48.4

8

8.6 8.77.3

7.37.3 7.37.6

7.8 7.8

7.87.9

7.9

7.9

9 99 9

9

99 99 9 9 9 9 9

9 9

9

6.8 6.9

(a)

2.7 2.1

2.1 2.1

2.2 2.2 2.2 2.4

2.2 2.2 2.4

2.6

2.7

2.7 2.7 2.7

2.3

2.8 2.9

2

1.3

1.3

3 3

1.7

1.7 1 1

1 1 1 1

1 1 1

1 1 1

1 1 1

1 1

1.6

1.6

1 1

1.6

1.1

1.1

1.2

1.2 1.1 1.2 1.4

1.2 1.6 1.6 1.6 1.8 1.9

1.2 1.1

1.8

(b)

5.3 5.1

4.5 5

5.3 5.4 5.2 5.5 5.3

5.3 5.5

5.4 5.4

5.1

5.1

5.1

5.2

5

5

5.6 5.6 5

5.6 5.6 5.8

5.8

4.8

4.8

5.8 5.8 5.8

4.8 4.8

4.2 4.2

4.2 4.2

4.2 5.7

5.7

4.2

4.7

4.7

4.7 4.7

4.4

4.4

4.4 4.4 4.4 4.3

4.4 4.4

4.5 4.5 4.6

4.6 4.9

4.9

4.9 4.9

3.9 3.9 5.9

(c)

Figure 4: For data matrix of Figure 1(a): (a) upper envelope matrix using FABEMD after smoothing, (b) lower envelope matrix usingFABEMD after smoothing, and (c) mean envelope matrix obtained by averaging the data in (a) and (b).

10.8 10.3 9.5 8.8 8.7 9.3 10.1 10.9

9.9 9.3 8.2 7.2 7 7.8 9 9.98.5 8 6.6 5.4 5.4 6.4 7.7 8.8

6.9 6.2 5 4 4.3 5.5 6.8 8

6 5.4 4.6 4.1 4.4 5.4 6.6 7.7

7.1 6.3 5.7 5.3 5.5 6.2 7.1 8

9 8.2 7.5 7.1 7.2 7.5 8.1 8.7

10.6 9.9 9.4 9 8.9 9 9.3 9.6

(a)

0.9 0.8 0.8 1 1.5 2 2.4 2.8

1 0.8 0.7 0.8 1.2 1.7 2.1 2.6

1.1 0.9 0.6 0.6 1 1.5 1.9 2.5

1.2 1 0.6 0.5 1.1 1.6 2.1 2.7

1.2 1 0.8 1 2 2.7 3 3.6

1 0.9 1.3 2.5 4.1 4.8 4.8 5

0.3 0.5 1.4 3.6 6 6.7 6.6 6.5

-0.4 -0.1 1 3.6 6.1 7.3 7.6 7.6

(b)

5.9 5.6 5.2 4.9 5.1 5.6 6.3 6.8

5.4 5.1 4.4 4 4.1 4.8 5.6 6.2

4.8 4.5 3.6 3 3.2 3.9 4.8 5.7

4.1 3.6 2.8 2.3 2.7 3.5 4.4 5.4

3.6 3.2 2.7 2.6 3.2 4.1 4.8 5.6

4 3.6 3.5 3.9 4.8 5.5 5.9 6.5

4.7 4.3 4.4 5.4 6.6 7.1 7.3 7.6

5.1 4.9 5.2 6.3 7.5 8.1 8.4 8.6

(c)

Figure 5: For data matrix of Figure 1(a): (a) upper envelope matrix using BEMD with thin-plate spline interpolation, (b) lower envelopematrix using BEMD with thin-plate spline interpolation, and (c) mean envelope matrix obtained by averaging the data in (a) and (b).

5004504003503002502001501005000

5

10

15

20

25

30

Figure 6: A 1D signal and its local maxima and minima points.

in Figure 7(b), and the same envelopes created by applyingcubic spline interpolation to the maxima and minima mapsare given in Figure 7(c). Figure 7(c) indicates the possibilityof incorrect interpolation at the boundary and thus causingimproper ME. The top waveforms in Figures 8(a) and8(b) are the original 1D signal given in Figure 6, whereasthe bottom waveform in Figure 8(a) is the result of MEsubtraction in FABEMD method and the bottom waveformin Figure 8(b) is the result of ME subtraction in BEMDmethod. This illustration, along with the previous analyses,demonstrates the effectiveness of the proposed FABEMDmethod for BIMF or BEMC extraction.

4. SIMULATION RESULTS

The effectiveness of the FABEMD is investigated by imple-menting the algorithm for analyzing various images. Thedecomposed BEMCs resulting from FABEMD are compared

with the BEMCs acquired using BEMD. Simulation resultsare reported for FABEMD with OSFW of Type-1 and Type-4. Although only one iteration for each BIMF is suggested inthe FABEMD method, some results are also shown for morethan one iteration to justify the adequacy of performingone iteration. Since the window sizes for order-statistics andsmoothing filters are determined from the source imageinformation, these sizes remain the same for all the iterationsfor the corresponding BIMF. FABEMD results are comparedwith the BEMD results obtained by thin-plate spline (TPS)interpolator, a radial basis function (RBF) that has beenestablished as a good choice for BEMD [3–5]. For BEMDwith RBF-TPS, SD criterion is employed as the fundamentalstopping criteria with a threshold of 0.01, while the maxi-mum number of allowable iterations (MNAI) is applied asadditional stopping criterion to prevent over sifting [6, 15].Additionally, in some cases BEMD results are also examinedand reported for one iteration, to compare with the resultsof FABEMD with one iteration. To further limit the numberof iterations and thus prevent over sifting in BEMD, SDdefined by (1b) is considered for the simulation. It should benoted that the definition of SD affects the number of requirediterations to achieve a given threshold and thus, the amountof sifting per BIMF; it does not have any contribution tothe calculation of UE, LE, or ME in a particular iteration.Even though the complete space-spatial-frequency analysisusing BHHT is investigated, the results are not shown inthis paper. However, it is obvious that a good set of BIMFswill yield a good BHHT-based image representation. In thesimulation, the maximum image size is limited to 256× 256-pixel. Although FABEMD is capable of decomposing imagesof any size or resolution very fast (e.g., in few seconds orfew minutes), BEMD is unable to do so. Since FABEMD


50040030020010000

5

10

15

20

25

30

(a)

50040030020010000

5

10

15

20

25

30

(b)

50040030020010000

5

10

15

20

25

30

(c)

Figure 7: (a) Envelopes using the proposed approach before smoothing, (b) envelopes using the proposed approach after smoothing, (c)envelopes using cubic spline interpolation.

500450400350300250200150100500−15

−10

−5

0

5

10

15

20

25

30

(a)

500450400350300250200150100500−15

−10

−5

0

5

10

15

20

25

30

(b)

Figure 8: (a) Original signal (top) and mean envelope subtracted signal (bottom) using FABEMD algorithm, (b) original signal (top) andmean envelope subtracted signal (bottom) using BEMD algorithm.

results are compared with BEMD results for the same images,256× 256-pixel images help perform the task conveniently.

4.1. Analysis with synthetic texture image

A synthetic texture image (STI) of 256 × 256-pixel size istaken, which is composed by adding three different compo-nents of the same size. For convenience of synthesizing, eachsynthetic texture component (STC) is generated from hori-zontal and vertical sinusoidal waveforms having different butclosely spaced frequencies. The first STC consists of higherfrequencies, the second STC consists of medium frequencies,and the last STC consists of very low frequencies. The STIand STCs are shown in Figure 9, while the diagonal intensityprofiles of the STI and STCs are presented in Figure 10. Evenif the addition of arbitrarily developed STCs in Figures 9(a)to 9(c) yields the original STI of Figure 9(d), applicationof BEMD or FABEMD to the STI of Figure 9(d) may notnecessarily regenerate the STCs of Figures 9(a) to 9(c) (e.g.,BEMC-1 may not be the same as STC-1), a consequence thatcan be attributed to the property of BEMD/FABEMD. Still,analysis of BEMD/FABEMD employing this synthetic textureprovides a good performance indication of the algorithm.The OI among the original STCs and the global mean ofeach component are given in Table 1, which facilitates thecomparison of the extracted BEMCs with the actual STCs.Since STC-1 and STC-2 are nearly symmetric with bipolar

Table 1: Global mean of STCs and their OI.

Global Mean Orthogonality Index (OI)

STC-1 0.65890.0270STC-2 0.0415

STC-3 234.2684

gray level values, their global mean should be close to zeroas seen from Table 1. Thus, for STC-1 and STC-2 zero localenvelope mean also implies zero global mean or vice versa.

Before demonstrating the final results of STI decomposi-tion using FABEMD and BEMD, let us investigate the UE, LE,and ME of the STI generated by using different approaches.Figures 11(a) to 11(c) display the combined three-dimensional (3D) mesh plots of 32 × 32-pixel regions takenfrom the same locations of the original 256 × 256-pixel STIand the envelopes obtained by FABEMD with Type-4 OSFW,FABEMD with Type-1 OSFW, and BEMD with RBF-TPSinterpolation, respectively. Figure 11 manifests the effective-ness of the proposed scheme of envelope estimation, whichcan very well replace the interpolation-based envelope esti-mation. Computation time of mean envelope estimation forthe 256× 256-pixel STI is also given in the parenthesis of thecorresponding caption of Figure 11. In this case, it is noticedthat the envelope estimation takes much shorter time withFABEMD than with BEMD. In general, OSFW increases for


25020015010050250

200

150

100

50

(a)

25020015010050250

200

150

100

50

(b)

25020015010050250

200

150

100

50

(c)

25020015010050250

200

150

100

50

(d)

Figure 9: (a) Component 1 (STC-1), (b) component 2 (STC-2), (c) component 3 (STC-3), (d) original synthetic texture image (STI)obtained from addition of (a) to (c).

300250200150100500−80

−40

0

40

80

(a)

300250200150100500−80

−40

0

40

80

(b)

30025020015010050050

100

150

200

250

300

(c)

3002502001501005000

100

200

300

400

(d)

Figure 10: 1D diagonal intensity profiles of (a) STC-1, (b) STC-2, (c) STC-3, (d) STI.

010

2030 40

250

200

150

10050

0

010

2030

40

(a)

010

2030 40

250

200

150

10050

0

010

2030

40

(b)

010

2030 40

250

200

150

10050

0

010

2030

40

(c)

Figure 11: Mesh plots of 32× 32-pixel regions taken from the 256× 256-pixel STI and its UE, LE, and ME employing (a) FABEMD Type-4OSFW (4.298072 seconds), (b) FABEMD Type-1 OSFW (3.717336 seconds), (c) BEMD RBF-TPS (193.124406 seconds).

the later source images and hence the corresponding compu-tation times of the envelopes also increase for the later BIMFsin the FABEMD process. On the other hand, the number ofextrema points decreases for the later source images or ITS-BIMFs and therefore envelope estimation time decreases forlater BIMFs in the BEMD process. The overall computationtime in the BEMD process still remains much higher due tothe iterative surface-fitting problem from the scattered data.

Decomposition of the STI in Figure 9(d) is first con-ducted by applying FABEMD having Type-4 OSFW withMNAI = 1. The resulting BEMCs and the summation of theBEMCs are displayed in Figure 12(a); and the correspondingdiagonal intensity profiles are displayed in Figure 12(b).Figure 12 reveals the similarity of the BEMCs with theoriginal STCs very well.

As mentioned in Section 2.2, in any approach of BEMDor FABEMD, the summation of the BEMCs will always returnthe original image back, except for the truncation/roundingerror introduced at various steps of the process. This factcan be well verified from comparison of the STI and thesummation of BEMCs in Figures 9, 10, and 12. Hence,showing the summation of BEMCs is excluded in thesubsequent analyses of this paper.

The BEMCs of the STI obtained by applying FABEMDwith Type-4 OSFW are displayed in Figure 13(a) for MNAI =5. The diagonal intensity profiles of the correspondingimages in Figure 13(a) are shown in Figure 13(b). Becauseof the increased iterations, the stopping point SD for eachBIMF decreases. This helps in attaining a first BIMF (BIMF-1), which is more similar to the original STC-1. But, due


25020015010050250

200

150

100

50

BEMC-1

25020015010050250

200

150

100

50

BEMC-2

25020015010050250

200

150

100

50

BEMC-3

25020015010050250

200

150

100

50

Sum of all BEMCs

(a)

300250200150100500−100

−60

−20

20

60

100

300250200150100500−80

−40

0

40

80

300250200150100500120

160

200

240

280

3002502001501005000

100

200

300

400

(b)

Figure 12: Decomposition of the STI using FABEMD with Type-4 OSFW (MNAI = 1) (a) BEMC-1 to BEMC-3 and summation of theBEMCs, (b) diagonal intensity profiles of BEMC-1 to BEMC-3 and summation of the BEMCs.

25020015010050250

200

150

100

50

BEMC-1

25020015010050250

200

150

100

50

BEMC-2

25020015010050250

200

150

100

50

BEMC-3

25020015010050250

200

150

100

50

BEMC-4

(a)

300250200150100500−80

−40

0

40

80

300250200150100500−80

−40

0

40

80

300250200150100500−60

−20

20

60

300250200150100500160

200

240

280

(b)

Figure 13: Decomposition of the STI using FABEMD with Type-4 OSFW (MNAI = 5) (a) BEMCs (b) diagonal intensity profiles of BEMCs.

to the over sifting, an additional component appears thatdoes not have any similarity to any of the original STCs.By looking at BEMC-3 of this decomposition, it may beinferred that this type of component may not have anysignificance in actual image processing applications. In fact,BEMC-3 and BEMC-4 may be combined to get a componentsimilar to STC-3, although BEMC-3 contains some higherspatial scales compared to STC-3. Since the characteristics ofdiagonal intensity profiles for various BEMCs of the STI arenow realized, displaying these profiles will be left out in thesubsequent analyses.

As a third example, the decomposed BEMCs employingFABEMD with Type-1 OSFW for MNAI = 1 are shownin Figure 14. Because Type-1 OSFW gives the minimumpossible width from the distance matrix, it causes anincreased level of sifting and thus a greater number ofBIMFs/BEMCs (e.g., six BEMCs in this case). This reveals thefact that the selection of OSFW type can be made based onthe image properties and desired applications.

Application of FABEMD with Type-1 OSFW for MNAI =5 generates seven BEMCs, which are displayed in Figure 15.This decomposition also shows the effect of over sifting and


25020015010050250

200

150

100

50

BEMC-1

(a)25020015010050

250

200

150

100

50

BEMC-2

(b)25020015010050

250

200

150

100

50

BEMC-3

(c)25020015010050

250

200

150

100

50

BEMC-4

(d)

25020015010050250

200

150

100

50

BEMC-5

(e)25020015010050

250

200

150

100

50

BEMC-6

(f)

Figure 14: BEMCs of the STI obtained by FABEMD with Type-1 OSFW (MNAI = 1).

25020015010050250

200

150

100

50

BEMC-1

(a)25020015010050

250

200

150

100

50

BEMC-2

(b)25020015010050

250

200

150

100

50

BEMC-3

(c)25020015010050

250

200

150

100

50

BEMC-4

(d)

25020015010050250

200

150

100

50

BEMC-5

(e)25020015010050

250

200

150

100

50

BEMC-6

(f)25020015010050

250

200

150

100

50

BEMC-7

(g)

Figure 15: BEMCs of the STI obtained by FABEMD with Type-1 OSFW (MNAI = 5).

thus extraction of improper BEMCs in FABEMD. Due to theproperty of order statistics filter-based envelope estimationfollowed by a smoothing operation, over sifting in FABEMDmay cause improper BEMCs as well.

The STI is next decomposed using BEMD with RBF-TPSinterpolation to compare the results with the new FABEMDmethod. The BEMCs resulting with MNAI = 1 are given inFigure 16 and the BEMCs resulting with MNAI = 5 are givenin Figure 17. In both cases four BEMCs are generated, whereBEMC-3 and BEMC-4 together may become similar to the

STC-3. Due to the increased number of iterations, BEMC-1appears better in Figure 17 than in Figure 16. In general, forBEMD, more than one iteration may generate more accurateBEMCs [3–5], although it is also better to limit the number ofiterations without satisfying SD threshold criteria to preventover sifting [6, 15, 21].

For additional performance assessment of the above sixmodes of the FABEMD/BEMD algorithm, the decomposi-tion data of the STI are presented in Tables 2 to 5. Table 2displays the number of obtained BEMCs, time taken for


25020015010050250

200

150

100

50

BEMC-1

(a)25020015010050

250

200

150

100

50

BEMC-2

(b)25020015010050

250

200

150

100

50

BEMC-3

(c)25020015010050

250

200

150

100

50

BEMC-4

(d)

Figure 16: BEMCs of the STI obtained by BEMD with RBF-TPS (MNAI = 1).

25020015010050250

200

150

100

50

BEMC-1

(a)25020015010050

250

200

150

100

50

BEMC-2

(b)25020015010050

250

200

150

100

50

BEMC-3

(c)25020015010050

250

200

150

100

50

BEMC-4

(d)

Figure 17: BEMCs of the STI obtained by BEMD with RBF-TPS (MNAI = 5).

each algorithm, and OI for each case. In terms of timetaken and orthogonality index, FABEMD employing Type-4 OSFW with MNAI = 1 appears to be the best choicefor decomposing the STI considered in this paper. FromTable 3 it is observed that one iteration of the processfor each BIMF results in a comparatively higher stoppingpoint SD. While higher SD implies nonzero local envelopemean for the corresponding BIMF, in the spatial scale-based decomposition, strict zero local envelope mean of aBIMF is not essential, which has been already establishedin the literature [6, 15]. This relaxation in turn allowsprevention of over sifting by reducing the number ofiterations, or not having a very low SD at the terminationof the iterations. For FABEMD, increased iterations maycause erroneous envelopes and thus improper BIMFs, asmentioned previously. Table 4 represents the global mean ofthe BEMCs for all the FABEMD/BEMD modes applied to theSTI. Like the first two symmetric and bipolar original STCs,the BIMFs are also expected to be symmetric and bipolar. Forthis reason, zero global mean of a BIMF should also indicatezero local mean or zero local envelope mean of it. In thatsense, except the residue, other BEMCs (i.e., BIMFs) havingnearly zero global mean can be considered good BIMFs.Hence, the method that produces a BIMF with higher globalmean from the considered STI can be treated as poor. Thesefeatures again designate FABEMD employing Type-4 OSFWwith MNAI = 1 as a good choice for decomposition of theSTI of Figure 9(d). Although the number of extrema in thesource images for BEMD or FABEMD and the OSFW foreach BIMF in FABEMD does not indicate any performancemeasure, these statistics are given in Tables 5 and 6 to providesome additional details of the processes.

4.2. Analysis with real images

Three real images are analyzed, in this section, to furtherinvestigate and compare the performance of FABEMD andBEMD. The first image is a 256 × 256-pixel region of areal texture image, D18, taken from the Brodatz textureset and shown in Figure 18(a) [24]. The second imageis a subsampled 256 × 256-pixel Elaine image shown inFigure 18(b), while the third image is a 200×228-pixel noisyaurora image shown in Figure 18(c). The BEMCs generatedfrom the D18 image, Elaine image, and aurora image byapplying FABEMD with Type-1 OSFW and MNAI = 1are shown in Figures 19, 21, and 23, respectively. On theother hand, the BEMCs generated from these same images byapplying BEMD with RBF-TPS interpolation and MNAI =10 are shown in Figures 20, 22, and 24, respectively.

Because there are no ground-truth BEMCs of the con-sidered real images, intuitive analysis using visual assessmentis reported as the fundamental performance criterion in thispaper. It is obvious from the BEMCs of real images that theFABEMD yields very well-defined BEMCs, which representthe image features at various spatial scales similar to, orbetter than, the BEMCs obtained from the BEMD method.Unwanted distortion and other artifacts may accompanythe BEMCs when obtained via BEMD, which is apparentfrom the figures of BEMCs obtained by BEMD and givenin Figures 20, 22, and 24; and thus they may not appearto be suitable for further image processing tasks. Althoughfurther evaluation of the texture decomposition can bereported by showing it for true texture analysis (e.g., textureclassification, texture segmentation), achievement in havingless distortion in the BEMCs of the texture image obtained


25020015010050250

200

150

100

50

(a)25020015010050

250

200

150

100

50

(b)20015010050

220

180

140

100

60

20

(c)

Figure 18: (a) 256× 256-pixel region of Brodatz texture D18 [24], (b) 256× 256-pixel Elaine image, (c) 200× 228-pixel noisy aurora image.

25020015010050250

200

150

100

50

BEMC-1

(a)25020015010050

250

200

150

100

50

BEMC-2

(b)25020015010050

250

200

150

100

50

BEMC-3

(c)25020015010050

250

200

150

100

50

BEMC-4

(d)

25020015010050250

200

150

100

50

BEMC-5

(e)25020015010050

250

200

150

100

50

BEMC-6

(f)25020015010050

250

200

150

100

50

BEMC-7

(g)25020015010050

250

200

150

100

50

BEMC-8

(h)

25020015010050250

200

150

100

50

BEMC-9

(i)25020015010050

250

200

150

100

50

BEMC-10

(j)25020015010050

250

200

150

100

50

BEMC-11

(k)

Figure 19: BEMCs of D18 obtained by FABEMD with Type-1 OSFW (MNAI = 1).

Table 2: Comparison among various FABEMD/BEMD for the STI in terms of total number of BEMCs, total time required, andorthogonality index (OI).

FABEMDType-4MNAI = 1




BEMD RBF-TPSMNAI = 1D = 0.01


Total no. ofBEMC

3 4 6 7 4 4

Total Time(seconds)

14.698635 446.639196 70.289524 166.728798 205.511427 975.323369

OI 0.0342 0.0581 0.0735 0.0376 0.0988 0.664


25020015010050250

200

150

100

50

BEMC-1

25020015010050250

200

150

100

50

BEMC-2

25020015010050250

200

150

100

50

BEMC-3

25020015010050250

200

150

100

50

BEMC-4

25020015010050250

200

150

100

50

BEMC-5

25020015010050250

200

150

100

50

BEMC-6

25020015010050250

200

150

100

50

BEMC-7

Figure 20: BEMCs of D18 image obtained by BEMD with RBF-TPS (MNAI = 10).

25020015010050250

200

150

100

50

BEMC-1

25020015010050250

200

150

100

50

BEMC-2

25020015010050250

200

150

100

50

BEMC-3

(a)

25020015010050250

200

150

100

50

BEMC-4

25020015010050250

200

150

100

50

BEMC-5

25020015010050250

200

150

100

50

BEMC-6

25020015010050250

200

150

100

50

BEMC-7

(b)

25020015010050250

200

150

100

50

BEMC-8

25020015010050250

200

150

100

50

BEMC-9

(c)

Figure 21: BEMCs of Elaine image obtained by FABEMD with Type-1 OSFW (MNAI = 1).


Table 3: Comparison among various FABEMD/BEMD for the STI in terms of achieved stopping point SD for each BIMF.







BIMF-1 0.98159 0.02355 0.98351 0.10443 0.98337 0.0067263

BIMF-2 0.99306 0.028749 0.99385 0.041544 0.99189 0.012556

BIMF-3 — 0.036908 0.99702 0.31663 0.90914 0.016638

BIMF-4 — — 0.99662 0.080695 — —

BIMF-5 — — 0.97419 0.050807 — —

BIMF-6 — — — 0.13184 — —

Table 4: Comparison among various FABEMD/BEMD for the STI in terms of global mean of the BEMCs.







BIMF-1 −0.1533 0.1332 0.0841 0.0652 0.4239 0.1035

BIMF-2 −0.9565 −0.4404 −0.0416 −0.0489 −0.6868 −0.1446

BIMF-3 — 8.2379 −0.1059 −0.1826 18.1854 9.9507

BIMF-4 — — −0.0171 −0.2247 — —

BIMF-5 — — 2.3059 0.1550 — —

BIMF-6 — — — 0.2183 — —

Residue 236.0785 227.0380 232.7433 234.9865 217.0462 225.0591

Table 5: Comparison among various FABEMD/BEMD for the STI in terms of number of extrema points in the source images for thecorresponding BIMF, and the residue.







BIMF-1 660 660 660 660 660 660

BIMF-2 98 161 196 420 78 84

BIMF-3 — 19 91 91 6 8

BIMF-4 — — 66 90 — —

BIMF-5 — — 8 80 — —

BIMF-6 — — — 12 — —

Residue 1 2 1 1 2 2

with FABEMD is clearly visible in Figure 19. On the otherhand, improvement of the BEMC quality for Elaine imageand aurora image is obvious with FABEMD. For example,BEMCs of the Elaine image have no or less distortion, andmore clearly reveal the edges and other characteristic featuresat different scales compared to the BEMCs obtained byBEMD. Similarly, observation of Figures 23 and 24 revealsthat the noise is better separated into the first or first fewBIMFs in FABEMD method than in BEMD method. This inturn should facilitate more efficient denoising of the auroraimage using FABEMD compared to using BEMD. Since theobtained BEMCs are better in FABEMD, the complete BHHT

analyses of images employing those BEMCs will be moreeffective. Preliminary studies on edge detection and noiseremoval using FABEMD show promising and significantlybetter performance compared to the analysis using BEMD,besides showing a dramatic improvement in the computa-tion time. Since the objective of this paper is to provide thedetails of the FABEMD algorithm and its features, and topropose the algorithm for all types of applications, whereverBEMD-type processing may be used, the specific application-wise performance is not reported here.

Envelope estimation in FABEMD, employing order-statistics filters, is nearly independent of the image or texture


25020015010050250

200

150

100

50

BEMC-1

25020015010050250

200

150

100

50

BEMC-2

25020015010050250

200

150

100

50

BEMC-3

25020015010050250

200

150

100

50

BEMC-4

25020015010050250

200

150

100

50

BEMC-5

25020015010050250

200

150

100

50

BEMC-6

25020015010050250

200

150

100

50

BEMC-7

Figure 22: BEMCs of Elaine image obtained by BEMD with RBF-TPS (MNAI = 10).

20015010050220

180

140

100

60

20

BEMC-1

(a)

20015010050220

180

140

100

60

20

BEMC-2

(b)

20015010050220

180

140

100

60

20

BEMC-3

(c)

20015010050220

180

140

100

60

20

BEMC-4

(d)

20015010050220

180

140

100

60

20

BEMC-5

(e)

20015010050220

180

140

100

60

20

BEMC-6

(f)

20015010050220

180

140

100

60

20

BEMC-7

(g)

Figure 23: BEMCs of noisy aurora image obtained by FABEMD with Type-1 OSFW (MNAI = 1).

pattern in terms of complexity and processing time; and theenvelopes closely follow the image. But envelope estimationin the BEMD method, employing surface interpolation, ishighly dependent on the maxima or minima maps, whilethe envelopes are not guaranteed to follow the image.In some cases when there are very few points in themaxima or minima maps, BEMD is prone to generate anerroneous surface and thus erroneous BEMCs. On the otherhand, FABEMD is inherently free of boundary effects andovershoot-undershoot problems, and thus it does not requireadditional boundary processing. In Section 4.1, the time

required for BEMD-based decomposition has been foundto be higher than the time required for FABEMD-baseddecomposition of a simple and uniform STI. But for realimages, the time taken by BEMD is even much higher thanthat required by FABEMD, which has been experienced in theexample simulations presented in this section as well. WhileFABEMD takes only a few minutes, BEMD takes many hours,even for a very few iterations performed per BIMF. Thisproblem hinders the application of BEMD in many practicalcases. Adaptability achievable through the selection of OSFWis another supplementary feature of the FABEMD process.


20015010050220

180

140

100

60

20

BEMC-1

(a)

20015010050220

180

140

100

60

20

BEMC-2

(b)

20015010050220

180

140

100

60

20

BEMC-3

(c)

20015010050220

180

140

100

60

20

BEMC-4

(d)

20015010050220

180

140

100

60

20

BEMC-5

(e)

20015010050220

180

140

100

60

20

BEMC-6

(f)

20015010050220

180

140

100

60

20

BEMC-7

(g)

Figure 24: BEMCs of noisy aurora image obtained by BEMD with RBF-TPS (MNAI = 10).

Table 6: Comparison among various FABEMD/BEMD for the STIin terms of order statistics filter width (OSFW).





BIMF-1 13 13 9 9

BIMF-2 45 43 13 13

BIMF-3 — 183 15 19

BIMF-4 — — 33 29

BIMF-5 — — 121 43

BIMF-6 — — 255 97

Residue — — — 255

This feature allows various possible sets of BEMCs from thesame image, which in turn helps in optimizing the imageprocessing need by providing the opportunity of selecting anappropriate set. Even though this type of adaptability is alsoavailable from BEMD by means of selecting different types ofinterpolation, the associated problems of interpolation maynot render the utilization of this method successfully.

Although various possibility of OSFW provides adapt-ability, sometimes it may impose some difficulty in applyingthe FABEMD algorithm; because, to achieve the desireddecomposition, a trial and error selection procedure mayhave to be performed to find the required value for OSFW.On the other hand, the requirement for manipulation ofwen for a BIMF, when the calculated value does not appearlarger than the previous BIMF mode, may impose addi-tional complexity. Hence, reducing the above-mentioneddifficulties can be worked out in future. Also, applying theFABEMD algorithm for various real image processing tasksand reporting the methodologies and the corresponding

results individually with comparison to the results employingother BEMD approaches will be interesting.

5. CONCLUSION

BEMD is a potential image processing algorithm. To boostincreased application of this algorithm for image processingapplications, a fast, time efficient, and effective methodis essential. This fact motivated the formulation of a fastand adaptive BEMD, abbreviated as FABEMD, described inthis paper. In FABEMD, the envelope estimation methodof regular BEMD is modified by replacing the 2D surfaceinterpolation by an order-statistics-based filtering followedby a smoothing operation. A number of window sizes can beselected for the order statistics and smoothing filters, all ofwhich are data driven and thus making the process adaptive.The simple change in the envelope estimation procedureprovides a tremendous enhancement of the algorithm interms of computation time. The proposed FABEMD hasbeen tested for decomposing various images, some ofwhich have been reported in this paper. Simulation resultsdemonstrate the usefulness of this novel FABEMD approachfor BEMD-based image decomposition. FABEMD enablesthe decomposition of images with any dimensions in a veryshort period of time, while the application of BEMD is stilllimited to smaller images. Beside reducing the computationtime, this novel approach also ensures a more accurateestimation of the BIMFs in some cases. It is believed thatFABEMD can be a perfect alternative to the regular BEMDand will play a very significant role in this area.

REFERENCES

[1] N. E. Huang, Z. Shen, S. R. Long, et al., “The empirical modedecomposition and the Hilbert spectrum for nonlinear and


non-stationary time series analysis,” Proceedings of the RoyalSociety A, vol. 454, no. 1971, pp. 903–995, 1998.



[4] J. C. Nunes, S. Guyot, and E. Delechelle, “Texture analysisbased on local analysis of the bidimensional empirical modedecomposition,” Machine Vision and Applications, vol. 16, no.3, pp. 177–188, 2005.

[5] A. Linderhed, “2-D empirical mode decompositions in thespirit of image compression,” in Wavelet and IndependentComponent Analysis Applications IX, vol. 4738 of Proceedingsof SPIE, pp. 1–8, Orlando, Fla, USA, April 2002.

[6] C. Damerval, S. Meignen, and V. Perrier, “A fast algorithm forbidimensional EMD,” IEEE Signal Processing Letters, vol. 12,no. 10, pp. 701–704, 2005.

[7] Y. Xu, B. Liu, J. Liu, and S. Riemenschneider, “Two-dimensional empirical mode decomposition by finite ele-ments,” Proceedings of the Royal Society A, vol. 462, no. 2074,pp. 3081–3096, 2006.

[8] Z. Liu, H. Wang, and S. Peng, “Texture classification throughdirectional empirical mode decomposition,” in Proceedings ofthe 17th IEEE International Conference on Pattern Recognition(ICPR ’04), vol. 4, pp. 803–806, Cambridge, UK, August 2004.

[9] Z. Liu, H. Wang, and S. Peng, “Texture segmentation usingdirectional empirical mode decomposition,” in Proceedings ofIEEE International Conference on Image Processing (ICIP ’04),vol. 1, pp. 279–282, Singapore, October 2004.

[10] S. R. Long, “Applications of HHT in image analysis,” inHilbert-Huang Transform and Its Applications, N. E. Huangand S. S. P. Shen, Eds., World Scientific, River Edge, NJ, USA,2005.

[11] Z. Liu and S. Peng, “Boundary processing of bidimensionalEMD using texture synthesis,” IEEE Signal Processing Letters,vol. 12, no. 1, pp. 33–36, 2005.

[12] N. E. Huang, M.-L. C. Wu, S. R. Long, et al., “A confidencelimit for the empirical mode decomposition and Hilbertspectral analysis,” Proceedings of the Royal Society A, vol. 459,no. 2037, pp. 2317–2345, 2003.

[13] G. Rilling, P. Flandrin, and P. Goncalves, “On empirical modedecomposition and its algorithms,” in Proceedings of IEEE-EURASIP Workshop on Nonlinear Signal and Image Processing(NSIP ’03), Grado, Italy, June 2003.

[14] C. Junsheng, Y. Dejie, and Y. Yu, “Research on the intrinsicmode function (IMF) criterion in EMD method,” MechanicalSystems and Signal Processing, vol. 20, no. 4, pp. 817–824, 2006.

[15] M. Shen, H. Tang, and B. Li, “The modified bidimensionalempirical mode decomposition for image denoising,” inProceedings of the 8th IEEE International Conference on SignalProcessing (ICSP ’06), vol. 4, Beijing, China, November 2007.

[16] Y. Washizawa, T. Tanaka, D. P. Mandic, and A. Cichocki,“A flexible method for envelope estimation in empiricalmode decomposition,” in Proceedings of the 10th InternationalConference on Knowledge-Based Intelligent Information andEngineering Systems (KES ’06), B. Gabrys, R. J. Howlett, andL. C. Jain, Eds., pp. 1248–1255, Bournemouth, UK, October2006.

[17] R. Srinivasan, R. Rengaswamy, and R. Miller, “A modifiedempirical mode decomposition (EMD) process for oscillation

characterization in control loops,” Control Engineering Prac-tice, vol. 15, no. 9, pp. 1135–1148, 2007.

[18] Z. K. Peng, P. W. Tse, and F. L. Chu, “An improved Hilbert-Huang transform and its application in vibration signalanalysis,” Journal of Sound and Vibration, vol. 286, no. 1-2, pp.187–205, 2005.

[19] B. Xuan, Q. Xie, and S. Peng, “EMD sifting based onbandwidth,” IEEE Signal Processing Letters, vol. 14, no. 8, pp.537–540, 2007.

[20] Z. Liu, “A novel boundary extension approach for empiricalmode decomposition,” in Proceedings of the InternationalConference on Intelligent Computing (ICIC ’06), D. Huang,K. Li, and G. W. Irwin, Eds., pp. 299–304, Kunming, China,August 2006.

[21] B. Shen, “Estimating the instantaneous frequencies of amulticomponent AM-FM image by bidimensional empiricalmode decomposition,” in Proceedings of IEEE InternationalWorkshop on Intelligent Signal Processing (WISP ’05), pp. 283–287, Faro, Portugal, September 2005.

[22] S. Kizhner, K. Blank, T. Flatley, N. E. Huang, D. Petrick, and P.Hestnes, “On certain theoretical developments underlying theHilbert-Huang transform,” in Proceedings of IEEE AerospaceConference, p. 14, Big Sky, Mont, USA, March 2006.

[23] R. C. Gonzalez and R. E. Woods, Digital Image Processing,Pearson Education, Upper Saddle River, NJ, USA, 2nd edition,2002.

[24] P. Brodatz, Textures: A Photographic Album for Artists andDesigners, Dover, New York, NY, USA, 1966.


Research ArticleSingle-Trial Classification of Bistable Perception byIntegrating Empirical Mode Decomposition, Clustering,and Support Vector Machine

Zhisong Wang,1 Alexander Maier,2 Nikos K. Logothetis,3 and Hualou Liang1

1 School of Health Information Sciences, University of Texas Health Science Center at Houston, 7000 Fannin,Suite 600, Houston, TX 77030, USA

2 Unit on Cognitive Neurophysiology and Imaging, National Institute of Health, Building 49, Room B2J-45, MSC-4400,49 Convent Dr., Bethesda, MD 20892, USA

3 Max Planck Institut fur biologische Kybernetik, Spemannstraße 38, 72076 Tubingen, Germany

Correspondence should be addressed to Hualou Liang, [email protected]

Received 23 August 2007; Revised 25 January 2008; Accepted 10 March 2008


We propose an empirical mode decomposition (EMD-) based method to extract features from the multichannel recordings oflocal field potential (LFP), collected from the middle temporal (MT) visual cortex in a macaque monkey, for decoding its bistablestructure-from-motion (SFM) perception. The feature extraction approach consists of three stages. First, we employ EMD todecompose nonstationary single-trial time series into narrowband components called intrinsic mode functions (IMFs) with timescales dependent on the data. Second, we adopt unsupervised K-means clustering to group the IMFs and residues into severalclusters across all trials and channels. Third, we use the supervised common spatial patterns (CSP) approach to design spatialfilters for the clustered spatiotemporal signals. We exploit the support vector machine (SVM) classifier on the extracted features todecode the reported perception on a single-trial basis. We demonstrate that the CSP feature of the cluster in the gamma frequencyband outperforms the features in other frequency bands and leads to the best decoding performance. We also show that theEMD-based feature extraction can be useful for evoked potential estimation. Our proposed feature extraction approach may havepotential for many applications involving nonstationary multivariable time series such as brain-computer interfaces (BCI).

Copyright © 2008 Zhisong Wang et al. This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. INTRODUCTION

Spiking activity has been extensively studied in brain researchto determine its relationship with perceptual reports orbehavioral choices during ambiguous visual stimulation inthe middle temporal (MT) visual area of macaque monkeys[1, 2]. However, spiking data as collected with standardneurophysiological techniques only provide informationabout the outputs of a small number of neurons within agiven brain area. The local field potential (LFP) has recentlyreceived increasing attention in the analysis of the neuronalpopulation activity [3, 4]. LFP is thought to largely arise fromthe dendritic activity of local populations of neurons andis dominated by the excitatory synaptic inputs to a corticalarea as well as intra-areal local processing. The investigationof correlations between perceptual reports and LFP oscilla-

tions during physically identical but perceptually differentconditions may provide new insights on the mechanism ofneural information processing and the neural basis of visualperception and perceptual decision making.

It is of great interest to study the temporally correlatedLFP voltage fluctuations to understand the cognitive andperceptual processes. To reveal the temporal structure ofLFP, the LFP spectrum at a certain time and frequency isoften analyzed. Perhaps the most commonly used methodfor spectral analysis is the Fourier transform, which providesa general method for estimating the global power-frequencydistribution of a given random process, assuming that itis stationary. As for neurobiological time series, however,Fourier analysis may be insufficient because the underlyingprocesses are mostly nonstationary. The short-time Fouriertransform (STFT) provides a means of joint time-frequency


analysis by applying moving windows to a signal and byFourier-transforming it within each window [5]. The short-coming of the short-time Fourier transform, however, isthat a single fixed analysis window is used. Consequently,signals with different spectral components are treated withthe same frequency resolution. In comparison with Fouriertransform-based techniques, wavelet transform- (WT-) [6]based methods offer new capacity and advantages. Forexample, the basis functions in WT are not limited to si-nusoidal waves, but rather are chosen by the user to ad-dress various problems of time-frequency resolution. Inprinciple, WT uses short windows at high frequencies andlong windows at low frequencies, which renders WT moresuitable for dealing with nonstationary time series. Nonethe-less, wavelet analysis is also limited by the fundamentaluncertainty principle, in which both time and frequencycannot simultaneously be resolved with the same precision.Moreover, the results of WT analysis depend on the choiceof the mother wavelet, which is arbitrary and may notbe optimal for the time series under scrutiny. In contrastto the STFT and WT approaches, the empirical modedecomposition (EMD) method [7] adaptively decomposesnonstationary time series into narrow-band components,namely, intrinsic mode functions (IMFs), by empiricallyidentifying the physical time scales intrinsic to the datawithout assuming any basis functions. These IMF compo-nents allow the calculation of a meaningful multicomponentinstantaneous frequency by virtue of the Hilbert transform.Thus, one can potentially localize events in both time andfrequency, even in nonstationary time series. Because of itsversatility and generality, the EMD approach has found wideuse in a variety of applications ranging from geophysics,biomedicine, neuroscience, financial engineering, and mete-orology to seismology [8–12].

The availability of multichannel recordings offers oppor-tunities to study how populations of neurons interact toproduce a certain perceptual outcome. On one hand, dif-ferent channels of LFP may reflect different aspects of brainactivity correlated with the percept, and it is of interest tocombine the LFP signals from various channels to exploitthe different but complementary information embedded inthe data simultaneously recorded from multiple channels.On the other hand, LFP signals from multiple channels maycontain irrelevant and redundant information. By extractingthe most discriminative features, we not only can reduce thedata dimension but also can attain faster computation andmore accurate classification. The common spatial patterns(CSPs) method has become a popular feature extractionapproach in EEG-based brain-computer interface (BCI)applications [13–15]. CSP essentially finds spatial filters thatmaximize the variance for one class and simultaneouslyminimize the variance for the other class. A preprocessingstep of CSP is the use of bandpass filtering on the EEG signalsin a given frequency band. However, the most predictivefrequency content for CSP may be subject specific andunknown a priori in some tasks. To deal with this problem,we first employ EMD on nonstationary single trials fromeach channel to extract the signal energy associated withvarious intrinsic time scales. Since different trials of LFP

may be decomposed by EMD into IMFs with differentnumbers, we use K-means clustering [16] to group theIMFs and residues into a number of clusters across all trialsand channels. We then apply CSP to filter the clusteredspatiotemporal signals and extract features from the CSP-filtered signals for each cluster. Based on the extractedfeatures, we decode the monkey’s perception for each trialusing the support vector machine (SVM) classifier. In doingso, we have discovered that the cluster in the gammafrequency band carries the most discriminative informationabout the percept and hence it is the best feature fordecoding perception under these circumstances. By takingadvantage of both the temporal and spatial correlations ofsingle-trial time series, we have proposed a general featureextraction approach that may have potential for a plethoraof applications involving nonstationary multivariable timeseries.

In this paper, we examine the use of EMD to study neu-ronal activity in visual cortical area MT of a macaque mon-key performing a bistable structure-from-motion (SFM)task [17]. The rest of the paper is organized as follows.In Section 2, we first present the materials and featureextraction approach consisting of empirical mode decom-position of single-trial LFP time series, K-means clustering,and common spatial patterns-based spatial filtering. Wethen briefly introduce the SVM classifier. In Section 3, weexplore the application of the EMD-based feature extractionapproach for decoding the bistable SFM perception andestimating evoked potentials. Finally, Section 4 contains theconclusions.

2. MATERIALS AND METHODS

2.1. Subjects and neurophysiological recordings

Electrophysiological recordings were performed in a healthyadult male rhesus monkey. After behavioral training wascomplete, an occipital recording chamber was implantedand a craniotomy was made. Intracortical recordings wereconducted with a multielectrode array while the monkey wasviewing SFM stimuli, which consisted of an orthographicprojection of a transparent sphere that was covered withrandomly distributed dots on its entire surface. Stimulirotated for the entire period of presentation, giving theappearance of three-dimensional structure. The monkey waswell trained and required to indicate the choice of rotationdirection (clockwise or counterclockwise) by pushing one oftwo levers. Correct responses for disparity-defined stimuliwere acknowledged with application of a fluid reward. In thecase of fully ambiguous (bistable) stimuli, where the stimulican be perceived in one of two possible ways and no correctresponse can be externally defined, the monkey was rewardedby chance. The bistable stimuli effectively dissociate perceptsfrom the visual inputs and such uncoupling between stimuliand percepts serves as an entry point for research into theneural correlate of perception [18]. Only the trials of datacorresponding to bistable stimuli are analyzed in the paper.The recording site was the MT area of the monkey’s visual

Zhisong Wang et al. 3

cortex, which is commonly associated with visual motionprocessing.

2.2. Feature extraction

The feature extraction approach consists of three stages.First, we employ the EMD method to decompose nonsta-tionary single-trial time series into narrow-band compo-nents called IMFs with time scales intrinsic to the data.Second, we adopt unsupervised K-means clustering to groupthe IMFs and residues into a number of clusters acrossall trials and channels. Third, we use the CSP approachto design spatial filters for the clustered spatiotemporalsignals such that the resulting filtered signals carry themost discriminative information about the percept. Usingthe extracted features of MT LFP responses, the perceptualreports can be determined with high accuracy on a single-trial basis. In what follows, we provide more details for eachstage of feature extraction.

2.2.1. Empirical mode decomposition

The central idea of the EMD time series analysis is todecompose a time series into a set of IMFs by a siftingprocess in order to empirically identify the physical timescales intrinsic to the time series [7]. A time series mustsatisfy two criteria to be an IMF: (1) the number of extremaand the number of zero crossings are either equal or differat most by one; and (2) the mean of its upper and lowerenvelopes equals zero. The first criterion is similar to anarrow-band requirement. The second criterion is necessaryto ensure that the instantaneous frequency will not haveunwanted fluctuations as induced by asymmetric waveforms.

The sifting process for extracting IMFs from a given time-series x(t) is described as follows. First, two smooth splinesare constructed connecting all the maxima and minimaof x(t) to obtain its upper envelope, xu(t), and its lowerenvelope, xl(t). Once the extrema are identified, all themaxima are connected by a cubic spline line as the upperenvelope. The procedure is repeated for the local minimato produce the lower envelope. Second, the mean of thetwo envelopes is subtracted from the data to obtain theirdifference d(t) = x(t) − (xu(t) + xl(t))/2. Third, the processis repeated for d(t) until the resulting signal c1(t), the firstIMF, satisfies the criteria of an intrinsic mode function. Theresidue r1(t) = x(t)−c1(t) is then treated as a new time seriessubject to the sifting process as described above, yielding thesecond IMF from r1(t). The procedure continues until eitherthe recovered IMF or the residual time series gets too small,or the residual time series has no turning points. Once all ofthe IMFs have been extracted, the final residual componentrepresents the overall trend of the time-series. At the end ofthis process, the time series x(t) can be expressed as follows:

x(t) =n∑

j=1

cj(t) + rn(t), (1)

where n is the number of IMFs, and rn(t) denotes the finalresidue. By the nature of the decomposition procedure, the

technique decomposes a time series into n fundamentalcomponents, each with a distinct time scale. More specifi-cally, the first component has the smallest time scale whichcorresponds to the fastest time variation of data. As thedecomposition process proceeds, the time scale increases,and hence, the mean frequency of the mode decreases. Sincethe decomposition is based on the local characteristic timescale of the time series to yield adaptive basis, it is applicableto nonlinear and nonstationary data analysis.

In practice, the resulting time-series after a certain num-ber of iterations in the sifting process does not carry signif-icant physical information and a pure frequency-modulatedsignal of constant amplitude can result from oversifting. Tomaintain the physical sense of the IMF components in termsof amplitude and frequency modulations, it is typical to stopthe sifting process by limiting the standard deviation fromtwo consecutive sifting results between 0.2 and 0.3 [7]. Inaddition, the cubic spline fitting in the sifting process maysuffer from the end effect problem: large swings near theends of the time series may occur and propagate inward.To overcome this problem, characteristic waves with twoconsecutive extrema for both the frequency and amplitudecan be added at the ends [7].

2.2.2. K-means clustering

The goal of clustering is to partition a dataset into clusterssuch that the data within each cluster are similar but the dataacross distinct clusters are different. The K-means clusteringalgorithm is a popular algorithm for partitioning a datasetinto K clusters with each cluster represented by its mean[16]. Initially, the K-means clustering algorithm generatesK random points as cluster means. Then it iterates twosteps, namely, the assignment step and update step untilconvergence. In the assignment step, each data point isassigned to the cluster so that the distance from the datapoint to the mean of the cluster is smaller than that fromthe data point to the means of other clusters. In the updatestep, the means of all clusters are recomputed and updatedbased on the data points assigned to them. The convergencecriterion can be that the cluster assignment does not change.The K-means clustering algorithm is simple and fast but theclustering results depend on the initial random assignments.To overcome this problem we can take the best clusteringfrom multiple random starts.

To determine the number of clusters, we use the silhou-ette value that measures how similar a data point is to pointsin its own cluster compared to points in other clusters [19].It is defined as follows:

s(i) = minl b(i, l)− a(i)max

(a(i), minl b(i, l)

) , (2)

where a(i) is the average distance from the ith data pointto the other points in its cluster, and b(i, l) is the averagedistance from the ith point to points in another cluster l.The silhouette value ranges from −1 to +1 with 1 meaningthat data are separable and correctly clustered, 0 denotingpoor clustering, and −1 meaning that the data are wronglyclustered.


2.2.3. Common spatial patterns

The CSP technique has become a popular feature extractionapproach in EEG-based BCI applications [13–15]. The CSPalgorithm essentially finds spatial filters that maximize thevariance for one class and simultaneously minimize thevariance for the other class.

Assume that we have M channels of neuronal signals andeach trial has T time samples. Let X j(i) denote the M × Tdata matrix for the ith trial from class j, where j ∈ {−1, +1}denotes a perceptual report for the rotation direction withj = −1 for the clockwise direction and j = +1 for the coun-terclockwise direction. Let Nj denote the number of singletrials for class j. Then the mean vector for each class is

X j = 1Nj

Nj∑

i=1

X j(i). (3)

Let

V j = X j(i)− X j ,

R j = 1Nj

Nj∑

i=1

V jV′j

tr(

V jV′j

) ,(4)

where R j is the average normalized covariance matrix forclass j, tr(·) represents the sum of the diagonal elements, and(·)′ means transpose.

Let the eigendecomposition of R−1 + R+1 be

R−1 + R+1 = UΣU′, (5)

where U is a unitary matrix and Σ is a diagonal matrix. Let

S−1 = PR−1P′, S+1 = PR+1P′, (6)

where P = Σ−1/2U′ is the whitening transformation matrixfor R−1 + R+1. Let the eigendecomposition of S−1 be

S−1 = BΣ−1B′. (7)

Then it can be shown that

S+1 = BΣ+1B′, Σ−1 + Σ+1 = I, (8)

where I is the identity matrix. Let

W = B′P. (9)

It can be shown that

R−1 = W′Σ−1W, R+1 = W′Σ+1W. (10)

Hence W simultaneously diagonalizes R−1 and R+1, andeach row of W is an eigenvector. In general, only theeigenvectors (rows of W) corresponding to the largest andsmallest few eigenvectors are used as spatial filters for CSPsince they are most suitable for the discrimination purpose.Note that the spatial filters of CSP can also be determined bysolving a generalized eigenvalue problem:

R−1v = λR+1v. (11)

2.3. Support vector machine classifier

SVM is a promising classifier that minimizes the empiricalclassification error and at the same time maximizes themargin by determining a separating hyperplane to distin-guish different classes of data [20, 21]. Robust to outliersand having a good generalization capability, SVM has beensuccessfully used in many applications.

Assume that xk, k = 1, . . . ,K are the K training featurevectors for decoding and the class labels are yk ∈ {−1, +1}.SVM solves the following optimization problem:

min‖w‖2 + CK∑

k=1

ξk

subject to yk(

w′xk + b) ≥ 1− ξk, ξk ≥ 0,

(12)

where w is the weight vector, C > 0 is the penalty parameterof the error term chosen by cross-validation, ξk is the slackvariable, and b is the bias term. It turns out that the margin ofthe two classes is inversely proportional to ‖w‖2. Therefore,the first term in the objective function of SVM is used tomaximize the margin. The second term in the objectivefunction is the regularization term that allows for trainingerrors for the inseparable case.

The optimal solution of w and b to the above optimiza-tion problem can be found by using the Lagrange multipliermethod. Assume that t is the testing feature vector. Thentesting is done simply by determining on which side of theseparating hyperplane t lies, that is, if w′t + b ≥ 0, the labelof t is classified as +1, otherwise, the label is classified as −1.

SVM belongs to a class of methods, known as kernelmethods, which use a kernel function to map data intoa high-dimensional feature space in order to increase theexpressive power [20, 22, 23]. The kernel trick can also beapplied to other algorithms such as Fisher’s linear discrimi-nant analysis (LDA), principal components analysis (PCA),and so on [24]. Nevertheless, the linear SVM has been widelyused due to its simplicity, robustness, and interpretability, seefor example, [25]. In general, linear classifiers are less proneto overfitting than the nonlinear ones for the limited trainingdata. In addition, the relative importance of individualfeatures can be assessed by examining the weights in thelinear classifier.

3. EXPERIMENTAL RESULTS

In this section, we provide experimental examples to demon-strate the performance of the proposed feature extractionapproach for predicting perceptual decisions from the neu-ronal data. Simultaneously collected nine-channel LFP datawere used for demonstration. For K-means clustering, we use50 random starts to find the best clustering and adopt thecorrelation between the power spectra of IMFs and residuesfrom all trials and channels as the distance metric. For CSP,we use the pair of eigenvectors corresponding to the largestand smallest eigenvectors as spatial filters. In this paper, weemploy the linear SVM classifier from the LIBSVM package[26]. We use decoding accuracy as a performance measureand calculate it via leave-one-out cross-validation (LOOCV).


150

100

50

0

−50

−100

−150

−200

−250

Am

plit

ude

−500 0 500 1000 1500 2000

Time (ms)

(a)

0−50−100R

esid

ue

−500 0 500 1000 1500 2000

Time (ms)

1000

−100IMF

5

−500 0 500 1000 1500 2000

1000

−100IMF

4

−500 0 500 1000 1500 2000

1000

−100IMF

3

−500 0 500 1000 1500 2000

1000

−100IMF

2

−500 0 500 1000 1500 2000

1000

−100IMF

1

−500 0 500 1000 1500 2000

(b)

Figure 1: (a) A typical trial of LFP time series and (b) its five intrinsic mode functions (IMFs) and residue. Time 0 indicates the stimulusonset.

In particular, for a dataset withN trials, we chooseN−1 trialsfor feature extraction and training and use the remaining 1trial for testing. This is repeated for N times with each trialused for testing once. The decoding accuracy is obtained asthe ratio of the number of correctly decoded trials over N .

Figures 1(a), 1(b) show a typical trial of LFP time seriesand its five IMFs and residue, respectively. Similar resultswere observed from other trials though there was variationfrom trial to trial in the number of IMF componentsproduced by EMD. Figure 2 shows the mean and standarddeviation of the average silhouette values obtained byclustering the IMFs and residues across all trials and channelsusing the K-means algorithm with different random starts asa function of the number of clusters. Note that the averagesilhouette values significantly increase with the numberof clusters until the number of clusters is equal to five(comparing the average silhouette values obtained when thenumber of clusters is five with those obtained when thenumber of clusters is six via t-test, P-value > .05). Hence wechoose the number of clusters to be five. Figures 3(a), 3(b)show the time series and power spectrum, respectively, ofcluster 1–5 as the grand average across all trials and channels.It can be seen that the time series of different clusters differin the smoothness and physical time scales. These clusters fallinto five spectral bands: the first cluster in the gamma band(30–70 Hz), the second cluster in the alpha and beta bands(9–30 Hz), the third cluster in the theta band (4–8 Hz), thefourth cluster in the delta band (1–3 Hz), and the fifth clusteris mainly the DC component.

Next, we compare the SVM decoding accuracy basedon different features and present the results in Tables 1,2, focusing on a nonoverlapping moving window of 200milliseconds in length. Note that for each feature, we reportonly the decoding accuracy of the moving window that yieldsthe best decoding performance. In Table 1, we compare thedecoding accuracy based on the features obtained from the

0.5

0.45

0.4

0.35

0.3

0.25

0.2

Ave

rage

silh

ouet

teva

lue

2 3 4 5 6

Number of clusters

Figure 2: The mean and standard deviation of the averagesilhouette values obtained by clustering the IMFs and residuesacross all trials and channels using the K-means algorithm withdifferent random starts as a function of the number of clusters.Note that the average silhouette values significantly increase withthe number of clusters until the number of clusters is equal tofive (comparing the average silhouette values obtained when thenumber of clusters is five with those obtained when the numberof clusters is six via t-test, P-value > .05).

raw signal and clusters 1–5 (denoted as c1–c5, resp.) and theirCSP counterparts. It is clear from Table 1 that the featureobtained by using CSP on cluster 1 yields the best decodingaccuracy of 0.76. Hence it is the most discriminative featureof the LFP data of all the above features. In Table 2, we com-pare the decoding accuracy based on the features obtainedfrom the bandpass filtered signal and cluster 1 (denoted asc1) and their CSP counterparts. Here the bandpass-filtered


2

0

−2−500 0 500 1000 1500 2000

Time (ms)

50

0

−50−500 0 500 1000 1500 2000

−500 0 500 1000 1500 2000

100

0

−100

50

0

−50−500 0 500 1000 1500 2000

20

0

−20−500 0 500 1000 1500 2000

(a)

200

100

00 20 40 60 80 100

Frequency (Hz)

10050

00 20 40 60 80 100

0 20 40 60 80 100

50

0

20

10

00 20 40 60 80 100

10

5

00 20 40 60 80 100

(b)

Figure 3: The grand average across all trials and channels of (a) time series and (b) power spectrum corresponding for clusters 1–5. Thesubfigures from top to bottom correspond to clusters 1–5, respectively. Note that the clusters fall into distinct spectral bands: the first clusterin the gamma band (30–70 Hz), the second cluster in the alpha and beta bands (9–30 Hz), the third cluster in the theta band (4–8 Hz), thefourth cluster in the delta band (1–3 Hz), and the fifth cluster is mainly the DC component.

100

80

60

40

20

0

−20

−40

−60

−80

−100

Am

plit

ude

−500 0 500 1000 1500 2000

Time (ms)

(a)

100

80

60

40

20

0

−20

−40

−60

−80

−100

−120

Am

plit

ude

−500 0 500 1000 1500 2000

Time (ms)

(b)

Figure 4: The average evoked potential from a single channel obtained as (a) the combination of clusters 3–5 and (b) ensemble averaging.Note that there is a close match between the average evoked potential obtained via these two approaches.

Table 1: Comparison of the decoding accuracy based on the features obtained from the raw signal and clusters 1–5 (denoted as c1–c5, resp.)and their CSP counterparts.

Approach Raw signal c1 c2 c3 c4 c5

Decoding accuracy 0.70 0.71 0.70 0.61 0.61 0.61

Approach Raw signal + CSP c1 + CSP c2 + CSP c3 + CSP c4 + CSP c5 + CSP

Decoding accuracy 0.69 0.76 0.66 0.66 0.66 0.64

Table 2: Comparison of the decoding accuracy based on the features obtained from the bandpass filtered signal and cluster 1 (denoted asc1) and their CSP counterparts.

Approach Bandpass filtered signal Bandpass filtered signal + CSP c1 c1 + CSP

Decoding accuracy 0.65 0.71 0.71 0.76


signal represents the signal obtained by directly bandpass-filtering the raw single-trial signal in the gamma band (30–70 Hz). We can observe that although the first cluster alsofalls into the gamma band, its decoding accuracy significantlyoutperforms that based on the bandpass-filtered signal inthe same band. The different performance underscores thefundamental difference between bandpass filtering and EMDin that EMD adaptively identifies the physical time scalesintrinsic to the nonstationary LFP time series.

Interestingly, the EMD-based feature extraction is alsouseful for evoked potential estimation. From Figure 3, we seethat clusters 3–5 correspond to the low-frequency compo-nents and hence contribute to the average evoked potential.Figure 4(a) shows the average evoked potential from a singlechannel as the combination of clusters 3–5. Note that thestimulus appears at t = 0 second and remains until twoseconds. As can be seen, the average evoked potentialcaptures the event-related neural activity. Furthermore, theresulting average evoked potential is quite smooth evenwhen the number of trials is not very large. Figure 4(b)shows the average evoked potential from a single channelobtained by the commonly used ensemble averaging, whichaverages across all trials the potentials measured overrepeated presentations of a stimulus. It is actually the sameas the combination of clusters 1–5. Obviously, the averageevoked potential obtained by ensemble averaging is poor andprobably contains background activity and noise, which isdue to the fact that ensemble averaging generally requires alarge number of trials to achieve satisfactory performance.

4. CONCLUSIONS

In this paper, we have shown how to apply an EMD-based method to extract features from the LFP in monkeyvisual cortex for decoding its bistable SFM perception. Wehave employed EMD to decompose nonstationary single-trial time series into narrow-band components called IMFswith time scales dependent on the data. We have adoptedunsupervised K-means clustering to group the IMFs andresidues into a number of clusters across all trials andchannels. We have used the supervised CSPs approach todesign spatial filters for the clustered spatiotemporal signalssuch that the resulting filtered signals carry the most discrim-inative information about the percept. We have exploitedthe SVM classifier on the extracted features of the LFPresponse to decode the reported perception on a single-trialbasis. We have applied the feature extraction approach tothe multichannel intracortical LFP data collected from theMT visual area in a macaque monkey performing an SFMtask. Using these techniques, we have demonstrated that theCSP feature of the cluster in the gamma frequency bandoutperforms the features in other bands and leads to the bestdecoding performance. We have also shown that the EMD-based feature extraction can be useful for evoked potentialestimation.

The advantages of our proposed feature extractionapproach can be summarized as follows. First of all, theapproach is data-driven and can identify the physical timescales intrinsic to the data. Hence it does not demand prior

knowledge of the discriminative frequency bands and isrobust against subject variability. In addition, it does notrequire stationarity of the time series or any functionalbasis. Because of these advantages, the EMD-based featureextraction approach may have potential for a large varietyof applications involving nonstationary multivariable timeseries such as brain-computer interfaces (BCI).

ACKNOWLEDGMENT

This work was supported by the NIH R01 MH072034 andthe Max Planck Society.

REFERENCES

[1] K. H. Britten, W. T. Newsome, M. N. Shadlen, S. Celebrini,and J. A. Movshon, “A relationship between behavioral choiceand the visual responses of neurons in macaque MT,” VisualNeuroscience, vol. 13, no. 1, pp. 87–100, 1996.

[2] J. V. Dodd, K. Krug, B. G. Cumming, and A. J. Parker, “Per-ceptually bistable three-dimensional figures evoke high choiceprobabilities in cortical area MT,” Journal of Neuroscience,vol. 21, no. 13, pp. 4809–4821, 2001.

[3] B. Pesaran, J. S. Pezaris, M. Sahani, P. P. Mitra, and R. A.Andersen, “Temporal structure in neuronal activity duringworking memory in macaque parietal cortex,” Nature Neuro-science, vol. 5, no. 8, pp. 805–811, 2002.

[4] G. Kreiman, C. P. Hung, A. Kraskov, R. Q. Quiroga, T. Poggio,and J. J. DiCarlo, “Object selectivity of local field potentialsand spikes in the macaque inferior temporal cortex,” Neuron,vol. 49, no. 3, pp. 433–445, 2006.

[5] L. Cohen, Time Frequency Analysis, Prentice-Hall, EnglewoodCliffs, NJ, USA, 1995.

[6] S. Mallat, A Wavelet Tour of Signal Processing, Academic Press,San Diego, Calif, USA, 1999.

[7] N. E. Huang, Z. Shen, S. R. Long, et al., “The empirical modedecomposition and the Hilbert spectrum for nonlinear andnonstationary time-series analysis,” Proceedings of the RoyalSociety of London A, vol. 454, no. 1971, pp. 903–995, 1998.


[9] H. Liang, S. L. Bressler, E. A. Buffalo, R. Desimone, and P.Fries, “Empirical mode decomposition of field potentials frommacaque V4 in visual spatial attention,” Biological Cybernetics,vol. 92, no. 6, pp. 380–392, 2005.


[11] N. E. Huang and N. O. Attoh-Okine, Eds., The Hilbert-HuangTransform in Engineering, CRC, Boca Raton, Fla, USA, 2005.

[12] C.M. Sweeney-Reed and S. J. Nasuto, “A novel approach tothe detection of synchronisation in EEG based on empiricalmode decomposition,” Journal of Computational Neuroscience,vol. 23, no. 1, pp. 79–111, 2007.

[13] Z. J. Koles, “The quantitative extraction and topographicmapping of the abnormal components in the clinical EEG,”


Electroencephalography and Clinical Neurophysiology, vol. 79,no. 6, pp. 440–447, 1991.

[14] J. Muller-Gerking, G. Pfurtscheller, and H. Flyvbjerg, “Design-ing optimal spatial filters for single-trial EEG classification in amovement task,” Clinical Neurophysiology, vol. 110, no. 5, pp.787–798, 1999.

[15] H. Ramoser, J. Muller-Gerking, and G. Pfurtscheller, “Optimalspatial filtering of single trial EEG during imagined handmovement,” IEEE Transactions on Rehabilitation Engineering,vol. 8, no. 4, pp. 441–446, 2000.

[16] J. B. MacQueen, “Some methods for classification and analysisof multivariate observations,” in Proceedings of the 5th BerkeleySymposium on Mathematical Statistics and Probability, vol. 1,pp. 281–297, Berkeley, Calif, USA, June-July 1967.

[17] Z. Wang, A. Maier, D. A. Leopold, N. K. Logothetis, andH. Liang, “Single-trial evoked potential estimation usingwavelets,” Computers in Biology and Medicine, vol. 37, no. 4,pp. 463–473, 2007.

[18] R. Blake and N. K. Logothetis, “Visual competition,” NatureReviews Neuroscience, vol. 3, no. 1, pp. 13–21, 2002.

[19] L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: AnIntroduction to Cluster Analysis, John Wiley & Sons, New York,NY, USA, 1990.

[20] V. N. Vapnik, The Nature of Statisical Learning Theory,Springer, New York, NY, USA, 1995.

[21] C. Cortes and V. N. Vapnik, “Support-vector networks,”Machine Learning, vol. 20, no. 3, pp. 273–297, 1995.

[22] B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A trainingalgorithm for optimal margin classifiers,” in Proceedings of the5th Annual Workshop on Computational Learning Theory, pp.144–152, Pittsburgh, Pa, USA, July 1992.

[23] B. Scholkopf, C. J. C. Burges, and A. J. Smola, Eds., Advancesin Kernel Methods: Support Vector Learning, MIT Press,Cambridge, Mass, USA, 1999.

[24] J. Shawe-Taylor and N. Cristianini, Kernel Methods for PatternAnalysis, Cambridge University Press, Cambridge, UK, 2004.

[25] C. P. Hung, G. Kreiman, T. Poggio, and J. J. DiCarlo, “Fastreadout of object identity from macaque inferior temporalcortex,” Science, vol. 310, no. 5749, pp. 863–866, 2005.

[26] C.-C. Chang and C.-J. Lin, “LIBSVM: a library for supportvector machines,” 2001, http://www.csie.ntu.edu.tw/∼cjlin/libsvm.


Research ArticleA Fault Diagnosis Approach for Gears Based onIMF AR Model and SVM

Junsheng Cheng, Dejie Yu, and Yu Yang

The State Key Laboratory of Advanced Design and Manufacturing for Vehicle Body, Hunan University,Changsha 410082, China

Correspondence should be addressed to Junsheng Cheng, [email protected]

Received 24 July 2007; Revised 28 February 2008; Accepted 15 April 2008


An accurate autoregressive (AR) model can reflect the characteristics of a dynamic system based on which the fault featureof gear vibration signal can be extracted without constructing mathematical model and studying the fault mechanism of gearvibration system, which are experienced by the time-frequency analysis methods. However, AR model can only be applied tostationary signals, while the gear fault vibration signals usually present nonstationary characteristics. Therefore, empirical modedecomposition (EMD), which can decompose the vibration signal into a finite number of intrinsic mode functions (IMFs), isintroduced into feature extraction of gear vibration signals as a preprocessor before AR models are generated. On the other hand,by targeting the difficulties of obtaining sufficient fault samples in practice, support vector machine (SVM) is introduced into gearfault pattern recognition. In the proposed method in this paper, firstly, vibration signals are decomposed into a finite numberof intrinsic mode functions, then the AR model of each IMF component is established; finally, the corresponding autoregressiveparameters and the variance of remnant are regarded as the fault characteristic vectors and used as input parameters of SVMclassifier to classify the working condition of gears. The experimental analysis results show that the proposed approach, in whichIMF AR model and SVM are combined, can identify working condition of gears with a success rate of 100% even in the case ofsmaller number of samples.

Copyright © 2008 Junsheng Cheng et al. This is an open access article distributed under the Creative Commons AttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properlycited.

1. INTRODUCTION

The process of gear fault diagnosis includes the acquisition ofinformation, extracting feature, and recognizing conditions,in which the last two are the prior.

Signal processing methods have been widely used toextract fault feature of gear vibration signals [1, 2]. Fouriertransform (FT), which has been the dominating analysis toolfor feature extraction of stationary signals, could producethe statistical average characteristics over the entire durationof the data. However, it fails to provide the whole andlocal features of the signal in time and frequency domain.Unfortunately, the gear fault vibration signals exactly presentnonstationary characteristics. On the other hand, the time-frequency analysis methods can generate both time andfrequency information of a signal simultaneously. Therefore,in the most recent studies, the time-frequency analysismethods are used in gear fault feature extraction [3–5].Among all the available time-frequency analysis methods,

the wavelet transform may be the best one [6, 7], however,it still has some inevitable deficiencies [8]. Firstly, energyleakage will occur when wavelet transform is used to processsignals due to the fact that wavelet transform is essentiallyan adjustable windowed Fourier transform. Secondly, theappropriate base function needs to be selected in advance.Moreover, once the decomposition scales are determined,the results of wavelet transform would be the signal undera certain frequency band. Therefore, wavelet transform isnot a self-adaptive signal processing method in nature. Inaddition, the mathematical model needs to be establishedor the fault mechanism of the gear vibration system needsto be studied before the feature extraction in above-mentioned methods, which usually are quite difficult to befulfilled in practice. Autoregressive (AR) model, which hasno requirements of constructing mathematical model andstudying the fault mechanism of a complex gear vibrationsystem in advance, is a time sequence analysis method whoseparameters comprise significant information of the system


condition; more importantly, an accurate AR model canreflect the characteristics of a dynamic system. Additionally,it is indicated that the autoregression parameters of ARmodel are very sensitive to the condition variation [9, 10].The gear fault vibration signals own shock characteristics,whereas AR model can model transients and its frequencyresponse function can be calculated from autoregressionparameters of AR model. Therefore, the autoregressionparameters can be used to analyze the condition variationof dynamic systems. However, when the AR model isapplied to nonstationary signals, it is difficult to estimateautoregression parameters by the least square method orYule-Walker equation method. The time-dependent autore-gressive and moving average (ARMA) model, on the otherhand, can be applied to nonstationary signals, but the morecomputation time is needed. Furthermore, only when thetime-dependent ARMA model is applied to the commonlylinear frequency and amplitude modulated signals, can thesatisfactory results be obtained [11]. Therefore, it is necessaryto preprocess the vibration signals before the AR model isgenerated. Empirical mode decomposition (EMD) is anewtime-frequency analysis method proposed by Huang et al.[12, 13], which is based on the local characteristic timescale of signal and decomposes the complicated signal intoa number of intrinsic mode functions (IMFs). By analyzingeach IMF component that involves the local characteristicof the signal, the features of the original signal couldbe extracted more accurately and effectively. In addition,the frequency components involved in each IMF not onlyrelates to sampling frequency but also changes with thesignal itself, therefore EMD is a self-adaptive time frequencyanalysis method that is perfectly applicable to nonlinearand nonstationary processing. Now EMD method has beenwidely applied to the mechanical fault diagnosis and con-dition monitoring. In [14], EMD method is combined withsmoothed nonlinear energy operator to detect flute breakage.The results demonstrate that this method can efficientlymonitor the conditions of the endmill under varying cuttingconditions. In [15], a fault diagnosis method for sheetmetal stamping process based on EMD and learning vectorquantization is proposed. The results show that this methodcould successfully detect the artificially created defects. Inthis paper, targeting the nonstationary characteristics of gearvibration signal and disadvantage of AR model, a faultfeature extraction method in which IMF and AR model arecombined is proposed.

After the feature extraction, the pattern recognition isanother point of gears fault diagnosis [16–18]. Conventionalstatistical pattern recognition methods and artificial neuralnetworks (ANNs) classifiers are studied based on the premisethat the sufficient samples are available, which is notalways true in practice [19]. In recent years, support vectormachines (SVMs) have been found to be remarkably effectivein many real-world applications [20–23]. They are basedon statistical learning theories that are of specialties for asmaller sample number and have better generalization thanANNs and guarantee that the extremum and global optimalsolution are exactly the same. Meantime, SVMs can solve thelearning problem of a smaller number of samples [24, 25].

Due to the fact that it is difficult to obtain sufficient faultsamples in practice, SVMs are introduced into gears faultdiagnosis due to their high accuracy and good generalizationfor a smaller sample number in this paper.

2. EMD METHOD

EMD method is developed from the simple assumption thatany signal consists of different simple intrinsic modes ofoscillations. Each linear or nonlinear mode will have thesame number of extrema and zero-crossings. There is onlyone extremum between successive zero-crossings. Each modeshould be independent of the others. In this way, each signalcould be decomposed into a number of intrinsic modefunctions (IMFs), each of which must satisfy the followingdefinition [12, 13].

(1) In the whole dataset, the number of extrema and thenumber of zero-crossings must either equal or differat most by one.

(2) At any point, the mean value of the envelope definedby local maxima and the envelope defined by the localminima is zero.

An IMF represents a simple oscillatory mode comparedwith the simple harmonic function. With the definition, anysignal x(t) can be decomposed as follows.

(1) Identify all the local extrema, then connect all thelocal maxima by a cubic spline line as the upper envelope.

(2) Repeat the procedure for the local minima to producethe lower envelope. The upper and lower envelopes shouldcover all the data between them.

(3) The mean of upper and lower envelope value isdesignated as m1, and the difference between the signal x(t)and m1 is the first component, h1:

x(t)−m1 = h1. (1)

Ideally, if h1 is an IMF, then h1 is the first IMF component ofx(t).

(4) If h1 is not an IMF, h1 is treated as the original signaland repeat (1), (2), (3), then

h1 −m11 = h11. (2)

After repeated sifting, that is, up to k times, h1k becomes anIMF:

h1(k−1) −m1k = h1k, (3)

then it is designated as

c1 = h1k, (4)

the first IMF component from the original data.(5) Separate c1 from x(t), we could get

r1 = x(t)− c1, (5)

r1 is treated as the original data and repeat the aboveprocesses, therefore the second IMF component c2 of x(t)

Junsheng Cheng et al. 3

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Time t (s)

−50

0

50

Acc

eler

atio

na

(ms−

2)

Figure 1: Acceleration vibration signal of a gear with a brokentooth.

could be got. Let us repeat the process as described above forn times, then n-IMFs of signal x(t) could be got. Then,

r1 − c2 = r2...

rn−1 − cn = rn.

(6)

The decomposition process can be stopped when rnbecomes a monotonic function from which no more IMF canbe extracted. By summing up (5) and (6), we finally obtain

x(t) =n∑

j=1

cj + rn. (7)

Thus, one can achieve a decomposition of the signalinto n-empirical modes and a residue rn, which is the meantrend of x(t). Each of the IMFs c1, c2, . . . , cn includes differentfrequency bands ranging from high to low and is stationary.

Figure 1 shows an acceleration vibration signal of a gearwith a broken tooth. It is decomposed into 5 IMFs and aremnant rn by using EMD method as Figure 2 illustrates. Itcan be concluded from Figure 2 that each IMF componentimplies distinct time characteristic scale.

3. SUPPORT VECTOR MACHINES (SVMs)

SVM is developed from the optimal separation plane underlinearly separable condition. Its basic principle can beillustrated in two-dimensional way as Figure 3 [25]. Figure 3shows the classification of a series of points for two differentclasses of data, class A (circles) and class B (stars). The SVMtries to place a linear boundary H between the two classesand orients it in such way that the margin is maximized,namely, the distance between the boundary and the nearestdata point in each class is maximal. The nearest data pointsare used to define the margin and are known as supportvectors.

Suppose there is a given training sample set G ={(xi, yi), i = 1 · · · l}, each sample xi ∈ Rd belongs to a classby y ∈ {+1,−1}. The boundary can be expressed as follows:

ω·x + b = 0, (8)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Time t (s)

−500

50

c 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Time t (s)

−500

50

c 2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Time t (s)

−200

20

c 3

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Time t (s)

−100

10

c 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Time t (s)

−100

10

c 5

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Time t (s)

−100

10

r n

Figure 2: The EMD results of a gear vibration signal.

Support vector

Support vector

Support vector

Margin

H2

HH1

Figure 3: Classification of data by SVM.

where ω is a weight vector and b is a bias. So the followingdecision function can be used to classify any data point ineitherclass A or B:

f (x) = sign(ω·x + b). (9)

The optimal hyperplane separating the data can beobtained as a solution to the following constrained optimiza-tion problem:

minimize12‖ω‖2,

subject to yi[(ω·xi

)+ b]− 1 ≥ 0, i = 1, . . . , l.

(10)


Introducing Lagrange multipliers αi ≥ 0, the optimiza-tion problem can be rewritten as

minimize L(ω, b,α) =l∑

i=1

αi − 12

l∑

i, j=1

αiαj yi y j(xi·xj

),

subject to αi ≥ 0,

l∑

i=1

αi yi = 0.

(11)

The decision function can be obtained as follows:

f (x) = sign

( l∑

i=1

αi yi(xi·x

)+ b

)

. (12)

If the linear boundary in the input spaces is not enoughto separate into two classes properly, it is possible to createa hyperplane that allows linear separation in the higherdimension. In SVM, it is achieved by using a transformationΦ(x) that maps the data from input space to feature space. Ifa kernel function

K(x, y) = Φ(x)·Φ(y) (13)

is introduced to perform the transformation, the basic formof SVM can be obtained:

f (x) = sign

( l∑

i=1

αi yiK(x, xi

)+ b

)

. (14)

Among the kernel functions in common use are linearfunctions, polynomials functions, radial basis functions, andsigmoid functions.

4. DIAGNOSIS APPROACH FOR GEARS BASED ONIMF AR MODEL AND SVM

The following autoregressive model AR(m) could be estab-lished for each IMF component ci(t) in (7) [26]:

ci(t) +m∑

k=1

ϕikci(t − k) = ei(t), (15)

where ϕik (k = 1, 2, . . . ,m), m are the model parametersand model order of the autoregressive model AR(m) ofci(t), respectively; ei(t) is the remnant of the model andis a white noises sequence whose mean value is zero andvariance is σ2

i . Since the parameters ϕik can reflect theinherent characteristics of a gear vibration system and thevariance of the remnant σ2

i is tightly related with the outputcharacteristics of the system, ϕik and σ2

i can be chosen asfeature vectors Ai = [ϕi1,ϕi2, . . . ,ϕim, σ2

i ] to identify thecondition of the gears system.

The flow chart of a diagnosis method proposed in thispaper is illustrated in Figure 4.

The fault diagnosis approach for gearsbased on IMF ARmodel and SVM is represented as follows.

(1) Sample signals N times at a certain sample frequencyfs under the circumstance that the gear is normal and the

Start

Input original signal x(t)

IMF components c1, c2, . . . , cn areobtained after applying EMD to x(t)

AR model is created foreach IMF component ci(t)

Extract feature vectors Ai

SVM classifier

Identify the condition of the gears

End

Figure 4: The flow chart of the proposed method.

gear has the crack faults. And the 2N signals are takenas samples that are divided into two subsets, the trainingsamples and test samples.

(2) Each signal is decomposedby EMD. Different signalhas different amount of the IMFs, denoted by n1,n2, . . . ,n2N ,and let n = max(n1,n2, . . . ,n2N ). If some samples whoseamount nk (k = 1, 2, . . . , 2N) of IMF components is lessthan n, it can be padded with zero to n componentsc1(t), c2(t), . . . , cn(t), that is ci(t) = {0}, i = nk + 1,nk +2, . . . ,n.

(3) In order to eliminate the effect of the signal amplitudeto the variance of the remnant σ2

i , normalize each IMFcomponent to achieve a new component:

ci(t) = ci(t)√∫∞−∞c

2i (t)dt

. (16)

(4) Establish AR model for the normalized component,determine the order m of the model and estimate autore-gressive parameters ϕik (k = 1, 2, . . . ,m) and the remnant’svariance σ2

i , where ϕik means the kth autoregressive param-eters of the ith IMF component. Therefore, the featurevector used as input vector of SVMs is as follows: Ai =[ϕi1,ϕi2, . . . ,ϕim, σ2

i

].

(5) Separate the training set into two classes: y = +1 andy = −1, which represent two kinds of working condition ofthe gears, namely, the normal gear and the gear with crackfault. Actually, the decision function f (x) is determinedonly by the support vectors, so after the support vectors areobtained the feature vector of test samples can be input intothe trained SVM classifier and then the working conditioncan be classified by the output of the SVMs classifier.


Table 1: The identification results based on IMF AR model and SVM.

Conditions of the signals IMFFeature vectors Distance

Resultsϕi1 ϕi2 ϕi3 σ2

i 6 training samples 3 training samples

Normalc1 0.4488 0.2870 0.2498 2.1331

1.4313 0.9421 +1c2 −0.7683 1.5523 −1.0823 0.9972

c3 −2.1518 2.6944 −2.0254 0.2134

Normalc1 0.3980 0.1908 0.2330 1.7583

1.3609 1.0774 +1c2 −1.0207 1.8408 −1.6746 0.7681

c3 −2.1360 2.7934 −2.2215 0.1856

Normalc1 0.5110 0.2482 0.2179 2.0377

1.7666 1.4178 +1c2 −0.7941 1.5924 −1.1135 0.9576

c3 −2.0363 2.4411 −1.5479 0.2315

Crack faultc1 0.0545 6.7798 0.1888 1.2081

−1.7755 −1.5707 −1c2 −1.7086 2.0489 −1.3569 0.4271

c3 −2.8216 3.9288 −3.2710 0.0439

Crack faultc1 0.0072 0.7102 0.2035 1.0662

−1.2758 −1.0311 −1c2 −1.7070 2.0933 −1.5511 0.3248

c3 −2.8072 3.7685 −2.9271 0.0321

Crack faultc1 0.1515 0.5989 0.0622 1.5854

−1.5496 −1.5219 −1c2 −1.4817 1.8108 −1.1972 0.5092

c3 −2.8286 4.0104 −3.4727 0.0436

5. APPLICATIONS

An experiment has been carried out on the smallexperiment-rig developed by the Vibration and Test Centerof Hunan University itself. The fault is introduced by cuttingslot with laser in the root of tooth, and the width of theslot is 0.15–0.25 mm, as well as its depth is 0.1–0.3 mm.The acceleration sensor has been fixed on the cover of thegear box before 30 signals under two circumstances aresampled with sample frequency of 1024 Hz, among whichthree randomly chosen samples for each condition are takenas training samples, and the remain are test data.

Decompose each vibration signals under different condi-tions with EMD method into a number of IMFs. The analysisresults show that the fault information of gear vibrationsignals is mainly included in the first three IMF components.Therefore, the AR models of the first three IMF componentsare established merely. In this paper, the order of the model,m, is determined with FPE criterion [26]; the autoregressiveparameters ϕik (k = 1, 2, . . . ,m) and the remnant varianceσ2i of the model are computed with least squares criterion

[26]. As, in fact, the system condition is mainly decided bythe autoregressive parameters of the first several ones and theremnant variance, those of only the first three ones, that isϕik (k = 1, 2, 3) and σ2

i , are chosen as feature vectors in thispaper for convenience.

Define the normal condition as y = +1 and the one withthe crack fault as y = −1; choose the linear kernel function tocalculate and by formulas (11) we can obtain the parametersof SVM classifier, α = [0, 0.1699, 0.6091, 0.7790, 0, 0]T ,‖ω‖ = 1.2482, and b = 2.5942. Then, by formula (12)the identification result of each test sample is obtained, partof which are shown in Table 1. Obviously, the identification

results are totally consistent with the fact. For further studyof the application of SVMs in the pattern identification withsmaller number of samples, the number of training samplesdecrease to three (one is normal and the others is withcrack fault) and the calculation procedure is the same asabove. Here, the parameters of the SVM classifier becomeα = [0.5014, 0.5014, 0]T , ‖ω‖ = 1.0014, b = 2.5485. Theidentification results to the same test samples are shown inTable 1 too.

It can be seen from Table 1 that SVM classifier canstill classify the two conditions of gears accurately afterthe training samples are decreased, which confirm fullythat the SVM classifier can be applied successfully to thepattern recognition even in cases where only limited trainingsamples are available. It also can be found, if we comparethe distances between test samples with different numberof training samples to the optimal separating hyperplaneH , that the distance decreases after the number of trainingsamples become smaller although the gear work states canstill be identified by SVM, which shows that in this way thewhole performance of the classifier somewhat reduces.

What we discuss above is how to classify two conditionsof gears (normal and crack fault), that is, two-class problem.When it comes to the multiple-class problems, that is, howto identify the gears with multiple-class faults (e.g., crack,broken teeth, etc.), generalizing method can be introducedto decompose the multiple-class problems into two-classproblems which then can be trained with SVM. In otherwords, each time take one group of the training samples asone class and therest, which do not belong to the former,can be taken as the other class. Hence, for the k (k ≥ 3)classes’ problems, the classification of the input space can beachieved by k decision-functions based on SVM.


Table 2: The identification results based on IMF AR model and SVMs.

Conditions of the signalsSVM classifier

Identification resultsSVM1 SVM2 SVM3

Normal +1 Normal

Normal +1 Normal

Crack fault −1 +1 Crack fault

Crack fault −1 +1 Crack fault

Broken teeth −1 −1 +1 Broken teeth

Broken teeth −1 −1 +1 Broken teeth

Three SVM classifiers are needed to design if three classesof gear work conditions are to be identified like normal,with crack fault and with broken teeth fault. First of all,define that y = +1 represents the normal condition andy = −1 represents the faults condition, that is, identify thegear whether it has fault or not by SVM1. Secondly, identifythe gear whether it has crack fault or not by SVM2, herey = +1 represents crack fault and y = −1 represents otherfaults. Finally, identify the gear whether it has broken teethfault or not, here y = +1 represents broken teeth fault andy = −1 represents other faults. The identification approachis the same as above, that is, extract nine samples as trainingones at random (three samples with normal condition, threesamples with crack fault, and three samples with broken teethfault); and then calculate the parameters of SVM classifier.The part identification results are shown in Table 2 fromwhich we can see that three SVM classifiers can identify theworking conditions and fault patterns of gears accurately.

6. CONCLUSIONS

AR model is an information container that contains thecharacteristics of gear vibration systems, based on which thefault feature of gear vibration signal can be extracted. Themost important is that the gear work states can be identifiedby the parameters of the AR model after the AR modelof vibration signals is established without constructingmathematical model and studying the fault mechanism.However, AR model can only be applied to stationarysignals, while the gear fault vibration signals always displaynonstationary behavior. To target this problem, in this paperbefore AR model is established, a preprocessing on gear faultvibration signals is carried out with EMD method, which candecompose a signal, in terms of its intrinsic information, intoa number of IMFs. The decomposition of EMD is a process oforigin signal linearization and stationary in nature, thus ARmodel can be established for each of the IMF components.

The limitations of the conventional statistical patternrecognition methods and ANNs classifies are targeted.Support vector machine, which has better generalizationthan ANNs and can solve the learning problem of smallernumber of samples quite well, has been introduced into thepattern recognition.

By the analysis results of three kinds of gears vibrationsignals among which one is normal and the other two arethe gears with crack and gears with broken tooth faultsrespectively, it has been shown that the gear fault diagnosis

approach based on IMF AR model and SVM can be appliedto classify the gear working conditions and fault patternseffectively and accurately even in case of smaller number ofsamples, which accordingly offers a new approach for thefault diagnosis of gears. However, because it would take moretime to determine the parameters of SVM classifier and theAR model, the proposed method cannot be available in real-time. In addition, what is necessary to point out is that theSVM theory is still in its perfecting phase, for example, theproblems of kernel functions selection in different conditionand so on are still needed to research further.

ACKNOWLEDGMENT

The support for this research under Chinese National ScienceFoundation Grant no. 50775068 is gratefully acknowledged.

REFERENCES

[1] W. Q. Wang, F. Ismail, and M. F. Golnaraghi, “Assessment ofgear damage monitoring techniques using vibration measure-ments,” Mechanical Systems and Signal Processing, vol. 15, no.5, pp. 905–922, 2001.

[2] D. Brie, M. Tomczak, H. Oehlmann, and A. Richard, “Gearcrack detection by adaptive amplitude and phase demodula-tion,” Mechanical Systems and Signal Processing, vol. 11, no. 1,pp. 149–167, 1997.

[3] W. J. Staszewski, K. Worden, and G. R. Tomlinson, “Time-frequency analysis in gearbox fault detection using theWigner-Ville distribution and pattern recognition,” Mechan-ical Systems and Signal Processing, vol. 11, no. 5, pp. 673–692,1997.

[4] N. Baydar and A. Ball, “A comparative study of acoustic andvibration signals in detection of gear failures using Wigner-Ville distribution,” Mechanical Systems and Signal Processing,vol. 15, no. 6, pp. 1091–1107, 2001.

[5] H. Oehlmann, D. Brie, M. Tomczak, and A. Richard, “Amethod for analysing gearbox faults using time-frequencyrepresentations,” Mechanical Systems and Signal Processing,vol. 11, no. 4, pp. 529–545, 1997.

[6] J. Lin and M. J. Zuo, “Gearbox fault diagnosis using adaptivewavelet filter,” Mechanical Systems and Signal Processing, vol.17, no. 6, pp. 1259–1269, 2003.

[7] G. Meltzer and N. P. Dien, “Fault diagnosis in gears operatingunder non-stationary rotational speed using polar waveletamplitude maps,” Mechanical Systems and Signal Processing,vol. 18, no. 5, pp. 985–992, 2004.


[8] P. W. Tse, W.-X. Yang, and H. Y. Tam, “Machine fault diagnosisthrough an effective exact wavelet analysis,” Journal of Soundand Vibration, vol. 277, no. 4-5, pp. 1005–1024, 2004.

[9] D. Hong, W. Ya, and Y. Shuzi, “Fault diagnosis by time seriesanalysis,” in Applied Time Series Analysis, World Scientific,River Edge, NJ, USA, 1989.

[10] W. Ya and Y. Shuzi, “Application of several time seriesmodels in prediction,” in Applied Time Series Analysis, WorldScientific, River Edge, NJ, USA, 1989.

[11] Y. Grenier, “Time-dependent ARMA modeling of nonstation-ary signals,” IEEE Transactions on Acoustics, Speech, and SignalProcessing, vol. 31, no. 4, pp. 899–911, 1983.

[12] N. E. Huang, Z. Shen, S. R. Long, et al., “The empirical modedecomposition and the Hubert spectrum for nonlinear andnon-stationary time series analysis,” Proceedings of the RoyalSociety A, vol. 454, no. 1971, pp. 903–995, 1998.


[14] A. M. Bassiuny and X. Li, “Flute breakage detection duringend milling using Hilbert-Huang transform and smoothednonlinear energy operator,” International Journal of MachineTools and Manufacture, vol. 47, no. 6, pp. 1011–1020, 2007.

[15] A. M. Bassiuny, X. Li, and R. Du, “Fault diagnosis of stampingprocess based on empirical mode decomposition and learningvector quantization,” International Journal of Machine Toolsand Manufacture, vol. 47, no. 15, pp. 2298–2306, 2007.

[16] L. Wuxing, P. W. Tse, Z. Guicai, and S. Tielin, “Classificationof gear faults using cumulants and the radial basis functionnetwork,” Mechanical Systems and Signal Processing, vol. 18,no. 2, pp. 381–389, 2004.

[17] B. Samanta, “Artificial neural networks and genetic algorithmsfor gear fault detection,” Mechanical Systems and SignalProcessing, vol. 18, no. 5, pp. 1273–1282, 2004.

[18] X. Li, S. K. Tso, and J. Wang, “Real-time tool conditionmonitoring using wavelet transforms and fuzzy techniques,”IEEE Transactions on Systems, Man and Cybernetics C, vol. 30,no. 3, pp. 352–357, 2000.

[19] M. Zacksenhouse, S. Braun, M. Feldman, and M. Sidahmed,“Toward helicopter gearbox diagnostics from a small numberof examples,” Mechanical Systems and Signal Processing, vol.14, no. 4, pp. 523–543, 2000.

[20] H.-C. Kim, S. Pang, H.-M. Je, D. Kim, and S. Y. Bang,“Constructing support vector machine ensemble,” PatternRecognition, vol. 36, no. 12, pp. 2757–2767, 2003.

[21] G. Guo, S. Z. Li, and K. L. Chan, “Support vector machinesfor face recognition,” Image and Vision Computing, vol. 19, no.9-10, pp. 631–638, 2001.

[22] O. Barzilay and V. L. Brailovsky, “On domain knowledge andfeature selection using a support vector machine,” PatternRecognition Letters, vol. 20, no. 5, pp. 475–484, 1999.

[23] U. Thissen, R. van Brakel, A. P. de Weijer, W. J. Melssen, and L.M. C. Buydens, “Using support vector machines for time seriesprediction,” Chemometrics and Intelligent Laboratory Systems,vol. 69, no. 1-2, pp. 35–49, 2003.

[24] V. N. Vapnik, The Nature of Statistical Learning Theory,Springer, New York, NY, USA, 1995.

[25] V. N. Vapnik, Statistical Learning Theory, John Wiley & Sons,New York, NY, USA, 1998.

[26] G. C. Goodwin and R. L. Payne, Dynamic System Identifica-tion, Experiment Design and Data Analysis, Academic Press,New York, NY, USA, 1977.


Research ArticleUnivariate and Bivariate Empirical Mode Decomposition forPostural Stability Analysis

Hassan Amoud, Hichem Snoussi, David Hewson, and Jacques Duchene

Charles Delaunay Institute, FRE CNRS 2848, University of Technology of Troyes, 10000 Troyes, France

Correspondence should be addressed to Hassan Amoud, [email protected]

Received 22 October 2007; Accepted 7 February 2008

Recommended by Kenneth Barner

The aim of this paper was to compare empirical mode decomposition (EMD) and two new extended methods of EMD namedcomplex empirical mode decomposition (complex-EMD) and bivariate empirical mode decomposition (bivariate-EMD). Allmethods were used to analyze stabilogram center of pressure (COP) time series. The two new methods are suitable to be applied tocomplex time series to extract complex intrinsic mode functions (IMFs) before the Hilbert transform is subsequently applied onthe IMFs. The trace of the analytic IMF in the complex plane has a circular form, with each IMF having its own rotation frequency.The area of the circle and the average rotation frequency of IMFs represent efficient indicators of the postural stability status ofsubjects. Experimental results show the effectiveness of these indicators to identify differences in standing posture between groups.

Copyright © 2008 Hassan Amoud et al. This is an open access article distributed under the Creative Commons AttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properlycited.

1. INTRODUCTION

Falls are a major problem for the elderly due to the resultingloss of autonomy and subsequent behavior modificationrelated to fear of falling. Health care services worldwide areactively working on the issue of falls for both medical andsocial reasons. For instance, in France alone, the number ofdeaths attributed annually to falls is estimated to be morethan 9000, with a resultant cost in excess of two billion euros[1]. The cost of falls is only going to increase, in line with theincrease in the elderly population. In Europe, the percentageof adults over 65 years old will almost double from 15% to30% by the year 2050 [2]. Falling is a consequence of a failurein the postural control system due to aging or a specificpathology. Many risk factors have been identified for falls,the most commonly cited including an underlying muscularweakness, a previous fall, as well as balance and gait problems[3].

Postural equilibrium is maintained by reacting toinformation from the different sensory systems, includingvestibular, visual, and proprioception systems. It is possibleto evaluate the postural control system using either clinicalor biomechanical tests. Biomechanical tests can be dynamicor static, with dynamic tests used to characterize the per-

formance of the postural control system to maintain theposture after an external perturbation. In contrast, static testsevaluate postural performance in a static position. In staticposture, a force plate is used to evaluate postural sway. Theforce plate measures the displacement of the center of pres-sure (COP), which represents the location of the resultantforce exerted on the surface of a force plate. The COP, whichcan be used as a measure of postural stability, is measuredin the horizontal plane in both anteroposterior (AP) andmediolateral (ML) directions [4]. The representation of theCOP time series in AP and ML directions is known as thestabilogram (see Figure 1).

Traditional parameters are extracted from the stabilo-gram signals under the assumption that the COP is astationary time series [4, 5]. These classical parametersgive few insights into the control of posture [6], providingpurely statistical information while ignoring the dynamiccharacteristics of COP displacements. These parametersinclude temporal (mean, RMS), spatiotemporal (area of theellipse), and spectral (median frequency) parameters.

Recently, parameters that describe the fractal and timeevolutionary properties of the COP and provide informationrelated to underlying physiological control processes havebeen extracted using nonlinear and fractal analyses. These


x

y

z

−10 −8 −6 −4 −2 0 2 4 6 8 10

Mediolateral displacement (mm)

10

86

4

2

0

−2

−4−6

−8

−10An

tero

post

erio

rdi

spla

cem

ent

(mm

)

(a)

0 2 4 6 8 10

Time (s)

10

5

0

−5

−10

ML

(mm

)

0 2 4 6 8 10

Time (s)

10

5

0

−5

−10

AP

(mm

)

(b)

Figure 1: Displacement of the center of pressure in the horizontalplane (bottom left), displacement in the anteroposterior directionover time (top right), and displacement in the mediolateraldirection over time (bottom right). Data are recorded from ahealthy control subject for 10 seconds.

new groups of parameters include the Hurst exponent, whichprovides information about long-term correlation [7], thereconstructed phase space [8], and entropy [9].

Stabilogram signals have been shown to be nonstationarysignals [10, 11]. In order to extract information from the

stabilogram and to characterize the postural stability, itis proposed in this paper to apply the Hilbert transformon the intrinsic mode functions (IMFs) extracted by theempirical mode decomposition (EMD) [12]. The EMDmethod decomposes the signal into a set of IMFs whichrepresent the oscillatory modes embedded in the signal.For the COP signal, IMFs have well-defined instantaneousfrequency and different IMFs do not exhibit the samefrequency at the same time. In addition, the COP signal isa bidirectional signal and can be represented by a complexsignal (ML + jAP). The two components of the COP signalare nonlinearly correlated due to the complex correlationbetween all the postural control systems. To analyze thecomplex COP signal, two new methods derived from EMDmethod to decompose a complex signal will be applied.The first method, complex empirical mode decomposition(complex-EMD), was developed in [13], while the secondmethod, bivariate empirical mode decomposition (bivariate-EMD) was developed in [14, 15].

The complex-EMD method is based on the inherentrelationship between the positive and negative frequencycomponents of a complex signal. This method treats the pos-itive and negative frequency components as two independentsignals. The EMD of these two components gives two sets ofIMFs corresponding to the positive and negative frequencycomponents of a complex signal. The bivariate-EMD methodis based on the idea of replacing the oscillation notion inEMD by a notion of rotation. This method considers thatthe bivariate signal can be described as the sum of a fastrotation component superimposed on a slower rotation. Thebivariate-EMD algorithm consists in projecting the complexsignal on a set of directions and then applying the siftingprocess of the basic-EMD on the projected components.

The EMD, complex-EMD, and bivariate-EMD methodsare powerful signal processing techniques that can be usedto analyze the stabilogram and to characterize the qualityof equilibrium in standing posture. In fact, the plots ofthe analytic IMFs in the complex plane have a specificgeometry similar to a circular form; and each IMF has itsown rotation frequency. In contrast, the analytic signal ofthe entire stabilogram signal does not have a specific form,and it differs between experiments for the same subject dueto its inherent nonstationary nature. Furthermore, the sta-bilogram is a multicomponent signal, thus its instantaneousfrequency cannot be calculated directly. These two features(circular form and rotation frequency) permit the extractionof new nonlinear parameters that characterize the qualityof equilibrium and provide more information about themechanisms underlying postural control. The first parameteris simply the area of the circle formed by the shape of theanalytic IMF, while the second parameter is the averagerotation frequency.

In order to test the capacity of different EMD methodsand the Hilbert transform to describe the quality of equilib-rium and to discriminate between elderly and control group,the three EMD methods were applied on stabilogram signalsextracted from elderly and control groups. The extractednonlinear parameters from the plots of the analytic IMFswere the area of the circle in the complex plane, and the

Hassan Amoud et al. 3

10.5

0Res

idu

e

0 2 4 6 8 10

Time (s)

50

−5IMF5

0 2 4 6 8 10

50

−5IMF4

0 2 4 6 8 10

20

−2IMF3

0 2 4 6 8 10

20

−2IMF2

0 2 4 6 8 10

0.50

−0.5IMF1

0 2 4 6 8 10

60

−6

AP

0 2 4 6 8 10

Figure 2: EMD decomposition of the 10-second anteroposteriordisplacement time series in Figure 1.

average rotation frequency of each IMF. The paper is orga-nized as follows. The experimental protocol, EMD, complex-EMD, bivariate-EMD, and the parameter calculation arepresented in Section 2. The results of the application of thecited methods on the stabilogram are presented in Section 3.Discussions about the results and the EMD methods arepresented in Section 4.

2. METHODS

2.1. Subjects

Ten healthy control subjects (three males and seven females)and ten healthy elderly subjects (four males and six females)participated in the study. Control subjects’ mean age, height,and weight were 33.3 ± 7.4 y, 168.0 ± 6.5 cm, and 65.7 ±17.6 kg, respectively. Elderly subjects’ mean age, height andweight were 80.5 ± 4.7 y, 165.6 ± 7.0 cm, and 71.9 ± 9.9 kg,respectively. All subjects who participated gave their writteninformed consent. No subjects reported any musculoskeletalor neurological conditions that precluded their participationin the study.

2.2. Data acquisition and data processing

Center of pressure data were obtained from a Bertec 4060–08 force plate (Bertec Corporation, Columbus, Ohio, USA).The initial COP signals were calculated with respect to thecenter of the force plate before normalization by subtractionof the mean value. Data were recorded using ProTags anddeveloped in Labview (National Instruments Corporation,Austin, Tex, USA). Data were sampled at 100 Hz, using an8th-order lowpass Butterworth filter with a cutoff frequencyof 10 Hz. All subsequent calculations were performed usingMATLAB (Mathworks Inc, Natick, Mass, USA).

2.3. Experimental protocol

Subjects were tested barefoot or wearing socks. Testing beganwith subjects standing upright with their arms by theirsides in front of the force-plate while looking at a 10-cm cross fixed on the wall two meters in front of them.Upon verbal instruction, subjects stepped onto the force-plate. Subjects were not required to use a preordained footposition. Data recording lasted 15 seconds, during whichtime subjects maintained an upright posture. A second verbalcommand was given for subjects to step down from the forceplate.

2.4. Empirical mode decomposition

EMD is a signal processing decomposition technique thatdecomposes the signal into waveforms modulated in bothamplitude and frequency by extracting all of the oscillatorymodes embedded in the signal [12]. The decomposition is anintuitive and adaptive signal-dependent decomposition anddoes not require any conditions about the stationarity andlinearity of the signal. The waveforms extracted by EMD arenamed IMFs. Each IMF is symmetric, is assumed to yielda meaningful local frequency, and different IMFs do notexhibit the same frequency at the same time. In other words,each IMF satisfies the two following constraints:

(i) the number of extrema and the number of zerocrossings are identical or differ at most by one;

(ii) the mean value between the upper and the lowerenvelope is equal to zero at any time.

The difference between the original signal and the IMFtime series is the residual. The first IMF component isobtained by a sifting process. This procedure is then appliedon the residual in order to extract the second IMF, andso forth. Thus all the IMFs are iteratively extracted. Thenonstationary signal x(t) is then represented as a linear sumof IMFs and the residual component:

x(t) =K∑

k=1

dk(t) + rK (t), (1)

where dk(t) denotes the kth extracted empirical mode andrK (t) the residual.

The EMD algorithm can be summarized as follows.

(1) Extract all the extrema of x(t).

(2) Interpolate between minima (resp., maxima) to ob-tain two envelopes emin(t) and emax(t).

(3) Compute the average: m(t) = (emin(t) + emax(t))/2.

(4) Extract the detail d(t) = x(t)−m(t).

(5) Test if d(t) is an IMF:

(i) if yes, repeat the procedure from the step 1 onthe residual signal r(t) = x(t)− d(t),

(ii) if not, replace x(t) with d(t) and repeat theprocedure from step 1.


−40 −30 −20 −10 0 10 20 30 40

40

30

20

10

0

−10

−20

−30

−40

(a)

−15 −10 −5 0 5 10 15

15

10

5

0

−5

−10

−15

(b)

−15 −10 −5 0 5 10 15

15

10

5

0

−5

−10

−15

(c)

−15 −10 −5 0 5 10 15

15

10

5

0

−5

−10

−15

(d)

−15 −10 −5 0 5 10 15

15

10

5

0

−5

−10

−15

(e)

Figure 3: Representation in the complex plane of the analytic signal of a healthy control subject (black lines) and healthy elderly subject(grey lines) for the AP direction (a). Analytic signals of the first four IMFs (b): IMF1, (c): IMF2, (d): IMF3, and (e): IMF4.

Matlab codes are available at http://perso.ens-lyon.fr/patrick.flandrin/emd.html.

An example of the application of EMD on the 10-secondAP time series traced in Figure 1 is shown in Figure 2. Thenumber of IMFs varied between experiments and betweensubjects. The minimum number of IMFs was 4.

2.5. Hilbert transformation and phase estimation

The Hilbert transform of a real signal x(t) is defined as

y(t) = 1π

p.v.∫ +∞

−∞

x(τ)t − τ dτ, (2)


where p.v. indicates the Cauchy principle value. The analyticsignal of x(t) is then defined as:

z(t) = x(t) + j y(t). (3)

The analytic signal can be further expressed as

z(t) = a(t) exp[ jθ(t)], (4)

where a(t) =√x2(t) + y2(t) is the amplitude of z(t), and

θ(t) = arctan y(t)/x(t) is the instantaneous phase fromwhich the instantaneous signal frequency f (t) is obtained bydifferentiation:

f (t) = 12π

∂θ(t)∂t

. (5)

The original signal x(t) is now expressed in a Fourier-likeexpansion as

x(t) = Re

{K+1∑

k=1

ak(t) exp

[

j∫

2π fk(t)dt

]}

(6)

in which the residual rK (t) is included and where theindex k refers to each IMF and Re{·} denotes the realpart of a complex quantity. Huang et al. proposed that theHilbert transform should be applied on all IMFs obtained byEMD. This transform is known as Hilbert-Huang transform(HHT). Indeed, the IMFs are locally symmetric functionsand therefore the instantaneous frequency is well localizedin the time-frequency domain.

An example of the HHT (with scalar EMD) appliedon two AP time series is presented in Figure 3. The firstAP time series was recorded from a healthy control subject(black lines), while the second time series was recorded froma healthy elderly subject (grey lines). This figure presentsthe trace of the entire analytic signal in the complex plane(Figure 3(a)), as well as those of each analytic IMF extractedwith the basic-EMD (Figures 3(b)–3(e)). It can be seen thatthe forms of these traces are similar to circles. A discussionabout this figure is presented in the discussion section.

In Figure 4, the phase time series of each IMF andthe average rotation frequencies of a control subject arepresented. The average rotation frequency is equal to theslope estimate of the phase time series divided by 2π. It canbe observed that the average rotation frequency decreasesfrom the first IMF to the last IMF simply because the siftingprocedure picks the component with the fastest variationembedded in the original signal first and that with the slowestvariation last.

2.6. Complex empirical mode decomposition

The complex-EMD is an extension of the basic-EMD suitablefor dealing with complex signals [13]. The motivation toextend EMD is that a large number of signal processing appli-cations have complex signals. In addition, this extension isapplied on both the real and imaginary parts simultaneouslybecause complex signals have a mutual dependence betweenthe real and imaginary parts. Thus, if the decomposition isdone separately, the mutual dependency will be lost. Thealgorithm of the complex-EMD is as follows.

0 2 4 6 8 10

Time (s)

500

400

300

200

100

0

Ph

ase

(rad

)

(IMF1)

(IMF2)

(IMF3)

(IMF4)

7.48

3.41

1.1

0.56

Figure 4: Traces of the instantaneous phase and the value ofthe average rotation frequency for the first four IMFs of the APdisplacement traced in Figure 2.

(1) Extract the positive and negative frequency compo-nents to generate two analytic signals X+(e jw) andX−(e jw) as follows:

X+(e jw) = H(e jw)X(e jw),

X−(e jw) = H(e jw)X∗(e− jw),(7)

where H(e jw) is an ideal bandpass filter equal to onefor 0 ≤ w < π and zero for −π ≤ w < 0, X∗(e jw) isthe complex conjugate of X(e jw).

(2) Extract the real part of the inverse Fourier transformof X+(e jw) and X−(e jw), denoted by x+(t) and x−(t).

(3) Apply the basic-EMD on x+(t) and x−(t) separatelyto extract the IMFs of the positive and negative

components denoted by {xi(t)}N+i=1 and {xi(t)}−1

i=−N− .

Finally, x+(t) and x−(t) can be expressed as

x+(t) =N+∑

i=1

xi(t) + r+(t),

x−(t) =−1∑

i=−N−xi(t) + r−(t),

(8)

where r+(t) and r−(t) are the residuals of x+(t) and x−(t),respectively.

The complex-EMD can now be expressed as

x(t) =N+∑

i=−N− ,i /=0

yi(t) + r(t), (9)

where yi(t) = xi(t)+ jH[xi(t)], and H[·] denotes the Hilberttransform operator.

2.7. Bivariate empirical mode decomposition

Bivariate-EMD is another extension of the EMD to complexsignals. The main difference between the bivariate-EMD and


the complex-EMD is that the latter uses the basic-EMDto decompose complex signals, whereas the bivariate-EMDadapts the rationale underlying the EMD to a bivariateframework [14, 15]. The algorithm of the bivariate-EMD, asproposed in [14], is as follows:

(1) For 1 ≤ m ≤M,

(a) project x(t) on direction φm : pφm(t) =Re(e− jφmx(t)),

(b) extract the maxima of pφm(t) : (tmi , pmi ),

(c) interpolate the set of points (tmi , e jφm pmi ) toobtain the partial envelope curve in directionφm named eφm(t).

(2) Compute the mean of all tangents as follows: e(t) =(2/M)

∑meφm(t).

(3) Subtract the mean to obtain d(t) = x(t)− e(t).

(4) Test if d(t) is an IMF:

(a) if yes, repeat the procedure from the step 1 onthe residual signal,

(b) if not, replace x(t) with d(t) and repeat theprocedure from step 1.

The bivariate-EMD can now be expressed as

x(t) =∑

k

dk(t) + r(t), (10)

where dk(t) denotes the kth extracted complex empiricalmode and r(t) the residual.

2.8. Parameter calculation

Two types of parameters were calculated from the decompo-sition of COP signals into IMFs using the three methods ofdecomposition, EMD, complex-EMD, and bivariate-EMD.The first parameter is related to the form of the analytic IMFsin the complex plane, which can be seen to be circular. Theparameter extracted is the area of the circle in which 95%of the data points are located (AreaCIMF). This parameterwas calculated for each of the first four IMFs. The secondparameter, which is related to the phase of the analytic IMFs,is the average rotation frequency (FIMF) of the analytic IMFsin the complex plane. These parameters were calculated forboth AP and ML directions for the basic-EMD methodand for ML + jAP for the complex-EMD and bivariate-EMD methods. For each COP signal, a total of 24 areasand 24 frequencies were calculated using the three differentmethods. For the bivariate-EMD, we have used M = 4projections.

2.8.1. Data analysis

Center of pressure data were calculated from the momentthe second foot contacted the force plate (FC2). FC2 wascalculated as the time at which the maximum value ofthe second derivative of the ML signal occurred, whichcorresponded to the time the second foot touched the

force plate. At this point, the largest acceleration of MLwould occur when the COP moved rapidly towards thesecond foot. This instant in time was used for both APand ML displacements. All analyses were performed for the10-second period starting 1 second after FC2, in order togive both AP and ML displacement time to return to nearcentral values. The choice of FC2 has been validated inprevious work [7]. Statistical analyses were performed withthe statistical package for social sciences (SPSS Inc., Chicago,Ill, USA). The Kolmogorov-Smirnov test was used to checkfor normality. Owing to the grossly non-normal distributionof the area parameters (AreaCIMF), it was necessary to applya log transformation to all AreaCIMF in order that an analysisof variance (ANOVA) could be performed. ANOVA was usedto compare results between conditions, with AreaCIMF andFIMF as the dependent variables and the subject group as theindependent variable. Alpha level was set at P < 0.05.

3. RESULTS

There was a significant difference in AreaCIMF values betweengroups for all IMFs for all three methods (basic-EMDfor AP and ML, complex-EMD for negative and positivecomponents, and bivariate-EMD for real and imaginaryparts) (Figure 5). The area was greater for elderly subjectsthen for control subjects. These increases in the values ofAreaCIMF are indicative of degradation in the balance due tothe effect of age on postural stability.

In respect to the average rotation frequency (FIMF)parameters, significant differences were observed for thebasic-EMD for AP displacement for both IMF1 andIMF2, with smaller values observed for elderly subjects(Figure 6(a)). In respect to ML displacement for basic-EMD,significant differences were observed between elderly andcontrol subjects for IMF1, IMF2, and IMF3 (Figure 6(b)).For the bivariate-EMD method, significant differences wereobserved only for IMF3 for the real part correspondingto ML (Figure 6(d)), and for IMF1 for the imaginarypart corresponding to AP (Figure 6(c)). However, for thecomplex-EMD, the average rotation frequency was signifi-cantly different for all IMFs for both positive and negativecomponents (Figures 6(e), 6(f)). The rotation frequency waslower for elderly subjects then for the control subjects.

4. DISCUSSION

The first point to be addressed is whether the use of the EMDmethodology followed by the Hilbert transform, as usedin the present study, provides greater insight than simplyapplying the Hilbert transform to the whole signal, whichalso detected significant differences in surface area betweengroups. The response is related to the nature of the traceof the analytic signal of whole signal. It can be observed inFigure 3(a) that the analytic signal in the complex plane doesnot have a specific geometry, making it impossible to definethe circle that encloses 95% of the data points. In addition,the form of the analytic trace differs between experiments forthe same subject, as well as between subjects. Such a tracing isindicative of multiple centers of rotation, possibly due to the


IMF

1 2 3 4

AP6

4

2

0

−2

Are

aCIM

F

ControlElderly

(a)

IMF

1 2 3 4

ML6

4

2

0

−2

Are

aCIM

F

ControlElderly

(b)

1 2 3 4

BivAP

IMF

6

4

2

0

−2

Are

aCIM

F

ControlElderly

(c)

IMF

1 2 3 4

BivML6

4

2

0

−2

Are

aCIM

F

ControlElderly

(d)

1 2 3 4

Positive6

4

2

0

−2IMF

Are

aCIM

F

ControlElderly

(e)

1 2 3 4

Negative6

4

2

0

−2IMF

Are

aCIM

F

ControlElderly

(f)

Figure 5: The surface area of the analytic signal in a logarithmic scale (AreaCIMF) of the first four IMF for control and elderly subjects. (a)EMD for AP displacement; (b) EMD for ML displacement; (c) bivariate-EMD for the imaginary part of the signal (corresponding to APdisplacement); (d) bivariate-EMD for the real part of the signal (ML); (e) complex-EMD for positive components; and (f) complex-EMDfor negative components. Data are mean and 95% confidence intervals. All comparisons are significantly different between groups.

presence of several different control strategies. In contrast,the traces of the analytic signals of each IMF always havea circular form (Figures 3(b)–3(e)), the area of which canthen be estimated in the complex plane.

In respect to average rotation frequency, the trace of thephase as a function of time indicates a linear relation betweenphase and time. This relationship confirms the existence of aharmonic oscillation with a constant frequency equal to theaverage frequency. In addition, all the EMD methods permit

the extraction of a proper rotation that could be related todifferent postural control systems.

In respect to the interpretation of the results for thepostural data of the present study, elderly subjects hadsignificantly greater surface areas for both AP and MLdisplacement than did the control subjects. The increasedsurface areas observed for the elderly subjects were due to thegreater amplitude of postural sway for elderly subjects for allIMFs, thus indicating a less well-controlled posture.


∗

∗

1 2 3 4

AP9876543210

IMF

FIM

F

ControlElderly

(a)

∗

∗∗

1 2 3 4

ML9876543210

IMF

FIM

F

ControlElderly

(b)

∗

1 2 3 4

BivAP9876543210

IMF

FIM

F

ControlElderly

(c)

∗

1 2 3 4

BivML9876543210

IMF

FIM

F

ControlElderly

(d)

∗

∗∗ ∗

1 2 3 4

Positive9876543210

IMF

FIM

F

ControlElderly

(e)

∗

∗∗ ∗

1 2 3 4

Negative9876543210

IMF

FIM

F

ControlElderly

(f)

Figure 6: The average rotation frequency (FIMF) of the first four IMF for control and elderly subjects. (a) EMD for AP displacement; (b)EMD for ML displacement; (c) bivariate-EMD for the imaginary part of the signal (corresponding to AP displacement); (d) bivariate-EMDfor the real part of the signal (ML); (e) complex-EMD for positive components; and (f) complex-EMD for negative components. Data aremean and 95% confidence intervals. ∗Significant difference from control subjects.

In respect to the rotation frequency, in contrast to surfacearea, smaller values were observed for elderly subjects. Thiscontrast is self-explanatory, as surface area and rotationfrequency are negatively correlated. Given that the time-series length is the same for the both groups, it follows that ifthe radius of the circle is greater, the rotation frequency willbe smaller.

4.1. Comparison between one-dimensional andtwo-dimensional EMD methods

For the one-dimensional EMD method, the mutual depen-dence between the real and imaginary parts of the complexsignal is lost and is mapped onto two real independentsignals that are decomposed separately. In the case of the


∗

∗∗ ∗

1 2 3 4

AP9876543210

IMF

FIM

F

ControlVibration

(a)

∗

∗

1 2 3 4

ML9876543210

IMF

FIM

F

ControlVibration

(b)

∗

∗∗

1 2 3 4

BivAP9876543210

IMF

FIM

F

ControlVibration

(c)

∗

∗

1 2 3 4

BivML9876543210

IMF

FIM

F

ControlVibration

(d)

∗

∗∗ ∗

1 2 3 4

Positive9876543210

IMF

FIM

F

ControlVibration

(e)

∗

∗∗ ∗

1 2 3 4

Negative9876543210

IMF

FIM

F

ControlVibration

(f)

Figure 7: The average rotation frequency (FIMF) of the first four IMF for control and vibration subjects. (a) EMD for AP displacement; (b)EMD for ML displacement; (c) bivariate-EMD for the imaginary part of the signal (corresponding to AP displacement); (d) bivariate-EMDfor the real part of the signal (ML); (e) complex-EMD for positive components; and (f) complex-EMD for negative components. Data aremean and 95% confidence intervals. ∗Significant difference from control subjects.

stabilogram, AP and ML displacements are decomposedseparately in the basic-EMD, despite the fact that the samepostural control systems produce these two signals. It isalmost certain that these two signals are linearly and/or non-linearly correlated. In addition, physiological interpretationcould be difficult as mutual information is lost.

The complex-EMD method decomposes the positive andnegative components separately, as for two independentsignals. The IMFs resulting from the positive and negativecomponents form two independent sets of IMFs for which itis not necessary that the number of IMFs is equal for positive

and negative components. However, interpretation of thepositive and negative frequency is difficult as the positiveand negative components bear no obvious relationship withphysiological control systems. Despite this, complex-EMDwas the only method that enabled discrimination betweenelderly and control groups using the average rotationfrequency for all of the first four IMFs for both positiveand negative frequency components. Confirmation of thecapacity of the complex-EMD method to detect differencesbetween groups based on average rotation frequencies wasperformed using data from an experimental study in which


the proprioceptive and visual systems were degraded byapplying tendon vibration, and by closing subjects’ eyes.Details of the experiments performed on 17 healthy youngsubjects can be found in [16]. The same EMD, complex-EMD, and bivariate-EMD analyses were applied on the datafrom these experiments. In respect to the area, similar resultswere obtained between groups, with greater surface areasand thus increased postural sway observed for vibration.In respect to the average rotation frequency, the complex-EMD method again outperformed the two others, withsignificant differences observed between groups for all fourIMF for both positive and negative rotation (Figure 7). Incontrast, for the basic-EMD method, significant differenceswere observed for all IMFs for AP displacement and for thefirst and second IMFs for ML displacement. With respectto the bivariate-EMD method, only three of the IMFsfor AP and two of the IMFs for ML showed significantdifferences between groups (Figure 7). Furthermore, theIMFs in which these differences were observed were notthe same as those identified in Figure 6 for the control andelderly subjects.

In addition, we have studied the influence of differentvalues of M (from M = 4 to M = 16) on the abilityof the bivariate-EMD to discriminate between groups. Thestatistical significance of results is unaltered for the surfaceparameter AreaCIMF for the different values of M. However,varying the value of M may cause the change of the statisticalsignificance of the average rotation frequency to discriminatebetween groups. The statistical significance is lost for certainIMFs and may appear for other IMFs while varying the valueof M.

The difference between results is not surprising forthe bivariate-EMD method. This method is based on therotation in two-dimensional space, while the extractionof extrema is related to the change of the moving pointdirection [15]. Due to this fundamental difference inmethodology, the results for the average rotation frequencyare not the same for the bivariate-EMD.

In respect to the sifting process used by the differentmethods, complex-EMD uses the basic-EMD method. Theonly difference is that the complex-EMD decomposes thecomplex signal into two positive and negative frequencycomponents before applying the sifting process.

Concerning the bivariate-EMD method, despite theless impressive results shown in the present study whencompared to the complex-EMD method, bivariate-EMDmight offer some advantages under specific conditions. Forinstance, rather than split the complex signal into two parts,the bivariate-EMD decomposition is done on the complexsignal directly, thus the problem of the number of IMFs forthe positive and negative frequency components does notarise due to the unified approach adopted to decompose thecomplex signal. In addition, the mutual dependency betweenthe real and imaginary parts of the complex signal is takeninto account. In this way, bivariate-EMD might be morelikely to respond to changes in spatial parameters than theothers methods. Furthermore, bivariate-EMD enables theextraction of the fast rotation that is superimposed on slowerrotation in the COP signals. In this way, it could be possible

to identify the characteristics of different posture controlsystems in future studies.

5. CONCLUSION

The EMD, complex-EMD, and bivariate-EMD methods areuseful and powerful methods to analyze nonstationary uni-variate and bivariate time series. In the case of stabilogramtime series, these methods were able to extract the oscilla-tions in different adaptive time scales, and to define properrotations that could be related to the different posturalcontrol systems. In addition, these methods enabled thequality of equilibrium of postural systems to be characterizedas well as to identify the differences in postural controlmechanisms between elderly and control subjects. In fact,the use of the different types of EMD methods enabledthe extraction of individual centers of rotation for eachIMF. In perspective, additional work is planned in order tofind the relation between IMFs and the different posturalcontrol systems. A follow-up study will address this issue,initially by simulation, before using different experimentalconditions in which the different postural control systemswill be impaired. The aim of this study will be to ascertainwhether or not each IMF corresponds to a given posturalcontrol system.

ACKNOWLEDGMENTS

This study was undertaken as part of the mv-EMDresearch project (multivariate Empirical Mode Decom-position) supported by the French ANR agency (ANRBlanc Grant BLAN07-1 223026), the PreDICA researchproject (Prevision, Detection Investigation contre la Chutedes Personnes Agees) supported by the French ANRagency (Grant ANR-O5-RNTS-01801), and the PARAChuteresearch project (Personnes Agees et Risque de Chute)which was supported in part by The French Ministry ofResearch (Grant 03-B-254), The European Social Fund(Grant 3/1/3/4/07/3/3/011), The European Regional Devel-opment Fund (Grant 2003-2-50-0014 and Grant 2006-2-20-0011), The Champagne-Ardenne Regional Council (GrantE200308251), and INRIA (Grant 804F04620016000081).

REFERENCES

[1] Comite Francais d’Education pour la Sante, “Les cles du“bien vieillir”: prevention des chutes chez les seniors,” CaisseNationale de l’Assurance Maladie des Travailleurs Salaries,November 2001.

[2] “The 2005 epc projections of age-related expenditure (2004-50) for the eu-25 member states,” Tech. Rep. 4, EuropeanCommission, Luxembourg, 2005.

[3] L. Z. Rubenstein and K. R. Josephson, “The epidemiology offalls and syncope,” Clinics in Geriatric Medicine, vol. 18, no. 2,pp. 141–158, 2002.

[4] T. E. Prieto, J. B. Myklebust, R. G. Hoffmann, E. G. Lovett, andB. M. Myklebust, “Measures of postural steadiness: differencesbetween healthy young and elderly adults,” IEEE Transactionson Biomedical Engineering, vol. 43, no. 9, pp. 956–966, 1996.


[5] T. E. Prieto, J. B. Myklebust, and B. M. Myklebust, “Charac-terization and modeling of postural steadiness in the elderly:a review,” IEEE Transactions on Rehabilitation Engineering,vol. 1, no. 1, pp. 26–34, 1993.

[6] J. J. Collins and C. J. De Luca, “Upright, correlated randomwalks: a statistical-biomechanics approach to the humanpostural control system,” Chaos, vol. 5, no. 1, pp. 57–63, 1995.

[7] H. Amoud, M. Abadi, D. Hewson, V. Michel-Pellegrino, M.Doussot, and J. Duchene, “Fractal time series analysis ofpostural stability in elderly and control subjects,” Journal ofNeuroEngineering and Rehabilitation, vol. 4, pp. 1–12, 2007.

[8] H. Snoussi, H. Amoud, M. Doussot, D. Hewson, and J.Duchene, “Reconstructed phase spaces of intrinsic modefunctions. Application to postural stability analysis,” in Pro-ceedings of the 28th Annual International Conference of the IEEEEngineering in Medicine and Biology Society (EMBS ’06), pp.4584–4589, New York, NY, USA, August-September 2006.

[9] H. Amoud, H. Snoussi, D. Hewson, M. Doussot, and J.Duchene, “Intrinsic mode entropy for nonlinear discriminantanalysis,” IEEE Signal Processing Letters, vol. 14, no. 5, pp. 297–300, 2007.

[10] J. P. Carroll and W. Freedman, “Nonstationary properties ofpostural sway,” Journal of Biomechanics, vol. 26, no. 4-5, pp.409–416, 1993.

[11] G. F. Harris, S. A. Riedel, D. Matesi, and P. Smith, “Standingpostural stability assessment and signal stationarity in childrenwith cerebral palsy,” IEEE Transactions on RehabilitationEngineering, vol. 1, no. 1, pp. 35–42, 1993.

[12] N. E. Huang, Z. Shen, S. R. Long, et al., “The empirical modedecomposition and the Hilbert spectrum for nonlinear andnon-stationary time series analysis,” Proceedings of the RoyalSociety of London A, vol. 454, no. 1971, pp. 903–995, 1998.

[13] T. Tanaka and D. P. Mandic, “Complex empirical modedecomposition,” IEEE Signal Processing Letters, vol. 14, no. 2,pp. 101–104, 2007.

[14] G. Rilling, P. Flandrin, P. Goncalves, and J. M. Lilly, “Bivariateempirical mode decomposition,” IEEE Signal Processing Let-ters, vol. 14, no. 12, pp. 936–939, 2007.

[15] M. U. Bin Altaf, T. Gautama, T. Tanaka, and D. P. Mandic,“Rotation invariant complex empirical mode decomposition,”in Proceedings of the IEEE International Conference on Acous-tics, Speech and Signal Processing (ICASSP ’07), vol. 3, pp.1009–1012, Honolulu, Hawaii, USA, April 2007.

[16] V. Michel-Pellegrino, H. Amoud, D. Hewson, and J. Duchene,“Identification of a degradation in postural equilibriuminvoked by different vibration frequencies on the tibialis ante-rior tendon,” in Proceedings of the 28th Annual InternationalConference of the IEEE Engineering in Medicine and BiologySociety (EMBS ’06), pp. 4047–4050, New York, NY, USA,August-September 2006.


Review ArticleMultimodal Pressure-Flow Analysis: Application of HilbertHuang Transform in Cerebral Blood Flow Regulation

Men-Tzung Lo,1, 2, 3 Kun Hu,1 Yanhui Liu,4 C.-K. Peng,2 and Vera Novak1

1 Division of Gerontology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02115, USA2 Division of Interdisciplinary Medicine & Biotechnology and Margret & H.A. Rey Institute for Nonlinear Dynamics in Medicine,Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02115, USA

3 Research Center for Adaptive Data Analysis, National Central University, Chungli 32054, Taiwan4 DynaDx Corporation, Mountain View, CA 94041, USA

Correspondence should be addressed to Vera Novak, [email protected]

Received 3 September 2007; Revised 15 February 2008; Accepted 14 April 2008


Quantification of nonlinear interactions between two nonstationary signals presents a computational challenge in differentresearch fields, especially for assessments of physiological systems. Traditional approaches that are based on theories of stationarysignals cannot resolve nonstationarity-related issues and, thus, cannot reliably assess nonlinear interactions in physiologicalsystems. In this review we discuss a new technique called multimodal pressure flow (MMPF) method that utilizes Hilbert-Huangtransformation to quantify interaction between nonstationary cerebral blood flow velocity (BFV) and blood pressure (BP) for theassessment of dynamic cerebral autoregulation (CA). CA is an important mechanism responsible for controlling cerebral bloodflow in responses to fluctuations in systemic BP within a few heart-beats. The MMPF analysis decomposes BP and BFV signalsinto multiple empirical modes adaptively so that the fluctuations caused by a specific physiologic process can be represented in acorresponding empirical mode. Using this technique, we showed that dynamic CA can be characterized by specific phase delaysbetween the decomposed BP and BFV oscillations, and that the phase shifts are significantly reduced in hypertensive, diabetics andstroke subjects with impaired CA. Additionally, the new technique can reliably assess CA using both induced BP/BFV oscillationsduring clinical tests and spontaneous BP/BFV fluctuations during resting conditions.

Copyright © 2008 Men-Tzung Lo et al. This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. INTRODUCTION

Previous works have demonstrated that fluctuations in phys-iological signals carry important information reflecting themechanisms underlying control processes and interactionsamong organ systems at multiple time scales. A majorproblem in the analysis of physiological signals is relatedto nonstationarities (statistical properties such as mean andstandard deviation vary with time), which is an intrinsicfeature of physiological data and persists even withoutexternal stimulation [1–3]. The presence of nonstation-arities makes traditional approaches assuming stationarysignals not reliable. To resolve the difficulties related tononstationary behavior, concepts and methods derived fromstatistical physics have been applied in the studies of differentcontrol mechanisms including locomotion control [4–6],cardiac regulation [7, 8], cardio-respiratory coupling [9–11], renal vascular autoregulation [12], cerebral blood flow

regulation [13–16], and circadian rhythms [17–19]. One ofthe innovative approaches applied to physiological studies isHilbert Huang transform (HHT) [20]. The HHT is basedon nonlinear chaotic theories and has been designed toextract dynamic information from nonstationary signalsat different time scales. The advantages of the HHT overtraditional Fourier-based methods have been appreciatedin many studies of different physiological systems such asblood pressure hemodynamics [21], cerebral autoregulation[13, 15, 16], cardiac dynamics [22], respiratory dynamics[23], and electroencephalographic activity [24]. In thisreview, we focus on the computational challenge on thequantification of interactions between two nonstationaryphysiologic signals. To demonstrate progress in resolving thegeneric problem related to nonstationarities, we review therecent applications of nonlinear dynamic approaches basedon HHT to one specific physiological control mechanism—cerebral blood flow regulation.


Cerebral autoregulatory mechanisms are engaged tocompensate for metabolic demands and perfusion pressurevariations under physiologic and pathologic conditions [25,26]. Dynamic autoregulation reflects the ability of thecerebral microvasculature to control perfusion by adjustingthe small-vessel resistances in response to beat-to-beat bloodpressure (BP) fluctuations by involving myogenic and neu-rogenic regulation. Reliable and noninvasive assessment ofcerebral autoregulation (CA) is a major challenge in medicaldiagnostics. Transcranial Doppler ultrasound (TCD) enablesassessment of dynamic CA during interventions with suddensystemic BP changes induced by the Valsalva maneuver(VM), head-up tilt, and sit-to-stand test in various medicalconditions [13, 26–34]. Conventional approaches typicallymodel cerebral regulation using mathematical models of alinear and time-invariant system to simulate the dynamicsof BP as an input to the system, and cerebral blood flowas output. A transfer function is typically used to explorethe relationship between BP and cerebral blood flow velocity(BFV) by calculating gain and phase shift between the BPand BFV power spectra [26, 35–40]. Many studies haveshown that transfer function can identify alterations inBP-BFV relationship under pathologic conditions such asstroke, hypertension, and traumatic brain injuries that areassociated with impaired autoregulation [26, 35–39, 41–43].This Fourier transform-based approach, however, assumedthat signals are composed of superimposed sinusoidal oscil-lations of constant amplitude and period at a predeterminedfrequency range. This assumption puts an unavoidablelimitation on the reliability and application of the method,because BP and BFV signals recorded in clinical settings areoften nonstationary and are modulated by nonlinearly inter-acting processes at multiple time-scales corresponding to thebeat-to-beat systolic pressure, respiration, spontaneous BPfluctuations, and those induced by interventions.

To overcome problems in CA evaluations related tononstationarity and nonlinearity, several approaches derivedfrom concepts and methods of nonlinear dynamics havebeen proposed [13–16, 44–47]. A novel computationalmethod called multimodal pressure-flow (MMPF) analysiswas recently developed to study the BP-BFV relationshipduring the Valsalva maneuver (VM) [13]. The MMPFmethod enables evaluation of autoregulatory dynamics basedon instantaneous phase analysis of BP and BFV oscillationsinduced by the intervention (a sudden reduction of BPand BFV followed by an increase in both signals). TheMMPF applies an empirical mode decomposition (EMD)algorithm to decompose complex BP and BFV signals intomultiple empirical modes [21]. Each mode represents afrequency-amplitude modulation in a narrow frequencyband that can be related to a specific physiologic process.For example, this technique can easily identify BP and BFVoscillations induced by the VM (0.1–0.03 Hz, i.e., period∼10 to 30 seconds). Using this method, a characteristicphase lag between BFV and BP fluctuations correspondingto VM was found in healthy subjects, and this phase lagwas reduced in patients with hypertension and stroke [13].These findings suggested that BFV-BP phase lag could serveas an index of CA. However, intervention procedures, such

as the VM, introduce large intracranial pressure fluctuationsand also require patients’ active participation. As a result,such procedures are not applicable under various clinicalconditions, such as in acute care settings.

It has been hypothesized that CA can be evaluated fromspontaneous BP-BFV fluctuations during resting conditions[14–16]. This hypothesis has been motivated by the facts that(i) CA is a continuous dynamic process so that it shouldalways engage to regulate cerebral blood flow, and (ii) BPand BFV display spontaneous fluctuations at different timescales [38, 39, 48–50] even during resting conditions. Sincespontaneous BP and BFV fluctuations can be entrainedby respiration or other external perturbation over a widefrequency range [0.05–0.4 Hz] [51, 52] and the dominantfrequency of spontaneous BP fluctuations varies amongindividuals over time and under different test conditions,reliable measures of the nonlinear BFV-BP relationshipwithout preassuming oscillation frequencies and waveformshapes are needed. These requirements are well satisfiedby the MMPF algorithm which extracts intrinsic BP andBFV oscillations embedded in the original signals andquantifies instantaneous phase relationship between them. Ifthe MMPF is sensitive and can provide reliable estimation ofautoregulation using spontaneous BP and BFV fluctuations,it is expected that, similar to BP and BFV oscillationsintroduced by the VM, spontaneous BFV and BP oscillationsduring resting conditions should also exhibit specific phaseshifts.

In this review, we present an overview of the transferfunction analysis (TFA) that was traditionally used toquantify CA (Section 2) and of the MMPF method and itsmodifications (Section 3). In Section 4, we introduce a newlydeveloped automatic algorithm for the improved MMPFmethod as well as engineering aspects that will potentiallylead to a fully automated analysis without expert input.In Section 5, we review previous applications of MMPF inclinical studies [15, 16], in which the ability and reliabilityof the method in assessing the CA from spontaneous BP-BFV fluctuations during resting conditions were evaluated(Section 5). Specifically, we discuss the MMPF results inthree pathological conditions that are associated with car-diovascular complications affecting cerebrovascular controlsystems (stroke, hypertension, and diabetes) [53–57]. Ourprevious studies have shown altered CA in these conditions[13, 15, 16]. Additionally, a comparison of the MMPF andthe TFA results in the study of type 2 diabetes was discussed.In Section 6, we discuss why nonlinear dynamic approachessuch as the MMPF can more reliably quantify nonlinearrelationship between nonstationary signals.

2. TRANSFER FUNCTION ANALYSIS

Transfer function analysis which has been widely used inthe CA assessment [35, 58] is based on Fourier transform.BP and BFV signals are decomposed into multiple sinu-soidal waveforms in order to compare the amplitudes andphases of BP and BFV components at different frequencies.The coherence representing the degree of similarity inthe variation (phase or amplitude) of two signals within

Men-Tzung Lo et al. 3

specific frequencies, then, can be evaluated through thecross-spectrum. In general, a strong coherence indicatesdysfunction of CA.

The BP and BFV time series are first linearly detrendedand divided into 5000-point (100-seconds) segments with50% overlap. The Fourier transform of BP, denoted as Sp( f ),and BFV, denoted as SV ( f ), is calculated for each segmentwith a spectral resolution of 0.01 Hz, and was used tocalculate the transfer function:

H( f ) = Sp( f )S∗V ( f )∣∣Sp( f )

∣∣2 = G( f )e jφ( f ), (1)

where S∗V ( f ) is the conjugate of SV ( f ); |SP( f )|2 is the powerspectrum density of BP; G( f ) = |H( f )| is the transferfunction amplitude (gain); and φ( f ) is the transfer functionphase at a specific frequency f . The amplitude and the phaseof the transfer function reflect the linear amplitude and timerelationship between the two signals. The reliability of theselinear relationships can be evaluated by C( f ), coherence thatranges from 0 to 1:

C( f ) =∣∣SP( f )S∗V ( f )

∣∣2

∣∣SP( f )∣∣2∣∣SV ( f )

∣∣2 . (2)

A coherence value close to 0 indicates the lack of linearrelationship between BP and BFV signals and, therefore,the linear relationship between BP and BFV estimated bythe transfer function is not reliable. The absence of linearrelationship between BP and BFV is usually assumed toreflect the nonlinear influence of CA.

Average coherence, gain, and phase are calculated in thefrequency range below 0.07 Hz in which the CA is assumedto be most effective [35, 39]. For comparison with the MMPFresults, the same transfer function analysis is also performedin the same frequency range as the observed dominantspontaneous oscillations in BP and BFV.

3. MULTIMODAL PRESSURE-FLOW METHOD

The main concept of the MMPF method is to quantifynonlinear BP-BFV relationship by concentrating on intrinsiccomponents of BP and BFV signals that have simplifiedtemporal structures but still can reflect nonlinear inter-actions between two physiologic variables. The MMPFmethod includes four major steps: (1) decomposition of eachsignal (BP and BFV) into multiple empirical modes, (2)selection of empirical modes for (dominant) oscillations inBP and corresponding oscillations in BFV (3) calculation ofinstantaneous phases of extracted BP and BFV oscillations,and (4) calculation of biomarker(s) of CA based on BP-BFVphase relationship.

The improved MMPF method provides a more reliableestimation of BP-BFV phase relationship by implementinga noise assisted EMD, called ensemble EMD (EEMD) [59],to extract oscillations embedded in nonstationary BP andBFV signals. The EEMD technique can ensure that eachcomponent does not consist of oscillations at dramaticallydisparate scales, and that different components are locally

nonoverlapping in the frequency domain. Thus, each com-ponent obtained from the EEMD may better representfluctuations corresponding to a specific physiologic process.To demonstrate such an advantage of the EEMD, we willapply the method to extract dominant spontaneous BP-BFVoscillations during baseline resting conditions and comparethe results to those obtained from the traditional EMDmethod.

3.1. Empirical mode decomposition

To achieve the first major step of MMPF, we originallyutilized the empirical mode decomposition (EMD) algo-rithm, developed by Huang et al. [21] to decompose thenonstationary BP and BFV signals into multiple empiricalmodes, called intrinsic mode functions (IMFs). Each IMFrepresents a frequency-amplitude modulation in a narrowband that can be related to a specific physiologic process [21].

For a time series x(t) with at least 2 extremes, the EMDuses a sifting procedure to extract IMFs one by one from thesmallest scale to the largest scale:

x(t) = c1(t) + r1(t)

= c1(t) + c2(t) + r2(t)

...

= c1(t) + c2(t) + · · · + cn(t),

(3)

where ck(t) is the kth IMF component, and rk(t) is the resid-ual after extracting the first k IMF components {i.e., rk(t) =x(t) − ∑k

i=1 ci(t)}. Briefly, the extraction of the kth IMFincludes the following steps.

(i) Initialize h0(t) = hi−1(t) = rk−1(t) (if k = 1, h0(t) =x(t)), where i = 1.

(ii) Extract local minima/maxima of hi−1(t) (if the totalnumber of minima and maxima is less than 2, ck(t) =hi−1(t) and stop the whole EMD process).

(iii) Obtain upper envelope (from maxima) and lowerenvelope (from minima) functions p(t) and v(t) byinterpolating local minima and maxima of hi−1(t),respectively.

(iv) Calculate hi(t) = hi−1(t)− (p(t) + v(t))/2.

(v) Calculate the standard deviation (SD) of (p(t) +v(t))/2.

(vi) If SD is small enough (less than a chosen thresholdSD max, typically between 0.2 and 0.3) [21], the kthIMF component is assigned as ck(t) = hi(t) andrk(t) = rk−1(t) − ck(t); otherwise repeat steps (ii) to(v) for i + 1 until SD < SD max.

The above procedure is repeated to obtain different IMFsat different scales until there are less than 2 minima ormaxima in a residual rk−1(t) which will be assigned as thelast IMF (see the step (ii) above).


EMD

0 50 100 150 200 250 300

0 50 100 150 200 250 300

0 50 100 150 200 250 300

0 50 100 150 200 250 300

0 50 100 150 200 250 300

0 50 100 150 200 250 300

−10

0

10

−10−5

0

5

−5

0

5

−5

0

5

−5

0

5

−5

0

5

0

0.5

1

0

0.5

1

0

0.5

1

0

0.5

1

0

0.5

1

0

0.5

1

50 100 150 200 250

50 100 150 200 250

50 100 150 200 250

50 100 150 200 250

50 100 150 200 250

50 100 150 200 250

204060

204060

204060

204060

204060

204060

The spectrogram of the oscillationentrained by respiration

Figure 1: (Left panel) A raw BP signal and its decomposed empirical modes (i.e., c5–c9 components from bottom to top) obtained bythe EMD method. (Right panel) The corresponding short-time Fourier transform (STFT) spectrograms of the signals in left panel. Thespectrogram was obtained using Gaussian sliding window with time duration of 40 seconds, shifted 2 seconds between successive evaluationsand then plotted using color map.

3.2. Ensemble empirical mode decomposition (EEMD)

For signals with intermittent oscillations, one essential prob-lem of the EMD algorithm is that an intrinsic mode couldcomprise of oscillations with very different wavelengthsat different temporal locations (i.e., mode mixing). Theproblem can cause certain complications for our analysis,making the results less reliable. To overcome the modemixing problem, a noise assisted EMD algorithm, namely,the ensemble empirical mode decomposition (EEMD), hasbeen proposed [59]. The EEMD algorithm first generatesan ensemble of data sets obtained by adding differentrealizations of white noise to the original data. Then, theEMD analysis is applied to these new data sets. Finally,the ensemble average of the corresponding intrinsic modefunctions from different decompositions is calculated as thefinal result. Shortly, for a time series x(t), the EEMD includesthe following steps.

(i) Generate a new signal y(t) by superposing to x(t)a randomly generated white noise with amplitudeequal to certain ratio of the standard deviation of x(t)(applying noise with larger amplitude requires morerealizations of decompositions).

(ii) Perform the EMD on y(t) to obtain intrinsic modefunctions.

(iii) Iterate steps (i)-(ii) m times with different whitenoise to obtain an ensemble of intrinsic modefunction (IMFs) {c1

k(t), k = 1, 2, . . . ,n}, {c2k(t), k =

1, 2, . . . ,n}, . . . , {cmk (t), k = 1, 2, . . . ,n}.(iv) Calculate the average of intrinsic mode func-

tions {ck(t), k = 1, 2, . . . ,n}, where ck(t) =(1/m)

∑mi=1 c

ik(t).

The last two steps are applied to reduce noise level andto ensure that the obtained IMFs reflect the true oscillationsin the original time series x(t). In this study, we repeatdecomposition m times (m ≥ 200) to make sure that thenoise is reduced to negligible level.

To illustrate the mode mixing problem, we applied bothEMD and EEMD to BP signal of a healthy subject. Figure 1shows the results of the EMD. The left-side panels of Figure 1show the original BP signal (the top plot) and the decom-posed IMFs (modes 9–5 from the second to the bottomplots). For each plotted signal on the left side of Figure 1,the corresponding short-time Fourier transform (STFT)spectrogram was obtained by applying Fourier transformin overlapped Gaussian sliding windows (the window sizeis 40 seconds and 2 seconds shift between two successivewindows) and was plotted using color mapping on the rightside of Figure 1. As shown in the rectangle area of the STFTspectrograms of raw BP signals (marked using white line, the


EEMD

0 50 100 150 200 250 300

0 50 100 150 200 250 300

0 50 100 150 200 250 300

0 50 100 150 200 250 300

0 50 100 150 200 250 300

0 50 100 150 200 250 300

−5

0

5

−10

0

10

−5

0

5

−5

0

5

−5

0

5

−5

0

5

0

0.5

1

0

0.5

1

0

0.5

1

0

0.5

1

0

0.5

1

0

0.5

1

50 100 150 200 250

50 100 150 200 250

50 100 150 200 250

50 100 150 200 250

50 100 150 200 250

50 100 150 200 250

204060

204060

204060

204060

204060

204060

Figure 2: (Left panel) The same BP signal as shown in Figure 1 and its decomposed empirical modes (i.e., c5–c9 components from bottom totop) obtained by the EEMD method. (Right panel) The corresponding short-time Fourier transform (STFT) spectrograms of the signals inleft panel. The spectrograms were calculated and plotted using the same procedure discussed in Figure 1. The noise ratio for EEMD methodis 0.2.

top panel of the right side in Figure 1), the instantaneousfrequency of spontaneous oscillation entrained by the res-piration is time dependent over the range of 0.18∼0.3 Hz.Both mode 5 and mode 6 IMFs from the EMD containparts of respiration induced oscillations in BP at differenttime, that is, no single IMF mode can reflect respirationinfluence consistently throughout the entire time series. Incontrast, as shown in Figure 2, the mode 7 IMF from theEEMD can fully represent the respiratory oscillations in BP,as indicated by the same STFT spectrogram of the IMF asthe original BP signals in the frequency range of 0.18–0.3 Hz.Using the EEMD, we also extracted the respiration inducedoscillations in the simultaneously recorded BFV signal of thesame subject (mode 7 IMF in Figure 3).

As shown in our simulation, EEMD ensures the decom-positions to compass the range of possible solutions inthe sifting process and to collate the signals of differentscales in the proper IMF naturally. It produces a set ofIMFs, each displaying a time-frequency distribution withouttransitional gaps. With the elimination of the mode mixingproblem, the EEMD can better extract intrinsic mode(s)corresponding to specific physiologic mechanisms.

3.3. Mode selection

The second step of the MMPF is to choose an IMF for the BPand the corresponding IMF for the BFV signal. The choiceseems rather subjective and any mode within the interested

frequency range can be used. The following criteria areproposed for this step in order to improve reliability androbustness of MMPF results. The most important one isto ensure that the two chosen IMFs are matched, that is,the extracted fluctuations in BP and BFV correspond tothe same physiologic process. In addition, it is better tochoose BP component that has reproducible patterns tominimize variability among different trials. For example,the initial MMPF study used the BP and BFV oscillationsinduced by interventions such as VM [13], and recent studiesused the spontaneous BP and BFV oscillations entrained byrespiration [15, 16]. We will discuss these applications of theMMPF and its performance in Section 4.

3.4. Hilbert transform

The third major step of the MMPF analysis is to obtaininstantaneous phases of the extracted BP and BFV oscil-lations (i.e., the IMFs correspond to specific physiologyprocess). Note that the extracted BP and BFV oscillationsare not stationary, that is, their amplitude and frequencyvary over time. Such nonstationary oscillations can be bettercharacterized by analytical methods that can quantify theamplitude and phase (or frequency) at any given moment.Therefore, the MMPF uses Hilbert transform to obtaininstantaneous phases of BP and BFV oscillation. Unlike theFourier transform, Hilbert transform does not assume thatsignals are composed of superimposed sinusoidal oscillations


EEMD

0 50 100 150 200 250 300

0 50 100 150 200 250 300

0 50 100 150 200 250 300

0 50 100 150 200 250 300

0 50 100 150 200 250 300

0 50 100 150 200 250 300

−202

4

−5

50

10

−5

0

5

−5

0

5

−5

0

5

−202

4

0

0.5

1

0

0.5

1

0

0.5

1

0

0.5

1

0

0.5

1

0

0.5

1

50 100 150 200 250

50 100 150 200 250

50 100 150 200 250

50 100 150 200 250

50 100 150 200 250

50 100 150 200 250

204060

204060

204060

204060

204060

204060

Figure 3: (Left panel) A raw BFV signal and its decomposed empirical modes (i.e., c5–c9 components from bottom to top) obtained bythe EEMD method. (Right panel) The corresponding short-time Fourier transform (STFT) spectrograms of the signals in left panel. Thespectrograms were calculated and plotted using the same procedure discussed in Figure 1. The noise ratio for EEMD method is 0.2.

with constant amplitude and frequency. Thus, the instan-taneous phases obtained from Hilbert transform are moresuitable for the assessment of the nonlinear relationshipbetween complex oscillations [60].

In order to obtain instantaneous phases with appropriatephysical meaning, Hilbert transform requires that an oscilla-tory signal should be symmetric with respect to the local zeromean and the numbers of zero crossings and extreme shouldbe the same. The intrinsic mode function derived from theEMD method satisfies this requirement (see Section 3.1). Fora time series s(t), its Hilbert transform is defined as

s(t) = 1πP∫s(t′)

t − t′ dt′, (4)

where P denotes the Cauchy principal value. Hilbert trans-form has an apparent physical meaning in Fourier space: forany positive (negative) frequency f , the Fourier componentof the Hilbert transform s(t) at this frequency f canbe obtained from the Fourier component of the originalsignal s(t) at the same frequency f after a 90◦ clockwise(anticlockwise) rotation in the complex plane, for example,if the original signal is cos(ωt), its Hilbert transform willbecome cos(ωt − 90◦) = sin(ωt). For any signal s(t), thecorresponding analytic signal can be constructed using itsHilbert transform and the original signal:

S(t) ≡ s(t) + is(t) = A(t)eiϕ(t), (5)

where A(t) and ϕ(t) are the instantaneous amplitude andinstantaneous phase of s(t), respectively.

In particular, the instantaneous BP and BFV phasesare calculated on a sample by sample basis. The BP-BFVphase shift for each subject is calculated as the average ofinstantaneous differences of BFV and BP phases over theentire baseline. The instantaneous BP-BFV phase shift isaveraged over a prolonged time period to provide statisticallyrobust phase estimates.

3.5. MMPF autoregulation indices

The last step of the MMPF is to derive indices of CA fromthe instantaneous phases of BP and BFV oscillations. It isbelieved that CA leads to fast recovery of BFV in response toBP fluctuations and, thus, the phases of BFV oscillations areadvanced compared to BP phases. For simplicity of statisticalanalysis, originally the phase shift at the minimum andmaximum of these two signals is used as the index of CA[13]. To provide statistically more robust phase estimates, theBP-BFV phase shift for each subject can be calculated as theaverage of instantaneous differences of BFV and BP phasesover the course of the VM or spontaneous oscillations [16].

4. COMPUTER-ASSISTED PROGRAM FORMMPF ANALYSIS

To implement the steps in Sections 3.3–3.5 in the MMPFanalysis, a software package was developed to load thedecomposed intrinsic modes of BP and BFV signals, to allowthe selections of BP and BFV components, and to calculate


the MMPF autoregulation index (see Figure 4). In previousversion of the MMPF software, the selection of BP and BFVcomponents had been done manually, that is, a researcherwill pick an intrinsic mode after visualizing all componentsdecomposed by the EMD or EEMD. The manual selectionis useful, but it requires fully understanding the MMPFalgorithm and all technical details of the program execution.Moreover, the manual selection needs human inputs and itis time consuming. Therefore, the best solution would be toenable a program-based automatic selection according to thedefined criteria for mode selection, described in Section 3.3.As a first step to achieve this goal, we have designeda computer-assisted program to select the respiratory-modulated oscillation from the decomposed IMF modes.In this program, the STFT spectrogram analysis, a well-known method of time frequency analysis, is performedfor all decomposed modes (right panel of Figures 2 and3). For each mode, the instantaneous mean frequencyfor each sliding window is obtained. The IMF with themean frequency oscillating mostly in a selected frequencyrange (e.g., 0.1∼0.4 Hz for spontaneous oscillations duringbaseline conditions) is automatically picked as the defaultmode to be used for the assessment of autoregulation.With the illustrated spectrograms, the default mode canalso be manually verified or modified to ensure that theautomated selection is appropriate. The same procedure isused to obtain both spontaneous oscillations in BP and thecorresponding oscillations in BFV. Finally, the instantaneousBP and BFV phases are calculated using Hilbert transform ona sample by sample basis. The instantaneous BP-BFV phaseshift for each subject is averaged over 5 minutes and is usedas an index of the dynamic CA.

5. PERFORMANCE OF IMPROVED MMPF

5.1. Assessment of autoregulation in healthycontrol, hypertensive, and stroke subjectsduring resting condition

To test whether the MMPF can evaluate the dynamics ofCA from spontaneous BP-BFV fluctuations during supinerest, our recent study compared the BP-BFV phase shiftsobtained from BP and BFV oscillations introduced by theVM and from spontaneous BP-BFV oscillations duringsupine baseline [15]. Data of 12 control, 10 hypertensive,and 10 stroke subjects during VM and baseline restingcondition were analyzed using the improved MMPF method.Spontaneous oscillations (period: mean ± SD, 15.7 ± 9.2seconds) in the same frequency range as the VM oscillations(17.7 ± 7.9 seconds, pair t-test P = .37) were chosen. BP-BFV phase shifts during spontaneous oscillations (rangingfrom ∼−60 to 120 degrees) were highly correlated to thoseobtained from VM oscillations (left side middle cerebralarteries R = 0.92, P < .0001; right side R = 0.80, P <.0001) (see Figure 5). Consistently, the paired- t test showedthat the average BP-BFV phase shifts during baseline werestatistically the same as the values during the VM (P > .47).These results indicate that the MMPF method can enablereliable assessment of CA dynamics and its impairment

under pathologic conditions using spontaneous BP-BFVfluctuations.

5.2. Measurement of cerebral autoregulationdynamics based on spontaneous oscillationsentrained by respirations in diabetic subjects

In our recent study [16], the MMPF method was appliedto study the relationship between spontaneous BP-BFVoscillations at the respiratory frequency (∼0.1–0.4 Hz) inhealthy (control) and diabetic subjects. The results showedthat in healthy subjects, there were also specific phaseshifts between spontaneous BP and BFV oscillations overthis frequency range (0.1–0.4 Hz) and that the phase shiftswere significantly reduced in patients with type 2 diabetes,indicating altered dynamics of BP-BFV relationship, andthus impairment of vasoregulation in diabetic subjects (seeFigure 6). In contrast, the transfer function analysis wasunable to show any significant group differences of phaseshifts between BP and BFV signals at the frequency <0.07 Hzin which CA is traditionally studied as well as over thefrequency range of 0.1–0.4 Hz (see Table 1). The sensitivityand specificity of the MMPF and transfer function measureswere compared using receiver operating characteristic (ROC)analysis [61] by comparing the areas under the ROC curves(AUC) between the control and diabetes groups. The ROCanalysis showed that the AUC of MMFP-based phase shifts(left: 0.94 ± 0.04; right: 0.87 ± 0.06) are larger than thoseobtained by applying transfer function analysis (left: 0.56 ±0.09, P < .001; right: 0.56 ± 0.09, P = .003) (see Figure 7),indicating that the BP-BFV phase shifts may serve as a moresensitive biomarker for the diabetes mellitus (DM) groupthan the traditional transfer function phase.

6. DISCUSSION & CONCLUSION

6.1. Assessment of nonlinear interactions betweennonstationary signals

Quantification of nonlinear interactions between two non-stationary signals presents a computational challenge in dif-ferent research fields, especially for assessments of physiolog-ical systems. The computational approaches, based on tradi-tional theories and methods, cannot resolve nonstationarity-related issues and be used reliably to study these systems.One possible and promising approach is to utilize and adoptconcepts and methods derived from nonlinear dynamicsthat are designed to explore nonlinear interactions innonstationary systems. In the last two decades, nonlineardynamic approaches have been applied in many differentbiological fields such as cardiovascular system, respiration,locomotor activity, and neuronal activity in brain [11, 14, 62,63]. It has been gradually accepted that nonlinear dynamicmethods can provide new information about the controlmechanisms of physiological systems that may be difficultto be characterized using traditional approaches. In thisreview, we aim to demonstrate the point by discussingrecent advance in the field of cerebral blood flow regulationand the contribution of a nonlinear dynamic approach as


20 40 60 80 100 120Time (s)

−4

−2

0

2

40

50

100

15020

40

60

80

1000

20

40

60

80

100

BP

MC

AR

MC

AL

−100 −50 0 50 100 150 200 250 300BP phase

−100

−50

0

50

100

150

200

250

300

350

400

450

MCALMCAR

MC

Aph

ase

Subject: VAUA289.xl0

7

7

7

11.835621.3537

179.9919

18.38

229.8668

18.38

240.6777

18.38

Figure 4: Screen copy of the MMPF analysis software (adapted from [15]). The data shown in this plot are from a healthy subject. The topthree panels on the left show BFV (left side and right side) and BP signals, respectively. The colored curves in these panels show the resultsafter removing faster fluctuations from the original signals. The bottom left panel shows the corresponding intrinsic modes for these threesignals (red: BP; blue: BFV on right side; green: BFV on left side). The vertical red dashed box (around 40–50 seconds) identifies part ofthe VM period. The spontaneous oscillations in these signals during resting conditions prior to the VM can also be visualized. One of theseoscillations (around 14–22 seconds) is identified by two vertical red lines. The result of the BP-BFV phase shift analysis of this period isplotted in the right panel. A reference line (dotted black line), indicating synchronization between BP and BFV, is shown in this panel foreasy comparison. The result is representative of normal autoregulation where BFV leads BP (by about 50 degrees in phase).

represented by the multimodal pressure flow method (asdiscussed in the following sections). Though the MMPFmethod has been mainly applied to assess the cerebralautoregulation, the concept of this approach is generallyapplicable for other physiological controls that involveinteractions between two nonstationary signals. Designingand improving these approaches are crucial to tackle thegeneric problem related to nonstationarity.

6.2. Assessment of autoregulation from spontaneousBP and BFV oscillations

Autoregulatory responses are assessed by challenging cere-brovascular systems using interventions such as the VM,thigh cuff deflation, and the head-up tilt [26–31, 64].However, these intervention procedures may introduce largeintracranial pressure fluctuations and require patients’ activecooperation. Therefore, they are not generally applicable in

acute care clinical settings. In recent studies, an improvedMMPF method was introduced to quantify the BP-BFVrelationship in healthy, hypertensive, and stroke subjectsduring supine resting conditions [15]. The results supportthe notion that autoregulation is a dynamic process andis always engaged even during resting conditions. Dynamicautoregulation is needed for continuous adjustment ofcerebral perfusion in response to variations of autonomiccardiovascular and respiratory control (e.g., respiration,heart rate, blood pressure, vascular tone). Furthermore,applying the method to healthy and diabetic subjects, weshowed that cerebral vasoregulatory processes that controlpressure-flow relationship can operate at shorter time-scales(<10 seconds) than previously suggested (see Figure 6).

In this review, we also introduced new results thatpresent a significant improvement of MMPF method byintroducing an automated mode selection algorithm thatis based on time-frequency analysis. This approach allows


0 60 120

Baseline BP-BFV phase shift (degrees)

0

60

120

VM

BP-

BFV

phas

esh

ift

(deg

rees

)

R = 0.92P < .0001

ControlHTNStroke

(a)

0

60

120

P = .01P = .003

Control HTN Stroke

(c)

0 60 120

Baseline BP-BFV phase shift (degrees)

0

60

120

VM

BP-

BFV

phas

esh

ift

(deg

rees

)

R = 0.8P < .0001

ControlHTNStroke

(b)

0

60

120

P = .02P = .003

Control HTN Stroke

(d)

Figure 5: Comparison of the BP-BFV phase shift during two different conditions and between control, hypertensive (HTN), and strokegroups. (a)-(b) (adapted from [15]). For each subject in this study, BP-BFV phase shifts for left (a) and right (b) side middle cerebral arteries(MCAs) were measured during the Valsalva maneuver (VM) and during supine baseline conditions. The straight line is the linear regressionfit of the data. The phase shifts during VM and baseline showed a strong correlation (left R = 0.92, P < .0001; right R = 0.8, P < .0001).(c)-(d). BP-BFV phase shifts during VM were smaller in hypertensive and stroke groups than in control group in both left and right MCAs(HTN: left P = .01, right P = .02; Stroke: left P = .003, right P = .003).

objective mode selection based on time-frequency measures.Thus, the MMPF software is now more user-friendly anddoes not require computational knowledge to implement theMMPF technique for clinical evaluations.

Unlike traditional Fourier transform based approaches,the MMPF method does not assume the BP and BFVas superimposed sinusoidal oscillations of constant ampli-tude and period at a preset frequency range. Instead, themethod adopts a new adaptive signal processing algorithm,EEMD, to extract dominant spontaneous oscillations thatare actually embedded in the BP and BFV fluctuations.Since spontaneous oscillations that are related to a specificphysiology process are usually nonstationary (i.e., statisticalproperties such as mean levels and oscillation period varyover time and change for different subjects), the conventionalfilters that are based on Fourier or wavelet theories are not

reliable or valid for the extraction of embedded spontaneousoscillation from the BP and BFV signals. In this paper,we demonstrated that the EEMD can accurately extractoscillations associated with respirations from nonstationaryBP and BFV signals. This result indicates that the EEMD canserve as a blind time-variant filter to extract the embeddednonstationary oscillations adaptively. Studying spontaneousBP and BFV oscillations extracted by the EEMD methodrevealed advanced phases in BFV compared to those inBP, that is, flow oscillations preceded systemic pressureoscillations. These BP-BFV phase shifts were similar tothose observed during the VM at the BP minimum andmaximum [13]. Such positive phase shift has also beenreported using Fourier transform methods during head-uptilt and is interpreted as the faster recovery of BFV causedby the compensation of cerebral vasoregulation [30]. In our


Control

0 60 120Time (seconds)

0

40

80

BFV

-BP

phas

esh

ift

(deg

rees

)

LeftRight

−3

0

3

6

Com

pon

ents

3060

BFV

R(c

m/s

)

30

60

BFV

L(c

m/s

)

70

140

BP

(mm

Hg)

BPBFVLBFVR

0 60 120

Time (seconds)

(a)

DM

0 60 120Time (seconds)

−40

0

40

80B

FV-B

Pph

ase

shif

t(d

egre

es)

LeftRight

−5

0

5

Com

pon

ents

70140

BFV

R(c

m/s

)

70

140

BFV

L(c

m/s

)

70

140BP

(mm

Hg)

0 60 120

Time (seconds)

BPBFVLBFVR

(b)

Left RightSubject

0

20

40

60

80

BP-

BFV

phas

esh

ift

(deg

rees

)

P < .0001 P < .0001

ControlDM

(c)

Figure 6: Spontaneous oscillations of blood pressure (BP) and cerebral blood flow velocity (BFV) in (a) a 72-year-old healthy control womanand (b) a 52-year-old man with type 2 diabetes during supine baseline. Figure 6(a) was adapted from [16]. BP, left and right BFVs (panels 1 to3 in (a) and (b)) were decomposed into different modes using ensemble empirical mode decomposition algorithm, each mode correspondingto fluctuations at different time scale. The components corresponding to respirations at frequency ranging from ∼0.1 to 0.4 Hz (the forthpanels in (a) and (b)) were extracted and used for the assessment of BP-BFV relationship. Instantaneous phases of BP and BFV oscillations(solid lines in the bottom panels of (a) and (b)) were obtained using the Hilbert transform. There were large time/phase delays in BPoscillations compared to the BFV oscillations. For each subject, the average BFV-BP phase shift (horizontal dashed lines in bottom panelsof (a) and (b)) was obtained as the average of instantaneous BFV-BPV phase shifts during the entire 5-min supine baseline. (c) Phase shiftsbetween spontaneous oscillations of BP and BFV were much smaller in diabetes group than in healthy control group (P < .0001). The groupaverages of control and diabetes are shown in blue symbols with error bars as the standard deviations. There was no significant difference inphase shifts between left and right blood flow velocities in both control and diabetes groups.


Table 1: Transfer function results. Adapted from [16]. P values indicate between group comparisons.

Group0.01–0.07 Hz 0.1–0.4 Hz

Control Diabetes P Control Diabetes P

(n = 20) (n = 20) (n = 20) (n = 20)

Coherence (left) 0.47± 0.12 0.54± 0.15 .12 0.71± 0.13 0.60± 0.18 .05

Coherence (right) 0.45± 0.11 0.50± 0.17 .25 0.70± 0.12 0.58± 0.17 .02

Gain (left) 0.67± 0.42 0.67± 0.42 .98 1.07± 0.27 0.68± 0.34 .0003

Gain (right) 0.65± 0.43 0.59± 0.36 .64 1.01± 0.33 0.63± 0.34 .0006

Phase (left) 36.9± 32.1 44.3± 32.5 .49 20.6± 8.8 19.5± 10.4 .73

Phase (right) 44.6± 29.9 38.5± 39.4 .57 21.3± 11.8 22.2± 9.6 .79

study, we showed that BP-BFV phase shifts of spontaneousoscillation for hypertensive stroke subjects were significantlyreduced when compared to healthy subjects as shown byprevious studies during the VM [13]. Therefore, the BP-BFVphase shifts derived from the spontaneous oscillations canalso be used as the indicator of dynamic CA.

6.3. Frequency dependence of cerebral autoregulation

It has been proposed that autoregulatory mechanisms act asa high-pass filter—cybernetic model [35, 37], being moreactive at lower frequencies (< 0.1 Hz) and less effective forfaster spontaneous fluctuations and at respiration frequency.Though there is no established physiologic neural pathwaythat can account for the high-pass filter mechanism, thefrequency dependent influence of CA has been supported bymany studies that are based on the transfer function analysis[39, 40, 42, 65]. It is important to note that coherence, gain,and phase of transfer function are continuous functions offrequency and do not exhibit an apparent transition pointat a specific frequency. Thus, the frequency-dependent influ-ence of CA, as suggested by the model and transfer functionresults, does not indicate a cutoff frequency beyond whichCA has no influence on blood flow regulation. Nevertheless,many studies used ∼0.1 Hz as an upper frequency boundaryfor the transfer function analysis; such choice of frequencyrange for the estimation of CA seems rather arbitrary. Sinceprevious studies showed that blood flow level after inducedsudden blood reduction can be restored within 3–6 seconds(corresponding to 0.16–0.33 Hz in frequency domain) [66,67], there is no reason to refute that CA can modulate therelationship of BP and BFV at frequencies faster than 0.1 Hz.Indeed, there were already studies indicating that BP andBFV oscillations at frequencies faster than 0.1 Hz may alsoprovide useful information on CA [14, 68].

Moreover, the transfer function analysis is based onFourier transform that implicitly assumes stationary signalscomposed of sinusoidal oscillations of constant amplitudeand period. However, real-world recordings, such as BP andBFV signals, are usually nonstationary and exhibit dynamicchanges over time (e.g., shifts of respiratory frequencies,occurrence of spontaneous waves, etc.). Therefore, a singletransfer function may not be sensitive enough to identify the

influences of CA on relationship between the BP and BFVoscillations at all time scales.

It is intriguing that the MMPF analysis revealed a specificphase shift between BP and BFV oscillation in the frequencyrange of ∼0.1–0.4 Hz in control subjects, and this phaseshift was significantly reduced in diabetic subjects. Thesefindings strongly support that CA is a continuous dynamicprocess, influencing BP-BFV relationship over a frequencyrange (>0.1 Hz) that is beyond previously ranges recognized.However, transfer function analysis could not identify thisalteration in BP-BFV phase relationship in diabetic subjectsin this frequency range, suggesting that inherent nonlinear-ities of CA may be better described by nonlinear methodssuch as the MMPF and multivariate coherence—an approachthat takes into account contributions of other inputs, forexample, pressure and cerebrovascular resistances [46].

6.4. Comparison of the MMPF method and traditionalCA approaches

The observation that transfer function analysis (TFA) can-not, but the MMPF can, show difference in phase relationbetween systemic BP and BFV in type 2 diabetes, may leadto following explanations: (1) TFA quantifies pressure andflow relationship in a specific frequency range, while MMPFis not frequency dependent. Therefore, these two methodsmay quantify different aspects of underlying mechanismsresponsible for blood flow regulation. (2) Sensitivities ofthese two methods are different so that their performancesin a small sample size of subjects can be different. As shownby previous studies, both TFA and MMPF can identifyalterations in blood flow regulation in pathologic conditionssuch as stroke, hypertension, and traumatic brain injuriesthat are associated with impaired autoregulation. Thesefindings indicate that both methods can quantify CA usingBP and cerebral BFV but do not explain different resultsin diabetic patients. The second possibility comes fromthe fact that TFA usually focuses on the frequencies below0.1 Hz while MMPF does not assume frequency range, thatis, MMPF extracted dominant oscillations that are trulyembedded in data. Thus, the optimal frequency range todistinguish the difference between controls and diabetics inblood pressure and blood flow relationship is not known. Inthis study, we found that there were no group differences


0 0.5 11-specificity

0

0.5

1

Sen

siti

vity

Left

MMPFTransfer function

(a)

0 0.5 11-specificity

0

0.5

1

Sen

siti

vity

Right

MMPFTransfer function

(b)

Figure 7: Receiver operating characteristic (ROC) curves for the DM prediction using BP-BFV phase shifts obtained from the MMPFmethod and using transfer function phases (0.1–0.4 Hz) (adapted from [16]). The y-axis is the sensitivity, representing the percentage ofDM subjects identified; and the x-axis is 1-specificity; that is, the percentage of control subjects that are incorrectly identified as DM subjects.The areas under the ROC curves (AUC) closer to 1.0 for BP-BFV phase shifts indicates that the MMPF measure serve as a better discriminatorbetween the control and DM groups than traditional transfer function analysis.

in TFA results in the frequency range 0.01–0.07 Hz (inwhich CA was traditionally believed to affect pressure andflow relationship). The frequency of dominant oscillationsin blood pressure and flow extracted by MMPF was from0.1 to 0.4 Hz. However, BP-BFV phase obtained from TFAfor the frequency range 0.1–0.4 Hz showed no differencebetween controls and diabetic subjects, either (see Table 1).This finding refutes the notion that the differences in resultsdetected by TFA and MMPF are merely due to differencesin frequency range. Therefore, the differences in sensitivityof both methods offer explanation for discrepancy in theCA estimates in diabetic patients. Consistently, we foundthat the BP-BFV phase shift had a better performance indiscriminating between control subjects and subjects withtype 2 diabetes (see Figure 7). The different results obtainedfrom the two analyses may not be surprising becausethe BP-BFV phase shifts of transfer function analysis arebased on the Fourier transform which is not applicableto nonstationary BP and BFV signals and nonlinear BP-BFV relationship. Comparisons of the MMPF and the TFAperformance were done only using data obtained frompatients with type 2 diabetes. It would be desirable to furtherestablish reliability and repeatability of these methods inother pathological conditions that are known to impaircerebral autoregulation.

This review was focused on the MMPF method. Thereare other approaches from nonlinear dynamics such asphase synchronization technique [14], multiple multivariatecoherence [46], and general Volterra-Wiener approaches [44,

45, 47] that have been used to quantify cerebral autoregu-lation but could not be covered in this short review. Moresystematic studies are necessary to evaluate advantages anddisadvantages of these innovative methods during differentphysiological and pathological conditions.

In conclusion, CA dynamics can be reliably estimatedfrom spontaneous BP and BFV fluctuations during baselineresting conditions, and the BFV-BP phase shift obtainedby the improved MMPF method is a sensitive and reliablemeasure of blood flow regulation and can be potentially usedto monitor autoregulation in subjects with cerebromicrovas-cular diseases.

ABBREVIATIONS

MMPF: Multimodal pressure flow method;EMD: Empirical mode decomposition;EEMD: Ensemble empirical mode decomposition;IMF: Intrinsic mode functions;BP: Blood pressure;BFV: Blood flow velocity;VM: Valsalva maneuver;TCD: Transcranial Doppler;CA: Cerebral autoregulation.

ACKNOWLEDGMENTS

This study was supported by an American Diabetes Associa-tion Grant 1-03-CR-23 to V. Novak, an NIH Older American


Independence Center Grant AG08812, NIH Programprojects AG004390 and NS045745, NIH-NINDS STTR grantNS053128 in collaboration with DynaDx, Inc., a CIMITNew Concept Grant (W81XWH) and a General ClinicalResearch Center (GCRC) Grant MO1-RR01302., and JamesS. McDonnell Foundation, the Ellison Medical Founda-tion Senior Scholar in Aging Award, the G. Harold andLeila Y. Mathers Charitable Foundation, Defense AdvancedResearch Projects Agency, and the NIH/National Centerfor Research Resources (P41RR013622). M.-T Lo gratefullyacknowledges support by NCU plan to develop first-classuniversity and top-level research centers (Grant 965941). Theauthors acknowledge Steven Lin, Ary Goldberger for theirhelpful comments, and Chris Peng for the assistance of dataprocessing.

REFERENCES

[1] H. Kantz and T. Schreiber, Nonlinear Time Series Analysis,Cambridge University Press, Cambridge, UK, 1997.

[2] G. M. Viswanathan, C.-K. Peng, H. E. Stanley, and A. L.Goldberger, “Deviations from uniform power law scaling innonstationary time series,” Physical Review E, vol. 55, no. 1,pp. 845–849, 1997.

[3] P. Bernaola-Galvan, P. Ch. Ivanov, L. A. Nunes Amaral, and H.E. Stanley, “Scale invariance in the nonstationarity of humanheart rate,” Physical Review Letters, vol. 87, no. 16, Article ID168105, 4 pages, 2001.

[4] J. J. Collins and I. N. Stewart, “Symmetry-breaking bifur-cation: a possible mechanism for 2:1 frequency-locking inanimal locomotion,” Journal of Mathematical Biology, vol. 30,no. 8, pp. 827–838, 1992.

[5] J. J. Collins and C. J. De Luca, “Random walking during quietstanding,” Physical Review Letters, vol. 73, no. 5, pp. 764–767,1994.

[6] K. Hu, P. Ch. Ivanov, Z. Chen, M. F. Hilton, H. E. Stanley,and S. A. Shea, “Non-random fluctuations and multi-scaledynamics regulation of human activity,” Physica A, vol. 337,no. 1-2, pp. 307–318, 2004.

[7] C.-K. Peng, J. Mietus, J. M. Hausdorff, S. Havlin, H. E. Stanley,and A. L. Goldberger, “Long-range anticorrelations and non-Gaussian behavior of the heartbeat,” Physical Review Letters,vol. 70, no. 9, pp. 1343–1346, 1993.

[8] M. Costa, A. L. Goldberger, and C.-K. Peng, “Multiscaleentropy to distinguish physiologic and synthetic RR timeseries,” Computers in Cardiology, vol. 29, pp. 137–140, 2002.

[9] B. Pompe, P. Blidh, D. Hoyer, and M. Eiselt, “Using mutualinformation to measure coupling in the cardiorespiratorysystem,” IEEE Engineering in Medicine and Biology Magazine,vol. 17, no. 6, pp. 32–39, 1998.

[10] D. Hoyer, R. Bauer, B. Walter, and U. Zwiener, “Estimationof nonlinear couplings on the basis of complexity andpredictability—a new method applied to cardiorespiratorycoordination,” IEEE Transactions on Biomedical Engineering,vol. 45, no. 5, pp. 545–552, 1998.

[11] C. Schafer, M. G. Rosenblum, H.-H. Abel, and J. Kurths,“Synchronization in the human cardiorespiratory system,”Physical Review E, vol. 60, no. 1, pp. 857–870, 1999.

[12] K. H. Chon, Y.-M. Chen, N.-H. Holstein-Rathlou, and V. Z.Marmarelis, “Nonlinear system analysis of renal autoregula-tion in normotensive and hypertensive rats,” IEEE Transactionson Biomedical Engineering, vol. 45, no. 3, pp. 342–353, 1998.

[13] V. Novak, A. C. C. Yang, L. Lepicovsky, A. L. Goldberger, L. A.Lipsitz, and C.-K. Peng, “Multimodal pressure-flow methodto assess dynamics of cerebral autoregulation in stroke andhypertension,” BioMedical Engineering Online, vol. 3, article39, 2004.

[14] Z. Chen, K. Hu, H. E. Stanley, V. Novak, and P. Ch.Ivanov, “Cross-correlation of instantaneous phase incrementsin pressure-flow fluctuations: applications to cerebral autoreg-ulation,” Physical Review E, vol. 73, no. 3, Article ID 031915,14 pages, 2006.

[15] K. Hu, C.-K. Peng, M. Czosnyka, P. Zhao, and V. Novak,“Nonlinear assessment of cerebral autoregulation from spon-taneous blood pressure and cerebral blood flow fluctuations,”Cardiovascular Engineering, vol. 8, no. 1, pp. 60–71, 2008.

[16] K. Hu, C.-K. Peng, N. E. Huang, et al., “Altered phaseinteractions between spontaneous blood pressure and flowfluctuations in type 2 diabetes mellitus: nonlinear assessmentof cerebral autoregulation,” Physica A, vol. 387, no. 10, pp.2279–2292, 2008.

[17] K. Hu, P. Ch. Ivanov, M. F. Hilton, et al., “Endogenous circa-dian rhythm in an index of cardiac vulnerability independentof changes in behavior,” Proceedings of the National Academyof Sciences of the United States of America, vol. 101, no. 52, pp.18223–18227, 2004.

[18] K. Hu, F. A. J. L. Scheer, P. Ch. Ivanov, R. M. Buijs, andS. A. Shea, “The suprachiasmatic nucleus functions beyondcircadian rhythm generation,” Neuroscience, vol. 149, no. 3, pp.508–517, 2007.

[19] P. Ch. Ivanov, K. Hu, M. F. Hilton, S. A. Shea, and H. E. Stanley,“Endogenous circadian rhythm in human motor activityuncoupled from circadian influences on cardiac dynamics,”Proceedings of the National Academy of Sciences of the UnitedStates of America, vol. 104, no. 52, pp. 20702–20707, 2007.

[20] N. E. Huang, Z. Shen, S. R. Long, et al., “The empirical modedecomposition and the Hubert spectrum for nonlinear andnon-stationary time series analysis,” Proceedings of the RoyalSociety A, vol. 454, no. 1971, pp. 903–995, 1998.


[22] R. Maestri, G. D. Pinna, A. Accardo, et al., “Nonlinearindices of heart rate variability in chronic heart failurepatients: redundancy and comparative clinical value,” Journalof Cardiovascular Electrophysiology, vol. 18, no. 4, pp. 425–433,2007.


[24] C. M. Sweeney-Reed and S. J. Nasuto, “A novel approach tothe detection of synchronisation in EEG based on empiricalmode decomposition,” Journal of Computational Neuroscience,vol. 23, no. 1, pp. 79–111, 2007.

[25] R. Aaslid, “Cerebral hemodynamics,” in Transcranial Doppler,D. W. Newell and R. Aaslid, Eds., pp. 49–55, Raven Press, NewYork, NY, USA, 1992.

[26] R. B. Panerai, “Assessment of cerebral pressure autoregulationin humans—a review of measurement methods,” PhysiologicalMeasurement, vol. 19, no. 3, pp. 305–338, 1998.

[27] V. Novak, J. M. Spies, P. Novak, B. R. McPhee, T. A. Rummans,and P. A. Low, “Hypocapnia and cerebral hypoperfusion in


orthostatic intolerance,” Stroke, vol. 29, no. 9, pp. 1876–1881,1998.

[28] V. Novak, A. Chowdhary, B. Farrar, et al., “Altered cerebralvasoregulation in hypertension and stroke,” Neurology, vol. 60,no. 10, pp. 1657–1663, 2003.

[29] S. L. Dawson, R. B. Panerai, and J. F. Potter, “Critical closingpressure explains cerebral hemodynamics during the Valsalvamaneuver,” Journal of Applied Physiology, vol. 86, no. 2, pp.675–680, 1999.

[30] R. B. Panerai, S. L. Dawson, P. J. Eames, and J. F. Potter, “Cere-bral blood flow velocity response to induced and spontaneoussudden changes in arterial blood pressure,” American Journalof Physiology, vol. 280, no. 5, pp. H2162–H2174, 2001.

[31] B. J. Carey, R. B. Panerai, and J. F. Potter, “Effect of aging ondynamic cerebral autoregulation during head-up tilt,” Stroke,vol. 34, no. 8, pp. 1871–1875, 2003.

[32] R. Cavestri, L. Radice, F. Ferrarini, et al., “CBF side-to-sideasymmetries in stenosis-occlusion of internal carotid artery.Relevance of CT findings and collateral supply,” Italian Journalof Neurological Sciences, vol. 12, no. 5, pp. 383–388, 1991.

[33] G. Russo, R. de Falco, E. Scarano, A. Cigliano, and G. Profeta,“Non invasive recording of CO2 cerebrovascular reactivity innormal subjects and patients with unilateral internal carotidartery stenosis,” Journal of Neurosurgical Sciences, vol. 38, no.3, pp. 147–153, 1994.

[34] M. Silvestrini, F. Vernieri, P. Pasqualetti, et al., “Impairedcerebral vasoreactivity and risk of stroke in patients withasymptomatic carotid artery stenosis,” Journal of the AmericanMedical Association, vol. 283, no. 16, pp. 2122–2127, 2000.

[35] R. R. Diehl, D. Linden, D. Lucke, and P. Berlit, “Phaserelationship between cerebral blood flow velocity and bloodpressure: a clinical test of autoregulation,” Stroke, vol. 26, no.10, pp. 1801–1804, 1995.

[36] A. A. Birch, M. J. Dirnhuber, R. Hartley-Davies, F. Iannotti,and G. Neil-Dwyer, “Assessment of autoregulation by meansof periodic changes in blood pressure,” Stroke, vol. 26, no. 5,pp. 834–837, 1995.

[37] R. R. Diehl, D. Linden, D. Lucke, and P. Berlit, “Spontaneousblood pressure oscillations and cerebral autoregulation,” Clin-ical Autonomic Research, vol. 8, no. 1, pp. 7–12, 1998.

[38] A. P. Blaber, R. L. Bondar, F. Stein, et al., “Transfer functionanalysis of cerebral autoregulation dynamics in autonomicfailure patients,” Stroke, vol. 28, no. 9, pp. 1686–1692, 1997.

[39] R. Zhang, J. H. Zuckerman, C. A. Giller, and B. D. Levine,“Transfer function analysis of dynamic cerebral autoregula-tion in humans,” American Journal of Physiology, vol. 274, no.1, pp. H233–H241, 1998.

[40] C. Haubrich, A. Wendt, R. R. Diehl, and C. Klotzsch,“Dynamic autoregulation testing in the posterior cerebralartery,” Stroke, vol. 35, no. 4, pp. 848–852, 2004.

[41] C. A. Giller, “The frequency-dependent behavior of cerebralautoregulation,” Neurosurgery, vol. 27, no. 3, pp. 362–368,1990.

[42] C. A. Giller and D. G. Iacopino, “Use of middle cerebralvelocity and blood pressure for the analysis of cerebralautoregulation at various frequencies: the coherence index,”Neurological Research, vol. 19, no. 6, pp. 634–640, 1997.

[43] C. Haubrich, A. Klemm, R. R. Diehl, W. Moller-Hartmann,and C. Klotzsch, “M-wave analysis and passive tilt in patientswith different degrees of carotid artery disease,” Acta Neuro-logica Scandinavica, vol. 109, no. 3, pp. 210–216, 2004.

[44] G. D. Mitsis, R. Zhang, B. D. Levine, and V. Z. Marmarelis,“Modeling of nonlinear physiological systems with fast andslow dynamics. II. Application to cerebral autoregulation,”

Annals of Biomedical Engineering, vol. 30, no. 4, pp. 555–565,2002.

[45] G. D. Mitsis, M. J. Poulin, P. A. Robbins, and V. Z. Marmarelis,“Nonlinear modeling of the dynamic effects of arterial pres-sure and CO2 variations on cerebral blood flow in healthyhumans,” IEEE Transactions on Biomedical Engineering, vol.51, no. 11, pp. 1932–1943, 2004.

[46] R. B. Panerai, P. J. Eames, and J. F. Potter, “Multiple coherenceof cerebral blood flow velocity in humans,” American Journalof Physiology, vol. 291, no. 1, pp. H251–H259, 2006.

[47] G. D. Mitsis, R. Zhang, B. D. Levine, and V. Z. Marmarelis,“Cerebral hemodynamics during orthostatic stress assessed bynonlinear modeling,” Journal of Applied Physiology, vol. 101,no. 1, pp. 354–366, 2006.

[48] R. R. Diehl, B. Diehl, M. Sitzer, and M. Hennerici, “Sponta-neous oscillations in cerebral blood flow velocity in normalhumans and in patients with carotid artery disease,” Neuro-science Letters, vol. 127, no. 1, pp. 5–8, 1991.

[49] J. M. Karemaker, “Analysis of blood pressure and heart ratevariability: theoretical considerations,” in Clinical AutonomicDisorders: Evaluation and Management, P. A. Low, Ed., pp.309–322, Lippincott-Raven, Philadelphia, Pa, USA, 2nd edi-tion, 1997.

[50] T. B.-J. Kuo, C.-M. Chern, W.-Y. Sheng, W.-J. Wong, andH.-H. Hu, “Frequency domain analysis of cerebral bloodflow velocity and its correlation with arterial blood pressure,”Journal of Cerebral Blood Flow & Metabolism, vol. 18, no. 3, pp.311–318, 1998.

[51] R. I. Kitney, T. Fulton, A. H. McDonald, and D. A. Linkens,“Transient interactions between blood pressure, respirationand heart rate in man,” Journal of Biomedical Engineering, vol.7, no. 3, pp. 217–224, 1985.

[52] V. Novak, P. Novak, J. de Champlain, A. R. Le Blanc, R. Martin,and R. Nadeau, “Influence of respiration on heart rate andblood pressure fluctuations,” Journal of Applied Physiology, vol.74, no. 2, pp. 617–626, 1993.

[53] P. J. Eames, M. J. Blake, S. L. Dawson, R. B. Panerai, andJ. F. Potter, “Dynamic cerebral autoregulation and beat tobeat blood pressure control are impaired in acute ischaemicstroke,” Journal of Neurology Neurosurgery and Psychiatry, vol.72, no. 4, pp. 467–472, 2002.

[54] S. L. Dawson, R. B. Panerai, and J. F. Potter, “Serial changesin static and dynamic cerebral autoregulation after acuteischaemic stroke,” Cerebrovascular Diseases, vol. 16, no. 1, pp.69–75, 2003.

[55] J. Kwan, M. Lunt, and D. Jenkinson, “Assessing dynamiccerebral autoregulation after stroke using a novel techniqueof combining transcranial Doppler ultrasonography andrhythmic handgrip,” Blood Pressure Monitoring, vol. 9, no. 1,pp. 3–8, 2004.

[56] D. N. W. Griffith, S. Saimbi, C. Lewis, S. Tolfree, andD. J. Betteridge, “Abnormal cerebrovascular carbon dioxidereactivity in people with diabetes,” Diabetic Medicine, vol. 4,no. 3, pp. 217–220, 1987.

[57] B. Zvan, M. Zaletel, J. P. Oblak, T. Pogacnik, and T. Kiauta,“The middle cerebral artery flow velocities during head-uptilt testing in diabetic patients with autonomic nervous systemdysfunction,” Cerebrovascular Diseases, vol. 15, no. 4, pp. 270–275, 2003.

[58] L. A. Lipsitz, S. Mukai, J. Hamner, M. Gagnon, and V.L. Babikian, “Dynamic regulation of middle cerebral arteryblood flow velocity in aging and hypertension,” Stroke, vol. 31,no. 8, pp. 1897–1903, 2000.


[59] Z. Wu and N. E. Huang, “Ensemble empirical mode decompo-sition: a noise-assisted data analysis method,” Tech. Rep. 193,Centre for Ocean-Land-Atmosphere Studies, Calverton, Md,USA, 2005.

[60] D. Gabor, “Theory of communication,” Journal of the IEE, vol.93, part 3, no. 26, pp. 429–457, 1946.

[61] M. H. Zweig and G. Campbell, “Receiver-operating character-istic (ROC) plots: a fundamental evaluation tool in clinicalmedicine,” Clinical Chemistry, vol. 39, no. 4, pp. 561–577,1993.

[62] D. M. Bramble and D. R. Carrier, “Running and breathing inmammals,” Science, vol. 219, no. 4582, pp. 251–256, 1983.

[63] P. Tass, M. G. Rosenblum, J. Weule, et al., “Detectionof n : m phase locking from noisy data: application tomagnetoencephalography,” Physical Review Letters, vol. 81, no.15, pp. 3291–3294, 1998.

[64] F. P. Tiecks, A. M. Lam, R. Aaslid, and D. W. Newell,“Comparison of static and dynamic cerebral autoregulationmeasurements,” Stroke, vol. 26, no. 6, pp. 1014–1019, 1995.

[65] J. W. Hammer, M. A. Cohen, S. Mukai, L. A. Lipsitz, and J. A.Taylor, “Spectral indices of human cerebral blood flow control:responses to augmented blood pressure oscillations,” Journalof Physiology, vol. 559, no. 3, pp. 965–973, 2004.

[66] L. Symon, K. Held, and N. W. Dorsch, “A study of regionalautoregulation in the cerebral circulation to increased perfu-sion pressure in normocapnia and hypercapnia,” Stroke, vol. 4,no. 2, pp. 139–147, 1973.

[67] R. Aaslid, K.-F. Lindegaard, W. Sorteberg, and H. Nornes,“Cerebral autoregulation dynamics in humans,” Stroke, vol.20, no. 1, pp. 45–52, 1989.

[68] R. B. Panerai, J. M. Rennie, A. W. R. Kelsall, and D. H. Evans,“Frequency-domain analysis of cerebral autoregulation fromspontaneous fluctuations in arterial blood pressure,” Medical& Biological Engineering & Computing, vol. 36, no. 3, pp. 315–322, 1998.


Research ArticleSpeech Enhancement via EMD

Kais Khaldi,1, 2 Abdel-Ouahab Boudraa,2, 3 Abdelkhalek Bouchikhi,2, 3 and Monia Turki-Hadj Alouane1

1 Unite Signaux et Systemes, ENIT, BP 37, Le Belvedere 1002, Tunis, Tunisia2 IRENav, Ecole Navale, Lanveoc Poulmic, BP600, 29200 Brest-Armees, France3 E3I2, EA 3876, ENSIETA, 2 rue Francois Verny, 29806 Brest Cedex 09, France

Correspondence should be addressed to Abdel-Ouahab Boudraa, [email protected]

Received 13 August 2007; Accepted 5 March 2008


In this study, two new approaches for speech signal noise reduction based on the empirical mode decomposition (EMD) recentlyintroduced by Huang et al. (1998) are proposed. Based on the EMD, both reduction schemes are fully data-driven approaches.Noisy signal is decomposed adaptively into oscillatory components called intrinsic mode functions (IMFs), using a temporaldecomposition called sifting process. Two strategies for noise reduction are proposed: filtering and thresholding. The basicprinciple of these two methods is the signal reconstruction with IMFs previously filtered, using the minimum mean-squarederror (MMSE) filter introduced by I. Y. Soon et al. (1998), or thresholded using a shrinkage function. The performance of thesemethods is analyzed and compared with those of the MMSE filter and wavelet shrinkage. The study is limited to signals corruptedby additive white Gaussian noise. The obtained results show that the proposed denoising schemes perform better than the MMSEfilter and wavelet approach.

Copyright © 2008 Kais Khaldi et al. This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. INTRODUCTION

Speech enhancement is a classical problem in signal pro-cessing, particularly in the case of additive white Gaussiannoise where different noise reduction methods have beenproposed [1–4]. When noise estimation is available, thenfiltering gives accurate results. However, these methods arenot so effective when noise is difficult to estimate. Linearmethods such as Wiener filtering [5] are used becauselinear filters are easy to implement and design. These linearmethods are not so effective for signals presenting sharpedges or impulses of short duration. Furthermore, realsignals are often nonstationary. In order to overcome theseshortcomings, nonlinear methods have been proposed andespecially those based on wavelets thresholding [6, 7]. Theidea of wavelet thresholding relies on the assumption thatsignal magnitudes dominate the magnitudes of noise in awavelet representation so that wavelet coefficients can be setto zero if their magnitudes are less than a predeterminedthreshold [7]. A limit of the wavelet approach is that basisfunctions are fixed, and, thus, do not necessarily match allreal signals. To avoid this problem, time-frequency atomicsignal decomposition can be used [8, 9]. As for waveletpackets, if the dictionary is very large and rich with a collec-

tion of atomic waveforms which are located on a much finergrid in time-frequency space than wavelet and cosine packettables, then it should be possible to represent a large class ofreal signals; but, in spite of this, the basis functions must bespecified (Gabor functions, damped sinusoids, . . .).

Recently, a new data-driven technique, referred to asempirical mode decomposition (EMD) has been introducedby Huang et al. [10] for analyzing data from nonstationaryand nonlinear processes. The EMD has received moreattention in terms of applications [11–23], interpretation[24, 25], and improvement [26, 27]. The major advantageof the EMD is that basis functions are derived from thesignal itself. Hence, the analysis is adaptive in contrast tothe traditional methods where basis functions are fixed.The EMD is based on the sequential extraction of energyassociated with various intrinsic time scales of the signal,called intrinsic mode functions (IMFs), starting from finertemporal scales (high-frequency IMFs) to coarser ones (low-frequency IMFs). The total sum of the IMFs matches thesignal very well and therefore ensures completeness [10]. Wehave shown that the EMD can be used for signals denoising[14, 15] or filtering [17]. The denoising method reconstructsthe signal with all the IMFs previously thresholded as inwavelet analysis or filtered [14, 15]. The filtering scheme


relies on the basic idea that most structures of the signalare often concentrated on lower-frequency components (lastIMFs), and decrease toward high-frequency modes (firstIMFs) [17]. Thus, the recovered signal is reconstructed withonly few IMFs that are signal dominated using an energycriterion. Thus, compared to the approach introduced in [14,15], no thresholding or filtered is required. The proposedfiltering method is a fully data approach [17].

In this paper, we show how the idea of thresholdingIMFs using hard or soft shrinkage introduced in [14, 15] canbe extended and adapted to speech signal for enhancementpurpose. According to if the noise level can be correctlyestimated or not, two noise reduction methods are proposed.The first strategy combines the EMD and the minimummean-squared error (MMSE) filter [1], and the secondone associates the EMD with hard shrinkage [14, 15]. Thetwo methods are applied to speech signals corrupted withdifferent noise levels, and the results are compared to theMMSE filter and the wavelet approach.

2. EMD ALGORITHM

The EMD decomposes a signal x(t) into a series of IMFsthrough an iterative process called sifting; each one, withdistinct time scale [10]. The decomposition is based on thelocal time scale of x(t) and yields adaptive basis functions.The EMD can be seen as a type of wavelet decompositionwhose subbands are built up as needed to separate thedifferent components of x(t). Each IMF replaces the signalsdetail, at a certain scale or frequency band [24]. The EMDpicks out the highest-frequency oscillation that remains inx(t). By definition, an IMF satisfies two conditions:

(1) the number of extrema and the number of zeroscrossings may differ by no more than one;

(2) the average value of the envelope defined by thelocal maxima and the envelope defined by the localminima is zero.

Thus, locally, each IMF contains lower-frequency oscillationsthan the just-extracted one. To be successfully decomposedinto IMFs, x(t) must have at least two extrema; one mini-mum and one maximum. The sifting involves the followingsteps:

Step 1. fix the threshold ε and set j ← 1 ( jth IMF);

Step 2. r j−1(t) ← x(t) (residual);

Step 3. extract the jth IMF:

(a) hj,i−1(t) ← r j−1(t), i← 1 (i number of sifts),

(b) extract local maxima/minima of hj,i−1(t),

(c) compute upper and lower envelopes Uj,i−1(t) andLj,i−1(t) by interpolating, using cubic spline, respec-tively, local maxima and minima of hj,i−1(t),

(d) compute the mean of the envelopes: μj,i−1(t) =(Uj,i−1(t) + Lj,i−1(t))/2,

(e) update: hj,i(t) := hj,i−1(t)− μj,i−1(t), i := i + 1,

(f) calculate the following stopping criterion: SD(i) =∑T

t=1(|hj,i−1(t)− hj,i(t)|2/(hj,i−1(t))2),

(g) repeat Steps (b)–(f) until SD(i) < ε and then putIMF j(t) ← hj,i(t) ( jth IMF);

Step 4. update residual: r j(t) := r j−1(t)− IMF j(t);

Step 5. repeat Step 3 with j := j + 1 until the number ofextrema in r j(t) is ≤ 2;

where T is x(t) time duration. The sifting is repeated severaltimes (i) in order to get h true IMF that fulfills the conditions(1) and (2). The result of the sifting is that x(t) will bedecomposed into a sum of C IMFs and a residual rC(t) suchthat

x(t) =C∑

j=1

IMF j(t) + rC(t), (1)

C value is determined automatically using SD (Step 3(f)).The sifting has two effects: (a) it eliminates riding waves and(b) to smoothen uneven amplitudes. To guarantee that IMFcomponents retain enough physical sense of both amplitudeand frequency modulation, we have to determine SD valuefor the sifting. This is accomplished by limiting the size of thestandard deviation SD computed from the two consecutivesifting results. Usually, SD (or ε) is set between 0.2 to 0.3 [10].

3. DENOISING PRINCIPLE

Let a clean speech signal x(t) be corrupted by an additivewhite Gaussian noise b(t) as follows:

y(t) = x(t) + b(t). (2)

The noisy signal is decomposed into a sum of IMFs by theEMD, such that

y(t) =C∑

j=1

IMF j(t) + rC(t), (3)

where IMF j is a noisy version of the data f j :

IMF j(t) = f j(t) + bj(t). (4)

An estimation f j(t) of f j(t) based on the noisy observationIMF j(t) is given by

f j(t) = Γ[IMF j(t); τj], (5)

where Γ[IMF j(t); τj] is a preprocessing function, defined bya set of parameters τj , applied to signal IMF j [14, 15]. Thefunction Γ is chosen according to if noise level can be esti-mated or not. When this estimation is possible, Γ is reducedto the MMSE filter [1]. However, when the estimation is noteasy, the preprocessing can be a thresholding [14, 15]. Thefunction Γ is a shrinkage, and τ is a threshold parameter.Finally, the denoised signal, x(t), is given by

x(t) =C∑

j=1

f j(t) + rC(t). (6)

Kais Khaldi et al. 3

−101

Am

plit

ude

0 1 2 3 4 5 6

Time ×104

−101

Am

plit

ude

0 1 2 3 4 5 6

Time ×104

−1

01

Am

plit

ude

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Time ×104

−101

Am

plit

ude

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Time ×104

(a)

(b)

(c)

(d)

Figure 1: The original signals “a”, “b”, “c”, and “d”.

−101

Am

plit

ude

0 1 2 3 4 5 6

Time ×104

−101

Am

plit

ude

0 1 2 3 4 5 6

Time ×104

−1

01

Am

plit

ude

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Time ×104

−101

Am

plit

ude

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Time ×104

(a)

(b)

(c)

(d)

Figure 2: The noisy version of signals “a”, “b”, “c”, and “d”. (SNR =5 dB).

3.1. EMD-MMSE

Generally, speech noise estimation is performed using theBoll’s method [28]. Accordingly, the silence periods of thesignal are detected, and then power spectra noise estimationis performed by considering the average of the power spectraof the noisy signal on the M first temporal frames whichare considered as being moments of silence, following therelation

∣∣B(fe,m)

∣∣2 = 1

M

M−1∑

i=0

|B(fe, i)|2, (7)

EMD-MMSE

−101

Am

plit

ude

0 1 2 3 4 5 6

Time ×104

−101

Am

plit

ude

0 1 2 3 4 5 6

Time ×104

−1

01

Am

plit

ude

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Time ×104

−1

01

Am

plit

ude

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Time ×104

(a)

(b)

(c)

(d)

(a)

MMSE filter

−101

Am

plit

ude

0 1 2 3 4 5 6

Time ×104

−101

Am

plit

ude

0 1 2 3 4 5 6

Time ×104

−1

01

Am

plit

ude

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Time ×104

−1

01

Am

plit

ude

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Time ×104

(a)

(b)

(c)

(d)

(b)

Figure 3: Denoising results of signals “a”, “b”, “c”, and “d” by theEMD-MMSE and the MMSE filter.

where |B(fe, i)| is power spectra value at the discrete fre-quency fe of frame i. This method gives a correct estimationof the noise [28].

Extensive simulations have shown that when a speechsignal with a silence sequence is decomposed by EMD, its firstIMF corresponds to that silence sequence. Thus, the first IMFcan be used to correctly estimate the noise level. Accordingto [24], the noise level of the modes following the first IMF(k = 1) is estimated via

σk = σ1√2k−1 with k ≥ 2, (8)

where σ1 is the noise level of first IMF.


11

12

13

14

15

16

17SN

Rga

in(d

B)

4 5 6 7 8 9 10

Initial SNR (dB)

EMD-MMSEMMSE filter

(a) Gain in SNR for noisy version of “a”

13

14

15

16

17

18

19

SNR

gain

(dB

)

4 5 6 7 8 9 10

Initial SNR (dB)

EMD-MMSEMMSE filter

(b) Gain in SNR for noisy version of “b”

12

13

14

15

16

17

18

19

20

21

SNR

gain

(dB

)

4 5 6 7 8 9 10

Initial SNR (dB)

EMD-MMSEMMSE filter

(c) Gain in SNR for noisy version of “c”

13

14

15

16

17

18

19

20

21

22SN

Rga

in(d

B)

4 5 6 7 8 9 10

Initial SNR (dB)

EMD-MMSEMMSE filter

(d) Gain in SNR for noisy version of “d”

Figure 4: Final SNR values obtained for different initial noise levels of signals “a”, “b”, “c”, and “d”. The results are the average of 100 instancessignal. It is reported for EMD-MMSE and the MMSE filter.

The combination of the EMD and the MMSE filter [1]is called EMD-MMSE strategy. Thus, each IMF is filtered bythe MMSE filter as follows:

F j(fe,m) = H(fe,m)IMF j(fe,m), (9)

where F j(fe,m) and F j(fe,m) are the spectral noisy IMFand the spectral estimated IMF, respectively, observed at thediscrete frequency fe on the framem. H(fe,m) is described asfollows [1]:

H(fe,m) = SNRprio(fe,m)

1 + SNRprio(fe,m). (10)

The signal-to-noise ratio, SNRprio, is estimated according tothe method of Ephraim and Malah [2] which is based on the

estimated F(fe,m−1) from the previous frame and on a localestimation of SNRinst:

SNRprio(fe,m)

= αF

2(fe,m− 1)

B2(fe,m− 1)+ (1− α) max(SNRinst(fe,m), 0),

(11)

where α is a weighting factor (equal to 0.98) and SNRinst indi-cates the instantaneous SNR, defined as the local estimationof SNRprio:

SNRinst = IMF2(fe,m)B2(fe,m)

. (12)


−101

Am

plit

ude

0 2 4 6 8 10 12 14 16 18

Time ×103

−101

Am

plit

ude

0 2 4 6 8 10 12 14 16 18

Time ×103

−1

01

Am

plit

ude

0 0.5 1 1.5 2 2.5 3

Time ×104

−1

01

Am

plit

ude

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Time ×104

(e)

(f)

(g)

(h)

Figure 5: The original signals “e”, “f”, “g”, and “h”.

−101

Am

plit

ude

0 2 4 6 8 10 12 14 16 18

Time ×103

−101

Am

plit

ude

0 2 4 6 8 10 12 14 16 18

Time ×103

−1

01

Am

plit

ude

0 0.5 1 1.5 2 2.5 3

Time ×104

−1

01

Am

plit

ude

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Time ×104

(e)

(f)

(g)

(h)

Figure 6: The noisy version of signals “e”, “f”, “g”, and “h” (SNR =−1 dB).

3.2. EMD-shrinkage

A smooth version of the input signal can be obtained bythresholding the IMFs before signal reconstruction [14, 15].In this case, the threshold parameter is estimated by thefollowing expression [6, 14, 15, 29, 30]:

τ =√

2 log(T)σ , (13)

where T is the signal length and σ is the estimated noise level(scale level). The σ1 is given by [14, 15, 31]

σ1 = 1.4826×Median{∣∣IMF1(t)−Median

{IMF1(t)

}∣∣}.(14)

EMD-shrinkage

−101

Am

plit

ude

0 2 4 6 8 10 12 14 16 18

Time ×103

−101

Am

plit

ude

0 2 4 6 8 10 12 14 16 18

Time ×103

−1

01

Am

plit

ude

0 0.5 1 1.5 2 2.5 3

Time ×104

−1

01

Am

plit

ude

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Time ×104

(e)

(f)

(g)

(h)

(a)

Wavelet-shrinkage (Daubechies 4)

−101

Am

plit

ude

0 2 4 6 8 10 12 14 16 18

Time ×103

−101

Am

plit

ude

0 2 4 6 8 10 12 14 16 18

Time ×103

−1

01

Am

plit

ude

0 0.5 1 1.5 2 2.5 3

Time ×104

−1

01

Am

plit

ude

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Time ×104

(e)

(f)

(g)

(h)

(b)

Figure 7: Denoising results of signals “e”, “f”, “g”, and “h” by theEMD-shrinkage and the wavelet approach (Daubechies 4).

According to [24, 32], using σ1, the noise level σk of the IMFscan be estimated using (8).

There are different nonlinear shrinkage functions [33]. Inthe present work, we use the hard shrinkage which has giveninteresting denoising results for speech enhancement:

f j ={

IMF j(t) if∣∣IMF j(t)

∣∣ > τj ,

0 if∣∣IMF j(t)

∣∣ ≤ τj .

(15)

The association of the EMD and the hard shrinkage is calledEMD-shrinkage method.


0

2

4

6

8

10

12

14SN

Rga

in(d

B)

−10 −8 −6 −4 −2 0 2 4

Initial SNR (dB)

EMD-shrinkageWavelet (Haar)

Wavelet (Symmlet 4)Wavelet (Daubechies 4)

(a) Gain in SNR for noisy version of “e”

0

2

4

6

8

10

12

SNR

gain

(dB

)

−10 −8 −6 −4 −2 0 2 4

Initial SNR (dB)



(b) Gain in SNR for noisy version of “f”

−2

−1

0

1

2

3

4

5

6

7

SNR

gain

(dB

)

−10 −8 −6 −4 −2 0 2 4

Initial SNR (dB)



(c) Gain in SNR for noisy version of “g”

−2

−1

0

1

2

3

4

5

6

7

8SN

Rga

in(d

B)

−10 −8 −6 −4 −2 0 2 4

Initial SNR (dB)



(d) Gain in SNR for noisy version of “h”

Figure 8: Final SNR values obtained for different initial noise levels of signals “e”, “f”, “g”, and “h”. The results are the average of 100 instancessignal. It’s reported for EMD-shrinkage and for three different wavelets (Haar, Symmlet 4, Daubechies 4).

4. RESULTS

The two proposed noise reduction methods are testedon speech signals corrupted by additive white Gaussiannoise with different SNRs. The results are compared to theMMSE filter and the wavelet approach (Haar, Symmlet 4,Daubechies 4). As indicated, the EMD denoising schemesdepend on the noise estimation. So, if the prespeech periodof the noisy signal is detected, then the EMD-MMSE is used.Otherwise, the EMD-shrinkage is used. The SNR is usedas an objective measure to evaluate the denoising methodsperformance. More precisely, the SNR is used to compare the

EMD-MMSE to the MMSE filter and the wavelet approach tothe EMD-shrinkage. The SNR is defined by

SNR = 10 log10

∑Ti=1

(x(ti))2

∑Ti=1

(x(ti)− x(ti

))2 , (16)

where x(ti) and x(ti) are the original signal and thereconstructed one, respectively.

The EMD-MMSE denoising scheme is applied to fourclean speech signals “a”, “b”, “c”, and “d” (Figures 1(a)–1(d)) corrupted by additive white Gaussian noise with SNRvalues ranging from 4 dB to 10 dB. Noisy versions of the


original signals corresponding to SNR = 5 dB are shownin Figure 2. We carried out numerical simulations wherefor each SNR value, 100 independent noise simulations aregenerated and averaged values calculated. Figure 3 showsthe denoising result obtained by the EMD-MMSE and theMMSE filter. From this figure and compared to the respectiveclean signals of Figure 1, one can conclude that the EMD-MMSE performs better in terms of noise reduction than theMMSE filter. This fact is confirmed by the results shown inFigure 4 where interesting improvement in SNR are given bythe EMD-MMSE compared to the MMSE filter. Indeed, theEMD-MMSE’s SNR improvement is about 1 dB greater thanthe MMSE filter for all the four considered signals “a”, “b”,“c”, and “d”.

The EMD-shrinkage is applied to four clean speechsignals “e”, “f”, “g”, and “h” (Figure 5), corrupted by additivewhite Gaussian noise with SNR values ranging from − 10 dBto 3 dB. Noisy versions of the original signals correspondingto SNR = − 1 dB are shown in Figure 6. Denoising resultsof the EMD-shrinkage (hard thresholding) and the waveletmethod (Daubechies 4) are shown in Figure 7. A carefulexamination of the signals of Figures 5 and 7 shows thatthe EMD-shrinkage performs better than the wavelet methodin terms of noise reduction. Furthermore, signals structuresor features are globally better preserved with the EMD-shrinkage than the wavelet method. Figure 8 shows theimprovement in SNR values obtained for different noiselevels of the signals “e”, “f”, “g”, and “h” for the EMD-shrinkage and three-type wavelet method (Haar, Symmlet4, Daubechies 4). This figure demonstrates that for noiseSNR values from − 10 dB to 3 dB, the improvement in SNRprovided by the EMD-shrinkage varies from − 0.7 dB to11.5 dB. In addition, the gain in SNR of the EMD-shrinkageis much better than the one obtained by the other method forthe three wavelets. When listening to the enhanced speeches,both the EMD-MMSE and the EMD-shrinkage are foundto produce lower residual noise and, noticeably, less speechdistortion for all the signals compared to the MMSE or thewavelet method.

5. CONCLUSION

This paper presents two new speech denoising methods.Both schemes are based on the EMD and thus are simpleand fully data-driven methods. The methods do not useany pre- or postprocessing and do not require any useof parameters setting (except the threshold ε). The studyis limited to signals corrupted by additive white Gaussiannoise. Obtained results for clean speech signals corruptedwith additive Gaussian noise with different SNR valuesranging from− 10 dB to 10 dB show that the proposed EMD-denoising methods, associated with the MMSE filter or theshrinkage strategy, perform better than the MMSE filter andthe wavelet approach, respectively. These results show thatthe EMD-denoising methods are effective for noise removaland confirm our previous findings [14, 15]. The EMD-shrinkage is very attractive, especially in the case where thenoise estimation is not easy. Even in the case when the noiselevel estimation is possible, the EMD improves the denoising

result with the classical MMSE filter. The obtained resultsalso show that it is more efficient to apply thresholding orfiltering to the different components (IMFs) of the signalthan to the signal itself. To confirm the obtained resultsand the effectiveness of the EMD-denoising approaches, theschemes must be evaluated with a large class of speech signalsand in different experimental conditions, such as samplingrates, sample sizes, multiplicative noise, or the type of noise.

REFERENCES

[1] I. Y. Soon, S. N. Koh, and C. K. Yeo, “Noisy speechenhancement using discrete cosine transform,” Speech Com-munication, vol. 24, no. 3, pp. 249–257, 1998.

[2] Y. Ephraim and D. Malah, “Speech enhancement using aminimum mean-square error short-time spectral amplitudeestimator,” IEEE Transactions on Acoustics, Speech, and SignalProcessing, vol. 32, no. 6, pp. 1109–1121, 1984.

[3] I.-Y. Soon and S. N. Koh, “Low distortion speech enhance-ment,” IEE Proceedings: Vision, Image and Signal Processing,vol. 147, no. 3, pp. 247–253, 2000.

[4] P. Scalart and J. V. Filho, “Speech enhancement based ona priori signal to noise estimation,” in Proceedings of theIEEE International Conference on Acoustics, Speech and Signal(ICASSP ’96), vol. 2, pp. 629–632, Atlanta, Ga, USA, May 1996.

[5] J. G. Proakis and D. G. Manolakis, Digital Signal Processing:Principles, Algorithms, and Applications, Prentice-Hall, UpperSaddle River, NJ, USA, 3rd edition, 1996.

[6] D. L. Donoho and I. M. Johnstone, “Ideal spatial adaptationby wavelet shrinkage,” Biometrica, vol. 81, no. 3, pp. 425–455,1994.

[7] D. L. Donoho, “De-noising by soft-thresholding,” IEEE Trans-actions on Information Theory, vol. 41, no. 3, pp. 613–627,1995.

[8] S. G. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,” IEEE Transactions on Signal Process-ing, vol. 41, no. 12, pp. 3397–3415, 1993.

[9] M. M. Goodwin and M. Vetterli, “Matching pursuit andatomic signal models based on recursive filter banks,” IEEETransactions on Signal Processing, vol. 47, no. 7, pp. 1890–1902,1999.

[10] N. E. Huang, Z. Shen, S. R. Long, et al., “The empirical modedecomposition and Hilbert spectrum for nonlinear and non-stationary time series analysis,” Proceedings of the Royal SocietyA, vol. 454, no. 1971, pp. 903–995, 1998.

[11] A.-O. Boudraa, J. C. Cexus, F. Salzenstein, and L. Guillon,“If estimation using empirical mode decomposition andnonlinear Teager energy operator,” in Proceedings of the 1stInternational Symposium on Control, Communications and Sig-nal Processing (ISCCSP ’04), pp. 45–48, Hammamet, Tunisia,March 2004.

[12] J. C. Cexus and A. O. Boudraa, “Non-stationary signalsanalysis by Teager-Huang transform (THT),” in Proceedings ofthe 14th European Signal Processing Conference (EUSIPCO ’06),p. 5, Florence, Italy, September 2006.

[13] J. C. Cexus and A. O. Boudraa, “Teager-Huang analysis appliedto sonar target recognition,” International Journal of SignalProcessing, vol. 1, no. 1, pp. 23–27, 2004.

[14] A. O. Boudraa, J. C. Cexus, and Z. Saidi, “EMD-based signalnoise reduction,” International Journal of Signal Processing, vol.1, no. 1, pp. 33–37, 2004.

[15] A. O. Boudraa and J. C. Cexus, “Denoising via empiricalmode decomposition,” in Proceedings of the IEEE International


Symposium on Control, Communications and Signal Processing(ISCCSP ’06), p. 4, Marrakech, Morocco, March 2006.

[16] B. Weng, M. Blanco-Velasco, and K. E. Barner, “ECGdenoising based on the empirical mode decomposition,” inProceedings of the 28th Annual International Conference of theIEEE Engineering in Medicine and Biology Society (EMBS ’06),pp. 1–4, New York, NY, USA, August-September 2006.

[17] A. O. Boudraa, J. C. Cexus, S. Benramdane, and A. Beghdadi,“Noise filtering using empirical mode decomposition,” inProceedings of the IEEE International Symposium on SignalProcessing and Its Applications (ISSPA ’07), p. 4, Sharjah,United Arab Emirates, February 2007.

[18] Z. Liu and S. Peng, “Boundary processing of bidimensionalEMD using texture synthesis,” IEEE Signal Processing Letters,vol. 12, no. 1, pp. 33–36, 2005.

[19] A. O. Boudraa, J. C. Cexus, F. Salzenstein, and A. Beghdadi,“EMD-based multibeam echosounder images segmentation,”in Proceedings of the 2nd IEEE International Symposium onCommunications, Control and Signal Processing (ISCCSP ’06),p. 4, Marrakech, Morocco, March 2006.

[20] K. Zeng and M.-X. He, “A simple boundary process techniquefor empirical mode decomposition,” in Proceedings of theIEEE International Geoscience and Remote Sensing SymposiumProceedings (IGARSS ’04), vol. 6, pp. 4258–4261, Anchorage,Alaska, USA, September 2004.

[21] P. Flandrin, P. Goncalves, and G. Rilling, “Detrending anddenoising with empirical mode decomposition,” in Pro-ceedings of the 12th European Signal Processing Conference(EUSIPCO ’04), pp. 1581–1584, Vienna, Austria, September2004.

[22] G. Rilling, P. Flandrin, and P. Goncalves, “Empirical modedecomposition, fractional Gaussian noise and hurst exponentestimation,” in Proceedings of the IEEE International Confer-ence on Acoustics, Speech and Signal Processing (ICASSP ’05),vol. 4, pp. 489–492, 2005.

[23] S. Benramdane, J. C. Cexus, A. O. Boudraa, and J. A. Astolfi,“Transient turbulent pressure signal processing using empir-ical mode decomposition,” in Proceedings of the 4th Interna-tional Conference on Physics in Signal and Image Processing, p.4, Mulhouse, France, January 2007.

[24] P. Flandrin, G. Rilling, and P. Goncalves, “Empirical modedecomposition as a filter bank,” IEEE Signal Processing Letters,vol. 11, no. 2, part 1, pp. 112–114, 2004.

[25] Z. Wu and N. E. Huang, “A study of the characteristics of whitenoise using the empirical mode decomposition method,”Proceedings of the Royal Society A, vol. 460, no. 2046, pp. 1597–1611, 2004.

[26] B. Weng and K. E. Barner, “Optimal and bidirectionaloptimal empirical mode decomposition,” in Proceedings of theIEEE International Conference on Acoustics, Speech and SignalProcessing (ICASSP ’07), vol. 3, pp. 1501–1504, Honolulu,Hawaii, USA, April 2007.

[27] R. Deering and J. F. Kaiser, “The use of a masking signal toimprove empirical mode decomposition,” in Proceedings of theIEEE International Conference on Acoustics, Speech and SignalProcessing (ICASSP ’05), vol. 4, pp. 485–488, Philadelphia, Pa,USA, March 2005.

[28] S. F. Boll, “Suppression of acoustic noise in speech usingspectral subtraction,” IEEE Transactions on Acoustics, Speech,and Signal Processing, vol. 27, no. 2, pp. 113–120, 1979.

[29] D. L. Donoho and I. M. Johnstone, “Adapting to unknowsmoothness via wavelet shrinkage,” Journal of the AmericanStatistical Association, vol. 90, no. 432, pp. 1200–1424, 1995.

[30] D. L. Donoho, I. M. Johnstone, G. Kerkyacharian, andD. Picard, “Wavelet shrinkage: asymptopia with discussion,”Proceedings of the Royal Statistical Society B, vol. 57, no. 2, pp.301–396, 1995.

[31] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T.Vetterling, Numerical Recipes in C: The Art of ScientificComputing, Cambridge University Press, New York, NY, USA,2nd edition, 1992.

[32] G. Steidl, J. Weickert, T. Brox, P. Mrazek, and M. Welk, “On theequivalence of soft wavelet shrinkage, total variation diffusion,total variation regularization and SIDEs,” Tech. Rep. SeriesSPP-1114, Department of Mathematics, University of Bremen,Bremen, Germany, 2003.

[33] S. Mallat, Une Exploration des Signaux en Ondelettes, EcolePolytechnique, Palaiseau, France, 2000.


Research ArticleSegmentation of Killer Whale Vocalizations Usingthe Hilbert-Huang Transform

Olivier Adam

Laboratorie d’Images, Signaux et Systemes Intelligents (LiSSi - iSnS), Universite de Paris 12, 61 avenue de Gaulle,94010 Creteil Cedex, France

Correspondence should be addressed to Olivier Adam, [email protected]

Received 1 September 2007; Revised 3 March 2008; Accepted 14 April 2008


The study of cetacean vocalizations is usually based on spectrogram analysis. The feature extraction is obtained from 2D methodslike the edge detection algorithm. Difficulties appear when signal-to-noise ratios are weak or when more than one vocalization issimultaneously emitted. This is the case for acoustic observations in a natural environment and especially for the killer whaleswhich swim in groups. To resolve this problem, we propose the use of the Hilbert-Huang transform. First, we illustrate how fewmodes (5) are satisfactory for the analysis of these calls. Then, we detail our approach which consists of combining the modesfor extracting the time-varying frequencies of the vocalizations. This combination takes advantage of one of the empirical modedecomposition properties which is that the successive IMFs represent the original data broken down into frequency componentsfrom highest to lowest frequency. To evaluate the performance, our method is first applied on the simulated chirp signals. Thisapproach allows us to link one chirp to one mode. Then we apply it on real signals emitted by killer whales. The results confirmthat this method is a favorable alternative for the automatic extraction of killer whale vocalizations.

Copyright © 2008 Olivier Adam. This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. INTRODUCTION

Marine mammals show a vast diversity of vocalizations fromone species to another and from one individual to anotherwithin a species. This can be problematic in analyzingvocalizations. The Fourier spectrogram remains today theclassical time-frequency tool used by cetologists [1–3]—and sometimes the only one proposed—for use with typicalsoftware dedicated to bioacoustic sound analysis, such asMobySoft Ishmael, RainbowClick, Raven, Avisoft, and XBat,respectively, developed by [4–8].

In general, when analyzing bioacoustic sounds, posttreat-ment consists of binarizing the spectrogram by comparingthe frequency energy to a manually fixed threshold [4, 9].Then, feature extraction of the detected vocalizations iscarried out using 2D methods specific to image processing.These algorithms, like the edge detection algorithm, areapplied on the time-frequency representations [4, 5, 10].

Though the Fourier transform provides satisfactoryresults as far as cetologists are concerned, all hypotheses arenot consistently verified. This is particularly true for the

analysis of continuous recordings when signals and noisesare varying in time and frequency [11]. Moreover, thesetime-frequency representations have interference structures,especially for the type 1 Cohen’s class (e.g., as the Wigner-Ville distribution) [12]. In addition, the uniform time-frequency resolution of the spectrogram has drawbacks fornonstationary signal analysis [13].

To overcome these difficulties, the following approacheshave been recently proposed: parametric linear modelssuch as autoregressive filters, Schur algorithm, and wavelettransform [14–17]. A comparative study of these approachescan be found in [16]. All of these methods are based onspecific functions for providing the decomposition of theoriginal signals. These functions can present a bias in theresults proving a disadvantage in analyzing a large set ofdifferent signals, such as killer whale vocalizations. Also,concerning the wavelet transform, it should be noted that,in general, bioacoustic signals are never decomposed usingthe same wavelet family. For example, in analyzing the spermwhale regular clicks, authors have presented the Mexican hatwavelet, the wavelet package, and the Daubechies wavelet,


and so forth [15, 16, 18–20]. It seems that the choice to useone specific wavelet family is influenced less by the shape ofthe sperm whale click than by the global performance on thecomplete dataset used by the authors in their application.

Introduced as the generalization of the wavelet transform[21], the chirplet transform appears a possible solution inour application because of the specific shape of certain killerwhale vocalizations (e.g., chirps). However, this method hassome disadvantages. First, it requires the presegmentationof the signals (unnecessary in our method). Second, it isknown that the computation time of the chirplet transformis lengthy and the proposed method to compensate for thisdrawback limits the analysis to one single chirp per preseg-ment [21, 22]. This is not feasible for our approach becausemore than one vocalization is likely to be simultaneouslypresent in the recordings.

This paper endeavors to adapt the Hilbert-Huang trans-form (HHT) to the killer whale vocalization detection andanalysis. We introduce the HHT because it is well suited fornonlinear nonstationary signals analysis [12]. This transformis used as a reliable alternative to the wavelet transform formany applications [23, 24], including underwater acousticsounds [25, 26]. The detailed advantages are promising fordetecting underwater biological signals even if they have awide diversity, as mentioned above. In our previous work,we have confirmed positive results for the analysis of spermwhale clicks using the HHT [27, 28].

In these articles, we demonstrated how to detect thesetransient signals emitted by sperm whales. The modesobtained from the HHT were used for extracting andcharacterizing sperm whale clicks, as detailed in [29]. Wecompared results from different approaches to obtain thebest time resolution. First, this allowed us to characterize theshape of the emitted sounds (evaluation of the size of thesperm whale head with precision). Second, we optimized thecomputation of time delays for arrivals of the same soundon different hydrophones to minimize the error margin onthe sperm whale localization. In conclusion, the HHT waspresented as the alternative to the spectrograms.

Also, in these articles, we did not discuss the role ofeach mode obtained from the HHT and we did not presentthe method based on the combined modes as we do in thisarticle. Considering that our current work is not only aimedat illustrating a new application of the HHT but also, throughour application dedicated to killer whale vocalizations, weintroduce an original method based on the combined modesdetailed in the following section.

2. METHOD

Proposed by Huang et al. in 1998 [12], the Hilbert-Huangtransform is based on the following two consecutive steps:(1) the empirical mode decomposition (EMD) extractsmodes from the original signal. These modes are also referredto as intrinsic mode functions (IMFs), and (2) by applyingthe Hilbert transform on each mode, it is possible to providetime-frequency representation of the original signal. It isimportant to note that (1) the EMD is not defined bymathematical formalism; the algorithm can be found in [12],

and (2) the second step is optional. Some authors limit theirapplication solely to the use of the EMD [30, 31].

The use of these modes can be compared to a filter bank[32]. At time k, the decreasing frequencies are placed insuccessive modes, from first to last. Our method takes advan-tage of this characteristic. Our contribution is an originalprocess for the segmentation/combination of these modes.The objective is to link a single killer whale vocalization to asingle mode.

2.1. Brief theory of the HHT

The EMD is applied on the original signal. This decompo-sition is one of the advantages of this method because no apriori functions are required: no function has to be chosen,and consequently, no bias results from this.

The EMD is based on the extraction of the upperand lower envelopes of the original signal (by extremainterpolation). The mode is extracted when (1) the numberof the extrema and the number of zero crossings are equalto or differ at most by one, and (2) the mean of these twoenvelopes is equal to zero.

The original sampled signal s(t) is

s(t) =M∑

i=1

ci(t) + RM(t), (1)

with t, i,M ∈ N. t = 1, 2, . . . ,T , where T is the length of thesignal s. M is the number of modes extracted from the signalusing EMD. ci is the ith IMF and RM the residue. ci and RMare 1-dimension signals with T samples.

We note that the EMD could be applied on any nonzero-mean signal. However, each mode is a zero-mean signal. Itis important to note that all the modes are monocomponenttime-variant signals. The algorithm is shown in Figure 1.

The time-frequency representation is provided aftercomputation of the Hilbert transform on each mode,

cHi(t) = HT(ci) = ci(t)⊗ 1πt

, (2)

where ⊗ is the convolution.From the analytic mode cAi(t) = ci(t) + jcHi(t), also

written cAi(t) = ai(t)e jθi(t), we define the instantaneousamplitude response and the instantaneous phase. For eachmode, the instantaneous frequency is obtained by

fci(t) =1

2πdθci(t)dt

. (3)

Lastly, the time variations of the instantaneous frequencies ofeach mode correspond to the time-frequency representation.

2.2. Segmentation and combination of the modes

For cetologists, the acoustic observations of a specificmarine zone consist of detecting sounds emitted by marinemammals. Once achieved, a feature extraction is carried outto identify the species.

It is possible to use the HHT in performing the emittedsound detection. We assume that the original zero-mean

Olivier Adam 3

Initialization step:δ = value of the stop criterion thresholdi = 1residual signal: r j−1 = s

Sifting process: extraction of ci1. j = 12. ctmpi, j−1 = ri−1

3. Extraction of the local extrema of ctmpi, j−1

4. Interpolation of the minima and the maximato obtain the lower Li, j−1 and upper Ui, j−1 envelopes

5. Mean of these envelopes: mi, j−1 = 0.5x(Ui, j−1 + Li, j−1)

6. ctmpi, j = ctmpi, j−1 −mi, j−1

7. Stop criterion: SD j = sum(((|ctmpi, j−1 − ctmpi, j |)2)/(ctmpi, j−1)2)

j = j + 1N

SD j < δ

Y

Saving step:save the ith IMF: ci = ctmpi, j

Update:residual signal: ri = (ri−1 − ctmpi, j)nr = number of the local extrema of ri

i = i + 1N

nr < 2

Y

End

Figure 1: Algorithm for the IMF extraction from the original signal s.

real signal has not been previously segmented by means ofanother technique. The EMD provides a limited numberof modes (IMFs) resulting from this original signal. Notethat each mode is the same length as the original signal(same number of samples). In any application, the challengein using the HHT is in interpreting the contents of eachmode as all signal components are divided between all theIMFs according to their instantaneous frequency [12]. Forthis reason, we propose the segmentation of the modesin order to link a part of this information to one singlemode. Our method allows for segmentation to be basedon the strong variations of the mode frequencies: thesevariations can be used to distinguish the presence of differentchirps (cf. the example detailed in Section 3.1) or differentvocalizations (cf. Section 3.2). Our segmentation is based onthe three following rules: (1) all the modes are composedby the same number of segments, (2) the jth segmentsof all the modes have the same length, and (3) differentsegments of one single mode could be different lengths. Toperform this segmentation, we could have used a criterionbased on the discontinuities of the instantaneous amplitude.But vocalizations show a continuous fundamental frequency(signal with a constant or time-varying frequency) in theircomplete duration (time between two silences like that which

the human ear can hear). Also, for our purposes, we havechosen to work with variations of the frequencies becausewe want to track killer whale vocalizations. Moreover,tracking the frequency variations for extracting the killerwhale vocalizations is possible because these frequenciesare much higher in pitch than the underwater ambientnoise.

The detection of the frequency variations helps usidentify the exact beginning and end of each vocalization.For the detection approach, our criterion is based onthe derivative of the instantaneous frequency. But it isimportant to keep in mind that the phase is a localparameter. To avoid fluctuations due mainly to ambientnoise, Cexus et al. have recently proposed the use of theTeager-Kaiser operator [33]. But this seemingly promisingoperator has not been evaluated for our application. Up tonow, we calculate the derivative of the mean instantaneousfrequency for establishing the limits of all segments for onemode,

gci(t) =d f ci(t)

dt, (4)

where f ci is the mean of the successive instantaneousfrequencies. This step is added for attenuating the variations


of these instantaneous frequencies. f ci is the median offci :

f ci(t) =1Tw

Tw/2∑

k=−Tw/2fci(t − k). (5)

The length Tw of the time window for providing thismean depends on the application. In this paper, the Twvalue is empirically established from the study of ourdataset.

The idea of our detection approach is to track the signalvia analysis of the functions gci . These functions correspondto the frequency variations of each monocomponent IMF.Strong variations in these IMFs which indicate the presenceof signal information (start or end of one vocalization)provoke notable changes in the functions gci , hypothesis H0.Otherwise, these functions are nearly constant, hypothesisH1. The functions dci are given by

dci(t) =(gci(t)− gci(t − 1)

)2 H1

≷H0

η, (6)

where η denotes the comparison threshold. For our applica-tion, this value is constant (η = 10%×max(dci)), but it couldbe made adaptive.

When a new vocalization appears in the recordings,the function gci calculated from the first mode is suddenlyvarying. The value of the detection criterion dci is superior tothe threshold η.

Moreover, this function gci will have a positive maximumand a negative maximum, respectively, for the start and theend of one single vocalization as the vocalization frequenciesare currently higher than the low ambient noise frequencies.

Moreover, because two vocalizations have two differentmain frequencies, gci will present discontinuities, which areused for the vocalization segmentation.

Our criterion is successively applied on the first mode,then the second mode, and so on. At the end of this process,we obtain all the segments and we can determine their length.

The ith IMF is

ci ={c1i |c2

i · · · |cNi}

, (7)

with cji being the jth segment of ci defined by

cji =

{ci(t j−1 + 1), ci(t j−1 + 2), . . . ci(t j + 1), ci(t j)

}, (8)

where t j−1 and t j are the time of the last sample of segments

cj−1i and c

ji , respectively. Note that t0 = 0 and tN = T .

In our approach, we validate either the decreasing shiftor the permutation of the jth segments between two modesci−1 and ci. These combinations allow us to link specificinformation to one single IMF. Our objective is to track thefundamental frequency and the harmonics of the killer whalevocalizations (see Section 3). Each vocalization will be linkedto one mode.

The new mode m is the result of the combined previousIMF,

mi ={c1k|c2

k · · · |cjk| · · · cNk

}. (9)

The combination depends on the positive or negativemaximum of gci , when dci(t) > η.

(i) max(gci) > 0. This means that the instantaneous fre-

quency of the end of segment cji is less than the instantenous

frequency of the start of the next segment cj+1i . Concerning

segment cji , the vocalization could continue on segment c

j+1i+1 .

So, our process consists of switching this segment cji to the

new mji+1 and putting zeros z

ji in the new m

ji ,

zji =

{

0︸︷︷︸zi(t j−1+1)

, 0︸︷︷︸zi(t j−1+2)

, . . . , 0︸︷︷︸zi(t j−1)

, 0︸︷︷︸zi(t j )

}

. (10)

We repeat this process on the segment of each following

mode: mjk+1 = c

jk with k ≥ i. Whereas segment c

j+1i is the

start of a new vocalization. Our process does not modify thissegment or those that follow.

(ii) max(gci) < 0. The instantaneous frequency of the end

of segment cji is higher than the instantenous frequency of

the start of the next segment cj+1i . This means that segment

cji marks the end of the vocalization. This segment is not

modified. All the following segments clk (l ≥ j +1) of thismode are switched to the next mode (k +1): ml

k+1 = clk andwe replace the current segments with zeros zlk.

This process is summarized in Table 1.This process of combining is done from the first to the

last IMF. Because the number of modes and the number ofsegments are finite, the process ends on its own.

The new obtained signal is 1-dimensional with T samplesand is given by

u ={ M∑

i=1

m1i

∣∣∣∣

M∑

i=1

m2i · · ·

∣∣∣∣

M∑

i=1

mNi

}

. (11)

The following step is optional. We use a weighted factor (λji ∈

R) on each segment,

u ={ M∑

i=1

λ1i m

1i

∣∣∣∣

M∑

i=1

λ2i m

2i · · ·

∣∣∣∣∣

M∑

i=1

λNi mNi

}

. (12)

We diminish the role of each segment by using low valuesof the weighted factors; we can even delete certain segments

by using λji = 0. Consequently, this step allows us to am-

plify or attenuate one or more segments of the combinedIMF. The value of these weighted coefficients must bechosen based on the objective of the application. In manycases, it could be appropriate to fix a value dependent onthe signal frequencies. In our application, we amplify thehighest frequencies and attenuate the lowest frequencies inrelation to the killer whale vocalizations and the ambientnoise, respectively—we use our process like a filter. In otherapplications, the objective could be to use a criterion basedon the signal energy, for example, to reduce high-energysegments and amplify low-energy segments.

Equation (12) demonstrates the possibility of using thenew IMF for the selection of certain parts of the originalsignal.

Olivier Adam 5

Table 1: Combination of segments; case 1: max(gci ) > 0; case 2: max(gci) < 0 (the dotted line is the separation of 2 successive segments).

Cases

1.

fci

cji

cj+1i

gci

2.

fci

cji

cj+1i

gci

Actions (k � i, l � j + 1)

Segments mjk

zji m

ji

cji m

ji+1

cji+1 m

ji+2

cji+2 m

ji+3

...

Segments mlk

No change

Segments mjk

No change

Segments mlk

zli mli

cli mli+1

cli+1 mli+2

cli+2 mli+3

...

Remarks

segment cji+1

could be the continuation

of segment cj+1i

(possible parts ofthe same vocalization)

Segment cji+1

is the last part ofthe vocalization

All segments clkare switched to

the segments clk+1

3. RESULTS

Our research team is involved in a scientific project basedon the detection and localization of marine mammals usingpassive acoustics. We have already used the HHT for differentkinds of bioacoustic transient signals, particularly spermwhale clicks [27]. Now, we are applying the method on har-monic signals. In this section, we show the results obtainedon simulated chirps, then we illustrate its performance onkiller whale vocalizations.

3.1. Analysis of the simulated three chirps signal

To present our method in detail, we have generated asimulated signal composed of the three chirps with varyingfrequencies (linear, convex, or concave) (Figure 2(A)).

The normalized frequencies of the first chirp s1 varyfrom 0.062 to 0.022. s2 is the second chirp having a concavevariation of the normalized frequency from 0.016 to 0.08.s3 is the third chirp containing the linear variation of thenormalized frequency from 0.008 to 0.012.

In this example, we use normalized frequency as it isimportant to know the frequencies of the chirps rather thanthe value of the sampling frequency.

The spectrogram is provided in Figure 2(B).The first step of our approach involves performing the

EMD (Figure 2(C)). We note that the three first modespresent all the frequency variations of the three chirps.

Providing the time-frequency representation of all thesemodes will reveal the frequencies of each chirp. With theEMD, these frequencies are hierarchically allocated to eachmode, meaning that at each moment, the first mode has thehighest frequency and the last mode, the lowest frequency.

Figure 2(D) shows that the IMFs have frequencies orig-inating from all three chirps. Therefore, IMF 1 successivelycontains the frequencies from chirp s3, then from s1, thenfrom s2, and then from s3 again. Similarly, IMF 2 is composedof frequencies from s3, then s2, and s3 again. Finally, IMF 3contains only a short part of the frequency of s3.

Feature extraction from the time-frequency representa-tion (Figure 2(B)) requires 2D algorithms, such as the edgedetection algorithm, for example. Our goal allows us to avoidusing these algorithms so common in image processing.

In our simulated signal analysis, the work results inlinking one complete chirp to one single IMF. The point ofusing the new combined IMF is that the new IMF 1 receivesits frequency solely from chirp s1. New IMF 2 and IMF 3 will,respectively, receive frequencies solely from s2 and s3 (6).

To segment these IMFs, we monitor the variations ofthe gci parameter (Figure 2(E)). In our example, the fivesegments are obtained from this parameter (Figure 2(F)).Note that to avoid the side effects resulting from thesegmentation process, we force the segments to start and endat zero by applying the Tukey window [34].

Then, the IMFs are combined (see (6) and Figure 2(G)).We provide the time-frequency representation. The Hilberttransform is applied on these new combined IMFs. Thus,the obtained figure confirms that the new IMFs have thefrequencies of the original chirps.

If one of these chirps is considered a source of noise, wecould discard this chirp by using the weighted coefficients

equal to zero. For example, we can deletem3 by applying λj3 =

0.The advantage is that we can use a 1D algorithm to

extract the frequency from each new IMF (in our case, theinterpolation could be done by using a simple 1-order or


Time domain

Relative amplitude

Signal

(A)

Step 1:EMD

Time-frequency domain

Normalized frequency

Hilberttransform

(B)

of themode 1

of themode 2

of themode 3

Spectrogram

c1

c2

c3

c4

c5

(C)...

.

.

....

0.5

0.4

0.3

0.2

0.1

0

0.06

0.04

0.02

0

0.06

0.04

0.02

0

0.06

0.04

0.02

0(D)

(a) Decomposition of the original simulated signal; (A) original signal with the three chirps, (B) spectrogram, (C) EMDdecomposition, (D) Hilbert transform of each IMF

Relative amplitude

Step 2 : segmentation

(F)

c1

c2

c3

c11

c12

c13

c21

c22

c23

c31

c32

c33

c41

c42

c43

c51

c52

c53

(D)

gc1

0

dc1

10%xmax (dc1 )

0gc2

0

dc2

10%xmax (dc2 )

0gc3

0

dc3

10xmax (dc3 )

0

0.06

0.04

0.020

0.06

0.04

0.020

0.06

0.04

0.020

(E)

(b) Segmentation of the IMFs; (D) Hilbert transform of each IMF, (E) computation of gci and dci , (F) segmentation of the IMFs

Olivier Adam 7

Time domain

Relative amplitude

Time-frequency domain

Normalized frequency

Hilberttransform

(H)

of the newmode 1

of the newmode 2

of the newmode 3

(F)

Step 3: combination

c1

c2

c3

c11

c12

c13

c21

c22

c23

c31

c32

c33

c41

c42

c43

c51

c52

c53

m1

m2

m3

z11

z12

c11

c21

z22

c22

c31

c32

c33

z41

c41

c42

(G)

z51

z52

c51

.

.

....

.

.

.

.

.

....

.

.

.

Time

0.06

0.04

0.02

0

0.06

0.04

0.02

0

0.06

0.04

0.02

0Time

(c) Combination of the IMFs; (F) segmentation of the IMFs, (G) new combined IMFs, (H) Hilbert transform applied on thesenew IMFs

Figure 2

Relative amplitude

Relative amplitude

Relative amplitude

Relative amplitude

Hilberttransform

Hilberttransform

(c)

EMD

EMD

Frequency (kHz)

Frequency (kHz)

(b)

Time (s)

Time (s)Time (s)

Time (s)Time (s)

Time (s)

0 0.50 0.5

0 0.5

0 0.50 0.5

0 0.5

(a)

543

210

543

21

0

c1

c2

c3

c4

c5... .

.

....

c1

c2

c3

c4

c5...

.

.

....

.

.

.

.

.

.

Figure 3: Decomposition of two harmonic killer whale vocalizations; (a) original signal, (b) EMD, (c) Hilbert transform of each new IMF.

2-order polynomial regression). We do not have to employ2D algorithms.

In conclusion, we have linked one chirp to one single newIMF. We have shown too that it is possible to filter the signalthrough this method.

3.2. Analysis of killer whale vocalizations

Killer whales emit vocalizations with various time and fre-quency characteristics (short, long, with or without harmon-ics, etc.). Killer whales live and evolve in social groups, so it isvery rare to have recordings from only one individual, unlesswe consider the animals in the aquarium. Therefore, in theserecordings, it is current to find more than one vocalization

at the same time. This complicates the detection of thesevocalizations. Another challenge is to find one completevocalization. At times, a single complete vocalization issegmented into many components. This depends on themethod used to provide the time-frequency representation.When the signal-to-noise ratio is weak, it is common that thebinarized spectrogram separately extracts different parts ofone single vocalization. To prevent this, other methods havebeen proposed like the chirplet transform and the wavelettransform [16, 21, 25].

In our dataset, the vocalizations have been recorded froma group of killer whales in their natural environment. Vocal-ization segmentation is commonly accomplished by apply-ing the spectrogram. The analysis of this time-frequency


Table 2: Detection of vocalizations; % of detection of completevocalizations, % of detection of simultaneous vocalizations.

Detection ofvocalizations

Spectrogram Chirplet transform Combined IMFs

Complete 76.9 95 95

Simultaneous 78 31.7 92.7

representation is executed with the aid of a threshold tobinarize the spectrogram, or of an edge detector [4, 5].The performance depends on (1) the signal-to-noise ratiowhich is varying during all the recordings, and (2) thesimultaneous presence of more than one vocalization. Ourmethod was introduced as a solution to overcome these twoobstacles. First, the ambient noise has lower frequencies thanthe vocalizations. So it is coded by the last IMFs. Second,each vocalization is linked to a single combined IMF. Thisfacilitates feature extraction (duration of the vocalization,start and end frequencies, and shape).

In our application, we do not take into account thelast IMFs. In our previous work [27], we defined a per-formance/complexity criterion based on the contributionof each mode for obtaining the complete original signal.Applied on this dataset, this criterion shows that only thefirst five IMFs are sufficient for extracting killer whalevocalizations. This low number of IMFs is coherent with theresults obtained by Wang et al. [25]. Considering only thefirst five IMFs contributes to minimize the execution time ofthis approach.

In the second step of the process, the modes are com-bined following our algorithm to link one vocalization to onemode.

We have compared the detection performance of thethree methods: the spectrogram, the chirplet transform, andour approach based on the combined IMFs. Results appear inTable 2. We consider our detection to be accurate when thevocalization is determined in its full length. The segmentedvocalization is considered to be falsely detected.

When using the spectrogram, detection quality dependsmainly on the threshold value. In this application, we haveused a fixed threshold for the complete dataset in spite ofthe presence of the varying ambient noise. The consequenceis that 25% of the vocalizations are segmented. Thus, thespectrogram detector extracts many successive vocalizationsthat are in fact all components of the same vocalization.These results could be slightly improved by using an adaptivethreshold.

With the chirplet transform, the results decrease signifi-cantly in the presence of simultaneous vocalizations. In thesecases, it seems that the algorithm extracts the vocalizationcontaining the greatest energy. Our method is more robustbecause these different vocalizations are linked to differentcombined modes. The detection process is done on eachmode.

Another advantage of our approach concerns vocaliza-tions with harmonics. The presence of these harmonicshelps biologists characterize and classify sounds emitted byanimals. Our method equally enables linking one harmonic

Time (s)

Rel

ativ

eam

plit

ude

0.6

0

−0.60 0.2 0.4 0.6

(a)

Time (s)

Nor

mal

ized

freq

uen

cy

0.1

0.06

00 0.2 0.4 0.6

(b)

Time (s)

Nor

mal

ized

freq

uen

cy

0.1

0.06

00 0.2 0.4 0.6

(c)

Figure 4: Extraction of the vocalization features; (a) original signal,(b) Hilbert transform, (c) characterization of the vocalization.

to a single mode (as seen in Figure 3). Unlike in the previouscase, the vocalizations with harmonics are distinguishiblefrom simultanous vocalizations because all the harmoniccomponents have the same shape.

Another advantage of our method is that it allows us toeasily characterize each vocalization by applying the Hilberttransform on each combined mode mi (duration, start andend frequency, and shape). We employ a simple 1D functionto model the vocalizations. This is illustrated on a sample ofour dataset (Figure 4); we have extracted the start and theend of the vocalization and the shape by applying a 3-orderpolynomial regression.

Olivier Adam 9

4. CONCLUSION

After achieving promising results obtained on sperm whaleclicks (transient signals), our objective is to evaluate theHilbert-Huang transform on harmonic killer whale vocal-izations. To this end, we propose a new method based onan original combination of the intrinsic mode functionsobtained by the empirical mode decomposition. The advan-tages of our method are (1) we filter the signal from thenew combined modes; (2) we link one vocalization (or oneharmonic) to one single mode; (3) we use a 1D algorithm tocharacterize the vocalizations.

ACKNOWLEDGMENT

This work was supported by Association DIRAC (France).

REFERENCES

[1] J. Cirillo, S. Renner, and D. Todt, “Significance of context-related changes in compositions and performances of group-repertoires: evidence from the vocal accomplishments oforcinus orca,” in Proceedings of the 20th Annual Conferenceof the European Cetacean Society, pp. 70–71, Gdynia, Poland,April 2006.

[2] A. Kumar, “Animal communication,” Current Science, vol. 85,no. 10, pp. 1398–1400, 2003.

[3] W. A. Kuperman, G. L. D’Spain, and K. D. Heaney, “Longrange source localization from signal hydrophone spectro-grams,” Journal of the Acoustical Society of America, vol. 109,no. 5, pp. 1935–1943, 2001.

[4] D. Mellinger, “Automatic detection of regularly repeatingvocalizations,” Journal of the Acoustical Society of America, vol.118, no. 3, p. 1940, 2005.

[5] D. Gillespie, “Detection and classification of right whale classusing an edge detector operating on smoothed spectrogram,”Journal of the Canadian Acoustical Association, vol. 32, pp. 39–47, 2004.

[6] R. A. Charif, D. W. Ponirakis, and T. P. Krein, “Raven Lite 1.0User’s Guide,” Cornell Laboratory of Ornithology, Ithaca, NY,USA, 2006.

[7] R. Specht, www.avisoft.de.[8] H. Figueroa, “Acoustic tool development with XBAT,” in

Proceedings of the 2nd International Workshop on Detection andLocalization of Marine Mammals Using Passive Acoustics, p. 53,Monaco, France, November 2005.

[9] S. Jarvis, D. Moretti, R. Morrissey, and N. Dimarzio, “Passivemonitoring and localization of marine mammals in openocean environments using widely spaced bottom mountedhydrophones,” Journal of the Acoustical Society of America, vol.114, no. 4, pp. 2405–2406, 2003.

[10] C. Hory, N. Martin, and A. Chehikian, “Spectrogram segmen-tation by means of statistical features for non-stationary signalinterpretation,” IEEE Transactions on Signal Processing, vol. 50,no. 12, pp. 2915–2925, 2002.

[11] C. Ioana and A. Quinquis, “On the use of time-frequencywarping operators for analysis of marine-mammal signals,”in Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP ’04), vol. 2, pp. 605–608,Montreal, Canada, May 2004.

[12] N. E. Huang, Z. Shen, S. R. Long, et al., “The empiricalmode decomposition and the Hilbert transform spectrum for

nonlinear and non-stationary time series analysis,” Proceedingsof the Royal Society A, vol. 454, no. 1971, pp. 903–995, 1998.

[13] R. Tolimieri and M. An, Time-Frequency Representations,Applied and Numerical Harmonic Analysis, Birkhauser,Boston, Mass, USA, 1997.

[14] S.-H. Chang and F.-T. Wang, “Application of the robustdiscrete wavelet transform to signal detection in underwatersound,” International Journal of Electronics, vol. 90, no. 6, pp.361–371, 2003.

[15] R. Huele and H. Udo de Haes, “Identification of individualsperm whales by wavelet transform of the trailing edge of theflukes,” Marine Mammal Science, vol. 14, no. 1, pp. 143–145,1998.

[16] M. Lopatka, O. Adam, C. Laplanche, J. Zarzycki, and J.-F. Motsch, “An attractive alternative for sperm whale clickdetection using the wavelet transform in comparison to theFourier spectrogram,” Aquatic Mammals, vol. 31, no. 4, pp.463–467, 2005.

[17] M. Lopatka, O. Adam, C. Laplanche, J. Zarzycki, and J.-F. Motsch, “Effective analysis of non-stationary short-timesignals based on the adaptative schur filter,” Transactions onSystems, Signals & Devices, vol. 1, no. 3, pp. 295–319, 2005.

[18] M. P. Fargues and R. Bennett, “Comparing wavelet transformsand AR modelling as feature extraction tools for underwatersignal classification,” in Proceedings of the 29th AsilomarConference on Signals, Systems and Computers, vol. 2, pp. 915–919, Pacific Grove, Calif, USA, October-November 1995.

[19] J. Ioup and G. Ioup, “Identifying individual sperm whalesacoustically using self-organizing maps,” Journal of the Acous-tical Society of America, vol. 118, no. 3, p. 2001, 2005.

[20] M. van der Schaar, E. Delory, A. Catala, and M. Andre, “Neuralnetwork-based sperm whale click classification,” Journal of theMarine Biological Association of the UK, vol. 87, no. 1, pp. 35–38, 2007.

[21] S. Mann and S. Haykin, “The chirplet transform: physicalconsiderations,” IEEE Transactions on Signal Processing, vol.43, no. 11, pp. 2745–2761, 1995.

[22] J. Cui, W. Wong, and S. Mann, “Time-frequency analysis ofvisual evoked potentials using chirplet transform,” ElectronicsLetters, vol. 41, no. 4, pp. 217–218, 2005.

[23] N. E. Huang, C. C. Chern, K. Huang, L. W. Salvino, S. R. Long,and K. L. Fan, “A new spectral representation of earthquakedata: Hilbert spectral analysis of station TCU129, Chi-Chi,Taiwan, 21 September 1999,” Bulletin of the SeismologicalSociety of America, vol. 91, no. 5, pp. 1310–1338, 2001.

[24] P. Hwang, J. Kaihatu, and D. Wang, “A comparison of theenergy flux computation of shoaling waves using Hilbert andwavelet spectral analysis technique,” in Proceedings of the 7thInternational Workshop on Wave Hindcasting and Forecasting,Banff, Canada, October 2002.

[25] F.-T. Wang, S.-H. Chang, and J. C.-Y. Lee, “Signal detection inunderwater sound using the empirical mode decomposition,”IEICE Transactions on Fundamentals of Electronics, Communi-cations and Computer Sciences, vol. E89-A, no. 9, pp. 2415–2421, 2006.

[26] A. D. Veltcheva and C. G. Soares, “Identification of thecomponents of wave spectra by the Hilbert-Huang transformmethod,” Applied Ocean Research, vol. 26, no. 1-2, pp. 1–12,2004.

[27] O. Adam, “The use of the Hilbert-Huang transform to analyzetransient signals emitted by sperm whales,” Applied Acoustics,vol. 67, no. 11-12, pp. 1134–1143, 2006.

[28] O. Adam, “Advantages of the Hilbert-Huang transform formarine mammals signals analysis,” Journal of the AcousticalSociety of America, vol. 120, no. 5, pp. 2965–2973, 2006.


[29] M. A. Chappell and S. J. Payne, “A method for the automateddetection of venous gas bubbles in humans using empiricalmode decomposition,” Annals of Biomedical Engineering, vol.33, no. 10, pp. 1411–1421, 2005.

[30] P. J. Oonincx and J.-P. Hermand, “Empirical mode decompo-sition of ocean acoustic data with constraint on the frequencyrange,” in Proceedings of the 7th European Conference onUnderwater Acoustics, Delft, The Netherlands, July 2004.

[31] I. M. Janosi and R. Muller, “Empirical mode decompositionand correlation properties of long daily ozone records,”Physical Review E, vol. 71, no. 5, Article ID 056126, 5 pages,2005.

[32] P. Flandrin, G. Rilling, and P. Goncalves, “Empirical modedecomposition as a filter bank,” IEEE Signal Processing Letters,vol. 11, no. 2, pp. 112–114, 2004.

[33] J. C. Cexus, A. O. Boudraa, L. Guillon, and A. Khenchaf,“Sonar targets analysis by Huang Teager Transform (THT),”Colloque Sea Tech Week, CMM 2006.

[34] R. B. Blackman and J. W. Tukey, The Measurement of PowerSpectra from the Point of View of Communication Engineering,Dover, Mineola, NY, USA, 1958.


Research ArticleEvaluating Pavement Cracks with BidimensionalEmpirical Mode Decomposition

Albert Ayenu-Prah and Nii Attoh-Okine

Department of Civil and Environmental Engineering, University of Delaware, Newark, DE 19716-3120, USA

Correspondence should be addressed to Nii Attoh-Okine, [email protected]

Received 5 September 2007; Accepted 2 March 2008


Crack evaluation is essential for effective classification of pavement cracks. Digital images of pavement cracks have been analyzedusing techniques such as fuzzy set theory and neural networks. Bidimensional empirical mode decomposition (BEMD), a newimage analysis method recently developed, can potentially be used for pavement crack evaluation. BEMD is an extension of theempirical mode decomposition (EMD), which can decompose nonlinear and nonstationary signals into basis functions calledintrinsic mode functions (IMFs). IMFs are monocomponent functions that have well-defined instantaneous frequencies. EMD isa sifting process that is nonparametric and data driven; it does not depend on an a priori basis set. It is able to remove noise fromsignals without complicated convolution processes. BEMD decomposes an image into two-dimensional IMFs. The present paperexplores pavement crack detection using BEMD together with the Sobel edge detector. A number of images are filtered with BEMDto remove noise, and the residual image analyzed with the Sobel edge detector for crack detection. The results are compared withresults from the Canny edge detector, which uses a Gaussian filter for image smoothing before performing edge detection. Theobjective is to qualitatively explore how well BEMD is able to smooth an image for more effective edge detection with the Sobelmethod.

Copyright © 2008 A. Ayenu-Prah and N. Attoh-Okine. This is an open access article distributed under the Creative CommonsAttribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work isproperly cited.

1. INTRODUCTION

Pavement evaluation is an essential part of a good pavementmanagement system for effective maintenance, rehabilita-tion, and reconstruction (MR&R) decision making. Pave-ment evaluation involves condition surveys to monitor theoverall health of the pavement network, and recommen-dations made regarding maintenance actions. Traditionally,pavement condition surveys are visual surveys whereby acrew is sent out to visually inspect sections of pavement forvarious types of distress. The most popular method is thepavement condition index (PCI) method developed by theUnited States Army Corps of Engineers. The PCI assessmentis a visual procedure by which a selected pavement section isvisually evaluated for various distress types, distress severityand quantity. Apart from the method being subjective anddepending on the expertise of the inspector, it is also quiteexpensive. A more objective and less expensive method ofdistress evaluation is automated pavement distress evalua-

tion, which system consists of automatically getting imagesof distresses and analyzing them using feature selectionmethods such as edge detection techniques for distress detec-tion and identification. Various image-processing techniquessuch as fuzzy set theory [1], neural networks [2], and Markovmethods [3] have been used to analyze cracking in roadpavements. Furthermore, there has been work in the area ofaggregate shape characteristics [4–6] using various imagingtechniques.

The present paper explores pavement crack detectionusing a new method called the bidimensional empiricalmode decomposition (BEMD) together with a well-knownedge detector, the Sobel edge detector. A number of imagesare smoothed with BEMD to remove noise, and the residualimage analyzed with the Sobel edge detector for crackdetection. The results are compared with results from theCanny edge detector, which first filters out noise fromthe image with a Gaussian filter before performing edgedetection. The objective is to qualitatively determine how


well BEMD is able to smooth an image for more effectiveedge detection using the Sobel method.

2. BIDIMENSIONAL EMPIRICALMODE DECOMPOSITION

The bidimensional empirical mode decomposition (BEMD)is the 2-D extension of the empirical mode decomposition(EMD), which is part of the Hilbert-Huang transform(HHT) developed by Huang et al. [7]. The empirical modedecomposition (EMD) is a multiresolution decompositionmethod that decomposes signals into basis functions thatare adapted from the signals themselves. That is, no a prioribasis functions are defined for the decomposition as inFourier-based methods in which sines and cosines are usedas predefined basis functions and then convolved with thesignal. Therefore, Fourier methods are most suitable forlinear and stationary signals. The EMD is hinged on theidea of instantaneous frequency; instantaneous frequencybecomes valid only in the event the signal is made symmetricwith respect to the local zero-mean line. Upper and lowerenvelopes, which cover all local maxima and local minima,respectively, are constructed, and then their mean iterativelyremoved in order to force local symmetry about the zero-mean line; the procedure has been termed “sifting.” Thesifting process results in the generation of basis functionsknown as intrinsic mode functions (IMFs), which areadaptively derived from the signal within the local time scaleof the signal; IMFs have instantaneous frequency defined forthem at every point. Therefore, while the EMD is a localdecomposition method, Fourier-based methods are global innature, which requires a transformation into the frequencydomain in order to determine the energy content of thesignal; it is not possible to achieve that in the time domain.

The HHT represents the energy content of a signal in anenergy-frequency-time domain called the Hilbert spectrum;energy content is analyzed in the time domain so that theexact instance an event occurs is known. It differs fromthe wavelet transform, however, in that wavelets still needa priori defined basis sets similar to the Fourier transform.Huang et al. [7] gives the full treatment of the HHT method.The process used to generate the Hilbert spectrum is calledthe Hilbert spectral analysis (HSA). Thus the HHT consistsof the two parts, EMD and HSA.

IMFs have certain requirements that need to be met inorder to be acceptable:

(i) the number of zero crossings and extrema must beequal or differ by at most one in whole data sets (toremove riding waves); and

(ii) the mean value of the envelope defined by thelocal maxima and the envelope defined by the localminima must be zero at every point.

An important step in the EMD process is the con-struction of the maxima and minima envelopes; researchhas shown that the cubic spline is the best fit for 1-DEMD. There are stopping criteria for the EMD process toprevent the resulting IMFs from being just purely frequency

Original signal

Construct upper andlower envelopes, and

find mean

Inn

erlo

op

Ou

ter

loop

Subtract meanfrom original

signal

Check inner loopresidue for IMF

qualification

Not IMF IMFTreat inner loop

residue asoriginal signal

Store IMF

Subtract IMF fromoriginal signal, and

treat outer loopresidue as original

signal

Figure 1: Pictorial representation of EMD.

and amplitude-modulated components. Two stopping cri-teria have been proposed: a Cauchy-type convergence thatdepends on limiting the standard deviation computed fromtwo consecutive IMFs [7], and one that depends on theagreement of the numbers of extrema and zero crossings [8].The whole EMD is stopped when the final residue becomes amonotonic function, or a constant. A snapshot of the siftingprocess to generate IMFs is shown in Figure 1 in which twoloops are presented: the inner loop iterates for IMFs, whilethe outer loop subtracts the most current IMF from theoriginal signal or what is left of it after previous IMFs havebeen removed from it, and then passes execution to the innerloop for the next IMF.

The HHT has a number of advantages that make it desir-able for signal analysis. The process is empirical and the mostcomputationally intensive step is the EMD operation, whichdoes not involve convolution and other time-consumingoperations; this makes HHT ideal for signals of large size.The Hilbert-Huang spectrum does not involve the conceptof frequency resolution but instantaneous frequency, whichis desirable for local analyses.

The success of the 1-D EMD prompted research intoa 2-D version, which may be used for image process-ing. Linderhed [9] first introduced 2-D EMD, which hasbeen subsequently called bidimensional empirical modedecomposition (BEMD). The basic steps in BEMD are thesame as for the EMD, only in two dimensions. Of muchimportance is the envelope construction for maxima andminima; in this case, scattered data interpolation (SDI) isused to construct 2-D surfaces. Various SDI methods havebeen used to construct maxima and minima envelopes,

A. Ayenu-Prah and N. Attoh-Okine 3

but unpublished results of recent comprehensive analysesconducted by authors of the present research were notconclusive regarding the superiority of one SDI methodover another when various methods were used in BEMDanalyses of texture and real images. However, Linderhed[10] preferred radial basis functions (RBFs) with thin-platesplines. The appropriate SDI method would depend onthe objective of the BEMD analysis. Before SDI can beperformed, appropriate extrema detection needs to becarried out. Detection of extrema has been achieved withmethods including morphological reconstruction based ongeodesic operators [11], and neighboring windows [10]. Thestopping criteria for BEMD are similar to that for the 1-DEMD. BEMD has been used for texture analysis [12] andimage compression [13]. Recently, Sinclair and Pegram [14]have used it for rainfall analysis and nowcasting.

3. EDGE DETECTION

3.1. Canny method

Edges are areas in an image with sharp intensity gradients.The objective of edge detection algorithms is to seek out thesepoints of rapid intensity changes. There are a number of edgedetection algorithms, including the Sobel edge detector, theLaplacian of Gaussian method, the Canny edge detector, thefast Fourier transform, the zero-crossing method, the Prewittmethod, and the Roberts method. Of all the edge detectionalgorithms, the Canny edge detector seems to be the mosteffective in detecting object edges, and the most widely used.

The Canny edge detector detects edges by finding thepixel points where the gradient magnitude is a maximumin the direction of the gradient, that is, in the directionof maximum intensity change. However, the image is firstsmoothed with a Gaussian filter to remove noise, which is aconvolution operation. The detection method is summarizedinto four steps as follows [15]:

(i) smooth image by convolving with an appropriateGaussian filter to reduce image details;

(ii) at each pixel, determine gradient magnitude andgradient direction along maximum intensity change;

(iii) mark the pixel as an edge if the gradient magnitudeat the pixel is greater than the pixels at both sides of itin the gradient direction;

(iv) remove the weak edges by hysteresis thresholding.

3.2. Sobel method

Similar to the Canny method, the Sobel edge detector is alsoa gradient-based method. It detects edges by searching formaxima and minima in the first derivative of the image.However, the Sobel method does not do any presmootheningof the image; therefore, it is more susceptible to noise, butis computationally less expensive and faster. The Sobel edgedetector performs a 2-D spatial gradient calculation on agray-scale image; two 3 × 3 convolution masks are used to

calculate gradients, one along the x-direction, and the otheralong the y-direction. The masks are given as follows:

⎡

⎢⎣

1 2 10 0 0−1 −2 −1

⎤

⎥⎦ in the x-direction;

⎡

⎢⎣−1 0 1−2 0 2−1 0 1

⎤

⎥⎦ in the y-direction.

(1)

3.3. BEMD in edge detection

The potential application of BEMD is in presmoothing ofimages before feature detection techniques are applied; thiscan pave the way for a hybrid method of edge detection thatinvolves the BEMD and an edge detector that does not havea presmoothing step. Images usually tend to be noisy and sofiltering out noise is essential to make the image ready forfurther analysis.

In BEMD, an image is decomposed into basis functionscalled IMFs; the set of IMFs are complete, so that summingup the IMFs and any residual left recovers the original image.EMD essentially acts as a dyadic filter [16, 17], and byextension, the BEMD also acts as a dyadic filter. It has beenobserved that the first IMF constitutes most of the noise inthe signal [11]. Hence removal of the first IMF reduces highspatial frequencies. Since BEMD is local in nature, imageblurring is reduced. Filtering occurs in time space ratherthan in frequency space; therefore, any nonlinearity andnonstationarity present in the data are preserved. Thus nospurious harmonics are introduced as occurs in traditionalFourier analyses that arise out of a priori definition ofsine and cosine basis sets. Although the first IMF has beenobserved to contain most of the noise, the first few IMFsfrom BEMD still usually contain a lot of the noise in theoriginal image; therefore, removing them and reconstructingthe image with the remaining IMFs tend to denoise theimage. The number of IMFs needed to be removed dependson the level of noise in the image; very noisy images requiremore high-frequency IMFs removed than do less noisyimages. The Canny edge detector has a prefiltering step inwhich images are denoised with a Gaussian filter beforeedge detection is accomplished. This detection method canbe computationally more expensive due to the convolutionprocesses required in Gaussian smoothing. The Sobel edgedetection method has no prefiltering step; however, it is moresusceptible to noise. Therefore, the BEMD is used to firstfilter the images before the Sobel method is applied. Anadvantage BEMD has over Gaussian filtering is that it doesnot involve any convolution process, and it is a local methodof denoising.

Traditional filtering (Gaussian, mean, or median filter-ing) requires an optimal filter size to perform effectively.However, it is not a trivial matter to determine the optimalfilter size; a large filter removes much of the noise but leavesmore blur while too small a filter size leaves little blur butmay leave a lot of noise. This problem is circumvented bythe BEMD because it is a local decomposition technique


Table 1: Detection results for asphalt images.

No. of Images Canny BEMD/Sobel

9

Good detection for 6 of the 9 images Good detection for 3 of the 9 images

or 67% of the 9 images or 33% of the 9 images

Of the 6 images detected, 4 images Of the 3 images detected, 2 images

had cracks (representing 67 % of the 6 images) had cracks (representing 67% of the 3 images)

The remaining 2 images had no cracks The remaining 1 image had no cracks

(representing 33% of the 6 images) (representing 33% of the 3 images)

Table 2: Detection results for PCC images.

No. of Images Canny BEMD/Sobel

6

Good detection for 2 of the 6 images Good detection for 2 of the 6 images

or 33% of the 6 images or 33% of the 6 images

Canny and BEMD/Sobel tied on the remaining 2 of the 6 images

(representing 33% of the 6 images); these images had cracks

Of the 2 images detected, 1 image had cracks Of the 2 images detected, none had cracks


The remaining 1 image had no cracks The remaining 2 images had no cracks


(a) With Canny: asphalt surface

(b) With BEMD/Sobel: asphalt surface

Figure 2

rather than global. For instance, the Gaussian filter incor-porates the Fourier transform, which is global and henceintroduces some artifacts due to nonstationarity and possiblenonlinearity.

4. ANALYSES

A total of 15 asphalt concrete and portland cement concrete(PCC) images are analyzed with the Canny edge detectorto detect cracks; the same images are again analyzed withthe Sobel edge detector, but this time BEMD is first used tosmooth the image before detection. The first IMF is removedfrom the original image and the residue, which is a smoothed



Figure 3

image, is analyzed with the Sobel method; the codes used areimplemented in Matlab. The objective is to find out if BEMDis able to perform image smoothing for more effective crackdetection. There are 9 asphalt concrete images and 6 PCCimages. A digital camera was used to take the images in clearweather; each image had a resolution of 256-by-256 pixels.There are images with cracks and images without cracks.For brevity, only 8 images are shown in the present paper:4 asphalt and 4 PCC images.

Hysteresis thresholding is used to aid in crack detection.The edge detection depends upon selection of appropriatethresholds; improper thresholds may result in many unnec-essary edges returned, or insufficient edges that result in




Figure 4



Figure 5

missing important edges. A standard deviation is chosen forthe Gaussian filter, and the effect of thresholding depends onthe chosen standard deviation.

Matlab codes for BEMD are as developed by Nunes et al.[11]; to generate IMFs, upper and lower envelopes areconstructed from strict extrema using interpolation byminimum curvature method.

Regarding the images used, asphalt concrete images tendto have a lot of irregularities due to the nature of the finishedsurface while PCC images tend to be smoother with fewerirregularities. Therefore, detecting cracks on asphalt concretesurfaces can be more challenging than on PCC surfaces.

5. RESULTS AND DISCUSSION

Figures 2 to 9 show the results of the edge detection attemptsby the Canny edge detector (all the “a” figures above) and bythe combination of BEMD and Sobel edge detector method(all the “b” figures below). A summary of the detectionresults for all 15 images is given in Tables 1 and 2.

After BEMD was performed on an asphalt image, thefirst three IMFs were discarded. The image was then recon-structed with the remaining IMFs, which was then used as

(a) With Canny: PCC surface

(b) With BEMD/Sobel: PCC surface

Figure 6



Figure 7

the input image for the Sobel Edge Detector. This is necessaryafter observing that removing only the first IMF does notsmooth the image enough for edge detection. However,removal of only the first IMF was sufficient smoothing for thePCC images. The Canny edge detector already has a Gaussianfilter, so no BEMD was performed for smoothing.

The Canny edge detector, and the BEMD/Sobel methodwere able to detect cracks more easily on PCC surfaces, butwith a little bit more difficulty for asphalt surfaces. Thiswas expected due to the many irregularities on the asphaltsurfaces analyzed. However, the Canny method generallyproved better on asphalt surfaces. It is also observed thatdespite the noisy output of the BEMD/Sobel method, crackedges could be detected on closer examination as may beseen in Figures 2 and 3. In Figure 2, the edge of the lanemarking and part of the horizontal crack can be madeout in Figure 2(b) despite the noisy output; however, evenwith less noise, Figure 2(a) (Canny method) is not able todetect the whole length of the horizontal crack, but is ableto easily bring out the diagonal crack connecting it at thejunction of the lane marking and the horizontal crack. InFigure 3, the crack is more easily identified in Figure 3(b)(BEMD/Sobel). For images with no cracks, as in Figures 4




Figure 8



Figure 9

and 5 for asphalt and Figures 6 and 8 for PCC, both methodsgenerally give acceptable results; BEMD/Sobel actually givesless noisy outputs, though, which is better.

Results for both methods were significantly more com-parable for PCC surfaces. With the exception of Figure 7, theBEMD plus Sobel method matched the Canny method in thequality of detection. The BEMD is a local analysis method,so the expectation is a better performance than the Gaussianfilter, which is a global analysis; fewer artifacts are expectedwith BEMD. However, the Sobel method still suffers from theeffects of noise in an image even after smoothing with BEMDwhen the image has a lot of irregularities, as is the case forasphalt concrete surfaces.

6. CONCLUSION

The present paper is an exploration into the possible appli-cation of BEMD to image smoothing before crack detectionwith the Sobel edge detector; the results are compared withthat of the Canny edge detector. Asphalt concrete and PCCimages, both with cracks and without cracks, are analyzedand compared qualitatively. It is observed that althoughBEMD does well smoothing an image before edge detection

with the Sobel method, the Sobel method still suffers fromthe effects of noise when the images have lots of irregularitiespresent, as is the case for asphalt concrete surfaces. Forimages with less irregularities, such as the PCC surfaces,crack detection is more effective, and easily comparable toresults from the Canny method; for PCC surfaces with nocracks, the BEMD/Sobel method gives outputs with lessnoise, which is better. Overall, the Canny edge detectorperformed better than the BEMD/Sobel method for asphaltsurfaces, and slightly better for PCC surfaces. More researchis needed to further explore the effectiveness of BEMD as asmoothing filter for quality crack detection.

ACKNOWLEDGMENT

Part of this paper has been presented at the SPIE Defense& Security Symposium, Orlando, Florida, USA, 9–13 April2007.

REFERENCES

[1] H. D. Cheng, J.-R. Chen, C. Glazier, and Y. G. Hu, “Novelapproach to pavement cracking detection based on fuzzy settheory,” Journal of Computing in Civil Engineering, vol. 13,no. 4, pp. 270–280, 1999.

[2] B. J. Lee and H. D. Lee, “Position-invariant neural networkfor digital pavement crack analysis,” Computer-Aided Civil andInfrastructure Engineering, vol. 19, no. 2, pp. 105–118, 2004.

[3] P. Delagnes and D. Barba, “A markov random field forrectilinear structure extraction in pavement distress imageanalysis,” in Proceedings of IEEE International Conference onImage Processing (ICIP ’95), vol. 1, pp. 446–449, Washington,DC, USA, October 1995.

[4] C. Chandan, K. Sivakumar, E. Masad, and T. Fletcher,“Application of imaging techniques to geometry analysis ofaggregate particles,” Journal of Computing in Civil Engineering,vol. 18, no. 1, pp. 75–82, 2004.

[5] J. M. Brzezicki and J. Kasperkiewicz, “Automatic image analy-sis in evaluation of aggregate shape,” Journal of Computing inCivil Engineering, vol. 13, no. 2, pp. 123–128, 1999.

[6] L. Wang, X. Wang, L. Mohammad, and C. Abadie, “Unifiedmethod to quantify aggregate shape angularity and textureusing Fourier analysis,” Journal of Materials in Civil Engineer-ing, vol. 17, no. 5, pp. 498–504, 2005.

[7] N. E. Huang, Z. Shen, S. R. Long, et al., “The empirical modedecomposition and the Hilbert spectrum for nonlinear andnon-stationary time series analysis,” Proceedings of the RoyalSociety A, vol. 454, no. 1971, pp. 903–995, 1998.

[8] N. E. Huang, M.-L. C. Wu, S. R. Long, et al., “A confidencelimit for the empirical mode decomposition and Hilbertspectral analysis,” Proceedings of the Royal Society A, vol. 459,no. 2037, pp. 2317–2345, 2003.

[9] A. Linderhed, “2-D empirical mode decompositions in thespirit of image compression,” in Wavelet and IndependentComponent Analysis Applications IX, vol. 4738 of Proceedingsof SPIE, pp. 1–8, Orlando, Fla, USA, April 2002.

[10] A. Linderhed, “Variable sampling of the empirical modedecomposition of two-dimensional signals,” InternationalJournal of Wavelets, Multiresolution and Information Process-ing, vol. 3, no. 3, pp. 435–452, 2005.



[12] J. C. Nunes, S. Guyot, and E. Delechelle, “Texture analysisbased on local analysis of the bidimensional empirical modedecomposition,” Machine Vision and Applications, vol. 16,no. 3, pp. 177–188, 2005.

[13] A. Linderhed, “Image compression based on empirical modedecomposition,” in Proceedings of the 3rd International Con-ference on Image and Graphics, pp. 430–443, Hong Kong,December 2004.

[14] S. Sinclair and G. G. S. Pegram, “Empirical mode decompo-sition in 2-D space and time: a tool for space-time rainfallanalysis and nowcasting,” Hydrology and Earth System SciencesDiscussions, vol. 2, no. 1, pp. 289–318, 2005.

[15] L. Ding and A. Goshtasby, “On the Canny edge detector,”Pattern Recognition, vol. 34, no. 3, pp. 721–725, 2001.

[16] P. Flandrin, G. Rilling, and P. Goncalves, “Empirical modedecomposition as a filter bank,” IEEE Signal Processing Letters,vol. 11, no. 2, pp. 112–114, 2004.

[17] Z. Wu and N. E. Huang, “A study of the characteristics of whitenoise using the empirical mode decomposition method,”Proceedings of the Royal Society A, vol. 460, no. 2046, pp. 1597–1611, 2004.

Documents

The Empirical Mode Decomposition and the Hilbert-Huang Transform