117
Ensemble Based Classification and Forecasting Methods Dr P. N. Suganthan School of EEE, NTU, Singapore Some Software Resources Available from: http://www.ntu.edu.sg/home/epnsugan TENCON 2016 MBS Singapore, 22 nd Nov. 2016

Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

  • Upload
    lycong

  • View
    239

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Ensemble Based Classification and

Forecasting Methods

Dr P. N. SuganthanSchool of EEE, NTU, Singapore

Some Software Resources Available from:

http://www.ntu.edu.sg/home/epnsugan

TENCON 2016

MBS Singapore, 22nd Nov. 2016

Page 2: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

General Concept

S Training

Data

S1 S2 SnMultiple Data Sets

P1 P2 PnMultiple

Predictors

HCombined Decision

Y. Ren, L. Zhang, and P. N. Suganthan, "Ensemble Classification and Regression –

Recent Developments, Applications and Future Directions,“ IEEE Computational

Intelligence Magazine, DOI: 10.1109/MCI.2015.2471235 , Feb 2016.

2

Page 3: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Outline• Classification

– Random forests

– Random Vector Functional Link

– Kernel Ridge Regression

– Benchmarking

• Forecasting– Random vector functional link networks

– Random forests

Non-iterative methods

3

Page 4: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

4

Bagging(1)

Sampling from original training data (without

deletion) Usually to the same number of samples

in each bag, as in the original dataset. Around 65%

of distinct samples and 35% duplicates.

Original training data

Bags

Same number of samples

Page 5: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Bagging(2)

1. For i=1: No.Trs // No.Trs is number of trees or bags

a) Draw (with replacement) from the training set to generate the training data Si

b) Learn a classifier Ci on Si .

2. For each test example

a) Try all classifiers Ci

b) Predict the class that receives the highest number of votes.

5

Page 6: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Random Forests

L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5–32, 2001.

Patented with restrictions on commercial usage.6

terminal node.

using d

Page 7: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Decision Tree:

An Example

Classifying an email as either as

spam or a valid email using

Features such as ch! , ch$, 1999,

george, free, our, receive, …

Each node is labeled based on

majority training samples reaching

the node. Training samples of

each class is indicated as x/y

below each node.

Page 8: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

A node of decision Tree

1,1 1,2 1,

2,1 2,2 2,

3,1 3,2 3,

1,1 1,2 1,

,1 ,2 ,

d

d

d

n n n d

n n n d

X X X

X X X

X X X

X X X

X X X

Each row is a

data point

Each column is a feature. There

are M features per training sample.

d features are selected randomly.

Threshold

How to optimize the

threshold for each

feature?

Assume that n training samples (each with M features) reach a node.

8

Page 9: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

• Gini impurity, the threshold should be optimized so that is maximized.

Optimize threshold for each feature

2

1

2 2

1 1

( ) 1 ( )

( ) [1 ( ) ] [1 ( ) ]

i

i i

cw

beforesplit

i

l rl rc cw w

aftersplit l ri i

nGini t

n

n nn nGini t

n n n n

beforesplit aftersplitGini Gini

There are other impurity criteria such

as information gain, etc.

C -total number of classes at the node

n – total samples

r - right branch

l – left branch

nwi- samples of class wi

9

Page 10: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Rotation Forest

NxM M is the dimensionality of each data.

Page 11: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Rotation Forest

• A distinct rotation matrix for each tree

• All features, all samples are rotated at the root level.

• In the default version, all rotated features are used in each node.

L. Zhang, P N Suganthan, “Random Forests with Ensemble of Feature Spaces,” Pattern Recognition, 47 (10), 3429-3437, 2014. (Codes Available: 2015-TCyb-Oblique-RF

11

Page 12: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

12

Rotation Forest Illustrated

Page 13: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

13

Rotation Forest(2)

• Base classifiers: decision trees Forest

PCA is a simple rotation of the coordinate axes Rotation Forest

Page 14: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

14

Method for Constructing Rotation Matrix

X: the objects in the training data set

x = [x1, x2, …, xM

]T

a data point with M

features

1 1 11 2

1 2

n

N N Nn

x x x

X

x x x

N×M matrix

Y = [y1, y2, …, yN]T

: class labels with c classes

Page 15: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

15

Method for Constructing Rotation Matrix

• For i = 1 … L (to construct the training set for

classifier Di)

F : full feature set

Fi,1

Fi,2

Fi,3

…Fi,K

K subsets (Fi,j j=1…K)

each has m = M/K features

Page 16: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

16

Method for Constructing Rotation Matrix

• For j = 1 … K

F1,1F1,2

F1,3

…F1,K

X1,1: data set X for the features in F1,1

Eliminate a random

subset of data

Select a bootstrap sample

from X1,1 to obtain X’1,1

Run PCA on X’1,1 using only m (default m=3) features

Principal components a(1)

1,1,…,a(m1)

1,1

Page 17: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

17

Method … (Cont’d)

• Arrange the principal components for all j to obtain rotation matrix

• Rearrange the rows of R1 so as to match the order of features in F to obtain R1

a

• Build classifier D1 using XR1a as a training set

1

2

( )(1) (2)1,1 1,1 1,1

( )(1) (2)1,2 1,2 1,2

1

( )(1) (2)1, 1, 1,

, ,..., [0] [0]

[0] , ,..., [0]

[0] [0] , ,..., K

M

M

MK K K

a a a

a a aR

a a a

1 1 11 2

1 2

n

N N Nn

x x x

X

x x x

Page 18: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Hyperplane in a node of a Decision Tree

• In most case, axis-parallel hyperplane is employed in the decision tree.

• Oblique hyperplane can be better

• Axis-parallel or univariate or othorgonal

• Oblique or multi-variate

18

Page 19: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Oblique VS axis-parallel hyperplane

X>2

. Y>4

X>5 +

.Y>2

+

An example instance space,“+”“.”means different classes,

the stage shows the decision boundary of univariate DT and

the line shows the decision boundary of oblique DT

x

y

no

no

no

no

.

yes

yes

yes

19

Page 20: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

A Toy Example of Oblique and axis-Parallel RF

20

Page 21: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Oblique Decision Tree (DT) ensemble Via

multisurface Proximal SVM (MP-SVM)

• Univariate (axis-parallel ) DT: finding a split amounts to finding an feature which is the most ‘useful ‘ in discriminating the input data.

• Multivariate (oblique) DT: finding a ‘composite’ feature, a combination of the existing features that has good discriminatory power.

• Suppose there are n examples with d features. For Uni-DT, there exist only d(n-1) hyperplanes. However, the number of the oblique hyperplanes is O(nd) .

• Exhaustive search which works quite good for uni-DT is impossible for Multi-DT.

Le Zhang and P. N. Suganthan, “Oblique Decision Tree Ensemble via MultisurfaceProximal Support Vector Machine,” IEEE Transactions on Cybernetics, Oct 2015. 10.1109/TCYB.2014.2366468 (Codes available online) 21

Page 22: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

How to find the optimal hyperplane in

each internal node

• Heuristic search: hill-climbing, simulated annealing, Genetic algorithm

• Time-consuming, usually lead to sub-optimal solutions.

• For pattern classification, evolutionary methods are not effective.

22

Page 23: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

SVM and its Variant: Multisurface

Proximal SVM (MP-SVM)

SVM finds two parallel hyperplanes that divide the feature spaces into three

disjoint parts. The data lies in between these two hyperplanes are linearly

inseparable. SVM classifies data by assigning them to one of the remaining

two disjoint half-spaces.

MP-SVM aims to find two clustering hyperplanes, where the first plane is

closest to the class 1 data & farthest from class 2 and the second plane is

closest to class2 and furthest from class 1. 23

Page 24: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Obtain the optimal hyperplane from MP-SVM

• An example of how to get the optimal hyperplane from the two clustering hyperplanes. The two red planes are the clustering planes and the two blue planes are the two angle bisectors of the two clustering planes. Here we can choose the bisector to act as a test. The data above the plane goes to one child node and the others goes to the other child node. Since there are two bisectors, we can choose the one which has better discriminant ability. Here the discriminant ability can be defined as some criteria such as info-gain or GINI impurity.

24

Page 25: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

MPSVM• The first and the second clustering hyperplanes (W below) can be obtained

by solving the two following equations, where A represents the matrix (each row is a data sample) of the first class and B stands for the matrix of the second class, e is a vector of ones with the same dimension as AW and BW.

2

2

2

2

0),(

2

2

2

2

0),(

/||||

/||||

min

/||||

/||||

min

b

WebAW

b

WebBW

b

WebBW

b

WebAW

bW

bW

25

Page 26: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

MPSVM

• By defining

G=[A -e]’[A -e], H=[B -e]’[B -e]

Z=[W b]’ the two problems become:

• Thus, the two clustering hyperplanes can be found by the eigenvectors corresponding to the smallest and largest eigenvalues of the following generalized eigenvalue problem:

0, zHzGz

0 0

Hmin , min

G

T T

T TZ Z

Z GZ Z Z

Z HZ Z Z

26

Page 27: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

MPSVM based decision tree

• How to handle the multiclass problem since the MPSVM can only solve the binary classification problems.

• As the tree grows, the number of samples reaching lower nodes will become lesser, but the matrix G and H have the size of (M + 1) × (M + 1) always, So G and H may become singular.

• (One approach would be to use feature subspace to solve singularity problem and to reduce computation time)

27

Page 28: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

MP-SVM Based Decision Tree

21

12

12

1

211221

2/)(ln

2

1

)(2

)(8

1),(

TwwB

28

Page 29: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Multiclass problem

C classes

Distance Matrix

Wp WnRemaining

class

Distance 1 Distance 2

Distance 1<Distance 2 Distance 1> Distance2

Largest distance

1,2 1,

2,1 2,

,1 ,2

0

0

0

c

c

c c

d d

d d

d d

C X C matrix

29

(group B)(Group A)

Page 30: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

NULL space method

30

Solving the small sample size (S3) problem.

Page 31: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Null space approach

0min

T

TZ

Z GZ

Z HZ

1 1 1

' ' ' ' ' '

' ' '

[ , ]n

x H

Q

H QQ x xQQ QQ HQQ

G QQ GQQ

'

'0max

T

TZ

Z H Z

Z G Z

'

'

0

0;

max

T

T

Z

Z G Z

Z H Z

Eigenvector corresponding to the largest eigenvalue

Suppose G has rank r

0

Hmax

G

T

TZ

Z Z

Z Z

31

If G is singular:

Page 32: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Other methods to handle S3 problem

• NULL space method [1,2] is very sensitive to the data perturbations.

• Here, we apply two regularization approaches: Tikhonov regularization and axis-parallel split regularization.

[1] X. Jiang, “Linear subspace learning-based dimensionality reduction,” Signal

Processing Magazine, IEEE, vol. 28, no. 2, pp. 16–26, 2011.

[2] X. Jiang et al, “Eigenfeature regularization and extraction in face recognition,”

TPAMI, vol. 30, no. 3, pp. 383–394, 2008.

32

Page 33: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Other methods to handle S3

• Tikhonov regularization works by adding a constant term to the diagonal entries of the matrix to be regularized. In our case, suppose G becomes rank deficient, G is regularized by:

IGG *'

33

Page 34: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Other methods to handle S3

• If the matrix of G or H becomes singular at a node, we can always use the axis-parallel split to continue the tree induction process. Hence, the decision tree grows using heterogeneous test functions. From the root node to the current node, the decision tree uses the MPSVM to perform splits. From the current node to the leaf node, the decision tree switches to use the axis-parallel splitting method.

34

Page 35: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

• Here we make no assumption about the tree structure.

• Suppose there are m data samples with n features.

• For axis-parallel split, m*n*log n(n*log n for each feature with Gini takes log n)

• For the MPSVM based decision tree , complexity of the generalized problem is of order m3

.

• In the upper level of nodes (near root node), MPSVM is much faster than axis-parallel split.

• In the lower level of nodes(near leaf node), the computational complexity is small (less data).

• MPSVM-T has slightly larger computational complexity than standard DT. MPSVM-P has the same computational complexity as the standard DT.

T – Tikhonov regularization & P – Axis parallel regularization

Computational Complexity

3

2

log

log

mn n m

n n m

35

Page 36: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Experiment result

36

Page 37: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Datasets

37

Le Zhang and P. N. Suganthan, “Oblique Decision Tree Ensemble via Multisurface Proximal Support Vector Machine,” IEEE Transactions on Cybernetics, Oct 2015. 10.1109/TCYB.2014.2366468 (Codes available online)

Page 38: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

38

Experimental Results(1)

• Experimental settings:

1. Rotation Forest, and Random Forest were kept at their default values in WEKA

2. for Rotation Forest, M is fixed to be 3

3. all ensemble methods have the same ensemble size 50.

4. base DT classifier: CART (Breiman), Classification and Regression Tree.

5. database: UCI Machine Learning Repository, Bioinformatic dataset, Face recognition dataset

6. 10 3-fold cross validation.

Page 39: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Experimental Result: RaF

39

MPRaF-P: Multisurface Proximal Random Forest with

axis parallel regulrization.

Page 40: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Experimental Result: RoF

40

MPRoF-T: Multisurface Proximal Rotation Forest

with Tikhonov regularization

Page 41: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

• RaF

• RoF

Discussion

0min

T

TZ

Z GZ

Z HZ

For RaF, H is (m+1)X(m+1). For default RoF, H is (M+1)X(M+1) with m~=sqrt(M).

41

Why MPSVM works for RaF?

Page 42: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Random Rotation Forest

• In order to solve this problem, we propose to employ random feature subspace in the base learner of Rotation Forest and name it as Random Rotation Forest (RRoF). In each node, the test function is evaluated on a randomly selected sub feature set instead of the whole feature set.

• In this case, Random Forest and Rotation Forests differ in the way that they perturb the data: Random Forest uses bagging to create a data subset and Rotation Forests employs different rotation matrices for different trees.

42

Page 43: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Experiment Result: RRoF

43

MPRRoF-N: Multisurface Proximal Random Rotation

Forest with Null space regularization

Page 44: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

• In order to find out the statistical significance of the results, we carry out a Friedman test. It ranks the algorithms for each data set separately with the best performing algorithm getting the lowest rank. Let ri

j be the rank of the jth of k algorithms on the ith of N data sets. The Friedman test compares the average ranks of algorithms Rj. Under the null-hypothesis, which states that all the algorithms are equivalent and so their ranks Rj should be equal, the Friedman statistic

is distributed according to XF2 with k − 1 degrees of freedom, when N and k are big

enough. In that case, Friedman’s XF2 is undesirably conservative and derived a better

statistic:

Which is distributed according to the F-distribution with k − 1 and

(k − 1)(N − 1) degrees of freedom [1].

4

)1(

)1(

12 222 kk

Rkk

NX

j

jF

RaF vs RoF vs RRoF

2

2

)1(

)1(

F

FF

kN

NF

[1]. J. Demˇsar, “Statistical comparisons of classifiers over multiple data sets,”

JMLR, vol. 7, pp. 1–30, 2006.

44

Page 45: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

RaF vs RoF vs RRoF• If the null-hypothesis is rejected, the Nemenyi test (Nemenyi,1963) can be

used to check whether the performance of two among k classifiers is significantly different. If the corresponding average ranks differ by at least the critical difference

we say there is a significant difference between 2classifiers.

N

kkqCD

6

)1(

45

Page 46: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

RaF vs RoF vs RRoF

46

Page 47: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

• The average ranks for RaF, RoF and RRoF are 2.24, 2.07, 1.69, respectively. Then XF

2 =6.98, FF =3.71. The critical value of F(2,86) for =0.05 is ~2.72, so we reject the null-hypothesis.

• In this case, CD=0.5.

RaF vs RoF vs RRoF

47

Page 48: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Which regularization works better ?

Method RaF(2.95) MPRaF-T (1.97) MRPaF-P(2.11) MPRaF-N(2.97)

RaF(2.95)

MPRaF-T(1.97) √ √

MPRaF-P(2.11) √ √

MPRaF-N(2.97)

Method RRoF(2.91) MPRRoF-T (2.19) MPRRoF-P(2.15) MPRRoF-N(2.75)

RRoF(2.91)

MPRRoF-T(2.19) √

MPRRoF-P(2.15) √

MPRRoF-N(2.75)

The numbers in the bracket represent the average rank for the algorithm, √means the method of the row is significantly better than the method of of the

column. Empty entry means there is no significant difference between the method

of row and the method of the column. 48

Page 49: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Bias-Variance point of view: Bias

Bias-RaF

the smallest value of bias/variance gets rank 1, the largest value gets highest rank.

Groups of methods with the value that are not significantly different (alpha = 0.05)

are connected.

Bias-RRoF

49

Page 50: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

1

• For RaF, MPRaF-T and MPRaF-P generates lower bias than the others and MPRaF-P is slightly better, which demonstrates that MPSVM can better capture the geometric structure of the data.

2• For RRoF, the MPRRoF-N is significantly worse than the others.

3• This further indicates that Rotation Forest tends to generate

base classifiers with slightly lower-bias than Random Forest..

Discussion about the Bias

50

Page 51: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Bias-Variance point of view-Variance

Var-RaF

Var-RRoF

For all the

ensembles, no

significant

difference is

detected among

the variance of

the base

classifiers.

51

Page 52: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Discussion about the Variance

1

• For RF and its MPSVM based variants, though there are no significant difference among their variances, the MPRaF tend to reduce the variance to a larger extent, especially for MPRaF-T and MPRaF-P.

2• The exactly same conclusion can be drawn for RRoF

and its MPSVM based variants.

52

Page 53: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

For a given regularization approach,

which ensemble method is better?

• we use Sign Test to compare each pair of algorithms.

• If the two algorithms compared are, as assumed under the null-hypothesis, equivalent, each should win on approximately N/2 out of N data sets: if the number of wins is at least N/2 + 1.96sqrt(N)/2, the algorithm is significantly better with p < 0.05

53

Page 54: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Method Significance

(MPRaF-T,MPRRoF-T) (18,26)

(MPRaF-P,MPRRoF-P) (16,28)

(MPRaF-N,MPRRoF-N) (12,32) √

For a given regularization approach, which

ensemble method is better?

The first number in the bracket represents the number of

times RaF wins, the second number means the number of

times RRoF wins. √ means there is significant difference

between this pair of algorithms.

54

Page 55: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

• The parameter m denotes the number of features randomly selected at each node

• the smaller m is, the stronger the randomization of the trees and the weaker the dependence of their structures on the output

• However, if m is small, the features randomly selected at a node may fail to capture the geometry of the data samples

On the effect of m

55

Page 56: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

The effect of m for “ parkinsons” dataset

RaF RRoF

56

Page 57: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

The effect of m for “ wine quality (Wine)” dataset

RaF RRoF

57

Page 58: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

• For very small value of m, the accuracy of all the ensemble methods are very low, especially for the MPSVM based ensembles

• However, as the m grows, the accuracies of all MPSVM based ensembles grow significantly and become stable very quickly except for the MPSVM with NULL space regularization.

Discussion about the parameter m

58

Page 59: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

• ensemble methods with other base classifiers

• ensemble with deep learning, big data, etc.

• application for ensemble methods

Future Work

59

Page 60: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

• We choose MPRaF-P and compare with other 179 classifier on 121 UCI dataset:

Fernández-Delgado, Manuel, et al. "Do we need hundreds of classifiers to solve real world classification problems?." The Journal of Machine Learning Research 15.1 (2014): 3133-3181.

More Comprehensive Evaluation

60

Page 61: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Top 2 are our proposed methods with axis parallel regularization (P)

61

Page 62: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Strengths & Weaknesses of RF

Easy to parallelize

Excellent batch mode performance

Online learning

Not friendly for transfer learning, visual feature extraction.

62

Page 63: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Neural Networks

• Theoretical proof about the approximation ability of standard multilayer feedforward network can be found in:

Hornik, Kurt, Maxwell Stinchcombe, and Halbert White. "Multilayer feedforward networks are universal approximators." Neural networks 2.5 (1989): 359-366.

• Some conditions are:

Arbitrary squashing functions.

Sufficiently many hidden units are available.

• Standard multilayer feedforward network can approximate virtually any function to any desired degree of accuracy.

63

Page 64: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Weakness and Improvement

• Some weakness of back-prop:

Error surface usually have multiple local minima.

Slow convergence.

Sensitivity to learning rate setting.

• Improvement:

parameters in hidden layers can be randomly and appropriately

generated without learning.

The parameters in the last layer can be computed by least squares.

Reference:

Wouter F Schmidt, Martin Kraaijveld, Robert PW Duin, et al. “Feedforward neural networks with random

weights”. In: 1992 11th IAPR International Conference on Pattern Recognition. IEEE. 1992, pp. 1–4

Yoh-Han Pao, Gwang-Hoon Park, and Dejan J Sobajic. “Learning and generalization characteristics of the

random vector functional-link net”. In: Neurocomputing 6.2 (1994), pp. 163–180.

64

Page 65: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Structure of RVFL

Parameters H (indicated in blue) in enhancement nodes are randomly generated in a

proper range and kept fixed.

Original features (from the input layer) are concatenated with enhanced features (from

hidden layer) to boost the performance.

Learning aims at solving yi = di ∗ W, i = 1, 2, ..., n, where n is the number of training

sample, W (indicated in red and gray) are the weights in the output layer, and y, d

represent the target and the combined features, respectively.65

Page 66: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

RVFL details• Notations:

• X = [x1, x2, ...xn]’ : input data (n samples and m features).

• β = [β1, β2, ...βm]’ : the weights for the enhancement nodes (m × k, k is the number of enhancement nodes).

• b = [b1, b2, ...bk ] the bias for the enhancement node.

• H = h(X ∗ β + ones(n, 1) ∗ b): feature matrix after the enhancement nodes and h is the activation function.

66

Tribas – triangular basis

Relu - rectified linear Unit

Page 67: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

RVFL details

67

d(x1)

d(x2)

d(xn)

=

Page 68: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Learning in RVFL

• Once the random parameters β and b are generated in proper range, the learning of RVFL aims at solving the following problem: yi = di ∗ W, i = 1, 2, ..., n, where n is the number of training sample, W are the weights in the output layer, and y, d represent the target and the combined features.

• Objective: min ||DW-Y||2 + λ ||W||2

• W

• Solutions:

In prime space: W = (λI + D’D) -1D’Y;

In dual space: W = D’(λI + DD’ )-1Y

where λ is the regularization parameter, when λ → 0, the methods converges to the Moore-Penrose pseudoinverse solution. D and Y are the stacked versions of the features and target.

68

Page 69: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Comprehensive Evaluation of RVFL

69

L. Zhang, P. N. Suganthan, "A Survey of Randomized Algorithms for

Training Neural Networks," Information Sciences, DoI: 10.1016/j.ins.2016.01.039,

Volumes 364–365, pp.146–155, Oct, 2016

Page 70: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Evaluation Protocol

• Follow the exact pipeline in JMLR paper.

• Friedman rank for each method is used to evaluate the classifier.

L. Zhang, P. N. Suganthan, "A Comprehensive Evaluation of Random Vector Functional Link Networks," Information Sciences, DOI:10.1016/j.ins.2015.09.025, Volumes 367–368, pp. 1094–1105, Nov 2016. (Codes Available: 2016-RVFL-Comp-Eval-Classification)

70

Page 71: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Results: direct link and bias

The direct links lead to better performance than those without in all cases.

The bias term in the output neurons only has mixed effects on the performance,

as it may or may not improve performance.

71

Page 72: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Results: Activation Function.

radbas function always leads to a better performance. hardlim and

sign activation functions lead to penultimate worst and worst

performances, respectively.

72

Page 73: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Results: Moore–Penrose

Pseudoinverse VS Ridge Regression

ridge regression based RVFL shows better performance than

the Moore–Penrose pseudoinverse based RVFL

73

Page 74: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Results: Scaling the Randomization

range of input weights and biases

74

Page 75: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Performance of RVFL based for different ranges of randomization. Smaller rank indicates

better accuracy and less number of hidden neurons. N stands for the number of hidden

neurons corresponding to the testing accuracy used in the ranking. In other words for each

ranking performance is maximized and corresponding number of neurons recorded.

75

Page 76: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Results: Scaling the randomization

range of input weights and biases

• Scaling down the randomization range of input weights and biases to avoid saturating the neurons may risk at degenerating the discrimination power of the random features. However, this can be compensated by having more hidden neurons or direct link.

• Scaling the randomization range of input weights and biases up to enhance the discrimination power of the random features may risk saturating the neurons. Again, this can be compensated by having more hidden neurons or combining with the direct link from the input to the output layer. However, for reasons explained in Section 2.5, we prefer lower model complexity, i.e. lower number of hidden neurons.

76

Page 77: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Ridge Regression

Linear regression :

Its solution:

Consider the feature map below which maps original feature to

higher dimension to enhance the discriminability, i.e. kernel trick:

n

n

i

T

i wwyn 1

22 ||||)x(1

Page 78: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Kernel Ridge Regression

According to representer theorem, one can express the solution

as a linear combination of the samples:

Objective with kernel trick:

The solution is:

Output

78

Page 79: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

KRR for Classification

• Through the commonly used 0-1 coding:

C. Saunders, A. Gammerman and V. Vovk, "Ridge Regression Learning Algorithm in

Dual Variables", in Proc ICML 1998.

S. An, W. Liu, and S. Venkatesh, 2007, "Face recognition using kernel ridge regression",

in CVPR 2007 : Proceedings of the IEEE Computer Society Conference on Computer

Vision and Pattern Recognition, IEEE, Piscataway, N.J, pp. 1-7, 2007.

Page 80: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Overall performance compared with JMLR Results1

Top 10 classifier Top 11-20 classifier

Our proposed method.

Reference: Fernández-Delgado, Manuel, et al. "Do we need hundreds of classifiers to

solve real world classification problems?." The Journal of Machine Learning

Research 15.1 (2014): 3133-3181.

Page 81: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

References

• Zhang, Le, and Ponnuthurai Nagaratnam Suganthan. "Random forests with ensemble of feature spaces." Pattern Recognition 47.10 (2014): 3429-3437.

• Zhang, Le, and Ponnuthurai N. Suganthan. "Oblique decision tree ensemble via multisurface proximal support vector machine." Cybernetics, IEEE Transactions on 45.10 (2015): 2165-2176.

• Zhang, Le, Ren, Ye, and P. N. Suganthan. "Ensemble Classification and Regression-Recent Developments, Applications and Future Directions [Review Article]." Computational Intelligence Magazine, IEEE 11.1 (2016): 41-53.

• Zhang, Le, and P. N. Suganthan. "A comprehensive evaluation of random vector functional link networks." Information Sciences (2015).

• Zhang, Le, and P. N. Suganthan. "A Survey of Randomized Algorithms for Training Neural Networks." Information Sciences (2016).

81

Page 82: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Time Series Forecasting in

Renewable Energy Systems

82

Page 83: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Outline

• Introduction– Time Series– Computational Intelligence– Ensemble Methods

• Empirical Mode Decomposition (EMD)– EMD-SVR– Ensemble EMD– Application: Wind speed forecasting

• Random Vector Functional Link (RVFL) Network– RVFL structural configurations– Application: Load demand forecasting– Application: Wind power ramp classification

83

Page 84: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Outline

• Introduction– Time Series– Computational Intelligence– Ensemble Methods

• Empirical Mode Decomposition (EMD)– EMD-SVR– Ensemble EMD– Application: Wind speed forecasting

• Random Vector Functional Link (RVFL) Network– RVFL structural configurations– Application: Load demand forecasting– Application: Wind power ramp classification

84

Page 85: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Introduction – Time Series

A time series is a sequence of data points that1. Consists of successive measurements made over a time interval2. The time interval is continuous3. The distance in this time interval between any two consecutive data point is the same4. Each time unit in the time interval has at most one data point5. Univariate / multivariate6. Classification or forecasting.

85

Page 86: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Introduction – Applications

Finance Energy

86

Page 87: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Time Series Forecasting

• Historical value => future value

• Current observations => future value

• Hybrid of the above two

• Statistical approach

• Physical approach based on differential eqns.

• Computational intelligence based approach

• Hybrid/ensemble approach

Y. Ren, L. Zhang, and P. N. Suganthan, “Ensemble Classification and Regression –

Recent Developments, Applications and Future Directions,” IEEE Comput. Intell.

Mag., 2016, doi: 10.1109/MCI.2015.2471235

Page 88: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Time Series Forecasting (Cont’d)

• Wind speed/power forecasting

• Electricity Load demand forecasting

• Electricity Price forecasting

• Solar irradiance/power forecasting

• Load/wind/solar power ramp forecasting, etc.

88

Page 89: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Outline

• Introduction– Time Series– Computational Intelligence– Ensemble Methods

• Empirical Mode Decomposition (EMD)– EMD-SVR– Ensemble EMD– Application: Wind speed forecasting

• Random Vector Functional Link (RVFL) Network– RVFL structural configurations– Application: Load demand forecasting– Application: Wind power ramp classification

89

Page 90: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Empirical Mode Decomposition (EMD)

Finally:

IMF: Intrinsic Mode FunctionsN. Huang, Z. Shen, S. Long, M. Wu, H. Shih, Q. Zheng, N. Yen, C. Tung, and H. Liu, “The

empirical mode decomposition and Hilbert spectrum for nonlinear and nonstationary time series

analysis,” Proc. Royal Society London A, vol. 454, pp. 903–995, 1998.

Page 91: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Empirical Mode Decomposition (EMD)

Finally:

N. Huang, Z. Shen, S. Long, M. Wu, H. Shih, Q.

Zheng, N. Yen, C. Tung, and H. Liu, “The

empirical mode decomposition and Hilbert

spectrum for nonlinear and nonstationary time

series analysis,” Proc. Royal Society London A,

vol. 454, pp. 903–995, 1998.

Page 92: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Empirical Mode Decomposition (EMD)

(cont’d)

92

Page 93: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Empirical Mode Decomposition (EMD)

(cont’d)

• Adaptive, Local, Orthogonal, Completeness

• Decompose complex time series into simpler time series (narrow band, symmetric)

• Reveal hidden features/correlations of the time series

• Mode mixing problem:– a sub series consists of

signal spanning a wide band of frequency or

– more than one sub series contain signals in a similar frequency band

• Ensemble of EMD => solve the problem

G. Rilling, P. Flandrin, and P. Gonc¸alv`es, “On empirical mode decomposition and its algorithms,” in

Proc. IEEE-EURASIP Workshop on Nonlinear Signal and Image Processing (NSIP’03), no. 3, Grado,

Italy, 2003, pp. 8–11.93

Page 94: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Ensemble of EMD

• Ensemble EMD (EEMD)– Add uncorrelated

Gaussian noise to the original time series,

– Repeat EMD to the noise added series

– Combine the results: noise will cancel each other

– But completeness is violated

Z. Wu and N. E. Huang, “Ensemble empirical mode decomposition: a noise-assisted

data analysis method,” Advances in Adaptive Data Analysis, vol. 1, no. 1, pp. 1–41,

2009.

Page 95: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Ensemble of EMD (cont’d)

• Complementary EEMD (CEEMD)

– Gaussian noise are in pair and complementary to each other

– Completeness is retained

– Needs more trials (ensembles)

J.-R. Yeh, J.-S. Shieh, and N. E. Huang, “Complimentary ensemble empirical mode decomposition: a novel

noise enhanced data analysis method,” Advances in Adaptive Data Analysis, vol. 2, no. 2, pp. 135– 156,

2010. 95

Page 96: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Ensemble of EMD (cont’d)

• Complete EEMD with Adaptive Noise (CEEMDAN)

– Adaptive noise

– Sequential process

– Reduce number of trials (ensembles)

– But cannot do parallel computing

M. Torres, M. Colominas, G. Schlotthauer, and P. Flandrin, “A complete ensemble

empirical mode decomposition with adaptive noise,” in Proc. IEEE International

Conference on Acoustics, Speech and Signal Processing (ICASSP’11), no. 1520-6149,

Prague, Czech Republic, 22-27 May 2011.

Page 97: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Study on EMD-SVR/ANN for wind speed

forecasting

• Forecasting models: – EMD-ANN, EEMD-ANN, CEEMD-ANN, CEEMDAN-ANN,

– EMD-SVR, EEMD-SVR, CEEMD-SVR, and CEEMDAN-SVR.

• 12 wind speed TS datasets obtained from National Data Buoy Center(NDBC).

• 70% for training and 30% for testing.

• Forecasting horizon: 1, 3 and 5 hours ahead.

• Scaled to (0,1] interval.

• Compare on RMSE/MASE

• CEEMDAN: Complementary

ensemble empirical mode decomposition

with adaptive noise.Y. Ren, P. N. Suganthan, and N. Srikanth, “A comparative study of empirical mode decomposition-based short-term wind speed forecasting methods,” IEEE Trans.

Sustain. Energy, vol. 6, no. 1, pp. 236--244, Jan. 2015.

RMSE

Page 98: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Study on EMD-SVR/ANN for wind speed

forecasting (cont’d)

Friedman Test of EMD based Hybrid Methodsfor 1, 3 and 5 hour ahead Wind Speed Forecasting

CPU Time (s) of EMD, EEMD, CEEMD and CEEMDANon Wind Speed TS

Page 99: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Concluding remarks

• EMD based hybrid SVR methods outperformed the persistence method.

• EMD based hybrid ANN methods outperformed the persistence method for 1 and 3 hour ahead forecasting only.

• EMD based hybrid SVR methods outperformed the SVR for 1, 3 and 5 hour ahead forecasting.

• EMD-ANN method had significantly worse performance than the ANN.

• EEMD/CEEMD/CEEMDAN-ANN methods had comparable performance as the ANN.

• CEEMDAN-SVR and EEMD-SVR outperformed the CEEMD-SVR and EMD-SVR for 1, 3 and 5 hour ahead forecasting.

• CEEMDAN-ANN, the EEMD-ANN and the CEEMD-ANN outperformed the EMD-ANN.

• In general, the EMD based hybrid SVR methods had better performance than the EMD based hybrid ANN methods although the SVR and ANN methods had similar performance. => EMD and its improved versions enhanced the performance of SVR on time series forecasting.

• By considering the CPU time and the number of decomposed sub series, CEEMDAN-SVR is the best performed method for wind speed time series forecasting.

Y. Ren, P. N. Suganthan, and N. Srikanth, “A comparative study of empirical mode

decomposition-based short-term wind speed forecasting methods,” IEEE Trans.

Sustain. Energy, vol. 6, no. 1, pp. 236--244, Jan. 2015.

Page 100: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Outline

• Introduction– Time Series– Computational Intelligence– Ensemble Methods

• Empirical Mode Decomposition (EMD)– EMD-SVR– Ensemble EMD– Application: Wind speed forecasting

• Random Vector Functional Link (RVFL) Network– RVFL structural configurations– Application: Load demand forecasting– Application: Wind power ramp classification

100

Page 101: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Random vector functional link (RVFL)

Network

Y.-H. Pao, G.-H. Park, D. J. Sobajic, Learning and generalization characteristics of the random vector functional-

link net, Neurocomputing 6 (2) (1994) 163–180.

Page 102: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

RVFL variations

• Effects on:– Input layer bias

– Hidden layer bias

– Direct input-output connections

Y. Ren, P. N. Suganthan, N. Srikanth and G. Amaratunga, “Single Hidden Layer Neural Networks with Random Weights for

Short-term Electricity Load Demand Forecasting,” Information Sciences, 2016.

Schmidt W F, Kraaijveld M A, Duin R P W. Feedforward neural networks with random weights[C]//Pattern Recognition, 1992. Vol.

II. Conference B: Pattern Recognition Methodology and Systems, Proceedings., 11th IAPR International Conference on. IEEE,

1992: 1-4.

Page 103: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

RVFL variations (cont’d)

Input layer bias

Hidden layer bias

Input output

link

Page 104: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

RVFL for load demand forecasting

• Input layer bias and hidden layer bias insignificantly affected the performance

• whereas the direct input– output connections significantly improved the performance

• Quantile scaling algorithm has improved the performance for 1–4 hour and 18–24 hour ahead forecasting horizons.

• Feature selection based on partial auto correlation function or seasonal auto-regression has consistently degraded the performance on the seasonal time series.

Page 105: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Random Forest for TS forecasting

• Same concept as classification but– Classification tree => regression tree

– Majority vote => averaging or selecting median for the final output

• Important parameters: – n_tree: number of bootstrap samples

– m_try: number of variables tried at each split

• Criteria at split:

– Residual sum of square: – Or minimize within group variance

– Or maximize between group variance

Page 106: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

RVFL for load demand forecasting

(cont’d)

Performance

Training time

Testing time

106

Page 107: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

RVFL for load demand forecasting

(cont’d)

• No clear overall advantage of input layer and hidden layer biases. However, the input layer biases are necessary for the neural networks to function properly as a universal approximator. => recommend retaining biases as the selection choices as they may be beneficial for some forecasting problems.

• Compared with reported non-ensemble forecasting methods such as the persistence method, seasonal ARIMA and artificial neural networks, the RVFL network has significantly better performance.

• The RVFL network is underperformed by random forest, which is an ensemble method.

• The computation time (including training and testing) of the RVFL network is the shortest compared with the reported methods.

107

Page 108: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Wind Power Ramp Forecasting with

RVFL

Based on

local

extrema

Based on

end points

Y. Ren, X. Qiu, P. N. Suganthan, N. Srikanth, G. Amaratunga, “Detecting Wind Power

Ramp with Random Vector Functional Link (RVFL) Network,” IEEE Symposium on

Computational Intelligence and Ensemble Learning (IEEE CIEL'15), Dec. 2015.

Page 109: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Wind Power Ramp Forecasting with

RVFL (cont’d)

• Noises caused by:

– Wind gust

– Wind turbine maintenance / shutdown

– Sensor noise or fault

• De-noise:

– Smooth outlier: median absolute deviation (MAD) rejection

– Missing data: extrapolation

– White noise: empirical mode decomposition (EMD)

109

Page 110: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Wind Power Ramp Forecasting with RVFL

(cont’d)• Imbalanced data:

– Wind ramp forecasting => classification problem

– Ramp: minority class, no-ramp: majority class => imbalanced dataset

– Up-scaling or down-scaling to recover from imbalance

• Error measure:

110

Page 111: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Experiment setup

Wind power time series

De-noise

Convert to power ramp (+1: significant ramp, -1: no significant ramp)

Over sampling to avoid imbalance

RVFL classification

Page 112: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Results and Discussion

A Fraction of Wind Power Generated in an ELIA wind farm, red segments denote power ramps

Summary of the Datasets for Wind Ramp Classification

112

Page 113: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Results and Discussion (cont’d)

Friedman Rank Sum Test on the Performance Measures of the Four classification Methods

Nemenyi Post-hoc Test on the Performance Measures of the Four Classification Methods with 12 Hour Window Forecasting

113

Page 114: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Concluding remarks

• Wind power ramp forecasting with RVFL network

• Two wind power definitions: based on (i) local extrema and (ii) end points

• RVFL network has comparable performance as ANN, RF and SVR for 6 hour ahead forecasting

• RVFL network has better performance than the SVM for 12 hour ahead forecasting

• RVFL network has significant time advantage over SVM and ANN and is comparable as RF

Page 115: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Conclusions

• Empirical Mode Decomposition based SVR: improve Time series forecasting accuracy

• Ensemble of EMD: CEEMDAN-SVR has best performance and short computation time

• RVFL network with input bias, without hidden bias, with direct input—output connections is the best configuration

• RVFL network has significantly better performance than non-ensemble methods for wind speed TS forecasting

• RVFL network has better performance and shorter time for wind power ramp forecasting

• RF is also highly competitive.

115

Page 116: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Future work

• Ensemble of EMD with multi-variate time series

• Ensemble of EMD with complex valued time series

• Ensemble of RVFL network

• Investigate in details on the feature space after enhancement node transformation

• Deep Learning ensembles

• Transfer Learning

116

Page 117: Ensemble Based Classification and Forecasting Methodsweb.mysites.ntu.edu.sg/epnsugan/PublicSite/Shared Documents... · Ensemble Based Classification and Forecasting Methods ... TENCON

Contributed by:Zhang Le (PhD 2016)

Ren Ye (PhD 2016)

Thank You !

117