27
A Simulated-annealing-based Approach for Simultaneous Parameter Optimization and Featur Selection of Back-Propagation Networks (BPN) Shih-Wei Lin, Tsung-Yuan Tseng, Shuo-Yan Chou, Shih-Chieh Chen National Taiwan University of Science and Technology Expert Systems with Applicati ons 2008

Shih-Wei Lin, Tsung-Yuan Tseng, Shuo-Yan Chou, Shih-Chieh Chen

  • Upload
    abra

  • View
    76

  • Download
    0

Embed Size (px)

DESCRIPTION

A Simulated-annealing-based Approach for Simultaneous Parameter Optimization and Feature Selection of Back-Propagation Networks (BPN). Shih-Wei Lin, Tsung-Yuan Tseng, Shuo-Yan Chou, Shih-Chieh Chen National Taiwan University of Science and Technology. Expert Systems with Applications 2008. - PowerPoint PPT Presentation

Citation preview

Page 1: Shih-Wei Lin, Tsung-Yuan Tseng,  Shuo-Yan Chou, Shih-Chieh Chen

A Simulated-annealing-based Approach for Simultaneous Parameter Optimization and Feature Selection of Back-Propagation Networks (BPN)

Shih-Wei Lin, Tsung-Yuan Tseng, Shuo-Yan Chou, Shih-Chieh Chen

National Taiwan University of Science and Technology

Expert Systems with Applications 2008

Page 2: Shih-Wei Lin, Tsung-Yuan Tseng,  Shuo-Yan Chou, Shih-Chieh Chen

Introduction

The back-propagation network (BPN) can be used in various fields. -evaluating consumer loans

-diagnosing heart disease Different problems may require

different parameter settings for network architectures.

Rule of thumb or ‘‘trial and error’’ methods are usually used to determine them.

Page 3: Shih-Wei Lin, Tsung-Yuan Tseng,  Shuo-Yan Chou, Shih-Chieh Chen

Introduction

Not all features are beneficial for classification in BPN.

Select the beneficial subset of features which result in a better classification.

Simulated-annealing (SA) -based approach, to obtain the optimal parameter settings for network architectures of BPN.

Page 4: Shih-Wei Lin, Tsung-Yuan Tseng,  Shuo-Yan Chou, Shih-Chieh Chen

BPN

Before applying BPN to solve problems as follow:

(1) the parameter settings for network architectures (2) hidden layer number (3) learning rate (4) momentum term (5) number of hidden neurons (6) learning cycle

Page 5: Shih-Wei Lin, Tsung-Yuan Tseng,  Shuo-Yan Chou, Shih-Chieh Chen

Feature Selection

The main benefits of feature selection are as follows:

(1) Reducing computational cost and storage requirements (2) Dealing with the degradation of classification efficiency due to the finiteness of training sample sets (3) Reducing training and prediction time (4) Facilitating data understanding and visualization

Page 6: Shih-Wei Lin, Tsung-Yuan Tseng,  Shuo-Yan Chou, Shih-Chieh Chen

Problems

While using BPN, we confront two problems:

How to set the best parameters for BPN !

How to choose the input attributes for BPN !

SA-based approach that not only provided the best parameter settings for network architecture of BPN, but also found out the beneficial subset of features according to different problems.

Page 7: Shih-Wei Lin, Tsung-Yuan Tseng,  Shuo-Yan Chou, Shih-Chieh Chen

BPN

BPN is a common neural network model whose architecture is the multilayer perceptorns (MLP).

Page 8: Shih-Wei Lin, Tsung-Yuan Tseng,  Shuo-Yan Chou, Shih-Chieh Chen

Learning rate of BPN

Learning rate:

1.Too high a learning rate will cause the network architecture to oscillate and be hard to converge.

2.Too low a learning rate will cause slow convergence and may fall into local optimization.

Page 9: Shih-Wei Lin, Tsung-Yuan Tseng,  Shuo-Yan Chou, Shih-Chieh Chen

Momentum term of BPN

Momentum term:

1.Too small a momentum term does not have an obvious effect and cannot increase the classification accuracy rate

2.Too big a momentum term can excessively affect the learning effect and cause extreme modification.

Page 10: Shih-Wei Lin, Tsung-Yuan Tseng,  Shuo-Yan Chou, Shih-Chieh Chen

Number of hidden neurons of BPN

(3)Number of hidden neurons:

1.When there are too few hidden neurons, it is apt to cause a bigger error

2.Increasing the number of hidden neurons can affect the speeds of convergence and computing with almost no help in reducing errors

Page 11: Shih-Wei Lin, Tsung-Yuan Tseng,  Shuo-Yan Chou, Shih-Chieh Chen

Learning cycle of BPN

(4) Learning cycle:

1.Too high a learning cycle will result in over-fitting 2.Too low a learning cycle can lead to too little training and result in a worse classification accuracy rate of testing data.

Page 12: Shih-Wei Lin, Tsung-Yuan Tseng,  Shuo-Yan Chou, Shih-Chieh Chen

Some Solution

Search for the optimal weights after training

Search for the optimal parameter settings of BPN

Neural network pruning

Page 13: Shih-Wei Lin, Tsung-Yuan Tseng,  Shuo-Yan Chou, Shih-Chieh Chen

Simulated Annealing Proposed by Kirkpatrick

(1985)1. Pick a random assignme

nt2. Make a small change3. Accept change if

cost is decreased; or Other criteria

First used by Kakuno et. al.

Page 14: Shih-Wei Lin, Tsung-Yuan Tseng,  Shuo-Yan Chou, Shih-Chieh Chen

Simulated-annealing

Page 15: Shih-Wei Lin, Tsung-Yuan Tseng,  Shuo-Yan Chou, Shih-Chieh Chen

Initial

random assignment

Make a small change

Accept?

Update current solution

temperature dropping

temperature dropped

Termination?

Optimized

No

Yes

Yes

Yes

No

No

Simulated-annealing

Page 16: Shih-Wei Lin, Tsung-Yuan Tseng,  Shuo-Yan Chou, Shih-Chieh Chen

Solution representation

First variable is the learning rate Second is the momentum term Third is the number of hidden neurons Other is represented as feature selection

Page 17: Shih-Wei Lin, Tsung-Yuan Tseng,  Shuo-Yan Chou, Shih-Chieh Chen

Parameters range

SA was set to 300 to find the optimal BPN parameter settings

The learning rate ranged from 0 to 0.45 The momentum term ranged from 0.4 t

o 0.9 The learning cycle of BPN was set as 50

0

Page 18: Shih-Wei Lin, Tsung-Yuan Tseng,  Shuo-Yan Chou, Shih-Chieh Chen

Platform

Using the C language Windows XP operating system Pentium IV 3.0 GHz CPU 512 MB of RAM.

Page 19: Shih-Wei Lin, Tsung-Yuan Tseng,  Shuo-Yan Chou, Shih-Chieh Chen

Cross-Validation

To guarantee that the present results are valid and can be generalized for making predictions regarding new data

Using k-fold-cross-validation This study used k = 10, meaning that

all of the data will be divided into ten parts, each of which will take turns at being the testing data set.

Page 20: Shih-Wei Lin, Tsung-Yuan Tseng,  Shuo-Yan Chou, Shih-Chieh Chen

Datasets

Page 21: Shih-Wei Lin, Tsung-Yuan Tseng,  Shuo-Yan Chou, Shih-Chieh Chen

System architecture

Page 22: Shih-Wei Lin, Tsung-Yuan Tseng,  Shuo-Yan Chou, Shih-Chieh Chen

10-fold classification result of Breast Cancer dataset

Page 23: Shih-Wei Lin, Tsung-Yuan Tseng,  Shuo-Yan Chou, Shih-Chieh Chen

The comparison results of approaches without feature selection

Page 24: Shih-Wei Lin, Tsung-Yuan Tseng,  Shuo-Yan Chou, Shih-Chieh Chen

SA + BPN approach with feature selection and other approaches

Page 25: Shih-Wei Lin, Tsung-Yuan Tseng,  Shuo-Yan Chou, Shih-Chieh Chen

Experimental results summary of with/without feature selection on datasets

Page 26: Shih-Wei Lin, Tsung-Yuan Tseng,  Shuo-Yan Chou, Shih-Chieh Chen

Concusion

We proposed a SA-based strategy to select features subset and to set the parameters for BPN classification.

Compared to the previous studies, the classification accuracy rates of the proposed SA + BPN approach are better than those of other approaches.

Page 27: Shih-Wei Lin, Tsung-Yuan Tseng,  Shuo-Yan Chou, Shih-Chieh Chen

Thank YouQ & A