Neural 2

ISAN-DSP GROUP

1

Chapter 2

Single Layer Perceptron Networks

ISAN-DSP GROUP

2

How can we use a mathematical function to classify ?How can we use a mathematical function to classify ?

Consider a simple problem: How can we classify fat and thin peoples?

Decisio

n line

FatFat

ThinThin

Weight (kg.)

Height(cm.)

40 80

140

180

ISAN-DSP GROUP

3

How can we use a mathematical function to classify ? How can we use a mathematical function to classify ? (cont.)(cont.)

We used 2 inputs, weight (x1) and height (x2), to classify “thin” and “fat”.

x1 (kg.)

x2

(cm.)

40 80

140

180Deci

sion lin

e

FatFat

ThinThin

Line x 2 - x 1 -

100=0

Area where x2 - x1 - 100 < 0

Area where x2 - x1 - 100 > 0

Weight-height space

We can usea line to classifyData.

ISAN-DSP GROUP

4

How can we use a mathematical function to classify ? How can we use a mathematical function to classify ? (cont.)(cont.)

We can write the decision function for classifying “thin” and “fat” as follows:

2 1

2 1

1 (thin) if - -100 0

0 (fat) if - -100 0

x xy

x x

Advantage: Universal linear classifierProblem: For a particular problem, how can we choosesuitable weights w and of the function ?

or1 1 2 2

1 2

( )

( 100)

y g w x w x

g x x

1 if z 0( )

0 otherwise.g z

where

ISAN-DSP GROUP

5

Supervised Learning: Perceptron Networks

Rosenblatt และเพื่��อนร่�วมงานเป็�นผู้��ร่ �เร่��มแนวคิ�ดของ Perceptron เม��อป็� 1962 โดยได�ร่�บแร่งบ�นดาลใจจากโมเดลการ่ทำ$างานของเซลล&ป็ร่ะสาทำของ McCulloch-Pitts

wi1

wi2

wi3 i

yi

x1

x2

x3

McCulloch-Pitts model

jijij

iiiii

xw

xwxwxwy

)(

)( 332211

ISAN-DSP GROUP

6

The decision function to classify “thin” and “fat”using the McCulloch-Pitts model:

1 1 2 2

1 2

( )

( 100)

y g w x w x

g x x

1 if z 0( )

0 otherwise.g z

where

Supervised Learning: Perceptron Networks (cont.)

y

x1

x2

-1

1 100

ISAN-DSP GROUP

7

Perceptron Networks

Input nodes

Output nodes

Input nodes

Output nodes

Hidden nodes

Single layer percentronnetwork

Multilayer percentronnetwork

(this case: 2 layers)

ISAN-DSP GROUP

8

1

( )N

i ij j ij

y g w x

wi1

yi

x4

wi2

wi4

wi3

x1

x2

x3

For each node, the output is given by

wij = Connection weight of branch (i,j)xj = Input data from node j in the input layeri = Threshold value of node i in the output layerg = Activation function

Inpu

t lay

er

Out

put l

ayer

A Single Layer Perceptron NetworkA Single Layer Perceptron Network

ISAN-DSP GROUP

9

Single Layer Perceptron Networks (cont.)

1 .จ$านวน input nodes ข()นอย��ก�บจ$านวน components ของ input data

2. Activation function ข()นอย��ก�บล�กษณะข�อม�ลของ Output เช่�น ถ้�า output ทำ.�ต้�องการ่เป็�น “ใช่�” หร่�อ “ไม�ใช่�” เร่าจะต้�องใช่� Thresholdfunction

หร่�อถ้�า output เป็�นคิ�าต้�วเลขทำ.�ต้�อเน��อง เร่าต้�องใช่� continuous functionเช่�น Sigmoid function

Tx

Tx xf

if 0

if 1)(

xexf

1

1)(

T=Threshold level

ISAN-DSP GROUP

10

Activation function

-10 -5 0 5 10-0.2

0

0.2

0.4

0.6

0.8

1

1.2

-10 -5 0 5 10-0.2

0

0.2

0.40.6

0.81

1.2

Threshold function (T=0)

Sigmoid function

= 2= 4

= 0.5

= 1

ISAN-DSP GROUP

11

How a single layer perceptron works

yw1

w2

x1

x2

สมม1ต้�ว�าเร่าม.วงจร่ข�าย perceptron ทำ.�ม. 2 input nodes และม. activation function เป็�น threshold function เร่าจะได� Binary output

0 if 0

0 if 1

2211

2211

xwxw

xwxwy

x1

x2 Region where y = 1

Region where y = 0

เส�นต้ร่ง L = 02211 xwxw

- ถ้�า (x1,x2) อย��เหน�อเส�นต้ร่ง L จะได� y = 1- ถ้�า (x1,x2) อย��ใต้�เส�นต้ร่ง L จะได� y = 0

ด�งน�)นเร่าเร่.ยกเส�นต้ร่ง L น.)ว�า ฟั3งก&ช่�นต้�ดส�น Decision function หร่�อฟั3งก&ช่�นขอบเขต้ Boundary Function

ISAN-DSP GROUP

12

x1

x2

(0,0)

(0,1)

(1,0)

y = 0

(1,1) y =1

เส�นต้ร่ง L ต้�องอย��ในช่�วงน.)จ(งใช่�ได�ส$าหร่�บFunction AND

เส�นป็ร่ะน.)ใช่�ไม�ได�ส$าหร่�บ Function AND

ต้�วอย�าง

How a single layer perceptron works (cont.)

คิวามช่�นและต้$าแหน�งของเส�นต้ร่ง L : w1x1+w2x2 – q = 0 ข()นอย��ก�บพื่าร่าม�เต้อร่& w1, w2, และ q เร่าจะต้�องป็ร่�บพื่าร่าม�เต้อร่&เหล�าน.)ให�ได�เส�นต้ร่ง L ทำ.�ให�ผู้ลล�พื่ธ์&ถ้�กต้�อง

x1 x2 y0 0 0

0 1 0

1 0 0

1 1 1

Function AND

ISAN-DSP GROUP

13

Learning Algorithm: Training Perceptron Networks

หล�กการ่ป็ร่�บต้�วของวงจร่ข�าย: ป็ร่�บพื่าร่าม�เต้อร่&ต้�างๆให�ไป็ในทำางทำ.�จะลดคิ�า คิวามผู้�ดพื่ลาดลงได� yy ˆ

outputNetwork ˆ output, Desired Error, yy

ข�)นต้อนในการ่ฝึ7กวงจร่ข�าย Training the network

1. ป็8อน Input เข�า Network2. คิ$านวณคิ�า Network Output

3. คิ$านวณคิ�า Error4. ป็ร่�บคิ�า weight ทำ1กคิ�า

5. กล�บไป็ทำ$าข�อ 1 ใหม�จนกว�า Error จะต้$�าลงจนยอมร่�บได�

)(ˆ 2211 xwxwfy

yy ˆ

yxx output red Desi),(Input 21

oldnewoldnew www ,

t

ISAN-DSP GROUP

14

wi1ywi2

wi3

x1

x2

x3

Input Networkoutput

-

Error

Desired output

+

Adjust weights

Learning Algorithm: Training Perceptron Networks (cont.)

ISAN-DSP GROUP

15

ส�ต้ร่การ่ป็ร่�บคิ�า weight (เฉพื่าะกร่ณ.ของต้�วอย�างน.))ii xyyw )ˆ(

)ˆ( yy rate Learning

หมายเหต้1 การ่ป็8อน input-output เพื่��อฝึ7กสอนวงจร่ข�ายต้�องป็8อนข�อม�ลทำ1กๆคิ��โดย1. ป็8อนทำ1กๆคิ��เร่.ยงต้ามล$าด�บในต้าร่าง หร่�อ2. ป็8อนข�อม�ลโดยส1�มมาจากต้าร่าง

วงจรข่�ายเรยนร� โดยการปร�บ weights !


ISAN-DSP GROUP

16

x1 x2 y0 0 0

0 1 0

1 0 0

1 1 1

Function ANDต้�วอย�าง

w1 = 0.5w2 = 2.5 = 1.0 = 0.2

คิ�าเร่��มต้�น

-0.2 0 0.2 0.4 0.6 0.8 1 1.2-0.2

0

0.2

0.4

0.6

0.8

1

1.2

ป็ร่�บคิร่�)งทำ.� 4ป็ร่�บคิร่�)งทำ.� 8ป็ร่�บคิร่�)งทำ.� 12ป็ร่�บคิร่�)งทำ.� 16

เร่��มต้�น


ISAN-DSP GROUP

17

n x1 x2 y y^ Err W1

New

w1 W2

New

w2 New

0 0.5 2.5 1

1 0 0 0 0 0 0.5 0 2.5 0 1

2 0 1 0 1 -1 0.5 0 2.3 -0.2 1.2

3 1 0 0 0 0 0.5 0 2.3 0 1.2

4 1 1 1 1 0 0.5 0 2.3 0 1.2

5 0 0 0 0 0 0.5 0 2.3 0 1.2

6 0 1 0 1 -1 0.5 0 2.1 -0.2 1.4

7 1 0 0 0 0 0.5 0 2.1 0 1.4

8 1 1 1 1 0 0.5 0 2.1 0 1.4

9 0 0 0 0 0 0.5 0 2.1 0 1.4

10 0 1 0 1 -1 0.5 0 1.9 -0.2 1.6

… … … … … … … … … … …

)(ˆ 2211 xwxwfy

yy ˆ

ii xyyw )ˆ( )ˆ( yy

oldnew

oldnew www


ISAN-DSP GROUP

18

Classification Data

x1

x2

(0,0)

(0,1)

(1,0)

Class y = 0

(1,1) Class y =1

การใช้ a single layer perceptron network ในการแยกแยะ ( classify) ข่ อมู�ล- เร่าเร่.ยก input ทำ.�ม.หลายม�ต้�ว�า input pattern หร่�อ input vector- Output จะต้�องม.ล�กษณะเป็�น classคุ�ณสมูบ�ติ!ข่องข่ อมู�ลที่#สามูารถจะใช้ a single layer perceptron network แยกแยะได -ใน feature space (input domain) 2 ม�ต้�น�)นclass แต้�ละ class จะต้�องสามาร่ถ้แยกจากclass อ��นได�โดยใช่�เส�นต้ร่งเส�นเด.ยวเป็�นต้�วแบ�ง

ISAN-DSP GROUP

19

Classification Data (cont.)

x1 x2 y0 0 0

0 1 1

1 0 1

1 1 0

Function XOR

x1

x2

(0,0)

(0,1)

(1,0)

Class y = 1

(1,1)

Class y =0

ต้�วอย�างทำ.�ใช่� a single layer perceptron ไม�ได�Function XOR เร่าไม�สามาร่ถ้ใช่� เส�นต้ร่งเส�นเด.ยวในการ่แบ�งแยก Class y=0 ก�บ Class y=1 ได�

Not OK!

ISAN-DSP GROUP

20

Higher Dimension Feature Space

ในกร่ณ.ทำ.� input pattern ม. component มากกว�า 1 component,output ของ perceptron จะเป็�น

0 if 0

0 if 1

2211

2211

xwxwxw

xwxwxwy

NN

NN

เร่าจะได� decision function เป็�น02211 xwxwxw NN

สมการ่น.)เป็�นสมการ่ของร่ะนาบหลายม�ต้� ( hyperplane)ถ้�าในกร่ณ.ของ input 2 ม�ต้�, decision function จะเป็�นเส�นต้ร่ง

ISAN-DSP GROUP

21

Higher Dimension Feature Space (cont.)

ในกร่ณ.ของ input 3 ม�ต้� เร่าจะได� decision function เป็�น0332211 xwxwxw

Class B

Decision planeX1

X3

X2

Class A

สมการ่น.)เป็�นสมการ่ของร่ะนาบ 3 ม�ต้�

ISAN-DSP GROUP

22

Linear Separability

- ในกร่ณ.ของ a single layer perceptron เร่าม. hyperplane เป็�น decision function - ถ้�า input pattern ของแต้�ละ class สามาร่ถ้แบ�งแยกจาก class อ��นได�โดยใช่� hyperplane แบ�ง เร่าเร่.ยกข�อม�ลช่1ดน.)ว�าม.ล�กษณะเป็�น linearly separable

Linearly separable Not linearly separable(เป็�นข�อจ$าก�ดของ a single layer

perceptron)

ISAN-DSP GROUP

23

โดยม. output ในร่�ป็

How can we adjust weights?

พื่�จาร่ณา สมม1ต้�ว�าเร่าม. function 21 2xxy

และต้�องการ่ใช่� a single layer perceptron ป็ร่ะมาณคิ�า function น.)

2211ˆ xwxwy

yw1

w2

x1

x2

^

- เร่าต้�องการ่ป็ร่�บ w1และ w2 ทำ.�ทำ$าให� y ใกล�เคิ.ยงก�บ y มากทำ.�ส1ด ^( ในกร่ณ.น.) activation function คิ�อ identity function f(x) = x )

ISAN-DSP GROUP

24

How can we adjust weights? (cont.)

พิ!จารณาคุ�าเฉล#ยข่องคุ�าคุวามูผิ!ดพิลาดก)าล�งสอง (Mean Square Error, MSE)

22211

22

)(

)ˆ(

xwxwy

yy

หมายถ้(ง คิ�าเฉล.�ย

เร่าจะได� 2 ในร่�ป็ของ function ของ w1และ w2 ด�งในร่�ป็ข�างล�าง

w1w2

MS

E เร่าเร่.ยกร่�ป็น.)ว�า error surface(ในกร่ณ.น.)เป็�นร่�ป็ parabola คิว$�า)

ISAN-DSP GROUP

25


0 0.5 1 1.5 21

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

3

w2

w1

Mean square error 2 as a function of w1 and w2

จ1ดต้$�าส1ดอย��ทำ.� (1,2) ซ(�งให�คิ�าMSE = 0

เร่าจะต้�องป็ร่�บ w1และ w2 ให�เข�าส��จ1ดต้$�าส1ดใน error surface

ISAN-DSP GROUP

26

0 0.5 1 1.5 21

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

3

w2

w1

ล�กษณะการ่ป็ร่�บ w1และ w2 ให�เข�าส��จ1ดต้$�าส1ดใน error surface


จ1ดเร่��มต้�นของ(w1,w2)

ป็ร่�บคิร่�)งทำ.� 1ป็ร่�บคิร่�)งทำ.� 2ป็ร่�บคิร่�)งทำ.� 3

ป็ร่�บคิร่�)งทำ.� kเป็8าหมาย

ISAN-DSP GROUP

27

Gradient Descent Method

เร่าต้�องการ่จะป็ร่�บ w1และ w2 ให�เข�าส��จ1ดต้$�าส1ดใน error surface ในสถ้านการ่ณ&น. �เร่าสามาร่ถ้ต้�)งโจทำย&ป็3ญหาข()นว�าคุ)าถามู1. Error surface เป็�นห1บเขา ทำ.�ม.คินต้าบอดคินหน(�งย�นอย��2 .ม.บ�อน$)าอย��ล�างส1ด3. ช่ายคินน.)ต้�องการ่เด�นลงไป็ทำ.�บ�อน$)า4. เขาร่� �แต้�เพื่.ยงว�าทำ.�ทำ.�เขาย�นอย��เอ.ยงไป็ทำางไหน5. คิ$าถ้ามม.อย��ว�า ช่ายคินน.)จะต้�องเด�นอย�างไร่จ(งจะไป็ถ้(งบ�อน$)าได�

คุ)าติอบให�เด�นไป็ทำ�ศทำางทำ.�พื่�)นด�นเอ.ยงลงมากทำ.�ส1ด

ISAN-DSP GROUP

28

Gradient Descent Method (cont.)

หล�กการ่เด�นลงเขาไป็ในทำ�ศทำางทำ.�พื่�)นเอ.ยงลงมากทำ.�ส1ดเร่.ยกว�า Gradient Descent Method1. คิ$านวณหา gradient ของพื่�)นผู้�ว (error surface) ในต้$าแหน�งทำ.�เร่าย�นอย�� (ต้$าแหน�ง (w1,w2) ในป็3จจ1บ�น ) gradient ทำ.�ได�จะช่.)ไป็ในทำ�ศทำางทำ.�ช่�นทำ.�ส1ด (ทำ�ศข()นเขา)

2. เด�นไป็ในทำ�ศทำางต้ร่งข�ามก�บ gradient ทำ.�คิ$านวณได�ในร่ะยะทำางส�)น (การ่ป็ร่�บคิ�า w1,w2)

3. ไป็ทำ$าข�)นต้อนทำ.� 1 ใหม�จนกว�าจะถ้(งจ1ดต้$�าส1ด

ISAN-DSP GROUP

29

A single layer percentron case

)(ˆ1

N

jjj xwfy

w1

y

x4

w2

w4

w3

x1

x2

x3

Network output คิ$านวณได�จาก

Square error 2 คิ$านวณได�จาก

2

1

22

))((

ˆ

j

N

jj xwfy

yy

Slope 2 เทำ.ยบก�บ wj คิ$านวณได�จาก

jj

N

jjj

N

jj

j

N

jj

jj

xxwfxwfy

xwfyww

))())((2

))((

11

2

1

2

ISAN-DSP GROUP

30

A single layer percentron case (cont.)

Slope 2 เทำ.ยบก�บ คิ$านวณได�จาก

))())((

))())((

11

11

j

N

jjj

N

jj

jj

N

jjj

N

jjj

xwfxwfy

xxwfxwfyw

))())((211

2

j

N

jjj

N

jj xwfxwfy

, > 0, เร่.ยกว�า Learning rate

ด�งน�)นเร่าจะได�

oldnew

joldj

newj www

สมการ่การ่ป็ร่�บ weight จะเป็�น

ISAN-DSP GROUP

31

Weak point of Gradient Descent Method

Local minimum

Global minimum

ว�ธ์.การ่ gradient descent method อาจจะทำ$าให�เร่าต้�ดอย��ทำ.� local minima ซ(�งย�งไม�ใช่�จ1ดทำ.�ต้$�าส1ดจร่�งๆ

ISAN-DSP GROUP

32

Why we need to adjust weight using a small step

0 0.5 1 1.5 21

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

3

w2

w1


ป็ร่�บคิร่�)งทำ.� 1



ป็ร่�บคิร่�)งทำ.� k

การ่ป็ร่�บ weight ถ้�าป็ร่�บโดยใช่�learning rate คิ�ามากๆจะทำ$าให�network ป็ร่�บต้�วเข�าส��จ1ดต้$�าส1ดได�ช่�าหร่�ออาจไม�ได�เลย (unstable)

เป็8าหมาย

ISAN-DSP GROUP

33

Why we need to adjust weight using a small step (cont.)

0 0.5 1 1.5 21

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

3

w2

w1

เป็8าหมาย


ป็ร่�บคิร่�)งทำ.� 1ป็ร่�บคิร่�)งทำ.� 2ป็ร่�บคิร่�)งทำ.� 3ป็ร่�บคิร่�)งทำ.� k

การ่ป็ร่�บ weight ทำ.ละน�อยจะทำ$าให�network ป็ร่�บต้�วเข�าส��จ1ดต้$�าส1ดได�ด.และช่�วยลดเร่��องการ่ไม�เสถ้.ยร่ unstable ได�

ISAN-DSP GROUP

34

Gradient Descent Method as Optimization method

22 )ˆ( yy

ในว�ธ์.การ่ของ gradient descent method เร่าต้�องการ่ทำ$าให�คิ�า error

ม.คิ�าต้$�าทำ.�ส1ด (minimize error)

ว�ธ์.การ่น.)เร่าสามาร่ถ้มองว�า error คิ�อ cost function และการ่ป็ร่�บคิ�า weights ก<เพื่��อจะ minimize cost function น��นเอง

ISAN-DSP GROUP

35

Adjusting Weights for a Linear Unit

jj

N

jjj

N

jjj xxwfxwfyw

))())((11

คิ�าในการ่ป็ร่�บ weights

Learningrate Output

errorf ´

Input

ส$าหร่�บ Linear unit xxf )(

jj

N

jjj xxwyw

))((1

เร่าได�

1)( xfโดยม.

ISAN-DSP GROUP

36

Adjusting Weights for a Nonlinear Unit

xexf 21

1)(

การ่คิ$านวณ f ´ ส$าหร่�บกร่ณ. Nonlinear unit

))(1()(2)( xfxfxf

1. Sigmoid function

เร่าได�

2. Function tanh(x) )tanh()( xxf

เร่าได� ))(1()( 2xfxf

คิ$านวณ f ´ ได�ง�าย

ISAN-DSP GROUP

37

Adjusting Weights for a Nonlinear Unit (cont.)

-5 -4 -3 -2 -1 0 1 2 3 4 5-0.2

0

0.2

0.4

0.6

0.8

1

1.2

-5 -4 -3 -2 -1 0 1 2 3 4 5

-1

-0.6

-0.20

0.2

0.6

1

Sigmoidfunction (= 1)

tanhfunction (= 1)

f(x)

f ´(x)

f ´(x)

f(x)

ISAN-DSP GROUP

38

How difficult the data for the network to learn

Class A

Class B

ยากคิวามยากในการ่ฝึ7กสอนให�วงจร่ข�ายเร่.ยนร่� �ข()นอย��ก�บร่ะยะห�างร่ะหว�างกล1�มข�อม�ลของแต้�ละ class (ย��งห�างก�นมากย��งฝึ7กง�าย ย��งช่�ดก�นย��งฝึ7กยาก)

Class A

Class B

ง�ายOK

OK

Not OK

ISAN-DSP GROUP

39

Classification Performance : How good the network has learnt

Class A

Class B

Class A

Class B

dmin

dmin

Decision plane

Decision plane

ป็ร่ะส�ทำธ์�ภาพื่ของวงจร่ข�ายในการ่แยกแยะ class สามาร่ถ้ด�ได�จากร่ะยะทำางต้$�าส1ดจาก input pattern ถ้(ง decision plane

ด. ไม�ด.

ISAN-DSP GROUP

40

Conclusions: single layer percentron networks

1. Function ของแต้�ละ unit อย��ในร่�ป็ function ของ input weighted sum

)(1

N

jijiji xwgy

2. ในกร่ณ.ของการ่แยกแยะข�อม�ล Decision function จะอย��ในร่�ป็ของ Hyperplane

3. ข�อจ$าก�ดของ a single layer network คิ�อข�อม�ลทำ.�วงจร่ข�ายสามาร่ถ้แยกแยะได� จะต้�องเป็�น linearly separable data

4. การ่ฝึ7กวงจร่ข�ายโดยการ่ป็ร่�บคิ�า weight อาศ�ยหล�กการ่ของ gradient descent method เพื่��อทำ.�จะลดคิ�าคิวามผู้�ดพื่ลาดให�ต้$�าทำ.�ส1ด5. Gradient descent method ม.จ1ดอ�อนคิ�อว�ธ์.การ่น.)อาจจะทำ$าให�วงจร่ข�ายต้�ดอย�� ทำ.� local minima ของ error surface ได�

Documents

Neural 2