• BP algorithm


• Shape recognition with matlab code

• Face recognision with matlab code

( ورب�اسVerbos در )ش�يوه آم�وزش پس انتش�ار خط�ا ي�ا 1974 Back Propagation ي�ک ک�ه ک�رد بی�ان دک�تری خ�ود در رس�اله را

ش�بکه پرس�پترون چن�د الي�ه البت�ه ب�ا ق�وانين نيرومن�دتر آموزش�ی ب�ود. چ�ون خط�ا را در ط�ول ش�بکه رو ب�ه عقب منتش�ر می کن�د ب�ه

این عنوان نامگذاری شد.

الگوریتمBP( 1985 توسط )Williams وHinton وRumelhart گسترش یافت.

ن�زول ش�یب روش از اس�تفاده Gradient descend)الگ�وریتم )دس�ته ب�ا ت�ا کن�د تنظیم را ط�وری پارامتره�ای ش�بکه ت�ا ک�رده نمون�ه ه�ای آموزش�ی تط�ابق داش�ته باش�ند. ی�ادگیری ش�بکه ه�ای عص�بی در مقاب�ل خط�ا در داده ه�ای آموزش�ی مق�اوم اس�ت و در

(و Lang 1990(، شناس�ایی گفت�ار )Cottrell 1990تش�خیص چه�ره )( کاربرد دارد.LeCun 1989تشخیص کاراکترهای دست نویس )

Back Propagation Algorithm

وزنهای الزم برای یک شبکه چند الیه با ساختار BPالگوریتم •را پیدا می کند. در این الگوریتم از شیب نزولی شبکه ثابت

برای مینیمم کردن میزان خطا استفاده می شود.

است.روشهای یادگیری با ناظر یکی از BPالگوریتم •

Back Propagation Algorithm…

• Training a multilayer perceptron is the same as training a perceptron; the only difference is that now the output is a nonlinear function of the input thanks to the nonlinear basis function in the hidden units. Considering the hidden units as inputs, the second layer is a perceptron and we know how to update the parameters, . For the first-layer weights,, we use the chain rule to calculate the gradients:

Back Propagation Algorithm…

• The learning problem faced by Back propagation is to search a large hypothesis space defined by all possible weight values for all the units in the network. Gradient descent can be used to find a hypothesis to minimize E.

•Backpropagation (training_examples, Ƞ,)

•Each training example is a pair of the form (), where is the vector of network input values and is the vector of target network output values.

Back Propagation Algorithm…

•Ƞ is the learning rate (e.g., .05),is the number of network inputs, the number of units in the hidden layer, the number of output units.

•The input from unit i into unit j is denoted, and the weight from unit i to unit j is denoted.

Error Back Propagation Learning

•The back propagation algorithm learns the weights for a multilayer network, given a network with a fixed set of units and interconnections. It employs gradient descent to minimize the squared error between the network output values and the target values for these outputs.

• • Outputs is the set of output units in the network

• and are the target and output values associated with kth output unit and training example d.

Hypothesis Space

دو مق.دار ممکن w1و w0دو مح.ور •خطی واح..د وزن ب..ردار ب..رای

س..وم مح..ور م..یزان Eهس..تند. از ای دس.ته ب.ه مرب.وط خط.ای را خ..اص آموزش..ی ه..ای نمون..ه خط..ای س..طح ده.د. می نش..ان نش.ان داده ش.ده در ش.کل ارجحیت ه.ر ب.ردار وزن را در فض.ای فرض.یه نش.ان می ده.د. ب.ا توج.ه ب.ه تعری.ف

س�همی وار س.طح خط.ا همیش.ه و می�نیمم اس..ت نقط�ه ی�ک

خواهد داشت.مطلق

Back Propagation Rules

Create a feed-forward network with inputs, hidden units, and output units.

Initialize all network weights to small random numbers.

Until the termination condition is met, Do

o For each () in training-examples, Do

• Propagate the input forward through the network:

1. Input the instance to the network and compute output of every unit u in the network.

Update Weights

• Propagate the error backward through the network:

2. For each network output unit k, calculate its error term

3. For each hidden unit h, calculate its error term

4. Update each network weight

BP Neural Network


jS mw,1

mjw ,1

mjiw ,

Layer m






Layer m-1


mw 1,1

miw 1,


S mw1,1


SS mmw,1


S mw,1


Si mw,




Layer MMa1



S Ma

Layer 1









,1 RSw




Structuring the network

• In some applications, we may believe that the input has a local structure. For example, in vision we know that nearby pixels are correlated and there are local features like edges and corners; any object, for example, a handwritten digit, may be defined as a combination of such a primitives.

• When designing the MLP, hidden units are not connected to all input units because not all inputs are correlated. Instead we define hidden units that are connected to only a small local subset of the inputs. (Le Cun 1989)

Structuring the network…

• A structured MLP. Each unit is connected to a local group of units below it and checks for a particular feature.

•NETtalk: Neural networks that learn to pronounce English text (Sejnowski and Rosenberg, 1987)

•Speech recognition (Cohen et al, 1993; Renals et al. 1992)

•Optical character recognition (Sackinger et al. 1992; LeCun et al., 1990)

•On-line handwritten character recognition (Guyon, 1990)

•Combining visual and acoustic speech signals for improved intelligibility (Sejnowski et al., 1990)

•System identification (Narendra and Parthasarathy, 1990)

•Steering of an autonomous vehicle (Pomerleau, 1992)

Example- Speech Recognition

ب..رای • گفت..ار تش..خیص دار ص.دا ح.روف تش.خیص ص.دا بی ح.روف دو بین

h,d.در ده حالت مختلف

ورودی س.یگنال ه.ا بص.ورت •از ک.ه ع.ددی پ.ارامتر دو آم.ده بدس.ت ص.دا آن.الیز


ص.دایی • ش.بکه بی.نی پیش مق.دار بیش.ترین ک.ه اس.ت داش.ته را ش.بکه خ.روجی


Steering of an autonomous vehicle

سیس.تم کن.ترل فرم.ان ب.ا س.رعت متوس.ط در بزرگ.راه ه.ا توس.ط •Pomerleau (1993) ش.بکه این ورو.دی اس.ت.. ش.ده ط.راحی

نقط.ه ای اس.ت .ک.ه از د.ورب.ین رو ب.ه 30× 32عص.بی .ی.ک تص.و.یر ک.ار. گذاش.ته .ش.ده. اس.ت گر.فت.ه .می اتو.مبی.ل داخ.ل در. ک.ه جل.ویی ش.و.د. خ.ر.و.جی ش.ب.ک.ه عص.ب.ی جه.تی. .اس.ت ک.ه فر.م.ان ب.ه. آن. س.م.ت

5بای.د. بچ.ر.خ.د. ش.بکه ب.ر.ای. تقل.ی.د ف.رم.ان .ده.ی انس.ا.ن. در .ط.ول ح.دود ت.ا . ر.ا خ.و.ر.د.و ش.ب.ک.ه .موف.ق. .ش.ده. می .ش.ود... دا.د.ه دق.ی.ق.ه .آم.وز.ش

م.ای.ل در .بزرگ.راه 90 ما.ی.ل در س.اع.ت و. ب.رای مس.اف.ت 70س.رع.ت کنترل کند.

Steering of an autonomous vehicle

Handwritten Digit recognition

تشخیص ارقام است.BPیکی از کاربردهای شبکه های •

حداقل پردازش داده ها مورد نیاز است.•

ورودی شبکه تصاویر نرماالیز شده از ارقام منفک شده است.•

% است.9% و خطای عدم پذیرش 1خطای روش •

ZIP code recognition

ورودی شامل پیکسلهای سیاه و سفید است. •ارقام براحتی از پس زمینه جدا می شوند.

فهرست از خروجی وجود دارد.10•

ارقام سایز ها و سبک های مختلفی دارند.•

نرونهای الیه اول شدت روشنائی پیکسلها را •تقریب میزنندو

نرونهای الیه آخر شکل ارقام را تعیین •میکنند.

ZIP code recognition

مختل�ف 9298ابت�دا • حالته�ای در رقم segment ارق��ام این ش��ود. می بن�دی

توس�ط اف�راد مختلفی نوش�ته ش�ده است.

35 رقم چ�اپی از 3349این مجموع�ه ب�ا •فونت مختلف تکمیل شده است.

این • مجموع�ه این ه�ای وی�ژگی از یکی و �هم آ�موزش�ی داده ه�ای ک�ه� هم� اس�ت داده ه�ای تس�ت مبه�م و غ�یر قا�ب�ل طبق�ه

بندی هستند.

سایز ی�ک رقم متف�اوت اس�ت ام�ا تقریب�ا در • پیکس�ل اس�ت. از آنجاییک�ه 60 ˟ 40ح�دود �

ش�ب�که بای�د BPورود�ی دارد. ث�ا�بتی س�ایز س�ایز کا�رک�تره�ا �ن�رمالس�ازی ش�ود.� ا�ین �عم�ل� ت�ا می �گ�یرد ص�ور�ت خطی انتق�ال� ی�ک ب�ا

در� تص�ویر� ˟ 16کاراکت�ره�ا� پیکس�ل ق�رار 16 گیرند.

بدلیل انتق�ال خطی تص�ویر خ�روجی ب�اینری •ن�یس�ت و �چن�دین س�طو�ح خاکس�تری دارد. این ی�ک و� بن�دی ش�ده س�طوح �خاکس�تری �درج�ه

+ م�نتقل م�ی شوند.1-� تا �1د�امنه ت�غی�یرا�ت

The Network

ش�ده • نرم�االیز تص�ویر ورودی ˟ 16بن�ابراین و 16 پیکس�ل واحد تشکیل شده است.10خروجی از

ب�ه کالس • متعل�ق الگ�و ی�ک گ�یرد. خ�روجی iوق�تی ق�رار می - است.1+ است و برای دیگر واحد ها 1مطلوب برای واحد

The Network

• Input image(left), weight vector (center), and resulting feature map (right). The feature map is obtained by scanning the input image with a single neuron that has a local receptive field. White represent -1, black represents +1

•Notice that most of the errors are cases that people find quite easy. The human error rate is probably 20 to 30 errors

Learning Weights

تصویر به شبکه ارائه شده و وزنهای پیکسلهای فعال بتدریج اضافه میشوند. وزن پیکسلهای غیر موثر نیز بتدریج کاهش میابد.

Learning Weights (cont.)

Learning Weights (cont.)

Learning Weights (cont.)

Learning Weights (cont.)

Learning Weights (cont.)

Learning Weights (cont.)

Function Approximation



)( ,1

1)( 21

.10,10,10,10 12



11 bbww

.0,1,1 221

21 bww

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -1





Effect of Parameter Changes

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -1











Effect of Parameter Changes…

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -1






20 15 10 5 0

Effect of Parameter Changes…

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -1











Function Approximation

• Two-layer networks, with sigmoid transfer functions in the hidden layer and linear transfer functions in the output layer, can approximate virtually any function of interest to any degree accuracy, provided sufficiently many hidden units are available.

Ex: Function Approximation…

g p 14---p






• The steepest descent algorithm for the approximate mean square error:

• Matrix form:

Steepest Descent Algorithm


,,, )(









mji askw











mi skb








k 1+ Wm

k sm

am 1–



bmk 1+ bm

k sm–=

sm F




n 1m



n 2m







BP the Sensitivity

• Backpropagation: a recurrence relationship in which the sensitivity at layer m is computed from the sensitivity at layer m +1.

• Jacobian matrix:



































































Matrix Repression• The i,j element of Jacobian matrix










































)( 2












Recurrence Relation• The recurrence relation for the sensitivity

• The sensitivities are propagated backward through the network from the last layer to the first layer.






















.121 ssss MM

Intrusion Detection System (IDS)

وظیف.ه شناس.ایی و تش.خیص ه.ر گون.ه اس.تفاده غیرمج.از ب.ه سیس.تم، س.وء •

اس.تفاده و ی.ا آس.یب رس.انی توس.ط ه.ر دو دس.ته ک.اربران داخلی و خ.ارجی

را ب.ر عه.ده دارن.د. تش.خیص و جلوگ.یری از نف.وذ ام.روزه ب.ه عن.وان یکی از

رای.انه ای سیس.تم های و ش.بکه ها ام.نیت ب.رآوردن در اص.لی مکانیزم ه.ای

مط.رح اس.ت و عموم.أ در کن.ار دیواره ه.ای آتش و ب.ه ص.ورت مکم.ل امنی.تی

برای آن ها مورد استفاده قرار می گیرند.

Intrusion Detection System (IDS)….

سامانه های تش.خیص نف.وذ ب.ه ص.ورت س.امانه های نرم اف.زاری و س.خت اف.زاری ایج.اد •

از .مزای.ای و دقت. دار.ن.د. .س.رعت ر.ا م.ع.ایب خ.اص. خ.ود ک.دام .مزای.ا .و ه.ر و ش.د.ه

ا.منی.تی. آن ه.ا. توس.ط .نف.وذگران، سیس.تم های س.خت اف.ز.اری اس.ت .و ع.دم. شکس.ت

آس.ان ا.ز نرم. اف.زار، ق.ابلیت ق.ابل.یت دی.گ.ر ا.ین گون.ه س.یس.تم ها م.ی باش.د. ا.م.ا اس.تفاده

عم.ومیت مختل.ف،. سیس.تم های .عام.ل تف.اوت. نرم ا.ف.زاری .و ش.رایط در. س.ا.زگاری

بی.ش.تری را. ب.ه س.امانه. های ن.رم اف.زاری می .ده.د و عم.وم.أ. این گون.ه سیس.تم ها انتخ.اب

مناسب .تری هستند..

Intrusion Detection Methods

• Misuse detection

• matches the activities occurring on an information system to the signatures of known intrusions

• Anomaly detection

• compares activities on the information system to the norm behaviour

Motivation for using AI for Intrusion Detection

معایب روش های سنتی:•

هشدارهای غلط•

به روز رسانی مداوم پایگاه داده براساس نشانه های جدید•

:AIمزایای تکنیک های بر پایه ی

انعطاف پذیری


تشخیص الگو و توانایی تشخیص الگوهای جدید

توانایی یادگیری

AI techniques used for Intrusion Detection

• Support Vector Machines (SVMs)

• Artificial Neural Networks (ANNs)

• Expert Systems

• Multivariate Adaptive Regression Splines (MARS)

Traditional Neural Network Based IDS

Typically consist of a single neural network based on either misuse detection or anomaly detection

Neural network with good pattern classification abilities typically used for misuse detetction, such as Multilayer Perceptron Radial Basis function networks, etc

Neural network with good classification abilities typically used for anomaly detetction, such as Self organizing maps (SOM) Competitive learning neural network, etc

Hybrid Neural Network Approach

Combination of Misuse detection and anomaly detection based systems Clustering results in dimensionality reduction Classification attains attack identification

Advantages Improved accuracy Enhanced flexibility


SOM and MLP using back propagation SOM and RBF SOM and CNN, etc

Hybrid Neural Network Approach(Using SOM and MLP)

SOM employing unsupervised learning used for clustering

MLP employing Back Propagation Algorithm used for classification

Output from SOM is given as input to MLP

Self Organizing Maps

استفاده • آموزش برای رقابتی یادگیری روش از ده، خودسازمان ی درشبکهشود گره .می در گر پردازش های واحد ده سازمان خود ی شبکه یک در

. ها واحد شوند می داده قرار بیشتر یا بعدی دو بعدی، یک ی شبکه یک های . شوند می منظم ورودی الگوهای به نسبت رقابتی یادگیری فرآیند یک در

صورت • بدین شود می گرفته بکار ها شبکه قیبل این در که رقابتی یادگیریرقابت به یکدیگر با شدن فعال برای واحدها یادگیری، قدم هر در که استکه شود، می برنده واحد یک تنها رقابت مرحله یک پایان در پردازند، میمی داده تغییر متفاوتی شکل به واحدها سایر وزنهای به نسبت آن وزنهای

نظارت. بی یادگیری را یادگیری از نوع این .( Unsupervised)شود نامند می

Proposed hybrid SOM_BPN Neural Network

Shape recognition by backpropagation in matlab

• In this section we will design a neural network capable of identifying some conventional forms, for reasons of simplicity we have chosen the following shapes: triangle, rectangle and circle.

• The network is a network MLP (MultiLayer Perceptron).

• First we will prepare samples (images) and then proceed to learning and then shape recognition.

• The work will be done in Matlab by exploiting functions of Neural Network Toolbox.

Preparation of training data and simulation

• The data used are 4 images for each type of form (12 in total) including 3 of each shape (9 in total) will be used for training and each form (3 in total) will be used for the test.

Preparation of training data and simulation…

Preparation of training data and simulation…

• The images are binary bitmap images with a resolution of 192 x 160.One of the key steps for the preparation of the data is to process the images have unified dimensional images and for the reason of simplicities all images are already of the same size and shapes are almost the same size in respect of their resolutions.

• to fill the input matrix each matrix of image is divided into 16 lines batches, 16 lines we travel and if we have a black pixel we accumulate them in a matrix P.

• As our image is 160 lines so we have 10 (160/16) entries.

Preparation of training data and simulation…

• Thus the matrix P contains 10 rows and 9 columns (number of samples of the input data )

• Similarly it is necessary to prepare a matrix T (target) will indicate to the neural network during the learning in the case of a triangular rectangular shape , or circular .

• The matrix T is a 3 x 9 matrix , such that in each column target value for that shape is given.

The associated code

•Num_Inputs=10 ;% number of 160/16•P=zeros(Num_Inputs,9);% input matrix•T=zeros(3,9); % target matrix

The associated code…

•for h=1:9 % for each 9 images• switch h• case 1• Img = imread('t1.bmp');• Target=[1;-1;-1] ;• case 4• Img = imread('r1.bmp');• Target=[-1;1;-1] ;• case 7• Img = imread('c1.bmp');• Target=[-1;-1;1];


The associated code…

• Imread: gives 0 for black pixels and 1 for white ones

The associated code…

•[Num_Row,Num_column] = size(Img) ;

•for i=1:Num_Inputs %10• for j=(((Num_Row/Num_Inputs)*(i-1))+1) : ((Num_Row/Num_Inputs)*(i)) %160• for k=1 : Num_column %192• if Img(j,k)==0• P(i,h)=P(i,h)+k ;• end• end• end• end•end

Input matrice

Target matrice

Test values and associated code

And so on run the same code for test objects, with s matrice•S=zeros(Num_Inputs,3);•for h=1:3 % 3 test image•switch h• case 1• Img = imread('t4.bmp');

Test values and associated code…

•[Num_Row,Num_column] = size(Img) ;• for i=1:Num_Inputs• for j=(((Num_Row/Num_Inputs)*(i-1))+1) : ((Num_Row/Num_Inputs)*(i))• for k=1 : Num_column• if Img(j,k)==0• S(i,h)=S(i,h)+k ;• end• end• end• end•end

Test matrice

Normalization values of the matrices P and S between 1 and -1

•A=[P,S] ;•maxi=max(max(A));•mini=min(min(A));•[a,b]=size(A);•for i=1:a• for j=1:b• AN(i,j)=2*(A(i,j)/(maxi-mini))-1;• end•end

Result of normalization p

Result of normalization S

• The neural network is a MLP network whose hidden layer neurons 20, which contains the activation function is the function tansig and outputs layer contains 3 neurons whose activation function is purelinCode of creating and learning is as follows:

Learning …

• Num_Neuron_Hidden=20

• net = newff(P,T,Num_Neuron_Hidden,{},'traingd');

• Newff: create a feed-forward backpropagation network

• traingd: back prop network training function,

• {}:Transfer function of ith layer. Default is 'tansig' for hidden layers, and 'purelin' for output layer.

Learning …

• net=init(net); % reintialisation weights and bias

• net.trainparam.epochs=500;% maximum number of iteration

• net.trainparam.goal=0.0001; %error

• net=train(net,P,T); % start learning

Error plot

• y = sim(net,S);

• and the output:

• As you see for test shapes:

• For first shape: (0.3700>-0.8391>-1.4397) so the shape is triangle

• For second shape:(1.5965 > 0.4745 >-0.4391). so the shape is rectangle

• For third shape:(0.2083 > -1.3437 >--1.4397). so the shape is circle

Test one



Face recognition

• The learning task here involves classifying camera images of faces of various people in various poses.

• Images of 20 different people were collected, including approximately 32 images per person, varying the person's expression (happy, sad, angry, neutral), the direction in which they were looking (left, right, straight ahead,up), and whether or not they were wearing sunglasses.

Face recognition…

• As can be seen from the example images, there is also variation in the background behind the person, the clothing worn by the person, and the position of the person's face within the image.

• In total, 624 greyscale images were collected, each with a resolution of 120 x 128,

Face recognition…

• A variety of target functions can be learned from this image data.

• For example, given an image as input we could train an ANN to output the

o identity of the person,

o the direction in which the person is facing,

o the gender of the person,

owhether or not they are wearing sunglasses, etc.

Face recognition…

• In the remainder of this section we consider one particular task:

• learning the direction in which the person is facing (to their left, right, straight ahead, or upward).

Typical input images

Face recognition…

• Learning an artificial neural network to recognize face pose. Here a 960 x 3 x 4 network is trained on grey-level images of faces to predict whether a person is looking to their left, right, ahead, or up

• After training on 260 such images, the network achieves an accuracy of 90% over a separate test set.

• The learned network weights are shown after one weight-tuning iteration through the training examples and after 100 iterations

Learned weights

• Network weights after 1 iteration through each training example

Learned weights…

• Network weights after 100 iteration through each training example

Face recognition…

• The leftmost block corresponds to the weight w0, which determines the unit threshold, and the three blocks to the right correspond to weights on inputs from the three hidden units.

• After training on a set of 260 images, classification accuracy over a separate test set is 90%. In contrast, the default accuracy achieved by randomly guessing one of the four possible face directions is 25%.

Design Choices

• In applying BACKPROPAGATION to any given task, a number of design choices must be made.

• For example, we could preprocess the image to extract edges, regions of uniform intensity, or other local image features, then input these features to the network.

Difficulty with design choice

• One difficulty with this design option is that it would lead to a variable number of features (e.g., edges) per image, whereas the ANN has a fixed number of input units.

• The design option chosen in this case was instead to encode the image as a fixed set of 30 x 32 pixel intensity values, with one network input per pixel.

• The pixel intensity values ranging from 0 to 255 were linearly scaled to range from 0 to 1.

Design choice difficulty's solution

• The 30 x 32 pixel image is, in fact, a coarse resolution summary of the original 120 x 128 captured image, with each coarse pixel intensity calculated as the mean of the corresponding high-resolution pixel intensities.

• Using this coarse-resolution image reduces the number of inputs and network weights to a much more manageable size, thereby reducing computational demands, while maintaining sufficient resolution to correctly classify the images

Differences with ALVINN

• One interesting difference is that in ALVINN, each coarse resolution pixel intensity is obtained by selecting the intensity of a single pixel at random from the appropriate region within the high-resolution image, rather than taking the mean of all pixel intensities within this region

• The motivation for this in ALVINN is that it significantly reduces the computation required to produce the coarse-resolution image from the available high-resolution image. This efficiency is especially important when the network must be used to process many images per second while autonomously driving the vehicle.

Output encoding

• The ANN must output one of four values indicating the direction in which the person is looking (left, right, up, or straight)

• we could encode this four-way classification using a single output unit, assigning outputs of, say, 0.2,0.4,0.6, and 0.8 to encode these four possible values

• Instead, we use four distinct output units, each representing one of the four possible face directions, with the highest-valued output taken as the network prediction

Output encoding…

• This is often called a 1-of-n output encoding. There are two motivations for choosing the 1-of-n output encoding over the single unit option.

• First, it provides more degrees of freedom to the network for representing the target function (i.e., there are n times as many weights available in the output layer of units)

• Second, in the 1-of-n encoding the difference between the highest-valued output and the second-highest can be used as a measure of the confidence in the network prediction

Target values

• One obvious choice would be to use the four target values (1,0,0,0) to encode a face looking to the left, (0,1,0,0) to encode a face looking straight, etc.

• Instead of 0 and 1 values, we use values of 0.1 and 0.9, so that (0.9,0. 1,0.1,0.1) is the target output vector for a face looking to the left.

• The reason for avoiding target values of 0 and 1 is that sigmoid units cannot produce these output values given finite weights.

• If we attempt to train the network to fit target values of exactly 0 and 1, gradient descent will force the weights to grow without bound.

Network structure

• Another design choice we face is how many units to include in the network and how to interconnect them.

• The most common network structure is a layered network with feed forward connections from every unit in one layer to every unit in the next.

• In the current design we chose this standard structure, using two layers of sigmoid units (one hidden layer and one output layer).

• It is not common to use more layers than this because training times become long.

Network structure…

• how many hidden units should we include?

• In the results reported before, only three hidden units were used, yielding a test set accuracy of 90%.

• In other experiments 30 hidden units were used, yielding a test set accuracy one to two percent higher.

• Although the generalization accuracy varied only a small amount between these two experiments, the second experiment required significantly more training time

Other learning algorithm parameters.

• In these learning experiments the learning rate was set to 0.3, and the momentum was set to 0.3.

• Lower values for both parameters longer training times.

• If these values are set too high, training fails to converge to a network with acceptable error over the training set.

Learned Hidden Representations

• consider first the four rectangular blocks just below the face images in the figure. Each of these rectangles depicts the weights for one of the four output units in the network (encoding left, straight, right, and up).

• The four squares within each rectangle indicate the four weights associated with this output unit-the weight w0, which determines the unit threshold (on the left), followed by the three weights connecting the three hidden units to this output.

Page 99: Applying back propagation to shape recognition Presented by Amir Mahdi Azadi Farzane Salami Osame Ghavidel Ahmad Lahuti

Learned weights…

• Network weights after 100 iteration through each training example

Learned Hidden Representations…

• The brightness of the square indicates the weight value,

• with bright white indicating a large positive weight, dark black indicating a large negative weight, and intermediate shades of grey indicating intermediate weight values.

• For example, the output unit labeled "up" has a near zero w0 threshold weight, a large positive weight from the first hidden unit, and a large negative weight from the second hidden unit.

بعن.وان مث.ال م.دل پرس.پترون چن.د الی.ه • ، ، MLPشبکه عص.بی مص.نوعی

بعن.وان ی.ک روش ج.امع و توان.ا ب.رای تخمین ه.ر ن.وع دلخ.واه ،سیس.تم ه.ای

خطی و غ.یر خطی ب.دون هیچ گون.ه پیش فرض.ی م.ورد توج.ه ق.رار گرفت.ه

هن.وز ب.ا موان.ع خاص.ی هنگ.ام س.روکار داش.تن ب.ا MLPاس.ت . اگرچ.ه م.دل

سیس.تم ه.ای ب.ا ابع.اد ب.اال در فض.ای داده ه.ای ورودی مواج.ه اس.ت . ان.الیز

ق.رار توج.ه م.ورد وی.ژگی اس.تخراج مهم.ترین روش بعن.وان اص.لی اج.زای

گرفت.ه اس.ت و می توان.د متغ.یر ه.ای م.ازاد را از داده ه.ای ورودی اص.لی

حذف بکند .

-بع.دی( هم.راه ب.ا نم.ایش متغیره.ای ورودی n ابت.دا ب.رای ی.افتن مس.یر در فض.ای ورودی )PCAمت.د •

. (Xi, i=1,……n. که بیشت.رین ا.همیت تغیری. پذ.یر.ی ر.ا دارند اط.العات را .استخراج .میکند )

سپس مق.ادیر وی.ژه ) ( و ب.ردار وی.ژه نرم.ال متن.اظر) ( براس.اس •

م.اتریس. کو.واری.انس . . و تص.ویر .ورو.دی .ه.ا. ب.ه ی.ک زیرف.ض.ای. ک.و.چک.تر ب.رای اعم.ال ع.ملی.ات

Xا.م.ین ج.زء اص.لی. از iک.اه.ش. .ابع.اد. م.حاس.ب.ه. می ش.و.ند.. بر.ا.س.اس .تئ.و.ر.ی ا.م.ا.ر. کالس.یک

ا.مین .ب.ردار. وی.ژه i .براب.ر. ب.ا .ج.س.تجو ک.رد.ن. .Xا.مین. ج.زء .اص.ل.ی. i ک.ه .ب.ه. ای.ن .مع.ن.ی. ا.س.ت .ک.ه. ح.ل .

ما.تریس .کوو.اری.انس است ..
