ECE656-Machine Learning and Adaptive Systems Lectures 1 & 2c e p t o r s Ef f ector s Motor Organs Central Nervous System External Feedback Responses Body Internal Feedback Figure:

Introduction Neural Networks Neuron Models

ECE656-Machine Learning and Adaptive SystemsLectures 1 & 2

M.R. Azimi, Professor

Department of Electrical and Computer EngineeringColorado State University

Fall 2015

M.R. Azimi Machine Learning and Adaptive Systems


Why Machine Learning?

The volume and dimension of data that need to be stored/transmitted,processed, analyzed, or screened are increasing exponentially in almostevery aspect of our lives nowadays.Automated analysis and decision-making from a large volume of datarequires the development and application of many different types ofmachine learning systems. In a more broad term machine learninginvolves a set of tools that can be used to automatically: (a) detect orretrieve patterns in the data (e.g., anomaly detection, text/imageretrieval and ranking); (b) make decisions about class membership of thedata (e.g., speaker recognition, face recognition); (c) reduce thedimensionality of the data (e.g., principal component analysis (PCA),manifold learning); (d) feature map or make inference (e.g., clustering,parameter estimation) about the data; etc.The interest in this area has grown tremendously over the past few yearsowing to the wide range of its applications in medical, environmental,biometrics, surveillance, remote sensing, and military arenas. Amongthese are:



Applications:

1 Medical: Automatic detection, classification, and diagnoses of tumors frommedical images (e.g., X-ray, MRI, CT-scan, and ultrasound), chromosomeidentification, etc.

2 Computer Vision: Identification of parts in assembly lines, robotic vision,unmanned autonomous systems (e.g., UAVs), etc.

3 Remote Sensing: Cloud classification and height estimation, weatherprediction and forecasting, data assimilation, etc.

4 Military: Detection and recognition of various targets in radar, sonar,acoustic,.. IR data

5 Data Reduction and Feature Extraction: Transform a large dimensionaldata to lower dimensional subspaces for better representation, recognition;manifold learning, etc.

6 Information Retrieval: Document (text and image) retrieval andreproduction.

7 Identification & Security Systems: Facial, Iris, finger-print-based IDsystems, airport security systems, etc.



Definition and Terminology

Machine Learning Definition (Tom Mitchell, 1998):

An algorithm that learns from Experience E wrt some Task T andPerformance Measure ρ, if its performance on T , as measured by ρ,improves Experience E.Well-posed learning problem: < ρ, T,E >.

Examples:1 Task T: Classification of benign and malignant tumors from medical images

(hypothesis test that maps image data to binary decisions i.e. H0: benigntumor, H1: malignant tumor); Performance Metric ρ: Probability of Error;Experience E: Examples of benign and malignant tumors from medical images.In this case learning involves designing a classifier that performs binaryhypothesis test with minimum probability of error (i.e. false negative and falsepositive) when applied to medical images.

2 Task T: Extract inherent low-dimensional attributes from image data (i.e. map,either linearly or nonlinearly, image data to a reduced dimensional subspace);Performance Metric ρ: Reconstruction error; Experience E: An ensemble set ofimages with the same (e.g., 1st and 2nd order) statistical properties. In thiscase, learning involves designing a mapping system that converts largedimensional data to extract reduced dimensional attributes such that thereconstruction error is minimized (e.g., in the MSE sense).



Terminology:Here, we use the above medical example as an illustration example.

Examples: Items or instances of data used for learning or evaluation e.g., tumorimages.

Features: A set of attributes associated to an example, often of reduceddimension e.g., shape, texture, color attributes of the tumors.

Labels: Values of categories to examples, e.g., benign versus malignant tumors.

Training Samples: Examples used for the training, e.g., examples of benign andmalignant tumors.

Validation Samples: Examples used to tune the parameters of a learningalgorithm for optimal performance.

Test Samples: Novel examples used to test the performance of a learningalgorithm, e.g., actual medical images of tumors used for diagnoses.

Performance Metric: A function that measures performance wrt to Task T ,e.g., Mean Squared Error (MSE) between the actual label and the predictedlabel (during the training phase).



Machine Learning and Other Areas

Economics & Organizational

Behavior

Evolution

Computer ScienceCognitive and Neuroscience,

ANN

Adaptive Systems

Statistics

Machine Learning

In this class, we use tools from Artificial Neural Networks (ANN),Statistics and Adaptive Systems.



Neural Networks

Artificial neural networks or ANNs have been studied for many years inthe hope of achieving human-like performance in the fields of speech andpattern recognition. Although the original research perhaps dates back to1950’s , the 1990’s and 2000’s have seen an extraordinary growth inneural network models and their computational properties. Somecontributing factors are: advances in VLSI and analog devices;neurobiologists have better understanding of the manner in which theinformation is processed by nature and development of mathematicalmodels and efficient adaptation or machine learning algorithms.ANN are:

1 Inspired from biological neural systems (BNNs).

2 Alternatives to digital computing.

3 Alternatives to AI and expert systems.



ANNs have attracted attention of scientists from a number of disciplines.Neuroscientists are interested in modeling BNNs; physicists envisageanalogies between neural network models and nonlinear dynamicalsystems; mathematicians use them as tools for solving complexoptimization and large scale problems; electrical engineers use them forsignal/image processing and control system applications, andpsychologists look at building prototype structures of human-likeinformation processing systems.Features & Characteristics of ANN:

1 Adaptive or Trainable: Adaptable in face of changing environments.No programming is required, i.e. system is trained directly fromdata (i.e. model-free estimation).

2 Massively Parallel: High speed performance in decision-making.

3 Fault Tolerant: Damage to few neurons or links does not impairfunction.

4 Generalization: Ability to extend decision-making to novel data (notseen ANN during training).



5 Abstraction: Extract features of pieces of a pattern to recognize orreconstruct complete pattern.

6 Nonlinearity: Offer nonlinearity important for functionapproximation used in certain signal processing and controlapplications.

Biological Neural System (BNN):BNN has an elaborate structure with very complex interconnections.Input to BNN is provided by sensory receptors which deliver stimuli bothfrom within the body as well as from sensory organs when the stimulioriginate in the external world. The stimuli are in the form of electricalimpulses that convey information into the network of neurons. As a resultof information processing in the central nervous system, the effectors arecontrolled and give human responses in the form of diverse actions.Thus, system has three stages, receptors, neural network and effectors.The motor organs are monitored in the central nervous system byfeedback links that verify their action.



ExternalStimuli

Sensoryorgans

Receptors Effecto

rs

MotorOrgans

CentralNervousSystem

ExternalFeedback

Responses

Body

InternalFeedback

Figure: Information flow in BNN.

The fundamental building block of the BNNs is the elementary neuronsor nerve cells. The structure of each neuron consists of:

1 Cell body or soma which is a large round central body anywherefrom 5 to 100 microns in diameter.

2 The Axon which is attached to soma and is electrically active,producing pulses emitted by the neuron.

3 The dendrites which are electrically passive, receive inputs fromother neurons by means of a specialized contact, synapse, whichoccurs when dendrites of two different neurons meet.



Figure: Biological Neurons.

Signals reaching a synapse and received by dendrites are electricalimpulses. Communication between neurons occurs as a result of release,by presynaptic cell, of chemical substances called neurotransmitters.Thus, terminal boutons generate the chemical that affects the receivingneuron which in turn generates an impulse to its axon, or produces noresponse.

Neuron responds to the total of its aggregated input within a short timeinterval called the period of latent summation. The neuron’s response isgenerated if the total potential of its membrane reaches a certain level.In this case, the neuron fires and sends a pulse response to its axon.



Incoming impulses are excitatory if they promote firing, or inhibitory ifthey hinder the firing of the response. For firing, the excitation shouldexceed the inhibition by the neuron’s threshold of (≈ 40mV ). Sincesynaptic connections cause the excitatory and inhibitory reactions of thereceiving neuron, it is practical to assign + or - weight values to suchconnections, respectively. Usually, a certain number of incomingimpulses, generated by neighboring neurons and by a neuron itself, arerequired to make a neuron fire. The impulses that are closely spaced intime and arrive synchronously are more likely to cause the firing.

After carrying a pulse, an axon fiber will be in state of completenon-excitability for a certain time called the refractory period. This canbe used to determine the state of the neuron at time t+ 1 based on thestate at time t.

The human cerebral cortex is comprised of approx. 100 billion neurons,each having roughly 1000 dendrites that form some 100,000 billionsynapses. Given that system operates at about 100Hz it functions atsome billion interconnections per second. It weighs approx. 3lb andcovers about 0.15m2 and 2mm thickness.



Artificial Neural Networks (ANN):In ANN, neurons (also called nodes or cells) are analogous to theirbiological counterparts in which neurons becomes the cells, axons anddendrites become connections or links and synapses become variableweights. The weighted inputs are summed and compared with athreshold associated with the cell. The cell fires if the weighted sum ofinputs (excitatory and inhibitory) is higher than a threshold.

The nodes can interact in many ways by virtue of the manner in whichthey are interconnected.

+ i

neti(t)1

wi1

bi

2

n

wi2

winNeuron i

Figure: A binary neuron model (McCulloch-Pitts).



McCulloch-Pitts Neuron ModelMcCulloch-Pitts (1943) model for a binary node is,

oi(t+ 1) =

1 if

∑Nj=1 wijxj(t) ≥ bi Firing

0 if∑Nj=1 wijxj(t) < bi Not firing

or

oi(t+ 1) =

{1,

∑N+1j=1 wijxj(t) ≥ 0, wi,N+1 = bi, x0 = −1

0, otherwiset : Discrete timeoi(t+ 1): state or output of the node i at time (t+ 1).xj(t): input j to node i.wij : weight connecting input j to node i, + = excitatory, - = inhibitory,0 = no synapse.bi: Threshold (or bias) for node i.

This simple model can perform many digital Boolean operations (NAND,NOR, memory cell, etc.).



For example, NOR and NAND gates using this model are shown in Figure4.

b1 = 1-1

1

1

1

x1

x2

x3

b2 = 0

Inhibitory

Excitatory

NOTOR

b1=0

b2=0

b3=0

b4=1

1

1

1

-1

-1

-1

x1

x2

x3

Figure: Simple gates constructed using binary neurons.

Memory cell is shown in Figure 5.

b=1

Inhibitory input 1 or 0

Excitatory input 1 or 0

1

o(t+1) = x(t)

-1

1

Figure: Memory cell constructed using a binary neuron



For memory cell, the excitatory and inhibitory inputs initialize firing andnon-firing states. The output in absence of inputs is then sustainedindefinitely, as 0 output is fed back to the input does not cause firingwhile output of 1 does.

Real neurons involve many complications ignored by this simpleMcCulloch-Pitts model. These are the drawbacks,

It allows binary states only,

Operates under discrete time assumptions,

Assumes synchrony of operations of all neurons in a network.

Weights and thresholds are fixed and no interaction among neuronstake place.

A simple generalization of McCulloch-Pitts model which circumventssome of the above was proposed by Rosenblatt’s Perceptron Model.



Rosenblatt’s Perceptron Model (1958) is a nonlinear neuron model givenby,

oi = f(∑Nj=1 wijxj − bi) = f(neti)

where oi : Continuous valued output (state or activation)f(.) : Activation function or squashing function and

neti =∑Nj=1 wijxj − bi : net input (induced local field) to node i

Alternatively, in vector formneti =

∑Nj=1 wijxj − bi = wtix− bi = xtwi − bi

where wi = [wi1, wi2, · · · , wiN ]t is the weight vector andx = [x1, · · · , xN ]t is the input vector.

Typical activation functions f(.) are:

1 Sigmoidal:

Unipolar: f(net) = 11+exp(−λnet)

Bipolar: f(net) = 21+exp(−λnet) − 1

Note that for unipolar sigmoidal function f ′(net) = 1λ (1− f(net))f(net)



1

0.5 net

f(net)

0

1

-1

net

f(net)

0

Unipolar Sigmoid

Bipolar Sigmoid

Figure: Unipolar and bipolar sigmoidal activation functions.

2 Threshold logic:

f(net) =

0 net ≤ 0net/a 0 < net < a1 net ≥ a

3 Hard-limiter:

f(net) =

{1 net > 00 net < 0

Bipolar version is called signum function,

f(net) =

1 net > 00 net = 0−1 net < 0

Note: As λ −→∞, Sigmoid −→ Hard− limiter



f(net)

net

f(net)

a

1

net

Threshold Logic Hard-Limiter

Figure: Threshold Logic and Hard limiter Activations.

Stochastic version of McCulloch-PittsDecision for neuron to fire is probabilistic. Let x be the state of neuronand P (v) be probability of firing, v is the induced local field of neuron,then

x =

{+1 with prob. P (v)−1 with prob. (1− P (v))

where P (v) = 11+e−v/T , T is the pseudo temperature (controls synaptic

noise level or uncertainty in firing). When T −→ 0 it reduces to thedeterministic McCulloch-Pitts model.



Example: Configure a linear neuron as a nonrecursive (Finite Impulse Response orFIR) filter for signal filtering.An Nth order FIR filter is represented by the following input/output equation(convolution sum),

y(n) =N∑

k=0

bkx(n− k)

where x(n) is the input, y(n) is the output, and bk are the filter coefficients (orimpulse response or taps). Alternatively, we can write

y(n) = wtx

where x = [x(n) x(n− 1), · · · , x(n−N)]t and w = [b0 b1, · · · , bN ]t. Thus, thefollowing modifications to a linear neuron model will implement an FIR filteringoperation.

y(n)

x(n)

b0

x(n-1)

x(n-N)

b1

bN

Z-1

.

.

.

Linear Neuron

Delay

Figure: A linear neuron as an FIR filter.


Documents

ECE656-Machine Learning and Adaptive Systems Lectures 1 & 2c e p t o r s Ef f ector s Motor Organs Central Nervous System External Feedback Responses Body Internal Feedback Figure: