Introduction to Learning

Introduction to Learning

ECE457 Applied Artificial IntelligenceFall 2007

Lecture #11

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 2

Outline Overview of learning Supervised learning

Russell & Norvig, sections 18.1-18.3, 19.1

Unsupervised learning Russell & Norvig, section 20.3

Reinforcement learning Russell & Norvig, sections 21.1, 21.3,

21.5

CS 498 & CS 698 (Prof. Ben-David)


Limit of Predefined Knowledge Many of the algorithms and techniques

we saw relied on predefined information Probability distributions Heuristics Utility functions

Only works if this information is easily available

For real-world application, often preferable to make agent learn the information automatically


Overview of Learning Learn what?

Facts about the world (KB) Decision-making strategy Probabilities, costs, functions, states, …

Learn from what? Training data

Often freely available for common problems Real world

Learn how? Need some form of feedback to the

agent


Learning and Feedback Supervised learning

Training data includes correct output Learn relationship between data and output Evaluate with statistics

Unsupervised learning Training data with no correct output Learn patterns in the data Evaluate with fitness of the pattern

Reinforcement learning Set of actions with rewards and punishments Learn to maximize reward & minimize

punishment Evaluate with value of reward


Supervised Learning Given a training corpus of data-

output pairs x & y values Email & spam/not spam Variable values & decision

Learn the relationship mapping the data to the output f(x) Spam features Decision rules


Supervised Learning Example

2D state space with binary classification

Learn function to separate both classes

+

- -- -

-

-

-

-

-

-

-

-

-

- - -- -

-

-

-

-

++

+

+

+

+

+

++

+

+

+

+

+


Decision Tree

+

- -- -

-

-

-

-

-

-

-

-

-

- - -- -

-

-

-

-

++

+

+

+

+

+

++

+

+

+

+

+

2 5

8

4

X>5

-

Yes

X<2

Y>8

NoYes

No

-

Y<4

NoYes

-

+

NoYes

-


Decision Tree

+

- -- -

-

-

-

-

-

-

-

-

-

- - -- -

-

-

-

-

++

+

+

+

+

+

++

+

+

+

+

+

2 5

8

4

X

-

<2

Y

-+-

-

>5Other

Other>8<4


Decision Tree Multiple variables and value to

decide whether email is spamEmail

UW Pics Words

Attach

Multi

Spam?

E1 N Y 58 Y Y YesE2 N N 132 N Y NoE3 N Y 1049 Y Y YesE4 N N 18 Y Y YesE5 Y N 26 Y Y NoE6 N Y 32 N Y YesE7 Y N 44 N Y NoE8 N Y 256 N Y YesE9 Y Y 2789 Y N NoE10 N Y 857 Y N NoE11 N N 732 N N NoE12 N Y 541 N Y Yes


Decision Tree Build decision tree

UW

Not Spam

Yes

Pictures

Multi

Not Spam

Spam

NoYes

Yes

Attach

No

Not Spam

Spam

NoYes

No


Supervised Learning Algorithm

Many possible learning techniques, depending on the problem and the data Start with inaccurate

initial hypothesis Refine to reduce error

or increase accuracy End with trade-off

between accuracy and simplicity

+

- -- -

-

-

-

-

-

-

-

-

-

- - -- -

-

-

-

-

++

+

+

+

+

+

++

+

+

+

+

+


Trade-Off Why trade-off

between accuracy and simplicity? Noisy data Special cases Measurement

errors

+

- -- -

-

-

-

-

-

-

-

-

-

- - -- -

-

-

-

-

++

+

+

+

+

+

+++

+

+

+

+-

-

-

+-

-


Supervised Learning Algorithm Learning a decision tree follows the

same general algorithm Start with all emails at root Pick attribute that will teach us the

most Highest information gain, i.e. difference of

probability of each class Branch using that attribute Repeat until trade-off between

accuracy of leafs and depth limit / relevance of attributes


Supervised Learning Evaluation

Statistical measures of agent’s performance RMS error between f(x) and y Making correct decision

With as few decision rules as possible Shallowest tree possible

Accuracy of a classification Precision and recall of a classification


Precision and Recall Binary classification: distinguish + (our

target) from – (everything else) Classifier makes mistakes

Classifies some + as – and some – as + Define four categories:

Actual value+ –

Classified as

+True

PositivesFalse

Positives

–False

Negatives

True Negative

s


Precision and Recall Precision

Proportion of selected items the classifier got right

TP / (TP + FP) Recall

Proportion of target items the classifier selected

TP / (TP + FN)


Precision and Recall Why ignore True Negatives? Typically, there are a lot more

negatives than positives Internet searches: + are target

websites, - are all other websites

Counting TN would skew the statistics and favour a system that classifies everything as negatives


Overfitting A common problem with supervised

learning is over-specializing the relation learned to the training data

Learning from irrelevant features of the data Email features such as: paragraph

indentation, number of typos, letter “x” in sender address, …

Works well on training data Because of poor sampling or random chance

Fails in real-world tests


Testing Data Evaluate the relation learned using

unseen test data i.e. that was not used in the training Therefore system not overfitted for it

Split training data beforehand, keep part away for testing Only works once! If you reuse testing data, you are

overfitting your system for that test!! Never do that!!!


Cross-Validation Shortcomings of holding out test data

Test only works once Training on less data, therefore result

less accurate n-fold cross-validation

Split the training corpus into n parts Train with n-1, test with 1 Run n tests, each time using a different

test part Final training with all data and best

features


Naïve Bayes Classifier P(Cx|F1,…,Fn) = P(Cx) i

P(Fi|Cx) Classify item in class Cx with

maximum probability Weighted Naïve Bayes Classifier

Paper on website Give each feature Fi a weight wi

Learn the proper weight values P(Cx|F1,…,Fn) = P(Cx) i

P(Fi|Cx)wi


Learning the Weights Start with initial weight values At each iteration, for each feature

Measure the impact of that feature on the accuracy of the classification

Modify the weight to increase the accuracy of the classification

End if Iteration limit is reached Accuracy increase less than threshold


Learning the Weights Define the following

Initial weight values: wi(0) = 1 Learning rate: Measure of the accuracy of the

classification using feature Fi at iteration n: Ain

Function to convert Ain into weight variation: (Ain)

(Ain) = (1 + e-Ain)-1 * [1 - (1 + e-Ain)-1]²

Threshold improvement in accuracy: є

Iteration limit: nmax


Learning the Weights Start with wi(0) At iteration n, for feature Fi

Measure Ain

Compute wi(n) = (Ain) wi(n) = wi(n-1) + wi(n)

End if n = nmax (entire algorithm) Ain < є (feature Fi)


Unsupervised Learning Given a training corpus of data points

Observed value of random variables in Bayesian network

Series of data points Orbits of planets

Learn underlying pattern in the data Existence and conditional probability of

hidden variables Number of classes and classification

rules Kepler’s laws of planetary motion


Unsupervised Learning Example

2D state space with unclassified observations

Learn number and form of clusters

Problem of unsupervised clustering Many algorithms proposed

for it More research still being

done for better algorithms, different kind of data, …

*

*

* *

** * *

****

** *

* ***

**

*

**

* **** ***


*

*

* *

** * *

****

** *

* ***

**

*

**

* **** ***

Unsupervised Learning Algorithm

Define a similarity measure, to compare pairs of elements

Starting with no clusters Pick seed element Group similar elements

until threshold Pick new seed from

free elements and start again

**

*


***

****

*

*

* *

** * *

****

** * *

* * **** ***

Unsupervised Learning Algorithm

Starting with one all-encompassing cluster Find cluster with highest

internal dissimilarity Find most dissimilar pair

of elements inside cluster Split into two clusters Repeat until all clusters

have internal homogeneity

Merge homogeneous clusters

*

* *

*

*

*


Unsupervised Learning Evaluation Need to evaluate fitness of

relationship learned Number of clusters vs. their internal

properties Difference between clusters vs.

internal homogeneity Number of parameters vs. number of

hidden variables in Bayesian network No way of knowing what is the

optimal solution


K-Means Popular unsupervised clustering

algorithm Data represented as cloud of

points in state space Target

Group points in k clusters Minimize intra-cluster variance


K-Means Start with k random cluster centers For each iteration

For each data point Associate the point to the nearest cluster

center Add to variance

Move each cluster center to the center of mass of associated data point cloud

End when Variance less than threshold Cluster centers stabilize


K-Means We have:

Data points: x1, …, xi, …, xn

Clusters: C1, …, Cj, … Ck

Cluster centers: 1, …, j, … k

Minimize intra-cluster variance V = j xiCj |xi - j|²


*

*

* *

** * *

****

** *

* ***

**

*

**

* **** ***

*

*

* *

** * *

****

** *

* ***

**

*

**

* **** ***

*

*

* *

** * *

****

** *

* ***

**

*

**

* **** ***

*

*

* *

** * *

****

** *

* ***

**

*

**

* **** ***

K-Means Example

o

o

o

o

o

o

o

o

o

o

o


Reinforcement Learning Given a set of possible actions, the

resulting state of the environment, and rewards or punishment for each state Taxi driver: tips, car repair costs, tickets Checkers: advantage in number of pieces

Learn to maximize the rewards and/or minimize the punishments Maximize tip, minimize damage to car and

police tickets: drive properly Protect own pieces, take enemy pieces:

good play strategy


Reinforcement Learning Learning by trial and error Try something, see the result

Speeding results in tickets, going through a red light results in car damage, quick and safe drive results in tips

Checkers pieces in the center of the board are soon lost, pieces on the side are kept longer, sacrifice some pieces to take a greater number of enemy pieces

Sacrifice known rewarding actions to explore new, potentially more rewarding actions

Develop strategies to maximize rewards while minimizing penalties over the long-term


Q-Learning Each state has

A reward or punishment A list of possible actions, which lead

to other states Learn value of state-action pairs

Q-value


Q-Learning Update value of previous (t-1) state-action

pair based on current (t) state-action value Q(st-1,at-1) = [Rt-1 + maxa(Q(st,at)) –

Q(st-1,at-1)] Q(s,a): estimated value of state-action pair

(s,a) Rt: reward of state st

: learning rate : discount factor of future rewards

0 (future rewards are irrelevant), 1 (future rewards are the same as current rewards)

Update: Q(st-1,at-1) = Q(st-1,at-1) + Q(st-1,at-1)


Exploration Function If agent always does action with

max Q(s,a), it always evaluates the same state-action pairs

Need exploration function Trade-off greed vs. curiosity Try rarely-explored low-payoff actions

instead of well-known high-payoff actions

Many possible functions


Exploration Function Define:

Q(s,a): estimated value of (s,a) N(s,a): Number of times (s,a) has been tried Rmax: maximum possible value of Q(s,a) Nmin: minimum number of times we want the

agent to try (s,a) f( Q(s,a), N(s,a) ) =

Rmax if N(s,a) < Nmin

Q(s,a) otherwise Agent picks action with maximum f(.) value Guarantees each (s,a) pair is explored at

least Nmin times


Limits of RL Search

Number of state-action pairs can be very large

Intermediate rewards can be noisy Real-world search

Initial policy can have very poor reward

Necessary exploration of suboptimal actions can be costly

Some states are hard to reach


Policy Learn the optimal policy in decision

network : S A EU() = t=0

t Rt

Greedy search Modify policy until EU stops

increasing


Helicopter Flight Control Sustained stable inverted flight

Very difficult for humans First AI able to do it


Helicopter Flight Control Collect flight data with human pilot Learn model of helicopter

dynamics Stochastic and nonlinear Supervised learning

Learn policy for helicopter controller Reinforcement learning


Helicopter Dynamics States

Position, orientation, velocity, angular velocity

12 variables 391 seconds of flight data

Time step 0.1s 3910 triplets (st, at, st+1)

Learn probability distribution P(st+1|st,at) Implemented in simulator and tested

by pilot


Helicopter Controller Problem definition

S: set of possible states s0: initial state (s0 S) A: set of possible actions P(S|S,A): state transition probabilities : discount factor R: reward function mapping states to values

At state st, controller picks action at, system transitions to random state st+1 with probability P(st+1|st,at)


Helicopter Controller Reward function

Punish deviation from desired helicopter position and velocity

R [-, 0] Policy learning

Reinforcement learning EU() = t=0

t Rt

Problem Stochastic state transitions Impossible to compare several policies!


PEGASUS algorithm Predefined series of random numbers

Length of series function of complexity of policy

Use the same series to test all policies At time t, each policy encounters the same

random event Simulate stochastic environment

Environment stochastic from point of view of agent

Environment deterministic from our point of view

Makes comparison between policies possible


Summary of Learning

Supervised Unsupervised

Reinforcement

Training data

Data and correct output

DataStates,

actions, and rewards

Learning target

Data-output

relationship

Patterns in data

Policy

Evaluation

Statistics FitnessReward value

Typical application

Classifiers Clustering Controllers

Documents

Introduction to Learning