49
Introduction to Learning ECE457 Applied Artificial Intelligence Fall 2007 Lecture #11

Introduction to Learning

Embed Size (px)

DESCRIPTION

Introduction to Learning. ECE457 Applied Artificial Intelligence Fall 2007 Lecture #11. Outline. Overview of learning Supervised learning Russell & Norvig, sections 18.1-18.3, 19.1 Unsupervised learning Russell & Norvig, section 20.3 Reinforcement learning - PowerPoint PPT Presentation

Citation preview

Page 1: Introduction to Learning

Introduction to Learning

ECE457 Applied Artificial IntelligenceFall 2007

Lecture #11

Page 2: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 2

Outline Overview of learning Supervised learning

Russell & Norvig, sections 18.1-18.3, 19.1

Unsupervised learning Russell & Norvig, section 20.3

Reinforcement learning Russell & Norvig, sections 21.1, 21.3,

21.5

CS 498 & CS 698 (Prof. Ben-David)

Page 3: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 3

Limit of Predefined Knowledge Many of the algorithms and techniques

we saw relied on predefined information Probability distributions Heuristics Utility functions

Only works if this information is easily available

For real-world application, often preferable to make agent learn the information automatically

Page 4: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 4

Overview of Learning Learn what?

Facts about the world (KB) Decision-making strategy Probabilities, costs, functions, states, …

Learn from what? Training data

Often freely available for common problems Real world

Learn how? Need some form of feedback to the

agent

Page 5: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 5

Learning and Feedback Supervised learning

Training data includes correct output Learn relationship between data and output Evaluate with statistics

Unsupervised learning Training data with no correct output Learn patterns in the data Evaluate with fitness of the pattern

Reinforcement learning Set of actions with rewards and punishments Learn to maximize reward & minimize

punishment Evaluate with value of reward

Page 6: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 6

Supervised Learning Given a training corpus of data-

output pairs x & y values Email & spam/not spam Variable values & decision

Learn the relationship mapping the data to the output f(x) Spam features Decision rules

Page 7: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 7

Supervised Learning Example

2D state space with binary classification

Learn function to separate both classes

+

- -- -

-

-

-

-

-

-

-

-

-

- - -- -

-

-

-

-

++

+

+

+

+

+

++

+

+

+

+

+

Page 8: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 8

Decision Tree

+

- -- -

-

-

-

-

-

-

-

-

-

- - -- -

-

-

-

-

++

+

+

+

+

+

++

+

+

+

+

+

2 5

8

4

X>5

-

Yes

X<2

Y>8

NoYes

No

-

Y<4

NoYes

-

+

NoYes

-

Page 9: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 9

Decision Tree

+

- -- -

-

-

-

-

-

-

-

-

-

- - -- -

-

-

-

-

++

+

+

+

+

+

++

+

+

+

+

+

2 5

8

4

X

-

<2

Y

-+-

-

>5Other

Other>8<4

Page 10: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 10

Decision Tree Multiple variables and value to

decide whether email is spamEmail

UW Pics Words

Attach

Multi

Spam?

E1 N Y 58 Y Y YesE2 N N 132 N Y NoE3 N Y 1049 Y Y YesE4 N N 18 Y Y YesE5 Y N 26 Y Y NoE6 N Y 32 N Y YesE7 Y N 44 N Y NoE8 N Y 256 N Y YesE9 Y Y 2789 Y N NoE10 N Y 857 Y N NoE11 N N 732 N N NoE12 N Y 541 N Y Yes

Page 11: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 11

Decision Tree Build decision tree

UW

Not Spam

Yes

Pictures

Multi

Not Spam

Spam

NoYes

Yes

Attach

No

Not Spam

Spam

NoYes

No

Page 12: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 12

Supervised Learning Algorithm

Many possible learning techniques, depending on the problem and the data Start with inaccurate

initial hypothesis Refine to reduce error

or increase accuracy End with trade-off

between accuracy and simplicity

+

- -- -

-

-

-

-

-

-

-

-

-

- - -- -

-

-

-

-

++

+

+

+

+

+

++

+

+

+

+

+

Page 13: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 13

Trade-Off Why trade-off

between accuracy and simplicity? Noisy data Special cases Measurement

errors

+

- -- -

-

-

-

-

-

-

-

-

-

- - -- -

-

-

-

-

++

+

+

+

+

+

+++

+

+

+

+-

-

-

+-

-

Page 14: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 14

Supervised Learning Algorithm Learning a decision tree follows the

same general algorithm Start with all emails at root Pick attribute that will teach us the

most Highest information gain, i.e. difference of

probability of each class Branch using that attribute Repeat until trade-off between

accuracy of leafs and depth limit / relevance of attributes

Page 15: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 15

Supervised Learning Evaluation

Statistical measures of agent’s performance RMS error between f(x) and y Making correct decision

With as few decision rules as possible Shallowest tree possible

Accuracy of a classification Precision and recall of a classification

Page 16: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 16

Precision and Recall Binary classification: distinguish + (our

target) from – (everything else) Classifier makes mistakes

Classifies some + as – and some – as + Define four categories:

Actual value+ –

Classified as

+True

PositivesFalse

Positives

–False

Negatives

True Negative

s

Page 17: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 17

Precision and Recall Precision

Proportion of selected items the classifier got right

TP / (TP + FP) Recall

Proportion of target items the classifier selected

TP / (TP + FN)

Page 18: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 18

Precision and Recall Why ignore True Negatives? Typically, there are a lot more

negatives than positives Internet searches: + are target

websites, - are all other websites

Counting TN would skew the statistics and favour a system that classifies everything as negatives

Page 19: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 19

Overfitting A common problem with supervised

learning is over-specializing the relation learned to the training data

Learning from irrelevant features of the data Email features such as: paragraph

indentation, number of typos, letter “x” in sender address, …

Works well on training data Because of poor sampling or random chance

Fails in real-world tests

Page 20: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 20

Testing Data Evaluate the relation learned using

unseen test data i.e. that was not used in the training Therefore system not overfitted for it

Split training data beforehand, keep part away for testing Only works once! If you reuse testing data, you are

overfitting your system for that test!! Never do that!!!

Page 21: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 21

Cross-Validation Shortcomings of holding out test data

Test only works once Training on less data, therefore result

less accurate n-fold cross-validation

Split the training corpus into n parts Train with n-1, test with 1 Run n tests, each time using a different

test part Final training with all data and best

features

Page 22: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 22

Naïve Bayes Classifier P(Cx|F1,…,Fn) = P(Cx) i

P(Fi|Cx) Classify item in class Cx with

maximum probability Weighted Naïve Bayes Classifier

Paper on website Give each feature Fi a weight wi

Learn the proper weight values P(Cx|F1,…,Fn) = P(Cx) i

P(Fi|Cx)wi

Page 23: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 23

Learning the Weights Start with initial weight values At each iteration, for each feature

Measure the impact of that feature on the accuracy of the classification

Modify the weight to increase the accuracy of the classification

End if Iteration limit is reached Accuracy increase less than threshold

Page 24: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 24

Learning the Weights Define the following

Initial weight values: wi(0) = 1 Learning rate: Measure of the accuracy of the

classification using feature Fi at iteration n: Ain

Function to convert Ain into weight variation: (Ain)

(Ain) = (1 + e-Ain)-1 * [1 - (1 + e-Ain)-1]²

Threshold improvement in accuracy: є

Iteration limit: nmax

Page 25: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 25

Learning the Weights Start with wi(0) At iteration n, for feature Fi

Measure Ain

Compute wi(n) = (Ain) wi(n) = wi(n-1) + wi(n)

End if n = nmax (entire algorithm) Ain < є (feature Fi)

Page 26: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 26

Unsupervised Learning Given a training corpus of data points

Observed value of random variables in Bayesian network

Series of data points Orbits of planets

Learn underlying pattern in the data Existence and conditional probability of

hidden variables Number of classes and classification

rules Kepler’s laws of planetary motion

Page 27: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 27

Unsupervised Learning Example

2D state space with unclassified observations

Learn number and form of clusters

Problem of unsupervised clustering Many algorithms proposed

for it More research still being

done for better algorithms, different kind of data, …

*

*

* *

** * *

****

** *

* ***

**

*

**

* **** ***

Page 28: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 28

*

*

* *

** * *

****

** *

* ***

**

*

**

* **** ***

Unsupervised Learning Algorithm

Define a similarity measure, to compare pairs of elements

Starting with no clusters Pick seed element Group similar elements

until threshold Pick new seed from

free elements and start again

**

*

Page 29: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 29

***

****

*

*

* *

** * *

****

** * *

* * **** ***

Unsupervised Learning Algorithm

Starting with one all-encompassing cluster Find cluster with highest

internal dissimilarity Find most dissimilar pair

of elements inside cluster Split into two clusters Repeat until all clusters

have internal homogeneity

Merge homogeneous clusters

*

* *

*

*

*

Page 30: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 30

Unsupervised Learning Evaluation Need to evaluate fitness of

relationship learned Number of clusters vs. their internal

properties Difference between clusters vs.

internal homogeneity Number of parameters vs. number of

hidden variables in Bayesian network No way of knowing what is the

optimal solution

Page 31: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 31

K-Means Popular unsupervised clustering

algorithm Data represented as cloud of

points in state space Target

Group points in k clusters Minimize intra-cluster variance

Page 32: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 32

K-Means Start with k random cluster centers For each iteration

For each data point Associate the point to the nearest cluster

center Add to variance

Move each cluster center to the center of mass of associated data point cloud

End when Variance less than threshold Cluster centers stabilize

Page 33: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 33

K-Means We have:

Data points: x1, …, xi, …, xn

Clusters: C1, …, Cj, … Ck

Cluster centers: 1, …, j, … k

Minimize intra-cluster variance V = j xiCj |xi - j|²

Page 34: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 34

*

*

* *

** * *

****

** *

* ***

**

*

**

* **** ***

*

*

* *

** * *

****

** *

* ***

**

*

**

* **** ***

*

*

* *

** * *

****

** *

* ***

**

*

**

* **** ***

*

*

* *

** * *

****

** *

* ***

**

*

**

* **** ***

K-Means Example

o

o

o

o

o

o

o

o

o

o

o

Page 35: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 35

Reinforcement Learning Given a set of possible actions, the

resulting state of the environment, and rewards or punishment for each state Taxi driver: tips, car repair costs, tickets Checkers: advantage in number of pieces

Learn to maximize the rewards and/or minimize the punishments Maximize tip, minimize damage to car and

police tickets: drive properly Protect own pieces, take enemy pieces:

good play strategy

Page 36: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 36

Reinforcement Learning Learning by trial and error Try something, see the result

Speeding results in tickets, going through a red light results in car damage, quick and safe drive results in tips

Checkers pieces in the center of the board are soon lost, pieces on the side are kept longer, sacrifice some pieces to take a greater number of enemy pieces

Sacrifice known rewarding actions to explore new, potentially more rewarding actions

Develop strategies to maximize rewards while minimizing penalties over the long-term

Page 37: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 37

Q-Learning Each state has

A reward or punishment A list of possible actions, which lead

to other states Learn value of state-action pairs

Q-value

Page 38: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 38

Q-Learning Update value of previous (t-1) state-action

pair based on current (t) state-action value Q(st-1,at-1) = [Rt-1 + maxa(Q(st,at)) –

Q(st-1,at-1)] Q(s,a): estimated value of state-action pair

(s,a) Rt: reward of state st

: learning rate : discount factor of future rewards

0 (future rewards are irrelevant), 1 (future rewards are the same as current rewards)

Update: Q(st-1,at-1) = Q(st-1,at-1) + Q(st-1,at-1)

Page 39: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 39

Exploration Function If agent always does action with

max Q(s,a), it always evaluates the same state-action pairs

Need exploration function Trade-off greed vs. curiosity Try rarely-explored low-payoff actions

instead of well-known high-payoff actions

Many possible functions

Page 40: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 40

Exploration Function Define:

Q(s,a): estimated value of (s,a) N(s,a): Number of times (s,a) has been tried Rmax: maximum possible value of Q(s,a) Nmin: minimum number of times we want the

agent to try (s,a) f( Q(s,a), N(s,a) ) =

Rmax if N(s,a) < Nmin

Q(s,a) otherwise Agent picks action with maximum f(.) value Guarantees each (s,a) pair is explored at

least Nmin times

Page 41: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 41

Limits of RL Search

Number of state-action pairs can be very large

Intermediate rewards can be noisy Real-world search

Initial policy can have very poor reward

Necessary exploration of suboptimal actions can be costly

Some states are hard to reach

Page 42: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 42

Policy Learn the optimal policy in decision

network : S A EU() = t=0

t Rt

Greedy search Modify policy until EU stops

increasing

Page 43: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 43

Helicopter Flight Control Sustained stable inverted flight

Very difficult for humans First AI able to do it

Page 44: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 44

Helicopter Flight Control Collect flight data with human pilot Learn model of helicopter

dynamics Stochastic and nonlinear Supervised learning

Learn policy for helicopter controller Reinforcement learning

Page 45: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 45

Helicopter Dynamics States

Position, orientation, velocity, angular velocity

12 variables 391 seconds of flight data

Time step 0.1s 3910 triplets (st, at, st+1)

Learn probability distribution P(st+1|st,at) Implemented in simulator and tested

by pilot

Page 46: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 46

Helicopter Controller Problem definition

S: set of possible states s0: initial state (s0 S) A: set of possible actions P(S|S,A): state transition probabilities : discount factor R: reward function mapping states to values

At state st, controller picks action at, system transitions to random state st+1 with probability P(st+1|st,at)

Page 47: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 47

Helicopter Controller Reward function

Punish deviation from desired helicopter position and velocity

R [-, 0] Policy learning

Reinforcement learning EU() = t=0

t Rt

Problem Stochastic state transitions Impossible to compare several policies!

Page 48: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 48

PEGASUS algorithm Predefined series of random numbers

Length of series function of complexity of policy

Use the same series to test all policies At time t, each policy encounters the same

random event Simulate stochastic environment

Environment stochastic from point of view of agent

Environment deterministic from our point of view

Makes comparison between policies possible

Page 49: Introduction to Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 49

Summary of Learning

Supervised Unsupervised

Reinforcement

Training data

Data and correct output

DataStates,

actions, and rewards

Learning target

Data-output

relationship

Patterns in data

Policy

Evaluation

Statistics FitnessReward value

Typical application

Classifiers Clustering Controllers