Upload
beatrice-owens
View
41
Download
0
Embed Size (px)
DESCRIPTION
Introduction to Learning. ECE457 Applied Artificial Intelligence Fall 2007 Lecture #11. Outline. Overview of learning Supervised learning Russell & Norvig, sections 18.1-18.3, 19.1 Unsupervised learning Russell & Norvig, section 20.3 Reinforcement learning - PowerPoint PPT Presentation
Citation preview
Introduction to Learning
ECE457 Applied Artificial IntelligenceFall 2007
Lecture #11
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 2
Outline Overview of learning Supervised learning
Russell & Norvig, sections 18.1-18.3, 19.1
Unsupervised learning Russell & Norvig, section 20.3
Reinforcement learning Russell & Norvig, sections 21.1, 21.3,
21.5
CS 498 & CS 698 (Prof. Ben-David)
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 3
Limit of Predefined Knowledge Many of the algorithms and techniques
we saw relied on predefined information Probability distributions Heuristics Utility functions
Only works if this information is easily available
For real-world application, often preferable to make agent learn the information automatically
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 4
Overview of Learning Learn what?
Facts about the world (KB) Decision-making strategy Probabilities, costs, functions, states, …
Learn from what? Training data
Often freely available for common problems Real world
Learn how? Need some form of feedback to the
agent
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 5
Learning and Feedback Supervised learning
Training data includes correct output Learn relationship between data and output Evaluate with statistics
Unsupervised learning Training data with no correct output Learn patterns in the data Evaluate with fitness of the pattern
Reinforcement learning Set of actions with rewards and punishments Learn to maximize reward & minimize
punishment Evaluate with value of reward
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 6
Supervised Learning Given a training corpus of data-
output pairs x & y values Email & spam/not spam Variable values & decision
Learn the relationship mapping the data to the output f(x) Spam features Decision rules
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 7
Supervised Learning Example
2D state space with binary classification
Learn function to separate both classes
+
- -- -
-
-
-
-
-
-
-
-
-
- - -- -
-
-
-
-
++
+
+
+
+
+
++
+
+
+
+
+
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 8
Decision Tree
+
- -- -
-
-
-
-
-
-
-
-
-
- - -- -
-
-
-
-
++
+
+
+
+
+
++
+
+
+
+
+
2 5
8
4
X>5
-
Yes
X<2
Y>8
NoYes
No
-
Y<4
NoYes
-
+
NoYes
-
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 9
Decision Tree
+
- -- -
-
-
-
-
-
-
-
-
-
- - -- -
-
-
-
-
++
+
+
+
+
+
++
+
+
+
+
+
2 5
8
4
X
-
<2
Y
-+-
-
>5Other
Other>8<4
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 10
Decision Tree Multiple variables and value to
decide whether email is spamEmail
UW Pics Words
Attach
Multi
Spam?
E1 N Y 58 Y Y YesE2 N N 132 N Y NoE3 N Y 1049 Y Y YesE4 N N 18 Y Y YesE5 Y N 26 Y Y NoE6 N Y 32 N Y YesE7 Y N 44 N Y NoE8 N Y 256 N Y YesE9 Y Y 2789 Y N NoE10 N Y 857 Y N NoE11 N N 732 N N NoE12 N Y 541 N Y Yes
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 11
Decision Tree Build decision tree
UW
Not Spam
Yes
Pictures
Multi
Not Spam
Spam
NoYes
Yes
Attach
No
Not Spam
Spam
NoYes
No
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 12
Supervised Learning Algorithm
Many possible learning techniques, depending on the problem and the data Start with inaccurate
initial hypothesis Refine to reduce error
or increase accuracy End with trade-off
between accuracy and simplicity
+
- -- -
-
-
-
-
-
-
-
-
-
- - -- -
-
-
-
-
++
+
+
+
+
+
++
+
+
+
+
+
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 13
Trade-Off Why trade-off
between accuracy and simplicity? Noisy data Special cases Measurement
errors
+
- -- -
-
-
-
-
-
-
-
-
-
- - -- -
-
-
-
-
++
+
+
+
+
+
+++
+
+
+
+-
-
-
+-
-
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 14
Supervised Learning Algorithm Learning a decision tree follows the
same general algorithm Start with all emails at root Pick attribute that will teach us the
most Highest information gain, i.e. difference of
probability of each class Branch using that attribute Repeat until trade-off between
accuracy of leafs and depth limit / relevance of attributes
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 15
Supervised Learning Evaluation
Statistical measures of agent’s performance RMS error between f(x) and y Making correct decision
With as few decision rules as possible Shallowest tree possible
Accuracy of a classification Precision and recall of a classification
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 16
Precision and Recall Binary classification: distinguish + (our
target) from – (everything else) Classifier makes mistakes
Classifies some + as – and some – as + Define four categories:
Actual value+ –
Classified as
+True
PositivesFalse
Positives
–False
Negatives
True Negative
s
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 17
Precision and Recall Precision
Proportion of selected items the classifier got right
TP / (TP + FP) Recall
Proportion of target items the classifier selected
TP / (TP + FN)
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 18
Precision and Recall Why ignore True Negatives? Typically, there are a lot more
negatives than positives Internet searches: + are target
websites, - are all other websites
Counting TN would skew the statistics and favour a system that classifies everything as negatives
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 19
Overfitting A common problem with supervised
learning is over-specializing the relation learned to the training data
Learning from irrelevant features of the data Email features such as: paragraph
indentation, number of typos, letter “x” in sender address, …
Works well on training data Because of poor sampling or random chance
Fails in real-world tests
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 20
Testing Data Evaluate the relation learned using
unseen test data i.e. that was not used in the training Therefore system not overfitted for it
Split training data beforehand, keep part away for testing Only works once! If you reuse testing data, you are
overfitting your system for that test!! Never do that!!!
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 21
Cross-Validation Shortcomings of holding out test data
Test only works once Training on less data, therefore result
less accurate n-fold cross-validation
Split the training corpus into n parts Train with n-1, test with 1 Run n tests, each time using a different
test part Final training with all data and best
features
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 22
Naïve Bayes Classifier P(Cx|F1,…,Fn) = P(Cx) i
P(Fi|Cx) Classify item in class Cx with
maximum probability Weighted Naïve Bayes Classifier
Paper on website Give each feature Fi a weight wi
Learn the proper weight values P(Cx|F1,…,Fn) = P(Cx) i
P(Fi|Cx)wi
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 23
Learning the Weights Start with initial weight values At each iteration, for each feature
Measure the impact of that feature on the accuracy of the classification
Modify the weight to increase the accuracy of the classification
End if Iteration limit is reached Accuracy increase less than threshold
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 24
Learning the Weights Define the following
Initial weight values: wi(0) = 1 Learning rate: Measure of the accuracy of the
classification using feature Fi at iteration n: Ain
Function to convert Ain into weight variation: (Ain)
(Ain) = (1 + e-Ain)-1 * [1 - (1 + e-Ain)-1]²
Threshold improvement in accuracy: є
Iteration limit: nmax
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 25
Learning the Weights Start with wi(0) At iteration n, for feature Fi
Measure Ain
Compute wi(n) = (Ain) wi(n) = wi(n-1) + wi(n)
End if n = nmax (entire algorithm) Ain < є (feature Fi)
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 26
Unsupervised Learning Given a training corpus of data points
Observed value of random variables in Bayesian network
Series of data points Orbits of planets
Learn underlying pattern in the data Existence and conditional probability of
hidden variables Number of classes and classification
rules Kepler’s laws of planetary motion
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 27
Unsupervised Learning Example
2D state space with unclassified observations
Learn number and form of clusters
Problem of unsupervised clustering Many algorithms proposed
for it More research still being
done for better algorithms, different kind of data, …
*
*
* *
** * *
****
** *
* ***
**
*
**
* **** ***
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 28
*
*
* *
** * *
****
** *
* ***
**
*
**
* **** ***
Unsupervised Learning Algorithm
Define a similarity measure, to compare pairs of elements
Starting with no clusters Pick seed element Group similar elements
until threshold Pick new seed from
free elements and start again
**
*
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 29
***
****
*
*
* *
** * *
****
** * *
* * **** ***
Unsupervised Learning Algorithm
Starting with one all-encompassing cluster Find cluster with highest
internal dissimilarity Find most dissimilar pair
of elements inside cluster Split into two clusters Repeat until all clusters
have internal homogeneity
Merge homogeneous clusters
*
* *
*
*
*
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 30
Unsupervised Learning Evaluation Need to evaluate fitness of
relationship learned Number of clusters vs. their internal
properties Difference between clusters vs.
internal homogeneity Number of parameters vs. number of
hidden variables in Bayesian network No way of knowing what is the
optimal solution
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 31
K-Means Popular unsupervised clustering
algorithm Data represented as cloud of
points in state space Target
Group points in k clusters Minimize intra-cluster variance
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 32
K-Means Start with k random cluster centers For each iteration
For each data point Associate the point to the nearest cluster
center Add to variance
Move each cluster center to the center of mass of associated data point cloud
End when Variance less than threshold Cluster centers stabilize
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 33
K-Means We have:
Data points: x1, …, xi, …, xn
Clusters: C1, …, Cj, … Ck
Cluster centers: 1, …, j, … k
Minimize intra-cluster variance V = j xiCj |xi - j|²
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 34
*
*
* *
** * *
****
** *
* ***
**
*
**
* **** ***
*
*
* *
** * *
****
** *
* ***
**
*
**
* **** ***
*
*
* *
** * *
****
** *
* ***
**
*
**
* **** ***
*
*
* *
** * *
****
** *
* ***
**
*
**
* **** ***
K-Means Example
o
o
o
o
o
o
o
o
o
o
o
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 35
Reinforcement Learning Given a set of possible actions, the
resulting state of the environment, and rewards or punishment for each state Taxi driver: tips, car repair costs, tickets Checkers: advantage in number of pieces
Learn to maximize the rewards and/or minimize the punishments Maximize tip, minimize damage to car and
police tickets: drive properly Protect own pieces, take enemy pieces:
good play strategy
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 36
Reinforcement Learning Learning by trial and error Try something, see the result
Speeding results in tickets, going through a red light results in car damage, quick and safe drive results in tips
Checkers pieces in the center of the board are soon lost, pieces on the side are kept longer, sacrifice some pieces to take a greater number of enemy pieces
Sacrifice known rewarding actions to explore new, potentially more rewarding actions
Develop strategies to maximize rewards while minimizing penalties over the long-term
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 37
Q-Learning Each state has
A reward or punishment A list of possible actions, which lead
to other states Learn value of state-action pairs
Q-value
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 38
Q-Learning Update value of previous (t-1) state-action
pair based on current (t) state-action value Q(st-1,at-1) = [Rt-1 + maxa(Q(st,at)) –
Q(st-1,at-1)] Q(s,a): estimated value of state-action pair
(s,a) Rt: reward of state st
: learning rate : discount factor of future rewards
0 (future rewards are irrelevant), 1 (future rewards are the same as current rewards)
Update: Q(st-1,at-1) = Q(st-1,at-1) + Q(st-1,at-1)
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 39
Exploration Function If agent always does action with
max Q(s,a), it always evaluates the same state-action pairs
Need exploration function Trade-off greed vs. curiosity Try rarely-explored low-payoff actions
instead of well-known high-payoff actions
Many possible functions
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 40
Exploration Function Define:
Q(s,a): estimated value of (s,a) N(s,a): Number of times (s,a) has been tried Rmax: maximum possible value of Q(s,a) Nmin: minimum number of times we want the
agent to try (s,a) f( Q(s,a), N(s,a) ) =
Rmax if N(s,a) < Nmin
Q(s,a) otherwise Agent picks action with maximum f(.) value Guarantees each (s,a) pair is explored at
least Nmin times
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 41
Limits of RL Search
Number of state-action pairs can be very large
Intermediate rewards can be noisy Real-world search
Initial policy can have very poor reward
Necessary exploration of suboptimal actions can be costly
Some states are hard to reach
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 42
Policy Learn the optimal policy in decision
network : S A EU() = t=0
t Rt
Greedy search Modify policy until EU stops
increasing
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 43
Helicopter Flight Control Sustained stable inverted flight
Very difficult for humans First AI able to do it
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 44
Helicopter Flight Control Collect flight data with human pilot Learn model of helicopter
dynamics Stochastic and nonlinear Supervised learning
Learn policy for helicopter controller Reinforcement learning
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 45
Helicopter Dynamics States
Position, orientation, velocity, angular velocity
12 variables 391 seconds of flight data
Time step 0.1s 3910 triplets (st, at, st+1)
Learn probability distribution P(st+1|st,at) Implemented in simulator and tested
by pilot
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 46
Helicopter Controller Problem definition
S: set of possible states s0: initial state (s0 S) A: set of possible actions P(S|S,A): state transition probabilities : discount factor R: reward function mapping states to values
At state st, controller picks action at, system transitions to random state st+1 with probability P(st+1|st,at)
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 47
Helicopter Controller Reward function
Punish deviation from desired helicopter position and velocity
R [-, 0] Policy learning
Reinforcement learning EU() = t=0
t Rt
Problem Stochastic state transitions Impossible to compare several policies!
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 48
PEGASUS algorithm Predefined series of random numbers
Length of series function of complexity of policy
Use the same series to test all policies At time t, each policy encounters the same
random event Simulate stochastic environment
Environment stochastic from point of view of agent
Environment deterministic from our point of view
Makes comparison between policies possible
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 49
Summary of Learning
Supervised Unsupervised
Reinforcement
Training data
Data and correct output
DataStates,
actions, and rewards
Learning target
Data-output
relationship
Patterns in data
Policy
Evaluation
Statistics FitnessReward value
Typical application
Classifiers Clustering Controllers