Upload
lesa
View
56
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Instance Based Learning. Ata Kaban The University of Birmingham. Today we learn: K-Nearest Neighbours Locally weighted regression Case-based reasoning Lazy and eager learning. Instance-based learning. One way of solving tasks of approximating discrete or real valued target functions - PowerPoint PPT Presentation
Citation preview
1
Instance Based Learning
Ata KabanThe University of Birmingham
2
Today we learn: K-Nearest Neighbours Locally weighted regression Case-based reasoning Lazy and eager learning
3
Instance-based learning
One way of solving tasks of approximating discrete or real valued target functions
Have training examples: (xn, f(xn)), n=1..N. Key idea:
– just store the training examples– when a test example is given then find the
closest matches
4
1-Nearest neighbour:Given a query instance xq,
• first locate the nearest training example xn
• then f(xq):= f(xn) K-Nearest neighbour:
Given a query instance xq, • first locate the k nearest training examples • if discrete values target function then take
vote among its k nearest nbrs else if real valued target fct then take the mean of the f values of the k nearest nbrs
kxf
xfk
i iq
1)(
:)(
5
The distance between examples
We need a measure of distance in order to know who are the neighbours
Assume that we have T attributes for the learning problem. Then one example point x has elements xt , t=1,…T.
The distance between two points xi xj is often defined as the Euclidean distance:
T
ttjtiji xxd
1
2][),( xx
6
Voronoi Diagram
7
Characteristics of Inst-b-Learning
An instance-based learner is a lazy-learner and does all the work when the test example is presented. This is opposed to so-called eager-learners, which build a parameterised compact model of the target.
It produces local approximation to the target function (different with each test instance)
8
When to consider Nearest Neighbour algorithms?
Instances map to points in Not more then say 20 attributes per
instance Lots of training data Advantages:
– Training is very fast– Can learn complex target functions– Don’t lose information
Disadvantages:– ? (will see them shortly…)
n
9
twoone
four
three
five six
seven Eight ?
10
Training dataNumber Lines Line types Rectangles Colours Mondrian?
1 6 1 10 4 No
2 4 2 8 5 No
3 5 2 7 4 Yes
4 5 1 8 4 Yes
5 5 1 10 5 No
6 6 1 8 6 Yes
7 7 1 14 5 No
Number Lines Line types Rectangles Colours Mondrian?
8 7 2 9 4
Test instance
11
Keep data in normalised formOne way to normalise the data ar(x) to a´r(x) is
t
ttt
xxx
'
attributestofmeanx thr
attributestofdeviationndardsta tht
12
Normalised training dataNumber Lines Line
types Rectangles Colours Mondrian?
1 0.632 -0.632 0.327 -1.021 No
2 -1.581 1.581 -0.588 0.408 No
3 -0.474 1.581 -1.046 -1.021 Yes
4 -0.474 -0.632 -0.588 -1.021 Yes
5 -0.474 -0.632 0.327 0.408 No
6 0.632 -0.632 -0.588 1.837 Yes
7 1.739 -0.632 2.157 0.408 No
Number Lines Line types
Rectangles Colours Mondrian?
8 1.739 1.581 -0.131 -1.021
Test instance
13
Distances of test instance from training data
Example Distanceof testfromexample
Mondrian?
1 2.517 No
2 3.644 No
3 2.395 Yes
4 3.164 Yes
5 3.472 No
6 3.808 Yes
7 3.490 No
Classification1-NN Yes
3-NN Yes
5-NN No
7-NN No
14
What if the target function is real valued?
The k-nearest neighbour algorithm would just calculate the mean of the k nearest neighbours
15
Variant of kNN: Distance-Weighted kNN
We might want to weight nearer neighbors more heavily
Then it makes sense to use all training examples instead of just k (Stepard’s method)
2
1
1
),(1 where
)(:)(
iqik
i i
k
i iiq d
ww
fwf
xxx
x
16
Difficulties with k-nearest neighbour algorithms
Have to calculate the distance of the test case from all training cases
There may be irrelevant attributes amongst the attributes – curse of dimensionality
17
A generalisation: Locally Weighted Regression
Some useful terminology– Regression = approximating a real valued
function– Residual error = the error in approximating
the target function– Kernel function = the function of distance
that is used to determine the weight of each training example, I.e.
)),((:),(
12 iq
iqi xxdK
xxdw
18
(Locally Weighted Regression)
Note kNN forms local approximations to f for each query point Why not form an explicit local approximation for regions
surrounding the query point, e.g.– Fit a linear function to the k nearest neighbours (or a quadratic, …) e.g.
– …
)(
2
110
))(ˆ)((21)( minimize
)(...)()(ˆlet
qxkNNxq
nn
xfxfxE
xawxawwxf
19
Case-based reasoning (CBR) CBR is an advanced instance based learning
applied to more complex instance objects Objects may include complex structural
descriptions of cases & adaptation rules It doesn’t use Euclidean distance measures but
can do matching between objects using e.g. semantic nets.
It tries to model human problem-solving– uses past experience (cases) to solve new problems– retains solutions to new problems
CBR is an ongoing area of machine learning research with many applications
20
Applications of CBR Design
– landscape, building, mechanical, conceptual design of aircraft sub-systems
Planning– repair schedules
Diagnosis– medical
Adversarial reasoning– legal
21
CBR processNew Case
matching Matched Cases
Retrieve
Adapt?No
Yes
Closest CaseSuggest solution
Retain
Learn
Revise
Reuse
Case Base
Knowledge and Adaptation rules
22
CBR example: Property pricingCase Location
codeBedrooms Recep
roomsType floors Cond-
itionPrice£
1 8 2 1 terraced 1 poor 20,500
2 8 2 2 terraced 1 fair 25,000
3 5 1 2 semi 2 good 48,000
4 5 1 2 terraced 2 good 41,000
Case Locationcode
Bedrooms Receprooms
Type floors Cond-ition
Price£
5 7 2 2 semi 1 poor ???
Test instance
23
How rules are generated
Examine cases and look for ones that are almost identical– case 1 and case 2
• R1: If recep-rooms changes from 2 to 1 then reduce price by £5,000
– case 3 and case 4• R2: If Type changes from semi to terraced then
reduce price by £7,000
24
Matching Comparing test instance
– matches(5,1) = 3– matches(5,2) = 3– matches(5,3) = 2– matches(5,4) = 1
Estimate price of case 5 is £25,000
25
Adapting
Reverse rule 2– if type changes from terraced to semi then
increase price by £7,000 Apply reversed rule 2
– new estimate of price of property 5 is £32,000
26
Learning
So far we have a new case and an estimated price– nothing is added yet to the case base
If later we find house sold for £35,000 then the case would be added– could add a new rule
• if location changes from 8 to 7 increase price by £3,000
27
Problems with CBR
How should cases be represented? How should cases be indexed for fast
retrieval? How can good adaptation heuristics be
developed? When should old cases be removed?
28
Advantages
An local approximation is found for each test case
Knowledge is in a form understandable to human beings
Fast to train
29
Summary
K-Nearest Neighbour (Locally weighted regression) Case-based reasoning Lazy and eager learning
30
Lazy and Eager Learning
Lazy: wait for query before generalizing– k-Nearest Neighbour, Case based reasoning
Eager: generalize before seeing query– Radial Basis Function Networks, ID3, …
Does it matter?– Eager learner must create global approximation– Lazy learner can create many local
approximations