25
Learning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency M. Magdon-Ismail CSCI 4100/6100

Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Learning From DataLecture 17

Memory and Efficiency in Nearest Neighbor

MemoryEfficiency

M. Magdon-IsmailCSCI 4100/6100

Page 2: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

recap: Similarity and Nearest Neighbor

Similarity

d(x,x′) = ||x− x′ ||

1-NN rule

21-NN rule

1. Simple.

2. No training.

3. Near optimal Eout:

k →∞, k/N → 0 =⇒ Eout → E∗out.

4. Good ways to choose k:

k = 3; k =⌈√

N⌉

; validation/cross validation.

5. Easy to justify classification to customer.

6. Can easily do multi-class.

7. Can easily adapt to regression or logistic regression

g(x) =1

k

k∑

i=1

y[i](x) g(x) =1

k

k∑

i=1

q

y[i](x) = +1y

8. Computationally demanding.

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 2 /25 Computational demands −→

Page 3: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Computational Demands of Nearest Neighbor

Memory.

Need to store all the data, O(Nd) memory.

N = 106, d = 100, double precision≈ 1GB

Finding the nearest neighbor of a test point.

Need to compute distance to every data point, O(Nd).

N = 106, d = 100, 3GHz processor ≈ 3ms (compute g(x))

N = 106, d = 100, 3GHz processor ≈ 1hr (compute CV error)

N = 106, d = 100, 3GHz processor > 1month (choose best k from among 1000 using CV)

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 3 /25 Two basic approaches −→

Page 4: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Two Basic Approaches

Reduce the amount of data.

The 5-year old does not remember every horse he has seen, only a few representative horses.

Store the data in a specialized data structure.

Ongoing research field to develop geometric data structures to make finding nearest neighbors fast.

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 4 /25 Irrelevant data −→

Page 5: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Throw Away Irrelevant Data

−−−−−−→

−−−→

k = 1

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 5 /25 Decision boundary consistent −→

Page 6: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Decision Boundary Consistent

−−−−−−→

−−−→

g(x) unchanged

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 6 /25 Training set consistent −→

Page 7: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Training Set Consistent

−−−−−−→

−−−→

g(xn) unchanged

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 7 /25 Comparing −→

Page 8: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Decision Boundary Vs. Training Set Consistent

DB−−−−−−→

TS

−−−→

g(x) unchangedversus

g(xn) unchanged

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 8 /25 Consistent 6=⇒ (g(xn) = yn) −→

Page 9: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Consistent Does Not Mean g(xn) = yn

DB−−−−−−→

TS

−−−→

k = 3

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 9 /25 Training set consistent (k = 3) −→

Page 10: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Training Set Consistent (k = 3)

−−−−−−→

−−−→

g(xn) unchanged

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 10 /25 CNN −→

Page 11: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

CNN: Condensed Nearest Neighbor (k = 3)

+ +

++

add this point

Consider the solid blue point:i. blue w.r.t. selected pointsii. red w.r.t. D

Add a red point:i. not already selected

ii. closest to the inconsistent point

1. Randomly select k data points into S .

2. Classify all data according to S .

3. Let x∗ be an inconsistent point and y∗ its class w.r.t. D.

4. Add the closest point to x∗ not in S that has class y∗.

5. Iterate until S classifies all points consistently with D.

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 11 /25 CNN: add red point −→

Page 12: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

CNN: Condensed Nearest Neighbor

+ +

++

add this point

Consider the solid blue point:i. blue w.r.t. selected pointsii. red w.r.t. D

Add a red point:i. not already selected

ii. closest to the inconsistent point

1. Randomly select k data points into S .

2. Classify all data according to S .

3. Let x∗ be an inconsistent point and y∗ its class w.r.t. D.

4. Add the closest point to x∗ not in S that has class y∗.

5. Iterate until S classifies all points consistently with D.

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 12 /25 CNN: algorithm −→

Page 13: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

CNN: Condensed Nearest Neighbor

+ +

++

add this point

Consider the solid blue point:i. blue w.r.t. selected pointsii. red w.r.t. D

Add a red point:i. not already selected

ii. closest to the inconsistent point

1. Randomly select k data points into S .

2. Classify all data according to S .

3. Let x∗ be an inconsistent point and y∗ its class w.r.t. D.

4. Add the closest point to x∗ not in S that has class y∗.

5. Iterate until S classifies all points consistently with D.

Minimum consistent set (MCS)?← NP-hard

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 13 /25 Digits Data −→

Page 14: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Nearest Neighbor on Digits Data

1-NN rule 21-NN rule

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 14 /25 Condensing the Digits Data −→

Page 15: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Condensing the Digits Data

1-NN rule 21-NN rule

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 15 /25 Finding the nearest neighbor −→

Page 16: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Finding the Nearest Neighbor

1. S1, S2 are ‘clusters’ with centers µ1,µ2 and radii r1, r2.

2. [Branch] Search S1 first→ x̂[1].

3. The distance from x to any point in S2 is at least

||x− µ2 || − r2

4. [Bound] So we are done if

||x− x̂[1] || ≤ ||x− µ2 || − r2

A branch and bound algorithm

Can be applied recursively

S1

S2

x

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 16 /25 When does the bound hold? −→

Page 17: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

When Does the Bound Hold?

Bound condition: ||x− x̂[1] || ≤ ||x− µ2 || − r2.

||x− x̂[1] || ≤ ||x− µ1 ||+ r1

So, it suffices that

r1 + r2 ≤ ||x− µ2 || − ||x− µ1 ||.

||x− µ1 || ≈ 0 means ||x− µ2 || ≈ ||µ2 − µ2 ||.

It suffices that

r1 + r2 ≤ ||µ2 − µ1 ||. S1

S2

x

within cluster spread should be less than between cluster spread

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 17 /25 Finding clusters – Lloyd’s algorithm −→

Page 18: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Finding Clusters – Lloyd’s Algorithm

1. Pick well separated centers for each cluster.

2. Compute Voronoi regions as the clusters.

3. Update the Centers.

4. Update the Voronoi regions.

5. Compute centers and radii:

µj =1

|Sj|∑

xn∈Sj

xn; rj = maxxn∈Sj

||xn − µj ||.

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 18 /25 Furtherest away point −→

Page 19: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Finding Clusters – Lloyd’s Algorithm

1. Pick well separated centers for each cluster.

2. Compute Voronoi regions as the clusters.

3. Update the Centers.

4. Update the Voronoi regions.

5. Compute centers and radii:

µj =1

|Sj|∑

xn∈Sj

xn; rj = maxxn∈Sj

||xn − µj ||.

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 19 /25 Next furtherest away point −→

Page 20: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Finding Clusters – Lloyd’s Algorithm

1. Pick well separated centers for each cluster.

2. Compute Voronoi regions as the clusters.

3. Update the Centers.

4. Update the Voronoi regions.

5. Compute centers and radii:

µj =1

|Sj|∑

xn∈Sj

xn; rj = maxxn∈Sj

||xn − µj ||.

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 20 /25 All centers picked −→

Page 21: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Finding Clusters – Lloyd’s Algorithm

1. Pick well separated centers for each cluster.

2. Compute Voronoi regions as the clusters.

3. Update the Centers.

4. Update the Voronoi regions.

5. Compute centers and radii:

µj =1

|Sj|∑

xn∈Sj

xn; rj = maxxn∈Sj

||xn − µj ||.

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 21 /25 Construct Voronoi regions −→

Page 22: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Finding Clusters – Lloyd’s Algorithm

1. Pick well separated centers for each cluster.

2. Compute Voronoi regions as the clusters.

3. Update the Centers.

4. Update the Voronoi regions.

5. Compute centers and radii:

µj =1

|Sj|∑

xn∈Sj

xn; rj = maxxn∈Sj

||xn − µj ||.

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 22 /25 Update centers −→

Page 23: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Finding Clusters – Lloyd’s Algorithm

1. Pick well separated centers for each cluster.

2. Compute Voronoi regions as the clusters.

3. Update the Centers.

4. Update the Voronoi regions.

5. Compute centers and radii:

µj =1

|Sj|∑

xn∈Sj

xn; rj = maxxn∈Sj

||xn − µj ||.

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 23 /25 Update Voronoi regions −→

Page 24: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Finding Clusters – Lloyd’s Algorithm

1. Pick well separated centers for each cluster.

2. Compute Voronoi regions as the clusters.

3. Update the Centers.

4. Update the Voronoi regions.

5. Compute centers and radii:

µj =1

|Sj|∑

xn∈Sj

xn; rj = maxxn∈Sj

||xn − µj ||.

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 24 /25 Preview RBF −→

Page 25: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Radial Basis Functions (RBF)

k-Nearest Neighbor: Only considers k-nearest neighbors.each neighbor has equal weight

What about using all data to compute g(x)?

RBF: Use all data.data further away from x have less weight.

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 25 /25