73
1 Introduction to Machine Learning Active Learning Barnabás Póczos

Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

  • Upload
    others

  • View
    18

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

1

Introduction to Machine Learning

Active Learning

Barnabás Póczos

Page 2: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

2

Credits

Some of the slides are taken from Nina Balcan.

Page 3: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

3

Modern applications: massive amounts of raw data.

Only a tiny fraction can be annotated by human experts.

Billions of webpages Images

Classic Supervised Learning Paradigm is Insufficient Nowadays

Sensor measurements

Page 4: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

4

Modern applications: massive amounts of raw data.

Expert

We need techniques that minimize need for expert/human

intervention => Active Learning

Modern ML: New Learning Approaches

The Large Synoptic

Survey Telescope

15 Terabytes of data …

every night

Page 5: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

5

Active Learning Intro

▪ Batch Active Learning vs Selective Sampling Active Learning

▪ Exponential Improvement on # of labels

▪ Sampling bias: Active Learning can hurt performance

Active Learning with SVM

Gaussian Processes▪ Regression

▪ Properties of Multivariate Gaussian distributions

▪ Ridge regression

▪ GP = Bayesian Ridge Regression + Kernel trick

Active Learning with Gaussian Processes

Contents

Page 6: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

6

• Two faces of active learning. Sanjoy Dasgupta. 2011.

• Active Learning. Bur Settles. 2012.

• Active Learning. Balcan-Urner. Encyclopedia of Algorithms. 2015

Additional resources

Page 7: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

7

A Label for that Example

Request for the Label of an Example

A Label for that Example

Request for the Label of an Example

Data Source

A set of

Unlabeled

examples

. . .

Algorithm outputs a classifier w.r.t D

Learning

Algorithm

Expert

• Learner can choose specific examples to be labeled.

• Goal: use fewer labeled examples

[pick informative examples to be labeled].

Underlying data distr. D.

Batch Active Learning

Page 8: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

8

Unlabeled

example 𝑥3

Unlabeled

example 𝑥1

Unlabeled

example 𝑥2

Request for

label or let it

go?

Data Source

Learning

Algorithm

Expert

Request label

A label 𝑦1for example 𝑥1

Let it

go

Algorithm outputs a classifier w.r.t D

Request label

A label 𝑦3for example 𝑥3

• Selective sampling AL (Online AL): stream of unlabeled examples,

when each arrives make a decision to ask for label or not.

• Goal: use fewer labeled examples[pick informative examples to be labeled].

Underlying data distr. D.

Selective Sampling Active Learning

Page 9: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

9

• Need to choose the label requests carefully, to get

informative labels.

• Guaranteed to output a relatively good classifier for

most learning problems.

• Doesn’t make too many label requests.

Hopefully a lot less than passive learning.

What Makes a Good Active Learning Algorithm?

Page 10: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

10

• YES! (sometimes)

• We often need far fewer labels for active learning

than for passive.

• This is predicted by theory and has been observed

in practice.

Can adaptive querying really do better than passive/random sampling?

Page 11: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

11

• Threshold fns on the real line:

w

+-

Exponential improvement.

hw(x) = 1(x ¸ w), C = {hw: w 2 R}

• How can we recover the correct labels with ≪ N queries?

-

• Do binary search (query at half)!

Active: only O(log 1/ϵ) labels.

Passive supervised: Ω(1/ϵ) labels to find an -accurate threshold.

+-

Active Algorithm

Just need O(log N) labels!

• N = O(1/ϵ) we are guaranteed to get a classifier of error ≤ ϵ.

• Get N unlabeled examples

• Output a classifier consistent with the N inferred labels.

Can adaptive querying help? [CAL92, Dasgupta04]

Page 12: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

12

Uncertainty sampling in SVMs common and quite useful in

practice.

• At any time during the alg., we have a “current guess” wt

of the separator: the max-margin separator of all labeled

points so far.

• Request the label of the example closest to the current

separator.

E.g., [Tong & Koller, ICML 2000; Jain, Vijayanarasimhan & Grauman, NIPS 2010;

Schohon Cohn, ICML 2000]

Active SVM Algorithm

Active SVM

Page 13: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

13

Active SVM seems to be quite useful in practice.

• Find 𝑤𝑡 the max-margin

separator of all labeled

points so far.

• Request the label of the example

closest to the current separator:

minimizing 𝑥𝑖 ⋅ 𝑤𝑡 .

[Tong & Koller, ICML 2000; Jain, Vijayanarasimhan & Grauman, NIPS 2010]

Algorithm (batch version)

Input Su={x1, …,xmu} drawn i.i.d from the underlying source D

Start: query for the labels of a few random 𝑥𝑖s.

For 𝒕 = 𝟏, ….,

(highest uncertainty)

Active SVM

Page 14: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

14

• Uncertainty sampling works sometimes….

• However, we need to be very very very careful!!!

• Myopic, greedy techniques can suffer from sampling bias.

(The active learning algorithm samples from a different (x,y)

distribution than the true data)

• A bias created because of the querying strategy; as time goes

on the sample is less and less representative of the true data

source.[Dasgupta10]

DANGER!!!

Page 15: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

15

DANGER!!!

• Main tension: want to choose informative points, but also want to guarantee that the classifier we output does well on true random examples from the underlying distribution.

• Observed in practice too!!!!

Page 16: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

16

Interesting open question to analyze under

what conditions they are successful.

Other Interesting Active Learning Techniques used in Practice

Page 17: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

17

Centroid of largest unsampled cluster

[Jaime G. Carbonell]

Density-Based Sampling

Page 18: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

18

Closest to decision boundary (Active SVM)

[Jaime G. Carbonell]

Uncertainty Sampling

Page 19: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

19

Maximally distant from labeled x’s

[Jaime G. Carbonell]

Maximal Diversity Sampling

Page 20: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

20

Uncertainty + Diversity criteria

Density + uncertainty criteria

[Jaime G. Carbonell]

Ensemble-Based Possibilities

Page 21: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

21

Active learning could be really helpful, could provide

exponential improvements in label complexity (both

theoretically and practically)!

Need to be very careful due to sampling bias.

Common heuristics

• (e.g., those based on uncertainty sampling).

What You Should Know so far

Page 22: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

22

Gaussian Processes for Regression

Page 23: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

23

http://www.gaussianprocess.org/

Some of these slides are taken from D. Lizotte, R. Parr, C. Guesterin

Additional resources

Page 24: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

24

• Nonmyopic Active Learning of Gaussian Processes: An

Exploration–Exploitation Approach. A.Krause and C. Guestrin,

ICML 2007

• Near-Optimal Sensor Placements in Gaussian Processes: Theory,

Efficient Algorithms and Empirical Studies. A.Krause, A. Singh,

and C. Guestrin, Journal of Machine Learning Research 9 (2008)

• Bayesian Active Learning for Posterior Estimation, Kandasamy,

K., Schneider, J., and Poczos, B, International Joint Conference on

Artificial Intelligence (IJCAI), 2015

Additional resources

Page 25: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

25

Why GPs for Regression?

Motivation:

All the above regression method give point estimates. We would like a method that could also provide confidence during the estimation.

Regression methods:

Linear regression, multilayer precpetron, ridge regression, support vector regression, kNN regression, etc…

Application in Active Learning:

This method can be used for active learning: query the next point and its label where the uncertainty is the highest

Page 26: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

26

• Here’s where the function will most likely be.(expected function)

• Here are some examples of what it might look like. (sampling from the posterior distribution [blue, red, green functions)

• Here is a prediction of what you’ll see if you evaluate your function at x’, with confidence

GPs can answer the following questions:

Why GPs for Regression?

Page 27: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

27

Properties of Multivariate Gaussian

Distributions

Page 28: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

28

1D Gaussian DistributionParameters

• Mean,

• Variance, 2

Page 29: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

29

Multivariate Gaussian

Page 30: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

30

A 2-dimensional Gaussian is defined by

• a mean vector = [ 1, 2 ]

• a covariance matrix:

where i,j2 = E[ (xi – i) (xj – j) ]

is (co)variance

Note: is symmetric,

“positive semi-definite”: x: xT x 0

2

2,2

2

2,1

2

1,2

2

1,1

Multivariate Gaussian

Page 31: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

31

Multivariate Gaussian examples

= (0,0)

18.0

8.01

Page 32: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

32

Multivariate Gaussian examples

= (0,0)

18.0

8.01

Page 33: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

33

Marginal distributions of Gaussians are Gaussian

Given:

The marginal distribution is:

bbba

abaa

baba xxx ),(),,(

Useful Properties of Gaussians

Page 34: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

34

Marginal distributions of Gaussians are Gaussian

Page 35: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

35

Block Matrix Inversion

Theorem

Definition: Schur complements

Page 36: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

36

Conditional distributions of Gaussians are Gaussian

Notation:

Conditional Distribution:

bbba

abaa1

bbba

abaa

Useful Properties of Gaussians

Page 37: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

37

Higher Dimensions Visualizing > 3 dimensional Gaussian random

variables is… difficult

Means and variances of marginal variable s are practical, but then we don’t see correlations between those variables

Marginals are Gaussian, e.g., f(6) ~ N(µ(6), σ2(6))

Visualizing an 8-dimensional Gaussian variable f:

µ(6)σ2(6)

654321 7 8

Page 38: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

38

Yet Higher DimensionsWhy stop there?

Page 39: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

39

Getting Ridiculous

Why stop there?

Page 40: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

40

Gaussian Process

Probability distribution indexed by an arbitrary set (integer, real, finite dimensional vector, etc)

Each element (indexed by x) is a Gaussian distribution over the reals with mean µ(x)

These distributions are dependent/correlated as defined by k(x,z)

Any finite subset of indices defines a multivariate Gaussian distribution

Definition of GP:

Page 41: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

41

Gaussian Process

Distribution over functions….

If our regression model is a GP, then it won’t be a point estimate anymore! It can provide regression estimates with confidence

Domain (index set) of the functions can be pretty much whatever

• Reals

• Real vectors

• Graphs

• Strings

• Sets

• …

Page 42: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

42

Bayesian Updates for GPs

• How can we do regression and learn the GP from data?

• We will be Bayesians today:• Start with GP prior

• Get some data

• Compute a posterior

Page 43: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

43

Samples from the prior distribution

Picture is taken from Rasmussen and Williams

Page 44: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

44

Samples from the posterior distribution

Picture is taken from Rasmussen and Williams

Page 45: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

45

PriorZero mean Gaussians with covariance k(x,z)

Page 46: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

46

Data

Page 47: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

47

Posterior

Page 48: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

48

Ridge Regression

Linear regression:

Ridge regression:

The Gaussian Process is a Bayesian Generalization

of the kernelized ridge regression

Page 49: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

49

Weight Space View

GP = Bayesian ridge regression in feature space

+ Kernel trick to carry out computations

The training data

Page 50: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

50

Bayesian Analysis of Linear Regression with Gaussian noise

Linear regression:

Linear regression

with noise:

Page 51: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

51

Bayesian Analysis of Linear Regression with Gaussian noise

The likelihood:

Page 52: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

52

Bayesian Analysis of Linear Regression with Gaussian noise

The prior:

Now, we can calculate the posterior:

Page 53: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

53

Bayesian Analysis of Linear Regression with Gaussian noise

After “completing the square”

MAP estimation

Ridge Regression

Page 54: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

54

Bayesian Analysis of Linear Regression with Gaussian noise

This posterior covariance matrix doesn’t depend on the observations y,

A strange property of Gaussian Processes

Page 55: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

55

Projections of Inputs into Feature Space

The Bayesian linear regression suffers from

limited expressiveness

To overcome the problem )

go to a feature space and do linear regression there

a., explicit features

b., implicit features (kernels)

Page 56: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

56

Explicit Features

Linear regression in the feature space

Page 57: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

57

Explicit Features

The predictive distribution after feature map:

Reminder: This is what we had without feature maps:

Page 58: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

58

Explicit Features

Shorthands:

The predictive distribution after feature map:

Page 59: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

59

Explicit FeaturesThe predictive distribution after feature map:

A problem with (*) is that it needs an NxN matrix inversion...

(*)

(*) can be rewritten:

Theorem:

Page 60: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

60

Proofs• Mean expression. We need:

• Variance expression. We need:

Lemma:

Matrix inversion Lemma:

Page 61: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

61

From Explicit to Implicit Features

Page 62: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

62

From Explicit to Implicit Features

The feature space always appears in the form of:

Lemma:

No need to know the explicit N dimensional features.

Their inner product is enough.

Page 63: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

63

GP pseudo code Inputs:

Page 64: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

64

GP pseudo code (continued)

Outputs:

Page 65: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

65

Results

Page 66: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

66

Results using Netlab , Sin function

Page 67: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

67

Results using Netlab, Sin function

Increased # of training points

Page 68: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

68

Results using Netlab, Sin function

Increased noise

Page 69: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

69

Results using Netlab, Sinc function

Page 70: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

70

Applications: Sensor placement

Temperature modeling with GP

Near-Optimal Sensor Placements in Gaussian

Processes: Theory, Efficient Algorithms and

Empirical Studies. A.Krause, A. Singh, and C.

Guestrin, Journal of Machine Learning Research (2008)

Page 71: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

71

An example of placements chosen using entropy and mutual information

criteria on temperature data. Diamonds indicate the positions chosen

using entropy; squares the positions chosen using MI.

Applications: Sensor placement

Entropy criterion:

Mutual information criterion:

Page 72: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

Properties of Multivariate Gaussian distribution

Gaussian process = Bayesian Ridge Regression

GP Algorithm

GP application in active learning

What You Should Know

Page 73: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

73

Thanks for the Attention! ☺