Importance Sampling ICS 276 Fall 2007 Rina Dechter

Importance Sampling

ICS 276Fall 2007

Rina Dechter

Outline

Gibbs Sampling Advances in Gibbs sampling

Blocking Cutset sampling (Rao-Blackwellisation)

Importance Sampling Advances in Importance Sampling Particle Filtering

Importance Sampling Theory

Z

EX

n

iii

EX

eEZPeEP

eXpaXPeEEXPeEP

),()(

)),(|(),\()(\ 1\

simplify E,\XZLet


Given a distribution called the proposal distribution Q (such that P(Z=z,e)>0=> Q(Z=z)>0)

Zz

eEzZPeEP ),()(

)()(

),()( zZQ

zZQ

eEzZPeEP

Zz

Zz

Q zZzQZE )( :value expected of definition By

)()(

),()( zZwE

zZQ

eEzZPEeEP QQ

w(Z=z) is called as importance weight


)()(

),()( zZwE

zZQ

eEzZPEeEP QQ

)()(ˆ ,N

)(1

)(

),(1)(ˆ

)z,...,(z Samples

Q fromdrawn samples ofset aGiven

11

n1

eEPeEPAs

zZwNzZQ

eEzZP

NeEP

N

i

ii

N

ii

i

Underlying principle, Approximate Average over a set of numbers by an average over a set of sampled numbers

Importance Sampling (Informally) Express the problem as computing the

average over a set of real numbers Sample a subset of real numbers Approximate the true average by sample

average. True Average:

Average of (0.11, 0.24, 0.55, 0.77, 0.88,0.99)=0.59 Sample Average over 2 samples:

Average of (0.24, 0.77) = 0.505

How to generate samples from Q

Express Q in product form: Q(Z)=Q(Z1)Q(Z2|Z1)….Q(Zn|Z1,..Zn-1)

Sample along the order Z1,..Zn

Example: Q(Z1)=(0.2,0.8) Q(Z2|Z1)=(0.2,0.8,0.1,0.9) Q(Z3|Z1,Z2)=Q(Z3|Z1)=(0.5,0.5,0.3,0.7)

N

ii

i

zZQ

eEzZP

NeEP

1 )(

),(1)(

How to sample from Q

Generate a random number between 0 and 1

Q(Z1)=(0.2,0.8)Q(Z2|Z1)=(0.2,0.8,0.1,0.9)Q(Z3|Z1,Z2)=Q(Z3|Z1)=(0.5,0.5,0.3,0.7)

0 10.2

Which value to select for Z1?

Domains of each variable is {0,1}

01

How to sample from Q?

Each Sample Z=z Sample Z1=z1 from Q(Z1) Sample Z2=z2 from Q(Z2|Z1=z1) Sample Z3=z3 from Q(Z3|Z1=z1)

Generate N such samples

)(1

)(

),(1)(

)z,...,(z Samples

11

n1

iN

i

N

ii

i

zZwNzZQ

eEzZP

NeEP

Likelihood weighting

Q= Prior Distribution=CPTs of the Bayesian network

Likelihood weighting example

lung Cancer

Smoking

X-ray

Bronchitis

DyspnoeaP(D|C,B)

P(B|S)

P(S)

P(X|C,S)

P(C|S)

P(S, C, B, X, D) = P(S) P(C|S) P(B|S) P(X|C,S) P(D|C,B)

0)BC,|S)P(DC,|1S)P(X|0S)P(B|P(S)P(C0)B1,P(X

false0 and 1 where?)0,1( trueBXP

Likelihood weighting example

lung Cancer

Smoking

X-ray

Bronchitis

DyspnoeaP(D|C,B)

P(B|S)

P(S)

P(X|C,S)

P(C|S)

Q=Prior

Q(S,C,D)=Q(S)*Q(C|S)*Q(D|C,B=0)

=P(S)P(C|S)P(D|C,B=0)

Sample S=s from P(S)

Sample C=c from P(C|S=s)

Sample D=d from P(D|C=c,B=0)

N

ii

i

zZQ

eEzZP

NeEP

1 )(

),(1)(

),|1()|0(

)0,|()|()(

)0,|(),|1()|0()|()(

)0,|()|()(

)0,1,,,(

)(

),()(

sScCXPsSBP

BcCdDPsScCPsSP

BcCdDPsScCXPsSBPsScCPsSP

BcCdDPsScCPsSP

BXdDcCsSP

zZQ

eEzZPzZw

i

ii

The Algorithm

N

P

w(e)P(e)P

paePww

eX

paxPxX

EX

XXoX

w

N1k

(e)P

k

iikk

ii

iiii

i

ni

k

(e)ˆReturn

ˆˆ

)|(

Assign

)|( from sample

:),...,(order icalin topologeach each For

1

to

0ˆ

1

else

if

For

How to solve belief updating?

eE

eExX

eEP

eExXPeExXP

ii

iiii

is Evidence :rDenominato

, is Evidence :Numerator

sampling importanceby r Denominato andNumerator Estimate

)(

),()|(

0 , z sample iff 1),(,

)(

)(),(

)|(ˆ

j

1

1

elsexXcontainszxwhere

zw

zwzx

eExXP

iij

i

N

j

j

N

j

jji

ii

Difference between estimating P(E=e) and P(Xi=xi|E=e)

N

i

izwN

eEP1

)(1

)(ˆ

N

j

j

N

j

jji

ii

zw

zwzx

eExXP

1

1

)(

)(),(

)|(ˆ

UnbiasedAsymptotically Unbiased )()(ˆ eEPeEPEQ )|()|(ˆ eExXPeExXPE iiiiQ

)|()|(ˆlim eExXPeExXPE iiiiQN

Proposal Distribution: Which is better?

e)P(E compute tosufficient is sample oneonly and

)()(ˆ then 0, varianceIf

ondistributi proposal variancelowprefer should one So

)()()(

is |)()(ˆ|y thatprobabilit The

22

2

eEPeEP

VariancezQeEPzw

eEPeEP

Zz

Outline




Research Issues in Importance Sampling

Better Proposal Distribution Likelihood weighting

Fung and Chang, 1990; Shachter and Peot, 1990 AIS-BN

Cheng and Druzdzel, 2000 Iterative Belief Propagation

Changhe and Druzdzel, 2003 Iterative Join Graph Propagation and

variable ordering Gogate and Dechter, 2005

Research Issues in Importance Sampling (Cheng and

Druzdzel 2000)

Adaptive Importance Sampling

k

)(ˆ Re

')(Q Update

)(N

1)(ˆe)(EP̂

Q z,...,z samples Generate

dok to1iFor

0)(ˆ

))(|(*..*))(|(*)()(Q Proposal Initial

1k

1

N1

2211

eEPturn

End

QQkQ

zweEP

from

eEP

ZpaZQZpaZQZQZ

kk

iN

jk

k

nn

Adaptive Importance Sampling

General case Given k proposal distributions Take N samples out of each

distribution Approximate P(e)

1)(ˆ

1

k

j

proposaljthweightAvgk

eP

Estimating Q'(z)

sampling importanceby estimated is

)Z,..,Z|(ZQ'each where

))(|('*..*))(|('*)(')(Q

1-i1i

221'

nn ZpaZQZpaZQZQZ

Cutset importance sampling

Divide the Set of variables into two parts Cutset (C) and Remaining Variables

(R)

instancefor bel-Elim using computed is )|(

)|(*)(

)(1)(

~

1

j

jN

jj

j

cCRP

cCRPcCQ

cCP

NeEP

(Gogate and Dechter, 2005) and (Bidyuk and Dechter 2006)

Outline




Dynamic Belief Networks (DBNs)

Bayesian Network at time t

Bayesian Network at time t+1

Transition arcs

Xt Xt+1

Yt Yt+1

X0 X1 X2

Y0 Y1 Y2

Unrolled DBN for t=0 to t=10

X10

Y10

Query

Compute P(X 0:t |Y 0:t ) or P(X t |Y 0:t ) Example P(X0:10|Y0:10) or P(X10|Y0:10)

Hard!!! over a long time period Approximate! Sample!

Particle Filtering (PF)

= “condensation” = “sequential Monte Carlo” = “survival of the fittest”

PF can treat any type of probability distribution, non-linearity, and non-stationarity;

PF are powerful sampling based inference/learning algorithms for DBNs.

Particle Filtering

On white board

Documents

Importance Sampling ICS 276 Fall 2007 Rina Dechter