19
Importance sampling for MC simulation (“Importance-weighted random walk”) M i i i b a x b a x x f M x x f dx x x x f dx x f A 1 ) ( ) ( ) ( 1 ) ( ) ( ) ( ) ( ) ( ) ( Sampling points from a uniform distribution may not be the best way for MC. When most of the weight of the integral comes from a small range of x where f(x) is large, sampling more often in this region would increase the accuracy of the MC.

Importance sampling for MC simulation (“Importance-weighted random walk”) Sampling points from a uniform distribution may not be the best way for MC. When

Embed Size (px)

Citation preview

Page 1: Importance sampling for MC simulation (“Importance-weighted random walk”) Sampling points from a uniform distribution may not be the best way for MC. When

Importance sampling for MC simulation(“Importance-weighted random walk”)

M

i i

ib

ax

b

a x

xf

Mx

xfdxx

x

xfdxxfA

1)()(

)(1

)(

)()(

)(

)()(

Sampling points from a uniform distribution may not be the best way for MC.When most of the weight of the integral comes from a small range of x where f(x) is large, sampling more often in this region would increase the accuracy of the MC.

Page 2: Importance sampling for MC simulation (“Importance-weighted random walk”) Sampling points from a uniform distribution may not be the best way for MC. When

Example: Importance of small region: Measuring the depth of the NileSystematic

quadrature or uniform sampling

Importance sampling(importance weighted random

walk)

Frenkel and Smit, Understanding Molecular Simulations

Page 3: Importance sampling for MC simulation (“Importance-weighted random walk”) Sampling points from a uniform distribution may not be the best way for MC. When

Cyrus Levinthal formulated the “Levinthal paradox” (late 1960’s): -Consider a protein molecule composed of (only) 100 residues, -each of which can assume (only) 3 different conformations. -The number of possible structures of this protein yields 3100 = 5×1047. -Assume that it takes (only) 100 fs to convert from one structure to another. -It would require 5×1034 s = 1.6×1027 years to ”systematically” explore all possibilities.-This long time disagrees with the actual folding time (μs~ms). Levinthal’s paradox

Example: Importance of small region:Energy funnel in protein folding

Page 4: Importance sampling for MC simulation (“Importance-weighted random walk”) Sampling points from a uniform distribution may not be the best way for MC. When

To decrease the error of MC simulation

(f) = standard deviation in the observable O, i.e., y = f(x), itself

f measures how much f(x) deviates from its average over the integration region. ~ independent of the number of trials M (or N) ~ estimated from one simulation

where,OOtrue M

)deviation standard , trialsofnumber ( M~ cost

O=f(x)

a b x

A

<f>

x2 x1xi xM

O=f(x)

a b x

A

<f>

O

0 1 p(O)

O

0 1 p(O)

vs.

(f = 0)(f > 0) non-ideal, real case

the ideal case

M

i i

ib

ax

b

a x

xf

Mx

xfdxx

x

xfdxxfA

1)()(

)(1

)(

)()(

)(

)()(

Importance sampling

flat function

sharp(probabilit

y) distributio

n

broad distributio

n

fluctuating, varying function f

f

Page 5: Importance sampling for MC simulation (“Importance-weighted random walk”) Sampling points from a uniform distribution may not be the best way for MC. When

From N-step uniform sampling

From normalized importance sampling

weighted by w(x)

3 in accurac

y

Importance sampling for MC simulation: Example

Page 6: Importance sampling for MC simulation (“Importance-weighted random walk”) Sampling points from a uniform distribution may not be the best way for MC. When

Lab 3: Importance sampling for MC simulation

Page 7: Importance sampling for MC simulation (“Importance-weighted random walk”) Sampling points from a uniform distribution may not be the best way for MC. When

* Calculate the normalization constant N for each probability distribution function (x).

What’s new in Lab 3: Importance sampling* Include “cpu.h”, compile “cpu.c”, and call “cpu()” to measure the cpu time of the run.tstart = cpu();

~tend = cpu(); printf("CPU time: %5.5f s, CPU time/measure: %g s", tend - tstart, (tend - tstart) / M);

* Include “ran3.h” & “fran3.h”; compile fran3.c; call “fexp” or “flin” defined in fran3.r = ran3(&seed);

f_x = exp(-x*x);r = ran3(&seed);rho = fexp(r, &x);f_x = exp(-x*x) / rho;

r = ran3(&seed);rho = flin(r, &x);f_x = exp(-x*x) / rho;

* Display histograms for the distribution of x values generated by fran3 (for the step 4 only).r = ran3(&seed);

f_x = exp(-x*x);

i_hist = (int) (x * inv_dx);if (i_hist < n_hist)hist[i_hist] += 1.0;

r = ran3(&seed);rho = fexp(r, &x);f_x = exp(-x*x) / rho;

i_hist = (int) (x * inv_dx);if (i_hist < n_hist)hist[i_hist] += 1.0;

r = ran3(&seed);rho = flin(r, &x);f_x = exp(-x*x) / rho;

i_hist = (int) (x * inv_dx);if (i_hist < n_hist)hist[i_hist] += 1.0;

Page 8: Importance sampling for MC simulation (“Importance-weighted random walk”) Sampling points from a uniform distribution may not be the best way for MC. When

/* number of bins in histogram*/n_hist = 50;

/* Allocate memory for histogram */hist = (double *) allocate_1d_array(n_hist, sizeof(double));

/* Initializae histogram */for (i_hist = 0; i_hist < n_hist; ++i_hist) hist[i_hist] = 0.0;

/* Size of the ihistogram bins */dx = 1.0 / n_hist; /* 1.0 is the size of the interval */inv_dx = 1.0 / dx;

/* histogram accumulated */i_hist = (int) (x * inv_dx);if (i_hist < n_hist)hist[i_hist] += 1.0;

/* Write histogram. */fp = fopen("hist_2.dat", "w");for (i_hist = 0; i_hist < n_hist; ++i_hist) {x = (i_hist + 0.5) * dx;fprintf(fp, "%g %g", x, hist[i_hist] / M); }fclose(fp);

* Display histograms for the distribution of x values generated by fran3. (full version)

Plot with gnuplot, excel, origin, …Fit to a function. What is the resulted function?

Page 9: Importance sampling for MC simulation (“Importance-weighted random walk”) Sampling points from a uniform distribution may not be the best way for MC. When

Further reading: Sampling a non-uniform & discrete probability distribution {pi} (tower sampling)

(Ref) Gould, Toboshnik, Christian, Ch. 11.5

Page 10: Importance sampling for MC simulation (“Importance-weighted random walk”) Sampling points from a uniform distribution may not be the best way for MC. When

Further reading:Sampling a non-uniform & continuous probability distribution (x)

Page 11: Importance sampling for MC simulation (“Importance-weighted random walk”) Sampling points from a uniform distribution may not be the best way for MC. When

Example

Page 12: Importance sampling for MC simulation (“Importance-weighted random walk”) Sampling points from a uniform distribution may not be the best way for MC. When
Page 13: Importance sampling for MC simulation (“Importance-weighted random walk”) Sampling points from a uniform distribution may not be the best way for MC. When
Page 14: Importance sampling for MC simulation (“Importance-weighted random walk”) Sampling points from a uniform distribution may not be the best way for MC. When
Page 15: Importance sampling for MC simulation (“Importance-weighted random walk”) Sampling points from a uniform distribution may not be the best way for MC. When

Lab 3: Importance sampling for MC simulation

Integrand f(x)

constantdistributi

on function(uniform)

gooddistributi

on function

baddistributi

on function

Normalized!

Normalized!

Normalized!

w(x)(x)

p(x)

Page 16: Importance sampling for MC simulation (“Importance-weighted random walk”) Sampling points from a uniform distribution may not be the best way for MC. When

Results: Importance sampling for MC simulationOne experiment with M measures

n = 100 experiments, each with M measures

Page 17: Importance sampling for MC simulation (“Importance-weighted random walk”) Sampling points from a uniform distribution may not be the best way for MC. When

Results: Importance sampling for MC simulationOne experiment with M measures

n = 100 experiments, each with M measures

Page 18: Importance sampling for MC simulation (“Importance-weighted random walk”) Sampling points from a uniform distribution may not be the best way for MC. When

M

i i

ib

ax

b

a x

xf

Mx

xfdxx

x

xfdxxfA

1)()(

)(1

)(

)()(

)(

)()(

Analogy: Throw a dice with the results of {1, 2, 2, 2, 2, 4, 5, 6}.

(Discrete) probability distribution {pi, i=1-6}

= {1, 4, 0, 1, 1, 1} / 8 where 8 = 1+4+0+1+1+1 = {1/8, 1/2, 0, 1/8, 1/8, 1/8}

Mean value <A> = 3 = 24/8 = (1+2+2+2+2+4+5+6) / 8 = (1 + 2x4 + 4 + 5 + 6) / 8 = 1x1/8 + 2x4/8 + 3x0/8 + 4x1/8 + 5x1/8 + 6x1/8

Importance sampling = Importance-weighted average

)(constant for )(1

otherwise )(

)()(

1)( )( normalizedfor

)()(

1

1

1

1

1

i

M

ii

M

ii

i

M

ii

M

iii

i

M

ii

xpxfM

xp

xpxf

xpxp

xpxff

Page 19: Importance sampling for MC simulation (“Importance-weighted random walk”) Sampling points from a uniform distribution may not be the best way for MC. When

Beyond 1D integrals: A system of N particles in a container

of a volume V in contact with a thermostat T (constant NVT)

(=1/kT) or for discrete microstates

for discrete microstates

• Particles interact with each other through a potential energy U(rN) (~ pair potential).

• U(rN) is the potential energy of a microstate {rN} = {x1, y1, z1, …, xN, yN, zN}.

• (rN) is the probability to find a microstate {rN} under the constant-NVT constraint.

• Partition function Z (required for normalization) = the weighted sum of all the microstates compatible with the constant-NVT condition

• Average of an observable O, <O>, over all the microstates compatible with constant NVT

or for discrete microstates

ensemble average

“canonical ensemble”

external constraint