4

Click here to load reader

Comparison of transition-matrix sampling procedures

  • Upload
    bjarne

  • View
    217

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Comparison of transition-matrix sampling procedures

1Mccq(imsmttrngghpHtz

2Arsooctpbhpsus

184 J. Opt. Soc. Am. A/Vol. 26, No. 1 /January 2009 Yevick et al.

Comparison of transition-matrix samplingprocedures

David Yevick,1,* Michael Reimer,1 and Bjarne Tromborg2

1Department of Physics and Astronomy, University of Waterloo, 200 University Avenue West, Waterloo, Ontario,Canada, N2L 3G1

2Technical University of Denmark, Department of Photonics Engineering, DK-2800 Kgs. Lyngby, Denmark*Corresponding author: [email protected]

Received July 9, 2008; revised September 18, 2008; accepted October 16, 2008;posted November 13, 2008 (Doc. ID 98507); published December 24, 2008

We compare the accuracy of the multicanonical procedure with that of transition-matrix models of static anddynamic communication system properties incorporating different acceptance rules. We find that for appropri-ate ranges of the underlying numerical parameters, algorithmically simple yet highly accurate procedures canbe employed in place of the standard multicanonical sampling algorithm. © 2008 Optical Society of America

OCIS codes: 060.0060, 060.2330, 060.2400, 260.5430.

aro=mcdicng=cpa=cifwvfihs

3Tetamtsd

. INTRODUCTIONulticanonical methods [1] have found numerous appli-

ations to optical systems since they were first adapted toommunications theory in [2–4]. Although only staticuantities, such as the probability distribution functionpdf), p�E�, that a system is in a configuration character-zed by values E of its observables, can be calculated with

ulticanonical techniques, we recently removed this re-triction by a suitable implementation of the transition-atrix method [5,6]. Our analysis employed different

echniques for generating statistical samples. The first ofhese is based on a modified multicanonical acceptanceule that retains the accuracy of the standard multica-onical method but converges faster and is simpler to pro-ram. A second procedure uniformly samples different re-ions of the pdf but is generally less accurate. Finally, weave modified this procedure to increase the algorithmicrecision without affecting the sampling probabilities.ere we examine in greater detail the relative advan-

ages of these three procedure with emphasis on discreti-ation errors associated with the histogram bin widths.

. MULTICANONICAL PROCEDUREs our work partly incorporates the multicanonical algo-ithm [7] we first summarize this procedure. Consider aystem described in general by a NE-component vector ofbservables E�a�= �E1�a� ,E2�a� , . . .ENE

�a��T that dependn the elements of a Na-component vector of stochasti-ally varying parameters a. A stochastic function such ashe joint pdf, p�E�, can be obtained by partitioning an ap-ropriate region of the observable space into NB bins la-eled by the index m=1,2, . . . ,NB. The elements of twoistograms, one of which will contain the unnormalizeddf estimate after i iterations, pm

�i� (initially i=0), and aecond Hm that stores temporary data, are then set tonity (we identify discretized histogram variables withans serif font).

1084-7529/09/010184-4/$15.00 © 2

In the first iteration, a random set of the system vari-bles acur is selected, and the histogram entry Hmcur

cor-esponding to E�acur� with bin index mcur is increased byne. A Markov chain is implemented according to anew

acur+�a, transforming E�acur� and mcur into E�anew� andnew, where each element of the parameter variation �a

an be generated stochastically by an effectively arbitraryistribution function [8]. The resulting Markov transitions accepted with probability min�1,pmcur

�0� /pmnew

�0� � in whichase anew is employed as the new acur. After a specifiedumber of steps M an improved estimate pm

�i� of p�E� isenerated from the previous estimate according to pm

�i�

cpm�i−1�Hm for each m=1,2, . . . ,NB, in which the constant

normalizes this estimate of pm�i� and multiplication by

m�i−1� eliminates the bias introduced into Hm through thecceptance rule. All elements of the histogram Hm, m1,2, . . . ,NB, are then reset to unity and the above pro-edure is iterated. Since the pm

�i−1� remain unity in unvis-ted histogram bins, the transitions among such statesollow an unbiased Monte Carlo probability distributionhile the remaining transitions preferentially sample Ealues for which p�E� is small. Accordingly, the relativerequency of transitions into and among unsampled statess enhanced. As the number of iterations becomes larger,owever, the histogram bins Hm gradually become equallyampled [2].

. TRANSITION-MATRIX METHODShe above procedure can be applied to dynamic systemvolution if the frequency of all accepted and rejectedransitions is retained [5,7]. In particular, specializing to

single observable so that E is replaced by E, the ele-ents of an unnormalized transition matrix tlk (and addi-

ionally in certain methods a histogram V of visitedtates) are initialized to zero while the initial acur are ran-omly generated. For each accepted or rejected transition

009 Optical Society of America

Page 2: Comparison of transition-matrix sampling procedures

frtubtats

mtasMt

mcmpVdobo

peoetcsteiEtva

cocwtcstsoww

sct

iviElmdtnet

oosTwtmfbsfdholntica

ptettprqTnhcoaaqttiwnssatsbnp

Yevick et al. Vol. 26, No. 1 /January 2009 /J. Opt. Soc. Am. A 185

rom an initial k : th histogram bin to a final l : th bin cor-esponding to the observables E�acur� and E�anew�, respec-ively, tlk, and where relevant, Vk, are incremented bynity. Following an accepted transition, the state anew inin l replaces acur as the initial state for the subsequentime step. The normalized transition-matrix T is gener-ted by scaling each column of t such that �lTlk=1, sincehe probabilities of all transitions from a state k mustum to unity.

From the above discussion, we observe that transition-atrix methods are distinguished by the choice of accep-

ance rule. Although since all unbiased transitions out ofstate are recorded, so that any transition rule is permis-

ible, we demonstrate below that if the change in E over aarkov step is comparable to the histogram bin widths,

he discretization error is dependent on the rule selected.In previous work, we considered two procedures. Inethod 1 we employed the standard multicanonical ac-

eptance rule to populate the transition matrix, whileethod 2 accepted a transition only if the final state was

reviously visited fewer times than the initial state, i.e.,l�Vk. In this paper, we further refine method 1 by up-ating our estimate of p�E� after every small number Nuf steps. Here we rewrite the detailed balance conditionetween each pair of adjacent histogram bins in the formf a recursion relation for the pdf [9–11],

pm+1 = pm

Tm+1,m

Tm,m+1. �1�

Thus, starting from an arbitrary initial value for therobability of the first histogram bin and assuming, forxample, that pm is independent of m if either Tm+1,m=0r Tm,m+1=0, we obtain an estimate of p�E� that can bemployed in the multicanonical acceptance rule. Sincehis calculation requires negligible computation time, pman be regenerated after any desired number of Markovteps (although preferably the procedure should be ini-ialized with a Monte Carlo calculation). Dynamic systemvolution can be modeled by repeatedly multiplying annitial state distribution by the transition matrix T [7].ach multiplication is analogous to evolving the system

hrough a simulated time interval �t, so that the eigen-ector of T with unit eigenvalue corresponds to the prob-bility distribution function pm [11].In general, if the width of the m : th histogram bin is

omparable to or greater than the average change in Ever a single Markov step into or within this bin, the ac-uracy of pm is affected by the Markov chain dynamicsithin the bin—in fact we find in the calculations below

hat this is the dominant source of error under the givenonditions. That is, ideally the probability of visitingtates within a bin should follow the physical distributionhat is obtained in a Markov chain calculation in the ab-ence of an acceptance rule. Otherwise states are spuri-usly depleted toward one side of the histogram bin,hich significantly alters the population of the statesithin the bin for large bin sizes.However, if the acceptance rule is formulated to pre-

erve detailed balance, as in the case of the multicanoni-al method, the depletion of states out of one side of a his-ogram bin is compensated for by an equal number of

ncoming transitions from the neighboring bin. This pre-ents states in the Markov chain from, on average, enter-ng bins preferentially from, for example, lower values of

and exiting toward higher values, which affects the re-ationship, Eq. (1), between the pdf and the transition-

atrix elements. Therefore, for procedures that preserveetailed balance, the discretization error associated withhe finite bin size is considerably reduced, as we verifyumerically below. The magnitude of the reduction, how-ver, depends on a potentially large number of computa-ional parameters.

Considering next method 2, which accepts transitionsnly to less visited states, suppose that at a certain stagef the calculation the acceptance rule permits only tran-itions from the current bin to bins with a smaller pdf.hen within a bin, a larger than average fraction of statesith high pdf will result from incoming transitions. (Note

hat for a large bin width with high probability one orore Markov steps are required for the state to transfer

rom the larger to the smaller probability region of thein.) Therefore, the ratio between the frequency of tran-itions out of the bin to large pdf states and the transitionrequency to small pdf states is smaller than in the stan-ard Monte Carlo procedure. However, approximatelyalf of the time, the acceptance rule will instead permitnly transitions from the current bin to higher instead ofower pdf states. In this case, however, the above bias isot fully compensated for since most Monte Carlo transi-ions occur in any case to higher pdf states. Thus averag-ng over both possibilities for the acceptance rule, we con-lude that the effective transition probability will onverage be enhanced in the direction of large pdf values.We have indeed observed in a number of different com-

utational contexts involving a single observable E thathe states in the Markov chain on average evolve prefer-ntially at a constant velocity from one region of low p�E�o the opposing region, after which the simulation some-imes enters into a previously unsampled state with low�E�. [In the uncommon case of several isolated large p�E�egions, this behavior will still occur modified by infre-uent transitions between different high p�E� regions.]he Markov chain then remains in the new bin until theumber of samples in the bin equals that of the adjacentistogram bin within the higher p�E� region. The Markovhain then retraverses the problem domain toward theriginal starting point. This ensures that on average thecceptance rule excludes transitions to higher pdf statess many times as transitions to lower pdf states. Conse-uently, method 2 yields a spurious bias that augmentshe computed slope of p�E�. To restore the correct transi-ion probabilities, in our method 3, following a transitionnto a given bin, we discard all Markov steps associatedith transitions out of a histogram bin until a certainumber of confined steps Nc have been executed. Theteady-state statistical distribution of states is then re-tored within the bin. The value of Nc required to ensuregiven level of accuracy, however, can in general be de-

ermined only empirically, since the average number ofteps that the Markov chain spends in a histogram binefore exiting depends in a complicated fashion on the binumber and the computational and physical details of theroblem.

Page 3: Comparison of transition-matrix sampling procedures

4Weiptt=tDttr

rctcpgwiptfereglhrrcdftro

1tpg

tnawctoteuqecete

Fsm1hd(N

Ftm

FbMm(

186 J. Opt. Soc. Am. A/Vol. 26, No. 1 /January 2009 Yevick et al.

. COMPUTATIONAL RESULTSe now quantify the accuracy of our three procedures by

valuating the pdf of the differential group delay (DGD) �n a polarization mode dispersion (PMD) emulator com-osed of Nsec=10 polarization-maintaining (PM) fiber sec-ions with �s=1.0 ps, separated by randomizing polariza-ion controllers. This yields an average DGD of �ave

�s�8Nsec/3�=2.91 ps for the emulator. Our histogram ishen formed by dividing the interval [0, 3.5] of normalizedGD values � /�ave into 100 equal width segments. Thus

he system variable E is identified with the DGD �, whilehe bin designated by the index m corresponds to theange 3.5�m−1� /100�E�3.5m /100.

We display in the main graph (left axis) of Fig. 1 theatio pm

numerical /pmanalytic for a standard multicanonical cal-

ulation after three 1.67�106 step iterations (� symbols),he transition-matrix technique with a multicanonical ac-eptance rule (method 1, � symbols), a transition methodrocedure that rejects transitions to more sampled histo-ram bins (method 2, dashed–dotted curve), and method 3ith Nc, the number of steps in which the Markov chain

s confined to a bin before transitions out of the bin areermitted, equal to 40 (crosses). By repeating our calcula-ion with different values of Nc our method 3 curves areound to be nearly indistinguishable for Nc�20; in gen-ral, for a given computation and a desired level of accu-acy, the minimum value of Nc can be determined onlympirically. The curve marked by the right arrow to-ether with the right axis of the figure displays the ana-ytic pdf of the DGD of the fiber emulator [12]. Here weave employed 5�107 emulator realizations in which theelative angle between each pair of emulator segments isandomly varied by an average of � /80 deg between suc-essive realizations. Further, in method 1 the pdf is up-ated after every step according to the current estimaterom Eq. (1) of the transition matrix. As discussed in Sec-ion 3, method 2 predicts a reduced slope and therefore aatio �1 near the pdf maximum and �1 for DGD valuesccurring with low probability. Further, although method

ig. 1. Ratio between the numerical and analytic pdfs for thetandard multicanonical procedure (�), our modified transitionatrix procedure with a multicanonical acceptance rule (method

, �), an acceptance rule that rejects transitions to more visitedistogram bins (method 2, dashed–dotted curve), and a proce-ure that restricts transitions out of a recently visited binmethod 3, crosses) as functions of the normalized DGD for a

=10 segment fiber emulator (solid curve).

sec

yields an accuracy comparable with the standard mul-icanonical method, we have found that it exhibits im-roved convergence with fewer samples and reduced pro-ramming complexity.

Considering next in Fig. 2 the total number of times Vkhat a state in bin k is visited for the standard multica-onical procedure (� symbols), method 1 (� symbols),nd method 2 (dashed–dotted curve) we observe thathile the numerical error of method 2 is large, this pro-

edure samples the PDF most evenly. Method 3 sampleshe PDF almost identically to method 2 and is thereforemitted to increase the legibility of the graph. Further,he sum over many iterations of the number of samples inach bin for the standard multicanonical procedure is notniform as a result of variations in the sampling fre-uency in small pdf regions during each iteration. How-ver, the multicanonical procedure and its variants con-entrate samples in regions of very low pdf and thereforestimate the pdf rapidly in these regions. Method 1 fur-her improves on this feature since the intermediate pdfstimates are frequently revised.

0 1 2 30

0.5

1

1.5

2x 10

4

His

togr

amof

visi

ted

stat

es

τ / τave

ig. 2. Total number of times each histogram bin is visited forhe standard multicanonical procedure (�), method 1 (�), andethod 2 (dashed–dotted curve).

0 0.1 0.2 0.3 0.40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Mea

ner

ror

Step sizeig. 3. Variation of the error, Eq. (2), weighted by the histogramin probability as a function of the average DGD change over onearkov step for the standard multicanonical method (�),ethod 1 (�), method 2 (dashed–dotted curve), and method 3

crosses).

Page 4: Comparison of transition-matrix sampling procedures

h

idwrrarcotatpc

5Tcishettnonateta

sFmp

R

1

1

1

1

Yevick et al. Vol. 26, No. 1 /January 2009 /J. Opt. Soc. Am. A 187

Finally, in Fig. 3, we display the error averaged over allistogram bins according to [8]:

�m=1

NB

pmanalyticlog10�pm

numerical

pmanalytic � , �2�

n which �…� denotes an ensemble average over one hun-red 2�106-sample calculations as a function of theidth of the uniform distribution of the change �a in the

elative angles of two adjacent emulator segments. Theesults of the multicanonical procedure and methods 1–3re denoted by �, �, a dashed–dotted curve, and crosses,espectively. The accuracy of all methods greatly in-reases when the mean variation of the system observablever a single step in the Markov chain is large comparedo the size of a histogram bin. Further, the improvementfforded by method 3 over method 2 is clearly visible inhe figure, where for the smallest step sizes shown, therecision of method 3 approaches that of the multicanoni-al procedure.

. CONCLUSIONSransition-matrix procedures, which extend multicanoni-al techniques to dynamic problems, possess numerousmplementations that differ considerably in accuracy andensitivity to variations in numerical parameters. Weere achieved a high degree of efficiency and accuracy bymploying the multicanonical acceptance rule while con-inually updating the pdf estimates. We then interpretedhe apparent violation of detailed balance and hence ofumerical accuracy at large histogram bin sizes in termsf the change in probabilities of transitions to the twoeighboring bins compared with the values expected forn unbiased calculation. Acceptance rules that increasehe probability of asymmetric transitions out of a bin gen-rally enhance this source of error. Procedures of this na-ure can however be simple to program and computation-

lly efficient if the numerical parameters ensure a small

elf-transition probability within each histogram bin.urther pursuing this line of reasoning could yield evenore efficient transition-matrix methods with possible ex-

erimental relevance [4,13].

EFERENCES1. B. Berg and T. Neuhaus, “Multicanonical algorithms for

first-order phase transitions,” Phys. Lett. B 267, 249–253(1991).

2. D. Yevick, “Multicanonical communication systemmodeling—Application to PMD statistics,” IEEE Photon.Technol. Lett. 14, 1512–1514 (2002).

3. D. Yevick, “The accuracy of multicanonical system models,”IEEE Photon. Technol. Lett. 15, 224–226 (2003).

4. T. Lu, D. Yevick, L. Yan, B. Zhang, and A. E. Willner, “Anexperimental approach to multicanonical sampling,” IEEEPhoton. Technol. Lett. 16, 1978–1980 (2004).

5. J-S. Wang and R. Swendsen, “Transition matrix MonteCarlo method,” J. Stat. Phys. 106, 245–285 (2002).

6. M. Fitzgerald, R. Picard, and R. Silver, “Monte Carlotransition dynamics and variance reduction,” J. Stat. Phys.98, 321–345 (2000).

7. D. Yevick and M. Reimer, “Transition matrix analysis ofsystem outages,” IEEE Photon. Technol. Lett. 19,1529–1531 (2007).

8. D. Yevick and T. Lu, “Improved multicanonical algorithms,”J. Opt. Soc. Am. A 23, 2912–2918 (2006).

9. M. S. Shell, P. Debenedetti, and A. Panagiotopoulos, “Animproved Monte Carlo method for direct calculation of thedensity of states,” J. Chem. Phys. 119, 9406–9411 (2003).

0. R. Ghulghazaryan, S. Hayryan, and C. Hu, “Efficientcombination of Wang-Landau and transition matrix MonteCarlo methods for protein simulations,” J. Comput. Chem.28, 715–726 (2006).

1. D. Yevick and M. Reimer, “Modified transition matrixsimulations of communication systems,” IEEE Commun.Lett. 12, 755–757 (2008).

2. M. Karlsson, “Probability density functions of thedifferential group delay in optical fiber communicationsystems,” J. Lightwave Technol. 19, 324–331 (2001).

3. T. Lu, D. Yevick, B. Hamilton, D. Dumas, and M. Reimer,“An experimental realization of biased multicanonicalsampling,” IEEE Photon. Technol. Lett. 17, 1583–2585

(2005).