15
Measurably evolving populations Alexei J Drummond and Remco Bouckaert December 13, 2012 1 Time-stamped data This tutorial estimates the rate of evolution from a set of virus sequences which have been isolated at different points in time (heterochronous or time-stamped data). The data are 129 sequences from the G (attachment protein) gene of human respiratory syncytial virus subgroup A (RSVA) from various parts of the world with isolation dates ranging from 1956-2002 [?, ?]. RSVA causes infections of the lower respiratory tract causing symptoms that are often indistinguishable from the common cold. By age 3, nearly all children will be infected and a small percentage (< 3%) will develop more serious inflammation of the bronchioles requiring hospitalisation. The aim of this tutorial is to obtain estimates for : the rate of molecular evolution the date of the most recent common ancestor the phylogenetic relationships with measures of statistical support. The following software will be used in this tutorial: BEAST - this package contains the BEAST program, BEAUti, DensiTree, TreeAn- notator and other utility programs. This tutorial is written for BEAST v2.0, which has support for multiple partitions. It is available for download from http://beast2.cs.auckland.ac.nz/. Tracer - this program is used to explore the output of BEAST (and other Bayesian MCMC programs). It graphically and quantitively summarizes the dis- tributions of continuous parameters and provides diagnostic information. At the time of writing, the current version is v1.5. It is available for download from http://beast.bio.ed.ac.uk/. FigTree - this is an application for displaying and printing molecular phylogenies, in particular those obtained using BEAST. At the time of writing, the current version is v1.3.1. It is available for download from http://tree.bio.ed.ac.uk/. 1

MEPsv2.0.2

Embed Size (px)

DESCRIPTION

5

Citation preview

  • Measurably evolving populations

    Alexei J Drummond and Remco Bouckaert

    December 13, 2012

    1 Time-stamped data

    This tutorial estimates the rate of evolution from a set of virus sequences which havebeen isolated at different points in time (heterochronous or time-stamped data). Thedata are 129 sequences from the G (attachment protein) gene of human respiratorysyncytial virus subgroup A (RSVA) from various parts of the world with isolation datesranging from 1956-2002 [?, ?]. RSVA causes infections of the lower respiratory tractcausing symptoms that are often indistinguishable from the common cold. By age 3,nearly all children will be infected and a small percentage (< 3%) will develop moreserious inflammation of the bronchioles requiring hospitalisation.

    The aim of this tutorial is to obtain estimates for :

    the rate of molecular evolution the date of the most recent common ancestor the phylogenetic relationships with measures of statistical support.The following software will be used in this tutorial:

    BEAST - this package contains the BEAST program, BEAUti, DensiTree, TreeAn-notator and other utility programs. This tutorial is written for BEAST v2.0,which has support for multiple partitions. It is available for download fromhttp://beast2.cs.auckland.ac.nz/.

    Tracer - this program is used to explore the output of BEAST (and otherBayesian MCMC programs). It graphically and quantitively summarizes the dis-tributions of continuous parameters and provides diagnostic information. At thetime of writing, the current version is v1.5. It is available for download fromhttp://beast.bio.ed.ac.uk/.

    FigTree - this is an application for displaying and printing molecular phylogenies,in particular those obtained using BEAST. At the time of writing, the currentversion is v1.3.1. It is available for download from http://tree.bio.ed.ac.uk/.

    1

  • The NEXUS alignment

    The data is in a file called RSV2.nex and you can find it in the examples/nexus directoryin the directory where BEAST was installed. This file contains an alignment of 129sequences from the G gene of RSVA virus, 629 nucleotides in length. Import thisalignment into BEAUti. Because this is a protein-coding gene we are going to split thealignment into three partitions representing each of the three codon positions. To dothis we will click the Split button at the bottom of the Partitions panel and thenselect the 1 + 2 + 3 frame 3 from the drop-down menu. This signifies that the firstfull codon starts at the third nucleotide in the alignment. This will create three rowsin the partitions panel. You will have to re-link the tree and clock models across thethree partitions (and name them tree and clock respectively) before continuing tothe next step.

    The partition panel should now look something like this:

    By default all the taxa are assumed to have a date of zero (i.e. the sequences areassumed to be sampled at the same time). In this case, the RSVA sequences have beensampled at various dates going back to the 1950s. The actual year of sampling is givenin the name of each taxon and we could simply edit the value in the Date column of thetable to reflect these. However, if the taxa names contain the calibration information,then a convenient way to specify the dates of the sequences in BEAUti is to click thecheckbox Use tip dates and then use the Guess button at the top of the Tip Datespanel. Clicking this will make a dialog box appear.

    2

  • Select the option to use everything, choose after last from from drop-down boxand types into the corresponding text box. This will extract the trailing numbers fromthe taxon names after the last little s, which are interpreted as the year (in this casesince 1900) that the sample was isolated.

    The dates panel should now look something like this:

    3

  • Setting the substitution model

    We will use the HKY model with empirical base frequencies for all three partitions. Todo this first link the site partitions and then choose HKY and Empirical from the SubstModel and Frequencies drop-boxes. Also check the estimate box for the Mutation Rateand finally check the Fix mean mutation rate box. Then go back to Partitions paneland unlink the partitions.

    1.0.1 Priors

    To set up the priors, select the Priors tab. Choose Coalescent Constant Populationfor the tree prior. Set the prior on the clockRate parameter to a log-normal withM = 5 and S = 1.25.

    4

  • 1.1 Setting the MCMC options

    For this dataset lets initially set the chain length to 2,000,000 as this will run reasonablyquickly on most modern computers. Set the sampling frequencies for the screen to to1000, the trace log file to 400 and the trees file to 400.

    Running BEAST

    Save the BEAST file (e.g. RSV2.xml) and run it in BEAST.

    5

  • Analysing the BEAST output

    Note that the effective sample sizes (ESSs) for many of the logged quantities are small(ESSs less than 100 will be highlighted in red by Tracer). This is not good. A low ESSmeans that the trace contains a lot of correlated samples and thus may not representthe posterior distribution well. In the bottom right of the window is a frequency plotof the samples which is expected given the low ESSs is extremely rough.

    If we select the tab on the right-hand-side labelled Trace we can view the rawtrace, that is, the sampled values against the step in the MCMC chain.

    Here you can see how the samples are correlated. There are 5000 samples in thetrace (we ran the MCMC for 2,000,000 steps sampling every 400) but adjacent samplesoften tend to have similar values. The ESS for the absolute rate of evolution (clockRate)is about 138 so we are only getting 1 independent sample to every 36 = 5000/138 actualsamples). With a short run such as this one, it may also be the case that the defaultburn-in of 10% of the chain length is inadequate. Not excluding enough of the start ofthe chain as burn-in will render estimates of ESS unreliable.

    The simple response to this situation is that we need to run the chain for longer.Given the lowest ESS (for the constant coalescent prior) is 93, it would suggest that wehave to run the chain for at least twice the length to get reasonable ESSs that are >200.However it would be better to aim higher so lets go for a chain length of 10,000,000.Go back to the MCMC options section in BEAUti, and create a new BEAST XMLfile with a longer chain length. Now run BEAST and load the new log file into Tracer(you can leave the old one loaded for comparison).

    6

  • Click on the Trace tab and look at the raw trace plot.

    Again we have chosen options that produce 5000 samples and with an ESS of about419 there is still auto-correlation between the samples but >400 effectively independentsamples will now provide a very good estimate of the posterior distribution. Thereare no obvious trends in the plot which would suggest that the MCMC has not yetconverged, and there are no significant long range fluctuations in the trace which wouldsuggest poor mixing.

    As we are satisfied with the mixing we can now move on to one of the parameters ofinterest: substitution rate. Select clockRate in the left-hand table. This is the averagesubstitution rate across all sites in the alignment. Now choose the density plot byselecting the tab labeled Marginal Density. This shows a plot of the marginal posteriorprobability density of this parameter. You should see a plot similar to this:

    7

  • As you can see the posterior probability density is roughly bell-shaped. There issome sampling noise which would be reduced if we ran the chain for longer or sampledmore often but we already have a good estimate of the mean and HPD interval. Youcan overlay the density plots of multiple traces in order to compare them (it is upto the user to determine whether they are comparable on the the same axis or not).Select the relative substitution rates for all three codon positions in the table to the left(labelled mutationRate.1, mutationRate.2 and mutationRate.3). You will now seethe posterior probability densities for the relative substitution rate at all three codonpositions overlaid:

    8

  • Summarizing the trees

    Use the program TreeAnnotator to summarize the tree and view the results in Figtree(Figure ??).

    9

  • 1940.0 1950.0 1960.0 1970.0 1980.0 1990.0 2000.0 2010.0 2020.0 2030.0 2040.0 2050.0 2060.0 2070.0 2080.0 2090.0 2100.0 2110.0 2120.0 2130.0 2140.0 2150.0 2160.0 2170.0 2180.0 2190.0 2200.0 2210.0 2220.0 2230.0 2240.0 2250.0 2260.0 2270.0 2280.0 2290.0 2300.0 2310.0 2320.0 2330.0 2340.0 2350.0 2360.0 2370.0 2380.0 2390.0 2400.0 2410.0 2420.0 2430.0 2440.0 2450.0 2460.0 2470.0 2480.0 2490.0 2500.0 2510.0 2520.0 2530.0 2540.0 2550.0 2560.0 2570.0 2580.0 2590.0 2600.0 2610.0 2620.0 2630.0 2640.0 2650.0 2660.0 2670.0 2680.0 2690.0 2700.0 2710.0 2720.0 2730.0 2740.0 2750.0 2760.0 2770.0 2780.0 2790.0 2800.0 2810.0 2820.0 2830.0 2840.0 2850.0 2860.0 2870.0 2880.0 2890.0 2900.0 2910.0 2920.0 2930.0 2940.0 2950.0 2960.0 2970.0 2980.0 2990.0 3000.0 3010.0 3020.0 3030.0 3040.0 3050.0 3060.0 3070.0 3080.0 3090.0 3100.0 3110.0 3120.0 3130.0 3140.0 3150.0 3160.0 3170.0 3180.0 3190.0 3200.0 3210.0 3220.0 3230.0 3240.0 3250.0 3260.0 3270.0 3280.0 3290.0 3300.0 3310.0 3320.0 3330.0 3340.0 3350.0 3360.0 3370.0 3380.0 3390.0 3400.0 3410.0 3420.0 3430.0 3440.0 3450.0 3460.0 3470.0 3480.0 3490.0 3500.0 3510.0 3520.0 3530.0 3540.0 3550.0 3560.0 3570.0 3580.0 3590.0 3600.0 3610.0 3620.0 3630.0 3640.0 3650.0 3660.0 3670.0 3680.0 3690.0 3700.0 3710.0 3720.0 3730.0 3740.0 3750.0 3760.0 3770.0 3780.0 3790.0 3800.0 3810.0 3820.0 3830.0 3840.0 3850.0 3860.0 3870.0 3880.0 3890.0 3900.0 3910.0 3920.0 3930.0 3940.0 3950.0 3960.0 3970.0 3980.0 3990.0 4000.0 4010.0 4020.0 4030.0 4040.0 4050.0 4060.0 4070.0 4080.0 4090.0 4100.0 4110.0 4120.0 4130.0 4140.0 4150.0 4160.0 4170.0 4180.0 4190.0 4200.0 4210.0 4220.0 4230.0 4240.0 4250.0 4260.0 4270.0 4280.0 4290.0 4300.0 4310.0 4320.0 4330.0 4340.0 4350.0 4360.0 4370.0 4380.0 4390.0 4400.0 4410.0 4420.0 4430.0 4440.0 4450.0 4460.0 4470.0 4480.0 4490.0 4500.0 4510.0 4520.0 4530.0 4540.0 4550.0 4560.0 4570.0 4580.0 4590.0 4600.0 4610.0 4620.0 4630.0 4640.0 4650.0 4660.0 4670.0 4680.0 4690.0 4700.0 4710.0 4720.0 4730.0 4740.0 4750.0 4760.0 4770.0 4780.0 4790.0 4800.0 4810.0 4820.0 4830.0 4840.0 4850.0 4860.0 4870.0 4880.0 4890.0 4900.0 4910.0 4920.0 4930.0 4940.0 4950.0 4960.0 4970.0 4980.0 4990.0 5000.0 5010.0 5020.0 5030.0 5040.0 5050.0 5060.0 5070.0 5080.0 5090.0 5100.0 5110.0 5120.0 5130.0 5140.0 5150.0 5160.0 5170.0 5180.0 5190.0 5200.0 5210.0 5220.0 5230.0 5240.0 5250.0 5260.0 5270.0 5280.0 5290.0 5300.0 5310.0 5320.0 5330.0 5340.0 5350.0 5360.0 5370.0 5380.0 5390.0 5400.0 5410.0 5420.0 5430.0 5440.0 5450.0 5460.0 5470.0 5480.0 5490.0 5500.0 5510.0 5520.0 5530.0 5540.0 5550.0 5560.0 5570.0 5580.0 5590.0 5600.0 5610.0 5620.0 5630.0 5640.0 5650.0 5660.0 5670.0 5680.0 5690.0 5700.0 5710.0 5720.0 5730.0 5740.0 5750.0 5760.0 5770.0 5780.0 5790.0 5800.0 5810.0 5820.0 5830.0 5840.0 5850.0 5860.0 5870.0 5880.0 5890.0 5900.0 5910.0 5920.0 5930.0 5940.0 5950.0 5960.0 5970.0 5980.0 5990.0 6000.0 6010.0 6020.0 6030.0 6040.0 6050.0 6060.0 6070.0 6080.0 6090.0 6100.0 6110.0 6120.0 6130.0 6140.0 6150.0 6160.0 6170.0 6180.0 6190.0 6200.0 6210.0 6220.0 6230.0 6240.0 6250.0 6260.0 6270.0 6280.0 6290.0 6300.0 6310.0 6320.0 6330.0 6340.0 6350.0 6360.0 6370.0 6380.0 6390.0 6400.0 6410.0 6420.0 6430.0 6440.0 6450.0 6460.0 6470.0 6480.0 6490.0 6500.0 6510.0 6520.0 6530.0 6540.0 6550.0 6560.0 6570.0 6580.0 6590.0 6600.0 6610.0 6620.0 6630.0 6640.0 6650.0 6660.0 6670.0 6680.0 6690.0 6700.0 6710.0 6720.0 6730.0 6740.0 6750.0 6760.0 6770.0 6780.0 6790.0 6800.0 6810.0 6820.0 6830.0 6840.0 6850.0 6860.0 6870.0 6880.0 6890.0 6900.0 6910.0 6920.0 6930.0 6940.0 6950.0 6960.0 6970.0 6980.0 6990.0 7000.0 7010.0 7020.0 7030.0 7040.0 7050.0 7060.0 7070.0 7080.0 7090.0 7100.0 7110.0 7120.0 7130.0 7140.0 7150.0 7160.0 7170.0 7180.0 7190.0 7200.0 7210.0 7220.0 7230.0 7240.0 7250.0 7260.0 7270.0 7280.0 7290.0 7300.0 7310.0 7320.0 7330.0 7340.0 7350.0 7360.0 7370.0 7380.0 7390.0 7400.0 7410.0 7420.0 7430.0 7440.0 7450.0 7460.0 7470.0 7480.0 7490.0 7500.0 7510.0 7520.0 7530.0 7540.0 7550.0 7560.0 7570.0 7580.0 7590.0 7600.0 7610.0 7620.0 7630.0 7640.0 7650.0 7660.0 7670.0 7680.0 7690.0 7700.0 7710.0 7720.0 7730.0 7740.0 7750.0 7760.0 7770.0 7780.0 7790.0 7800.0 7810.0 7820.0 7830.0 7840.0 7850.0 7860.0 7870.0 7880.0 7890.0 7900.0 7910.0 7920.0 7930.0 7940.0 7950.0 7960.0 7970.0 7980.0 7990.0 8000.0 8010.0 8020.0 8030.0 8040.0 8050.0 8060.0 8070.0 8080.0 8090.0 8100.0 8110.0 8120.0 8130.0 8140.0 8150.0 8160.0 8170.0 8180.0 8190.0 8200.0 8210.0 8220.0 8230.0 8240.0 8250.0 8260.0 8270.0 8280.0 8290.0 8300.0 8310.0 8320.0 8330.0 8340.0 8350.0 8360.0 8370.0 8380.0 8390.0 8400.0 8410.0 8420.0 8430.0 8440.0 8450.0 8460.0 8470.0 8480.0 8490.0 8500.0 8510.0 8520.0 8530.0 8540.0 8550.0 8560.0 8570.0 8580.0 8590.0 8600.0 8610.0 8620.0 8630.0 8640.0 8650.0 8660.0 8670.0 8680.0 8690.0 8700.0 8710.0 8720.0 8730.0 8740.0 8750.0 8760.0 8770.0 8780.0 8790.0 8800.0 8810.0 8820.0 8830.0 8840.0 8850.0 8860.0 8870.0 8880.0 8890.0 8900.0 8910.0 8920.0 8930.0 8940.0 8950.0 8960.0 8970.0 8980.0 8990.0 9000.0 9010.0 9020.0 9030.0 9040.0 9050.0 9060.0 9070.0 9080.0 9090.0 9100.0 9110.0 9120.0 9130.0 9140.0 9150.0 9160.0 9170.0 9180.0 9190.0 9200.0 9210.0 9220.0 9230.0 9240.0 9250.0 9260.0 9270.0 9280.0 9290.0 9300.0 9310.0 9320.0 9330.0 9340.0 9350.0 9360.0 9370.0 9380.0 9390.0 9400.0 9410.0 9420.0 9430.0 9440.0 9450.0 9460.0 9470.0 9480.0 9490.0 9500.0 9510.0 9520.0 9530.0 9540.0 9550.0 9560.0 9570.0 9580.0 9590.0 9600.0 9610.0 9620.0 9630.0 9640.0 9650.0 9660.0 9670.0 9680.0 9690.0 9700.0 9710.0 9720.0 9730.0 9740.0 9750.0 9760.0 9770.0 9780.0 9790.0 9800.0 9810.0 9820.0 9830.0 9840.0 9850.0 9860.0 9870.0 9880.0 9890.0 9900.0 9910.0 9920.0 9930.0 9940.0 9950.0 9960.0 9970.0 9980.0 9990.0 10000.0 10010.0 10020.0 10030.0 10040.0 10050.0 10060.0 10070.0 10080.0 10090.0 10100.0 10110.0 10120.0 10130.0 10140.0 10150.0 10160.0 10170.0 10180.0 10190.0 10200.0 10210.0 10220.0 10230.0 10240.0 10250.0 10260.0 10270.0 10280.0 10290.0 10300.0 10310.0 10320.0 10330.0 10340.0 10350.0 10360.0 10370.0 10380.0 10390.0 10400.0 10410.0 10420.0 10430.0 10440.0 10450.0 10460.0 10470.0 10480.0 10490.0 10500.0 10510.0 10520.0 10530.0 10540.0 10550.0 10560.0 10570.0 10580.0 10590.0 10600.0 10610.0 10620.0 10630.0 10640.0 10650.0 10660.0 10670.0 10680.0 10690.0 10700.0 10710.0 10720.0 10730.0 10740.0 10750.0 10760.0 10770.0 10780.0 10790.0 10800.0 10810.0 10820.0 10830.0 10840.0 10850.0 10860.0 10870.0 10880.0 10890.0 10900.0 10910.0 10920.0 10930.0 10940.0 10950.0 10960.0 10970.0 10980.0 10990.0 11000.0 11010.0 11020.0 11030.0 11040.0 11050.0 11060.0 11070.0 11080.0 11090.0 11100.0 11110.0 11120.0 11130.0 11140.0 11150.0 11160.0 11170.0 11180.0 11190.0 11200.0 11210.0 11220.0 11230.0 11240.0 11250.0 11260.0 11270.0 11280.0 11290.0 11300.0 11310.0 11320.0 11330.0 11340.0 11350.0 11360.0 11370.0 11380.0 11390.0 11400.0 11410.0 11420.0 11430.0 11440.0 11450.0 11460.0 11470.0 11480.0 11490.0 11500.0 11510.0 11520.0 11530.0 11540.0 11550.0 11560.0 11570.0 11580.0 11590.0 11600.0 11610.0 11620.0 11630.0 11640.0 11650.0 11660.0 11670.0 11680.0 11690.0 11700.0 11710.0 11720.0 11730.0 11740.0 11750.0 11760.0 11770.0 11780.0 11790.0 11800.0 11810.0 11820.0 11830.0 11840.0 11850.0 11860.0 11870.0 11880.0 11890.0 11900.0 11910.0 11920.0 11930.0 11940.0 11950.0 11960.0 11970.0 11980.0 11990.0 12000.0 12010.0 12020.0 12030.0 12040.0 12050.0 12060.0 12070.0 12080.0 12090.0 12100.0 12110.0 12120.0 12130.0 12140.0 12150.0 12160.0 12170.0 12180.0 12190.0 12200.0 12210.0 12220.0 12230.0 12240.0 12250.0 12260.0 12270.0 12280.0 12290.0 12300.0 12310.0 12320.0 12330.0 12340.0 12350.0 12360.0 12370.0 12380.0 12390.0 12400.0 12410.0 12420.0 12430.0 12440.0 12450.0 12460.0 12470.0 12480.0 12490.0 12500.0 12510.0 12520.0 12530.0 12540.0 12550.0 12560.0 12570.0 12580.0 12590.0 12600.0 12610.0 12620.0 12630.0 12640.0 12650.0 12660.0 12670.0 12680.0 12690.0 12700.0 12710.0 12720.0 12730.0 12740.0 12750.0 12760.0 12770.0 12780.0 12790.0 12800.0 12810.0 12820.0 12830.0 12840.0 12850.0 12860.0 12870.0 12880.0 12890.0 12900.0 12910.0 12920.0 12930.0 12940.0 12950.0 12960.0 12970.0 12980.0 12990.0 13000.0 13010.0 13020.0 13030.0 13040.0 13050.0 13060.0 13070.0 13080.0 13090.0 13100.0 13110.0 13120.0 13130.0 13140.0 13150.0 13160.0 13170.0 13180.0 13190.0 13200.0 13210.0 13220.0 13230.0 13240.0 13250.0 13260.0 13270.0 13280.0 13290.0 13300.0 13310.0 13320.0 13330.0 13340.0 13350.0 13360.0 13370.0 13380.0 13390.0 13400.0 13410.0 13420.0 13430.0 13440.0 13450.0 13460.0 13470.0 13480.0 13490.0 13500.0 13510.0 13520.0 13530.0 13540.0 13550.0 13560.0 13570.0 13580.0 13590.0 13600.0 13610.0 13620.0 13630.0 13640.0 13650.0 13660.0 13670.0 13680.0 13690.0 13700.0 13710.0 13720.0 13730.0 13740.0 13750.0 13760.0 13770.0 13780.0 13790.0 13800.0 13810.0 13820.0 13830.0 13840.0 13850.0 13860.0 13870.0 13880.0 13890.0 13900.0 13910.0 13920.0 13930.0 13940.0 13950.0 13960.0 13970.0 13980.0 13990.0 14000.0 14010.0 14020.0 14030.0 14040.0 14050.0 14060.0 14070.0 14080.0 14090.0 14100.0 14110.0 14120.0 14130.0 14140.0 14150.0 14160.0 14170.0 14180.0 14190.0 14200.0 14210.0 14220.0 14230.0 14240.0 14250.0 14260.0 14270.0 14280.0 14290.0 14300.0 14310.0 14320.0 14330.0 14340.0 14350.0 14360.0 14370.0 14380.0 14390.0 14400.0 14410.0 14420.0 14430.0 14440.0 14450.0 14460.0 14470.0 14480.0 14490.0 14500.0 14510.0 14520.0 14530.0 14540.0 14550.0 14560.0 14570.0 14580.0 14590.0 14600.0 14610.0 14620.0 14630.0 14640.0 14650.0 14660.0 14670.0 14680.0 14690.0 14700.0 14710.0 14720.0 14730.0 14740.0 14750.0 14760.0 14770.0 14780.0 14790.0 14800.0 14810.0 14820.0 14830.0 14840.0 14850.0 14860.0 14870.0 14880.0 14890.0 14900.0 14910.0 14920.0 14930.0 14940.0 14950.0 14960.0 14970.0 14980.0 14990.0 15000.0 15010.0 15020.0 15030.0 15040.0 15050.0 15060.0 15070.0 15080.0 15090.0 15100.0 15110.0 15120.0 15130.0 15140.0 15150.0 15160.0 15170.0 15180.0 15190.0 15200.0 15210.0 15220.0 15230.0 15240.0 15250.0 15260.0 15270.0 15280.0 15290.0 15300.0 15310.0 15320.0 15330.0 15340.0 15350.0 15360.0 15370.0 15380.0 15390.0 15400.0 15410.0 15420.0 15430.0 15440.0 15450.0 15460.0 15470.0 15480.0 15490.0 15500.0 15510.0 15520.0 15530.0 15540.0 15550.0 15560.0 15570.0 15580.0 15590.0 15600.0 15610.0 15620.0 15630.0 15640.0 15650.0 15660.0 15670.0 15680.0 15690.0 15700.0 15710.0 15720.0 15730.0 15740.0 15750.0 15760.0 15770.0 15780.0 15790.0 15800.0 15810.0 15820.0 15830.0 15840.0 15850.0 15860.0 15870.0 15880.0 15890.0 15900.0 15910.0 15920.0 15930.0 15940.0 15950.0 15960.0 15970.0 15980.0 15990.0 16000.0 16010.0 16020.0 16030.0 16040.0 16050.0 16060.0 16070.0 16080.0 16090.0 16100.0 16110.0 16120.0 16130.0 16140.0 16150.0 16160.0 16170.0 16180.0 16190.0 16200.0 16210.0 16220.0 16230.0 16240.0 16250.0 16260.0 16270.0 16280.0 16290.0 16300.0 16310.0 16320.0 16330.0 16340.0 16350.0 16360.0 16370.0 16380.0 16390.0 16400.0 16410.0 16420.0 16430.0 16440.0 16450.0 16460.0 16470.0 16480.0 16490.0 16500.0 16510.0 16520.0 16530.0 16540.0 16550.0 16560.0 16570.0 16580.0 16590.0 16600.0 16610.0 16620.0 16630.0 16640.0 16650.0 16660.0 16670.0 16680.0 16690.0 16700.0 16710.0 16720.0 16730.0 16740.0 16750.0 16760.0 16770.0 16780.0 16790.0 16800.0 16810.0 16820.0 16830.0 16840.0 16850.0 16860.0 16870.0 16880.0 16890.0 16900.0 16910.0 16920.0 16930.0 16940.0 16950.0 16960.0 16970.0 16980.0 16990.0 17000.0 17010.0 17020.0 17030.0 17040.0 17050.0 17060.0 17070.0 17080.0 17090.0 17100.0 17110.0 17120.0 17130.0 17140.0 17150.0 17160.0 17170.0 17180.0 17190.0 17200.0 17210.0 17220.0 17230.0 17240.0 17250.0 17260.0 17270.0 17280.0 17290.0 17300.0 17310.0 17320.0 17330.0 17340.0 17350.0 17360.0 17370.0 17380.0 17390.0 17400.0 17410.0 17420.0 17430.0 17440.0 17450.0 17460.0 17470.0 17480.0 17490.0 17500.0 17510.0 17520.0 17530.0 17540.0 17550.0 17560.0 17570.0 17580.0 17590.0 17600.0 17610.0 17620.0 17630.0 17640.0 17650.0 17660.0 17670.0 17680.0 17690.0 17700.0 17710.0 17720.0 17730.0 17740.0 17750.0 17760.0 17770.0 17780.0 17790.0 17800.0 17810.0 17820.0 17830.0 17840.0 17850.0 17860.0 17870.0 17880.0 17890.0 17900.0 17910.0 17920.0 17930.0 17940.0 17950.0 17960.0 17970.0 17980.0 17990.0 18000.0 18010.0 18020.0 18030.0 18040.0 18050.0 18060.0 18070.0 18080.0 18090.0 18100.0 18110.0 18120.0 18130.0 18140.0 18150.0 18160.0 18170.0 18180.0 18190.0 18200.0 18210.0 18220.0 18230.0 18240.0 18250.0 18260.0 18270.0 18280.0 18290.0 18300.0 18310.0 18320.0 18330.0 18340.0 18350.0 18360.0 18370.0 18380.0 18390.0 18400.0 18410.0 18420.0 18430.0 18440.0 18450.0 18460.0 18470.0 18480.0 18490.0 18500.0 18510.0 18520.0 18530.0 18540.0 18550.0 18560.0 18570.0 18580.0 18590.0 18600.0 18610.0 18620.0 18630.0 18640.0 18650.0 18660.0 18670.0 18680.0 18690.0 18700.0 18710.0 18720.0 18730.0 18740.0 18750.0 18760.0 18770.0 18780.0 18790.0 18800.0 18810.0 18820.0 18830.0 18840.0 18850.0 18860.0 18870.0 18880.0 18890.0 18900.0 18910.0 18920.0 18930.0 18940.0 18950.0 18960.0 18970.0 18980.0 18990.0 19000.0 19010.0 19020.0 19030.0 19040.0 19050.0 19060.0 19070.0 19080.0 19090.0 19100.0 19110.0 19120.0 19130.0 19140.0 19150.0 19160.0 19170.0 19180.0 19190.0 19200.0 19210.0 19220.0 19230.0 19240.0 19250.0 19260.0 19270.0 19280.0 19290.0 19300.0 19310.0 19320.0 19330.0 19340.0 19350.0 19360.0 19370.0 19380.0 19390.0 19400.0 19410.0 19420.0 19430.0 19440.0 19450.0 19460.0 19470.0 19480.0 19490.0 19500.0 19510.0 19520.0 19530.0 19540.0 19550.0 19560.0 19570.0 19580.0 19590.0 19600.0 19610.0 19620.0 19630.0 19640.0 19650.0 19660.0 19670.0 19680.0 19690.0 19700.0 19710.0 19720.0 19730.0 19740.0 19750.0 19760.0 19770.0 19780.0 19790.0 19800.0 19810.0 19820.0 19830.0 19840.0 19850.0 19860.0 19870.0 19880.0 19890.0 19900.0 19910.0 19920.0 19930.0 19940.0 19950.0 19960.0 19970.0 19980.0 19990.0 20000.0 20010.0 20020.0 20030.0 20040.0 20050.0 20060.0 20070.0 20080.0 20090.0

    BE1835s101

    SE05s96

    USALongs56

    BE8078s92

    BE156s84

    BE2149s100

    BE14461s98

    BE2122s100

    WV12342s84

    MON9s91

    BE13551s95

    BE307s87

    WV23836s88

    BE1587s89

    BE1556s101

    BE4147s87

    BE13393s99

    SE08s96

    BE614s93

    AUSA2s61

    BE14808s98

    NYCH17s93

    BE174s95

    BE15752s97

    BE076s86

    SE12s92

    MON1s92

    BE1061s100

    BE13462s99BE11584s101

    BE11s101

    MAD6s93

    BE410s86

    SE10s91

    SE02s98

    BE1224s101

    BE797s100

    SE09s92

    BE2584s85

    BE332s102

    BE13281s99

    BE339s86

    BE3785s87

    BIR1734s89

    SE01s95

    WV2780s79

    WV5222s81

    MADs91

    SE03s91

    S2s76

    BE12005s94

    BE11091s100

    NYCH57s94

    BE12243s96

    SE01s92

    BE071s88

    BE11129s100

    BE004s102

    BE191s90

    SE05s91

    SE12s97

    BE12216s96

    BE822s100

    BE1591s90

    MON1s90

    BE16s100

    BE1836s101

    MAD1s93

    BE512s95

    BE6274s91

    BE12350s96

    BE11600s94

    MAD6s92

    BE13280s99

    BE1150s100

    MON5s91

    BE13425s99

    BE1440s92

    SE12s95

    BE13192s99

    BE14536s98

    BE1441s101

    BE11030s100

    BE1343s101

    BE15471s97

    BE76s86

    MON9s92

    SE10s92

    SE11s92

    BE1937s100

    MON1s89

    BE64s101

    BIR6190s89

    WV6973s82

    BE12028s100BE519s101

    BE3252s86

    BE12895s95

    BE2466s85

    BE538s88

    BE10490s93

    SE12s94

    NYCH09s93

    BE183s85

    BE305s89

    MAD2s93SE11s95

    MON51s90

    MAD8s92

    BE11465s94

    BE1936s100

    BE4763s88

    MON2s88

    BE204s84

    BE800s100

    BE119s87

    BIR642s89

    BE1717s101

    NYCH34s94

    BE944s100

    WV19983s87

    BE6460s91

    BE138s90

    BE933s88

    BE112s101

    BE11976s100

    BE14898s98

    BE369s90

    BE13412s99

    Figure 1: The Maximum clade credibility tree for the G gene of 129 RSVA-2 viralsamples.

    10

  • BE8078s92NYCH09s93SE01s92SE05s91MON1s92

    BE614s93BE12005s94BE174s95BE12350s96BE12216s96

    BE11600s94

    SE10s92

    MAD2s93BE10490s93SE11s95

    MAD6s92 BE15471s97SE03s91

    BE13280s99BE1061s100BE13192s99BE16s100BE2122s100BE1556s101BE800s100

    BE14536s98

    BE1441s101BE11976s100BE64s101BE822s100BE13281s99BE1936s100

    BE944s100

    BE1835s101BE1224s101BE1343s101

    BE1937s100BE1836s101BE1717s101

    BE2149s100BE112s101BE1150s100BE797s100

    NYCH17s93MAD6s93MAD8s92MAD1s93

    BE1587s89MON1s90BIR6190s89WV12342s84BE933s88BE2584s85WV5222s81BE156s84

    S2s76

    BE11030s100BE11129s100BE11091s100

    BE004s102BE519s101BE13425s99

    BE11s101BE13393s99BE12028s100

    SE12s97BE14808s98SE02s98BE14898s98BE332s102BE14461s98NYCH57s94BE12895s95BE11465s94

    BE15752s97SE05s96BE13551s95MON2s88MON1s89SE10s91SE11s92SE12s94BE1591s90 BE512s95SE01s95BE4147s87BE071s88BE538s88BE204s84BE183s85BE076s86BE2466s85BE4763s88BE76s86

    BE339s86BE3252s86BE410s86

    WV23836s88 BE13462s99BE11584s101BE13412s99SE12s95BE12243s96SE08s96SE09s92SE12s92BE6274s91BE1440s92MADs91BE6460s91BE305s89BE3785s87BE307s87BE119s87

    BE138s90BE191s90BE369s90MON9s91BIR642s89WV19983s87MON5s91MON51s90MON9s92NYCH34s94

    BIR1734s89

    WV2780s79WV6973s82AUSA2s61USALongs56

    DensiTree with clade height bars for clades with over 50% support. Root canal treerepresents maximum clade credibility tree.

    Questions

    In what year did the common ancestor of all RSVA viruses sampled live? What is the95% HPD?

    11

  • Bonus section: Bayesian Skyline plot

    We can reconstruct the population history using the Bayesian Skyline plot. In order todo so, load the XML file into BEAUti, select the priors-tab and change the tree priorfrom coalescent with constant population size to coalescent with Bayesian skyline. Notethat an extra item is added to the priors called Markov chained population sizes whichis a prior that ensures dependence between population sizes.

    By default the number of groups used in the skyline analysis is set to 5, To changethis, select menu View/Show Initialization panel and a list of parameters is shown. Se-lect bPopSizes.t:tree and change the dimension to 3. Likewise, selection bGroupSizes.t:treeand change its dimension to 3. The dimensions of the two parameters should bethe same. More groups mean more population changes can be detected, but it alsomeans more parameters need to be estimated and the chain runs longer. The extendedBayesian skyline plot automatically detects the number of changes, so it could be usedas an alternative tree prior.

    12

  • This analysis requires a bit longer to converge, so change the MCMC chain lengthto 10 million, and the log intervals for the trace-log and tree-log to 10 thousand. Then,save the file and run BEAST.

    To plot the population history, load the log file in tracer and select the menu Anal-ysis/Bayesian Skyline Reconstruction.

    A dialog is shown where you can specify the tree file associated with the log file.Also, since the youngest sample is from 2002, change the entry for age of youngest tipto 2002.

    13

  • After some calculation, a graph appears showing population history where the me-dian and 95% HPD intervals are plotted. After selecting the solid interval checkbox,the graph should look something like this.

    14

  • Questions

    By what amount did the effective population size of RSVA grow from 1970 to 2002according to the BSP?

    What are the underlying assumptions of the BSP? Are the violated by this data set?

    1.2 Exercise

    Change the Bayesian skyline prior to extended Bayesian skyline plot (EBSP) prior andrun till convergence. EBSP produces an extra log file, called EBSP.$(seed).log where$(seed) is replaced by the seed you used to run BEAST. A plot can be created byrunning the EBSPAnalyser utility, and loading the output file in a spreadsheet.

    How many groups are indicated by the EBSP analysis? This is much lower than forBSP. How does this affect the population history plots?

    References

    [1] Kalina T Zlateva, Philippe Lemey, Elien Moes, Anne-Mieke Vandamme, and MarcVan Ranst, Genetic variability and molecular evolution of the human respiratorysyncytial virus subgroup b attachment g protein, J Virol 79 (2005), no. 14, 915767.

    [2] Kalina T Zlateva, Philippe Lemey, Anne-Mieke Vandamme, and Marc Van Ranst,Molecular evolution and circulation patterns of human respiratory syncytial virussubgroup a: positively selected sites in the attachment g glycoprotein, J Virol 78(2004), no. 9, 467583.

    15