Quantitative Clad is Tics

  • Published on
    04-Oct-2014

  • View
    2.039

  • Download
    155

Embed Size (px)

Transcript

<p>1</p> <p>Quantitative Cladistics and Use of TNTAll Rights Reserved Pablo A. Goloboff Instituto Superior de Entomologa, CONICET Facultad de Ciencias Naturales e Instituto Miguel Lillo, Miguel Lillo 205, 4000 S.M. De Tucumn The data sets for these exercises are distributed electronically, as part of a 5-day course in Cladistics. do not distribute this handout!</p> <p>2</p> <p>IntroductionThroughout, the general format for the assignments is: file or folder names are indicated as filename TNT commands are indicated as command (help with help command;) Menu choices (available only for Windows) are indicated as Choice In every case, unless otherwise specified, start by reading example.tnt (File/OpenInputFile) and calculating most parsimonious trees (note: most parsimonious trees can be calculated with Analyze/TraditionalSearch, or with command mult; using default settings in either case). Save all files to a mon_mor folder, to keep machine clean. 1 Create an output file called trees.out (File/Output/OpenOutputFile) and write tree diagrams for trees 0, 4, and 6 (Trees/DisplaySave, and select the trees you want to save). In the same file, include a table (default format) with the lengths for all trees (Optimize/TreeLengths), and a table (optional format, set with Format/OptionalTableFormat) for the score of characters 10-20 in trees 3-4. 2 Create a tree-file in compact notation (File/TreeSaveFile/OpenCompactMode), called example.ctf. Save trees to that file (File/TreeSaveFile/SaveTreesToFile), and close it (File/TreeSaveFile/CloseTreeFile). Create another tree-file, in parenthetical notation (File/TreeSaveFile/OpenParenthetical), called example.tre. Save trees, using taxon numbers (set with Format/UseTaxonNames), and close the file. Then create a third file in parenthetical notation, called taxnames.tre, and save the trees, but using taxon names. Exit the program, enter again, and re-read the trees from each of the files; confirm that the trees are identical (can be done with Trees/TreeBuffer/Filter, with defaults, which simply discards duplicate trees, or Trees/TreeBuffer/CompareTrees, which provides a list of non-unique trees). What is the difference in size for the files example.ctf and example.tre? When is it advisable to save the trees using taxon names, instead of numbers? 3 (Windows only) Create a metafile, example.emf, to include a drawing of tree 3. Create a PowerPoint file, and copy the image from example.emf in one of the slides. There are two ways to do this. First one is manual. For this, make sure "tree-preview" is ON (with Format/PreviewTrees). Then, display tree-diagram (as you did for exercise 1). When in the previewing screen, press "M" (for "metafile"). Second way to do this is automatic. For this, open the metafile first, with File/Output/OpenMetafile (or with log &amp; example.emf;); this automatically sets the preview as OFF, so that you will not need to be there to press any keys for execution to continue after saving the tree-diagram to the metafile. Then, display tree-diagram (as you did for exercise 1); this automatically writes the tree diagram to the metafile. Then, close the metafile (File/Output/CloseMetafile, or log /&amp;;). 4 Read the data set from contin.tnt. Create a random tree. Then, edit the tree (manually, in treeview mode, which you get by clicking on the button with the eye and the tree, in Windows, or with the edit command, in other versions). Make the tree ( B ( C ( D E ) ( F G H ) ) ); save tree diagram to an output file, contin.out. Then, edit tree again, to now include J as sister group of G, and K as sister group of H, and save tree diagram to output file.</p> <p>3 5 Create a file with instructions, instructs, for TNT to do the following task(s): a) b) c) d) e) open a log file, automatic.out read the data from example.tnt calculate most parsimonious tree(s) save consensus of all trees (Windows only) open a metafile, automatic.emf, and save the consensus to it f) calculate length of all trees found g) exit the program Create a batch file, automatic.bat (under Windows) or a script automatic (under Linux/Mac), which calls TNT and makes it read (=execute) the instructions in file instructs. The commands to use here are: log, procedure, mult, tplot, length, quit. You can get help on the syntax for command xxx by typing help xxx at the command line. 6 (Windows only) Create a file with batch-menu instructions, instructs.bmn, for TNT to do the same tasks as in point 5. Create a batch-file, autobatch.bat, which calls TNT and makes it read (and execute) all the instructions in instructs.bmn. (procedure). 7 Make non-additive characters 22, 26, and 92; make additive characters 42 and 34 (Data/CharacterSettings, the same can be done with the ccode command) Calculate most parsimonious tree(s). What's the resulting length? (Note: should be 383). Without exitting the program or re-reading the data, create a character-state-tree for character 101:0 / 2---3 ---1 \ 4</p> <p>Re-calculate most parsimonious trees; what's the resulting length? (note: should be 389). 8 Create a file, myhelp.txt, which contains a list of all the TNT commands, and a brief description of the options for all commands. 9 With a text editor, fuse the data sets in ..\dsets\part_a.tnt and ..\dsets\part_b.tnt (imaginary molecular and morphological data sets). Save the single data set to a file mixed.tnt. Make sure the ccodes are properly adapted, using the @ option. If so, the minimum length should be 699 (although superficial searches may produce trees of 700 steps, or even 701). 10 On Friday, we will see scripts; scripts can be used to produce special color diagrams. An example is in labeled.tnt, which contains a list of names and a tree (the shortest tree found by Goloboff et al., 2009, for mammals). The taxon names in the data set contain the full hierarchy of mammalian classification, which can be processed with the scripts dohi.run and colorgroups.run (copied to the ../ monday folder). Reading that data set, and typing dohi _taxon_A; will display the group in the tree closest to the taxon specified (in Windows, tree-previewing must be turned off for this). In Windows, colorgroups taxon_A taxon_B; will display a tree-diagram, where branches of taxon A and B are shown with different colors (up to 10 groups can be shown; tree-previewing must be turned ON). This can be used to facilitate checking results of an analysis.</p> <p>4</p> <p>Optimization1 read data set in example.tnt, and calculate most parsimonious tree(s). Map characters onto trees, using: (a) (Windows only) color codes (b) numbers to indicate states (c) state names 2 (Windows only) Then, create a metafile, colormap.emf, which includes a color mapping of the character named male_spur. Use thick-branches to make sure the colors are well-visible. 3 On tree number 0, count the number of minimum-maximum possible transformations (change, Optimize/SpecificChanges) for character male_spur. Count the number of losses and the number of gains (these should be: losses, 7-8, gains 2-5). 4 For the following data set: A B C D E F G H I J 0 1 2 3 4 5 6 7 8 9</p> <p>(a non-additive character), create a random tree, and count the number of possible reconstructions (recons, Optimize/Characters/Reconstructions). How many there are? (note: there should be over 4,000!). 5 For the data set example.tnt, find most parsimonious trees, and then calculate the strict consensus (with Trees/Consensus, plotting node numbers naked controls whether node numbers are plotted or not, as well as Format/ShowNodeNumbers). Calculate the synapomorphies common to all the trees. What are the common synapomorphies for the node common to taxa named L_ (=Lycinus) and D_ (=Diplothelopsis)?? (probably this node is numbered 104 in the consensus, depending on how you did your search). Are characters 22, 45, 46, 64, 85, and 102 a synapomorphy of that group in any of the shortest trees? 6 For the same situation above, produce the common mapping of character 22 (maxillary_cuspules) onto the most parsimonious trees (Optimize/Characters, or map[ option). If the consensus is optimized as such, then the character changes without ambiguity from 1 to 0 in the node common to Lycinus and Diplothelopsis. Confirm this, and confirm that there is a most parsimonious tree where that change does not occur (or does not occur unambiguously).</p> <p>5 7 For the data in example.tnt, create 10 random trees (random). Sort them, from best to worst (sort). Calculate the lengths. Then, retain the shortest of the random trees; what's it's length, compared to the length of most parsimonious trees? 8 - Read data from example.tnt, and then read the trees from ..\dsets\mixture.ctf. (shortread, File/ReadCompactTreeFile). Calculate tree-lengths (length, Optimize/TreeLength); some trees have length 382 (minimum), others 384 and 385, and some trees are very long (they're random trees). Create tree-groups (Trees/TreeGroups) for each of the length-classes, so that the groups are named: 1. 2. 3. 4. 5. "shortest": including all trees of length 382 "medium": including all trees of length 384 "longer": including all tree of length 385 "random": including all trees of length greater than 385 "notsobad": including the trees from groups 1 and 2.</p> <p>Then, use the groups created to output tree lengths (which, by the way, provides confirmation that the groups were properly created). If you don't have problems defining the tree-groups with the menu interface, repeat tree-group definition in a file, using commands (tgroup command). 9 Read data from example.tnt, and then read the trees from ..\dsets\mixture.ctf (shortread, File/ReadCompactTreeFile). Condense the trees (condense, Trees/TreeBuffer/CondenseTrees, with default settings; settings are controlled with collapse, Settings/CollapsingRules). Produce a table of the number of nodes of all the trees (tnode, Trees/Describe/NumberOfNodes).</p> <p>6</p> <p>Tree Searches1 read data from example.tnt. Deactivate all taxa, except the first 20 (i.e. taxa 0-19). Calculate an exact solution. Compare the results with 100 random addition sequences, saving up to 10 trees per replicate. Is the heuristic search likely to have found the actual minimum length for the first 20 taxa? why? Add taxa one by one, and compare the times required for exact solutions with 21, 22, 23, etc. taxa, untill an exact solution cannot be achieved in about 5-10 minutes. Then, run 100 random addition sequences with up to 10 trees per replicate; is it likely that there are shorter trees? 2 read the data set set in tbrdemo.tnt, which automatically calls the script dotbr.run script (in the dsets directory). This must be run with the character-mode version, and is a graphical demonstration of how a tree-search proceeds in practice. 3 read data from fam.tnt. What's the length of the shortest trees? How many distinct trees there are? How many TBR islands there are? Find the best trees where Ummidia+Calathotar+Heteromigas+ Actinopus+Plesiolena+Idiops+Neocteniza+Misbolas do not form a monophyletic group, the best trees where Stenoteromm+Acanthogona do not form a monophyletic group, and the best trees where MECICOBOT+ATYPIDA form a monophyletic group. What are the lengths in every case? (should be: 228, 229, 235) . Then find a tree where all those constraints are satisfied at the same time; why is the difference in minimum possible length? (should be 238). 4 read data from example.tnt. set collapsing rule to "max. length = 0" (rule 3). What would be the best strategy for finding all the equally most parsimonious trees under that setting? 5 read data from tricky_1.tnt. With collapsing as set in the file (to be seen in a future class), there are 864 distinct trees. In general, it might be expected that, as more more trees are saved in each of several rand-add-seqs, it becomes more likely to find all the equally parsimonious trees than if one saves a few trees per replication. Test this idea by running two different analyses: first, run 20 replications of a random addition sequence, saving up to 216 trees/replicate. Then, run 20 replications, saving up to 430 trees/replicate. If you want to make sure of the differences, run several times, changing the random seed, or using time as random seed (rseed 0;). Which of the two alternatives finds all the trees? Is this in agreement with the expected results? Why? Compare this with the number of trees found if, after completing the rand-add-seqs, global TBR is performed starting from the trees found. 6 Read the data from tricky_2.tnt. How can you explain that the exact solution (finding all the trees of minimum length: 10,395 trees of 306 steps) of this matrix can be done much faster (about 50x) than a heuristic solution (from a single starting point, saving all possible trees)? 7 Read the data in tricky_2.tnt, set the collapsing to "min. length = 0" (=rule 1), and then compare the running times for a single starting point for TBR saving up to 1000 trees, with the running times of TBR saving up to 11,000 trees. The first finds (and swaps) 1000 trees in X secs. The second finds (and swaps) 10,395, which is about 10 times more. The second, however, doesn't take 10 times more, but instead takes several hundred times more. Why the fuck is that? 8 Read the data in zilla.tnt. Compare the results of searching with three different strategies:</p> <p>7 a) multiple random addition sequences, saving up to 10 trees per replication b) a single random addition sequence, saving up to 10,000 trees c) as in (b), but setting collapsing to "none". in all cases, set the timeout to 3 minutes (with timeout 3:0 or with Analyze/Timeout), so that all searches use the same amount of time. Change the random seed between searches, and repeat several times. We will try to calculate grand totals for all the lab. 9 Just for fun: read the data set from hel.tnt. That is a relatively difficult data set, with 854 taxa. The minimum length is 23005. Using only traditional strategies, try to find trees as short as you can, using only traditional search strategies. Produce a log, where the status is saved every 30 secs. to a file, called heltrad.out (controlled with report, or Settings/ReportLevels). This will be compared to the results one can obtain using new strategies, in future classes.</p> <p>8</p> <p>Ambiguity, Consensus, Tree-Collapsing, Comparing Trees1 Read data from example.tnt, and find all equally parsimonious trees under the default collapsing rule, "min. length = 0" ( =rule 1). How many there are? Calculate the strict consensus tree; how many nodes does it have? (nelsen, tnodes, or Trees/Consensus and Trees/Describe/NumberOfNodes). Now set the collapsing rule to "max. length = 0" (=rule 3). Try to find all equally parsimonious trees how easy is it? With all the trees you could find, calculate the strict consensus; how many nodes does it have? 2 Read data from example.tnt (make sure settings are default ones before reading the data set). Do a few random addition seq...</p>