3
2620 q 2001 The Society for the Study of Evolution. All rights reserved. BOOK REVIEWS Evolution, 55(12), 2001, pp. 2620–2622 AFTER THE MOLECULAR EVOLUTION REVOLUTION 1 JUNHYONG KIM Department of Ecology and Evolutionary Biology, Yale University, 165 Prospect Street, New Haven, Connecticut 06511 Received October 30, 2001. ‘‘Shift happens’’—Dilbert. ‘‘Plus c ¸a change, plus c’est la me ˆme chose’’—Alphonse Karr. Nearly twenty years ago, with all the deep credentials of having once run a clustering program, I expressed an interest in Systematics at the beginning of my graduate studies and was greeted with the question, ‘‘do you know the difference between Phenetics and Cladistics?’’ (I didn’t.) It is a probably a good thing that my graduating students are also mostly ignorant about this Maginot line of the 70s and 80s. In my mind, this shift—in which knowing the difference between phenetics and cladistics changed from the shibboleth of sys- tematics to vague history—happened around the Sturm und Drang (Futuyma 1988) of the Asilomar Society for the Study of Evolution meetings, where more than one talk explained the mechanics of polymerase chain reaction (PCR) amplifi- cation. Two major methodological advancements developed around that time. The first, not a minor advance, is the growth and availability of fast personal computers. Al- though much of the quantitative methodological devel- opment happened during the earlier phase of modern sys- tematics, the actual use of numerical methods was greatly impeded by both the speed of available computers and the difficulty of their use. The intrepid systematist who wanted to use these techniques often also had to be a computer programmer familiar with arcane systems-level commands (if the person was fortunate enough to have access to a machine in the first place). Today programs such as PAUP* (Swofford 1997) are revolutionary in their ease of use and are available on almost everyone’s desktop. The second important development is the spectacular advancement of molecular biology. Although there were early attempts to use biochemical and molecular data in systematics (e.g., Wilson et al. 1974; Sibley and Ahlquist 1981), the diffi- culty of use, inaccessibility, and cost were major obstacles. Again these problems were largely overcome during the mid-80s with the development of techniques such as the PCR. This wide availability of molecular data was (and is) important for several reasons (setting aside the imagined increase in rigor). First, the amount of available phylo- genetically relevant data was greatly expanded. Previously, a large dataset of morphological characters might have 1 Molecular Evolution and Phylogenetics. Masatoshi Nei and Sud- hir Kumar. 2000. Oxford University Press, Oxford, U.K. xi plus 333 pp. HB $90.00, ISBN 019-513584-0; PB $50.00, ISBN 0-19- 513585-7. contained 100 characters, whereas a large molecular da- taset today may contain several thousand characters or even several million characters. Second, by an extremely fortunate circumstance of nature, it turned out to be pos- sible to extract comparable molecular characters over a wide range of taxonomic distance, in fact, perhaps across all of Life. This contrasts markedly with morphological data where we cannot imagine extracting homologous char- acters between, for example, Escherichia coli and humans. Third and most important, it made systematics accessible to the larger population wherein a student could be trained in matter of weeks to produce useful data, compared to years of apprenticeship required for anatomical studies. These developments were critical to the expansion of phylogenetic biology and its entry into mainstream evo- lutionary biology, as discussed in Doug Futuyma’s article and presidential address (Futuyma 1988). Phylogenetic bi- ology refers to the use of explicit history to address bio- logical problems. Evolutionary biology, of course, also concerns itself with the historical process; however, this concern is mostly focused on abstracting the general fea- tures of the historical process—for example, the process of natural selection. Phylogenetic biology proposes that the study of explicit histories is an indispensable part of extracting general principles and that key insights can be gained from particular features of the history. Pivotal pa- pers such as Felsenstein (1985) and Donoghue (1989) showed both the importance of accounting for historical constraints and the utility of historical reconstructions in evaluating and constructing evolutionary hypotheses. Most biological organization develops through a bifurcating de- scent process and the fact that this is reflected in the pat- terns of conservation and differentiation of molecular func- tion expanded the domain of phylogenetic biology to mo- lecular biology proper. Lastly, the realization that many other processes (e.g., the evolution of languages and the spread of diseases) follow a similar branching pattern fur- ther widened the applicability of the phylogenetic esti- mation techniques, such that today, phylogenetic biology and the incorporation of historical reconstruction are as ubiquitous as the idea of natural selection. It is then not so coincidental that Masatoshi Nei’s major foray into phylogenetics started around this time, with the introduction of the neighbor-joining method with Saitou (Saitou and Nei 1987): the mood of the times seems to have evoked the glue to join molecular evolution and sys- tematics. The neighbor-joining algorithm blends the the- oretical spirit of additive distance estimates with the prac- tical algorithm of sequential-tree building. And it has been shown to perform well. In Molecular Evolution and Phy-

AFTER THE MOLECULAR EVOLUTION REVOLUTION

Embed Size (px)

Citation preview

Page 1: AFTER THE MOLECULAR EVOLUTION REVOLUTION

2620

q 2001 The Society for the Study of Evolution. All rights reserved.

BOOK REVIEWSEvolution, 55(12), 2001, pp. 2620–2622

AFTER THE MOLECULAR EVOLUTION REVOLUTION1

JUNHYONG KIM

Department of Ecology and Evolutionary Biology, Yale University, 165 Prospect Street, New Haven, Connecticut 06511

Received October 30, 2001.

‘‘Shift happens’’—Dilbert.‘‘Plus ca change, plus c’est la meme chose’’—AlphonseKarr.

Nearly twenty years ago, with all the deep credentials ofhaving once run a clustering program, I expressed an interestin Systematics at the beginning of my graduate studies andwas greeted with the question, ‘‘do you know the differencebetween Phenetics and Cladistics?’’ (I didn’t.) It is a probablya good thing that my graduating students are also mostlyignorant about this Maginot line of the 70s and 80s. In mymind, this shift—in which knowing the difference betweenphenetics and cladistics changed from the shibboleth of sys-tematics to vague history—happened around the Sturm undDrang (Futuyma 1988) of the Asilomar Society for the Studyof Evolution meetings, where more than one talk explainedthe mechanics of polymerase chain reaction (PCR) amplifi-cation.

Two major methodological advancements developedaround that time. The first, not a minor advance, is thegrowth and availability of fast personal computers. Al-though much of the quantitative methodological devel-opment happened during the earlier phase of modern sys-tematics, the actual use of numerical methods was greatlyimpeded by both the speed of available computers and thedifficulty of their use. The intrepid systematist who wantedto use these techniques often also had to be a computerprogrammer familiar with arcane systems-level commands(if the person was fortunate enough to have access to amachine in the first place). Today programs such as PAUP*(Swofford 1997) are revolutionary in their ease of use andare available on almost everyone’s desktop. The secondimportant development is the spectacular advancement ofmolecular biology. Although there were early attempts touse biochemical and molecular data in systematics (e.g.,Wilson et al. 1974; Sibley and Ahlquist 1981), the diffi-culty of use, inaccessibility, and cost were major obstacles.Again these problems were largely overcome during themid-80s with the development of techniques such as thePCR. This wide availability of molecular data was (and is)important for several reasons (setting aside the imaginedincrease in rigor). First, the amount of available phylo-genetically relevant data was greatly expanded. Previously,a large dataset of morphological characters might have

1 Molecular Evolution and Phylogenetics. Masatoshi Nei and Sud-hir Kumar. 2000. Oxford University Press, Oxford, U.K. xi plus333 pp. HB $90.00, ISBN 019-513584-0; PB $50.00, ISBN 0-19-513585-7.

contained 100 characters, whereas a large molecular da-taset today may contain several thousand characters oreven several million characters. Second, by an extremelyfortunate circumstance of nature, it turned out to be pos-sible to extract comparable molecular characters over awide range of taxonomic distance, in fact, perhaps acrossall of Life. This contrasts markedly with morphologicaldata where we cannot imagine extracting homologous char-acters between, for example, Escherichia coli and humans.Third and most important, it made systematics accessibleto the larger population wherein a student could be trainedin matter of weeks to produce useful data, compared toyears of apprenticeship required for anatomical studies.

These developments were critical to the expansion ofphylogenetic biology and its entry into mainstream evo-lutionary biology, as discussed in Doug Futuyma’s articleand presidential address (Futuyma 1988). Phylogenetic bi-ology refers to the use of explicit history to address bio-logical problems. Evolutionary biology, of course, alsoconcerns itself with the historical process; however, thisconcern is mostly focused on abstracting the general fea-tures of the historical process—for example, the processof natural selection. Phylogenetic biology proposes thatthe study of explicit histories is an indispensable part ofextracting general principles and that key insights can begained from particular features of the history. Pivotal pa-pers such as Felsenstein (1985) and Donoghue (1989)showed both the importance of accounting for historicalconstraints and the utility of historical reconstructions inevaluating and constructing evolutionary hypotheses. Mostbiological organization develops through a bifurcating de-scent process and the fact that this is reflected in the pat-terns of conservation and differentiation of molecular func-tion expanded the domain of phylogenetic biology to mo-lecular biology proper. Lastly, the realization that manyother processes (e.g., the evolution of languages and thespread of diseases) follow a similar branching pattern fur-ther widened the applicability of the phylogenetic esti-mation techniques, such that today, phylogenetic biologyand the incorporation of historical reconstruction are asubiquitous as the idea of natural selection.

It is then not so coincidental that Masatoshi Nei’s majorforay into phylogenetics started around this time, with theintroduction of the neighbor-joining method with Saitou(Saitou and Nei 1987): the mood of the times seems tohave evoked the glue to join molecular evolution and sys-tematics. The neighbor-joining algorithm blends the the-oretical spirit of additive distance estimates with the prac-tical algorithm of sequential-tree building. And it has beenshown to perform well. In Molecular Evolution and Phy-

Page 2: AFTER THE MOLECULAR EVOLUTION REVOLUTION

2621BOOK REVIEWS

logenetics, Nei and Kumar similarly blend theory and prac-tical advice in an effective melange. The book treats mo-lecular phylogenetics theory and molecular evolution the-ory in an approximately equal mixture, but the slant isdefinitely towards phylogenetics, with the molecular evo-lution topics covered mostly to support the theoretical ba-sis for phylogenetic estimation. For example, the first fourchapters explain molecular models of evolutionary changemainly to show what they imply for joint distributionalpatterns over a tree. One of the later chapters discussespopulation trees from genetic markers, but this is not anin-depth treatment of coalescence theory. The level of pre-sentation is such that I would imagine this book to be anexcellent textbook for a one-semester course on molecularsystematics for graduate students or fairly advanced un-dergraduate students. It is particularly appropriate materialfor empirically oriented students because of its practicalapproach in discussing theoretical material. Typical of thispractical advice is the section on distance selection at theend of chapter 6. Rather than falling into the fallacy of‘‘the more realistic the model the better,’’ the authors givea common-sense taxonomy of distance computation basedon bias and variance trade-off considerations. In fact, oneof the major strengths of the book—as is expected of theauthors—is the well-rounded discussion in the first twochapters, of molecular evolution and genetic distances, es-pecially formulas for variance computation.

One of the important changes that accompanied main-streaming of phylogenetics is the use of statistical lan-guage. The merits of an explicit statistical framework forsystematics had been vigorously argued, to the degree thatquestions were raised whether one should talk of ‘‘phy-logenetic estimation’’ or ‘‘phylogenetic reconstruction’’(or even whether there was anything to reconstruct). Muchof the opposition to statistical language seemed historical;those upstarts of the numerical taxonomy school had start-ed out with the controversial idea that numerical analysiscould reproduce, and in fact better, traditional systematicpractice (a challenge generated over a bet on a six-pack ofbeer, according to R. Sokal). I doubt the real oppositionwas anything more than sociological, but the surface ar-gument seemed to boil down to whether ‘‘unrealistic as-sumptions’’ (such as found in simulations and stochasticmodels) tell us anything about the ‘‘real’’ world. A prob-lem with this view is that ‘‘realistic assumptions’’ do nottell us much either. That is, statements about the naturalworld are inductive statements and for such statements theidea that ‘‘truth follows from correct assumptions’’ is rath-er tenuous. We can make the wrong assumption that thesun moves around the earth and still come up with thecorrect prediction that the sun will rise tomorrow morning.Having a consistent model-theoretic view (i.e., ‘‘correctassumptions’’) often leads to a larger domain of inductiveinference and therefore statements with more utility (e.g.,an earth-centered view may be perfectly fine for predictingthe disposition of the sun in the morning, but it is not likelyto be useful for launching astronauts into space). But de-ductive consistency cannot guarantee correct inductivestatements. If we rule out ESP, all statements about theworld are model-theoretic inductive statements. Statistics

is the science of quantitative inference; that is, it is thescience of induction. Once phylogenetics escaped the so-ciological constraints of its genesis, it became natural towear statistics clothes—although there are still those whowonder if the clothes are those of the Emperor.

Nei and Kumar, of course, use statistical language andframework for their book—starting with the quote fromEfron and Tibshirani of the (avoidable) difficulty of sta-tistics. The mathematical and statistical treatment is at anappropriate level for a graduate audience, similar to Nei’sMolecular Evolutionary Genetics (1987). However, certainparts of the book are a bit strained in their statistical dis-cussion. For example, the authors state that the tree to-pology is not a parameter under standard statistical theorybecause it is not a numerical quantity (and the corollaryargument that maximum-likelihood methods for estimatingtrees do not have trees as a parameter). The definition ofa parameter can be tricky. One definition involves a set offunctions from the probability measure over the eventspace such that the measure is characterized up to a suitableequivalence. A dual construction involves a function froma reasonable index set to a collection of probability mea-sures. However, the case of tree topology involves nothingparticularly difficult except for the fact that the index set,that is, the tree topologies, is discrete and without muchuseful structure such as an algebraic construction or evena partial ordering. The standard maximum-likelihood es-timator does estimate tree topologies as a parameter.

Similarly, I was uncomfortable with the discussion ofexpected trees, realized trees, and estimated trees. Thereis a more natural way to discuss the so-called realized treeswhereby we might consider the change events on the treeas hidden random variables—thus one might think of es-timating them in a Bayesian sense. But the authors’ dis-cussions on whether different tree estimation methods es-timate the expected trees or the realized trees seem con-fusing. In the particular examples discussed in the book,the estimation methods are trying to estimate the tree froma family of models that include nonclock models; the par-ticular sample happens to fit a nonclock model better eventhough it was generated under a clock model; a sample ofnormally distributed numbers may fit a nonzero mean bet-ter even if their underlying distribution had zero mean.This simply means that our estimator is unbiased with re-spect to the larger model family. There is also a variationwith respect to the use of the word ‘‘bias’’ at the end ofchapter 5 that is not standard. This leads me to the onesection of the book to which I would partly object. Thisis the section on the optimization principle and topologicalerrors discussed at the beginning of chapter 9. Here, theauthors state that a particular statistic, R 5 (Le 2 Lt)/Lt,where Le is the objective function value of the estimatedtree and Lt is the objective function value of the true tree,tends to negative values when sample size is small. Thus,they argue that the optimization principles generally giveincorrect topologies when sample size is small. It is indeedthe case that small samples tend to produce incorrect es-timates. However, this is a property of most well-behavedstatistical estimators; it is not a special property of treeestimators, nor does the behavior of R indicate particular

Page 3: AFTER THE MOLECULAR EVOLUTION REVOLUTION

2622 BOOK REVIEWS

problems with the objective functions (which may verywell exist, of course). For example, consider a sample froma normal distribution and the least-squares estimate of themean, that is, 1/nSxi. If we computed R for this estimatorwhere Le is the sum of squares and Lt is the sum of squaresaround the true mean, R will tend to have negative valueswith small sample size. But this does not indicate a specialproblem with the least-squares estimator. Overall, this isa peccadillo in a well-balanced book with strong statisticalunderpinnings.

Besides the well-advertised philosophical disagree-ments, systematics is marred by a dogmatic search for ahammer to pound all the nails, the screws, the bolts, andthe dissenters. Thus, instead of each new methodologicaldevelopment becoming another tool in our toolbox (to pos-sibly turn a screw rather than pound it), it becomes a can-didate hammer of Thor to solve all the problems. We mayhave adopted the statistical language and more quantitativeapproaches to the phylogeny problems, but we do not seemto have abandoned the search for and advocacy of thismighty forger of Rheingold. Old philosophical argumentstake on new guises: people talk of ‘‘getting the right answerfor the wrong reasons’’ or vise versa. The idea that ‘‘real’’is different from ‘‘models’’ has turned into the idea thatan increasingly more realistic model of evolution incor-porated into statistical tools will yield the ultimately cor-rect tree (truth following from true assumptions argument).The latest hammer seems to be the Bayesian one and Ipredict that decision-theoretic statistical tools will join thepounding line soon. In this respect, Nei and Kumar’s bookis refreshing in that it commits itself to neither the bio-logical dogma nor the statistical dogma. As I mentionedbefore, a balanced practical outlook is the greatest strengthof this book.

A cynical analysis would say that phylogenetics becamepopular because it generated a production paradigm: grindup organisms, sequence, estimate a tree, and discuss char-acter evolution over the tree. There is a certain truth tothis view, as many of my nonsystematics colleagues wouldagree. However, a whole new territory is just beginning tobe covered (unfortunately with parallel literature), namelyphylogenetic analysis of genomics data, as pointed out inthe Perspective section of this book. It is well recognizedthat the gap between the amount of systematically collectedmolecular data and biological knowledge will never beclosed by direct experimentation. Computational and sta-tistical analysis will play a central role in converting thesemassive data into biologically relevant information. Phy-logenies, even as a rough sketch, provide a road map ofthe generating process of biodiversity and will play a cru-cial role in the future transition of molecular biology intoa more quantitative field.

But, even as we feel optimistic about the future, a thoughtremains in my mind that a certain dialogue has been lost in

systematics. The idea of a phylogeny emphasizes the gen-erating process—what some might say is the causal historyof current diversity and disparity. Twentieth-century sciencemight be characterized as the triumph of causal theories overpattern classification. Classification in the tradition of Lin-naeus now seems to be considered an inferior mode of sci-ence. Causal explanation of phenomena is seen to give a morefundamental insight into nature and many advocate the pri-macy of lineage-dependent systematics. However, causal ex-planations and classificatory explanations are dual descrip-tions of the same thing. The flight of a stone may be describedin terms of Newtonian mechanics or classified as a parabolictrajectory. The utility of each description depends on theneed. Differential equations are useful when we wish to knowwhere that stone will land. Knowing all thrown stones followa parabolic trajectory is useful when we wish to determineif in fact the stone might be a bird. (Most birders will ac-knowledge that a flight path is a key feature for identifyingspecies.) There was considerable attention paid to theoreticalproperties of classification such as stability, efficiency, pre-dictability, and so forth, that seems distant today (along withall the other important pre-1988 literature that seems to belost). These properties are not always naturally carried by aphylogeny (e.g., see Steel et al. 2000). Different constructionshave different utilities. Science should be carried out effi-ciently. In both methodology and philosophy, a dose of prac-ticality evident in this book and a good look at the diversityof tools and constructs will help us realize that sometimesVelcro is better than a rusty nail and sometimes a hammeris just a hammer.

LITERATURE CITED

Donoghue, M. J. 1989. Phylogenies and the analysis of evolutionarysequences, with examples from seed plants. Evolution 43:1127–1156.

Felsenstein, J. 1985. Phylogenies and the comparative method. Am.Nat. 125:1–15.

Futuyma, D. J. 1988. Sturm und Drang and the evolutionary syn-thesis. Evolution 42:217–226.

Nei, M. 1987. Molecular Evolutionary Genetics. Cambridge Univ.Press, Cambridge, UK.

Saitou, N., and M. Nei. 1987. The neighbor-joining method: A newmethod for reconstructing phylogenetic trees. J. Mol. Evol. 4:406–425.

Sibley, C. G., and J. E. Ahlquist. 1981. The phylogeny and re-lationships of the ratite birds as indicated by DNA-DNA hy-bridization. Pp. 301–355 in G. G. E. Scudder and J. L. Reveal,eds. Evolution Today. Carnegie Mellon Univ. Press, Pitts-burgh, PA.

Steel, M., A. W. M. Dress, and S. Bocker. 2000. Some simple butfundamental limits for supertree and consensus tree methods.Syst. Biol. 42:363–368.

Swofford, D. L. 1997. PAUP: phylogenetic analysis using parsi-mony. Ver. 4.0. Sinauer, Sunderland, MA.

Wilson, A. C., V. M. Sarich, and L. R. Maxson. 1974. The impor-tance of gene rearrangement in evolution: Evidence from studiesof rates of chromosomal, protein and anatomical evolution. Proc.Natl. Acad. Sci. USA 71:3028–3030.

Book Review Editor: D. Futuyma