3
ANALYTICAL BIOCHEMISTRY 247, 462–464 (1997) BOOK REVIEWS DNA Sequencing Strategies: Automated and Advanced Ap- would have also liked to see a protocol for single-stranded DNA prep- aration from phagemids. The PCR section discusses several sources proaches. Edited by W. ANSORGE, H. VOSS, AND J. ZIMMERMANN. Wiley, New York, and Spektrum Akademischer Verlag, Heidel- of template DNA for the PCR amplification and provides two alterna- tive approaches to sequencing the products, one involving cycle se- berg, 1997. 202 pp. quencing of double-stranded PCR product and the other involving a strand separation based on a biotinylated primer in the PCR. Proce- This text arises from the protocols and experience used in teaching dures are presented for scaling up all of these protocols effectively. a yearly DNA sequencing course in the EMBO Practical Course Se- A great deal of emphasis is appropriately placed on PCR product ries. Thus, the material presented is biased toward the approaches optimization and PCR primers and protocols are presented for the found to work best by the investigators at the EMBL laboratories. human mitochondrial D loop and for the human APC gene. Because these investigators represent one of the top sequencing Chapter 3 discusses the sequencing protocols themselves, with spe- groups in the world, this represents some of the most successful cific emphasis on fluorescently labeled primers and on internal label- approaches available. In addition, the authors’ experience with pre- ing. I was disappointed that there were no protocols for the dye senting this material in a course has certainly helped them to refine terminator primers that are gaining increasing acceptance. Sequenc- their presentation to the most essential elements. ing reactions for the use of sequenase and for cycle sequencing are To some extent, I think the title may be misleading. The first both presented and represent an excellent series of alternatives. All chapter is really devoted to what I would refer to as sequencing of the sequencing strategies are aimed at fluorescent methods for strategies. These include shotgun, sequential deletion and a com- automated sequencers, but many of the sections of the text could be bined random/directed approach which the authors refer to as easily adapted to more manual sequencing methods. There is a sec- RANDI. However, the next three quarters of the volume then present tion of troubleshooting the reactions. There is even a short section different technical aspects of carrying through those strategies, in- on automating Maxam – Gilbert sequencing for specialty uses. cluding DNA preparations and sequencing reactions, data assembly, Chapter 4 discusses primer design and provides a number of useful and computer data analysis. Thus, the volume spans the entire field tips. There is extensive discussion of primer synthesis and purifica- of DNA sequencing, but with an emphasis on automation and high tion that will serve only as background for the bulk of individuals who throughput. In none of these areas is the discussion of options ex- obtain their primers from commercial or core laboratories. Chapter 5 haustive, but each area presents a brief discussion of several ap- provides protocols for sequencing gel preparation with little discus- proaches to the task, followed by detailed protocols for accomplishing sion of any useful variation. those tasks. Each chapter ends with a number of very valuable prac- Chapter 6 does a good job of presenting the approaches needed to tical comments as well as discussions of pitfalls and troubleshooting. assemble sequencing fragments into contigs and means to carry out The protocols and comments are all derived from extensive experi- standard computer analysis of the sequencing data. Discussion of edit- ence and are therefore very efficient and to the point. ing of the data and determination of sequence quality is very important. In the first chapter the authors present protocols for making random The discussion revolves around the use of the EMBL Gene Skipper libraries from sheared DNA as well as restriction enzyme partial diges- program and is therefore not exhaustive in terms of the different com- tions. They also discuss the use of sequential deletions for directed se- puter packages available or of the total types of analysis that may be quencing, as well as direct primer walking. They further discuss combi- available. However, the presentation will give the user an appreciation nations of initiating a project in a random manner and closing the for the practical aspects of handling a large sequencing project. gaps in the sequence with directed sequencing strategies. There are Although the volume cannot be anywhere near exhaustive of the numerous variants of all of these procedures that are not covered, but useful approaches and procedures available for sequencing, I found that the protocols presented provide an excellent representation of the most there was a well-balanced presentation of representative protocols that commonly used methods. This chapter also provides specific discussion would accomplish almost any sequencing requirement. There is suffi- of cloning vectors and primers used in subcloning and sequencing. cient discussion to introduce the subject of DNA sequencing to the The second chapter discusses the preparation of the sequencing relative novice and yet enough excellent and varied protocols to make templates. Most investigators find that this is the most critical part the volume of use to an experienced sequencer as well. of a successful DNA sequencing reaction. Protocols for both manual and automated preparation of M13 single-stranded templates and PRESCOTT DEININGER Department of Biochemistry double-stranded plasmid templates are presented. Additionally, there is an extensive discussion of PCR-derived templates. These Louisiana State University Medical College New Orleans, Louisiana 70112 cover the most commonly used sequencing templates, although I ARTICLE NO. AB972081 Methods in Enzymology, Volume 266, Computer Methods for mology volume focusing on the computational analysis of protein and Macromolecular Sequence Analysis. Edited by RUSSELL F. nucleic acid sequences (Volume 183 of this series; R. Doolittle, Ed.). DOOLITTLE. Academic Press, San Diego, 1996. 711 pp., $110. Given the large-scale genome sequencing and mapping projects ongo- ing around the world, those 6 years have been particularly fruitful for the fields of computational biology and bioinformatics. Two devel- It has been 6 years since publication of the first Methods in Enzy- 462 0003-2697/97 $25.00 Copyright q 1997 by Academic Press All rights of reproduction in any form reserved.

Methods in Enzymology, Volume 266, Computer Methods for Macromolecular Sequence Analysis. Edited by Russell F. Doolittle

Embed Size (px)

Citation preview

Page 1: Methods in Enzymology, Volume 266, Computer Methods for Macromolecular Sequence Analysis. Edited by Russell F. Doolittle

ANALYTICAL BIOCHEMISTRY 247, 462–464 (1997)

BOOK REVIEWS

DNA Sequencing Strategies: Automated and Advanced Ap- would have also liked to see a protocol for single-stranded DNA prep-aration from phagemids. The PCR section discusses several sourcesproaches. Edited by W. ANSORGE, H. VOSS, AND J. ZIMMERMANN.

Wiley, New York, and Spektrum Akademischer Verlag, Heidel- of template DNA for the PCR amplification and provides two alterna-tive approaches to sequencing the products, one involving cycle se-berg, 1997. 202 pp.quencing of double-stranded PCR product and the other involving astrand separation based on a biotinylated primer in the PCR. Proce-This text arises from the protocols and experience used in teachingdures are presented for scaling up all of these protocols effectively.a yearly DNA sequencing course in the EMBO Practical Course Se-A great deal of emphasis is appropriately placed on PCR productries. Thus, the material presented is biased toward the approachesoptimization and PCR primers and protocols are presented for thefound to work best by the investigators at the EMBL laboratories.human mitochondrial D loop and for the human APC gene.Because these investigators represent one of the top sequencing

Chapter 3 discusses the sequencing protocols themselves, with spe-groups in the world, this represents some of the most successfulcific emphasis on fluorescently labeled primers and on internal label-approaches available. In addition, the authors’ experience with pre-ing. I was disappointed that there were no protocols for the dyesenting this material in a course has certainly helped them to refineterminator primers that are gaining increasing acceptance. Sequenc-their presentation to the most essential elements.ing reactions for the use of sequenase and for cycle sequencing areTo some extent, I think the title may be misleading. The firstboth presented and represent an excellent series of alternatives. Allchapter is really devoted to what I would refer to as sequencingof the sequencing strategies are aimed at fluorescent methods forstrategies. These include shotgun, sequential deletion and a com-automated sequencers, but many of the sections of the text could bebined random/directed approach which the authors refer to aseasily adapted to more manual sequencing methods. There is a sec-RANDI. However, the next three quarters of the volume then presenttion of troubleshooting the reactions. There is even a short sectiondifferent technical aspects of carrying through those strategies, in-on automating Maxam–Gilbert sequencing for specialty uses.cluding DNA preparations and sequencing reactions, data assembly,

Chapter 4 discusses primer design and provides a number of usefuland computer data analysis. Thus, the volume spans the entire fieldtips. There is extensive discussion of primer synthesis and purifica-of DNA sequencing, but with an emphasis on automation and hightion that will serve only as background for the bulk of individuals whothroughput. In none of these areas is the discussion of options ex-obtain their primers from commercial or core laboratories. Chapter 5haustive, but each area presents a brief discussion of several ap-provides protocols for sequencing gel preparation with little discus-proaches to the task, followed by detailed protocols for accomplishingsion of any useful variation.those tasks. Each chapter ends with a number of very valuable prac-

Chapter 6 does a good job of presenting the approaches needed totical comments as well as discussions of pitfalls and troubleshooting.assemble sequencing fragments into contigs and means to carry outThe protocols and comments are all derived from extensive experi-standard computer analysis of the sequencing data. Discussion of edit-ence and are therefore very efficient and to the point.ing of the data and determination of sequence quality is very important.In the first chapter the authors present protocols for making randomThe discussion revolves around the use of the EMBL Gene Skipperlibraries from sheared DNA as well as restriction enzyme partial diges-program and is therefore not exhaustive in terms of the different com-tions. They also discuss the use of sequential deletions for directed se-puter packages available or of the total types of analysis that may bequencing, as well as direct primer walking. They further discuss combi-available. However, the presentation will give the user an appreciationnations of initiating a project in a random manner and closing thefor the practical aspects of handling a large sequencing project.gaps in the sequence with directed sequencing strategies. There are

Although the volume cannot be anywhere near exhaustive of thenumerous variants of all of these procedures that are not covered, butuseful approaches and procedures available for sequencing, I found thatthe protocols presented provide an excellent representation of the mostthere was a well-balanced presentation of representative protocols thatcommonly used methods. This chapter also provides specific discussionwould accomplish almost any sequencing requirement. There is suffi-of cloning vectors and primers used in subcloning and sequencing.cient discussion to introduce the subject of DNA sequencing to theThe second chapter discusses the preparation of the sequencingrelative novice and yet enough excellent and varied protocols to maketemplates. Most investigators find that this is the most critical partthe volume of use to an experienced sequencer as well.of a successful DNA sequencing reaction. Protocols for both manual

and automated preparation of M13 single-stranded templates and PRESCOTT DEININGER

Department of Biochemistrydouble-stranded plasmid templates are presented. Additionally,there is an extensive discussion of PCR-derived templates. These Louisiana State University Medical College

New Orleans, Louisiana 70112cover the most commonly used sequencing templates, although I

ARTICLE NO. AB972081

Methods in Enzymology, Volume 266, Computer Methods for mology volume focusing on the computational analysis of protein andMacromolecular Sequence Analysis. Edited by RUSSELL F. nucleic acid sequences (Volume 183 of this series; R. Doolittle, Ed.).DOOLITTLE. Academic Press, San Diego, 1996. 711 pp., $110. Given the large-scale genome sequencing and mapping projects ongo-

ing around the world, those 6 years have been particularly fruitfulfor the fields of computational biology and bioinformatics. Two devel-It has been 6 years since publication of the first Methods in Enzy-

462 0003-2697/97 $25.00Copyright q 1997 by Academic Press

All rights of reproduction in any form reserved.

AID AB BKRV / 6m2f$$1081 04-13-97 23:32:36 aba

Page 2: Methods in Enzymology, Volume 266, Computer Methods for Macromolecular Sequence Analysis. Edited by Russell F. Doolittle

BOOK REVIEWS 463

opments in particular, the creation of the EST databases and the Similarity Searches.’’ In the course of a discussion of sequence mask-ing in searching the EST databases, the author touts parallel pro-now-commonplace use of the World Wide Web, have had a significantcessing in general and distributed parallel processing in particularimpact in this area. This new volume nicely reflects these changes.as an affordable solution to the need for increased computing power.Most of the databases and algorithms described are available asThis second section also contains two chapters concerned with therobust, user-friendly software available on the Web, and the authorsanalysis of nucleotide sequences: Chapter 16 nicely describes theinclude the pertinent URLs for public access to these resources. Thecoding region recognition system GRAIL, while the following chapteremphasis here is on the analysis of protein sequence, though nottreats a linguistic analysis of genomic DNA, including a handy compi-exclusively; chapters range from molecular evolution, gene-findinglation of other word-finding algorithms within the field. Recent se-from genomic DNA sequence, to protein structure prediction. Thequencing of the entire genome of several organisms has led to astructure of the volume itself is logically divided into five sections:general strategy for the analysis of genome-scale sequence data sets.section I (Chapters 1–8) describes the best-known databases, pri-Chapter 18 addresses some of the novel patterns that emerge frommarily sequence and structure. Section II (Chapters 9–19) coversthis level of analysis, such as paralogs—homologous genes from thethe more widely used pattern-recognition algorithms used to searchsame organism that have related but not identical functions.the databases, and Sections III (Chapters 20–28), IV (Chapters 29–

33), and V (Chapters 34–40) treat multiple sequence alignments, The alignment of multiple related sequences has been fundamen-tal to protein sequence analysis for some time. Section III, ‘‘Multiplesecondary structure, and three-dimensional considerations, respec-

tively. As with its predecessor, Volume 266 is primarily directed at Alignment and Phylogenetic Trees,’’ is anchored by Chapter 21, con-tributed by the editor. Chapter 20 gives careful consideration of thecomputationally inclined biologists rather than biologically inclined

computer scientists. critical issue of weighting gaps, and briefly compares several otherpublished approaches to the problem. The many molecular biologistsThe first chapter describes the databases and services offered to thewho rely upon the CLUSTAL series of progressive alignment pro-research community by the European Bioinformatics Institute. Thisgrams will appreciate the straightforward explanations of Chapterchapter sets the tone for the volume with its clarity, copious references22. On the other hand, Chapter 28 on ‘‘Parametric and Inverse-to the literature, URL addresses for pertinent Web sites and ftpParametric Sequence Alignment with XPARAL’’ is targeted towardservers, and even a glossary of terms for those new to the Internet.computational biologists. Chapter 27 shows how local alignment sta-Chapter 2 describes the Expressed Gene Anatomy (EGAD) and Hu-tistics, central to the original BLAST algorithm, have been used byman cDNA (HCD) databases. These resources have developed a logicalthe authors to develop new versions of BLAST programs that tolerateinfrastructure for organizing large EST data sets based on presumedgaps in sequence comparisons. Though less so than in volume 183,protein function. Chapters 3 and 4 describe data models and the pro-this section of Vol. 266 addresses estimating sequence divergencetein superfamily classification system of the Protein Identification Re-and the construction of phylogenetic trees from nucleotide sequencesource (PIR). The BLOCKS database contains protein domains com-(Chapters 25 and 26), protein sequence (Chapter 24), as well as frommon to known families of related proteins found by a local multiplea combined DNA and protein alignment algorithm (Chapter 23).alignment (without insertions or deletions) as well as the actual multi-

ple alignments. The construction and searching of BLOCKS occupies Section IV, ‘‘Secondary Structure Considerations,’’ is good but un-derrepresented. The accuracy of secondary structure predictions hasChapter 6. One approach to more efficient searching of the budding

sequence databases is the automated classification of new sequences benefited greatly from multiple alignments. The AMAS program,based on the conservation of 10 physicochemical properties at eachinto sequence classes or families. The Gene Classification Artificial

Neural System algorithm (GenCANS) using a neural network system position in a multiple protein sequence alignment, is very clearlyillustrated with examples from 6 blind predictions in Chapter 29.is ably described in Chapter 5. Another approach to this search space

problem is through the creation of sequence sublibraries specific to The COILS program described in Chapter 30 is quite useful for theprediction and analysis of solvent-exposed left-handed coils; exam-the needs of a particular protein family or project. Chapter 7 outlines

the means to index such libraries. At the other end of the spectrum, ples are given for both soluble globular and membrane proteins.Chapter 32 presents an updated presentation of the popular GORthe querying of a constellation of diverse, linked databases of con-

trasting data types is the function of the widely used Sequence Re- method of predicting protein secondary structure. Chapter 31 docu-ments the PHD family of programs for the prediction of secondarytrieval System (SRS), detailed in Chapter 8.structure, relative sovent accessibility, and transmembrane helices.The second section concerns searching through databases. ChapterThese tools use evolutionary information from a multiple sequence9 begins with a discussion of a network BLAST service, followed byalignment as input into a two-level neural network, followed by cor-a description of the Entrez molecular biology database and retrievalrecting filters to produce quite accurate predictions. As with othersystem in Chapter 10. A lucid, practical user’s guide to other se-software described in this volume, the author provides the WWWquence similarity tools such as FASTA and Smith–Waterman isURLs and ftp address to servers running this suite of programs;outlined in Chapter 15; still, one would like to see a more detailedperhaps it also bears mentioning that commercial institutions arediscussion of the sensitive Smith–Waterman algorithm if not in thischarged $1000/year for access. Only a brief discussion of periodicitychapter, somewhere in this volume. For evaluating sequence similar-in the context of helical secondary structures appears to have landedities in the twilight zone—those 10–20% of real similarities too dis-in Chapter 33, the ‘‘Analysis of Compositionally-Biased Sequences,’’tant to be detected by BLAST or FASTA alone—Chapters 11–13in this section of this volume. To this reader this discussion of thepresent sequence-based search strategies such as protein profileSEG family of programs would be more appropriate in Section II.analysis and motif searching. Chapter 11 in particular thoughtfullyMoreover, given that this volume is nearly 700 pages, the 74 pagesoutlines a series of checklists for conducting motif and profiledevoted to Section IV seem spare.searches and the interpretation of results. Chapter 19 reviews results

obtained using a method to generate an information-rich query for The last section, ‘‘Three-Dimensional Considerations,’’ offers a rep-resentative sample of the treasure of new structure prediction andsearching the sequence databases. Using six examples the authors

present the strengths and limitations of using templates of known comparison methods now available. Section V begins with Chapter34, a concise examination of amino acid replacements from alignedlocal structure as seed queries, followed by an iterative refinement

of the template through multiple alignments with the best scoring 3D structures of related proteins. This permits the detection of envi-ronment-specific residue substitutions and gap penalties that varymatches from each database search. Also, suggestions are offered as

to how to decrease the running time of this computationally intensive according to their position along a fold. The following chapter out-lines several methods for generating a 3D profile from protein se-method. Few chapters in this volume address this problem to much

degree; an exception is Chapter 14, ‘‘Effective Large-Scale Sequence quence of unknown structure to determine if that sequence is compa-

AID AB BKRV / 6m2f$$1101 04-13-97 23:32:36 aba

Page 3: Methods in Enzymology, Volume 266, Computer Methods for Macromolecular Sequence Analysis. Edited by Russell F. Doolittle

BOOK REVIEWS464

table with known three-dimensional structures. The 3D profiles are of chapters describing some of the most useful tools and methods forthe analysis of macromolecular sequence. Most of the chapters oncreated from a position-by-position score for each amino acid basedalgorithms present the assumptions implicit in their formulation,on 18 composite physicochemical and environmental parameters.giving the biologist a good idea to which questions they are bestChapter 36 reviews various methods for predicting protein structuralapplied, as well as an informed use of the parameters. One minorfamilies based on internal geometry, including the Sequential Struc-complaint would be that some of the chapters make only passingture Alignment Program (SSAP). SSAP represents three-dimen-reference to the computer itself. Computational issues such as creat-sional structures by two-dimensional matrices consisting of vectorsing algorithms readily adaptable to parallel processing, making data-from Cb to Cb atoms. Chapter 39 formulates the Dali method, whichbases more interoperative, or presenting analysis output in new vi-generates protein structure alignments using a distance plot of Casual ways are critical to making computational tools that scale withdistances. A concise description of the anatomy of the SCOP databasethe exponential growth of sequence data. Still, no other text gives(Structural Classification of Proteins) in Chapter 37 adds to thisas coherent and comprehensive a review of the field. This volumestrong section of the volume. Finally, Chapters 38 (‘‘Detecting Struc-should be within reach of every molecular biology lab’s computertural Similarities: A User’s Guide’’) and 40 (‘‘From Block Alignmentworkstation.to Structural Insights’’) are not methods chapters per se but provide

some empirical rules for the interpretation of structure predictions MITCHELL MARTINin general, from potential pitfalls to structural insights. Hoffmann–LaRoche

Nutley, New Jersey 07110In summary, this volume is a very helpful, nonredundant collection

ARTICLE NO. AB972082

Methods in Molecular Biology, Volume 56, Crystallographic phy to a number of contemporary areas of structural biology: protein/DNA interactions, virus crystallography, and membrane proteins.Methods and Protocols, Edited by CHRISTOPHER JONES, BAR-

BARA MULLOY, AND MARK R. SANDERSON. Humana Press, Totowa, Although early chapters contain some repetition, the majority ofthe book gives the reader a very good introduction and critique ofNew Jersey, 1996. 394 pp., $69.50.this area. The chapters on multiple-wavelength anomalous diffrac-

There is growing appreciation among molecular biologists of the tion and isomorphous replacement will be particularly helpful to thepotential benefits of obtaining the three-dimensional structure of beginner. This volume is very practical, containing many protocolsa protein or protein/ligand complex. This book is intended to give and lists of required components, with source addresses. In general,molecular biologists, who are collaborating with a crystallographer the writers give sufficient theory to orient the reader in the rightor are themselves moving in this direction, an awareness of the chal- direction before proceeding with practical considerations.lenges involved in obtaining a refined structure. Overall, I believe that this volume succeeds in its proposed purpose

The opening chapter gives a good basic sketch of the theories be- and would be a valuable asset to anyone seeking to begin to under-hind structure solving, introducing the reader to many of the terms stand crystallography.first encountered in this discipline. However, a comprehensive bibli-ography is provided for those wishing to read more widely. The book TIMOTHY M. JENKINS

Laboratory of Molecular Biologythen covers most of the important methods required for structuredetermination, such as preparation of suitable material, crystal National Institute of Diabetes and

Digestive and Kidney Diseasesgrowth and characterization, data collection, phase determination,production of an electron density map, model building, and structure National Institutes of Health

Bethesda, Maryland 20892refinement. The final chapters cover the application of crystallogra-

ARTICLE NO. AB969964

AID AB BKRV / 6m2f$$1101 04-13-97 23:32:36 aba