12
Bioinformatics

Bioinformatics - Springer978-1-60327-159-2/1.pdf · Preface Bioinformatics is the management and analysis of data for the life sciences. As such, it is inherently interdisciplinary,

  • Upload
    lamphuc

  • View
    225

  • Download
    3

Embed Size (px)

Citation preview

Bioinformatics

460. Essential Concepts in Toxicogenomics, edited by Donna L. Mendrick and William B. Mattes, 2008

459. Prion Protein Protocols, edited by Andrew F. Hill, 2008458. Artificial Neural Networks: Methods and Applica-

tions, edited by David S. Livingstone, 2008457. Membrane Trafficking, edited by Ales Vancura, 2008456. Adipose Tissue Protocols, Second Edition, edited by

Kaiping Yang, 2008455. Osteoporosis, edited by Jennifer J. Westendorf, 2008454. SARS- and Other Coronaviruses: Laboratory Proto-

cols, edited by Dave Cavanagh, 2008453. Bioinformatics, Volume II: Structure, Function and

Applications, edited by Jonathan M. Keith, 2008452. Bioinformatics, Volume I: Data, Sequence Analysis and

Evolution, edited by Jonathan M. Keith, 2008451. Plant Virology Protocols: From Viral Sequence to Pro-

tein Function, edited by Gary Foster, Elisabeth Johansen, Yiguo Hong, and Peter Nagy, 2008

450. Germline Stem Cells, edited by Steven X. Hou and Shree Ram Singh, 2008

449. Mesenchymal Stem Cells: Methods and Protocols, edited by Darwin J. Prockop, Douglas G. Phinney, and Bruce A. Brunnell, 2008

448. Pharmacogenomics in Drug Discovery and Develop-ment, edited by Qing Yan, 2008

447. Alcohol: Methods and Protocols, edited by Laura E. Nagy, 2008

446. Post-translational Modification of Proteins: Tools for Functional Proteomics, Second Edition, edited by Christoph Kannicht, 2008

445. Autophagosome and Phagosome, edited by Vojo Deretic, 2008

444. Prenatal Diagnosis, edited by Sinhue Hahn and Laird G. Jackson, 2008

443. Molecular Modeling of Proteins, edited by Andreas Kukol, 2008.

442. RNAi: Design and Application, edited by Sailen Barik, 2008

441. Tissue Proteomics: Pathways, Biomarkers, and Drug Discovery, edited by Brian Liu, 2008

440. Exocytosis and Endocytosis, edited by Andrei I. Ivanov, 2008

439. Genomics Protocols, Second Edition, edited by Mike Starkey and Ramnanth Elaswarapu, 2008

438. Neural Stem Cells: Methods and Protocols, Second Edition, edited by Leslie P. Weiner, 2008

437. Drug Delivery Systems, edited by Kewal K. Jain, 2008436. Avian Influenza Virus, edited by Erica Spackman, 2008435. Chromosomal Mutagenesis, edited by Greg Davis and

Kevin J. Kayser, 2008434. Gene Therapy Protocols: Volume II: Design and Char-

acterization of Gene Transfer Vectors, edited by Joseph M. LeDoux, 2008

433. Gene Therapy Protocols: Volume I: Production and In Vivo Applications of Gene Transfer Vectors, edited by Joseph M. LeDoux, 2008

432. Organelle Proteomics, edited by Delphine Pflieger and Jean Rossier, 2008

431. Bacterial Pathogenesis: Methods and Protocols, edited by Frank DeLeo and Michael Otto, 2008

430. Hematopoietic Stem Cell Protocols, edited by Kevin D. Bunting, 2008

429. Molecular Beacons: Signalling Nucleic Acid Probes, Methods and Protocols, edited by Andreas Marx and Oliver Seitz, 2008

428. Clinical Proteomics: Methods and Protocols, edited by Antonia Vlahou, 2008

427. Plant Embryogenesis, edited by Maria Fernanda Suarez and Peter Bozhkov, 2008

426. Structural Proteomics: High-Throughput Methods, edited by Bostjan Kobe, Mitchell Guss, and Huber Thomas, 2008

425. 2D PAGE: Sample Preparation and Fractionation, Volume II, edited by Anton Posch, 2008

424. 2D PAGE: Sample Preparation and Fractionation, Volume I, edited by Anton Posch, 2008

423. Electroporation Protocols: Preclinical and Clinical Gene Medicine, edited by Shulin Li, 2008

422. Phylogenomics, edited by William J. Murphy, 2008421. Affinity Chromatography: Methods and Protocols,

Second Edition, edited by Michael Zachariou, 2008420. Drosophila: Methods and Protocols, edited by Christian

Dahmann, 2008419. Post-Transcriptional Gene Regulation, edited by Jeffrey

Wilusz, 2008418. Avidin–Biotin Interactions: Methods and Applications,

edited by Robert J. McMahon, 2008417. Tissue Engineering, Second Edition, edited by

Hannsjörg Hauser and Martin Fussenegger, 2007416. Gene Essentiality: Protocols and Bioinformatics, edited

by Svetlana Gerdes and Andrei L. Osterman, 2008415. Innate Immunity, edited by Jonathan Ewbank and Eric

Vivier, 2007414. Apoptosis in Cancer: Methods and Protocols, edited by

Gil Mor and Ayesha Alvero, 2008413. Protein Structure Prediction, Second Edition, edited

by Mohammed Zaki and Chris Bystroff, 2008412. Neutrophil Methods and Protocols, edited by Mark

T. Quinn, Frank R. DeLeo, and Gary M. Bokoch, 2007

411. Reporter Genes: A Practical Guide, edited by Don Anson, 2007

410. Environmental Genomics, edited by Cristofre C. Martin, 2007

409. Immunoinformatics: Predicting Immunogenicity In Silico, edited by Darren R. Flower, 2007

408. Gene Function Analysis, edited by Michael Ochs, 2007407. Stem Cell Assays, edited by Vemuri C. Mohan, 2007406. Plant Bioinformatics: Methods and Protocols, edited by

David Edwards, 2007405. Telomerase Inhibition: Strategies and Protocols, edited

by Lucy Andrews and Trygve O. Tollefsbol, 2007

METHODS IN MOLECULAR BIOLOGY™

John M. Walker, SERIES EDITOR

M E T H O D S I N M O L E C U L A R B I O L O G Y ™

BioinformaticsVolume I

Data, Sequence Analysis and Evolution

Edited by

Jonathan M. Keith, PhD

School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia

ISBN: 978-1-58829-707-5 e-ISBN: 978-1-60327-159-2ISSN 1064-3745 e-ISSN: 1940-6029DOI: 10.1007/978-1-60327-159-2

Library of Congress Control Number: 2007943036

© 2008 Humana Press, a part of Springer Science+Business Media, LLCAll rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Humana Press, 999 Riverview Drive, Suite 208, Totowa, NJ 07512 USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden.The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.While the advice and information in this book are believed to be true and accurate at the date of going to press, nei-ther the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.

Cover illustration: Fig. 4, Chapter 19, “Inferring Ancestral Protein Interaction Networks,” by José M. Peregrín-Alvarez

Printed on acid-free paper

9 8 7 6 5 4 3 2 1

springer.com

EditorJonathan M. KeithSchool of Mathematical SciencesQueensland University of TechnologyBrisbane, Queensland, [email protected]

Series EditorJohn WalkerHatfield, Hertfordshire AL10 9NPUK

PrefaceBioinformatics is the management and analysis of data for the life sciences. As such,

it is inherently interdisciplinary, drawing on techniques from Computer Science, Sta-tistics, and Mathematics and bringing them to bear on problems in Biology. Moreover, its subject matter is as broad as Biology itself. Users and developers of Bioinformatics methods come from all of these fields. Molecular biologists are some of the major users of Bioinformatics, but its techniques are applicable across a range of life sciences. Other users include geneticists, microbiologists, biochemists, plant and agricultural scientists, medical researchers, and evolution researchers.

The ongoing exponential expansion of data for the life sciences is both the major challenge and the raison d’être for twenty-first century Bioinformatics. To give one example among many, the completion and success of the human genome sequencing project, far from being the end of the sequencing era, motivated a proliferation of new sequencing projects. And it is not only the quantity of data that is expanding; new types of biological data continue to be introduced as a result of technological development and a growing understanding of biological systems.

Bioinformatics describes a selection of methods from across this vast and expand-ing discipline. The methods are some of the most useful and widely applicable in the field. Most users and developers of Bioinformatics methods will find something of value to their own specialties here, and will benefit from the knowledge and experience of its 86 contributing authors. Developers will find them useful as components of larger meth-ods, and as sources of inspiration for new methods. Volume I, Section IV in particular is aimed at developers; it describes some of the “meta-methods”—widely applicable mathematical and computational methods that inform and lie behind other more spe-cialized methods—that have been successfully used by bioinformaticians. For users of Bioinformatics, this book provides methods that can be applied as is, or with minor vari-ations to many specific problems. The Notes section in each chapter provides valuable insights into important variations and when to use them. It also discusses problems that can arise and how to fix them. This work is also intended to serve as an entry point for those who are just beginning to discover and use methods in Bioinformatics. As such, this book is also intended for students and early career researchers.

As with other volumes in the Methods in Molecular Biology™ series, the intention of this book is to provide the kind of detailed description and implementation advice that is crucial for getting optimal results out of any given method, yet which often is not incorporated into journal publications. Thus, this series provides a forum for the com-munication of accumulated practical experience.

The work is divided into two volumes, with data, sequence analysis, and evolution the subjects of the first volume, and structure, function, and application the subjects of the second. The second volume also presents a number of “meta-methods”: techniques that will be of particular interest to developers of bioinformatic methods and tools.

Within Volume I, Section I deals with data and databases. It contains chapters on a selection of methods involving the generation and organization of data, including

v

sequence data, RNA and protein structures, microarray expression data, and func-tional annotations.

Section II presents a selection of methods in sequence analysis, beginning with multiple sequence alignment. Most of the chapters in this section deal with methods for discovering the functional components of genomes, whether genes, alternative splice sites, non-coding RNAs, or regulatory motifs.

Section III presents several of the most useful and interesting methods in phylogenetics and evolution. The wide variety of topics treated in this section is indicative of the breadth of evolution research. It includes chapters on some of the most basic issues in phylogenet-ics: modelling of evolution and inferring trees. It also includes chapters on drawing infer-ences about various kinds of ancestral states, systems, and events, including gene order, recombination events and genome rearrangements, ancestral interaction networks, lateral gene transfers, and patterns of migration. It concludes with a chapter discussing some of the achievements and challenges of algorithm development in phylogenetics.

In Volume II, Section I, some methods pertinent to the prediction of protein and RNA structures are presented. Methods for the analysis and classification of structures are also discussed.

Methods for inferring the function of previously identified genomic elements (chiefly protein-coding genes) are presented in Volume II, Section II. This is another very diverse subject area, and the variety of methods presented reflects this. Some well-known techniques for identifying function, based on homology, “Rosetta stone” genes, gene neighbors, phylogenetic profiling, and phylogenetic shadowing are discussed, alongside methods for identifying regulatory sequences, patterns of expres-sion, and participation in complexes. The section concludes with a discussion of a technique for integrating multiple data types to increase the confidence with which functional predictions can be made. This section, taken as a whole, highlights the opportunities for development in the area of functional inference.

Some medical applications, chiefly diagnostics and drug discovery, are described in Volume II, Section III. The importance of microarray expression data as a diagnostic tool is a theme of this section, as is the danger of over-interpreting such data. The case study presented in the final chapter highlights the need for computational diagnostics to be biologically informed.

The final section presents just a few of the “meta-methods” that developers of Bioinformatics methods have found useful. For the purpose of designing algorithms, it is as important for bioinformaticians to be aware of the concept of fixed parameter tractability as it is for them to understand NP-completeness, since these concepts often determine the types of algorithms appropriate to a particular problem. Clustering is a ubiquitous problem in Bioinformatics, as is the need to visualize data. The need to interact with massive data bases and multiple software entities makes the development of computational pipelines an important issue for many bioinformaticians. Finally, the chapter on text mining discusses techniques for addressing the special problems of interacting with and extracting information from the vast biological literature.

Jonathan M. Keith

vi Preface

Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vContributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixContents of Volume II. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

SECTION I: DATA AND DATABASES

1. Managing Sequence Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Ilene Karsch Mizrachi

2. RNA Structure Determination by NMR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29Lincoln G. Scott and Mirko Hennig

3. Protein Structure Determination by X-Ray Crystallography . . . . . . . . . . . . . . . . . . 63Andrea Ilari and Carmelinda Savino

4. Pre-Processing of Microarray Data and Analysis of Differential Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89Steffen Durinck

5. Developing an Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111Midori A. Harris

6. Genome Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125Hideya Kawaji and Yoshihide Hayashizaki

SECTION II: SEQUENCE ANALYSIS

7. Multiple Sequence Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143Walter Pirovano and Jaap Heringa

8. Finding Genes in Genome Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163Alice Carolyn McHardy

9. Bioinformatics Detection of Alternative Splicing . . . . . . . . . . . . . . . . . . . . . . . . . . 179Namshin Kim and Christopher Lee

10. Reconstruction of Full-Length Isoforms from Splice Graphs . . . . . . . . . . . . . . . . . 199Yi Xing and Christopher Lee

11. Sequence Segmentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207Jonathan M. Keith

12. Discovering Sequence Motifs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231Timothy L. Bailey

SECTION III: PHYLOGENETICS AND EVOLUTION

13. Modeling Sequence Evolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255Pietro Liò and Martin Bishop

14. Inferring Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287Simon Whelan

vii

15. Detecting the Presence and Location of Selection in Proteins. . . . . . . . . . . . . . . . . 311Tim Massingham

16. Phylogenetic Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 Lars Sommer Jermiin, Vivek Jayaswal, Faisal Ababneh, and John Robinson

17. Inferring Ancestral Gene Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365Julian M. Catchen, John S. Conery, and John H. Postlethwait

18. Genome Rearrangement by the Double Cut and Join Operation . . . . . . . . . . . . . . 385Richard Friedberg, Aaron E. Darling, and Sophia Yancopoulos

19. Inferring Ancestral Protein Interaction Networks. . . . . . . . . . . . . . . . . . . . . . . . . . 417José M. Peregrín-Alvarez

20. Computational Tools for the Analysis of Rearrangements in Mammalian Genomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431Guillaume Bourque and Glenn Tesler

21. Detecting Lateral Genetic Transfer: A Phylogenetic Approach . . . . . . . . . . . . . . . . . 457Robert G. Beiko and Mark A. Ragan

22. Detecting Genetic Recombination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471Georg F. Weiller

23. Inferring Patterns of Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485Paul M.E. Bunje and Thierry Wirth

24. Fixed-Parameter Algorithms in Phylogenetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507Jens Gramm, Arfst Nickelsen, and Till Tantau

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537Evolution Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551

viii Contents

Contributors

FAISAL ABABNEH • Department of Mathematics and Statistics, Al-Hussein Bin Talal University, Ma’an, Jordan

TIMOTHY L. BAILEY • ARC Centre of Excellence in Bioinformatics, and Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia

ROBERT G. BEIKO • Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada

MARTIN BISHOP • CNR-ITB Institute of Biomedical Technologies, Segrate, Milano, ItalyGUILLAUME BOURQUE • Genome Institute of Singapore, Singapore, Republic of SingaporePAUL M.E. BUNJE • Department of Biology, Lehrstuhl für Zoologie und Evolutionsbiologie,

University of Konstanz, Konstanz, GermanyJULIAN M. CATCHEN • Department of Computer and Information Science and Institute

of Neuroscience, University of Oregon, Eugene, ORJOHN S. CONERY • Department of Computer and Information Science, University of

Oregon, Eugene, ORAARON E. DARLING • ARC Centre of Excellence in Bioinformatics, and Institute for

Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia STEFFEN DURINCK • Katholieke Universiteit Leuven, Leuven, BelgiumRICHARD FRIEDBERG • Department of Physics, Columbia University, New York, NYJENS GRAMM • Wilhelm-Schickard-Institut für Informatik, Universität Tübingen,

Tübingen, Germany MIDORI A. HARRIS • European Molecular Biology Laboratory – European Bioinformatics

Institute, Hinxton, Cambridge, United KingdomYOSHIHIDE HAYASHIZAKI • Genome Exploration Research Group, RIKEN Yokohama

Institute, Yokohama, Kanagawa, Japan; and Genome Science Laboratory, RIKEN Wako Institute, Wako, Saitama, Japan

MIRKO HENNIG • Department of Biochemistry and Molecular Biology, Medical University of South Carolina, Charleston, SC

JAAP HERINGA • Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands

ANDREA ILARI • CNR Institute of Molecular Biology and Pathology (IBPM), Department of Biochemical Sciences, University of Rome, “Sapienza,” Roma, Italy

VIVEK JAYASWAL • School of Mathematics and Statistics, Sydney Bioinformatics and Centre for Mathematical Biology, University of Sydney, Sydney, New South Wales, Australia

LARS SOMMER JERMIIN • School of Biological Sciences, Sydney Bioinformatics and Centre for Mathematical Biology, University of Sydney, Sydney, New South Wales, Australia

HIDEYA KAWAJI • Functional RNA Research Program, Frontier Research System, RIKEN Wako Institute, Wako, Saitama, Japan

ix

JONATHAN M. KEITH • School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia

NAMSHIN KIM • Molecular Biology Institute, Institute for Genomics and Proteomics, Department of Chemistry and Biochemistry, University of California, Los Angeles, CA

CHRISTOPHER LEE • Molecular Biology Institute, Institute for Genomics and Proteomics, Department of Chemistry and Biochemistry, University of California, Los Angeles, CA

PIETRO LIÒ • Computer Laboratory, University of Cambridge, Cambridge, United Kingdom

TIM MASSINGHAM • European Molecular Biology Laboratory – European Bioinformatics Institute, Hinxton, Cambridge, United Kingdom

ALICE CAROLYN MCHARDY • IBM Thomas J. Watson Research Center, Yorktown Heights, NY

ILENE KARSCH MIZRACHI • National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD

ARFST NICKELSEN • Institut für Theoretische Informatik, Universität zu Lübeck, Lübeck, Germany

JOSÉ M. PEREGRÍN-ALVAREZ • SickKids Research Institute, Toronto, Ontario, CanadaWALTER PIROVANO • Centre for Integrative Bioinformatics (IBIVU), VU University

Amsterdam, Amsterdam, The NetherlandsJOHN H. POSTLETHWAIT • Institute of Neuroscience, University of Oregon, Eugene, ORMARK A. RAGAN • ARC Centre of Excellence in Bioinformatics, and Institute for Molec-

ular Bioscience, The University of Queensland, Brisbane, Queensland, AustraliaJOHN ROBINSON • School of Mathematics and Statistics and Centre for Mathematical

Biology, University of Sydney, Sydney, New South Wales, AustraliaCARMELINDA SAVINO • CNR-Institute of Molecular Biology and Pathology (IBPM),

Department of Biochemical Sciences, University of Rome, “Sapienza,” Roma, ItalyLINCOLN G. SCOTT • Cassia, LLC, San Diego, CATILL TANTAU • Institut für Theoretische Informatik, Universität zu Lübeck, Lübeck,

GermanyGLENN TESLER • Department of Mathematics, University of California, San Diego,

La Jolla, CAGEORG F. WEILLER • Research School of Biological Sciences and ARC Centre of Excellence

for Integrative Legume Research, The Australian National University, Canberra, Australian Capital Territory, Australia

SIMON WHELAN • Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom

THIERRY WIRTH • Museum National d’Histoire Naturelle, Department of Systematics and Evolution, Herbier, Paris, France

YI XING • Department of Internal Medicine, Carver College of Medicine and Department of Biomedical Engineering, University of Iowa, Iowa City, IA

SOPHIA YANCOPOULOS • The Feinstein Institute for Medical Research, Manhasset, NY

x Contributors

Contents of Volume II

SECTION I: STRUCTURES

1. UNAFold: Software for Nucleic Acid Folding and HybridizationNicholas R. Markham and Michael Zuker

2. Protein Structure PredictionBissan Al-Lazikani, Emma E. Hill, and Veronica Morea

3. An Introduction to Protein Contact PredictionNicholas Hamilton and Thomas Huber

4. Analysis of Mass Spectrometry Data in ProteomicsRune Matthiesen and Ole N. Jensen

5. The Classification of Protein DomainsRussell L. Marsden and Christine A. Orengo

SECTION II: INFERRING FUNCTION

6. Inferring Function from HomologyRichard D. Emes

7. The Rosetta Stone MethodShailesh V. Date

8. Inferring Functional Relationships from Conservation of Gene OrderGabriel Moreno-Hagelsieb

9. Phylogenetic ProfilingShailesh V. Date and José M. Peregrín-Alvarez

10. Phylogenetic Shadowing: Sequence Comparisons of Multiple Primate SpeciesDario Boffelli

11. Prediction of Regulatory ElementsAlbin Sandelin

12. Expression and MicroarraysJoaquín Dopazo and Fátima Al-Shahrour

13. Identifying Components of ComplexesNicolas Goffard and Georg Weiller

14. Integrating Functional Genomics DataInsuk Lee and Edward M. Marcotte

SECTION III: APPLICATIONS AND DISEASE

15. Computational Diagnostics with Gene Expression ProfilesClaudio Lottaz, Dennis Kostka, Florian Markowetz, and Rainer Spang

16. Analysis of Quantitative Trait LociMario Falchi

17. Molecular Similarity Concepts and Search CalculationsJens Auer and Jürgen Bajorath

xi

18. Optimization of the MAD Algorithm for Virtual ScreeningHanna Eckert and Jürgen Bajorath

19. Combinatorial Optimization Models for Finding Genetic Signatures from Gene Expression DatasetsRegina Berretta, Wagner Costa, and Pablo Moscato

20. Genetic Signatures for a Rodent Model of Parkinson’s Disease Using Combinatorial Optimization Methods Mou’ath Hourani, Regina Berretta, Alexandre Mendes, and Pablo Moscato

SECTION IV: ANALYTICAL AND COMPUTATIONAL METHODS

21. Developing Fixed-Parameter Algorithms to Solve Combinatorially Explosive Biological ProblemsFalk Hüffner, Rolf Niedermeier, and Sebastian Wernicke

22. ClusteringGeoffrey J. McLachlan, Richard W. Bean, and Shu-Kay Ng

23. VisualizationFalk Schreiber

24. Constructing Computational PipelinesMark Halling-Brown and Adrian J. Shepherd

25. Text MiningAndrew B. Clegg and Adrian J. Shepherd

xii Contents of Volume II