30
BIOMOLECULAR NETWORKS Methods and Applications in Systems Biology LUONAN CHEN Osaka Sangyo University, Japan RUI-SHENG WANG Renmin University of China, China XIANG-SUN ZHANG Chinese Academy of Science, China

BIOMOLECULAR NETWORKS€¦ · 1.2 Biomolecular Networks in Cells / 8 1.3 Network Systems Biology / 13 1.4 About This Book / 18. I GENE NETWORKS 23. 2 Transcription Regulation: Networks

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • BIOMOLECULARNETWORKSMethods and Applications inSystems Biology

    LUONAN CHENOsaka Sangyo University, Japan

    RUI-SHENG WANGRenmin University of China, China

    XIANG-SUN ZHANGChinese Academy of Science, China

    InnodataFile Attachment9780470488058.jpg

  • BIOMOLECULARNETWORKS

  • BIOMOLECULARNETWORKSMethods and Applications inSystems Biology

    LUONAN CHENOsaka Sangyo University, Japan

    RUI-SHENG WANGRenmin University of China, China

    XIANG-SUN ZHANGChinese Academy of Science, China

  • Copyright # 2009 by John Wiley & Sons, Inc. All rights reserved

    Published by John Wiley & Sons, Inc., Hoboken, New JerseyPublished simultaneously in Canada

    No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form orby any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permittedunder Section 107 or 108 of the 1976 United States Copyright Act, without either the prior writtenpermission of the Publisher, or authorization through payment of the appropriate per-copy fee to theCopyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978)750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should beaddressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030,(201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

    Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts inpreparing this book, they make no representations or warranties with respect to the accuracy or completenessof the contents of this book and specifically disclaim any implied warranties of merchantability or fitnessfor a particular purpose. No warranty may be created or extended by sales representatives or written salesmaterials. The advice and strategies contained herein may not be suitable for your situation. You shouldconsult with a professional where appropriate. Neither the publisher nor author shall be liable for any lossof profit or any other commercial damages, including but not limited to special, incidental, consequential,or other damages.

    For general information on our other products and services or for technical support, please contact ourCustomer Care Department within the United States at (800) 762-2974, outside the United States at(317) 572-3993 or fax (317) 572-4002.

    Wiley also publishes its books in variety of electronic formats. Some content that appears in print maynot be available in electronic formats. For more information about Wiley products, visit our web site atwww.wiley.com.

    Library of Congress Cataloging-in-Publication Data:

    Chen, Luonan, 1962-Biomolecular networks : methods and applications in systems biology / Luonan Chen, Rui-ShengWang,

    Xiang-Sun Zhang.p. cm.

    Includes bibliographical references and index.ISBN 978-0-470-24373-2 (cloth)

    1. Molecular biology–Data processing. 2. Computationalbiology. 3. Bioinformatics. 4. Biological systems–Research–Data processingI. Wang, Rui-Sheng. II. Zhang, Xiang-Sun, 1943- III. Title.QH506.C48 2009572.80285—dc22

    2009005776

    Printed in the United States of America

    10 9 8 7 6 5 4 3 2 1

    http://www.copyright.comhttp://www.wiley.com/go/permissionhttp://www.wiley.com

  • We Dedicate This Book to Our Colleagues and Our Families

  • CONTENTS

    PREFACE xiii

    ACKNOWLEDGMENTS xv

    LIST OF ILLUSTRATIONS xvii

    ACRONYMS xxiii

    1 Introduction 1

    1.1 Basic Concepts in Molecular Biology / 1

    1.1.1 Genomes, Genes, and DNA Replication Process / 5

    1.1.2 Transcription Process for RNA Synthesis / 6

    1.1.3 Translation Process for Protein Synthesis / 7

    1.2 Biomolecular Networks in Cells / 8

    1.3 Network Systems Biology / 13

    1.4 About This Book / 18

    I GENE NETWORKS 23

    2 Transcription Regulation: Networks and Models 25

    2.1 Transcription Regulation and Gene Expression / 25

    2.1.1 Transcription and Gene Regulation / 25

    2.1.2 Microarray Experiments and Databases / 28

    2.1.3 ChIP-Chip Technology and Transcription FactorDatabases / 30

    2.2 Networks in Transcription Regulation / 32

    2.3 Nonlinear Models Based on Biochemical Reactions / 36

    2.4 Integrated Models for Regulatory Networks / 43

    2.5 Summary / 44

    vii

  • 3 Reconstruction of Gene Regulatory Networks 47

    3.1 Mathematical Models of Gene RegulatoryNetwork / 47

    3.1.1 Boolean Networks / 48

    3.1.2 Bayesian Networks / 49

    3.1.3 Markov Networks / 52

    3.1.4 Differential Equations / 53

    3.2 Reconstructing Gene Regulatory Networks / 55

    3.2.1 Singular Value Decomposition / 56

    3.2.2 Model-Based Optimization / 58

    3.3 Inferring Gene Networks from Multiple Datasets / 61

    3.3.1 General Solutions and a Particular Solution of NetworkStructures for Multiple Datasets / 63

    3.3.2 Decomposition Algorithm / 65

    3.3.3 Numerical Validation / 67

    3.4 Gene Network-Based Drug Target Identification / 72

    3.4.1 Network Identification Methods / 73

    3.4.2 Linear Programming Framework / 77

    3.5 Summary / 87

    4 Inference of Transcriptional Regulatory Networks 89

    4.1 Predicting TF Binding Sites and Promoters / 89

    4.2 Inference of Transcriptional Interactions / 92

    4.2.1 Differential Equation Methods / 93

    4.2.2 Bayesian Approaches / 96

    4.2.3 Data Mining and Other Methods / 98

    4.3 Identifying Combinatorial Regulations of TFs / 99

    4.4 Inferring Cooperative Regulatory Networks / 105

    4.4.1 Mathematical Models / 105

    4.4.2 Estimating TF Activity / 106

    4.4.3 Linear Programming Models / 108

    4.4.4 Numerical Validation / 109

    4.5 Prediction of Transcription Factor Activity / 114

    4.5.1 Matrix Factorization / 114

    4.5.2 Nonlinear Models / 117

    4.6 Summary / 118

    viii CONTENTS

  • II PROTEIN INTERACTION NETWORKS 119

    5 Prediction of Protein–Protein Interactions 121

    5.1 Experimental Protein–Protein Interactions / 121

    5.2 Prediction of Protein–Protein Interactions / 126

    5.2.1 Association Methods / 127

    5.2.2 Maximum-Likelihood Estimation / 134

    5.2.3 Deterministic Optimization Approaches / 139

    5.3 Protein Interaction Prediction Based on Multidomain Pairs / 150

    5.3.1 Cooperative Domains, Strongly Cooperative Domains,Superdomains / 152

    5.3.2 Inference of Multidomain Interactions / 154

    5.3.3 Numerical Validation / 157

    5.3.4 Reconstructing Complexes by MultidomainInteractions / 160

    5.4 Domain Interaction Prediction Methods / 163

    5.4.1 Statistical Method / 163

    5.4.2 Domain Pair Exclusion Analysis / 163

    5.4.3 Parsimony Explanation Approaches / 164

    5.4.4 Integrative Approaches / 165

    5.5 Summary / 167

    6 Topological Structure of Biomolecular Networks 169

    6.1 Statistical Properties of Biomolecular Networks / 169

    6.2 Evolution of Protein Interaction Networks / 173

    6.3 Hubs, Motifs, and Modularity in BiomolecularNetworks / 174

    6.3.1 Network Centralities and Hubs / 174

    6.3.2 Network Modularity and Motifs / 177

    6.4 Explorative Roles of Hubs and Network Motifs / 179

    6.4.1 Dynamic Modularity Organized by Hubsand Network Motifs / 180

    6.4.2 Network Motifs Acting as Connectorsbetween Pathways / 186

    6.5 Modularity Evaluation of Biomolecular Networks / 194

    6.5.1 Modularity Density D / 195

    6.5.2 Improving Module Resolution Limits by D / 196

    CONTENTS ix

  • 6.5.3 Equivalence between D and Kernel k Means / 198

    6.5.4 Extension of D to General Criteria: Dl and Dw / 199

    6.5.5 Numerical Validation / 200

    6.6 Summary / 204

    7 Alignment of Biomolecular Networks 205

    7.1 Biomolecular Networks from Multiple Species / 205

    7.2 Pairwise Alignment of Biomolecular Networks / 207

    7.2.1 Score-Based Algorithms / 208

    7.2.2 Evolution-Guided Method / 211

    7.2.3 Graph Matching Algorithm / 212

    7.3 Network Alignment by Mathematical Programming / 213

    7.3.1 Integer Programming Formulation / 214

    7.3.2 Components of the Integer Quadratic ProgrammingApproach / 216

    7.3.3 Numerical Validation / 217

    7.4 Multiple Alignment of Biomolecular Networks / 223

    7.5 Subnetwork and Pathway Querying / 225

    7.6 Summary / 228

    8 Network-Based Prediction of Protein Function 231

    8.1 Protein Function and Annotation / 231

    8.2 Protein Functional Module Detection / 234

    8.2.1 Distance-Based Clustering Methods / 235

    8.2.2 Graph Clustering Methods / 236

    8.2.3 Validation of Module Detection / 238

    8.3 Functional Linkage for Protein Function Annotation / 239

    8.3.1 Bayesian Approach / 239

    8.3.2 Hopfield Network Method / 241

    8.3.3 p-Value Method / 242

    8.3.4 Statistical Framework / 243

    8.4 Protein Function Prediction from High-Throughput Data / 249

    8.4.1 Neighborhood Approaches / 250

    8.4.2 Optimization Methods / 251

    8.4.3 Probabilistic Methods / 254

    8.4.4 Machine Learning Techniques / 256

    x CONTENTS

  • 8.5 Function Annotation Methods for Domains / 265

    8.5.1 Domain Sources / 267

    8.5.2 Integration of Heterogeneous Data / 268

    8.5.3 Domain Function Prediction / 270

    8.5.4 Numerical Validation / 271

    8.6 Summary / 277

    III METABOLIC NETWORKS AND SIGNALINGNETWORKS 279

    9 Metabolic Networks: Analysis, Reconstruction,and Application 281

    9.1 Cellular Metabolism and Metabolic Pathways / 281

    9.2 Metabolic Network Analysis and Modeling / 286

    9.2.1 Flux Balance Analysis / 286

    9.2.2 Elementary Mode and Extreme Pathway Analysis / 288

    9.2.3 Modeling Metabolic Networks / 292

    9.3 Reconstruction of Metabolic Networks / 294

    9.3.1 Pathfinding Based on Reactions and Compounds / 294

    9.3.2 Stoichiometric Approaches Based on Flux Profiles / 297

    9.3.3 Inferring Biochemical Networks fromTimecourse Data / 298

    9.4 Drug Target Detection in Metabolic Networks / 300

    9.4.1 Drug Target Detection Problem / 301

    9.4.2 Integer Linear Programming Model / 302

    9.4.3 Numerical Validation / 305

    9.5 Summary / 311

    10 Signaling Networks: Modeling and Inference 313

    10.1 Signal Transduction in Cellular Systems / 313

    10.2 Modeling of Signal Transduction Pathways / 316

    10.2.1 Differential Equation Models / 317

    10.2.2 Petri Net Models / 319

    10.3 Inferring Signaling Networks fromHigh-Throughput Data / 321

    10.3.1 NetSearch Method / 322

    CONTENTS xi

  • 10.3.2 Ordering Signaling Components / 323

    10.3.3 Color-Coding Methods / 324

    10.4 Inferring Signaling Networks by Linear Programming / 326

    10.4.1 Integer Linear Programming Model / 327

    10.4.2 Significance Measures / 329

    10.4.3 Numerical Validation / 329

    10.4.4 Inferring Signaling Networks by NetworkFlow Models / 338

    10.5 Inferring Signaling Networks from Experimental Evidence / 341

    10.6 Summary / 343

    11 Other Topics and New Trends 345

    11.1 Network-Based Protein Structural Analysis / 345

    11.2 Integration of Biomolecular Networks / 347

    11.3 Posttranscriptional Regulation of Noncoding RNAs / 349

    11.4 Biomolecular Interactions and Human Diseases / 350

    11.5 Summary / 352

    REFERENCES 353

    INDEX 381

    xii CONTENTS

  • PREFACE

    Network-based systems biology (or Network Systems Biology), an emerging areafocusing on various biomolecular networks, is a multidisciplinary intersection ofmathematics, computer science, and biology. Burgeoning high-throughput data aredriving the integrative study from describing complex phenomena to understandingessential design principles, from studying individual components to understandingfunctional networks for biomolecular systems, cells, organs, and even entire organ-isms. To elucidate the fundamental mechanisms of cellular systems, study of biomo-lecular networks is increasingly attracting much attention from many academic fieldssuch as mathematics, information science, and the life sciences. A major challenge innetwork systems biology is to investigate how cellular systems facilitate biologicalfunctions by various interactions (pathways and networks) between genes, proteins,and metabolites. Based on analytical and computational methodologies, network sys-tems biology studies how an organism, viewed as a dynamical or interacting networkof biomolecules (e.g., genes, proteins, and complexes) and biochemical reactions,eventually gives rise to a complex life. In contrast to individual molecules, biomole-cular networks governed by universal laws offer a new conceptual framework thatcould potentially revolutionize our view of biology and pathology. Therefore, it ismandatory that mathematicians and computer scientists provide theoretical andcomputational methodologies to reveal the essential biological mechanisms ofliving organisms from a system or network perspective.

    Keeping this in mind, this book comprehensively covers the contents and the topicson modeling, inferring, and analyzing biomolecular networks in cellular systems onthe basis of available experimental data, in particular stressing the aspects of network,system, integration, and engineering. Each topic is treated in depth with specific bio-logical problems and novel computational methods. From a biological viewpoint, thisbook, based on the authors’ research work and experience in studying biomolecularnetworks, describes a variety of research topics related to biomolecular networkswith deep analysis of many real examples and detailed descriptions of the latesttrends, such as gene regulatory networks, transcription regulatory networks, proteininteraction networks, metabolic networks, signal transduction networks, and inte-gration of heterogenous networks. On the other hand, from a computational perspec-tive, this book covers many theoretical or computational methods from several areas,such as optimization, differential equations, probability theory, statistics, graph theory,complex systems, network analysis, statistical thermodynamics, graphical modeling,and machine learning, which are all applied in the analysis of biomolecular networks.

    xiii

  • The goal of this book is to help readers understand the state-of-the-art techniquesin bioinformatics and systems biology, particularly the theory and application ofbiomolecular networks.

    The potential readers are (1) specialists and advanced students in systems biologyand computational biology and practitioners in industry, (2) researchers and graduatestudents in computer science and mathematics who are interested in systems biology,and (3) molecular biologists who are interested in using computational tools to analyzebiological networks. Hence, any university or research institute with a bioinformaticsor systems biology program in this field will find this book useful.

    The contents of this book are based mainly on collaborative studies and discus-sions with many researchers, including Drs. Yong Wang (Chapters 3, 8), Dong Xu(Chapter 3), Ling-Yun Wu (Chapter 5), Zhenping Li (Chapters 6, 7, 9), ShihuaZhang (Chapters 6, 7), Guangxu Jin (Chapter 7), Xing-Ming Zhao (Chapters 8,10), and Zhi-Ping Liu (Chapter 11). Collectively and individually, we express ourgratitude to these people for their collaboration.

    LUONAN CHENRUI-SHENG WANGXIANG-SUN ZHANG

    October 2008

    xiv PREFACE

  • ACKNOWLEDGMENTS

    To those colleagues who contributed the materials for this book and shared theirexpertise and vision, the authors express their sincerest gratitude and appreciation.In particular, the authors thank Prof. Dong Xu in University of Missouri,Columbia, Drs. Yong Wang, Ling-Yun Wu, Shihua Zhang, Guangxu Jin inAcademy of Mathematics and Systems Science, CAS, Prof. Zhenping Li in BeijingWuzi University, and Drs. Xing-Ming Zhao, Ruiqi Wang, Achyut Sapkota, Prof.Zengrong Liu in Shanghai University, Dr. Zhi-Ping Liu in Osaka SangyoUniversity for their cooperation in bringing this book to completion.

    LUONAN CHENRUI-SHENG WANGXIANG-SUN ZHANG

    xv

  • LIST OF ILLUSTRATIONS

    Figures

    1.1. The Double-Helix DNA Backbone with Complementary Base Pairs

    1.2. The Double-Helix Structure of a DNA

    1.3. The Central “Dogma” of Molecular Biology

    1.4. The Structure of Eukaryotic Genes and Splicing Process

    1.5. The Mapping Rules (Genetic Codes) from Codons to Amino Acids

    1.6. The Structure of tRNA

    1.7. Ingredients in Cellular Systems in Terms of Network Architecture

    1.8. Omic Data and Biomolecular Networks

    1.9. Systems Biology Focusing on Integrating Omic Data

    1.10. The Research Focus of Network Systems Biology

    1.11. Biomolecular Networks withMajor Computational Tools Applied in This Book

    2.1. Gene Structure and Transcription Process

    2.2. The Whole Process of Gene Expression

    2.3. Scheme of the cDNA Microarray Technique

    2.4. Scheme of the ChIP-Chip Experiment Process

    2.5. Genetic Interactions in Gene Regulatory Networks

    2.6. Illustrations of a Single Node Input-Output Device and a Gene RegulatoryNetwork

    2.7. Structural Organization of Transcription Regulatory Networks

    2.8. Illustrations of a TF Binding to DNA and Starting Transcription

    2.9. Scheme of a Node in a Nonlinear Gene Regulatory Network

    3.1. A Boolean Network for Three Genes

    3.2. An Example of a Simple Bayesian Network

    3.3. Graphical View of a Dynamic Bayesian Network Model

    3.4. A Simple Example of a Markov Network

    3.5. Number of Errors E as Function of Number of Measurements for Four LinearNetworks

    3.6. Critical Number of Measurements Required to Recover the Entire ConnectivityMatrix Correctly versus Network Size for Linear Systems

    xvii

  • 3.7. The Scheme of GRNInfer for Inferring Gene Regulatory Networks

    3.8. A Simulated Example with l ¼ 0 and without Noise3.9. A Simulated Example with Noise

    3.10. Two Connected Subnetworks of the 64-Link Inferred Yeast Cell CycleRegulatory Network

    3.11. The Inferred 35-Link Arabidopsis thaliana Stress Response Regulatory Network

    3.12. Schematic Overview of the Mode-of-Action by Network Identification (MNI)Method

    3.13. Scheme of the Linear Programming (LP) Framework

    3.14. Results of the LP Approach on the Simulated Network

    3.15. Results for SOS Network

    4.1. Scheme for Inferring TRN from Various Kinds of Transcription Data

    4.2. Reconstructing Transcriptional Regulatory Networks by Integrating DNASequence and Gene Expression Information

    4.3. Combinatorial Control in Gene Regulation

    4.4. Expression Profiles of Genes Containing the Motifs Mcm1 and/or SFF

    4.5. An Illustrative Example of a Thermodynamic Model for One TF with TwoBinding Sites

    4.6. Illustration of a Transcription Complex Participating in a Transcription Process

    4.7. Yeast Cell Cycle Transcriptional Regulatory Network

    4.8. Comparison Results of LPMethod Based on TCs, LPMethod Based on mRNALevels of TFs, and SVD Method Based on mRNA Levels of TFs

    4.9. Transcription Regulatory Network for Polyphosphate Metabolism

    4.10. Workflow for Inferring Regulator Activity Profiles from Gene Expression Dataand ChIP-chip Data

    4.11. ICA, PCA, and NCA for a Regulatory System in Which the Output Data AreDriven by Regulatory Signals through a Bipartite Network

    5.1. Overview of a Yeast Two-Hybrid Assay System

    5.2. Mapping Protein–Protein Interactions Using Mass Spectrometry

    5.3. Schematic Representation for Inferring Protein–Protein Interactions fromDomain Information

    5.4. An Illustrative Example of Two Proteins with Four Domains

    5.5. Comparison of VariousMethods for Specificity and Sensitivity on YIP TrainingData and Testing Data

    5.6. Comparison of Various Methods for Specificity and Sensitivity on YIP TestData with Varying Reliability

    5.7. ROC Curves of Prediction Results Based on Multiple Organism (ThreeOrganisms) Data and Single Organism Data, Respectively

    5.8. Comparison of Distributions of Pearson Correlation Coefficients for Top 1000Predicted Interacting Protein Pairs Based on Multiple Organism Data andSingle Organism Data, Respectively

    xviii LIST OF ILLUSTRATIONS

  • 5.9. Numbers of Matched Protein Pairs to MIPS1 among All Predictions by theParsimony Model (PM) and MLE

    5.10. An Illustrative Example of Multidomain Interactions

    5.11. Cooperative Domains in the Complex Crystal Structure Formed by ProteinsP02994 and P32471

    5.12. Comparison of RMSE on Two-Domain Pairs and Multidomain Pairs forKrogan’s Yeast Extended Datasets

    5.13. Comparison of Three Methods for Domain Interaction Prediction

    5.14. Reconstruction of DNA-Directed RNA Polymerase Complex

    6.1. Illustrations of a Random Network and a Scale-Free Network

    6.2. Date Hubs and Party Hubs in Protein Interaction Networks

    6.3. Motifs and Modules in Protein Interaction Networks

    6.4. Proportions of mPHs and mDHs within the Hubs Common in FYI andHCfyi

    6.5. Spatial Distribution of mDHs and mPHs

    6.6. Cellular Localizations of mDHs and mPHs

    6.7. Effects of Deleting mPHs with Their Motifs and mDHs with Their Motifs

    6.8. The Filtered Human Interactome (FHI) Network and Motif Clusters in FHI

    6.9. p Values of Motif Clusters Located between Cancers and Other SignalPathways

    6.10. p Values for Motif Clusters Located between Type II Diabetes Mellitus andOther Signal Pathways

    6.11. Schematic Examples of (a) a Clique Circle Network and (b) a Network withTwo Pairs of Identical Cliques

    6.12. Comparison of Several Methods on Computer-Generated Networks withKnown Community Structure

    6.13. Karate Club Network and Optimal Partition Detected by Modularity Density D

    6.14. Journal Index Network and the value of D versus Different Partitions

    7.1. A Scheme of Biomolecular Network Alignment

    7.2. Illustration of Pairwise Pathway Alignment and Merged Alignment Graph

    7.3. ATutorial Network Alignment Example from PathBLAST Plugin of CytoscapeSoftware with l ¼ 0.5 by MNAligner

    7.4. A Simulated Alignment Example of Two Directed Networks with l ¼ 0.5 byMNAligner

    7.5. Illustration of Well-Matched Subnetworks in Yeast and Fly Protein InteractionNetworks with l ¼ 0.9

    7.6. Three Matched Interspecies Metabolic Pathway Pairs with l ¼ 0.97.7. Two Matched Intra-species Pairs with l ¼ 0.97.8. Illustration of a Scheme for Multiple Network Alignment

    7.9. Biomolecular Network Querying Examples for Multiple Species andConditions

    LIST OF ILLUSTRATIONS xix

  • 7.10. Overview of Biomolecular Network Querying from Perspective of SystemsBiology

    8.1. Distribution of Score Sij for Gavin’s Core Dataset

    8.2. Correlation Analysis of Z score with GO Similarity

    8.3. A Functional Module in Constructed Functional Linkage Network Revealed byStatistical Framework

    8.4. Illustration of Protein Function Prediction Based on Protein Interaction Data andOther Data Sources

    8.5. Comparison of Five Methods for Function Prediction

    8.6. Sensitivity versus 12Specificity for Threshold-Based ClassificationMethod onInterPro Domains

    8.7. Comparison of Threshold-Based Classification Method, SVM, and LogisticRegression for InterPro Domains

    8.8. Sensitivity versus 12Specificity for Threshold-Based ClassificationMethod onPfam-A Domains

    8.9. Comparison of SVM Method, Threshold-Based Classification Method, andLogistic Regression on Pfam-A Domains

    8.10. Results Obtained by Logistic Regression Model with Various Combinations ofInformation Sources for Pfam-A Domains

    9.1. Major Components of the Cellular Metabolism

    9.2. Overview of the Main Metabolic Pathways

    9.3. Illustration of a Reaction Network and Flux Balance Analysis

    9.4. Elementary Modes and Extreme Pathways in a Reaction Network

    9.5. Petri Net Modeling of Different Basic Reactions: Synthesis, Decomposition,Catalysis, Inhibition, and Reversible Reaction

    9.6. Flowchart of Reconstruction of a Genome-Scale Metabolic Network

    9.7. An Illustrative Metabolic Network

    9.8. Graphical Illustration of the Integer Linear Programming Model

    9.9. Comparison of LP and ILP on Various Metabolic Pathways in Terms ofImplementing CPU Time and Average Damage

    9.10. Comparison of LP and ILP on E. coli Entire Metabolic Network in Terms ofImplementing CPU Time and Average Damage

    9.11. Fraction of Essential Enzymes Plotted against Enzymes with Certain Damage

    10.1. Coarse-Grained View of Signal Transduction

    10.2. Example of a Petri Net

    10.3. An Enzyme-Catalyzed Reaction Formulated Using Various Models

    10.4. The MAPK Signaling Pathways for Yeast

    10.5. Pheromone Response Signaling Pathways and Networks

    10.6. Signaling Pathways of Filamentous Growth

    10.7. Yeast MAPK Signaling Networks Detected by ILPModel from Integrated Data

    xx LIST OF ILLUSTRATIONS

  • Tables

    2.1. Some Microarray Databases and Their Websites

    2.2. Some Experimental and Predicted Transcription Factor Databases

    3.1. Accuracies in Terms of Different Error Criteria and Confidence Evaluation

    3.2. The SOS Network and Predicted Perturbations for E. coli

    4.1. Several Databases of TF Binding Sites

    4.2. Some Software for Searching TF Binding Sites

    4.3. Databases of Promoters and TSSs

    4.4. TFs Related to Yeast Cell Cycle and Their Transcription Complexes

    4.5. p Values of Periodicity for Some TFs Related to Cell Cycle

    4.6. TFs Related to Polyphosphate Metabolism and Their Transcription Complexes

    5.1. Some Databases of Protein–Protein Interactions

    5.2. Major Databases of Domain–Domain Interactions

    5.3. Comparison of Various Methods for RMSE and Training Time onYIP Data

    5.4. Comparison of Various Methods for Average RMSE and Training Time onTHY Data

    5.5. Performance of Various Methods in Terms of Correlation Coefficient on YIP

    5.6. Results of Permutation Tests on Protein Interaction Data from Three Species

    5.7. Number of Matched Domain Interactions with iPfam

    5.8. Numbers of Matched Protein Pairs to MIPS1 among All Predictions

    6.1. Statistical Significance of Differences between SAMCs of mDHs and mPHs

    6.2. p Values for Pathogenesis Pathways with Respect to Cancers

    6.3. Performance Comparison of Three Community Detection Methods onSymmetric and Asymmetric Networks

    6.4. Performance Comparison of Three Community Detection Methods for ModelSelection

    7.1. Software Tools for Network Alignment or Pathway Querying

    8.1. Gene and Protein Function Annotation Databases

    8.2. Selected Functional Categories and Numbers of Annotated Genes

    8.3. Results of Tenfold Cross-Validation Using Five Methods Averaged over 13Classes

    8.4. Prediction Results Using Five Methods Averaged over 13 Classes

    8.5. Selected GO Terms for InterPro Domains

    8.6. Selected GO Terms for Pfam-A Domains

    8.7. Tenfold Cross-Validation Results Averaged over 20 Classes by Threshold-Based Classification Method on InterPro Domains

    8.8. Tenfold Cross-Validation Results Averaged over 20 Classes by SVMs onInterPro Domains

    LIST OF ILLUSTRATIONS xxi

  • 8.9. Tenfold Cross-Validation Results Averaged over 10 Classes by Threshold-Based Classification Method on Pfam-A Domains

    8.10. Tenfold Cross-Validation Results Averaged over 10 Classes by SVMs onPfam-A Domains

    8.11. Top Five Domains Assigned to Each GO Function Class with HighestProbabilities

    9.1. Six Major Classes of Enzymes According to Enzyme Commission

    9.2. Some Databases of Metabolic Pathways

    9.3. List of Drug Targets for SomeDrugs Detected by ILPApproach with Validation(Vd) Status

    10.1. List of Signal Transduction Databases

    10.2. Comparison of Different Methods for Detecting Pheromone Pathways on theBasis of Protein Interaction Data

    10.3. Comparison of Different Methods for Detecting Filamentation Pathway on theBasis of Protein Interaction Data

    10.4. Protein Interaction Data and Gene Expression Data for Detecting Yeast MAPKPathways

    10.5. p Values of Functional Enrichment for Pheromone Response SignalingNetwork Found by ILP

    10.6. Performance of ILP Model in Detecting MAPK Signaling Networks

    xxii LIST OF ILLUSTRATIONS

  • ACRONYMS

    2DE Two-dimensional gel electrophoresisAGPS Annotating genes with positive samplesAMC Average motif correlationANOVA Analysis of varianceAPM Association probabilistic methodAPMM Association probabilistic method with multidomain pairsASNM Association numerical methodBIND Biomolecular Interaction Network DatabaseBOLS Bayesian orthogonal least squaresBTR Binary transitive reductionCAGE Cap analysis of gene expressionCATH Class, Architecture, Topology, and Homologous superfamily databaseCBB Coomassie Brilliant BlueCBM Conserved Binding Mode databaseCD Czekanowski-Dice (distance)CDD Conserved Domain DatabaseChIP Chromatin immunoproteinCOG Cluster(s) of orthologous groupsCPM Clique percolation methodDBN Dynamical Bayesian networkDDI Domain–domain interactionDDIB Database of domain interactions and bindingsDIP Database of Interacting ProteinsDP Dynamic programmingDPEA Domain pair exclusion analysisEBI European Bioinformatics InstituteEC Enzyme CommissionEGF Epidermal growth factorEM Expectation maximizationERF Ethylene response factorERK Extracellular signal-regulated kinaseFA Factor analysisFBA Flux balance analysisGAP GTPase-activating proteinGEF Guanine exchange factor

    xxiii

  • GEO Gene expression omnibusGGM Graphical Gaussian modelGO Gene ontologyGRN Gene regulatory networkHCS Highly connected subgraphICA Independent component analysisILP Integer linear programmingIQP Integer quadratic programmingIST Interaction sequence tagKEGG Kyoto Encyclopedia of Genes and GenomesLP Linear programmingLPBN LP-based method for binary interaction dataLPNM LP-based method for numerical interaction dataM3D Many Microbe Microarray DatabaseMAMC Mean of average motif correlationsMAPK Mitogen-activated protein kinaseMCL Markov clustering algorithmMCMC Markov chain Monte CarloMCSA Monte Carlo simulated annealingmDH Motif date hubMESC Minimum exact set coverMFGO Modified and faster global optimizationMILP Mixed-integer linear programmingMIPS Munich Information Center for Protein SequencesMLE Maximum-likelihood estimationMNI Modification by network identificationMODY Mature-onset diabetes of the youngmPH Motif party hubMRF Markov random fieldMS Mass spectrometryMSC Minimum set coverMSSC Maximum-specificity set coverNCA Network component analysisNIR Network identification by (multiple) regressionNMR Nuclear magnetic resonanceODE Ordinary differential equationORF Open reading framePCA Principal-component analysisPCC Pearson correlation coefficientPCR Polymerase chain reactionPDB Protein Data BankPDE Partial differential equationPDGF Platelet-derived growth factorPE Parsimony explanationPLDE Piecewise-linear differential equation

    xxiv ACRONYMS

  • PLS Partial least squaresPPI Protein–protein interactionPVC Pseudovertex collapsePWM Position weight matrixQP Quadratic programmingRBF Radial basis functionRKIP Raf kinase inhibitor proteinRMSE Root-mean-square errorRNAP RNA polymeraseRNSC Restricted neighborhood search clusteringROC Receiver operating characteristicSAGE Serial analysis of gene expressionSAMC Standard deviation of average motif correlation(s)SCOP Structural classification of proteinsSDE Stochastic differential equationSGD Stanford Gene expression DatabaseSIM Signal-input motifSPA Selective permissibility algorithmSPC Superparametric clusteringSTN Signal transduction networkSVD Singular value decompositionSVM Support vector machineTAIR The Arabidopsis Information ResourcesTAP Tandem affinity purificationTC Transcription complexTF Transcription factorTFA Transcription factor activityTFBS Transcription factor binding siteTRN Transcriptional regulatory networkTSNI Time-series network identificationTSS Transcription start siteY1H Yeast one-hybridY2H Yeast two-hybrid

    ACRONYMS xxv

  • CHAPTER 1

    INTRODUCTION

    1.1 BASIC CONCEPTS IN MOLECULAR BIOLOGY

    We introduce some basic and central concepts in modern molecular biology in thissection to help readers understand the related problems discussed in the later chapters.Note that this is a very general and brief introduction, and arranged mainly forcomputer scientists andmathematicians who are trying to acquire a reading knowledgeabout molecular biology. Biology-oriented researchers can skip the details in thissection. For more detailed and systematic biological knowledge, readers can refer toprofessional books (e.g., [Sta02], [Kar02], [Bro02], [Sad07]).

    All living things, whether simple or complex organisms, are composed of cells,which are the basic units of structure and function in an organism [Sta02]. Eachcell is a complex system consisting of many different building blocks. According totheir sizes and types of internal structures, cells are classified as prokaryotic cellsand eukaryotic cells, which, in turn, distinguish organisms into prokaryotic organisms(or prokaryotes) and eukaryotic organisms (or eukaryotes). Prokaryotic organisms,represented by bacteria and blue algae, are made up of prokaryotic cells that are smal-ler and have simpler internal structures, whereas eukaryotic organisms such as fungi,plants, and animals are composed of structurally complex eukaryotic cells [Kar02].The distinction between eukaryotes and prokaryotes leads to the vast differencesbetween many cellular building blocks and life processes in these two organism types.

    Both eukaryotic and prokaryotic cells contain a nuclear region with the geneticmaterials of living organisms. However, the genetic materials of a prokaryotic cell

    Biomolecular Networks. By Luonan Chen, Rui-Sheng Wang, and Xiang-Sun ZhangCopyright # 2009 John Wiley & Sons, Inc.

    1

  • are contained in a nucleoid without a boundary membrane, whereas a eukaryotic cellhas a nucleus that is separated from the rest of the cell by a complex membranousstructure or nuclear envelope. Note that besides nuclear membrane, both prokaryotesand eukaryotes have cell membranes or plasma membranes, which regulate the flowof nutrients, energy, and information in and out of the cell and play important rolesin signal transduction. Despite this difference, eukaryotic cells have a molecularchemistry composition similar to that of prokaryotic cells. For example, both eukary-otic and prokaryotic organisms possess a genome in their cell that contains the biologi-cal genetic information needed to maintain life in that organism. Another essentialfeature of most living cells is their ability to reproduce and grow in an appropriateenvironment through cell division. New cells are generated from the reproductionof existing cells to maintain the life in living beings.

    Cells consist of four basic types of molecules: (1) small molecules, (2) DNA, (3)RNA, and (4) protein. Small molecules in cells include water, sugars, fatty acids,amino acids, and nucleotides. They are either the basic building blocks of the macro-molecules (DNA, RNA, proteins) or independent units with important roles, such assignal transduction and energy sources. Most eukaryotic and prokaryotic genomesconsist of deoxyribonucleic acid (DNA), but a few viruses have ribonucleic acid(RNA) genomes [Bro02]. DNA and RNA are polymeric large molecules made upof chains of monomeric subunits.

    DNA is the hereditary material in almost all organisms. Most DNA is located in thecell nucleus, but a small amount of DNA can also be found in the mitochondria. DNAis a linear polymer of four chemically distinct nucleotides consisting of three com-ponents: 20-deoxyribose (a type of sugar composed of five carbon atoms labeledfrom 10 to 50), a phosphate group attached to the 50-carbon of the sugar, and a nitrogen-ous base. Four kinds of nucleotides differ in their nitrogenous bases: adenine (A),cytosine (C), guanine (G), and thymine (T), which are usually referred to as bases,denoted by their initial letters, A, C, G, and T (Fig. 1.1). Hence, a DNA sequencecan always be denoted by a string of A, C, G, T. Individual nucleotides are linkedby phosphodiester bonds between their 50-carbon and 30-carbon in any order toform a DNA chain called a polynucleotide. A DNA molecule is actually double-stranded, and its nucleotide bases on two strands form complementary pairs: A pairingwith T, and C pairing with G. The orientations of DNA strands are determined by thecarbons at their ends which conventionally start from the 50 ends to the 30 ends(Fig. 1.1). The two strands are tied together and form a stable structure known as theDNA double helix, which was identified in 1953 in Cambridge by Watson and Crick.(Fig. 1.2).

    RNA is also a polynucleotide, and its structure is similar to that of DNA except fortwo main differences [Bro02]: (1) the sugar in a RNA nucleotide is ribose rather thandeoxyribose, and (2) RNA contains uracil (U) instead of thymine (T). In addition, thestructure of RNA generally does not form a double helix as does the structure of DNA.The functions of DNA and RNA for living cells are also different. Generally, DNA isresponsible for encoding genetic information and performs one essential function,while several types of RNA perform different functions, such as ribosomal RNAsand transfer RNAs. RNA also contains 30 –50 phosphodiester bonds, but these

    2 INTRODUCTION