52
CSCE555 Bioinformatics CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: http://www.scigen.org/csce555 University of South Carolina Department of Computer Science and Engineering 2008 www.cse.sc.edu .

CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Embed Size (px)

Citation preview

Page 1: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

CSCE555 BioinformaticsCSCE555 BioinformaticsLecture 18 Network Biology

Meeting: MW 4:00PM-5:15PM SWGN2A21

Instructor: Dr. Jianjun Hu

Course page: http://www.scigen.org/csce555

University of South CarolinaDepartment of Computer Science and Engineering

2008 www.cse.sc.edu.

Page 2: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

OutlineOutline

Biological Networks & DatabasesBackground of graphs and

networksThree types of bio-network

analysis◦Network statistics◦Network based functional annotation◦Bio-network reconstruction/inference

Summary

04/21/23 2

Page 3: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Why network analysis: Why network analysis: Building models from parts Building models from parts listslists

Systems Biology view

Page 4: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

BIOLOGICAL NETWORKSBIOLOGICAL NETWORKS

Networks are found in biological systems of varying scales:

1. Evolutionary tree of life

2. Ecological networks 3. Expression networks4. Regulatory networks

- genetic control networks of organisms

5. The protein interaction network in cells6. The metabolic network in cells… more biological networks

Page 5: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Examples of Biological Examples of Biological NetworksNetworks

Metabolic NetworksSignaling NetworksTranscription Regulatory

NetworksProtein-Protein Interaction

Networks

5

Page 6: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Signaling & Metabolic Pathway Signaling & Metabolic Pathway NetworkNetworkA Pathway can be defined as a modular

unit of interacting molecules to fulfill a cellular function.

Signaling Pathway Networks◦ In biology a signal or biopotential is an electric

quantity (voltage or current or field strength), caused by chemical reactions of charged ions.

◦ refer to any process by which a cell converts one kind of signal or stimulus into another.

◦ Another use of the term lies in describing the transfer of information between and within cells, as in signal transduction.

Metabolic Pathway Networks◦ a series of chemical reactions occurring within a cell,

catalyzed by enzymes, resulting in either the formation of a metabolic product to be used or stored by the cell, or the initiation of another metabolic pathway

Page 7: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

A Signaling Pathway ExampleA Signaling Pathway Example

Page 8: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

A Metabolic Pathway ExampleA Metabolic Pathway Example

Page 9: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Regulatory NetworkRegulatory Network

Page 10: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Expression NetworkExpression Network A network representation of genomic data. Inferred from genomic data, i.e. microarray.

Gene co-expression network. Each node is a gene. Edge: co-expression relationship

Page 11: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

11

Example of a PPI NetworkExample of a PPI Network Yeast PPI network Nodes – proteins Edges – interactions

The color of a node indicates the phenotypic effect of removing the corresponding protein (red = lethal, green = non-lethal, orange = slow growth, yellow = unknown).

Page 12: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

How do we know that proteins How do we know that proteins interact? (PPI Identification interact? (PPI Identification methods)methods) Data

◦Yeast 2 hybrid assay◦Mass spectrometry◦Correlated m-RNA expression◦Genetic interactions

Analysis

◦Phylogenetic analysis◦Gene neighbors◦Co-evolution◦Gene clusters

Also see: Comparative assessment of large-scale data sets of protein-protein interactions – von Mering

12

Page 13: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Protein Interaction Protein Interaction DatabasesDatabases Species-specific

◦ FlyNets - Gene networks in the fruit fly ◦ MIPS - Yeast Genome Database ◦ RegulonDB - A DataBase On Transcriptional Regulation in

E. Coli ◦ SoyBase ◦ PIMdb - Drosophila Protein Interaction Map database

Function-specific◦ Biocatalysis/Biodegradation Database ◦ BRITE - Biomolecular Relations in Information

Transmission and Expression ◦ COPE - Cytokines Online Pathfinder Encyclopaedia ◦ Dynamic Signaling Maps ◦ EMP - The Enzymology Database ◦ FIMM - A Database of Functional Molecular Immunology ◦ CSNDB - Cell Signaling Networks Database

Page 14: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Protein Interaction Protein Interaction DatabasesDatabases Interaction type-specific

◦ DIP - Database of Interacting Proteins ◦ DPInteract - DNA-protein interactions ◦ Inter-Chain Beta-Sheets (ICBS) - A database of protein-

protein interactions mediated by interchain beta-sheet formation

◦ Interact - A Protein-Protein Interaction database ◦ GeneNet (Gene networks)

General◦ BIND - Biomolecular Interaction Network Database ◦ BindingDB - The Binding Database ◦ MINT - a database of Molecular INTeractions ◦ PATIKA - Pathway Analysis Tool for Integration and

Knowledge Acquisition ◦ PFBP - Protein Function and Biochemical Pathways

Project ◦ PIM (Protein Interaction Map)

Page 15: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Pathway DatabasesPathway Databases KEGG (Kyoto Encyclopedia of Genes and Genomes)

http://www.genome.ad.jp/kegg/ Institute for Chemical Research, Kyoto University

PathDB http://www.ncgr.org/pathdb/index.html National Center for Genomic Resources

SPAD: Signaling PAthway Database Graduate School of Genetic Resources Technology.

Kyushu University. Cytokine Signaling Pathway DB.

Dept. of Biochemistry. Kumamoto Univ. EcoCyc and MetaCyc

Stanford Research Institute BIND (Biomolecular Interaction Network Database)

UBC, Univ. of Toronto

Page 16: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

KEGGKEGG Pathway Database: Computerize current knowledge

of molecular and cellular biology in terms of the pathway of interacting molecules or genes.

Genes Database: Maintain gene catalogs of all sequenced organisms and link each gene product to a pathway component

Ligand Database: Organize a database of all chemical compounds in living cells and link each compound to a pathway component

Pathway Tools: Develop new bioinformatics technologies for functional genomics, such as pathway comparison, pathway reconstruction, and pathway design

Page 17: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Network Properties

Page 18: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

18

Properties of networksProperties of networks Small world effectTransitivity/ ClusteringScale Free EffectMaximum degreeNetwork Resilience and robustnessMixing patterns and assortativityCommunity structureEvolutionary originBetweenness centrality of vertices

Page 19: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Biological Networks Biological Networks PropertiesProperties Power law degree distribution: Rich get

richerSmall World: A small average path length

◦Mean shortest node-to-node pathRobustness: Resilient and have strong

resistance to failure on random attacks and vulnerable to targeted attacks

Hierarchical Modularity: A large clustering coefficient◦How many of a node’s neighbors are

connected to each other

Page 20: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

20

Graph TerminologyGraph TerminologyNode Edge

Directed/UndirectedDegree

Shortest Path/Geodesic distanceNeighborhood

SubgraphComplete Graph

CliqueDegree Distribution

Hubs

Page 21: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

GraphsGraphs Graph G=(V,E) is a set of vertices V and edges E

A subgraph G’ of G is induced by some V’ V and E’ E

Graph properties:◦ Connectivity (node degree, paths)◦ Cyclic vs. acyclic◦ Directed vs. undirected

Page 22: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Network MeasuresNetwork Measures

Degree ki

Degree distribution P(k)

Mean path length

Network Diameter

Clustering Coefficient

Page 23: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Paths:metabolic, signaling pathways

Cliques:protein complexes

Hubs:regulatory modules

Subgraphs:maximally weighted

Network AnalysisNetwork Analysis

Page 24: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Sparse Sparse vsvs Dense Graphs Dense GraphsG(V, E) where |V|=n, |E|=m the

number of vertices and edges

Graph is sparse if m~n

Graph is dense if m~n2

Complete graph when m=n2

Page 25: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Connected ComponentsConnected Components

G(V,E)|V| = 69|E| = 71

Page 26: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Connected ComponentsConnected Components

G(V,E)|V| = 69|E| = 716

connected components

Page 27: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

PathsPathsA path is a sequence {x1, x2,…, xn} such that (x1,x2), (x2,x3), …, (xn-1,xn) are edges of the graph.

A closed path xn=x1 on a graph is called a graph cycle or circuit.

Page 28: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Shortest-Path between Shortest-Path between nodesnodes

Page 29: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Shortest-Path between Shortest-Path between nodesnodes

Page 30: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Longest Shortest-PathLongest Shortest-Path

Page 31: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Network Measures: Network Measures: DegreeDegree

Page 32: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

P(k) is probability of each degree k, i.e fraction of nodes having that degree.

For random networks, P(k) is normally distributed.

For real networks the distribution is often a power-law:

P(k) ~ k

Such networks are said to be scale-free

Degree DistributionDegree Distribution

Page 33: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

1

2

2

kk

nkn

C III

k: neighbors of I

nI: edges between

node I’s neighbors

The density of the network surrounding node I, characterized as the number of triangles through I. Related to network modularity

The center node has 8 (grey) neighbors

There are 4 edges between the neighbors

C = 2*4 /(8*(8-1)) = 8/56 = 1/7

Clustering CoefficientClustering Coefficient

Page 34: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Interesting Properties of Network Types

Page 35: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Small-world Network Small-world Network Every node can be reached from every

other by a small number of hops or steps

High clustering coefficient and low mean-shortest path length◦ Random graphs don’t necessarily have high

clustering coefficients

Social networks, the Internet, and biological networks all exhibit small-world network characteristics

Page 36: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

36

Small world effectSmall world effectmost pairs of vertices in the network seem to

be connected by a short path

l is mean geodesic distance dij is the geodesic distance between vertex i and vertex j

l ~ log(N)

Page 37: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Scale-Free Networks are Scale-Free Networks are RobustRobustComplex systems (cell, internet, social

networks), are resilient to component failure

Network topology plays an important role in this robustness◦ Even if ~80% of nodes fail, the remaining ~20% still

maintain network connectivity

Attack vulnerability if hubs are selectively targeted

In yeast, only ~20% of proteins are lethal when deleted, and are 5 times more likely to have degree k>15 than k<5.

Page 38: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Hierarchical NetworksHierarchical Networks

Page 39: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Detecting Hierarchical Detecting Hierarchical OrganizationOrganization

Page 40: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Other Interesting FeaturesOther Interesting FeaturesCellular networks are assortative, hubs tend

not to interact directly with other hubs.

Hubs tend to be “older” proteins (so far claimed for protein-protein interaction networks only)

Hubs also seem to have more evolutionary pressure—their protein sequences are more conserved than average between species (shown in yeast vs. worm)

Experimentally determined protein complexes tend to contain solely essential or non-essential proteins—further evidence for modularity.

Page 41: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

41

Network ModelsNetwork Models

Random NetworkScale free NetworkHierarchical Network

Page 42: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

42

Random Network IRandom Network I

The Erdös–Rényi (ER) model of a random network starts with N nodes and connects each pair of nodes with probability p, which creates a graph with approximately pN(N–1)/2 randomly placed links

The node degrees follow a Poisson distribution

Page 43: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

43

Random Network IIRandom Network II

Mean shortest path l ~ log N, which indicates that it is characterized by the small-world property. Random graphs have served as idealized models of certain gene networks, ecosystems and the spread of infectious diseases and computer viruses.

Page 44: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

44

Page 45: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

45

Scale Free Networks IScale Free Networks I

Power-law degree distribution: P(k) ~ k –γ, where γ is the degree exponent. Usually 2-3 The network’s properties are determined by hubsThe network is often generated by a growth process called Barabási–Albert model

Page 46: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

46

Scale Free Networks IIScale Free Networks IIScale-free networks with degree exponents 2<γ<3, a range that is observed in most biological and non-biological networks like the Internet backbone, the World Wide Web, metabolic reaction network and telephone call graphs.

The mean shortest path length is proportional to log(n)/log(log(n))

Page 47: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

PREFERENTIAL ATTACHMENT on Growth: the probability that a new vertex will be connected to vertex i depends on the connectivity of that vertex:

How Scale-free networks are How Scale-free networks are formed? formed?

( ) ii

jj

kk

k

Page 48: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

48

Hierarchical Networks IHierarchical Networks I To account for the coexistence of

modularity, local clustering and scale-free topology in many real systems it has to be assumed that clusters combine in an iterative manner, generating a hierarchical network

The hierarchical network model seamlessly integrates a scale-free

topology with an inherent modular structure by generating a network that has a power-law degree distribution with degree exponent γ = 1 + ln4/ln3 = 2.26

Page 49: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

49

Hierarchical Networks IIHierarchical Networks IIIt has a large system-size independent

average clustering coefficient <C> ~ 0.6. The most important signature of hierarchical modularity is the scaling of the clustering coefficient, which follows C(k) ~ k –1 a straight line of slope –1 on a log–log plot

A hierarchical architecture implies that sparsely connected nodes are part of highly clustered areas, with communication between the different highly clustered neighborhoods being maintained by a few hubsSome examples of hierarchical scale free networks.

Page 50: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Problems of Network Problems of Network BiologyBiology Network Inference

Micro Array, Protein Chips, other high throughput assay methods

Function prediction

The function of 40-50% of the new proteins is unknown

Understanding biological function is important for: Study of fundamental biological processes Drug design Genetic engineering

Functional module detection Cluster analysis

Topological Analysis Descriptive and Structural Locality Analysis Essential Component Analysis

Dynamics Analysis Signal Flow Analysis Metabolic Flux Analysis Steady State, Response, Fluctuation Analysis

Evolution Analysis Biological Networks are very rich networks with very limited,

noisy, and incomplete information. Discovering underlying principles is very challenging.

50

Page 51: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

SummarySummaryThe problem: Identify

Differentially expressed genes from Microarray data

How to identify: t-test and Rank product

How to evaluate significance of identified genes

Page 52: CSCE555 Bioinformatics Lecture 18 Network Biology Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Reference & Reference & AcknowledgementsAcknowledgements Albert Barabasi et al

◦ Network Biology: understanding the cell’s functional organization

Jing-Dong et al◦ Evidence for dynamically organized

modularity in the yeast protein–protein interaction network

Woochang Hwang