Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Introduction to Network Analysis in Systems Biology
Avi Ma’ayan, Ph.D.
Department of Pharmacology and Systems Therapeutics
Systems Biology Center New York (SBCNY)
Mount Sinai School of Medicine New York, NY
Lecture 1 Representation of biological systems as networks
1
Two Fundamental Ways to Abstract Biochemical Reactions
Eisenberg et al. Nature 405:823 (2000) 2
C1
C2
A1
A3D2
D1
D3
A2 B1
a b
c
e
E1
E2 E3 C1
C2
A1
A3D2
D1
D3
A2 B1 E1
E2 E3
C1
C2
A1
A3D2
D1
D3
A2 B1 E1
E2 E3
d
C1
C2.2
A1.2
A3D2
D1
D3.2
A2 B1 E1
E2 E3
A1.1
C2.1D3.1
C1
C2.2
A1.2
A3
D2 D1
D3.2
A2 B1 E1E2
E3
A1.1C2.1
D3.1
Ma'ayan et al. Annu Rev Biophys Biomol Struct. 34:319-349 (2005)
Different Levels of System Representation
A- gene ontology
B- protein-protein interactions
(undirected graphs)
C- signaling network diagrams
(mixed graphs,
directed/undirected)
D- ODE modeling of signaling
pathways (directed and weighted)
E- PDE modeling of signaling
pathways considering space
(directed, weighted and nodes
can move or be at different
compartments)
3
Graph Theory - Basic Concepts G = {V, E, A}
G – graph
V – vertices/nodes
E – edges/links
A- arcs/directed edges/arrows
Planar Graphs: when
there are no edge
crossing
Bipartite Graphs: two
sets of nodes; links
only between
members of each set http://en.wikipedia.org 4
Metabolic Networks
• Two types of nodes: enzymes and substrates
• Reactions can be directional or bidirectional
• Bipartite graph, reactions are not connected
and substrates are not connected
Bourqui et al. BMC Systems Biology 1:29 (2007)
Glycolysis
Berg et al. Biochemistry
New York: W. H. Freeman and Co.; c2002 5
• Nodes are proteins, metabolites, lipids,
second messengers, or peptides
• Interactions designate information flow, can
be activation or inhibition, and are direct and
physical
Gi/o Pathway
Cell Signaling
Pathways
Ma'ayan A, et al. Sci Signal. 2:cm1 (2009)
6
Signaling
pathways are not
isolated and can
be merged into
large networks
Ma’ayan et al. Science 310, 1078 (2005)
Cell Signaling Networks
7
Indirect Signaling Interactions from Literature
Li et al. PLoS Biol. 4:e312 (2006)
Pseudo-nodes
are used as
place holders to
fill-in unknown
links and
components
8
Kinase-Substrate Network
Protein kinase- substrate networks are directed bipartite graphs that connect kinases to their substrates through protein phosphorylation
Tan et al. Sci Signal. 2009 Jul 28;2(81):ra39 9
Example of Gene Regulation Networks
MacArthur et al., PLoS ONE 3: e3086 (2008)
Stem cell differentiation regulation
• Nodes are genes and transcription factors
• Interactions can be directional or bidirectional
• Interactions can be activation or inhibition
10
• Nodes are genes, transcription
factors or signaling components
• Interactions are directional and
can be activation or inhibition
Drosophila Segment Polarity Expression Pattern
Another Example of a Gene Regulation Network
Albert R, Othmer HG. J Theor Biol. 2003 223(1):1-18.
11
Network Construction from Legacy Literature
• Manual
• Semi-automated (i.e. preBIND)
• Natural Language Processing (NLP) (i.e. PathwayStudio)
Donaldson I, et al. BMC Bioinformatics. 4:11 (2003)
preBIND
12
PPI Networks from Y2H Screens
• Yeast
Does the small overlap between the
two studies mean that high-
throughput Y2H screens are not
identifying real interactions? 13
PPI Networks from Y2H Screens
Giot et al. Science 1727:302 (2003)
Fly Worm
Li et al. Science 540:303 (2004)
14
PPI Networks from Y2H Screens
• Human
Blue- literature
Red- Y2H screen (~78% verified by Co-IP)
• Defined different levels
of confidence
• Identified disease
genes
• Assessed overlap with
literature-based
interactions
• Used GO annotation
15
Epistasis Networks: Inferring Networks by Double Deletion Mutants
291 genetics
interactions
among 204
yeast genes
Hin Yan Tong, Science 294: 2364 (2001) 16
Epistasis Interactions in Yeast Metabolism
Segre et al., Nature Genetics 37:77 (2004)
Two types of links:
buffering and aggravating
Links can be directional
or bi-directional
17
Inferring Networks from Time Series Microarrays
Zou M, Conzen SD. Bioinformatics. 2005 21(1):71-9. 18
Perturbations and Bayesian Networks Networks can be inferred using targeted pertrubations
Sachs et al. Science. 2005 308:523-9 19
Disease Gene Networks
Each node corresponds to a distinct disorder, colored based on the disorder class. The size of
each node is proportional to the number of genes in the corresponding disorder, and the link
thickness is proportional to the number of genes shared by the disorders connected by the link.
Goh et al. Proc Natl Acad Sci USA. (2007) 104:8685-90
20
Drug-Target Networks
Ma’ayan et al. Mt Sinai J Med (2007) 74:27
Yildirim et al. Nat Biotechnol. (2007) 25:1110
Drugs can be connected to their known protein targets
21
Bipartite Networks for Data Integration
Tanay et al. PNAS (2004) 101:2981
Gene IDs can be used as
anchors for integrating
different omics datasets
22
Pajek - Free Windows Software to Visualize Networks
http://vlado.fmf.uni-lj.si/pub/networks/pajek/ 23
Cytoscape - Leading Academic Network Analysis and Visualization Software
Shannon et al. Genome Res. 2003 13(11):2498-504 24
Summary
• Different types of biological intracellular molecular networks can be represented by different types of graphs
• Networks can be created from collecting interactions published in many papers, or networks can be reconstructed directly from data
• Protein interaction networks and cell signaling networks can be connected to drugs and diseases
• Network representation can be used to integrate different datasets using genes as anchors
25
Introduction to Network Analysis in Systems Biology
Avi Ma’ayan, Ph.D.
Department of Pharmacology and Systems Therapeutics
Systems Biology Center New York (SBCNY)
Mount Sinai School of Medicine New York, NY
Lecture 2 Milestones and key concepts in network analysis
26
Konigsberg Bridge Problem
27
What are the mathematical consequences of throwing on the floor a random number of
buttons and randomly connecting them with a random number of links?
In the 1960’s Paul Erdos and Alfred Renyi studied the properties of random graphs.
P. Erdos A. Rényi. Publ. Math.
(Debrecen) 6, 290-297 (1959)
28
“Real” Networks are “Small World”
Watts DJ, Strogatz SH. Collective dynamics of 'small-world' networks.
Nature. 1998 Jun 4;393(6684):440-2. 29
Clustering Coefficient
Characteristic Path Length
Average shortest path from between all possible pairs of nodes
Ravasz et al. Science 297, 1551 (2002)
30
Creating Small-World Networks
Watts DJ, Strogatz SH. Collective dynamics of 'small-world' networks. Nature. 1998 Jun 4;393(6684):440-2.
31
Barabasi’s group analyzed databases of metabolic networks in lower organisms
and the protein-protein interactions map of the yeast proteome inferred from high-
throughput yeast-2-hybrid screens. All shown to have scale-free connectivity
distribution.
Barabasi, Albert and colleagues found that many real networks including the
Internet and the WWW are scale-free. This means that the connectivity
distribution of nodes fits a power-law.
Jeong et al. Nature 407, 651 (2000) Jeong et al. Nature 411, 41 (2001)
Barabasi and Albert. Science
286, 509 (1999)
“Real” Networks are “Scale Free”
32
Erdos-Renyi random networks vs.
Barabasi-Albert scale-free networks
Barabasi, Physics World, July 2001 33
Creating Scale-Free Networks
Barabasi and Albert. Science 286, 509 (1999)
34
The Importance of Hubs
Albert R, Jeong H, Barabasi A-L: Error and attack tolerance of complex
networks. Nature 2000, 406(6794):378-382.
H. Jeong, S. P. Mason, A.-L. Barabási and Z. N. Oltvai. Lethality and centrality
in protein networks. Nature 411, 41-42 (2001)
35
Creating Scale-Free Networks using
Duplication-Divergence Growth
Vázqueza et al. Complexus 1:1 (2003)
The network grows by
copying a node with its
links, then some links are
deleted with probability p,
and a link is formed
between the copied node
and the new node with
probability q.
36
Creating Geometric Random Networks
Throwing a bunch of buttons
in N-dimensions and
connecting buttons if they
are close in Euclidian space
(geometric distance between
nodes)
Przulj et al. Bioinformatics. 2004 20:3508 37
Network Motifs are Recurring Patterns of
Connectivity
Motifs are those circuits that are statistically more prevalent in real
networks vs. motifs found in randomized networks
Milo et al. Science, 298, 824 (2002)
38
Evolutionary conservation of motif constituents in the yeast protein interaction network
S Wuchty et al.
Nature Genetics
35, 176 – 179 (2003)
Graphlets – motifs in
undirected networks
39
Considering Protein Structure of Hubs
Hub proteins are either
multi or single site
Kim et al. Science 314, 1938 (2006)
40
Bow-Tie Structure of Signaling Networks
Oda and Kitano. Molecular Systems Biology 2:2006.0015 (2006)
41
Hierarchical Organization of Pathways from Ligands to Effectors
A topology common for systems that need to make discrete
decisions based on a continues complex state of the environment
Ma'ayan et al. Phys Rev E Stat Nonlin Soft Matter Phys. 2006 73:061912
Power-law distribution of branched pathways
42
General Topological Properties
of Biomolecular Networks
Ma'ayan A. J Biol Chem. 2009 284(9):5451
A- power-law connectivity distribution
B- party hubs and date hubs
C- multi-site and single-site hubs
D- power-law distribution of branched
pathways
E- bow-tie structure of signaling pathways
F- bifans, the most common motifs
G- negative feedback loops at the
membrane
H- monotone system topology
I- nesting of positive feedback loops
43
Ma’ayan et al. PNAS105:19235 (2010) 44
Ma’ayan et al. PNAS105:19235 (2010) 45
Ma’ayan et al. PNAS105:19235 (2010) 46
MacArthur, Sanchez-Garcia and Ma’ayan, Phys. Rev. Lett. 104, 168701 (2010) 47
Summary • Real networks are “small world” and “scale free”
• Simple algorithms can recreate the structure of real networks
• Shuffled networks are created for statistical control
• Network motifs and graphlets define the topology at the microscopic level
• Real biological regulatory networks have “date-and-party hubs”, hubs are either multi or single site, pathways branching follows a power-law, signaling networks display bow-tie structure, bifans are highly enriched, feedback loops are depleted and nested to provide dynamical stability. 48
Introduction to Network Analysis in Systems Biology
Avi Ma’ayan, Ph.D.
Department of Pharmacology and Systems Therapeutics
Systems Biology Center New York (SBCNY)
Mount Sinai School of Medicine New York, NY
Lecture 3 Making predictions using network analysis
49
Making Predictions based on Network Topology
Proteins close to each other in the interactome
network are also likely to share GO terms
Sharan et al. Molecular Systems Biology 3, 88 2007 50
Making Predictions based on Network Topology
Albert and Albert used the
SUGGEST algorithm used to
organize products in a
supermarket to predict protein-
protein interaction based on
known protein-protein
interactions
51
Making Predictions based on Network Topology
Completing defective cliques can be
used to predict protein interactions
Yu et al. Bioinformatics 22, 7 (2006) 52
How can we use prior knowledge networks for analyzing multivariate
experimental results?
+
Computational Modeling
Experiments
(High-content)
Low hanging fruit hypotheses 53
Govek et al.Genes & Dev. 19:1 (2005)
The Goal is to Better Understand Initial Cell Signaling Activation of Transcription Factors After HU-210
Stimulation of CB1R Receptors
Induction of Neurite Outgrowth
Study the Process of Cell Differentiation 54
Protein-DNA Arrays:
Measuring Transcription Factor Activation
DMSO 20 min
AP-2
RAR
PAX6
CREB
MYB
STAT3 TFAP2A, CEBPA, NFYA, MYB, CREB1, NR3C1, STAT3, SMAD3, SMAD4, STAT4, THRA, THRB, VDR, GATA2, STAT1, PAX6, XBP1, NR1I2, HOXD8, HOXD9, HOXD10, RUNX2, HIVEP1
Validated factors with Gel-shift assays
23 TF increase binding to DNA after 20 minutes
P
Consensus promoter sequence
Transcription Factor P
Consensus promoter sequence
Transcription Factor
signal
Bromberg KD, Ma'ayan A, Neves SR, Iyengar R.
Science. 2008 May 16;320(5878):903-9.
55
Genes2Networks
18,675
29,317
4,242
1,059
1,418
7,241
Integrator
Filter
3,121
List of
TFs
_____
_____
_____
_____
Genes2Networks
Output
subnetwork
Vidal
Stelzl
242
6,149
3,155
Unfiltered
Dataset
Filtered
Dataset
Significant
Intermediates
Berger SI, Posner JM, Ma'ayan A.
BMC Bioinformatics. 2007 Oct 4;8:372.
56
The Genes2Networks Algorithm
Large-scale mammalian
protein-protein
interaction network
Seed list of proteins
which are nodes in the background
network
Step 1: Find all shortest paths for all pairs of
nodes from the seed list
Step 2: Combine all links and nodes from all
found shortest paths to form a subnetwork
Step 3: Add all missing links that directly
connect any pair of nodes from the
subnetwork using interactions from the
background network
Step 4: Rank intermediate nodes (node that
are not from the seed list) based on the
proportion of links in the created
subnetwork vs. total links in the background
network using a binomial proportion test
Inputs Algorithm Output
Subnetwork connecting
the seed nodes
Table with ranked
intermediate proteins
Berger SI, Posner JM, Ma'ayan A.
BMC Bioinformatics. 2007 Oct 4;8:372.
57
Genes2Networks Web Interface
- Hash function for fast loading of the datasets
- Implementation of AJAX allows changing the page without reloading
- GraphViz, Overlib, and PerlMagic library utilization
http://actin.pharm.mssm.edu/genes2networks
Berger SI, Posner JM, Ma'ayan A. BMC Bioinformatics. 2007 Oct 4;8:372.
58
Network Connecting Activated Factors
Bromberg KD, Ma'ayan A, Neves SR, Iyengar R.
Science. 2008 May 16;320(5878):903-9.
59
Making Predictions by Network Analysis
Bromberg KD, Ma'ayan A, Neves SR, Iyengar R. Science. 2008 320(5878):903-9. 60
Experimental Validation
BRCA1 Blocks Neurite Outgrowth PI3K-AKT Pathway is Important for Neurite Outgrowth and Regulates Many of the Indentified Factors
Bromberg KD, Ma'ayan A, Neves SR, Iyengar R. Science. 2008 320(5878):903-9. 61
Predicting Disease Genes Noonan Syndrome
- Mild up regulation in the MAPK pathway (gain of function mutations)
- Four disease genes were identified in about 60% of patients
Noonan’s Symptoms
- Heart Defects
- Distinct Facial Features
- Learning Difficulties
- Bruising and Bleeding
62
Genes2Networks was used to find Additional
Genes that may be Mutated in Noonan Syndrome
Use known disease genes to build a network around these genes to identify new
genes/nodes that could be additional disease genes
Cordeddu V, Di Schiavi E, Pennacchio LA, Ma'ayan A, et al. Nat Genet. 41:1022 (2009) 63
Steiner Trees used to Connect Seed Genes
White and Ma’ayan, 41st ACSSC 2007. IEEE p. 155-159
64
Steiner Trees Used to Connect Signaling Pathways to Gene Regulation
Huang SS, Fraenkel E. Sci Signal.
2009 2(81):ra40
65
PluriNet - Connecting Differentially Expression Genes in Different Stem-Cells
Using Protein Interactions from Literature
Müller et al. Nature. (2008) 455:401 66
KEA- kinase-substrate interaction database and web-
based system for kinase enrichment analysis
Lachmann and Ma’ayan. Bioinformatics 11, 87 (2010)
http://amp.pharm.mssm.edu/lib/kea.jsp
67
ChEA- chip-chip and chip-seq database of
protein-DNA interactions and enrichment
analysis tool
• 118 unique transcription factors
• 107 publications
• 35286 genes
• >150 ChIP-X assays (ChIP-chip, ChIP-seq, ChIP-PET)
• Average targets per transcription factor ≈ 1,300
• Total interactions 254,854
68
ChEA works well for determining TFs regulating
gene expression changes: Myc was inferred as an
effector of Estrogen in MCF7 cells
Lachmann A, Xu H, Krishnan J, Berger SI, Mazloom AR, and Ma’ayan A. ChEA: Transcription Factor
Regulation Inferred from Integrating Genome-Wide ChIP-X Experiments. Bioinformatics, 26, 2438-44 (2010)
69
Summary
• Prior knowledge networks can be used to predict function of proteins, protein interactions and disease genes
• Different algorithms can be used to connect seed lists of proteins with known interactions from prior knowledge networks
• Network analysis can be used to develop hypotheses for functional experiments by combining high-throughput profiling data with prior knowledge networks
70
Slides from a lecture in the course Systems Biology—Biomedical
Modeling
Citation: A. Ma’ayan, Introduction to network analysis in systems biology. Sci. Signal.
4, tr5 (2011).