Upload
juan
View
28
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Gene Network Modeling. Prof. Yasser Kadah Eng. Fadhl Al-Akwaa. OUTLINES. What is the Gene Regulatory Network? GRN Application of GRN GRN Construction Methodology GRN modeling steps GRN Models GRS Software Next work Reference. From The Last Lecture. DNA sequence {A,T,C,G} ATCGAATCGA - PowerPoint PPT Presentation
Citation preview
23/2/20082
Gene Network Modeling
Prof. Yasser KadahEng. Fadhl Al-Akwaa
23/2/20083
OUTLINES
What is the Gene Regulatory Network? GRNApplication of GRNGRN Construction MethodologyGRN modeling stepsGRN ModelsGRS SoftwareNext workReference
From The Last Lecture
DNA sequence {A,T,C,G}
ATCGAATCGAProtein sequence { except B, J, O, U, X, Z}
KMLSLLMARTYW
:The Central DogmaProtein Synthesis
Transcription Translation
Genome ProteomeTranscriptome
Cell Function
23/2/20086
Bioinformatics Important Challenges
Transcription Translation
Gene Prediction
Gene FunctionProtein FunctionProtein 3D Structure
Public Data Base
Transcription Translation
DNA sequence {A,T,C,G}
Microarray Protein sequenceKMLSLLMARTYW
Gene Expression
Level
Gene Expression
9
23/2/200810
Microarray Technology
Gene Expression
Level
Protein Level
Translation Rate
Transcription
Rate
+
+
+
-
GENE A
Translation Rate
Transcription
Rate
Gene Expression Level
Protein Level
+
+
+
-
GENE A
Translation Rate
Transcription
Rate
Gene Expression Level
Protein Level
+
+
+
-
GENE B
?
?
Translation Rate
Transcription
Rate
Gene Expression Level
Protein Level
+
+
+
-
GENE A
Translation Rate
Transcription
Rate
Gene Expression Level
Protein Level
+
+
+
-
GENE B
?
??
?
?
23/2/200814
OUTLINES
What is the Gene Regulatory Network? Application of GRNGRN Construction MethodologyGRN modeling stepsGRN ModelsGRS SoftwareFuture workReference
What is Gene Regulatory Network? (GRN)
Gene A
Gene B
Gene C
Gene D?
?
?
??
GRN An example: Fission yeast
Lackner DH ,2007http://www.sanger.ac.uk/Info/News-releases/2007/070413.shtml
http://en.wikipedia.org/wiki/Metabolic_network_modelling
http://www.enm.bris.ac.uk/anm/summerschools/complexity/imagery/191.html
23/2/200819
OUTLINES
What is the Gene Regulatory Network? GRNApplication of GRNGRN Construction MethodologyGRN modeling stepsGRN ModelsGRS SoftwareNext workReference
23/2/200820
Why build a Gene Network?
Functional Genomics Allow researchers to make predictions about
gene function that can then be tested at the bench.
The Focus is gradually shifting to Functional Genomics.
23/2/200821
Application of GRN Translational Genomics
we can study the effects of a compound (such as a drug) on the level of expression of many genes.
Translational Genomics The mission of the Translational Genomics is to translate genomic discoveries into advances in human health.
23/2/200822
Application of GRN Understanding Experimental data
Biologists are expecting powerful computational tools to extract functional
information from the Experimental data.
23/2/200823
GRN Model ObjectiveConstruct a gene network model that:Describes known genes interactions wellPredicts interactions not known so farAllows for Drug effect simulationUnderstand the etiology of the Disease
23/2/200824
OUTLINES
What is the Gene Regulatory Network? GRNApplication of GRNGRN Construction MethodologyGRN modeling stepsGRN ModelsGRS SoftwareNext workReference
23/2/200825
GRN Construction Methodology
Forward EngineeringInverse Engineering “Traditional
methodology”
23/2/200826
Forward Engineering
Hard
23/2/200827
Reverse Engineering
Microaary Data
ModelGene Network
Possibleforward
problem
very difficultinverse
problem
23/2/200828
Reverse Engineering
Boolean data
Boolean networks
easy
easy
23/2/200829
Data Required: DNA Microarray
0 10 20 30 40 50 60time (min)
gene 1
gene 2
gene 3
assay
23/2/200830
Data Required: Gene Expression Matrix
t1t2t3t4
g10121
g21210
g30111.
g41210
23/2/200831
Data Required: Gene Expression Matrix
a1a2a3a4
g10311
g21210
g30111.
g41210
Snap Shot
t1t2t3t4
g10121
g21210
g30111.
g41210
Time serious
23/2/200832
OUTLINES
What is the Gene Regulatory Network? GRNApplication of GRNGRN Construction Methodology
GRN Modeling StepsGRN ModelsGRS SoftwareNext workReference
23/2/200833
Microarray Image
Grid Alignment SegmentationDiscretization
t1
t2
10
Upregulated
99%
Hypothesis testing
t1 t2
Down regulated
Gene Selection
Seed Algorithm
Gene Expression Extraction
YPBN steady state matched
BN generationDesign of Optimal Control Policy
(I) Penalty Assignment
(II) Formulation of Optimal Control Problem
-1
A1 A2
A3
B C
D
E
F
G
Dynamic Programming
Optimal Control Policy
H
Application of Stationary Policy
Steady State using Control
Original Steady State
1.722.250.941.56
Overview of steps in modeling and control of Probabilistic Boolean networks
Ranadip Pal,2007
Prior Biological Knowledge
Data Extraction Discretization
Network Generation Gene Selection
Control of Network
23/2/200834
GRN modeling steppes: Discretization
0 10 20 30 40 50 60time (min)
assume that genes exist in two states: on and off
if expression of gene i is above level i consider it on, otherwise, consider it off
gene 1
gene 2
gene 3
23/2/200835
GRN modeling steppes: Discretization
1
23/2/200836
GRN modeling steppes: Discretization
onononon
off off off
off
off
off
off
on
off
on
on
1
23/2/200837
GRN modeling steppes: Discretization
0 10 20 30 40 50 60time (min)
gene 1
gene 2
gene 3
2
1
3
23/2/200838
GRN modeling steppes: Discretization
0 10 20 30 40 50 60time (min)
gene 1
gene 2
gene 3
2
1
3
ononononon
on
off off off
off
off
offoffoff
off
off
on on onon
on
on
on
off off off off offoff
on
off off off
23/2/200839
GRN modeling steppes: Discretization
we obtain the following discretized gene expression data:
time 0 510152025303540455055
gene 1 000000111111
gene 2000000011000
gene 3111111100000
the gene expression data is now in the form of bit streams
23/2/200840
GRN modeling steppes: Discretization
assume that genes exist in three states
Unchanged 0
Up-regulated 1
Down-regulated -1
23/2/200841
GRN modeling steppes: Gene SelectionClustring
a1a2a3a4
g10121
g21210
g30111.
g41210
23/2/200842
Clustering Steps: CorrelationChoose a similarity metric to compare the
transcriptional response or the expression profiles:Pearson CorrelationSpearman CorrelationEuclidean Distance
…
23/2/200843
Clustering Steps: Correlation Algorithm
g1g2g3g4g5
g110.230.000.95-0.63
g2-110.910.560.56
g300.2310.320.77
g410.50.561-0.36
g5-10.910.320.41
Correlation coefficients are values from –1 to 1, with 1 indicating a similar behavior, –1 indicating an opposite behavior and 0 indicating no direct relation.
23/2/200844
Clustering Steps: Clustering Algorithm
Choose a clustering algorithm:HierarchicalK-means…
23/2/200845
Hierarchical Clustering
g1g2g3g4g5
g10.230.000.95-0.63
g20.910.560.56
g30.320.77
g4-0.36
g5
g1 g4
g1g2g3g4g5
g10.230.000.95-0.63
g20.910.560.56
g30.320.77
g4-0.36
g5 Find largest value in similarity matrix. Join clusters together. Recompute matrix and iterate.
23/2/200846
Hierarchical Clustering
g1 , g4g2g3g5
g1 , g40.370.16-
0.52
g20.910.56
g30.77
g5
g1 g4 g2 g3
g1 , g4g2g3g5
g1 , g40.370.16-
0.52
g20.910.56
g30.77
g5
• Find largest value is similarity matrix.
• Join clusters together.
• Recompute matrix and iterate.
23/2/200847
Hierarchical Clustering
g1 , g4g2 , g3g5
g1 , g40.27-
0.52
g2 , g30.68
g5
g1 g4 g2 g3g5
g1 , g4g2 , g3g5
g1 , g40.27-
0.52
g2 , g30.68
g5
• Find largest value is similarity matrix.
• Join clusters together.
• Recompute similarity matrix and iterate.
23/2/200848
Clustering Example
Eisen et al. (1998), PNAS, 95(25): 14863-14868
23/2/200849
GRN Modeling Steppes: GRN Generation
g2
g1
g4
g3
_
_
+
+ _
_
+
_
?
Gene network
t1t2t3t4
g10121
g21210
g30111.
g41210
Statistical Signal Processing Technique
23/2/200850
OUTLINES
What is the Gene Regulatory Network? GRNApplication of GRNGRN Construction MethodologyGRN modeling steps
GRN ModelsGRS SoftwareNext workReference
GRN Models Directed and undirected graphs Bayesian networks Boolean networks Generalized logical networks Non-linear ordinary differential equations Piecewise linear differential equations Qualitative differential equations Partial differential equations Stochastic master equations Rule based formalisms
23/2/200852
GRN Models Hidde de Jong,
Modeling and simulation of genetic regulatory systems: a literature review;
J Comput Biol. 2002;9(1):67-103. Review. Node States ComputationData ComplexityDynamics
23/2/200853
What class of modelsshould be chosen?
The selection should be made in view of data requirements goals of modeling and analysis.
23/2/200854
Classical Tradeoff A “fine” model with many parameters
may be able to capture detailed “low-level” phenomena (protein concentrations, reaction kinetics);
requires very large amounts of data for inference, lest the model be “overfit”.
A “coarse” model with lower complexitymay succeed in capturing “high-level” phenomena
(which genes are ON/OFF);requires smaller amounts of data.
23/2/200855
Occam’s Razor
23/2/200856
Model Reliability and Adequacy
P is the set of all possible observations
S set of all observations made on the study system
M is the set of all model outputs
Q=S пM
S
MQ
P
23/2/200857
Model Reliability and Adequacy
S
M
P
Useless Model
M
P
Dream Model
SQ
23/2/200858
Model Reliability and Adequacy
S
M
P
Incomplete model
Q M
S
P
Complete, but erring model
Q
Model reliability: |Q|/|M|Model adequacy: |Q|/|S|
23/2/200859
GRN Models Directed and undirected graphs Bayesian networks Boolean networks Generalized logical networks Non-linear ordinary differential equations Piecewise linear differential equations Qualitative differential equations Partial differential equations Stochastic master equations Rule based formalisms
23/2/200860
Directed and undirected Graphs Probably most straightforward way to model a GRN G=<V,E> V set of vertices Set of edges E=<i,j> where i,j є V, head and tail of edge Additional labels denote positive/negative influence
23/2/200861
Directed and undirected Graphs Advantages: Intuitive way of visualization Common and well explored graph algorithms can make
biologically relevant predictions about GRSes: paths between genes may reveal missing regulatory interactions or
provide clues about redundancy cycles in the network point at feedback relations connectivity characteristics give indication of the complexity loosely connected subgraphs point at functional modules
Disadvantages: Time does not play a role Too much abstraction: very simplified model far from reality
23/2/200862
GRN Models Directed and undirected graphs Bayesian networks Boolean networks Generalized logical networks Non-linear ordinary differential equations Piecewise linear differential equations Qualitative differential equations Partial differential equations Stochastic master equations Rule based formalisms
More popular and efficient
23/2/200863
Boolean Network Model
• A Boolean network is defined by a set of nodes, V = {x1, x2, . . . , xn}, and a list of
Boolean functions, F= {f1, f2, . . . , fn}
• Each xk represents the state (expression) of
a gene, gk, where xk = 1 the gene is expressed
or xk = 0, the gene is not expressed
23/2/200864
Boolean Network
At any given time, combining the gene states gives a gene activity pattern (GAP).
t01234
x111011
x210001
x310111
GAP
23/2/200865
Boolean Network
t01234
x111011
x210000
x310110
•Given a GAP at time t, a deterministic function (a set of logical rules) provides the GAP at time t +1.
GAPt GAPt+1
23/2/200866
Boolean Network
t01234
x111011
x210000
x310110
23/2/200867
Boolean Network Example
t01234
x111011
x210000
x310110
23/2/200868
Boolean Network
t01234
x111011
x210000
x310110
23/2/200869
Boolean Network Example
t01234
x111011
x210000
x310110
x1
x1
x2 x3t
t+1
23/2/200870
Boolean Network Example
t01234
x111011
x210000
x310110
x1
x1
x2 x3
or
t
t+1
23/2/200871
Boolean Network Example
t01234
x111011
x210000
x310110
x1
x1
x2 x3t
t+1
23/2/200872
Boolean Network Example
t01234
x111011
x210000
x310110
x1
x1
x2 x3t
t+1
23/2/200873
Boolean Network Example
t01234
x111011
x210000
x310110
x1
x1
x2 x3t
t+1or
For each node there will be 2^2^k possible functions
23/2/200874
Boolean Network Example
t01234
x111011
x210000
x310110x2
x1
x1 x3
x2 x3
or nor nand
t
t+1
23/2/200875
Boolean Network Example
I. Shmulevich et al., Bioinformatics (2002), 18 (2): 261-274
AND
NOT
NAND
23/2/200876
Boolean Networks – Summary Advantages
Efficient analysis of large RN Positive/negative feedback-cycles can be modeled with
BN‘s Disadvantages
Strong simplifying assumptions – gene is either on or off, no in between states
The computation time is very high or often impractical to construct large-scale gene networks
Very susceptible to noiseThere are situations where boolean idealisation is not
appropriate – more general methods required
23/2/200877
Bayesian Networks
A gene regulatory network is represented by directed acyclic graph:Vertices correspond to genes.Edges correspond to direct influence or interaction.
For each gene xi, a conditional distribution p(xi | ancestors(xi) ) is defined.
The graph and the conditional distributions, uniquely specify the joint probability distribution.
23/2/200878
Bayesian Network Example
x3x4
x5
x1 x2
Conditional distributions:p(x1), p(x2), p(x3| x2),
p(x4| x1,x2), p(x5| x4)
p(X) =p(X) = p(x1) p(x2) p(x3| x2) p(x4| x1,x2) p(x5| x4)
23/2/200879
Learning Bayesian Models
Using gene expression data, the goal is to find the bayesian network that best matches the data.
Recovering optimal conditional probability distributions when the graph is known is “easy”.
Recovering the structure of the graph is NP hard(non-deterministic polynomial ).
But, good statistics are available: What is the likelihood of a specific assignment? What is the distribution of xi given xj? …
23/2/200880
Issues with Bayesian Models Computationally intensive. Requires lots of data. Does not allow for feedback loops which are known
to play an important role in biological networks. Does not make use of the temporal aspect of the
data. Dynamical Bayesian Networks aim at solving some of
these issues but they require even more data.
23/2/200881
Differential Equations
Typically uses linear differential equations to model the gene trajectories:dxi(t) / dt = a0 + ai,1 x1(t)+ ai,2 x2(t)+ … + ai,n xn(t)
Several reasons for that choice:lower number of parameters implies that we are
less likely to over fit the datasufficient to model complex interactions between
the genes
23/2/200882
Small Network Example
dx1(t) / dt = 0.491 - 0.248 x1(t)
dx2(t) / dt = -0.473 x3(t) + 0.374 x4(t)
dx3(t) / dt = -0.427 + 0.376 x1(t) - 0.241 x3(t)
dx4(t) / dt = 0.435 x1(t) - 0.315 x3(t) - 0.437 x4(t)
x2
x1
x4
x3
_
_
+
+ _
_
+
_
23/2/200883
Small Network Example
dx1(t) / dt = 0.491 - 0.248 x1(t)
dx2(t) / dt = -0.473 x3(t) + 0.374 x4(t)
dx3(t) / dt = -0.427 + 0.376 x1(t) - 0.241 x3(t)
dx4(t) / dt = 0.435 x1(t) - 0.315 x3(t) - 0.437 x4(t)
x2
x1
x4
x3
_
_
+
+ _
_
+
_
one interactioncoefficient
23/2/200884
Small Network Example
dx1(t) / dt = 0.491 - 0.248 x1(t)
dx2(t) / dt = -0.473 x3(t) + 0.374 x4(t)
dx3(t) / dt = -0.427 + 0.376 x1(t) - 0.241 x3(t)
dx4(t) / dt = 0.435 x1(t) - 0.315 x3(t) - 0.437 x4(t)
x2
x1
x4
x3
_
_
+
+ _
_
+
_
constantcoefficients
23/2/200885
Problem Revisited
a0,ia1,ia2,ia3,ia4,i
x1.431.-248000
x2000.-473.374
x3.-427.3760.-2410
x40.4350.-315.-437
Given the time-series data, can we find the interactions coefficients?
23/2/200886
Issues with Differential Equations
• Even under the simplest linear model, there are m(m+1) unknown parameters to estimate:
• m(m-1) directional effects• m self effects• m constant effects
• Number of data points is mn and we typically have that n << m (few time-points).
• To avoid over fitting, extra constraints must be incorporated into the model such as:
• Smoothness of the equations • Sparseness of the network (few non-null interaction coefficients)
23/2/200887
OUTLINES
What is the Gene Regulatory Network? GRNApplication of GRNGRN Construction MethodologyGRN modeling stepsGRN ModelsGRS SoftwareNext workReference
23/2/200888
GRN Software
GNA: Genetic Network Analyzer
Helix Bioinformatics
http://www-helix.inrialpes.fr/article122.html
23/2/200889
GRN Software
Probabilistic Boolean Networks (PBN)Matlab Tool BoxIlya ShmulevichInstitute for Systems Biology
23/2/200890
OUTLINES
What is the Gene Regulatory Network? GRNApplication of GRNGRN Construction MethodologyGRN modeling stepsGRN ModelsGRS Software
Next workReference
Future Work: Literature Review
• Study the noisy natural of Microarray Data.• Study in depth the existing modeling
methodology.• Focus on specialized problem like cancer.
Future Work : GSP Statistics Books
Genomics signal processing and statistics, Edward,2006 Introduction to genomics signal processing
with control, Ily,2006 Computational and Statistical Approaches to
Genomics (Springer, 2006), Ily
23/2/200893
Future Work : Statistics Books
Handbook of Computational Statistics An Introduction to Statistical Signal
Processing, Robert M. Gray,2007 fundamentals of statistical signal
processing :estimation theory, steven kay nonlinear signal processing a statistical
approach, Gonzalo R,2005 Inference_in_HMM, Olivier Cappe,2005
23/2/200894
Future Work : Modeling Books
Modeling and Control of Complex Systems (Control Engineering)
by Petros A. Ioannou, Andreas Pitsillides,2008
MODELING BIOLOGICAL SYSTEMS: Principles and Applications2005
gene regulation and metabolism postgenomic computational approaches, Julio, 2000
23/2/200895
Future Work: Resources IEEE Transactions on Computational Biology and Bioinformatics IEEE International Workshop on Genomic Signal Processing and
Statistics IEEE Journal of Selected Topics in Signal Processing: Special Issue on
Genomic and Proteomic Signal Processing EURASIP Journal of Bioinformatics and Systems Biology Special issue of
the on Genetic Regulatory Networks IEEE Signal Processing Magazine on Signal Processing Special issue of
the Methods in Genomics and Proteomics IEEE Transactions on Signal Processing Special Genomic Signal
Processing issue of the Workshop on Discrete Models for Genetic Regulatory Networks
23/2/200896
OUTLINES
What is the Gene Regulatory Network? GRNApplication of GRNGRN Construction MethodologyGRN modeling stepsGRN ModelsGRS SoftwareNext work
Reference
23/2/200897
Reference Hidde de Jong,
Modeling and simulation of genetic regulatory systems: a literature review; J Comput Biol. 2002;9(1):67-103. Review.
BAYESIAN ROBUSTNESS IN THE CONTROL OF GENE REGULATORY NETWORKS Ranadip Pal1, Aniruddha Datta2, Edward R. Dougherty
Anastassiou, D. (2001). Genomic Signal Processing. IEEE Signal Processing
Dougherty, E. R. and A. Datta (2005). "Genomic signal processing: diagnosis and therapy." Signal Processing Magazine, IEEE 22(1): 107 - 112.
Vaidyanathan, P. P. (2004). Genomics and Proteomics: A Signal Processorapos's Tour. Circuits and Systems Magazine, IEEE. 4: 1-1.
23/2/200898
Reference Vaidyanathan, P. P. and B.-J. Yoon (2004). "The role
of signal-processing concepts in genomics and proteomics." Journal of the Franklin Institute.(Special Issue on Genomics).