Statistical Bioinformatics
• Genomics
• Transcriptomics
• Proteomics
• Systems Biology
Statistical Bioinformatics
• Genomics
• Transcriptomics
• Proteomics
• Systems Biology
Multiple Sequence Alignment (MSA)
Multiple Sequence Alignments (MSA):
Some past forces shaping MSAs
• Divergence of sequences by speciation and nucleotide substitution (Phylogenetics).
• Horizontal gene transfer (recombination), especially in bacteria and viruses.
TOPALi v.1 TOPALi v.1 Recombination detectionRecombination detection
Frank Wright,Iain Milne & Dirk Husmeier
TOPALi applied to Roseburia
and Eubacterium sequences
Some past forces shaping MSAs
• Divergence of sequences by speciation and nucleotide substitution (Phylogenetics).
• Horizontal gene transfer (recombination), especially in bacteria and viruses.
• Selective pressure acting on functional domains.
TOPALi v2 Future plans
• Detect genomic regions under selective pressure functional domains in proteins
• Methodology development: combined prediction of breakpoints due to recombination and evolutionary rate change.
• Improved phylogenetic analysis • Investigate use of UK GRID computational
resources for faster analyses
Statistical Bioinformatics
• Genomics
• Transcriptomics
• Proteomics
• Systems Biology
Genes differently expressed between two conditions
– Affymetrix microarrayMouse liver experiment
– Low fat diet vs high fat diet (6 per group)
– Plot of log-fold change vs. average log intensity.
– Points far away from the horizontal line seem “differentially expressed”.
– Which are significant?
• Statistical Methods (SAM, Limma,…) help to detect significant genes
• BUT: Many methods assume that the variances in both groups are the same
• If this is not the case:– Algorithms might give
wrong answers– The definition of “differential
expression” becomes more difficult
Which group gives 'higher' values?
0 2 4 6 8
0.0
0.2
0.4
0.6
0.8
0 2 4 6
02
46
81
0
Check for change in variance
expected absolute log(F)
ob
se
rve
d a
bso
lute
lo
g(F
)
Claus Mayer (BioSS)
• More complex statistical tests for detecting differential gene expression.
• Situations where standard assumptions are violated.
• Allows for different variance-covariance structures in both populations.
Statistical Bioinformatics
• Genomics
• Transcriptomics
• Proteomics
• Systems Biology
Proteomics: 2-D Gels
gel 1 gel 2
How to compare gels 1 and 2?
Chris Glasbey: Nonlinear Warping
John Gustafsson, Chalmers University, Sweden
WARP
2-D Gel Comparison
Two gels superimposed (in different colours)
Proteomics: 2-D Gel Interpretation
• Graham Horgan
• Identify spots which differ between treatments using variance and covariance information from other spots
differently expressed proteins
• Assessment of associations between spot densities and physiological variables.
Statistical Bioinformatics
• Genomics
• Transcriptomics
• Proteomics
• Systems Biology
Detect active pathways in a “known” network
• Network of protein-protein and protein-DNA interactions “known” from the literature
• Gene expression profiling for different conditions– Bacterial strains: promoting - preventing inflammation– Mice on a low-fat vs. high-fat diet
• Can we identify different pathways associated with these conditions?
• We need a robust method – Expression data: noisy, missing values– Post-translational modifications
Cytokine Network
• Collaboration with SCGTI• Interferon Pathway
– Cytokines– Pivotal role in modulating the innate and adaptive
mammalian immune system• Network of protein-protein and protein-DNA
interactions from the literature• Two gene expression times series from bone
marrow-derived macrophages in mice – Infected with cytomegalovirus – Infected and treated with IFN-gamma
casp8
bak
cybb
casp9 cdkn1a
ccl5
b2m
bcl-xlbcl2
c2ta
casp1
casp3
casp7fcer2a
fkbp4
g1p2 hist4h4
hla-ahla-bhla-c hla-dra
ifna11ifna1
hla-drb
ifna14 ifna4
ii
il12a
il12b
il1b
irf1
irf5
irf4
irf3
irf7
isgf3g
itgam
lcsbp1
lfnb
oas1
prkr
psmb10
psme1
psmb9
psmb8
psme2
sfpi1
stat1
stat2
stat6
tap1
tap2
tnfrsf6
tnfsf6
ctss
irf2
casp8
bak
cybb
casp9 cdkn1a
ccl5
b2m
bcl-xlbcl2
c2ta
casp1
casp3
casp7fcer2a
fkbp4
g1p2 hist4h4
hla-ahla-bhla-c hla-dra
ifna11ifna1
hla-drb
ifna14 ifna4
ii
il12a
il12b
il1b
irf1
irf5
irf4
irf3
irf7
isgf3g
itgam
lcsbp1
lfnb
oas1
prkr
psmb10
psme1
psmb9
psmb8
psme2
sfpi1
stat1
stat2
stat6
tap1
tap2
tnfrsf6
tnfsf6
ctss
Subnetwork 1 = Infected
irf2
casp8
bak
cybb
casp9 cdkn1a
ccl5
b2m
bcl-xlbcl2
c2ta
casp1
casp3
casp7fcer2a
fkbp4
g1p2 hist4h4
hla-ahla-bhla-c hla-dra
ifna11ifna1
hla-drb
ifna14 ifna4
ii
il12a
il12b
il1b
irf1
irf5
irf4
irf3
irf7
isgf3g
itgam
lcsbp1
lfnb
oas1
prkr
psmb10
psme1
psmb9
psmb8
psme2
sfpi1
stat1
stat2
stat6
tap1
tap2
tnfrsf6
tnfsf6
ctss
Subnetwork 2 = Infected+treated
irf2
Reverse Engineering of Regulatory Networks
• Can we learn the network structure from postgenomic data themselves?
• Statistical methods to distinguish between– Direct correlations– Indirect correlations
• Challenge: Distinguish between– Correlations– Causal interactions
• Breaking symmetries with active interventions:– Gene knockouts (VIGs, RNAi)
Evaluation: Raf signalling pathway
• Cellular signalling network of 11 phosphorylated proteins and phospholipids in human immune systems cell
• Laboratory data from cytometry experiments– Down-sampled to 100 measurements– Sample size indicative of microarray experiments
• Two types of experiments:– Passive observations– Active interventions (gene knockouts)
• Literature: “gold-standard” network