Upload
cornelia-french
View
243
Download
0
Embed Size (px)
Citation preview
• Reconstructing gene networks• Analysing the properties of gene networks
Gene NetworksUsing gene expression data to reconstruct gene networks
1)Parts list – genes, transcription factors, promoters, binding sites, …
2)Architecture – a graph depicting the connections of the parts
3)Logics – how combinations of regulatory signals interact (e.g., promoter logics)
4)Dynamics – how does it all work in real time
Gene Networksfour different levels
1)Parts list – genes, transcription factors, promoters, binding sites, …
2)Architecture – a graph depicting the connections of the parts
3)Logics – how combinations of regulatory signals interact (e.g., promoter logics)
4)Dynamics – how does it all work in real time
Gene Networksfour different levels
A Dw
Gene A Gene DrelationArc/EdgeNode Node
Networks and Graphsdescribing gene networks using graphs
G1 G2• The product of gene G1 is a transcription factor,
which binds to the promoter of gene G2 – physical networks
G1 G2
G1 G2
• Gene G2 is mentioned in a paper about gene G1 – literature networks
• The disruption of gene G1 changes the expression level of gene G2 – expression networks
Networks and GraphsDifferent interpretations of arcs
The dataset used is coming from Hughes et al.: “Functional discovery via a compendium of expression profiles”, Cell 102, 109-126 (2000)
• Yeast data, 6316 gene expression profiles over 300 experiments• 276 deletion mutants (274 single, 2 double)• 11 tet-promoter mutants• 13 compound treatments
selected a subset of 248 experiments:• Single deletion mutants• All chromosomes present
Gene Disruption Networksthe dataset
The normalized expression log(ratios) are discretized using the threshold :
Discretization of the dataHughes et al.
X < d(X) = 1 X d(X) = 0X > d(X) = 1
A B Cgene B
gene C
gene D
gene AA D
B C
Network constructiondisruption network
outdegree = 4indegree = 3
Indegree and Outdegreedegree of a node
Indegree
Most genes have only a few incoming / outgoing edges, but some have many (>500)
Outdegree
Indegree and Outdegreedegree distributions
Indegree and Outdegreepowerlaw distribution
outdegree m n indegree m n2.0 Carbohydrate metabolism 363 4 Amino-acid metabolism 9 194
RNA turnover 353 4 Nucleotide metabolism 6 82Meiosis 244 3 Energy generation 5 242Cellstress 207 9 Small molecule transport 5 343Protein translocation 197 3 Other metabolism 5 148
2.8 RNA turnover 110 4 Amino-acid metabolism 4 167Cellstress 62 8 Nucleotide metabolism 3 67Meiosis 54 3 Energy generation 2 184Proteinsynthesis 53 7 Differentiation 2 43Cellwallmaintenance 47 6 Small molecule transport 2 286
3.6 RNA turnover 48 4 Small molecule transport 2 230RNA processing/ modification 41 4 Other metabolism 2 96Cellstress 27 8 Nucleotide metabolism 2 58Small molecule transport 19 8 Matingresponse 2 57Cellwallmaintenance 19 6 Amino-acid metabolism 2 133
Cellular role table showing the top 5 groups with the highest median degrees for the networks with a minimum group size of 3 for outdegree and 40 for the indegree (: significance threshold, m: the median degree, n: the group size)
Median Out-/Indegree
• Is there one “big” dominant connected component and possibly a number of small components, or several components of comparable sizes?
• Can the network be broken down in several components of comparable size by removing nodes of high degree (i.e., nodes with many incoming or outgoing edges)?
network modularity
network modularityNumber of connected components in the networks
network modularityNumber of connected components in the networks
component
full network
1% removed
5% removed
10% removed
2.0 largestsecond
total
5383
1
4707
1
368222
261452
3.0 largestsecond
total
355622
246122
138549
7646
17
4.0 largestsecond
total
235434
120537
5426
22
452851
Number of connected components in the networks
network modularity
• Wagner, Genome Research 2002 – there exist many independent modules
• Featherstone and Broadie, Bioessays 2002 - there is only one giant module
• All depends on the definition of the ‘module’
Modularity other opinions
a closer look
AEP2
AKR1
CMK2
ANP1
RAD16
AFR1
CEM1
CUP5
SST2
DIG1
UBP10
STE2
ERG2
PHO89ERG6
GAS1 PTP2
GYP1
HIR2HPT1
ISW1
FIG1 ISW2
KIN3
MAC1MRPL33
MSU1
NPR2
PET111
RAD57
RIP1
RRP6
ASG7
STE6RTS1
SCS7
SGS1
MFA1
SHE4AGA1
SWI4
FUS1SWI5
VAC8
VMA8
YAL004W
YAR014C
YEL044W
YER050C
FUS3
GPA1
BAR1
MFA2
YER083C
RTT104
YMR014W
YMR029C AGA2YMR031W-A
YMR293C
YOR078W
ADE2
AFG3
BNI1
CLA4
ERG3
FKS1
KAR4
YAR064W
CHS3
VAP1
ICS2
YCLX09W
YDL009C
STP4
PMT1
VCX1HO
THI13
ADR1
YDR249C PAM1
YDR275W
HXT7
HXT6 YDR366CYDR534C
URA3
YEL071W
MNN1
ICL1
RNR1
YER130C
YER135C
SPI1 DMC1
HSP12
NIL1
GSC2
KSS1
MUP1
YGR138C
SKN1
YGR250C
YHR097C YHR116W
YHR122W
YHR145C
YIL060W
YIL096C
YIL117C
RHO3
YIL122W FKH1
NCA3
YJL145W
RPL17B
YJL217W
CYC1
DAN1
PGU1
GFA1
HAP4
RRN3
STE3
PRY2
KTR2
SRL3
YLR040C
YLR042C
SSP120
HSP60
YLR297W
RPS22B YLR413W
HOF1
DDR48
RNA1
YMR266W
YNL078W
SPC98
YNL133C
YNL217W
WSC2YPT11
RFA2
YNR009W
YNR067C
MDH2
YOL154W
NDJ1
WSC3
CDC21
PFY1
RGA1
MSB1
SRL1
YOR248W
YOR296W
YOR338W
GDS1PDE2
FRE5
YPL080C
RPS9A
BBP1
YPL256C
SUA7
MEP3
YPR156C
HMG1
HOG1
MED2
QCR2
RAD6
RAS2
RPD3
RPS24A
CRS4CYC8
YAR031W
YBR012C
HIS7
YCLX07W
YCRX18C PCL2
YDR124W
ECM18APA2
YER024W
HOM3
THI5
YGL053W
NRC465
YGR161C YHR055C
YIL037C
YIL080W
YIL082W
HIS5
YJL037W
SAG1
CPA2
AAD10
HYM1
MET1
MID2
YML047C
KAR5
CIK1
FUS2 SCW10
BOP3
YNL279WTHI12
YOL119C
YOR203W
TEA1
ISU1
YPL156C
YPL192CYPL250C
KAR3YIL082W
-A
YML048W-A
YMR085W
STE11
STE12 STE18
URA1
URA4
STE24
STE4
STE5
STE7SWI6
MAK1
TUP1
YER044C YJL107C
This subnetwork is the result of filtering the full network at =2.0 for the core set marked in red and their next neighbours (red arcs: down- regulation, green arcs: upregulation).
Mating subnetwork
This subnetwork is the result of filtering the full network at =4.0 for the core set marked in red and their next neighbours (red arcs: downregulation, green arcs: upregulation).
Mating subnetwork
AKR1
PHO89
AFR1
CMK2
SWI4
RAD16
YER050C
MFA1
STE2 BAR1
MFA2
AGA1 FUS1FUS3
HOG1
FIG1
AGA2
KSS1
RAD6
STE6
RPD3
CRS4
ASG7
CYC8
KAR4
STE11
STE12
STE18
STE24
STE4
STE5
STE7 TUP1
YER044C
SST2
•more information than randomised networks •no optimal •powerlaw distribution of arcs•no obvious modules• local networks make sense
Conclusion
... and now to something completely different
• ChIP• „theoretic“ ChIP= bindingsite network• Gene disruption
Comparison of gene neighbourhoods in graphsFirst, take three types of networks ...
source gene
target geneT1
T2
T3
Target Sets
T1
T2
T3
All genes – target setsAll genes – source sets
s1 T1
s2 T2
T1
T’1
Transcription factors
Disrupted genes
Genes with binding sites for s1
Genes affected by disruption s2s1 s2
s1 = s2
known relationships
target set overlap small
target set overlap large
target set overlap large
predicted relationship
• protein-protein interaction (Y2H, cellzome, etc.)• MIPS (C. v. Mering „reference set“)• Co-citation network (PubMed)
Comparison of gene neighbourhoods in graphs... and three more networks ...
rank for p-value:
1) s1 - s2 - p-value - tp2) s1 - s3 - p-value - fp 3) s1 - s4 - p-value - tp 4) s2 - s3 - p-value - tn5) s2 - s4 - p-value - fn 6) s3 - s4 - p-value - tn
Comparison of gene neighbourhoods in graphs... and then: