Upload
margaret-walton
View
35
Download
3
Tags:
Embed Size (px)
DESCRIPTION
“Ideal Parent” Structure Learning. Gal Elidan with Iftach Nachman and Nir Friedman. School of Engineering & Computer Science The Hebrew University, Jerusalem, Israel. Variables. Data. S. C. E. S. C. S. C. E. D. D. E. 1. Consider local changes. D. 2. Score each candidate. S. - PowerPoint PPT Presentation
Citation preview
“Ideal Parent” Structure Learning
School of Engineering & Computer Science
The Hebrew University, Jerusalem, Israel
Gal Elidan
withIftach Nachman and Nir Friedman
Problems: Need to score many candidates Each one requires costly parameter optimization
Structure learning is often impractical
S C
E
D
S C
E
D
S C
E
D
S C
E
D
Learning Structure
Data
VariablesInput:
-17.23
-19.19
-23.13
Inst
ance
s
S C
E
D
Output:
Init: Start with initial structure
Consider local changes1Score each candidate2
Apply best modification 3 The “Ideal Parent” Approach Approximate improvements of changes (fast)
Optimize & score promising candidates (slow)
EC
P(E
| C
)
D
A
C
E
B
Linear Gaussian Networks
Goal: Score only promising candidates
The “Ideal Parent” Idea
Parent Profile
Child Profile
Instances
Pred(X|U)
U
X
Goal: Score only promising candidates
The “Ideal Parent” Idea
Ideal Profile
Instances
Pred(X|U)
U
X
Y
Step 1:Compute optimal
hypothetical parent
Pred(X|U,Y)
Instances
pote
ntia
l par
ents
Step 2:Search for
“similar” parent
Z1
Z2
Z3
Z4
Parent Profile
Child Profile
Step 3:Add new parent
and optimize parameters
Goal: Score only promising candidates
The “Ideal Parent” Idea
Instances
U
X
Step 1:Compute optimal
hypothetical parent
Instances
pote
ntia
l par
ents
Step 2:Search for
“similar” parent
Z1
Z2
Z3
Z4Pred(X|U,Y)
Ideal Profile
Y
Parent(s) Profile
Z2
Predicted(X|U,Z)
Child Profile
Choosing the best parent Z
Our goal: Choose Z that maximizes
U
X
Z U
X
Likelihood of Likelihood of
Theorem: likelihood improvement when only z is optimized
y,z
Y
Z
We define:
Similarity vs. Score
C2 is more accurate
C1 will be useful later
scoreC
2 S
imila
rity
score
C1
Sim
ilarit
y
We now have an efficient approximation for the score
effect of fixed variance is large
Ideal Parent in Search Structure search involves
O(N2) Add parentO(NE) Replace parentO(E) Delete parentO(E) Reverse edge
S C
E
D
S C
E
D
S C
E
D
S C
E
D
-17.23
-19.19
-23.13
Vast majority of evaluations are replaced by ideal approximation
Only K candidates per family are optimized and scored
Gene Expression Experiment
4 Gene expression datasets with 44 (Amino), 89 (Metabolism) and 173 (2xConditions) variables
0.1
0.2
1 2 3 4 5K
test
-l
og
-lik
elih
oo
d AminoMetabolismConditions (AA)Conditions (Met)
0
1 2 3 4 5K
0
1
2
3
4
sp
eed
up
1 2 3 4 5K0.4%-3.6%
changes evaluated
greedy
Speedup:1.8-2.7
Scope
Conditional probability distribution (CPD) of the form
link function white noise
General requirement:
g(U) be any invertible (w.r.t ui) function
Linear Gaussian Chemical ReactionSigmoid Gaussian
Problem: No simple form for similarity measures
Sigmoid Gaussian CPD
0
2
-4 -2 0 2 4
P(X
=0.
5|Z
)
Z
0
2
-4 -2 0 2 4
P(X
=0.
85|Z
)
0
1
g(z)
Z
X = 0.5 X = 0.85
0
1
g(z) 0.5
Y(0.5) Y(0.85)-4 -2 0 2 4-4 -2 0 2 4 Linear approximation
around Y=0ExactApprox
Z
X
Like
lihoo
d
Like
lihoo
d
Solution:
Sensitivity to Z depends on gradient of specific instance
Z
Sigmoid Gaussian CPD
-0.86 -0.3 0.27 0.83
0.04
1.15
2.26
3.37
Z x 0.25 (g0.5)
Z x
0.1
275
(g 0
.85)
-1.85 -0.64 0.58 1.79
-0.11
1.1
2.31
3.52
Z (X=0.5)
Z (
X=
0.85
)
Equi-Likelihood Potential After gradient correction
We can now use the same measure
Sigmoid Gene Expression
4 Gene expression datasets with 44 (Amino), 89 (Metabolism) and 173 (Conditions) variables
-0.1
0
0.1
test
-l
og
-lik
elih
oo
d
0 5 10 15 20K
AminoMetabolismConditions (AA)Conditions (Met)
greedy
20
60
100
sp
eed
up
0 5 10 15 20K 2.2%-6.1% moves evaluated
18-30 times faster
For the Linear Gaussian case:
Challenge: Find that maximizes this bound
Adding New Hidden Variables
Idea Profile
Idea: Introduce hidden parent for nodeswith similar ideal profiles
H
X1 X2 X4
X1
X2
X3
X4
X5
Y1
Y2
Y3
Y4
Y5
Instances
where is the matrix whose columns are
must lie in the span of
is the eigenvector with largest eignevalue
Setting and using the above (with A invertible)
Scoring a parent
Rayleigh quotient of the matrix and .
Finding h* amounts to solving an eigenvector problem where |A|=size of cluster
X1
X2
X3
X4
X1 X2 X3 X4
compute only once
Compute using
X1 X2
12.35
X1 X3
14.12
X3 X4
3.11
Finding the best Cluster
X1
X2
X3
X4
X1 X2 X3 X4
compute only once
X1 X3
X1 X3
X1 X2
12.35
X1 X3
14.12
X3 X4
3.11
14.12
X1 X3 X2
X2
18.45
X4
X1 X3 X2 X416.79
Finding the best Cluster
Select cluster with highest score Add hidden parent and continue with search
Bipartite Network
Instances from biological expert network with7 (hidden) parents and 141 (observed) children
10 100
-100
-60
-20
test
log
-lik
elih
oo
d
Instances10 100
-60
-40
-20
tra
in lo
g-l
ikel
iho
od
Instances
GreedyIdeal K=2Ideal K=5Gold
Speedup is roughly x 10
Greedy takes over 2.5 days!
Summary New method for significantly speeding up structure learning in continuous variable networks
Offers promising time vs. performance tradeoff
Guided insertion of new hidden variables
Future work Improve cluster identification for non-linear case
Explore additional distributions and relation to GLM
Combine the ideal parent approach as plug-in with other search approaches