Upload
joe-suzuki
View
116
Download
2
Embed Size (px)
Citation preview
.
.
. ..
.
.
A Conjecture on Strongly Consistent Learning
Joe Suzuki
Osaka University
July 7, 2009
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 1 / 21
Outline
Outline
.
. .1 Stochastic Learning with Model Selection
.
. .
2 Learning CP
.
. .
3 Learning ARMA
.
. .
4 Conjecture
.
. .
5 Summary
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 2 / 21
Stochastic Learning with Model Selection
Probability Space (Ω,F , µ)
Ω: the entire setF : the set of events over Ωµ : F → [0, 1]: a probability measure (µ(Ω) = 1)
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 3 / 21
Stochastic Learning with Model Selection
Stochastic Learning
X : Ω → R: random variableµX : the measure w.r.t. Xx1, · · · , xn ∈ X (Ω): n samples
.
Induction/Inference
.
.
.
. ..
.
.
µX 7−→ x1, · · · , xn (Random Number Generation)
x1, · · · , xn 7−→ µX (Stochastic Learning)
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 4 / 21
Stochastic Learning with Model Selection
Two Problems with Model Selection
.
Conditional Probability for Y given X
.
.
.
. ..
.
.
µY |XIdentify the equivalence relation in X from samples.
x ∼ x ′ ⇐⇒ µ(Y = y |X = x) = µ(Y = y |X = x ′) for y ∈ Y (Ω)
.
ARMA for Xn∞n=−∞
.
.
.
. ..
.
.
Xn +∑k
j=1 λjXn−j ∼ N (0, σ2) with λkj=1 (0 ≤ k < ∞)
Identify the true order k from samples.
.
This paper
.
.
.
. ..
.
.
compares CP and ARMA to find how close they are.
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 5 / 21
Learning CP
CP
A ∈ FG ⊆ F
.
CP of A given G (Radon-Nykodim)
.
.
.
. ..
.
.
∃G-measurable g : Ω → R s.t µ(A ∩ G ) =
∫G
g(ω)µ(dω) for G ∈ G.
.
CP µY |X
.
.
.
. ..
.
.
A := (Y = y), y ∈ Y (Ω)G: generated by subsets of X .
.
Assumption
.
.
.
. ..
.
.
|Y (Ω)| < ∞
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 6 / 21
Learning CP
Applications for CP
.
Stochastic Decision Trees, Stochastic Horn Clauseses Y |X , etc.
.
.
.
. ..
.
.
Find G from samples.
G = G1, G2, G3, G4, G5
"!#Ã
"!#Ã
"!#Ã
"!#Ã"!
#Ã
"!#Ã "!
#Ã"!#Ã
¢¢¢
AAA
¢¢¢
AAA
©©©©©©
HHHHHHWeather
Temp
G3
Wind
G1 G2 G4 G5
ClearCloud
Rain
≤ 75 > 75 Yes No
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 7 / 21
Learning CP
Applications for CP (cont’d)
.
Finite Bayesian Networks Xi |Xj , j ∈ π(i)
.
.
.
. ..
.
.
Find π(i) ⊆ 1, · · · , i − 1 from samples.
µ´¶³X3 µ´¶³
X4 µ´¶³X5
µ´¶³X2
µ´¶³X1
6
- -
@@
@I
¡¡
¡ª
@@@R
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 8 / 21
Learning CP
Learning CP
zi := (xi , yi ) ∈ Z (Ω) := X (Ω) × Y (Ω)From zn := (z1, · · · , zn) ∈ Zn(Ω), find a minimal G.G∗: trueGn: estimated
.
Assumption
.
.
.
. ..
. .
|G∗| < ∞
.
Two Types of Errors
.
.
.
. ..
.
.
G∗ ⊂ Gn (Over Estimation)
G∗ ⊆ Gn (Under Estimation)
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 9 / 21
Learning CP
Example: Quinlan’s Q4.5
(a) G∗
µ´¶³µ´¶³ µ´¶³µ´¶³µ´¶³µ´¶³ µ´¶³µ´¶³
¢¢ AA ¢¢ AA
©©©©HHHH
Weather
Temp
3
Wind
1 2 4 5
ClearCloud
Rain
≤ 75 > 75 Yes No
(b) Overestimation
µ´¶³µ´¶³µ´¶³µ´¶³µ´¶³µ´¶³µ´¶³
µ´¶³ µ´¶³µ´¶³
¢¢ AA ¢¢ AA
©©©©HHHH
Weather
Temp
Wind
Wind
1 2 5 6
3 4
ClearCloud
Rain
≤ 75 > 75 Yes No
Yes No
¢¢ AA
(c) Underestimation
µ´¶³µ´¶³µ´¶³µ´¶³ µ´¶³µ´¶³
¢¢ AA
©©©©HHHH
Weather
Temp
3
4
1 2
ClearCloud
Rain
≤ 75 > 75
(d) Underestimation
µ´¶³µ´¶³µ´¶³
µ´¶³ µ´¶³µ´¶³©©©©
HHHHWeather
1
Wind
Wind
4 5µ´¶³¢¢ AA µ´¶³¢¢ AA
2 3
ClearCloud
Rain
Yes No
¢¢ AA
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 10 / 21
Learning CP
Information Criteria for CP
Given zn ∈ Z (Ω), find G minimizing
I (G, zn) := H(G, zn) +k(G)
2dn
H(G, zn): Empirical Entropy (Fitness of zn to G)k(G): the # of Parameters (Simplicity of G)
dn ≥ 0:dn
n→ 0
.
.
. ..
.
.
dn = log n BIC/MDL
dn = 2 AIC
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 11 / 21
Learning CP
Consistency
.
Consistency (Gn −→ G∗ (n → ∞))
.
.
.
. ..
.
.
Weak Consistency Probability Convergence (O(1) < dn < o(n))
Strong Consistency Almost Surely Convergence (MDL/BIC etc.)
AIC (dn = 2) is not consistent because dn is too small !
.
Problem
.
.
.
. ..
.
.
What is the minimum dn for Strong Consistency ?
.
Answer (Suzuki, 2006)
.
.
.
. ..
.
.
dn = 2 log log n
(the Law of Iterated Logarithms)
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 12 / 21
Learning CP
Error Probability for CP
.
G∗ ⊂ G (Over Estimation)
.
.
.
. ..
.
.
µω ∈ Ω|I (G, Z n(ω)) < I (G∗, Z n(ω))= µω ∈ Ω|χ2
K(G)−K(G∗)(ω) > (K (G) − K (G∗))dn
χ2K(G)−K(G∗) ∼ χ2 of freedom K (G) − K (G∗)
.
G∗ ⊆ G (Under Estimation)
.
.
.
. ..
.
.
µω ∈ Ω|I (G, Zn(ω)) < I (G∗, Zn(ω))
diminishes exponentially with n
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 13 / 21
Learning CP
Applications for ARMA
.
Time Series Xi |Xi−k , · · · , Xi−1
.
.
.
. ..
.
.
Xi +k∑
j=1
λjXi−j ∼ N (0, σ2)
Find k from samples.
.
Gaussian Bayesian Networks Xi |Xj , j ∈ π(i)
.
.
.
. ..
.
.
Xi +∑
j∈π(i)
λj ,iXj ∼ N (0, σ2i )
Find π(i) ⊆ 1, · · · , i − 1 from samples.
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 14 / 21
Learning ARMA
Learning ARMA
k ≥ 0λjk
j=1: λi ∈ Rσ2 ∈ R>0
Xi∞i=−∞: Xi +k∑
j=1
λjXi−j = ϵi ∼ N (0, σ2)
.
.
. ..
.
.
Given xn := (x1, · · · , xn) ∈ X1(Ω) × · · · × Xn(Ω), estimate
k: known λjkj=1, σ2
k: Unknown k as well as λjkj=1, σ2
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 15 / 21
Learning ARMA
Yule-Walker
If k is known, solve λj ,kkj=1 and σ2
k w.r.t.
x :=1
n
n∑i=1
xi
cj :=1
n
n−j∑i=1
(xi − x)(xi+j − x) , j = 0, · · · , k
−1 c1 c2 · · · ck
0 c0 c1 · · · ck−1
0 c1 c0 · · · ck−2...
......
......
0 ck−1 ck−2 · · · c0
σ2
k
λ1,k
λ2,k...
λk,k
=
−c0
−c1
−c2...
−ck
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 16 / 21
Learning ARMA
Information Criteria for ARMA
If k is unknown, given xn ∈ X n(Ω), find k minimiziing
I (k, xn) :=1
2log σ2
k +k
2dn
σ2k : obtain using Yule-Walker
dn ≥ 0:dn
n→ 0
.
.
. ..
.
.
dn = log n BIC/MDL
dn = 2 AIC
dn = 2 log log n Hannan-Quinn (1979)=⇒the minimum dn satisfying Strong Consistency
Suzuki (2006) was inspired by Hannan-Quinn (1979) !
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 17 / 21
Learning ARMA
Error Probability for ARMA
k∗: true Order
.
k∗ > k (Under Estimation)
.
.
.
. ..
.
.
µω ∈ Ω|I (k , X n(ω)) < I (k∗, Xn(ω))
diminishes exponentially with n
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 18 / 21
Conjecture
Error Probability for ARMA (cont’d)
.
Conjecture: k∗ < k (Over Estimation)
.
.
.
. ..
.
.
µω ∈ Ω|I (k, X n(ω)) < I (k∗, Xn(ω))
= µω ∈ Ω|χ2k−k∗(ω) > (k − k∗)dn
χ2k−k∗
∼ χ2 of freedom k − k∗
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 19 / 21
Conjecture
In fact, ...
For k > k∗ and large n (use σ2k = (1 − λ2
k,k)σ2k−1),
1
2log σ2
k +k
2dn <
k∗2
log σ2k∗ +
k∗2
dn
⇐⇒k∑
j=k∗+1
logσ2
j
σ2j−1
> (k − k∗)dn
⇐⇒k∑
j=k∗+1
log(1 − λ2k,k) > dn
⇐⇒k∑
j=k∗+1
λ2k,k > (k − k∗)dn
It is very likely that∑k
j=k∗+1 λ2k,k ∼ χ2
k−k∗although λk,k ∼ N (0, 1).
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 20 / 21
Summary
Summary
.
CP and ARMA are similar
.
.
.
. ..
.
.
.
..
1 Results in ARMA can be applied to CP
.
..
2 Results in Gausssian BN can be applied to finite BN.
.
The conjecture is likely enough ?
.
.
.
. ..
.
.
Answer: Perhaps. Several evidences.
Zini=1: independent
Zi ∼ N (0, 1)
rankA = n − k , A2 = A =⇒ t [Z1, · · · , Zn]A[Z1, · · · , Zn] ∼ χ2n−k
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 21 / 21