88
Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University of Cambridge MFPS XXX, Cornell University, Ithaca, June 14, 2014 Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 1/31

Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Conditional Independence,

Computability, and Measurability

Daniel M. Roy

Research FellowEmmanuel College

University of Cambridge

MFPS XXX, Cornell University, Ithaca, June 14, 2014

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 1/31

Page 2: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Computer Science

Statistics Probability

Algorithmic processes that describe andtransform uncertainty.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 2/31

Page 3: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Computer Science

Statistics Probability

Algorithmic processes that describe andtransform uncertainty.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 2/31

Page 4: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

The stochastic inference problem (informal version)

Input: guesser and checker probabilistic programs.

Output: a sample from the same distribution as the program

accept = False

while (not accept):

guess = guesser()

accept = checker(guess)

return guess

This computation captures Bayesian statistical inference.

“prior” distribution ←→ distribution of guesser()

“likelihood(g)” ←→ Pr(checker(g) is True

)“posterior” distribution ←→ distribution of return value

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 3/31

Page 5: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

The stochastic inference problem (informal version)

Input: guesser and checker probabilistic programs.

Output: a sample from the same distribution as the program

accept = False

while (not accept):

guess = guesser()

accept = checker(guess)

return guess

This computation captures Bayesian statistical inference.

“prior” distribution ←→ distribution of guesser()

“likelihood(g)” ←→ Pr(checker(g) is True

)“posterior” distribution ←→ distribution of return value

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 3/31

Page 6: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

The stochastic inference problem (informal version)

Input: guesser and checker probabilistic programs.

Output: a sample from the same distribution as the program

accept = False

while (not accept):

guess = guesser()

accept = checker(guess)

return guess

This computation captures Bayesian statistical inference.

“prior” distribution ←→ distribution of guesser()

“likelihood(g)” ←→ Pr(checker(g) is True

)“posterior” distribution ←→ distribution of return value

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 3/31

Page 7: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

The stochastic inference problem (informal version)

Input: guesser and checker probabilistic programs.

Output: a sample from the same distribution as the program

accept = False

while (not accept):

guess = guesser()

accept = checker(guess)

return guess

This computation captures Bayesian statistical inference.

“prior” distribution ←→ distribution of guesser()

“likelihood(g)” ←→ Pr(checker(g) is True

)“posterior” distribution ←→ distribution of return value

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 3/31

Page 8: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

The stochastic inference problem (informal version)

Input: guesser and checker probabilistic programs.

Output: a sample from the same distribution as the program

accept = False

while (not accept):

guess = guesser()

accept = checker(guess)

return guess

This computation captures Bayesian statistical inference.

“prior” distribution ←→ distribution of guesser()

“likelihood(g)” ←→ Pr(checker(g) is True

)“posterior” distribution ←→ distribution of return value

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 3/31

Page 9: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Example: predicting next coin toss in a sequence

accept = False

while (not accept):

guess = guesser()

accept = checker(guess)

return guess

Let n ≥ 0 and x1, . . . , xn ∈ {0, 1}. E.g., 0, 0, 1, 0, 0, 0, ?

guesser():• sample θ and U independently and uniformly in [0, 1], and• return (θ,X) where X = 1(U ≤ θ).

checker(θ,x):• sample U1, . . . , Un independently and uniformly in [0, 1],• let Xi = 1(Ui ≤ θ), and• accept if and only if Xi = xi for all i.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 4/31

Page 10: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Example: predicting next coin toss in a sequence

accept = False

while (not accept):

guess = guesser()

accept = checker(guess)

return guess

Let n ≥ 0 and x1, . . . , xn ∈ {0, 1}. E.g., 0, 0, 1, 0, 0, 0, ?

guesser():• sample θ and U independently and uniformly in [0, 1], and• return (θ,X) where X = 1(U ≤ θ).

checker(θ,x):• sample U1, . . . , Un independently and uniformly in [0, 1],• let Xi = 1(Ui ≤ θ), and• accept if and only if Xi = xi for all i.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 4/31

Page 11: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Example: predicting next coin toss in a sequence

accept = False

while (not accept):

guess = guesser()

accept = checker(guess)

return guess

Let n ≥ 0 and x1, . . . , xn ∈ {0, 1}. E.g., 0, 0, 1, 0, 0, 0, ?

guesser():• sample θ and U independently and uniformly in [0, 1], and• return (θ,X) where X = 1(U ≤ θ).

checker(θ,x):• sample U1, . . . , Un independently and uniformly in [0, 1],• let Xi = 1(Ui ≤ θ), and• accept if and only if Xi = xi for all i.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 4/31

Page 12: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Example: predicting next coin toss in a sequence

accept = False

while (not accept):

guess = guesser()

accept = checker(guess)

return guess

Let n ≥ 0 and x1, . . . , xn ∈ {0, 1}. E.g., 0, 0, 1, 0, 0, 0, ?

guesser():• sample θ and U independently and uniformly in [0, 1], and• return (θ,X) where X = 1(U ≤ θ).

checker(θ,x):• sample U1, . . . , Un independently and uniformly in [0, 1],• let Xi = 1(Ui ≤ θ), and• accept if and only if Xi = xi for all i.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 4/31

Page 13: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Let s = x1 + · · ·+ xn and let U be uniformly distributed.For all t ∈ [0, 1], we have Pr

(U ≤ t

)= t and

Pr(checker(t, x) is True

)= Pr

(∀i ( Ui ≤ t ⇐⇒ xi = 1 )

)= ts(1− t)n−s.

0.2 0.4 0.6 0.8 1.0t

0.01

0.02

0.03

0.04

0.05

0.06

PrHacceptL

n = 6, s ∈ {1, 3, 5}.

Pr(checker(U, x) is True

)=

∫ 1

0

ts(1− t)n−s dt =(s)!(n− s)!

(n+ 1)!=: Z(s)

Let p(t)dt be the probability that the accepted θ ∈ [t, t+ dt).

p(t)dt ≈ ts(1− t)n−sdt+(1− Z(s)

)p(t)dt ≈ ts(1− t)n−s

Z(s)dt

Probability that the accepted X = 1 is then∫t p(t)dt = s+1

n+2 .

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 5/31

Page 14: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Example: fitting a line to data (aka linear regression)

accept = False

while (not accept):

guess = guesser()

accept = checker(guess)

return guess

Let (xi, yi) ∈ R2 and σ, ν, ε > 0.

guesser():• sample coefficients α, β independently from Normal(0, σ2).

checker(α,β):• sample independent noise variables ξi from Normal(0, ν2),• let F (x) = αx+ β and Yi = F (xi) + ξi, and• accept if and only if |Yi − yi| < ε for all i.

Note that ε = 0 doesn’t work, but the limit ε→ 0 makes sense.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 6/31

Page 15: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Example: fitting a line to data (aka linear regression)

accept = False

while (not accept):

guess = guesser()

accept = checker(guess)

return guess

Let (xi, yi) ∈ R2 and σ, ν, ε > 0.

guesser():• sample coefficients α, β independently from Normal(0, σ2).

checker(α,β):• sample independent noise variables ξi from Normal(0, ν2),• let F (x) = αx+ β and Yi = F (xi) + ξi, and• accept if and only if |Yi − yi| < ε for all i.

Note that ε = 0 doesn’t work, but the limit ε→ 0 makes sense.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 6/31

Page 16: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Fantasy example: extracting 3D structure from images

accept = False

while (not accept):

guess = guesser()

accept = checker(guess)

return guess

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 7/31

Page 17: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Fantasy example: extracting 3D structure from images

accept = False

while (not accept):

guess = guesser()

accept = checker(guess)

return guess

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 7/31

Page 18: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Fantasy example: extracting 3D structure from images

accept = False

while (not accept):

guess = guesser()

accept = checker(guess)

return guess

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 7/31

Page 19: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Fantasy example: extracting 3D structure from images

accept = False

while (not accept):

guess = guesser()

accept = checker(guess)

return guess

checker−−−−−−→

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 7/31

Page 20: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Fantasy example: extracting 3D structure from images

accept = False

while (not accept):

guess = guesser()

accept = checker(guess)

return guess

checker−−−−−−→

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 7/31

Page 21: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Fantasy example: extracting 3D structure from images

accept = False

while (not accept):

guess = guesser()

accept = checker(guess)

return guess

checker−−−−−−→

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 7/31

Page 22: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Fantasy example: extracting 3D structure from images

accept = False

while (not accept):

guess = guesser()

accept = checker(guess)

return guess

inference←−−−−−−

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 7/31

Page 23: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Fantasy example: extracting 3D structure from images

accept = False

while (not accept):

guess = guesser()

accept = checker(guess)

return guess

inference←−−−−−−

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 7/31

Page 24: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Fantasy example: extracting 3D structure from images

accept = False

while (not accept):

guess = guesser()

accept = checker(guess)

return guess

inference←−−−−−−

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 7/31

Page 25: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Example: not so fantastical [Mansinghka et al.]

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 8/31

Page 26: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

The stochastic inference problem

accept = False

while (not accept):

guess = guesser()

accept = checker(guess)

return guess

Let U be a Uniform(0, 1) random variable.Let S and T be a computable metric space.

Input:X : [0, 1]→ S,Y : [0, 1]→ T , andx ∈ S.

Output:a sample from Pr

(Y (U)|X(U) = x

),

i.e., the conditional distribution of Y (U) given X(U) = x.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 9/31

Page 27: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

The stochastic inference problem

accept = False

while (not accept):

guess = guesser()

accept = checker(guess)

return guess

Let U be a Uniform(0, 1) random variable.

Let S and T be a computable metric space.

Input:X : [0, 1]→ S,Y : [0, 1]→ T , andx ∈ S.

Output:a sample from Pr

(Y (U)|X(U) = x

),

i.e., the conditional distribution of Y (U) given X(U) = x.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 9/31

Page 28: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

The stochastic inference problem

accept = False

while (not accept):

guess = guesser()

accept = checker(guess)

return guess

Let U be a Uniform(0, 1) random variable.Let S and T be a computable metric space.

Input:X : [0, 1]→ S,Y : [0, 1]→ T , andx ∈ S.

Output:a sample from Pr

(Y (U)|X(U) = x

),

i.e., the conditional distribution of Y (U) given X(U) = x.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 9/31

Page 29: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

The stochastic inference problem

accept = False

while (not accept):

guess = guesser()

accept = checker(guess)

return guess

Let U be a Uniform(0, 1) random variable.Let S and T be a computable metric space.

Input:X : [0, 1]→ S,Y : [0, 1]→ T , andx ∈ S.

Output:a sample from Pr

(Y (U)|X(U) = x

),

i.e., the conditional distribution of Y (U) given X(U) = x.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 9/31

Page 30: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

The stochastic inference problem

accept = False

while (not accept):

guess = guesser()

accept = checker(guess)

return guess

Let U be a Uniform(0, 1) random variable.Let S and T be a computable metric space.

Input:X : [0, 1]→ S,Y : [0, 1]→ T , andx ∈ S.

Output:a sample from Pr

(Y (U)|X(U) = x

),

i.e., the conditional distribution of Y (U) given X(U) = x.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 9/31

Page 31: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Bayesian statistics

1. Express statistical assumptions via probability distributions.

Pr(parameters,data)︸ ︷︷ ︸joint

= Pr(parameters)︸ ︷︷ ︸prior

Pr(data | parameters)︸ ︷︷ ︸model/likelihood

2. Statistical inference from data → parameters via conditioning.

Pr(parameters,data), xconditioning7−−−−−−−−−−→ Pr(parameters | data = x)︸ ︷︷ ︸

posterior

Probabilistic programming

1. Represent probability distributions by formulas probabilisticprograms that generate samples.

2. Build generic algorithms for probabilistic conditioningusing probabilistic programs as representations.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 10/31

Page 32: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Talk Outline1. The stochastic inference problem

2. Where are we now in probabilistic programming?

3. Approximability and Exchangeability:When can we represent conditional independence?

4. Conclusion

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 11/31

Page 33: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Talk Outline1. The stochastic inference problem

2. Where are we now in probabilistic programming?

3. Approximability and Exchangeability:When can we represent conditional independence?

4. Conclusion

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 11/31

Page 34: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

MIT-Church, Venture (MIT), webchurch (Stanford), BUGS, Tabular (MSR)Stan (Columbia), BLOG (Berkeley), Infer.NET (MSR), Figaro (CRA),

FACTORIE (UMass), ProbLog (KU Leuven), HANSEI (Indiana), . . .

Questions raised

I Which operations in probability theory can we performwhen distributions are represented by programs?

I When can we perform these computations efficiently?

I How are statistical properties (e.g., symmetries) of a distributionreflected in the structure of the computation representing it?

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 12/31

Page 35: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

MIT-Church, Venture (MIT), webchurch (Stanford), BUGS, Tabular (MSR)Stan (Columbia), BLOG (Berkeley), Infer.NET (MSR), Figaro (CRA),

FACTORIE (UMass), ProbLog (KU Leuven), HANSEI (Indiana), . . .

Questions raised

I Which operations in probability theory can we performwhen distributions are represented by programs?

I When can we perform these computations efficiently?

I How are statistical properties (e.g., symmetries) of a distributionreflected in the structure of the computation representing it?

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 12/31

Page 36: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

MIT-Church, Venture (MIT), webchurch (Stanford), BUGS, Tabular (MSR)Stan (Columbia), BLOG (Berkeley), Infer.NET (MSR), Figaro (CRA),

FACTORIE (UMass), ProbLog (KU Leuven), HANSEI (Indiana), . . .

Questions raised

I Which operations in probability theory can we performwhen distributions are represented by programs?

I When can we perform these computations efficiently?

I How are statistical properties (e.g., symmetries) of a distributionreflected in the structure of the computation representing it?

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 12/31

Page 37: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

MIT-Church, Venture (MIT), webchurch (Stanford), BUGS, Tabular (MSR)Stan (Columbia), BLOG (Berkeley), Infer.NET (MSR), Figaro (CRA),

FACTORIE (UMass), ProbLog (KU Leuven), HANSEI (Indiana), . . .

Questions raised

I Which operations in probability theory can we performwhen distributions are represented by programs?

I When can we perform these computations efficiently?

I How are statistical properties (e.g., symmetries) of a distributionreflected in the structure of the computation representing it?

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 12/31

Page 38: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

MIT-Church, Venture (MIT), webchurch (Stanford), BUGS, Tabular (MSR)Stan (Columbia), BLOG (Berkeley), Infer.NET (MSR), Figaro (CRA),

FACTORIE (UMass), ProbLog (KU Leuven), HANSEI (Indiana), . . .

Questions raised

I Which operations in probability theory can we performwhen distributions are represented by programs?

I When can we perform these computations efficiently?

I How are statistical properties (e.g., symmetries) of a distributionreflected in the structure of the computation representing it?

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 12/31

Page 39: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Q: Can we automate conditioning?

Pr(X,Y ), x 7−−−−−−→ Pr(Y |X = x)

A: No, but almost.

Y

X

Y

X

Y

X

X + ξ

ξ

Y

S

X

· · ·

X discrete X continuous Pr(ξ) smooth p(X|S) given√ × √ √[Freer and R., 2010] [Ackerman, Freer, and R., 2011] . . .

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 13/31

Page 40: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Q: What about EFFICIENT inference?

Pr(X,Y ), x 7−−−−−−→ Pr(Y |X = x)

A: It’s complicated.

Y

XX discrete×

def hash_of_random_string(n):

str = random_binary_string(n)

return cryptographic_hash(str)

Q: What explains the success of probabilisticmethods?A: Structure like conditional independence.

• Bayes nets are representations of distributions thatexpose conditional independence structure via a di-rected graph.• The complexity of exact inference in Bayes nets iscontrolled by the the tree width of the graph.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 14/31

Page 41: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Q: What about EFFICIENT inference?

Pr(X,Y ), x 7−−−−−−→ Pr(Y |X = x)

A: It’s complicated.

Y

XX discrete×

def hash_of_random_string(n):

str = random_binary_string(n)

return cryptographic_hash(str)

Q: What explains the success of probabilisticmethods?

A: Structure like conditional independence.

• Bayes nets are representations of distributions thatexpose conditional independence structure via a di-rected graph.• The complexity of exact inference in Bayes nets iscontrolled by the the tree width of the graph.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 14/31

Page 42: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Q: What about EFFICIENT inference?

Pr(X,Y ), x 7−−−−−−→ Pr(Y |X = x)

A: It’s complicated.

Y

XX discrete×

def hash_of_random_string(n):

str = random_binary_string(n)

return cryptographic_hash(str)

Q: What explains the success of probabilisticmethods?A: Structure like conditional independence.

• Bayes nets are representations of distributions thatexpose conditional independence structure via a di-rected graph.• The complexity of exact inference in Bayes nets iscontrolled by the the tree width of the graph.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 14/31

Page 43: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Q: Are probabilistic programs sufficiently general asrepresentations for stochastic processes?

We are missing a notion of approximation!

Theorem (Avigad, Freer, R., and Rute).“Approximate samplers can represent conditional independencies thatexact samplers cannot.”

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 15/31

Page 44: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Talk Outline1. The stochastic inference problem

2. Where are we now in probabilistic programming?

3. Approximability and Exchangeability:When can we represent conditional independence?

4. Conclusion

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 16/31

Page 45: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Talk Outline1. The stochastic inference problem

2. Where are we now in probabilistic programming?

3. Approximability and Exchangeability:When can we represent conditional independence?

4. Conclusion

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 16/31

Page 46: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Concrete example of an exchangeable sequence

Sum = 1.0; Total = 2.0

def next_draw():

global Sum, Total

y = bernoulli(Sum/Total)

Sum += y; Total += 1

return y

>>> repeat(next_draw, 10)

[0, 1, 1, 0, 1, 1, 1, 0, 1, 0]

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 17/31

Page 47: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Concrete example of an exchangeable sequence

Sum = 1.0; Total = 2.0

def next_draw():

global Sum, Total

y = bernoulli(Sum/Total)

Sum += y; Total += 1

return y

>>> repeat(next_draw, 10)

[0, 1, 1, 0, 1, 1, 1, 0, 1, 0]

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 17/31

Page 48: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Concrete example of an exchangeable sequence

Sum = 1.0; Total = 2.0

def next_draw():

global Sum, Total

y = bernoulli(Sum/Total)

Sum += y; Total += 1

return y

>>> repeat(next_draw, 10)

[0, 1, 1, 0, 1, 1, 1, 0, 1, 0]

2 Total

1

Su

m

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 17/31

Page 49: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Concrete example of an exchangeable sequence

Sum = 1.0; Total = 2.0

def next_draw():

global Sum, Total

y = bernoulli(Sum/Total)

Sum += y; Total += 1

return y

>>> repeat(next_draw, 10)

[0, 1, 1, 0, 1, 1, 1, 0, 1, 0]

2 Total

1

Su

m

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 17/31

Page 50: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Concrete example of an exchangeable sequence

Sum = 1.0; Total = 2.0

def next_draw():

global Sum, Total

y = bernoulli(Sum/Total)

Sum += y; Total += 1

return y

>>> repeat(next_draw, 10)

[0, 1, 1, 0, 1, 1, 1, 0, 1, 0]

2 Total

1

Su

m

2 Total1

Su

m

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 17/31

Page 51: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Concrete example of an exchangeable sequence

Sum = 1.0; Total = 2.0

def next_draw():

global Sum, Total

y = bernoulli(Sum/Total)

Sum += y; Total += 1

return y

>>> repeat(next_draw, 10)

[0, 1, 1, 0, 1, 1, 1, 0, 1, 0]

2 Total

1

Su

m

2 Total1

Su

m

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 17/31

Page 52: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Concrete example of an exchangeable sequence

Sum = 1.0; Total = 2.0

def next_draw():

global Sum, Total

y = bernoulli(Sum/Total)

Sum += y; Total += 1

return y

theta = uniform(0,1)

def next_draw():

return bernoulli(theta)

>>> repeat(next_draw, 10)

[0, 1, 1, 0, 1, 1, 1, 0, 1, 0]

2 Total

1

Su

m

2 Total1

Su

m

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 18/31

Page 53: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Exchangeability and Conditional Independence

Definition. A sequence Y = (Y1, Y2, . . . ) of random variables isexchangeable when

(Y1, . . . , Yn)d= (Yπ(1), . . . , Yπ(n)), (1)

for all n ∈ N and permutation π of {1, . . . , n}.

Theorem (de Finetti). The following are equivalent:

1. (Y1, Y2, . . . ) is exchangeable;

2. (Y1, Y2, . . . ) is conditionally i.i.d. given some θ;

3. Exists f such that

(Y1, Y2, Y3, . . . )d= (f(θ, U1), f(θ, U2), f(θ, U3), . . . ) (2)

for i.i.d. uniform θ, U1, U2, . . . .

Y1 Y2 Y3 Y4 Y5

θ

Y1 Y2 Y3 Y4 Y5

Informally: using f , we can sample Yi’s in parallel.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 19/31

Page 54: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Exchangeability and Conditional Independence

Definition. A sequence Y = (Y1, Y2, . . . ) of random variables isexchangeable when

(Y1, . . . , Yn)d= (Yπ(1), . . . , Yπ(n)), (1)

for all n ∈ N and permutation π of {1, . . . , n}.

Theorem (de Finetti). The following are equivalent:

1. (Y1, Y2, . . . ) is exchangeable;

2. (Y1, Y2, . . . ) is conditionally i.i.d. given some θ;

3. Exists f such that

(Y1, Y2, Y3, . . . )d= (f(θ, U1), f(θ, U2), f(θ, U3), . . . ) (2)

for i.i.d. uniform θ, U1, U2, . . . .

Y1 Y2 Y3 Y4 Y5

θ

Y1 Y2 Y3 Y4 Y5

Informally: using f , we can sample Yi’s in parallel.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 19/31

Page 55: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Exchangeability and Conditional Independence

Definition. A sequence Y = (Y1, Y2, . . . ) of random variables isexchangeable when

(Y1, . . . , Yn)d= (Yπ(1), . . . , Yπ(n)), (1)

for all n ∈ N and permutation π of {1, . . . , n}.

Theorem (de Finetti). The following are equivalent:

1. (Y1, Y2, . . . ) is exchangeable;

2. (Y1, Y2, . . . ) is conditionally i.i.d. given some θ;

3. Exists f such that

(Y1, Y2, Y3, . . . )d= (f(θ, U1), f(θ, U2), f(θ, U3), . . . ) (2)

for i.i.d. uniform θ, U1, U2, . . . .

Y1 Y2 Y3 Y4 Y5

θ

Y1 Y2 Y3 Y4 Y5

Informally: using f , we can sample Yi’s in parallel.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 19/31

Page 56: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Exchangeability and Conditional Independence

Definition. A sequence Y = (Y1, Y2, . . . ) of random variables isexchangeable when

(Y1, . . . , Yn)d= (Yπ(1), . . . , Yπ(n)), (1)

for all n ∈ N and permutation π of {1, . . . , n}.

Theorem (de Finetti). The following are equivalent:

1. (Y1, Y2, . . . ) is exchangeable;

2. (Y1, Y2, . . . ) is conditionally i.i.d. given some θ;

3. Exists f such that

(Y1, Y2, Y3, . . . )d= (f(θ, U1), f(θ, U2), f(θ, U3), . . . ) (2)

for i.i.d. uniform θ, U1, U2, . . . .

Y1 Y2 Y3 Y4 Y5

θ

Y1 Y2 Y3 Y4 Y5

Informally: using f , we can sample Yi’s in parallel.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 19/31

Page 57: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Exchangeability and Conditional Independence

Definition. A sequence Y = (Y1, Y2, . . . ) of random variables isexchangeable when

(Y1, . . . , Yn)d= (Yπ(1), . . . , Yπ(n)), (1)

for all n ∈ N and permutation π of {1, . . . , n}.

Theorem (de Finetti). The following are equivalent:

1. (Y1, Y2, . . . ) is exchangeable;

2. (Y1, Y2, . . . ) is conditionally i.i.d. given some θ;

3. Exists f such that

(Y1, Y2, Y3, . . . )d= (f(θ, U1), f(θ, U2), f(θ, U3), . . . ) (2)

for i.i.d. uniform θ, U1, U2, . . . .

Y1 Y2 Y3 Y4 Y5

θ

Y1 Y2 Y3 Y4 Y5

Informally: using f , we can sample Yi’s in parallel.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 19/31

Page 58: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

We can extract the hidden parallelism. [Freer and R., 2012]

Sum = 1.0; Total = 2.0

def next_draw():

global Sum, Total

y = bernoulli(Sum/Total)

Sum += y; Total += 1

return y

theta = uniform(0,1)

def next_draw():

return bernoulli(theta)

Y1 Y2 Y3 Y4 Y5

?−−−→

θ

Y1 Y2 Y3 Y4 Y5

Theorem (Freer and R., 2012). The distribution of anexchangeable sequence Y is computable if and only if there is an

almost computable f such that (Y1, Y2, . . . )d= (f(θ, U1), f(θ, U2), . . . ).

We can always recover hidden parallel structure, exposingconditional independence to the inference engine.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 20/31

Page 59: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

We can extract the hidden parallelism. [Freer and R., 2012]

Sum = 1.0; Total = 2.0

def next_draw():

global Sum, Total

y = bernoulli(Sum/Total)

Sum += y; Total += 1

return y

theta = uniform(0,1)

def next_draw():

return bernoulli(theta)

Y1 Y2 Y3 Y4 Y5 ?−−−→

θ

Y1 Y2 Y3 Y4 Y5

Theorem (Freer and R., 2012). The distribution of anexchangeable sequence Y is computable if and only if there is an

almost computable f such that (Y1, Y2, . . . )d= (f(θ, U1), f(θ, U2), . . . ).

We can always recover hidden parallel structure, exposingconditional independence to the inference engine.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 20/31

Page 60: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

We can extract the hidden parallelism. [Freer and R., 2012]

Sum = 1.0; Total = 2.0

def next_draw():

global Sum, Total

y = bernoulli(Sum/Total)

Sum += y; Total += 1

return y

theta = uniform(0,1)

def next_draw():

return bernoulli(theta)

Y1 Y2 Y3 Y4 Y5 ?−−−→

θ

Y1 Y2 Y3 Y4 Y5

Theorem (Freer and R., 2012). The distribution of anexchangeable sequence Y is computable if and only if there is an

almost computable f such that (Y1, Y2, . . . )d= (f(θ, U1), f(θ, U2), . . . ).

We can always recover hidden parallel structure, exposingconditional independence to the inference engine.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 20/31

Page 61: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Where else can we findhidden conditional independence?

Can we extract it for inference?

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 21/31

Page 62: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Exchangeable arrays in models of graph/relational data

Definition.structure symmetry definition

sequence (Yn) exchangeable (Yn)d= (Yπ(n))

array (Xi,j) separately exchangeable (Xi,j)d= (Xπ(i),τ(j))

array (Xi,j) jointly exchangeable (Xi,j)d= (Xπ(i),π(j))

Example. Adjacency matrix (Xi,j)i,j∈N of an undirected graph on N.EXCHANGEABILITY FOR CORRESPONDING ARRAYS

12

3

4

56

7

8

9

102

7

6

5

31

10

8

4

9

James Lloyd 9 / 45

Orbanz and R. (2014).

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 22/31

Page 63: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Exchangeable arrays in models of graph/relational data

Definition.structure symmetry definition

sequence (Yn) exchangeable (Yn)d= (Yπ(n))

array (Xi,j) separately exchangeable (Xi,j)d= (Xπ(i),τ(j))

array (Xi,j) jointly exchangeable (Xi,j)d= (Xπ(i),π(j))

Example. Adjacency matrix (Xi,j)i,j∈N of an undirected graph on N.EXCHANGEABILITY FOR CORRESPONDING ARRAYS

12

3

4

56

7

8

9

102

7

6

5

31

10

8

4

9

James Lloyd 9 / 45

Orbanz and R. (2014).

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 22/31

Page 64: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Exchangeable arrays in models of graph/relational data

Theorem (Aldous-Hoover). θ, Ui, Vj ,Wi,j all i.i.d. uniform.

structure symmetry representation

array (Xi,j) (Xi,j)d= (Xπ(i),τ(j)) (Xi,j)

d= (f(θ, Vi, Uj ,Wi,j))

sequence (Yn) (Yn)d= (Yπ(n)) (Yn)

d= (f(θ, Un))

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 23/31

Page 65: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Exchangeable arrays in models of graph/relational data

Theorem (Aldous-Hoover). θ, Ui, Vj ,Wi,j all i.i.d. uniform.

structure symmetry representation

array (Xi,j) (Xi,j)d= (Xπ(i),τ(j)) (Xi,j)

d= (f(θ, Vi, Uj ,Wi,j))

sequence (Yn) (Yn)d= (Yπ(n)) (Yn)

d= (f(θ, Un))

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 23/31

Page 66: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Exchangeable arrays in models of graph/relational data

Theorem (Aldous-Hoover). θ, Ui, Vj ,Wi,j all i.i.d. uniform.

structure symmetry representation

array (Xi,j) (Xi,j)d= (Xπ(i),τ(j)) (Xi,j)

d= (f(θ, Vi, Uj ,Wi,j))

sequence (Yn) (Yn)d= (Yπ(n)) (Yn)

d= (f(θ, Un))

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 23/31

Page 67: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Visualization of Aldous-Hoover theorem forexchangeable arrays

x

x

Θ U1 U2 U3 U4

X1,1

X2,1

X3,1

X4,1

X1,2

X2,2

X3,2

X4,2

X1,3

X2,3

X3,3

X4,3

X1,4

X2,4

X3,4

X4,4

?

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 24/31

Page 68: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Visualization of Aldous-Hoover theorem forexchangeable arrays

x

x

Θ U1 U2 U3 U4

X1,1

X2,1

X3,1

X4,1

X1,2

X2,2

X3,2

X4,2

X1,3

X2,3

X3,3

X4,3

X1,4

X2,4

X3,4

X4,4

?

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 24/31

Page 69: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Visualization of Aldous-Hoover theorem forexchangeable arrays

x

x

θ U1 U2 U3 U4

V1

V2

V3

V4

X1,1

X2,1

X3,1

X4,1

X1,2

X2,2

X3,2

X4,2

X1,3

X2,3

X3,3

X4,3

X1,4

X2,4

X3,4

X4,4

X3,4

?

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 24/31

Page 70: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Visualization of Aldous-Hoover theorem forexchangeable arrays

x

x

θ U1 U2 U3 U4

V1

V2

V3

V4

X1,1

X2,1

X3,1

X4,1

X1,2

X2,2

X3,2

X4,2

X1,3

X2,3

X3,3

X4,3

X1,4

X2,4

X3,4

X4,4

X3,4

?

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 24/31

Page 71: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Q: Is the Aldous-Hoover theorem computable?A: No.

Theorem (Avigad, Freer, R., and Rute). There is anexchangeable array X with a computable distribution but noa.e. computable f satisfying Aldous-Hoover.

Even “computationally universal” probabilistic programminglanguages cannot represent certain conditional independencestructure.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 25/31

Page 72: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Computably-distributed array X, noncomputable f [AFRR]

The construction (an aliens dating site).

I Let rows/columns represent aliens.Xi,j = 1 means aliens i and j are matched.

I Each alien answers an infinitely-long questionnaire.

I Question k ∈ {1, 2, . . . } has 2k possible answers.

I Aliens hate answering questionnaires, so they answer randomly.

I Two aliens are matched if they agree on ANY question.

Proof sketch.

I Note: f is “return 1 iff two aliens agree somewhere”.

I (f not a.e. computable) [Topological obstruction.]

Given two questionnaires, can’t accurately check in finite time.

I (array computably-distributed)The probability of agreeing on any question n, n+ 1, . . . decays.Using only first n questions yields an approximation.

Approximating f sufficed to sample. Q: The converse?

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 26/31

Page 73: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Computably-distributed array X, noncomputable f [AFRR]

The construction (an aliens dating site).

I Let rows/columns represent aliens.Xi,j = 1 means aliens i and j are matched.

I Each alien answers an infinitely-long questionnaire.

I Question k ∈ {1, 2, . . . } has 2k possible answers.

I Aliens hate answering questionnaires, so they answer randomly.

I Two aliens are matched if they agree on ANY question.

Proof sketch.

I Note: f is “return 1 iff two aliens agree somewhere”.

I (f not a.e. computable) [Topological obstruction.]

Given two questionnaires, can’t accurately check in finite time.

I (array computably-distributed)The probability of agreeing on any question n, n+ 1, . . . decays.Using only first n questions yields an approximation.

Approximating f sufficed to sample. Q: The converse?

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 26/31

Page 74: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Silver lining? f is always “nearly computable”

Let µ be a computable probability measure.

Definition. Say f is a.e. computable whenwe can compute f on a set of µ-measure one.

Definition (Kriesel-Lacombe (1957), Sanin (1968), Ko (1986)).

Say f is computably measurable when,uniformly for any ε > 0,we can compute f on a set of µ-measure at least 1− ε.

Theorem (Avigad, Freer, R., and Rute). The distribution of anexchangeable array X is computable if and only if there is acomputably measurable function f satisfying Aldous-Hoover.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 27/31

Page 75: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Exchangeability and probabilistic programming

Exchangeable random structures possess a lot of structure.

(Yi)d= (f(θ, Ui))

(Xi,j)d= (f(θ, Ui, Vj ,Wi,j))

Can your favorite PPL represent f?

Theorem (FR12). f a.e. computable for sequences.Theorem (AFRR). f merely computably measurable for arrays.

Approximation essential for capturing structure.

But do such arrays appear in practice?

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 28/31

Page 76: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Exchangeability and probabilistic programming

Exchangeable random structures possess a lot of structure.

(Yi)d= (f(θ, Ui))

(Xi,j)d= (f(θ, Ui, Vj ,Wi,j))

Can your favorite PPL represent f?

Theorem (FR12). f a.e. computable for sequences.Theorem (AFRR). f merely computably measurable for arrays.

Approximation essential for capturing structure.

But do such arrays appear in practice?

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 28/31

Page 77: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Exchangeability and probabilistic programming

Exchangeable random structures possess a lot of structure.

(Yi)d= (f(θ, Ui))

(Xi,j)d= (f(θ, Ui, Vj ,Wi,j))

Can your favorite PPL represent f?

Theorem (FR12). f a.e. computable for sequences.Theorem (AFRR). f merely computably measurable for arrays.

Approximation essential for capturing structure.

But do such arrays appear in practice?

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 28/31

Page 78: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Do such arrays X appear in practice?

YES!

Food Tools Vehicles Animals

stra

wber

rypi

neap

ple

grap

efru

itap

ple

tang

erin

ene

ctar

ine

lem

ongr

ape

oran

gecu

cum

ber

carro

tra

dish

onio

nsle

ttuce

pota

tocla

mp

drill

plie

rswr

ench

shov

elch

isel

tom

ahaw

ksle

dgeh

amm

er axe

ham

mer

hoe

sciss

ors

scre

wdriv

erra

kecr

owba

rva

nca

rtru

ck bus

jeep

ship

bike

helic

opte

rtra

inm

otor

cycle

tricy

clewh

eelb

arro

wsu

bmar

ine

yach

tje

tse

allio

ndo

lphi

nm

ouse

duck tiger rat

chick

en cat

deer

squi

rrel

shee

p pig

hors

eco

w

a toolan animal

made of metalis edible

has a handleis juicyis fast

eaten in saladsis white

has 4 wheelshas 4 legs

Figure 2: MAP tree recovered from a data set including 60 objects from four domains. MAP partitions forseveral features are shown: the model discovers, for example, that “is juicy” is associated with only one part ofthe tree. The weight of each edge in the tree is proportional to its vertical extent.

Diseases Chemicals

Acqu

ired

Abno

rmal

ity

Cong

enita

l Abn

orm

ality

Org

an F

unct

ion

Cell F

unct

ion

Phys

iolo

gic

Func

tion

Dise

ase

or S

yndr

ome

Path

olog

ic Fu

nctio

n

Cell D

ysfu

nctio

n

Tiss

ue Cell

Anat

omica

l Stru

ctur

e

Anim

al

Bird

Plan

t

Mam

mal

Natu

ral P

roce

ss

Hum

an−c

ause

d Pr

oces

s

Ther

apeu

tic P

roce

dure

Labo

rato

ry P

roce

dure

Diag

nost

ic Pr

oced

ure

Labo

rato

ry R

esul

t

Find

ing

Sign

or S

ympt

om

Ster

oid

Carb

ohyd

rate

Lipi

d

Amin

o Ac

id

Horm

one

Enzy

me

Antib

iotic

analyzes affects process of causes causes (IRM)

Figure 3: MAP tree recovered from 49 relations between entities in a biomedical data set. Four relations areshown (rows and columns permuted to match in-order traversal of the MAP tree). Consider the circled subset ofthe t-c partition for causes. This block captures the knowledge that “chemicals” cause “diseases.” The InfiniteRelational Model (IRM) does not capture the appropriate structure in the relation cause because it does notmodel the latent hierarchy, instead choosing a single partition to describe the structure across all relations.

representative features. The model discovers that some features are associated only with certainparts of the tree: “is juicy” is associated with the fruits, and “is metal” is associated with the man-made items. Discovering domains is a fundamental cognitive problem that may be solved earlyin development [11], but that is ignored by many cognitive models, which consider only carefullychosen data from a single domain (e.g. data including only animals and only biological features). Byorganizing the 60 objects into domains and identifying a subset of features that are associated witheach domain, our model begins to suggest how infants may parse their environment into coherentdomains of objects and features.

Our second application explores the acquisition of ontological knowledge, a problem that has beenpreviously discussed by Keil [7]. We demonstrate that our model discovers a simple biomedicalontology given data from the Unified Medical Language System (UMLS) [12]. The full data set in-cludes 135 entities and 49 binary relations, where the entities are ontological categories like ‘Sign orSymptom’, ‘Cell’, and ‘Disease or Syndrome,’ and the relations include verbs like causes, analyzesand affects. We applied our model to a subset of the data including the 30 entities shown in Figure 3.

“OR-nonexch” — 2013/6/12 — 14:07 — page 13 — #13

13

Fig. 7: Typical directing random functions underlying, from left to right, 1) an IRM (where partitions correspond with a Chinese restaurant process) withconditionally i.i.d. link probabilities; 2) a more flexible variant of the IRM with merely exchangeable link probabilities as in Example IV.3; 3) a LFRM (wherepartitions correspond with an Indian buffet process) with feature-exchangeable link probabilities as in Example IV.10; 4) a Mondrian-process-based modelwith a single latent dimension; 5) a Gaussian-processed-based model with a single latent dimension. (Note that, in practice, one would use more than onelatent dimension in the last two examples, although this complicates visualization. In the first four figures, we have truncated each of the “stick-breaking”constructions at a finite depth, although, at the resolution of the figures, it is very difficult to notice the effect.)

Then, two objects represented by random variables U and U 0

are equivalent iff U, U 0 2 E(N) for some finite set N ⇢ N. Asbefore, we could consider a simple, cluster-based representingfunction where the block values are given by an (fN,M ),indexed now by finite subsets N, M ✓ N. Then fN,M woulddetermine how two objects relate when they possess featuresN and M , respectively.

However, if we want to capture the idea that the rela-tionships between objects depend on the individual featuresthe objects possess, we would not want to assume that theentries of fN,M formed an exchangeable array, as in the caseof a simple, cluster-based model. E.g., we might chooseto induce more dependence between fN,M and fN 0,M whenN \N 0 6= ; than otherwise. The following definition capturesthe appropriate relaxation of exchangeability:

Definition IV.9 (feature-exchangeable array). Let Y :=(YN,M ) be an array of random variables indexed by pairsN, M ✓ N of finite subsets. For a permutation ⇡ of N andN ✓ N, write ⇡(N) := {⇡(n) : n 2 N} for the image. Then,we say that Y is feature-exchangeable when

(YN,M )d= (Y⇡(N),⇡(M)), (IV.7)

for all permutations ⇡ of N. /

Informally, an array Y indexed by sets of features is feature-exchangeable if its distribution is invariant to permutations ofthe underlying feature labels (i.e., of N). The following is anexample of a feature-exchangeable array, which we will usewhen we re-describe the Latent Feature Relational Model inthe language of feature-based models:

Example IV.10 (feature-exchangeable link probabilities). Letw := (wij) be a conditionally i.i.d. array of random variablesin R, and define ✓ := (✓N,M ) by

✓N,M = sig(P

i2N

Pj2M wij), (IV.8)

where sig : R ! [0, 1] maps real values to probabilities via,e.g., the sigmoid or probit functions. It is straightforward toverify that ✓ is feature-exchangeable. /

We can now define simple feature-based models:

Definition IV.11. We say that a Bayesian model of an ex-changeable array X is simple feature-based when, for some

random function F representing X , there are random featureallocations B and C of the unit interval [0, 1] such that, forevery pair N, M ✓ N of finite subsets, F takes the constantvalue fN,M on the block

AN,M := B(N) ⇥ C(M) ⇥ [0, 1], (IV.9)

and the values f := (fN,M ) themselves form a feature-exchangeable array, independent of B and C. We say an arrayis simple feature-based if its distribution is. /

We can relate this definition back to cluster-based modelsby pointing out that simple feature-based arrays are simplecluster-based arrays when either i) the feature allocationsare partitions or ii) the array f is exchangeable. The lattercase highlights the fact that feature-based arrays relax theexchangeability assumption of the underlying block values.

As in the case of simple cluster-based models, nonparamet-ric simple feature-based models will place positive mass onfeature allocations with an arbitrary number of distinct sets.As we did with general cluster-based models, we will definegeneral feature-based models as randomizations of simplemodels:

Definition IV.12 (feature-based models). We say that aBayesian model for an exchangeable array X := (Xij) in Xis feature-based when X is a P -randomization of a simple,feature-based, exchangeable array ✓ := (✓ij) taking values ina space T , for some probability kernel P from T to X. Wesay an array is feature-based when its distribution is. /

Comparing Definitions IV.5 and IV.12, we see that therelationship between random functions representing ✓ and Xare the same as with cluster-based models. We now return tothe LFRM model, and describe it in the language of feature-based models:

Example IV.13 (Latent Feature Relational Model continued).The random feature allocations underlying the LFRM can bedescribed in terms of so-called “stick-breaking” constructionsof the Indian buffet process. One of the simplest stick-breakingconstructions, and the one we will use here, is due to Teh,Gorur, and Ghahramani [61]. (See also [63], [52] and [53].)

Let W1, W2, . . . be an i.i.d. sequence of Beta(↵, 1) randomvariables for some concentration parameter ↵ > 0. For everyn, we define Pn :=

Qnj=1 Wj . (The relationship between

Roy and Teh. The Mondrian process. (2009)

Orbanz and Roy (2014).

Lloyd, Orbanz, Ghahramani, and Roy (2012).

Savova, Roy, Schmidt, and Tenenbaum (2007).

Roy, Kemp, Mansinghka, and Tenenbaum (2007)√

a.e. computable f× merely computably measurable f

√Infinite Relational Model

Dirichlet process

(Kemp, Tenenbaum, Griffiths, Yamada, and Ueda 2008)√Linear Relational Model

Mondrian process

(R. and Teh 2009)

× Infinite Feature Relational Model

Beta process

(Miller, Griffiths, and Jordan 2010)

× Random Function Model

Gaussian process

(Lloyd, Orbanz, R., and Ghahramani 2012)

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 29/31

Page 79: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Do such arrays X appear in practice?

YES!

Food Tools Vehicles Animals

stra

wber

rypi

neap

ple

grap

efru

itap

ple

tang

erin

ene

ctar

ine

lem

ongr

ape

oran

gecu

cum

ber

carro

tra

dish

onio

nsle

ttuce

pota

tocla

mp

drill

plie

rswr

ench

shov

elch

isel

tom

ahaw

ksle

dgeh

amm

er axe

ham

mer

hoe

sciss

ors

scre

wdriv

erra

kecr

owba

rva

nca

rtru

ck bus

jeep

ship

bike

helic

opte

rtra

inm

otor

cycle

tricy

clewh

eelb

arro

wsu

bmar

ine

yach

tje

tse

allio

ndo

lphi

nm

ouse

duck tiger rat

chick

en cat

deer

squi

rrel

shee

p pig

hors

eco

w

a toolan animal

made of metalis edible

has a handleis juicyis fast

eaten in saladsis white

has 4 wheelshas 4 legs

Figure 2: MAP tree recovered from a data set including 60 objects from four domains. MAP partitions forseveral features are shown: the model discovers, for example, that “is juicy” is associated with only one part ofthe tree. The weight of each edge in the tree is proportional to its vertical extent.

Diseases Chemicals

Acqu

ired

Abno

rmal

ity

Cong

enita

l Abn

orm

ality

Org

an F

unct

ion

Cell F

unct

ion

Phys

iolo

gic

Func

tion

Dise

ase

or S

yndr

ome

Path

olog

ic Fu

nctio

n

Cell D

ysfu

nctio

n

Tiss

ue Cell

Anat

omica

l Stru

ctur

e

Anim

al

Bird

Plan

t

Mam

mal

Natu

ral P

roce

ss

Hum

an−c

ause

d Pr

oces

s

Ther

apeu

tic P

roce

dure

Labo

rato

ry P

roce

dure

Diag

nost

ic Pr

oced

ure

Labo

rato

ry R

esul

t

Find

ing

Sign

or S

ympt

om

Ster

oid

Carb

ohyd

rate

Lipi

d

Amin

o Ac

id

Horm

one

Enzy

me

Antib

iotic

analyzes affects process of causes causes (IRM)

Figure 3: MAP tree recovered from 49 relations between entities in a biomedical data set. Four relations areshown (rows and columns permuted to match in-order traversal of the MAP tree). Consider the circled subset ofthe t-c partition for causes. This block captures the knowledge that “chemicals” cause “diseases.” The InfiniteRelational Model (IRM) does not capture the appropriate structure in the relation cause because it does notmodel the latent hierarchy, instead choosing a single partition to describe the structure across all relations.

representative features. The model discovers that some features are associated only with certainparts of the tree: “is juicy” is associated with the fruits, and “is metal” is associated with the man-made items. Discovering domains is a fundamental cognitive problem that may be solved earlyin development [11], but that is ignored by many cognitive models, which consider only carefullychosen data from a single domain (e.g. data including only animals and only biological features). Byorganizing the 60 objects into domains and identifying a subset of features that are associated witheach domain, our model begins to suggest how infants may parse their environment into coherentdomains of objects and features.

Our second application explores the acquisition of ontological knowledge, a problem that has beenpreviously discussed by Keil [7]. We demonstrate that our model discovers a simple biomedicalontology given data from the Unified Medical Language System (UMLS) [12]. The full data set in-cludes 135 entities and 49 binary relations, where the entities are ontological categories like ‘Sign orSymptom’, ‘Cell’, and ‘Disease or Syndrome,’ and the relations include verbs like causes, analyzesand affects. We applied our model to a subset of the data including the 30 entities shown in Figure 3.

“OR-nonexch” — 2013/6/12 — 14:07 — page 13 — #13

13

Fig. 7: Typical directing random functions underlying, from left to right, 1) an IRM (where partitions correspond with a Chinese restaurant process) withconditionally i.i.d. link probabilities; 2) a more flexible variant of the IRM with merely exchangeable link probabilities as in Example IV.3; 3) a LFRM (wherepartitions correspond with an Indian buffet process) with feature-exchangeable link probabilities as in Example IV.10; 4) a Mondrian-process-based modelwith a single latent dimension; 5) a Gaussian-processed-based model with a single latent dimension. (Note that, in practice, one would use more than onelatent dimension in the last two examples, although this complicates visualization. In the first four figures, we have truncated each of the “stick-breaking”constructions at a finite depth, although, at the resolution of the figures, it is very difficult to notice the effect.)

Then, two objects represented by random variables U and U 0

are equivalent iff U, U 0 2 E(N) for some finite set N ⇢ N. Asbefore, we could consider a simple, cluster-based representingfunction where the block values are given by an (fN,M ),indexed now by finite subsets N, M ✓ N. Then fN,M woulddetermine how two objects relate when they possess featuresN and M , respectively.

However, if we want to capture the idea that the rela-tionships between objects depend on the individual featuresthe objects possess, we would not want to assume that theentries of fN,M formed an exchangeable array, as in the caseof a simple, cluster-based model. E.g., we might chooseto induce more dependence between fN,M and fN 0,M whenN \N 0 6= ; than otherwise. The following definition capturesthe appropriate relaxation of exchangeability:

Definition IV.9 (feature-exchangeable array). Let Y :=(YN,M ) be an array of random variables indexed by pairsN, M ✓ N of finite subsets. For a permutation ⇡ of N andN ✓ N, write ⇡(N) := {⇡(n) : n 2 N} for the image. Then,we say that Y is feature-exchangeable when

(YN,M )d= (Y⇡(N),⇡(M)), (IV.7)

for all permutations ⇡ of N. /

Informally, an array Y indexed by sets of features is feature-exchangeable if its distribution is invariant to permutations ofthe underlying feature labels (i.e., of N). The following is anexample of a feature-exchangeable array, which we will usewhen we re-describe the Latent Feature Relational Model inthe language of feature-based models:

Example IV.10 (feature-exchangeable link probabilities). Letw := (wij) be a conditionally i.i.d. array of random variablesin R, and define ✓ := (✓N,M ) by

✓N,M = sig(P

i2N

Pj2M wij), (IV.8)

where sig : R ! [0, 1] maps real values to probabilities via,e.g., the sigmoid or probit functions. It is straightforward toverify that ✓ is feature-exchangeable. /

We can now define simple feature-based models:

Definition IV.11. We say that a Bayesian model of an ex-changeable array X is simple feature-based when, for some

random function F representing X , there are random featureallocations B and C of the unit interval [0, 1] such that, forevery pair N, M ✓ N of finite subsets, F takes the constantvalue fN,M on the block

AN,M := B(N) ⇥ C(M) ⇥ [0, 1], (IV.9)

and the values f := (fN,M ) themselves form a feature-exchangeable array, independent of B and C. We say an arrayis simple feature-based if its distribution is. /

We can relate this definition back to cluster-based modelsby pointing out that simple feature-based arrays are simplecluster-based arrays when either i) the feature allocationsare partitions or ii) the array f is exchangeable. The lattercase highlights the fact that feature-based arrays relax theexchangeability assumption of the underlying block values.

As in the case of simple cluster-based models, nonparamet-ric simple feature-based models will place positive mass onfeature allocations with an arbitrary number of distinct sets.As we did with general cluster-based models, we will definegeneral feature-based models as randomizations of simplemodels:

Definition IV.12 (feature-based models). We say that aBayesian model for an exchangeable array X := (Xij) in Xis feature-based when X is a P -randomization of a simple,feature-based, exchangeable array ✓ := (✓ij) taking values ina space T , for some probability kernel P from T to X. Wesay an array is feature-based when its distribution is. /

Comparing Definitions IV.5 and IV.12, we see that therelationship between random functions representing ✓ and Xare the same as with cluster-based models. We now return tothe LFRM model, and describe it in the language of feature-based models:

Example IV.13 (Latent Feature Relational Model continued).The random feature allocations underlying the LFRM can bedescribed in terms of so-called “stick-breaking” constructionsof the Indian buffet process. One of the simplest stick-breakingconstructions, and the one we will use here, is due to Teh,Gorur, and Ghahramani [61]. (See also [63], [52] and [53].)

Let W1, W2, . . . be an i.i.d. sequence of Beta(↵, 1) randomvariables for some concentration parameter ↵ > 0. For everyn, we define Pn :=

Qnj=1 Wj . (The relationship between

Roy and Teh. The Mondrian process. (2009)

Orbanz and Roy (2014).

Lloyd, Orbanz, Ghahramani, and Roy (2012).

Savova, Roy, Schmidt, and Tenenbaum (2007).

Roy, Kemp, Mansinghka, and Tenenbaum (2007)√

a.e. computable f× merely computably measurable f

√Infinite Relational Model Dirichlet process

(Kemp, Tenenbaum, Griffiths, Yamada, and Ueda 2008)√Linear Relational Model

Mondrian process

(R. and Teh 2009)

× Infinite Feature Relational Model

Beta process

(Miller, Griffiths, and Jordan 2010)

× Random Function Model

Gaussian process

(Lloyd, Orbanz, R., and Ghahramani 2012)

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 29/31

Page 80: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Do such arrays X appear in practice?

YES!

Food Tools Vehicles Animals

stra

wber

rypi

neap

ple

grap

efru

itap

ple

tang

erin

ene

ctar

ine

lem

ongr

ape

oran

gecu

cum

ber

carro

tra

dish

onio

nsle

ttuce

pota

tocla

mp

drill

plie

rswr

ench

shov

elch

isel

tom

ahaw

ksle

dgeh

amm

er axe

ham

mer

hoe

sciss

ors

scre

wdriv

erra

kecr

owba

rva

nca

rtru

ck bus

jeep

ship

bike

helic

opte

rtra

inm

otor

cycle

tricy

clewh

eelb

arro

wsu

bmar

ine

yach

tje

tse

allio

ndo

lphi

nm

ouse

duck tiger rat

chick

en cat

deer

squi

rrel

shee

p pig

hors

eco

w

a toolan animal

made of metalis edible

has a handleis juicyis fast

eaten in saladsis white

has 4 wheelshas 4 legs

Figure 2: MAP tree recovered from a data set including 60 objects from four domains. MAP partitions forseveral features are shown: the model discovers, for example, that “is juicy” is associated with only one part ofthe tree. The weight of each edge in the tree is proportional to its vertical extent.

Diseases Chemicals

Acqu

ired

Abno

rmal

ity

Cong

enita

l Abn

orm

ality

Org

an F

unct

ion

Cell F

unct

ion

Phys

iolo

gic

Func

tion

Dise

ase

or S

yndr

ome

Path

olog

ic Fu

nctio

n

Cell D

ysfu

nctio

n

Tiss

ue Cell

Anat

omica

l Stru

ctur

e

Anim

al

Bird

Plan

t

Mam

mal

Natu

ral P

roce

ss

Hum

an−c

ause

d Pr

oces

s

Ther

apeu

tic P

roce

dure

Labo

rato

ry P

roce

dure

Diag

nost

ic Pr

oced

ure

Labo

rato

ry R

esul

t

Find

ing

Sign

or S

ympt

om

Ster

oid

Carb

ohyd

rate

Lipi

d

Amin

o Ac

id

Horm

one

Enzy

me

Antib

iotic

analyzes affects process of causes causes (IRM)

Figure 3: MAP tree recovered from 49 relations between entities in a biomedical data set. Four relations areshown (rows and columns permuted to match in-order traversal of the MAP tree). Consider the circled subset ofthe t-c partition for causes. This block captures the knowledge that “chemicals” cause “diseases.” The InfiniteRelational Model (IRM) does not capture the appropriate structure in the relation cause because it does notmodel the latent hierarchy, instead choosing a single partition to describe the structure across all relations.

representative features. The model discovers that some features are associated only with certainparts of the tree: “is juicy” is associated with the fruits, and “is metal” is associated with the man-made items. Discovering domains is a fundamental cognitive problem that may be solved earlyin development [11], but that is ignored by many cognitive models, which consider only carefullychosen data from a single domain (e.g. data including only animals and only biological features). Byorganizing the 60 objects into domains and identifying a subset of features that are associated witheach domain, our model begins to suggest how infants may parse their environment into coherentdomains of objects and features.

Our second application explores the acquisition of ontological knowledge, a problem that has beenpreviously discussed by Keil [7]. We demonstrate that our model discovers a simple biomedicalontology given data from the Unified Medical Language System (UMLS) [12]. The full data set in-cludes 135 entities and 49 binary relations, where the entities are ontological categories like ‘Sign orSymptom’, ‘Cell’, and ‘Disease or Syndrome,’ and the relations include verbs like causes, analyzesand affects. We applied our model to a subset of the data including the 30 entities shown in Figure 3.

“OR-nonexch” — 2013/6/12 — 14:07 — page 13 — #13

13

Fig. 7: Typical directing random functions underlying, from left to right, 1) an IRM (where partitions correspond with a Chinese restaurant process) withconditionally i.i.d. link probabilities; 2) a more flexible variant of the IRM with merely exchangeable link probabilities as in Example IV.3; 3) a LFRM (wherepartitions correspond with an Indian buffet process) with feature-exchangeable link probabilities as in Example IV.10; 4) a Mondrian-process-based modelwith a single latent dimension; 5) a Gaussian-processed-based model with a single latent dimension. (Note that, in practice, one would use more than onelatent dimension in the last two examples, although this complicates visualization. In the first four figures, we have truncated each of the “stick-breaking”constructions at a finite depth, although, at the resolution of the figures, it is very difficult to notice the effect.)

Then, two objects represented by random variables U and U 0

are equivalent iff U, U 0 2 E(N) for some finite set N ⇢ N. Asbefore, we could consider a simple, cluster-based representingfunction where the block values are given by an (fN,M ),indexed now by finite subsets N, M ✓ N. Then fN,M woulddetermine how two objects relate when they possess featuresN and M , respectively.

However, if we want to capture the idea that the rela-tionships between objects depend on the individual featuresthe objects possess, we would not want to assume that theentries of fN,M formed an exchangeable array, as in the caseof a simple, cluster-based model. E.g., we might chooseto induce more dependence between fN,M and fN 0,M whenN \N 0 6= ; than otherwise. The following definition capturesthe appropriate relaxation of exchangeability:

Definition IV.9 (feature-exchangeable array). Let Y :=(YN,M ) be an array of random variables indexed by pairsN, M ✓ N of finite subsets. For a permutation ⇡ of N andN ✓ N, write ⇡(N) := {⇡(n) : n 2 N} for the image. Then,we say that Y is feature-exchangeable when

(YN,M )d= (Y⇡(N),⇡(M)), (IV.7)

for all permutations ⇡ of N. /

Informally, an array Y indexed by sets of features is feature-exchangeable if its distribution is invariant to permutations ofthe underlying feature labels (i.e., of N). The following is anexample of a feature-exchangeable array, which we will usewhen we re-describe the Latent Feature Relational Model inthe language of feature-based models:

Example IV.10 (feature-exchangeable link probabilities). Letw := (wij) be a conditionally i.i.d. array of random variablesin R, and define ✓ := (✓N,M ) by

✓N,M = sig(P

i2N

Pj2M wij), (IV.8)

where sig : R ! [0, 1] maps real values to probabilities via,e.g., the sigmoid or probit functions. It is straightforward toverify that ✓ is feature-exchangeable. /

We can now define simple feature-based models:

Definition IV.11. We say that a Bayesian model of an ex-changeable array X is simple feature-based when, for some

random function F representing X , there are random featureallocations B and C of the unit interval [0, 1] such that, forevery pair N, M ✓ N of finite subsets, F takes the constantvalue fN,M on the block

AN,M := B(N) ⇥ C(M) ⇥ [0, 1], (IV.9)

and the values f := (fN,M ) themselves form a feature-exchangeable array, independent of B and C. We say an arrayis simple feature-based if its distribution is. /

We can relate this definition back to cluster-based modelsby pointing out that simple feature-based arrays are simplecluster-based arrays when either i) the feature allocationsare partitions or ii) the array f is exchangeable. The lattercase highlights the fact that feature-based arrays relax theexchangeability assumption of the underlying block values.

As in the case of simple cluster-based models, nonparamet-ric simple feature-based models will place positive mass onfeature allocations with an arbitrary number of distinct sets.As we did with general cluster-based models, we will definegeneral feature-based models as randomizations of simplemodels:

Definition IV.12 (feature-based models). We say that aBayesian model for an exchangeable array X := (Xij) in Xis feature-based when X is a P -randomization of a simple,feature-based, exchangeable array ✓ := (✓ij) taking values ina space T , for some probability kernel P from T to X. Wesay an array is feature-based when its distribution is. /

Comparing Definitions IV.5 and IV.12, we see that therelationship between random functions representing ✓ and Xare the same as with cluster-based models. We now return tothe LFRM model, and describe it in the language of feature-based models:

Example IV.13 (Latent Feature Relational Model continued).The random feature allocations underlying the LFRM can bedescribed in terms of so-called “stick-breaking” constructionsof the Indian buffet process. One of the simplest stick-breakingconstructions, and the one we will use here, is due to Teh,Gorur, and Ghahramani [61]. (See also [63], [52] and [53].)

Let W1, W2, . . . be an i.i.d. sequence of Beta(↵, 1) randomvariables for some concentration parameter ↵ > 0. For everyn, we define Pn :=

Qnj=1 Wj . (The relationship between

Roy and Teh. The Mondrian process. (2009)

Orbanz and Roy (2014).

Lloyd, Orbanz, Ghahramani, and Roy (2012).

Savova, Roy, Schmidt, and Tenenbaum (2007).

Roy, Kemp, Mansinghka, and Tenenbaum (2007)√

a.e. computable f× merely computably measurable f

√Infinite Relational Model Dirichlet process

(Kemp, Tenenbaum, Griffiths, Yamada, and Ueda 2008)√Linear Relational Model Mondrian process

(R. and Teh 2009)

× Infinite Feature Relational Model

Beta process

(Miller, Griffiths, and Jordan 2010)

× Random Function Model

Gaussian process

(Lloyd, Orbanz, R., and Ghahramani 2012)

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 29/31

Page 81: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Do such arrays X appear in practice?

YES!

Food Tools Vehicles Animals

stra

wber

rypi

neap

ple

grap

efru

itap

ple

tang

erin

ene

ctar

ine

lem

ongr

ape

oran

gecu

cum

ber

carro

tra

dish

onio

nsle

ttuce

pota

tocla

mp

drill

plie

rswr

ench

shov

elch

isel

tom

ahaw

ksle

dgeh

amm

er axe

ham

mer

hoe

sciss

ors

scre

wdriv

erra

kecr

owba

rva

nca

rtru

ck bus

jeep

ship

bike

helic

opte

rtra

inm

otor

cycle

tricy

clewh

eelb

arro

wsu

bmar

ine

yach

tje

tse

allio

ndo

lphi

nm

ouse

duck tiger rat

chick

en cat

deer

squi

rrel

shee

p pig

hors

eco

w

a toolan animal

made of metalis edible

has a handleis juicyis fast

eaten in saladsis white

has 4 wheelshas 4 legs

Figure 2: MAP tree recovered from a data set including 60 objects from four domains. MAP partitions forseveral features are shown: the model discovers, for example, that “is juicy” is associated with only one part ofthe tree. The weight of each edge in the tree is proportional to its vertical extent.

Diseases Chemicals

Acqu

ired

Abno

rmal

ity

Cong

enita

l Abn

orm

ality

Org

an F

unct

ion

Cell F

unct

ion

Phys

iolo

gic

Func

tion

Dise

ase

or S

yndr

ome

Path

olog

ic Fu

nctio

n

Cell D

ysfu

nctio

n

Tiss

ue Cell

Anat

omica

l Stru

ctur

e

Anim

al

Bird

Plan

t

Mam

mal

Natu

ral P

roce

ss

Hum

an−c

ause

d Pr

oces

s

Ther

apeu

tic P

roce

dure

Labo

rato

ry P

roce

dure

Diag

nost

ic Pr

oced

ure

Labo

rato

ry R

esul

t

Find

ing

Sign

or S

ympt

om

Ster

oid

Carb

ohyd

rate

Lipi

d

Amin

o Ac

id

Horm

one

Enzy

me

Antib

iotic

analyzes affects process of causes causes (IRM)

Figure 3: MAP tree recovered from 49 relations between entities in a biomedical data set. Four relations areshown (rows and columns permuted to match in-order traversal of the MAP tree). Consider the circled subset ofthe t-c partition for causes. This block captures the knowledge that “chemicals” cause “diseases.” The InfiniteRelational Model (IRM) does not capture the appropriate structure in the relation cause because it does notmodel the latent hierarchy, instead choosing a single partition to describe the structure across all relations.

representative features. The model discovers that some features are associated only with certainparts of the tree: “is juicy” is associated with the fruits, and “is metal” is associated with the man-made items. Discovering domains is a fundamental cognitive problem that may be solved earlyin development [11], but that is ignored by many cognitive models, which consider only carefullychosen data from a single domain (e.g. data including only animals and only biological features). Byorganizing the 60 objects into domains and identifying a subset of features that are associated witheach domain, our model begins to suggest how infants may parse their environment into coherentdomains of objects and features.

Our second application explores the acquisition of ontological knowledge, a problem that has beenpreviously discussed by Keil [7]. We demonstrate that our model discovers a simple biomedicalontology given data from the Unified Medical Language System (UMLS) [12]. The full data set in-cludes 135 entities and 49 binary relations, where the entities are ontological categories like ‘Sign orSymptom’, ‘Cell’, and ‘Disease or Syndrome,’ and the relations include verbs like causes, analyzesand affects. We applied our model to a subset of the data including the 30 entities shown in Figure 3.

“OR-nonexch” — 2013/6/12 — 14:07 — page 13 — #13

13

Fig. 7: Typical directing random functions underlying, from left to right, 1) an IRM (where partitions correspond with a Chinese restaurant process) withconditionally i.i.d. link probabilities; 2) a more flexible variant of the IRM with merely exchangeable link probabilities as in Example IV.3; 3) a LFRM (wherepartitions correspond with an Indian buffet process) with feature-exchangeable link probabilities as in Example IV.10; 4) a Mondrian-process-based modelwith a single latent dimension; 5) a Gaussian-processed-based model with a single latent dimension. (Note that, in practice, one would use more than onelatent dimension in the last two examples, although this complicates visualization. In the first four figures, we have truncated each of the “stick-breaking”constructions at a finite depth, although, at the resolution of the figures, it is very difficult to notice the effect.)

Then, two objects represented by random variables U and U 0

are equivalent iff U, U 0 2 E(N) for some finite set N ⇢ N. Asbefore, we could consider a simple, cluster-based representingfunction where the block values are given by an (fN,M ),indexed now by finite subsets N, M ✓ N. Then fN,M woulddetermine how two objects relate when they possess featuresN and M , respectively.

However, if we want to capture the idea that the rela-tionships between objects depend on the individual featuresthe objects possess, we would not want to assume that theentries of fN,M formed an exchangeable array, as in the caseof a simple, cluster-based model. E.g., we might chooseto induce more dependence between fN,M and fN 0,M whenN \N 0 6= ; than otherwise. The following definition capturesthe appropriate relaxation of exchangeability:

Definition IV.9 (feature-exchangeable array). Let Y :=(YN,M ) be an array of random variables indexed by pairsN, M ✓ N of finite subsets. For a permutation ⇡ of N andN ✓ N, write ⇡(N) := {⇡(n) : n 2 N} for the image. Then,we say that Y is feature-exchangeable when

(YN,M )d= (Y⇡(N),⇡(M)), (IV.7)

for all permutations ⇡ of N. /

Informally, an array Y indexed by sets of features is feature-exchangeable if its distribution is invariant to permutations ofthe underlying feature labels (i.e., of N). The following is anexample of a feature-exchangeable array, which we will usewhen we re-describe the Latent Feature Relational Model inthe language of feature-based models:

Example IV.10 (feature-exchangeable link probabilities). Letw := (wij) be a conditionally i.i.d. array of random variablesin R, and define ✓ := (✓N,M ) by

✓N,M = sig(P

i2N

Pj2M wij), (IV.8)

where sig : R ! [0, 1] maps real values to probabilities via,e.g., the sigmoid or probit functions. It is straightforward toverify that ✓ is feature-exchangeable. /

We can now define simple feature-based models:

Definition IV.11. We say that a Bayesian model of an ex-changeable array X is simple feature-based when, for some

random function F representing X , there are random featureallocations B and C of the unit interval [0, 1] such that, forevery pair N, M ✓ N of finite subsets, F takes the constantvalue fN,M on the block

AN,M := B(N) ⇥ C(M) ⇥ [0, 1], (IV.9)

and the values f := (fN,M ) themselves form a feature-exchangeable array, independent of B and C. We say an arrayis simple feature-based if its distribution is. /

We can relate this definition back to cluster-based modelsby pointing out that simple feature-based arrays are simplecluster-based arrays when either i) the feature allocationsare partitions or ii) the array f is exchangeable. The lattercase highlights the fact that feature-based arrays relax theexchangeability assumption of the underlying block values.

As in the case of simple cluster-based models, nonparamet-ric simple feature-based models will place positive mass onfeature allocations with an arbitrary number of distinct sets.As we did with general cluster-based models, we will definegeneral feature-based models as randomizations of simplemodels:

Definition IV.12 (feature-based models). We say that aBayesian model for an exchangeable array X := (Xij) in Xis feature-based when X is a P -randomization of a simple,feature-based, exchangeable array ✓ := (✓ij) taking values ina space T , for some probability kernel P from T to X. Wesay an array is feature-based when its distribution is. /

Comparing Definitions IV.5 and IV.12, we see that therelationship between random functions representing ✓ and Xare the same as with cluster-based models. We now return tothe LFRM model, and describe it in the language of feature-based models:

Example IV.13 (Latent Feature Relational Model continued).The random feature allocations underlying the LFRM can bedescribed in terms of so-called “stick-breaking” constructionsof the Indian buffet process. One of the simplest stick-breakingconstructions, and the one we will use here, is due to Teh,Gorur, and Ghahramani [61]. (See also [63], [52] and [53].)

Let W1, W2, . . . be an i.i.d. sequence of Beta(↵, 1) randomvariables for some concentration parameter ↵ > 0. For everyn, we define Pn :=

Qnj=1 Wj . (The relationship between

Roy and Teh. The Mondrian process. (2009)

Orbanz and Roy (2014).

Lloyd, Orbanz, Ghahramani, and Roy (2012).

Savova, Roy, Schmidt, and Tenenbaum (2007).

Roy, Kemp, Mansinghka, and Tenenbaum (2007)√

a.e. computable f× merely computably measurable f

√Infinite Relational Model Dirichlet process

(Kemp, Tenenbaum, Griffiths, Yamada, and Ueda 2008)√Linear Relational Model Mondrian process

(R. and Teh 2009)

× Infinite Feature Relational Model Beta process(Miller, Griffiths, and Jordan 2010)

× Random Function Model

Gaussian process

(Lloyd, Orbanz, R., and Ghahramani 2012)

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 29/31

Page 82: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Do such arrays X appear in practice?

YES!

Food Tools Vehicles Animals

stra

wber

rypi

neap

ple

grap

efru

itap

ple

tang

erin

ene

ctar

ine

lem

ongr

ape

oran

gecu

cum

ber

carro

tra

dish

onio

nsle

ttuce

pota

tocla

mp

drill

plie

rswr

ench

shov

elch

isel

tom

ahaw

ksle

dgeh

amm

er axe

ham

mer

hoe

sciss

ors

scre

wdriv

erra

kecr

owba

rva

nca

rtru

ck bus

jeep

ship

bike

helic

opte

rtra

inm

otor

cycle

tricy

clewh

eelb

arro

wsu

bmar

ine

yach

tje

tse

allio

ndo

lphi

nm

ouse

duck tiger rat

chick

en cat

deer

squi

rrel

shee

p pig

hors

eco

w

a toolan animal

made of metalis edible

has a handleis juicyis fast

eaten in saladsis white

has 4 wheelshas 4 legs

Figure 2: MAP tree recovered from a data set including 60 objects from four domains. MAP partitions forseveral features are shown: the model discovers, for example, that “is juicy” is associated with only one part ofthe tree. The weight of each edge in the tree is proportional to its vertical extent.

Diseases Chemicals

Acqu

ired

Abno

rmal

ity

Cong

enita

l Abn

orm

ality

Org

an F

unct

ion

Cell F

unct

ion

Phys

iolo

gic

Func

tion

Dise

ase

or S

yndr

ome

Path

olog

ic Fu

nctio

n

Cell D

ysfu

nctio

n

Tiss

ue Cell

Anat

omica

l Stru

ctur

e

Anim

al

Bird

Plan

t

Mam

mal

Natu

ral P

roce

ss

Hum

an−c

ause

d Pr

oces

s

Ther

apeu

tic P

roce

dure

Labo

rato

ry P

roce

dure

Diag

nost

ic Pr

oced

ure

Labo

rato

ry R

esul

t

Find

ing

Sign

or S

ympt

om

Ster

oid

Carb

ohyd

rate

Lipi

d

Amin

o Ac

id

Horm

one

Enzy

me

Antib

iotic

analyzes affects process of causes causes (IRM)

Figure 3: MAP tree recovered from 49 relations between entities in a biomedical data set. Four relations areshown (rows and columns permuted to match in-order traversal of the MAP tree). Consider the circled subset ofthe t-c partition for causes. This block captures the knowledge that “chemicals” cause “diseases.” The InfiniteRelational Model (IRM) does not capture the appropriate structure in the relation cause because it does notmodel the latent hierarchy, instead choosing a single partition to describe the structure across all relations.

representative features. The model discovers that some features are associated only with certainparts of the tree: “is juicy” is associated with the fruits, and “is metal” is associated with the man-made items. Discovering domains is a fundamental cognitive problem that may be solved earlyin development [11], but that is ignored by many cognitive models, which consider only carefullychosen data from a single domain (e.g. data including only animals and only biological features). Byorganizing the 60 objects into domains and identifying a subset of features that are associated witheach domain, our model begins to suggest how infants may parse their environment into coherentdomains of objects and features.

Our second application explores the acquisition of ontological knowledge, a problem that has beenpreviously discussed by Keil [7]. We demonstrate that our model discovers a simple biomedicalontology given data from the Unified Medical Language System (UMLS) [12]. The full data set in-cludes 135 entities and 49 binary relations, where the entities are ontological categories like ‘Sign orSymptom’, ‘Cell’, and ‘Disease or Syndrome,’ and the relations include verbs like causes, analyzesand affects. We applied our model to a subset of the data including the 30 entities shown in Figure 3.

“OR-nonexch” — 2013/6/12 — 14:07 — page 13 — #13

13

Fig. 7: Typical directing random functions underlying, from left to right, 1) an IRM (where partitions correspond with a Chinese restaurant process) withconditionally i.i.d. link probabilities; 2) a more flexible variant of the IRM with merely exchangeable link probabilities as in Example IV.3; 3) a LFRM (wherepartitions correspond with an Indian buffet process) with feature-exchangeable link probabilities as in Example IV.10; 4) a Mondrian-process-based modelwith a single latent dimension; 5) a Gaussian-processed-based model with a single latent dimension. (Note that, in practice, one would use more than onelatent dimension in the last two examples, although this complicates visualization. In the first four figures, we have truncated each of the “stick-breaking”constructions at a finite depth, although, at the resolution of the figures, it is very difficult to notice the effect.)

Then, two objects represented by random variables U and U 0

are equivalent iff U, U 0 2 E(N) for some finite set N ⇢ N. Asbefore, we could consider a simple, cluster-based representingfunction where the block values are given by an (fN,M ),indexed now by finite subsets N, M ✓ N. Then fN,M woulddetermine how two objects relate when they possess featuresN and M , respectively.

However, if we want to capture the idea that the rela-tionships between objects depend on the individual featuresthe objects possess, we would not want to assume that theentries of fN,M formed an exchangeable array, as in the caseof a simple, cluster-based model. E.g., we might chooseto induce more dependence between fN,M and fN 0,M whenN \N 0 6= ; than otherwise. The following definition capturesthe appropriate relaxation of exchangeability:

Definition IV.9 (feature-exchangeable array). Let Y :=(YN,M ) be an array of random variables indexed by pairsN, M ✓ N of finite subsets. For a permutation ⇡ of N andN ✓ N, write ⇡(N) := {⇡(n) : n 2 N} for the image. Then,we say that Y is feature-exchangeable when

(YN,M )d= (Y⇡(N),⇡(M)), (IV.7)

for all permutations ⇡ of N. /

Informally, an array Y indexed by sets of features is feature-exchangeable if its distribution is invariant to permutations ofthe underlying feature labels (i.e., of N). The following is anexample of a feature-exchangeable array, which we will usewhen we re-describe the Latent Feature Relational Model inthe language of feature-based models:

Example IV.10 (feature-exchangeable link probabilities). Letw := (wij) be a conditionally i.i.d. array of random variablesin R, and define ✓ := (✓N,M ) by

✓N,M = sig(P

i2N

Pj2M wij), (IV.8)

where sig : R ! [0, 1] maps real values to probabilities via,e.g., the sigmoid or probit functions. It is straightforward toverify that ✓ is feature-exchangeable. /

We can now define simple feature-based models:

Definition IV.11. We say that a Bayesian model of an ex-changeable array X is simple feature-based when, for some

random function F representing X , there are random featureallocations B and C of the unit interval [0, 1] such that, forevery pair N, M ✓ N of finite subsets, F takes the constantvalue fN,M on the block

AN,M := B(N) ⇥ C(M) ⇥ [0, 1], (IV.9)

and the values f := (fN,M ) themselves form a feature-exchangeable array, independent of B and C. We say an arrayis simple feature-based if its distribution is. /

We can relate this definition back to cluster-based modelsby pointing out that simple feature-based arrays are simplecluster-based arrays when either i) the feature allocationsare partitions or ii) the array f is exchangeable. The lattercase highlights the fact that feature-based arrays relax theexchangeability assumption of the underlying block values.

As in the case of simple cluster-based models, nonparamet-ric simple feature-based models will place positive mass onfeature allocations with an arbitrary number of distinct sets.As we did with general cluster-based models, we will definegeneral feature-based models as randomizations of simplemodels:

Definition IV.12 (feature-based models). We say that aBayesian model for an exchangeable array X := (Xij) in Xis feature-based when X is a P -randomization of a simple,feature-based, exchangeable array ✓ := (✓ij) taking values ina space T , for some probability kernel P from T to X. Wesay an array is feature-based when its distribution is. /

Comparing Definitions IV.5 and IV.12, we see that therelationship between random functions representing ✓ and Xare the same as with cluster-based models. We now return tothe LFRM model, and describe it in the language of feature-based models:

Example IV.13 (Latent Feature Relational Model continued).The random feature allocations underlying the LFRM can bedescribed in terms of so-called “stick-breaking” constructionsof the Indian buffet process. One of the simplest stick-breakingconstructions, and the one we will use here, is due to Teh,Gorur, and Ghahramani [61]. (See also [63], [52] and [53].)

Let W1, W2, . . . be an i.i.d. sequence of Beta(↵, 1) randomvariables for some concentration parameter ↵ > 0. For everyn, we define Pn :=

Qnj=1 Wj . (The relationship between

Roy and Teh. The Mondrian process. (2009)

Orbanz and Roy (2014).

Lloyd, Orbanz, Ghahramani, and Roy (2012).

Savova, Roy, Schmidt, and Tenenbaum (2007).

Roy, Kemp, Mansinghka, and Tenenbaum (2007)√

a.e. computable f× merely computably measurable f

√Infinite Relational Model Dirichlet process

(Kemp, Tenenbaum, Griffiths, Yamada, and Ueda 2008)√Linear Relational Model Mondrian process

(R. and Teh 2009)

× Infinite Feature Relational Model Beta process(Miller, Griffiths, and Jordan 2010)

× Random Function Model Gaussian process(Lloyd, Orbanz, R., and Ghahramani 2012)

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 29/31

Page 83: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Talk Outline1. The stochastic inference problem

2. Where are we now in probabilistic programming?

3. Approximability and Exchangeability:When can we represent conditional independence?

4. Conclusion

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 30/31

Page 84: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Talk Outline1. The stochastic inference problem

2. Where are we now in probabilistic programming?

3. Approximability and Exchangeability:When can we represent conditional independence?

4. Conclusion

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 30/31

Page 85: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Conclusion

a.e. computability

PPLs −→

computable measurability

PPLs

1. One can see the gap in the literature.Key stochastic processes are merely computably measurable.

2. How do we use such representations?Exact-approximate inference and computable measurability?

3. Need new programming language constructs.Naıvely, we would need to thread ε’s everywhere in program.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 31/31

Page 86: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Conclusion

a.e. computability

PPLs −→

computable measurability

PPLs

1. One can see the gap in the literature.Key stochastic processes are merely computably measurable.

2. How do we use such representations?Exact-approximate inference and computable measurability?

3. Need new programming language constructs.Naıvely, we would need to thread ε’s everywhere in program.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 31/31

Page 87: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Conclusion

a.e. computability

PPLs −→

computable measurability

PPLs

1. One can see the gap in the literature.Key stochastic processes are merely computably measurable.

2. How do we use such representations?Exact-approximate inference and computable measurability?

3. Need new programming language constructs.Naıvely, we would need to thread ε’s everywhere in program.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 31/31

Page 88: Conditional Independence, Computability, and Measurability · Conditional Independence, Computability, and Measurability Daniel M. Roy Research Fellow Emmanuel College University

Conclusion

a.e. computability

PPLs −→

computable measurability

PPLs

1. One can see the gap in the literature.Key stochastic processes are merely computably measurable.

2. How do we use such representations?Exact-approximate inference and computable measurability?

3. Need new programming language constructs.Naıvely, we would need to thread ε’s everywhere in program.

Daniel Roy, Cambridge Conditional Independence, Computability, and Measurability 31/31