Recent developments in imprecise probabilities and ...gdcooma/presentations/recipgm.pdf · Conditioning and lower previsions Suppose we have two variables X 1 in X 1 and X 2 in X

Recent developments in imprecise probabilitiesand probabilistic graphical models

Gert de Cooman

Ghent University, SYSTeMS

[email protected]://users.UGent.be/˜gdcooma

gertekoo.wordpress.com

ECAI 201231 August 2012

What would I like to achieve and convey?

IMPRECISEPROBABILITIES

PROBABILISTICGRAPHICAL

MODELS

IMPRECISE PROBABILITYMODELS

Credal sets

Mass functions and expectationsAssume we are uncertain about:I the value or a variable XI in a finite set of possible values X.

This is usually modelled by a probability mass function p on X:

p(x)≥ 0 and ∑x∈X

p(x) = 1;

With p we can associate a prevision/expectation operator Pp:

Pp(f ) := ∑x∈X

p(x) f (x) where f : X→ R.

If A⊆X is an event, then its probability is given by

Pp(A) = ∑x∈A

p(x) = Pp(IA).

The simplex of all probability mass functions

Consider the simplex ΣX of all mass functions on X:

ΣX :=

{p ∈ RX

+ : ∑x∈X

p(x) = 1

}.

b

c

a

ΣX

(0,1,0)

(0,0,1)

(1,0,0)

b

c

a

ΣX

pu

Credal sets

DefinitionA credal set M is a convex closed subset of ΣX .

b

c

a Mb

c

a

M

b

c

a

M

b

c

a

M

It is completely characterised by its set of extreme points ext(M ).

Conditioning and credal sets

Suppose we have two variables X1 in X1 and X2 in X2.

A credal set for (X1,X2) jointly is a convex closed set of joint massfunctions p(x1,x2):

M ⊆ ΣX1×X2

This gives rise to a conditional model by applying Bayes’s Rule to eachmass function:

M |x2 := {p(·|x2) : p ∈M } .

Working with extreme points does the job too.

Independence and credal sets


Marginal models are credal sets for X1 and X2) separately:

M1 ⊆ ΣX1 and M2 ⊆ ΣX2

Their strong product is the joint credal set:

M1�M2 := CCH({p1 ·p2 : p1 ∈M1 and p2 ∈M2} .

This leads to a notion of strong independence.

Lower previsions

Lower and upper previsions

b

c

a

ΣX

P(I{c}) = 1/4

P(I{c}) = 4/7

Equivalent modelConsider the set L (X) = RX of all real-valued maps on X. We definetwo real functionals on L (X): for all f : X→ R

PM (f ) = min{Pp(f ) : p ∈M } lower prevision/expectationPM (f ) = max{Pp(f ) : p ∈M } upper prevision/expectation.

Observe thatPM (f ) =−PM (−f ).

Basic properties of lower previsions

DefinitionWe call a real functional P on L (X) a lower prevision if it satisfies thefollowing properties:for all f and g in L (X) and all real λ ≥ 0:

1. P(f )≥min f [boundedness];2. P(f +g)≥ P(f )+P(g) [super-additivity];3. P(λ f ) = λP(f ) [non-negative homogeneity].

TheoremA real functional P is a lower prevision if and only if it is the lowerenvelope of some credal set M .

Conditioning and lower previsions


Consider for instance:I a joint lower prevision P1,2 for (X1,X2) defined on L (X1×X2);I a conditional lower prevision P2(·|x1) for X2 conditional on X1 = x1,

defined on L (X2), for all values x1 ∈X1.

CoherenceThese lower previsions P1,2 and P2(·|X1) must satisfy certain (joint)coherence criteria: compare with Bayes’s Rule and de Finetti’scoherence criteria for precise previsions

See the web site of SIPTA (www.sipta.org) for pointers to moredetails.

Independence and lower previsions


Definition (Epistemic irrelevance)X1 is epistemically irrelevant to X2 when learning the value of X1 doesnot change our beliefs about X2:

P1,2(f (X2)) = P2(f (X2)|x1) for all f ∈L (X2) and all x1 ∈X1

Important:Epistemic irrelevance is not a symmetrical notion!It is weaker than strong independence.

Epistemic independence (also weaker) is the symmetrised version.

Sets of desirable gambles

First steps: Peter Walley (2000)

@ARTICLE{walley2000,author = {Walley, Peter},title = {Towards a unified theory of imprecise probability},journal = {International Journal of Approximate Reasoning},year = 2000,volume = 24,pages = {125--148}

}

First steps: Peter Williams (1977)

@ARTICLE{williams2007,author = {Williams, Peter M.},title = {Notes on conditional previsions},journal = {International Journal of Approximate Reasoning},year = 2007,volume = 44,pages = {366--383}

}

Set of desirable gambles as a belief model

Gambles:A gamble f : X→ R is an uncertain reward whose value is f (X)

Set of desirable gambles:D ⊆L (X) is a set of gambles that a subject strictly prefers to zero

Why work with sets of desirable gambles?

Working with sets of desirable gambles D:I is simpler, more intuitive and more elegantI is more general and expressive than (conditional) lower previsionsI gives a geometrical flavour to probabilistic inferenceI includes classical propositional logic as another special caseI shows that probabilistic inference and Bayes’ Rule are ‘logical’

inferenceI includes precise probability as one special caseI avoids problems with conditioning on sets of probability zero

Most comprehensive approach so far: note on arXiv

Introduction to Imprecise Probabilities

@BOOK{troffaes2012,title = {Introduction to Imprecise Probabilities},publisher = {Wiley},editor = {Augustin, Thomas and Coolen, Frank P. A.

and De Cooman, Gert and Troffaes, Matthias C. M.},note = {Due end 2012},

}

IMPRECISE-PROBABILISTICGRAPHICAL MODELS

Credal sets

Credal networks: the special case of a tree

Basic conceptConsider a directed tree T, with a variable Xt attached to each nodet ∈ T.

X1

X2

X3 X4

X5

X6

X7

X8 X9

X10 X11

Each variable Xt assumes values in a set Xt.

Credal trees: local uncertainty models

Local uncertainty model associated with each node tFor each possible value xm(t) ∈Xm(t) of the mother variable Xm(t), wehave a local conditional credal set

Mt|Xm(t)

which is a collection of credal sets

Mt|xm(t) ⊆ ΣXt for each xm(t) ∈Xm(t)

Xm(t)

Xs . . . Xt . . . Xs′

Interpretation of the graphical structure

The graphical structure is interpreted as follows:Conditional on the mother variable, the non-parent non-descendants ofeach node variable are strongly independent of it and its descendants.

X1

X2

X3 X4

X5

X6

X7

X8 X9

X10 X11

Lower previsions

Credal trees: local uncertainty models

Local uncertainty model associated with each node tFor each possible value xm(t) ∈Xm(t) of the mother variable Xm(t), wehave a conditional lower prevision/expectation

Qt(·|xm(t)) : L (Xt)→ R

where

Qt(f |xm(t)) = lower prevision of f (Xt), given that Xm(t) = xm(t).

The local model Qt(·|Xm(t)) is a conditional lower prevision operator.

Xm(t)

Xs . . . Xt . . . Xs′

Interpretation of the graphical structure

The graphical structure is interpreted as follows:Conditional on the mother variable, the non-parent non-descendants ofeach node variable are epistemically irrelevant to it and itsdescendants.

X1

X2

X3 X4

X5

X6

X7

X8 X9

X10 X11

@ARTICLE{cooman2010,author = {{d}e Cooman, Gert and Hermans, Filip and Antonucci, Alessandro and Zaffalon, Marco},title = {Epistemic irrelevance in credal nets: the case of imprecise {M}arkov trees},journal = {International Journal of Approximate Reasoning},year = 2010,volume = 51,pages = {1029--1052},doi = {10.1016/j.ijar.2010.08.011}

}

MePICTIr for updating a credal tree

For a credal tree we can find the joint model from the local modelsrecursively, from leaves to root.

Exact message passing algorithm

– credal tree treated as an expert system– linear complexity in the number of nodes

Python code

– written by Filip Hermans– testing and connection with strong independence in cooperation

with Marco Zaffalon and Alessandro Antonucci

Current (toy) applications in HMMscharacter recognition, air traffic trajectory tracking and identification,earthquake rate prediction

@INPROCEEDINGS{cooman2011,author = {De Bock, Jasper and {d}e Cooman, Gert},title = {Imprecise probability trees: Bridging two theories of imprecise probability},booktitle = {ISIPTA ’09 -- Proceedings of the 6th International Symposium on Imprecise Probability: Theories and Applications},year = 2009,editor = {Coolen, Frank P. A. and {d}e Cooman, Gert and Fetz, Thomas and Oberguggenberger, Michael},address = {Innsbruck, Austria},publisher = {SIPTA}

}

A HMM is a special credal tree

X1 X2 Xk Xn

O1 O2 Ok On

Q1 (·) Q2(·|X1) Qk(·|Xk−1) Qn(·|Xn−1)

S1(·|X1) S2(·|X2) Sk(·|Xk) Sn(·|Xn)

State sequence:

Output sequence:

Maximal state sequences

Classically (Viterbi):Find the state sequence x̂1:n that maximises the posterior probabilityp(x1:n|o1:n) corresponding to a given observation sequence o1:n.

Maximality (under robust ordering):Define a partial order > on state sequences:

x̂1:n > x1:n iff p(x̂1:n|o1:n)> p(x1:n|o1:n) for all compatible p(·|o1:n)

Find the state sequences x̂1:n that are maximal: undominated by anyother state sequence.

ESTIHMM for finding all maximal state sequences

Exact backward-forward algorithm

– developed by Jasper De Bock– finds all maximal state sequences that correspond to a given

observation sequence– quadratic complexity in the number of nodes [linear]– cubic complexity in the number of states [quadratic]– linear complexity in the number of maximal sequences. [linear]

Python code

– written by Jasper De Bock

Current (toy) applications in HMMscharacter recognition, finding gene islands

Sets of desirable gambles

@ARTICLE{moral2005,author = {Moral, Serafín},title = {Epistemic irrelevance on sets of desirable gambles},journal = {Annals of Mathematics and Artificial Intelligence},year = 2005,volume = 45,pages = {197--214},doi = {10.1007/s10472-005-9011-0}

}

Most comprehensive approach so far: note on arXiv

Documents

Recent developments in imprecise probabilities and ...gdcooma/presentations/recipgm.pdf · Conditioning and lower previsions Suppose we have two variables X 1 in X 1 and X 2 in X