30
1 Logical Bayesian Networks A knowledge representation view on Probabilistic Logical Models Daan Fierens, Hendrik Blockeel, Jan Ramon, Maurice Bruynooghe Katholieke Universiteit Leuven, Belgium

1 Logical Bayesian Networks A knowledge representation view on Probabilistic Logical Models Daan Fierens, Hendrik Blockeel, Jan Ramon, Maurice Bruynooghe

Embed Size (px)

Citation preview

1

Logical Bayesian NetworksA knowledge representation view on

Probabilistic Logical Models

Daan Fierens, Hendrik Blockeel, Jan Ramon, Maurice Bruynooghe

Katholieke Universiteit Leuven, Belgium

2

Probabilistic Logical Models

Variety of PLMs:• Origin in Bayesian Networks (Knowledge

Based Model Construction)• Probabilistic Relational Models• Bayesian Logic Programs• CLP(BN)• …

• Origin in Logic Programming• PRISM• Stochastic Logic Programs• …

THIS TALK

- learning- best known- most developed

3

Combining PRMs and BLPs

PRMs:• + Easy to understand, intuitive• - Somewhat restricted (as compared to BLPs)

BLPs:• + More general, expressive• - Not always intuitive

Combine strengths of both models in one model ?

We propose Logical Bayesian Networks (PRMs+BLPs)

4

Overview of this Talk

Example Probabilistic Relational Models Bayesian Logic Programs Combining PRMs and BLPs: Why and How ? Logical Bayesian Networks

5

Example [ Koller et al.]

University:• students (IQ) + courses (rating)

• students take courses (grade)

• grade IQ

• rating sum of IQ’s

Specific situation:• jeff takes ai, pete and rick take lp, no student

takes db

6

Bayesian Network-structure

rating(db) rating(ai)rating(lp)

iq(jeff)iq(pete) iq(rick)

grade(jeff,ai)grade(rick,lp)grade(pete,lp)

7

PRMs [Koller et al.]

PRM: relational schema,dependency structure (+ aggregates + CPDs)

key iq

Student

key rating

Course

key student

Takes

CPT aggr + CPT

course grade

Course

rating

Student

iq

Takes

grade

8

PRMs (2)

• Semantics: PRM induces a Bayesian network on the relational skeleton

key iq

jeff ?

pete ?

rick ?

Student

key rating

ai ?

lp ?

db ?

Course

key student

f1 jeff

f2 pete

f3 rick

Takes

course grade

ai ?

lp ?

lp ?

9

PRMs - BN-structure (3)

rating(db) rating(ai)rating(lp)

iq(jeff)iq(pete) iq(rick)

grade(jeff,ai)grade(rick,lp)grade(pete,lp)

10

PRMs: Pros & Cons (4)

Easy to understand and interpret Expressiveness as compared to BLPs, … :

• Not possible to combine selection and aggregation [Blockeel & Bruynooghe, SRL-workshop ‘03]

• E.g. extra attribute sex for students• rating sum of IQ’s for female students

• Specification of logical background knowledge ?

• (no functors, constants)

11

BLPs [Kersting, De Raedt]

Definite Logic Programs + Bayesian networks• Bayesian predicates (range)• Random var = ground Bayesian atom: iq(jeff)• BLP = clauses with CPT

rating(C) | iq(S), takes(S,C).

Range: {low,high}CPT + combining rule (can be anything)

• Semantics: Bayesian network• random variables = ground atoms in LH-model• dependencies grounding of the BLP

12

BLPs (2)

student(pete)., …, course(lp)., …, takes(rick,lp).

rating(C) | iq(S), takes(S,C).

rating(C) | course(C).

grade(S,C) | iq(S), takes(S,C).

iq(S) | student(S).

BLPs do not distinguish probabilistic and logical/certain/structural knowledge• Influence on readability of clauses

• What about the resulting Bayesian network ?

13

BLPs - BN-structure (3)

• Fragment:

iq(jeff)

grade(jeff,ai)

student(jeff)takes(jeff,ai)

student(jeff) iq(jeff)

truefalse

distribution for iq/1?

CPD

14

BLPs - BN-structure (3)

• Fragment:

iq(jeff)

grade(jeff,ai)

student(jeff)takes(jeff,ai)

distribution for grade/2, function of iq(jeff)

takes(jeff,ai) grade(jeff,ai)

truefalse ?

CPD

15

BLPs: Pros & Cons (4)

High expressiveness:• Definite Logic Programs (functors, …)

• Can combine selection and aggregation (combining rules)

Not always easy to interpret • the clauses

• the resulting Bayesian network

16

Combining PRMs and BLPs

Why ?• 1 model = intuitive + high expressiveness

How ? • Expressiveness: ( BLPs)

• Logic Programming

• Intuitive: ( PRMs)• Distinguish probabilistic and logical/certain knowledge• Distinct components (PRMs: schema determines

random variables / dependency structure)• (General vs Specific knowledge)

17

Logical Bayesian Networks

Probabilistic predicates (variables,range) vs Logical predicates

LBN - components:• Relational schema V

• Dependency Structure DE

• CPDs+ aggregates DI

Relational skeleton Logic Program Pl

• Description of DoD / deterministic info

18

Logical Bayesian Networks

Semantics:• LBN induces a Bayesian network on the

variables determined by Pl and V

19

Normal Logic Program Pl

student(jeff).

course(ai).

takes(jeff,ai).

student(pete).

course(lp).

takes(pete,lp).

student(rick).

course(db).

takes(rick,lp).

Semantics: well-founded model WFM(Pl) (when no negation: least Herbrand model)

20

Viq(S) <= student(S).

rating(C) <= course(C).

grade(S,C) <= takes(S,C).

Semantics: determines random variables• each ground probabilistic atom in WFM(Pl V) is random variable

• iq(jeff), …, rating(lp), …,grade(rick,lp)

• non-monotonic negation (not in PRMs, BLPs)• grade(S,C) <= takes(S,C), not(absent(S,C)).

21

DEgrade(S,C) | iq(S).

rating(C) | iq(S) <- takes(S,C).

Semantics: determines conditional dependencies

• ground instances with context in WFM(Pl)• e.g. rating(lp) | iq(pete) <- takes(pete,lp)• e.g. rating(lp) | iq(rick) <- takes(rick,lp)

22

V + DEiq(S) <= student(S).

rating(C) <= course(C).

grade(S,C) <= takes(S,C)

grade(S,C) | iq(S).

rating(C) | iq(S) <- takes(S,C).

23

LBNs - BN-structure

rating(db) rating(ai)rating(lp)

iq(jeff)iq(pete) iq(rick)

grade(jeff,ai)grade(rick,lp)grade(pete,lp)

24

DI The quantitative component

• ~ in PRMs: aggregates + CPDs

• ~ in BLPs: CPDs + combining rules For each probabilistic predicate p a logical CPD

• = function with• input: set of pairs (Ground prob atom,Value)• output: probability distribution for p

• Semantics: determines the CPDs for all variables about p

25

DI (2)

• e.g. for rating/1 (inputs are about iq/1)

If (SUM(iq(S),Val) Val) > 1000

Then 0.7 high / 0.3 low Else 0.5 high / 0.5 low• Can be written as logical probability tree (TILDE)

sum(Val, iq(S,Val), Sum), Sum > 1000

0.5 / 0.5 0.7 / 0.3

• cf [Van Assche et al., SRL-workshop ‘04]

26

DI (3)

DI determines the CPDs• e.g. CPD for rating(lp) = function of iq(pete) and

iq(rick)

• Entry in CPD for iq(pete)=100 and iq(rick)=120 ?

• Apply logical CPD for rating/1 to {(iq(pete),100),(iq(rick),120)}

• Result: probab. distribution 0.5 high / 0.5 low

If (SUM(iq(S),Val) Val) > 1000

Then 0.7 high / 0.3 low Else 0.5 high / 0.5 low

27

DI (4)

Combine selection and aggregation?• e.g. rating sum of IQ’s for female students

sum(Val, (iq(S,Val), sex(S,fem)), Sum), Sum > 1000

0.5 / 0.5 0.7 / 0.3

• again cf [Van Assche et al., SRL-workshop ‘04]

28

LBNs: Pros & Cons / Conclusion

Qualitative part (V + DE): easy to interpret High expressiveness

• Normal Logic Programs (non-monotonic negation, functors, …)

• Combining selection and aggregation Comes at a cost:

• Quantitative part (DI) is more difficult (than for PRMs)

29

Future Work: Learning LBNs

Learning algorithms for PRMs & BLPs• On high level: appropriate mix will probably do

for LBNs

• LBNs PRMs: learning quantitative component is more difficult for LBNs

• LBNs BLPs:• LBNs have separation V vs DE• LBNs: distinction probabilistic predicates vs logical

predicates = bias (but also used by BLPs in practice)

30

?