Parameter Related Domain Knowledge for Learning in Bayesian Networks

1

Parameter Related Domain Knowledge for

Learning in Bayesian Networks

Stefan NiculescuPhD Candidate, Carnegie Mellon University

Joint work with professor Tom Mitchell and Dr. Bharat Rao

April 2005

2

Domain Knowledge

• In real world, often data is too sparse to allow building of an accurate model

• Domain knowledge can help alleviate this problem

• Several types of domain knowledge:– Relevance of variables (feature selection) – Conditional Independences among variables– Parameter Domain Knowledge

3

Parameter Domain Knowledge

• In a Bayes Net for a real world domain:– can have huge number of parameters– not enough data to estimate them accurately

• Parameter Domain Knowledge constraints: – reduce the number of parameters to estimate– reduce the variance of parameter estimates

4

Outline

• Motivation

Parameter Related Domain Knowledge

• Experiments

• Related Work

• Summary / Future Work

5

Parameters and Counts

CPT for variable Xi

Theorem. The Maximum Likelihood estimators are given by:

6

Parameter Sharing

1g

11lc

12lc

13lc

1 11

1c 3c2c

1g1g Theorem. The Maximum Likelihood

estimators are given by:

21lc

22lc

23lc

2g 2g 2g32lc

},,{ 3211 cccC

7

Incomplete Data, Frequentist

8

Dependent Dirichlet Priors

9

Bayesian Averaging

10

Hierarchical Parameter Sharing

1

1

1 11 11

11 1 1 1

2 2 2

3 3 3

44 4

5 5

11

Probability Mass Sharing

DK: Parameters of a given color have the same sum across all distributions.

11 1

k5

...

1211 k1

21 k222

51 5241 k442

kk 2122122111 ...

12

Probability Ratio Sharing

11 1

k5

DK: Parameters of a given color preserve their relative ratios across all distributions.

...

k

k

2

1

22

12

21

11 ...

1211 k1

21 k222

51 5241 k442

13

Where are we right now?

14

Outline

• Motivation

• Parameter Related Domain Knowledge

Experiments

• Related Work


15

Datasets

• Project World - CALO– 6 persons, ~ 200 emails – Manually labeled as About / Not About Meetings– Data: (Person, Email, Topic)

• Artificial Datasets– Kept most of the characteristics of the data BUT ...– ... new emails were generated where frequencies of

certain words were shared across users– Purpose:

• Domain Knowledge readily available• To be able to study the effect of training set size (up to 5000)• To be able to compare our estimated distribution to the true

distribution

16

Approach

• Can model Email using a Naive Bayes model:– Without Parameter Sharing (PSNB)

– With Parameter Sharing (SSNB)

• Also compare with a model that assumes the sender is irrelevant (GNB)– the frequencies of words within a

topic to be learnt from all examples

Sender

Word

TopicSender

Word

Topic

17

Effect of Training Set Size

As expected:

• SSNB performs better than both models

• SSNB and PSNB tend to perform similarly when the size of training set increases, but SSNB much better when data is sparse

18

Outline

• Motivation


• Experiments

Related Work


19

Dirichlet Priors in a Bayes Net

Prior Belief

Spread

The Domain Expert specifies an assignment of parameters. However, leaves room for some error (Spread)

20

HMMs and DBNs

1tX 1tXtX

1tY tY 1tY

... ...

... ...

21

Module NetworksIn a Module:

• Same parents

• Same CPTs

Image from “Learning Module Networks” by Eran Segal and Daphne Koller

22

Context Specific Independence

Alarm

Set Burglary

23

Outline

• Motivation


• Experiments

• Related Work

Summary / Future Work

24

Summary

• Parameter Related Domain Knowledge is needed when data is scarce

• Developed methods to estimate parameters:

– For each of four types of Domain Knowledge presented

– From both complete and incomplete Data

• Markov Models, Module Nets, Context Specific Independence – particular

cases of our parameter sharing domain knowledge

• Models using Parameter Sharing performed better than two classical Bayes

Nets on synthetic data

25

Future Work

• Automatically find Shared Parameters

• Study interactions among different types of Domain Knowledge

• Incorporate Domain Knowledge about continuous variables

• Investigate Domain Knowledge in the form of inequality constraints

26

Questions ?

27

THE END

28

Backup Slides

29

Hierarchical Parameter Sharing

}{},,,,,,{ 1654321 cccccc

},{},,,{ 32321 ccc }{},,,{ 4654 ccc

}{},,{ 521 cc

{...}},{ 1c {...}},{ 2c

{...}},{ 3c {...}},{ 4c {...}},{ 5c {...}},{ 6c

30

Full Data Observability, Frequentist

31


21

1

)|(1 EnglishWordPc

)|(2 SpanishWordPc

11

1

12 2

2

1415

2425

•Want to model P(Word|Language)

•Two languages: English, Spanish

•Different sets of words

•Domain Knowledge:

•Aggregate Probability Mass of Nouns the same in both

•Same holds for adjectives, verbs, etc

NounsT 1

VerbsT 2

32


33


34


21

1

)|(1 EnglishWordPc

)|(2 SpanishWordPc

11

1

12 2

2

1415

2425

•Want to model P(Word|Language)

•Two languages: English, Spanish

•Different sets of words

•Domain Knowledge:

•Word groups:

•About computers: computer, mouse, monitor, etc

•Relative frequency of “computer” to “mouse” same in both languages

•Aggregate mass can be different

T1 Computer Words

T2 Business Words

35


36


Documents

Parameter Related Domain Knowledge for Learning in Bayesian Networks