Upload
deiter
View
26
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Parameter Related Domain Knowledge for Learning in Bayesian Networks. Stefan Niculescu PhD Candidate, Carnegie Mellon University Joint work with professor Tom Mitchell and Dr. Bharat Rao April 2005. Domain Knowledge. - PowerPoint PPT Presentation
Citation preview
1
Parameter Related Domain Knowledge for
Learning in Bayesian Networks
Stefan NiculescuPhD Candidate, Carnegie Mellon University
Joint work with professor Tom Mitchell and Dr. Bharat Rao
April 2005
2
Domain Knowledge
• In real world, often data is too sparse to allow building of an accurate model
• Domain knowledge can help alleviate this problem
• Several types of domain knowledge:– Relevance of variables (feature selection) – Conditional Independences among variables– Parameter Domain Knowledge
3
Parameter Domain Knowledge
• In a Bayes Net for a real world domain:– can have huge number of parameters– not enough data to estimate them accurately
• Parameter Domain Knowledge constraints: – reduce the number of parameters to estimate– reduce the variance of parameter estimates
4
Outline
• Motivation
Parameter Related Domain Knowledge
• Experiments
• Related Work
• Summary / Future Work
5
Parameters and Counts
CPT for variable Xi
Theorem. The Maximum Likelihood estimators are given by:
6
Parameter Sharing
1g
11lc
12lc
13lc
1 11
1c 3c2c
1g1g Theorem. The Maximum Likelihood
estimators are given by:
21lc
22lc
23lc
2g 2g 2g32lc
},,{ 3211 cccC
7
Incomplete Data, Frequentist
8
Dependent Dirichlet Priors
9
Bayesian Averaging
10
Hierarchical Parameter Sharing
1
1
1 11 11
11 1 1 1
2 2 2
3 3 3
44 4
5 5
11
Probability Mass Sharing
DK: Parameters of a given color have the same sum across all distributions.
11 1
k5
...
1211 k1
21 k222
51 5241 k442
kk 2122122111 ...
12
Probability Ratio Sharing
11 1
k5
DK: Parameters of a given color preserve their relative ratios across all distributions.
...
k
k
2
1
22
12
21
11 ...
1211 k1
21 k222
51 5241 k442
13
Where are we right now?
14
Outline
• Motivation
• Parameter Related Domain Knowledge
Experiments
• Related Work
• Summary / Future Work
15
Datasets
• Project World - CALO– 6 persons, ~ 200 emails – Manually labeled as About / Not About Meetings– Data: (Person, Email, Topic)
• Artificial Datasets– Kept most of the characteristics of the data BUT ...– ... new emails were generated where frequencies of
certain words were shared across users– Purpose:
• Domain Knowledge readily available• To be able to study the effect of training set size (up to 5000)• To be able to compare our estimated distribution to the true
distribution
16
Approach
• Can model Email using a Naive Bayes model:– Without Parameter Sharing (PSNB)
– With Parameter Sharing (SSNB)
• Also compare with a model that assumes the sender is irrelevant (GNB)– the frequencies of words within a
topic to be learnt from all examples
Sender
Word
TopicSender
Word
Topic
17
Effect of Training Set Size
As expected:
• SSNB performs better than both models
• SSNB and PSNB tend to perform similarly when the size of training set increases, but SSNB much better when data is sparse
18
Outline
• Motivation
• Parameter Related Domain Knowledge
• Experiments
Related Work
• Summary / Future Work
19
Dirichlet Priors in a Bayes Net
Prior Belief
Spread
The Domain Expert specifies an assignment of parameters. However, leaves room for some error (Spread)
20
HMMs and DBNs
1tX 1tXtX
1tY tY 1tY
... ...
... ...
21
Module NetworksIn a Module:
• Same parents
• Same CPTs
Image from “Learning Module Networks” by Eran Segal and Daphne Koller
22
Context Specific Independence
Alarm
Set Burglary
23
Outline
• Motivation
• Parameter Related Domain Knowledge
• Experiments
• Related Work
Summary / Future Work
24
Summary
• Parameter Related Domain Knowledge is needed when data is scarce
• Developed methods to estimate parameters:
– For each of four types of Domain Knowledge presented
– From both complete and incomplete Data
• Markov Models, Module Nets, Context Specific Independence – particular
cases of our parameter sharing domain knowledge
• Models using Parameter Sharing performed better than two classical Bayes
Nets on synthetic data
25
Future Work
• Automatically find Shared Parameters
• Study interactions among different types of Domain Knowledge
• Incorporate Domain Knowledge about continuous variables
• Investigate Domain Knowledge in the form of inequality constraints
26
Questions ?
27
THE END
28
Backup Slides
29
Hierarchical Parameter Sharing
}{},,,,,,{ 1654321 cccccc
},{},,,{ 32321 ccc }{},,,{ 4654 ccc
}{},,{ 521 cc
{...}},{ 1c {...}},{ 2c
{...}},{ 3c {...}},{ 4c {...}},{ 5c {...}},{ 6c
30
Full Data Observability, Frequentist
31
Probability Mass Sharing
21
1
)|(1 EnglishWordPc
)|(2 SpanishWordPc
11
1
12 2
2
1415
2425
•Want to model P(Word|Language)
•Two languages: English, Spanish
•Different sets of words
•Domain Knowledge:
•Aggregate Probability Mass of Nouns the same in both
•Same holds for adjectives, verbs, etc
NounsT 1
VerbsT 2
32
Probability Mass Sharing
33
Full Data Observability, Frequentist
34
Probability Ratio Sharing
21
1
)|(1 EnglishWordPc
)|(2 SpanishWordPc
11
1
12 2
2
1415
2425
•Want to model P(Word|Language)
•Two languages: English, Spanish
•Different sets of words
•Domain Knowledge:
•Word groups:
•About computers: computer, mouse, monitor, etc
•Relative frequency of “computer” to “mouse” same in both languages
•Aggregate mass can be different
T1 Computer Words
T2 Business Words
35
Probability Ratio Sharing
36
Full Data Observability, Frequentist