1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng....

Preview:

Citation preview

1

Unsupervised Ontology Induction

From Text

Hoifung PoonDept. Computer Science & Eng.

University of Washington

(Joint work with Pedro Domingos)

2

Extracting Knowledge From Text

……

5

Extracting Knowledge From Text

……

Extracting Knowledge From Text

Wanted: Automatic, end-to-end solution Manual engineering: Costly and limited Supervised learning

Bottleneck: Labeled examples Infeasible for large-scale, open-domain

knowledge extraction

6

Unsupervised Learning for Knowledge Extraction

TextRunner [Banko et al. 2007] State-of-the-art open information extraction Only extracts triples Extractions are largely unstructured and noisy

USP [Poon & Domingos 2009] Form complete, detailed meaning representation More robust to noise Still limited to extractions with substantial evidence Lacks ontological structures

7

Why Ontology?

Compact representation and efficient reasoning [Staab & Studer 2004]

Better generalization

8

Q: What does IL-2 regulate?

A: The DEX-mediated IkappaBalpha induction

Interestingly, the DEX-mediated IkappaBalpha induction was completely inhibited by IL-2, but not IL-4, in Th1 cells, while the reverse profile was seen in Th2 cells.

REGULATE

INHIBIT

ISA

Ontology Learning Attracts increasing interest [Snow et al. 2006, Cimiano

2006, Suchanek et al. 2008, Wu & Weld 2008] Induction: Construct an ontology Population: Map textual expressions to concepts and

relations in the ontology Limitations in existing approaches

Require heuristic patterns or existing KBs Pursue each task in isolation

9

Knowledge representation NLP

Jointly conducts: Ontology induction, population, and knowledge extraction Learns ISA hierarchy over logical expressions Populates it by translating sentences into

logical forms Extends USP with hierarchical clustering Hierarchical smoothing

Encoded in a few high-order formulas in Markov Logic [Richardson & Domingos, 2006]

Sole input is dependency trees10

This Talk: OntoUSP

Five times as many correct answers as TextRunner

Improves on the recall of USP by 47%

11

Outline

Background: USP Unsupervised ontology induction Conclusion

12

Semantic Parsing

induces

protein CD11b

nsubj dobj

IL-4

nn

induces

protein CD11b

nsubj dobj

IL-4

nn

INDUCE

INDUCER INDUCED

IL-4

CD11B

INDUCE(e1)

INDUCER(e1,e2) INDUCED(e1,e3)

IL-4(e2) CD11B(e3)

IL-4 protein induces CD11b

Structured prediction: Partition + Assignment

13

Challenge: Same Meaning, Many Variations

IL-4 induces CD11b

Protein IL-4 enhances the expression of CD11b

CD11b expression is induced by IL-4 protein

The cytokin interleukin-4 induces CD11b expression

IL-4’s up-regulation of CD11b, …

……

14

Unsupervised Semantic Parsing

USP Recursively cluster arbitrary expressions composed with / by similar expressions

IL-4 induces CD11b

Protein IL-4 enhances the expression of CD11b

CD11b expression is enhanced by IL-4 protein

The cytokin interleukin-4 induces CD11b expression

IL-4’s up-regulation of CD11b, …

15

Unsupervised Semantic Parsing

USP Recursively cluster arbitrary expressions composed with / by similar expressions

IL-4 induces CD11b

Protein IL-4 enhances the expression of CD11b

CD11b expression is enhanced by IL-4 protein

The cytokin interleukin-4 induces CD11b expression

IL-4’s up-regulation of CD11b, …

Cluster same forms at the atom level

16

Cluster forms in composition with same forms

Unsupervised Semantic Parsing

USP Recursively cluster arbitrary expressions composed with / by similar expressions

IL-4 induces CD11b

Protein IL-4 enhances the expression of CD11b

CD11b expression is enhanced by IL-4 protein

The cytokin interleukin-4 induces CD11b expression

IL-4’s up-regulation of CD11b, …

17

Unsupervised Semantic Parsing

USP Recursively cluster arbitrary expressions composed with / by similar expressions

IL-4 induces CD11b

Protein IL-4 enhances the expression of CD11b

CD11b expression is enhanced by IL-4 protein

The cytokin interleukin-4 induces CD11b expression

IL-4’s up-regulation of CD11b, …

Cluster forms in composition with same forms

18

Unsupervised Semantic Parsing

USP Recursively cluster arbitrary expressions composed with / by similar expressions

IL-4 induces CD11b

Protein IL-4 enhances the expression of CD11b

CD11b expression is enhanced by IL-4 protein

The cytokin interleukin-4 induces CD11b expression

IL-4’s up-regulation of CD11b, …

Cluster forms in composition with same forms

19

Unsupervised Semantic Parsing

USP Recursively cluster arbitrary expressions composed with / by similar expressions

IL-4 induces CD11b

Protein IL-4 enhances the expression of CD11b

CD11b expression is enhanced by IL-4 protein

The cytokin interleukin-4 induces CD11b expression

IL-4’s up-regulation of CD11b, …

Cluster forms in composition with same forms

Probabilistic Model for USP

Joint probability distribution over input dependency trees and their semantic parses

Use Markov logic A Markov Logic Network (MLN) is a set of

pairs (Fi, wi) where Fi is a formula in higher-order logic

wi is a real number1

( ) exp ( )i ii

P x w N xZ

Number of true

groundings of Fi

21

Unsupervised Semantic Parsing

Exponential prior on number of parameters Cluster mixtures:InClust(e,+c) ^ HasValue(e,+v)

Object/Event Cluster: INDUCE

induces 0.1

enhances 0.4

Property Cluster: INDUCER

0.5

0.4…

IL-4 0.2

IL-8 0.1

None 0.1

One 0.8

nsubj

agent

22

Inference: Hill-Climb Probability

Initialize

Search Operator

Lambda reduction

induces

protein CD11B

nsubj dobj

IL-4

nn

?

? ?

?

?

?

?

protein

IL-4

nn

protein

IL-4

nn

?

?

?

?

23

Learning: Hill-Climb Likelihood

enhances 1induces 1 protein 1IL-4 1

MERGE COMPOSE

IL-4 protein 1induces 0.2enhances 0.8

…Initialize

Search Operator

enhances 1induces 1 protein 1IL-4 1

24

Outline

Background: USP Unsupervised ontology induction Conclusion

OntoUSP

USP + Hierarchical clustering + Shrinkage Modify the cluster mixture formulaInClust(e,c) ^ ISA(c,+d) ^ HasValue(e,+v)

26

New Operator: Abstraction

induces 0.30.1

enhances

ISA ISA

inhibits 0.2suppresses 0.1

induces 0.6

up-regulates 0.2

INDUCE

INHIBIT

inhibits 0.4

0.2

suppresses

INHIBIT

inhibits 0.4

0.2

suppressesinduces 0.6

up-regulates 0.2

INDUCE

MERGE with

REGULATE?

Captures substantial similarities

27

Experiments

Evaluate on an end task: Question answering

Applied OntoUSP to extract knowledge from text and answer questions

Evaluation: Number of answers and accuracy GENIA dataset: 1999 Pubmed abstracts Questions

Use simple questions in this paper, e.g.:What does anti-STAT1 inhibit?What regulates MIP-1 alpha?

Sample 2000 questions according to frequency

Total vs. Correct Answers

0

100

200

300

400

500

KW-SYN TextRunner RESOLVER DIRT USP OntoUSP

Improves recall over USP by 47%

Five times as many correct answers as TextRunner

Highest accuracy of 91%

Induced Ontology (Partial)

30

Question-Answer: Example

Q: What does IL-2 control?A: The DEX-mediated IkappaBalpha inductionSentence:

Interestingly, the DEX-mediated IkappaBalpha induction was completely inhibited by IL-2, but not IL-4, in Th1 cells, while the reverse profile was seen in Th2 cells.

31

Conclusion

OntoUSP: Unsupervised ontology induction USP + hierarchical clustering / smoothing Jointly conducts ontology induction,

population, and knowledge extraction See you at poster 46

Recommended