Text, Knowledge, and Information Extraction - Meetupfiles.meetup.com/14535342/Text, Knowledge and Information...Information into Information Extraction Systems by Gibbs Sampling. Proceedings

NICTA Copyright 2014

Text, Knowledge, and Information Extraction

Lizhen Qu


A bit about Myself… •  PhD: Databases and Information Systems Group

(MPII) –  Advisors: Prof. Gerhard Weikum and Prof. Rainer Gemulla –  Thesis: “Sentiment Analysis with Limited Training Data”

•  Now: machine learning group at NICTA, adjunct research fellow at ANU.


Macquarie

3


News about Macquarie Bank

4


Negative News about Macquarie Bank

5


Simple Math Problem

Bob has 15 apples. He gives 9 to Sarah. How many apples does Bob have now?

6


Bob has 15 apples. He gives 9 to Sarah. How many apples does Bob have now?

7


Information Extraction

8

•  Named entity recognition •  Named entity disambiguation •  Relation extraction


Knowledge Bases (Open Linked Data)

9

Entity Graph

Economic Graph

OpenIE (Ollie, Reverb)

(Bob_Dylan, compose, Like_a_rolling_stone”) (The_Dark_Night, directedBy, Christopher_Nolan)



10

Entity Graph

Economic Graph


YAGO #classes: 350,000 #entities: 10 million #facts: 120 million #language： 10



11

Entity Graph

Economic Graph


DBpedia #classes: 735 #entities: 38.3 million #triples: 6.9 billion #languages: 128



12

Entity Graph

Economic Graph


Freebase #entities: 50 million #facts: 3 billion #languages: almost 70


Construct YAGO from (Semi) Structured Data

13


IE Challenge: ambiguity of Natural Language

14

I made her duck.

i.  I cooked waterfowl for her. ii.  I cooked waterfowl belonging her. iii.  I created the duck she owns. iv.  I caused her to quickly lower her head or body. v.  I waved my magic wand and turned her into a

waterfowl.


Named Entity Recognition

Research at Stanford led to a search engine company, founded by Page and Brin.

15

PER PER

ORG

O O ORG O

O

O

O O O

O

O

PER O

PER O

Research at Stanford led to search engine company , founded by Page and Brin .

TASK:

Machine Learning Problem:


Learning and Prediction

16

Feature Extraction Sentences

train models

prediction

Labeled Sentences

has labels

no labels


Feature Extraction •  Use features to represent each word.

•  Vectorise feature representations.

17

w-2 Research w-1 at w0 Stanford w+1 led w+2 to POS noun capitalized? true

w-2 to w-1 a w0 search w+1 engine w+2 company POS noun capitalized? false

Features of Stanford :

Features of Search :

w-2 = research capitalized w0 = stanford w0 = search …

1 1 1 0 …


Standard Model: Conditional Random Fields

18

•  Assigns local score to different (word, label) pairs. •  Joint inference to find best label sequences.

CRF: p(y|x) =

exp

⇥PTt=1

Pi �ifi(yt�1, yt, xt)

⇤

Z

Stanford NER [1]: 86% Best system [8]: 89%


Named Entity Disambiguation


19

PER PER

ORG TASK:

Larry Page Stanford Univeristy Sergey Brin


AIDA-light [2]

20


First Stage

21


Second Stage

22

AIDA-light [2]: 84.8% DBPepdia spotlight: 75%


Relation Extraction •  Relation mention extraction.

•  Expand knowledge bases.

23


PER: Larry_Page PER: Sergey_Brin

ORG: Stanford_University

?

Larry Page Stanford Univeristy

The Dark Night Christopher Nolan

?

?


Relation Mention Extraction •  Multi-class classification. •  Example features of a pair of entity mentions [3].

24


?

words between (Stanford, Page)

led, to, a, search, engine, company, founded, by

Named entity types (ORG, PER) Number of mentions between (Stanford, Page)

0

…

F-Measure on ACE: 71.2% [3]


Expand Knowledge Base •  Multi-instance, multi-label [4,5]. •  Distant supervision.

25

Larry Page Sergey Brin

relation-level label

Freebase


Larry Page and Sergey Brin explained why they just created Alphabet.

mention-level label ? mention-level label ?

MAP [3] : 56% MAP [4] : 66%


Open Information Extraction •  Extract triples of any relations from the web [6].

•  Optional: link triples to knowledge bases.

26

(“Bob Dylan”, “record”, “Like a rolling stone”)

It was exactly 50 years ago today that Bob Dylan walked into Studio A at Columbia Records in New York and recorded "Like a Rolling Stone”.

(“Bob Dylan”, “record”, “Like a rolling stone”)

The_Dark_Night Like_a_Rolling _Stone record

F1 [6] : 19.6% F1 [9] : 28.3%


Harvest Domain-Specific Knowledge •  Deep learning.

–  Learn cross-domain features. –  minimize training data.

•  Transfer learning.

27

newswire

source domain target domain

nurse handovers


Word Representation •  One-hot representation.

•  Distributed representation.

28

stanford [ 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ] university [ 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 ] oxford [ 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ] conference [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1 ] talk [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0 ]

stanford

university

oxford

conference

talk

= [0.01, 0.3, -0.5, 0.6]

NICTA Copyright 2014 29

Distributed Representation


Apply Distributed Representations for NER

30

compare

stanford

university

and

Feature Matrix o

UNI

UNI

o

label

oxford UNI

current word

first word to the right

2nd word to the right

first word to the left

2nd word to the left

Represent words based on positions rather than IDs.

…


Results of Named Entity Recognition [7]

31

•  Reduce the amount of training data. •  Tiny differences between word embeddings.


NER for Novel Named Entity Types •  Goals:

–  Minimize labeled training data. –  Leverage existing resources:

•  Labeled corpora. •  Unlabeled text. •  Existing knowledge bases.

32

person

orgnization

location

doctor

corporation

city

patient

hotel country

source domain target domain


Experimental Results on I2B2

33


Learn Text Representations for Relations •  Unsupervised pre-training. •  Distant supervision.

34

Larry Page Sergey Brin

co-founders

Freebase


Larry Page and Sergey Brin explained why they just created Alphabet.

Inferred mention-level label Inferred mention-level label


NICTA Deep Learning for IE Toolkit •  A fully integrated deep learning toolkit for NLP.

–  Pipelines include both NLP preprocessing and DL components.

–  Written in Scala/Java. –  Easy to write new ML component. –  Reuse UIMA NLP components.

•  Scalable. –  Easy switch between GPUs and CPUs. –  Learning on GPUs. –  Make use of UIMA for prediction.

35


References •  [1] Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating Non-local

Information into Information Extraction Systems by Gibbs Sampling. Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (ACL 2005).

•  [2] Nguyen, Dat Ba, et al. "Aida-light: High-throughput named-entity disambiguation." Linked Data on the Web at WWW2014 (2014).

•  [3] Chan, Yee Seng, and Dan Roth. "Exploiting background knowledge for relation extraction." Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 2010.

•  [4] Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, Christopher D. Manning. “Multi-instance Multi-label Learning for Relation Extraction.” Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing and Natural Language Learning, 2012.

•  [5] Riedel, Sebastian, et al. "Relation extraction with matrix factorization and universal schemas." (2013).

•  [6] Schmitz, Michael, et al. "Open language learning for information extraction." Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 2012.

•  [7] Qu, Lizhen, et al. "Big Data Small Data, In Domain Out-of Domain, Known Word Unknown Word: The Impact of Word Representation on Sequence Labelling Tasks." arXiv preprint arXiv:1504.05319 (2015).

•  [8] Rie Kubota Ando and Tong Zhang. 2005. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6:1817–1853

•  [9] Angeli, Gabor, Melvin Johnson Premkumar, and Christopher D. Manning. "Leveraging Linguistic Structure For Open Domain Information Extraction."

36


Resources •  YAGO:

http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago

•  DBPedia: http://wiki.dbpedia.org/ •  Alchemy : http://querybuilder.alchemyapi.com/builder •  Deep learning: http://www.deeplearning.net/ •  Word2vec : https://code.google.com/p/word2vec/ •  Mallet (Java): http://mallet.cs.umass.edu/ •  Factorie (Scala): http://factorie.cs.umass.edu/ •  Stanford CoreNLP: http://nlp.stanford.edu:8080/corenlp/ •  NLP conferences.

–  ACL, EMNLP, COLING, NAACL, EACL … •  NLP online courses.

–  https://www.coursera.org/course/nlangp –  https://www.youtube.com/playlist?list=PL6397E4B26D00A269

•  ML online courses. –  https://www.coursera.org/course/ml –  https://www.coursera.org/course/neuralnets –  http://www.socher.org/index.php/DeepLearningTutorial/DeepLearningTutorial

37

Documents

Text, Knowledge, and Information Extraction - Meetupfiles.meetup.com/14535342/Text, Knowledge and Information...Information into Information Extraction Systems by Gibbs Sampling. Proceedings