1 K NOWLEDGE G RAPH AND C ORPUS D RIVEN S EGMENTATION AND A NSWER I NFERENCE FOR T ELEGRAPHIC E...

KNOWLEDGE GRAPH AND CORPUS DRIVEN SEGMENTATION AND

ANSWER INFERENCE FOR TELEGRAPHIC ENTITY-SEEKING

QUERIES

EMNLP 2014

MANDAR JOSHI UMA SAWANT SOUMEN CHAKRABARTI

IBM RESEARCH IIT BOMBAY, YAHOO LABS

IIT BOMBAY

mandarj90@in.ibm.com

uma@cse.iitb.ac.in soumen@cse.iitb.ac.in

ENTITY-SEEKING TELEGRAPHIC QUERIES

• Short• Unstructured (like natural language

questions)• Expect entities as answers

• No reliable syntax clues Free word order No or rare capitalization, quoted phrases

• Ambiguous Multiple interpretations aamir khan films• Aamir Khan - the Indian actor or British

boxer• Films - appeared in, directed by, or about

• Previous QA work Convert to structured query Execute on knowledge graph (KG)

CHALLENGES

• KG is high precision but incomplete Work in progress Triples can not represent all information Structured – unstructured gap

• Corpus provides recall• fastest odi century batsman

WHY DO WE NEED THE CORPUS?

… Corey Anderson hits fastest ODI century. This was the first time two batsmen have hit hundreds in under 50 balls in the same ODI.

ANNOTATED WEB WITH KNOWLEDGE GRAPH

Entity: Cricketer

Type: /people/profession

instanceOf

Annotated document

Entity: Corey_Anderson

/people/person/profession

Type: /cricket/cricket_player

instanceOf

mentionOf

… Corey Anderson hits fastest ODI century in mismatch ... was the first time two batsmen have hit hundreds in under 50 balls in the same ODI.

INTERPRETATION VIA SEGMENTATION

• Queries seek answer entities (e2)

• Contain (query) entities (e1) , target types (t2), relations (r), and selectors (s).

SIGNALS FROM THE QUERY

query e1 r t2 s

washingtonfirst governor

washington governor governor first

washington - governor first

spider automobile company

spider - automobile company

automobile company company spider

Assignment of tokens to columns for illustration only; not necessarily optimal

• Interpretation = Segmentation + Annotation

• Segmentation of query tokens into 3 partitions Query entity (E1)

Relation and Type (T2/R) Selectors (S)

• Multiple ways to annotate each partition

SEGMENTATION AND INTERPRETATION

r: governorOf t2: us_state_governorr: null t2: us_state_governor

1. Washington (State)2. Washington_D.C. (City)

washington firstgovernorE1

partitionT2/R

partitionS partition

ΨT2, E2Entity Type Compatibility

COMBINING KG AND CORPUS EVIDENCE

Targettype

RelationQueryentity

Selectors

Segmentation

Candidateentity

ΨT2Typelanguagemodel

ΨRRelationlanguagemodel

ΨE1Entitylanguagemodel

ΨE1, R, E2KG-assisted relationevidence potential

ΨE1, R, E2, SCorpus-assistedentity-relationevidence potential

Washington | first | governorwashington first | governorZ

T2R E1 S

governorOfnull

us_state_governorgovernor_general

Washington (State)null

firstwashington first

Elisha Peyre Ferry

• Generate interpretations• Retrieve snippets for each interpretation• Construct candidate answer entities (e2)

set Top k from corpus based on snippet

frequency By KG links that are in interpretations set

• Inference

FROM QUERY TO ANSWER ENTITY

query – signals compatibility

e2-t2 compatibility

evidence from KG and corpus

• Objective: To map relation (or type) mentions in query to Freebase relation (or types)

• Relation Language Model (ΨR) Use annotated ClueWeb09 + Freebase triples Locate Freebase relation endpoints in corpus Extract dependency path words between

entities Maintain co-occurrence counts of <words,

rel> Assumption: Co-occurrence implies relation

• Type Language Model (ΨT2) Smoothed Dirichlet language model using

Freebase type names

RELATION AND TYPE MODELS

• Estimates support to e1-r-e2-s in corpus

• Snippet retrieval and scoring• Snippets scored using RankSVM • Partial list of features

#snippets with distance(e2 , e1 ) < k (k = 5, 10)

#snippets with distance(e2 , r) < k (k = 3, 6) #snippets with relation r = ⊥ #snippets with relation phrases as

prepositions #snippets covering fraction of query IDF > k

(k = 0.2, 0.4, 0.6, 0.8)

CORPUS POTENTIAL

LATENT VARIABLE DISCRIMINATIVE TRAINING (LVDT)

• q, e2 are observed; e1, t2, r and z are latent

• Non-convex formulation

• Constraints are formulated using the best scoring interpretation

• Training

• Inference

EXPERIMENTS

TEST BED• Freebase entity, type and relation knowledge

graph ~29 million entities 14000 types 2000 selected relation

• Annotated corpus Clueweb09B Web corpus having 50 million pages Google (FACC1), ~ 13 annotations per page Text and Entity Index

TEST BED• Query sets

TREC-INEX: 700 entity search queries WQT: Subset of ~800 queries from

WebQuestions (WQ) natural language query set [1], manually converted to telegraphic form

Available at http://bit.ly/Spva49

TREC-INEX WQT

Has type and/or relation hints

Has mostly relation hints

Answers from KG and corpus collected by volunteers

Answers from KG only collected by turkers.

Answer evidence from corpus (+ KG)

Answer evidence from KG

[1] Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on Freebase from question-answer pairs. In Empirical Methods in Natural Language Processing (EMNLP).

KG only Corpus only Both

TREC-INEX WQT

Unoptimized LVDT LVDTUnoptimized

Corpus and knowledge graph help each other to deliver better performance

SYNERGY BETWEEN KG AND CORPUS

TREC-INEX WQT0

No Interpretation Type + Selector [2] Unoptimized LVDT

PQUERY TEMPLATE

COMPARISON

Entity-relation-type-selector template provides yields better accuracy than type-selector template

[2] Uma Sawant and Soumen Chakrabarti. 2013. Learning joint query interpretation and response ranking. In WWW Conference, Brazil.

COMPARISON WITH SEMANTIC PARSERS

TREC-INEX WQT0

Jacana Sempre Unoptimized LVDT

• Benefits of collective inference automobile company makes spider Entity model fails to identify e1 (Alfa Romeo Spider) Recovery: automobile company makes spider

• Limitations Sparse corpus annotations south africa political system Few corpus annotations for e2: Constitutional

Republic Can’t find appropriate t2 (/../form_of_government)

and r (/location/country/form_of_government)

QUALITATIVE COMPARISON

e1: Automobile

t2: /../organization r : /business/industry/companies

SUMMARY

• Query interpretation is rewarding, but non-trivial

• Segmentation based models work well for telegraphic queries

• Entity-relation-type-selector template better than type-selector template

• Knowledge graph and corpus provide complementary benefits

• S&C: Uma Sawant and Soumen Chakrabarti. 2013. Learning joint query interpretation and response ranking. In WWW Conference, Brazil.

• Sempre: Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on Freebase from question-answer pairs. In Empirical Methods in Natural Language Processing (EMNLP).

• Jacana: Xuchen Yao and Benjamin Van Durme. 2014. Information extraction over structured data: Question answering with Freebase. In ACL Conference. ACL.

REFERENCES

• TREC-INEX and WQT Short URL http://bit.ly/Spva49 Long URL https://

docs.google.com/spreadsheets/d/1AbKBdFOIXum_NwXeWub0SdeG-y8Ub4_ub8qTjAw4Qug/edit#gid=0

• Project page http://www.cse.iitb.ac.in/~soumen/doc/CSA

THANK YOU!

QUESTIONS?

1 K NOWLEDGE G RAPH AND C ORPUS D RIVEN S EGMENTATION AND A NSWER I NFERENCE FOR T ELEGRAPHIC E...

Documents

P ARALLEL S KYLINE Q UERIES Foto Afrati Paraschos Koutris Dan Suciu Jeffrey Ullman University of Washington

A M IX - DOMAIN M ULTIMEDIA A LGORITHM IN V IDEO S EGMENTATION Yihan Sun CS&T 05 syhlalala@gmail.com

S EGMENTATION T ARGET MARKET P OSITIONING S ELLOFFVACATIONS. COM Prepared by Inna Sarkisova Sachin Pandey Prathmesh Ghag Alisa Orlova TOUR 9620 Group #

S PECKLE R EDUCTION, S EGMENTATION AND R EGISTRATION IN M EDICAL I MAGES : A C OMPARATIVE S TUDY By Abhinav Kumar Kushwaha Alok Ranjan Supervisor Dr. Rajeev

campus.unipune.ac.in...No direct ueries/ hone calls on the attem of the uestion a er will be entertained. Dr. S. D. Pardeshi Associate Professor, Department of Geography, Savitribai

It’s Saving Season! UP TO 30 - Indoff Business Products · 2017. 10. 10. · UERIES 1. Job Name USSIV00078130 2017 Smart Deals OCTOBER DISTRIBUTION LIST File Name USSIV00078130_2017

FAQenglish - NCTEncte-india.org/ncte_new/wp-content/uploads/2015/08/FAQenglish.pdf · a new teacher education ... Admission Criteria ueries Whether 50% marks in Bachelor's ... the

Supporting Q ueries and Analyses of Large-Scale Social Media Data with Customizable and Scalable Indexing Techniques over NoSQL databases

udamed ata xchange Services and ntity Models Introduction

Media pack 2018 - Amazon Web Services...Environment Analyst | Development & Infrastructure Service – Media Pack 2018 For all advertising or sponsorship ueries salesenvironmentanalyst.com

Microsoft Access 2007 Mentor Kusho Fier 2012 · PDF fileMicrosoft Access 2007 Mentor Kusho Fier 2012 Faqe | 2 Mësimi 5: Krijimi i Pyetsorit (Q ueries) në Microsoft Access Ju mund

RZ Walter K News Orgatec 2016leadchair.walterknoll.de/en/media/pdf/walter_k_news...Leadership & ide NTiTY Shaping modern living and working worlds Walter K. gestaltet diese Orte mit

a Creole al. there Is intrinsic id•ntity between tense the

Foreign Account Tax Compliance Act (FATCA) E ntity ... Banking Foreign Account Tax Compliance Act (FATCA) E ntity Classification Guide

NTITY MANAGEMENT TRAINING - Microsofthbexmail.blob.core.windows.net/eap/IPAS Homepage... · 2016. 11. 11. · insurance coverage. • Provide consumers with support and access to

Kingstar - Tony's Tirestonystiresonline.com/.../2013/01/Kingstar_Catalog_Spread_2012.pdf · Design Kingstar Brand Ide ntity Product Brand Visual Communication Design Kingstar Brand

CENTRO DE MECANIZADO DE CONTROL NUMÉRICO · taladros horizontales profundos, retesta-do, etc.). 14 IDE NTITY DISEÑO FUNCIONAL Un diseño innovador y esencial distingue el estilo

C apital R egion C onnector E ntity P rogram. Currently, there are 730,000 uninsured Marylanders Men: 56 Percent Women: 44 Percent Uninsured Maryland

notification-img1506494528862-60-rs · THE COL UERIES CœPAW LMT ED (A (POH07101 , UI"OZrG19mSGCCO-571, The Snwed Cd.ees Campmy Lhlted, 1m arcmd rrarvel cf hdan Mldng hüJstry, Is

Corner Detection and Curve Egmentation