15
Benjamin Adrian, Sven Schwarz http://www.dfki.de/~lastname Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale Benjamin Adrian, Sven Schwarz

Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale

Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

Using Suffix Arrays for Efficient Recognition of Named Entities

in Large Scale

Benjamin Adrian,Sven Schwarz

Page 2: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale

2Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

A huge Web of Data

The Semantic Web offerstechniques for ...

● representing,● formalizing,● and reasoning information

… on the WWW in order to make information ...

● transferable,● portable, ● and interpretable

… for machine consumption.∑ 9,363,625 distinct literal values

Page 3: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale

3Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

Wouldn't it be great to … ?

… to link entity references in text to referents in RDF graphs.

Goal: Enrich natural language text with formal facts.

Benjamin works at DFKI, Kaiserslautern.

Page 4: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale

4Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

natural language text

How to recognize entity references ?

→ application of relational databases and suffix arrays

efficient representation RDF source

Benjamin works at DFKI, Kaiserslautern.

Page 5: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale

5Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

Entity Recognition Process

text suffix array database RDF graph

query

candidates withmatching prefixes

hashes

prefixhashing

noun-phrasechunking

exact matches

exact match

Page 6: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale

6Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

RDF statements

<#19810211> <rdfs:label> “Benjamin Adrian”<#67478302> <rdfs:label> “DFKI”

<#19810211> <#employedAt> <#67478302>

symbols

relation

Page 7: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale

7Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

Represent RDF data

RESOURCE INDEX

URI INDEX

RELATIONS

SUBJECT PREDICATE OBJECT

SYMBOLS

SUBJECT PREDICATE OBJECT

LITERAL INDEX

LITERALINDEX HASH

sepatarate storage of symbols and relations

dictionaries

Page 8: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale

8Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

Suffix Array

“Benjamin Adrian works in DFKI, Kaiserslautern”

Adrian works in DFKI, KaiserslauternBenjamin Adrian works in DFKI, KaiserslauternDFKI, Kaiserslauternin DFKI, KaiserslauternKaiserslauternworks in DFKI, Kaiserslautern

Text

Suffix array (sorted list of suffixes)

Page 9: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale

9Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

Suffix Array

Benjamin AdrianDFKIKaiserslautern

Adrian works in DFKI, KaiserslauternBenjamin Adrian works in DFKI, KaiserslauternDFKI, KaiserslauternKaiserslautern

“Benjamin Adrian works in DFKI, Kaiserslautern”

Adrian works in DFKI, KaiserslauternBenjamin Adrian works in DFKI, KaiserslauternDFKI, Kaiserslauternin DFKI, KaiserslauternKaiserslauternworks in DFKI, Kaiserslautern

Text

Suffix array (sorted list of suffixes)

Phrases in text Reduced suffix array

Page 10: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale

10Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

Noun phrases in natural language text

Page 11: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale

11Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

Hashing prefixes

LITERAL INDEX

LITERALINDEX HASHAdrian works in DFKI, KaiserslauternBenjamin Adrian works in DFKI, KaiserslauternDFKI, KaiserslauternKaiserslautern

Suffix array (hashed prefix size = 4)

Page 12: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale

12Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

Select candidates from database

Page 13: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale

13Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

Response time

Page 14: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale

14Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

Summary

text suffix array database RDF graph

query

candidates withmatching prefixes

hashes

prefixhashing

noun-phrasechunking

exact matches

exact match

Page 15: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale

15Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

Thank you

Questions?

Benjamin Adrian

Sven Schwarz