Upload
benjamin-adrian
View
946
Download
0
Embed Size (px)
DESCRIPTION
Citation preview
Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
Using Suffix Arrays for Efficient Recognition of Named Entities
in Large Scale
Benjamin Adrian,Sven Schwarz
2Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
A huge Web of Data
The Semantic Web offerstechniques for ...
● representing,● formalizing,● and reasoning information
… on the WWW in order to make information ...
● transferable,● portable, ● and interpretable
… for machine consumption.∑ 9,363,625 distinct literal values
3Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
Wouldn't it be great to … ?
… to link entity references in text to referents in RDF graphs.
Goal: Enrich natural language text with formal facts.
Benjamin works at DFKI, Kaiserslautern.
4Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
natural language text
How to recognize entity references ?
→ application of relational databases and suffix arrays
efficient representation RDF source
Benjamin works at DFKI, Kaiserslautern.
5Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
Entity Recognition Process
text suffix array database RDF graph
query
candidates withmatching prefixes
hashes
prefixhashing
noun-phrasechunking
exact matches
exact match
6Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
RDF statements
<#19810211> <rdfs:label> “Benjamin Adrian”<#67478302> <rdfs:label> “DFKI”
<#19810211> <#employedAt> <#67478302>
symbols
relation
7Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
Represent RDF data
RESOURCE INDEX
URI INDEX
RELATIONS
SUBJECT PREDICATE OBJECT
SYMBOLS
SUBJECT PREDICATE OBJECT
LITERAL INDEX
LITERALINDEX HASH
sepatarate storage of symbols and relations
dictionaries
8Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
Suffix Array
“Benjamin Adrian works in DFKI, Kaiserslautern”
Adrian works in DFKI, KaiserslauternBenjamin Adrian works in DFKI, KaiserslauternDFKI, Kaiserslauternin DFKI, KaiserslauternKaiserslauternworks in DFKI, Kaiserslautern
Text
Suffix array (sorted list of suffixes)
9Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
Suffix Array
Benjamin AdrianDFKIKaiserslautern
Adrian works in DFKI, KaiserslauternBenjamin Adrian works in DFKI, KaiserslauternDFKI, KaiserslauternKaiserslautern
“Benjamin Adrian works in DFKI, Kaiserslautern”
Adrian works in DFKI, KaiserslauternBenjamin Adrian works in DFKI, KaiserslauternDFKI, Kaiserslauternin DFKI, KaiserslauternKaiserslauternworks in DFKI, Kaiserslautern
Text
Suffix array (sorted list of suffixes)
Phrases in text Reduced suffix array
10Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
Noun phrases in natural language text
11Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
Hashing prefixes
LITERAL INDEX
LITERALINDEX HASHAdrian works in DFKI, KaiserslauternBenjamin Adrian works in DFKI, KaiserslauternDFKI, KaiserslauternKaiserslautern
Suffix array (hashed prefix size = 4)
12Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
Select candidates from database
13Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
Response time
14Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
Summary
text suffix array database RDF graph
query
candidates withmatching prefixes
hashes
prefixhashing
noun-phrasechunking
exact matches
exact match
15Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
Thank you
Questions?
Benjamin Adrian
Sven Schwarz