Upload
larkc
View
729
Download
0
Tags:
Embed Size (px)
DESCRIPTION
The aim of the EU FP 7 Large-Scale Integrating Project LarKC is to develop the Large Knowledge Collider (LarKC, for short, pronounced “lark”), a platform for massive distributed incomplete reasoning that will remove the scalability barriers of currently existing reasoning systems for the Semantic Web. The LarKC platform is available at larkc.sourceforge.net. This is the first of two hand-ons that introduce participants to working with directly LarKC code.
Citation preview
Copyright 2007
Cyc-Gateworkflow
Blaz Fortuna, Luka BradeskoCycorp Europe, Slovenia
Goal
• Demonstrate reasoning over non-structured input data
• Learn how to correctly annotate a new plug-in
• Learn how to add a new plug-in to the platform
External tools useds
• GATE– Information Extraction framework– Used here for extraction of named entities
from articles
• ResearchCyc– Common-sense knowledge base
• ~300,000 concepts, 1.3M assertions
– Reasoning engine
Pipeline diagram
Query Identify
Transform
Select
ReasonResult
ResearchCyc
GATE
Internet
Example
Query
PREFIX cyc: <http://www.cycfoundation.org/concepts/>
SELECT ?company WHERE
{ ?company cyc:mentionedInArticle " http://shodan.ijs.si:8080/GateServer/news.txt " .
?company cyc:isa cyc:PubliclyHeldCorporation }
Identify
• Find links to html documents and retrieve them using ArticleIdentifier plugin.– Returns a text document:
http://shodan.ijs.si:8080/GateServer/news.txt
Transform
• Use GATE to extract organizations– Retruns SetOfStatements of style:
article-0 urn:hasUrl “http://shodan.ijs.si:8080/GateServer/news.txt "
company-0 urn:nameString “Microsoft”
company-0 urn:mentionedInArticle article-0
company-1 urn:nameString “Ford”
company-1 urn:mentionedInArticle article-0
…
Query:
…
?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news.txt"
…
Select
• Select only the companies with corresponding concept in ResearchCyc KBcompany-0 → #$MicrosoftInccompany-1 → #$FordMotors
• Replace URIs with Cyc conceptscyc:mentionedInArticle → #$mentionedInArticle
• Output:
#$MicrosoftInc #$mentionedInArticle #$article-0
#$FordMotors #$mentionedInArticle #$article-0
…
Reason
• Reason– Load the triples with
Cyc concept names in ReasearchCyc KB
– Transform SPARQL query to Cyc query
– Execute and retrieve results
Run the workflow on your computer!
Main class: eu.larkc.core.LarkcVM arguments: -Xmx512m
Run SPARQL client
• In windows:Double-click SPARQLClient.jar
• In Linux:java –jar SPARQLClient.jar
Run example query
• Execute query in SPARQL Client
• Walk-through the output of the program
• Go through the plug-ins’ .java files
Other interesting queries
PREFIX cyc: <http://www.cycfoundation.org/concepts/>SELECT ?company WHERE{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news.txt" .?company cyc:isa cyc:PubliclyHeldCorporation }
PREFIX cyc: <http://www.cycfoundation.org/concepts/>SELECT ?company WHERE{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news.txt" .?company cyc:isa cyc:SoftwareVendor }
PREFIX cyc: <http://www.cycfoundation.org/concepts/>SELECT ?company WHERE{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news2.txt" .?company cyc:isa cyc:SoftwareVendor }
PREFIX cyc: <http://www.cycfoundation.org/concepts/>SELECT ?company WHERE{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news.txt" .?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news2.txt" .?company cyc:isa cyc:Business }
Other interesting queries
PREFIX cyc: <http://www.cycfoundation.org/concepts/>
SELECT ?company WHERE
{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news2.txt" .
?company cyc:makesProductType cyc:CellularTelephone }
PREFIX cyc: <http://www.cycfoundation.org/concepts/>
SELECT ?company WHERE
{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news2.txt" .
?company cyc:makesProductType cyc:CellularTelephone .
?company cyc:stockTickerSymbol ?ticker }
PREFIX cyc: <http://www.cycfoundation.org/concepts/>
SELECT ?company WHERE
{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news2.txt" .
?program cyc:programAuthor ?company }
PREFIX cyc: <http://www.cycfoundation.org/concepts/>
SELECT ?company WHERE
{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news2.txt" .
?competitor cyc:competitors ?company .
?competitor cyc:makesProductType cyc:CellularTelephone }
Plug-in SAWSDL description
<wsdl:description>
<!-- COMMON TO ALL SELECTERS -->
<wsdl:interface name="identifier"
sawsdl:modelReference="http://larkc.eu/plugin#Identifier">
</wsdl:interface>
<wsdl:binding name="larkcbinding" type="http://larkc.eu/wsdl-binding" />
<!-- SPECIFIC TO THIS IDENTIFIER -->
<wsdl:service
name="urn:eu.larkc.plugin.identify.article.ArticleIdentifier"
interface="identifier”
sawsdl:modelReference="http://larkc.eu/plugin#ArticleIdentifier" >
<wsdl:endpoint
location="java:eu.larkc.plugin.identify.article.ArticleIdentifier" />
</wsdl:service>
</wsdl:description>
Plug-in ontology
@prefix larkc: <http://larkc.eu/plugin#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
larkc:ArticleIdentifier
rdf:type rdfs:Class ;
rdfs:subClassOf larkc:Identifier ;
larkc:hasInputType larkc:SPARQLQuery ;
larkc:hasOutputType larkc:NaturalLanguageDocument .
Scripted decider
Pipeline pipeline = new Pipeline();
pipeline.addPlugIn(new URIImpl("urn:eu.larkc.plugin.identify.article.ArticleIdentifier"));
pipeline.addPlugIn(new URIImpl("urn:eu.larkc.plugin.transform.gate.GateTransformer"));
pipeline.addPlugIn(new URIImpl("urn:eu.larkc.plugin.select.cycselecter.CycSelecter"));
pipeline.addPlugIn(new URIImpl("urn:eu.larkc.plugin.reason.cycreasoner.CycReasoner"));
try {
pipeline.start(theQuery);
} catch (Exception e) {
// error
}
return (VariableBinding)pipeline.take();
Write a new plug-in
• Create new project– New Folder– Link bin directory– Make source directory– Add libraries
• Prepare code:– Copy-paste GateTransformer.Java– Rename it to SimpleNamedEntitiyExtractor– Insert code available in SimpleNamedEntitiyExtractor.txt
• Prepare/update meta-data files– SimpleNamedEntitiyExtractor.wsdl– SimpleNamedEntitiyExtractor.rdf
• Update CycGateDecider• Clean, Build and Run!