226
Knowledge Graph 101 from the perspective of engineers

Knowledge Graph 101 –from the perspective of engineers

Embed Size (px)

Citation preview

Page 1: Knowledge Graph 101 –from the perspective of engineers

Knowledge Graph 101 – from

the perspective of engineers

Page 2: Knowledge Graph 101 –from the perspective of engineers

A Brief Introduction to

Knowledge Graph

Page 3: Knowledge Graph 101 –from the perspective of engineers
Page 4: Knowledge Graph 101 –from the perspective of engineers
Page 5: Knowledge Graph 101 –from the perspective of engineers
Page 6: Knowledge Graph 101 –from the perspective of engineers
Page 7: Knowledge Graph 101 –from the perspective of engineers
Page 8: Knowledge Graph 101 –from the perspective of engineers
Page 9: Knowledge Graph 101 –from the perspective of engineers

Google: Network of ‘things’Improved search and subject indexing

Page 10: Knowledge Graph 101 –from the perspective of engineers

The key for ‘Smart Data’

Things, not strings!

Page 12: Knowledge Graph 101 –from the perspective of engineers

Web of Documents

About:

•United States

•Barack Obama

•Presidential Election (Past)

•Some relevance to currently held

•Democrats & Republicans

•Winner & Looser

•Chicago

•Etc.. About:

•Location, Event, Places, Persons,

Groups, Abstract concepts (winning,

losing)

Page 13: Knowledge Graph 101 –from the perspective of engineers

Web of Documents

People can parse web of documents and

extract information from them

Page 14: Knowledge Graph 101 –from the perspective of engineers

humansthe web to

Page 15: Knowledge Graph 101 –from the perspective of engineers

The web of documents

Analogy– Global file system

Designed for– Human consumption

Primary objects– documents

Links between– documents (or sub-parts of)

Semantics– implicit

Page 16: Knowledge Graph 101 –from the perspective of engineers

The web of documents: Issues

Web of Documents but primarily About Data– But the connection is implicit

Integration & Querying– Show me all the news stories by US Presidents coming from

Chicago?

Page 17: Knowledge Graph 101 –from the perspective of engineers

Semantic Web

•We need to help machines to understand the web, so machines can

help us to understand things.

•If machines have access to the data about things (i.e. knowledge)

then they can do better job while processing documents

Page 18: Knowledge Graph 101 –from the perspective of engineers

Web of Data (Linked Data)

A

Thing

Thing

B

Thing

Thing

C

Thing

Thing

...

...

...

typed links typed links

Page 19: Knowledge Graph 101 –from the perspective of engineers

Linked Data…

…. is about creating global database of linked

things

…refers to a set of best practices for

publishing and interlinking data on the Web…

….is a method of publishing data [on the

Web], so that it can be interlinked and become

more useful.

Page 20: Knowledge Graph 101 –from the perspective of engineers

The Web of Linked Data

Analogy– a global database

Designed for– machines first, Humans later

Primary objects– things (or descriptions of things)

Links between– things

Semantics– explicit

Page 21: Knowledge Graph 101 –from the perspective of engineers
Page 22: Knowledge Graph 101 –from the perspective of engineers
Page 23: Knowledge Graph 101 –from the perspective of engineers
Page 24: Knowledge Graph 101 –from the perspective of engineers
Page 25: Knowledge Graph 101 –from the perspective of engineers

Semantic Web Standard Stack

Page 26: Knowledge Graph 101 –from the perspective of engineers
Page 27: Knowledge Graph 101 –from the perspective of engineers

Semantic Web Standard Stack

Page 28: Knowledge Graph 101 –from the perspective of engineers

Semantic Technologies : URIs

Like URLs but not just for Web pages– For things (cars, people, places, organisations, coursework, etc.)

“A Uniform Resource Identifier (URI) provides a simple

and extensible means for identifying a resource.” -- RFC

3986

Many different schemes – http://, ftp://, mailto:

Examples: http://ecust.edu.cn/ontologies/foaf/whf/me.rdf

http://dbpedia.org/resource/China

Page 29: Knowledge Graph 101 –from the perspective of engineers

HTTP

Data access mechanism between web

browsers (client) and servers

HTTP messages consists of requests from

client to servers and responses from servers

to clients

HTTP request/response methods: GET,

POST, etc.

Page 30: Knowledge Graph 101 –from the perspective of engineers

Semantic Technologies: RDF

Data format to describe things and their

interrelations

is based on triples

Subject, predicate, object

<The sky> <has the colour> <blue>

Page 31: Knowledge Graph 101 –from the perspective of engineers
Page 32: Knowledge Graph 101 –from the perspective of engineers

http://richard.cyganiak.de/2007/10/lod/lod-datasets_2011-09-19_colored.png

Web of Data: RDF, Tables, Microdata

YAGO

Cyc

TextRunner/

ReVerbWikiTaxonomy/

WikiNet

SUMO

ConceptNet 5

BabelNet

ReadTheWeb

30 Bio. SPO triples (RDF) and growing

Page 33: Knowledge Graph 101 –from the perspective of engineers

http://richard.cyganiak.de/2007/10/lod/lod-datasets_2011-09-19_colored.png

Web of Data: RDF, Tables, Microdata

YAGO

30 Bio. SPO triples (RDF) and growing

• 10M entities in

350K classes

• 120M facts for

100 relations

• 100 languages

• 95% accuracy

• 4M entities in

250 classes

• 500M facts for

6000 properties

• live updates

• 25M entities in

2000 topics

• 100M facts for

4000 properties

• powers Google

knowledge graph

Ennio_Morricone type composerEnnio_Morricone type GrammyAwardWinnercomposer subclassOf musicianEnnio_Morricone bornIn RomeRome locatedIn ItalyEnnio_Morricone created Ecstasy_of_GoldEnnio_Morricone wroteMusicFor The_Good,_the_Bad_,and_the_UglySergio_Leone directed The_Good,_the_Bad_,and_the_Ugly

Page 34: Knowledge Graph 101 –from the perspective of engineers

rdf.freebase.com/ns/en.romedata.nytimes.com/51688803696189142301

geonames.org/3169070/roma

N 41° 54' 10'' E 12° 29' 2''

dbpedia.org/resource/Rome

yago/wordnet:Actor109765278

yago/wikicategory:ItalianComposer

yago/wordnet: Artist109812338

imdb.com/name/nm0910607/

Linked RDF Triples on the Web

imdb.com/title/tt0361748/

dbpedia.org/resource/Ennio_Morricone

500 Mio. links

Page 35: Knowledge Graph 101 –from the perspective of engineers
Page 36: Knowledge Graph 101 –from the perspective of engineers

triples distribution

links distribution

http://lod-cloud.net/state/

Linked Open Data cloud stats

Page 37: Knowledge Graph 101 –from the perspective of engineers
Page 38: Knowledge Graph 101 –from the perspective of engineers
Page 39: Knowledge Graph 101 –from the perspective of engineers

Embedding (RDF) Microdata in HTML Pages

May 2, 2011

Maestro Morricone will perform

on the stage of the Smetana Hall

to conduct the Czech National

Symphony Orchestra and Choir.

The concert will feature both

Classical compositions and

soundtracks such as

the Ecstasy of Gold.

In programme two concerts for

July 14th and 15th.

<html … May 2, 2011

<div typeof=event:music>

<span id="Maestro_Morricone">

Maestro Morricone

<a rel="sameAs"

resource="dbpedia/Ennio_Morricone "/>

</span>…

<span property = "event:location" >

Smetana Hall </span>

<span property="rdf:type"

resource="yago:performance">

The concert </span> will feature

<span property="event:date"

content="14-07-2011"></span>

July 1

</div>

Supported by RDFa

and microformats

like schema.org

Page 40: Knowledge Graph 101 –from the perspective of engineers
Page 41: Knowledge Graph 101 –from the perspective of engineers
Page 42: Knowledge Graph 101 –from the perspective of engineers
Page 43: Knowledge Graph 101 –from the perspective of engineers
Page 44: Knowledge Graph 101 –from the perspective of engineers

Web Data Commons

Page 45: Knowledge Graph 101 –from the perspective of engineers

Use Case: Question Answering

This town is known as "Sin City" & its

downtown is "Glitter Gulch"

This American city has two airports

named after a war hero and a WW II battle

knowledge

back-ends

question

classification &

decomposition

D. Ferrucci et al.: Building Watson. AI Magazine, Fall 2010.

IBM Journal of R&D 56(3/4), 2012: This is Watson.

Q: Sin City ?

movie, graphical novel, nickname for city, …

A: Vegas ? Strip ?

Vega (star), Suzanne Vega, Vincent Vega, Las Vegas, …

comic strip, striptease, Las Vegas Strip, …

45

Page 46: Knowledge Graph 101 –from the perspective of engineers

Moon Shots in Anderson Cancer Center

Page 47: Knowledge Graph 101 –from the perspective of engineers

Dynamic Semantic Publishing in BBC

Page 48: Knowledge Graph 101 –from the perspective of engineers
Page 49: Knowledge Graph 101 –from the perspective of engineers
Page 50: Knowledge Graph 101 –from the perspective of engineers

Looking Inside the Data

Model and Query Language

of Knowledge Graph

Page 51: Knowledge Graph 101 –from the perspective of engineers

RDF is the first layer of the

semantic web standards

Introduction to RDF

Page 52: Knowledge Graph 101 –from the perspective of engineers

RDF stands for

Resource Description Framework

Introduction to RDF

Page 53: Knowledge Graph 101 –from the perspective of engineers

RDF stands for

Resource: pages, images, videos, ...

everything that can have a URI

Description: attributes, features, and

relations of the resources

Framework: model, languages and

syntaxes for these descriptions

Introduction to RDF

Page 54: Knowledge Graph 101 –from the perspective of engineers

RDF model

In RDF knowledge always comes in three.

RDF is a triple model i.e. every piece of

knowledge is broken down into

( subject , predicate , object )

Page 55: Knowledge Graph 101 –from the perspective of engineers

Example of RDF

doc.html has author Haofen and has theme

Music

doc.html has author Haofen

doc.html has theme Music

Page 56: Knowledge Graph 101 –from the perspective of engineers

Example of RDF

doc.html has author Haofen and has theme

Music

( doc.html, author, Haofen)

( doc.html, theme, Music )

Page 57: Knowledge Graph 101 –from the perspective of engineers

Predicate

Subject

Object

a triplethe RDF atom

Page 58: Knowledge Graph 101 –from the perspective of engineers

RDF is also a graph model

to link the descriptions of resources

Page 59: Knowledge Graph 101 –from the perspective of engineers

RDFtriples can be seen as arcs

of a graph (vertex, edge, vertex)

Page 60: Knowledge Graph 101 –from the perspective of engineers

(doc.html, author, Haofen)

(doc.html, theme, Music)

Page 61: Knowledge Graph 101 –from the perspective of engineers

Haofen

author

doc.html

theme

Music

Page 62: Knowledge Graph 101 –from the perspective of engineers

RDFin resources and properties are

identified by URIs

http://mydomain.org/mypath/myresource

Page 63: Knowledge Graph 101 –from the perspective of engineers

http://ex.org/~haofen#me

http://ex.org/schema#author

http://ex.org/rr/doc.html

http://ex.org/schema#theme

Music

Page 64: Knowledge Graph 101 –from the perspective of engineers

RDFin values of properties can also

be literals i.e. strings of characters

Page 65: Knowledge Graph 101 –from the perspective of engineers

(doc.html, author, Haofen)

(doc.html, theme, "Music")

Page 66: Knowledge Graph 101 –from the perspective of engineers

http://ex.org/~haofen#me

http://ex.org/schema#author

http://ex.org/rr/doc.html

http://ex.org/schema#theme

“Music”

Page 67: Knowledge Graph 101 –from the perspective of engineers

RDFin literal values of properties

can also be typed with XML datatypes

Page 68: Knowledge Graph 101 –from the perspective of engineers

doc.html has one author Haofen

and has 192 pages

Page 69: Knowledge Graph 101 –from the perspective of engineers

http://ex.org/~haofen#me

http://ex.org/schema#author

http://ex.org/rr/doc.html

http://ex.org/schema#nbPages

"192"^^xsd:integer

Page 70: Knowledge Graph 101 –from the perspective of engineers

RDF Blank Nodes

RDF allows blank nodes.

A resource may be anonymous

i.e. not identified by a URI, and noted _: xyz

E.g. there exists a report about Music

Page 71: Knowledge Graph 101 –from the perspective of engineers

71

http://ex.org/schema#Report

rdf:type

_:x

http://ex.org/schema#theme

"Music"

Page 72: Knowledge Graph 101 –from the perspective of engineers

RDF is Data Model, Not

Serialisation Format

RDF Serialisation Formats : RDF/XML, Turtle, N-Triples

– RDF/XML

<rdf:RDF

xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#

xmlns:foaf=http://xmlns.com/foaf/0.1 />

<foaf:Person rdf:ID="me">

<foaf:name>Haofen Wang</foaf:name>

<foaf:title>Dr</foaf:title>

<foaf:based_near rdf:resource="http://dbpedia.org/resource/Leeds"/>

Page 73: Knowledge Graph 101 –from the perspective of engineers

RDF is Data Model, Not

Serialisation Format

RDF Serialisation Formats : RDF/XML, Turtle, N-Triples

– Turtle

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

@prefix foaf: <http://xmlns.com/foaf/0.1/> .

@prefix dt: < http://ecust.edu.cn/ontologies/foaf/whf/me.rdf#>

dt:me

rdf:type foaf:Person ;

foaf:name “Haofen Wang" ;

foaf:title “Dr" .

Page 74: Knowledge Graph 101 –from the perspective of engineers

RDF is Data Model, Not

Serialisation Format

RDF Serialisation Formats : RDF/XML, Turtle, N-Triples

– N-Triples

< http://ecust.edu.cn/ontologies/foaf/whf/me.rdf#me>

<xmlns:foaf=http://xmlns.com/foaf/0.1#name> “Haofen Wang”.

< http://ecust.edu.cn/ontologies/foaf/whf/me.rdf#me>

< http://www.w3.org/1999/02/22-rdf-syntax-ns#type>

<xmlns:foaf=http://xmlns.com/foaf/0.1#Person>.

Page 75: Knowledge Graph 101 –from the perspective of engineers
Page 76: Knowledge Graph 101 –from the perspective of engineers

open-world

assumption

76

as opposed to the closed

world assumption of classical

systems

Page 77: Knowledge Graph 101 –from the perspective of engineers

in short: the absence of a

triple is not significant

77

Page 78: Knowledge Graph 101 –from the perspective of engineers

(doc.html, author, Haofen)

doesn't mean doc.html has one author

78

Page 79: Knowledge Graph 101 –from the perspective of engineers

(doc.html, author, Haofen)

means doc.html has at least one author

79

Page 80: Knowledge Graph 101 –from the perspective of engineers

RDF – Distribute by Cells!

Needs to reference both schema

and entities

Most flexible – can distribute data

in any way at all!

Family

SP1 Orchidaceae

Duration

sp1 Perennial

Status

sp1 Endangered

Family

SP1 Orchidaceae

Universal Resource Identifier (URI) as common reference

<http://www.usda.gov/classification/plants/species.owl#Orchidaceae>

<http://www.usda.gov/classification/plants/taxaonomy.owl#Family>

Page 81: Knowledge Graph 101 –from the perspective of engineers

Distribute by cells!?Family

SP1 Orchidaceae

Subject

Predicate

Object

URI’s

<SP1> <Family> <Orchidaceae>

Resource Description Framework (RDF)

Page 82: Knowledge Graph 101 –from the perspective of engineers

3 Triples with Same Subject

<SP1>

<SP1>

<SP1>

Page 83: Knowledge Graph 101 –from the perspective of engineers

Integrate Automatically

SP1SP1<SP1>

Page 84: Knowledge Graph 101 –from the perspective of engineers

SPARQL

Query Language for RDF– Based on RDF Data Model

Possible to write complex joins of disperate

datasets

Implemented by all major RDF databases

SPARQL Protocol and RDF Query Language

See more: http://www.w3.org/TR/rdf-sparql-query/

Page 85: Knowledge Graph 101 –from the perspective of engineers

Structure of a SPARQL Query

Page 86: Knowledge Graph 101 –from the perspective of engineers

SPARQL query

SELECT ...

FROM ...

WHERE { ... }

Page 87: Knowledge Graph 101 –from the perspective of engineers

SELECT clause

to identify the values to

be returned

Page 88: Knowledge Graph 101 –from the perspective of engineers

FROM clause

to identify the data

sources to query

Page 89: Knowledge Graph 101 –from the perspective of engineers

WHERE clause

the triple/graph pattern to

be matched against the

triples/graphs of RDF

Page 90: Knowledge Graph 101 –from the perspective of engineers

WHERE clause

a conjunction of triples:{ ?x rdf:type ex:Person

?x ex:name ?name }

Page 91: Knowledge Graph 101 –from the perspective of engineers

PREFIX

to declare the schema

used in the query

Page 92: Knowledge Graph 101 –from the perspective of engineers

example persons and their names

PREFIX ex: <http://ex.org/schema#>

SELECT ?person ?name

WHERE {

?person rdf:type ex:Person

?person ex:name ?name .

}

Page 93: Knowledge Graph 101 –from the perspective of engineers

example of result

<?xml version="1.0"?>

<sparql xmlns="http://www.w3.org/2005/sparql-results#" >

<head>

<variable name="person"/>

<variable name="name"/>

</head>

<results ordered="false" distinct="false">

<result>

<binding name="person">

<uri>http://ex.org/schema#whf</uri>

</binding>

<binding name="name">

<literal>haofen</literal>

</binding>

</result>

<result> ...

Page 94: Knowledge Graph 101 –from the perspective of engineers

FILTER

to add constraints to the

graph pattern (e.g., numerical like X>17 )

Page 95: Knowledge Graph 101 –from the perspective of engineers

example persons at least 18-year old

PREFIX ex: <http://ex.org/schema#>

SELECT ?person ?name

WHERE {

?person rdf:type ex:Person

?person ex:name ?name .

?person ex:age ?age .

FILTER (?age > 17)

}

Page 96: Knowledge Graph 101 –from the perspective of engineers

FILTER can use many

operators, functions (e.g.,

regular expressions), and

even users' extensions

Page 97: Knowledge Graph 101 –from the perspective of engineers

OPTIONAL

to make the matching of

a part of the pattern

optional

Page 98: Knowledge Graph 101 –from the perspective of engineers

example retrieve the age if available

PREFIX ex: <http://ex.org/schema#>

SELECT ?person ?name ?age

WHERE {

?person rdf:type ex:Person

?person ex:name ?name .

OPTIONAL { ?person ex:age ?age }

}

Page 99: Knowledge Graph 101 –from the perspective of engineers

UNION

to give alternative

patterns in a query

Page 100: Knowledge Graph 101 –from the perspective of engineers

example explicit or implicit adults

PREFIX ex: <http://ex.org/schema#>

SELECT ?name

WHERE {

?person ex:name ?name .

{

{ ?person rdf:type ex:Adult }

UNION

{ ?person ex:age ?age

FILTER (?age > 17) }

}

}

Page 101: Knowledge Graph 101 –from the perspective of engineers

Sequence & modify

ORDER BY to sort

LIMIT result number

OFFSET rank of first result

Page 102: Knowledge Graph 101 –from the perspective of engineers

example results 21 to 40 ordered by name

PREFIX ex: <http://ex.org/schema#>

SELECT ?person ?name

WHERE {

?person rdf:type ex:Person

?person ex:name ?name .

}

ORDER BY ?name

LIMIT 20

OFFSET 20

Page 103: Knowledge Graph 101 –from the perspective of engineers

negationis tricky and errors can easily be

made.

103

Page 104: Knowledge Graph 101 –from the perspective of engineers

? does this find persons who do not know "java" ?104

PREFIX ex: <http://ex.org/schema#>

SELECT ?name

WHERE {

?person ex:name ?name .

?person ex:knows ?x

FILTER ( ?x != "Java" )

}

Page 105: Knowledge Graph 101 –from the perspective of engineers

NO! also persons who know something else !

105

PREFIX ex: <http://ex.org/schema#>

SELECT ?name

WHERE {

?person ex:name ?name .

?person ex:knows ?x

FILTER ( ?x != "Java" )

}

haofen ex:knows "Java”

haofen ex:knows "C++”

haofen is a answer...

Page 106: Knowledge Graph 101 –from the perspective of engineers

ASK

to check just if there is at

least one answer ; result

is "true" or "false"

Page 107: Knowledge Graph 101 –from the perspective of engineers

example is there a person older than 17 ?

PREFIX ex: <http://ex.org/schema#>

ASK

{

?person ex:age ?age

FILTER (?age > 17)

}

Page 108: Knowledge Graph 101 –from the perspective of engineers

SPARQL protocol

sending queries and their

results accross the web

Page 109: Knowledge Graph 101 –from the perspective of engineers

examplewith HTTP Binding

GET /sparql/?query=<encoded query> HTTP/1.1

Host: www.ecust.edu.cn

User-agent: my-sparql-client/0.1

Page 110: Knowledge Graph 101 –from the perspective of engineers

#prefix declaration

prefix dbp-ont: <http://dbpedia.org/ontology/>

Prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

#result clause

SELECT *

#dataset definition

FROM <http://dbpedia.org>

#query pattern

WHERE {

?s rdf:type dbp-ont:Person .

?s rdf:type dbp-ont:Astronaut.

?s dbp-ont:status "Retired"@en.

?s dbp-ont:birthDate ?date

} ORDER BY ?date,

LIMIT 10 110

SELECT query: Find 10 of this and

order it by date: ORDER BY

Some one who is

Person & Astronaut

& Retired & youngest

first

Page 111: Knowledge Graph 101 –from the perspective of engineers

Comparison with RDB

Page 112: Knowledge Graph 101 –from the perspective of engineers

One-to-Many Relational Model

Page 113: Knowledge Graph 101 –from the perspective of engineers
Page 114: Knowledge Graph 101 –from the perspective of engineers

Equivalent Semantic Model - Easy

<triple 32: "person2" "type" "person"><triple 33: "person2" "first-name" "Rose"><triple 34: "person2" "middle-initial" "Elizabeth"><triple 35: "person2" "last-name" "Fitzgerald"><triple 36: "person2" "suffix" "none"><triple 37: "person2" "alma-mater" "Sacred-Heart-Convent"><triple 38: "person2" "birth-year" "1890"><triple 39: "person2" "death-year" "1995"><triple 40: "person2" "sex" "female"><triple 41: "person2" "spouse" "person1"><triple 58: "person2" "has-child" "person17"><triple 56: "person2" "has-child" "person15"><triple 54: "person2" "has-child" "person13"><triple 52: "person2" "has-child" "person11"><triple 50: "person2" "has-child" "person9"><triple 48: "person2" "has-child" "person7"><triple 46: "person2" "has-child" "person6"><triple 44: "person2" "has-child" "person4"><triple 42: "person2" "has-child" "person3"><triple 60: "person2" "profession" "home-maker">

Page 115: Knowledge Graph 101 –from the perspective of engineers

Semantic Model – Explicit Relationships

ZJU located_in Hangzhou

Hangzhou located_in China

located_in type transitiveProperty

Relationship Model

ZJU located_in China

Information Inferred

Question

In which country is ZJU located?

Answer

In China

Information Given

Relationships are explicit in the model and directly

available to applications!

Where are the relationships?

Page 116: Knowledge Graph 101 –from the perspective of engineers

Relational Model – Implicit Relationships

ID Company Name

IDC ZJU

City Country

Hangzhou China

City_ID CO_ID

China IDC

Company Table

City Table

Company_CityTable

Question

In which country is ZJU located?

Answer

In China

Develop a Query

Select Country

From Company Table, City Table, Company_City Table

Where Company Name = “ZJU” and ID = CO_ID and City =

City_ID

Relationships are in documents, SQL

code and collective memories - not

available to applications!

Where are the relationships?

Data Definition Statements? Applications do not use them, they are not descriptive and their scope

is a single database

Data Dictionary? Data Registry? They are for

human, not computer use

Page 117: Knowledge Graph 101 –from the perspective of engineers

When Changes Needed w/ Semantic Model

ZJU located_in Hangzhou

Hangzhou loacted_in Zhejiang

Zhejiang located_in China

Located_in type transitiveProperty

Information Given

Hangzhou located_in China

ZJU located_in China

Information Inferred

Question

In which country is ZJU located?

Answer

In China

Relationship Model

Hangzhou located_in China

new dataChanges are Easy to Make

Page 118: Knowledge Graph 101 –from the perspective of engineers

ID Company Name

IDC ZJU

City Country

Hangzhou China

City_ID CO_ID

Hangzhou IDC

Company Table

City Table

Company_City Table

ID Company Name

IDC ZJU

State Name ID Country

Zhejiang ZJ China

City_ID CO_ID

Hangzhou IDC

City_ID State_ID

Hangzhou ZJ

Company Table

State Table

Company_City Table

City_State Table

Question

In which country is ZJU located?

Using the same querySelect Country

From Company Table, City Table, Company_City Table

Where Company Name = “ZJU” and ID = CO_ID and City =

City_ID

Get No Answer!?

When Changes Needed w/ Relational Model

Doesn’t workany more!

Changes should be avoided at ALL costs

Page 119: Knowledge Graph 101 –from the perspective of engineers

“Smart” Data vs. Dumb Data

Depends on where “smart” is

Dumb Data(e.g., RDB)

SmartApplication

Code(SQL codes)

Smart Data

(RDF/OWL ontology)

UniformInference Engine

Today Tomorrow

Page 120: Knowledge Graph 101 –from the perspective of engineers

Triple Database as Data Warehouse

one-to-many relations are directly encoded

without the indirection of tables

Add new predicates (attributes) or class

hierarchy without changing any schema

Never think about what to index because all

the predicates are indexed

Ideal as data repository (warehouse) for

heterogeneous data sources

It’s a large-scale graph database

Ad hoc query is easy without schema

Page 121: Knowledge Graph 101 –from the perspective of engineers

Successful Cases of KG

in Enterpriese

Page 122: Knowledge Graph 101 –from the perspective of engineers

Biodiversity Repository

Page 123: Knowledge Graph 101 –from the perspective of engineers

Challenges for Biodiversity Repository

n Very diverse subjects, even just for flora

Page 124: Knowledge Graph 101 –from the perspective of engineers

Field rule code Record ID Version Version status Record status Name for list view Primary level Secondary level Tertiary level Borneensis no. Sufix No. Web releaseattrul id version verstat recstat brief maxcls subcls mincls entryno subno webflg

MA 25 1 1 1 Echinosorex gymnurus Mammals BOR-00000-03065 YesMA 27 1 1 1 Hylomys suillus Mammals BOR-00000-03067 YesMA 29 1 1 1 Suncus murinus Mammals BOR-00000-03069 NoMA 33 1 1 1 Tupaia glis Mammals BOR-00000-03073 No

Registration no. Old Borneensis no Registration date Collection date Collector's name Country State District Village or nearest village Specific localityRegno OldRegno Regdate collectiondate Collector country State District Village locality

MA0000005 9/15/2004 Henry Benard Malaysia Sabah Lahad Datu Tabin Forest Reserve, Lahad DatuMA0000007 9/15/2004 S.Yasuma MalaysiaMA0000009 9/15/2004 S.Yasuma MalaysiaMA0000013 9/15/2004 21/5/1999 Arifin Ag. Ali Malaysia Sabah Tawau Lembangan Maliau Basin

Latitude Longitude Altitude(Sign) Altitude Habitat type Substrate Ecological data Method of capture/collection Specimen preparation Specimen part Sex Total LengthLatitude Longitude Altitude-kbn Altitude Habita Substrate Ecological capture preparation Specimenpart Sex Total Length

Female 625 mm

Male

Tail length Weight Head-body length Hind foot length Forearm length Ear Other measurement Identification date Identifier Identification note Phylum Phylum(ID)meamethod meavalue HB length hindfoot forearm Ear Othemeasure Identdate Identifier Identnote phylum phylum-id

225 mm 400 mm 65 mm 30 mm

CHORDATA207.0 mm 180.0 g 208.0 mm 46.0 mm 16.0 mm CHORDATA

Credited to Universiti Malaysia Sabah

Sample Fauna Species Data on RDB Table

Page 125: Knowledge Graph 101 –from the perspective of engineers

Subphylum Subphylum(ID) Superclass Superclass(ID) Class Class(ID) Subclass Subclass(ID) Superorder Superorder(ID) Order Order(ID) Suborder Suborder(ID)subphylum subphylum-id superclass superclass-id Class Class-id subclass subclass-id superorder superorder-id order order-id suborder suborder-id

INSECTIVORAInsectivora

VERTERBRATA MAMMALIA Insectivora VERTERBRATA MAMMALIA Scandentia

Superfamily Superfamily(ID) Family Family(ID) Subfamily Subfamily(ID) Genus Genus(ID) Species Species(ID) Subspecies Author Common name (English)superfamily superfamily-id family family-id subfamily subfamily-id genus genus-id species species-id subspecies author English

ERINACEIDAE Hylominae ID:MA00000385 Echinosorex gymnurusErinaceidae Hylomys suillus Lesser GymnureSoricidae ID:MA00000428 Suncus murinus House ShrewTupaiidae Tupaiinea Tupaia glis ID:MA00000483 Common Treeshrew

Common name (local language) Type status Conservation Status Distribution Preservation method Jar no. Room no. Compactor no. Bay no. Shelves no. Container/Box/Jar no.locallang Typestatus consst distribution Preservation method Jarno roomno compactor bayno shelvesno Containerno

Wet room Wet(Eg-01)Tikus babi Dry room Dry(Hs-01)Cencurut Rumah Wet Room Wet(Sm-01)Tupai Moncong Besar Dry specimen Dry Room Dry(Tg-01)

Loaned ID Loaned to (Name & address) E-mail Telephone Fax Country (Borrower) Date loaned Due date Date returned Remarks Multimedia link Release flag Release level Regn statusloanedID loanedto loanmail phone fax countryloan Loaned duedate Returned remarks medialink opnflg opnlvl matst

Malaysia 10000 20Malaysia 10000 20Malaysia 10000 20Malaysia 10000 20

Horrendous table schema

More than 70% of table cells contain null value

Need to call in experts to update schema

A Sample Fauna Species Data (cont’d)

Page 126: Knowledge Graph 101 –from the perspective of engineers

Challenges for Biodiversity

Repository

Many islands of biodiversity information

Some estimate only 10% of information are known

(collected)

We don’t even know what else to come

A mammoth data integration problem,

let alone integrated understanding &

knowledge discovery

Try to design a schema for collection

data tables and data warehouses !!

Page 127: Knowledge Graph 101 –from the perspective of engineers

Perfect Application of

Semantic Database

Page 128: Knowledge Graph 101 –from the perspective of engineers

Life Science Knowledge Base

Page 129: Knowledge Graph 101 –from the perspective of engineers

Challenges for Life Science –

Diversity Very diverse subjects

How to relate all the information cohesively?

Page 131: Knowledge Graph 101 –from the perspective of engineers

Designed for human (90%+), not for computer

Challenges for Life Science –

Knowledge Representation

Page 132: Knowledge Graph 101 –from the perspective of engineers

RDF Class Hierarchy Maps

Taxonomy

NCI ontology – a comprehensive biomedical

taxonomy, containing 1,200,000 concepts

mapped to 2,900,000 terms with 5,000,000

relationships, e.g., Medicine

Medical_Specialties Radiology

Radiology_Therapeutic

Radiology_Bone

Radiology_Dental

Pediatric_Radiology

Nuclear_Medicine Medical_Radiation_Physics

Diagnostic_Radiology_Ionizing_and_Nonionizing_

Radiology_Thorax_Chest

Radiology_Soft_Tissue

Radiology_Head_Neck

Interventional_Radiology

Page 133: Knowledge Graph 101 –from the perspective of engineers

Looking for Alzheimer

Disease Targets

Signal transduction pathways

are considered to be rich in

“druggable” targets - proteins

that might respond to chemical

therapy

CA1 Pyramidal Neurons are

known to be particularly

damaged in Alzheimer’s disease.

Can we find candidate genes

known to be involved in signal

transduction and active in

Pyramidal Neurons?

Page 134: Knowledge Graph 101 –from the perspective of engineers

A SPARQL Query Spanning 4 Sources

SPARQL makes ad hoc queries over

multiple data sources (in RDF) easy

Page 135: Knowledge Graph 101 –from the perspective of engineers

Ad hoc Tracking & Capturing of

Component Properties & Processes

Page 136: Knowledge Graph 101 –from the perspective of engineers

NASA Space Shuttle Launch

Maintenance

Encode the complete maintenance rules &

process (millions of them) of all components

(inter-dependent) in a knowledgebase

Provide process guidance, monitoring,

validation, QA and QC for space shuttle

launch maintenance

Page 137: Knowledge Graph 101 –from the perspective of engineers

Statoil Exploration

Page 138: Knowledge Graph 101 –from the perspective of engineers

Siemens Energy Service

Page 139: Knowledge Graph 101 –from the perspective of engineers

A General Pipeline to Publish

and Explore Knowledge Graph

Page 140: Knowledge Graph 101 –from the perspective of engineers

Architecture scenarios

140

Page 141: Knowledge Graph 101 –from the perspective of engineers

Motivation: Music!

Visualization

Module

Metadata

Streaming providers

Physical Wrapper

Downloads

Da

ta a

cq

uis

itio

n D2R Transf.LD Wrapper

Musical Content

Ap

plic

atio

n

Analysis &

Mining Module

LD

Da

tase

tA

cce

ss

LD Wrapper

RDF/

XML

Integrated

DatasetInterlinking Cleansing

Vocabulary

Mapping

SPARQL

Endpoint

Publishing

RDFa

Other content

Page 142: Knowledge Graph 101 –from the perspective of engineers

Large KBs You Need to Know

Page 143: Knowledge Graph 101 –from the perspective of engineers

DBpedia

DBpedia is a crowd-sourced community effort

to extract structured information

from Wikipedia and make this information

available on the Web. DBpedia allows

you to ask sophisticated queries against

Wikipedia, and to link the different data sets

on the Web to Wikipedia data.

http://dbpedia.org/

Page 144: Knowledge Graph 101 –from the perspective of engineers

DBpedia

The DBpedia Ontology is a

shallow, cross-domain

ontology, which has been

manually created based

on the most commonly used

infoboxes within Wikipedia.

The ontology currently covers

685 classes which form

a subsumption hierarchy

and are described by 2,795

different properties. http://dbpedia.org/

Page 145: Knowledge Graph 101 –from the perspective of engineers

DBpedia

The DBpedia data set uses a large multi-

domain ontology which has been derived from

Wikipedia. The English version of the DBpedia

2014 data set currently describes 4.58 million

“things” with 583 million “facts”.

http://dbpedia.org/

Page 146: Knowledge Graph 101 –from the perspective of engineers

YAGO

YAGO (Yet Another Great Ontology) is

a knowledge base developed at the Max

Planck Institute for Computer

Science in Saarbrücken. It is automatically

extracted from Wikipedia and other sources.

Page 147: Knowledge Graph 101 –from the perspective of engineers

YAGO

YAGO2s(Stable release) is a huge semantic

knowledge base, derived

from Wikipedia WordNet and GeoNames.

Currently, YAGO2s has knowledge of more

than 10 million entities (like persons,

organizations, cities, etc.) and contains more

than 120 million facts about these entities.

http://www.mpi-inf.mpg.de/departments/databases-and-

information-systems/research/yago-naga/yago/

Page 148: Knowledge Graph 101 –from the perspective of engineers

YAGO Demo

https://gate.d5.mpi-inf.mpg.de/webyagospotlx/Browser

https://gate.d5.mpi-inf.mpg.de/webyagospotlx/WebInterface

Page 149: Knowledge Graph 101 –from the perspective of engineers

Freebase

A community-curated database of well-known

people, places, and things.

It is an online collection of structured

data harvested from many sources, including

individual, user-submitted wiki contributions.

http://www.freebase.com/

Page 150: Knowledge Graph 101 –from the perspective of engineers

Freebase

Page 151: Knowledge Graph 101 –from the perspective of engineers

NELL

NELL (Never-Ending Language Learner) can

extract facts from text found in hundreds of

millions of web pages and improve its reading

competence, so that tomorrow it can extract

more facts from the web, more accurately.

http://rtw.ml.cmu.edu/rtw/

Page 152: Knowledge Graph 101 –from the perspective of engineers

NELL

NELL has accumulated over 50 million candidate

beliefs by reading the web, and it is considering

these at different levels of confidence. NELL has

high confidence in 2,180,254 of these beliefs.

Page 153: Knowledge Graph 101 –from the perspective of engineers

NELL

http://rtw.ml.cmu.edu/rtw/kbbrowser/

Page 154: Knowledge Graph 101 –from the perspective of engineers

Entity Linking

Page 155: Knowledge Graph 101 –from the perspective of engineers

Public Toolkits and Web Services for

Entity Linking

Wikipedia Miner

TagMe

DBpedia Spotlight

Illinios Wikifier

AIDA

(OpenCalais)

Page 156: Knowledge Graph 101 –from the perspective of engineers

Wikipedia Miner [Milne & Witten 2008b]

Open source

(Public) web service

– Java

– Hadoop preprocessing pipeline

Lexical matching + machine learning

Target KB: Wikipedia

See http://wikipedia-miner.cms.waikato.ac.nz

Page 157: Knowledge Graph 101 –from the perspective of engineers
Page 158: Knowledge Graph 101 –from the perspective of engineers

TagMe [Ferragina & Scaiella 2010]

Web service only (demo + API)

Approach similar to Wikipedia Miner

– Voting for disambiguation

– based on all possible bindings

heuristics to select best target

Designed for short texts

Target KB: Wikipedia

See http://tagme.di.unipi.it/

Page 159: Knowledge Graph 101 –from the perspective of engineers
Page 160: Knowledge Graph 101 –from the perspective of engineers

Illinois Wikifier [Ratinov et al. 2011]

Local install + online demo– uses Illinois NER system

Disambiguation as weighted sum of features– Textual similarity

– Global coherence based on link structure

Target KB: Wikipedia

See http://cogcomp.cs.illinois.edu/page/software_view/33

Page 161: Knowledge Graph 101 –from the perspective of engineers

Demo:

http://cogcomp.cs.illinois.edu/demo/wikify/?id=25

Page 162: Knowledge Graph 101 –from the perspective of engineers

DBpedia Spotlight [Mendes et al., 2011]

Open source

Public web service

Disambiguation in local context

– vector-space model using bag-of-words and cosine

similarity

– (actually, Lucene)

Target KB: DBpedia

See http://spotlight.dbpedia.org

Page 163: Knowledge Graph 101 –from the perspective of engineers

Demo: http://dbpedia-spotlight.github.io/demo/

Page 164: Knowledge Graph 101 –from the perspective of engineers

AIDA [Yosef et al. 2011]

Open source

– uses Stanford NER system

(Public) web service, API

Links to YAGO2

Disambiguation in 3 variants

– PriorOnly: link to most common target

– Local: disambiguate individual links with local features

– CocktailParty: collective disambiguation maximizing

coherence using iterative graph-based approach

Target KB: YAGO2

See http://www.mpi-inf.mpg.de/departments/databases-

and-information-systems/research/yago-naga/aida/

Page 165: Knowledge Graph 101 –from the perspective of engineers

Demo: https://gate.d5.mpi-inf.mpg.de/webaida/

Page 166: Knowledge Graph 101 –from the perspective of engineers

OpenCalais

Only on public content

– does not keep a copy of content

– keeps a copy of the metadata it extracts

Free for up to 50,000 documents per day

Early adopters:

– CBS Interactive / CNET, Huffington Post, Al Jazeera,

The White House

– more than 30,000 developers && 50 publishers

Target KB: Calais

See http://www.opencalais.com/

Page 167: Knowledge Graph 101 –from the perspective of engineers

Demo: http://viewer.opencalais.com/

Page 168: Knowledge Graph 101 –from the perspective of engineers
Page 169: Knowledge Graph 101 –from the perspective of engineers
Page 170: Knowledge Graph 101 –from the perspective of engineers

Knowledge Acquisition

from Unstructured Texts

Page 171: Knowledge Graph 101 –from the perspective of engineers

OpenIE/TextRunner Learn syntactic patterns to extract any relation

instances from any domains from text

Completely unsupervised, no need for seeds

• Input: corpus C,

• Output: a set of extracted relations

parser phase on a portion of C, pattern generation

from parsed documents, t: <e1, r, e2>

Page 172: Knowledge Graph 101 –from the perspective of engineers

Reverb Automatically identifies and extracts binary

relationships from English sentences.

Designed for Web-scale information extraction

Consider all verbal phrases as potential relations

and all noun phrases as arguments

Target relations cannot be specified in advance

Page 173: Knowledge Graph 101 –from the perspective of engineers

Input: raw text

Output: (argument, relation phrase, argument2) triples

For example:

• Input: Bananas are an excellent source of potassium.

• Output: (bananas, be source of, potassium)

Reverb (cont’d)

https://github.com/knowitall/reverb

Page 174: Knowledge Graph 101 –from the perspective of engineers

Ollie Automatically identifies and extracts binary relationships

from English sentences. Designed for Web-scale

information extraction, where target relations are not

specified in advance.

Ollie also captures context that modifies a binary relation.

Presently Ollie handles attribution (He said/she believes)

and enabling conditions (if X then).

https://github.com/knowitall/ollie

Page 175: Knowledge Graph 101 –from the perspective of engineers

Enabling Condition:

Sentence: If I slept past noon, I'd be late for work.

Extraction: (I, 'd be late for, work) [enabler=If I slept past noon]

Ollie (cont’d)

Attribution:

Sentence: Some people say Barack Obama was not born in the United States.

Extraction:(Barack Obama, was not born in, the United States [attrib=Some

people say]

Page 176: Knowledge Graph 101 –from the perspective of engineers

Relational noun:

Some relations are expressed without verbs. Ollie can

capture these as well as verb-mediated relations

Sentence: Microsoft co-founder Bill Gates spoke at a conference on Monday.

Extraction: (Bill Gates, be co-founder of, Microsoft)

N-ary extractions:Sentence: I learned that the 2012 Sasquatch music festival is scheduled for May

25th until May 28th.

Extraction: (the 2012 Sasquatch music festival, is scheduled for, May 25th)

Extraction: (the 2012 Sasquatch music festival, is scheduled until, May 28th)

N-ary: (the 2012 Sasquatch music festival, is scheduled, [for May 25th, to May 28th])

Ollie (cont’d)

Page 177: Knowledge Graph 101 –from the perspective of engineers

SRLIE Automatically identifies n-ary extractions from English

sentences.

Designed for Web-scale information extraction, where

target relations are not specified in advance.

Builds extractions from Semantic Role Labelling .

https://github.com/knowitall/srlie

Page 178: Knowledge Graph 101 –from the perspective of engineers

Chunked Extractors

https://github.com/knowitall/chunkedextractor

a collection of three extractors:

• ReVerb -- an extractor for verb-mediated relations

• Sally sells sea shells

• Relnoun -- an extractor for noun-mediate relation

• United States president Barack Obama

• Nesty -- an extractor for nested relations

• Some people say that we never landed on the moon

Page 179: Knowledge Graph 101 –from the perspective of engineers

Learn

syntactic

patterns

TextRunner

Consider verbal phrases as

relations and noun phrases

as arguments

ReVerb

Extract relations are

expressed without verbs,

handle attribution

Ollie

Extract n-ary

extractions

SRLIE

binary relationships

Compare different open IE system:

Page 180: Knowledge Graph 101 –from the perspective of engineers

SOFIE:

Extract ontological facts from natural language documents and

link the facts into an ontology.

Uses logical reasoning on the existing knowledge and on the

new knowledge in order to disambiguate words to their most

probable meaning

Unites pattern matching, word sense disambiguation and

ontological reasoning in one unified model

• Input :target relations and type signature for

involved entities

http://www.mpi-inf.mpg.de/yago-naga/sofie/

Page 181: Knowledge Graph 101 –from the perspective of engineers

Extending a KB faces 3+ challenges

type(Reagan, president)

spouse(Reagan, Davis)

spouse(Elvis, Priscilla)

(F. Suchanek et al.: WWW‘09)

Problem: If we want to extend a KB, we face (at least) 3 challenges

1. Understand which relations are expressed by patterns

"x is married to y“ spouse(x, y)

2. Disambiguate entities

"Hermione is married to Ron": "Ron" = RonaldReagan?

3. Resolve inconsistencies

spouse(Hermione, Reagan) & spouse(Reagan, Davis) ?

"Hermione is married to Ron"

?

18

1

Page 182: Knowledge Graph 101 –from the perspective of engineers

PROSPERA

N-gram item-set patterns to generalize narrow

syntactic patterns to boost recall(different from

SOFIE)

Reasoning with large KB (YAGO) to constrain

extractions to boost precision

• Input :target relations and type signature for

involved entities

http://www.mpi-inf.mpg.de/yago-naga/sofie/

Page 183: Knowledge Graph 101 –from the perspective of engineers

Graph Database (with

Reasoning Supports)

Page 184: Knowledge Graph 101 –from the perspective of engineers

Current Graph databases (selected)

Open source– Bigdata

– Sesame

– Jena

– Neo4j

Commercial Edition– Virtuoso

– BigOwlim

– AllegroGraph

Page 185: Knowledge Graph 101 –from the perspective of engineers

Bigdata

High-performance

Supporting the RDF data

model and RDR.

Embedded database or over a

client/server REST API.

High-availability and dynamic

sharding.

Blueprints and Sesame APIs.

High-level query with SPARQL

http://www.bigdata.com/

Page 186: Knowledge Graph 101 –from the perspective of engineers

Sesame

An Java framework for processing RDF data.

Easy-to-use API can be connected to RDF storage

solutions.

SPARQL endpoints

two out-of-the-box RDF databases (the in-memory

store and the native store

supporting all mainstream RDF file formats

http://rdf4j.org/

Page 187: Knowledge Graph 101 –from the perspective of engineers

Sesame

Example code

for Sesame

http://rdf4j.org/

Page 188: Knowledge Graph 101 –from the perspective of engineers

Jena

A free and open source

Java framework for

building Semantic

Web and Linked

Data applications

Developed by HP

Laboratories

In-memory or persistent

storage

http://jena.apache.org/

Page 189: Knowledge Graph 101 –from the perspective of engineers

Jena

Example code to create graph with Jena

http://jena.apache.org/

Page 190: Knowledge Graph 101 –from the perspective of engineers

Neo4j

http://neo4j.com

A Graph database + Lucene index

Property Graph

Full ACID

(atomicity, consistency, isolation, durability)

High Availability (with Enterprise Edition)

32 Billion Nodes,32 Billion Relationships,

64 Billion Properties

Embedded server

REST API

Page 191: Knowledge Graph 101 –from the perspective of engineers

Neo4j

Cypher

http://neo4j.com

Page 192: Knowledge Graph 101 –from the perspective of engineers

Neo4j

Good for– Highly connected data

– Recommendations

– Path Finding

– A*

– Data First Schema

http://neo4j.com

Page 193: Knowledge Graph 101 –from the perspective of engineers

Virtuoso

Smart Data & Virtualization & Integration

Scalable & High-Performance Data Management

Web-scale identity & Security

Standards Compliance

http://virtuoso.openlinksw.com/

Page 194: Knowledge Graph 101 –from the perspective of engineers

Virtuoso

Unique hybrid

server

architecture

http://virtuoso.openlinksw.com/

Page 195: Knowledge Graph 101 –from the perspective of engineers

BigOwlim

The world’s leading RDF

Triplestore and graph database

The only triplestore can perform

semantic inferencing at scale

Allowing users to create new

semantic facts from existing facts

Handling massive loads, queries

and inferencing in real time

http://www.ontotext.com/owlim

Page 196: Knowledge Graph 101 –from the perspective of engineers

Allegrograph

http://www.franz.com/agraph/allegrograph

A modern, high-performance, persistent graph database

All Clients based on REST Protocol – Java Sesame, Java Jena, Python,etc

Page 197: Knowledge Graph 101 –from the perspective of engineers

Allegrograph

AllegroGraph is designed for maximum loading speed

and query speed and High-performance storage

http://www.franz.com/agraph/allegrograph

Page 198: Knowledge Graph 101 –from the perspective of engineers

Knowledge Integration

Page 199: Knowledge Graph 101 –from the perspective of engineers

Falcon-AO

Ontology Matching(classes, properties and instances)

LMO: Linguistic matching– Lexical Comparison(string similarity: SS): edit distance

– Statistic Analysis(document similarity: DS): VSM, virtual document of entity from labels, names, comments as well as ones from neighbors.

– Linguistic Similarity=0.8*DS + 0.2*SS

GMO: Graph matching– Similarity of two entities from two ontologies comes from the

accumulation of similarities of involved statements (triples) taking the two entities as the same role (subject, predicate,object) in the triples

– Similarity of two statements comes from the accumulation of similarities of involved entities of the same role in the two statements being compared.

– Input: A set of matched entities. Output: Additional matched entities

http://ws.nju.edu.cn/falcon-ao

Page 200: Knowledge Graph 101 –from the perspective of engineers

Falcon-AO

Page 201: Knowledge Graph 101 –from the perspective of engineers

Falcon-AO

Page 202: Knowledge Graph 101 –from the perspective of engineers

BLOOMS Ontology Alignment for Linked Open Data

Ontology Alignment(classes)

Construction of BLOOMS forest

Comparison of BLOOMS forests– Given two forests TC, TD, for any Ts∈ TC, Tt∈ TD

– If Ts=Tt, then C owl:equivalentClass D

– If overlap(Ts,Tt)≤ overlap(Tt,Ts), then

C owl:subclassOf D,else D owl:subclassOf C

http://wiki.knoesis.org/index.php/BLOOMS

Page 203: Knowledge Graph 101 –from the perspective of engineers

PARIS PARIS: Probabilistic Alignment of Relations, Instances, and Schema

Ontology Alignment(classes, relations, instances)

Probabilistic Model

http://webdam.inria.fr/paris/

Page 204: Knowledge Graph 101 –from the perspective of engineers

PARIS

Functionality

Page 205: Knowledge Graph 101 –from the perspective of engineers

PARIS

Equality of Instances

Page 206: Knowledge Graph 101 –from the perspective of engineers

PARIS

Equality of Classes– If all the instances of one class are instances of the other

then the former subsumes the latter

Equality of Relations– If every pair of one relation is a pair of another relation, then

the first is a sub-property of the second

Page 207: Knowledge Graph 101 –from the perspective of engineers

Silk Discovering and Maintaining Links on the Web of Data

Discovering relationships between instances

Components:– Link Discovery Engine

• Link Specification Language

• Computes links between data sources based on a

declarative specication of the conditions

– Generated Links Evaluation

• Fine-tune the linking specication

– A protocol for maintaining data links

• Allows data sources to exchange both linksets as well as detailed change

information and enables continuous link recomputation.

http://wifo5-03.informatik.uni-mannheim.de/bizer/silk/

Page 208: Knowledge Graph 101 –from the perspective of engineers

Silk

Page 209: Knowledge Graph 101 –from the perspective of engineers

Silk

It enables the user to manage different sets of data sources and

linking tasks.

It offers a graphical editor which enables the user to easily create

and edit link specifications.

It allows quickly evaluate the links

It allows the user to create and edit a set of reference links used

to evaluate the current link specification.

Page 210: Knowledge Graph 101 –from the perspective of engineers

Comparison

Class Property Instance

Falcon-AO √ √ √

BLOOMS √

PARIS √ √ √

SILK √

Page 211: Knowledge Graph 101 –from the perspective of engineers

Knowledge Exploration

Page 212: Knowledge Graph 101 –from the perspective of engineers

Gruff http://franz.com/agraph/gruff/

Page 213: Knowledge Graph 101 –from the perspective of engineers

Interactive Relational Data Navigation

http://www.sindicetech.com/pivotbrowser.html

Page 214: Knowledge Graph 101 –from the perspective of engineers

Exhibit – SMILE widgets

http://www.simile-widgets.org/exhibit/

Page 215: Knowledge Graph 101 –from the perspective of engineers

Open Source One-stop

Solution

Page 216: Knowledge Graph 101 –from the perspective of engineers

Linked Media Framework and Marmotta

LMF is build on top of three Apache projects:

Apache Marmotta provides the Lined Data Platform

capabilities

Apache Stanbol is the extraction and enhancement

framework used

Apache Solr provides indexation capabilities

The glue that LMF implements allows to get the best

of these three projects for providing advance linked

media capabilities, such as semantic search or

semantic enrichment.

Page 217: Knowledge Graph 101 –from the perspective of engineers

Linked Media Framework (Architecture)

https://code.google.com/p/lmf/

Page 218: Knowledge Graph 101 –from the perspective of engineers

Knowledge Graph

Tables

Data Graphs

References, Key

Concepts,

Relations

External Domain DataUnstructured/Semi-structured content

Customer Data

Enrichment and Encoding via

Domain Ontology

• Search++

• Recommendations

• Vertical applications

• Explorative interfaces

Relational DB

Align

An Enterprise Knowledge Graph

Page 219: Knowledge Graph 101 –from the perspective of engineers

Publishing Legacy Data as Linked Data

Google Refine (RDF Extension)

Apache Stanbol

Page 220: Knowledge Graph 101 –from the perspective of engineers

Publishing Legacy Data as Linked Data

Page 221: Knowledge Graph 101 –from the perspective of engineers

Publishing Legacy Data as Linked Data

Page 222: Knowledge Graph 101 –from the perspective of engineers

Publishing Legacy Data as Linked Data

Page 223: Knowledge Graph 101 –from the perspective of engineers

Publishing Legacy Data as Linked Data

Page 224: Knowledge Graph 101 –from the perspective of engineers

Publishing Legacy Data as Linked Data

Page 225: Knowledge Graph 101 –from the perspective of engineers

References

fabien gandon. RDF in a nutshell.

fabien gandon. SPARQL in a nutshell

fabien gandon. WWW 2014 tutorial on

Semantic Web

We adapt the above slides to introduce RDF

and SPARQL

Page 226: Knowledge Graph 101 –from the perspective of engineers

Thank you!

Any questions?