52
E. Della Valle – http://emanueledellavalle.org - @manudellavalle Knowledge Graphs in search engines like Google Emanuele Della Valle DEIB - Politecnico di Milano http://emanueledellavalle.org @manudellavalle

Knowledge graphs in search engines

Embed Size (px)

Citation preview

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

Knowledge Graphs in search engines like Google

Emanuele Della ValleDEIB - Politecnico di Milanohttp://emanueledellavalle.org @manudellavalle

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

Share, Remix, Reuse — Legally

This work is licensed under the Creative Commons Attribution 3.0 Unported License. Your are free:

to Share — to copy, distribute and transmit the work

to Remix — to adapt the work

Under the following conditions

Attribution — You must attribute the work by inserting“by E. Della Valle – http://emanueledellavalle.org -

@manudellavalle”

at the end of each reused slide

To view a copy of this license, visithttp://creativecommons.org/licenses/by/3.0/

2

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

Me

• Assistant Professor at DEIBPolitecnico di Milano

• Expert in semantic technologies and stream computing

• Brander of stream reasoning: an approach to master the velocity and variety dimension of Big Data• https://scholar.google.com/scholar?

hl=en&q="stream+reasoning"

• 17 years of experience in research and innovation projects

• Startupper: • http://www.fluxedo.com

3

[email protected]

@manudellavalle

http://emanueledellavalle.org

http://streamreasoning.org

http://fluxedo.com

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

• The interoperability problem• The standardization dilemma• One standard does not fit all• Embrace change with semantic technologies• Demo time for Google Knowledge Graph• How this become possible

Agenda

4

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

Definitions of Interoperability

• Interoperability• the ability of information and communication technology (ICT) systems

to exchange data and to enable sharing of information and knowledge • Functional interoperability

• Information has to be transmitted reliably between heterogeneous applications

• Semantic interoperability• Transmission must occur without loss of meaning, and thus without

loss of computability• E.g., Semantic Interoperability in healthcare information systems

• It is the ability to share information without loss of computable meaning, across multiple applications concerned with clinical (primary use) and related administrative, financial, and research domains (secondary uses).

5

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

Once upon a time …

6

…, in an happy organization, users were happy of the application the IT department prepared for them, but …

application

[…]

… the organization was not alone. Another organization developed a complementary application …

complementary application

[…]

… so, one day, the two organizations decided to integrate the two applications.

Organizationalboundaries

application

[…]

complementary application

[…]

Organizationalboundaries

application

[…]

?

Having much to gain the happy organization decided to invest in a bi-lateral solution

complementary application

[…]

Organizationalboundaries

application

[…]

adapter!

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

… this went on for a while, but …

7

[…]

!

… the more bi-lateral integrations, the sadder the organizations became.

[…]

[…]

[…]

[…]

[…]

[…]

!

!

!!

!!

!!

!?!

!!!!!!

!?!?!?

?!?

?!?! OK!! Good!!! Very Good

!?! Very Good …

?!? Have I done the right thing?

??? Does it make sence??#@ Why am I doying it!!!

Legend

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

… So, they standardized and …

8

[…]

[…]

[…]

[…]

[…]

[…]

[…]

standard

… and they lived happily ever after!

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

Well, not really :-( Actually …

9

[…]

[…]

[…]

[…]

[…]

[…]

[…]

??? KEEP CALM

AND

WAIT FOR 1 YEARS 10100

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

Why? The Standardization dilemma!

ComprehensiveHandles all use cases

GoodHigh quality

TimelyCompleted quickly

Pick two!

Pick two!

10

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

There are a variety of them

11

Standards are like plumbs

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

Over 100 in the Healthcare domain!

AIR ALT AOD AOT BI CCC CCPSS CCS CDT CHV COSTAR CPM CPT CPTSP CSP CST DDB DMDICD10 DMDUMD DSM3R DSM4 DXP FMA

HCDT HCPCS HCPT HL7V2.5 HL7V3.0 HLREL ICD10 ICD10AE ICD10AM ICD10AMAE ICD10CM ICD10DUT ICD10PCS ICD9CM ICF ICF-CY ICPC ICPC2EDUT ICPC2EENG ICPC2ICD10DUT ICPC2ICD10ENG ICPC2P ICPCBAQ ICPCDAN ICPCDUT ICPCFIN ICPCFRE ICPCGER ICPCHEB

ICPCHUN ICPCITA ICPCNOR ICPCPOR ICPCSPA ICPCSWE JABL KCD5 LCH LNC_AD8 LNC_MDS30 MCM MEDLINEPLUS MSHCZE MSHDUT

MSHFIN MSHFRE MSHGER MSHITA MSHJPN MSHLAV MSHNOR MSHPOL MSHPOR MSHRUS MSHSCR MSHSPA MSHSWE MTH MTHCH

MTHHH MTHICD9 MTHICPC2EAE MTHICPC2ICD10AE MTHMST MTHMSTFRE MTHMSTITA NAN NCISEER NIC NOC OMS PCDS PDQ PNDS PPAC PSY QMR RAM RCD RCDAE RCDSA RCDSY SNM SNMI

SOP SPN SRC TKMT ULT UMD USPMG UWDA WHO WHOFRE WHOGER WHOPOR WHOSPA

12

[source: dbooth.org/2014/yosemite/yosemite-project-slides.pdf]

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

And they keep changing :-(

13

[Credits: Rafael Richards]

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

Why?

14

[source http://xkcd.com/927/ ]

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

… sometime the variety is required

15

standards are like plumbs

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

One standard does not fit all

Different use cases need need different data, granularity and representations

16

[source: dbooth.org/2014/yosemite/yosemite-project-slides.pdf]

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

… thus translation is needed

17

standards are like plumbs

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

And counting on translation between standards is even convenient while working on increasing the comprehensiveness of a standard over time

18

Translation is unavoidable!

Co

mp

reh

en

siv

e

0%

100%

Time

Translation

Standard

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

But be aware of the cost of ad hoc translation!

19

standards are like plumbs

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

…in healthcare costs $30000 Million per year in USA

[source: http://www.calgaryscientific.com/blog/bid/284224/Interoperability-Could- Reduce-U-S-Healthcare-Costs-by-Thirty-Billion]

20

The luck of interoperability …

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

So What?!?

“It is not necessarily the strongest of the

species that survives nor the most intelligent,

but the one that is most responsive to change.”

--- Charles Darwin“The Origin of Species”

21

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

Embrace change!

22

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

Semantic Technologies embrace change

23

subject objectobjectproperty

Proposing a simple data model: RDF

E.g.,

Flexible enough to represent: Tables

Amoxi-cillin

bacterial disease

bacterial disease

treats

Trees Graphs

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

Providing a powerful query language: SPARQLE.g., what does Amoxicillin treat?

?x={Bacterial disease, Urinary tract infection, Sinus infection, …}

Flexible enough to query RDF data even without knowing the schemaE.g., can you describe Amoxicillin ?

?p={treats} ?x={Bacterial disease, Urinary tract infection, Sinus infection, …}?p={hasSideEffects} ?x={Diarrhoea}?p={belongsTo} ?x={β-Lactam antibiotic, Penicillin-class Antibacterial}…

Semantic Technologies embrace change

24

Amoxi-cillin ?x?x

treats

Amoxi-cillin ?x?x

?p

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

Providing a formal language for conceptual modelling: OWLE.g., Heart

Heart is a muscularorgan that is part ofthe circulatory system

∀x.[ Heart(x)→ MuscolarOrgan(x)∧ ∃y.[isPartOf(x,y )∧ CirculatorySystem(y)]]

OWL is a modular standard that offers different trade-offs OWL-QL OWL-RL OWL-EL

Semantic Technologies embrace change

25

TermsData

Terms

Data Terms

Data

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

Semantic Technologies embrace change

26

Standard in OWL

[…]

Ontology Based Data Access as a prototypical solution to interoperability problems

<XML><XML>

Translator Translator Translator

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

Semantic Technologies embrace change

27

Standard in OWL

[…]

SPARQL Queries

Ontology Based Data Access as a prototypical solution to interoperability problems

RDBMS <XML><XML>

Translator Translator Translator

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

Semantic Technologies embrace change

28

Standard in OWL

Results{ , , }

Ontology Based Data Access as a prototypical solution to interoperability problems

[…]RDBMS <XML><XML>

Translator Translator Translator

E. Della Valle – http://emanueledellavalle.org - @manudellavalle 29

E. Della Valle – http://emanueledellavalle.org - @manudellavalle 30

DEMO TIM

E

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

Search for Galileo and look to the right

31

Galileo Galilei AstronomerAstronomer

type

February 15, 1564

February 15, 1564

when born

CallistoCallisto

GanimedeGanimede

discovered

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

Let's try a more complex query

32

Galileo Galilei

discovered?x?x

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

Try and enjoy!

33

E. Della Valle – http://emanueledellavalle.org - @manudellavalle 34

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

“The Semantic Web is not a separate Web, but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation.”

“The Semantic Web”, Scientific American Magazine,

Maggio 2001

Semantic interoperability on the functionally interoperable Web

2001In the begging was the Semantic Web

35

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

2008It gained popularity when Linked Data became standards

36

View the full talk at http://www.ted.com/talks/view/id/484 !

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

2008it was funded by USA, UK and …

37

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

2008… and EU

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

2008Search engine created incentives

[source https://developer.yahoo.com/searchmonkey/siteowner.html ]

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

2008Search engine created incentives

Emanuele Della Valle -

@manudellavalle - http://emanueledellavalle.

org

[source https://developers.google.com/structured-data/rich-snippets/ ]

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

•Since Fall 2009•450.000 products •Using RDFa (= RDF embedded in HTML)•Pages with RDFa higher in Google ranking•BestBuy claims 30% more traffic!•Yahoo reports 15% higher click-through rat

2009Best Buy picked them up

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

© 2012 Politecnico di Milano, Emanuele Della Valle

2009Best Buy picked them up

<div rel="v:hasReview"><span property="v:rating" datatype="xsd:string"> 4.8</span> of <span property="v:best">5</span>

<div rel="v:hasReview"><span property="v:rating" datatype="xsd:string"> 4.8</span> of <span property="v:best">5</span>

RDFa

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

Google for Nikon+12.3-Megapixel+Digital+SLR+Camerahttps://www.google.com/search?q=Nikon+12.3-Megapixel+Digital+SLR+Camera

2009Best Buy picked them up

enriche

d pages//43

Sponsored Links

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

• Who: Richard MacManus • When: April 15th, 2010• Context: Modigliani’s painting are

scattered all other the world• The challenge: If all museums would have

published their collections as linked data, will it be possible to know the locations of allthe original paintings of Modigliani?

• http://readwrite.com/2010/04/15/the_modigliani_test_semantic_web_tipping_point

2010The Modigliani test for Linked Data

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

The Results of Modigliani test for Linked Data• Who: Atanas Kiryakov (Ontotext AD)• When: April 25th, 2010• How: http://factforge.net/ a “reason-able” view to the web of data• Results: http://bit.ly/ModiglianiTest

http://readwrite.com/2010/04/25/the_modigliani_test_for_linked_data

2010The Modigliani test for Linked Data

Part of my LarKC project http://www.larkc.org/

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

Use RDFa with some FB specific vocabulary

og:title - The title of your object, e.g., "The Rock".

og:type - The type of your object, e.g., "movie".

og:image - An image URL

og:url - The permanent ID of your object

og:description - A one to two sentence description of your object.

og:site_name - If your object is part of a larger web site, the name which should be displayed for the overall site. e.g., "IMDb".

2010It went main stream with Facebook Open Graph

http://ogp.me/

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

Open Graph Usage Statistics

15 millions sites are using Open Graph! 39% of the top 10,000 sites

2010It went main stream with Facebook Open Graph

[Source: http://trends.builtwith.com/docinfo/Open-Graph-Protocol]

%

40

30

202010 2011 2012 2013 2014 2015

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

•The core vocabulary currently consists of •597 Types•867 Properties•114 Enumeration values

[So

urc

e h

ttp

://b

log

.sch

em

a.o

rg/2

015/

11/s

che

mao

rg-w

hat

s-n

ew.h

tml

]

2011It reached its full potential with schema.org

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

Thanks to schema.org also recipe are in the Knowledge Graphs

49

E. Della Valle – http://emanueledellavalle.org - @manudellavalle 50

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

Google Knowledge Graph (powered by Semantic Technologies) passes the Modigliani Test

51

E. Della Valle – http://emanueledellavalle.org - @manudellavalle

Knowledge Graphs in search engines like Google

Emanuele Della ValleDEIB - Politecnico di Milanohttp://emanueledellavalle.org @manudellavalle