32
Frank van Harmelen Vrije Universiteit Amsterdam the Large Knowledge Collider ative Commons License: owed to share & remix, must attribute & non-commercial

LarKC: the large knowledge collider

Embed Size (px)

DESCRIPTION

Brief presentation of vision and mission of the LarKC project, building the Large Knowledge Collider http://www.larkc.eu

Citation preview

Page 1: LarKC: the large knowledge collider

Frank van HarmelenVrije Universiteit Amsterdam

the Large Knowledge

Collider

Creative Commons License: allowed to share & remix,but must attribute & non-commercial

Page 2: LarKC: the large knowledge collider

• The vision

• The project

• The consortium

• The plan

Yes!Oh Shit…

Page 3: LarKC: the large knowledge collider

The Vision

“a configurable platform for infinitely scalable semantic web reasoning”

Page 4: LarKC: the large knowledge collider

27-June-07

Why we needThe Large Knowledge Collider

Gartner (May 2007):"By 2012,

70% of public Web pages will have some level of semantic markup, 20% will use more extensive Semantic Web-based ontologies”

• Semantic Technologies at Web Scale?– 20% of 30 billion pages @ 1000 triples per page =

6 trillion triples– 30 billion and 1000 are underestimates,

imagine in 6 years from now… – data-integration and semantic search at web-scale?

Page 5: LarKC: the large knowledge collider

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 5 http://www.aifb.uni-karlsruhe.de/WBS

1 triple:

Page 6: LarKC: the large knowledge collider

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 6 http://www.aifb.uni-karlsruhe.de/WBS

Page 7: LarKC: the large knowledge collider

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 7 http://www.aifb.uni-karlsruhe.de/WBS

Page 8: LarKC: the large knowledge collider

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 8 http://www.aifb.uni-karlsruhe.de/WBS

Page 9: LarKC: the large knowledge collider

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 9 http://www.aifb.uni-karlsruhe.de/WBS

107 Triples[OWLIM]

Suez Canal

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 9 http://www.aifb.uni-karlsruhe.de/WBS

Page 10: LarKC: the large knowledge collider

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 10 http://www.aifb.uni-karlsruhe.de/WBS

RDF Store subsecond querying108 Triples

[Ingenta]

Moon

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 10 http://www.aifb.uni-karlsruhe.de/WBS

Page 11: LarKC: the large knowledge collider

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 11 http://www.aifb.uni-karlsruhe.de/WBS

~109 TriplesEarth

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 11 http://www.aifb.uni-karlsruhe.de/WBS

Page 12: LarKC: the large knowledge collider

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 12 http://www.aifb.uni-karlsruhe.de/WBS

[LarKC proposal] ~1010 Triples ≈ 1 triple per web-page

≈ 1 triple per web-page

Jupiter

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 12 http://www.aifb.uni-karlsruhe.de/WBS

Page 13: LarKC: the large knowledge collider

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 13 http://www.aifb.uni-karlsruhe.de/WBS

~1011 Triples

Page 14: LarKC: the large knowledge collider

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 14 http://www.aifb.uni-karlsruhe.de/WBS

Distance Sun – Pluto

Fensel / Harmelen estimate1014 Triples

~1014 Triples

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 14 http://www.aifb.uni-karlsruhe.de/WBS

Page 15: LarKC: the large knowledge collider

Infinitely scalable (1/2)

• by giving up 100% correctness:• trading quality for size• often completeness is not needed• sometimes even correctness is not needed

pre

cisi

on

(sou

ndn

ess

)

recall (completeness)

logic

IR

Semantic Web

A logician’s nightmare

(Dieter Fensel)

Page 16: LarKC: the large knowledge collider

Infinitely scalable (2/2)

• by parallelisation:• cluster computing

• wide area distribution “Thinking@home”, “self-computing semantic Web”

• cloud computing? (Amazon now, Google soon?)

Page 17: LarKC: the large knowledge collider

“Configurable platform”

“a configurable platform for infinitely scalable semantic web reasoning”

Page 18: LarKC: the large knowledge collider

Why “LarKC” ?

• The Large Knowledge Collider

A configurable platform

for experimentation

by others

Page 19: LarKC: the large knowledge collider

Why “LarKC” ?

and also:1. a merry, carefree adventure. 2. innocent or good-natured mischief; a prank. 3. something extremely easy to accomplish

But also:

Page 20: LarKC: the large knowledge collider

• The vision

• The consortium

• The project

• The plan

Page 21: LarKC: the large knowledge collider

The consortium

50 people present

Page 22: LarKC: the large knowledge collider

The Consortium

• Combining consortium competence– IR, Cognition– ML, Ontologies– Statistics, ML,

Cognition,DB– Logic,DB,

Probabilistic Inference– Economics,

Decision Theory

Page 23: LarKC: the large knowledge collider

The Consortium

 

Sem

antic Web

Logic

Distribu

ted

Com

puting

Informa

tion R

etrieval

human

problem

solving

Machine Learn

ing

Prob

abilistic Inference

RD

F techno

logy

Data

base T

echnology

Use

Case 1

Use

Case 2

UIBK                      

VUA                      

CycEur                      

HLRS                      

USFD                      

MPG                      

WICI                      

Siemens                      

Ontotext                      

CEFRIEL                      

Saltlux                      

WHO-IARC                      

Page 24: LarKC: the large knowledge collider

• The vision

• The consortium

• The project

• The plan

Oh Shit…

Page 25: LarKC: the large knowledge collider

The project

• 10M€ budget

• 3.5 years

• 80 person years

• 3 case studies

• 14 partners

• obtained in FP7 Call1: – overall < 10% funding rate– LarKC has highest funding, longest runtime

Page 26: LarKC: the large knowledge collider

Project Workpackages& timeline

WP1 – Conceptual Framework & Evaluation

WP 2: Retrieval and Selection

WP5: Collider Platform

WP

9:

Ex

plo

ita

tio

n a

nd

s

tan

da

rds

WP

10

: P

roje

ct

Ma

na

ge

me

nt

WP

8:

Tra

inin

g,

dis

se

min

ati

on

, c

om

mu

nit

y

bu

ild

ing

WP3: Abstraction and Learning

WP4: Reasoning and Deciding

WP 6: Use case: Real Time City

WP 7a: Use case: Early Clinical Development

WP 7b: Use case: Carcinogenesis

Reference Production

Page 27: LarKC: the large knowledge collider

Use case: Drug Discovery • Problem: pharmaceutical R&D in early clinical

development is stagnating

(Q1Q2Q3)

FDA white paper Innovation or Stagnation (March 2004):

“developers have no choice but to use the tools of the last century to assess this century's candidate solutions.”

“industry scientists often lack cross-cutting information about anentire product area, or information about techniques that may be used in areas other than theirs”

FDA white paper Innovation or Stagnation (March 2004):

“developers have no choice but to use the tools of the last century to assess this century's candidate solutions.”

“industry scientists often lack cross-cutting information about anentire product area, or information about techniques that may be used in areas other than theirs”

“Show me any potential liver toxicity associated with the compound’s drug class, target, structure and disease.”

Show me all liver toxicity associated with the target or the pathway.

Genetics

1Q“Show me all liver toxicity associated with compounds with similar structure”

Chemistry

2Q

“Show me all liver toxicity from the public literature and internal reports that are related to the drug class, disease and patient population”LITERATURE

3Q

Current NCBI: linking but no inference

Page 28: LarKC: the large knowledge collider

Use Case: City on-line

• Our cities face many challenges • Urban Computing

is the ICT way to address them • How can we redevelop existing neighborhoods

and business districts to improve the quality of life?

• How can we create more choices in housing, accommodating diverse lifestyles and all income levels?

• How can we reduce traffic congestion yet stay connected?

• How can we include citizens in planning their communities rather than limiting input to only those affected by the next project?

• How can we fund schools, bridges, roads, and clean water while meeting short-term costs of increased security?

• How can we redevelop existing neighborhoods and business districts to improve the quality of life?

• How can we create more choices in housing, accommodating diverse lifestyles and all income levels?

• How can we reduce traffic congestion yet stay connected?

• How can we include citizens in planning their communities rather than limiting input to only those affected by the next project?

• How can we fund schools, bridges, roads, and clean water while meeting short-term costs of increased security?

Is public transportation where the people are?Is public transportation where the people are?

Which landmarks attract more people?Which landmarks attract more people?

Where are people concentrating?Where are people concentrating?

Where is traffic moving?Where is traffic moving?

Page 29: LarKC: the large knowledge collider

• The vision

• The consortium

• The project

• The plan

Oh Shit…

Page 30: LarKC: the large knowledge collider

Project Timeline

• Surveys (plugins, platform)• Requirements (use cases)

Prototype Internal Release Public Release Final Release

Use Cases V1

Use Cases V2

Use Cases V3

420 6 18 3310

Page 31: LarKC: the large knowledge collider

Communication

• Early Access Group

• Usage Competition– “we will win if we start to loose”

• We deliver:– software– publications– not “deliverables”

Page 32: LarKC: the large knowledge collider

And Finally….

• People are already looking at us:– “Damn... the EU is where all the cool semweb work is

happening these days”– “This kind of infrastructure is exactly the kind of rocket fuel

that is needed at this stage of semweb maturity.”– “The LarKC-inspired workshop on new forms of reasoning for

the semantic web was a conference highlight for me”– “With the current growth rates of RDF on the Web, LarKC

which started out as technologically possible will quickly become operationally necessary”

– “this project really has it all (potentially) in terms of both science and impact”

• “projects already seeking collaboration:OKKAM, MUSING

“This project has the potential

to change the way people work in this area”