From Big Data to Valuable Knowledge Gerard de Melo, Tsinghua University http://gerard.demelo.org From Big Data to Valuable Knowledge Gerard de Melo, Tsinghua University http://gerard.demelo.org

From Big Data to Valuable Knowledge

Embed Size (px)

Citation preview

Page 1: From Big Data to Valuable Knowledge

From Big Data to Valuable Knowledge

Gerard de Melo, Tsinghua Universityhttp://gerard.demelo.org

From Big Data to Valuable Knowledge

Gerard de Melo, Tsinghua Universityhttp://gerard.demelo.org

Page 2: From Big Data to Valuable Knowledge

25 Years of the World Wide Web:1989−2014

25 Years of the World Wide Web:1989−2014


Tim Berners-Lee

Page 3: From Big Data to Valuable Knowledge

Big Data on the WebBig Data on the WebBig Data on the WebBig Data on the Web

Theological Hall, Strahov Monastery Library, Prague

Page 4: From Big Data to Valuable Knowledge

Main Challenge So Far: ScaleMain Challenge So Far: ScaleMain Challenge So Far: ScaleMain Challenge So Far: Scale

Matej Kren: Idiom. Prague Municipal Libraryhttps://www.flickr.com/photos/ill-padrino/6437837857/

Page 5: From Big Data to Valuable Knowledge

Developing for ScalabilityDeveloping for Scalability

Page 6: From Big Data to Valuable Knowledge

officialHadoopWordCount v1.0

excludingimportsandimprovementsin WordCountv2.0

Developing for ScalabilityDeveloping for Scalability

Page 7: From Big Data to Valuable Knowledge

import com.twitter.scalding._

class WordCountJob(args : Args) extends Job(args) { TextLine(args("input")) .flatMap('line -> 'word) { line : String => line.split("""\s+""") } .groupBy('word) { _.size } .write(Tsv(args("output")))}

Developing for ScalabilityDeveloping for Scalability

Apache Spark Twitter's Scalding

Page 8: From Big Data to Valuable Knowledge

Knowledge OrganizationKnowledge Organization

Image: http://commons.wikimedia.org/wiki/File:Mundaneum_Tir%C3%A4ng_Karteikaarten.jpg

Universal Bibliographic Repertory(Repertoire Bibliographique Universel, RBU)by Paul Otlet and Henri La Fontaine in 1895

index cards with answers to queries

Universal Bibliographic Repertory(Repertoire Bibliographique Universel, RBU)by Paul Otlet and Henri La Fontaine in 1895

index cards with answers to queries

Page 9: From Big Data to Valuable Knowledge

Knowledge OrganizationKnowledge Organization

Image: Mundaneum

Universal Bibliographic Repertory(Repertoire Bibliographique Universel, RBU)by Paul Otlet and Henri La Fontaine in 1895

index cards with answers to queries

Universal Bibliographic Repertory(Repertoire Bibliographique Universel, RBU)by Paul Otlet and Henri La Fontaine in 1895

index cards with answers to queries

Alex Wright: This was a sort of“analog search engine”

Alex Wright: This was a sort of“analog search engine”

Page 10: From Big Data to Valuable Knowledge

Current Challenge:Current Challenge:Knowledge OrganizationKnowledge Organization

Current Challenge:Current Challenge:Knowledge OrganizationKnowledge Organization

Alexandre Duret-Lutz https://www.flickr.com/photos/gadl/110845690/

Page 11: From Big Data to Valuable Knowledge

25 Years of the World Wide Web:1989−2014

25 Years of the World Wide Web:1989−2014

HyperText(the “HT” in


HyperText(the “HT” in


Basic Idea:Connecting Data

Basic Idea:Connecting Data


Tim Berners-Lee

Page 12: From Big Data to Valuable Knowledge

25 Years of the World Wide Web:1989−2014

25 Years of the World Wide Web:1989−2014

Source: Ivan Herman. Introduction to Semantic Web Technologies

Data reallyneeds to be more connected!

Data reallyneeds to be more connected!

Page 13: From Big Data to Valuable Knowledge

The Web of Data:Linked Data

The Web of Data:Linked Data

Page 14: From Big Data to Valuable Knowledge

Semantic WebSemantic WebJournal 2014Journal 2014Semantic WebSemantic WebJournal 2014Journal 2014

InterdisciplinaryInterdisciplinaryWork, e.g. inWork, e.g. inDigital HumanitiesDigital Humanities

InterdisciplinaryInterdisciplinaryWork, e.g. inWork, e.g. inDigital HumanitiesDigital Humanities

The Web of Data:Lexvo.org

The Web of Data:Lexvo.org

Page 15: From Big Data to Valuable Knowledge

Source: Peter Mika

Entity Integration:Challenges

Entity Integration:Challenges

Page 16: From Big Data to Valuable Knowledge

Entity Integration:Challenges

Entity Integration:Challenges

Page 17: From Big Data to Valuable Knowledge

ACL 2010AAAI 2013ACL 2010AAAI 2013

Entity Integration:Challenges

Entity Integration:Challenges

Page 18: From Big Data to Valuable Knowledge

One bad link is One bad link is enough to make aenough to make aconnected component connected component inconsistentinconsistent

One bad link is One bad link is enough to make aenough to make aconnected component connected component inconsistentinconsistent

ACL 2010AAAI 2013ACL 2010AAAI 2013

Entity Integration:Challenges

Entity Integration:Challenges

Page 19: From Big Data to Valuable Knowledge

Min. cost solution:Min. cost solution:NP-hardNP-hard


Min. cost solution:Min. cost solution:NP-hardNP-hard


Entity IntegrationEntity Integration

ACL 2010AAAI 2013ACL 2010AAAI 2013

Our Solution:Our Solution:Use Linear Program andUse Linear Program andthen apply region growingthen apply region growing


→ → LogarithmicLogarithmicApproximationApproximation


Our Solution:Our Solution:Use Linear Program andUse Linear Program and

then apply region growingthen apply region growingtechniquestechniques

→ → LogarithmicLogarithmicApproximationApproximation


Page 20: From Big Data to Valuable Knowledge

Taxonomic Links

a user wantsa list of

„Art Schools in Europe“

Page 21: From Big Data to Valuable Knowledge

Taxonomic Integration:MENTA Approach

De Melo & Weikum (2010).CIKM Best Interdisciplinary Paper AwardDe Melo & Weikum (2010).CIKM Best Interdisciplinary Paper Award

Page 22: From Big Data to Valuable Knowledge

Taxonomic Integration:MENTA Approach

De Melo & Weikum (2010).CIKM Best Interdisciplinary Paper AwardDe Melo & Weikum (2010).CIKM Best Interdisciplinary Paper Award

Page 23: From Big Data to Valuable Knowledge

Taxonomic Integration:MENTA Approach

De Melo & Weikum (2010).CIKM Best Interdisciplinary Paper AwardDe Melo & Weikum (2010).CIKM Best Interdisciplinary Paper Award

Page 24: From Big Data to Valuable Knowledge

Taxonomic Integration:MENTA Approach

De Melo & Weikum (2010).CIKM Best Interdisciplinary Paper AwardDe Melo & Weikum (2010).CIKM Best Interdisciplinary Paper Award

Page 25: From Big Data to Valuable Knowledge

UWN/MENTA: multilingual extension of WordNet forword senses and taxonomical information over 200 languages

Gerard de Melo


Page 26: From Big Data to Valuable Knowledge

Relation ExtractionRelation Extraction

Images: Denilson Barbosa, Haixun Wang, Cong Yu. Shallow Information Extraction for the Knowlege Web

Scaling Up:Tandon, de Melo & Weikum.AAAI 2011, COLING 2012

Scaling Up:Tandon, de Melo & Weikum.AAAI 2011, COLING 2012

Page 27: From Big Data to Valuable Knowledge


MetaWeb was acquired by Google.MetaWeb was just recently acquired by Google.MetaWeb, surprisingly, was acquired by Google.

Relation IntegrationRelation Integration

MetaWeb was bought out by Google.Google bought MetaWeb.Google acquired MetaWeb.MetaWeb was sold to Google.Google's acquisition of MetaWeb.Google's MetaWeb acquisition.and so on...

Page 28: From Big Data to Valuable Knowledge

Underlying frame: Commercial transfer

● Capture the “who-did-what-to-whom”● Microsoft bought the patent from Nokia.

Nokia sold the patent to Microsoft.The patent was acquired by Microsoft [from Nokia].The patent was sold [by Nokia] to Microsoft.

Relation IntegrationRelation Integration

Buyer: Microsoft

Seller: Nokia

Product: The patent

Page 29: From Big Data to Valuable Knowledge

Relation Integration:FrameBase.org

Bringing knowledge into a standard formbased on natural language (FrameNet)

Bringing knowledge into a standard formbased on natural language (FrameNet)

Page 30: From Big Data to Valuable Knowledge

Relation IntegrationRelation Integration

X isAuthorOf YY writtenBy XX wrote YY writtenInYear Z

Page 31: From Big Data to Valuable Knowledge

Relation IntegrationRelation Integration

YAGO: isMarriedTo predicateYAGO: isMarriedTo predicate

Freebase: Marriage EntityFreebase: Marriage Entity





Page 32: From Big Data to Valuable Knowledge

Search Interfaces

“Which companies were created during the last century in Silicon Valley ?”

YAGO2:WWW 2011

Best Demo Award

YAGO2:WWW 2011

Best Demo Award

Gerard de Melo

Page 33: From Big Data to Valuable Knowledge

Real Understanding?Real Understanding?

Knowledge Bases keep growing, butmuch of the Web is still not truly understood

Knowledge Bases keep growing, butmuch of the Web is still not truly understood

Page 34: From Big Data to Valuable Knowledge

Real Understanding?

Source: CMU NELL Browser 2015-03-17

Over 4000countries

with >90%confidence

Over 4000countries

with >90%confidence



Page 35: From Big Data to Valuable Knowledge

Future Challenge: Future Challenge: Real UnderstandingReal UnderstandingFuture Challenge: Future Challenge:

Real UnderstandingReal Understanding

Voynich Manuscript, early 15th century

Page 36: From Big Data to Valuable Knowledge

From Big Data to KnowledgeFrom Big Data to KnowledgeFrom Big Data to KnowledgeFrom Big Data to Knowledge

Image:Brett Ryder

Page 37: From Big Data to Valuable Knowledge

Machine LearningMachine Learning

Examples ProbablyIncorrect!

LearningLearning PredictionPrediction




Page 38: From Big Data to Valuable Knowledge

Better Machine LearningBetter Machine Learning

Examples ProbablyIncorrect!

LearningLearning PredictionPrediction




+BetterLabelsfor Test


Page 39: From Big Data to Valuable Knowledge


Always there toanswer questionsAlways there toanswer questions

Page 40: From Big Data to Valuable Knowledge

Learning Common-SenseLearning Common-Sense

Gerard de Melo

I'm cold.

Warm coffee and tea are available atCosta Coffee just around the corner.But don't forget your meeting with

Linda in half an hour!

Page 41: From Big Data to Valuable Knowledge

Learning Common-Sense:From Big Data?

Learning Common-Sense:From Big Data?

Page 42: From Big Data to Valuable Knowledge


AAAI 2014WSDM 2014AAAI 2011


AAAI 2014WSDM 2014AAAI 2011

WebChild: LearningCommon-Sense From Big Data

WebChild: LearningCommon-Sense From Big Data

Page 43: From Big Data to Valuable Knowledge

Why do you think Mary put on thering at the end of the movie?

Yes, that was powerful scene. The factthat she put it on after reading theletter from her mother indicates

that she may have changedher mind about the value of ...

Future: Learning Advanced Common-Sense Knowledge?Future: Learning Advanced

Common-Sense Knowledge?

Page 44: From Big Data to Valuable Knowledge


Big Data is radically changing the world

Main Challenge in the Past: Scale

Main Current Challenge: Organization1. Entity Integration2. Taxonomic Integration3. Relation Extraction and Integration

Main Future Challenge: Real Understandingby learning from weak signals