Transcript
Page 1: Functional manipulations of large data graphs 20160601

Linked Datasets as of August 2014

Uniprot

chem2bio2rdf

DBpedialive

URIBurner

Linguistics

Social Networking

Life Sciences

Cross-Domain

Government

User-Generated Content

Publications

Geographic

Media

Opencyc

DiseasomeFU-Berlin

DNBGND

Bio2RDFPubmed

Bio2RDFNDC

Bio2RDFMesh

CKAN

Freebase

LinklionOrganicEdunet

BiomodelsRDF

ReactomeRDF

Disgenet

IServe

LinkedTCGA

RDFLicense

EprintsHarvest

RKBExplorerLisbon

AustrianSki

Racers

RKBExplorer

LAAS

RKBExplorer

Wiki

ExplorerJISC

RKBExplorerEprints

RKBExplorer

CurriculumRKBExplorer

NSF

RKBExplorer

DBLP

RKBExplorer

ACM

RKBExplorer

Southampton

RKBExplorerDeepblue

RKBExplorer

Irit

RKBExplorerRAE2001

ExplorerBudapest

GeoLinkedData

Bio2RDFNcbigene

Bio2RDFDBSNP

Bio2RDFClinicaltrials

DBpediaPT

DBpediaES

DBpediaCS

AlpinoRDF

YAGO

KUPKB

Bio2RDF

Taxon-conceptAssets

GNULicenses

DBpedia

VIVOUniversityof Florida

StatusNetMrblog

Bio2RDFDataset

EUNIS

UniprotKB

StatusNetTimttmy

StatusNetSomsants

StatusNetIlikefreedom

DrugbankFU-Berlin

StatusNetDtdns

StatusNetStatus.net

StatusNetFragdev

Morelab

StatusNetMacno

DBpediaEU

Bio2RDFTaxon

UniprotMetadata

LinkedGeoData

ProjectWiki

Enipedia

LinkedMDB

SiderFU-Berlin

DBpediaDE

DBpediaEL

DBpediaLite

DrugInteractionKnowledge

BaseStatusNet

Qdnx

HellenicFire Brigade

StatusNetLydiastench

Taxon-concept

Occurences

W3C

StatusNet1w6

LinkedLifeData

Semantic WebDogFood

UMBEL

StatusNetSsweeny

StatusNetQuitter StatusNet

Jonkman

StatusNetThelovebug

Bio2RDFOMIM

UniprotTaxonomy

DBpediaNL

StatusNetRusswurm

DBpediaKO

DailymedFU-Berlin

DBpediaIT

Aves3D

NALT

StatusNetGomertronic

StatusNetProgval

Testee

DBpediaJA

StatusNetCooleysekula

ProductDB

StatusNetPostblue

StatusNetSkilledtests

StatusNetFcac

CleanEnergyData

Reegle

StatusNetLegadolibre

GeoNames

Bio2RDFGeneID

GNI

StatusNetSoucy

ArchiveshubLinkedData

CodeHaus

OrdnanceSurveyLinkedData

NUTSGeo-vocab

LODACBDLS

FOAF-Profiles

StatusNetSamnoble

DBpediaFR

StatusNetRainbowdash

StatusNetOurcoffs

StatusNetHackerposse

LOV

Bio2RDFTaxonomy

StatusNetMorphtown

StatusNetpiana

StatusNetchromic

Geospecies

linkedct

StatusNetlinuxwrangling

LinkedOpen Data

ofEcology

StatusNetchickenkiller

Taxonconcept

Functional Manipulation of Large Data Graphs

David Hyland-Wood [email protected]

@prototypo 1 June 2016

Page 2: Functional manipulations of large data graphs 20160601
Page 3: Functional manipulations of large data graphs 20160601
Page 4: Functional manipulations of large data graphs 20160601

Something Something elsea relationship

Page 5: Functional manipulations of large data graphs 20160601

UQ Universityis a

Page 6: Functional manipulations of large data graphs 20160601

UQ

The University of Queensland

label

Universityis a

Group of 8

affiliation

Page 7: Functional manipulations of large data graphs 20160601
Page 8: Functional manipulations of large data graphs 20160601
Page 9: Functional manipulations of large data graphs 20160601

We’ve Seen This Before

Page 10: Functional manipulations of large data graphs 20160601
Page 11: Functional manipulations of large data graphs 20160601

08 Oct 2007

Page 12: Functional manipulations of large data graphs 20160601
Page 13: Functional manipulations of large data graphs 20160601
Page 14: Functional manipulations of large data graphs 20160601
Page 15: Functional manipulations of large data graphs 20160601
Page 16: Functional manipulations of large data graphs 20160601
Page 17: Functional manipulations of large data graphs 20160601
Page 18: Functional manipulations of large data graphs 20160601
Page 19: Functional manipulations of large data graphs 20160601

The RDF Data Model

• Turtle • TriG • N-Triples • N-Quads • JSON-LD • RDFa • RDF/XML

Standard serialisation formats:

}Turtle family of RDF formats

Possibly lossy alternatives:

• CSV • ODATA • etc

Page 20: Functional manipulations of large data graphs 20160601

$ curl http://dbpedia.org/page/University_of_Queensland

$ curl http://dbpedia.org/data/University_of_Queensland

$ curl http://dbpedia.org/data/University_of_Queensland.n3 > University_of_Queensland.n3

https://en.wikipedia.org/wiki/University_of_Queensland

HTML

RDF in XML (Yuck!)

Many formats, e.g. sane RDF, ODATA, Microdata, JSON…

Page 21: Functional manipulations of large data graphs 20160601
Page 22: Functional manipulations of large data graphs 20160601
Page 23: Functional manipulations of large data graphs 20160601
Page 24: Functional manipulations of large data graphs 20160601
Page 25: Functional manipulations of large data graphs 20160601
Page 26: Functional manipulations of large data graphs 20160601

UQ

The University of Queensland

label

affiliationGroup of 8

34228

number of undergraduate students

48771

number of students

Page 27: Functional manipulations of large data graphs 20160601
Page 28: Functional manipulations of large data graphs 20160601

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)

Page 29: Functional manipulations of large data graphs 20160601

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/>select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)

Page 30: Functional manipulations of large data graphs 20160601

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergradswhere { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)

Page 31: Functional manipulations of large data graphs 20160601

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> .?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)

Page 32: Functional manipulations of large data graphs 20160601

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name .OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)

Page 33: Functional manipulations of large data graphs 20160601

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students}OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads}FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)

Page 34: Functional manipulations of large data graphs 20160601

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" )} ORDER BY DESC (?students)

Page 35: Functional manipulations of large data graphs 20160601

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)

Page 36: Functional manipulations of large data graphs 20160601
Page 37: Functional manipulations of large data graphs 20160601

OpenStreetMap

Wikimedia Commons

DBpedia

US EPA RCRA

US EPA FRS

ABT Associates

Page 38: Functional manipulations of large data graphs 20160601
Page 39: Functional manipulations of large data graphs 20160601
Page 40: Functional manipulations of large data graphs 20160601
Page 41: Functional manipulations of large data graphs 20160601
Page 42: Functional manipulations of large data graphs 20160601

UQ

The University of Queensland

label

ANU

Australian National University

label

Monash

affiliationUMelbourne

affiliation

UNSW

affiliation

USydney

affiliation

UAdelaideaffiliation

Go8

memberOf memberOf

memberOfmemberOf

memberOf

memberOf

memberOf

University of Melbourne

label

Monash University

label

University of Adelaide

label

Group of 8label

University of Sydney

label

Universityof NSW

label

Page 43: Functional manipulations of large data graphs 20160601

UQ

The University of Queensland

label

ANU

Australian National University

label

Monash

affiliation

UMelbourne

affiliation

UNSW

affiliation

USydney

affiliation

UAdelaide

affiliation

Page 44: Functional manipulations of large data graphs 20160601
Page 45: Functional manipulations of large data graphs 20160601

Graphs in Scalaval graph: Graph[String, String] = Graph(vertexRDD, edgeRDD)

// Create a subgraph based on the vertices connected // by an "affiliation" property. val affiliationRelatedSubgraph = graph.subgraph(t => t.attr == "http://dbpedia.org/ontology/affiliation")

// Find connected components of affiliationRelatedSubgraph. val ccGraph = affiliationRelatedSubgraph.connectedComponents()

Page 46: Functional manipulations of large data graphs 20160601

Graphs in Scala// Create a hashmap of componentLists. affiliationRelatedSubgraph.vertices.leftJoin (ccGraph.vertices) { case (id, u, comp) => comp.get }.foreach { case (id, startingNode) => { if (!(componentLists.contains(startingNode))) { componentLists(startingNode) = new ListBuffer[VertexId] } componentLists(startingNode) += id } }

Page 47: Functional manipulations of large data graphs 20160601

Graphs in Scala// Output a report on the connected components. println("------ connected components in related triples ------\n") for ((component, componentList) <- componentLists){ if (componentList.size > 1) { for(c <- componentList) { println(labelMap(c)); } println("--------------------------") } }

Page 48: Functional manipulations of large data graphs 20160601

------ connected components in related triples ------

Australian National University University of Sydney University of Adelaide University of New South Wales -------------------------- The University of Queensland University of Melbourne Monash University --------------------------

Page 49: Functional manipulations of large data graphs 20160601

Resources

• Slides: http://w3id.org/people/prototypo/talks/UQ-DKE-20160601/slides

• Code: http://w3id.org/people/prototypo/talks/UQ-DKE-20160601/code

Page 50: Functional manipulations of large data graphs 20160601

Resources

• Callimachus: http://callimachusproject.org

• Apache Spark: http://spark.apache.org

• GraphX Programming Guide: http://spark.apache.org/docs/latest/graphx-programming-guide.html

Page 51: Functional manipulations of large data graphs 20160601

Attributions

• Linking Open Data cloud diagram by Richard Cyganiak and Anja Jentzsch, used under a CC license: http://lod-cloud.net/

Page 52: Functional manipulations of large data graphs 20160601

This work is Copyright © 2015 David Hyland-Wood It is licensed under the Creative Commons Attribution 3.0 Unported LicenseFull details at: http://creativecommons.org/licenses/by/3.0/

You are free:

to Share — to copy, distribute and transmit the work

to Remix — to adapt the work

Under the following conditions:

Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).

Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.