70
A Little Graph Theory for the Busy Developer Dr. Jim Webber Chief Scientist, Neo Technology @jimwebber

A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

A Little Graph Theory for the Busy Developer

Dr. Jim Webber

Chief Scientist, Neo Technology

@jimwebber

Page 2: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Roadmap

• Imprisoned data

• Labeled Property Graph Model – And some cultural similarities

• Graph theory – South East London

– World War I

• Graph matching – Beer, diapers and Xbox

• End

Page 3: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the
Page 4: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

http://www.flickr.com/photos/crazyneighborlady/355232758/

Page 5: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

http://gallery.nen.gov.uk/image82582-.html

Page 6: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the
Page 7: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Aggregate-Oriented Data http://martinfowler.com/bliki/AggregateOrientedDatabase.html

“There is a significant downside - the whole approach works really well

when data access is aligned with the aggregates, but what if you want to

look at the data in a different way? Order entry naturally stores orders as

aggregates, but analyzing product sales cuts across the aggregate structure.

The advantage of not using an aggregate structure in the database is that it

allows you to slice and dice your data different ways for different

audiences.

This is why aggregate-oriented stores talk so much about map-reduce.”

Page 8: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the
Page 9: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

complexity = f(size, connectedness, uniformity)

Page 10: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the
Page 11: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

http://www.bbc.co.uk/london/travel/downloads/tube_map.html

Page 12: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Labeled Property graph model

• Nodes with optional properties and optional labels

• Named, directed relationships with optional properties

– Relationships have exactly one start and end node

– Which may be the same node

Page 13: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

stole from

loves loves

enemy

enemy

A Good Man Goes to War

appeared in

appeared in

appeared in

appeared in

Victory of the Daleks

appeared in

appeared in

companion

companion

enemy

Page 14: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

stole from

loves loves

enemy

enemy

A Good Man Goes to War

appeared in

appeared in

appeared in

appeared in

Victory of the Daleks

appeared in

appeared in

companion

companion

enemy

planet

prop

species

species

species

character

character

character

episode

episode

Page 15: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

http://blogs.adobe.com/digitalmarketing/analytics/predictive-analytics/predictive-analytics-and-the-digital-marketer/

Page 16: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

http://en.wikipedia.org/wiki/File:Leonhard_Euler_2.jpg

Meet Leonhard Euler

• Swiss mathematician

• Inventor of Graph Theory (1736)

16

Page 17: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Königsberg (Prussia) - 1736

Page 18: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

A

B

D

C

Page 19: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

A

B

D

C

1

2

3

4

7

6

5

Page 20: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

http://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg 20

Page 21: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the
Page 22: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Triadic Closure

name: Kyle

name: Stan name: Kenny

Page 23: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Triadic Closure

name: Kyle

name: Stan name: Kenny

name: Kyle

name: Stan name: Kenny FRIEND

Page 24: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Structural Balance

name: Cartman

name: Craig name: Tweek

Page 25: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Structural Balance

name: Cartman

name: Craig name: Tweek

name: Cartman

name: Craig name: Tweek FRIEND

Page 26: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Structural Balance

name: Cartman

name: Craig name: Tweek

name: Cartman

name: Craig name: Tweek ENEMY

Page 27: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Structural Balance

name: Kyle

name: Stan name: Kenny

name: Kyle

name: Stan name: Kenny FRIEND

Page 28: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Structural Balance is a key predictive technique

And it’s domain-agnostic

Page 29: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Allies and Enemies

UK

Germany France

Russia Italy

Austria

Page 30: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Allies and Enemies

UK

Germany France

Russia Italy

Austria

Page 31: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Allies and Enemies

UK

Germany France

Russia Italy

Austria

Page 32: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Allies and Enemies

UK

Germany France

Russia Italy

Austria

Page 33: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Allies and Enemies

UK

Germany France

Russia Italy

Austria

Page 34: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Allies and Enemies

UK

Germany France

Russia Italy

Austria

Page 35: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Predicting WWI [Easley and Kleinberg]

Page 36: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Strong Triadic Closure

It if a node has strong relationships to two neighbours, then these neighbours must have at least a weak relationship between them.

[Wikipedia]

Page 37: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Triadic Closure (weak relationship)

name: Kenny

name: Stan name: Cartman

Page 38: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Triadic Closure (weak relationship)

name: Kenny

name: Stan name: Cartman

name: Kenny

name: Stan name: Cartman

FRIEND 50%

Page 39: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Weak relationships

• Relationships can have “strength” as well as intent

– Think: weighting on a relationship in a property graph

• Weak links play another super-important structural role in graph theory

– They bridge neighbourhoods

Page 40: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Local Bridges

FRIEND

name: Kenny

name: Stan name: Kyle FRIEND

FRIEND

name: Sally

name: Bebe name: Wendy FRIEND

FRIEND 50%

name: Cartman FRIEND

ENEMY

Page 41: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Local Bridge Property

“If a node A in a network satisfies the Strong Triadic Closure Property and is involved in at least two strong relationships, then any local bridge it is involved in must be a weak relationship.” [Easley and Kleinberg]

Page 42: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

University Karate Club

Page 43: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Graph Partitioning

• (NP) Hard problem

– Recursively remove the spanning links between dense regions

– Or recursively merge nodes into ever larger “subgraph” nodes

– Choose your algorithm carefully – some are better than others for a given domain

• Can use to (almost exactly) predict the break up of the karate club!

Page 44: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

University Karate Clubs (predicted by Graph Theory)

9

Page 45: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

University Karate Clubs (what actually happened!)

Page 46: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the
Page 47: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Cypher

• Declarative graph pattern matching language

– “SQL for graphs”

– A humane tool pioneered by a tamed SQL DBA

• A pattern graph matching language

– Find me stuff like…

Page 48: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the
Page 49: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Firstname:

Mickey

Surname: Smith

DoB: 19781006

SKU: 5e175641

Product:

Badgers

Nadgers Ale

SKU: 2555f258

Product:

Peewee Pilsner

Category: beer

SKU: 49d102bc

Product: Baby

Dry Nights

Category:

nappies

Category: baby Category:

alcoholic

drinks

SKU: 49d102bc

Product: XBox

360

Category:

consumer

electronics

Category:

console

BOUGHT BOUGHT

MEMBER_OF

MEMBER_OF MEMBER_OF

MEMBER_OF MEMBER_OF

Page 50: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the
Page 51: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Firstname: *

Surname: *

DoB: 1996 > x

> 1972

Category: beer Category:

nappies

BOUGHT Category: game

console

Page 52: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Firstname: *

Surname: *

DoB: 1996 > x

> 1972

Category: beer Category:

nappies

!BOUGHT Category: game

console

Page 53: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

(beer) (nappies)

(console)

(daddy)

() ()

()

Page 54: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Flatten the graph

(d)-[:BOUGHT]->()-[:MEMBER_OF]->(n)

(d)-[:BOUGHT]->()-[:MEMBER_OF]->(b)

(d)-[:BOUGHT]->()-[:MEMBER_OF]->(c)

Page 55: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Include any labels

(d:Person)-[:BOUGHT]->()-[:MEMBER_OF]->(n:Category)

(d:Person)-[:BOUGHT]->()-[:MEMBER_OF]->(b:Category)

(d:Person)-[:BOUGHT]->()-[:MEMBER_OF]->(c:Category)

Page 56: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Add a MATCH clause

MATCH (d:Person)-[:BOUGHT]->()-[:MEMBER_OF]->(n:Category),

(d:Person)-[:BOUGHT]->()-[:MEMBER_OF]->(b:Category)

Page 57: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Constrain the Pattern

MATCH (d:Person)-[:BOUGHT]->()-[:MEMBER_OF]->(n:Category),

(d:Person)-[:BOUGHT]->()-[:MEMBER_OF]->(b:Category),

(c:Category)

WHERE NOT((d)-[:BOUGHT]->()-[:MEMBER_OF]->(c))

Page 58: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Add property constraints

MATCH (d:Person)-[:BOUGHT]->()-[:MEMBER_OF]->(n:Category),

(d:Person)-[:BOUGHT]->()-[:MEMBER_OF]->(b:Category),

(c:Category)

WHERE n.category = "nappies" AND

b.category = "beer" AND

c.category = "console" AND

NOT((d)-[:BOUGHT]->()-[:MEMBER_OF]->(c))

Page 59: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Profit!

MATCH (d:Person)-[:BOUGHT]->()-[:MEMBER_OF]->(n:Category),

(d:Person)-[:BOUGHT]->()-[:MEMBER_OF]->(b:Category),

(c:Category)

WHERE n.category = "nappies" AND

b.category = "beer" AND

c.category = "console" AND

NOT((d)-[:BOUGHT]->()-[:MEMBER_OF]->(c))

RETURN DISTINCT d AS daddy

Page 60: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Results

==> +---------------------------------------------+

==> | daddy |

==> +---------------------------------------------+

==> | Node[15]{name:"Rory Williams",dob:19880121} |

==> +---------------------------------------------+

==> 1 row

==> 0 ms

==>

neo4j-sh (0)$

Page 61: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the
Page 62: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Facebook Graph Search

Which sushi restaurants in NYC do my friends like?

See http://maxdemarzi.com/

Page 63: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Graph Structure

Page 64: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Cypher (1.x) query

START me=node:person(name = 'Jim'),

location=node:location(location='New York'),

cuisine=node:cuisine(cuisine='Sushi')

MATCH (me)-[:IS_FRIEND_OF]->(friend)-[:LIKES]->(restaurant)

-[:LOCATED_IN]->(location),(restaurant)-[:SERVES]->(cuisine)

RETURN restaurant

Page 65: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Cypher query is trivial!

MATCH (me)-[:IS_FRIEND_OF]->()-[:LIKES]->(restaurant)

-[:LOCATED_IN]->(venue),

(restaurant)-[:SERVES]->(cuisine)

WHERE me.name = 'Jim' AND

venue.location='New York' AND

cuisine.cuisine='Sushi'

RETURN restaurant

Page 66: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Search structure

Page 67: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the
Page 68: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

What’s Neo4j good for?

• Data centre management • Supply chain/provenance • Recommendations • Business intelligence • Social computing • MDM • Web of things • Time series/event data • Product/engineering catalogue • Web analytics, user journeys • Scientific computing • Spatial • Geo/Seismic/Meteorological • Bio/Pharma • And many, many more…

Page 69: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Free O’Reilly book! (if you come talk to me or attend

tomorrow’s Neo4j meetup)

Or visit http://graphdatabases.com

for the eBook version

Page 70: A Little Graph Theory for the Busy Developer...aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the

Thanks for listening @jimwebber