Modelling Data as Graphs (Neo4j)

Preview:

DESCRIPTION

Modelling Data in Neo4j for beginners, common mistakes, frequently asked questions, hardware sizing and a few extra tips

Citation preview

GraphAwareTM

by Michal Bachman

a few best practices and lessons learned

Modelling Data in Neo4j

GraphAwareTM

GraphAwareTM

Ride-sharing website

History of rides

Friendships from Facebook

Aim: build trust between users

Example Domain

GraphAwareTM

There is no single correct way.

Modelling Data as Graphs

GraphAwareTM

Graphs are very whiteboard friendly.

Modelling Data as Graphs

ipsum

FRIEND_OF

name: “Michael”

User

name: “Laura”

User

FRIEND_OF

name: “Peter”

User

DROVE

name: “Alice”

User

DROVE

ipsum

FRIEND_OF

name: “Michael”

User

name: “Laura”

User

FRIEND_OF

name: “Peter”

User

DROVE

name: “Alice”

User

DROVE

name: “Jenny”

User

DROVE

ipsum

FRIEND_OF

name: “Michael”

User

name: “Laura”

User

FRIEND_OF

name: “Peter”

User

DROVE

name: “Alice”

User

DROVE

name: “Jenny”

User

DROVE

date: 2014-01-29date: 2014-01-29date: 2014-01-27

ipsum

FRIEND_OF

name: “Michael”

User

name: “Laura”

User

FRIEND_OF

name: “Peter”

User

DROVE

name: “Alice”

User

DROVE

name: “Jenny”

User

DROVE

date: 2014-01-29date: 2014-01-29date: 2014-01-27

RODE_TOGETHER

RODE_TOGETHER

ipsum

FRIEND_OF

name: “Michael”

User

name: “Laura”

User

FRIEND_OF

date: 2014-01-29from: “London”

to: “Nottingham”

RideDRIVER

name: “Alice”

User

PASSENGER

date: 2014-01-27from: “Brighton”

to: “Hastings”

Ride

PASSENGER

name: “Peter”

User

PASSENGER

name: “Jenny”

User

DRIVER

GraphAwareTM

Make important concepts in your domain nodes, you will gain flexibility.

Nodes vs. Relationships

ipsum

FRIEND_OF

name: “Michael”

User

name: “Laura”

User

FRIEND_OF

date: 2014-01-29from: “London”

to: “Nottingham”

RideDRIVER

name: “Alice”

User

PASSENGER

date: 2014-01-27from: “Brighton”

to: “Hastings”

Ride

PASSENGER

name: “Peter”

User

PASSENGER

name: “Jenny”

User

DRIVER

ipsum

FRIEND_OF

name: “Michael”

User

name: “Laura”

User

FRIEND_OF

date: 2014-01-29from: “London”

to: “Nottingham”

RideDRIVER

name: “Alice”

User

PASSENGER

date: 2014-01-27from: “Brighton”

to: “Hastings”

Ride

PASSENGER

name: “Peter”

User

PASSENGER

name: “Jenny”

User

DRIVER

RATEDrating: 5RATED

rating: 3

ipsum

FRIEND_OFname: “Michael”

User

name: “Laura”

User

date: 2014-01-29from: “London”

to: “Nottingham”

RideDRIVER

name: “Alice”

User

PASSENGER

date: 2014-01-27from: “Brighton”

to: “Hastings”

Ride

PASSENGER

name: “Peter”

User

PASSENGER

name: “Jenny”

User

DRIVER

RATEDrating: 5RATED

rating: 3

ipsum

FRIEND_OFname: “Michael”

User

name: “Laura”

User

date: 2014-01-29from: “London”

to: “Nottingham”

RideDRIVER

name: “Alice”

User

PASSENGER

date: 2014-01-27from: “Brighton”

to: “Hastings”

Ride

PASSENGER

name: “Peter”

User

PASSENGER

name: “Jenny”

User

DRIVER

RATEDrating: 5RATED

rating: 3

GraphAwareTM

a common mistake

Bidirectional Relationships

DEFEATEDCzech Republic

Sweden

GraphAwareTM

Ice Hockey

DEFEATEDCzech Republic

Sweden

GraphAwareTM

Ice Hockey

DEFEATED

Czech Republic

Sweden

DEFEATED_BY

GraphAwareTM

Ice Hockey (Implied Relationship)

DEFEATED

Czech Republic

Sweden

DEFEATED_BY

GraphAwareTM

Ice HockeyIce Hockey (Implied Relationship)

PARTNERNeo Technology GraphAware

PARTNERNeo Technology GraphAware

GraphAwareTM

Company Partnership (Naturally Bidirectional)

PARTNER

Neo Technology GraphAware

PARTNER

GraphAwareTM

Company Partnership (Naturally Bidirectional)

PARTNER

Neo Technology GraphAware

PARTNER

GraphAwareTM

Company Partnership (Naturally Bidirectional)

Neo Technology GraphAware

PARTNER

GraphAwareTM

Company Partnership (Naturally Bidirectional)

Neo Technology GraphAware

PARTNER

GraphAwareTM

Company Partnership (Naturally Bidirectional)

GraphAwareTM

In Neo4j, the speed of traversal does not depend on the direction of the relationships being traversed.

Traversal Speed

GraphAwareTM

Why?

GraphAwareTM

GraphAwareTM

Node Record in the Node Store (9 bytes), first bit = inUse flag

Relationship Record in the Relationship Store (33 bytes), first bit = inUse flag, second bit unused

next relationship

(35 bits)

next property (36 bits)

first node(35 bits)

second node (35 bits)

type(16 bits)

first node's previous

relationship (35 bits)

first node's next

relationship (35 bits)

second node's first relationship

(35 bits)

second node's next relationship

(35 bits)

next property (36 bits)

Neo4j Data Layout

GraphAwareTM

Neo4j APIs allow developers to completely ignore relationship direction when querying the graph.

Traversal APIs

GraphAwareTM

MATCH  (neo)-­‐[:PARTNER]-­‐>(partner)

Cypher

GraphAwareTM

MATCH  (neo)<-­‐[:PARTNER]-­‐(partner)

Cypher

GraphAwareTM

MATCH  (neo)-­‐[:PARTNER]-­‐(partner)

Cypher

GraphAwareTM

Different quality in each direction => should have two relationships!

Heads Up!

LOVES

Geeky Guy Girl

DOESN’T CARE ABOUT

ipsum

FRIEND_OFname: “Michael”

User

name: “Laura”

User

date: 2014-01-29from: “London”

to: “Nottingham”

RideDRIVER

name: “Alice”

User

PASSENGER

date: 2014-01-27from: “Brighton”

to: “Hastings”

Ride

PASSENGER

name: “Peter”

User

PASSENGER

name: “Jenny”

User

DRIVER

RATEDrating: 5RATED

rating: 3

ipsum

FRIEND_OFname: “Michael”

User

name: “Laura”

User

date: 2014-01-29from: “London”

to: “Nottingham”

RideDRIVER

name: “Alice”

User

PASSENGER

date: 2014-01-27from: “Brighton”

to: “Hastings”

Ride

PASSENGER

name: “Peter”

User

PASSENGER

name: “Jenny”

User

DRIVER

RATEDrating: ?RATED

rating: 3

HATEDDISLIKEDNEUTRALLIKEDLOVED

FRIEND_OFname: “Michael”

User

name: “Laura”

User

date: 2014-01-29from: “London”

to: “Nottingham”

RideDRIVER

name: “Alice”

User

PASSENGER

date: 2014-01-27from: “Brighton”

to: “Hastings”

Ride

PASSENGER

name: “Peter”

User

PASSENGER

name: “Jenny”

User

DRIVER

LOVEDNEUTRAL

GraphAwareTM

performance comparison

Qualifying Relationships

ipsum

FRIEND_OFname: “Michael”

User

name: “Laura”

User

date: 2014-01-29from: “London”

to: “Nottingham”

RideDRIVER

name: “Alice”

User

PASSENGER

date: 2014-01-27from: “Brighton”

to: “Hastings”

Ride

PASSENGER

name: “Peter”

User

PASSENGER

name: “Jenny”

User

DRIVER

RATEDrating: 5RATED

rating: 3

Qualifying by Properties

GraphAwareTM

START      ride=node({id})  MATCH      (ride)<-­‐[r:RATED]-­‐(passenger)  WHERE      r.rating  >  3  RETURN    passenger

Who liked the ride? (Cypher)

GraphAwareTM

for  (Relationship  r  :  ride.getRelationships(INCOMING,  RATED))    {          if  ((int)  r.getProperty("rating")  >  3)            {                  Node  passenger  =  r.getStartNode();  //do  something  with  it          }  }

Who liked the ride? (Java)

FRIEND_OFname: “Michael”

User

name: “Laura”

User

date: 2014-01-29from: “London”

to: “Nottingham”

RideDRIVER

name: “Alice”

User

PASSENGER

date: 2014-01-27from: “Brighton”

to: “Hastings”

Ride

PASSENGER

name: “Peter”

User

PASSENGER

name: “Jenny”

User

DRIVER

LOVEDNEUTRAL

Qualifying by Relationship Type

GraphAwareTM

START      ride=node({id})  MATCH      (ride)<-­‐[r:LIKED|LOVED]-­‐(passenger)  RETURN    passenger

Who liked the ride? (Cypher)

GraphAwareTM

for  (Relationship  r  :  ride.getRelationships(INCOMING,  LIKED,  LOVED))    {          Node  passenger  =  r.getStartNode();  //do  something  with  it  }

Who liked the ride? (Java)

GraphAwareTM

GraphAwareTM

FRIEND_OFname: “Michael”

User

name: “Laura”

User

date: 2014-01-29from: “London”

to: “Nottingham”

RideDRIVER

name: “Alice”

User

PASSENGER

date: 2014-01-27from: “Brighton”

to: “Hastings”

Ride

PASSENGER

name: “Peter”

User

PASSENGER

name: “Jenny”

User

DRIVER

LOVEDNEUTRAL

Winner!

Other interesting info?

GraphAwareTM

frequently asked question

Hardware Sizing

GraphAwareTM

HDD

Record Files

Transaction Log

Operating System

JVM

Neo4j

Object Cache

Core API

Other APIs

TransactionManagement

File System Cache

Node

s

Rela

tions

hips

Prop

ertie

s

Rela

tions

hip

Type

s

Neo4j Architecture

GraphAwareTM

>  cd  data  >  ls  -­‐ah

Disk Space

GraphAwareTM

drwxr-­‐xr-­‐x      5  bachmanm    wheel      170B  19  Oct  12:56  index  -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel        31K  19  Oct  12:56  messages.log  -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel        69B  19  Oct  12:56  neostore  -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.id  -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel      8.8K  19  Oct  12:56  neostore.nodestore.db  -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.nodestore.db.id  -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel        39M  19  Oct  12:56  neostore.propertystore.db  -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel      153B  19  Oct  12:56  neostore.propertystore.db.arrays  -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.propertystore.db.arrays.id  -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.propertystore.db.id  -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel        43B  19  Oct  12:56  neostore.propertystore.db.index  -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.propertystore.db.index.id  -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel      140B  19  Oct  12:56  neostore.propertystore.db.index.keys  -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.propertystore.db.index.keys.id  -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel      154B  19  Oct  12:56  neostore.propertystore.db.strings  -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.propertystore.db.strings.id  -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel        31M  19  Oct  12:56  neostore.relationshipstore.db  -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.relationshipstore.db.id  -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel        38B  19  Oct  12:56  neostore.relationshiptypestore.db  -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.relationshiptypestore.db.id  -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel      140B  19  Oct  12:56  neostore.relationshiptypestore.db.names  -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.relationshiptypestore.db.names.id

Disk Space

GraphAwareTM

Disk Space

node 14B

relationship 33B

property 41B

GraphAwareTM

Disk Space (Example)

1,000 nodes x 14B = 13.7 kB1,000,000 rels x 33B = 31.5 MB2,010,000 props x 41B = 78.6 MBTOTAL 110.1 MB

GraphAwareTM

How about low level cache? Any guesses?

Low Level Cache

GraphAwareTM

Same as disk space

Low Level Cache

GraphAwareTM

High Level Cache

node 344B

relationship 208B

property 116B

...

Other interesting info?

GraphAwareTM

Cypher is great!

Cypher is improving

But don’t be afraid of writing some Java

Java API vs. Cypher

GraphAwareTM

Experiment

Measure

Analyse

Ask

Conclusion

GraphAwareTM

www.graphaware.com @graph_aware

Thanks!

GraphAwareTM

Next  meetup

• The  transport  graph  – Roads,  Nodes  and  Automobiles (Jacqui  Read)  

– Transport  Network  Route  Finding  Using  A  Graph (Ian  Cartwright  &  Ben  Earlham)  

• 26th  February  2014  • Here!

GraphAwareTM

GraphAwareTM

Ian Robinson, Jim Webber & Emil Eifrem

Graph Databases

h

Compliments

of Neo Technology

GraphAwareTM

Take  me  to  the  pub…

GraphAwareTM

www.graphaware.com @graph_aware

Thanks!

Recommended