26
Traversing Graph Databases with Gremlin Marko A. Rodriguez Graph Systems Architect http://markorodriguez.com NoSQL New York City Meetup – May 16, 2011 Gremlin G =(V,E ) May 10, 2011

Traversing Graph Databases with Gremlin

Embed Size (px)

DESCRIPTION

A discussion of Blueprints, Pipes, and Gremlin. The presentation's second half was a live Gremlin tutorial/demo.

Citation preview

Page 1: Traversing Graph Databases with Gremlin

Traversing Graph Databases with Gremlin

Marko A. RodriguezGraph Systems Architecthttp://markorodriguez.com

NoSQL New York City Meetup – May 16, 2011

GremlinG = (V,E)

May 10, 2011

Page 2: Traversing Graph Databases with Gremlin

Thank You Sponsors

Short Slideshow + Live Demo = This Presentation

Page 3: Traversing Graph Databases with Gremlin

TinkerPop Productions

1

1TinkerPop (http://tinkerpop.com), Marko A. Rodriguez (http://markorodriguez.com), PeterNeubauer (http://www.linkedin.com/in/neubauer), Joshua Shinavier (http://fortytwo.net/), PavelYaskevich (https://github.com/xedin), Darrick Wiebe (http://ofallpossibleworlds.wordpress.com/), Stephen Mallette (http://stephen.genoprime.com/), Alex Averbuch (http://se.linkedin.com/in/alexaverbuch)

Page 4: Traversing Graph Databases with Gremlin

TinkerPop Graph Stack

TinkerGraph

http://host:8182/graph/vertices/1

Graph Database

Generic Interface

Traversal Engine

Traversal Language

RESTful Server

Graph-to-Object Mapper

Page 5: Traversing Graph Databases with Gremlin

TinkerPop Graph Stack – Focus

TinkerGraph

http://host:8182/graph/vertices/1

Generic Interface

Traversal Engine

Traversal Language

Page 6: Traversing Graph Databases with Gremlin

Blueprints –> Pipes –> Groovy –> Gremlin

To understand Gremlin, its important to understand Groovy,Blueprints, and Pipes.

Blueprints Pipes

Page 7: Traversing Graph Databases with Gremlin

Gremlin is a Domain Specific Language

• Gremlin 0.7+ uses Groovy as its host language.2

• Gremlin 0.7+ takes advantage of Groovy’s meta-programming, dynamictyping, and closure properties.

• Groovy is a superset of Java and as such, natively exposes the full JDKto Gremlin.

2Groovy available at http://groovy.codehaus.org/.

Page 8: Traversing Graph Databases with Gremlin

Gremlin is for Property Graphs

name = "marko"age = 29

1

4

knows

weight = 1.0

name = "josh"age = 32

name = "vadas"age = 27

2

knows

weight = 0.5

created

weight = 0.4

name = "lop"lang = "java"

3created

weight = 0.4

name = "ripple"lang = "java"

5

created

weight = 1.0

name = "peter"age = 35

6

created

weight = 0.2

78

9

11

10

12

A graph is composed of vertices (nodes), edges (links), and properties (keys/values).

Page 9: Traversing Graph Databases with Gremlin

Gremlin is for Blueprints-enabled Graph Databases

• Blueprints can be seen as the JDBC for property graph databases.3

• Provides a collection of interfaces for graph database providers to implement.

• Provides tests to ensure the operational semantics of any implementation are correct.

• Provides numerous “helper utilities” to make working with graph databases easy.

Blueprints

3Blueprints is available at http://blueprints.tinkerpop.com.

Page 10: Traversing Graph Databases with Gremlin

A Blueprints Detour - Implementations

TinkerGraph

Page 11: Traversing Graph Databases with Gremlin

A Blueprints Detour - Ouplementations

JUNGJava Universal Network/Graph Framework

Page 12: Traversing Graph Databases with Gremlin

Gremlin Compiles Down to Pipes

• Pipes is a data flow framework for evaluating lazy graph traversals.4

• A Pipe extends Iterator, Iterable and can be chained togetherto create processing pipelines.

• A Pipe can be embedded into another Pipe to allow for nestedprocessing.

Pipes4Pipes is available at http://pipes.tinkerpop.com.

Page 13: Traversing Graph Databases with Gremlin

A Pipes Detour - Chained Iterators

• This Pipeline takes objects of type A and turns them into objects oftype D.

Pipe1A B Pipe2 C Pipe3 D

Pipeline

A

AA

A

D

DD

D

Pipe<A,D> pipeline =

new Pipeline<A,D>(Pipe1<A,B>, Pipe2<B,C>, Pipe3<C,D>)

Page 14: Traversing Graph Databases with Gremlin

A Pipes Detour - Simple Example

“What are the names of the people that marko knows?”

A C

B

D

knows

knows

created

createdname=marko

name=peter

name=pavel

name=gremlin

Page 15: Traversing Graph Databases with Gremlin

A Pipes Detour - Simple ExamplePipe<Vertex,Edge> pipe1 = new OutEdgesPipe("knows");

Pipe<Edge,Vertex> pipe2 = new InVertexPipe();

Pipe<Vertex,String> pipe3 = new PropertyPipe<String>("name");

Pipe<Vertex,String> pipeline = new Pipeline(pipe1,pipe2,pipe3);

pipeline.setStarts(new SingleIterator<Vertex>(graph.getVertex("A"));

A C

B

D

knows

knows

created

createdname=marko

name=peter

name=pavel

OutEdgesPipe("knows")

InVertexPipe()

PropertyPipe("name")

name=gremlin

HINT: The benefit of Gremlin is that this Java verbosity is reduced to g.v(‘A’).outE(‘knows’).inV.name.

Page 16: Traversing Graph Databases with Gremlin

A Pipes Detour - Pipes Library

[ FILTERS ]

AndFilterPipe

BackFilterPipe

CollectionFilterPipe

ComparisonFilterPipe

DuplicateFilterPipe

FutureFilterPipe

ObjectFilterPipe

OrFilterPipe

RandomFilterPipe

RangeFilterPipe

[ GRAPHS ]

OutEdgesPipe

InEdgesPipe

OutVertexPipe

InVertexPip

IdFilterPipe

IdPipe

LabelFilterPipe

LabelPipe

PropertyFilterPipe

PropertyPipe

[ SIDEEFFECTS ]

AggregatorPipe

GroupCountPipe

CountPipe

SideEffectCapPipe

[ UTILITIES ]

GatherPipe

PathPipe

ScatterPipe

Pipeline

...

Page 17: Traversing Graph Databases with Gremlin

A Pipes Detour - Creating Pipes

public class NumCharsPipe extends AbstractPipe<String,Integer> {

public Integer processNextStart() {

String word = this.starts.next();

return word.length();

}

}

When extending the base class AbstractPipe<S,E> all that is required isan implementation of processNextStart().

Page 18: Traversing Graph Databases with Gremlin

Now onto Gremlin proper...

Page 19: Traversing Graph Databases with Gremlin

The Gremlin Architecture

OrientDB DEXNeo4j

Page 20: Traversing Graph Databases with Gremlin

The Many Ways of Using Gremlin

• Gremlin has a REPL to be run from the shell.

• Gremlin can be natively integrated into any Groovy class.

• Gremlin can be interacted with indirectly through Java, via Groovy.

• Gremlin has a JSR 223 ScriptEngine as well.

Page 21: Traversing Graph Databases with Gremlin

Pipe = Step: 3 Generic Steps

In general a Pipe is a “step” in Gremlin. Gremlin is founded on a collectionof atomic steps. The syntax is generally seen as step.step.step. Thereare 3 categories of steps.

• transform: map the input to some output. S → T

? outE, inV, paths, copySplit, fairMerge, . . .

• filter: either output the input or not. S → (S ∪ ∅)

? back, except, uniqueObject, andFilter, orFilter, . . .

• sideEffect: output the input and yield a side-effect. S → S

? aggregate, groupCount, . . .

Page 22: Traversing Graph Databases with Gremlin

Abstract/Derived/Inferred Adjacency

}

outE

inVoutE

loop(2){..} back(3){it.salary}

friend_of_a_friend_who_earns_less_than_friend_at_work

• outE, inV, etc. is low-level graph speak (the domain is the graph).

• codeveloper is high-level domain speak (the domain is software development).5

5In this way, Gremlin can be seen as a DSL (domain-specific language) for creating DSL’s for yourgraph applications. Gremlin’s domain is “the graph.” Build languages for your domain on top of Gremlin(e.g. “software development domain”).

Page 23: Traversing Graph Databases with Gremlin

Explicit Graph

Developer CreatedGraph

Software ImportsGraph

FriendshipGraph

EmploymentGraph

Developer Imports Graph

Friend-Of-A-FriendGraph

Friends at WorkGraph

Developer's FOAF ImportGraph

You need not make derived graphs explicit. You can, at runtime, compute them. Moreover, generate them locally, not globally (e.g. ``Marko's friends from work relations").

This concept is related to automated reasoning and whether reasoned relations are inserted into the explicit graph or computed at query time.

Page 24: Traversing Graph Databases with Gremlin

Integrating the JDK (Java API)

• Groovy is the host language for Gremlin. Thus, the JDK is nativelyavailable in a path expression.

gremlin> v.out(‘friend’){it.name.matches(‘M.*’)}.name

==>Mattias

==>Marko

==>Matt

...

gremlin> x.out(‘rated’).transform{JSONParser.get(it.uri).stars}.mean()

...

gremlin> y.outE(‘rated’).filter{TextAnalysis.isElated(it.review)}.inV

...

Page 25: Traversing Graph Databases with Gremlin

Time for a demo

marko$ gremlin

\,,,/

(o o)

-----oOOo-(_)-oOOo-----

gremlin>

GremlinG = (V,E)

http://gremlin.tinkerpop.com

http://groups.google.com/group/gremlin-users

http://tinkerpop.com

Page 26: Traversing Graph Databases with Gremlin

Graph Bootcamp

http://markorodriguez.com/services/workshops/graph-bootcamp/

• June 23-24th in Chicago, Illinois.

• August X-Yth in Denver, Colorado.

• Book a private gig for your organization.