25
Chuck Olson Software Engineer October 2015 [email protected] Graph Databases and Java 1

Chuck Olson Software Engineer October 2015 [email protected] Graph Databases and Java 1

Embed Size (px)

Citation preview

Page 1: Chuck Olson Software Engineer October 2015 colson@anl.gov Graph Databases and Java 1

Chuck OlsonSoftware EngineerOctober 2015

[email protected]

Graph Databases and Java

1

Page 2: Chuck Olson Software Engineer October 2015 colson@anl.gov Graph Databases and Java 1

Outline

Assumptions What is a graph and what are they good for? What is a graph database? What is Neo4J and how does one use it? Case: Subway Model Results Compilation Questions

2

Page 3: Chuck Olson Software Engineer October 2015 colson@anl.gov Graph Databases and Java 1

Audience Assumptions

Working knowledge of:

Java Relational databases

3

Page 4: Chuck Olson Software Engineer October 2015 colson@anl.gov Graph Databases and Java 1

What is a graph?

Collection of nodes and edges Edges can be directed (or not) Edges can represent many things

4

Chuck

Jim

Jay

Gary

KnowsKnow

s

KnowsK

now

sKnow

s

Coco

Annoys

Page 5: Chuck Olson Software Engineer October 2015 colson@anl.gov Graph Databases and Java 1

What is a graph?

5

Page 6: Chuck Olson Software Engineer October 2015 colson@anl.gov Graph Databases and Java 1

Transportation Example

6

Denver

18:00

20:00

13:0

0

15:0

0 12:00

Los Angeles

ChicagoNew York

Dallas

16:00

17:00

Page 7: Chuck Olson Software Engineer October 2015 colson@anl.gov Graph Databases and Java 1

What are graphs good for?

Often map more directly to the structure of some object-oriented problems.

Work best for storing “richly connected” data Many algorithms exist to extract useful information

– Dijkstra’s shortest path– Minimum spanning tree (Kruskal and others)

7

Page 8: Chuck Olson Software Engineer October 2015 colson@anl.gov Graph Databases and Java 1

What is a graph database?

A NoSQL database that stores nodes and edges, and provides a mechanism to easily query information from it.

Can contain nodes of different types Can have free-form attributes within

nodes Can have edges (relationships) of

different types Can have attributes attached to edges

(distance, cost, relationship) Query mechanism

8

Page 9: Chuck Olson Software Engineer October 2015 colson@anl.gov Graph Databases and Java 1

Why would I ever use one?

Easier to find solutions to certain problems by framing data graphically.

“The right tool for the job”

9

Page 10: Chuck Olson Software Engineer October 2015 colson@anl.gov Graph Databases and Java 1

Neo4J

Open source (GPLv3 for Community Edition) V1.0 released in 2010 Written in Java and Scala Managed by Neo Technology Uses the Property Graph Model Embedded or server Fully transactional Set of jar files ~30MB Query language: Cypher

10

Page 11: Chuck Olson Software Engineer October 2015 colson@anl.gov Graph Databases and Java 1

How do you use Neo4J?

Creating a database

11

// Location of databaseString dbPath = “/Users/chuck/myneodb”;

GraphDatabaseFactory factory = new GraphDatabaseFactory();GraphDatabaseBuilder builder = factory.newEmbeddedDatabaseBuilder(dbPath); GraphDatabaseService dbService = builder.newGraphDatabase();

Page 12: Chuck Olson Software Engineer October 2015 colson@anl.gov Graph Databases and Java 1

How do you use Neo4J?

Creating fixed node and edge types

12

// Node typespublic enum NodeLabel implements Label {Station}; // Relationship typespublic enum RelType implements RelationshipType {TRACKS_TO, ROUTE_TO, AIRWAY_TO};

Page 13: Chuck Olson Software Engineer October 2015 colson@anl.gov Graph Databases and Java 1

How do you use Neo4J?

Adding nodes to a database

13

// Create Station nodeNode node1 = dbService.createNode(NodeLabel.Station);

// Set properties on the Stationnode1.setProperty("number", “100”);node1.setProperty("name", “State St”);

// Add anotherNode node2 = dbService.createNode(NodeLabel.Station);node2.setProperty("number", “101”);node2.setProperty("name", “Lake St”);

Page 14: Chuck Olson Software Engineer October 2015 colson@anl.gov Graph Databases and Java 1

How do you use Neo4J?

Adding edges to a database

14

// Create edge from node1 to node2Relationship edge = node1.createRelationshipTo(node2, RelType.ROUTE_TO);

// Set props on the edgeedge.setProperty("route", “State St Subway”);edge.setProperty("line", “Red”);

// Create another edge of a different type.edge = node1.createRelationshipTo(node2, RelType.TRACK_TO);

Page 15: Chuck Olson Software Engineer October 2015 colson@anl.gov Graph Databases and Java 1

How do you use Neo4J?

Querying the database

15

// Returns station numbers of all stations in graph.String queryText = “MATCH (stn:Station) RETURN stn.number";

ExecutionEngine engine = new ExecutionEngine(dbService);ExecutionResult result = engine.execute(queryText);Iterator<String> stnIt = result.columnAs("stn.number");

// Print resultswhile (stnIt.hasNext()) System.out.println(stnIt.next());

Page 16: Chuck Olson Software Engineer October 2015 colson@anl.gov Graph Databases and Java 1

Case: Studying Subways

16

Page 17: Chuck Olson Software Engineer October 2015 colson@anl.gov Graph Databases and Java 1

Case: Studying Subways

Questions we might want to ask:

“Find all the stations that have air connectivity paths to station X that are less than K km”

“Find all the train routes that that go through all stations that are N stops from station X”

17

Page 18: Chuck Olson Software Engineer October 2015 colson@anl.gov Graph Databases and Java 1

Case: Studying Subways

18

Page 19: Chuck Olson Software Engineer October 2015 colson@anl.gov Graph Databases and Java 1

Case: Studying Subways

19

Stations- Number (id)- Name- Lat- Lon

Segments- SegmentId (id)- StationFromNumber- StationToNumber- Length- SegmentType

SegmentTypes- SegmentTypeId (id)- TypeName

LineSegments- LineId (id)- SegmentId- SegmentIndex

Lines- LineId (id)- LineName- LineDirection

Relational attempt…

Page 20: Chuck Olson Software Engineer October 2015 colson@anl.gov Graph Databases and Java 1

Case: Studying Subways

20

Station NameStation Number

Route Name

Distance

Distance

Graph attempt…

Node

ROUTE_TO Edge

TRACKS_TO Edge

AIRWAY_TO Edge

Page 21: Chuck Olson Software Engineer October 2015 colson@anl.gov Graph Databases and Java 1

Case: Studying Subways

21

St. Paul’s100

Bank101

Cannon Street200

Monument201

Tower Hill202

Tower Gateway300

Red

Green

Yellow

1 km

.7 km

.2 km

1.8 km

Green

Yellow

.1 km

White/Blue

Page 22: Chuck Olson Software Engineer October 2015 colson@anl.gov Graph Databases and Java 1

Case: Studying Subways

Answering the question: returns all stations 2 track segments from station 200 (Cannon Street)

22

MATCH p=(fromStn:Station)-[edge:TRACKS_TO*2..2]-(toStn:Station {number:‘200’})

WHERE fromStn.number <> toStn.number

RETURN distinct fromStn,toStn,fromStn.number

Page 23: Chuck Olson Software Engineer October 2015 colson@anl.gov Graph Databases and Java 1

Drawbacks

No standard query language like SQL. Vendor-specific. Query language learning curve. Lack of built-in visualization tools.

23

Page 24: Chuck Olson Software Engineer October 2015 colson@anl.gov Graph Databases and Java 1

For Further Reading…

24

Ian Robinson, Jim Webber, Emil EifremGraph Databases, 2nd EditionO’Reilly and Associates

Rik Van BruggenLearning Neo4JPackt Publishing

http://www.neo4j.com

http://www.analytics-driven.com

Page 25: Chuck Olson Software Engineer October 2015 colson@anl.gov Graph Databases and Java 1

Questions

25