24
Avoiding Deadlocks in Neo4j on Z-Platform - Mahesh Chaudhari, Cesar Arevalo & Brian Roy

Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB - Mahesh Chaudhari @ GraphConnect SF 2013

Embed Size (px)

DESCRIPTION

Z-Platform is the new innovative powerful and complex platform to ingest data of any kind and store the data in the form of JSON documents in MongoDB and represent a sparse representation of the same in Neo4j graph database. Mahesh discusses how he tackled deadlocks and improved the performance of the system significantly. The test environment included small graphs (ranging up to 10000 relationships to very large graphs (ranging up to 39 million relationships). The average performance of the system is 3741 relationships per minute.

Citation preview

Page 1: Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB  - Mahesh Chaudhari @ GraphConnect SF 2013

Avoiding Deadlocks in Neo4j on Z-Platform

- Mahesh Chaudhari, Cesar Arevalo & Brian Roy

Page 2: Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB  - Mahesh Chaudhari @ GraphConnect SF 2013

2

Outline

• Introduction to the Z-Platform• Problems caused by Deadlocks• Locks and Deadlocks in Neo4j• Avoidance using Bipartite graphs• Performance• Conclusion

Page 3: Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB  - Mahesh Chaudhari @ GraphConnect SF 2013

3

Z-PlatformMongoDB Database

Z-Platform

Neo4j Graph Database

Json Documents Nodes & Edges

Source Datasets

Full Entity Profiles

Sparse Representation of Profiles

Page 4: Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB  - Mahesh Chaudhari @ GraphConnect SF 2013

4

Deadlocks in Z-Platform

• Creating relationships is one of the most time consuming processes

• Log analysis reveals deadlocks among batch transactions and retry-mechanism takes time

• Dependent on how nodes and relationships are grouped together

• Batch size is dependent on the size of the JSON block sent to the server

• Time required to build relationships and resolve deadlocks is in the order of seconds

Page 5: Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB  - Mahesh Chaudhari @ GraphConnect SF 2013

5

Locks in Neo4j

• Create a Node n1 Write Lock on Node n1• Update a Node n1 Write Lock on Node n1

but read available on Node n1• Create a Relationship r1 between nodes n1

and n2 Write Locks on relationship r1, n1 and n2

Page 6: Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB  - Mahesh Chaudhari @ GraphConnect SF 2013

6

Deadlocks across processes• Processes: P1 and P2• Nodes: A, B, C, D• Relationships: R1, R2, R3, R4

A B

C D

P1

P2

No Deadlocks

R1

R2

A B

A D

P1

P2

Possibility of Deadlock

R1

R3

A

C

B

D

P1

P2

No Deadlocks A B

A D

P1

P2

Deadlock

R1

R3

A CR4

C DR2

Page 7: Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB  - Mahesh Chaudhari @ GraphConnect SF 2013

7

Deadlocks across Transactions

• Transactions are also like separate processes but in a single thread or multiple threads

• Deadlocks occur across transactions– Two concurrent transactions need write locks on the

same node n1– In two concurrent transactions, T1 has write lock on

node n1 and waiting on write lock on node n2 whereas T2 has write lock on n2 and is waiting for write lock on node n1

– Transactions of varying sizes

Page 8: Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB  - Mahesh Chaudhari @ GraphConnect SF 2013

8

Concurrent Transactions Deadlocks

A B

C D

T1

T2

No Deadlocks

A B

A D

T1

T2

Possibility of Deadlock

Page 9: Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB  - Mahesh Chaudhari @ GraphConnect SF 2013

9

Sequential Asynchronous Transactions Deadlocks

A B

C D

T1

T2

No Deadlocks

E F

.

.

.n edges

A D T2

Deadlock

A B

T1

E F

.

.

.n edges

A D T1

No Deadlock

A B

T2

E F...

n edges

Page 10: Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB  - Mahesh Chaudhari @ GraphConnect SF 2013

10

Deadlocks Detection and Avoidance

• Deadlocks Detection– Only possible at run-time– Recovery from deadlock is either to abort or retry

• Deadlocks Avoidance– Reorder the operations to lower or eliminate the

likelihood of deadlocks• Graph Clustering Algorithms: Most of them require

knowledge of entire graph

Clustering Relationships Bipartite Graphs

Page 11: Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB  - Mahesh Chaudhari @ GraphConnect SF 2013

11

Bipartite Graphs• Given a Graph G with Vertices V and Edges E,

then graph G is a bipartite graph such that vertices V can be partitioned into two independent sets V1 and V2.

A E

CD

B

A

C

B

D

E

V1 V2

Page 12: Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB  - Mahesh Chaudhari @ GraphConnect SF 2013

12

Creating Bipartite Graphs• Use two colors to color each node such that

no two adjacent nodes have the same color.

A E

CD

B

A

C

B

D

E

V1 V21 2

Page 13: Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB  - Mahesh Chaudhari @ GraphConnect SF 2013

13

Non-Bipartite Graphs

A E

CD

B

A

C

B

D

E

V1 V2

1 2

Page 14: Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB  - Mahesh Chaudhari @ GraphConnect SF 2013

14

Algorithm to generate Graph

• Create all the nodes• Create batches of

relationships among the same colored nodes

• Create batches of relationships across the two colors

A

C

B

D

E

V1 V2

Page 15: Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB  - Mahesh Chaudhari @ GraphConnect SF 2013

15

Algorithm in Z-Platform

• Batch of relationships R = {r1, r2, r3….. rn} :– each r is a triplet {src, dest, props} where src and dest are

nodes and props is a set of key-value pairs• Color the nodes based on each relationship with two

colors• Mark the conflicting edges where both the src and

dest nodes are of the same color• Batch these relationships together in a single batch• Start grouping the remaining edges such that no two

batches have any node in common

Page 16: Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB  - Mahesh Chaudhari @ GraphConnect SF 2013

16

Performance – Test Setup

• JDK 1.7• Neo4j Java Binding Rest API• Neo4j Enterprise Server 1.9• Batch size (configurable) : 2000• Test Program that generates random nodes

(max 1000) and relationships (max 10,000)• Huge file that contains 10,226 nodes and

39,564,960 relationships (5 GB)

Page 17: Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB  - Mahesh Chaudhari @ GraphConnect SF 2013

17

Performance – Creating Nodes

• 10,226 Nodes: 5.07 seconds• Average Time for 2000 Nodes: 0.99 seconds ~ 1 second• Each Node has 11 properties

1 2 3 4 5 60

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Time in seconds for Nodes

Time in Secs

Page 18: Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB  - Mahesh Chaudhari @ GraphConnect SF 2013

18

Performance – Creating Relationships

• 1,74,000 Relationships created in 47.16 seconds• Average Time for 2000 relationships: 0.54 seconds• Number of relationships per second: 3,689

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 850

0.2

0.4

0.6

0.8

1

1.2

1.4

Time in Seconds for relationships

Time in Secs

Page 19: Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB  - Mahesh Chaudhari @ GraphConnect SF 2013

19

Performance – Creating 39 Million Relationships

• 39,564,960 Relationships in : 10,573.56 seconds (2 hrs 56 mins 13 seconds)• Average Time for 2000 relationships: 0.53 seconds• Number of relationships per second: 3,741

Page 20: Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB  - Mahesh Chaudhari @ GraphConnect SF 2013

20

Graph Visualized in the Neo4j 2.0

Page 21: Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB  - Mahesh Chaudhari @ GraphConnect SF 2013

21

Future Work

• Test performance over the network using Amazon EC2 servers to mimic real world setup

• Single threaded application multi-threaded to see if better performance– More complex algorithm to batch relationships together– Analyze if the complexity is worth the performance improvement

• Vary multiple factors:– Batch size : 1000 to 4000– Properties (relationship descriptors) : 2 – 20

• Dispatcher Pattern to facilitate the single point distribution of nodes and relationships to threads/Transactions

Page 22: Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB  - Mahesh Chaudhari @ GraphConnect SF 2013

22

Conclusion

• Deadlocks in general are time consuming and difficult to detect and prevent

• Use of graph coloring to partition graph into conflicting and non-conflicting edges

• Successful prototype tests shows significant improvement in building relationships varying from small number to a very large number

Page 23: Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB  - Mahesh Chaudhari @ GraphConnect SF 2013

23

Dr. Mahesh ChaudhariSr. Software Engineer+1 602 524 [email protected]

[email protected]

Page 24: Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB  - Mahesh Chaudhari @ GraphConnect SF 2013

24

Contact Information

Zephyr Health Inc.

589 Howard St. 3rd Flr. San Francisco, California 94105+1.415.529.7649

zephyrhealthinc.com

Sven JunkergårdDirector of Technology+1 415 503 [email protected]

Brian RoyDirector of Platform Engineering & Architect+1 415 663 [email protected]