View
82
Download
0
Category
Preview:
Citation preview
Budapest University of Technology and EconomicsDepartment of Measurement and Information Systems
Budapest University of Technology and EconomicsFault Tolerant Systems Research Group
Sharded Joins for Scalable Incremental Graph Queries
János Maginecz, Gábor Szárnyas
Agile Model-Driven Development
Modeling
Codegeneration
Testing
Early validationsTransformations
Scalabilitychallenges
Performance issues
Agile Model-Driven Development
Modeling
Codegeneration
Testing
Early validationsTransformations
Scalabilitychallenges
Performance issues
Agile Model-Driven Development
Modeling
Codegeneration
Testing
Early validationsTransformations
Scalabilitychallenges
MDD
Scalability
Incrementality
Incremental queries
Incremental transformation
Storing partialresults
Trackingchanges
Motivating Example
Error pattern for an AUTOSAR validation constraint
Communicationchannel
Logical signal Mapping Physical signal
Motivating Example
Error pattern for an AUTOSAR validation constraint
Communicationchannel
Logical signal Mapping Physical signal
Validation
Motivating Example
Error pattern for an AUTOSAR validation constraint
Communicationchannel
Logical signal Mapping Physical signal
Invalid submodel
Validation
Motivating Example
Error pattern for an AUTOSAR validation constraint
Communicationchannel
Logical signal Mapping Physical signal
Invalid submodel
Validation
Motivating Example
Error pattern for an AUTOSAR validation constraint
Communicationchannel
Logical signal Mapping Physical signal
Invalid submodel
Validation
Valid submodel
Antijoin
Join
Join
Fill indexer nodes
Rete Algorithm
Communication channel
Logical signal Mapping Physical signal
Antijoin
Join
Join
Fill indexer nodes
Rete Algorithm
Communication channel
Logical signal Mapping Physical signal
Antijoin
Join
Join
Fill indexer nodes
Rete Algorithm
Communication channel
Logical signal Mapping Physical signal
Antijoin
Join
Join
Fill indexer nodes
Rete Algorithm
Communication channel
Logical signal Mapping Physical signal
Antijoin
Join
Join
Fill indexer nodes
Rete Algorithm
Communication channel
Logical signal Mapping Physical signal
Antijoin
Join
Join
Fill indexer nodesStore interim results
Rete Algorithm
Communication channel
Logical signal Mapping Physical signal
Antijoin
Join
Join
Fill indexer nodesStore interim results
Rete Algorithm
Communication channel
Logical signal Mapping Physical signal
Antijoin
Join
Join
Fill indexer nodesStore interim results
Rete Algorithm
Communication channel
Logical signal Mapping Physical signal
Antijoin
Join
Join
Fill indexer nodesStore interim results
Rete Algorithm
Communication channel
Logical signal Mapping Physical signal
Antijoin
Join
Join
Fill indexer nodesStore interim results
Rete Algorithm
Communication channel
Logical signal Mapping Physical signal
Antijoin
Join
Join
Fill indexer nodesStore interim results
Rete Algorithm
Communication channel
Logical signal Mapping Physical signal
Antijoin
Join
Join
Fill indexer nodesStore interim results
Rete Algorithm
Communication channel
Logical signal Mapping Physical signal
Antijoin
Join
Join
Fill indexer nodesStore interim resultsRead result set
Rete Algorithm
Communication channel
Logical signal Mapping Physical signal
Result set
Antijoin
Join
Join
Fill indexer nodesStore interim resultsRead result set
Rete Algorithm
Communication channel
Logical signal Mapping Physical signal
Antijoin
Join
Join
Fill indexer nodesStore interim resultsRead result setEdit model
Rete Algorithm
Communication channel
Logical signal Mapping Physical signal
Antijoin
Join
Join
Fill indexer nodesStore interim resultsRead result setEdit modelPropagating changes
Rete Algorithm
Communication channel
Logical signal Mapping Physical signal
Antijoin
Join
Join
Fill indexer nodesStore interim resultsRead result setEdit modelPropagating changes
Rete Algorithm
Communication channel
Logical signal Mapping Physical signal
Antijoin
Join
Join
Fill indexer nodesStore interim resultsRead result setEdit modelPropagating changesRead result set
Rete Algorithm
Communication channel
Logical signal Mapping Physical signal
Result set
Single Workstation Rete Implementation
Rete-based incremental graph query engine
Open-source Eclipse project
Java Virtual Machine limitations
o Cannot handle 15+ GB heap memory efficiently
Proposed solution
o Horizontal scaling: distributed system
IncQuery-D Architecture
Transaction
In-memory EMF model
Rete net
Indexer layer
EMF-INCQUERY
In-memory storage
IncQuery-D Architecture
Transaction
In-memory EMF model
Rete net
Indexer layer
EMF-INCQUERY
Indexing
In-memory storage
IncQuery-D Architecture
Transaction
In-memory EMF model
Rete net
Indexer layer
EMF-INCQUERY
Indexing
In-memory storage
Production network• Stores intermediate query results• Propagates changes
IncQuery-D Architecture
Server 1
Databaseshard 1
Server 2
Databaseshard 2
Server 3
Databaseshard 3
Transaction
Databaseshard 0
Server 0
Rete net
Indexer layer
INCQUERY-D
IncQuery-D Architecture
Server 1
Databaseshard 1
Server 2
Databaseshard 2
Server 3
Databaseshard 3
Transaction
Databaseshard 0
Server 0
Rete net
INCQUERY-D
IncQuery-D Architecture
Server 1
Databaseshard 1
Server 2
Databaseshard 2
Server 3
Databaseshard 3
Transaction
Databaseshard 0
Server 0
Rete net
INCQUERY-D
Distributed indexer Model access adapter
IncQuery-D Architecture
Server 1
Databaseshard 1
Server 2
Databaseshard 2
Server 3
Databaseshard 3
Transaction
Databaseshard 0
Server 0
INCQUERY-D
Distributed query evaluation network
Distributed indexer Model access adapter
IncQuery-D Architecture
Server 1
Databaseshard 1
Server 2
Databaseshard 2
Server 3
Databaseshard 3
Transaction
Databaseshard 0
Server 0
INCQUERY-D
Distributed query evaluation network
Distributed indexer Model access adapter
Distributed persistent storage
IncQuery-D Architecture
Server 1
Databaseshard 1
Server 2
Databaseshard 2
Server 3
Databaseshard 3
Transaction
Databaseshard 0
Server 0
INCQUERY-D
Distributed query evaluation network
Distributed indexer Model access adapter
Distributed indexing, notification
Distributed persistent storage
IncQuery-D Architecture
Server 1
Databaseshard 1
Server 2
Databaseshard 2
Server 3
Databaseshard 3
Transaction
Databaseshard 0
Server 0
INCQUERY-D
Distributed query evaluation network
Distributed indexer Model access adapter
Distributed indexing, notification
Distributed persistent storage
Distributed production network• Each intermediate node can be allocated
to a different host• Remote internode communication
IncQuery-D Architecture
Server 1
Databaseshard 1
Server 2
Databaseshard 2
Server 3
Databaseshard 3
Transaction
Databaseshard 0
Server 0
INCQUERY-D
Distributed query evaluation network
Distributed indexer Model access adapter
IncQuery-D Architecture
Server 1
Databaseshard 1
Server 2
Databaseshard 2
Server 3
Databaseshard 3
Transaction
Databaseshard 0
Server 0
INCQUERY-D
Distributed query evaluation network
Indexer Indexer Indexer Indexer
Join
Join
Antijoin
Working around Memory Limits
Host-2
Host-1
Input
Node A
Node B
Distributed
Output
Host-1
Input
Node A
Node B
Local
Output
Solution 1
Simple and efficientMemory of the machine is an upper bound for the network
Nodes run on different computersThe memory of each node is limited to the assigned machine
+
−
+
−
Working around Memory Limits
Host-2
Host-1
Input
Node A
Node B
Distributed
Output
Host-1
Input
Node A
Node B
Local
Output
Solution 1
EMF-IncQuery IncQuery-D
Simple and efficientMemory of the machine is an upper bound for the network
Nodes run on different computersThe memory of each node is limited to the assigned machine
+
−
+
−
Host-3Host-1
Host-2
Working around Memory LimitsDistributed
+Sharded
Input
Node A
Node B
Output
Solution 2
Host-2
Host-1
Input
Node A
Node B
Distributed
Output
Nodes may be allocated on more than 1 computerNetwork overhead
+
−
Nodes run on different computersThe memory of each node is limited to the assigned machine
+
−
Host-3Host-1
Host-2
Working around Memory LimitsDistributed
+Sharded
Input
Node A
Node B
Output
Solution 2
IncQuery-DS
Host-2
Host-1
Input
Node A
Node B
Distributed
Output
IncQuery-D
Nodes may be allocated on more than 1 computerNetwork overhead
+
−
Nodes run on different computersThe memory of each node is limited to the assigned machine
+
−
Í
Join
Antijoin
Join / Shard 2Join / Shard 1
Sharded Rete Algorithm
Communication channel
Logical signal Mapping Physical signal
Í
Join
Antijoin
Join / Shard 2Join / Shard 1
Sharded Rete Algorithm
Communication channel
Logical signal Mapping Physical signal
Í
Join
Antijoin
Join / Shard 2Join / Shard 1
Sharded Rete Algorithm
Communication channel
Logical signal Mapping Physical signal
Í
Join
Antijoin
Join / Shard 2Join / Shard 1
Sharded Rete Algorithm
Communication channel
Logical signal Mapping Physical signal
Í
Join
Antijoin
Join / Shard 2Join / Shard 1
Sharded Rete Algorithm
Communication channel
Logical signal Mapping Physical signal
Í
Join
Antijoin
Join / Shard 2Join / Shard 1
Sharded Rete Algorithm
Communication channel
Logical signal Mapping Physical signal
IncQuery-DSDistributed and Sharded
Validation of Critical Systems
Model validation for large models
Well-formedness contraints with complex graph patterns
Train Benchmark
o Open-source performance measurement framework
o Presented yesterday
Train Benchmark
Phases
o Initial read and validation
o Small changes and revalidation
• Simulating modifications from a user
Goal: Measure response times
Execution timeExecution time
Read Transformation RevalidationValidation
× 10× 3
Join Optimization
Hash join
o Using hash maps
Sort merge join
o Using red-black trees
Collection frameworks
o Standard library in Scala
o Goldman Sachs Collections
Summary Designed a sharded Rete engine
Evaluated its scalability
Analysis of join algorithms and collection frameworks
Future work
o Domains with similar challenges
Recommended