Upload
yu-liu
View
240
Download
0
Embed Size (px)
Citation preview
A Generate-Test-Aggregate Parallel Programming Library
Yu Liu1, Kento Emoto2, Zhenjiang Hu3
1The Graduate University for Advanced Studies 2The University of Tokyo 3National Institute of Informatics
PPoPP PMAM 2013
Systematic Parallel Programming for MapReduce
Outline
Introduction to GTA
The GTA library
Implementation strategy
Programming interface
Automatic parallelization and optimization
Applications and evaluations
Conclusions
Outline
Introduction to GTA
The GTA library
Implementation strategy
Programming interface
Automatic parallelization and optimization
Applications and evaluations
Conclusions
The GTA Programming Methodology
Simple programming pattern
1. Generate all possible solution candidates;
2. Test and filter candidates;
3. Aggregate the valid candidates.
Expressive and code efficient
Covers a large class of problems
Automatic optimization and parallelization
~ Kento Emoto, et.al., [ESOP’12]
An Example: The Knapsack Problem
Writing a parallel (MapReduce) program for the knapsack problem is not easy.
Picture from Wikipedia
input: [ (1 $, 2 Kg), (2 $, 6 Kg), (3 $, 10 Kg) ]
weight limitation =15
generate:
[ [ ], [ (1$, 2 Kg) ], [ (2$, 6 Kg) ], [ (3 $, 10 Kg) ], [(1$, 2 Kg) , (2$, 6 Kg) ], [1$, 2 Kg) , (3 $, 10 Kg) ], [(2$, 6 Kg) , (3 $, 10 Kg) ], [(1$, 2 Kg) , (2$, 6 Kg) , (3 $, 10 Kg) ] ]
test: [true, true, true, true, true, false, false]
filter: [ [ ], [ (1$, 2 Kg) ], [ (2$, 6 Kg) ], [ (3 $, 10 Kg) ],
[(1$, 2 Kg) , (2$, 6 Kg) ], [1$, 2 Kg) , (3 $, 10 Kg) ] ]
aggregate: 0$, 1$, 2 $, 3$, 3$, 4$
Naively implementing Knapsack is inefficient (O(2n)).
Input (length) Time (ms) 8 30
12 86
16 97
20 2829
24 java.lang.OutOfMemoryError: Java heap
space
performance of the naïve Knapsack program
The GTA fusion theorem is introduced for resolve efficiency problem
GTA Fusion
mapReduceable
predicates
generator
aggregator
map ( mapReduceable.f ) . reduce ( mapReduceable.combine )
MapReduce
Definitions of G,T,A
Class Name Algebraic Structure
Generator polymorphic semiring generator
Predicate almost list homomorphism
Aggregator semiring homomorphism
Ref: K.Emoto [ESOP’12]
Main Contributions
The implementation of a GTA library
A simple and statically typed GTA-DSL is implemented
Algebraic structures and computations/transformations of them are implemented
Evaluation of GTA methodology
Outline
GTA programming methodology
The GTA library
Implementation strategy
Programming interface
Automatic parallelization and optimization
Applications and evaluations
Conclusions
Object-oriented Functional Style
We defined the basic algebraic structures.
Relations/transformations of the algebras are well typed
Outline
GTA programming methodology
The GTA library
Implementation strategy
Programming interface
Automatic parallelization and optimization
Applications and evaluations
Conclusions
The users write GTA expressions like: generate(g:GEN) filter(t:Predicate)* aggregate(a:Aggregator)
G‧T‧A Programming DSL
GEN, Aggregator, Predicate are Scala traits defined in the GTA library
Outline
GTA programming methodology
The GTA library
Implementation strategy
Programming interface
Automatic parallelization and optimization
Applications and evaluations
Conclusions
GTA-fusion
G+A+T 𝑀𝑎𝑝𝑅𝑒𝑑𝑢𝑐𝑒𝑎𝑏𝑙𝑒[𝑓,⊕]
Input x1, x2, x3, … , xn
MAP
REDUCE
table1 tablen
f f f f
…
table1 tablen table2 ⊕ ⊕ ⊕ …
[EuroPar’11]
Implementation of GTA Fusion/Optimization
The main difficulties:
How to define a polymorphic generator
How to define a predicate for test
How to define intermediate data structures and other algebraic structures
Outline
GTA programming methodology
The GTA library
Implementation strategy
Programming interface
Automatic parallelization and optimization
Applications and evaluations
Conclusions
More Examples
More examples in the paper and source package:
Extended Knapsack problems
The maximum-segments-sum problem
Finding the most possible sequence (viterbi algorithm)
More information on: https://bitbucket.org/inii/gtalib
G‧T‧A Building Blocks
Our library provides commonly used G·T·A building blocks and users can also implement their own G,T,As.
Performance Evaluations
Evaluations on EdubaseCluster (Cloud)
– Up to 32 VM nodes, each has 3GB RAM, 1 single core CPU
– Executed on Spark – an in-memory MR cluster
Execution Time (Knapsack)
203.63
92.83 64.64 47.76 37.06 29.78 25.17 23.25
1727.973
679.305 637.33
471.2
362.36 287.08
234.25 223.44
0
200
400
600
800
1000
1200
1400
1600
1800
4 8 12 16 20 24 28 32
Tim
e (
seco
nd
)
Number of VM nodes
1.00E+07 items
1.00E+08 items
Linear Speedup
0
1
2
3
4
5
6
7
8
9
4 8 12 16 20 24 28 32
spe
ed
up
number of VM
Knapsack
ViterbiAlg
MSS
Outline
GTA programming methodology
The GTA library
Implementation strategy
Programming interface
Automatic parallelization and optimization
Applications and evaluations
Conclusions
Conclusions
We show GTA can be efficiently implemented
GTA-DSL can simplify parallel programming
Simple programming model
Good code efficiency
GTA-DSL is architecture independent
Future Works
Enrich the library by more building blocks in terms of G, T, A
GTA-DSL can be extended to processing more complex data structures such as tree/graph