Upload
ashlynn-powers
View
228
Download
2
Embed Size (px)
Citation preview
Scala Parallel CollectionsAleksandar Prokopec, Tiark Rompf
Scala TeamEPFL
Introduction
• multi-core programming – not straightforward
• need better higher order abstractions
• libraries and tools have only begun using these new capabilites
• collections - everywhere
Scala Collection Framework
• most operations implemented in terms of an abstract method
def foreach[U](f: T => U): Unit
• new collections are created using builders
trait Builder[Elem, To]
Example
• the filter method:
def filter(p: A => Boolean): Repr = {
val b = newBuilder
for (x <- this) if (p(x)) b += x
b.result
}
List(1, 2, 3, 4, 5, 6, 7).filter(_ % 2 == 0)
1 2 3 4 5 6 7 Nil2 4 6
Nil
Builder
Parallel operations
• parallel traversal should be easy for some data structures
• could filter be parallelized by having a concurrent builder?
• 3 problems:– order may not be preserved anymore – sequences?– performance concerns– there are more complicated methods such as span
11-1 -599 6311 423 217
Method span
7 3 11 99 99 21 42 33-1 19 22 63 -5 11 -2 -7 1
prefixElems suffixElems
um... not a good idea
• assume an array (keep it simple) array.span(_ >= 0)
Method reduce
• span seems inherently sequential• we’ll get back to this, let’s try
something simpler – reduce
def reduce[U >: T](op: (U, U) => U): U
• takes an associative operator and applies it between all the elements (examples: adding, concatenation)
family to use Scala.Tell your friends and
Scala.and usefriends toyour familyTell
Method reduce
• assume associative operator is concatenation
val s = “Tell your friends and family to use Scala.”
s.split(“ ”).toArray.reduce(_ + _)
Tell your friends and family to use Scala.
+
Method reduce
• we might have more processors
• this is a well known pattern from parallel programming
• but, we need a right abstraction
1 2 3 4 5 6 7 8
3 7 11 15
10 26
36
+ + +
+
+
+
+
Method split
• we can implement methods such as reduce, foreach, count, find and forall assuming we can divide the collection
• new abstract operation
def split: Seq[Repr]
• returns a non-trivial partition of the collection
Method split
def split: Seq[Repr]
• how to implement?
– copy elements– produce a wrapper– use data structure properties (e.g. tree)
Method filter
• this abstract method can be used to implement accessor methods
• for transformer methods such as filter this is not sufficient – collection results should be merged
1, 2, 3, 4 5, 6, 7, 82, 4 6, 8
2, 4, 6, 8
3, 1, 8, 0 2, 2, 1, 98, 0 2, 2
8, 0, 2, 2
2, 4, 6, 8, 8, 0, 2, 2
Method combine
• we need another abstraction
def combine[Other >: Repr]
(that: Other): Other
• creates a collection that contains all the elements of this collection and that collection
Method combine
def combine[Other >: Repr]
(that: Other): Other
• how to implement?– copy elements– use lazy evaluation to copy twice– use specialized data structures
Lazy collection evaluation
• merge occurs more than once• each processor adds results to its
own builder• evaluation occurs in the root
1, 2, 3, 4 5, 6, 7, 82, 4 6, 8 3, 1, 8, 0 2, 2, 1, 98, 0 2, 2
merge merge
mergecopy
allocate
2 4 6 8 8 0 2 2
Lazy collection evaluation
• advantages:– easier to apply to existing collections– for certain data structures copying is
cheap (arrays)– merging is very cheap
• disadvantages:– copying occurs twice – affects cheap
operations– garbage collection occurs more often
Specialized data structures
• some data structures such can be merged efficiently (trees, heaps, skiplists…)
• immutable vectors – immutable sequences with efficient splitting and concatenation
Method span
• each processors keeps 2 builders• merge has 2 cases
– counterexample in the left partition– no counterexample in the left partition
3 9 -1 2 4 -5 7 3 2 4 -7 2
39 -1
24
-5
24
-5-1
73
22
-74
24
73
1 2 3 . . . 750 751 752 753 754 755
Load balancing
• processor availability and data processing cost may not be uniform
• fine grained division – more tasks than processors
Done!
Work-stealing
• need to schedule tasks to processors – work stealing
• each processor has a task queue• when it runs out of tasks – it steals
from other queues
proc 1 proc 2
steal!
Adaptive work-stealing
• still, a large number of tasks can lead to an overhead
adaptive partitioning
Adaptive work-stealing
• ensures better load balancing
proc 1 proc 2
steal!
Package hierarchy
• subpackage of collection package
collection
mutable immutable parallel
mutable immutable
Class hierarchy
• consistent with existing collections• clients can refer to parallel
collections transparently
Iterable
Map Seq Set ParallelIterable
ParallelMap ParallelSeq ParallelSet
How to use
• be aware of side-effects
var k = 0
array.foreach(k += _)
• parallel collections are not concurrent collections
• careful with small collections – cost of setup may be higher
How to use
• parallel ranges – a way to parallelize for-loops
for (i <- (0 until 1000).par) yield {
var num = i
var lst: List[Int] = Nil
while (num > 0) {
lst ::= num % 2
num = num / 2
}
lst
}
Benchmarks
• microbenchmarks with low cost per-element operations
foreach 1 2 4 6 8
Sequential 1227 1227 1227 1227 1227
ParallelArray 1180 797 529 449 421
Extra166 1195 757 544 442 403
reduce 1 2 4 6 8
Sequential 949 949 949 949 949
ParallelArray 832 551 375 328 297
Extra166 890 566 363 300 282
Benchmarks
• microbenchmarks with low cost per-element operations
filter 1 2 4 6 8
Sequential 611 611 611 611 611
ParallelArray 476 333 235 216 208
Extra166 581 372 296 280 264
find 1 2 4 6 8
Sequential 1181 1181 1181 1181 1181
ParallelArray 961 608 410 331 300
Extra166 841 602 393 309 294
Current state
• an array - ParallelArray• ranges - ParallelRange• views - ParallelView• working on – ParallelVector and ParallelHashMap
Conclusion
• good performance results• nice integration with existing collections• more parallel collections worked on• will be integrated into Scala 2.8.1