Scala Parallel Collections Aleksandar Prokopec, Tiark Rompf Scala Team EPFL

Scala Parallel CollectionsAleksandar Prokopec, Tiark Rompf

Scala TeamEPFL

Introduction

• multi-core programming – not straightforward

• need better higher order abstractions

• libraries and tools have only begun using these new capabilites

• collections - everywhere

Scala Collection Framework

• most operations implemented in terms of an abstract method

def foreach[U](f: T => U): Unit

• new collections are created using builders

trait Builder[Elem, To]

Example

• the filter method:

def filter(p: A => Boolean): Repr = {

val b = newBuilder

for (x <- this) if (p(x)) b += x

b.result

}

List(1, 2, 3, 4, 5, 6, 7).filter(_ % 2 == 0)

1 2 3 4 5 6 7 Nil2 4 6

Nil

Builder

Parallel operations

• parallel traversal should be easy for some data structures

• could filter be parallelized by having a concurrent builder?

• 3 problems:– order may not be preserved anymore – sequences?– performance concerns– there are more complicated methods such as span

11-1 -599 6311 423 217

Method span

7 3 11 99 99 21 42 33-1 19 22 63 -5 11 -2 -7 1

prefixElems suffixElems

um... not a good idea

• assume an array (keep it simple) array.span(_ >= 0)

Method reduce

• span seems inherently sequential• we’ll get back to this, let’s try

something simpler – reduce

def reduce[U >: T](op: (U, U) => U): U

• takes an associative operator and applies it between all the elements (examples: adding, concatenation)

family to use Scala.Tell your friends and

Scala.and usefriends toyour familyTell

Method reduce

• assume associative operator is concatenation

val s = “Tell your friends and family to use Scala.”

s.split(“ ”).toArray.reduce(_ + _)

Tell your friends and family to use Scala.

+

Method reduce

• we might have more processors

• this is a well known pattern from parallel programming

• but, we need a right abstraction

1 2 3 4 5 6 7 8

3 7 11 15

10 26

36

+ + +

+

+

+

+

Method split

• we can implement methods such as reduce, foreach, count, find and forall assuming we can divide the collection

• new abstract operation

def split: Seq[Repr]

• returns a non-trivial partition of the collection

Method split

def split: Seq[Repr]

• how to implement?

– copy elements– produce a wrapper– use data structure properties (e.g. tree)

Method filter

• this abstract method can be used to implement accessor methods

• for transformer methods such as filter this is not sufficient – collection results should be merged

1, 2, 3, 4 5, 6, 7, 82, 4 6, 8

2, 4, 6, 8

3, 1, 8, 0 2, 2, 1, 98, 0 2, 2

8, 0, 2, 2

2, 4, 6, 8, 8, 0, 2, 2

Method combine

• we need another abstraction

def combine[Other >: Repr]

(that: Other): Other

• creates a collection that contains all the elements of this collection and that collection

Method combine

def combine[Other >: Repr]

(that: Other): Other

• how to implement?– copy elements– use lazy evaluation to copy twice– use specialized data structures

Lazy collection evaluation

• merge occurs more than once• each processor adds results to its

own builder• evaluation occurs in the root

1, 2, 3, 4 5, 6, 7, 82, 4 6, 8 3, 1, 8, 0 2, 2, 1, 98, 0 2, 2

merge merge

mergecopy

allocate

2 4 6 8 8 0 2 2

Lazy collection evaluation

• advantages:– easier to apply to existing collections– for certain data structures copying is

cheap (arrays)– merging is very cheap

• disadvantages:– copying occurs twice – affects cheap

operations– garbage collection occurs more often

Specialized data structures

• some data structures such can be merged efficiently (trees, heaps, skiplists…)

• immutable vectors – immutable sequences with efficient splitting and concatenation

Method span

• each processors keeps 2 builders• merge has 2 cases

– counterexample in the left partition– no counterexample in the left partition

3 9 -1 2 4 -5 7 3 2 4 -7 2

39 -1

24

-5

24

-5-1

73

22

-74

24

73

1 2 3 . . . 750 751 752 753 754 755

Load balancing

• processor availability and data processing cost may not be uniform

• fine grained division – more tasks than processors

Done!

Work-stealing

• need to schedule tasks to processors – work stealing

• each processor has a task queue• when it runs out of tasks – it steals

from other queues

proc 1 proc 2

steal!

Adaptive work-stealing

• still, a large number of tasks can lead to an overhead

adaptive partitioning

Adaptive work-stealing

• ensures better load balancing

proc 1 proc 2

steal!

Package hierarchy

• subpackage of collection package

collection

mutable immutable parallel

mutable immutable

Class hierarchy

• consistent with existing collections• clients can refer to parallel

collections transparently

Iterable

Map Seq Set ParallelIterable

ParallelMap ParallelSeq ParallelSet

How to use

• be aware of side-effects

var k = 0

array.foreach(k += _)

• parallel collections are not concurrent collections

• careful with small collections – cost of setup may be higher

How to use

• parallel ranges – a way to parallelize for-loops

for (i <- (0 until 1000).par) yield {

var num = i

var lst: List[Int] = Nil

while (num > 0) {

lst ::= num % 2

num = num / 2

}

lst

}

Benchmarks

• microbenchmarks with low cost per-element operations

foreach 1 2 4 6 8

Sequential 1227 1227 1227 1227 1227

ParallelArray 1180 797 529 449 421

Extra166 1195 757 544 442 403

reduce 1 2 4 6 8

Sequential 949 949 949 949 949

ParallelArray 832 551 375 328 297

Extra166 890 566 363 300 282

Benchmarks

• microbenchmarks with low cost per-element operations

filter 1 2 4 6 8

Sequential 611 611 611 611 611

ParallelArray 476 333 235 216 208

Extra166 581 372 296 280 264

find 1 2 4 6 8

Sequential 1181 1181 1181 1181 1181

ParallelArray 961 608 410 331 300

Extra166 841 602 393 309 294

Current state

• an array - ParallelArray• ranges - ParallelRange• views - ParallelView• working on – ParallelVector and ParallelHashMap

Conclusion

• good performance results• nice integration with existing collections• more parallel collections worked on• will be integrated into Scala 2.8.1

Documents

Scala Parallel Collections Aleksandar Prokopec, Tiark Rompf Scala Team EPFL