36
Heap Shape Scalability Scalable Garbage Collection on Highly Parallel Platforms Kathy Barabash, Erez Petrank Computer Science Department Technion, Israel

Heap Shape Scalability Scalable Garbage Collection on Highly Parallel Platforms

  • Upload
    corine

  • View
    42

  • Download
    0

Embed Size (px)

DESCRIPTION

Heap Shape Scalability Scalable Garbage Collection on Highly Parallel Platforms. Kathy Barabash, Erez Petrank Computer Science Department Technion, Israel. Outline. Is tracing GC ready for the many-core? How the heap shape is related? Evaluating the heap shape scalability - PowerPoint PPT Presentation

Citation preview

Page 1: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

Heap Shape ScalabilityScalable Garbage Collection on Highly Parallel PlatformsKathy Barabash, Erez Petrank Computer Science Department Technion, Israel

Page 2: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 2

Outline Is tracing GC ready for the many-core?

How the heap shape is related?

Evaluating the heap shape scalability Idealized Trace Utilization

Improving the heap shape scalability Solution 1: Reshaping with Shortcut References Solution 2: Tracing with Speculative Roots

Related work & conclusion

Page 3: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 3

Is Tracing GC Ready for Many-core ?

a

Heap

he

b d

g

c

j

f i

k

l

m

Roots

GC tracing Traverse lots of objects

Sequential trace Each live object is

touched (BFS, DFS)

Parallel trace Load balancing 1K cores really soon

Page 4: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 4

Can Heaps Spoil the Scalability?

Heap

1

2

Roots

3

4M live objects Single linked list

Sequential trace 4M steps

Parallel trace Not any faster

4K

4M

Page 5: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 5

Deep Object Graphs Can be Evil Object Depth

Length of the minimal path from some root object

Object-Graph Depth Maximal live object depth

Heap

0

1

2

3

Object DepthsExample:

Definition:

How deep are object graphs of Java programs? SpecJVM, Dacapo, SpecJBB

Instrumented BFS trace

Page 6: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 6

Name DescriptionHeap Size

(MB)GC

CyclesMax

DepthSpecJVM

javac Java compiler run 3 times 32 15 1,234

mtrt 3D raytracer 32 8 1,416

Dacapo

bloat Java byte code analyzer 48 344 1,195

pmd Java code analyzer 48 59 18,482

xalan Transforms XML into HTML

128 129 8,476

Other 15 benchmarks 128

Object-Graph Depths of Java Benchmarks

Page 7: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 7

Name DescriptionHeap Size

(MB)GC

CyclesMax

DepthSpecJVM

javac Java compiler run 3 times 32 15 1,234

mtrt 3D raytracer 32 8 1,416

Dacapo

bloat Java byte code analyzer 48 344 1,195

pmd Java code analyzer 48 59 18,482

xalan Transforms XML into HTML

128 129 8,476

Other 15 benchmarks 128

Object-Graph Depths of Java Benchmarks

Page 8: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 8

Object-Graph Depths of Java BenchmarksName Description

Heap Size (MB)

GC Cycles

Max Depth

SpecJVM

javac Java compiler run 3 times 32 15 1,234

mtrt 3D raytracer 32 8 1,416

Dacapo

bloat Java byte code analyzer 48 344 1,195

pmd Java code analyzer 48 59 18,482

xalan Transforms XML into HTML

128 129 8,476

Other 15 benchmarks 128

Page 9: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 9

Not all Deep Object Graphs are Evil

Heap

1

2

Roots

3

4K

Object-graph 1K same sized linked lists

of 4K objects

Sequential trace 4M steps

Parallel trace Scales well for up to 1K

processors

4K 4K

Page 10: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 10

Definition:

Deep and Narrow Object Graphs are Evil

Object Depths Distribution Amount of objects at different depths

Example:

Heap

2

4

3

1

1

#objectsGraphical Representation (Object-graph shape):

0

1

2

3

4

5

1 2 3 4 5

depth

# ob

ject

s

Page 11: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 11

Object-Graph Shapes of Java Benchmarksjython

# ob

ject

s

depth

depth

xalan

# ob

ject

s

Page 12: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 12

Object-Graph Shapes of Java Benchmarks

bloat

javac

mtrt

xalan

pmd

db

hsqldb

antlr

jython

jess

jack

lusearch

depth (log 10) depth (log 10)

# ob

ject

s (lo

g 10

)

Page 13: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 13

The Idealized Trace UtilizationSimulate the idealized traversal by N threads

Perfect load balancing Perfect cache behavior

BFS traversal Single time tick object scan

During the traversal, count Objects available to be scanned at every time tick Processor slots: some are busy and some are wasted

At the end, report the utilization (ITU)Total Scanned ObjectsTotal Processor Slots

* 100%

Page 14: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 14

Idealized Trace Utilization Example

Heap objects

Time ticks

Scanned objects

8

15

Total Scanned Objects

Total Processor Slots* 100%ITU = = 15

8*4* 100% = 47 %

4 Tracers

1

2

2

5

3

9

4

11

5

12

6

13

7

14

Core 1Core 2Core 3

Core 4

Page 15: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 15

Graphical Representation

1. Simulate and compute 2. Draw the graph

depth

# ob

ject

s

0

20

40

60

80

100

1 2 4 8

Processors

Util

izat

ion

Page 16: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 16

Worst Case ITU for Java Benchmarks

0

20

40

60

80

100

1 2 4 8 16 32 64 128 256 512 1024

Processors

Utiliz

atio

n

check

compress

db

jack

javacjess

mpegaudio

mtrtantlr

bloat hsqldb

jython

lusearch

pmdxalan

Page 17: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 17

0

20

40

60

80

100

1 2 4 8 16 32 64 128 256 512 1024

Processors

Util

izat

ion

check

compress

db

jack

javac

jess

mpegaudio

mtrtantlr

bloat hsqldb

jython

lusearch

pmd

xalan

Average ITU for Java Benchmarks

Page 18: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 18

What’s Next?

Problematic heaps exist javac, mtrt, pmd, bloat, xalan

Can we improve the trace scalability without modifying the benchmarks?

Reshape with Shortcut References

Trace with Speculative Roots

Page 19: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 19

Reshape with Shortcut References

Heap

1

2

Roots

3

4

Sequential trace 16K steps

New references are added Invisible to the

program Useful for the

tracers

Parallel trace Scales for 4

processors4K

16K

Page 20: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 20

Evaluation Prototype Devise a shortcut strategy

Where shortcuts are needed

When the program is stopped for GC Compute the Idealized Trace Utilization Run the shortcuts adding algorithm Compute the ITU for the modified heap

Report ITU improvement Amount of shortcuts added

Page 21: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 21

Shortcut Strategy and Parameters Identify candidate subgraphs

With at least size objects With depth-to-size ratio no less than ratio

Add shortcut to the root of the subgraph Leading to the objects length pointers away Next shortcut introduced not closer than distance

pointers away

1 65432 987

Distance (2) Length (4)

Size=5

Depth=4

Ratio=0.8

Page 22: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 22

Results for SpecJVM mtrt

0

20

40

60

80

100

1 2 4 8 16 32 64 128 256 512 1024

Processors

Util

izat

ion

Worst before Worst after Avg before Avg after

~ 500K of live objects

Max shortcuts – 110 Avg shortcuts – 94

Size=50

Ratio=0.2Length=50

Distance=25

Page 23: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 23

Results for DaCapo xalan

~ 400K of live objects

Max shortcuts – 888 Avg shortcuts – 536

Size=50

Ratio=0.2Length=50

Distance=25

0

20

40

60

80

100

1 2 4 8 16 32 64 128 256 512 1024

Processors

Utilizat

ion

Worst before Worst after Avg before Avg after

Page 24: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 24

Results for DaCapo bloat

~ 400K of live objects

Max shortcuts – 940 Avg shortcuts – 378

Size=50

Ratio=0.2Length=50

Distance=25

0

20

40

60

80

100

1 2 4 8 16 32 64 128 256 512 1024

Processors

Util

izat

ion

Worst before Worst after Avg before Avg after

Page 25: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 25

Results for DaCapo pmd

~ 434K of live objects

Max shortcuts – 5,874 Avg shortcuts – 432

Size=600

Ratio=0.1Length=120

Distance=40

0

20

40

60

80

100

1 2 4 8 16 32 64 128 256 512 1024

Processors

Utilizat

ion

Worst before Worst after Avg before Avg after

Page 26: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 26

Results for SpecJVM javac

~ 383K of live objects

Max shortcuts – 292 Avg shortcuts – 16

Size=500

Ratio=0.1Length=100

Distance=50

0

20

40

60

80

100

1 2 4 8 16 32 64 128 256 512 1024

Processors

Util

izat

ion

Worst before Worst after Avg before Avg after

Page 27: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 27

Trace with Speculative Roots

Heap

Roots

4K

4M

Sequential trace 16M steps

Helper tracers Pick random roots Trace using custom

colors

Parallel trace Scales for 4

processors

Page 28: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 28

Speculative Trace Helper tracer

Pick up the root Pick up the color, e.g. red Trace; if blue object is discovered, mark blue as

reachable from red

Regular trace Trace from root; if blue object is discovered, mark blue

as live

Complete trace All colors reachable from live colors marked live All objects marked by live colors survive the collection

Page 29: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 29

Evaluation Prototype

Useful helpers work Live objects colored by live colors

Wasted helpers work Dead objects colored by dead

colors

Floating garbage Dead objects colored by live colors

a

Heap

he

b d

g

c

j

f i

k

l

m

4 regular tracers, 4 helper tracers Speculative roots – random unmarked objects ITU before and after the colored trace

Page 30: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 30

Limit the floating garbage Maximal amount of objects colored by a

single color Helpers must save discovered but not traced objects Trace completion phase takes care of the saved fronts

Make the random roots choices smarter To avoid choosing dead objects To reach deeper parts of the live object graph

Filter for the recursive objects Objects with referents of their own type

Page 31: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 31

Results Lots of floating garbage

Even with the filter

Hard to find good roots Progressively harder as the live objects are getting

marked

Trace completion phase is complex Can defeat the purpose

Modest improvement in the Idealized Trace Utilization scores

Page 32: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 32

Results for DaCapo xalanWorst case ITU improvement, with the random choices filter

0

20

40

60

80

100

1 2 4 8 16 32 64 128 256 512 1024

Processors

Util

izat

ion

BeforeAfter

Page 33: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 33

Results for DaCapo bloatWorst case ITU improvement, with the random choices filter

0

20

40

60

80

100

1 2 4 8 16 32 64 128 256 512 1024

Processors

Utiliz

atio

n

BeforeAfter

Page 34: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 34

Related Work Parallel Garbage Collection Folklore

There are heap structures that can foil any clever load balancing scheme

Siebert (ISMM’08) Reported object graph depths for SpecJVM

benchmarks Proposed upper bound on the worst case

scalability as a way to compute RT guarantees for the GC tracing

Random tracing originally proposed by Click

Page 35: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 35

Summary Studied the heap shape properties of Java

benchmarks Out of twenty considered benchmarks, five had not

scalable heap shapes during the run

Devised a measure to quantify the heap shape scalability Idealized Trace Utilization

Proposed, prototyped and evaluated two approaches to improve the tracing scalability Reshaping with Shortcuts appears to be more

promising than Tracing from Speculative Roots

Page 36: Heap Shape Scalability Scalable Garbage Collection on  Highly Parallel Platforms

ISMM 2010 36

Thank You!