208
Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4 Parallel worlds of CRuby's GC @nari_en Network Applied Communication Laboratory Ltd.

Parallel worlds of CRuby's GC

Embed Size (px)

DESCRIPTION

For RubyConf Argentina 2011

Citation preview

Page 1: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Parallel worlds of CRuby's GC

@nari_enNetwork Applied Communication

Laboratory Ltd.

Page 2: Parallel worlds of CRuby's GC

Argentina is great!!

Page 3: Parallel worlds of CRuby's GC

http://www.flickr.com/photos/khaase/6323999001/in/set-72157628079773218http://www.flickr.com/photos/khaase/6323999001/in/set-72157628079773218

Soccer

Page 4: Parallel worlds of CRuby's GC

http://www.flickr.com/photos/khaase/6323997953/in/set-72157628079773218http://www.flickr.com/photos/khaase/6323997953/in/set-72157628079773218

"We are Rubyist"

Page 5: Parallel worlds of CRuby's GC

Where are you from?

Page 6: Parallel worlds of CRuby's GC

Page 7: Parallel worlds of CRuby's GC
Page 8: Parallel worlds of CRuby's GC
Page 9: Parallel worlds of CRuby's GC
Page 10: Parallel worlds of CRuby's GC

Self introduction

Page 11: Parallel worlds of CRuby's GC
Page 12: Parallel worlds of CRuby's GC
Page 13: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Ice-cream factory

I worked in an assembly line✓

For example, I made many cardboard boxes.

I was a professional cardboard box maker :)

12/207

Page 14: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Ice-cream factory

I made 150 boxes per hour(ZOMG)

13/207

Page 15: Parallel worlds of CRuby's GC

http://www.flickr.com/photos/kevincollins123/5887984753/http://www.flickr.com/photos/kevincollins123/5887984753/

I was like a machine!!

Page 16: Parallel worlds of CRuby's GC
Page 17: Parallel worlds of CRuby's GC
Page 18: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Working with Java

I worked in a big company.✓

This work was similar to assembly line work..

I made a part of a product. I didn't understand whole product.

17/207

Page 19: Parallel worlds of CRuby's GC

http://www.flickr.com/photos/kevincollins123/5887984753/http://www.flickr.com/photos/kevincollins123/5887984753/

I was still like a machine!!

Page 20: Parallel worlds of CRuby's GC
Page 21: Parallel worlds of CRuby's GC
Page 22: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

My current work

Currently, I work at NaCl.✓

Matz is my co-worker.✓

Shugo maeda is my boss.They are CRuby committers.✓

21/207

Page 23: Parallel worlds of CRuby's GC

When I started Ruby programming,

Page 24: Parallel worlds of CRuby's GC

I felt free.

Page 25: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

In Ruby's works.

Ruby's work wasn't similar to assembly line work.

I could make the whole product.✓

24/207

Page 26: Parallel worlds of CRuby's GC

http://www.flickr.com/photos/danzden/121379782/http://www.flickr.com/photos/danzden/121379782/

I was no longera machine!!

Page 27: Parallel worlds of CRuby's GC
Page 28: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Garbage Collection for me

GC technology is very interesting for me.

GC is a garbage collecting machine.

I've been creating it since then. It's very fun!!

27/207

Page 29: Parallel worlds of CRuby's GC

I'm making a machine!!

Page 30: Parallel worlds of CRuby's GC

My relationship to GC

Page 31: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

I'm a CRuby Committer

I work on GC.✓

30/207

Page 32: Parallel worlds of CRuby's GC

And, I wrote abook about GC.

Page 33: Parallel worlds of CRuby's GC

But, it's only in Japanese :(

Page 34: Parallel worlds of CRuby's GC

And, I've been creating GC with RDD.

Page 35: Parallel worlds of CRuby's GC

What is RDD?

Page 36: Parallel worlds of CRuby's GC

RDD = RubyKaigi Driven Development

Page 37: Parallel worlds of CRuby's GC

http://www.flickr.com/photos/recompile_net/4935820587/http://www.flickr.com/photos/recompile_net/4935820587/

Page 38: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

My RDD history

LazySweepGC - RubyKaigi2008✓

LonglifeGC - 2009✓

LazySweepGC - 2010✓

ParallelMarkingGC - 2011✓

37/207

Page 39: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

My RDD history

LazySweepGC - RubyKaigi2008✓

LonglifeGC - 2009✓

LazySweepGC - 2010✓

ParallelMarkingGC - 2011✓

38/207

Page 40: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

LazySweepGC

Traditional M&S GC executes mark and sweep atomically.

Ruby application stops during GC (stop-the-world).

In Lazy sweeping, sweeping is lazy.

39/207

Page 41: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

LazySweepGC

Each invocation of the object allocation sweeps Ruby's heap

until it finds an appropriate free object.✓

40/207

Page 42: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Improvements

This improves the response time of GC

I.e. the worst case time of GC decreases.

41/207

Page 43: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

LazySweepGC

You can use LazySweepGC since Ruby 1.9.3

42/207

Page 44: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

My RDD history

LazySweepGC - RubyKaigi2008✓

LonglifeGC - 2009✓

LazySweepGC - 2010✓

ParallelMarkingGC - 2011✓

43/207

Page 45: Parallel worlds of CRuby's GC

Today's topics

Page 46: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Today's topics

Why do we need Parallel Marking?

What to consider?✓

How to implement?✓

How much did performance improve?

45/207

Page 47: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Today's topics

Why do we need Parallel Marking?

What to consider?✓

How to implement?✓

How much did performance improve?

46/207

Page 48: Parallel worlds of CRuby's GC

Why do we need Parallel Marking?

Page 49: Parallel worlds of CRuby's GC

This is CRuby'scurrent GC.

Page 50: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Current CRuby's GC

GC operates on only 1 core.✓

In multi-core environment, other cores don't help GC.

49/207

Page 51: Parallel worlds of CRuby's GC

http://www.flickr.com/photos/hortont/2698261070/http://www.flickr.com/photos/hortont/2698261070/

GC:"I'm alone, it's so hard."

Page 52: Parallel worlds of CRuby's GC

http://www.flickr.com/photos/knallaerbse/2863161933/http://www.flickr.com/photos/knallaerbse/2863161933/

We should run GC in parallel!!

Page 53: Parallel worlds of CRuby's GC

First, Let me explain a few GC related concepts.

Page 54: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

What is GC?

GC collects all dead objects.✓

53/207

Page 55: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

What is a dead object?

A dead object is an object that is never referenced by the program.

In GC terms, we say a that dead object is unreachable from Roots.

54/207

Page 56: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

What is Roots?

Roots is a set of pointers that directly reference objects in the program.

e.g. Ruby's local variables, etc..✓

55/207

Page 57: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

For example

56/207

Page 58: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Please remember that

GC collects objects that are unreachable from Roots.

57/207

Page 59: Parallel worlds of CRuby's GC

Next, Let me explain the current CRuby GC

algorithm.

Page 60: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

CRuby's GC algorithm summary

CRuby adopts the Mark & Sweep algorithm

Collector works in separate Mark and Sweep phases.

59/207

Page 61: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

In the Mark phase

collector marks live objects that are reachable from Roots.

60/207

Page 62: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

For example

61/207

Page 63: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Mark phase with GC.start

62/207

Page 64: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Ruby Heap after marking

63/207

Page 65: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

In the Sweep phase

collector sweeps "dead" objects"dead" means unmarked✓

"dead" means unreachable from Roots✓

64/207

Page 66: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Sweep phase

65/207

Page 67: Parallel worlds of CRuby's GC

Characteristics of CRuby's GC

Page 68: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Characteristics

The stop-the-world algorithm✓

Single thread execution✓

67/207

Page 69: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Recently, PC has multi-core processors. But,

GC executes on a single thread.✓

Other cores don't work during GC.✓

What a waste!!✓

68/207

Page 70: Parallel worlds of CRuby's GC

How can we fix this?

Page 71: Parallel worlds of CRuby's GC

UseParallel Marking,Luke

Page 72: Parallel worlds of CRuby's GC

What is Parallel Marking?

Page 73: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

What is Parallel Marking?

Collector run several marking processes in parallel

by using native threads.✓

We will be happy on multi-core machine.

72/207

Page 74: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Flow diagram for Parallel Marking

73/207

Page 75: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Today's topics

Why do we need Parallel Marking?

What to consider?✓

How to implement?✓

How much did performance improve?

74/207

Page 76: Parallel worlds of CRuby's GC

What to consider when implementing Parallel

Marking?

Page 77: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

We should consider two problems

Workload balancing✓

Wait-free algorithm✓

76/207

Page 78: Parallel worlds of CRuby's GC

Workload balancing

Page 79: Parallel worlds of CRuby's GC

How can we divide the marking task into sub-

tasks?

Page 80: Parallel worlds of CRuby's GC

I tried think about a simple approach.

Page 81: Parallel worlds of CRuby's GC

1 branch of Roots is marked by 1 thread.

Page 82: Parallel worlds of CRuby's GC
Page 83: Parallel worlds of CRuby's GC
Page 84: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

This means..

Tasks are distributed to multiple threads.

The task of marking the entire heap is divided into several tasks, each marking a single branch.

83/207

Page 85: Parallel worlds of CRuby's GC

This seems to be no problem.

Page 86: Parallel worlds of CRuby's GC

But actually, this solution suffers from the workload

problem.

Page 87: Parallel worlds of CRuby's GC

Each thread doesn't know what the other threads are doing.

Page 88: Parallel worlds of CRuby's GC

For instance, if A and B finishes work early,

Page 89: Parallel worlds of CRuby's GC

then, they will stop doing anything :(

Page 90: Parallel worlds of CRuby's GC

I think "machines should work forever" :D

Page 91: Parallel worlds of CRuby's GC

So, I think A and B should ...

Page 92: Parallel worlds of CRuby's GC

http://www.flickr.com/photos/ryanr/157458385/http://www.flickr.com/photos/ryanr/157458385/

Page 93: Parallel worlds of CRuby's GC

Parallel Marking with Task Stealing.

Page 94: Parallel worlds of CRuby's GC

If A and B finishes work early,

Page 95: Parallel worlds of CRuby's GC
Page 96: Parallel worlds of CRuby's GC

This is called"Task Stealing"

Page 97: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

We should consider two problems

Workload balancing✓

Wait-free algorithm✓

96/207

Page 98: Parallel worlds of CRuby's GC

Wait-free algorithm

Page 99: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

What does "wait-free" mean?

A wait-free program does non-blocking execution.

It guarantees per-thread progress.✓

98/207

Page 100: Parallel worlds of CRuby's GC

Why is wait-free important?

Page 101: Parallel worlds of CRuby's GC

Amdahl's law

Page 102: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Amdahl's law

is used to find the maximum expected improvement to an overall system when only part of the system is improved.

[cited from `Amdahl's law - Wikipedia']

101/207

Page 103: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Amdahl's law is used in parallel computing

If parallel portion of the system is X%

And number of processors is Y,✓

How much speedup can we expect?

102/207

Page 104: Parallel worlds of CRuby's GC
Page 105: Parallel worlds of CRuby's GC
Page 106: Parallel worlds of CRuby's GC
Page 107: Parallel worlds of CRuby's GC

It's worse than expected, right?

Page 108: Parallel worlds of CRuby's GC

The conclusion so far

Page 109: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

The conclusion so far

We should consider how we can efficiently balance workloads.

So, we use Task Stealing.✓

We should eliminate non-parallel parts

by using wait-free algorithm.✓

108/207

Page 110: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Today's topics

Why do we need Parallel Marking?

What to consider?✓

How to implement?✓

How much did performance improve

109/207

Page 111: Parallel worlds of CRuby's GC

How to implement Parallel Marking?

Page 112: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Task Stealing

In Task Stealing, threads steal tasks from each other

Task Stealing is achieved with Arora's Deque

111/207

Page 113: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Arora's Deque

Deque stands for the Double-Ended Queue.

In Arora's Deque, the deque contains tasks as elements.

It's a wait-free data structure.✓

112/207

Page 114: Parallel worlds of CRuby's GC

Arora's Deque has only three operations.

Page 115: Parallel worlds of CRuby's GC
Page 116: Parallel worlds of CRuby's GC
Page 117: Parallel worlds of CRuby's GC
Page 118: Parallel worlds of CRuby's GC

Each mark worker has a single deque.

Page 119: Parallel worlds of CRuby's GC

Only the owner can call pop() and push().

Page 120: Parallel worlds of CRuby's GC

Worker can call shift() to steal other workers' deque.

Page 121: Parallel worlds of CRuby's GC

"Hey wait a minute, doesn't shift() have

contention problems?"

Page 122: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

In what ways could shift() cause contention problems?

e.g...

Multi-thread (workers) may call shift() of same deque at the same time.

121/207

Page 123: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

In what ways could shift() cause contention problems?

e.g...

shift() and pop() could be called at the same time

when deque has only one element.✓

122/207

Page 124: Parallel worlds of CRuby's GC

But, Arora's Deque avoids these contention problems.

Page 125: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Serialization

shift() is serialized by using CAS.CAS = Compare And Swap✓

And, this serialization doesn't use a lock.

It's wait-free!!✓

124/207

Page 126: Parallel worlds of CRuby's GC

I omit details of the implementation of the

serialization.

Page 127: Parallel worlds of CRuby's GC

For the sake of this presentation, let's assume that Arora's Deque avoids

contention problems.

Page 128: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Summary for Arora's Deque

A simple data structure for Task Stealing.

Each worker has a single deque.✓

Stealing (shift operation) is wait-free!

127/207

Page 129: Parallel worlds of CRuby's GC

How to use Arora's Deque in Parallel Marking?

Page 130: Parallel worlds of CRuby's GC

First try: A task is an object.

Page 131: Parallel worlds of CRuby's GC

Let's say that worker A has a branch that is composed of 4 objects.

Page 132: Parallel worlds of CRuby's GC

We start by marking A and pushing it to the deque.

Page 133: Parallel worlds of CRuby's GC

pop A, mark B and C, push B and C.

Page 134: Parallel worlds of CRuby's GC

pop C, mark D, push D

Page 135: Parallel worlds of CRuby's GC

pop D, pop B

Page 136: Parallel worlds of CRuby's GC

This is a branch marking.

Page 137: Parallel worlds of CRuby's GC

How do you steal?

Page 138: Parallel worlds of CRuby's GC

Suppose that worker1 has task B and C. Worker2 has no task.

Page 139: Parallel worlds of CRuby's GC

Worker2 steals task B on Worker1 by using shift().

Page 140: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Summary

Marker uses Arora's Deque as a marking stack.

A "task" means an object.The granularity of the task is very fine.✓

This is a naive implementation.✓

139/207

Page 141: Parallel worlds of CRuby's GC

I implemented this approach.

Page 142: Parallel worlds of CRuby's GC

But..

Page 143: Parallel worlds of CRuby's GC

It's slower than original GC.

Page 144: Parallel worlds of CRuby's GC

http://www.flickr.com/photos/emariephotos/4958245676/http://www.flickr.com/photos/emariephotos/4958245676/

OMG...

Page 145: Parallel worlds of CRuby's GC

I fell intothe Pitfalls ofParallel Processing(PPP!!!)

Page 146: Parallel worlds of CRuby's GC

Why slow?

Page 147: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Why slow?

pop(),push(),shift() are called frequently.

Because deque has fine-grained tasks.✓

Their overhead is too big.✓

146/207

Page 148: Parallel worlds of CRuby's GC

How to fix this?

Page 149: Parallel worlds of CRuby's GC

We can make the tasks less fine-grained.

Page 150: Parallel worlds of CRuby's GC

A task is a branch

Page 151: Parallel worlds of CRuby's GC

All branches in Roots are divided roughly among the deques.

Page 152: Parallel worlds of CRuby's GC

Each Worker marks a branch in its deque.

Page 153: Parallel worlds of CRuby's GC

When the deque is empty, the worker steals a branch from another worker.

Page 154: Parallel worlds of CRuby's GC

like this!!

Page 155: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Good point & Bad point

Number of calls to Deque's operations was reduced.

Marking speed of the worker is improved.

However, Coarse-grained tasks decrease parallelism.

154/207

Page 156: Parallel worlds of CRuby's GC

Why do coarse-grained tasks decrease parallelism?

Page 157: Parallel worlds of CRuby's GC

Tasks may involve a large branch.

Page 158: Parallel worlds of CRuby's GC

If an object in B's branch has many child objects..

Page 159: Parallel worlds of CRuby's GC

.. then A can't steal it while B is marking the large branch.

Page 160: Parallel worlds of CRuby's GC

So, the worker needs to treat large branches as

special cases.

Page 161: Parallel worlds of CRuby's GC

Almost all large branches hold large Array objects

and/or large Hash objects.

Page 162: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Treatment for large Array objects and Hash objects

Each marker has a special deque to manage them.

A marker divides them into fixed size tasks.

e.g. 0-9 elements of Array, 10-19 elements of Array...

161/207

Page 163: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Treatment for Large Array and Hash

By doing this, other workers can steal divided tasks.

This improves parallelism.✓

162/207

Page 164: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Summary

The naive implementation was slow.

Grain of the task was too fine.✓

A "task" means a branch in RootsGrain of the task is coarse.✓

It's faster!!✓163/207

Page 165: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Today's topics

Why do we need Parallel Marking?

What to consider?✓

How to implement?✓

How much did performance improve?

164/207

Page 166: Parallel worlds of CRuby's GC

How much did performance improve?

Page 167: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

These are my machine specs

My machine has only 2 cores✓

Memory: 8GB✓

OS: Linux✓

166/207

Page 168: Parallel worlds of CRuby's GC

Parallel marking uses 4 marking threads.

Page 169: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

First benchmark program is

make benchmarkThis is the benchmark which used in CRuby development

168/207

Page 170: Parallel worlds of CRuby's GC
Page 171: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Why does this seem so slow?

I think it's affected by Parallel Marking's preparation.

e.g. creating marking threads, allocation of deques.

170/207

Page 172: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Why does this seem so slow?

In most of the benchmarks, the mark target objects are few.

In this case, Parallel Marking cost is expensive.

171/207

Page 173: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Next benchmark program is

make rdocmake rdoc generates the Ruby documentation.

This benchmark measures execution time and the GC execution time of make rdoc.

172/207

Page 174: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

make rdoc

It takes about 80 seconds on my machine.

In fact, 30% of that time is spent on GC!!

How much did performance improve?

173/207

Page 175: Parallel worlds of CRuby's GC
Page 176: Parallel worlds of CRuby's GC

All GC time is improved by 40%!

Page 177: Parallel worlds of CRuby's GC

So fast!!

Page 178: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

In many core environment

I expect we get a large improvement.

e.g. 8 core, 16 core...✓

But, my machine has just 2 cores.I can't see it :(✓

177/207

Page 179: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Best case for Parallel GC

If the objects are many.In this case, mark targets is also many.✓

If the objects are long-lived.Server-side application?✓

178/207

Page 180: Parallel worlds of CRuby's GC

Demo

Page 181: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Demonstration

I want to show the performance improvement with Parallel GC.

This demonstration is video game style.

180/207

Page 182: Parallel worlds of CRuby's GC

Let me explain about this game.

Page 183: Parallel worlds of CRuby's GC

And, Character has HP.

Page 184: Parallel worlds of CRuby's GC

When GC runs,

Page 185: Parallel worlds of CRuby's GC

the character loses HP while waiting for the GC to finish.

Page 186: Parallel worlds of CRuby's GC

We must reach the goal before HP run out.

Page 187: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Other characteristics of SUPER NARIO GC

GC is running in fixed intervals.✓

A lot of objects are generated to increase GC's burden.

Burden = Game Level✓

186/207

Page 188: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Try to compare Original GC and Parallel GC

Original GC pause time is long.This game will be difficult.✓

Parallel GC pause time is short.This game will be easy.✓

187/207

Page 189: Parallel worlds of CRuby's GC

OK, Let's try!

Page 190: Parallel worlds of CRuby's GC

DEMOOriginal GC version

Page 191: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Difficulty...!

190/207

Page 192: Parallel worlds of CRuby's GC

DEMOParallel GC version

Page 193: Parallel worlds of CRuby's GC

Easy!!

Page 194: Parallel worlds of CRuby's GC

Let's compare average times GC

Page 195: Parallel worlds of CRuby's GC
Page 196: Parallel worlds of CRuby's GC

Fast!!

Page 197: Parallel worlds of CRuby's GC

Remaining Problems

Page 198: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Windows OS is not supported

Mark Worker uses pthread as native thread.

And, uses some gcc built-in functions.

But, I'll support for Windows eventually.

197/207

Page 199: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Increased memory usage.

Size of 1 Deque is roughly 32KB.✓

But generally multi-core machine have plenty of memory.

So, I think it's OK :P✓

198/207

Page 200: Parallel worlds of CRuby's GC

Conclusion

Page 201: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Conclusion

I implemented Parallel Marking GC

GC was improved!I'll report to ruby-core soon.✓

200/207

Page 202: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Conclusion

But, Parallel Marking has some problems.

I'll fix these.✓

201/207

Page 203: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

source code

Parallel Marking GC<URL:https://github.com/authorNari/ruby/tree/pmark_div_root2>

SUPER NARIO GC<URL:https://github.com/authorNari/nario/>

202/207

Page 204: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Acknowledgments

Following people helped me make this presentation!!

Tor-san!!✓

matz, shugo, yhara, sada, takaokouji, other co-workers!!

Google Translate✓

203/207

Page 205: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

R.I.P.

John McCarthy (Sep 4, 1927 – Oct 23, 2011)

204/207

Page 206: Parallel worlds of CRuby's GC

Gracias!!!

Page 207: Parallel worlds of CRuby's GC

Do you have any questions?

Please short and easy questions :)

Page 208: Parallel worlds of CRuby's GC

Parallel worlds of CRuby's GC Powered by Rabbit 1.0.4

Sorry

It's too difficult for me to understand/answer the question.

Could be send the question on twitter(@nari_en)?

207/207