Monday, April 10, 2023
International Symposium on Memory Management
Vancouver BC, October 2004
Barriers: Friend or Foe?
Steve BlackburnDepartment of Computer Science
Australian National University
Tony HoskingDepartment of Computer Sciences
Purdue University
Monday, April 10, 2023
International Symposium on Memory Management
Vancouver BC, October 2004
Read & Write Barrier Costs
Are r/w barrier costs significant?
Monday, April 10, 2023
International Symposium on Memory Management
Vancouver BC, October 2004
Read and Write Barriers• Algorithmically powerful mechanisms
– Extend semantics of each read/write• Particularly useful to GC• Untested assumption:
“read/write barriers are expensive”– Curtails creativity in GC algorithm
development– Encourages (unnecessary?) work on
avoidance• Prior work
– [Zorn 1990] (used simulation & traces)– [Blackburn & McKinley 2002] (compilation & inlining)
Monday, April 10, 2023
International Symposium on Memory Management
Vancouver BC, October 2004
Our Contributions
• Methodology for measurement• Evaluate mutator overhead
– 5 common w/b, 2 r/b– 9 benchmarks– 3 architectures (AMD, P4, PPC)– Exclude compiler, GC from
measurements
Monday, April 10, 2023
International Symposium on Memory Management
Vancouver BC, October 2004
Methodology• Want to remove barrier
– Compare with and without barrier• Add full trace to generational collector
– Remembered objects irrelevant– Can include/exclude barrier
• MMTk, Jikes RVM– Hardware performance counters– Pseudo-adaptive (realistic, deterministic)– Second iteration (avoid compiler overhead)– Best of 5 (least disturbed)
Monday, April 10, 2023
International Symposium on Memory Management
Vancouver BC, October 2004
Write Barrier Code
1 public final void writeBarrier(ObjectReference src, Address slot,2 ObjectReference tgt, int mode)3 throws InlinePragma {4 // insert write barrier code here5 slot.store (tgt); 6 }
Monday, April 10, 2023
International Symposium on Memory Management
Vancouver BC, October 2004
Write Barrier Code cont.
Java PPC asm x86 asm
Boundary(Slot)
4 if (slot.LT(NURSERY_START)5 && tgt.GE(NURSERY_START))6 remSlots.insert(slot);
1 liu R3,0x6e102 cmplW cr1,R30,R33 bge 1 544 liu R3,0x6e105 cmplW cr1,R31,R36 bge 1 7c
1 cmp edi 0xa02000002 jlge 03 cmp ebx 0xa02000004 jlge 0
Object 4 if (getHeader(src)5 .and(LOGGING_MASK)6 .EQ(UNLOGGED))7 rememberObject(src);
1 lwz R4,-8(R5)2 rlinm R4,R4,0x0,0x1d,0x1d3 cmpiW cr1,R4,0x44 beq 1 78
1 mov ecx -8[edx]2 and ecx 43 cmp ecx 44 jeq 0
Card 4 int card=src.rshl(LOG_CARD_SIZE);5 cardTable.add(card).store((byte) 1);
1 lwz R5,0x1664(JT)2 rlinm R6,R3,0x16,0xa,0x1f3 lil R7,0x14 stbx R7,R5,R6
1 mov ebx [0x290279a]2 shr eax 103 mov [0+ebx+eax<<0] 1
Zone 4 if (slot.xor(tgt).GE(ZONE_SIZE))5 remSlots.insert(slot);
1 xor R3,R30,R312 liu R5,0x403 cmplW cr1,R3,R54 bge 1 74
1 mov edi eax2 mov eax edi3 xor eax ebx4 cmp eax 0x4000005 jlge 0
Monday, April 10, 2023
International Symposium on Memory Management
Vancouver BC, October 2004
Experiments: Hardware
• 3 platforms:– 1.9GHz AMD Athlon XP 2600 1GB– 2.6GHz Pentium 4 1GB – 1.6GHz PowerPC 970 768MB
• AMD and Intel performance counters– cycles– instructions retired– L1/L2 cache misses– TLB misses– both mutator and collector, separately
Monday, April 10, 2023
International Symposium on Memory Management
Vancouver BC, October 2004
Experiments: Software
• MMTk in Jikes RVM version 2.3.2+CVS – ignore remsets GC configuration (now in
MMTk)– patched to support performance counters– pseudo-adaptive compilation– read barriers
• Debian Linux 2.6.0 kernel + x86 perfctr• Standalone mode
Monday, April 10, 2023
International Symposium on Memory Management
Vancouver BC, October 2004
Write Barrier Overheadmean of SPECjvm98 & SPECjbb
0%
1%
2%
3%
4%
5%
6%
Boundary Object Hybrid Zone Card
Overhead
amd
p4
ppc
Monday, April 10, 2023
International Symposium on Memory Management
Vancouver BC, October 2004
Write Barrier Code (Again)
Java PPC asm x86 asm
Boundary(Slot)
4 if (slot.LT(NURSERY_START)5 && tgt.GE(NURSERY_START))6 remSlots.insert(slot);
1 liu R3,0x6e102 cmplW cr1,R30,R33 bge 1 544 liu R3,0x6e105 cmplW cr1,R31,R36 bge 1 7c
1 cmp edi 0xa02000002 jlge 03 cmp ebx 0xa02000004 jlge 0
Object 4 if (getHeader(src)5 .and(LOGGING_MASK)6 .EQ(UNLOGGED))7 rememberObject(src);
1 lwz R4,-8(R5)2 rlinm R4,R4,0x0,0x1d,0x1d3 cmpiW cr1,R4,0x44 beq 1 78
1 mov ecx -8[edx]2 and ecx 43 cmp ecx 44 jeq 0
Card 4 int card=src.rshl(LOG_CARD_SIZE);5 cardTable.add(card).store((byte) 1);
1 lwz R5,0x1664(JT)2 rlinm R6,R3,0x16,0xa,0x1f3 lil R7,0x14 stbx R7,R5,R6
1 mov ebx [0x290279a]2 shr eax 103 mov [0+ebx+eax<<0] 1
Zone 4 if (slot.xor(tgt).GE(ZONE_SIZE))5 remSlots.insert(slot);
1 xor R3,R30,R312 liu R5,0x403 cmplW cr1,R3,R54 bge 1 74
1 mov edi eax2 mov eax edi3 xor eax ebx4 cmp eax 0x4000005 jlge 0
Monday, April 10, 2023
International Symposium on Memory Management
Vancouver BC, October 2004
Write Barrier Code (Again)
Java PPC asm x86 asm
Boundary(Slot)
4 if (slot.LT(NURSERY_START)5 && tgt.GE(NURSERY_START))6 remSlots.insert(slot);
1 liu R3,0x6e102 cmplW cr1,R30,R33 bge 1 544 liu R3,0x6e105 cmplW cr1,R31,R36 bge 1 7c
1 cmp edi 0xa02000002 jlge 03 cmp ebx 0xa02000004 jlge 0
Object 4 if (getHeader(src)5 .and(LOGGING_MASK)6 .EQ(UNLOGGED))7 rememberObject(src);
1 lwz R4,-8(R5)2 rlinm R4,R4,0x0,0x1d,0x1d3 cmpiW cr1,R4,0x44 beq 1 78
1 mov ecx -8[edx]2 and ecx 43 cmp ecx 44 jeq 0
Card 4 int card=src.rshl(LOG_CARD_SIZE);5 cardTable.add(card).store((byte) 1);
1 lwz R5,0x1664(JT)2 rlinm R6,R3,0x16,0xa,0x1f3 lil R7,0x14 stbx R7,R5,R6
1 mov ebx [0x290279a]2 shr eax 103 mov [0+ebx+eax<<0] 1
Zone 4 if (slot.xor(tgt).GE(ZONE_SIZE))5 remSlots.insert(slot);
1 xor R3,R30,R312 liu R5,0x403 cmplW cr1,R3,R54 bge 1 74
1 mov edi eax2 mov eax edi3 xor eax ebx4 cmp eax 0x4000005 jlge 0
Monday, April 10, 2023
International Symposium on Memory Management
Vancouver BC, October 2004
Write Barrier Code (Again)
Java PPC asm x86 asm
Boundary(Slot)
4 if (slot.LT(NURSERY_START)5 && tgt.GE(NURSERY_START))6 remSlots.insert(slot);
1 liu R3,0x6e102 cmplW cr1,R30,R33 bge 1 544 liu R3,0x6e105 cmplW cr1,R31,R36 bge 1 7c
1 cmp edi 0xa02000002 jlge 03 cmp ebx 0xa02000004 jlge 0
Object 4 if (getHeader(src)5 .and(LOGGING_MASK)6 .EQ(UNLOGGED))7 rememberObject(src);
1 lwz R4,-8(R5)2 rlinm R4,R4,0x0,0x1d,0x1d3 cmpiW cr1,R4,0x44 beq 1 78
1 mov ecx -8[edx]2 and ecx 43 cmp ecx 44 jeq 0
Card 4 int card=src.rshl(LOG_CARD_SIZE);5 cardTable.add(card).store((byte) 1);
1 lwz R5,0x1664(JT)2 rlinm R6,R3,0x16,0xa,0x1f3 lil R7,0x14 stbx R7,R5,R6
1 mov ebx [0x290279a]2 shr eax 103 mov [0+ebx+eax<<0] 1
Zone 4 if (slot.xor(tgt).GE(ZONE_SIZE))5 remSlots.insert(slot);
1 xor R3,R30,R312 liu R5,0x403 cmplW cr1,R3,R54 bge 1 74
1 mov edi eax2 mov eax edi3 xor eax ebx4 cmp eax 0x4000005 jlge 0
Monday, April 10, 2023
International Symposium on Memory Management
Vancouver BC, October 2004
AMD Athlon 2600+ 1.9GHz
Write Barrier
-4%
-2%
0%
2%
4%
6%
8%
10%
12%
_201_compress
_202_jess
_205_raytrace
_209_db_213_javac
_222_mpegaudio
_227_mtrt_228_jackpseudojbb
mean
Boundary
Object
Hybrid
Zone
Card
Monday, April 10, 2023
International Symposium on Memory Management
Vancouver BC, October 2004
Intel P4 2.6GHzWrite Barrier
-4%
-2%
0%
2%
4%
6%
8%
10%
_201_compress
_202_jess
_205_raytrace
_209_db_213_javac
_222_mpegaudio
_227_mtrt_228_jackpseudojbb
mean
Boundary
Object
Hybrid
Zone
Card
Monday, April 10, 2023
International Symposium on Memory Management
Vancouver BC, October 2004
G5 PowerPC 970 1.6GHz Write Barrier
-2%
0%
2%
4%
6%
8%
10%
12%
14%
_201_compress
_202_jess
_205_raytrace
_209_db_213_javac
_222_mpegaudio
_227_mtrt_228_jackpseudojbb
mean
Boundary
Object
Hybrid
Zone
Card
Monday, April 10, 2023
International Symposium on Memory Management
Vancouver BC, October 2004
Performance Counters
Monday, April 10, 2023
International Symposium on Memory Management
Vancouver BC, October 2004
Intel P4 2.6GHzWrite Barrier Retired
Instructions
-2%
0%
2%
4%
6%
8%
10%
12%
_201_compress
_202_jess
_205_raytrace
_209_db_213_javac
_222_mpegaudio
_227_mtrt_228_jackpseudojbb
mean
Boundary
Object
Hybrid
Zone
Card
Monday, April 10, 2023
International Symposium on Memory Management
Vancouver BC, October 2004
Intel P4 2.6GHzWrite Barrier L1 Misses
-50%
-40%
-30%
-20%
-10%
0%
10%
20%
_201_compress
_202_jess
_205_raytrace
_209_db_213_javac
_222_mpegaudio
_227_mtrt_228_jackpseudojbb
mean
Boundary
Object
Hybrid
Zone
Card
Monday, April 10, 2023
International Symposium on Memory Management
Vancouver BC, October 2004
Intel P4 2.6GHzWrite Barrier L2 Misses
-60%
-40%
-20%
0%
20%
40%
60%
80%
100%
120%
140%
_201_compress
_202_jess
_205_raytrace
_209_db_213_javac
_222_mpegaudio
_227_mtrt_228_jackpseudojbb
mean
Boundary
Object
Hybrid
Zone
Card
Monday, April 10, 2023
International Symposium on Memory Management
Vancouver BC, October 2004
Intel P4 2.6GHzWrite Barrier DTLB Misses
-15%
-10%
-5%
0%
5%
10%
15%
20%
25%
_201_compress
_202_jess
_205_raytrace
_209_db_213_javac
_222_mpegaudio
_227_mtrt_228_jackpseudojbb
mean
Boundary
Object
Hybrid
Zone
Card
Monday, April 10, 2023
International Symposium on Memory Management
Vancouver BC, October 2004
Read Barrier Code
1 public final ObjectReference readBarrier(ObjectReference obj,
2 Address slot, int mode)
3 throws InlinePragma {4 ObjectReference value = slot.loadObjectReference();5 return value; // insert read barrier code here6 }
Monday, April 10, 2023
International Symposium on Memory Management
Vancouver BC, October 2004
Read Barrier Code cont.
Java PPC asm x86 asm
Unconditional
5 return value.and(~3); 1 rlinm R3,R3,0x0,0x0,0x1d
1 and cax -4
Conditional 5 if (value.and(1).NE(1))6 return value;7 else8 return 0;
1 rlinm R4,R3,0x0,0x1f,0x1f2 cmpiW cr1,R4,0x13 bne 1 3c
1 mov edx eax2 and edx 13 cmp edx 14 mov edx 05 cmovne edx eax6 mov eax edx
Monday, April 10, 2023
International Symposium on Memory Management
Vancouver BC, October 2004
Read Barrier Overheadmean of SPECjvm98 & SPECjbb
0%
5%
10%
15%
20%
25%
Unconditional Conditonal
Overhead
amd
p4
ppc
Monday, April 10, 2023
International Symposium on Memory Management
Vancouver BC, October 2004
AMD Athlon 2600+ 1.9GHz Read Barrier
0%5%
10%15%20%25%30%35%40%
_201_compress
_202_jess
_205_raytrace
_209_db_213_javac
_222_mpegaudio
_227_mtrt_228_jackpseudojbb
mean
Unconditional
Conditional
Monday, April 10, 2023
International Symposium on Memory Management
Vancouver BC, October 2004
Intel P4 2.6GHzRead Barrier
0%
5%
10%
15%
20%
25%
30%
35%
_201_compress
_202_jess
_205_raytrace
_209_db_213_javac
_222_mpegaudio
_227_mtrt_228_jackpseudojbb
mean
Unconditional
Conditional
Monday, April 10, 2023
International Symposium on Memory Management
Vancouver BC, October 2004
G5 PowerPC 970 1.6GHz Read Barrier
-4%
-2%
0%
2%
4%
6%
8%
10%
12%
14%
16%
_201_compress
_202_jess
_205_raytrace
_209_db_213_javac
_222_mpegaudio
_227_mtrt_228_jackpseudojbb
mean
Unconditional
Conditional
Monday, April 10, 2023
International Symposium on Memory Management
Vancouver BC, October 2004
Conclusions• New methodology: available in MMTk
– Specific barrier patches at:http://cs.anu.edu.au/~Steve.Blackburn/pubs/wb-ismm-2004.tgz
• Barrier costs (often) surprisingly low• Barrier costs very architecturally
sensitive– GC developers: think about your target arch.– GC papers: what architecture did they use?– Architects: choices impact OO languages in
surprising ways.