Read-Write Lock Allocation in Software Transactional Memory

Read-Write Lock Allocation in Software Transactional

Memory

Amir Ghanbari Bavarsad and Ehsan Atoofian

Lakehead University

P1

$ $

Pn

Global Clock

Transactional Memory Software transactional memory (STM) exploits a

global clock to validate transactional data Pros: reduces validation overhead Cons: contention

Alternate: Read Write Lock Allocation (RWLA) Pros: no central clock Cons: overhead if a TX aborts

Speculative RWLA: changes validation policy dynamically → Speedup: up to 66%

2

Outline

Background

RWLA

Speculative RWLA

Conclusion

3

4

Counter in STM

T1

TM_BEGIN(); local_counter = TM_READ(counter); local_counter++;

TM_WRITE(counter, local_counter); TM_END();

Transactional data are validated using: Global clock

Shared variable Timestamp for transactions

Lock Memory is mapped to Lock Table Each entry of the table:

Version #

…

…

5

Validation in STM

Global Clock

Memory

Lock Table

Version #

6

Updating Global Clock & Lock Increment Global Clock Version # = global_clock Global Clock

Memory

Lock Table

Version #

…

…

counter

7

Validation in STM

rv (read version) is set to global_clock

T1



Metadata for TX1

rv

Global Clock

8

Successful Read Validation

rv >= version# The most recent write to counter,

occurred before TM_BEGIN()

T1



Metadata for TX1 Global Clock

rv

9

Failed Read Validation

rv < version# The most recent write to counter,

occurred after TM_BEGIN()

T1



Metadata for TX1 Global Clock

rv

Overhead of Validation

This method, called GV4, results in many cache coherence misses if transactions commit frequently

10

P1

$ $

Pn

Global Clock

Outline

Background

RWLA

Speculative RWLA

Conclusion

11

Lock Memory is mapped to Lock Table Each entry of the table:

Lock bit Read bits

Read Write Lock Allocation (RWLA)

12

Lock Table

…

…

Memory

P0P1…Pn-1

lock bitRead bits

13

TM_READ



000000 …..

14

TM_READ



Set read bit in the corresponding lock

entry

Yes

TM_READ()

Lock bit is free?

000000 …..1lock bit

15

TM_READ



Abort

No

100000 …..

Set read bit in the corresponding lock

entry

Yes

TM_READ()

Lock bit is free?

16

TM_WRITE



Abort

TM_WRITE

All read bits are clear?

No

000100 …..

17

TM_WRITE



Abort

TM_WRITE

Acquire lockfailed


No

Yes

100000 …..

18

TM_WRITE



00000 …..

Abort

TM_WRITE

Acquire lockfailed


No

Yes

10

Experimental Framework

Benchmarks: Stamp v0.9.7 Run up to competition Measured statistics over 10 runs

TL2 as an STM framework

Two Intel Xeon E5660, 6-way CMP

19

Performance of RWLA

20

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Bayes Kmeans Labyrinth Ssca2 Vacation Genome

2 4 8 16 AVG.

bette

r

Speculative RWLA Conflict occurs frequently → select GV4 Conflict occurs rarely → select RWLA How to predict conflict?

21

Contention Predictor

Prediction: y≥0 →predict commit y<0 →predict abort

Update If outcome of current TX and TXi agree/disagree →increment/decrement

wi

22

1 X1 … Xn

y

w1w0 wn

n

niiwxwy

10 )(

xi: global transaction history, bipolar value

wi: weight vector

Performance of Speculative RWLA # of threads changes between 2 and 16 On average, performance changes from 21% in Bayes to

47% in Labyrinth

23

0

0.2

0.4

0.6

0.8

1

1.2

Bayes Kmeans Labyrinth Ssca2 Vacation Genome

2 4 8 16 AVG.

bette

r

Conclusion

RWLA to overcome contentions over global clok

Applications react differently to GV4 and RWLA

Speculative RWLA changes validation policy dynamically

Speculative RWLA performance of STMs up to 66%

24

25

Thank You!

Questions?

Documents

Read-Write Lock Allocation in Software Transactional Memory