32
© 2006 Mulitfacet Project University of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, & David A. Wood 12th International Symposium on High Performance Computer Architecture (HPCA-12)

© 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

Embed Size (px)

DESCRIPTION

3 2/15/06HPCA-12 Outline Background & Motivation –Why Hardware Transactional Memory (TM)? –How do TM systems differ? LogTM: Log-based Transactional Memory Evaluation Conclusion

Citation preview

Page 1: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

© 2006 Mulitfacet Project University of Wisconsin-Madison

LogTM:Log-Based Transactional Memory

Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, & David A. Wood

12th International Symposium on High Performance Computer Architecture (HPCA-12)

Page 2: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

2 2/15/06 HPCA-12

Big Picture

• (Hardware) Transactional Memory promising– Most use lazy version management

• Old values “in place”• New values “elsewhere”

– Commits slower than aborts

• New LogTM: Log-based Transactional Memory– Uses eager version management (like most databases)

• Old values to log in thread-private virtual memory• New values “in place”

– Makes common commits fast!– Allows cache overflow– Aborts handled in software

Page 3: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

3 2/15/06 HPCA-12

Outline

• Background & Motivation– Why Hardware Transactional Memory (TM)?– How do TM systems differ?

• LogTM: Log-based Transactional Memory• Evaluation• Conclusion

Page 4: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

4 2/15/06 HPCA-12

Why (Hardware) Transactional Memory (TM)?

• CMPs make multithreaded programming important

• Locks Challenging• Transactional Memory Promising

– Interface intuitive• begin_transaction { atomic execution } end_transaction

– Implementation manages data versions & conflicts• Speed is important

– HTMs faster than STMs– HTMs faster than some lock regimes– Whole reason for parallelism

Page 5: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

5 2/15/06 HPCA-12

How Do Transactional Memory Systems Differ?

• (Data) Version Management– Keep old values for abort AND new values for commit

– Eager: record old values “elsewhere”; update “in place”

– Lazy: update “elsewhere”; keep old values “in place”

• (Data) Conflict Detection– Find read-write, write-read or write-write conflicts

among concurrent transactions

– Eager: detect conflict on every read/write

– Lazy: detect conflict at end (commit/abort)

Fastcommit

Less wasted work

Page 6: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

6 2/15/06 HPCA-12

Outline

• Background & Motivation• LogTM: Log-based Transactional Memory

– Eager Version Management– Eager Conflict Detection– Conflict Resolution (working solution)

• Evaluation• Conclusion

Page 7: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

7 2/15/06 HPCA-12

LogTM’s Eager Version Management

• Old values stored in the transaction log– A per-thread linear (virtual) address space (like the

stack)– Filled by hardware (during transactions)– Read by software (on abort)

• New values stored “in place”• Current design requires hardware support

Page 8: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

8 2/15/06 HPCA-12

12--------------

--------------23

34--------------

0

Transaction Log Example

00

40

C0

1000

1040

1080

• Initial State• LogBase = LogPointer• TM count > 0

Data BlockVA

Log Base

Log Ptr

TM count

1000

1000

1

0 0

R W

0 0

0 0

Page 9: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

9 2/15/06 HPCA-12

10001048--

34------------

12--------------

--------------23

34-------------- 0

Transaction Log Example

56--------------

00

40

C0

1000

1040

1080

• Store r2, (c0) /* r2 = 56 */– Set W bit for block (c0)– Store address (c0) and old

data on the log– Increment Log Ptr to 1048– Update memory

Data BlockVA

Log Base

Log Ptr

TM count

1000

1

0 0

R W

0 0

0 1

c0

Page 10: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

10 2/15/06 HPCA-12

12--------------

--------------23

56--------------

Transaction Log Example

00

40

C0

1000

1040

1080

• Commit transaction– Clear R & W for all blocks– Reset Log Ptr to Log Base

(1000)– Clear TM count

Data BlockVA

Log Base

Log Ptr

TM count

1000

1000

0

0 0

R W

0 0

0 0

34------------c0

--

0

0 0

1

1

1048

Page 11: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

11 2/15/06 HPCA-12

1

1090

Transaction Log Example

12--------------

--------------23

34--------------

00

40

C0

1000

1040

1080

• Abort transaction– Replay log entries to “undo”

the transaction– Reset Log Ptr to Log Base

(1000)– Clear R & W bits for all

blocks– Clear TM count

Data BlockVA

Log Base

Log Ptr

TM count

1000

1048

0

0 0

R W

0 0

0 0

c0 34------------

--

0

0 0

156--------------

1000

Page 12: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

12 2/15/06 HPCA-12

Eager Version Management Discussion

• Advantages:– Fast Commits

• No copying• Common case

– Unambiguous Data Values• Value of a memory location is the value of the last store (no

table lookup)

• Disadvantages– Slow/Complex Aborts

• Undo aborting transaction– Relies on Eager Conflict Detection/Prevention

Page 13: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

13 2/15/06 HPCA-12

LogTM’s Eager Conflict Detection

• Most Hardware TM Leverage Invalidation Cache Coherence– Add per-processor transactional write (W) & read (R) bits– Coherence protocol detects transactional data conflicts– E.g., Writer seeks M copy, seeks S copies, & finds R bit set

• LogTM detects conflicts this way using directory coherence– Requesting processor issues coherence request to directory– Directory forwards to other processor(s)– Responding processor detects conflict using local R/W bits

& informs requesting processor of conflict

Page 14: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

14 2/15/06 HPCA-12

I [old]M@P0 [old]

I (--) [none]M (--) [old]M (-W) [new]

Conflict Detection (example)

Directory

TM modeOverflow

00

P1

I (--) [none]

TM modeOverflow

00

1

• P0 store– P0 sends get exclusive

(GETX) request– Directory responds with

data (old)– P0 executes store

P0

GETX DATA

Page 15: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

15 2/15/06 HPCA-12

M (-W) [new]M (-W) [new]

Conflict Detection (example)

Directory

TM modeOverflow

00

P1

I (--) [none]

TM modeOverflow

00

M@P0 [old]

1

• In-cache transaction conflict– P1 sends get shared

(GETS) request– Directory forwards to P0– P1 detects conflict and

sends NACK

P0

GETSFwd_GETS

Conflict!NACK

Page 16: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

16 2/15/06 HPCA-12

M (-W) [new]I (--) [none]

M@P0 [old]Msticky@P0 [new]

Conflict Detection (example)

Directory

TM modeOverflow

00

P1

I (--) [none]

TM modeOverflow

00

1

• Cache overflow– P0 sends put exclusive

(PUTX) request– Directory acknowledges– P0 sets overflow bit– P0 writes data back to

memory

P0

PUTX ACK DATA

1

Page 17: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

17 2/15/06 HPCA-12

Conflict Detection (example)

Directory

I (--) [none]

TM modeOverflow

00

P1

I (--) [none]

TM modeOverflow

00

M@P0 [old]

1

• Out-of-cache conflict– P1 sends GETS request– Directory forwards to P0– P0 detects a (possible)

conflict– P0 sends NACK

P0

M (--) [old]M (-W) [new]

Msticky@P0 [new]

I (--) [none]

1

GETSFwd_GETS

Conflict!NACK

1

Page 18: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

18 2/15/06 HPCA-12

Conflict Detection (example)

Directory

I (--) [none]

TM modeOverflow

00

P1

I (--) [none]

TM modeOverflow

00

M@P0 [old]

1

• Commit– P0 clears TM mode and

Overflow bits

P0

M (--) [old]M (-W) [new]

Msticky@P0 [new]

I (--) [none]

100

Page 19: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

19 2/15/06 HPCA-12

Msticky@P0 [new]S(P1) [new]

000

Conflict Detection (example)

Directory

I (--) [none]

TM modeOverflow

0 P1

I (--) [none]

TM modeOverflow

00

• Lazy cleanup– P1 sends GETS request– Directory forwards request

to P0– P0 detects no conflict,

sends CLEAN– Directory sends Data to P1

P0

M (--) [old]M (-W) [new]I (--) [none]

GETSFwd_GETS

CLEAN DATA

S (--) [new]

Page 20: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

20 2/15/06 HPCA-12

LogTM’s Conflict Detection w/ Cache Overflow

• At overflow at processor P– Set P’s overflow bit (1 bit per processor)– Allow writeback, but set directory state to Sticky@P

• At transaction end (commit or abort) at processor P– Reset P’s overflow bit

• At (potential) conflicting request by processor R– Directory forwards R’s request to P.– P tells R “no conflict” if overflow is reset– But asserts conflict if set (w/ small chance of false positive)

Page 21: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

21 2/15/06 HPCA-12

Conflict Resolution

• Conflict Resolution– Can wait risking deadlock– Can abort risking livelock– Wait/abort transaction at requesting or responding proc?

• LogTM resolves conflicts at requesting processor– Requesting processor waits (using coherence nacks/retries)– But aborts if other processor is waiting (deadlock possible)

& it is logically younger (using timestamps)

• Future: Requesting processor traps to software contention manager that decides who waits/aborts

Page 22: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

22 2/15/06 HPCA-12

Outline

• Background & Motivation• LogTM: Log-based Transactional Memory• Evaluation

– Methods– Shared-Counter Microbenchmark – SPLASH2 Benchmarks

• Conclusion

Page 23: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

23 2/15/06 HPCA-12

Methods

• Simulated Machine: 32-way non-CMP– 32 SPARC V9 processors running Solaris 9 OS– 1 GHz in-order processors w/ ideal IPC=1 & private caches– 16 kB 4-way split L1 cache, 1 cycle latency– 4 MB 4-way unified L2 cache, 12 cycle latency– 4 GB main memory, 80-cycle access latency– Full-bit vector directory w/ directory cache

• Simulation Infrastructure– Virtutech Simics for full-system function– Magic no-ops instructions for begin_transaction()etc. – Multifacet GEMS for memory system timing (Ruby only)

GPL Release: http://www.cs.wisc.edu/gems/– LogTM simulator part of GEMS 2.0 (coming soon)

Page 24: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

24 2/15/06 HPCA-12

Microbenchmark Analysis

• Shared Counter– All threads update

the same counter– High contention– Small Transactions

• LogTM v. Locks– EXP - Test-And-Test-And-

Set Locks with Exponential Backoff

– MCS - Software Queue-Based Locks

BEGIN_TRANSACTION();

new_total = total.count + 1; private_data[id].count++; total.count = new_total;

COMMIT_TRANSACTION();

Page 25: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

25 2/15/06 HPCA-12

Shared Counter

• LogTM (like other HTMs) does not read/write lock• LogTM has few aborts despite conflicts

0

10

20

30

40

50

60

70

80

90

0 5 10 15 20 25 30 35Threads (on 32 Processors)

Execution Time (in millions of cycles)

EXPMCSLogTM

Page 26: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

26 2/15/06 HPCA-12

SPLASH-2 Benchmarks

Benchmark Input SynchronizationBarnes 512 Bodies Locks on tree nodes

Cholesky 14 Task queue locks

Ocean Contiguous partitions, 258 Barriers

Radiosity Room Task queue and buffer locks

Raytrace Small image (teapot) Work list and counter locks

Raytrace-Opt Small image (teapot) Work list and counter locks

Water N-Squared 512 Molecules barriers

False sharing optimization

Page 27: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

27 2/15/06 HPCA-12

SPLASH2 Benchmark Results

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

OCEAN RADIOSITY CHOLESKY RT-OPT RT-BASE BARNES WATERBenchmark

Execution Time (in millions of cycles)

4.18

2.68

• LogTM (like other HTMs) does not read/write lock• Allow “critical section parallelism” (e.g., 5.5 for RT-OPT)

LogT

M S

peed

up w

.r.t.

to P

AR

MA

C L

ocks

False sharing

Page 28: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

28 2/15/06 HPCA-12

SPLASH2 Benchmark Results

Benchmark % Stalls % Aborts

Barnes 4.89 15.3

Cholesky 4.54 2.07

Ocean .30 .52

Radiosity 3.96 1.03

Raytrace-Base 24.7 1.24

Raytrace-Opt 2.04 .41

Water 0 .11

Conflicts Less Common

Aborts

Page 29: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

29 2/15/06 HPCA-12

Conclusion

• Commits far more common than aborts– Conflicts are rare– Most conflicts can be resolved w/o aborts– Software aborts do not impact performance

• Overflows are rare (in current benchmarks)• LogTM

– Eager Version Management makes the common case (commit) fast

– Sticky States/Lazy Cleanup detects conflicts outside the cache (if overflows are infrequent)

– More work is needed to support virtualization and OS interaction

• False sharing has greater impact with TM

Page 30: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

© 2006 Mulitfacet Project University of Wisconsin-Madison

QUESTIONS?

Page 31: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

31 2/15/06 HPCA-12

How Do Transactional Memory Systems Differ?

Lazy Version Management

Eager Version Management

Lazy Conflict Detection

Eager Conflict Detection

Wisconsin LogTM

Herlihy/Moss TM

MIT LTM

Intel/Brown VTM

Databases withOptimistic Conc. Ctrl.

Stanford TCC

Databases withConservative C. Ctrl.

MIT UTM

Not done (yet)

Page 32: © 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark

32 2/15/06 HPCA-12

Hardware State

• R and W bits in cache – track read and write sets

• Register checkpoint– Fast save/restore

• Log Base and Log Pointer• TM mode bit

R W Tag Data

Data Caches

Processor

Registers Register Checkpoint

LogBase

LogPtr

TMmode