Upload
chavi
View
54
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Extending Open64 with Transactional Memory features. Jiaqi Zhang Tsinghua University. Contents. Background Design Implementation Optimization Experiment Conclusion. Transactional Memory Background. Trend to concurrent programming Current solution: Lock Flaws: - PowerPoint PPT Presentation
Citation preview
Extending Open64 withTransactional Memory features
Jiaqi ZhangTsinghua University
Contents
• Background• Design• Implementation• Optimization• Experiment• Conclusion
Transactional Memory Background
• Trend to concurrent programming• Current solution:
– Lock– Flaws:
• Association between locks and data• Deadlock• Not composable
Transactional Memory Background
a.credit(amount);b.debit(amount);
class Account{ int balance; lock mylock; bool credit(int amount); bool debit(int amount); };
bool credit(int amount){ acquire(mylock); balance+=amount; release(mylock);}bool debit(int amount){ acquire(mylock); balance-=amount; release(mylock);}
inconsistent stateacquire(a.mylock);acquire(b.mylock);
release(a.mylock);release(b.mylock);
Poor abstraction of class AccountDeadlockExposed implementation details
transfer(Account a, Account b, int amount){
}
atomic{ a.credit(amount); b.debit(amount);}
Transactional Memory Background
• Current Implementations– TM libraries
• DSTM• DracoSTM• TL2• TinySTM• ……..
Function calls:TM_INIT()/TM_SHUTDOWN()TM_ATOMIC_BEGIN()/TM_ATOMIC_END()TM_SHARED_READ()/TM_SHARED_WRITE()
Explicit Transaction
Transactional Memory Background
• Current Implementations– Compilers
• Intel C++ STM Compiler• Tanger• OpenTM• GCC
Design
• Programming Interfaces#pragma tm atomic [clause]structured block
readonly
private(var list)
shared(var list)
#pragma tm abort
#pragma tm functionfunction declaration
#pragma tm waiverfunction declaration
Design
• TM runtime interfaces (TL2)Interface Description
Thread* TxNewThread() Allocate a new Thread structure to keep logs
TxStart(Thread* Self, jmp_buf* buf, int flags) Start a new transaction for current thread
TxCommit(Thread* Self) Commit the current transaction
TxLoad(Thread* Self, void* addr) Perform synchronized load from given memory address
TxStore(Thread* Self, void* addr, intptr_t val) Perform synchronized store to given memory address
TxStoreLocal(Thread* Self, void* addr, intptr_t val) Perform locally logged store to given memory address
TxAbort(Thread* Self) Abort the current transaction and re-execute
Design
• Wrapper functions– To ease the process of integrating new TM librariestm_init()/tm_finalize()tm_thread_start()/tm_thread_end()
__tm_atomic_begin()/__tm_atomic_end()__tm_shared_read()/__tm_shared_read_float()__tm_shared_write()/__tm_shared_write_float()__tm_local_write()/__tm_local_write_float()
by programmers
by compiler
more wrapper functions are needed for other data types, and additional TM semantics
Design
• Optimization– Eliminate redundant calls to runtime libraries
Implementation
• General Transformation
Implementation
• General Transformation– #pragma tm atomic– simple statements– control flow statements
• IF• WHILE_DO
a = b+c;
PARM #address of cCALL <__tm_shared_read> LDID <return_offset>STID #tm_preg_num_0 PARM #address of bCALL <__tm_shared_read> LDID <return_offset> STID #tm_preg_num_1 LDID #tm_preg_num_0 LDID #tm_preg_num_1 ADD PARM PARM #address of aCALL <__tm_shared_write>
setjmp();__tm_atomic_begin();
for(;i<10;i++){}
PARM #address of ICALL <__tm_shared_read> LDID <return_offset>STID #tm_preg_num_0WHILE_DO LDID #tm_preg_num_0 INTCONST 9 LEBODY BLOCK ……………. PARM #address of I CALL <__tm_shared_read> LDID <return_offset> STID #tm_preg_num_0 END_BLOCK
Implementation
• General Transformation1.1 int i = 0;
1.2 #pragma tm atomic
{
1.3 int j = 0;
1.4 for(i=0;i<20;i++)
{
1.5 for(j=0;j<10;j++)
{
1.6 result++;
}
}
}
2.1 int i = 0;
2.2 jmpbuf jbuf;
2.3 _setjmp(jbuf);
2.4 TxStart(Self, jbuf);
2.5 TxStore(Self, &j, 0);
2.6 for (TxStore(Self, &i, 0); TxLoad(Self, &i)<20;
TxStore(Self, &i, TxLoad(Self, &i)+1)){
2.7 for(TxStore(Self, &j, 0); TxLoad(Self, &j)<10;
TxStore(Self, &j, TxLoad(Self, &j)+1)){
2.8 TxStore(Self, &result, TxLoad(Self, &result)+1);
}}
2.9 TxCommit(Self);
Implementation
• Functions– clone and instrument
#pragma tm functionvoid calculate(){}
void calculate()
__tm_cloned__calculate() //instrumented
#pragma tm atomic{ calculate();}
#pragma tm atomic{ __tm_cloned__calculate();}
Implementation
• Optimization1.1 int i = 0;
1.2 #pragma tm atomic
{
1.3 int j = 0;
1.4 for(i=0;i<20;i++)
{
1.5 for(j=0;j<10;j++)
{
1.6 result++;
}
}
}
2.1 int i = 0;
2.2 jmpbuf jbuf;
2.3 _setjmp(jbuf);
2.4 TxStart(Self, jbuf);
2.5 TxStore(Self, &j, 0);
2.6 for (TxStore(Self, &i, 0);; TxLoad(Self, &i)<20;
TxStore(Self, &i, TxLoad(Self, &i)+1)){
2.7 for(TxStore(Self, &j, 0); TxLoad(Self, &j)<10;
TxStore(Self, &j, TxLoad(Self, &j)+1)){
2.8 TxStore(Self, &result, TxLoad(Self, &result)+1);
}}
2.9 TxCommit(Self);
Transaction local variables : detected by the frontend
Implementation
• Optimization1.1 int i = 0;
1.2 #pragma tm atomic
{
1.3 int j = 0;
1.4 for(i=0;i<20;i++)
{
1.5 for(j=0;j<10;j++)
{
1.6 result++;
}
}
}
2.1 int i = 0;
2.2 jmpbuf jbuf;
2.3 _setjmp(jbuf);
2.4 TxStart(Self, jbuf);
2.5 j=0;
2.6 for (TxStore(Self, &i, 0); TxLoad(Self, &i)<20;
TxStore(Self, &i, TxLoad(Self, &i)+1)){
2.7 for(j=0; j<10;j++)){
2.8 TxStore(Self, &result, TxLoad(Self, &result)+1);
}}
2.9 TxCommit(Self);
Barrier Free variables : detected according to its storage class
Implementation
• Optimization1.1 int i = 0;
1.2 #pragma tm atomic
{
1.3 int j = 0;
1.4 for(i=0;i<20;i++)
{
1.5 for(j=0;j<10;j++)
{
1.6 result++;
}
}
}
2.1 int i = 0;
2.2 jmpbuf jbuf;
2.3 _setjmp(jbuf);
2.4 TxStart(Self, jbuf);
2.5 j=0;
2.6 for (; i<20; TxStoreLocal(Self, &i, i+1)){
2.7 for(j=0; j<10;j++)){
2.8 TxStore(Self, &result, TxLoad(Self, &result)+1);
}}
2.9 TxCommit(Self);
Implementation
• Optimization– Optimization opportunities detection strategy
• Pthread parallel task – transaction local: declared in tm atomic scope– barrier free: auto variables
• Cloned transactional function– transaction local: declared in the function
• OpenMP parallel task– transaction local: declared in tm atomic scope– barrier free: declared in micro task, marked in openmp private clause
• Checking readonly transactions
– Limitation• Reserved design for pointers• Needs programmers to participate in optimization
Preliminary Experiments• Compare with fine-grained lock based application
Preliminary Experiments
• Compare with manually instrumented application
Preliminary Experiments
#pragma tm atomic{ int j; *new_centers_len[index] ++; for(j=0;j<nfeatures;j++){ new_centers[index][j]+=feature[i][j]; }}
private(feature)
Conclusion & Future work
• A infrastructure for TM on Open64– Replaceable TM implementation– Optimization
• More experiments on non-trivial applications are desired• Nested transaction• Signal processing• Event handler• Indirect calls• Dealing with legacy code• …
FastDB: 8 out of 75 critical regions contain nested transactionsFastDB: 28 out of 75 critical regions contain signal processing
PARSEC: 20 out of 55 critical regions contain signal processing
Thanks