Inherent limitations facilitate design & verification of concurrent programs Hagit Attiya Technion

Inherent limitations facilitate design &

verificationof concurrent

programs

Hagit Attiya Technion

Concurrent Programs

• Core challenge is synchronization

• Correct synchronization is hard to get right

• Efficient synchronization is even harder

Principled, Automatic approach

Work with Ramalingam and Rinetzky (POPL 2010)

EXAMPLE I:VERIFYING LOCKING PROTOCOLS

The Goal: Sequential Reductions

Verify concurrent data structures• Pre-execution static analysis

E.g., linked list with hand-over-hand locking

• no memory leaks, shape (it’s a list), serializability

Find sequential reductions Consider only sequential executions But conclude that properties hold in all

executions

Back-of-envelopestimate of gain

Static analysis of a linked-list algorithm

[Amit, Rinetzky, Reps, Sagiv, Yahav, CAV 2007]

– Verifies e.g., memory safety, sortedness, pointed-to by a variable, heap sharing One thread (sequential) 10s 3.6MB

Two threads (interleaved) ~4h 886MB

Three threads (interleaved) > 8h ----

Serializability

operation

interleaved execution

complete non-interleaved execution

~~~~~~ ~~~

[Papadimitriou ‘79]

Observed by the threads locally

Serializability gives Sequential Reduction

Concurrent code M

A small subset of all executions

If M is serializable, then a local property φ holds in all executions of M iff φ holds in all complete non-interleaved executions

Easily derived from [Papadimitriou ‘79]

How do we know that M is serializable, without considering all executions?

Special (and common) case: Disciplined programming with

locksGuard access to data with locks (lock &

unlock)Only one process holds the lock at each time

Follow a locking protocol that guarantees conflict serializability

E.g., two-phase locking (2PL) or tree locking (TL)

Two-phase locking

[Papadimitriou `79]

• Locks acquire (grow) phase followed by locks release (shrink) phaseNo lock is acquired after some lock is

released

t1

H

t1t1

t2t1

Tree (hand-over-hand) locking

[Kedem & Sliberschatz ‘76] [Smadi ‘76] [Bayer & Scholnick ‘77]

• Except for the first lock, acquire a lock only when holding the lock on its parent

• No lock is acquired after being released

t1

H

t1t1

t2

Tree (hand-over-hand) locking

[Kedem & Sliberschatz ‘76] [Smadi ‘76] [Bayer & Scholnick ‘77]

• Except for the first lock, acquire a lock only when holding the lock on its parent

• No lock is acquired after being released

t1

t2t2

H

t1

void p() { acquire(B)B = 0release(B)int b = Bif (b)

acquire(A)}

void q() {acquire(B)B = 1release(B)

}

Yes!– for databases– concurrency control monitor

ensures that M follows the locking policy at run-time M is serializable

No!– for code analysis– no central monitor

Not two-phase lockedBut only in interleaved executions

Our Goal

Statically verify that M follows a locking policy

For local conflict-serializable locking protocols – Depending only on thread’s local variables

& global variables locked by it

E.g., two phase locking, tree locking, (dynamic) DAG locking…

But not protocols that rely on a centralized concurrency control monitor!

Our contribution: Easy step

complete non-interleaved executions of M

A local conflict serializable locking policy is respected in all executions iff it is respected in all non-interleaved executions

A thread-local property holds in all executions iff it holds in all non-interleaved executions

Two phase locking

Tree lockingDynamic tree

locking

Our contribution: Easy step

complete non-interleaved executions of M

Proof considers shortest execution violating the protocol + indistiguishability argument

A local conflict serializable locking policy is respected in all executions iff it is respected in all non-interleaved executions

Further reduction

Almost-complete non-interleaved executions of M

A local conflict serializable locking policy is respected in all executions iff it is respected in all almost-complete non-interleaved executions

Further reduction: A complication

Need to argue about termination

int X=0, Y=0

void p() {acquire(Y)y = Yrelease(Y); if (y ≠ 0)

acquire(X)X = 3release(X)

}

void q() {if (random(5) == 3){

acquire(Y)Y = 1release(Y)while (true) nop

}}

Y is set to 1 & the method

enters an infinite

loop

Observe Y == 1 & violates

2PL

Cannot happen in complete non-

interleaved executions

Further reduction: Termination

Can use sequential reduction to verify termination

A terminating local conflict serializable locking policy is respected in all executions iff it is respected in all almost-complete non-interleaved executions

Initial analysis results

Shape analysis of hand-over-hand linked lists

* Does not verify sortedness of list and fails to verify linearizability in some cases

Shape analysis of hand-over-hand trees (for the first time)

Our method 3.5s 4.0MB

TVLA prior 596.1s 90.3MB

Separation logic*

0.4s 0.2MB

Our method 124.6s 90.6MB

What’s next?

• Extend to other serializability protocols– shared (read) locks– non-locking non-conflict based

serializability (e.g., using timestamps)

– optimistic protocols– Aborted / failed methods

EXAMPLE II:REQUIRED MEMORY ORDERINGS

Work with Guerraoui, Hendler, Kuznetsov, Michael and Vechev (POPL 2011)

Relaxed memory models

Out of order execution of memory accesses, to compensate for slow writes

Optimize to issue reads before following writes, if they access different locations

Reordering may lead to inconsistency

CPU 0 CPU 1

cache cache

memory

interconnect

Read-after-write (RAW) Reordering

Process P:

Write(X,1)

Read(Y)

Process Q:

Write(Y,1)

Read(X)

P

QW(Y,1)

R(Y)W(X,1)

R(X)

W(X,1)

Avoiding out-of-order:Read-after-write (RAW) Fence

Process P:

Write(X,1)FENCERead(Y)

Process Q:

Write(Y,1)FENCERead(X)

P

QW(Y,1)

R(Y)W(X,1)

R(X)

Avoiding out-of-order:Atomic Operations

Atomic operations: atomic-write-after-read (AWAR)

E.g., CAS, TAS, Fetch&Add,…

atomic{read(Y) …write(X,1)

}

RAW fences / AWAR are ~60 slower than (remote) memory accesses

• Concurrent data types:– queues, counters, hash tables, trees,…– Non-commutative operations– Serializable solo-terminating implementations

• Mutual exclusion

Our result

Any concurrent program in a certain class must use RAW / AWARs

Non-commutative operations

Operation A is non-commutative if there is operation B where:

A influences Band

B influences A

Example: Queue

enq(v) adds v to the end of the queuedeq() takes item from the head of the queue

Q.deq():1;Q.deq():2 Q.deq():2;Q.deq():1deq() influence each other

Q.enq(3):ok;Q.deq():1 Q.deq():1;Q.enq(3):okenq() is not non-commutative

1 2Q

1 2Q 3

1 2Q 3

Proof Intuition: Writing

If an operation does not write, it does not influence anyoneIt would be commutative

no shared write

1

deq do not influence each other

1deq deq

Proof Intuition: Reading

If an operation does not read, it is not influenced by anyoneIt would be commutative

deq do not influence each other

no shared read

11deq deq

Proof Intuition: RAWdeq

1deq

1

W

no RAW

deq 1 1deq

serialization

Mutual exclusion

(Mutex) Two processes do not hold lock at the same time

(Deadlock-freedom) If a process calls Lock() then some process acquires the lock

Lock() operations do not “commute”!Every successful Lock() incurs a RAW / AWAR

Who should care?

• Concurrent programmers: know when is it futile to try and avoid expensive synchronization

• Hardware designers: motivation to lower cost of specific synchronization constructs

• API designers: choice of API affects synchronization

• Verification engineers: declare incorrect when synchronization is missing

“…although I hope that these shortcomings will be addressed, I hasten to add that they are insignificant compared to the huge step forward that this paper represents….”

-- Paul McKenney, Linux Weekly News, Jan 26, 2011

What else?

• Weaker operations? E.g., idempotent Work Stealing

• Other patterns• Read-after-read, write-after-write, barriers,

across-thread orders

• The cost of verifying adherence to a locking policy

• (Semi-) Automatic insertion of lock acquire / release commands or fences

And beyond…

Other theorems allowing to “cut corners” when designing / verifying concurrent applications

Thank you!

Documents

Inherent limitations facilitate design & verification of concurrent programs Hagit Attiya Technion