15-440/15-640 Distributed Systems Concurrency Control

Concurrency Control15-440/15-640 Distributed Systems

Lecture #10, Thursday September 30th

P1 – Checkpoint done, Part A (due 10/8), Part B (10/15)

Updated Piazza policy to allow private posts• General idea still: please use public posts • Only for H/W questions that may reveal answer, we can make public • Not for detailed project debugging questions => please go to OH

HW2 – released Oct 3rd – Due in a week Oct 10th. Get started early! • Short timeline, need to release solutions for Midterm

Midterm -1 – In class, during class time, on Tuesday Oct 12th• Please arrive early (10:00 or 10:05) so that we can start on time!

AnnouncementsO U T L I N E

Consistency for multiple-objects, multiple-servers

Part I: Single Server Case• (not covered well in book)• Two Phase Locking

Part II: Distributed Transactions• Two Phase Commit (Tanenbaum 8.5)

Today's LectureO U T L I N E





a) Ignore failures in first half• Concurrency is our main concern

b) To deal with failures• Assume a form of logging, where every machine writes information

down before operating on it, to recover from simple failures. Recover after failure.

• Next lecture: Logging and Crash Recovery

Assumptions for Today

In the context of concurrency, what have we been focused on avoiding?

S O F A R I N T H I S C O U R S E ,



Generally so far, we have been focused

on various ways to ensure safe

concurrent access to a single resource.Unsafe concurrent access to shared resources.e.g., race conditions.

Unsafe concurrent access to shared resources.e.g., race conditions.



Generally so far, we have been focused

on various ways to ensure safe

concurrent access to a single resource.

Going forward, we want to handle composite operations– we want to build concurrent systems which act on multiple distinct objects!

Imagine a banking application with 3 functions that can access account information:

If each of these 3 functions were implemented in a thread-safe way, we should be OK, right?

Nope.

Composite OperationsC O N T E X T

Let’s start with an example of composite operations.

getAccount(i) deposit(j, v) withdraw(i, v)

Imagine a banking application with 3 functions that can access account information:

If each of these 3 functions were implemented in a thread-safe way, we should be OK, right?

Nope.

Composite OperationsC O N T E X T

Let’s start with an example of composite operations.

getAccount(i) deposit(j, v) withdraw(i, v)

// thread 1: transfer 100// from savings->currentwithdraw(savings, 100)deposit(current, 100)

// thread 2: check balances = getBalance(savings)c = getBalance(current)tot = s + c

Assuming these 2 threads are concurrently executing… What can happen?

Two issues…

Problems with Composite OperationsC O M P O S I T E O P E R AT I O N S

Insufficient Isolation• Individual operations each being atomic is not enough• Instead, we want the deposit & withdraw making up the transfer to

happen as one atomic operation…

1

2 Fault Tolerance• In the real-word, programs (or systems) can fail• Need to make sure we can recover safely if failure happens

(particularly– no corrupted data!)

Transactions!E N T E R :

Yep! The database kind.

Goal:Want programmer to be able to specify that a set of operations should happen atomically.

TransactionsB A C K G R O U N D

For example:

// transfer amt from A -> Btransaction {if (getBalance(A) > amt) {withdraw(A, amt)deposit(B, amt)return true

} else return false}

There are exactly two possibilities for execution:

A transaction either1. executes correctly (it commits),

or2. has no effect at all (it aborts)

Importantly: this always happens regardless of other transactions, or system crashes!

ACID PropertiesT R A N S A C T I O N S

Q: Anybody familiar with the ACID acronym?

Transactions are defined as having the 4 ACID properties.

Atomicity: Each transaction completes in its entirely, or is aborted. If aborted, should not have effect on the shared global state. Example: Update account balance on multiple servers

Consistency: Each transaction preserves a set of invariants about global state. (Nature of invariants is system dependent). Example: in a bank system, law of conservation of $$

Isolation: Also means serializability. Each transaction executes as if it were the only one with the ability to read/write shared global state.

Durability: Once a transaction has been completed, or “committed” there is no going back. In other words there is no “undo”.


Atomicity: Each transaction completes in its entirely, or is aborted. If aborted, should not have effect on the shared global state.Example: Update account balance on multiple servers





















Q: Anybody familiar with the ACID acronym?

Also noteworthy:Transactions can be nested.

And, terminologically:“Atomic Operations” ⇒ Atomicity + Isolation

Transactions are defined as having the 4 ACID properties.





Array Bal[i] stores balance of Account “i”

A Transaction Example: Bank E X A M P L E

xfer(i, j, v): if withdraw(i, v):

deposit(j, v) else

abort

withdraw(i, v): b = Bal[i] // Readif b >= v // Test

Bal[i] = b-v // Write return true

else return false

deposit(j, v): Bal[j] += v

Goal:Implement: xfer, withdraw, deposit



deposit(j, v) else

abort



else return false


Imagine the following scenario:Bal[x] = 100, Bal[y] = Bal[z] = 02 transactions:- T1: xfer(x,y,60)- T2: xfer(x,z,70)

Q: What will happen if these 2 transactions follow ACID properties and happen in some serial order?



deposit(j, v) else

abort



else return false



T1; T2:- T1 Succeeds; T2 Fails. Bal[x]=40, Bal[y]=60

T2; T1:- T2 Succeeds. T1 Fails. Bal[x]=30, Bal[z]=70



deposit(j, v) else

abort



else return false



Q: What if we weren’t careful about

ACID properties? Is there a race

condition?



deposit(j, v) else

abort



else return false



Q: What if we weren’t careful about

ACID properties? Is there a race

condition?Updating Bal[x] with Read/Write

interleaving of T1,T2

ACID VIOLATION: NOT ISOLATED, NOT DURABLE

Bal[x] = 30 or 40; Bal[y] = 60; Bal [z] = 70



deposit(j, v) else

abort



else return false



Updating Bal[x] with Read/Write interleaving of T1,T2

Bal[x] = 30 or 40; Bal[y] = 60; Bal [z] = 70




deposit(j, v) else

abort



else return false



Bal[x] = 30 or 40; Bal[y] = 60; Bal [z] = 70


sumbalance(i, j, k): return Bal[i]+Bal[j]+ Bal[k]

For Consistency, we can implement sumbalance()

Updating Bal[x] with Read/Write interleaving of T1,T2



deposit(j, v) else

abort



else return false





Q: Do you see anything wrong with this?



deposit(j, v) else

abort



else return false





Q: Do you see anything wrong with this?State invariant sumbalance=100 violated!

We created $$

Implementing Transaction with LocksE X A M P L E

xfer(i, j, v): lock()if withdraw(i, v):

deposit(j, v) else

abortunlock()

We can use locks to wrap xfer.

xfer(i, j, v):lock(i)if withdraw(i, v):

unlock(i)lock(j)deposit(j, v)unlock(j)

else unlock(i)abort

Q: However, is this a good approach? Sequential bottleneck due to global lock!

Q: Is there some other solution?

Q: Is this fixed now?

Nope. Consistency violation.Releasing lock i early before another transaction can cause consistency violation.


xfer(i, j, v):lock(i)if withdraw(i, v):

lock(j)deposit(j, v)unlock(i)unlock(j)

else unlock(i)abort

Potential fix:Release locks when updateof all state variables complete.

Q: Does this work?

Nope. Deadlock!Bal[x]=Bal[y]=100 xfer(x,y,40) and xfer(y,x,30)


xfer(i, j, v):lock(min(i,j); lock(max (i,j)) if withdraw(i, v):

deposit(j, v)unlock(i); unlock(j)

else unlock(i); unlock(j)abort

General rule: Always acquire locks according to some consistent global order

This works

This is also the motivation for 2-phase locking!

Why does this version work?We can visualize what’s happening by representing the state of the locks in a directed acyclic graph called a “waits-for” graph.

Acquiring Locks in a Unique OrderW E K N O W : O R D E R O F L O C K A C Q U I S I T I O N I S I M P O R TA N T

Pi Pj

Pk

Consider “wait-for” graph for state of locks.

• Vertices represent transactions

• Edge from vertex Pi to vertex Pjif transaction Pi is waiting for lock held by transaction Pj.

What does a cycle mean?


Pi Pj

Pk




Can a cycle occur if we acquire locks in a unique order?


Pi Pj

Pky

x

z





No. Label edges with a lock ID.

– For any cycle, there must be some pair of edges (i, j), (j, k) labeled with values x & y. As k holds y, but waits for x: y < x

– Transaction j is holding lock x and it wants lock y, so y > x

– Implies that j is not acquiring its lock in proper order.


Pi Pj

Pky

x

z





No. Label edges with a lock ID.

– For any cycle, there must be some pair of edges (i, j), (j, k) labeled with values x & y. As k holds y, but waits for x: y < x

– Transaction j is holding lock x and it wants lock y, so y > x

– Implies that j is not acquiring its lock in proper order.

This general scheme is known as

two-phase locking.(Specifically, strong strict two-phase locking.)

Two-Phase Locking (2PL)E N T E R :

General 2-phase locking • Phase 1: Acquire or Escalate Locks (e.g. read => write)• Phase 2: Release or de-escalate lock

Strict 2-phase locking • Phase 1: (same as before.) Acquire or Escalate Locks (e.g. read => write) • Phase 2: Release WRITE lock at end of transaction only

Strong Strict 2-phase locking • Phase 1: (same as before) Acquire or Escalate Locks (e.g. read => write) • Phase 2: Release ALL locks at end of transaction only. • Most common version, required for ACID properties!

2-Phase Locking Variants TA X O N O M Y

There are several kinds of 2PL.

Why not always use strong-strict 2-phase locking?• A transaction may not know the locks it needs in advance

2-Phase Locking S H O U L D W E A L W AY S U S E I T ?

if Bal(heather) < 100: x = find_richest_prof() transfer_from(x, heather)

•Ways to handle deadlocks• Lock manager builds a “waits-for” graph. On finding a cycle, choose offending transaction and force abort

• Use timeouts: Transactions should be short. If hit time limit, find transaction waiting for a lock and force abort.

For reliability, transactions are split into two phases:

Phase 1: Preparation: • Determine what has to be done, how it will change state, without

actually altering it.• Generate Lock set “L” • Generate List of Updates “U”

Phase 2: Commit or Abort:• Everything OK, then update global state • Transaction cannot be completed, leave global state as is• In either case, RELEASE ALL LOCKS

Transactions: Split Into 2 PhasesR E T U R N I N G T O T R A N S A C T I O N S

Returning our attention to transactions…

2PL Applied to Banking ExampleE X A M P L E

xfer(i, j, v):L={i,j} //Locks U=[] //List of Updatesbegin(L) //Begin transaction, Acquire locksbi = Bal[i]bj = Bal[j]if bi >= v:

Append(U,Bal[i] <- bi – v)Append(U, Bal[j] <- bj + v) commit(U,L)

else abort(L)

commit(U,L):Perform all updates in U Release all locks in L

Q: So, what would “commit” and ”abort” look like?

abort(L): Release all locks in L





Partition databases across multiple machines for scalability(E.g., machine 1 responsible for account m, machine 2 responsible for account n)

Transactions often touch more than one data partition.

How do we guarantee that all of the partitions commit the transactions or none commit the transactions?• Transferring money from m to n.• Requirement: both machines do it, or neither.

Distributed Transactions?B R I N G I N G T R A N S A C T I O N S T O D I S T R I B U T E D S Y S T E M S

21

account m account n

Multiple servers (S1, S2, S3, …), each holding some objects which can be read and written within client transactionsMultiple concurrent clients (C1, C2, …) who perform transactions that interact with one or more servers• E.g. T1 reads x, z from S1, writes a on S2, reads + writes j on S3• E.g. T2 reads i, j from S3, then writes z on S1A successful commit implies agreement at all servers

Modeling Distributed TransactionsE X A M P L E O F W H AT W E ’ R E S T R I V I N G T O W A R D S

i = 2j = 4

a = 7b = 8c = 1

x = 5y = 0z = 3

S1

S2

S3

C1 C2

T1 transaction {if (x<2) aborta:= zj:= j + 1

}

T2 transaction {z:= (i+j)

}

• Similar idea as before, but: • State spread across servers (maybe even WAN) • Single transaction should be able to read/modify global state while

maintaining ACID properties.• Failures!

• Overall Idea:• Client initiates transaction. • Makes use of “coordinator”• All other relevant servers operate as “participants” • Coordinator assigns unique transaction ID (TxID)

• Distributed Commit• 2-phase commit protocol (2PC)

Enabling Distributed Transactions B R I N G I N G T R A N S A C T I O N S T O D I S T R I B U T E D S Y S T E M S

A simple distributed commit protocol:• Coordinator tells other processes that are

involved (called participants) whether or not to locally perform the operation in question.

• This is called the one-phase commit protocol.

Even without failures, a lot can go wrong, for example:• Account m on Server 2 has only $90• Account m doesn’t exist!

Distributed CommitA S I M P L E S C H E M E

Coordinator

Server 1 Server 2

Client0: start

1: m, +$100 1: j, -$100

2: done

Q: What part of ACID does this violate?

Phase 1: Prepare & Vote• Participants figure out all state changes • Each determines if it can complete the transaction• Communicate with coordinator

Phase 2: Commit• Coordinator broadcasts to participants: COMMIT / ABORT• If COMMIT, participants make respective state changes

2-Phase CommitO V E R V I E W

Implementing 2-Phase Commit2 P C

Coordinator

Server 1 Server 2

Client0: start

Implemented as a set of messages.


Coordinator

Server 1 Server 2

Client

1A: CanCommit? 1A: CanCommit?


Messages in first phase • 1A: Coordinator sends CanCommit? to participants


Coordinator

Server 1 Server 2

Client

1B: VoteCommit 1B: VoteCommit


Messages in first phase • 1A: Coordinator sends CanCommit? to participants• 1B Participants respond: VoteCommit or VoteAbort


Coordinator

Server 1 Server 2

Client

2A: DoCommit 2A: DoCommit



Messages in the second phase• 2A: All VotedCommit: , Coord sends DoCommit• If any VoteAbort: abort transaction. Coordinator sends DoAbort to everyone => release locks



Messages in the second phase• 2A: All VotedCommit: , Coord sends DoCommit• If any VoteAbort: abort transaction. Coordinator sends DoAbort to everyone => release locks


Coordinator

Server 1 Server 2

Client3: done

Implementing 2-Phase CommitA N O T H E R V I E W

time

S1

S2

S3

yes

C

yes

yes

ack

ack

ack

doCommit(TxID)canCommit(TxID)?

ack = haveCommitted(TxID,P)yes = voteCommit(TxID,P)

Bank Account m at Server 1, n at Server 2.

BankingE X A M P L E

L={m} Begin(L) // Acq. Locks U=[] //List of Updates b=Bal[m]if b >= v:

Append(U, Bal[m] <- b – v)vote commit

elsevote abort

L={n} Begin(L) // Acq. LocksU=[] //List of Updatesb=Bal[n]Append(U, Bal[n] <- b + v)vote commit

Server 1 implements transactionServer 2 implements transaction

Server 2 can assume that the account of m has enough money, otherwise whole transaction will abort.

Q: What about locking?

21

account m account n

Correctness• Neither can commit unless both agreed to commit

Performance• 3N messages per transaction

How to handle failure?• Timeouts à performance bad in case of failure!

Properties of 2-Phase Commit2 P C

Distributed deadlock • Cyclic dependency of locks by transactions across servers

• In 2PC this can happen if participants unable to respond to voting request e.g. still waiting on a lock on its local resource.

• Handled with a timeout. Participants times out, then votes to abort. Retry transaction again. • Addresses the deadlock concern • However, danger of LIVELOCK – keep trying!

Deadlocks and Livelocks 2 P C

Q: How can this happen?

Q: How can we fix this?

Coordinator times out after CanCommit?• Hasn’t sent any commit messages, safely abort• Conservative. Why?• Preserve correctness, sacrifice performance

Participant times out after VoteAbort• Can safely abort unilaterally.• Why?

Timeout and Failure Cases (1)2 P C

Participant times out after VoteCommit•Are unilateral decisions possible? Commit, Abort?•Participant could wait forever

Solution: ask another participant (gossip protocol)• Learn coordinator’s decision: do the same• Assumption: non-Byzantine failure model

• Other participant hasn’t voted: abort is safe. Why?• Coordinator has not made decision

• No reply or other participant also VoteCommit: wait• 2PC is “blocking protocol” à 3PC in book.

Timeout and Failure Cases (2)2 P C

2PC widely used in practice

In transactional APIs for databases and cloud platforms

Logging and Crash Recovery• Crucial to handle crashes / reboots

(next lecture)• Very powerful and resilient when paired with

RAID (3 lectures from now)

2 Phase Commit in Practice

NDB Cluster

• Distributed consistency management• ACID Properties desirable • Single Server case: use locks + 2-phase locking (strict 2PL, strong strict

2PL), transactional support for locks• Multiple server distributed case: use 2-phase commit for distributed

transactions. Need a coordinator to manage messages from participants• 2PC can become a performance bottleneck

Summary

Overview:•2PC notation from the Book•Terminology used by messages different, but essentially the protocol is the same•Pointers to 3PC (fully described in the book)

Additional Material

Two-Phase Commit (1)C o o r d i n a t o r / P a r t i c i p a n t c a n b e b l o c k e d i n 3 s t a t e s :

Participant: Waiting in INIT state for VOTE_REQUEST

Coordinator: Blocked in WAIT state, listening for votes

Participant: blocked in READY state, waiting for global vote

(a) The finite state machine for the coordinator in 2PC. (b) The finite state machine for a participant.

•What if a “READY” participant does not receive the global commit? Can’t just abort => figure out what message a co-ordinator may have sent.•Approach: ask other partcipants• Take actions on response on any of the participants• E.g. P is in READY state, asks other “Q” participants

Two-Phase Commit (2)

What happens if everyone is in ”READY” state?

• For recovery, must save state to persistent storage (e.g. log), to restart/recover after failure. • Participant (INIT): Safe to local abort, inform Coordinator• Participant (READY): Contact others• Coordinator (WAIT): Retransmit VOTE_REQ• Coordinator (WAIT/Decision): Retransmit VOTE_COMMIT

Two-Phase Commit (3)

2PC: Actions by Coordinator

Why do we have the ”write to LOG” statements?

2PC: Actions by Participant

Wait for REQUESTS

If Decision to COMMITLOG=>Send=> Wait

No response? Ask others

If global decision,COMMIT OR ABORT

Else, Local DecisionLOG => send

2PC: Handling Decision Request

Note, participant can only help others if it has reached a global decision and committed it to its log.

What if everyone has received VOTE_REQ, and Co-ordinator crashes?

Documents

15-440/15-640 Distributed Systems Concurrency Control