21
Signature Based Concurrency Control Thomas Schwarz, S.J. JoAnne Holliday Santa Clara University Santa Clara, CA 95053 tjschwarz,[email protected]

Signature Based Concurrency Control Thomas Schwarz, S.J. JoAnne Holliday Santa Clara University Santa Clara, CA 95053 tjschwarz,[email protected]

  • View
    224

  • Download
    0

Embed Size (px)

Citation preview

Signature Based Concurrency Control

Thomas Schwarz, S.J.JoAnne Holliday

Santa Clara UniversitySanta Clara, CA 95053

tjschwarz,[email protected]

Overview

Transactional concurrency control in a distributed system: Signatures are a better version of

version numbers. Signatures are calculated from the

records.

Basic Idea

A signatures is a short string of f bits calculated from a record. We assume here an LH* file scenario. File is a dictionary data structure

associating keys with a non-key field:

key c non-key field

signature

Basic Idea When a transaction reads a record it

records the signature of the record. When the transaction is ready to

commit, it checks whether any signatures of records it read have changed. If this is the case, the transaction

restarts. Otherwise, it commits.

Basic Idea

Danger of false negative: Two different records can have the

same signature. Control the probability of false

negatives by the length of the signature (16B) MD5, (20B) SHA1 are excepted

in computer forensics.

Simple Signature Scheme

Each transaction i contains atomic operations: Ri(x) – Read record x Wi(x) – Write record x Vi(x) – Verify the signature of record x Ai – Abort Ci – Commit

Simple Signature Scheme

Rules for transaction i All reads precede all verify. All verifies precede all writes. If another transaction j writes to x

between a read and a verify, then transaction i aborts.

If all verifies are successful, then the transaction does all its writes and commits.

Simple Signature Scheme

Dirty Reads: Ri(x)Wj(x) Aj Ci

or Ri (x) Wj(x) Ci Aj

Impossible, because a transaction that writes also commits.

Simple Signature Scheme

Fuzzy Reads: Ri(x)Wj(x) Cj Ri(x)

Possible only if we were to allow multiple reads to the same item x:

R1(x) W2(x) C2 R1(x) V1(x) C1.

Simple Signature Scheme

If we do all the reads in a single block: Has arguably ANSI REPEATABLE READ

property. Even has ANSI ANOMALY

SERIALIZABLE. But it is certainly not serializable:

R1(x) R2(x) R1(y) R2(y) V1(x) V2(x) V1(y) V2(y) W1(x) W2(x) W2(y) W1(y) C1 C2

Extended Signature Scheme

Add: Verify-Write phase is atomic. Then: Scheme is (conflict)

serializable. Proof (Idea):

Consider all reads to be “pre-reads”. Only the verify operations are read in the

sense of concurrency control. Then the result follows by definition.

Implementation Lock based implementation:

Read-Calculate Phase No locking at all. However, a transaction that

reads an exclusively locked record might want to reread that record because that record might change.

Verify-Write Phase Read lock on all the signatures of records read. Write lock on all the signatures of records to be

modified. Verify signatures and decide on commit / abort. Release all locks.

Implementation Lock based implementation:

Conservative Strict Two-Phase Locking

Locks are short-lived: One round of messages to acquire locks and

signatures. One round of messages for commit / abort

and release messages.

Implementation

No-locking scheme

Transaction appear to servers to be very short.

Chance for conflict limited.

Signature Implementation

We do not use the record signature directly, but a region signature. A region is a contiguous set of keys

that all hash to the same bucket. Typically, a region should have

between 0.5 and 5 records on average.

Signature Implementation Let ci be the keys in a region.

Then set the region signature to be

Arithmetic is done in a GF. g hashes keys into GF. The record signature of a non-existing

record is zero.

( ) sig(record( ))i ii I

g c c

Signature Implementation The verify operations read region

signatures. Addressed by the key-space they

cover. Locking is done on regions. Store region signatures. Large regions have little storage

overhead, small ones have large storage overhead.

Signature Implementation

Region signatures prevent phantom records.

Implementation No-Locking Scheme Assumes loosely synchronized clocks.

Clocks that are accurate to within a small multiple of average message delay.

Transaction acquires a time-stamp at the lowest numbered SDDS bucket it visits.

Transaction sends verify / write / vote requests to all servers it visited.

Each server votes on whether the transaction should commit.

In the usual way. If every server returns a yes vote to the transaction

manager, then the transaction commits. Transaction manager sends out the result of the vote.

Discussion Signature scheme interesting if

transactions have large calculation times and updates are rare.

Signature scheme should be extendible to replicated databases.

Size of region can be fit to the scale of the file, so that a region always has about the same number of records. E.g. whenever the LH* split pointer returns

to zero, split regions in half.

Discussion

Future Work: Performance evaluation