View
217
Download
0
Embed Size (px)
Citation preview
CPSC-608 Database Systems
Fall 2008
Instructor: Jianer ChenOffice: HRBB 309BPhone: 845-4259Email: [email protected]
Notes #6
Optimizing Disk Access
Optimizing Disk Access
(By reducing the seek time and rotational delay
via Operating Systems and Disk Controllers)
Optimizing Disk Access
(By reducing the seek time and rotational delay
via Operating Systems and Disk Controllers)3 or 5x
x
1 NCylinders Traveled
Time
Average seek time = 20 msShortest seek time = 5 ms
Average rotational delay = 8 ms
Dealing with many random accesses(disk scheduling) • Suppose that we have a large (dynamic)
sequence of disk read/write tasks, on blocks randomly distributed in the disk.
• How do we order the tasks so that the total time can be minimized?
• Elevator algorithm
Elevator algorithm
Disk head makes sweeps across the disk, stops at a cylinder if a task reads/writes a block in the cylinder, and reverses its direction if no read/write tasks (at that moment) ahead.
Intuitively good, in particular when there are a large number of tasks reading/writing blocks uniformly distributed in the disk
Real-time mannerPrecise analysis is difficult
Dealing with a long sequence of data on disk • Data in consecutive cylinders• Larger buffer• Pre-fetch/double buffering• Disk arrays• Mirrored disks
Example. Sorting on disk (again)
• A relation R of 10M tuples takes 100K blocks
• Main memory can store 6400 blocks• A disk block read/write: 40 ms
seek time = 31 ms, rotational delay = 8 ms transfer time = 1 ms
• Two-phase Multiway Sorting on randomly distributed blocks takes about 4.5 hours.
• Also assume that a track holds 500 blocks, and that traversing one cylinder takes 5 ms
Data in consecutive cylinders
In phase 1, suppose that we have the input relation stored in consecutive tracks
We can read/write 6400 consecutive blocks between main memory and disks
Phase 1 read/write: 2*(100K/6400)(31 + 8 + 12*5 + 6400*1) ≈ 208000 ms < 4 minutes (save 2 hour)
Larger Buffer
In phase 2, we have 16 sublists, each takes a block in main memory, with 6384 blocks left.
If we use all these 6384 blocks for output buffer, and write them to disk only when they are all full:
Phase 2 writing: (100K/6384)(31 + 8 + 12*5 + 6384*1) = 104000 ms < 2 minutes (save 1
hour)
However
The reading in phase 2 seems harder to improve: it is kind of random.
Double Buffering
For applications where the read/write is predictable.
Have a program» Process B1» Process B2» Process B3
...
Single Buffer Solution
1. Read B1 Buffer2. Process Data in Buffer3. Read B2 Buffer4. Process Data in Buffer ...
Let P = time to process/blockR = time to read in 1 blockn = # blocks
Single buffer time = n(P+R)
Double Buffering
Memory:
Disk: A B C D GE F
Double Buffering
Memory:
Disk: A B C D GE F
A
Double Buffering
Memory:
Disk: A B C D GE F
B
done
process
A
Double Buffering
Memory:
Disk: A B C D GE F
C
process
B
done
if P R
• Double buffering time = R + nP
• Single buffering time = n(R+P)
P = Processing time/blockR = IO time/blockn = # blocks
Disk ArraysTaking the advantage that disk read/write
can be done in parallel between a single CPU and multiple disks.
logically one disk
Would not help if the interesting blocks are in the same disk
Mirrored Disks
Duplicating disks so that multiple reads in the same disk can be done in parallel.
A A B B
Writing is more (but not much) expensive
Disk Failures
• Partial Total• Intermittent Permanent
Coping with Disk Failures
• Detection– Checksum
• Correction– Redundancy
At what level do we cope?
Operating System Level (Stable Storage)
Logical block Copy A Copy B
Database System Level (Log File)
Log
Current DB Yesterday’s DB
Intermittent Failure Detection (Checksums)
• Idea: add n parity bits every m data bits– Ex.: m=8, n=1
• Block A: 01101000:1 (odd # of 1’s)• Block B: 11101110:0 (even # of 1’s)
• But suppose: Block A instead contains• Block A’: 01000000:1 (also has odd # of 1’s) 50% change of detection per parity bit
• More parity bits decrease the probability of an
undetected failure 1/2n (with n ≤ m independent parity bits)
Disk Crash (Disk Arrays)• RAIDs (Redundant Arrays of Inexpensive Drives)
logically one disk
Disk Arrays• RAID Level 1 (Mirroring)
– Keep exact copy of data on redundant disks
AA BB AA BB
Disk Arrays• RAID Level 4
– Keep only one redundant disk– Entire parity blocks on redundant disk
AA BB CC PP
Parity Blocks & Modulo-2 Sums
• Have an array of 3 data disks– Disk 1, block 1: 11110000– Disk 2, block 1: 10101010– Disk 3, block 1: 00111000
• … and 1 parity disk– Disk 4, block 1: 01100010
Note: - Sum over each column is always an even # of 1’s
- Mod-2 sum can recover any missing single row (e.g., a logical block)
Using Mod-2 Sums for Error Recovery
– Suppose we have:– Disk 1, block 1: 11110000
– Disk 2, block 1: ????????– Disk 3, block 1: 00111000– Disk 4, block 1: 01100010 ( Parity)
– Mod-2 sums for block 1 over disks 1,3,4: Disk 2, block 1: 10101010
Disk Arrays
• RAID Level 5 (Striping)– Like level 4, but balanced read & write
load
DDCCBBAA
Parity partition on each disk
Disk Arrays
• RAID Level 6 (error correction code) more powerful, can recover from
more than one task crashes.