32
CS 162 Section Lecture 8

CS 162 Section

  • Upload
    oakley

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

CS 162 Section. Lecture 8. What happens when you issue a read() or write() request?. Life Cycle of An I/O Request. User Program. Kernel I/O Subsystem. Device Driver Top Half. Device Driver Bottom Half. Device Hardware. When should you return from the read()/write() call?. - PowerPoint PPT Presentation

Citation preview

Page 1: CS 162 Section

CS 162 Section

Lecture 8

Page 2: CS 162 Section

What happens when you issue a read() or write() request?

Page 3: CS 162 Section

Life Cycle of An I/O Request

Device DriverTop Half

Device DriverBottom Half

DeviceHardware

Kernel I/OSubsystem

UserProgram

Page 4: CS 162 Section

When should you return from the read()/write() call?

Page 5: CS 162 Section

Interface Timing• Blocking Interface: “Wait”

– When request data (e.g., read() system call), put process to sleep until data is ready

– When write data (e.g., write() system call), put process to sleep until device is ready for data

• Non-blocking Interface: “Don’t Wait”– Returns quickly from read or write request with count of bytes

successfully transferred to kernel– Read may return nothing, write may write nothing

• Asynchronous Interface: “Tell Me Later”– When requesting data, take pointer to user’s buffer, return

immediately; later kernel fills buffer and notifies user– When sending data, take pointer to user’s buffer, return

immediately; later kernel takes data and notifies user

Page 6: CS 162 Section

Magnetic Disk Characteristic• Cylinder: all the tracks under the

head at a given point on all surfaces• Read/write data is a three-stage

process:– Seek time: position the head/arm over the proper track (into

proper cylinder)– Rotational latency: wait for the desired sector

to rotate under the read/write head– Transfer time: transfer a block of bits (sector)

under the read-write head• Disk Latency = Queuing Time + Controller time +

Seek Time + Rotation Time + Xfer Time

• Highest Bandwidth: – Transfer large group of blocks sequentially from one track

SectorTrack

CylinderHead

Platter

SoftwareQueue

(Device Driver)

Hardw

areC

ontroller Media Time

(Seek+Rot+Xfer)

Request

Result

Page 7: CS 162 Section

We have a disk with the following parameters:

• 1TB in size• 7200 RPM, Data transfer rate of 40 Mbytes/s

(40 × 106 bytes/sec) • Average seek time of 6ms• ATA Controller with 2ms controller initiation

time • A block size of 4Kbytes (4096 bytes)

What is the average time to read a random block from the disk?

Page 8: CS 162 Section

SSD– No penalty for random access– Rule of thumb: writes 10x more expensive than reads, and erases

10x more expensive than writes (read 25μs)– Limited drive lifespan

– Controller maintains pool of empty pages by coalescing used sectors (read, erase, write), also reserve some % of capacity

– Controller uses ECC, performs wear leveling– OS may provide TRIM information about “deleted” sectors

(normally only file system knows about unallocated blocks, not the disk drive)

Page 9: CS 162 Section

How will you allocate space on disk?

Page 10: CS 162 Section
Page 11: CS 162 Section

What is the purpose of a File System?

Page 12: CS 162 Section

File System

• Transforms blocks into Files and Directories

• Optimize for access and usage patterns

• Maximize sequential access, allow efficient random access

Page 13: CS 162 Section

Linked Allocation: File-Allocation Table (FAT)

Page 14: CS 162 Section

If entry size is 16 bits

What is the max size of the FAT?

Page 15: CS 162 Section

Given a 512 byte block, What is the max size

of the FS?

Page 16: CS 162 Section

What is the space overhead of FAT?

Page 17: CS 162 Section

Multilevel Indexed Files (UNIX 4.1)

Page 18: CS 162 Section

Where are the i-nodes stored?

Page 19: CS 162 Section
Page 20: CS 162 Section

What are problems with multi-level indexed files?

Page 21: CS 162 Section

Directory Structure

Page 22: CS 162 Section

What can the FS do to improve performance?

Page 23: CS 162 Section

Bitmap of free blocks

Page 24: CS 162 Section

Variable sized splits

Page 25: CS 162 Section

Cylinder Groups

Page 26: CS 162 Section

File System Caching• Optimizations for sequential access:

– Try to store consecutive blocks of a file near each other– Store inode near data blocks– Try to locate directory near the inodes it points to

• Buffer cache used to increase file system performance– Read Ahead Prefetching and Delayed Writes

• Key Idea: Exploit locality by caching data in memory– Name translations: Mapping from pathsinodes– Disk blocks: Mapping from block addressdisk content

• Buffer Cache: Memory used to cache kernel resources, including disk blocks and name translations– Can contain “dirty” blocks (blocks yet on disk)– Size: adjust boundary dynamically so that the disk access

rates for paging and file access are balanced

Page 27: CS 162 Section

File System Caching (cont’d)• Delayed Writes: Writes to files not immediately sent out to

disk– Instead, write() copies data from user space buffer to kernel

buffer (in cache)» Enabled by presence of buffer cache: can leave written file

blocks in cache for a while» If some other application tries to read data before written to disk,

file system will read from cache – Flushed to disk periodically (e.g. in UNIX, every 30 sec)– Advantages:

» Disk scheduler can efficiently order lots of requests» Disk allocation algorithm can be run with correct size value for a

file» Some files need never get written to disk! (e..g temporary scratch

files written /tmp often don’t exist for 30 sec)– Disadvantages

» What if system crashes before file has been written out?» Worse yet, what if system crashes before a directory file has

been written out? (lose pointer to inode!)

Page 28: CS 162 Section

Log Structured and Journaled File Systems• Better reliability through use of log

– All changes are treated as transactions – A transaction is committed once it is written to the log

» Data forced to disk for reliability» Process can be accelerated with NVRAM

– Although File system may not be updated immediately, data preserved in the log

• Difference between “Log Structured” and “Journaled”– In a Log Structured file system, data stays in log form– In a Journaled file system, Log used for recovery

• For Journaled system:– Log used to asynchronously update filesystem

» Log entries removed after used– After crash:

» Remaining transactions in the log performed (“Redo”)» Modifications done in way that can survive crashes

• Examples of Journaled File Systems: – Ext3 (Linux), XFS (Unix), HDFS (Mac), NTFS (Windows), etc.

Page 29: CS 162 Section

Key Value Store

• Very large scale storage systems• Two operations

– put(key, value)– value = get(key)

• Challenges– Fault Tolerance replication– Scalability serve get()’s in parallel; replicate/cache hot

tuples– Consistency quorum consensus to improve put()

performance

Page 30: CS 162 Section

Key Value Store

• Also called a Distributed Hash Table (DHT)• Main idea: partition set of key-values across many

machineskey, value

Page 31: CS 162 Section

Chord Lookup

• Each node maintains pointer to its successor

• Route packet (Key, Value) to the node responsible for ID using successor pointers

• E.g., node=4 lookups for node responsible for Key=37

4

20

3235

8

15

44

58

lookup(37)

node=44 is responsible for Key=37

Page 32: CS 162 Section

Chord

• Highly scalable distributed lookup protocol• Each node needs to know about O(log(M)), where m is

the total number of nodes• Guarantees that a tuple is found in O(log(M)) steps• Highly resilient: works with high probability even if half of

nodes fail