1
Loading File into Memory
DMA buffer replacement LRU
제 07 강 : Loading File into Memory
2
• Buffer• each buffer -- holds one disk block (sector)• kernel has N buffers -- shared by all
– OS needs information about each buffer• user Clinton, Bob, ... (who’s using this buffer now)
• hw device, sector number• state free/used (empty/waiting/reading/full/locked/writing/dirty)
• “buffer header” (struct)• stores all information about each buffer• points to actual buffer• buffer header has link fields (doubly linked)
– device_list, free_buffer_list, I/O_wait_list
sector
DMA
buffer
cpu
Memory
3
“Buffer Cache”• Managed like CPU cache
– read ahead (reada)– delayed write (dwrite)
• dwrite– just set “dirty* bit” in buffer cache (on update)– write to disk later (when it is being replaced)
• reada– prefetch if offset moves sequentially
• dirty: data came from disk. Later memory copy is modified. Now disk copy and memory copy are different
sector
DMA
buffer
cpu
Memory
4
Delayed Write ---- Pros & cons
• Good performance– many disk traffic can be saved
• Complex reliability– logically single information – physically many copies (disk, buffer) --
inconsistency– If system crashes ...
sector
DMA
buffer
cpu
Memory
5
Pow
er
t
(2) computer full stop
(1) problem detected
6
Pow
er
t
problemdetected& interrupt
computer full stop
How many disk blocks can you save during this interval?
Emergency actionduring this period
7
Crash ...• Only few blocks can be saved• What happens if they cannot be saved…?
if lost, following goes wrongsuperblock which block is free/occupied?
inode pointer to file data block
data block if directories -- subtree structure
if regular files -- just a file content
• metadata are more important – superblock, directory, inode
8
Damage --- if this block becomes bad block?
Superblock
root
directory
inode
data
HolesOccupied
9
Crash ...• In program, sync(2) system call
– sync(2) flush (disk write) dirty buffers • doesn’t finish disk I/O (just queue them) on return• So sync(2) twice …2nd return guarantees flush
• At keyboard– updated calls sync(2) every 30 second -- periodic– halt(8), shutdown(8) calls sync(2) -- by super user try man 8 intro …. (before logoff)
• Caution– Do not power down without sync(2) or halt(8)– Otherwise the system crashes. What if it crashes?
10
fsck(8)• file system check -- check & repair file system
– performed at system bootup time– start from root inode -- mark all occupied blocks– start from superblock -- mark all free blocks– something is wrong if:
• some block has no incoming arc (unreachable)• some block has many incoming arc (reached many times)• lost+found
– Very time-consuming10 ms. * (1 GB / 1 KB) = 10 mega ms. = 10,000 sec !!!
11
Design Goal• Original UNIX file system design was
– cheap, good performance – adequate reliability for School, SW house
• on power fault ( 電源 中斷 )– max. 30 seconds’ amount of work is gone– most important metadata are saved
– timesharing market (school, sw house)
• UNIX for bank?– Need to solve these problems
flush
30 sec 30 sec
Power Down?
SomeContents lost
12
Modern systems• System V
– To reduce boot time (minimize downtime)• On successful return from sync(2), make /fastboot file• if /fastboot exits, system was shutdown cleanly (don’t fsck)• After successful boot, remove /fastboot file• If /fastboot doesn’t exist, do fsck (only for /etc/fstab)
• Log Structured File System– collect dirty nodes in one big segment (~track size)– periodically write this log to disk
• fast -- no seek/rotational delay
– recovery is fast & complete
sector
DMA
buffer Memory
13
Issues • Transactional guarantee
– Write all, or no write at all – “Account A Account B (transfer $ 100)”– Atomic transaction– Write both or cancel both
• Ordering guarantee– “Delete file A”
1. Modify parent directory’s data block (file name A)2. Release file A’s inode (address of data block sectors, …)3. Release file A’s data block
– Suggested order : (3 2 1), – Otherwise, A’s inode exists, pointer exists, wrong data
…,– Write the next block to disk, only if previous write is
complete synchronous write
** Reference: Vahalia, 11.7.2
directory
a b dev bin7 9 11 45
inode of b
“remove b”
pointers[ ]
data data data
14
Back to buffer cache
15
22
23
88
83
14
25
45
32
74
37
11
19
Free buffers
Some buffers are linked to free buffer pool
16
11
43
23
33
15
44
54
64
97
10
99
Disk 3
Some buffers are allocated to a device
18
17
Process 1
Allocate buffers to whom?
CPU
user
CPU
inodeoffset
dev
Buffer cache
UNIX
Linux
18
11
43
23
33
15
44
54
64
97
10
99
Disk 3
18
Among buf allocated to dev ... some will do (waiting) DMA some is currently doing DMA others has done DMA
Buffer header has flag
(I/O wait queue) within (dev)
19
11
43
23
33
15
44
54
Disk 3
Some buffers are waiting for disk I/O
18I/O waitQueue
Waiting to do DMA
has done DMA
20
struct buf{
int b_flags; /* see defines below */
struct buf *b_forw; /* headed by devtab of b_dev */struct buf *b_back; /* " */struct buf *av_forw; /* position on free list, */struct buf *av_back; /* if not BUSY*/
int b_dev; /* major+minor device name */char *b_blkno; /* block # on device */int b_wcount; /* transfer count (usu. words) */char b_error /* returned after I/O */
char *b_addr; /* low order core address */char *b_xmem; /* high order core address */
} buf[NBUF];struct buf bfreelist;
21
struct devtab{ char d_active; /* busy flag */
char d_errcnt; /* error count (for recovery) */structbuf *b_forw; /* first buffer for this dev */structbuf *b_back; /* last buffer for this dev */structbuf *d_actf; /* head of I/O queue */struct buf *d_actl; /* tail of I/O queue */
};
structdevtab
d_activeb_forwb_backd_actfd_actl
11
43
23
33
15
44
54
64
97
10
99
18
I/O waiting buffers
22
Remember ..OS Kernel
CPU
PCB
mem disk
PCB PCB
tty
Process 1 Process 2 Process 3
CPU mem disk tty
(plain C program with variables and functions)
: Table (Data Structure): Object (hardware or software)
23
Kernel Data Structure
CPU
user
Process 1
CPU
inodeoffset
disk_read ( )
devswtab
devtab
/
bin etc
cc date sh getty passwd
Buffer cache
superblock
inode
data
24
– Each buffer header has 4 link fields– buf can belong to two doubly linked list at a time– read(fd) system call
• get offset• get inode
– checks access permission (rwx rwx rwx)
– mapping: offset sector address – get major/minor device number
• search buffer cache (buffer header has disk & sector #)– start from device table, traverse the links– compare each buffer with sector address
• if already in buffer cache, done• if miss, then arrange to read from disk
user file inode dev
fdoffset
25
– read() system call{fd offset inode device search buffer list}
If (hit) then
done /* return data from buffer cache */ else /* buffer cache miss – must read disk */ if (free buf available?)
then /* using this free buffer, read disk */ get buf read disk fill buf doneelse /* need replacement first */ {get most LRU buffer
If (dirty?) {write old content -first, delayed write} {read disk fill buf done}}
26
mounting
System can have many file systems
Compare with Windows {C: D: E: ...}
27
BootblockSuperblockInode list
Data block
BootblockSuperblockInode list
Data block
BootblockSuperblockInode list
Data block
FS 1
FS 2
FS 3
<Logically>
FS
At bootup timespecify which F.S. to bootas a “root file system”
FS
FS
28
BootblockSuperblockInode list
Data block
BootblockSuperblockInode list
Data block
BootblockSuperblockInode list
Data block
FS 1
FS 2
FS 3
<Logically>
dsk1
dsk2
dsk3
“root file system”
/
bin etc usr
date sh getty passwd
Now all files under root file system can be accessed
But how do we access files in other file systems?
Windows C: D: E:
29
BootblockSuperblockInode list
Data block
BootblockSuperblockInode list
Data block
BootblockSuperblockInode list
Data block
FS 1
FS 2
FS 3
<Logically>
dsk1
dsk2
dsk3
/
bin etc usr
date sh getty passwd
/
bin include src
utsstudio.hbanner yacc
/dev/dsk3
Mount it!
30
/
bin etc usr
date sh getty passwd bin include src
utsstudio.hbanner yacc
System callmount (path1, path2, option)
dev special file: /dev/dsk3 (which)
mount point: /usr (where)
example: read-only (how)After mounting,
/dev/dsk3 is accessed as /usr
i-numbersin disk-1
rootsuperblock
i-numbersin disk-2
rootsuperblock
31
/
bin etc usr
date sh getty passwd bin include src
utsstudio.hbanner yacc
Mount Table Entry Purpose:
- resolve pathname- locate superblock
inode (/usr)
inode (root)
superblock
device number
32
Inode table
inodeof /usr
inodeof dsk 3 root
SuperblockMounted on inode
Root inode
Mount table
buf
Relationship between Tables
Buffer Cabe
33
Disk File System
• Boot block• Superblock pointers to free space
in disk• inode list pointers to data block• data block
• mounting file system