PERSISTENCE: FILE APIpages.cs.wisc.edu/~shivaram/cs537-sp20-notes/file-api/cs537-file-ap… · File...

PERSISTENCE: FILE API

Shivaram VenkataramanCS 537, Spring 2020

ADMINISTRIVIA

Midterm grades are up!

Project 4a: due tomorrow at 10pmDiscussion today:

Some debugging hints (valgrind), P4b preview

AGENDA / LEARNING OUTCOMES

How do we achieve resilience against disk errors?

How to name and organize data on a disk?

What is the API programs use to communicate with OS?

ApplicationBuild logical disk from many physical disks.

Logical Disk

RAID: Redundant Array of Inexpensive Disks

Logical disk gives

capacity,

performance,

reliability

MetricsCapacity: how much space can apps use?

Reliability: how many disks can we safely lose? (assume fail stop)

Performance: how long does each workload take? (latency, throughput)

Normalize each to characteristics of one disk

Different RAID levels make different trade-offs

RAID Level Comparisons

Reliability Capacity Read latencyWrite

Latency Seq Read Seq Write Rand ReadRand Write

RAID-0 0 C*N D D N * S N * S N * R N * R

RAID-1 1 C*N/2 D D N/2 * S N/2 * S N * R N/2 * R

Raid-4 Strategy

Use parity disk

If an equation has N variables, and N-1 are known, you can solve for the unknown.

Treat sectors across disks in a stripe as an equation.

Data on bad disk is like an unknown in the equation.

RAID 4: Example

Stripe:

Disk0 Disk1 Disk2 Disk3 Disk4

What functions can we use to compute parity?

parity

RAID-4: AnalysisWhat is capacity?How many disks can fail?Latency (read, write)?

1 0 1 1 1

N := number of disksC := capacity of 1 diskS := sequential throughput of 1 diskR := random throughput of 1 diskD := latency of one small I/O operation

parity

0 1 1 0 0

1 1 0 1 1

RAID-4: Throughput

What is steady-state throughput for- sequential reads?- sequential writes?- random reads?- random writes? (next page!)

0 0 1 1 0

(parity)

RAID-4: ADDITIVE vs SUBTRACTIVE

0 0 1 1 XOR(0,0,1,1)

Additive Parity Subtractive Parity

C0 C1 C2 C3 P0

RAID-5

- - - - P

- - - P -

- - P - -

Rotate parity across different disks

RAID-5: AnalysisWhat is capacity?

How many disks can fail?

Latency (read, write)?

N := number of disksC := capacity of 1 diskS := sequential throughput of 1 diskR := random throughput of 1 diskD := latency of one small I/O operation

- - - - P

- - - P -

- - P - -

RAID-5: ThroughputWhat is steady-state throughput for RAID-5?- sequential reads?- sequential writes?- random reads?- random writes? (next page!)

- - - - P

- - - P -

- - P - -

RAID-5 Random WRITES

RAID Level Comparisons

Reliability Capacity Read latencyWrite

Latency Seq Read Seq Write Rand ReadRand Write

RAID-0 0 C*N D D N * S N * S N * R N * R

RAID-1 1 C*N/2 D D N/2 * S N/2 * S N * R N/2 * R

RAID-4 1 (N-1) * C D 2D (N-1)*S (N-1)*S (N-1)*R R/2

RAID-5 1 (N-1) * C D 2D (N-1)*S (N-1)*S N * R N/4 * R

Summary

RAID: a faster, larger, more reliable disk system

One logical disk built from many physical disk

Different mapping and redundancy schemes present different trade-offs

DISKS à FILES

What is a File?

Array of persistent bytes that can be read/written

File system consists of many files

Refers to collection of filesAlso refers to part of OS that manages those files

Files need names to access correct one

Three types of names

– Unique id: inode numbers– Path– File descriptor

locationsize=12

inodes

0location

size1location

size2locationsize=63…

Meta-data

File API (attempt 1)

read(int inode, void *buf, size_t nbyte)write(int inode, void *buf, size_t nbyte)seek(int inode, off_t offset)

Disadvantages?

- names hard to remember- no organization or meaning to inode numbers- semantics of offset across multiple processes?

String names are friendlier than number namesFile system still interacts with inode numbers

Store path-to-inode mappings in a special file or rather a Directory!

locationsize=12

inodes

0location

size1location

“readme.txt”: 3, “hello”: 0, …

Directory Tree instead of single root directoryFile name needs to be unique within a directory

/usr/lib/file.so/tmp/file.so

Store file-to-inode mapping in each directory

locationsize=12

inodes

0location

size1location

“readme.txt”: 3, “hello”: 0, …

Example: read /hello

Reads for getting final inode called “traversal”

read(char *path, void *buf, off_t offset, size_t nbyte)write(char *path, void *buf, off_t offset, size_t nbyte)

Disadvantages?

Expensive traversal! Goal: traverse once

File Descriptor (fd)

Idea: Do expensive traversal once (open file)Store inode in descriptor object (kept in memory).Do reads/writes via descriptor, which tracks offset

Each process:File-descriptor table contains pointers to open file descriptors

Integers used for file I/O are indexes into this tablestdin: 0, stdout: 1, stderr: 2

int fd = open(char *path, int flag, mode_t mode)read(int fd, void *buf, size_t nbyte)write(int fd, void *buf, size_t nbyte)close(int fd)

advantages:- string names- hierarchical- traverse once- offsets precisely defined

FD Table (xv6)

struct file {...struct inode *ip;uint off;

// Per-process state struct proc {... struct file *ofile[NOFILE]; // Open files...

struct {struct spinlock lock;struct file file[NFILE];

} ftable;

Code Snippet

012345

offset = inode =

fdsfd table

location = …size = …

“file.txt” points here

int fd1 = open(“file.txt”); // returns 3read(fd1, buf, 12);int fd2 = open(“file.txt”); // returns 4int fd3 = dup(fd2); // returns 5

READ NOT SEQUENTIALLYoff_t lseek(int filedesc, off_t offset, int whence)

If whence is SEEK_SET, the offset is set to offset bytes.If whence is SEEK_CUR, the offset is set to its current

location plus offset bytes.If whence is SEEK_END, the offset is set to the size of

the file plus offset bytes.

struct file {...struct inode *ip;uint off;

QUIZ 24

Offset for fd1

Offset for fd2

Offset for fd3

https://tinyurl.com/cs537-sp20-quiz24

WHAT HAPPENS ON FORK?

Communicating Requirements: fsync

File system keeps newly written data in memory for awhileWrite buffering improves performance (why?)

But what if system crashes before buffers are flushed?

fsync(int fd) forces buffers to flush to disk, tells disk to flush its write cacheMakes data durable

Deleting Files

There is no system call for deleting files!

Inode (and associated file) is garbage collected when there are no references

Paths are deleted when: unlink() is called

FDs are deleted when: close() or process quits

rename

rename(char *old, char *new):- deletes an old link to a file- creates a new link to a file

Just changes name of file, does not move dataEven when renaming to new directory

What can go wrong if system crashes at wrong time?

Atomic File Update

Say application wants to update file.txt atomicallyIf crash, should see only old contents or only new contents

1. write new data to file.txt.tmp file

2. fsync file.txt.tmp3. rename file.txt.tmp over file.txt, replacing it

Summary

Using multiple types of name provides convenience and efficiency

Special calls (fsync, rename) let developers communicate requirements to file system

Next class: Directory features, Filesystem implementation

Discussion: Debugging parallel code, P4b

PERSISTENCE: FILE APIpages.cs.wisc.edu/~shivaram/cs537-sp20-notes/file-api/cs537-file-ap… · File...

Documents

iNode-PSenseintellect-module.ru/downloads/manuals/iNode-Psense/iNode-PSense… · snmp это протокол управления сетями связи на основе архитектуры

Disk Scheduling Algorithms - Cal Poly Pomonakanluezhang/cs537/presentation/CS537-DiskScheduling...CPP CS537 Spring 2015 Disk Scheduling Algorithms (Chapter 41. Scheduling in Secondary

Provenance Tracking System (PTS) - MITweb.mit.edu/6.033/2012/ · extensibility. I assume a versioning file system in which every file has a unique inode number and all subsequent

UNIX File Systems - Oakton Community College · system • INODE structures (UNIX i-node list) are stored on the file system block device (e.g., disk) in a predefined location on

Operating Systems: File Systems Œ p. 1/37mandar/os/filesys.pdf · Operating Systems: File Systems Œ p ... Algorithm: increment file system free inode count; if (SB is locked)

PERSISTENCE: FSCK, JOURNALINGpages.cs.wisc.edu/~shivaram/cs537-sp20-notes/fsck/cs537... · 2020. 4. 14. · FSCK = file system checker Strategy: After crash, scan whole disk for contradictionsand

iNode C - 35D Сетевой WEB/SNMP контроллер

UNIX File Systems - Registration - · PDF fileUNIX File Systems (Chap 4. in the book ... ialloc/ifree : assignment of a disk inode ... Algorithm ialloc /* allocate inode */ Input :

PERSISTENCE: JOURNALING, LFSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/lfs/... · LFS SUMMARY Journaling: Put final location of data wherever file system chooses (usually in a place

CS 537: Introduction to Operating Systems Fall 2015 ...pages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2015/... · Basic File System Operations and Data Structures [3 points each] These

CS537 Numerical Analysis - Computer Science …jzhang/CS537/lecture3.pdf · CS537 Numerical Analysis Lecture 3 Numerical Integration Professor Jun Zhang Department of Computer Science

Squirrel Programming Language By Nandini Bhatta CS537 Summer 2008

CS537 Randomness in Computing - cs-people.bu.edu

An Introduction to NFS - VM · 2002-03-19 · Introduction to File Systems The most widely used on Linux is ext2fs (extended 2 file system) Every file is represented by an “inode”

File System Architecture Pathname File Tree Mounting File Types inode and file Link File Access Mode Changing File Owner FreeBSD bonus

A Fast File System for UNIX - Cornell University...A Fast File System for UNIX • 183 descriptor associated with it called an inode. An inode contains information describing ownership

Introduction to Tizen Platform - SKKUnyx.skku.ac.kr/wp-content/uploads/2016/03/Lecture7-Week... · 2016-05-10 · inode 8 /home/ Directory file a.out 9 inode 9 inode 10 inode 11 inode

PERSISTENCE: FILE APIpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/file-api/cs537-file-ap… · File API (attempt 1) read(int inode, void *buf, size_t nbyte) write(int inode, void

CONCURRENCY: LOCKSpages.cs.wisc.edu/~shivaram/cs537-sp20-notes/locks/cs537... · 2020. 2. 25. · Associate queue of waiters with each lock (as in previous implementation) Multiprocessor

Project 6 - Princeton University Computer ScienceDirectories, part 1 •Like a file: List of files and directories –Name to inode number mapping •Can read it like a file –Use