Chapter 11: File System Implementationtawalbeh/nyit/csci620/slides/ch11.pdf · Chapter 11: File System Implementation ... File Concept The file system ... zFile-organization module

1

Chapter 11: File System Chapter 11: File System ImplementationImplementation

11.2 Silberschatz, Galvin and Gagne ©2005Operating System Concepts – 7th Edition, Jan 1, 2005

ObjectivesObjectives

To describe the details of implementing local file systems and directory structures

To describe the implementation of remote file systems

To discuss block allocation and free-block algorithms and trade-offs

2


File ConceptFile Concept

The file system consists of two distinct parts: collection of files and a directory structure.

The OS provides abstraction of physical properties of disks by defining a logical storage unit, the file.

A file is a collection of related information that is recorded on secondary storage.

Why files, why we don’t save info in the process itself?

Limited space

When the process terminates, the information is lost

Multiple processes must be able to access the information concurrently.


File AttributesFile Attributes

Name – only information kept in human-readable form, For the convenience of human users.

Identifier – unique tag (number) identifies file within file system

Type – needed for systems that support different types

Location – Pointer to a device and the file location on the device.

Size – current file size

Protection – controls who can do reading, writing, executing

Time, date, and user identification – data for protection, security, and usage monitoring

Information about files are kept in the directory structure, which is maintained on the disk

Usually, a directory entry consists of a filename and its ID. The ID locates the other attributes.

3


File OperationsFile Operations

A file is an abstract data type. System calls in the OS implement basic file operations:

Create: The file is created with no data (just to initialize attributes)

Open: Before using a file, a process must open it.

to allow the system to fetch the attributes and list of disk addresses into main memory for rapid access on later calls.

Close: to free up internal table space

Read: read from the current pointer position

Write: Data are written to the file usually at the current position

Reposition within file (file seek)

Delete

Truncate

Other possible operations: append, rename, get and set file attributes

Combining the above operations to perform other file operation (copying)


Directory StructureDirectory StructureA collection of nodes containing information about all files in a file system (name. location, size)

A directory is a system file for maintaining the structure of the file system

F 1 F 2F 3

F 4

F n

Directory

Files

Both the directory structure and the files reside on disk (or a partition)A disk can contain multiple file systemsDirectories can be organized in several different ways.

4


Operations Performed on DirectoryOperations Performed on Directory

Search for a file

Create a file

Delete a file

List a directory

Rename a file


Issues for Directory Organization

Efficiency – locating a file quickly

Naming – convenient to users

two users can have same name for different files

the same file can have several different names

Grouping – logical grouping of files by properties, (e.g., all Java programs, all games, …)

5


Directory StructureDirectory StructureSingleSingle--Level DirectoryLevel Directory

A single directory for all users, in which all files are contained.

Was common in early days when there was only one user.

Simple and easy to find files (one place to look in)

Naming problem

Grouping problem


TwoTwo--Level DirectoryLevel Directory

Separate directory for each user

Path name: defined by the username and the file name.Each OS has specific format for the path name.

System files are searched through search path rather than copied to each user directory

Different users can have files with the same name

Efficient searching

No grouping capability

6


TreeTree--Structured DirectoriesStructured Directories


TreeTree--Structured Directories (Cont)Structured Directories (Cont)

Efficient searching

Grouping Capability (subdirectories)

Current directory (working directory): should contain most of the files of interest to the process, else a user specify new directory

Special system calls (cd) are provided to change directories.

Absolute or relative path names.

Absolute – specify the name starting from the top (root) directory.

Example: /users/smith/project1/main.c

Relative – specify the name starting from the current directory.

Example: main.c

Search path for command names.

7


TreeTree--Structured Directories (Cont)Structured Directories (Cont)

Creating a new file is done in current directoryDeleting a directory – two options:

Require the directory to be empty before it can be deleted.Provide an option to delete the directory along with all of its

files and subdirectories.Delete a filerm <file-name>

Creating a new subdirectory is done in current directorymkdir <dir-name>

Example: if in current directory /mailmkdir count

mail

prog copy prt exp count

Deleting “mail” ⇒ deleting the entire subtree rooted by “mail”


FileFile--System StructureSystem Structure

File system resides permanently on secondary storage (disks)

For efficiency, I/O transfer between memory and a disk is performed on the units of blocks

Each block has one or more sector, a sector is usually 512 bytes

OS imposes a file system to store and access data. 2 problems:

Define how the file system should look to the user. It involves defining a file, its attributes, operations on a file, directorystructure. USER VIEW

Creating algorithms and data structure to map the logical file system into the physical secondary-storage device. SYSTEM VIEW

8


FileFile--System Structure (System Structure (contdcontd))

A file system is generally organized into layers:

I/O control – device drivers and interrupt handlers to transfer info between main memory and the disk system

The device driver translates a high level commands ( such as get block 123) into hardware-specific instructions.

Basic file system – implements basic operations (commands).

File-organization module – knows about files and their logical and physical blocks (location).

Tracks also the free space

Logical file system – manages metadata, i.e. information about the files.

Manages the directory structure via a file control block (FCB), which contains info about the file including ownership, permissions, and location.


Layered File SystemLayered File System

9


On-Disk File System Structures

File system structures that reside on disk generally include:

A boot control block (boot block, boot sector)

contains information needed to boot the OS. If no OS, this block is empty

Usually at the beginning of each partition (disk)

A volume control block contains volume (partition) information such as the number of blocks, size of blocks, free-block count and free-block pointers.

(UFS: superblock. NTFS: master file table.)

A directory structure per file system is used to organize files.

A file control block for each file contains details about the file, including permissions, ownership, size, and location of data blocks. (UFS: inode. NTFS: stored within the master file table.)


A Typical File Control BlockA Typical File Control Block

10


InIn--Memory File System StructuresMemory File System Structures

Information about files that is kept in memory by the operating system may include:

A mount table with information about each mounted volume.

A directory-structure cache.

The system-wide open-file table, with a copy of the FCB for each open file, along with other information.

The per-process open-file tables, which contain pointers to appropriate entries in the system-wide open-file table, and other information.

The following slide illustrates some of the in-memory structures kept by the operating system.

Figure (a) refers to opening a file.

Figure (b) refers to reading a file.


InIn--Memory File System StructuresMemory File System Structures

11


Directory ImplementationDirectory Implementation

Linear list of file names with pointer to the data blocks.

simple to program

time-consuming to execute requires linear search

To create a new file, the directory must be searched first to make sure that no existing file has the same name.

Also the same for deleting a file.

Hash Table – linear list stores the directory entries but with hash data structure. (hash function takes filename as input and returns a pointer to the filename in the list)

decreases directory search time

collisions – situations where two file names hash to the same location

Another problem is fixed size of hash tables. If the directory size must increase, a new hash function is needed. (ex: mod 64 to mod 128)


File Implementation File Implementation (Allocation Methods)(Allocation Methods)

Disk blocks must be allocated to files. The main problem is how to allocate space to files so that disk space is utilized effectively and files can be accessed quickly.

keeping track of which disk blocks go with which file

There are three major methods of allocating disk space to files:

Contiguous allocation

Linked allocation

Indexed allocation

12


Contiguous AllocationContiguous AllocationEach file occupies a set of contiguous blocks on the diskProvides efficient direct access.

Accessing block b+1 after block b requires no head movement, so the number of head seeks is minimal.

Simple addressing scheme: only starting location (block #) and length (number of blocks) are required

The directory entry for each file indicates the address of the starting block and the number of blocks required

Both sequential and direct access can be supportedA dynamic storage-allocation problem.

First fit and best fit are the most common used strategies.External fragmentation can be a problem.

Another problem is knowing how much space is needed for the file (file size) when it is createdFiles cannot grow, so if the allocated space is not enough any more:

Terminate the user program, with an appropriate error messageFind larger hole and copy the file into it.

Pre-allocation of space is generally required if the file size is known in advance, However, the file might reach its final size over long period (may be years) which can cause significant internal fragmentation.


Contiguous Allocation of Disk SpaceContiguous Allocation of Disk Space

13


Linked AllocationLinked Allocation

Each file is a linked list of fixed-size disk blocks.

Blocks may be scattered anywhere on the disk.

Allocate blocks as needed as the file grows, wherever they are available on the disk

The directory contains a pointer to the first and last blocks of the file

Each block contains a pointer to the next block (not accessible by users)


Linked Allocation (Cont.)Linked Allocation (Cont.)

Simple – only need to keep the starting address of each file.

No external fragmentation. Any free block can be used to satisfy a request for more space.

Creating a file is easy, declared of size zero (null pointer to the first block) then grow easily as long as free blocks are available

No efficient direct access.

Effective for sequential access

Pointers saved in the blocks consume some space

Use of clusters (a set of blocks which the system deals with as a unit)

Decreases overhead of pointers (4 blocks = 1 cluster needs 1 ptr)

Increases throughput (fewer head seeks).

Increases internal fragmentation.

Reliability: if a pointer is lost, blocks (clusters) can’t be traversed

14


Linked Allocation with FAT

Variation: File-Allocation Table (FAT)

A section of disk at the beginning is set a side to store the table.

FAT: One entry per block or cluster, and is indexed by block no. Link pointers only are kept in the FAT.

taking the pointer word from each disk block and putting it in the FAT table in memory

The directory entry contains the block no. of the first block

Special end-of-file is saved in the last pointer.

The FAT should be cached in memory to reduce head seeks.

Random access is much easier

Used by MS-DOS


Indexed Allocation (iIndexed Allocation (i--node)node)

Linked allocation solves the external fragmentation and size-declaration problem, but it can’t support efficient direct access with the absence of FAT

Indexed allocation brings all pointers together into the index block.

Each file has its own index block, which is an array of disk-block addresses

The directory contains the address of the index block, from which the ith block can be accessed directly

Logical view.

Block 5571396

index block

Block7

Block 13

Block 9

Block 6

15


Example of Indexed AllocationExample of Indexed Allocation


Indexed Allocation (cont.)

More efficient direct access than linked allocation.

No external fragmentation. Any free block can be used to satisfy a request for more space.

Suffers from a wasted space due to the use of index block. (pointer overhead is larger)

How large the index block should be? What if a file is too largefor one index block?

Linked scheme: use one block. To allow larger files, link several index blocks

Multilevel index: use a first-level index block to point to a second-level index blocks

Combined scheme: used in UNIX.

16


Indexed Allocation (Cont.) Indexed Allocation (Cont.) –– Multilevel IndexMultilevel Index

M

outer-index

index table file


Combined Scheme: UNIX (4K bytes per block)Combined Scheme: UNIX (4K bytes per block)

17


Efficiency and PerformanceEfficiency and Performance

Disks are the main bottleneck in a system performance, so they need to be improved in terms of efficiency and performance

Efficiency depends on:

disk allocation and directory algorithms

Clustering reduces head seeks and improve file transfer

types of data kept in file’s directory entry

Do we need to keep last access, last write dates for example. If yes we need to update these fields whenever a file is read or written

Pointer size.

Performance: it can be improved even after the algorithms are selected

A buffer (disk) cache can be used to keep frequently used file blocks in mem.

Page cache: cache file data as pages rather than as system blocks.

Memory mapped files can improve performance through page caching

Synchronous and asynchronous writes

free-behind and read-ahead – techniques to optimize sequential access


RecoveryRecoveryCare must be taken to ensure that system failure does not result in data loss or data inconsistency.

Consistency checking – compares data in directory structure kept in memory with data blocks on disk, and tries to fix inconsistencies

performing a backup on an active file system might lead to inconsistency. (If files and directories are being added, deleted, and modified during the dumping process, the resulting backup may beinconsistent )

Example: fsck on Unix or chkdsk in DOS

Use system programs to back up data from disk to another storage device (floppy disk, magnetic tape, other magnetic disk, optical)

Full backup: should the entire file system be backed up or only part of it?

it is usually desirable to back up only specific directories andeverything in them rather than the entire file system.

Incremental backup: back up files that have not changed since the last backup

Recover lost file or disk by restoring data from backup

18


Log Structured File SystemsLog Structured File Systems

We have faster CPUs, faster and larger memories, larger but SLOW diskswrites are done in very small chunks very inefficient due to high seek time

Ex: creating a file requires the i-node for the directory, the directory block, the i-node for the file, and the file itself must all be written

The goal of LFS is to achieve the full bandwidth of the disk, even in the face of a workload consisting in large part of small random writes Log structured (or journaling) file systems record each small update to the file system as a transaction

Used in NTFS, optional in Solaris UNIXAll operations for a transactions are written to a log (the disk is viewed as a log)

A transaction is considered committed once all of its operations have been successfully written to the log. Then the system call can return to the userHowever, the file system may not yet be updated

The transactions in the log are asynchronously written to the file systemWhen the file system has been successfully updated, the transactions are

removed from the log.After a system crash, all remaining committed transactions in the log are performed. Uncommitted transactions are rolled back.

Summary: all writes are initially buffered in memory, and periodically all the buffered writes are written to the disk in a single segment, at the end of the log


RAID StructureRAID Structure

RAID – multiple disk drives provides reliability via redundancy.distributes data across several physical disks which look to the operating system and the user like a single logical disk

Idea: Use many disks in parallel to increase storage bandwidth, improve reliability

Files are striped across disks

Each stripe portion is read/written in parallel

Bandwidth increases with more disks, but more error prone

RAID is arranged into six different levels. RAID 0 distributes data across several disks in a way which gives improved speed and full capacity, but all data on all disks will be lost if any one disk fails.

RAID 1 (mirrored disks) uses two (possibly more) disks which each store the same data, so that data is not lost so long as one disk survives

RAID 5 combines three or more disks in a way that protects data against loss of any one disk

19


RAID (cont)RAID (cont)

Several improvements in disk-use techniques involve the use of multiple disks working cooperatively.

key concepts in RAID:

mirroring, the copying of data to more than one disk;

Used for reliability purposes

striping, the splitting of data across more than one disk;

Used for performance. More disks in the array means higher bandwidth, but greater risk of data loss

error correction, where redundant data is stored to allow problems to be detected and possibly fixed.


RAID LevelsRAID LevelsRAID 0: Striped set without parity:

Provides improved performance and additional storage but no fault tolerance. Any disk failure destroys the array, which becomes more likely with more disks in the array

RAID1: Mirrored set without parity.Provides fault tolerance from disk errors and single disk failure

RAID2: Redundancy through Hamming codeRAID3: Striped with interleaved parity

Dedicated disk for parity writing bottleneck

RAID4: striped with Block level parity.Dedicated disk for parity writing bottleneck

RAID5: Striped with distributed parityEach disk have some blocks that hold the corresponding blocks parityUpon drive failure, any subsequent reads can be calculated from the distributed parity such that the drive failure is masked from the end user

RAID6: Striped set with dual parityeach group of blocks have two distributed parity blocks that are distributed across the disks

Provides fault tolerance from two drive failures

20


RAID (0 + 1) and (1 + 0)RAID (0 + 1) and (1 + 0)


Fast File System

The original Unix file system had a simple, straightforward implementation

Easy to implement and understand

But very poor utilization of disk bandwidth (lots of seeking)

BSD Unix folks did a redesign (mid 80s) that they called the Fast File System (FFS)

Improved disk utilization, decreased response time

Now the FS from which all other Unix FS’s have been compared

Good example of being device-aware for performance

21


Data and Inode Placement

Original Unix FS had two placement problems:

1. Data blocks allocated randomly in aging file systems

Blocks for the same file allocated sequentially when FS is new

As FS “ages” and fills, need to allocate into blocks freed up when other files are deleted

Problem: Deleted files essentially randomly placed

So, blocks for new files become scattered across the disk

2. Inodes allocated far from blocks

All inodes at beginning of disk, far from data

Traversing file name paths, manipulating files, directories requires going back and forth from inodes to data blocks

Both of these problems generate many long seeks


Cylinder Groups

BSD FFS addressed these problems using the notion of a cylinder group

Disk partitioned into groups of cylinders

Data blocks in same file allocated in same cylinder

Files in same directory allocated in same cylinder

Inodes for files allocated in same cylinder as file data blocks

Free space requirement

To be able to allocate according to cylinder groups, the disk must have free space scattered across cylinders

10% of the disk is reserved just for this purpose

22


Other Problems

Small blocks (1K) caused two problems:

Low bandwidth utilization

Small max file size (function of block size)

Fix using a larger block (4K)

Very large files, only need two levels of indirection for 2^32

Problem: internal fragmentation

Fix: Introduce “fragments” (1K pieces of a block)

Problem: Media failures

Replicate master block (superblock)

End of Chapter 11End of Chapter 11

Documents

Chapter 11: File System Implementationtawalbeh/nyit/csci620/slides/ch11.pdf · Chapter 11: File System Implementation ... File Concept The file system ... zFile-organization module