42
UNIX FILE SYSTEM UNIX FILE SYSTEM NEZER J. ZAIDENBERG NEZER J. ZAIDENBERG

UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

Embed Size (px)

Citation preview

Page 1: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

UNIX FILE SYSTEMUNIX FILE SYSTEMNEZER J. ZAIDENBERGNEZER J. ZAIDENBERG

Page 2: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

HashlamaHashlama

Since it’s raining and my GF wants to cuddle and I Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick)hate it (and because I was sick)

Sunday 11.1 17-18Sunday 11.1 17-18

Tuesday 13.1 12-13 17-18 18-19Tuesday 13.1 12-13 17-18 18-19

Will be held in room 4 Will be held in room 4

Will focus on debugging skills of UNIX program and Will focus on debugging skills of UNIX program and problems I will note on your ex. 2 in user land problems I will note on your ex. 2 in user land (similar to the horrors presentation)(similar to the horrors presentation)

You are welcome to join. (but we will study You are welcome to join. (but we will study nothing new)nothing new)

Page 3: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

AGENDAAGENDA

UFS UFS

EXT2EXT2

EXT3,EXT4 – NEW FEATURESEXT3,EXT4 – NEW FEATURES

/PROC FILE SYSTEM/PROC FILE SYSTEM

VIRTUAL FILE SYSTEMVIRTUAL FILE SYSTEM

HOW TO CODE FILE SYSTEM DRIVERS (next class)HOW TO CODE FILE SYSTEM DRIVERS (next class)

WHAT HAPPENS IN THE KERNEL WHEN WE OPEN WHAT HAPPENS IN THE KERNEL WHEN WE OPEN FILE (next class)FILE (next class)

Page 4: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

REFERENCESREFERENCES

UNIX FILESYSTEMS – S. PETEUNIX FILESYSTEMS – S. PETE

ADVANCED PROGRAMMING IN THE UNIX ADVANCED PROGRAMMING IN THE UNIX ENVIRONMENT CHAPTERSENVIRONMENT CHAPTERS

UNDERSTANDING LINUX KERNELUNDERSTANDING LINUX KERNEL

Page 5: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

WHAT PURPOSE FILE WHAT PURPOSE FILE SYSTEM SERVESYSTEM SERVE

Manage used and free blocks on the disksManage used and free blocks on the disks

Manage multiple filesManage multiple files

Manage multiple devicesManage multiple devices

User permissionsUser permissions

And more (wear leveling, links, devices)And more (wear leveling, links, devices)

Page 6: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

Something about physical Something about physical disks drives and logical disks drives and logical

partitonpartitonHard drive - where data is keptHard drive - where data is kept

JBOD – just bunch of disks. (several hard drives that do nothing really special)JBOD – just bunch of disks. (several hard drives that do nothing really special)

RAID – Redundant array of independent disks = several disks that operate in RAID – Redundant array of independent disks = several disks that operate in special way to improve read/write/reliability performance usually at cost of special way to improve read/write/reliability performance usually at cost of disk space or reliability(for example, mirroring = using 2 disks data is saved to disk space or reliability(for example, mirroring = using 2 disks data is saved to both disks. Double read performance, much better availability, same write both disks. Double read performance, much better availability, same write speed, data takes twice the space. Striping = using two volumes where data is speed, data takes twice the space. Striping = using two volumes where data is saved on both. Doubles performance of read and write but reliability is saved on both. Doubles performance of read and write but reliability is damaged.)damaged.)

Hard drive may have multiple partitions each is treated as a separate disk for Hard drive may have multiple partitions each is treated as a separate disk for most OS related issues.most OS related issues.

Today high end storage project (big iron from vendors such as IBM Today high end storage project (big iron from vendors such as IBM (Shark/ESS), EMC (Symmetrix, Clarion), HDS, SUN(STK) etc.) have many (Shark/ESS), EMC (Symmetrix, Clarion), HDS, SUN(STK) etc.) have many physical drives that are usually not visible to the User. Instead the machine physical drives that are usually not visible to the User. Instead the machine exports several logical partition, each may be mapped to one or several disks.exports several logical partition, each may be mapped to one or several disks.

In this course when I use the term disk I refer to logical partitionIn this course when I use the term disk I refer to logical partition

Page 7: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

Hard drivesHard drives

Mechanical – What most of you have in your PC Mechanical – What most of you have in your PC Spinning heads over metal platesSpinning heads over metal plates

Slow compared to memorySlow compared to memory

Slow seek timeSlow seek time

Relatively fast sequential readRelatively fast sequential read

Tend to be unreliable compared to other hardware componentsTend to be unreliable compared to other hardware components

In this course I assume standard hard drivesIn this course I assume standard hard drives

Solid State – New technology – Solid State – New technology – Uses solid state technologyUses solid state technology

Slow compared to memorySlow compared to memory

Seek time = identical to sequential read timeSeek time = identical to sequential read time

Relatively reliableRelatively reliable

Require wear leveling (some solid state disks include wear leveling in hardware) Require wear leveling (some solid state disks include wear leveling in hardware)

Other types of hardwareOther types of hardwareCDROM – fast sequential read, very slow seek, ROM CDROM – fast sequential read, very slow seek, ROM

Tapes drive – usually don’t have “file system” – very very slow seek Tapes drive – usually don’t have “file system” – very very slow seek

Page 8: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

Types of file systemTypes of file system

Physical file system – we take a disk (or partition!) Physical file system – we take a disk (or partition!) and we want to arrange files on it. and we want to arrange files on it.

Logical file system – file system that demonstrate Logical file system – file system that demonstrate some logical state of the system such as /proc /dev some logical state of the system such as /proc /dev or /sys (those file systems demonstrate running or /sys (those file systems demonstrate running processes or devices detected by the system or processes or devices detected by the system or system info.) – Those file system don’t deal with system info.) – Those file system don’t deal with real file and are beyond scope. (but we real file and are beyond scope. (but we acknowledge their existence)acknowledge their existence)

Virtual file system – we take several physical and Virtual file system – we take several physical and logical merge them into one file system. logical merge them into one file system.

Page 9: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

Some definitionSome definition

Disk (n), partition(n) – where I put the files on. (I don’t Disk (n), partition(n) – where I put the files on. (I don’t care about type of disk or disk/partition semantic. I also care about type of disk or disk/partition semantic. I also ignore for now network file system logical file system ignore for now network file system logical file system etc.)etc.)

Mount(v) – the action in which I make a file system Mount(v) – the action in which I make a file system usable by the system (occur automatically in windows usable by the system (occur automatically in windows and some unices)and some unices)

Unmount (v) – making the file system no longer usable Unmount (v) – making the file system no longer usable to the system – for example if I want to eject itto the system – for example if I want to eject it

File (n) – unless noted otherwise I would refer to a real File (n) – unless noted otherwise I would refer to a real file! (not socket, pipe etc. those are not written on disk)file! (not socket, pipe etc. those are not written on disk)

Page 10: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

Disk based file Disk based file systemsystem

Page 11: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

UFS (early design + Not UFS (early design + Not very accurate)very accurate)

UFS was first available starting version 5-version 7 Linux from AT&T. (early 1980’s)UFS was first available starting version 5-version 7 Linux from AT&T. (early 1980’s)

UFS (UNIX file system) is the modern name of the Berkeley fast file system (FFS) UFS (UNIX file system) is the modern name of the Berkeley fast file system (FFS)

UFS was first described in a USENIX letter from 1984 titled “A fast file system for UFS was first described in a USENIX letter from 1984 titled “A fast file system for UNIX” (by Mckusick et al)UNIX” (by Mckusick et al)

UFS derived file system exist and improved in most modern UNIX box. (indeed the UFS derived file system exist and improved in most modern UNIX box. (indeed the Linux ext2 file system is almost direct extension.)Linux ext2 file system is almost direct extension.)

Today UFS implementation (found in Solaris for example) have many additional Today UFS implementation (found in Solaris for example) have many additional features, beyond our scope. Here we describe the some of the basic 1984 features, beyond our scope. Here we describe the some of the basic 1984 implementation. (It is easier to understand UFS first then ext2) implementation. (It is easier to understand UFS first then ext2)

This review – which by no means attempts to be historically accurate or describe This review – which by no means attempts to be historically accurate or describe any specific version in any way – is helpful to understand the idea’s that UNIX file any specific version in any way – is helpful to understand the idea’s that UNIX file system implement. (note that not all ideas were introduced in one version and system implement. (note that not all ideas were introduced in one version and with new ideas also came new optimization concepts that complicate things that I with new ideas also came new optimization concepts that complicate things that I left out) left out)

I ignore (as beyond the scope or no longer relevant) many consideration that were I ignore (as beyond the scope or no longer relevant) many consideration that were made regarding physical positioning of the data on the disks.made regarding physical positioning of the data on the disks.

Page 12: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

Basic building block of Basic building block of ufsufs

Block and fragmentsBlock and fragments

InodeInode

SuperblockSuperblock

Page 13: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

BlockBlock

Place to store data.Place to store data.

512 bytes (version 7) and 4096 bytes and up (BSD 512 bytes (version 7) and 4096 bytes and up (BSD 4 versions) (I ignore fragments intentionally.)4 versions) (I ignore fragments intentionally.)

Each block is identified by unique address it can be Each block is identified by unique address it can be used or notused or not

Files are saved on discrete number of blocks. (and Files are saved on discrete number of blocks. (and its file either use a block or not)its file either use a block or not)

Each block is identified by unique Each block is identified by unique

Nothing smart hereNothing smart here

Page 14: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

InodeInode

Reference to a fileReference to a file

Points directly and indirectly to blocksPoints directly and indirectly to blocks

Contain the OS info on a fileContain the OS info on a file

Does not contain the file nameDoes not contain the file name

Each Inode is identified by unique numberEach Inode is identified by unique number

Page 15: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

The inode strctureThe inode strcture

Permissions, user id, group id etc.Permissions, user id, group id etc.

Timers Timers

Everything we can get in stat/fstatEverything we can get in stat/fstat

Direct referances to block that the file is made ofDirect referances to block that the file is made of

Indiret (reference to block containing references) Indiret (reference to block containing references) reference to blocksreference to blocks

Indirect^2 (reference to references to references) Indirect^2 (reference to references to references) reference to blockreference to block

Etc. (modern UNIX system have indirect^4 references) Etc. (modern UNIX system have indirect^4 references)

Page 16: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

The super blockThe super block

The file system catalogThe file system catalog

General information about the file system such as General information about the file system such as Number of InodesNumber of Inodes

Number of blocksNumber of blocks

Number of used and free inodes and blocksNumber of used and free inodes and blocks

Page 17: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

Maximum file size?Maximum file size?

Maximum file size=total number of blocks that we Maximum file size=total number of blocks that we can point to.can point to.

Derived from the number of indirect levels of Derived from the number of indirect levels of pointing to blockspointing to blocks

In most cases it is practically unlimited in modern In most cases it is practically unlimited in modern UNIX boxes (but old versions had limit of 2GB to UNIX boxes (but old versions had limit of 2GB to couple of terabytes)couple of terabytes)

Page 18: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

Filenames and Filenames and directoriesdirectories

A directory type of file is a file containing list of i-Nodes A directory type of file is a file containing list of i-Nodes (specified by I-node number) and names that are contained (specified by I-node number) and names that are contained in the directoryin the directory

(The directory can contain other directories)(The directory can contain other directories)

The I-Nodes are the files that are contained in the directoryThe I-Nodes are the files that are contained in the directory

For each I-Node we have the name that will be used to For each I-Node we have the name that will be used to access it. (A file with several hard links can have several access it. (A file with several hard links can have several names)names)

Permissions for directory we have = read permission = I Permissions for directory we have = read permission = I can read the directory (ls(1)) write = I can create files in the can read the directory (ls(1)) write = I can create files in the directory (touch(1)) execute = I can cd into the directorydirectory (touch(1)) execute = I can cd into the directory

Page 19: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

Hard linksHard links

Hard links are two I-Nodes pointing to the same fileHard links are two I-Nodes pointing to the same file

Usable when two users want to work on the same Usable when two users want to work on the same file (data) each from his own directoryfile (data) each from his own directory

Also when one binary is used (such as bzip2, gcc) Also when one binary is used (such as bzip2, gcc) and decides based on how it is called what to do and decides based on how it is called what to do (check argv[0] is it bunzip2? Is it gcc? g++?) (check argv[0] is it bunzip2? Is it gcc? g++?)

When hard link is deleted the file is not deleted When hard link is deleted the file is not deleted (but the inode count on the inode is reduced by 1)(but the inode count on the inode is reduced by 1)

When the last (and only) Inode is deleted the When the last (and only) Inode is deleted the blocks are marked freeblocks are marked free

Page 20: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

Soft (Symbolic) linksSoft (Symbolic) links

Windows : Short cutsWindows : Short cuts

Those files contain a path where another file is Those files contain a path where another file is locatedlocated

When UNIX reads the file it moves to the other file When UNIX reads the file it moves to the other file and operate on it. (so open (unless op on symbolic and operate on it. (so open (unless op on symbolic links actually calls open on the file it points)links actually calls open on the file it points)

Page 21: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

Broken linksBroken links

Symbolic links are not counted in the I-nodeSymbolic links are not counted in the I-node

That means that if the file the symbolic link points That means that if the file the symbolic link points is deleted we have “broken link”is deleted we have “broken link”

Homework - not for submission – create a broken Homework - not for submission – create a broken linklink

Page 22: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

Ideas from BerkeleyIdeas from Berkeley

Some ideas for improvement was added in UFSSome ideas for improvement was added in UFSBlocks were too speed on the disks that caused many Blocks were too speed on the disks that caused many seek for the next block and low throughput. seek for the next block and low throughput. Therefore, UFS has larger block size (with continuous Therefore, UFS has larger block size (with continuous data) data)

Fragments were added to support partial usage of Fragments were added to support partial usage of blocksblocks

Super block is now replicated several times on the Super block is now replicated several times on the disk (stability and reliability as well as performance – disk (stability and reliability as well as performance – faster seek time for nearest superblock)faster seek time for nearest superblock)

Many new features are added (but I didn’t made Many new features are added (but I didn’t made distinction)distinction)

Page 23: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

FragmentsFragments

In an effort to reduce waste, and maintain low seek In an effort to reduce waste, and maintain low seek times, UFS allowed blocks to be broken to times, UFS allowed blocks to be broken to fragments to store odd ends of a filefragments to store odd ends of a file

When new data was appended to files with When new data was appended to files with fragments the new data was either filled in the fragments the new data was either filled in the fragment block (filling the block) or copied to a new fragment block (filling the block) or copied to a new block. block.

Page 24: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

Catalog based file Catalog based file systemsystem

Most file system that resides on disk are catalog based.Most file system that resides on disk are catalog based.

There exist a catalog (such as the superblock) with info regarding the file There exist a catalog (such as the superblock) with info regarding the file systemsystem

The catalog is in specific place The catalog is in specific place

Catalog based file system can be mounted easily (one only needs to read the Catalog based file system can be mounted easily (one only needs to read the file system catalog and know what’s up) file system catalog and know what’s up)

Catalog based file system are not suitable for devices that require wear leveling Catalog based file system are not suitable for devices that require wear leveling (The catalog is written to and accessed to much more often then other parts of (The catalog is written to and accessed to much more often then other parts of the filesystem)the filesystem)

Catalog based file systems are suitable for mechanical hard drives are less Catalog based file systems are suitable for mechanical hard drives are less suitable for Solid state devices (some solid state disks implement wear leveling suitable for Solid state devices (some solid state disks implement wear leveling internally so catalog based file system can be used)internally so catalog based file system can be used)

Catalog based file system are used now days in UNIX, Windows, IBM mainframes Catalog based file system are used now days in UNIX, Windows, IBM mainframes and most computer systems. They are not used in SS devices which explains and most computer systems. They are not used in SS devices which explains why the OS has to read the entire disk on key when you plug it in (why it takes why the OS has to read the entire disk on key when you plug it in (why it takes long to recognize)long to recognize)

Page 25: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

Problems with the UFS Problems with the UFS modelmodel

No log – incase of crash we don’t know what happened No log – incase of crash we don’t know what happened with last I/O and may have problems in recoveringwith last I/O and may have problems in recovering

Fragmented file – as we have seen (also from berkeley) Fragmented file – as we have seen (also from berkeley) we have the problem of fragmented files – when file is we have the problem of fragmented files – when file is broken to many blocks that span all over the disk we broken to many blocks that span all over the disk we need to seek for each block. This greatly reduce need to seek for each block. This greatly reduce throughput. Berkeley allocation algorithms and larger throughput. Berkeley allocation algorithms and larger block sized improved performance by factor of 10 (i.e. block sized improved performance by factor of 10 (i.e. 1000%!) when first implemented (compared to version 7 1000%!) when first implemented (compared to version 7 UFS measured as ability to use disk throughput!) however UFS measured as ability to use disk throughput!) however Berkeley still achieved only 40-50% of disk throughputBerkeley still achieved only 40-50% of disk throughput

Wear leveling – the catalog is written much more then Wear leveling – the catalog is written much more then other parts of the file systemother parts of the file system

Page 26: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

EXT2EXT2

Ext2 contains some logical performance extensions Ext2 contains some logical performance extensions over Berkeleyover Berkeley

Multiple block sizeMultiple block sizeDisk is implemented as several block groups each Disk is implemented as several block groups each contain superblock, inodes and data and block and contain superblock, inodes and data and block and inode bitmap (to assist in finding free block/inode) – inode bitmap (to assist in finding free block/inode) – Using block groups helps to reduce fragmentation as Using block groups helps to reduce fragmentation as files are extended to nearby blocksfiles are extended to nearby blocks8 blocks at a time are allocated at write to further 8 blocks at a time are allocated at write to further minimize file fragmentationminimize file fragmentationExt2 added other enhancement (long file names, 4TB Ext2 added other enhancement (long file names, 4TB file system, large files (indirect^3), reserved space file system, large files (indirect^3), reserved space (for root), periodic file system check etc. that are (for root), periodic file system check etc. that are beyond scope)beyond scope)

Page 27: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

Ext2 i-node 1/2Ext2 i-node 1/2

struct ext2_inode {struct ext2_inode {

__le16 i_mode; /* File mode */__le16 i_mode; /* File mode */

__le16 i_uid; /* Low 16 bits of Owner Uid */__le16 i_uid; /* Low 16 bits of Owner Uid */

__le32 i_size; /* Size in bytes */__le32 i_size; /* Size in bytes */

__le32 i_atime; /* Access time */__le32 i_atime; /* Access time */

__le32 i_ctime; /* Creation time */__le32 i_ctime; /* Creation time */

__le32 i_mtime; /* Modification time */__le32 i_mtime; /* Modification time */

__le32 i_dtime; /* Deletion Time */__le32 i_dtime; /* Deletion Time */

__le16 i_gid; /* Low 16 bits of Group Id */__le16 i_gid; /* Low 16 bits of Group Id */

__le16 i_links_count; /* Links count */__le16 i_links_count; /* Links count */

__le32 i_blocks; /* Blocks count */__le32 i_blocks; /* Blocks count */

Page 28: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

Ext2 i-node 2/2Ext2 i-node 2/2

__le32 i_flags; /* File flags */__le32 i_flags; /* File flags */

union {__le32 } osd1; /* OS dependent 1 */union {__le32 } osd1; /* OS dependent 1 */

__le32 i_block[15];/* Pointers to blocks */__le32 i_block[15];/* Pointers to blocks */

__le32 i_generation; / * File version (for NFS) */__le32 i_generation; / * File version (for NFS) */

__le32 i_file_acl; /* File ACL */__le32 i_file_acl; /* File ACL */

__le32 i_dir_acl; /* Directory ACL */__le32 i_dir_acl; /* Directory ACL */

__le32 i_faddr; /* Fragment address */__le32 i_faddr; /* Fragment address */

union {} osd2; /* OS dependent 2 */union {} osd2; /* OS dependent 2 */

}}

Page 29: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

Ext2 super block Ext2 super block (important fields)(important fields)

Inode, blocks, count, size, Inode, blocks, count, size, free count etc.free count etc.

Timers (mount time, write Timers (mount time, write time)time)

Block group Block group

User id/group id User id/group id

How many blocks to pre-alloc How many blocks to pre-alloc on each writeon each write

Magic number (to identify Magic number (to identify ext2 file system)ext2 file system)

Following the ext2 Following the ext2 superblock (on serperate superblock (on serperate blocks) we will fine the ext2 blocks) we will fine the ext2 block bitmap and ext2 inode block bitmap and ext2 inode bitmapbitmap

Page 30: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

Log based file systemLog based file system

In attempt to improve stability we implement a file In attempt to improve stability we implement a file system log (similar to database log)system log (similar to database log)

We will record operation we are about to take in We will record operation we are about to take in the logthe log

The log will help recreateThe log will help recreate

Page 31: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

Ext3/4 file systemExt3/4 file system

The ext3 file system is basicly a log added to the The ext3 file system is basicly a log added to the ext2 file systemext2 file system

Ext3 is currently the default file system in LinuxExt3 is currently the default file system in Linux

Ext4 is the next (experimental file system) Ext4 is the next (experimental file system)

Both file system add additional features that are Both file system add additional features that are beyond the scope of this course (and the usability beyond the scope of this course (and the usability requirements of most users)requirements of most users)

Page 32: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

Other disk file systemsOther disk file systems

Linux has many file system projectsLinux has many file system projectsReiserFS – very fast and stable log based file system ReiserFS – very fast and stable log based file system that lost popularity after its author Hans Reiser was that lost popularity after its author Hans Reiser was arrested for allegedly killing his wife.arrested for allegedly killing his wife.

Xfs – yet another log based file system by SGIXfs – yet another log based file system by SGI

Jffs (and varients) – file system for solid state disksJffs (and varients) – file system for solid state disks

Cdrfs – cdrom file systemCdrfs – cdrom file system

Page 33: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

Logical file systemLogical file system

Page 34: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

/proc/proc

/proc is a speical file system that contain some info /proc is a speical file system that contain some info regarding the system (for example max size of regarding the system (for example max size of shared memory etc.)shared memory etc.)

There is also directory for each running process There is also directory for each running process containing information about the process (CPU containing information about the process (CPU accounting information, open file descriptors etc.)accounting information, open file descriptors etc.)

/proc is used by performance monitors and other /proc is used by performance monitors and other programs that manipulate or monitor processesprograms that manipulate or monitor processes

Other logical file system are implementedOther logical file system are implemented

Page 35: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

Virtual file systemVirtual file system

Page 36: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

The virtual file systemThe virtual file system

Several file systems are accessed by the same host (the Several file systems are accessed by the same host (the HD, maybe another HD (with dos partition maybe?), a HD, maybe another HD (with dos partition maybe?), a DVD, CD-R, USB disk on key and a network share or two)DVD, CD-R, USB disk on key and a network share or two)

Each file system is MOUNTED and is assigned in a Each file system is MOUNTED and is assigned in a specific place.specific place.

UNIX also puts some “special files” in place – sockets, UNIX also puts some “special files” in place – sockets, pipes etc.pipes etc.

All those files have a name and are accessed by UNIXAll those files have a name and are accessed by UNIX

All those files are part of the Virtual file system interfaceAll those files are part of the Virtual file system interface

Page 37: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

The need for VFSThe need for VFS

We want to use files from many different file system We want to use files from many different file system each has (or maybe has not got) different super each has (or maybe has not got) different super block and different propertiesblock and different properties

Each file system driver has to support several Each file system driver has to support several methods that are supposed to be common to all file methods that are supposed to be common to all file system (also when we create a new file system we system (also when we create a new file system we register the new method and the new file system register the new method and the new file system name for mount to use)name for mount to use)

When we call mount(1), unmount(1), open(2), When we call mount(1), unmount(1), open(2), read(2), write(2) etc. The kernel calls the VFS read(2), write(2) etc. The kernel calls the VFS interface methods implemented by the file system interface methods implemented by the file system driver (the piece of kernel code that make us able to driver (the piece of kernel code that make us able to read the files on the file system)read the files on the file system)

Page 38: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

The VFS interfaceThe VFS interface

Vfs_mount – mount a file systemVfs_mount – mount a file system

Vfs_unmount – ubmount a file systemVfs_unmount – ubmount a file system

Vfs_root – return the root vnode for the file system Vfs_root – return the root vnode for the file system (what is vnode? Bare with me)(what is vnode? Bare with me)

Vfs_statfs – return file system specific info (answer Vfs_statfs – return file system specific info (answer to statfs(2))to statfs(2))

Vfs_sync – flush data to diskVfs_sync – flush data to disk

Vfs_fid, vfs_vget – beyond the scope (used by Vfs_fid, vfs_vget – beyond the scope (used by network file system)network file system)

Page 39: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

So what is vnodeSo what is vnode

Vnode is a kernel struct that points to a file (if the Vnode is a kernel struct that points to a file (if the file is implemented on a UFS or similar file system a file is implemented on a UFS or similar file system a v-node will point on i-node)v-node will point on i-node)

All file operation are done using the vnode operation All file operation are done using the vnode operation vector which contain pointers to function that can vector which contain pointers to function that can handle the specific vnode (based on the file system handle the specific vnode (based on the file system that vnode points on. Obviously a v-node pointing to that vnode points on. Obviously a v-node pointing to ext2 file system will be different from v-node ext2 file system will be different from v-node pointing to msdos file system.)pointing to msdos file system.)

Not all vnode functions has to be implemented for Not all vnode functions has to be implemented for every type of file system (for example one may every type of file system (for example one may implement file system that does not support hard implement file system that does not support hard links)links)

Page 40: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

Vnode operation vector Vnode operation vector functions (partial list)functions (partial list)

Vop_select – implement select(2)Vop_select – implement select(2)

Vop_rdwr – read to or write from fileVop_rdwr – read to or write from file

Vop_link – implement link(2)Vop_link – implement link(2)

vop_rename – obviousvop_rename – obvious

Vop_mkdir – make directoryVop_mkdir – make directory

Vop_rmdir – remove directoryVop_rmdir – remove directory

Vop_symlink – implement symlink(2)Vop_symlink – implement symlink(2)

Page 41: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

Linux specific stuffLinux specific stuff

Linux file system driver are implemented as kernel Linux file system driver are implemented as kernel module (remember from class 1?) module (remember from class 1?)

A file system driver inform the system he is a driverA file system driver inform the system he is a driver

A file system driver supply the system with list of A file system driver supply the system with list of functions to call when a file operation is done on functions to call when a file operation is done on said file system (a struct is given with pointers to said file system (a struct is given with pointers to functions) and a name given to mount. When functions) and a name given to mount. When mounting a file system from that specific type the mounting a file system from that specific type the specific API will be called.specific API will be called.

A file system specific api is used with the new driver.A file system specific api is used with the new driver.

Page 42: UNIX FILE SYSTEM NEZER J. ZAIDENBERG. Hashlama Since it’s raining and my GF wants to cuddle and I hate it (and because I was sick) Sunday 11.1 17-18 Tuesday

Other file systemsOther file systems

Linux also support network file systems (file Linux also support network file systems (file systems that are received via the network from systems that are received via the network from windows or UNIX hosts), distributed file systems windows or UNIX hosts), distributed file systems (file are saved on several computers and accessed (file are saved on several computers and accessed by group of computers) by group of computers)

Modules that are (below file system layer) that Modules that are (below file system layer) that implement software RAID productsimplement software RAID products

File system interface written by several programs. File system interface written by several programs. – all those are considered beyond the scope– all those are considered beyond the scope