40
FILE SYSTEMS 2016 Operating Systems Design Euiseong Seo ([email protected])

FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

FILE SYSTEMS

2016 Operating Systems DesignEuiseong Seo ([email protected])

Page 2: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

File System Variations

¨ FAT (file allocation table) variants¤ FAT12, FAT16, FAT32¤ VFAT¤ exFAT

¨ ext variants¤ ext2¤ ext3¤ ext4

¨ NTFS¨ UFS, HFS and so on…

Page 3: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext2: Disk Data Structure

¨ Block groups¤ Used for ease of management¤ Each group contains several blocks¤ The number of blocks are determined by size of

partition and block size

Advanced Operating Systems

Disk data structure� Block groups� Are used rather than CG in FFS to ease management� Each group contains several blocks� Each block in the file system can be allocated of free� The number of blocks are determined by size of partition and block

size

19

BootBlock Block Group 0 … Block Group n

1 block n block 1 block 1 block n block n block

SuperBlock

GroupDescriptors

Data blockbitmap

Inodebitmap

InodeTable Data block

Page 4: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext2: Boot Block

¨ First 1024 bytes of a disk¨ Reserved for partition boot sectors¨ Unused by ext2 FS

Advanced Operating Systems

Boot Block� Main features� First 1,024 bytes of the disk� Reserved for the partition boot sectors� Unused by the Ext2 FS

20

BootBlock Block Group 0 … Block Group n

1 block n block 1 block 1 block n block n block

SuperBlock

GroupDescriptors

Data blockbitmap

Inodebitmap

InodeTable Data block

Page 5: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext2: Super Block

¨ 1,024 bytes from the start of the file system ¨ Is copied on each block group boundary for backup

¤ If different, it indicates file system corruption

¨ Information type ¤ Parameters which are determined when a specific file

system was created¤ Parameters which are tunable¤ Current file system state

Advanced Operating Systems

Superblock� Main features� Defined in struct ext2_super_block of “ext2_fs.h”� Fixed offset 1,024 bytes from the start of the file system� Is copied on each block group boundary for backup¾ If different, it indicates file system corruption

� Information type¾ Parameters which are determined when a specific file system was

created – cannot e changed once the file system was created¾ Parameters which are tunable – can always be changed¾ Current file system state

21

1 block n block 1 block 1 block n block n block

SuperBlock

GroupDescriptors

Data blockbitmap

Inodebitmap

InodeTable Data block

Page 6: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext2: Super Block Parameters

Advanced Operating Systems

Superblock� Main paramters

22

Type Filed Description

__u32 s_inodes_count Total number of inodes__u32 s_blocks_count File system size__u32 s_r_blocks_count Number of reserved blocks__u32 s_free_blocks_count Free blocks counter__u32 s_free_inodes_count Free inodes counter__u32 s_first_data_block Number of first useful block (always 1)__u32 s_log_block_size Block size__s32 s_log_frag_size Fragment size__u32 s_blocks_per_group Number of blocks per group__u32 s_frags_per_group Number of fragments per group__u32 s_inodes_per_group Number of inodes per group__u32 s_mtime Time of last mount operation__u32 s_wtime Time of last write operation__u16 s_mnt_count Mount operations counter__s16 s_max_mnt_count Maximal mount count

Page 7: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext2: Super Block Parameters

¨ Parameters related to the error handling (example)¤ Correction task is left to an external utility, such as

e2fsck ¤ s_state

n If value of bit 0 is zero on an unmounted file system means that the file system was not unmounted correctly

Advanced Operating Systems

Superblock� Parameters related to the error handling� Correction task is left to an external utility, such as e2fsck� s_state¾ If value of bit 0 is zero on an unmounted file system means that the file

system was not unmounted correctly

24

Value Description

Bit 00 When the partition is mounted

1 When the partition is unmounted

Bit 10 Kernel didn’t find any error

(It does not mean that there is no error)

1 When an error is detected by the kernel

Page 8: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext2: Group Descriptors

¨ Main features¤ Summarizes necessary information about the specific

group block

Advanced Operating Systems

� Main features� Summarizes necessary information about the specific group block

� Main fields

28

1 block n block 1 block 1 block n block n block

SuperBlock

GroupDescriptors

Data blockbitmap

Inodebitmap

InodeTable Data block

☜Type Filed Description

__u32 bg_block_bitmap Block number of block bitmap__u32 bg_inode_bitmap Block number of inode bitmap__u32 bg_inode_table Block number of first inode table block__u16 bg_free_blocks_count Number of free blocks in the group__u16 bg_free_inodes_count Number of free inodes in the group__u16 bg_used_dirs_count Number of free directories in the group

Advanced Operating Systems

� Main features� Summarizes necessary information about the specific group block

� Main fields

28

1 block n block 1 block 1 block n block n block

SuperBlock

GroupDescriptors

Data blockbitmap

Inodebitmap

InodeTable Data block

☜Type Filed Description

__u32 bg_block_bitmap Block number of block bitmap__u32 bg_inode_bitmap Block number of inode bitmap__u32 bg_inode_table Block number of first inode table block__u16 bg_free_blocks_count Number of free blocks in the group__u16 bg_free_inodes_count Number of free inodes in the group__u16 bg_used_dirs_count Number of free directories in the group

Page 9: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext2: Bitmaps

¨ Each bit in the block/inode bitmap indicates whether a specific block in the group is used or free

Advanced Operating Systems

� Main features� Each bit in the block/inode bitmap indicates whether a specific

block in the group is used or free

31

1 block n block 1 block 1 block n block n block

SuperBlock

GroupDescriptors

Data blockbitmap

Inodebitmap

InodeTable Data block

☜ ☜

Page 10: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext2: Bitmaps

¨ Example format ¤ Block/inode bitmap can represent as a block size ¤ If the bock size is 1,024 bytes:

n There is a place for 1,024 × 8 = 8,192 blocks in a block group

¨ Values¤ 0: corresponding data block/inode is free¤ 1: corresponding data block/inode is used

Advanced Operating Systems

� Example format� Block/inode bitmap can represent as a block size� If the bock size is 1,024 bytes:¾ there is a place for 1,024 × 8 = 8,192 blocks in a block group

� Values� 0: corresponding data block/inode is free� 1: corresponding data block/inode is used

32

Block size (bytes) Block bitmap (block)

1,024 8,192

2,048 16,384

4,096 32,768

Page 11: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext2: I-Node Table

¨ Each file/directory is allocated one inode¨ If all the inodes are used, any new files cannot be

created ¨ Each inode takes up 128 bytes (Ext4: 256 bytes)

¤ 1,024 bytes block contains 8 inodes¤ 4,096 bytes block contains 32 inodes¤ Total number of inodes in a block group is stored in the

superblock variable s_inodes_per_group¨ Goal is to place inodes and their related files in the

same block group

Advanced Operating Systems

Inode table� Main features� Each file/directory is allocated one inode� If all the inodes are used, any new files cannot be created � Each inode takes up 128 bytes (Ext4: 256 bytes)¾ 1,024 bytes block contains 8 inodes¾ 4,096 bytes block contains 32 inodes¾ Total number of inodes in a block group is stored in the superblock

variable s_inodes_per_group� Goal is to place inodes and their related files in the same block

group

34

1 block n block 1 block 1 block n block n block

SuperBlock

GroupDescriptors

Data blockbitmap

Inodebitmap

InodeTable Data block

Page 12: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext2: I-Node Table

Advanced Operating Systems

Inode table� Main fields

35

Type Filed Description

__u16 i_mode File type and access rights__u16 i_uid Owner identifier__u32 i_size File length in bytes__u32 i_atime Time of last file access__u32 i_ctime Time that inode last changed__u32 i_mtime Time that file contents last changed__u32 i_dtime Time of file deletion__u16 i_gid Group identifier__u16 i_link_count Hard links counter__u32 i_blocks Number of data blocks of the file__u32 i_flags File flagsunion osd1 Specific operating system information

__u32[EXT2_N_BLOCKS] i_block Pointers to data blocks

Page 13: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext2: I-Node Table

Advanced Operating Systems

Inode table� Main fields� i_mode: determines the inode type and access permission

� File type

36

File type Executionoverride

Owner permission

Group permission

Otherpermission

015 11 25812 9 6 3

Identifier Value Description

S_IFSOCK A000 Socket

S_IFLNK C000 Symbolic link

S_IFREG 8000 Regular file

S_IFBLK 6000 Block device

S_IFDIR 4000 Directory

S_IFCHR 2000 Character device

S_IFIFO 1000 FIFO

Page 14: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext2: I-Node Table

Advanced Operating Systems

Inode table� Main fields� i_mode: determines the inode type and access permission

� Execution override

37

File type Executionoverride

Owner permission

Group permission

Otherpermission

015 11 25812 9 6 3

Identifier Value Comment Description

S_ISUID 0800 Set UID Run the file with the owner permissions

S_ISGID 0400 Set GID Run the file with the group permissions

S_ISVTX 0200 Sticky bitRegular file: It should not be deleted in the swap areaDirectory: If it isn’t owner of the file, this directory should not be deleted

Page 15: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext2: I-Node Table

Advanced Operating Systems

Inode table� Main fields� i_mode: determines the inode type and access permission

� Owner/Group/Other permission

38

File type Executionoverride

Owner permission

Group permission

Otherpermission

015 11 25812 9 6 3

Identifier Value Description

Read 0b100 Set read permission

Write 0b010 Set write permission

Execute 0b001 Set execute permission

Page 16: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext2: I-Node Table

¨ Hard links ¤ Single inode may be pointed to from several directories

n In this case, there exist hard links to the file

¤ i_link_count keeps the number of hard linkn Is incremented with each additional linkn Is decremented when a file is deleted

¨ Time and data

Advanced Operating Systems

Inode table� Hard links� Single inode may be pointed to from several directories¾ In this case, there exist hard links to the file

� i_link_count keeps the number of hard link¾ Is incremented with each additional link¾ Is decremented when a file is deleted

– Only when this number reaches zero, the inode will be deallocated� Time and data

39

Fields Description

i_ctime Time in which the inode was last allocated

i_mtime Time in which the file was last modified

i_atime Time in which the file was last accessed

i_dtime Time in which the inode was deallocated

Page 17: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext2: I-Node Table

Advanced Operating Systems 41

i_block[0]

i_block[1]

i_block[2]

i_block[3]

i_block[4]

i_block[5]

i_block[6]

i_block[7]

i_block[8]

i_block[9]

i_block[10]

i_block[11]

i_block[12]

i_block[13]

i_block[14]

Direct

IndirectDouble indir.Triple indir.

1

i_block

5

Direct block

12

𝑏4

2

+ 2𝑏4

+ 11

𝑏4

2

+𝑏4

+ 12

𝑏4+ 12

Direct block

Indirect block

Indirect block

Direct block

Indirect block

Page 18: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext2: I-Node Table

¨ Maximum file size

Advanced Operating Systems

� Maximum file size

43

Block size Direct block Indirect block Double indirect block

Triple indirect block

1,024 12KB 268KB 64.26MB 16.062GB

2,048 24KB 1.02MB 513.02MB 256.5GB

4,096 48KB 4.04MB 4GB ~4TB

Page 19: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext2: Disk Space Management

¨ Disk space management required for allocation and deallocation of the data blocks and inodes

¨ If possible, it should avoid the file fragmentation¤ Fragmentation Increases the average time of read

operation¤ Because position of the disk head is to be changed

frequently¤ Thus, space management must be operated as soon as

possible

Page 20: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext2: Creation of I-Node

¨ ext2_new_inode() ¤ Creates an disk inode and return address of inode object

n If it fails, it returns NULL

¤ Selects block group so that new file is placed in the same group as the parent directory n Directories that is not associated with the inodes is distributed

between the groups

¤ Parameters n dir: address of directory corresponds to the new inode is inserted n mode: type of inode to create

Page 21: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext2: Removal of I-Node

¨ ext2_free_inode() ¤ Removes the disk inode that is pointed from inode

object ¤ Before call the function

n Kernel performs the operation to clean up the internal data structure and file data

n Then, removes the inode object in inode hash table

¤ The function must be called after making length of file to zero to remove all data block

Page 22: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext2: File Holes

¨ Used to prevent the waste of disk space¤ Such as database application

¨ Contains only NULL character as part of a regular file¨ Command to create a file with hole in the front

¤ 6,145 characters are in /tmp/holen NULL character: 6,144 and X character: 1n The file takes only one data block

¨ Based on dynamic data block assignment¤ Block is assigned to the file only when process writes a data

Advanced Operating Systems

� Used to prevent the waste of disk space� Such as database application

� Contains only NULL character as part of a regular file� Command to create a file with hole in the front

� 6,145 characters are in /tmp/hole¾ NULL character: 6,144 and X character: 1¾ The file takes only one data block

� Based on dynamic data block assignment� Block is assigned to the file only when process writes a data

53

$ echo –n “x” | dd of=/tmp/hole bs=1024 seek=6

Page 23: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext2: Data Block (De)Assignment

¨ ext2_get_block()¤ Look for a block that has a regular file data¤ If there is no corresponding block, it is automatically assigned¤ Called each time read/write is requested on a regular file

¨ ext2_alloc_block()¤ Look for free block in Ext2 partition¤ If necessary, assign to the blocks used for indirect reference

¨ ext2_truncate() ¤ When process delete the file or make length to zero, all data

blocks of this file are returned ¤ Takes the address of the file inode object as a parameter

¨ ext2_free_blocks()¤ Frees one or more adjacent data blocks of data group

Page 24: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext3: Overview

¨ ext3, or third extended filesystem, is a journaled file system that is commonly used by the Linux kernel

¨ Since 2001, it was merged with Linux kernel (2.4.15)¨ Its main advantage over ext2 is journaling ¨ ext3 adds the following features to ext2

¤ A journal¤ Online file system growth¤ Hash-Tree indexing for larger directories

¨ You can convert an ext2 file system to an ext3 file system directly¤ Without backup/restore

Page 25: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext3: Journaling - Motivation

¨ Single I/O request may involve many disk writesJournaling: Motivation• Single I/O request may involve

many disk write

2015-11-04 Computer System Lab.

Disk

PageCache

Application

File System

Disk BA C

A

Metadata (ex: inode)

Journaling: Motivation• Single I/O request may involve

many disk write

2015-11-04 Computer System Lab.

Disk

PageCache

Application

File System

Disk BA C

A

File contents (Data blocks)

Page 26: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext3: Journaling - Motivation

¨ Disk writes are cachedJournaling: Motivation• Single I/O request may involve

many disk write• Example: write()

• Update metadata of the file (timestamp, length…)

• Write the contents to the data blocks

2015-11-04 Computer System Lab.

Disk

PageCache

Application

BA C

Write A

A

File System

Page 27: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext3: Journaling - Motivation

¨ What if a system failure occurs during this?Journaling: Motivation• Single I/O request may involve

many disk write• Example: write()

• Update metadata of the file (timestamp, length…)

• Write the contents to the data blocks

• What if a system failure occurs during the operation?

2015-11-04 Computer System Lab.

Disk

PageCache

Application

BA C

Write A

File System

A

Page 28: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext3: Journaling - Motivation

¨ How can we successfully recover from these failures?Journaling: Motivation• Single I/O request may involve

many disk write• Example: write()

• Update metadata of the file (timestamp, length…)

• Write the contents to the data blocks

• What if a system failure occurs during the operation?• Cannot guarantee data consistency

¾ How can we successfully recover from these failures?

2015-11-04 Computer System Lab.

Disk

PageCache

Application

BA C

Write A

File System

Page 29: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext3: Journaling - Concept

¨ Uses the concept of transaction ¤ A series of operations either all

occur, or nothing occurs

¨ Keeps track of changes not yet committed to the file system ¤ All the writes are recorded as a

journal and it is stored into a journal area

Journaling: Idea• Journaling file system

• Uses the concept of transaction• A series of operations either all occur, or

nothing occurs

• Keeps track of changes not yet committed to the file system• All the writes are recorded as a “journal”

and it is stored into a “journal area”

2015-11-04 Computer System Lab.

Disk

Page Cache

Application

B

File System

Journal Area

A

A

Page 30: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext3: Journaling - ConceptJournaling: Idea• Journaling file system

• Uses the concept of transaction• A series of operations either all occur, or

nothing occurs

• Keeps track of changes not yet committed to the file system• All the writes are recorded as a “journal”

and it is stored into a “journal area”

2015-11-04 Computer System Lab.

Disk

Page Cache

Application

File System

Commit

A

BA

Page 31: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext3: Journaling - ConceptJournaling: Idea• Journaling file system

• Uses the concept of transaction• A series of operations either all occur, or

nothing occurs

• Keeps track of changes not yet committed to the file system• All the writes are recorded as a “journal”

and it is stored into a “journal area”

2015-11-04 Computer System Lab.

Disk

Page Cache

Application

File System

AS E

Commit

A

BA

Page 32: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext3: Journaling - Concept

¨ The file system periodically checkpoints committed journals to the original copies Journaling: Idea

• Journaling file system• Uses the concept of transaction

• A series of operations either all occur, or nothing occurs

• Keeps track of changes not yet committed to the file system• All the writes are recorded as a “journal”

and it is stored into a “journal area”• The file system periodically checkpoints

committed journals to the original copies

2015-11-04 Computer System Lab.

Disk

Page Cache

Application

File System

AS E

A

Checkpoint

BA

Page 33: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext3: Journaling - Concept

¨ Such file system is easy to recover from a system failure

Journaling: Idea• Journaling file system (cont’d)

• Such file systems can be easily recovered from the failures• System failure during commit

2015-11-04 Computer System Lab.

Disk

Page Cache

Application

File System

BA S

Journaling: Idea• Journaling file system (cont’d)

• Such file systems can be easily recovered from the failures• System failure during commit• System failure during checkpoint

2015-11-04 Computer System Lab.

Disk

Page Cache

Application

File System

AS E

A

Checkpoint

BA

Case 1 Case 2

Page 34: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext3: Journaling Implementation

¨ Backward and forward compatibility with ext2

¨ Existing ext2 partitions can be mounted as ext3

¨ JBD (Journal Block Device) ¤ A generic block device journaling

layer ¤ Journaling in ext3 is done at

block level, not structure level ¤ For the file system independency

Journaling: Ext3 Implementation• Remind: design goal of Ext3 FS

• Backward and forward compatibility with ext2• Existing ext2 partitions can be mounted as ext3

• JBD(Journal Block Device)• A generic block device journaling layer• Journaling in ext3 is done at block level, not

structure level• For the file system independency

2015-11-04 Computer System Lab.

File System

JBD

Block Device Joural Area

Transactionstart/update/commit

Page 35: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext3: Journaling Implementation

Page 36: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext3: Journal Modes

¨ Ext3 support three journaling modes: ¨ Journal

¤ Metadata + file contents are journaled ¨ Writeback

¤ Only metadata is journaled ¨ Ordered

¤ Only metadata is journaled, but it's guaranteed that file contents are written to disk before associated metadata is marked as committed in the journal

¤ Default option

Page 37: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext4: Overview

¨ The new ext4 filesystem: current status and future plans¤ 2007 Linux Symposium, Ottawa, Canada July 27th - 30th

¨ Motivation¤ 16TB File System Size Limitation ¤ 32-bit Block Numbers : 4KB × 232 = 16TB¤ 32,768 Sub-Directory Limitation¤ Performance Limitation

¨ New Features¤ 48-bit Block Numbers: 4KB × 248 = 1¤ Replacing indirect blocks with extents¤ Optimized block allocation¤ Performance optimization

Page 38: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext4: Changes

Changes of EXT File SystemExt2 Ext3 Ext4

Added to Kernel 1993 2001 (2.4.15) 2006(2.6.19)2008(2.6.28) ‐ Stable

Max File Size 16GB ~ 2TB 16GB ~ 2TB 16GB ~ 16TB

MAX File System Size 2TB ~ 32TB 2TB ~ 32TB 1EB(Exabyte = 1024TB)

Feature Block Group JournalingExtended MappingMultiblock AllocationDelayed Allocation

Block Size Max File Size Max File System Size

1 KB 16 GB 2 TB

2 KB 256 GB 8 TB

4 KB 2 TB 16 TB

8 KB 2 TB 32 TB

¾4 × 2 4 = 16¾2 × 2 4 = 8

Page 39: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext4: Extended Mapping

¨ EXT2/3¤ Indirect block maps are incredibly inefficient for large files

n Double, triple indirect block mappingn One extra block read (and seek) every 1024 blocks

¨ EXT4¤ An extent is a single descriptor for a range of contiguous

blocks n Scalability Enhanced - An efficient way to represent large file

¤ Better CPU utilization, fewer metadata I/Os¤ Extents block mapping

Page 40: FILE SYSTEMS - AndroBenchcsl.skku.edu/uploads/ECE5658S16/week14.pdf · 2016-05-30 · ext2: Disk Data Structure ¨ Block groups ¤ Used for ease of management ¤ Each group contains

ext4: Extended Mapping

¨ On-disk extents format ¤ Extent : represent a range of contiguous physical blocks ¤ 12 bytes ext4_extent structure

n Address 1 EB file system (48 bit physical block number)n Max extent 128 MB (16 bit extent length)n Address 16 TB file size (32 bit logical block number)