Upload
louisa-cobb
View
219
Download
1
Embed Size (px)
DESCRIPTION
Introduction
Citation preview
www.monash.edu.au
FIT1001- Computer Systems
Lecture 10File Systems Features and Formats
SG9: FIT1001 Computer Systems S1 2006 2
Lecture 10: Learning Objectives• Explain the concepts and purpose of files and directories in a file
system;• Discuss various file naming restrictions imposed by common file
systems;• Understand sequential, direct and indexed file access methods;• Explain the concept and function of volumes / partitions in a file
system;• Discuss the various attributes stored in reference to files, directories
and volumes / partitions;• Define the terms current / working directory, absolute and relative
paths;• Discuss file access control methods;• Explain allocation methods – contiguous, linked and indexed;• Explain how free space is managed in a file system;• Explain and illustrate practical implementations of modern file
system, FAT16/32, NTFS, UFS/ExtFS and ISO 9660
www.monash.edu.au
Introduction
SG9: FIT1001 Computer Systems S1 2006 4
Introduction – File Systems
• Operating system – Transfers required programs and data from secondary
storage devices to main memory for the processor to execute– The output is typically saved back to these devices– Program and data files are important resources– Task of creating, managing and manipulating files within a
operating systems is performed by the File System
• File Systems– Objectives: Stallings (but not limited to):
> To meet the data management needs and requirements of the user, which include storage of data and the ability to a standard set of operations
> To guarantee, to the extent possible, that data in the file are valid
SG9: FIT1001 Computer Systems S1 2006 5
> To optimize performance, – From the system point of view in terms of overall throughput – From the user’s point of view in terms of response time
> To provide I/O support for a variety of storage device types> To minimize or eliminate the potential for lost or destroyed data> To provide a standardized set of I/O interface routines to use
processes> To provide I/O support for multiple users, in the case of multiple-
user systems– Objectives: Palmer (but not limited to):
> Partition and format disks to store and retrieve information> Enable files to be organized through directories and folders> Establish file naming conventions> Provide utilities to maintain and manage the file system and
storage media
Introduction – File Systems
SG9: FIT1001 Computer Systems S1 2006 6
Introduction – File Systems > Provide for file and data integrity> Enable error recovery or prevention> Secure the information in files
www.monash.edu.au
File System Basics
SG9: FIT1001 Computer Systems S1 2006 8
File System Basics – File Concept
• File system consists of two distinct parts– A collection of files, each storing related data– A directory structure, which organises and provides
information about all files in the system
• File concept– Store information on various storage media– Operating system provides a uniform logical view of
information storage– Operating system abstracts from the physical properties of its
storage devices to define a logical storage unit, the FILE– User’s perspective
> File is the smallest allotment of logical secondary storage> Data cannot be written unless it is in a file
SG9: FIT1001 Computer Systems S1 2006 9
File System Basics – File Concept – In general a file is a sequence of bits, bytes, lines or records,
the meaning of which is defines by the file’s creator and user– Many different types of information can be stored in a file:
> Source programs / executable programs / numeric data / text / images, sound, video etc
– A file has a certain defined structure> Dependent on type
– Eg, Text files are a sequence of characters– Eg., Source files is a sequence of subroutines / functions, each
further organized as declarations and executable statements– Eg., Executable files is a series of code sections that the loader can
bring into memory and execute
SG9: FIT1001 Computer Systems S1 2006 10
File System Basics – File Concept
• File Attributes– A name is usually a string of characters– A file is named for the convenience of human users– Some systems differentiate between upper and lower case– File attributes vary but typically consist of:
> Name – only information kept in human-readable form> Identifier – unique tag (number) identifies file within file system> Type – needed for systems that support different types> Location – pointer to file location on device> Size – current file size> Protection – controls who can do reading, writing, executing> Time, date, and user identification – data for protection, security,
and usage monitoring
SG9: FIT1001 Computer Systems S1 2006 11
File System Basics – File Concept – Information about all files are kept in the directory structure
> Typically an entry consists of the file’s name and its unique identifier
> Identifier locates the other file attributes> It can take more than 1KB to record this information> Directories can be very large in size
• File Operations– Six basic file operations (minimal set):
> Creating a file: Space in the file system must be found / an entry for the new file must be made in the directory
> Writing a file: system call specifying the name of the file and the information to be written / system searches directory for file name location / write pointer indicates location on the next write within the file
SG9: FIT1001 Computer Systems S1 2006 12
File System Basics – File Concept > Reading a file: system call specifying the name of the file and
where in memory the next block should of the file should be put / directory searched and read pointer (of the next read location) updated
> Repositioning within a file: directory searched for appropriate entry / current file position pointer is repositioned to a given value
> Deleting a file: directory searched for appropriate entry / all related file space is released / erase the directory entry
> Truncating a file: allows all attributes to remain but file is reset to 0 length and file space released
– Open file table> To avoid constant searching the operating system maintains a
table containing information about all open files> When a file operation is requested the file is specified via an
index directly into the table
SG9: FIT1001 Computer Systems S1 2006 13
File System Basics – File Concept > When file is no longer active it is closed by the process using it
and entry from table removed– Opening and closing files more difficult where several
processes may open / use the file at the same time> Operating system uses two tables:
– Per process: tracks all files that a process has opened / use of the file by the process is stored, eg., current file pointer / access rights etc / points to system wide open file table
– System wide: contains process independent information, eg., location of file on disk, access dates, file size etc
– In general the following are associated with an open file:> File pointer – last read/write location (per process)> File open count – when 0 the entry can be removed> Disk location of the file > Access rights
SG9: FIT1001 Computer Systems S1 2006 14
File System Basics – File Concept – Some operating system provide for locking an open file
> Allows one process to lock a file and prevent other processes from gaining access to it
• File Types / Structure– Whether the operating system should recognize and support
file types is a critical file system design consideration– If so then it can operate on the file in reasonable ways– Common technique is to include the type as part of the file
name> Split into two parts – name and extension
– Eg., resume.doc / server.java– Not all systems employ or strictly enforce this method
> DOS and Windows are a good example of this technique> Unix uses a magic number to indicate type of file
– Allows extensions but are neither enforced or operating system dependent / more for identifying content
SG9: FIT1001 Computer Systems S1 2006 15
File System Basics – File Concept
SG9: FIT1001 Computer Systems S1 2006 16
File System Basics – File Concept – File Structure
> Can be used to indicate internal structure> Files need to conform to structures so related programs /
operating system can read / execute them> Operating systems can support sets of file structures
– The more file structures supported the larger the operating system– That is the operating system must contain the code to support these
structures– An operating system may expect all files to be defined
» Issues occur if a new application requires a new file structure that is not supported by the operating system
» Eg., A system supports two file types – text and executable binary / if we want to now encrypt them we need to either fool the operating system or abandon the encryption scheme
SG9: FIT1001 Computer Systems S1 2006 17
File System Basics – File Concept – Some operating system support a minimal number of
structures> Eg., UNIX, MS-DOS, Mac, Windows> UNIX considers files to be a sequence of bits
– No interpretation by operating system – maximum flexibility but no support
– Each application includes code to interpret their own file structure
– All operating systems must support at least one structure> The executable file
– Internal File Structure> Physical record – one block of memory / same size> Logical record – Part of a file / fixed length > Physical record is generally larger than logical record
– Can pack a number of logical records in a physical record
SG9: FIT1001 Computer Systems S1 2006 18
File System Basics – File Concept – Eg., UNIX – logical record size is 1 byte / block size (physical
record) generally is 512 bytes> Does lead to internal fragmentation
– Disk space allocated in blocks– Last portion of file may not need all of the assigned block
» Eg., Each block is 512 bytes, a file of 1,949 bytes would be allocated four block (2,048 bytes) with 99 bytes being wasted
– The larger the block size the greater internal fragmentation
www.monash.edu.au
Access Methods
SG9: FIT1001 Computer Systems S1 2006 20
Access Methods
• Access Methods– Accessing information in a file can be accessed in several
ways– Some systems provide only one methods, others support
many methods– Common methods are:
> Sequential– Information in the file is processed in order, one record after the
other– Reads and writes make up bulk of operations on a file – Read Next reads next portion of the file and increments file pointer
to next record– Write Next appends to the end of the file– Usually employed for sequential devices like tape drives– Does not work as well on random access devices
SG9: FIT1001 Computer Systems S1 2006 21
Access Methods > Direct
– File is made up of fixed length logical records– Read and written in no particular order– Based on disk model since it allows random access to any file block – Allows immediate access to large amounts of information– File operation must include the block number as a parameter
» Read N / Write N– Block number provided is a relative block number
» An index relative to the beginning of the file, so the first relative block is 0, the next 1 and so on
» The absolute disk address for these may by 14703 for first block, 3192 for next and so on
» Relative block allow the operating system to decide where the file should be placed / stops user from accessing portions of the file system
– Not all systems support both sequential and direct
SG9: FIT1001 Computer Systems S1 2006 22
Access Methods – Index
> Other methods can be built upon direct access method> Generally use a form of index – pointer to various blocks
– To find a file the index is searched and the pointer used to access the file directly and find the desired record
> Allows us to search a large file doing little I/O> With large files index itself may be to large to be kept in memory
– One solution is to create another index for the index file
www.monash.edu.au
Directory Structure
SG9: FIT1001 Computer Systems S1 2006 24
Directory Structure
• So far we have looked a files, we now need to look at how these can be managed
• Storage structure– A disk can be used for the entire file system or multiple file
systems> Partitions / Slices / minidisks
– Volumes can be created that contain a file system but must also contain information about the files in the system
> This is known as a device directory / volume table of contents or just directory
SG9: FIT1001 Computer Systems S1 2006 25
Directory Structure
• Directory> A directory contains information such as name, location, size,
type etc for all files on that volume> Can be viewed as a symbol table that translates file names into
their directory entries> Typical operations on a directory:
– Search for a file / match patterns– Create a file – Delete a file – List a directory – Rename a file– Traverse a file system
SG9: FIT1001 Computer Systems S1 2006 26
Directory Structure – Single Level Directory
> Simplest structure – all files are contained in the same directory
> Limitations– All files must have unique names– Problems increase with more files and more users
> File names are generally limited in length – MS-DOS allows only 11 characters / UNIX and Windows 9x, 2000
and XP allow up to 255
SG9: FIT1001 Computer Systems S1 2006 27
Directory Structure – Two Level Directory
> Solution to single is allow each user to have their own directory> System has a master file directory (MFD) / each user has a user
file directory (UFD)– MFD contains entry for UFD
> Path name – uniquely identifies a file (eg., user1/cat)> Can have the same file names for different user files> Efficient searching> No grouping capability – isolates users / no sharing of files
SG9: FIT1001 Computer Systems S1 2006 28
Directory Structure – Tree Structured Directories
> Extend directory structure to a tree of arbitrary height> Allows users to create their own subdirectories / organize files> Tree has a root and every file has a unique path name> A directory (or subdirectory) contains a set of files or
subdirectories– Directory simply another file– All directories has the same
internal format– One bit in each entry is used
to define the entry as a fileor subdirectory
SG9: FIT1001 Computer Systems S1 2006 29
Directory Structure > Current directory (working directory) – location of a particular file
– When reference is made to a file the current directory is searched– If the file is not found user must provide the full path or change the
directory to the directory containing the file> Path names can be absolute or relative
– Absolute begins at the root and follows the path down to the file, eg., c:\temp\fit1001\assignment2.doc
– Relative defines a path from the current directory, eg.,if current directory is root/spell/mail then the path prt/first points to the same file as does the absolute path of root/spell/mail/prt/first
SG9: FIT1001 Computer Systems S1 2006 30
Directory Structure – Acyclic-Graph Directories
> Allows directories to share subdirectories and file> The same file or subdirectory may be in two different directories> This is not the same as having have two copies
– Only one actual copy exists so changes are made on only one> Can be implemented in several
ways– Common methods is to create
a directory entry called a LINK– A LINK is a pointer to another
file or subdirectory> More complex than the simple
tree structure– A file may now have multiple
absolute path names, so distinct file names may refer to the same file
SG9: FIT1001 Computer Systems S1 2006 31
Directory Structure – Deletion of a file can be an issue – if you delete the actual file then
the links remain, but with nothing to point to » Deletion of a link is not a problem» In UNIX links are left when a file is deleted, it is up to the user
to realize the original file is missing» Another approach is to preserve file until all references are
deleted – can keep a file reference list or just the number of references (when 0 the file can be deleted)
www.monash.edu.au
File Access and Control
SG9: FIT1001 Computer Systems S1 2006 33
File Access and Control • Information stored should be safe from physical
damage and improper access• Need, especially in multi-user systems for
controlled access • File owner/creator should be able to control:
– What can be done– By whom– Types of access
> Read> Write> Execute> Append> Delete> List
SG9: FIT1001 Computer Systems S1 2006 34
File Access and Control
• Common method of access control is to base access of identity of user
– Different users may require different access– Most general scheme is based on access control lists (ACL)
> Specifies user name and type of access allowed> Main problem is ACL length, must list all users that have access
to a particular file– Many system use a condensed version of access lists and
recognize three classifications of users:> Owner> Group > Universe
– Access modes of read, write and execute are given to each group
SG9: FIT1001 Computer Systems S1 2006 35
File Access and Control – In UNIX / Linux this can be seen in the left hand column of
the image below > From left to right with the column
– D: indicate a directory (subdirectory)– Next three characters indicate the owners access rights– Next three characters indicate group access rights– Next three characters indicate world / universe access rights
SG9: FIT1001 Computer Systems S1 2006 36
File Access and Control – In Windows XP something similar exists
www.monash.edu.au
Implementing File Systems
SG9: FIT1001 Computer Systems S1 2006 38
Implementing File Systems
• The file system resides on secondary storage– Designed to hold a large amount of data permanently
• Today we are going to look at file storage and access on the most common storage – the disk
• Disks have two characteristics useful for storing files:
– A disk can be rewritten in place (read, modify and write back to same space
– A disk can access directly any given block> A block has one or more sectors
• A file system is generally composed of many different levels
SG9: FIT1001 Computer Systems S1 2006 39
Implementing File Systems – I/O control: device drivers and interrupts
handlers to transfer information between main memory and disk
– Basic file system: needs only to issue generic commands to the appropriate device driver to read and write physical blocks on the disk
– File organization module: knows about files and their logical and physical blocks / also includes free space manager
– Logical file system: manages metadata related to file system structure / manages directory structure / protection and security
SG9: FIT1001 Computer Systems S1 2006 40
Implementing File Systems
• Layered approach– Minimizes code required, I/O and basic file system code can
be reused by multiple file systems– Each file system can have its own logical file system and file
organization module
SG9: FIT1001 Computer Systems S1 2006 41
Implementing File Systems
• File System Implementation– Several on-disk and in-memory structures are used to
implement a file system> These vary on the operating system and the file system
– On-disk> File system may contain information about how to boot an
operating system / total number of blocks / number and location of free blocks / directory structure / individual files
> Common structures– Boot control block: contains information on how to boot the
operating system / if volume does not contain an operating systems then block is empty / typically first block of the volume
» In UNIX file system (UFS) – boot block » In Windows NTFS – partition boot sector
SG9: FIT1001 Computer Systems S1 2006 42
Implementing File Systems – Volume control block: contain volume details
» In UNIX file system (UFS) – superblock » In Windows NTFS – master file table
– A directory structure per file system is used to organize files» In UNIX file system (UFS) – file names and associated inode
(information node) numbers » In Windows NTFS – stored in master file table
– A per file control block containing details about the file» In UNIX file system (UFS) –inode » In Windows NTFS – stored within master file table
– In-memory> Used for both file system management and performance
improvement via caching> Common structures
– An in-memory mount table which contains information about a volume that is mounted
SG9: FIT1001 Computer Systems S1 2006 43
Implementing File Systems – In-memory directory structure cache that holds recently accessed
directories– System wide open file table: contains a copy of the file control block
of each open file / plus additional information– Per process open file table: contains pointers to the appropriate
entry in the system wide file table / plus additional information
• Directory Implementation– Selection of directory allocation and directory management
algorithms significantly affects the file systems:> Efficiency> Performance > Reliability
SG9: FIT1001 Computer Systems S1 2006 44
Implementing File Systems – Linear List
> Simplest method of implementing a directory> List of file names with pointer to the data blocks> Simple to program> Time-consuming to execute > Main disadvantage is that finding a file requires a linear search
– Hash Table> Linear list still used but a hash data structure is also used> Hash table takes a value computed from the file name and
returns a pointer to the file name in the linear list> Decreases directory search time> Collisions – situations where two file names hash to the same
location (which needs to be considered)> Major problem is the hash table is generally a fixed size
SG9: FIT1001 Computer Systems S1 2006 45
Implementing File Systems – Allocation methods
> In allocating disk space the issue is how to allocate space to files so that space is utilized effectively and access quick
> Three major methods are contiguous, linked and indexed> Contiguous
– Each file occupies a set of contiguous blocks on the disk
– Simple: only starting location (block #) and length (number of blocks) are required
– Wasteful of space (dynamic storage-allocation problem)
– Files cannot grow, once space allocated it cannot expand
– External fragmentation occurs
SG9: FIT1001 Computer Systems S1 2006 46
Implementing File Systems > Linked
– Solves problems of contiguous allocation
– Each file is a linked list of disk blocks (blocks can be scattered anyway on the disk)
– Directory contains pointer to first and last blocks with each block containing a pointer to the next block
– No external fragmentation as any free block on the free-space list can used (a file can grow as long as free blocks are available)
– Free-space management system – no waste of space
SG9: FIT1001 Computer Systems S1 2006 47
Implementing File Systems – Disadvantages
» Only used effectively for sequential file access (for ith block of a file we must start at the beginning and follow the pointers) / each access require a disk read and maybe a disk seek
» Space required to store pointer within a block, consumes space» Reliability: if pointer was damaged / lost, then link could not be
followed– One solution to pointer space would be to group blocks into clusters
and allocate clusters instead of blocks» Reduces overheads / mapping simpler
– Variation on linked allocation if FAT (file allocation table) employed by MS-DOS and IBM’s OS/2
> Indexed– Solves external fragmentation and size declaration issues of
contiguous allocation– Linked allocation (besides FAT) cannot support efficient direct
access (as pointers are scattered with the block)
SG9: FIT1001 Computer Systems S1 2006 48
Implementing File Systems – Indexed allocation brings all the pointers together in one location –
the index block– Each file has its own index block
» An array of disk block addresses– Directory contains the
address of the index block– When file is created all
pointers in block set to nil– When a block is obtained its
address is entered into theindex
– Supports direct access without externalfragmentation
SG9: FIT1001 Computer Systems S1 2006 49
Implementing File Systems – Free space management
> Since disk space is limited we need to reuse space from deleted files for new files (if possible)
> To keep track of free space a free-space list is maintained> Common implementation include bit vector, linked list, grouping
and counting> Bit vector
– List implemented as a bit map / bit vector– Each block represented by 1 bit
» If the block is free the bit is 1 / in use then bit is 0» Eg., blocks 2,3 and 4 are free, 0 and 1 are used, so map would
look like 00111> Linked list
– Link together all the free disk blocks keeping a pointer to the first free block in a special location on the disk and caching it in memory
– First block contains a pointer to the next and so on
SG9: FIT1001 Computer Systems S1 2006 50
Implementing File Systems – Scheme not efficient, must
traverse list (but is not a frequent action)
– The first block in the list is generally used each time
> Grouping– Store the addresses of n free blocks
in the first free block– The first n-1 of these block are
actually free– The last block contains the
addresses of another n blocks and so on
> Counting– Takes advantage that generally several contiguous block may be
allocated or freed simultaneously– Can keep address of first free block and number of contiguous
blocks that follow the first block
www.monash.edu.au
File System Examples
SG9: FIT1001 Computer Systems S1 2006 52
File System Examples
• Windows 2000/XP/Server 2003 File Systems – Three file systems supported:
> FAT16> FAT32> NTFS version 5
– Extended FAT16> Evolved from FAT16 system in earlier versions of Windows> Uses long file names (stored in unicode)> ASCII generally used but only limited to 255 characters
SG9: FIT1001 Computer Systems S1 2006 53
File System Examples – Extended FAT16– Partitioning / Formatting
> Assigned a letter followed by a colon: A:, B:, C:, and so on through Z:
> Typically, C: is reserved for the first hard disk
> Format Command– Writes the file system structure to the disk– Includes several additional switches that modify precise program
operation– Switches: extra codes to change the way a particular command
operates– File attributes: file characteristics such a Hidden, Read-only,
Archive, etc.
SG9: FIT1001 Computer Systems S1 2006 54
File System Examples – Extended FAT16 > File stored to disk
– Data is written in the clusters on the disk– Filename stored in the directory
> Linked-list method used– Bad clusters
» Areas never used for file storage> Formatting a disk
– Removes / deletes all data that was on the disk> The FAT tables and root directory are found at the beginning > Each item in a directory consists of 32 bytes> Status bits
– Identify the type of filename contained in each entry:» Volume, Directory, System, Hidden, Read-only, and Archive
SG9: FIT1001 Computer Systems S1 2006 55
File System Examples – FAT32 / NTFS– FAT32
> Accommodates larger disks than FAT16> Allows partitions of up to 2 TB> Windows 2000, XP, Server 2003 > convert from FAT16 or FAT32 to NTFS
– NTFS> Advantages of NTFS:
– Ability to compress file and directory contents on the fly– Better recoverability and stability– Less disk fragmentation– Local file and folder-level security
> Basic features:– Long filenames (LFN)– Better file compression than FAT– Ability to use larger disks and files
SG9: FIT1001 Computer Systems S1 2006 56
File System Examples – NTFS – File activity tracking for better recovery and stability than FAT– POSIX support– Volume striping and volume extensions– Less disk fragmentation than FAT– Equipped with security features that meet the U.S. government’s C2
security specifications» High-level, “top secret” standards for data protection, system
auditing, and system access> NTFS 5 adds several new features:
– Ability to encrypt files– No system reboot required after creating an extended volume– Ability to reduce drive designations– Indexing for fast access– Ability to retain shortcuts and other file information– Ability to establish disk quotas
SG9: FIT1001 Computer Systems S1 2006 57
File System Examples – NTFS – Distributed Link Tracking
» Available in NTFS 5 so that shortcuts are not lost when you move files to another volume
– Uses a Master File Table (MFT)» Located at the beginning of the partition» When a file is made in NTFS, a record for that file is added to
the MFT> Basic disks
– Use traditional disk management
– Dynamic disks – Setup large volumes on one disk– Extend volumes onto additional physical disks
SG9: FIT1001 Computer Systems S1 2006 58
File System Examples – UNIX File System
• Works differently from anything discussed up to this point
• “UNIX file system” is really a misnomer– Many different file systems that can be used– Extended file system (ext or ext fs)– Native in Linux and installed by default– ufs UNIX file system (and also ext/ext2/ext3) uses the
concept of inodes> An inode contains
– Name of file– General information about the file– Information (pointer)
» Pointer information based on logical blocks
SG9: FIT1001 Computer Systems S1 2006 59
File System Examples – UNIX File System – Superblock
> Information about the layout of blocks, sectors, and cylinder groups
– Mount command> OS told to map the root inode of another file system onto the
empty directory– Directory is nothing more than a special file– Two types of devices
> Raw devices and block devices– Raw device has no logical division in blocks, where as a block
device does
– Every device must be represented by a device inode– Symbolic link
> To link a directory entries to a shared file
SG9: FIT1001 Computer Systems S1 2006 60
Next Week
• Study Guide 11– User Interfaces