50
File Management File Management Marc’s first try, Marc’s first try, Please don’t sue me. Please don’t sue me.

File Management Marc’s first try, Please don’t sue me

Embed Size (px)

Citation preview

Page 1: File Management Marc’s first try, Please don’t sue me

File ManagementFile Management

Marc’s first try,Marc’s first try,

Please don’t sue me.Please don’t sue me.

Page 2: File Management Marc’s first try, Please don’t sue me

IntroductionIntroduction

FilesFiles Long-term existenceLong-term existence

Can be temporally decoupled from applicationsCan be temporally decoupled from applications Sharable between processesSharable between processes Can be structured to the taskCan be structured to the task Can be viewed in various logical mannersCan be viewed in various logical manners Can have permissions for individuals or Can have permissions for individuals or

groupsgroups Can be manipulated in a variety of waysCan be manipulated in a variety of ways

Page 3: File Management Marc’s first try, Please don’t sue me

File Manipulation OperationsFile Manipulation Operations

CreateCreate DeleteDelete OpenOpen CloseClose Read (all or a portion)Read (all or a portion) Write (append or update)Write (append or update)

Page 4: File Management Marc’s first try, Please don’t sue me

Internal File StructureInternal File Structure

Byte (most UNIX)Byte (most UNIX) FieldField RecordRecord FileFile DatabaseDatabase

Page 5: File Management Marc’s first try, Please don’t sue me

Internal File Structure (cont)Internal File Structure (cont)

Field:Field: Basic logical element of dataBasic logical element of data Characterized by length and data typeCharacterized by length and data type

ASCII String, decimal, integer, etcASCII String, decimal, integer, etc Fixed or variable lengthFixed or variable length With variable-length, may have subfieldsWith variable-length, may have subfields Length may be indicated by demarcationLength may be indicated by demarcation

Page 6: File Management Marc’s first try, Please don’t sue me

Internal File Structure (cont)Internal File Structure (cont)

Record:Record: A collection of related fieldsA collection of related fields

Can be treated as a unit by app or userCan be treated as a unit by app or user Can be fixed or variable lengthCan be fixed or variable length If # of fields is variable, each has a If # of fields is variable, each has a

namename Entire record usually has a lengthEntire record usually has a length

Page 7: File Management Marc’s first try, Please don’t sue me

Internal File Structure (cont)Internal File Structure (cont)

File:File: A collection of similar recordsA collection of similar records

Treated as a single entityTreated as a single entity Can be referenced by nameCan be referenced by name

Access control restrictions Access control restrictions implementedimplemented

Sometimes enforced at the record or Sometimes enforced at the record or field levelfield level

Page 8: File Management Marc’s first try, Please don’t sue me

Internal File Structure (cont)Internal File Structure (cont)

Database:Database: Collection of related data (many Collection of related data (many

files)files) Various explicit relationships Various explicit relationships

between databetween data Usually managed by a DBMSUsually managed by a DBMS Not usually ‘built-in’ to an OSNot usually ‘built-in’ to an OS

Page 9: File Management Marc’s first try, Please don’t sue me

Internal File Structure (cont)Internal File Structure (cont)

Database:Database: Collection of related data (many Collection of related data (many

files)files) Various explicit relationships Various explicit relationships

between databetween data Usually managed by a DBMSUsually managed by a DBMS Not usually ‘built-in’ to an OSNot usually ‘built-in’ to an OS

Page 10: File Management Marc’s first try, Please don’t sue me

File Access OperationsFile Access Operations

Operating primarily on records, but Operating primarily on records, but abstraction can be applied to just abstraction can be applied to just bytes:bytes:

Retrieve_AllRetrieve_All Read all records into memory in Read all records into memory in

sequencesequence Retrieve_OneRetrieve_One

Usually associated with interactive, Usually associated with interactive, transaction-oriented applicationstransaction-oriented applications

Page 11: File Management Marc’s first try, Please don’t sue me

File Access Operations File Access Operations (cont)(cont)

Retrieve_Next/PreviousRetrieve_Next/Previous Retrieve next record in some predefined Retrieve next record in some predefined

logical sequence.logical sequence. Often associated with searchOften associated with search

Insert_OneInsert_One May involve random access, or appendingMay involve random access, or appending

Delete_OneDelete_One Certain linkages or other data structures Certain linkages or other data structures

may require updating to preserve may require updating to preserve sequencingsequencing

Page 12: File Management Marc’s first try, Please don’t sue me

File Access Operations File Access Operations (cont)(cont)

Update_OneUpdate_One One-two punch:One-two punch:

Retrieve a record, update one or more fields, then Retrieve a record, update one or more fields, then rewirte the updated record back into the file.rewirte the updated record back into the file.

With variable-length fields/records, may require With variable-length fields/records, may require much more data structure manipulation.much more data structure manipulation.

Retrieve_FewRetrieve_Few Get some specified number of recordsGet some specified number of records Usually used in databases when selecting on Usually used in databases when selecting on

certain criteriacertain criteria

Page 13: File Management Marc’s first try, Please don’t sue me

File Management SystemsFile Management Systems

Meet data management requirements of userMeet data management requirements of user Guarantee, whenever possible, that file data Guarantee, whenever possible, that file data

are validare valid Optimize performance (both throughput and Optimize performance (both throughput and

response time)response time) Provide I/O support for various storage Provide I/O support for various storage

devicesdevices Minimize or eliminate the potential for lost or Minimize or eliminate the potential for lost or

destroyed datadestroyed data Provide a standardized set of I/O interface Provide a standardized set of I/O interface

routines to use processesroutines to use processes Provide I/O support for multiple usersProvide I/O support for multiple users

Page 14: File Management Marc’s first try, Please don’t sue me
Page 15: File Management Marc’s first try, Please don’t sue me

File System ArchitectureFile System Architecture

Device driversDevice drivers Responsible for starting and completing I/O Responsible for starting and completing I/O

requests to various peripheral devicesrequests to various peripheral devices Basic file system (physical I/O level in OS)Basic file system (physical I/O level in OS)

Deals with interchange of blocks of dataDeals with interchange of blocks of data Does not understand contentDoes not understand content

Basic I/O supervisor (part of OS)Basic I/O supervisor (part of OS) Maintains control structures for device I/O, Maintains control structures for device I/O,

scheduling, and file status.scheduling, and file status. Logical I/OLogical I/O

General-purpose facility for accessing recordsGeneral-purpose facility for accessing records Maintains basic data about files (indices, etc)Maintains basic data about files (indices, etc)

Page 16: File Management Marc’s first try, Please don’t sue me
Page 17: File Management Marc’s first try, Please don’t sue me

File Organization and File Organization and AccessAccess

Several, sometimes conflicting criteria for Several, sometimes conflicting criteria for organization of files:organization of files: Short access timeShort access time Ease of updateEase of update Economy of storageEconomy of storage Simple maintenanceSimple maintenance ReliabilityReliability

Conflict: economy of storage vs. redundancy Conflict: economy of storage vs. redundancy Redundancy increases access speed and Redundancy increases access speed and reliability, but also increases storage reliability, but also increases storage requirementsrequirements

Page 18: File Management Marc’s first try, Please don’t sue me

Common File OrganizationsCommon File Organizations

PilePile Data are collected in the order in which Data are collected in the order in which

they arrivethey arrive Each record consists of one burst of dataEach record consists of one burst of data Records may have a wildly varying Records may have a wildly varying

assortment of fields and field-lengthsassortment of fields and field-lengths Each field must be self-describingEach field must be self-describing Record access is by exhaustive search.Record access is by exhaustive search. When you don’t know what you’ll get, this When you don’t know what you’ll get, this

uses space well and is easy to updateuses space well and is easy to update

Page 19: File Management Marc’s first try, Please don’t sue me
Page 20: File Management Marc’s first try, Please don’t sue me

Common File Organizations Common File Organizations (cont)(cont)

Sequential FileSequential File Fixed format used for recordsFixed format used for records Length and position of each field known, requiring Length and position of each field known, requiring

that only values of fields must be storedthat only values of fields must be stored First field of every record is key field, records then First field of every record is key field, records then

stored in key sequence (can have variations)stored in key sequence (can have variations) NOT good for interactive applications with NOT good for interactive applications with

individual record queries or updatesindividual record queries or updates Inserting records is also inefficent, requiring Inserting records is also inefficent, requiring

periodic “batch merges”periodic “batch merges” Can be implemented by organizing file physically Can be implemented by organizing file physically

as linked listas linked list

Page 21: File Management Marc’s first try, Please don’t sue me
Page 22: File Management Marc’s first try, Please don’t sue me

Common File Organizations Common File Organizations (cont)(cont)

Indexed Sequential FileIndexed Sequential File Uses an index to support random accessUses an index to support random access Requires an overflow file to handle Requires an overflow file to handle

additionsadditions Index uses same key as main file, and Index uses same key as main file, and

has a pointer into the file, greatly has a pointer into the file, greatly improves search time.improves search time.

Can have multilevel indices to get blazing Can have multilevel indices to get blazing fast speedfast speed

Page 23: File Management Marc’s first try, Please don’t sue me
Page 24: File Management Marc’s first try, Please don’t sue me

Common File Organizations Common File Organizations (cont)(cont)

Indexed FileIndexed File Uses an index to support random accessUses an index to support random access Maintains multiple indices for each type of Maintains multiple indices for each type of

field that may be the subject of a searchfield that may be the subject of a search Records are accessed only by their indices, Records are accessed only by their indices,

never by traversalnever by traversal Variable-length fields can be usedVariable-length fields can be used Exhaustive index and partial index may be Exhaustive index and partial index may be

usedused

Page 25: File Management Marc’s first try, Please don’t sue me
Page 26: File Management Marc’s first try, Please don’t sue me

Common File Organizations Common File Organizations (cont)(cont)

Hashed FileHashed File Hashes on the key value to go directly to Hashes on the key value to go directly to

the record on disk.the record on disk. Primarily efficient for fixed-length records Primarily efficient for fixed-length records

and Retreive_One operationsand Retreive_One operations

Page 27: File Management Marc’s first try, Please don’t sue me
Page 28: File Management Marc’s first try, Please don’t sue me

File DirectoriesFile Directories

Is almost always a file itselfIs almost always a file itself Contains info for each file like:Contains info for each file like:

File name, type, organizationFile name, type, organization Volume, starting address, size Volume, starting address, size

used/allocatedused/allocated Owner, access info, permitted actionsOwner, access info, permitted actions Creation date, creator, last accessed, last Creation date, creator, last accessed, last

accessor, last modified, last modifier, last accessor, last modified, last modifier, last backup, current usagebackup, current usage

Page 29: File Management Marc’s first try, Please don’t sue me

File Directory OperationsFile Directory Operations SearchSearch

Locate directory entry corresponding to Locate directory entry corresponding to specified filespecified file

Create fileCreate file Add new directory entryAdd new directory entry

Delete fileDelete file Remove directory entryRemove directory entry

ListList Show directory contents, with possible filtersShow directory contents, with possible filters

UpdateUpdate Change properties of the directory or some file Change properties of the directory or some file

attributes only stored in the directoryattributes only stored in the directory

Page 30: File Management Marc’s first try, Please don’t sue me

Directory StructureDirectory Structure

Could have a simple, single directoryCould have a simple, single directory Many files make it unwieldy for usersMany files make it unwieldy for users

Hierarchical approach is widely usedHierarchical approach is widely used Master directory with a number of files and Master directory with a number of files and

other directories contained withinother directories contained within Recursive substructure allows virtually Recursive substructure allows virtually

unlimited (in modern systems) number of unlimited (in modern systems) number of levelslevels

Usually uses a hashed structure to store Usually uses a hashed structure to store entriesentries

Page 31: File Management Marc’s first try, Please don’t sue me
Page 32: File Management Marc’s first try, Please don’t sue me

Directory Structure (cont)Directory Structure (cont)

NamingNaming Directory trees prevent the need for Directory trees prevent the need for

unique file or directory names on unique file or directory names on different levelsdifferent levels

Pathname (in UNIX) specifies the “level” Pathname (in UNIX) specifies the “level” from the top (root or master directory)from the top (root or master directory)

/User_B/Draw/ABC/User_B/Draw/ABC Too complicated to specify full path every Too complicated to specify full path every

time, so we have concept of working time, so we have concept of working directory, both for applications and users:directory, both for applications and users:

If in User_B directory: access ./Draw/ABCIf in User_B directory: access ./Draw/ABC

Page 33: File Management Marc’s first try, Please don’t sue me
Page 34: File Management Marc’s first try, Please don’t sue me

Access RightsAccess Rights

Individuals or groups of users are granted certain rights to files Individuals or groups of users are granted certain rights to files or directories, in the following hierarchy:or directories, in the following hierarchy:

NoneNone Can’t even know about existence of file or directoryCan’t even know about existence of file or directory

KnowledgeKnowledge User can determine that file exists and its ownerUser can determine that file exists and its owner

ExecutionExecution User can load & execute program but cannot copyUser can load & execute program but cannot copy

ReadRead User can read file for any purposeUser can read file for any purpose

AppendAppend User can add data to the file but cannot modify or deleteUser can add data to the file but cannot modify or delete

UpdateUpdate User can modify, delete, and add to the file’s data (possibly graded)User can modify, delete, and add to the file’s data (possibly graded)

Change protectionChange protection User can change the access rights granted to other usersUser can change the access rights granted to other users

DeletionDeletion User can delete the file from the file system and do anything else.User can delete the file from the file system and do anything else.

Page 35: File Management Marc’s first try, Please don’t sue me
Page 36: File Management Marc’s first try, Please don’t sue me

Simultaneous AccessSimultaneous Access

When access is granted to append or When access is granted to append or update a file to more than one user, the update a file to more than one user, the OS or file management system must OS or file management system must enforce discipline. A brute-force enforce discipline. A brute-force approach is to allow a user to lock the approach is to allow a user to lock the entire file when it is to be updated. A entire file when it is to be updated. A finer grain of control is to lock individual finer grain of control is to lock individual records during update.records during update.

This is the readers/writers problem, and This is the readers/writers problem, and the classic issues of mutual exclusion and the classic issues of mutual exclusion and deadlock must be addressed.deadlock must be addressed.

Page 37: File Management Marc’s first try, Please don’t sue me

Record BlockingRecord Blocking

Blocks are the unit of I/O for secondary Blocks are the unit of I/O for secondary storagestorage

Records are logical unit of access, and must Records are logical unit of access, and must be organized in blocks to perform I/Obe organized in blocks to perform I/O

Three methods:Three methods: Fixed blockingFixed blocking

Fixed-length records are used, with integral number of Fixed-length records are used, with integral number of records stored in a block. Internal fragmentationrecords stored in a block. Internal fragmentation

Variable-length spanned blockingVariable-length spanned blocking Variable-length records are used, packed into blocks with Variable-length records are used, packed into blocks with

no unused space. Pointers used to span blocksno unused space. Pointers used to span blocks Variable-length unspanned blockingVariable-length unspanned blocking

Same as above without spanning, with wasted space in Same as above without spanning, with wasted space in most blocks, because of inability to use remaindersmost blocks, because of inability to use remainders

Page 38: File Management Marc’s first try, Please don’t sue me
Page 39: File Management Marc’s first try, Please don’t sue me

Record Blocking (cont)Record Blocking (cont)

Fixed blocking common for sequential files Fixed blocking common for sequential files with fixed-length recordswith fixed-length records

Variable-length spanned blocking is efficient Variable-length spanned blocking is efficient of storage and does not limit record size, but of storage and does not limit record size, but more complicated to implement and more complicated to implement and sometimes inefficient. Files are more difficult sometimes inefficient. Files are more difficult to updateto update

Variable-length unspanned blocking results in Variable-length unspanned blocking results in wasted space and limits record size to the wasted space and limits record size to the size of the blocksize of the block

Record-blocking technique may interact with Record-blocking technique may interact with VM. Page may be implemented as integral VM. Page may be implemented as integral number of blocks, or vice versanumber of blocks, or vice versa

Page 40: File Management Marc’s first try, Please don’t sue me

File AllocationFile Allocation

Preallocation vs Dynamic AllocationPreallocation vs Dynamic Allocation PreallocationPreallocation

Max file size is declared at time of creationMax file size is declared at time of creation Almost impossible to estimate reliably for most Almost impossible to estimate reliably for most

applicationsapplications Potentially very wastefulPotentially very wasteful

Dynamic:Dynamic: Allocate space to a file in portions as necessaryAllocate space to a file in portions as necessary

Sound familiar?Sound familiar?

Page 41: File Management Marc’s first try, Please don’t sue me

File Allocation (cont)File Allocation (cont)

Portion SizePortion Size Choosing a size is a tradeoff. Consider:Choosing a size is a tradeoff. Consider:

Contiguity of space increases performance, especially for Contiguity of space increases performance, especially for Retrieve_NextRetrieve_Next

Having a large number of small portions increases the Having a large number of small portions increases the size of tables needed to manage the allocation infosize of tables needed to manage the allocation info

Having fixed-size portions (blocks) simplifies the Having fixed-size portions (blocks) simplifies the reallocation of spacereallocation of space

Having variable-size or small fixed-size portions Having variable-size or small fixed-size portions minimizes waste of unused storage due to overallocationminimizes waste of unused storage due to overallocation

Leads to 2 alternatives:Leads to 2 alternatives: Variable, large contiguous portionsVariable, large contiguous portions

Better performance, but space hard to reuseBetter performance, but space hard to reuse BlocksBlocks

Provide greater flexibility, but may require complex FA Provide greater flexibility, but may require complex FA structuresstructures

Page 42: File Management Marc’s first try, Please don’t sue me
Page 43: File Management Marc’s first try, Please don’t sue me

File Allocation (cont)File Allocation (cont)

MethodsMethods Contiguous allocation – preallocationContiguous allocation – preallocation

File Allocation Table (FAT) needs one entry per file, File Allocation Table (FAT) needs one entry per file, showing start block and lengthshowing start block and length

External fragmentation occurs fairly quicklyExternal fragmentation occurs fairly quickly Defragmentation is required to maintain Defragmentation is required to maintain

performanceperformance Chained allocationChained allocation

On individual block basisOn individual block basis Each block contains a pointer to next blockEach block contains a pointer to next block Any free block can be added to a chainAny free block can be added to a chain No external fragmentationNo external fragmentation UnfortunatelyUnfortunately, cannot capitalize on principle of , cannot capitalize on principle of

localitylocality

Page 44: File Management Marc’s first try, Please don’t sue me
Page 45: File Management Marc’s first try, Please don’t sue me

File Allocation (cont)File Allocation (cont)

Indexed allocationIndexed allocation FAT contains a separate one-level index per fileFAT contains a separate one-level index per file File index kept in its own blockFile index kept in its own block Allocation can be in either fixed-size blocks or Allocation can be in either fixed-size blocks or

variable-size portionsvariable-size portions By blocks eliminates external fragmentationBy blocks eliminates external fragmentation By portions improves localityBy portions improves locality File consolidation on a regular basis will File consolidation on a regular basis will

improve performanceimprove performance Supports both sequential and direct accessSupports both sequential and direct access

Page 46: File Management Marc’s first try, Please don’t sue me
Page 47: File Management Marc’s first try, Please don’t sue me
Page 48: File Management Marc’s first try, Please don’t sue me

File Allocation (cont)File Allocation (cont)

Free Space Management –Free Space Management –In addition to FAT we need disk allocation table (DAT) to manage In addition to FAT we need disk allocation table (DAT) to manage free spacefree space

Bit TablesBit Tables A vector containing one bit for each block on the diskA vector containing one bit for each block on the disk Can be very fast in main memory, tradeoff is spaceCan be very fast in main memory, tradeoff is space

Chained Free PortionsChained Free Portions Free portions are chained together by using a pointer and length Free portions are chained together by using a pointer and length

value in each free portionvalue in each free portion Lends itself to high amounts of fragmentation, and even deletion of Lends itself to high amounts of fragmentation, and even deletion of

highly fragmented files becomes a chorehighly fragmented files becomes a chore Indexing (only for variable-size portions)Indexing (only for variable-size portions)

Treats free space like a file and uses an index table.Treats free space like a file and uses an index table. One entry for every free portion, quite efficientOne entry for every free portion, quite efficient

Free Block ListFree Block List Each block assigned a number sequentially and list of the numbers Each block assigned a number sequentially and list of the numbers

of all free blocks is maintained in a reserved portion of the storage.of all free blocks is maintained in a reserved portion of the storage. Efficiency can be achieved by maintaining a small portion of the list Efficiency can be achieved by maintaining a small portion of the list

in memory at any given timein memory at any given time

Page 49: File Management Marc’s first try, Please don’t sue me

ReliabilityReliability

Consider this scenario:Consider this scenario: User A requests a file allocation to add to an User A requests a file allocation to add to an

existing fileexisting file The request is granted and the disk and file The request is granted and the disk and file

allocation tables are updated in main memory but allocation tables are updated in main memory but not yet on disknot yet on disk

The system crashes and subsequently restartsThe system crashes and subsequently restarts User B requests a file allocation and is allocated User B requests a file allocation and is allocated

space on disk that overlaps the last allocation to space on disk that overlaps the last allocation to user Auser A

User A accesses the overlapped portion via a User A accesses the overlapped portion via a reference that is stored inside A’s filereference that is stored inside A’s file

Page 50: File Management Marc’s first try, Please don’t sue me

Reliability (cont)Reliability (cont)

Solution:Solution: Lock the disk allocation table on disk, Lock the disk allocation table on disk,

preventing another user from altering the preventing another user from altering the table until the current allocation is completedtable until the current allocation is completed

Search the DAT (in memory) for available Search the DAT (in memory) for available spacespace

Allocate space, update DAT, and update disk Allocate space, update DAT, and update disk (write DAT back to disk, and possibly update (write DAT back to disk, and possibly update pointers for chained allocation).pointers for chained allocation).

Update the FAT on diskUpdate the FAT on disk Unlock the DATUnlock the DAT