Upload
amelia-jenkins
View
221
Download
0
Tags:
Embed Size (px)
Citation preview
File ManagementFile Management
Marc’s first try,Marc’s first try,
Please don’t sue me.Please don’t sue me.
IntroductionIntroduction
FilesFiles Long-term existenceLong-term existence
Can be temporally decoupled from applicationsCan be temporally decoupled from applications Sharable between processesSharable between processes Can be structured to the taskCan be structured to the task Can be viewed in various logical mannersCan be viewed in various logical manners Can have permissions for individuals or Can have permissions for individuals or
groupsgroups Can be manipulated in a variety of waysCan be manipulated in a variety of ways
File Manipulation OperationsFile Manipulation Operations
CreateCreate DeleteDelete OpenOpen CloseClose Read (all or a portion)Read (all or a portion) Write (append or update)Write (append or update)
Internal File StructureInternal File Structure
Byte (most UNIX)Byte (most UNIX) FieldField RecordRecord FileFile DatabaseDatabase
Internal File Structure (cont)Internal File Structure (cont)
Field:Field: Basic logical element of dataBasic logical element of data Characterized by length and data typeCharacterized by length and data type
ASCII String, decimal, integer, etcASCII String, decimal, integer, etc Fixed or variable lengthFixed or variable length With variable-length, may have subfieldsWith variable-length, may have subfields Length may be indicated by demarcationLength may be indicated by demarcation
Internal File Structure (cont)Internal File Structure (cont)
Record:Record: A collection of related fieldsA collection of related fields
Can be treated as a unit by app or userCan be treated as a unit by app or user Can be fixed or variable lengthCan be fixed or variable length If # of fields is variable, each has a If # of fields is variable, each has a
namename Entire record usually has a lengthEntire record usually has a length
Internal File Structure (cont)Internal File Structure (cont)
File:File: A collection of similar recordsA collection of similar records
Treated as a single entityTreated as a single entity Can be referenced by nameCan be referenced by name
Access control restrictions Access control restrictions implementedimplemented
Sometimes enforced at the record or Sometimes enforced at the record or field levelfield level
Internal File Structure (cont)Internal File Structure (cont)
Database:Database: Collection of related data (many Collection of related data (many
files)files) Various explicit relationships Various explicit relationships
between databetween data Usually managed by a DBMSUsually managed by a DBMS Not usually ‘built-in’ to an OSNot usually ‘built-in’ to an OS
Internal File Structure (cont)Internal File Structure (cont)
Database:Database: Collection of related data (many Collection of related data (many
files)files) Various explicit relationships Various explicit relationships
between databetween data Usually managed by a DBMSUsually managed by a DBMS Not usually ‘built-in’ to an OSNot usually ‘built-in’ to an OS
File Access OperationsFile Access Operations
Operating primarily on records, but Operating primarily on records, but abstraction can be applied to just abstraction can be applied to just bytes:bytes:
Retrieve_AllRetrieve_All Read all records into memory in Read all records into memory in
sequencesequence Retrieve_OneRetrieve_One
Usually associated with interactive, Usually associated with interactive, transaction-oriented applicationstransaction-oriented applications
File Access Operations File Access Operations (cont)(cont)
Retrieve_Next/PreviousRetrieve_Next/Previous Retrieve next record in some predefined Retrieve next record in some predefined
logical sequence.logical sequence. Often associated with searchOften associated with search
Insert_OneInsert_One May involve random access, or appendingMay involve random access, or appending
Delete_OneDelete_One Certain linkages or other data structures Certain linkages or other data structures
may require updating to preserve may require updating to preserve sequencingsequencing
File Access Operations File Access Operations (cont)(cont)
Update_OneUpdate_One One-two punch:One-two punch:
Retrieve a record, update one or more fields, then Retrieve a record, update one or more fields, then rewirte the updated record back into the file.rewirte the updated record back into the file.
With variable-length fields/records, may require With variable-length fields/records, may require much more data structure manipulation.much more data structure manipulation.
Retrieve_FewRetrieve_Few Get some specified number of recordsGet some specified number of records Usually used in databases when selecting on Usually used in databases when selecting on
certain criteriacertain criteria
File Management SystemsFile Management Systems
Meet data management requirements of userMeet data management requirements of user Guarantee, whenever possible, that file data Guarantee, whenever possible, that file data
are validare valid Optimize performance (both throughput and Optimize performance (both throughput and
response time)response time) Provide I/O support for various storage Provide I/O support for various storage
devicesdevices Minimize or eliminate the potential for lost or Minimize or eliminate the potential for lost or
destroyed datadestroyed data Provide a standardized set of I/O interface Provide a standardized set of I/O interface
routines to use processesroutines to use processes Provide I/O support for multiple usersProvide I/O support for multiple users
File System ArchitectureFile System Architecture
Device driversDevice drivers Responsible for starting and completing I/O Responsible for starting and completing I/O
requests to various peripheral devicesrequests to various peripheral devices Basic file system (physical I/O level in OS)Basic file system (physical I/O level in OS)
Deals with interchange of blocks of dataDeals with interchange of blocks of data Does not understand contentDoes not understand content
Basic I/O supervisor (part of OS)Basic I/O supervisor (part of OS) Maintains control structures for device I/O, Maintains control structures for device I/O,
scheduling, and file status.scheduling, and file status. Logical I/OLogical I/O
General-purpose facility for accessing recordsGeneral-purpose facility for accessing records Maintains basic data about files (indices, etc)Maintains basic data about files (indices, etc)
File Organization and File Organization and AccessAccess
Several, sometimes conflicting criteria for Several, sometimes conflicting criteria for organization of files:organization of files: Short access timeShort access time Ease of updateEase of update Economy of storageEconomy of storage Simple maintenanceSimple maintenance ReliabilityReliability
Conflict: economy of storage vs. redundancy Conflict: economy of storage vs. redundancy Redundancy increases access speed and Redundancy increases access speed and reliability, but also increases storage reliability, but also increases storage requirementsrequirements
Common File OrganizationsCommon File Organizations
PilePile Data are collected in the order in which Data are collected in the order in which
they arrivethey arrive Each record consists of one burst of dataEach record consists of one burst of data Records may have a wildly varying Records may have a wildly varying
assortment of fields and field-lengthsassortment of fields and field-lengths Each field must be self-describingEach field must be self-describing Record access is by exhaustive search.Record access is by exhaustive search. When you don’t know what you’ll get, this When you don’t know what you’ll get, this
uses space well and is easy to updateuses space well and is easy to update
Common File Organizations Common File Organizations (cont)(cont)
Sequential FileSequential File Fixed format used for recordsFixed format used for records Length and position of each field known, requiring Length and position of each field known, requiring
that only values of fields must be storedthat only values of fields must be stored First field of every record is key field, records then First field of every record is key field, records then
stored in key sequence (can have variations)stored in key sequence (can have variations) NOT good for interactive applications with NOT good for interactive applications with
individual record queries or updatesindividual record queries or updates Inserting records is also inefficent, requiring Inserting records is also inefficent, requiring
periodic “batch merges”periodic “batch merges” Can be implemented by organizing file physically Can be implemented by organizing file physically
as linked listas linked list
Common File Organizations Common File Organizations (cont)(cont)
Indexed Sequential FileIndexed Sequential File Uses an index to support random accessUses an index to support random access Requires an overflow file to handle Requires an overflow file to handle
additionsadditions Index uses same key as main file, and Index uses same key as main file, and
has a pointer into the file, greatly has a pointer into the file, greatly improves search time.improves search time.
Can have multilevel indices to get blazing Can have multilevel indices to get blazing fast speedfast speed
Common File Organizations Common File Organizations (cont)(cont)
Indexed FileIndexed File Uses an index to support random accessUses an index to support random access Maintains multiple indices for each type of Maintains multiple indices for each type of
field that may be the subject of a searchfield that may be the subject of a search Records are accessed only by their indices, Records are accessed only by their indices,
never by traversalnever by traversal Variable-length fields can be usedVariable-length fields can be used Exhaustive index and partial index may be Exhaustive index and partial index may be
usedused
Common File Organizations Common File Organizations (cont)(cont)
Hashed FileHashed File Hashes on the key value to go directly to Hashes on the key value to go directly to
the record on disk.the record on disk. Primarily efficient for fixed-length records Primarily efficient for fixed-length records
and Retreive_One operationsand Retreive_One operations
File DirectoriesFile Directories
Is almost always a file itselfIs almost always a file itself Contains info for each file like:Contains info for each file like:
File name, type, organizationFile name, type, organization Volume, starting address, size Volume, starting address, size
used/allocatedused/allocated Owner, access info, permitted actionsOwner, access info, permitted actions Creation date, creator, last accessed, last Creation date, creator, last accessed, last
accessor, last modified, last modifier, last accessor, last modified, last modifier, last backup, current usagebackup, current usage
File Directory OperationsFile Directory Operations SearchSearch
Locate directory entry corresponding to Locate directory entry corresponding to specified filespecified file
Create fileCreate file Add new directory entryAdd new directory entry
Delete fileDelete file Remove directory entryRemove directory entry
ListList Show directory contents, with possible filtersShow directory contents, with possible filters
UpdateUpdate Change properties of the directory or some file Change properties of the directory or some file
attributes only stored in the directoryattributes only stored in the directory
Directory StructureDirectory Structure
Could have a simple, single directoryCould have a simple, single directory Many files make it unwieldy for usersMany files make it unwieldy for users
Hierarchical approach is widely usedHierarchical approach is widely used Master directory with a number of files and Master directory with a number of files and
other directories contained withinother directories contained within Recursive substructure allows virtually Recursive substructure allows virtually
unlimited (in modern systems) number of unlimited (in modern systems) number of levelslevels
Usually uses a hashed structure to store Usually uses a hashed structure to store entriesentries
Directory Structure (cont)Directory Structure (cont)
NamingNaming Directory trees prevent the need for Directory trees prevent the need for
unique file or directory names on unique file or directory names on different levelsdifferent levels
Pathname (in UNIX) specifies the “level” Pathname (in UNIX) specifies the “level” from the top (root or master directory)from the top (root or master directory)
/User_B/Draw/ABC/User_B/Draw/ABC Too complicated to specify full path every Too complicated to specify full path every
time, so we have concept of working time, so we have concept of working directory, both for applications and users:directory, both for applications and users:
If in User_B directory: access ./Draw/ABCIf in User_B directory: access ./Draw/ABC
Access RightsAccess Rights
Individuals or groups of users are granted certain rights to files Individuals or groups of users are granted certain rights to files or directories, in the following hierarchy:or directories, in the following hierarchy:
NoneNone Can’t even know about existence of file or directoryCan’t even know about existence of file or directory
KnowledgeKnowledge User can determine that file exists and its ownerUser can determine that file exists and its owner
ExecutionExecution User can load & execute program but cannot copyUser can load & execute program but cannot copy
ReadRead User can read file for any purposeUser can read file for any purpose
AppendAppend User can add data to the file but cannot modify or deleteUser can add data to the file but cannot modify or delete
UpdateUpdate User can modify, delete, and add to the file’s data (possibly graded)User can modify, delete, and add to the file’s data (possibly graded)
Change protectionChange protection User can change the access rights granted to other usersUser can change the access rights granted to other users
DeletionDeletion User can delete the file from the file system and do anything else.User can delete the file from the file system and do anything else.
Simultaneous AccessSimultaneous Access
When access is granted to append or When access is granted to append or update a file to more than one user, the update a file to more than one user, the OS or file management system must OS or file management system must enforce discipline. A brute-force enforce discipline. A brute-force approach is to allow a user to lock the approach is to allow a user to lock the entire file when it is to be updated. A entire file when it is to be updated. A finer grain of control is to lock individual finer grain of control is to lock individual records during update.records during update.
This is the readers/writers problem, and This is the readers/writers problem, and the classic issues of mutual exclusion and the classic issues of mutual exclusion and deadlock must be addressed.deadlock must be addressed.
Record BlockingRecord Blocking
Blocks are the unit of I/O for secondary Blocks are the unit of I/O for secondary storagestorage
Records are logical unit of access, and must Records are logical unit of access, and must be organized in blocks to perform I/Obe organized in blocks to perform I/O
Three methods:Three methods: Fixed blockingFixed blocking
Fixed-length records are used, with integral number of Fixed-length records are used, with integral number of records stored in a block. Internal fragmentationrecords stored in a block. Internal fragmentation
Variable-length spanned blockingVariable-length spanned blocking Variable-length records are used, packed into blocks with Variable-length records are used, packed into blocks with
no unused space. Pointers used to span blocksno unused space. Pointers used to span blocks Variable-length unspanned blockingVariable-length unspanned blocking
Same as above without spanning, with wasted space in Same as above without spanning, with wasted space in most blocks, because of inability to use remaindersmost blocks, because of inability to use remainders
Record Blocking (cont)Record Blocking (cont)
Fixed blocking common for sequential files Fixed blocking common for sequential files with fixed-length recordswith fixed-length records
Variable-length spanned blocking is efficient Variable-length spanned blocking is efficient of storage and does not limit record size, but of storage and does not limit record size, but more complicated to implement and more complicated to implement and sometimes inefficient. Files are more difficult sometimes inefficient. Files are more difficult to updateto update
Variable-length unspanned blocking results in Variable-length unspanned blocking results in wasted space and limits record size to the wasted space and limits record size to the size of the blocksize of the block
Record-blocking technique may interact with Record-blocking technique may interact with VM. Page may be implemented as integral VM. Page may be implemented as integral number of blocks, or vice versanumber of blocks, or vice versa
File AllocationFile Allocation
Preallocation vs Dynamic AllocationPreallocation vs Dynamic Allocation PreallocationPreallocation
Max file size is declared at time of creationMax file size is declared at time of creation Almost impossible to estimate reliably for most Almost impossible to estimate reliably for most
applicationsapplications Potentially very wastefulPotentially very wasteful
Dynamic:Dynamic: Allocate space to a file in portions as necessaryAllocate space to a file in portions as necessary
Sound familiar?Sound familiar?
File Allocation (cont)File Allocation (cont)
Portion SizePortion Size Choosing a size is a tradeoff. Consider:Choosing a size is a tradeoff. Consider:
Contiguity of space increases performance, especially for Contiguity of space increases performance, especially for Retrieve_NextRetrieve_Next
Having a large number of small portions increases the Having a large number of small portions increases the size of tables needed to manage the allocation infosize of tables needed to manage the allocation info
Having fixed-size portions (blocks) simplifies the Having fixed-size portions (blocks) simplifies the reallocation of spacereallocation of space
Having variable-size or small fixed-size portions Having variable-size or small fixed-size portions minimizes waste of unused storage due to overallocationminimizes waste of unused storage due to overallocation
Leads to 2 alternatives:Leads to 2 alternatives: Variable, large contiguous portionsVariable, large contiguous portions
Better performance, but space hard to reuseBetter performance, but space hard to reuse BlocksBlocks
Provide greater flexibility, but may require complex FA Provide greater flexibility, but may require complex FA structuresstructures
File Allocation (cont)File Allocation (cont)
MethodsMethods Contiguous allocation – preallocationContiguous allocation – preallocation
File Allocation Table (FAT) needs one entry per file, File Allocation Table (FAT) needs one entry per file, showing start block and lengthshowing start block and length
External fragmentation occurs fairly quicklyExternal fragmentation occurs fairly quickly Defragmentation is required to maintain Defragmentation is required to maintain
performanceperformance Chained allocationChained allocation
On individual block basisOn individual block basis Each block contains a pointer to next blockEach block contains a pointer to next block Any free block can be added to a chainAny free block can be added to a chain No external fragmentationNo external fragmentation UnfortunatelyUnfortunately, cannot capitalize on principle of , cannot capitalize on principle of
localitylocality
File Allocation (cont)File Allocation (cont)
Indexed allocationIndexed allocation FAT contains a separate one-level index per fileFAT contains a separate one-level index per file File index kept in its own blockFile index kept in its own block Allocation can be in either fixed-size blocks or Allocation can be in either fixed-size blocks or
variable-size portionsvariable-size portions By blocks eliminates external fragmentationBy blocks eliminates external fragmentation By portions improves localityBy portions improves locality File consolidation on a regular basis will File consolidation on a regular basis will
improve performanceimprove performance Supports both sequential and direct accessSupports both sequential and direct access
File Allocation (cont)File Allocation (cont)
Free Space Management –Free Space Management –In addition to FAT we need disk allocation table (DAT) to manage In addition to FAT we need disk allocation table (DAT) to manage free spacefree space
Bit TablesBit Tables A vector containing one bit for each block on the diskA vector containing one bit for each block on the disk Can be very fast in main memory, tradeoff is spaceCan be very fast in main memory, tradeoff is space
Chained Free PortionsChained Free Portions Free portions are chained together by using a pointer and length Free portions are chained together by using a pointer and length
value in each free portionvalue in each free portion Lends itself to high amounts of fragmentation, and even deletion of Lends itself to high amounts of fragmentation, and even deletion of
highly fragmented files becomes a chorehighly fragmented files becomes a chore Indexing (only for variable-size portions)Indexing (only for variable-size portions)
Treats free space like a file and uses an index table.Treats free space like a file and uses an index table. One entry for every free portion, quite efficientOne entry for every free portion, quite efficient
Free Block ListFree Block List Each block assigned a number sequentially and list of the numbers Each block assigned a number sequentially and list of the numbers
of all free blocks is maintained in a reserved portion of the storage.of all free blocks is maintained in a reserved portion of the storage. Efficiency can be achieved by maintaining a small portion of the list Efficiency can be achieved by maintaining a small portion of the list
in memory at any given timein memory at any given time
ReliabilityReliability
Consider this scenario:Consider this scenario: User A requests a file allocation to add to an User A requests a file allocation to add to an
existing fileexisting file The request is granted and the disk and file The request is granted and the disk and file
allocation tables are updated in main memory but allocation tables are updated in main memory but not yet on disknot yet on disk
The system crashes and subsequently restartsThe system crashes and subsequently restarts User B requests a file allocation and is allocated User B requests a file allocation and is allocated
space on disk that overlaps the last allocation to space on disk that overlaps the last allocation to user Auser A
User A accesses the overlapped portion via a User A accesses the overlapped portion via a reference that is stored inside A’s filereference that is stored inside A’s file
Reliability (cont)Reliability (cont)
Solution:Solution: Lock the disk allocation table on disk, Lock the disk allocation table on disk,
preventing another user from altering the preventing another user from altering the table until the current allocation is completedtable until the current allocation is completed
Search the DAT (in memory) for available Search the DAT (in memory) for available spacespace
Allocate space, update DAT, and update disk Allocate space, update DAT, and update disk (write DAT back to disk, and possibly update (write DAT back to disk, and possibly update pointers for chained allocation).pointers for chained allocation).
Update the FAT on diskUpdate the FAT on disk Unlock the DATUnlock the DAT