55
Chapter 2 Chapter 2 Simple File Simple File Storage and Storage and Retrieval Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal Chin, Ph.D. Virginia Commonwealth University John Wiley & Sons, Inc.

Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

Embed Size (px)

Citation preview

Page 1: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

Chapter 2Chapter 2Simple File Storage Simple File Storage

and Retrievaland Retrieval

Fundamentals of Database Management Systemsby

Mark L. Gillenson, Ph.D.

University of Memphis

Presentation by: Amita Goyal Chin, Ph.D.

Virginia Commonwealth University

John Wiley & Sons, Inc.

Page 2: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-22

Chapter ObjectivesChapter Objectives

Discuss the nature of data.Discuss the nature of data.

Define data-related terms such as entity Define data-related terms such as entity and attribute.and attribute.

Define storage-related terms such as field, Define storage-related terms such as field, record, and file.record, and file.

Page 3: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-33

Chapter ObjectivesChapter Objectives

Identify the four basic operations Identify the four basic operations performed on stored data.performed on stored data.

Compare sequential access of data with Compare sequential access of data with direct access of data.direct access of data.

Describe how a disk device works.Describe how a disk device works.

Page 4: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-44

Chapter ObjectivesChapter Objectives

Describe the principles of file Describe the principles of file organizations and access methods.organizations and access methods.

Describe how simple linear indexes and Describe how simple linear indexes and B+-tree indexes work.B+-tree indexes work.

Describe how hashed files work.Describe how hashed files work.

Page 5: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-55

What is Data?What is Data?

A single piece of data is a single fact about A single piece of data is a single fact about something that interests us.something that interests us.

A fact can be any characteristic of an object.A fact can be any characteristic of an object.

SalespersonNumber

SalespersonName City State

OfficeNumber

CommissionPercentage

Year ofHire

137 Baker Detroit MI 1284 10 1995

Figure 2.1 Facts about salesperson Baker.

Page 6: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-66

Records and FilesRecords and Files

Entity - a “thing” or “object” in our environment Entity - a “thing” or “object” in our environment that we want to keep track of.that we want to keep track of.

Entity set - A collection of entities of the same Entity set - A collection of entities of the same type (e.g., all of the company’s employees).type (e.g., all of the company’s employees).

SalespersonNumber

SalespersonName City State

OfficeNumber

CommissionPercentage

Year ofHire

119 Taylor New York NY 1211 15 2003137 Baker Detroit MI 1284 10 1995186 Adams Dallas TX 1253 15 2001204 Dickens Dallas TX 1209 10 1998255 Lincoln Atlanta GA 1268 20 2003361 Carlyle Detroit MI 1227 20 2001420 Green Tucson AZ 1263 10 1993

Figure 2.2 Salesperson file.

Page 7: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-77

Records and FilesRecords and Files

Attribute - a property of, a characteristic of, or a Attribute - a property of, a characteristic of, or a fact that we know about an entity.fact that we know about an entity.

Some attributes have unique values within an Some attributes have unique values within an entity set.entity set.

SalespersonNumber

SalespersonName City State

OfficeNumber

CommissionPercentage

Year ofHire

119 Taylor New York NY 1211 15 2003137 Baker Detroit MI 1284 10 1995186 Adams Dallas TX 1253 15 2001204 Dickens Dallas TX 1209 10 1998255 Lincoln Atlanta GA 1268 20 2003361 Carlyle Detroit MI 1227 20 2001420 Green Tucson AZ 1263 10 1993

Figure 2.2 Salesperson file.

Page 8: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-88

Records and FilesRecords and Files

Record - each row of a structure like aboveRecord - each row of a structure like above

Fields - the columns, representing the factsFields - the columns, representing the facts

File - the entire structureFile - the entire structure

SalespersonNumber

SalespersonName City State

OfficeNumber

CommissionPercentage

Year ofHire

119 Taylor New York NY 1211 15 2003137 Baker Detroit MI 1284 10 1995186 Adams Dallas TX 1253 15 2001204 Dickens Dallas TX 1209 10 1998255 Lincoln Atlanta GA 1268 20 2003361 Carlyle Detroit MI 1227 20 2001420 Green Tucson AZ 1263 10 1993

Figure 2.2 Salesperson file.

keyfield

Page 9: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-99

Records and FilesRecords and Files

Record type - Record type - a structural description of each and a structural description of each and every record in the fileevery record in the file

Record occurrence / Record instance - a specific Record occurrence / Record instance - a specific record of the salesperson filerecord of the salesperson file

SalespersonNumber

SalespersonName City State

OfficeNumber

CommissionPercentage

Year ofHire

119 Taylor New York NY 1211 15 2003137 Baker Detroit MI 1284 10 1995186 Adams Dallas TX 1253 15 2001204 Dickens Dallas TX 1209 10 1998255 Lincoln Atlanta GA 1268 20 2003361 Carlyle Detroit MI 1227 20 2001420 Green Tucson AZ 1263 10 1993

Figure 2.2 Salesperson file.

Page 10: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-1010

Retrieving and Manipulating Retrieving and Manipulating DataData

Four fundamental operations can be performed on Four fundamental operations can be performed on stored data:stored data: Retrieve or Read - looking at a record’s contents without Retrieve or Read - looking at a record’s contents without

changing itchanging it

Insert - adding a new record to the file, as when a new Insert - adding a new record to the file, as when a new salesperson is hiredsalesperson is hired

Delete - deleting a record from the file, as when a salesperson Delete - deleting a record from the file, as when a salesperson leaves the companyleaves the company

Update - changing one or more of a record’s field valuesUpdate - changing one or more of a record’s field values

Page 11: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-1111

Data Retrieval MethodData Retrieval Method

Sequential access - the retrieval of all or a Sequential access - the retrieval of all or a portion of the records of a file one after another, portion of the records of a file one after another, in some sequence, starting from the beginning, in some sequence, starting from the beginning, until all of the required records have been until all of the required records have been retrieved.retrieved. Physical sequential access - records are retrieved, Physical sequential access - records are retrieved,

one after the other, just as they are stored on the disk one after the other, just as they are stored on the disk device.device.

Logical sequential access - records are retrieved in an Logical sequential access - records are retrieved in an order based on the values of one or a combination of order based on the values of one or a combination of the fields.the fields.

Page 12: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-1212

Data Retrieval MethodData Retrieval Method

Direct Access - the retrieval of a single record of Direct Access - the retrieval of a single record of a file or a subset of the records of a file based a file or a subset of the records of a file based on one or more values of a field or a on one or more values of a field or a combination of fields in the file.combination of fields in the file. a crucial concept in information systems todaya crucial concept in information systems today

requires hardware storage device that will requires hardware storage device that will accommodate direct accessaccommodate direct access

requires software that will take advantage of the requires software that will take advantage of the hardware’s capabilities and store and retrieve the hardware’s capabilities and store and retrieve the data in such a way that it accomplishes direct access.data in such a way that it accomplishes direct access.

Page 13: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-1313

Disk StorageDisk Storage

Primary (Main) Memory - where Primary (Main) Memory - where computers execute programs and process computers execute programs and process datadata Very fastVery fast Permits direct accessPermits direct access Has several drawbacksHas several drawbacks

relatively expensiverelatively expensive not transportablenot transportable is volatileis volatile

Page 14: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-1414

Disk StorageDisk Storage

Secondary Memory - stores the vast Secondary Memory - stores the vast volume of data and the programs that volume of data and the programs that process themprocess them

Data is loaded from secondary memory Data is loaded from secondary memory into primary memory when required for into primary memory when required for processing.processing.

Page 15: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-1515

Primary and Secondary Primary and Secondary MemoryMemory

When a person needs some particular information that’s When a person needs some particular information that’s not in her brain at the moment, she finds a book in the not in her brain at the moment, she finds a book in the library that has the information and, by reading it, library that has the information and, by reading it, transfers the information from the book into her brain.transfers the information from the book into her brain.

Page 16: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-1616

How Disk Storage WorksHow Disk Storage Works

Disks come in a variety of types and Disks come in a variety of types and capacitiescapacities 3.5” diskettes hold 1.44 MB on a single plastic 3.5” diskettes hold 1.44 MB on a single plastic

disk or platterdisk or platter Large, multi-platter, aluminum or ceramic disk Large, multi-platter, aluminum or ceramic disk

unitsunits

Provide a direct access capability to the Provide a direct access capability to the data.data.

Page 17: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-1717

How Disk Storage WorksHow Disk Storage Works

PC diskettes are designed to be PC diskettes are designed to be removable.removable.

Fixed or hard disk drives in PCs are Fixed or hard disk drives in PCs are designed to be nonremovable.designed to be nonremovable.

Page 18: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-1818

How Disk Storage WorksHow Disk Storage Works

Several disk platters Several disk platters are stacked together, are stacked together, and mounted on a and mounted on a central spindle, with central spindle, with some space in some space in between them.between them.

Referred to as “the Referred to as “the disk.”disk.”

Page 19: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-1919

How Disk Storage WorksHow Disk Storage Works

The platters have a The platters have a metallic coating that metallic coating that can be magnetized, can be magnetized, and this is how the and this is how the data is stored, bit-by-data is stored, bit-by-bit.bit.

Page 20: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-2020

Access Arm MechanismAccess Arm Mechanism

The basic disk drive has one access arm mechanism with arms that The basic disk drive has one access arm mechanism with arms that can reach in between the disks.can reach in between the disks.

At the end of each arm are two read/write heads.At the end of each arm are two read/write heads.

The platters spin, all together as a single unit, on the central spindle, The platters spin, all together as a single unit, on the central spindle, at a high velocity.at a high velocity.

Page 21: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-2121

TracksTracks

Concentric circles on which data is stored, Concentric circles on which data is stored, serially by bit.serially by bit.

Numbered track 0, track 1, track 2, and so on.Numbered track 0, track 1, track 2, and so on.

Page 22: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-2222

CylindersCylinders

A collection of tracks, one from each recording A collection of tracks, one from each recording surface, one directly above the other.surface, one directly above the other.

Number of cylinders in a disk = number of Number of cylinders in a disk = number of tracks on any one of its recording surfaces.tracks on any one of its recording surfaces.

Page 23: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-2323

CylindersCylinders

The collection of each surface’s track 76, one The collection of each surface’s track 76, one above the other, seem to take the shape of a above the other, seem to take the shape of a cylinder.cylinder.

This collection of tracks is called cylinder 76.This collection of tracks is called cylinder 76.

Page 24: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-2424

CylindersCylinders

Once we have established a cylinder, it is also Once we have established a cylinder, it is also necessary to number the tracks within the necessary to number the tracks within the cylinder.cylinder.

Cylinder 76’s tracks.Cylinder 76’s tracks.

Page 25: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-2525

Steps in Finding and Steps in Finding and Transferring DataTransferring Data

Seek Time - The time it takes to move the Seek Time - The time it takes to move the access arm mechanism to the correct cylinder access arm mechanism to the correct cylinder from whatever cylinder it’s currently positioned.from whatever cylinder it’s currently positioned.

Head Switching - Selecting the read/write head Head Switching - Selecting the read/write head to access the required track of the cylinder. to access the required track of the cylinder.

Rotational Delay - Waiting for the desired data Rotational Delay - Waiting for the desired data on the track to arrive under the read/write head on the track to arrive under the read/write head as the disk is spinning.as the disk is spinning.

Page 26: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-2626

Steps in Finding and Steps in Finding and Transferring DataTransferring Data

Transfer Time - The time to actually move Transfer Time - The time to actually move the data from the disk to primary memory the data from the disk to primary memory once the previous 3 steps have been once the previous 3 steps have been completed.completed.

Page 27: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-2727

File Organizations and File Organizations and Access MethodsAccess Methods

File Organization - the way that we store File Organization - the way that we store the data for subsequent retrieval.the data for subsequent retrieval.

Access Method - The way that we retrieve Access Method - The way that we retrieve the data, based on it being stored in a the data, based on it being stored in a particular file organization.particular file organization.

Page 28: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-2828

Achieving Direct AccessAchieving Direct Access

An index tool. An index tool.

Hashing Method - a way of storing and retrieving Hashing Method - a way of storing and retrieving records.records.

If we know the value of a field of a record that If we know the value of a field of a record that we want to retrieve, the index or hashing method we want to retrieve, the index or hashing method will pinpoint its location in the file and instruct the will pinpoint its location in the file and instruct the hardware mechanisms of the disk device where hardware mechanisms of the disk device where to find it. to find it.

Page 29: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-2929

The IndexThe Index

Principal is the same Principal is the same as that governing the as that governing the index in the back of a index in the back of a book.book.

Page 30: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-3030

The IndexThe Index

The items of interest are copied over into the The items of interest are copied over into the index, but the original text is not disturbed in any index, but the original text is not disturbed in any way.way.

The items in the index are sorted.The items in the index are sorted.

Each item in the index is associated with a Each item in the index is associated with a “pointer.”“pointer.”

Page 31: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-3131

Simple Linear IndexSimple Linear Index

Index is ordered by Salesperson Name field.Index is ordered by Salesperson Name field.

The first index record shows Adams 3 because the The first index record shows Adams 3 because the record of the Salesperson file with salesperson name record of the Salesperson file with salesperson name Adams is at relative record location 3 in the Salesperson Adams is at relative record location 3 in the Salesperson file.file.

SalespersonName

RecordAddress

RecordNumber

SalespersonNumber

SalespersonName City ...

Adams 3 1 119 Taylor New York ...Baker 2 2 137 Baker Detroit ...Carlyle 6 3 186 Adams Dallas ...Dickens 4 4 204 Dickens Dallas ...Green 7 5 255 Lincoln Atlanta ...Lincoln 5 6 361 Carlyle Detroit ...Taylor 1 7 420 Green Tucson ...

Index Salesperson File

Figure 2.11 Salesperson file on the right with index built over the Salesperson Namefield, on the left.

Page 32: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-3232

Simple Linear IndexSimple Linear Index

An index built over the City field.An index built over the City field.

An index can be built over a field with nonunique An index can be built over a field with nonunique values. values.

SalespersonNumber

RecordAddress

RecordNumber

SalespersonNumber

SalespersonName City ...

Atlanta 5 1 119 Taylor New York ...Dallas 3 2 137 Baker Detroit ...Dallas 4 3 186 Adams Dallas ...Detroit 2 4 204 Dickens Dallas ...Detroit 6 5 255 Lincoln Atlanta ...New York 1 6 361 Carlyle Detroit ...Tucson 7 7 420 Green Tucson ...

Index Salesperson File

Figure 2.12 Salesperson file on the right with index built over the City field, on the left.

Page 33: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-3333

Simple Linear IndexSimple Linear Index

SalespersonNumber

RecordAddress

RecordNumber

SalespersonNumber

SalespersonName City ...

119 1 1 119 Taylor New York ...137 2 2 137 Baker Detroit ...186 3 3 186 Adams Dallas ...204 4 4 204 Dickens Dallas ...255 5 5 255 Lincoln Atlanta ...361 6 6 361 Carlyle Detroit ...420 7 7 420 Green Tucson ...

Index Salesperson File

Figure 2.13 Salesperson file on the right with index built over the Salesperson Numberfield, on the left.

An index built over the Salesperson Number field.An index built over the Salesperson Number field.

Indexed sequential file - the file is stored on the disk in Indexed sequential file - the file is stored on the disk in order based on a set of field values (salesperson order based on a set of field values (salesperson numbers), and an index is built over that same field.numbers), and an index is built over that same field.

Page 34: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-3434

Simple Linear IndexSimple Linear Index

SalespersonName

RecordAddress

RecordNumber

SalespersonNumber

SalespersonName City ...

Adams 3 1 119 Taylor New York ...Baker 2 2 137 Baker Detroit ...Carlyle 6 3 186 Adams Dallas ...Dickens 4 4 204 Dickens Dallas ...Green 7 5 255 Lincoln Atlanta ...Lincoln 5 6 361 Carlyle Detroit ...

Taylor 1 7 420 Green Tucson ...8 452 French New York ...

Index Salesperson File

French 8

Figure 2.14 Salesperson file with the insertion of a record for #452 French. But howcan you squeeze the index record into the proper sequence?

?

Page 35: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-3535

Simple Linear IndexSimple Linear Index

French 8, would have to be inserted between the French 8, would have to be inserted between the index records for Dickens and Green to maintain index records for Dickens and Green to maintain the crucial alphabetic sequence.the crucial alphabetic sequence.

Would have to move all of the index records Would have to move all of the index records from Green to Taylor down one record position.from Green to Taylor down one record position.

Not a good solution for indexing the records of a Not a good solution for indexing the records of a file.file.

Page 36: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-3636

B+-tree IndexB+-tree Index

The most common data indexing system The most common data indexing system in use today.in use today.

Unlike simple linear indexes, B+-trees are Unlike simple linear indexes, B+-trees are designed to comfortably handle the designed to comfortably handle the insertion of new records into the file and to insertion of new records into the file and to handle record deletion.handle record deletion.

Page 37: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-3737

B+-tree IndexB+-tree Index

An arrangement of An arrangement of special index records special index records in a “tree.” in a “tree.”

A single index record, A single index record, the “root,” at the top, the “root,” at the top, with “branches” with “branches” leading down from it leading down from it to other “nodes.”to other “nodes.”

Page 38: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-3838

B+-tree IndexB+-tree Index

The lowest level The lowest level nodes are called nodes are called “leaves.”“leaves.”

Think of it as a family Think of it as a family tree.tree.

Page 39: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-3939

B+-tree IndexB+-tree Index

Each key value in the tree is associated Each key value in the tree is associated with a pointer that is the address of either with a pointer that is the address of either a lower level index record or a cylinder a lower level index record or a cylinder containing the salesperson records.containing the salesperson records.

The index records contain salesperson The index records contain salesperson number key values copied from certain of number key values copied from certain of the salesperson records.the salesperson records.

Page 40: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-4040

B+-tree IndexB+-tree Index

Page 41: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-4141

B+-tree IndexB+-tree Index

Each index record, at every level of the Each index record, at every level of the tree, contains space for the same number tree, contains space for the same number of key value/pointer pairs.of key value/pointer pairs.

Each index record is at least half full.Each index record is at least half full.

The tree index is small and can be kept in The tree index is small and can be kept in main memory indefinitely for a frequently main memory indefinitely for a frequently accessed file. accessed file.

Page 42: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-4242

B+-tree IndexB+-tree Index

Figure 2.15 is an indexed-sequential file, Figure 2.15 is an indexed-sequential file, because the file is stored in sequence by the because the file is stored in sequence by the salesperson numbers and the index is built over salesperson numbers and the index is built over the Salesperson Number field.the Salesperson Number field.

B+-tree indexes can also be used to index B+-tree indexes can also be used to index nonkey, nonunique fields.nonkey, nonunique fields.

In general, the storage unit for groups of records In general, the storage unit for groups of records can be the cylinder or any other physical device can be the cylinder or any other physical device subunit. subunit.

Page 43: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-4343

B+-tree IndexB+-tree Index

Say that a new record Say that a new record with salesperson with salesperson number 365 must be number 365 must be inserted.inserted.

Suppose that cylinder Suppose that cylinder 5 is completely full.5 is completely full.

Page 44: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-4444

B+-tree IndexB+-tree Index

The collection of records The collection of records on the entire cylinder has on the entire cylinder has to be split between to be split between cylinder 5 and an empty cylinder 5 and an empty reserve cylinder, say reserve cylinder, say cylinder 11.cylinder 11.

There is no key There is no key value/pointer pair value/pointer pair representing cylinder 11 representing cylinder 11 in the tree index.in the tree index.

Page 45: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-4545

B+-tree IndexB+-tree Index

The index record, into which the key for the new cylinder The index record, into which the key for the new cylinder should go, which happens to be full, is split into two should go, which happens to be full, is split into two index records.index records.

The now five key values and their associated pointers The now five key values and their associated pointers are divided between them.are divided between them.

Page 46: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-4646

IndexesIndexes

Can be built over any field (unique or nonunique) Can be built over any field (unique or nonunique) of a file.of a file.

Can also be built on a combination of fields.Can also be built on a combination of fields.

In addition to its direct access capability, an In addition to its direct access capability, an index can be used to retrieve the records of a index can be used to retrieve the records of a file in logical sequence based on the indexed file in logical sequence based on the indexed field.field.

Page 47: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-4747

IndexesIndexes

Many separate indexes into a file can exist Many separate indexes into a file can exist simultaneously. The indexes are quite simultaneously. The indexes are quite independent of each other.independent of each other.

When a new record is inserted into a file, When a new record is inserted into a file, an existing record is deleted, or an an existing record is deleted, or an indexed field is updated, all of the affected indexed field is updated, all of the affected indexes must be updated.indexes must be updated.

Page 48: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-4848

Hashed FilesHashed Files

The number of records in a file is The number of records in a file is estimated, and enough space is reserved estimated, and enough space is reserved on a disk to hold them.on a disk to hold them.

Additional space is reserved for additional Additional space is reserved for additional overflow records. overflow records.

Page 49: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-4949

Hashed FilesHashed Files

To determine where to insert a particular To determine where to insert a particular record of the file, the record’s key value is record of the file, the record’s key value is converted by a hashing routine into one of converted by a hashing routine into one of the reserved record locations on the disk.the reserved record locations on the disk.

To find and retrieve the record, the same To find and retrieve the record, the same hashing routine is applied to the key value hashing routine is applied to the key value during the search. during the search.

Page 50: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-5050

Division-Remainder MethodDivision-Remainder Method

Divide the key value of the record that we Divide the key value of the record that we want to insert or retrieve by the number of want to insert or retrieve by the number of record locations that we have reserved.record locations that we have reserved.

Perform the division, discard the quotient, Perform the division, discard the quotient, and use the remainder to tell us where to and use the remainder to tell us where to locate the record.locate the record.

Page 51: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-5151

A Hashed FileA Hashed File

Storage area for 50 Storage area for 50 records plus overflow records plus overflow records.records.

Collision - more than one Collision - more than one key value hashes to the key value hashes to the same location.same location. The two key values are The two key values are

called “synonyms.”called “synonyms.”

Page 52: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-5252

Hashed FilesHashed Files

Hashing disallows any sequential storage based Hashing disallows any sequential storage based on a set of field values.on a set of field values.

A file can only be hashed once, based on the A file can only be hashed once, based on the values of a single field or a single combination of values of a single field or a single combination of fields.fields.

If a file is hashed on one field, direct access If a file is hashed on one field, direct access based on another field can be achieved by based on another field can be achieved by building an index on the other field. building an index on the other field.

Page 53: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-5353

Hashed FilesHashed Files

Many hashing routines have been developed.Many hashing routines have been developed.

The goal is to minimize the number of collisions, The goal is to minimize the number of collisions, which can slow down retrieval performance.which can slow down retrieval performance.

In practice, several hashing routines are tested In practice, several hashing routines are tested on a file to determine the best “fit.” on a file to determine the best “fit.”

Even a relatively simple procedure like the Even a relatively simple procedure like the division-remainder method can be fine-tuned. division-remainder method can be fine-tuned.

Page 54: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-5454

Hashed FilesHashed Files

A hashed file must occasionally be A hashed file must occasionally be reorganized after so many collisions have reorganized after so many collisions have occurred that performance is degraded to occurred that performance is degraded to an unacceptable level.an unacceptable level.

A new storage area with a new number of A new storage area with a new number of storage locations is chosen, and the storage locations is chosen, and the process starts all over again. process starts all over again.

Page 55: Chapter 2 Simple File Storage and Retrieval Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation

2-2-5555

“Copyright 2004 John Wiley & Sons, Inc. All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976 United States Copyright Act without express permission of the copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley & Sons, Inc. The purchaser may make back-up copies for his/her own use only and not for distribution or resale. The Publisher assumes no responsibility for errors, omissions, or damages caused by the use of these programs or from the use of the information contained herein.”