YV - Access Methods and Indexes 274 Kεφάλαιο 7 Φυσικός Σχεδιασμός -- Αρχεία

YV - Access Methods and Indexes 1

Kεφάλαιο 7Kεφάλαιο 7

Φυσικός Σχεδιασμός -- Αρχεία


Physical StoragePhysical Storage

The DATA MANAGER is the component of the DBMS responsible to interact with the physical database - The related concepts are: file system, buffer management, access methods

Each DBMS has its own data manager, which may employ a computer-system standard file system enhanced with additional mechanisms/facilities

A DBMS involves the following Memory Hierarchy:

Tape Disk Memory Cache (sequential) (direct)– Tapes are used for mass storage, disks for the persistent

database storage, while the main memory and cache for processing the transactions and DBMS functions


Memory HierarchyMemory Hierarchy

Cache

DATACURRENCY

.

Main Memory

Online External Storage

Near-line (Archive) Storage

Off-line

MEMORY CAPACITY

Electronic Storage

Nonvolatile ElectronicOR Magnetic/Optical(block-addressed)

Disk JukeboxesOR Tape Robots

Registers

EXPENSE


Disks and Files

DBMS stores information on (“hard”) disks. This has major implications for DBMS design!

– READ: transfer data from disk to main memory (RAM).– WRITE: transfer data from RAM to disk.– Both are high-cost operations, relative to in-memory operations,

so must be planned carefully!


Why Not Store Everything in Main Memory?

Costs too much. $1000 will buy you either 128MB of RAM or 7.5GB of disk today.

Main memory is volatile. We want data to be saved between runs. (Obviously!)

Typical storage hierarchy:– Main memory (RAM) for currently used data.– Disk for the main database (secondary storage).– Tapes for archiving older versions of the data (tertiary

storage).


Physical Storage MediaPhysical Storage Media

Disk Storage Devices– Data is stored as magnetized areas on magnetic disks– Disk Packs have many disks connected to a rotating

spindle– Disks are divided into concentric circular tracks on each

surface - track capabilities range from 4 to 50 KBytes– Tracks are divided into blocks (pages), with fixed size for a

specific system. The sizes range from 512 to 4096 bytes– Whole blocks are transferred between disks and memory– A physical disk block address consists of: a surface number,

track number (within surface) & block number (within track)– Reading or writing a disk block is time consuming

(because of the seek time and the rotational delay)


Accessing a Disk Page

Time to access (read/write) a disk block:– seek time (moving arms to position disk head on track)– rotational delay (waiting for block to rotate under head)– transfer time (actually moving data to/from disk surface)

Seek time and rotational delay dominate.– Seek time varies from about 1 to 20msec– Rotational delay varies from 0 to 10msec– Transfer rate is about 1msec per 4KB page

Key to lower I/O cost: reduce seek/rotation delays! Hardware vs. software solutions?


Components of a Disk

Platters

The platters spin (say, 90rps).

Spindle

The arm assembly is moved in or out to position a head on a desired track. Tracks under heads make a cylinder (imaginary!).

Disk head

Arm movement

Arm assembly

Only one head reads/writes at any one time.

Tracks

Sector

Block size is a multiple of sector size (which is fixed).


Disk Space Management

Lowest layer of DBMS software manages space on disk. Higher levels call upon this layer to:

– allocate/de-allocate a page– read/write a page

Request for a sequence of pages must be satisfied by allocating the pages sequentially on disk! Higher levels don’t need to know how this is done, or how free space is managed.


Basic DefinitionsBasic Definitions

A record is a named collection of data values (items) A file is a named sequence of fixed sized records stored

in a sequence of fixed sized blocks (pages) on the disk Each file has a file descriptor (file header) with

information about the file (item names, data types, etc.) The blocking factor for a file is the average number of

file records stored in a disk block Each block has a name called its address File records can be unspanned (no record can span two

blocks) or spanned (the record is stored in more than one block)


File SystemFile System

FILE SYSTEM: Its primary role is to manage files stored on pages:– Create a file – Insert a page– Modify a page– Delete a page– Retrieve a page– Reorganize a file– Terminate access to a file, etc.

Seen from the outside a file system is responsible for the:– translation from file name to the absolute address of the file– translation from a record key to the page address


Memory ManagementMemory Management

..

OnlineExternal

Nearline External

DBMS Application

Database Access Methods

Logging and Recovery

TransactionPrograms

SET-ORIENTED

TUPLE-ORIENTED

Tuple ManagementAssociative Access

Record Management

File Manager

Buffer ManagerMain

Manages

Manages

BLOCK-ORIENTED

Archive ManagerManages

Buffer Management

File Management


Buffer ManagementBuffer Management

A Buffer is a part of the main memory available for the storage of blocks (pages) transferred to/from disks

The BUFFER MANAGER is the subsystem responsible for the allocation of buffer space (transparently to the user)

Typical Operation of the Buffer ManagerGiven a user request for a page:– checks if the page is in the buffer already,– if it is, then it passes its address to the user– if it is not, then it brings it from the disk into the buffer,

possibly replacing another page (if no space is available), then passes its address to the user


Buffer Management in a DBMS

Data must be in RAM for DBMS to operate on it! Table of <frame#, pageid> pairs is maintained.

DB

MAIN MEMORY

DISK

disk page

free frame

Page Requests from Higher Levels

BUFFER POOL

choice of frame dictatedby replacement policy


When a Page is Requested ...

If requested page is not in pool:

– Choose a frame for replacement– If frame is dirty, write it to disk– Read requested page into chosen frame

Pin the page and return its address.

If requests can be predicted (e.g., sequential scans) pages can be pre-fetched several pages at a time!


More on Buffer Management

Requestor of page must unpin it, and indicate whether page has been modified: – dirty bit is used for this.

Page in pool may be requested many times, – a pin count is used. A page is a candidate for replacement

iff pin count = 0.

CC & recovery may entail additional I/O when a frame is chosen for replacement. (Write-Ahead Log protocol; more later.)


Buffer Replacement Policy

Frame is chosen for replacement by a replacement policy:– Least-recently-used (LRU), Clock, MRU etc.

Policy can have big impact on # of I/O’s; depends on the access pattern.

Sequential flooding: Nasty situation caused by LRU + repeated sequential scans.

– # buffer frames < # pages in file means each page request causes an I/O. MRU much better in this situation (but not in all situations, of course).


Buffer Management (2)Buffer Management (2)

A buffer manager is very similar to virtual memory managers (as found in Operating Systems). But, it is usually much more sophisticated, since it is specially designed for database systems and can thus predict much better the needs and idiosyncrasies of the database system

NEW ISSUES:– replacement strategy. The typical strategies in Operating

Systems (e.g., LRU) do not perform well in databases (MRU)

– pinned records. It is often the case that the DBMS needs to specify that some blocks remain continuously (are pinned) in the buffer.

– forced output of blocks. It is usual for the DBMS (e.g., for recovery reasons) to force some blocks to disk prematurely


Buffer Management (3)Buffer Management (3)

A Buffer Manager keeps for each page in the buffer:

– In which disk page it is stored– Whether it has been modified or not (dirty page)– Information for the replacement strategy that is used

There are several alternative buffer structure designs:

– The same buffer pool is used for all relations– Separate buffer pool is used for each relation– As above, but with relations borrowing buffers from other

relations


DBMS vs. OS File System

OS does disk space & buffer mgmt: why not let OS manage these tasks?

Differences in OS support: portability issues Some limitations, e.g., files can’t span disks. Buffer management in DBMS requires ability to:

– pin a page in buffer pool, force a page to disk (important for implementing CC & recovery),

– adjust replacement policy, and pre-fetch pages based on access patterns in typical DB operations.


Access MethodsAccess Methods

The access methods are responsible for the following:

– Allocation of file records (tuples) within blocks– Support of record addressing by address and by value.

In essence, converting between references to records and physical blocks on storage devices

– Support of secondary (auxiliary) file structures in order to make record addressing more efficient.

In the sequel, we examine the physical organization of records and blocks and also the basic file organizations


File ManagementFile Management

PHYSICAL ORGANIZATION OF RECORDS / BLOCKS

Key Issues:– Formatting fields within a record– Formatting records within a block– Assigning records to blocks

Formatting fields within a record– Fixed length stored in specific orderF1 F2 F3 F4 F5

B : Base address

L1 L2 L3 L4 L5

The address of Fi is:

B + Ó Lkk = 1

i-1


Formatting Fields Formatting Fields

– Fixed length fields stored as an indexed heap

--- Fields need not be stored in order

--- There is exactly one pointer in the header for each field

(whether it is present or not)

F5 F3 F1 F4


Formatting Fields (2)Formatting Fields (2)

– Variable length fields delimited by special symbols

– Variable length fields delimited by length

L1 L2 L3 L4 L5

F1 F2 F3 F4 F5

$ $ $ $

F1 F2 F3 F4 F5


Formatting RecordsFormatting Records

Formatting records within a block– Records stored contiguously within the block (fixed packed)

L

1 2..

N

.........

B

A record is located by a simple address calculation

Ri = B + (i-1)*L


Formatting Records (2)Formatting Records (2)

– The above structure is highly inflexible, introducing several inefficiencies

* records may span blocks (happens very often and is costly)

* insertion and deletion become complicated

Block Boundary1 2 3 4 5

1 2..

N

.........

Delete this record



– A block header contains an array of pointers pointing to the records within the block (indexed heap)

DESCRIPTORNextPrimary

NextOverflow ...

........ grows



A record is located by providing its block number and its index in the pointer array in the block header. This combination (block number, index) is called TID

Insertion and deletion are easy: they are accomplished by manipulating the pointer array

The block may be reorganized without affecting external pointers (pointing to records). That is, records retain their TID even if they are moved around within the block.


Assigning Records to Blocks - FILE ORGANIZATION

Assigning Records to Blocks - FILE ORGANIZATION

Assigning records to blocks

-- Arbitrary placement

-- Keyed placement

-- Keyed placement by sorting

Arbitrary placement: Records are assigned to blocks arbitrarily, usually according to the order of insertion (this is called a HEAP or PILE)– simplest file organization strategy– uses as many blocks as necessary - links blocks together– provides no help for retrieval whatsoever (linear search)


Unordered (Heap) Files

Simplest file structure contains records in no particular order. As file grows and shrinks, disk pages are allocated and de-

allocated. To support record level operations, we must:

– keep track of the pages in a file– keep track of free space on pages– keep track of the records on a page

There are many alternatives for keeping track of this.


Heap File Implemented as a List

The header page id and Heap file name must be stored someplace. Each page contains 2 `pointers’ plus data.

HeaderPage

DataPage

DataPage

DataPage

DataPage

DataPage

DataPage Pages with

Free Space

Full Pages


Heap File Using a Page Directory

The entry for a page can include the number of free bytes on the page.

The directory is a collection of pages; linked list implementation is just one alternative.

– Much smaller than linked list of all HF pages!

DataPage 1

DataPage 2

DataPage N

HeaderPage

DIRECTORY


Indexes

A Heap file allows us to retrieve records:– by specifying the rid, or– by scanning all records sequentially

Sometimes, we want to retrieve records by specifying the values in one or more fields, e.g.,– Find all students in the “CS” department– Find all students with a gpa > 3

Indexes are file structures that enable us to answer such value-based queries efficiently.


System Catalogs

For each index:– structure (e.g., B+ tree) and search key fields

For each relation:– name, file name, file structure (e.g., Heap file)– attribute name and type, for each attribute– index name, for each index– integrity constraints

For each view:– view name and definition

Plus statistics, authorization, buffer pool size, etc.

Catalogs are themselves stored as relations!


File Organization - BasicsFile Organization - Basics

Keyed placement: Records are assigned to blocks according to the values of some key fields. They can then be retrieved with associative access

– The supporting structure implementing the mapping of records with specific values in the key fields to blocks is called an INDEX

– It facilitates the execution of retrievals since, to a large extent, only relevant records are retrieved.

– Updates (insertions and deletions) become more expensive, because of the requirement to maintain the index

– Three major index structures:

(a) ISAM

(b) HASHING

(c) B-Trees


File Organization - BasicsFile Organization - Basics

Keyed placement by sorting: Sort the file on the key field(s) and store it in that order (SEQUENTIAL FILE)

– It is a special case of the general keyed placement, with the distinguishing characteristic that there is no index to be supported

– Retrievals are performed employing binary search– Advantages:

» faster selection than non-keyed» good for range queries (e.g., salary between 25 and 35 K)» efficient joins (applying merge-scan)

– Disadvantages:

» Slower equality selection than other keyed index structures» Updates are extremely expensive (and complex)


HashingHashing

The magic of folding and hashing

...

Range ofpotential key values

(shadedareas denoteused keyvalues)

FOLDING

Range of PositiveIntegers

HASHING

RecordAddressSpace


Hashing EssentialsHashing Essentials

Key values usually come from very large domains (e.g., character strings of a certain length).

First, they have to be converted into a numerical representation: FOLDING

Then, the numerical value is transformed into a valid address from the address space: HASHING

Factors that are important: domain values must be evenly distributed, utilization in the address space must be high, records must be evenly spread across the available space, ...

Hashing is generally good for exact queries, but very inadequate for range queries.


Hashing MechanicsHashing Mechanics

The file blocks are divided into an equal number of buckets Typically, a bucket corresponds to one disk block (or a fixed

number of blocks) One (or more) of the file fields is (are) selected to be the hash

key(s)

A hashing function h is constructed as follows:

h : V {0, 1, 2, ... B-1}

where: V is the domain of field values

B is the number of buckets in the address space

(Note: Folding is required as an intermediate operation)


Hashing ExamplesHashing Examples

Example– Assume that, V is the domain for EmployeeNumber (a 9-

digit number standing for the SSN), and B = 1000.– We create a hash function h : V {0, 1, 2, ... B-1}, as:

if v belongs to V, h(v) = last 3 digits of v = v MOD 1000

Hashing Functions can be constructed easily, with the main criterion being: uniform distribution of records in buckets (otherwise, search gets very expensive)

Typical hash functions: Congruential (division remainder), Nth power, base transformation, polynomial division, encryption, etc.


Hashing Functions - OverflowHashing Functions - Overflow

Perhaps, the most popular and most heavily used hash function is congruence (MOD).

Basically, we divide the field value (after folding) by B and we interpret the remainder as the bucket value.

Example

Use the function h(v) = v MOD 3 to index Salary

0

1

2

Salariesh (Salary)

Shirley 22Maria 25

Dan 29

Tom 30Jill 36

Ron 43Bart 61

OVERFLOW


Hashing --- OverflowHashing --- Overflow

The example above demonstrates the phenomenon of collisions, which occur when a new record hashes to a bucket which is already FULL

An overflow area is kept for storing such records

Overflows can occur because of:– Heavy loading of the file– Poor hashing function (does not distribute uniformly the field values)– Statistical peculiarities (too many values hash to the same bucket)


Hashing --- Overflow (2)Hashing --- Overflow (2)

Overflows are usually handled by one the following ways:

– Chaining: if a bucket h(v) is full, chain an empty block to the bucket to expand it

– Open Addressing: if h(v) is full, store the record in h(v)+1 If it is also full, store it in h(v)+2, etc.

– Double-hashing: Employ 2 hashing functions (h and h’ ) If h(v) is full, try h’(v). If h’(v) is also full, try any of the above mentioned schemes (including a third hash function)


Performance of HashingPerformance of Hashing

The performance of a hashing scheme depends on the value of the loading factor L, defined as:

L is the number of records in a file divided by (B´S)

where: B is the number of buckets

S is the number of records per bucket Practical Hint: For loading factors of about 0.9 and with a

well-chosen hashing function, expect about 1.2 probes on the average to retrieve a record with a given key value.

Rule of thumb: When the loading factor becomes too high, a typical tactic is to double B and rehash


Static Hashing LimitationsStatic Hashing Limitations

The main disadvantage of static hashing is the fixed number of buckets (while the number of records dynamically changes) -- this brings in OVERFLOW

Various dynamic extensions of hashing have been devised:– Extendible Hashing– Linear Hashing

Dynamic hashing techniques avoid having long overflow chains in each bucket

They achieve this by dynamically changing the number of buckets and / or the hashing function


Extendible HashingExtendible Hashing

In extendible hashing, the number of bucket increases and decreases as the file (relation) expands or shrinks

The hashing function h is chosen such that its range is a very large set of integers (typically, B is 2b , where b=32)

Not all bits of the hash value are used, in particular:– At any point, the d most significant bits are used, 0 d b– The d-bit number is used as an index to a directory (array) that

contains a pointer to the appropriate bucket. The directory is said to have (global) length d

– The directory is stored on disk and expands or shrinks dynamically


Extendible Hashing (2)Extendible Hashing (2)

– Any number (the ones in the power of 2) of neighboring entries in the directory may be pointing to the same bucket. Consecutive entries, taken in pairs (1-2, 3-4, 5-6, etc.) are called buddies

– If 2k entries point to a bucket, the bucket’s local depth d’ which is stored in the bucket’s header, is equal to

d’ = d - k

– The hashing values of keys hashed in the same bucket have the same d’ bits.

– Extendible hashing does not require an overflow area



Example: Index on Salary (extendible hashing)

Salaries Rudy 38

Hart 32Jill 40Taft 330

0

01

10

11

h (Salary)

Bill 31Shirl 35Larry 31

2

1

2

2

d =

d’=



Overflows are handled with the following algorithm:

Assume that a bucket is about to overflow (due to the insertion of a new tuple):

– If the overflown bucket has d’ < d,

~ Split the bucket into 2

~ Make the pointer of the buddy entry in the directory point to the new entry

~ Rehash all the keys of the bucket

~ Increase the local depth (d’) of the bucket by 1. This will also be the value of the local depth for the new

bucket



– If the overflown bucket has d’ = d,

~ DOUBLE the size of the directory

~ Every entry indexed by a (d+1) - bit number, points to the bucket where the entry indexed by the first d bits pointed before

~ Increase the depth d by 1

~ Proceed as before by splitting the overflown bucket

When a lot of keys are deleted, buddies may be merged. This may result in cutting the size of the directory in half.


Extendible Hashing: OverflowExtendible Hashing: Overflow

.Example: Insert tuples <Peter, 37> and <Nat, 43>

000

001

010

011

101

110

111

100

3 2

2

2

3

3

Hart 32Jill 40

Taft 33Peter 37

Rudy 38

Bill 31Larry 31

Shirl 35Nat 43

h (Salary)


Extendible Hashing - SummaryExtendible Hashing - Summary

ADVANTAGES– Performance of retrievals is constant as the relation grows

DISADVANTAGES– Updates are fairly expensive, especially when the directory doubles– There is much space overhead for the directory– If the directory grows very big and does not fit in main memory,

retrievals need two I/O operations– If a bucket is overflown with tuples having the same key values,

extendible hashing will be splitting this bucket forever!!!

Documents

YV - Access Methods and Indexes 274 Kεφάλαιο 7 Φυσικός Σχεδιασμός -- Αρχεία