17
Today Review of Directory of Slot Block Organizations Heap Files Program 1 Hints Ordered Files & Hash Files • RAID

Today Review of Directory of Slot Block Organizations Heap Files Program 1 Hints Ordered Files & Hash Files RAID

Embed Size (px)

Citation preview

Page 1: Today Review of Directory of Slot Block Organizations Heap Files Program 1 Hints Ordered Files & Hash Files RAID

Today

• Review of Directory of Slot Block Organizations• Heap Files• Program 1 Hints• Ordered Files & Hash Files• RAID

Page 2: Today Review of Directory of Slot Block Organizations Heap Files Program 1 Hints Ordered Files & Hash Files RAID

Directory of Slots Example

Page 3: Today Review of Directory of Slot Block Organizations Heap Files Program 1 Hints Ordered Files & Hash Files RAID

Heap Files

• Heap files are stored as unordered records– the use of “heap” here is unrelated to the “free store”

used for dynamic memory allocation

• The simplest organization

Page 4: Today Review of Directory of Slot Block Organizations Heap Files Program 1 Hints Ordered Files & Hash Files RAID

Heap File Example

Paradise, Sal 231

Favor, Sue 123

Mach, Chris 401

Rodgers, Bill 616

Smith, Mary

Yost, Ned 819

Alm, Louis

Link, Steve

Patch, Linda

Jones, Jim 983

281

93

14

183 Ming, Yao

Turing, Alan 933

709

block 1 of file block 2 of file block N of file

Name ID Name ID Name ID

assuming N data blocks, R records per block, I/O cost of D and “record processing time” of C what are the costs of:

2D+C2D+C

N(D+RC)N(D+RC)/2

N(D+RC)

inserting a record, deleting record given RID (ignore reclaiming space)scan, search for “key” w/ equality selection,search w/ range selection?

Page 5: Today Review of Directory of Slot Block Organizations Heap Files Program 1 Hints Ordered Files & Hash Files RAID

Prog 1 Hints

• check slot numbers to make sure they are valid• sizeof(), memcpy(), memmove()

– man is your friend

• test patterns – make sure you handle error cases correctly

• error reporting

Page 6: Today Review of Directory of Slot Block Organizations Heap Files Program 1 Hints Ordered Files & Hash Files RAID

class HFPage {

struct slot_t { short offset; short length; }; // equals EMPTY_SLOT if slot is not in use

static const int DPFIXED = sizeof(slot_t) + 4 * sizeof(short)+ 3 * sizeof(PageId);

short slotCnt; // number of slots in use short usedPtr; // offset of first used byte in data[] short freeSpace; // number of bytes free in data[]

short type; // an arbitrary value used by subclasses as needed

PageId prevPage; // backward pointer to data page PageId nextPage; // forward pointer to data page PageId curPage; // page number of this page

slot_t slot[1]; // first element of slot array.

char data[MAX_SPACE - DPFIXED];

// methods ...

Page 7: Today Review of Directory of Slot Block Organizations Heap Files Program 1 Hints Ordered Files & Hash Files RAID

// **********************************************************// page class constructor

void HFPage::init(PageId pageNo){ nextPage = prevPage = INVALID_PAGE;

slotCnt = 0; // no slots in use

curPage = pageNo;

usedPtr = sizeof(data); // offset of used space in data array

freeSpace = sizeof(data) + sizeof(slot_t); // amount of space available // (initially one unused slot)}

Page 8: Today Review of Directory of Slot Block Organizations Heap Files Program 1 Hints Ordered Files & Hash Files RAID

• init()• getNextPage(), setNextPage()• getPrevPage(), setPrevPage()• insertRecord(), deleteRecord()• firstRecord(), nextRecord()• getRecord(), returnRecord()• available_space()• empty()

Page 9: Today Review of Directory of Slot Block Organizations Heap Files Program 1 Hints Ordered Files & Hash Files RAID

int HFPage::available_space(void){

// look for an empty slot. if one exists, then freeSpace // bytes are available to hold a record.

int i; for (i=0; i < slotCnt; i++) { if (slot[i].length == EMPTY_SLOT) return freeSpace; } // no empty slot exists. must reserve sizeof(slot_t) bytes // from freeSpace to hold new slot.

return freeSpace - sizeof(slot_t);}

Page 10: Today Review of Directory of Slot Block Organizations Heap Files Program 1 Hints Ordered Files & Hash Files RAID

Ordered Files

• Also called a sequential file.• File records are kept sorted by the values of an ordering field.• Insertion is expensive: records must be inserted in the correct order.

– It is common to keep a separate unordered overflow (or transaction) file for new records to improve insertion efficiency; this is periodically merged with the main ordered file.

• A binary search can be used to search for a record on its ordering field value.– This requires reading and searching log2 of the file blocks on the

average, an improvement over linear search.

• Reading the records in order of the ordering field is quite efficient.

Page 11: Today Review of Directory of Slot Block Organizations Heap Files Program 1 Hints Ordered Files & Hash Files RAID

File of Ordered Records

Page 12: Today Review of Directory of Slot Block Organizations Heap Files Program 1 Hints Ordered Files & Hash Files RAID

Hashed Files

• Hashing for disk files is called External Hashing• The file blocks are divided into M equal-sized buckets, numbered

bucket0, bucket1, ..., bucketM-1.– Typically, a bucket corresponds to one (or a fixed number of) disk block.

• One of the file fields is designated to be the hash key of the file.• The record with hash key value K is stored in bucket i, where i=h(K),

and h is the hashing function.• Search is very efficient on the hash key.• Collisions occur when a new record hashes to a bucket that is

already full.– An overflow file is kept for storing such records.– Overflow records that hash to each bucket can be linked together.

Page 13: Today Review of Directory of Slot Block Organizations Heap Files Program 1 Hints Ordered Files & Hash Files RAID

Hashed Files (contd.)

Page 14: Today Review of Directory of Slot Block Organizations Heap Files Program 1 Hints Ordered Files & Hash Files RAID

Hashed Files (contd.)

• There are numerous methods for collision resolution, including the following:– Open addressing: Proceeding from the occupied position specified by

the hash address, the program checks the subsequent positions in order until an unused (empty) position is found.

– Chaining: For this method, various overflow locations are kept, usually by extending the array with a number of overflow positions. In addition, a pointer field is added to each record location. A collision is resolved by placing the new record in an unused overflow location and setting the pointer of the occupied hash address location to the address of that overflow location.

– Multiple hashing: The program applies a second hash function if the first results in a collision. If another collision results, the program uses open addressing or applies a third hash function and then uses open addressing if necessary.

Page 15: Today Review of Directory of Slot Block Organizations Heap Files Program 1 Hints Ordered Files & Hash Files RAID

Hashed Files (contd.)

• To reduce overflow records, a hash file is typically kept 70-80% full.

• The hash function h should distribute the records uniformly among the buckets– Otherwise, search time will be increased because

many overflow records will exist.

• Main disadvantages of static external hashing:– Fixed number of buckets M is a problem if the number

of records in the file grows or shrinks.– Ordered access on the hash key is quite inefficient

(requires sorting the records).

Page 16: Today Review of Directory of Slot Block Organizations Heap Files Program 1 Hints Ordered Files & Hash Files RAID

Hashed Files - Overflow handling

Page 17: Today Review of Directory of Slot Block Organizations Heap Files Program 1 Hints Ordered Files & Hash Files RAID

Fill in This Table

Heap

Sorted

Hashed

scan equality range insert delete search search