CSE 326: Data Structures Lecture #13 Extendible Hashing and Splay Trees

CSE 326: Data StructuresLecture #13

Extendible Hashing and Splay Trees

Alon Halevy

Spring Quarter 2001

Extendible Hashing

• Hashing technique for huge data sets– optimizes to reduce disk accesses

– each hash bucket fits on one disk block

– better than B-Trees if order is not important

• Table contains– buckets, each fitting in one disk block, with the data

– a directory that fits in one disk block used to hash to the correct bucket

001 010 011 110 111 101

Extendible Hash Table• Directory contains entries labeled by k bits plus a

pointer to the bucket with all keys starting with its bits• Each block contains keys+data matching on the first

j k bits

000 100

(2)00001000110010000110

(2)010010101101100

(3)1000110011

(3)101011011010111

(2)11001110111110011110

directory for k = 3

Inserting (easy case)

001 010 011 110 111 101000 100

(2)00001000110010000110

(2)010010101101100

(3)1000110011

(3)101011011010111

(2)11001110111110011110

insert(11011)001 010 011 110 111 101000 100

(2)00001000110010000110

(2)010010101101100

(3)1000110011

(3)101011011010111

(2)110011110011110

Splitting

001 010 011 110 111 101000 100

(2)00001000110010000110

(2)010010101101100

(3)1000110011

(3)101011011010111

(2)11001110111110011110

insert(11000)

001 010 011 110 111 101000 100

(2)00001000110010000110

(2)010010101101100

(3)1000110011

(3)101011011010111

(3)110001100111011

(3)1110011110

Rehashing

01 10 1100

(2)01101

(2)10000100011001110111

(2)1100111110

insert(10010)

001 010 011 110 111 101000 100

No room to insert and no adoption!

Expand directory

Now, it’s just a normal split.

Rehash of Hashing• Hashing is a great data structure for storing unordered data

that supports insert, delete, and find• Both separate chaining (open) and open addressing

(closed) hashing are useful– separate chaining flexible– closed hashing uses less storage, but performs badly with load

factors near 1– extendible hashing for very large disk-based data

• Hashing pros and cons+ very fast+ simple to implement, supports insert, delete, find- lazy deletion necessary in open addressing, can waste storage- does not support operations dependent on order: min, max, range

Recall: AVL Tree Dictionary Data Structure

121062

14137 9

• Binary search tree properties– binary tree property

– search tree property

• Balance property– balance of every node is:

-1 b 1– result:

• depth is (log n)

Splay Trees

“blind” rebalancing – no height info kept• amortized time for all operations is O(log n)• worst case time is O(n)• insert/find always rotates node to the root!

– Good locality – most common keys move high in tree

You’re forced to make a really deep access:

Since you’re down there anyway,fix up a lot of deep nodes!

Splay Operations: Find

• Find(x)1. do a normal BST search to find n such that

n->key = x

2. move n to root by series of zig-zag and zig-zig rotations, followed by a final zig if necessary

Zig-Zag*

*This is just a double rotation

Helped

Unchanged

Zig-Zig

root root

Why Splaying Helps• Node n and its children are always helped (raised)• Except for final zig, nodes that are hurt by a zig-

zag or zig-zig are later helped by a rotation higher up the tree!

• Result: – shallow (zig) nodes may increase depth by one or two– helped nodes may decrease depth by a large amount

• If a node n on the access path is at depth d before the splay, it’s at about depth d/2 after the splay– Exceptions are the root, the child of the root, and the

node splayed

Locality

• Assume m n access in a tree of size n– Total amortized time O(m log n)

– O(log n) per access on average

• Gets better when you only access k distinct items in the m accesses.– Exercise.

Splaying Example

Find(6)

zig-zig

Still Splaying 6

zig-zig2

Almost There, Stay on Target

Splay Again

Find(4)

zig-zag

Example Splayed Out

zig-zag

Splay Tree Summary

• All operations are in amortized O(log n) time• Splaying can be done top-down; better because:

– only one pass– no recursion or parent pointers necessary

• Invented by Sleator and Tarjan (1985), now widely used in place of AVL trees

• Splay trees are very effective search trees– relatively simple– no extra fields required– excellent locality properties: frequently accessed keys are cheap to

Coming Up

• Project 3: implement a “smart” web server.

• Heaps!

• Disjoint Sets

• Graphs and more.

CSE 326: Data Structures Lecture #13 Extendible Hashing and Splay Trees

Documents

Indexação e Hashing - Construção de Índices e Funções Hash · Funções Hash Dinâmicas ∙ A função hash é modiﬁcada dinamicamente Hash Extensível (Extendible Hashing)

Parallel Multi-split Extendible Hashing for Persistent Memory

Hash-Based Indexescsbio.unc.edu/mcmillan/Media/Comp521F10Lecture15.pdf · Extendible Hashing Situation: Bucket (primary page) becomes full. Change hashing function and reorganize

Indexing: Overview & Hashing - Joyce Hojoyceho.github.io/cs377_s16/slides/index-17.pdf · Example: Extendible Hashing Structure (3) Insert new tuple Einstein whose ﬁrst 1 bit hash

Hash-Based Indexingadrem.uantwerpen.be/sites/default/files/db2-hash-indexes.pdfHash-Based Indexing Hash-Based Indexing Static Hashing Hash Functions Extendible Hashing Search Insertion

Hash-Based Indexing Torsten Grust Chapter 6...Hash-Based Indexing Torsten Grust Hash-Based Indexing Static Hashing Hash Functions Extendible Hashing Search Insertion Procedures Linear

Database Applications (15 -415)mhhammou/15415-s14/lectures/Lecture12... · Database Applications (15 -415) DBMS Internals- Part IV Lecture 12, February 24, 2014 ... Extendible Hashing

Handling large amount of data efficiently - Teaching Labscsc443h/winter/slides/01.08.Dynamic... · Extendible hashing •Assume that the hash function h(k) returns a binary number

Hashing for filessaksagan.ceng.metu.edu.tr/.../week6_Hashing_section1.pdf · 2009. 11. 24. · Extendible Hashing:Intr. Extendible hashing does not have chains of buckets, contrary

Dynamic Hash Indexes & Tree-Structured Indexes · Extendible Hashing vs. Linear Hashing • Dynamic Extendible Hashing • Periodically double the size of the database directory

1 B+-tree and Hash Indexes B+-trees Bulk loading Static Hashing Extendible Hashing Linear Hashing

Chapter 1thanhtung/downloads/dbms/Chapter_1.pdf · hashing, extendible hashing, and linear hashing. Both dynamic and extendible hashing use the binary representation of the hash value

Lecture XI HASHINGyap/wiki/pm/uploads/Algo/l11_BASE.pdf · basic hashing framework, including universal hashing, perfect hashing, extendible hashing, and cuckoo hashing. Hash is one

Hashing - WSU · Extendible Hashing Store hash table in a depth-1 tree Every search takes 2 disk accesses Insertions require few disk accesses Hash the keys to a long integer (“extendible”)

Chapter 1 - · PDF fileChapter 1 Disk Storage, Basic File Structures, and Hashing. Adapted from the slides of “Fundamentals of Database Systems” ... extendible hashing,

Kapitel 5...Extendible Hashing Aufbau 17 Beispiel: Extendible Hashing Schlüsselwert h(k) k Binärer Wert B 00 01 10 11 Verzeichnis Ersten i Bits von B (hier i = 2) 2 global depth

Extendible Hashing Report

File storage with an Introduction to Indexing...Linear Hashing • This is another dynamic hashing scheme, an alternative to Extendible Hashing. • LH handles the problem of long

5. Hashingmouhoubm/=postscript/=c3620/c36205.pdf · 5. Hashing 5.6 Extendible Hashing • If the amount of data is too large to ﬁt in main memory, the main consideration is the

Extendible Hashing For Use as a File Structure 1