Storage&and&Indexing&ir.cis.udel.edu/~carteret/CISC637-S10/slides/11-12indexing.pdf · 4/19/10 14 Heap&File/Unclustered&Hash&Index& • Unsorted&ﬁle&with&hash&funcNon&on&search&key&

4/19/10

1

Storage and Indexing

CISC437/637, Lecture #11 Ben Cartere=e

1 Copyright © Ben Cartere=e

Physical Layer

•  So far we have focused on high-‐level database layers –  Conceptual: E-‐R model, E-‐R diagrams –  Logical: relaNonal model, SQL

•  UlNmately the logical layer needs to be implemented in data structures and stored in files on disk –  The physical layer

•  Understanding how the files are organized is essenNal to using a DBMS effecNvely

Copyright © Ben Cartere=e 3

4/19/10

2

Physical Storage Media

•  The actual physical objects that data is stored in •  It is convenient to classify media according to characterisNcs: –  Speed of access –  Cost per unit of storage –  Reliability

•  Loss of data on power outage or crash •  Failure of device

–  VolaNlity •  VolaNle storage loses contents when shut down •  Non-‐volaNle persists during shut down period



•  Cache – very fast; very expensive; volaNle; managed by hardware/OS

•  Main memory – fast; comparaNvely expensive; volaNle – Usually too expensive to allow storage of enNre DB

•  Flash memory – fast read, slower write; roughly as costly as main memory; non-‐volaNle


4/19/10

3


•  Magne3c disk – slow; cheap; non-‐volaNle –  This is where the full DB would be stored –  DBMS must move data from disk to memory for access, and from memory to disk for storage •  Doing this efficiently is the driving force behind storage decisions

•  Op3cal storage – slow; cheap; non-‐volaNle –  CDs, DVDs

•  Tape – very slow; very cheap; non-‐volaNle –  Only allows sequen3al access: data accessed in storage order from beginning •  Compare to direct access offered by magneNc disks

–  Used for backups and archives


Physical Storage Hierarchy


primary

secondary (on-‐line)

terNary (off-‐line)

4/19/10

4

MagneNc Disk Mechanism


MagneNc Disk Details

•  Read-‐write head reads and writes magneNcally-‐encoded informaNon from conNnuously-‐spinning pla>ers

•  A pla=er is divided into circular tracks –  50,000 – 100,000 tracks per pla=er

•  Tracks are divided into sectors –  The smallest unit of informaNon that can be stored (typically 512 bytes)

–  500 sectors/track on inner tracks; 1000 on outer •  When a read/write operaNon executes, head goes to correct track and reads/writes data from/to sectors as they pass underneath


4/19/10

5

Measuring Disk Performance

•  Access 3me – Nme from issue of read/write request to when data transfer begins –  Seek 3me – Nme to posiNon the read/write head over the correct track •  Worst case = Nme to move from innermost track to outermost •  Average case = ½ worst case •  4 – 10 ms on typical disks

–  Rota3onal latency – Nme for sector being accessed to appear under head •  Worst case = full rotaNon •  Average case = ½ worst case •  4 – 11 ms on typical disks


Measuring Disk Performance

•  Data-‐transfer rate – the rate at which data can be retrieved from or stored to disk – 25 – 100 MB per second max; lower for inner tracks

•  Mean 3me to failure – average Nme the disk is expected to run conNnuously with no failure – 3 – 5 years


4/19/10

6

Disk Blocks and Pages

•  A page or block is a logical unit consisNng of a fixed number of conNguous sectors – Data is transferred from disk to main memory in pages – Usually 4KB to 8KB pages

•  Smaller more transfers from disk •  Larger more wasted space from parNally-‐used blocks

•  Access pa=erns: –  Sequen3al – successive requests for successive pages; requires only one disk seek

–  Random – successive requests for pages at arbitrary posiNons; each request requires a seek


OpNmizing Page Access

•  Keeping disk access fast is imperaNve for large databases

•  At the hardware/OS level: –  Buffering of pages stored temporarily in memory while not used (also done by DBMS)

–  Read-‐ahead of pages in case they might be used –  Scheduling of page access to minimize disk arm movement –  File organiza3on to place pages on disk that corresponds to expected access pa=ern

–  Nonvola3le write buffers for temporary storage –  Log disk devoted to recording page updates


4/19/10

7

File OrganizaNon

•  A database is stored on disk in a collecNon of files that span one or more pages

•  Each file contains a sequence of records •  Pages may contain one or more records •  The DBMS can insert and update records in files, and scan through records in a file one at a Nme – Note that we have not specified that all records are from the same relaNon

•  How these files are structured and stored on disk can have a big impact on data access speed


OrganizaNon of Records in Files

•  Simplest organizaNon scheme is a heap file – Records are stored in random order across pages

•  Indexed organizaNons are the alternaNve – The idea of an index is to efficiently locate all records matching a parNcular search key •  Search keys correspond to [sets of] fields/a=ributes/columns

•  They do not need to be keys in the relaNonal sense


4/19/10

8

Indexes

•  An index file consists of a series of data entries with one of three forms: –  k*: a record or search key value –  Pairs (k, r): a search key value with a pointer to a record –  Pairs (k, {r1, …, rn}): a search key value with a list of pointers

•  Three basic kinds of indexes: –  Ordered indexes in which search keys are stored in sorted order –  Hash indexes in which search keys are distributed uniformly in

“buckets” –  Tree indexes in which search keys are organized in a tree

structure


Ordered Indexes

•  A clustered index is an ordered index to a file of records in the same order

•  These types of indexes do not need to store pointers to records, since the posiNon of a search key in the index file corresponds to the posiNon of the record in the record file

•  Pages for a subset of records can be accessed sequenNally


4/19/10

9

Primary and Secondary Indexes

•  Primary index can mean one of two things: 1.  An index on the primary key

2.  A clustered index •  Secondary index can also mean one of two

things: 1.  An index on any other [set of] keys 2.  A non-‐clustered index


Dense and Sparse Indexes

•  Key values in an ordered index do not have to span all values in the files

•  In a dense index, there is an index record for every search-‐key value in the file

•  A sparse index contains index records for a subset of search-‐key values –  To locate a record with search-‐key value k, find the index record with largest value less than k

– Go to that record in the file and search sequenNally for the matching record


4/19/10

10

Hash-‐Based Indexing

•  Records in a file are grouped into buckets •  Each bucket consists of a primary page and zero or more overflow pages

•  A hash func3on h takes a search-‐key value k and returns the address of a bucket

•  To find records matching a search key value k, calculate h(k), then look through bucketed pages sequenNally to find matching data entries


Tree-‐Based Indexing

•  Search key values are organized in a tree •  The highest level is the root •  The lowest level (the leaf level) contains data entries

•  Each node in the tree is a page on disk –  Retrieving nodes involves disk I/O – And therefore the number of disk reads in a search is equal to the length of the path from root to leaf

•  A B+ tree is an index structure that ensures all paths from root to leaf are the same length


4/19/10

11

Comparing File OrganizaNons

•  We are interested in the total cost of accessing and modifying data with a given file organizaNon scheme

•  Specifically, what is the cost of: –  Scan (fetch all records in a file) –  Search with equality selec3on (fetch records that match an equality condiNon)

–  Search with range selec3on (fetch records that match a range condiNon)

–  Insert (insert a new record into a file) – Delete (delete a record from a file)


Cost Model

•  To esNmate cost, we need a model of total execuNon Nme

•  Our model is a simplified one: –  B is the number of pages (assuming 100% capacity) –  R is the number of records per page (100% capacity) – D is the average Nme to read or write a page from or to disk

–  C is average Nme to process a record

•  Consider the average case •  This is good enough to indicate trends


4/19/10

12

Heap Files

•  Heap file = randomly ordered records •  Costs: – Scan: B(D+RC) – Search with equality selecNon: 0.5B(D+RC) •  (if equality field is a key; same as scan if not)

– Search with range selecNon: B(D+RC) –  Insert: 2D+C – Delete: search cost + C+D


Sorted Files

•  Records stored directly, sorted on one or more fields

•  Costs: – Scan: B(D + RC) – Search with equality selecNon: D log2 B + C log2 R •  (assuming selecNon field is the sort field)

– Search with range selecNon: D log2 B + C log2 R –  Insert: search + B(D + RC) – Delete: search + B(D + RC)


4/19/10

13

Clustered File/Tree Index

•  Sorted file with B+-‐tree index on sort field –  F = fanout, max number of pointers in a node

•  AssumpNon: pages at 67% capacity •  Costs: –  Space overhead: 0.5B –  Scan: 1.5B(D + RC) –  Search with equality selecNon:

•  D logF (1.5B) + C log2 R –  Search with range selecNon: same –  Insert: search + D – Delete: same


Heap File/Unclustered Tree Index

•  Unsorted file with B+-‐tree on search key •  AssumpNon: index data entry = 10% record size •  Costs: –  Space overhead: 0.15B(F – 1) –  Scan: BD(R + 0.15) + 2BRC –  Search with equality selecNon:

•  D logF(0.15B) + C log2(6.7R) + D –  Search with range selecNon: “same” (plus another D for each matching record)

–  Insert: 3D + C + D logF(0.15B) + C log2(6.7R) •  (heap insert + tree search + leaf node page write)

–  Delete: heap delete + tree search + leaf node page write


4/19/10

14

Heap File/Unclustered Hash Index

•  Unsorted file with hash funcNon on search key – H = Nme to compute hash funcNon

•  AssumpNon: index pages are 80% full •  Costs: –  Space overhead: 0.125B –  Scan: BD(R + 0.125) + 2BRC –  Search with equality selecNon: H + 2D + 4RC –  Search with range selecNon: B(D + RC) –  Insert: heap insert + H + 2D + C – Delete: search + 2D


Comparison of I/O Costs

File type Scan Equal search Range search Insert Delete

Heap BD 0.5BD BD 2D Search + D

Sorted BD Dlog2 B Dlog2 B + m Search + BD Search + BD

Clustered 1.5BD DlogF 1.5B DlogF 1.5B + m Search + D Search + D

Tree index BD(R + 0.15) D(1 + logF 0.15B) D(logF 0.15B + m) D(3 + logF 0.15B) Search + 2D

Hash index BD(R + 0.125) 2D BD 4D Search + 2D


•  Best for scanning: sorted •  Best for equality search: hash index •  Best for range search: clustered •  Best for inserNons: heap •  Best for deleNons: heap

•  Worst for scanning: any index •  Worst for equality search: tree index •  Worst for range search: hash index •  Worst for inserNons: sorted •  Worst for deleNons: sorted

4/19/10

15

CreaNng Indexes in MySQL

CREATE INDEX indexName ON RelaNon (A1, …) USING [HASH|BTREE]

•  Creates an index of the specified type on the specified column(s)

•  MySQL automaNcally maintains it with inserts/updates/deletes

•  MySQL creates clustered indexes on primary keys by default –  If no primary key, clustered index on first UNIQUE NOT NULL

field –  If no UNIQUE NOT NULL field, clustered index on internal row ID


Indexes and Performance Tuning

•  A workload is a mix of queries and update operaNons •  We’d like to create indexes that will support the expected workload efficiently

•  For each query in the workload: – What relaNons does it access? – What a=ributes are retrieved? – Which a=ributes are involved in select/join clauses? –  How selecNve are those condiNons?

•  For each update in the workload: – What type of update (INSERT/UPDATE/DELETE)? – What a=ributes are affected?


4/19/10

16

Choosing Indexes

•  What indexes should we create? – Which relaNons need indexes?

– What fields should be used as search keys? – Do we need more than one index for a relaNon?

•  What type of index? – Clustered? Hash? B+-‐tree?


Index SelecNon Guidelines

•  A=ributes in WHERE clauses are good candidates – Many exact matches hash – Many range queries tree

•  If WHERE clauses o|en contain several condiNons, consider mulN-‐a=r index – Order of a=ributes ma=ers!

•  Choose indexes that benefit as many queries as possible


4/19/10

17

Index Design Examples

•  Range selecNon: SELECT E.dnum FROM Employees E WHERE E.age > 40 – B+ tree index supports; hash index does not –  Is B+ tree index worthwhile? •  Consider selecNvity as well as clustering


Index Design Examples

•  Range selecNon with grouping: SELECT E.dnum, COUNT(*) FROM Employees E WHERE E.age > 10 GROUP BY E.dnum – B+ tree index on age again, but this Nme almost certain that age > 10

– What about index on dnum?


4/19/10

18

Composite Search Keys

•  A composite (or concatenated) search key is one that contains several fields – Equality queries specify a constant value for every field in the key

– Range queries have either missing values or range tests

•  Composite keys support more queries – But are larger and require more updates


Composite Key Design

•  SELECT E.eid FROM Employees E WHERE E.age BETWEEN 20 AND 30 AND E.sal BETWEEN 3000 AND 5000 – Hash index does not support, but B+ tree index does

– Compare index on <sal, age> to index on <age, sal> •  Assuming similar selecNvity of clauses


4/19/10

19

Composite Key Design

•  SELECT E.eid FROM Employees E WHERE E.age = 25 AND E.sal BETWEEN 3000 AND 5000 – Compare index on <age, sal> to index on <sal, age>

•  SELECT AVG(E.sal) FROM Employees E WHERE E.age = 25 AND E.sal BETWEEN 3000 AND 5000


Documents

Storage&and&Indexing&ir.cis.udel.edu/~carteret/CISC637-S10/slides/11-12indexing.pdf · 4/19/10 14 Heap&File/Unclustered&Hash&Index& • Unsorted&ﬁle&with&hash&funcNon&on&search&key&