Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
4/19/10
1
Storage and Indexing
CISC437/637, Lecture #11 Ben Cartere=e
1 Copyright © Ben Cartere=e
Physical Layer
• So far we have focused on high-‐level database layers – Conceptual: E-‐R model, E-‐R diagrams – Logical: relaNonal model, SQL
• UlNmately the logical layer needs to be implemented in data structures and stored in files on disk – The physical layer
• Understanding how the files are organized is essenNal to using a DBMS effecNvely
Copyright © Ben Cartere=e 3
4/19/10
2
Physical Storage Media
• The actual physical objects that data is stored in • It is convenient to classify media according to characterisNcs: – Speed of access – Cost per unit of storage – Reliability
• Loss of data on power outage or crash • Failure of device
– VolaNlity • VolaNle storage loses contents when shut down • Non-‐volaNle persists during shut down period
Copyright © Ben Cartere=e 4
Physical Storage Media
• Cache – very fast; very expensive; volaNle; managed by hardware/OS
• Main memory – fast; comparaNvely expensive; volaNle – Usually too expensive to allow storage of enNre DB
• Flash memory – fast read, slower write; roughly as costly as main memory; non-‐volaNle
Copyright © Ben Cartere=e 5
4/19/10
3
Physical Storage Media
• Magne3c disk – slow; cheap; non-‐volaNle – This is where the full DB would be stored – DBMS must move data from disk to memory for access, and from memory to disk for storage • Doing this efficiently is the driving force behind storage decisions
• Op3cal storage – slow; cheap; non-‐volaNle – CDs, DVDs
• Tape – very slow; very cheap; non-‐volaNle – Only allows sequen3al access: data accessed in storage order from beginning • Compare to direct access offered by magneNc disks
– Used for backups and archives
Copyright © Ben Cartere=e 6
Physical Storage Hierarchy
Copyright © Ben Cartere=e 7
primary
secondary (on-‐line)
terNary (off-‐line)
4/19/10
4
MagneNc Disk Mechanism
Copyright © Ben Cartere=e 8
MagneNc Disk Details
• Read-‐write head reads and writes magneNcally-‐encoded informaNon from conNnuously-‐spinning pla>ers
• A pla=er is divided into circular tracks – 50,000 – 100,000 tracks per pla=er
• Tracks are divided into sectors – The smallest unit of informaNon that can be stored (typically 512 bytes)
– 500 sectors/track on inner tracks; 1000 on outer • When a read/write operaNon executes, head goes to correct track and reads/writes data from/to sectors as they pass underneath
Copyright © Ben Cartere=e 9
4/19/10
5
Measuring Disk Performance
• Access 3me – Nme from issue of read/write request to when data transfer begins – Seek 3me – Nme to posiNon the read/write head over the correct track • Worst case = Nme to move from innermost track to outermost • Average case = ½ worst case • 4 – 10 ms on typical disks
– Rota3onal latency – Nme for sector being accessed to appear under head • Worst case = full rotaNon • Average case = ½ worst case • 4 – 11 ms on typical disks
Copyright © Ben Cartere=e 10
Measuring Disk Performance
• Data-‐transfer rate – the rate at which data can be retrieved from or stored to disk – 25 – 100 MB per second max; lower for inner tracks
• Mean 3me to failure – average Nme the disk is expected to run conNnuously with no failure – 3 – 5 years
Copyright © Ben Cartere=e 11
4/19/10
6
Disk Blocks and Pages
• A page or block is a logical unit consisNng of a fixed number of conNguous sectors – Data is transferred from disk to main memory in pages – Usually 4KB to 8KB pages
• Smaller more transfers from disk • Larger more wasted space from parNally-‐used blocks
• Access pa=erns: – Sequen3al – successive requests for successive pages; requires only one disk seek
– Random – successive requests for pages at arbitrary posiNons; each request requires a seek
Copyright © Ben Cartere=e 12
OpNmizing Page Access
• Keeping disk access fast is imperaNve for large databases
• At the hardware/OS level: – Buffering of pages stored temporarily in memory while not used (also done by DBMS)
– Read-‐ahead of pages in case they might be used – Scheduling of page access to minimize disk arm movement – File organiza3on to place pages on disk that corresponds to expected access pa=ern
– Nonvola3le write buffers for temporary storage – Log disk devoted to recording page updates
Copyright © Ben Cartere=e 13
4/19/10
7
File OrganizaNon
• A database is stored on disk in a collecNon of files that span one or more pages
• Each file contains a sequence of records • Pages may contain one or more records • The DBMS can insert and update records in files, and scan through records in a file one at a Nme – Note that we have not specified that all records are from the same relaNon
• How these files are structured and stored on disk can have a big impact on data access speed
Copyright © Ben Cartere=e 14
OrganizaNon of Records in Files
• Simplest organizaNon scheme is a heap file – Records are stored in random order across pages
• Indexed organizaNons are the alternaNve – The idea of an index is to efficiently locate all records matching a parNcular search key • Search keys correspond to [sets of] fields/a=ributes/columns
• They do not need to be keys in the relaNonal sense
Copyright © Ben Cartere=e 15
4/19/10
8
Indexes
• An index file consists of a series of data entries with one of three forms: – k*: a record or search key value – Pairs (k, r): a search key value with a pointer to a record – Pairs (k, {r1, …, rn}): a search key value with a list of pointers
• Three basic kinds of indexes: – Ordered indexes in which search keys are stored in sorted order – Hash indexes in which search keys are distributed uniformly in
“buckets” – Tree indexes in which search keys are organized in a tree
structure
Copyright © Ben Cartere=e 16
Ordered Indexes
• A clustered index is an ordered index to a file of records in the same order
• These types of indexes do not need to store pointers to records, since the posiNon of a search key in the index file corresponds to the posiNon of the record in the record file
• Pages for a subset of records can be accessed sequenNally
Copyright © Ben Cartere=e 17
4/19/10
9
Primary and Secondary Indexes
• Primary index can mean one of two things: 1. An index on the primary key
2. A clustered index • Secondary index can also mean one of two
things: 1. An index on any other [set of] keys 2. A non-‐clustered index
Copyright © Ben Cartere=e 18
Dense and Sparse Indexes
• Key values in an ordered index do not have to span all values in the files
• In a dense index, there is an index record for every search-‐key value in the file
• A sparse index contains index records for a subset of search-‐key values – To locate a record with search-‐key value k, find the index record with largest value less than k
– Go to that record in the file and search sequenNally for the matching record
Copyright © Ben Cartere=e 19
4/19/10
10
Hash-‐Based Indexing
• Records in a file are grouped into buckets • Each bucket consists of a primary page and zero or more overflow pages
• A hash func3on h takes a search-‐key value k and returns the address of a bucket
• To find records matching a search key value k, calculate h(k), then look through bucketed pages sequenNally to find matching data entries
Copyright © Ben Cartere=e 20
Tree-‐Based Indexing
• Search key values are organized in a tree • The highest level is the root • The lowest level (the leaf level) contains data entries
• Each node in the tree is a page on disk – Retrieving nodes involves disk I/O – And therefore the number of disk reads in a search is equal to the length of the path from root to leaf
• A B+ tree is an index structure that ensures all paths from root to leaf are the same length
Copyright © Ben Cartere=e 21
4/19/10
11
Comparing File OrganizaNons
• We are interested in the total cost of accessing and modifying data with a given file organizaNon scheme
• Specifically, what is the cost of: – Scan (fetch all records in a file) – Search with equality selec3on (fetch records that match an equality condiNon)
– Search with range selec3on (fetch records that match a range condiNon)
– Insert (insert a new record into a file) – Delete (delete a record from a file)
Copyright © Ben Cartere=e 22
Cost Model
• To esNmate cost, we need a model of total execuNon Nme
• Our model is a simplified one: – B is the number of pages (assuming 100% capacity) – R is the number of records per page (100% capacity) – D is the average Nme to read or write a page from or to disk
– C is average Nme to process a record
• Consider the average case • This is good enough to indicate trends
Copyright © Ben Cartere=e 23
4/19/10
12
Heap Files
• Heap file = randomly ordered records • Costs: – Scan: B(D+RC) – Search with equality selecNon: 0.5B(D+RC) • (if equality field is a key; same as scan if not)
– Search with range selecNon: B(D+RC) – Insert: 2D+C – Delete: search cost + C+D
Copyright © Ben Cartere=e 24
Sorted Files
• Records stored directly, sorted on one or more fields
• Costs: – Scan: B(D + RC) – Search with equality selecNon: D log2 B + C log2 R • (assuming selecNon field is the sort field)
– Search with range selecNon: D log2 B + C log2 R – Insert: search + B(D + RC) – Delete: search + B(D + RC)
Copyright © Ben Cartere=e 25
4/19/10
13
Clustered File/Tree Index
• Sorted file with B+-‐tree index on sort field – F = fanout, max number of pointers in a node
• AssumpNon: pages at 67% capacity • Costs: – Space overhead: 0.5B – Scan: 1.5B(D + RC) – Search with equality selecNon:
• D logF (1.5B) + C log2 R – Search with range selecNon: same – Insert: search + D – Delete: same
Copyright © Ben Cartere=e 26
Heap File/Unclustered Tree Index
• Unsorted file with B+-‐tree on search key • AssumpNon: index data entry = 10% record size • Costs: – Space overhead: 0.15B(F – 1) – Scan: BD(R + 0.15) + 2BRC – Search with equality selecNon:
• D logF(0.15B) + C log2(6.7R) + D – Search with range selecNon: “same” (plus another D for each matching record)
– Insert: 3D + C + D logF(0.15B) + C log2(6.7R) • (heap insert + tree search + leaf node page write)
– Delete: heap delete + tree search + leaf node page write
Copyright © Ben Cartere=e 27
4/19/10
14
Heap File/Unclustered Hash Index
• Unsorted file with hash funcNon on search key – H = Nme to compute hash funcNon
• AssumpNon: index pages are 80% full • Costs: – Space overhead: 0.125B – Scan: BD(R + 0.125) + 2BRC – Search with equality selecNon: H + 2D + 4RC – Search with range selecNon: B(D + RC) – Insert: heap insert + H + 2D + C – Delete: search + 2D
Copyright © Ben Cartere=e 28
Comparison of I/O Costs
File type Scan Equal search Range search Insert Delete
Heap BD 0.5BD BD 2D Search + D
Sorted BD Dlog2 B Dlog2 B + m Search + BD Search + BD
Clustered 1.5BD DlogF 1.5B DlogF 1.5B + m Search + D Search + D
Tree index BD(R + 0.15) D(1 + logF 0.15B) D(logF 0.15B + m) D(3 + logF 0.15B) Search + 2D
Hash index BD(R + 0.125) 2D BD 4D Search + 2D
Copyright © Ben Cartere=e 29
• Best for scanning: sorted • Best for equality search: hash index • Best for range search: clustered • Best for inserNons: heap • Best for deleNons: heap
• Worst for scanning: any index • Worst for equality search: tree index • Worst for range search: hash index • Worst for inserNons: sorted • Worst for deleNons: sorted
4/19/10
15
CreaNng Indexes in MySQL
CREATE INDEX indexName ON RelaNon (A1, …) USING [HASH|BTREE]
• Creates an index of the specified type on the specified column(s)
• MySQL automaNcally maintains it with inserts/updates/deletes
• MySQL creates clustered indexes on primary keys by default – If no primary key, clustered index on first UNIQUE NOT NULL
field – If no UNIQUE NOT NULL field, clustered index on internal row ID
Copyright © Ben Cartere=e 30
Indexes and Performance Tuning
• A workload is a mix of queries and update operaNons • We’d like to create indexes that will support the expected workload efficiently
• For each query in the workload: – What relaNons does it access? – What a=ributes are retrieved? – Which a=ributes are involved in select/join clauses? – How selecNve are those condiNons?
• For each update in the workload: – What type of update (INSERT/UPDATE/DELETE)? – What a=ributes are affected?
Copyright © Ben Cartere=e 31
4/19/10
16
Choosing Indexes
• What indexes should we create? – Which relaNons need indexes?
– What fields should be used as search keys? – Do we need more than one index for a relaNon?
• What type of index? – Clustered? Hash? B+-‐tree?
Copyright © Ben Cartere=e 32
Index SelecNon Guidelines
• A=ributes in WHERE clauses are good candidates – Many exact matches hash – Many range queries tree
• If WHERE clauses o|en contain several condiNons, consider mulN-‐a=r index – Order of a=ributes ma=ers!
• Choose indexes that benefit as many queries as possible
Copyright © Ben Cartere=e 33
4/19/10
17
Index Design Examples
• Range selecNon: SELECT E.dnum FROM Employees E WHERE E.age > 40 – B+ tree index supports; hash index does not – Is B+ tree index worthwhile? • Consider selecNvity as well as clustering
Copyright © Ben Cartere=e 34
Index Design Examples
• Range selecNon with grouping: SELECT E.dnum, COUNT(*) FROM Employees E WHERE E.age > 10 GROUP BY E.dnum – B+ tree index on age again, but this Nme almost certain that age > 10
– What about index on dnum?
Copyright © Ben Cartere=e 35
4/19/10
18
Composite Search Keys
• A composite (or concatenated) search key is one that contains several fields – Equality queries specify a constant value for every field in the key
– Range queries have either missing values or range tests
• Composite keys support more queries – But are larger and require more updates
Copyright © Ben Cartere=e 36
Composite Key Design
• SELECT E.eid FROM Employees E WHERE E.age BETWEEN 20 AND 30 AND E.sal BETWEEN 3000 AND 5000 – Hash index does not support, but B+ tree index does
– Compare index on <sal, age> to index on <age, sal> • Assuming similar selecNvity of clauses
Copyright © Ben Cartere=e 37
4/19/10
19
Composite Key Design
• SELECT E.eid FROM Employees E WHERE E.age = 25 AND E.sal BETWEEN 3000 AND 5000 – Compare index on <age, sal> to index on <sal, age>
• SELECT AVG(E.sal) FROM Employees E WHERE E.age = 25 AND E.sal BETWEEN 3000 AND 5000
Copyright © Ben Cartere=e 38