Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Rasmus Ejlers Møgelberg
Physical data organisation
Introduction to Database Design 2012, Lecture 10
Rasmus Ejlers Møgelberg
Don’t forget
• Mandatory ex. deadline April 11
• No teaching April 9 and 16
• Course evaluation April 11-17
2
Rasmus Ejlers Møgelberg
Course overview
• Communicating with DBMSs
• Designing databases
• Making databases efficient
• Making databases reliable
3
Rasmus Ejlers Møgelberg
Making databases efficient
• 2-3 lectures on internals of DBMSs
- Storage structure
- Indices
- Execution plans
• If you really want to become good at using DBs efficiently you should take database tuning class
4
Rasmus Ejlers Møgelberg
Levels of data abstraction
5
(Illustrations from book)Filesystem
Interface of DBMS
Restricted access view
Rasmus Ejlers Møgelberg
Today’s lecture
• Physical data organisation
• Storage structures:
- Sorted files
- Multilevel indices
- B+ trees
6
Rasmus Ejlers Møgelberg
Physical data organisation
Rasmus Ejlers Møgelberg
Storage hierarchy
8
Rasmus Ejlers Møgelberg
Storage hierarchy
• The higher, the smaller capacity
• The lower, the slower access
• CPU always tries to read from cache
• If that fails (cache miss) will try main memory
• If that fails: transfer block from disk
9
Rasmus Ejlers Møgelberg
Characteristics of storage
• Considerations for storage media
- Cost per storage unit
- Access speed
- Reliability
• Types of reliability
- Volatile: lost when power turns off
- Non-volatile: not lost
10
Rasmus Ejlers Møgelberg
Cache and main memory
• Cache is fastest and smallest type of storage
- Located on chip
- Volatile
• Main memory (RAM)
- Access time 10s - 100s of nanoseconds
- Volatile
11
Rasmus Ejlers Møgelberg
Flash memory
• Non-volatile
• Reads about as fast as main memory
• Writes slower (microseconds (1000 nanoseconds))
• Supports only a limited number of writes/erases
12
Rasmus Ejlers Møgelberg
Magnetic disks
13
Rasmus Ejlers Møgelberg
Typical numbers
• 60-250 revolutions a second
• 100,000 tracks per platter
• 1-5 platters per disk
• 500-2000 sectors per platter
• Sectors typically 512 bytes
14
Rasmus Ejlers Møgelberg
Performance measures
• Access time (usually milliseconds (= millions of nanoseconds))
- Seek time: moving head to right track
- Rotational latency: waiting for disk to rotate
• Relation between disk access and cache access:
- Like difference between borrowing sugar from your neighbour and flying to Australia to get it!
15
Rasmus Ejlers Møgelberg
Reading from disk
• Whole blocks transferred at a time
• A block is a fixed number of contiguous sectors
• Blocks are also sometimes called pages
• If you have to go to Australia, you might as well bring more than one bag of sugar
• Similar comparisons between computation time and disk access time
16
Performance measured in number of disk accesses!
Rasmus Ejlers Møgelberg
File organisation
Rasmus Ejlers Møgelberg
Sorted files
• Records stored sequentially on disk
• Sorted by primary key
18
Rasmus Ejlers Møgelberg
Searching sorted files
• Consider queries such as
• Can be implemented using binary search
- Look in the middle of the file
- If equal to 32343 we are done
- If larger than 32343 look in the middle of first half
- Else look in the middle of the second half
- Repeat until found
19
select * from instructor where id = 32343
select * from instructor where id between 20000 and 30000
Rasmus Ejlers Møgelberg
Performance of binary search
• For each iteration we cut the search space in half
• If table occupies N blocks we need approximately log2(N) disk access operations
• log2(N) is approximately 3 times number of digits in N
20
log2(1000) ≈ 10log2(1000000) ≈ 20
log2(1000000000) ≈ 30
Rasmus Ejlers Møgelberg
Searching sorted files
• Consider queries such as
• Must use linear search
- Look through file one record at a time
• Number of disk access operations linear in file size
• Very slow if table is large
• Answer: secondary indices (later in this lecture)
21
select * from instructor where name = ‘Einstein’
Rasmus Ejlers Møgelberg
Deleting from sorted files
• Simply delete records
• Keep list of available space for future insertions
• Reorganise when file sparsely populated
22
Rasmus Ejlers Møgelberg
Inserting Records
• If space insert
• Otherwise insert into overflow block
• Leave space for insertions (fill factor < 100%)
23
Rasmus Ejlers Møgelberg
The problem of maintaining sorted files
• Overflow block may increase in size over time
• Overflow blocks are not sorted
• After many insertions advantage of sorted file is lost
• May need to reorganise
• Can be done e.g. at night
24
Sorted files are costly to maintain. Can we do better?
Rasmus Ejlers Møgelberg
More problems
• Should file be kept on contiguous blocks?
• How else can we find the middle of the file?
• But this makes maintenance harder
• What happens when file cannot be extended further on disk?
25
Rasmus Ejlers Møgelberg
Indices
Rasmus Ejlers Møgelberg
Using a sparse index
• Blocks really need not be contiguous
• For search we can use a sparse index
• Index file much smaller than data file
• Note that overflow blocks no longer needed
27
Index file
Rasmus Ejlers Møgelberg
Sparse index
• For search purposes index file should be kept sorted
• Number of key, block address combinations storable in a block is called the fan out
• We need (number of data blocks)/fan out blocks for the index file
• Index file could potentially be too large to store in single block
• So may need to apply same trick to index file
28
Rasmus Ejlers Møgelberg
Multilevel indexing
29
Rasmus Ejlers Møgelberg
Multilevel indexing
30
202
112
145
112
256
202
Nielsen
Bohr
Mozart114
131
112
Day
Moore
Freyd512
923
256
Kleisli
Lewis
Kelly234
245
202
Abel
Korth
Kiefer178
201
145
Inner index Data blocksOuter index
Rasmus Ejlers Møgelberg
An example
• Suppose there are 10,000 blocks in the data file
• Suppose fan out is 100
• We then need 10,000 / 100 = 100 blocks for the inner index
• We need 1 block for the outer index
• A search using the index requires 3 block reads
• A linear search requires 10,000 block reads
• If there are 1,000,000 blocks of data we need just 1 more index level
31
Rasmus Ejlers Møgelberg
Secondary indices
Rasmus Ejlers Møgelberg
Multiple indexes
• Recall the query
• Sorted structure on id does not help us here
• If we do this kind of queries often, we may need a secondary index
• Compare to phone books sorted by business type with an index sorted by company name
33
select * from instructor where name = ‘Einstein’
Rasmus Ejlers Møgelberg
Secondary indices
• Index over attribute(s) for which data file is not sorted
• Attribute(s) need not be candidate key
• Inner index must be dense, i.e. one entry for each tuple
• But is usually still much smaller than data file
• Outer indices need not be dense, because inner index sorted
• Will need 1 more level of indexing for the secondary index
34
Rasmus Ejlers Møgelberg
Secondary index
35
202
112
145
112
256
202
Nielsen
Bohr
Mozart114
131
112
Day
Abel
Bohr
Day
Moore
Freyd512
923
256
Kleisli
Lewis
Kelly234
245
202
Abel
Kleisli
Korth178
201
145
Nielsen
Moore
Mozart
Lewis
Kleisli
Korth
Kiefer
Freyd
Kelly
Kleisli
Abel
Freyd
Moore
Abel
Moore
Primary index Secondary indexData blocks
Rasmus Ejlers Møgelberg
Types of indices
• Clustered vs. non-clustered
- Clustered indices: files sorted by indexing attribute
- Secondary indices are non-clustered
• Sparse vs. dense
- Dense indices have one entry per value
- Sparse indices have one entry per block
36
Rasmus Ejlers Møgelberg
B+ trees
Rasmus Ejlers Møgelberg
B+ trees
• Are a multilevel index structure
• Can be used for storage or indexing
• Key features
- Easily maintainable
- Also number of levels can grow
- Search trees are always balanced
• Balance is the key to efficiency
38
Rasmus Ejlers Møgelberg
B+ trees example
39
Rasmus Ejlers Møgelberg
Node structure
• Ks are keys, Ps are pointers
• In internal nodes Pi+1 points to subtree containing only nodes with search key x
• Search key values are ordered
• Fan out is n
40
K1 < K2 < K3 ...
Ki ≤ x < Ki+1
Rasmus Ejlers Møgelberg
B+ trees
• The tree must be balanced:
- All paths from the root to a leaf node have same length
• If fan out is n then
- Leaf nodes must have between (n-1)/2 and n-1 values
- Internal nodes must have between n/2 and n pointers
- Root node must have at least 2 pointers
- (all numbers above should be rounded up)
• Underfull nodes are waste of space
• Space inefficiency may lead to slower queries
41
Rasmus Ejlers Møgelberg
Insertions example
• Insert vince, vera and rob
42
rickjudy
tompetemikejanebob
eddiebob joejane karenjudy philpetenanmike solrick tomalabe
Rasmus Ejlers Møgelberg
Insertions rule
• If room, simply insert
• Else split node in two
• This requires insertion on one level up
• When splitting a non-leaf, middle value goes one level up
• To insert into full root, create new level
43
Rasmus Ejlers Møgelberg
Deletion example
• Delete jane, bob, joe, eddie, abe and al
• Deletions may cause nodes to be underfull
• This may not be a problem, but in extreme cases it can waste space and time
• When a node is underfull, pointer should be transfered to it from a neighbouring sibling
• If this is not possible, the nodes should merge
44
Rasmus Ejlers Møgelberg
Multi column keys
• B+-trees can also be used for composite keys
• e.g. key (order_id, cd_id)
• Keys sorted lexicographically
45
(7,45)(3,56)
(8,1)(5,13)(4,6)(2,13)(1,34)
(2,12)(1,34) (3,25)(2,13) (4,5)(3,56) (7,34)(5,13)(4,10)(4,6) (7,46)(7,45) (8,1)(1,23)(1,2)
Rasmus Ejlers Møgelberg
Searching multicolumn B+trees
• Index useful for
- full key queries
- partial prefix key queries
• Not useful for partial queries not using prefix
46
select * from purch_cd where purch_id = 1 and cd_id = 34
select * from purch_cd where purch_id = 1
select * from purch_cd where cd_id = 13
Rasmus Ejlers Møgelberg
Summary
• Storage media organised in hierarchy
- Cache
- Main memory (RAM)
- Disk
• Upper layers smaller but faster than lower layers
• Disk access is much more costly than computations and main memory access
• Performance measured in number of disk accesses needed
• Did not cover RAID
47
Rasmus Ejlers Møgelberg
Summary
• Sorted files are costly to maintain
• Multilevel indexed file structures are much more efficient
• B+ trees are multilevel indices which can be maintained efficiently
• B+ trees can be used as
- a storage structure, or
- a structure for a secondary index
• Next time: hash indices
48