Physical data organisation - IT U

Rasmus Ejlers Møgelberg

Physical data organisation

Introduction to Database Design 2012, Lecture 10


Don’t forget

• Mandatory ex. deadline April 11

• No teaching April 9 and 16

• Course evaluation April 11-17

2


Course overview

• Communicating with DBMSs

• Designing databases

• Making databases efficient

• Making databases reliable

3


Making databases efficient

• 2-3 lectures on internals of DBMSs

- Storage structure

- Indices

- Execution plans

• If you really want to become good at using DBs efficiently you should take database tuning class

4


Levels of data abstraction

5

(Illustrations from book)Filesystem

Interface of DBMS

Restricted access view


Today’s lecture

• Physical data organisation

• Storage structures:

- Sorted files

- Multilevel indices

- B+ trees

6


Physical data organisation


Storage hierarchy

8


Storage hierarchy

• The higher, the smaller capacity

• The lower, the slower access

• CPU always tries to read from cache

• If that fails (cache miss) will try main memory

• If that fails: transfer block from disk

9


Characteristics of storage

• Considerations for storage media

- Cost per storage unit

- Access speed

- Reliability

• Types of reliability

- Volatile: lost when power turns off

- Non-volatile: not lost

10


Cache and main memory

• Cache is fastest and smallest type of storage

- Located on chip

- Volatile

• Main memory (RAM)

- Access time 10s - 100s of nanoseconds

- Volatile

11


Flash memory

• Non-volatile

• Reads about as fast as main memory

• Writes slower (microseconds (1000 nanoseconds))

• Supports only a limited number of writes/erases

12


Magnetic disks

13


Typical numbers

• 60-250 revolutions a second

• 100,000 tracks per platter

• 1-5 platters per disk

• 500-2000 sectors per platter

• Sectors typically 512 bytes

14


Performance measures

• Access time (usually milliseconds (= millions of nanoseconds))

- Seek time: moving head to right track

- Rotational latency: waiting for disk to rotate

• Relation between disk access and cache access:

- Like difference between borrowing sugar from your neighbour and flying to Australia to get it!

15


Reading from disk

• Whole blocks transferred at a time

• A block is a fixed number of contiguous sectors

• Blocks are also sometimes called pages

• If you have to go to Australia, you might as well bring more than one bag of sugar

• Similar comparisons between computation time and disk access time

16

Performance measured in number of disk accesses!


File organisation


Sorted files

• Records stored sequentially on disk

• Sorted by primary key

18


Searching sorted files

• Consider queries such as

• Can be implemented using binary search

- Look in the middle of the file

- If equal to 32343 we are done

- If larger than 32343 look in the middle of first half

- Else look in the middle of the second half

- Repeat until found

19

select * from instructor where id = 32343

select * from instructor where id between 20000 and 30000


Performance of binary search

• For each iteration we cut the search space in half

• If table occupies N blocks we need approximately log2(N) disk access operations

• log2(N) is approximately 3 times number of digits in N

20

log2(1000) ≈ 10log2(1000000) ≈ 20

log2(1000000000) ≈ 30


Searching sorted files

• Consider queries such as

• Must use linear search

- Look through file one record at a time

• Number of disk access operations linear in file size

• Very slow if table is large

• Answer: secondary indices (later in this lecture)

21

select * from instructor where name = ‘Einstein’


Deleting from sorted files

• Simply delete records

• Keep list of available space for future insertions

• Reorganise when file sparsely populated

22


Inserting Records

• If space insert

• Otherwise insert into overflow block

• Leave space for insertions (fill factor < 100%)

23


The problem of maintaining sorted files

• Overflow block may increase in size over time

• Overflow blocks are not sorted

• After many insertions advantage of sorted file is lost

• May need to reorganise

• Can be done e.g. at night

24

Sorted files are costly to maintain. Can we do better?


More problems

• Should file be kept on contiguous blocks?

• How else can we find the middle of the file?

• But this makes maintenance harder

• What happens when file cannot be extended further on disk?

25


Indices


Using a sparse index

• Blocks really need not be contiguous

• For search we can use a sparse index

• Index file much smaller than data file

• Note that overflow blocks no longer needed

27

Index file


Sparse index

• For search purposes index file should be kept sorted

• Number of key, block address combinations storable in a block is called the fan out

• We need (number of data blocks)/fan out blocks for the index file

• Index file could potentially be too large to store in single block

• So may need to apply same trick to index file

28


Multilevel indexing

29


Multilevel indexing

30

202

112

145

112

256

202

Nielsen

Bohr

Mozart114

131

112

Day

Moore

Freyd512

923

256

Kleisli

Lewis

Kelly234

245

202

Abel

Korth

Kiefer178

201

145

Inner index Data blocksOuter index


An example

• Suppose there are 10,000 blocks in the data file

• Suppose fan out is 100

• We then need 10,000 / 100 = 100 blocks for the inner index

• We need 1 block for the outer index

• A search using the index requires 3 block reads

• A linear search requires 10,000 block reads

• If there are 1,000,000 blocks of data we need just 1 more index level

31


Secondary indices


Multiple indexes

• Recall the query

• Sorted structure on id does not help us here

• If we do this kind of queries often, we may need a secondary index

• Compare to phone books sorted by business type with an index sorted by company name

33

select * from instructor where name = ‘Einstein’


Secondary indices

• Index over attribute(s) for which data file is not sorted

• Attribute(s) need not be candidate key

• Inner index must be dense, i.e. one entry for each tuple

• But is usually still much smaller than data file

• Outer indices need not be dense, because inner index sorted

• Will need 1 more level of indexing for the secondary index

34


Secondary index

35

202

112

145

112

256

202

Nielsen

Bohr

Mozart114

131

112

Day

Abel

Bohr

Day

Moore

Freyd512

923

256

Kleisli

Lewis

Kelly234

245

202

Abel

Kleisli

Korth178

201

145

Nielsen

Moore

Mozart

Lewis

Kleisli

Korth

Kiefer

Freyd

Kelly

Kleisli

Abel

Freyd

Moore

Abel

Moore

Primary index Secondary indexData blocks


Types of indices

• Clustered vs. non-clustered

- Clustered indices: files sorted by indexing attribute

- Secondary indices are non-clustered

• Sparse vs. dense

- Dense indices have one entry per value

- Sparse indices have one entry per block

36


B+ trees


B+ trees

• Are a multilevel index structure

• Can be used for storage or indexing

• Key features

- Easily maintainable

- Also number of levels can grow

- Search trees are always balanced

• Balance is the key to efficiency

38


B+ trees example

39


Node structure

• Ks are keys, Ps are pointers

• In internal nodes Pi+1 points to subtree containing only nodes with search key x

• Search key values are ordered

• Fan out is n

40

K1 < K2 < K3 ...

Ki ≤ x < Ki+1


B+ trees

• The tree must be balanced:

- All paths from the root to a leaf node have same length

• If fan out is n then

- Leaf nodes must have between (n-1)/2 and n-1 values

- Internal nodes must have between n/2 and n pointers

- Root node must have at least 2 pointers

- (all numbers above should be rounded up)

• Underfull nodes are waste of space

• Space inefficiency may lead to slower queries

41


Insertions example

• Insert vince, vera and rob

42

rickjudy

tompetemikejanebob

eddiebob joejane karenjudy philpetenanmike solrick tomalabe


Insertions rule

• If room, simply insert

• Else split node in two

• This requires insertion on one level up

• When splitting a non-leaf, middle value goes one level up

• To insert into full root, create new level

43


Deletion example

• Delete jane, bob, joe, eddie, abe and al

• Deletions may cause nodes to be underfull

• This may not be a problem, but in extreme cases it can waste space and time

• When a node is underfull, pointer should be transfered to it from a neighbouring sibling

• If this is not possible, the nodes should merge

44


Multi column keys

• B+-trees can also be used for composite keys

• e.g. key (order_id, cd_id)

• Keys sorted lexicographically

45

(7,45)(3,56)

(8,1)(5,13)(4,6)(2,13)(1,34)

(2,12)(1,34) (3,25)(2,13) (4,5)(3,56) (7,34)(5,13)(4,10)(4,6) (7,46)(7,45) (8,1)(1,23)(1,2)


Searching multicolumn B+trees

• Index useful for

- full key queries

- partial prefix key queries

• Not useful for partial queries not using prefix

46

select * from purch_cd where purch_id = 1 and cd_id = 34

select * from purch_cd where purch_id = 1

select * from purch_cd where cd_id = 13


Summary

• Storage media organised in hierarchy

- Cache

- Main memory (RAM)

- Disk

• Upper layers smaller but faster than lower layers

• Disk access is much more costly than computations and main memory access

• Performance measured in number of disk accesses needed

• Did not cover RAID

47


Summary

• Sorted files are costly to maintain

• Multilevel indexed file structures are much more efficient

• B+ trees are multilevel indices which can be maintained efficiently

• B+ trees can be used as

- a storage structure, or

- a structure for a secondary index

• Next time: hash indices

48

Documents

Physical data organisation - IT U