50
1 CSCE 520 Test 2 Info Indexing Modified from slides of Hector Garcia-Molina and Jeff Ullman

CSCE 520 Test 2 Info Indexing

  • Upload
    conlan

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

CSCE 520 Test 2 Info Indexing. Modified from slides of Hector Garcia-Molina and Jeff Ullman. Physical Storage Media. Speed of data access Cost per unit of data Reliability Data loss (power failure or system crash) Physical failure (storage device) Storage types Volatile storage - PowerPoint PPT Presentation

Citation preview

Page 1: CSCE 520  Test 2 Info Indexing

1

CSCE 520 Test 2 InfoIndexing

Modified from slides of Hector Garcia-Molina and Jeff Ullman

Page 2: CSCE 520  Test 2 Info Indexing

2

Physical Storage Media

Speed of data access

Cost per unit of data

Reliability

•Data loss (power failure or system crash)

•Physical failure (storage device)

•Storage types

•Volatile storage

•Non-volatile storage

Page 3: CSCE 520  Test 2 Info Indexing

3

Memory Hierarchy

DBMSPrograms,Main MemoryDBMS

Tertiary Storage

VirtualMemory

Disk FileSystem

Main Memory

Cache

Page 4: CSCE 520  Test 2 Info Indexing

4

Disk Access Characteristics

•Move data to main memory: •Position head on cylinder•Find and access sector

•Steps of reading a block:•Processor and disk controller processes the request •Seek time: position the head•Rotation latency: rotate the sector under the head•Transfer time: sector/block read by the head

Page 5: CSCE 520  Test 2 Info Indexing

5

Disk Access Characteristics

•Steps of writing a block:•Read the block into the main memory•Change main memory copy of block•Write new content back on disk•Verify correctness of write

Page 6: CSCE 520  Test 2 Info Indexing

6

How to find records efficiently?

• Primary key – sequential organization

• Search key?• High I/O cost

INDEXING

Page 7: CSCE 520  Test 2 Info Indexing

Cost of Indexing

• Where the time spent on answering a query

• Fast: processing in memory• Slow: fetching from secondary storage• Cost of indexing:

– Index on several attributes: fast retrieval but slow writes (maintain index structure)

7

Page 8: CSCE 520  Test 2 Info Indexing

8

Topics

• Conventional indexes• B-trees• Hashing schemes (read only)

Page 9: CSCE 520  Test 2 Info Indexing

9

Sequential File

2010

4030

6050

8070

10090

Page 10: CSCE 520  Test 2 Info Indexing

10

Sequential File

2010

4030

6050

8070

10090

Dense Index

10203040

50607080

90100110120

Page 11: CSCE 520  Test 2 Info Indexing

11

Sequential File

2010

4030

6050

8070

10090

Sparse Index

10305070

90110130150

170190210230

Page 12: CSCE 520  Test 2 Info Indexing

12

Sequential File

2010

4030

6050

8070

10090

Sparse 2nd level

10305070

90110130150

170190210230

1090

170250

330410490570

Page 13: CSCE 520  Test 2 Info Indexing

13

Sparse vs. Dense Tradeoff

• Sparse: Less index space per record can keep more of

index in memory• Dense: Can tell if any record exists

without accessing file

Page 14: CSCE 520  Test 2 Info Indexing

14

Terms

• Index sequential file• Search key ( primary key)• Primary index (on Sequencing field)• Secondary index• Dense index (all Search Key values in)• Sparse index• Multi-level index

Page 15: CSCE 520  Test 2 Info Indexing

15

Next:

• Duplicate keys

• Deletion/Insertion

• Secondary indexes

Page 16: CSCE 520  Test 2 Info Indexing

16

Duplicate keys

1010

2010

3020

3030

4540

Page 17: CSCE 520  Test 2 Info Indexing

17

1010

2010

3020

3030

4540

10101020

20303030

1010

2010

3020

3030

4540

10101020

20303030

Dense index, one way to implement?

Duplicate keys

Page 18: CSCE 520  Test 2 Info Indexing

18

1010

2010

3020

3030

4540

10203040

Dense index, better way?

Duplicate keys

Page 19: CSCE 520  Test 2 Info Indexing

19

1010

2010

3020

3030

4540

10102030

Sparse index, one way?

Duplicate keys

care

ful if lookin

gfo

r 2

0 o

r 3

0!

Page 20: CSCE 520  Test 2 Info Indexing

20

1010

2010

3020

3030

4540

10203030

Sparse index, another way?

Duplicate keys

– place first new key from block

shouldthis be40?

Page 21: CSCE 520  Test 2 Info Indexing

21

Duplicate values, primary index

• Index may point to first instance ofeach value only

File Index

Summary

aaa

b

Page 22: CSCE 520  Test 2 Info Indexing

22

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

Page 23: CSCE 520  Test 2 Info Indexing

23

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

– delete record 40

Page 24: CSCE 520  Test 2 Info Indexing

24

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

– delete record 30

4040

Page 25: CSCE 520  Test 2 Info Indexing

25

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

– delete records 30 & 40

5070

Page 26: CSCE 520  Test 2 Info Indexing

26

Deletion from dense index

2010

4030

6050

8070

10203040

50607080

Page 27: CSCE 520  Test 2 Info Indexing

27

Deletion from dense index

2010

4030

6050

8070

10203040

50607080

– delete record 30

4040

Page 28: CSCE 520  Test 2 Info Indexing

28

Insertion, sparse index case

2010

30

5040

60

10304060

Page 29: CSCE 520  Test 2 Info Indexing

29

Insertion, sparse index case

2010

30

5040

60

10304060

– insert record 34

34

• our lucky day! we have free space where we need it!

Page 30: CSCE 520  Test 2 Info Indexing

30

Insertion, sparse index case

2010

30

5040

60

10304060

– insert record 15

15

2030

20

• Illustrated: Immediate reorganization• Variation:

– insert new block (chained file)– update index

Page 31: CSCE 520  Test 2 Info Indexing

31

Insertion, sparse index case

2010

30

5040

60

10304060

– insert record 25

25

overflow blocks(reorganize later...)

Page 32: CSCE 520  Test 2 Info Indexing

32

Insertion, dense index case

• Similar

• Often more expensive . . .

Page 33: CSCE 520  Test 2 Info Indexing

33

Summary so far

• Conventional index– Basic Ideas: sparse, dense, multi-

level…– Duplicate Keys– Deletion/Insertion– Secondary indexes

Page 34: CSCE 520  Test 2 Info Indexing

34

Conventional indexes

Advantage:- Simple- Index is sequential file

good for scans

Disadvantage:- Inserts expensive,

and/or- Lose sequentiality &

balance

Page 35: CSCE 520  Test 2 Info Indexing

35

• NEXT: Another type of index– Give up on sequentiality of index– Try to get “balance”

Page 36: CSCE 520  Test 2 Info Indexing

36

Root

B+Tree Example n=3

100

120

150

180

30

3 5 11

30

35

100

101

110

120

130

150

156

179

180

200

Page 37: CSCE 520  Test 2 Info Indexing

37

Sample non-leaf

to keys to keys to keys to keys

< 57 57 k<81 81k<95 95

57

81

95

Page 38: CSCE 520  Test 2 Info Indexing

38

Sample leaf node:

From non-leaf node

to next leafin

sequence5

7

81

95

To r

eco

rd

wit

h k

ey 5

7

To r

eco

rd

wit

h k

ey 8

1

To r

eco

rd

wit

h k

ey 8

5

Page 39: CSCE 520  Test 2 Info Indexing

39

Size of nodes: n+1 pointersn keys

(fixed)

Page 40: CSCE 520  Test 2 Info Indexing

40

Don’t want nodes to be too empty

• Use at least

Non-leaf: (n+1)/2pointers

Leaf: (n+1)/2 pointers to data

Page 41: CSCE 520  Test 2 Info Indexing

41

Full nodemin. node

Non-leaf

Leaf

n=3

12

01

50

18

0

30

3 5 11

30

35

counts

even if

null

Page 42: CSCE 520  Test 2 Info Indexing

42

B+tree rules tree of order n

(1) All leaves at same lowest level(balanced tree)

(2) Pointers in leaves point to records except for “sequence pointer”

Page 43: CSCE 520  Test 2 Info Indexing

43

(3) Number of pointers/keys for B+tree

Non-leaf(non-root) n+1 n (n+1)/2 (n+1)/2- 1

Leaf(non-root) n+1 n

Root n+1 n 1 1

Max Max Min Min ptrs keys ptrsdata keys

(n+1)/2 (n+1)/2

Page 44: CSCE 520  Test 2 Info Indexing

44

Insert into B+tree (read only)

(a) simple case– space available in leaf

(b) leaf overflow(c) non-leaf overflow(d) new root

Page 45: CSCE 520  Test 2 Info Indexing

45

(a) Insert key = 32 n=33 5 11

30

31

30

100

32

Page 46: CSCE 520  Test 2 Info Indexing

46

(a) Insert key = 7 n=3

3 5 11

30

31

30

100

3 5

7

7

Page 47: CSCE 520  Test 2 Info Indexing

47

(a) Simple case - no example

(b) Coalesce with neighbor (sibling)

(c) Re-distribute keys(d) Cases (b) or (c) at non-leaf

Deletion from B+tree

Page 48: CSCE 520  Test 2 Info Indexing

48

(b) Coalesce with sibling– Delete 50

10

40

100

10

20

30

40

50

n=4

40

Page 49: CSCE 520  Test 2 Info Indexing

49

(c) Redistribute keys– Delete 50

10

40

100

10

20

30

35

40

50

n=4

35

35

Page 50: CSCE 520  Test 2 Info Indexing

50

B+tree deletions in practice

– Often, coalescing is not implemented– Too hard and not worth it!