270
1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha ([email protected]) Philippe Bonnet ([email protected])

1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha ([email protected]) Philippe Bonnet ([email protected])

Embed Size (px)

Citation preview

Page 1: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

1

Database TuningPrinciples, Experiments and

Troubleshooting Techniques Dennis Shasha ([email protected])

Philippe Bonnet ([email protected])

Page 2: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

2

Availability of Materials

1. Power Point presentation is available on my web site (and Philippe Bonnet’s). Just type our names into google

2. Experiments available from Denmark site maintained by Philippe (site in a few slides)

3. Book with same title available from Morgan Kaufmann.

Page 3: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

3

Database Tuning

Database Tuning is the activity of making a database application run more quickly. “More quickly” usually means higher throughput, though it may mean lower

response time for time-critical applications.

Page 4: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

4

ApplicationProgrammer

(e.g., business analyst,Data architect)

SophisticatedApplicationProgrammer

(e.g., SAP admin)

DBA,Tuner

Hardware[Processor(s), Disk(s), Memory]

Operating System

Concurrency Control Recovery

Storage SubsystemIndexes

Query Processor

Application

Page 5: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

5

Outline of Tutorial

1. Basic Principles2. Tuning the guts3. Indexes4. Relational Systems5. Application Interface6. Ecommerce Applications7. Data warehouse Applications8. Distributed Applications9. Troubleshooting

Page 6: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

6

Goal of the Tutorial

• To show:– Tuning principles that port from one system to

the other and to new technologies– Experimental results to show the effect of these

tuning principles.– Troubleshooting techniques for chasing down

performance problems.

Page 7: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

7

Tuning Principles Leitmotifs

• Think globally, fix locally (does it matter?)

• Partitioning breaks bottlenecks (temporal and spatial)

• Start-up costs are high; running costs are low (disk transfer, cursors)

• Be prepared for trade-offs (indexes and inserts)

Page 8: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

8

Experiments -- why and where

• Simple experiments to illustrate the performance impact of tuning principles.

• Philippe Bonnet has lots of code on his course’s site: https://learnit.itu.dk/mod/workshop/view.php?id=43059

Page 9: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

9

Experimental DBMS and Hardware

• Results presented here obtained with old systems. Conclusions mostly hold:

– SQL Server 7, SQL Server 2000, Oracle 8i, Oracle 9i, DB2 UDB 7.1

– Three configurations:1. Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller

from Adaptec (80Mb) 2 Ultra 160 channels, 4x18Gb drives (10000RPM), Windows 2000.

2. Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.

3. Pentium III (1 GHz, 256 Kb), 1Gb RAM, Adapter 39160 with 2 channels, 3x18Gb drives (10000RPM), Linux Debian 2.4.

Page 10: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

10

Tuning the Guts

• Concurrency Control– How to minimize lock contention?

• Recovery– How to manage the writes to the log (to dumps)?

• OS– How to optimize buffer size, process scheduling, …

• Hardware– How to allocate CPU, RAM and disk subsystem

resources?

Page 11: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

11

Isolation

• Correctness vs. Performance– Number of locks held by each transaction– Kind of locks– Length of time a transaction holds locks

Page 12: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

12

Isolation Levels• Read Uncommitted (No lost update)

– Exclusive locks for write operations are held for the duration of the transactions

– No locks for read

• Read Committed (No dirty retrieval)– Shared locks are released as soon as the read operation terminates.

• Repeatable Read (no unrepeatable reads for read/write )– Two phase locking

• Serializable (read/write/insert/delete model)– Table locking or index locking to avoid phantoms

Page 13: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

13

Snapshot isolation

T1

T2

T3

TIME

R(Y) re

turns 1

R(Z) returns 0

R(X) re

turns 0

W(Y:=1)

W(X:=2, Z

:=3)

X=Y=Z=0

• Each transaction executes against the version of the data items that was committed when the transaction started:– No locks for read– Costs space (old copy of data must

be kept)

• Almost serializable level:– T1: x:=y– T2: y:= x– Initially x=3 and y =17– Serial execution:

x,y=17 or x,y=3– Snapshot isolation:

x=17, y=3 if both transactions start at the same time.

Page 14: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

14

Value of Serializability -- Data

Settings:accounts( number, branchnum, balance);

create clustered index c on accounts(number);

– 100000 rows

– Cold buffer; same buffer size on all systems.

– Row level locking

– Isolation level (SERIALIZABLE or READ COMMITTED)

– SQL Server 7, DB2 v7.1 and Oracle 8i on Windows 2000

– Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb), 4x18Gb drives (10000RPM), Windows 2000.

Page 15: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

15

Value of Serializability -- transactions

Concurrent Transactions:– T1: summation query [1 thread]

select sum(balance) from accounts;

– T2: swap balance between two account numbers (in order of scan to avoid deadlocks) [N threads]

T1: valX:=select balance from accounts where number=X;

valY:=select balance from accounts where number=Y;

T2: update accounts set balance=valX where number=Y;

update accounts set balance=valY where number=X;

Page 16: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

16

Value of Serializability -- results

• With SQL Server and DB2 the scan returns incorrect answers if the read committed isolation level is used (default setting)

• With Oracle correct answers are returned (snapshot isolation).

Oracle

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10

Concurrent update threads

Rat

io o

f cor

rect

an

swer

s read committed

serializable

SQLServer

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10

Concurrent update threads

Rat

io o

f cor

rect

an

swer

s

read committed

serializable

Page 17: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

17

Cost of Serializability

Because the update conflicts with the scan, correct answers are obtained at the cost of decreased concurrency and thus decreased throughput.

Oracle

0 2 4 6 8 10

Concurrent Update Threads

Thro

ughp

ut

(tra

ns/s

ec)

read committed

serializable

SQLServer

0 2 4 6 8 10

Concurrent Update Threads

Th

rou

gh

pu

t (t

ran

s/se

c)

read committed

serializable

Page 18: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

18

Locking Implementation

@ Dennis Shasha and Philippe Bonnet, 2013

Database Item (e.g., row or

table)

Lock setL

Wait setW

x LO1(T1,S), LO3(T3,S)

LW2(T2,X)

y LO2(T2,X) LW4(T4, S), LW5(T5, X)

Transaction

ID

Locks

T1 LO1

T2 LO2, LW2

T3 LO3

T4 LW4

T5 LW5

Page 19: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Latches and Locks

• Locks are used for concurrency control– Requests for locks are queued

• Priority queue

– Lock data structure• Locking mode (S, lock granularity, transaction id.• Lock table

• Latches are used for mutual exclusion – Requests for latch succeeds or fails

• Active wait (spinning) on latches on multiple CPU.

– Single location in memory• Test and set for latch manipulation

@ Dennis Shasha and Philippe Bonnet, 2013

Page 20: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Phantom Problem

E# Name age

01 Smith 35

02 Jones 28

Table R T1: Insert into R values (03, Smythe, 42) into R

T2: Select max(age) from R where Name like ‘Sm%’

Select max(age) from R where Name like ‘Sm%’

Any serializable execution returns twice the same max value (either 35 if T2 is executed before T1, or 42 if T1 is executed before

Time

[row1]

[row2]

T1: insert(03,Smythe, 42), commit

T2: get locks on existing rows, read all rows, compute max, read all rows, compute max, release locks, commit

2 Phase locking with row locks does not prevent concurrent insertions as

they only protect existing rows.

2 Phase locking with row locks does not prevent concurrent insertions as

they only protect existing rows.@ Dennis Shasha and Philippe Bonnet, 2013

Snapshot isolation not in effect

This schedule returns twoDifferent max values!

35 42

Page 21: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Solution to Phantom Problem• Solution#1:

– Table locking (mode X)– No insertion is allowed in the table– Problem: too coarse if predicate is

used in transactions• Solution #2:

– Predicate locking – avoid inserting tuples that satisfy a given predicate

• E.g., 30 < age < 40– Problem: very complex to

implement• Solution #3:

– Next key locking (NS)– See index tuning

Set of all tuples that can be inserted in R

Set ofTuples

in R

Set ofTuples

in R

Tuples that

satisfy predicate

P

Tuples that

satisfy predicate

P

@ Dennis Shasha and Philippe Bonnet, 2013

Page 22: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

22

Locking Overhead -- data

Settings:accounts( number, branchnum, balance);

create clustered index c on accounts(number);

– 100000 rows– Cold buffer– SQL Server 7, DB2 v7.1 and Oracle 8i on Windows 2000– No lock escalation on Oracle; Parameter set so that there is no

lock escalation on DB2; no control on SQL Server.

– Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb), 4x18Gb drives (10000RPM), Windows 2000.

Page 23: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

23

Locking Overhead -- transactions

No Concurrent Transactions:– Update [10 000 updates]update accounts set balance = Val;

– Insert [10 000 transactions], e.g. typical one:insert into accounts values(664366,72255,2296.12);

Page 24: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

24

Locking Overhead

Row locking is barely more expensive than table locking because recovery overhead is higher than row locking overhead– Exception is updates on

DB2 where table locking is distinctly less expensive than row locking.

0

0.2

0.4

0.6

0.8

1

update insert

Thro

ughp

ut ra

tio

(row

lock

ing/

tabl

e lo

ckin

g)

db2

sqlserver

oracle

Page 25: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

25

Logical Bottleneck: Sequential Key generation

• Consider an application in which one needs a sequential number to act as a key in a table, e.g. invoice numbers for bills.

• Ad hoc approach: a separate table holding the last invoice number. Fetch and update that number on each insert transaction.

• Counter approach: use facility such as Sequence (Oracle)/Identity(MSSQL).

Page 26: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

26

Counter Facility -- data

Settings:

– default isolation level: READ COMMITTED; Empty tables

– Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb), 4x18Gb drives (10000RPM), Windows 2000.

accounts( number, branchnum, balance);create clustered index c on accounts(number);

counter ( nextkey );insert into counter values (1);

Page 27: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

27

Counter Facility -- transactions

No Concurrent Transactions:– System [100 000 inserts, N threads]

• SQL Server 7 (uses Identity column)insert into accounts values (94496,2789);

• Oracle 8iinsert into accounts values (seq.nextval,94496,2789);

– Ad-hoc [100 000 inserts, N threads]begin transaction NextKey:=select nextkey from counter; update counter set nextkey = NextKey+1; insert into accounts values(NextKey,?,?);commit transaction

Page 28: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

28

Avoid Bottlenecks: Counters

• System generated counter (system) much better than a counter managed as an attribute value within a table (ad hoc). Counter is separate transaction.

• The Oracle counter can become a bottleneck if every update is logged to disk, but caching many counter numbers is possible.

• Counters may miss ids.

SQLServer

0 10 20 30 40 50

Number of concurrent insertion threads

Th

rou

gh

pu

t (s

tate

me

nts

/se

c)

system

ad-hoc

Oracle

0 10 20 30 40 50

Number of concurrent insertion threads

Th

rou

gh

pu

t (s

tate

men

ts/s

ec)

system

ad-hoc

Page 29: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

31

Semantics-Altering Chopping

You call up an airline and want to reserve a seat. You talk to the agent, find a seat and then reserve it.

Should all of this be a single transaction?

Page 30: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

32

Semantics-Altering Chopping II

Probably not. You don’t want to hold locks through human interaction.

Transaction redesign: read is its own transaction. Get seat is another.

Consequences?

Page 31: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

33

Atomicity and Durability

• Every transaction either commits or aborts. It cannot change its mind

• Even in the face of failures:– Effects of committed

transactions should be permanent;

– Effects of aborted transactions should leave no trace.

ACTIVE(running, waiting)

ABORTED

COMMITTEDCOMMIT

ROLLBACK

ØBEGINTRANS

Page 32: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

34

LOG DATADATADATA

STABLE STORAGE

UNSTABLE STORAGE

WRITE log records before commit

WRITE modified pages after commit

RECOVERY

Pi Pj

DATABASE BUFFERLOG BUFFER

lri lrj

Page 33: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Physical Logging (pure)• Update records contain before and after images

– Roll-back: install before image– Roll-forward: install after image

• Pros: – Idempotent. If a crash occurs during recovery, then

recovery simply restart as phase 2 (rollforward) and 3 (rollback) rely on operations that are indempotent.

• Cons: – A single SQL statement might touch many pages and

thus generate many update records– The before and after images are large

@ Dennis Shasha and Philippe Bonnet, 2013

Page 34: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Logical Logging (kdb/other main memory systems)• Update records contain logical operations and its

inverse instead of before and after image– E.g., <op: insert t in T, inv: delete t from T>

• Pro: compact• Cons:

– Not idempotent.• Solution: Include a LastLSN in each database page. During phase 2

of recovery, an operation is rolled forward iff its LSN is higher than the LastLSN of the page.

– Not atomic• What if a logical operation actually involves several pages, e.g., a

data and index page? And possibly several index pages?• Solution: Physiological logging

@ Dennis Shasha and Philippe Bonnet, 2013

Page 35: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Physiological Logging

• Physiological logging– Physical across pages– Logical within a page

• Combines the benefits of logical logging and avoids the problem of atomicity, as logical mini-operations are bound to a single page– Logical operations split into mini-operations on each

page– A log record is created for each mini-operation– Mini-operations are not idempotent, thus page LSN

have to be used in phase 2 of recovery

@ Dennis Shasha and Philippe Bonnet, 2013

Page 36: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Logging in SQL Server, DB2

@ Dennis Shasha and Philippe Bonnet, 2013

FreeLog caches

FlushLog caches

CurrentLog caches

FlushLog caches

dbwriter

Flush queue

Waiting processes

LOG DATA

free Pi free Pj

Log entries:- LSN- before and after images or logical log

Lazy-writer

Synchronous I/O Asynchronous I/O

DATABASE BUFFER

Physiological logging

Page 37: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Logging in Oracle after 10g

@ Dennis Shasha and Philippe Bonnet, 2013

Redo logrecords

(redo+undo)

Redo log buffer

Pi Pj

Free list

DATABASE BUFFER

LOGDATAUNDOLog File

#1Log File

#2

LGWR(log writer)

DBWR(database writer)

Physiological logging

(public) Redo allocation latch

Redo copy latches

Redo logrecords

(redo+undo)

Redo logrecords

(redo+undo)

In memory undo latch

In-memory undo

Private redo

redo allocation latch

In memory undo latch

In-memory undo

Private redo

redo allocation latch

In memory undo latch

In-memory undo

Private redo

redo allocation latch

Page 38: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

40

Log IO -- data

Settings:lineitem ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY,

L_LINENUMBER , L_QUANTITY, L_EXTENDEDPRICE , L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS , L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT , L_SHIPMODE , L_COMMENT );

– READ COMMITTED isolation level

– Empty table

– Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb), 4x18Gb drives (10000RPM), Windows 2000.

Page 39: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

41

Log IO -- transactions

No Concurrent Transactions:

Insertions [300 000 inserts, 10 threads], e.g., insert into lineitem values (1,7760,401,1,17,28351.92,0.04,0.02,'N','O','1996-03-13','1996-02-12','1996-03-22','DELIVER IN PERSON','TRUCK','blithely regular ideas caj');

Page 40: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Group Commits

• For small transactions, log records might have to be flushed to disk before a log page is filled up

• When many small transactions are committed, many IOs are executed to flush half empty pages

• Group commits allow to delay transaction commit until a log page is filled up– Many transactions committed together

• Pros: Avoid too many round trips to disk, i.e., improves throughput

• Cons: Increase mean response time

@ Dennis Shasha and Philippe Bonnet, 2013

Page 41: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

43

Group Commits

• DB2 UDB v7.1 on Windows 2000

• Log records of many transactions are written together– Increases throughput by

reducing the number of writes

– at cost of increased minimum response time.

0

50

100

150

200

250

300

350

1 25

Size of Group Commit

Th

rou

gh

pu

t (t

up

les/

sec)

Page 42: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Put Log on a Separate Disk

• Improve log writer performance– HDD: sequential IOs not disturbed by random

IOs– SSD: minimal garbage collection/wear leveling

• Isolate data and log failures

@ Dennis Shasha and Philippe Bonnet, 2013

Page 43: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

45

Put the Log on a Separate Disk

• DB2 UDB v7.1 on Windows 2000

• 5 % performance improvement if log is located on a different disk

• Controller cache hides negative impact– mid-range server, with

Adaptec RAID controller (80Mb RAM) and 2x18Gb disk drives.

0

50

100

150

200

250

300

350

controller cache no controller cache

Th

rou

gh

pu

t (t

up

les/

sec)

Log on same disk

Log on separate disk

Page 44: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

46

Tuning Database Writes

• Dirty data is written to disk– When the number of dirty pages is greater than

a given parameter (Oracle 8)– When the number of dirty pages crosses a given

threshold (less than 3% of free pages in the database buffer for SQL Server 7)

– When the log is full, a checkpoint is forced. This can have a significant impact on performance.

Page 45: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

47

Tune Checkpoint Intervals

• Oracle 8i on Windows 2000

• A checkpoint (partial flush of dirty pages to disk) occurs at regular intervals or when the log is full:– Impacts the performance of

on-line processing

+ Reduces the size of log

+ Reduces time to recover from a crash

0

0.2

0.4

0.6

0.8

1

1.2

0 checkpoint 4 checkpoints

Thro

ughp

ut R

atio

Page 46: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

48

Database Buffer Size

• Buffer too small, then hit ratio too small

hit ratio = (logical acc. - physical acc.) / (logical acc.)

• Buffer too large, paging• Recommended strategy:

monitor hit ratio and increase buffer size until hit ratio flattens out. If there is still paging, then buy memory. LOG DATADATA

RAM

Paging Disk

DATABASE PROCESSES

DATABASEBUFFER

Page 47: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

49

Buffer Size -- data

Settings:employees(ssnum, name, lat, long, hundreds1,

hundreds2);

clustered index c on employees(lat); (unused)– 10 distinct values of lat and long, 100 distinct values of hundreds1

and hundreds2

– 20000000 rows (630 Mb);

– Warm Buffer

– Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb), 4x18Gb drives (10000 RPM), Windows 2000.

Page 48: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

50

Buffer Size -- queries

Queries:– Scan Queryselect sum(long) from employees;

– Multipoint queryselect * from employees where lat = ?;

Page 49: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

51

Database Buffer Size

• SQL Server 7 on Windows 2000

• Scan query:– LRU (least recently used) does

badly when table spills to disk as Stonebraker observed 20 years ago.

• Multipoint query:– Throughput increases with

buffer size until all data is accessed from RAM.

Multipoint Query

0

40

80

120

160

0 200 400 600 800 1000

Buffer Size (Mb)

Th

rou

gh

pu

t (Q

uer

ies/

sec)

Scan Query

0

0.02

0.04

0.06

0.08

0.1

0 200 400 600 800 1000

Buffer Size (Mb)

Th

rou

gh

pu

t (Q

ue

rie

s/s

ec

)

Page 50: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

52

Scan Performance -- data

Settings:lineitem ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY,

L_LINENUMBER , L_QUANTITY, L_EXTENDEDPRICE , L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS , L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT , L_SHIPMODE , L_COMMENT );

– 600 000 rows – Lineitem tuples are ~ 160 bytes long– Cold Buffer– Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller

from Adaptec (80Mb), 4x18Gb drives (10000RPM), Windows 2000.

Page 51: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

53

Scan Performance -- queries

Queries:select avg(l_discount) from lineitem;

Page 52: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

54

Prefetching

• DB2 UDB v7.1 on Windows 2000

• Throughput increases up to a certain point when prefetching size increases.

0

0.05

0.1

0.15

0.2

32Kb 64Kb 128Kb 256Kb

Prefetching

Thro

ughp

ut (T

rans

/sec

)

scan

Page 53: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

55

Usage Factor

• DB2 UDB v7.1 on Windows 2000

• Usage factor is the percentage of the page used by tuples and auxilliary data structures (the rest is reserved for future)

• Scan throughput increases with usage factor.

0

0.05

0.1

0.15

0.2

70 80 90 100

Usage Factor (%)

Thro

ughp

ut (T

rans

/sec

)

scan

Page 54: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Large Reads

• External algorithms for sorting/hashing manipulate a working set which is larger than RAM (or larger than the buffer space allocated for sorting/hashing)– Sort or hash is performed in multiple passes– In each pass data is read from disk,

hashed/sorted/merged in memory and then written to secondary

– Pass N+1 can only start when Pass N is done writing data back to secondary storage

@ Dennis Shasha and Philippe Bonnet, 2013

Page 55: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

© Dennis Shasha, Philippe Bonnet 2001

ChopLarge Update Transactions

• Consider an update-intensive batch transaction (concurrent access is not an issue):

It can be broken up in short transactions(mini-batch):

+ Does not overfill the log buffers+ Does not overfill the log files

Example: Transaction that updates, in sorted order, all accounts that had activity on them, in a given day.Break-up to mini-batches each of which access 10,000 accounts and then updates a global counter.

Note: DB2 has a parameter limiting the portion of the log used by a single transaction (max_log)

Page 56: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

58

Tuning the Storage Subsystem

Page 57: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

59

Outline

• Storage Subsystem Components– Moore’s law and consequences– Magnetic disk performances

• From SCSI to SAN, NAS and Beyond– Storage virtualization

• Tuning the Storage Subsystem– RAID levels– RAID controller cache

Page 58: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

60

Exponential Growth

Moore’s law– Every 18 months:

• New processing = sum of all existing processing• New storage = sum of all existing storage

– 2x / 18 months ~ 100x / 10 years

http://www.intel.com/research/silicon/moorespaper.pdf

Page 59: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

61

Consequences of “Moore’s law”

• Over the last decade:– 10x better access time

– 10x more bandwidth

– 100x more capacity

– 4000x lower media price– Scan takes 10x longer (3 min vs 45 min)

– Data on disk is accessed 25x less often (on average)

Page 60: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

63

Memory Hierarchy

Processor cache

RAM/flash

Disks

Tapes / Optical Disks

Access TimePrice $/ Mb

1 ns

x10

x106

x1010

100

10

0.2

0.2 (nearline)

Page 61: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

64

Magnetic Disks1956: IBM (RAMAC) first

disk drive5 Mb – 0.002 Mb/in235000$/year9 Kb/sec

1980: SEAGATEfirst 5.25’’ disk drive5 Mb – 1.96 Mb/in2625 Kb/sec

1999: IBM MICRODRIVEfirst 1’’ disk drive340Mb 6.1 MB/secController

read/write head

disk arm

tracks

platter

spindle

actuator

disk interface

Page 62: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

65

Magnetic Disks

• Access Time (2001)– Controller overhead (0.2 ms)– Seek Time (4 to 9 ms)– Rotational Delay (2 to 6 ms)– Read/Write Time (10 to 500

KB/ms)

• Disk Interface– IDE (16 bits, Ultra DMA - 25

MHz)– SCSI: width (narrow 8 bits vs.

wide 16 bits) - frequency (Ultra3 - 80 MHz).

http://www.pcguide.com/ref/hdd/

Page 63: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Solid State Drive (SSD)

@ Dennis Shasha and Philippe Bonnet, 2013

Page program

Page program

Page program

Page program

Chip boundChannel bound

Four parallel reads Four parallel writes

Chip1Chip2Chip3Chip4

Pagetransfer

Pageread

Command

Chip bound

ReadWrite

Log

ical

ad

dre

ss s

pac

e

Scheduling& Mapping

Wear Leveling

Garbage collection

ReadProgram

Erase

Chip

Chip

Chip

Chip

Chip

Chip

Chip

Chip

Chip

Chip

Chip

Chip

Flash memory array

Channels

Ph

ysic

al a

dd

ress

sp

ace

Example on a disk with 1 channel and 4 chips

Page 64: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Performance Contract

• Disk (spinning media)– The block device abstraction hides a lot of complexity while

providing a simple performance contract:• Sequential IOs are orders of magnitude faster than random IOs• Contiguity in the logical space favors sequential Ios

• Flash (solid state drive)– No intrinsic performance contract– A few invariants:

• No need to avoid random IOs• Applications should avoid writes smaller than a flash page• Applications should fill up the device IO queues (but not overflow

them) so that the SSD can leverage its internal parallelism

@ Dennis Shasha and Philippe Bonnet, 2013

Page 65: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

68

RAID Levels

• RAID 0: striping (no redundancy)• RAID 1: mirroring (2 disks)• RAID 5: parity checking

– Read: stripes read from multiple disks (in parallel)– Write: 2 reads + 2 writes

• RAID 10: striping and mirroring• Software vs. Hardware RAID:

– Software RAID: run on the server’s CPU– Hardware RAID: run on the RAID controller’s CPU

Page 66: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

69

Why 4 read/writes when updating a single stripe using RAID 5?

• Read old data stripe; read parity stripe (2 reads)

• XOR old data stripe with replacing one.

• Take result of XOR and XOR with parity stripe.

• Write new data stripe and new parity stripe (2 writes).

Page 67: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

70

RAID Levels -- data

Settings:accounts( number, branchnum, balance);create clustered index c on accounts(number);

– 100000 rows

– Cold Buffer

– Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb), 4x18Gb drives (10000RPM), Windows 2000.

Page 68: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

71

RAID Levels -- transactions

No Concurrent Transactions:– Read Intensive: select avg(balance) from accounts;

– Write Intensive, e.g. typical insert:insert into accounts values (690466,6840,2272.76);

Writes are uniformly distributed.

Page 69: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

72

RAID Levels• SQL Server7 on Windows

2000 (SoftRAID means striping/parity at host)

• Read-Intensive:– Using multiple disks

(RAID0, RAID 10, RAID5) increases throughput significantly.

• Write-Intensive:– Without cache, RAID 5

suffers. With cache, it is ok.

Write-Intensive

0

40

80

120

160

Soft-RAID5

RAID5 RAID0 RAID10 RAID1 SingleDisk

Th

rou

gh

pu

t (t

up

les/

sec)

Read-Intensive

0

20000

40000

60000

80000

Soft-RAID5

RAID5 RAID0 RAID10 RAID1 SingleDisk

Th

rou

gh

pu

t (t

up

les/

sec)

Page 70: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

73

RAID Levels

• Log File– RAID 1 is appropriate

• Fault tolerance with high write throughput. Writes are synchronous and sequential. No benefits in striping.

• Temporary Files– RAID 0 is appropriate.

• No fault tolerance. High throughput.

• Data and Index Files– RAID 5 is best suited for read intensive apps or if the

RAID controller cache is effective enough.– RAID 10 is best suited for write intensive apps.

Page 71: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

74

Controller Prefetching no, Write-back yes.

• Read-ahead:– Prefetching at the disk controller level. – No information on access pattern.– Better to let database management system do it.

• Write-back vs. write through:– Write back: transfer terminated as soon as data is

written to cache. • Batteries to guarantee write back in case of power failure

– Write through: transfer terminated as soon as data is written to disk.

Page 72: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

75

SCSI Controller Cache -- data

Settings:employees(ssnum, name, lat, long, hundreds1,hundreds2);

create clustered index c on employees(hundreds2);

– Employees table partitioned over two disks; Log on a separate disk; same controller (same channel).

– 200 000 rows per table– Database buffer size limited to 400 Mb.– Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID

controller from Adaptec (80Mb), 4x18Gb drives (10000RPM), Windows 2000.

Page 73: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

76

SCSI (not disk) Controller Cache -- transactions

No Concurrent Transactions:update employees set lat = long, long = lat where hundreds2 = ?;

– cache friendly: update of 20,000 rows (~90Mb)

– cache unfriendly: update of 200,000 rows (~900Mb)

Page 74: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

77

SCSI Controller Cache

• SQL Server 7 on Windows 2000.

• Adaptec ServerRaid controller:– 80 Mb RAM– Write-back mode

• Updates• Controller cache increases

throughput whether operation is cache friendly or not.– Efficient replacement policy!

2 Disks - Cache Size 80Mb

0

500

1000

1500

2000

cache friendly (90Mb) cache unfriendly (900Mb)

Th

rou

gh

pu

t (t

up

les

/se

c)

no cache

cache

Page 75: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Dealing with Multi-Core

@ Dennis Shasha and Philippe Bonnet, 2013

LOOK UP: SW for shared multi-core, Interrupts and IRQ tuning

CPUCPU

L1 cacheL1 cache

Core 0

CPUCPU

L1 cacheL1 cache

Core 1

L2 CacheL2 Cache

Socket 0

CPUCPU

L1 cacheL1 cache

Core 2

CPUCPU

L1 cacheL1 cache

Core 3

L2 CacheL2 Cache

Socket 1

System BusSystem Bus

IO, NIC Interrupts

RAMRAM

Cache locality is king:•Processor affinity•Interrupt affinity•Spinning vs. blocking

Page 76: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Row storePage Layout

structure record_id { integer page_id; integer row_id:}

procedure RECORD_ID_TO_BYTES(int record_id) returns bytes { pid = record_id.page_id; p = PAGE_ID_TO_PAGE(pid); byte byte_array[PAGE_SIZE_IN_BYTES]; byte_array = p.contents; byte_address = byte_array + PAGE_SIZE_IN_BYTES-1; row_start = byte_address – record_id.row_id * 2 // each address entry is 2B return RECORD_ADDRESS_TO_BYTES(int row_address);}

Page 77: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Record Structure

Procedure column_id_to_bytes return bytes

Page 78: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Storing Large Attributes

Page 79: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Columnstore Ids

• Explicit IDs– Expand size on disk– Expand size when transferring data to RAM

• Virtual IDs– Offset as virtual ID – Trades simple arithmetic for space

• I.e., CPU time for IO time

– Assumes fixed width attributes• Challenge when using compression

Page 80: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Page Layoutsource: IEEE

Row store: N-aryStorage Model – NSM)

DecomposedStorage Model – DSM

PAX Model –Partition Attributes

Across

Page 81: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

PAX Model

• Invented by A.Ailamaki in early 2000s

• IO Pattern of NSM

• Great for cache utilization– columns packed together in cache lines

Page 82: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Partitioning

• There is parallelism (i) across servers, and (ii) within a server both at the CPU level and throughout the IO stack.

• To leverage this parallelism– Rely on multiple instances/multiple partitions per instance

• A single database is split across several instances. Different partitions can be allocated to different CPUs (partition servers) / Disks (partition).

• Problem#1: How to control overall resource usage across instances/partitions?

– Control the number/priority of threads spawned by a DBMS instance

• Problem#2: How to manage priorities?• Problem#3: How to map threads onto the available cores

– Fix: processor/interrupt affinity

@ Dennis Shasha and Philippe Bonnet, 2013

Page 83: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Instance Caging

• Allocating a number of CPU (core) or a percentage of the available IO bandwidth to a given DBMS Instance

• Two policies:– Partitioning: the total

number of CPUs is partitioned across all instances

– Over-provisioning: more than the total number of CPUs is allocated to all instances

@ Dennis Shasha and Philippe Bonnet, 2013

LOOK UP: Instance Caging

# Cores

Instance A(2 CPU)

Instance A(2 CPU)

Instance B(2 CPU)

Instance B(2 CPU)

Instance C(1 CPU)

Instance C(1 CPU)

Instance D(1 CPU)

Instance D(1 CPU)

Max #core

Partitioning Over-provisioning

Instance A(2 CPU)

Instance A(2 CPU)

Instance B(3 CPU)

Instance B(3 CPU)

Instance C(2 CPU)

Instance C(2 CPU)

Instance C(1 CPU)

Instance C(1 CPU)

Page 84: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Number of DBMS Threads

• Given the DBMS process architecture– How many threads should be defined for

• Query agents (max per query, max per instance)– Multiprogramming level (see tuning the writes and index tuning)

• Log flusher– See tuning the writes

• Page cleaners– See tuning the writes

• Prefetcher– See index tuning

• Deadlock detection– See lock tuning

• Fix the number of DBMS threads based on the number of cores available at HW/VM level

• Partitioning vs. Over-provisioning• Provisioning for monitoring, back-up, expensive stored procedures/UDF

@ Dennis Shasha and Philippe Bonnet, 2013

Page 85: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Priorities

• Mainframe OS have allowed to configure thread priority as well as IO priority for some time. Now it is possible to set IO priorities on Linux as well:– Threads associated to synchronous IOs (writes to the

log, page cleaning under memory pressure, query agent reads) should have higher priorities than threads associated to asynchronous IOs (prefetching, page cleaner with no memory pressure) – see tuning the writes and index tuning

– Synchronous IOs should have higher priority than asynchronous IOs.

@ Dennis Shasha and Philippe Bonnet, 2013

LOOK UP: Getting Priorities Straight, Linux IO priorities

Page 86: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

The Priority Inversion Problem

@ Dennis Shasha and Philippe Bonnet, 2013

lock X

request X

Priority #1

T3

T1

T2

Priority #2

Priority #3

runningwaiting

Transaction states

Three transactions:T1, T2, T3 in priority order (high to low)

– T3 obtains lock on x and is preempted

– T1 blocks on x lock, so is descheduled

– T2 does not access x and runs for a long time

Net effect: T1 waits for T2Solution:

– No thread priority– Priority inheritance

Page 87: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Processor/Interrupt Affinity

• Mapping of thread context or interrupt to a given core– Allows cache line sharing between application

threads or between application thread and interrupt (or even RAM sharing in NUMA)

– Avoid dispatch of all IO interrupts to core 0 (which then dispatches software interrupts to the other cores)

– Should be combined with VM options– Specially important in NUMA context– Affinity policy set at OS level or DBMS level?

@ Dennis Shasha and Philippe Bonnet, 2013

LOOK UP: Linux CPU tuning

Page 88: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Processor/Interrupt Affinity

• IOs should complete on the core that issued them– I/O affinity in SQL server– Log writers distributed across all NUMA nodes

• Locking of a shared data structure across cores, and specially across NUMA nodes– Avoid multiprogramming for query agents that

modify data– Query agents should be on the same NUMA node

• DBMS have pre-set NUMA affinity policies

@ Dennis Shasha and Philippe Bonnet, 2013

LOOK UP: Oracle and NUMA, SQL Server and NUMA

Page 89: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Transferring large files (with TCP)

• With the advent of compute cloud, it is often necessary to transfer large files over the network when loading a database.

• To speed up the transfer of large files:– Increase the s ize of the TCP buffers– Increase the socket buffer size (Linux)– Set up TCP large windows (and timestamp) – Rely on selective acks

@ Dennis Shasha and Philippe Bonnet, 2013

LOOK UP: TCP Tuning

Page 90: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Throwing hardware at a problem

• More Memory– Increase buffer size without increasing paging

• More Disks– Log on separate disk– Mirror frequently read file– Partition large files

• More Processors– Off-load non-database applications onto other CPUs– Off-load data mining applications to old database copy– Increase throughput to shared data

• Shared memory or shared disk architecture

@ Dennis Shasha and Philippe Bonnet, 2013

Page 91: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Virtual Storage Performance

@ Dennis Shasha and Philippe Bonnet, 2013

Intel 710(100GN, 2.5in SATA 3Gb/s, 25nm, MLC)170 MB/s sequential write

85 usec latency write38500 iops Random reads (full range; 32 iodepth)

75 usec latency reads

Core i5 CPU 750 @

2.67GHz

Core i5 CPU 750 @

2.67GHz

Host: Ubuntu 12.04noop scheduler

VM: VirtualBox 4.2 (nice -5)

all accelerations enabled4 CPUs8 GB VDI disk (fixed)SATA Controller

Guest: Ubuntu 12.04noop scheduler

4 cores

Experiments with flexible I/O tester (fio): - sequential writes (seqWrites.fio) - random reads (randReads.fio)

[seqWrites]ioengine=libaioIodepth=32rw=writebs=4k,4kdirect=1Numjobs=1size=50mdirectory=/tmp

[randReads]ioengine=libaioIodepth=32rw=readbs=4k,4kdirect=1Numjobs=1size=50mdirectory=/tmp

Page 92: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Virtual Storage - seqWrites

@ Dennis Shasha and Philippe Bonnet, 2013

Performance on Host Performance on Guest

Default page size is 4k, default iodepth is 32

Page 93: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Virtual Storage - seqWrites

@ Dennis Shasha and Philippe Bonnet, 2013

Page size is 4k, iodepth is 32

Page 94: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Virtual Storage - randReads

@ Dennis Shasha and Philippe Bonnet, 2013

Page size is 4k*

* experiments with 32k show negligible difference

Page 95: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

98

Index Tuning

• Index issues– Indexes may be better or worse than scans – Multi-table joins that run on for hours, because

the wrong indexes are defined– Concurrency control bottlenecks– Indexes that are maintained and never used

Page 96: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

99

Index

An index is a data structure that supports efficient access to data

Set ofRecords

indexCondition

onattribute

value

Matchingrecords

(search key)

Page 97: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

100

Search Keys

• A (search) key is a sequence of attributes.create index i1 on accounts(branchnum, balance);

• Types of keys – Sequential: the value of the key is monotonic

with the insertion order (e.g., counter or timestamp)

– Non sequential: the value of the key is unrelated to the insertion order (e.g., social security number)

Page 98: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

101

Data Structures

• Most index data structures can be viewed as trees. • In general, the root of this tree will always be in

main memory, while the leaves will be located on disk.– The performance of a data structure depends on the

number of nodes in the average path from the root to the leaf.

– Data structure with high fan-out (maximum number of children of an internal node) are thus preferred.

Page 99: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

102

B+-Tree Performance

• Tree levels– Tree Fanout

• Size of key• Page utilization

• Tree maintenance– Online

• On inserts• On deletes

– Offline

• Tree locking• Tree root in main memory

Page 100: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

B+-TreeA B+-Tree is a balanced tree whose nodes contain a sequence of key-pointer pairs.

Fan-out

Depth

Page 101: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

B+-Tree• Nodes contains a bounded number of key-pointer

pairs determined by b (branching factor)– Internal nodes: ceiling(b/ 2) <= size(node) <= b– Root node:

• Root is the only node in the tree: 1 <= size(node) <= b• Internal nodes exist: 2 <= size(node) <= b

– Leaves (no pointers):• floor(b/2) <= number(keys) <= b-1

• Insertion, deletion algorithms keep the tree balanced, and maintain these constraints on the size of each node– Nodes might then be split or merged, possibly the

depth of the tree is increased.

size(node) = number of key-pointer pairssize(node) = number of key-pointer pairs

Page 102: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

B+-Tree Performance #1• Memory / Disk

– Root is always in memory– What is the portion of the index actually in memory?

• Impacts the number of IO• Worst-case: an I/O per level in the B+-tree!

• Fan-out and tree level– Both are interdependent– They depend on the branching factor

– In practice, index nodes are mapped onto index pages of fixed size– Branching factor then depends on key size (pointers of fixed size) and

page utilization

Page 103: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

106

B+-Tree Performance

• Key length influences fanout– Choose small key when creating an index

– Key compression• Prefix compression (Oracle 8, MySQL): only store that part of

the key that is needed to distinguish it from its neighbors: Smi, Smo, Smy for Smith, Smoot, Smythe.

• Front compression (Oracle 5): adjacent keys have their front portion factored out: Smi, (2)o, (2)y. There are problems with this approach:

– Processor overhead for maintenance

– Locking Smoot requires locking Smith too.

Page 104: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

107

Hash Index

• A hash index stores key-value pairs based on a pseudo-randomizing function called a hash function.

Hashed key values

01

n

R1 R5

R3 R6 R9 R14 R17 R21 R25Hash

function

key

2341

The length ofthese chains impacts performance

Page 105: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

108

Clustered / Non clustered index

• Clustered index (primary index)– A clustered index on

attribute X co-locates records whose X values are near to one another.

• Non-clustered index (secondary index)– A non clustered index does

not constrain table organization.

– There might be several non-clustered indexes per table.

Records Records

Page 106: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

109

Dense / Sparse Index

• Sparse index– Pointers are associated to

pages

• Dense index– Pointers are associated to

records

– Non clustered indexes are dense

P1 PiP2 record

recordrecord

Page 107: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

DBMS Implementation

• Oracle 11g– B-tree vs. Hash vs. Bitmap– Index-organized table vs. heap

• Non-clustered index can be defined on both– Reverse key indexes– Key compression– Invisible index (not visible to optimizer – allows

for experimentation)– Function-based indexes

• CREATE INDEX idx ON table_1 (a + b * (c - 1), a, b);

See: Oracle 11g indexes

Page 108: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

DBMS Implementation

• DB2 10– B+-tree (hash index only for db2 for z/OS)– Non-cluster vs. cluster– Key compression– Indexes on expression

• SQL Server 2012– B+-tree (spatial indexes built on top of B+-trees)– Columnstore index for OLAP queries– Non-cluster vs. cluster– Non key columns included in index (for coverage)– Indexes on simple expressions (filtered indexes)

See: DB2 types of indexes, SQLServer 2012 indexes

Page 109: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

113

Types of Queries

• Point Query

SELECT balanceFROM accountsWHERE number = 1023;

• Multipoint Query

SELECT balanceFROM accountsWHERE branchnum = 100;

• Range Query

SELECT numberFROM accountsWHERE balance > 10000 and balance <= 20000;

• Prefix Match Query

SELECT *FROM employeesWHERE name = ‘J*’ ;

Page 110: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

114

More Types of Queries

• Extremal Query

SELECT *FROM accountsWHERE balance = max(select balance from accounts)

• Ordering Query

SELECT *FROM accountsORDER BY balance;

• Grouping Query

SELECT branchnum, avg(balance)FROM accountsGROUP BY branchnum;

• Join Query

SELECT distinct branch.adresseFROM accounts, branchWHERE accounts.branchnum =

branch.numberand accounts.balance > 10000;

Page 111: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

115

Index Tuning -- data

Settings:employees(ssnum, name, lat, long, hundreds1,

hundreds2);

clustered index c on employees(hundreds1) with fillfactor = 100;

nonclustered index nc on employees (hundreds2);

index nc3 on employees (ssnum, name, hundreds2);

index nc4 on employees (lat, ssnum, name);– 1000000 rows ; Cold buffer

– Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb), 4x18Gb drives (10000RPM), Windows 2000.

Page 112: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

116

Index Tuning -- operations

Operations:– Update:

update employees set name = ‘XXX’ where ssnum = ?;– Insert:

insert into employees values (1003505,'polo94064',97.48,84.03,4700.55,3987.2);

– Multipoint query: select * from employees where hundreds1= ?; select * from employees where hundreds2= ?;

– Covered query: select ssnum, name, lat from employees;

– Range Query: select * from employees where long between ? and ?;

– Point Query: select * from employees where ssnum = ?

Page 113: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

117

Clustered Index

• Multipoint query that returns 100 records out of 1000000.

• Cold buffer• Clustered index is

twice as fast as non-clustered index and orders of magnitude faster than a scan.

0

0.2

0.4

0.6

0.8

1

SQLServer Oracle DB2

Th

rou

gh

pu

t ra

tio

clustered nonclustered no index

Page 114: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

118

Index “Face Lifts”

• Index is created with fillfactor = 100.

• Insertions cause page splits and extra I/O for each query

• Maintenance consists in dropping and recreating the index

• With maintenance performance is constant while performance degrades significantly if no maintenance is performed.

SQLServer

0

20

40

60

80

100

0 20 40 60 80 100

% Increase in Table Size

Th

rou

gh

pu

t (q

ue

rie

s/s

ec

)

No maintenance

Maintenance

Page 115: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

119

Index Maintenance

• In Oracle, clustered index are approximated by an index defined on a clustered table

• No automatic physical reorganization

• Index defined with pctfree = 0

• Overflow pages cause performance degradation

Oracle

0

5

10

15

20

0 20 40 60 80 100

% Increase in Table Size

Th

rou

gh

pu

t (q

uer

ies/

sec)

Nomaintenance

Page 116: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

120

Covering Index - defined

• Select name from employee where department = “marketing”

• Good covering index would be on (department, name)

• Index on (name, department) less useful.

• Index on department alone moderately useful.

Page 117: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

121

Covering Index - impact

• Covering index performs better than clustering index when first attributes of index are in the where clause and last attributes in the select.

• When attributes are not in order then performance is much worse.

0

10

20

30

40

50

60

70

SQLSe rv e r

Th

rou

gh

pu

t (q

uer

ies/

sec)

cov e ring

cov e ring - notorde re d

non cluste ring

cluste ring

Page 118: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

122

Scan Can Sometimes Win

• IBM DB2 v7.1 on Windows 2000

• Range Query• If a query retrieves 10% of

the records or more, scanning is often better than using a non-clustering non-covering index. Crossover > 10% when records are large or table is fragmented on disk – scan cost increases.

0 5 10 15 20 25

% of se le cte d re cords

Th

rou

gh

pu

t (q

ue

rie

s/s

ec

)

scan

non clustering

Page 119: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

123

Index on Small Tables

• Small table: 100 records, i.e., a few pages.

• Two concurrent processes perform updates (each process works for 10ms before it commits)

• No index: the table is scanned for each update. No concurrent updates.

• A clustered index allows to take advantage of row locking.

0

2

4

6

8

10

12

14

16

18

no index index

Th

rou

gh

pu

t (u

pd

ates

/sec

)

Page 120: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

124

Bitmap vs. Hash vs. B+-Tree

Settings:employees(ssnum, name, lat, long, hundreds1,

hundreds2);create cluster c_hundreds (hundreds2 number(8)) PCTFREE 0;create cluster c_ssnum(ssnum integer) PCTFREE 0 size 60;

create cluster c_hundreds(hundreds2 number(8)) PCTFREE 0 HASHKEYS 1000 size 600;

create cluster c_ssnum(ssnum integer) PCTFREE 0 HASHKEYS 1000000 SIZE 60;

create bitmap index b on employees (hundreds2);create bitmap index b2 on employees (ssnum);

– 1000000 rows ; Cold buffer– Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb),

4x18Gb drives (10000RPM), Windows 2000.

Page 121: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

125

Multipoint query: B-Tree, Hash Tree, Bitmap

• There is an overflow chain in a hash index because hundreds2 has few values

• In a clustered B-Tree index records are on contiguous pages.

• Bitmap is proportional to size of table and non-clustered for record access.

Multipoint Queries

0

5

10

15

20

25

B-Tree Hash index Bitmap index

Th

rou

gh

pu

t (q

ue

rie

s/s

ec

)

Page 122: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

126

• Hash indexes don’t help when evaluating range queries

• Hash index outperforms B-tree on point queries

Range Queries

0

0.1

0.2

0.3

0.4

0.5

B-Tree Hash index Bitmap index

Th

rou

gh

pu

t (q

ue

rie

s/s

ec

)

B-Tree, Hash Tree, Bitmap

Point Queries

0

10

20

30

40

50

60

B-Tree hash index

Th

rou

gh

pu

t(q

ue

rie

s/s

ec

)

Page 123: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

127

Tuning Relational Systems

• Schema Tuning– Denormalization, Vertical Partitioning

• Query Tuning– Query rewriting– Materialized views

Page 124: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

128

Denormalizing -- data

Settings:lineitem ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY,

L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE , L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS , L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT , L_SHIPMODE , L_COMMENT );

region( R_REGIONKEY, R_NAME, R_COMMENT );

nation( N_NATIONKEY, N_NAME, N_REGIONKEY, N_COMMENT,);

supplier( S_SUPPKEY, S_NAME, S_ADDRESS, S_NATIONKEY, S_PHONE, S_ACCTBAL, S_COMMENT);

– 600000 rows in lineitem, 25 nations, 5 regions, 500 suppliers

Page 125: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

129

Denormalizing -- transactions

lineitemdenormalized ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE , L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS , L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT , L_SHIPMODE , L_COMMENT, L_REGIONNAMEL_REGIONNAME);

– 600000 rows in lineitemdenormalized

– Cold Buffer

– Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.

Page 126: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

130

Queries on Normalized vs. Denormalized Schemas

Queries:select L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY,

L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT, R_NAME

from LINEITEM, REGION, SUPPLIER, NATIONwhereL_SUPPKEY = S_SUPPKEYand S_NATIONKEY = N_NATIONKEYand N_REGIONKEY = R_REGIONKEYand R_NAME = 'EUROPE';

select L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT, L_REGIONNAME

from LINEITEMDENORMALIZEDwhere L_REGIONNAME = 'EUROPE';

Page 127: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

131

Denormalization

• TPC-H schema• Query: find all lineitems

whose supplier is in Europe.

• With a normalized schema this query is a 4-way join.

• If we denormalize lineitem and add the name of the region for each lineitem (foreign key denormalization) throughput improves 30%

0

0.0005

0.001

0.0015

0.002

normalized denormalized

Th

rou

gh

pu

t (Q

uer

ies/

sec)

Page 128: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

132

Vertical Partitioning

• Consider account(id, balance, homeaddress)

• When might it be a good idea to do a “vertical partitioning” into account1(id,balance) and account2(id,homeaddress)?

• Join vs. size.

Page 129: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

133

Vertical Partitioning

• Which design is better depends on the query pattern:– The application that sends a

monthly statement is the principal user of the address of the owner of an account

– The balance is updated or examined several times a day.

• The second schema might be better because the relation (account_ID, balance) can be made smaller:– More account_ID, balance

pairs fit in memory, thus increasing the hit ratio

– A scan performs better because there are fewer pages.

Page 130: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

134

Tuning Normalization

• A single normalized relation XYZ is better than two normalized relations XY and XZ if the single relation design allows queries to access X, Y and Z together without requiring a join.

• The two-relation design is better iff:– Users access tend to partition between the two sets Y

and Z most of the time

– Attributes Y or Z have large values

Page 131: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

135

Vertical Partitioningand Scan

• R (X,Y,Z)– X is an integer– YZ are large strings

• Scan Query• Vertical partitioning

exhibits poor performance when all attributes are accessed.

• Vertical partitioning provides a sped up if only two of the attributes are accessed.

0

0.005

0.01

0.015

0.02

No Partitioning -XYZ

VerticalPartitioning - XYZ

No Partitioning -XY

VerticalPartitioning - XY

Th

rou

gp

ut

(qu

erie

s/se

c)

Page 132: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

136

Vertical Partitioningand Point Queries

• R (X,Y,Z)– X is an integer– YZ are large strings

• A mix of point queries access either XYZ or XY.

• Vertical partitioning gives a performance advantage if the proportion of queries accessing only XY is greater than 20%.

• The join is not expensive compared to a simple look-up.

0

200

400

600

800

1000

0 20 40 60 80 100

% of access that only concern XY

Th

rou

gh

pu

t (q

uer

ies/

sec)

no vertical partitioning

vertical partitioning

Page 133: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

137

Queries

Settings: employee(ssnum, name, dept, salary, numfriends);student(ssnum, name, course, grade);techdept(dept, manager, location);

clustered index i1 on employee (ssnum);nonclustered index i2 on employee (name);nonclustered index i3 on employee (dept);

clustered index i4 on student (ssnum);nonclustered index i5 on student (name);

clustered index i6 on techdept (dept);

– 100000 rows in employee, 100000 students, 10 departments; Cold buffer– Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives

(10000RPM), Windows 2000.

Page 134: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

138

Queries – View on Join

View Techlocation:create view techlocation as select ssnum, techdept.dept, location from employee, techdept where employee.dept = techdept.dept;

Queries:– Original:

select dept from techlocation where ssnum = ?;

– Rewritten:select dept from employee where ssnum = ?;

Page 135: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

139

Query Rewriting - Views

• All systems expand the selection on a view into a join

• The difference between a plain selection and a join (on a primary key-foreign key) followed by a projection is greater on SQL Server than on Oracle and DB2 v7.1.

0

10

20

30

40

50

60

70

80

view

Th

rou

gh

pu

t im

pro

vem

ent

per

cen

t

SQLServer 2000

Oracle 8i

DB2 V7.1

Page 136: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

140

Queries – Correlated SubqueriesQueries:

– Original:select ssnum from employee e1 where salary = (select max(salary) from employee e2 where e2.dept = e1.dept);

– Rewritten:select max(salary) as bigsalary, deptinto TEMP from employee group by dept;

select ssnumfrom employee, TEMPwhere salary = bigsalaryand employee.dept = temp.dept;

Page 137: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

141

Query Rewriting – Correlated Subqueries

• SQL Server 2000 does a good job at handling the correlated subqueries (a hash join is used as opposed to a nested loop between query blocks)– The techniques implemented

in SQL Server 2000 are described in “Orthogonal Optimization of Subqueries and Aggregates” by C.Galindo-Legaria and M.Joshi, SIGMOD 2001.

-10

0

10

20

30

40

50

60

70

80

correlated subquery

Th

rou

gh

pu

t im

pro

vem

ent p

erce

nt

SQLServer 2000

Oracle 8i

DB2 V7.1

> 10000> 1000

Page 138: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

142

Eliminate unneeded DISTINCTs

• Query: Find employees who work in the information systems department. There should be no duplicates.

SELECT distinct ssnumFROM employeeWHERE dept = ‘information systems’

• DISTINCT is unnecessary, since ssnum is a key of employee so certainly is a key of a subset of employee.

Page 139: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

143

Eliminate unneeded DISTINCTs

• Query: Find social security numbers of employees in the technical departments. There should be no duplicates.

SELECT DISTINCT ssnumFROM employee, techWHERE employee.dept = tech.dept

• Is DISTINCT needed?

Page 140: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

144

Distinct Unnecessary Here Too

• Since dept is a key of the tech table, each employee record will join with at most one record in tech.

• Because ssnum is a key for employee, distinct is unnecessary.

Page 141: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

145

Reaching

• The relationship among DISTINCT, keys and joins can be generalized:– Call a table T privileged if the fields returned by the

select contain a key of T

– Let R be an unprivileged table. Suppose that R is joined on equality by its key field to some other table S, then we say R reaches S.

– Now, define reaches to be transitive. So, if R1 reaches R2 and R2 reaches R3 then say that R1 reaches R3.

Page 142: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

146

Reaches: Main Theorem

• There will be no duplicates among the records returned by a selection, even in the absence of DISTINCT if one of the two following conditions hold:– Every table mentioned in the FROM clause is

privileged– Every unprivileged table reaches at least one

privileged table.

Page 143: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

147

Reaches: Proof Sketch

• If every relation is privileged then there are no duplicates– The keys of those relations are in the from clause

• Suppose some relation T is not privileged but reaches at least one privileged one, say R. Then the qualifications linking T with R ensure that each distinct combination of privileged records is joined with at most one record of T.

Page 144: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

148

Reaches: Example 1

• The same employee record may match several tech records (because manager is not a key of tech), so the ssnum of that employee record may appear several times.

• Tech does not reach the privileged relation employee.

SELECT ssnumFROM employee, techWHERE employee.manager = tech.manager

Page 145: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

149

Reaches: Example 2

• Each repetition of a given ssnum vlaue would be accompanied by a new tech.dept since tech.dept is a key of tech

• Both relations are privileged.

SELECT ssnum, tech.deptFROM employee, techWHERE employee.manager = tech.manager

Page 146: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

150

Reaches: Example 3

• Student is priviledged• Employee does not reach student (name is not a

key of employee)• DISTINCT is needed to avoid duplicates.

SELECT student.ssnumFROM student, employee, techWHERE student.name = employee.name AND employee.dept = tech.dept;

Page 147: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

151

Aggregate Maintenance -- data

Settings:orders( ordernum, itemnum, quantity, storeid, vendorid );create clustered index i_order on orders(itemnum);

store( storeid, name );

item(itemnum, price);create clustered index i_item on item(itemnum);

vendorOutstanding( vendorid, amount);

storeOutstanding( storeid, amount);

– 1000000 orders, 10000 stores, 400000 items; Cold buffer– Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives

(10000RPM), Windows 2000.

Page 148: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

152

Aggregate Maintenance -- triggers

Triggers for Aggregate Maintenancecreate trigger updateVendorOutstanding on orders for insert asupdate vendorOutstandingset amount =

(select vendorOutstanding.amount+sum(inserted.quantity*item.price)from inserted,itemwhere inserted.itemnum = item.itemnum)

where vendorid = (select vendorid from inserted) ;

create trigger updateStoreOutstanding on orders for insert asupdate storeOutstandingset amount =

(select storeOutstanding.amount+sum(inserted.quantity*item.price) from inserted,item where inserted.itemnum = item.itemnum)

where storeid = (select storeid from inserted)

Page 149: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

153

Aggregate Maintenance -- transactions

Concurrent Transactions:– Insertions

insert into orders values (1000350,7825,562,'xxxxxx6944','vendor4');

– Queries (first without, then with redundant tables)select orders.vendor, sum(orders.quantity*item.price)from orders,itemwhere orders.itemnum = item.itemnumgroup by orders.vendorid;

vs. select * from vendorOutstanding;

select store.storeid, sum(orders.quantity*item.price)from orders,item, storewhere orders.itemnum = item.itemnum

and orders.storename = store.namegroup by store.storeid;

vs. select * from storeOutstanding;

Page 150: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

154

Aggregate Maintenance

• SQLServer 2000 on Windows 2000

• Using triggers for view maintenance

• If queries frequent or important, then aggregate maintenance is good.

pect. of gain with aggregate maintenance

21900

31900

- 62.2-5000

0

5000

10000

15000

20000

25000

30000

35000

insert vendor total store total

Page 151: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

155

Superlinearity -- data

Settings:sales( id, itemid, customerid, storeid, amount, quantity);item (itemid);customer (customerid);store (storeid);

A sale is successful if all foreign keys are present.

successfulsales(id, itemid, customerid, storeid, amount, quantity);

unsuccessfulsales(id, itemid, customerid, storeid, amount, quantity);

tempsales( id, itemid, customerid, storeid, amount,quantity);

Page 152: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

156

Superlinearity -- indexes

Settings (non-clustering, dense indexes):index s1 on item(itemid);

index s2 on customer(customerid);

index s3 on store(storeid);

index succ on successfulsales(id);

– 1000000 sales, 400000 customers, 40000 items, 1000 stores

– Cold buffer

– Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.

Page 153: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

157

Superlinearity -- queries

Queries:– Insert/create indexdelete

insert into successfulsalesselect sales.id, sales.itemid, sales.customerid,

sales.storeid, sales.amount, sales.quantityfrom sales, item, customer, storewhere sales.itemid = item.itemidand sales.customerid = customer.customeridand sales.storeid = store.storeid;

insert into unsuccessfulsales select * from sales;godelete from unsuccessfulsaleswhere id in (select id from successfulsales)

Page 154: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

158

Superlinearity -- batch queries

Queries:– Small batches

DECLARE @Nlow INT;DECLARE @Nhigh INT;DECLARE @INCR INT;set @INCR = 100000set @NLow = 0set @Nhigh = @INCRWHILE (@NLow <= 500000)BEGIN

insert into tempsales select * from sales where id between @NLow and @Nhighset @Nlow = @Nlow + @INCRset @Nhigh = @Nhigh + @INCRdelete from tempsales

where id in (select id from successfulsales);insert into unsuccessfulsales select * from tempsales;delete from tempsales;

END

Page 155: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

159

Superlinearity -- outer join

Queries:– outerjoin

insert into successfulsalesselect sales.id, item.itemid, customer.customerid, store.storeid, sales.amount,

sales.quantityfrom ((sales left outer join item on sales.itemid = item.itemid)left outer join customer on sales.customerid = customer.customerid)left outer join store on sales.storeid = store.storeid;

insert into unsuccessfulsalesselect *from successfulsaleswhere itemid is nullor customerid is nullor storeid is null;godelete from successfulsaleswhere itemid is nullor customerid is nullor storeid is null

Page 156: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

160

Circumventing Superlinearity

• SQL Server 2000• Outer join achieves the

best response time.• Small batches do not

help because overhead of crossing the application interface is higher than the benefit of joining with smaller tables.

0

20

40

60

80

100

120

140

small large

Re

spo

nse

Tim

e (s

ec)

insert/delete no indexinsert/delete indexedsmall batchesouter join

Page 157: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

161

Tuning the Application Interface

• 4GL– Power++, Visual basic

• Programming language + Call Level Interface– ODBC: Open DataBase Connectivity– JDBC: Java based API– OCI (C++/Oracle), CLI (C++/ DB2), Perl/DBI

• In the following experiments, the client program is located on the database server site. Overhead is due to crossing the application interface.

Page 158: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

162

Looping can hurt -- data

Settings:lineitem ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY,

L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE , L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS , L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT , L_SHIPMODE , L_COMMENT );

– 600 000 rows; warm buffer.

– Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.

Page 159: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

163

Looping can hurt -- queries• Queries:

– No loop:sqlStmt = “select * from lineitem where l_partkey < 200;”odbc->prepareStmt(sqlStmt);odbc->execPrepared(sqlStmt);

– Loop:sqlStmt = “select * from lineitem where l_partkey = ?;”odbc->prepareStmt(sqlStmt);for (int i=1; i<200; i++){

odbc->bindParameter(1, SQL_INTEGER, i);odbc->execPrepared(sqlStmt);

}

Page 160: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

164

Looping can Hurt

• SQL Server 2000 on Windows 2000

• Crossing the application interface has a significant impact on performance.

• Why would a programmer use a loop instead of relying on set-oriented operations: object-orientation?

0

100

200

300

400

500

600

loop no loop

thro

ug

hp

ut

(rec

ord

s/se

c)

Page 161: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

165

Cursors are Death -- data

Settings:employees(ssnum, name, lat, long, hundreds1,

hundreds2);

– 100000 rows ; Cold buffer

– Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.

Page 162: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

166

Cursors are Death -- queries

Queries:– No cursor

select * from employees;

– Cursor

DECLARE d_cursor CURSOR FOR select * from employees;

OPEN d_cursorwhile (@@FETCH_STATUS = 0)

BEGIN

FETCH NEXT from d_cursorEND

CLOSE d_cursor

go

Page 163: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

167

Cursors are Death

• SQL Server 2000 on Windows 2000

• Response time is a few seconds with a SQL query and more than an hour iterating over a cursor.

0

1000

2000

3000

4000

5000

cursor SQL

Th

rou

gh

pu

t (r

eco

rds/

sec)

Page 164: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

168

Retrieve Needed Columns Only - data

Settings:lineitem ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY,

L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE , L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS , L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT , L_SHIPMODE , L_COMMENT );

create index i_nc_lineitem on lineitem (l_orderkey, l_partkey, l_suppkey, l_shipdate, l_commitdate);

– 600 000 rows; warm buffer.– Lineitem records are ~ 10 bytes long– Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives

(10000RPM), Windows 2000.

Page 165: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

169

Retrieve Needed Columns Only - queries

Queries:– All

Select * from lineitem;

– Covered subsetSelect l_orderkey, l_partkey, l_suppkey, l_shipdate, l_commitdate from lineitem;

Page 166: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

170

Retrieve Needed Columns Only

• Avoid transferring unnecessary data

• May enable use of a covering index.

• In the experiment the subset contains ¼ of the attributes.– Reducing the amount of

data that crosses the application interface yields significant performance improvement.

0

0.25

0.5

0.75

1

1.25

1.5

1.75

no index index

Th

rou

gh

pu

t (q

uer

ies/

mse

c)

all

covered subset

Experiment performed on Oracle8iEE on Windows 2000.

Page 167: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

171

Bulk Loading Data

Settings:lineitem ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY,

L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE , L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS , L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT , L_SHIPMODE , L_COMMENT );

– Initially the table is empty; 600 000 rows to be inserted (138Mb)

– Table sits one disk. No constraint, index is defined.

– Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.

Page 168: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

172

Bulk Loading Queries

Oracle 8isqlldr directpath=true control=load_lineitem.ctl data=E:\Data\lineitem.tbl

load data infile "lineitem.tbl"into table LINEITEM appendfields terminated by '|' (

L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE DATE "YYYY-MM-DD", L_COMMITDATE DATE "YYYY-MM-DD", L_RECEIPTDATE DATE "YYYY-MM-DD", L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT

)

Page 169: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

173

Direct Path

• Direct path loading bypasses the query engine and the storage manager. It is orders of magnitude faster than for conventional bulk load (commit every 100 records) and inserts (commit for each record).

650

10000

20000

30000

40000

50000

conventional direct path insert

Th

rou

gh

pu

t (r

ec/s

ec)

Experiment performed on Oracle8iEE on Windows 2000.

Page 170: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

174

Batch Size

• Throughput increases steadily when the batch size increases to 100000 records.Throughput remains constant afterwards.

• Trade-off between performance and amount of data that has to be reloaded in case of problem.

0

1000

2000

3000

4000

5000

0 100000 200000 300000 400000 500000 600000

Th

rou

gh

pu

t (r

eco

rds/

sec)

Experiment performed on SQL Server 2000 on Windows 2000.

Page 171: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

175

Tuning E-Commerce Applications

Database-backed web-sites:– Online shops– Shop comparison portals– MS TerraServer

Page 172: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

176

E-commerce Application Architecture

Clients Web servers Application servers Database server

Web

cac

heW

eb c

ache

Web

cac

he

DB

cac

heD

B c

ache

Page 173: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

177

E-commerce Application Workload

• Touristic searching (frequent, cached)– Access the top few pages. Pages may be personalized.

Data may be out-of-date.

• Category searching (frequent, partly cached and need for timeliness guarantees)– Down some hierarchy, e.g., men’s clothing.

• Keyword searching (frequent, uncached, need for timeliness guarantees)

• Shopping cart interactions (rare, but transactional)• Electronic purchasing (rare, but transactional)

Page 174: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

178

Design Issues• Need to keep historic information

– Electronic payment acknowledgements get lost.

• Preparation for variable load– Regular patterns of web site accesses during the day, and

within a week.

• Possibility of disconnections– State information transmitted to the client (cookies)

• Special consideration for low bandwidth• Schema evolution

– Representing e-commerce data as attribute-value pairs (IBM Websphere)

Page 175: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

179

Caching• Web cache:

– Static web pages– Caching fragments of dynamically created web pages

• Database cache (Oracle9iAS, TimesTen’s FrontTier)– Materialized views to represent cached data. – Queries are executed either using the database cache or

the database server. Updates are propagated to keep the cache(s) consistent.

– Note to vendors: It would be good to have queries distributed between cache and server.

Page 176: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

180

Ecommerce -- setting

Settings:shoppingcart( shopperid, itemid, price, qty);

– 500000 rows; warm buffer

– Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.

Page 177: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

181

Ecommerce -- transactions

Concurrent Transactions:– Mix

insert into shoppingcart values (107999,914,870,214);update shoppingcart set Qty = 10 where shopperid = 95047 and itemid = 88636;

delete from shoppingcart where shopperid = 86123 and itemid = 8321;

select shopperid, itemid, qty, price from shoppingcart where shopperid = ?;

– Queries Onlyselect shopperid, itemid, qty, price from shoppingcart where shopperid = ?;

Page 178: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

182

Connection Pooling (no refusals)• Each thread establishes a

connection and performs 5 insert statements.

• If a connection cannot be established the thread waits 15 secs before trying again.

• The number of connection is limited to 60 on the database server.

• Using connection pooling, the requests are queued and serviced when possible. There are no refused connections.

0

50

100

150

200

250

20 40 60 80 100 120 140

Num be r of clie nt thre ads

Res

po

nse

Tim

e

connection pooling

simple connections

Experiment performed on Oracle8i on Windows 2000

Page 179: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

183

Indexing

• Using a clustered index on shopperid in the shopping cart provides:– Query speed-up

– Update/Deletion speed-up

Experiment performed on SQL Server 2000 on Windows 2000

0

10

20

30

40

50

60

70

80

90

Mix Queries Only

Th

rou

gh

pu

t (q

uer

ies/

sec) no index

clustered index

Page 180: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

184

Capacity Planning

• Arrival Rate– A1 is given as an assumption– A2 = (0.4 A1) + (0.5 A2)– A3 = 0.1 A2

• Service Time (S)– S1, S2, S3 are measured

• Utilization– U = A x S

• Response Time– R = U/(A(1-U)) = S/(1-U)(assuming Poisson arrivals)

Entry (S1) 0.4

Search (S2)

Checkout (S3)

0.5

0.1

Getting the demand assumptions rightis what makes capacity planning hard

Page 181: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

185

How to Handle Multiple Servers

• Suppose one has n servers for some task that requires S time for a single server to perform.

• The perfect parallelism model is that it is as if one has a single server that is n times as fast.

• However, this overstates the advantage of parallelism, because even if there were no waiting, single tasks require S time.

Page 182: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

186

Rough Estimate for Multiple Servers

• There are two components to response time: waiting time + service time.

• In the parallel setting, the service time is still S.

• The waiting time however can be well estimated by a server that is n times as fast.

Page 183: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

187

Approximating waiting time for n parallel servers.

• Recall: R = U/(A(1-U)) = S/(1-U)• On an n-times faster server, service time is

divided by n, so the single processor utilization U is also divided by n. So we would get: Rideal = (S/n)/(1 – (U/n)).

• That Rideal = serviceideal + waitideal.• So waitideal = Rideal – S/n• Our assumption: waitideal ~ wait for n

processors.

Page 184: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

188

Approximating response time for n parallel servers

• Waiting time for n parallel processors ~ (S/n)/(1 – (U/n)) – S/n = (S/n) ( 1/(1-(U/n)) – 1) =(S/(n(1 – U/n)))(U/n) = (S/(n – U))(U/n)

• So, response time for n parallel processors is above waiting time + S.

Page 185: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

189

Example

• A = 8 per second.• S = 0.1 second.• U = 0.8.• Single server response time = S/(1-U) = 0.1/0.2 =

0.5 seconds. • If we have 2 servers, then we estimate waiting time

to be (0.1/(2-0.8))(0.4) = 0.04/1.2 = 0.033. So the response time is 0.133.

• For a 2-times faster server, S = 0.05, U = 0.4, so response time is 0.05/0.6 = 0.0833

Page 186: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

190

Example -- continued

• A = 8 per second.• S = 0.1 second.• U = 0.8.• If we have 4 servers, then we estimate waiting

time to be (S/(n – U))(U/n) = 0.1/(3.2) * (0.8/4) = 0.02/3.2 = 0.00625 So response time is 0.10625.

Page 187: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

191

Datawarehouse Tuning

• Aggregate (strategic) targeting:– Aggregates flow up from a wide selection of

data, and then – Targeted decisions flow down

• Examples:– Riding the wave of clothing fads– Tracking delays for frequent-flyer customers

Page 188: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

192

Data Warehouse Workload

• Broad– Aggregate queries over ranges of values, e.g., find

the total sales by region and quarter.

• Deep – Queries that require precise individualized

information, e.g., which frequent flyers have been delayed several times in the last month?

• Dynamic (vs. Static)– Queries that require up-to-date information, e.g.

which nodes have the highest traffic now?

Page 189: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

193

Tuning Knobs

• Indexes

• Materialized views

• Approximation

Page 190: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

194

Bitmaps -- data

Settings:lineitem ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY, L_LINENUMBER,

L_QUANTITY, L_EXTENDEDPRICE , L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS , L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT , L_SHIPMODE , L_COMMENT );

create bitmap index b_lin_2 on lineitem(l_returnflag);

create bitmap index b_lin_3 on lineitem(l_linestatus);

create bitmap index b_lin_4 on lineitem(l_linenumber);

– 100000 rows ; cold buffer

– Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.

Page 191: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

195

Bitmaps -- queries

Queries:– 1 attribute

select count(*) from lineitem where l_returnflag = 'N';

– 2 attributesselect count(*) from lineitem where l_returnflag = 'N' and l_linenumber > 3;

– 3 attributesselect count(*) from lineitem where l_returnflag =

'N' and l_linenumber > 3 and l_linestatus = 'F';

Page 192: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

196

Bitmaps

• Order of magnitude improvement compared to scan, because summing up the 1s is very fast.

• Bitmaps are best suited for multiple conditions on several attributes, each having a low selectivity.

ANR

l_returnflagOF

l_linestatus

0

2

4

6

8

10

12

1 2 3

Th

rou

gh

pu

t (Q

uer

ies/

sec)

linear scan

bitmap

Page 193: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

197

Vertical Partitioning Revisited

In the same settings that bit vectors are useful – lots of attributes, unpredictable combinations queried – so is vertical partitioning more attractive.

If a table has 200 attributes, no reason to retrieve 200 field rows if only three attributes are needed.

Sybase IQ, KDB, Vertica all use column-oriented tables.

Page 194: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

198

Multidimensional Indexes -- data

Settings:create table spatial_facts( a1 int, a2 int, a3 int, a4 int, a5 int, a6 int, a7 int, a8 int, a9 int, a10 int, geom_a3_a7 mdsys.sdo_geometry );create index r_spatialfacts on spatial_facts(geom_a3_a7) indextype is mdsys.spatial_index;create bitmap index b2_spatialfacts on

spatial_facts(a3,a7);

– 500000 rows ; cold buffer– Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives

(10000RPM), Windows 2000.

Page 195: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

199

Multidimensional Indexes -- queries

Queries:– Point Queries

select count(*) from fact where a3 = 694014 and a7 = 928878;

select count(*) from spatial_facts where SDO_RELATE(geom_a3_a7, MDSYS.SDO_GEOMETRY(2001, NULL, MDSYS.SDO_POINT_TYPE(694014,928878, NULL), NULL, NULL), 'mask=equal querytype=WINDOW') = 'TRUE';

– Range Queriesselect count(*) from spatial_facts where SDO_RELATE(geom_a3_a7,

mdsys.sdo_geometry(2003,NULL,NULL, mdsys.sdo_elem_info_array(1,1003,3),mdsys.sdo_ordinate_array(10,800000,1000000,1000000)), 'mask=inside querytype=WINDOW') = 'TRUE';

select count(*) from spatial_facts where a3 > 10 and a3 < 1000000 and a7 > 800000 and a7 < 1000000;

Page 196: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

200

Multidimensional Indexes

• Oracle 8i on Windows 2000

• Spatial Extension:– 2-dimensional data– Spatial functions used

in the query

• R-tree does not perform well because of the overhead of spatial extension.

0

2

4

6

8

10

12

14

point query range query

Res

po

nse

Tim

e (s

ec) bitmap - two

attributes

r-tree

Page 197: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

201

Multidimensional Indexes

R-Tree

SELECT STATEMENT

SORT AGGREGATE

TABLE ACCESS BY INDEX ROWID SPATIAL_FACTS

DOMAIN INDEX R_SPATIALFACTS

Bitmaps

SELECT STATEMENT

SORT AGGREGATE

BITMAP CONVERSION COUNT

BITMAP AND

BITMAP INDEX SINGLE VALUE B_FACT7

BITMAP INDEX SINGLE VALUE B_FACT3

Page 198: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

202

Materialized Views -- data

Settings:orders( ordernum, itemnum, quantity, storeid, vendor );

create clustered index i_order on orders(itemnum);

store( storeid, name );

item(itemnum, price);

create clustered index i_item on item(itemnum);

– 1000000 orders, 10000 stores, 400000 items; Cold buffer

– Oracle 9i

– Pentium III (1 GHz, 256 Kb), 1Gb RAM, Adapter 39160 with 2 channels, 3x18Gb drives (10000RPM), Linux Debian 2.4.

Page 199: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

203

Materialized Views -- data

Settings:create materialized view vendorOutstanding

build immediate

refresh complete

enable query rewrite

as

select orders.vendor, sum(orders.quantity*item.price)

from orders,item

where orders.itemnum = item.itemnum

group by orders.vendor;

Page 200: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

204

Materialized Views -- transactions

Concurrent Transactions:– Insertions

insert into orders values (1000350,7825,562,'xxxxxx6944','vendor4');

– Queriesselect orders.vendor,

sum(orders.quantity*item.price)from orders,itemwhere orders.itemnum = item.itemnumgroup by orders.vendor;

select * from vendorOutstanding;

Page 201: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

205

Materialized Views

• Graph:– Oracle9i on Linux

– Total sale by vendor is materialized

• Trade-off between query speed-up and view maintenance:– The impact of incremental

maintenance on performance is significant.

– Rebuild maintenance achieves a good throughput.

– A static data warehouse offers a good trade-off.

0

400

800

1200

1600

MaterializedView (fast on

commit)

MaterializedView (complete

on demand)

NoMaterialized

ViewThro

ughp

ut (s

tate

men

ts/s

ec)

0

2

4

6

8

10

12

14

Materialized View No Materialized View

Res

pons

e Ti

me

(sec

)

Page 202: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

206

Materialized View Maintenance

• Problem when large number of views to maintain.

• The order in which views are maintained is important:– A view can be computed

from an existing view instead of being recomputed from the base relations (total per region can be computed from total per nation).

• Let the views and base tables be nodes v_i

• Let there be an edge from v_1 to v_2 if it possible to compute the view v_2 from v_1. Associate the cost of computing v_2 from v_1 to this edge.

• Compute all pairs shortest path where the start nodes are the set of base tables.

• The result is an acyclic graph A. Take a topological sort of A and let that be the order of view construction.

Page 203: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

207

Materialized View Example

• Detail(storeid, item, qty, price, date)

• Materialized view1(storeid, category, qty, month)

• Materializedview2(city, category, qty, month)

• Materializedview3(storeid, category, qty, year)

Page 204: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

208

Approximations -- data

Settings:– TPC-H schema– Approximations

insert into approxlineitem

select top 6000 *from lineitemwhere l_linenumber = 4;

insert into approxordersselect O_ORDERKEY, O_CUSTKEY, O_ORDERSTATUS, O_TOTALPRICE, O_ORDERDATE, O_ORDERPRIORITY, O_CLERK, O_SHIPPRIORITY, O_COMMENT

from orders, approxlineitemwhere o_orderkey = l_orderkey;

Page 205: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

209

Approximations -- queriesinsert into approxsupplierselect distinct S_SUPPKEY,

S_NAME ,S_ADDRESS,S_NATIONKEY,S_PHONE,S_ACCTBAL,S_COMMENT

from approxlineitem, supplierwhere s_suppkey = l_suppkey;

insert into approxpartselect distinct P_PARTKEY,

P_NAME , P_MFGR , P_BRAND , P_TYPE , P_SIZE ,P_CONTAINER , P_RETAILPRICE , P_COMMENT

from approxlineitem, partwhere p_partkey = l_partkey;

insert into approxpartsuppselect distinct PS_PARTKEY,

PS_SUPPKEY,PS_AVAILQTY,PS_SUPPLYCOST,PS_COMMENT

from partsupp, approxpart, approxsupplierwhere ps_partkey = p_partkey and

ps_suppkey = s_suppkey;

insert into approxcustomerselect distinct C_CUSTKEY,

C_NAME ,C_ADDRESS,C_NATIONKEY,C_PHONE ,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT

from customer, approxorderswhere o_custkey = c_custkey;insert into approxregion select * from

region;insert into approxnation select * from

nation;

Page 206: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

210

Approximations -- more queries

Queries:– Single table query on lineitemselect l_returnflag, l_linestatus, sum(l_quantity) as sum_qty,

sum(l_extendedprice) as sum_base_price,

sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,

sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,

avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order

from lineitem

where datediff(day, l_shipdate, '1998-12-01') <= '120'

group by l_returnflag, l_linestatus

order by l_returnflag, l_linestatus;

Page 207: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

211

Approximations -- still more

Queries:– 6-way joinselect n_name, avg(l_extendedprice * (1 - l_discount)) as

revenuefrom customer, orders, lineitem, supplier, nation, regionwhere c_custkey = o_custkey

and l_orderkey = o_orderkeyand l_suppkey = s_suppkeyand c_nationkey = s_nationkeyand s_nationkey = n_nationkeyand n_regionkey = r_regionkeyand r_name = 'AFRICA'and o_orderdate >= '1993-01-01'and datediff(year, o_orderdate,'1993-01-01') < 1

group by n_nameorder by revenue desc;

Page 208: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

212

Approximation accuracy

• Good approximation for query Q1 on lineitem

• The aggregated values obtained on a query with a 6-way join are significantly different from the actual values -- for some applications may still be good enough.

TPC-H Query Q5: 6 way join

-40

-20

0

20

40

60

1 2 3 4 5

Groups returned in the query

Ap

pro

xim

ated

ag

gre

gat

ed v

alu

es

(% o

f B

ase

Ag

gre

gat

e V

alu

e)

1% sample

10% sample

TPC-H Query Q1: lineitem

-40

-20

0

20

40

60

1 2 3 4 5 6 7 8

Aggregate values

Ap

pro

xim

ated

A

gg

reg

ate

Val

ues

(%

of

Bas

e A

gg

reg

ate

Val

ue)

1% sample

10% sample

Page 209: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

213

Approximation Speedup

• Aqua approximation on the TPC-H schema– 1% and 10% lineitem

sample propagated.

• The query speed-up obtained with approximated relations is significant.

00.5

11.5

22.5

33.5

4

Q1 Q5

TPC-H Queries

Res

po

nse

Tim

e (s

ec)

base relations

1 % sample

10% sample

Page 210: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

214

Good Way to Find a Random Sample

• If you know how big your data is (i.e. it is not streaming), then the best way is to include each record with some probability p drawn independently and randomly.

• If you don’t know how big your data is, hash each record to a value in some range. Choose the lowest k.

• Can use a priority queue having k elements. As each new element is inserted, perform deletemax.

Page 211: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

215

Approximate Counts of Every Item

• Count-min idea.• Imagine that you had a perfect hash function H

(i.e. no collisions). Then, you could simply increment location H(x) every time x appeared.

• You don’t. So use several hash function H1, H2, … Hj.

• When you see x, increment locs H1(x), H2(x), …• To estimate the count of x, look at min of those

locs.

Page 212: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

216

Keep Track of k Most Frequent Items

• Take the first k items.• Each time you see an item x, if x is among the top

k, increment the count of x. • Otherwise, replace the item y having the lowest

count among the top k by x and increment the count (formerly for y) and ascribe that count to x.

• Works (approximately) if items are presented as a random permutation. Remarkably robust.

Page 213: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

217

Maintaining “typical” clusters for streaming data

• Find a sample using the sampling technique from before.

• Form a cluster off-line.• As points come in, associate them with nearest

centroid.• Let the initial fraction with centroid i be p(i).• If that fraction changes a lot, then maybe the

distribution has changed.

Page 214: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

218

Approximating Count Distinct

• Suppose you have one or more columns with duplicates. You want to know how many distinct values there are.

• Imagine that you could hash into (0,1). (Simulate this by a very large integer range).

• Hashing will put duplicates in the same location and, if done right, will distribute values uniformly.

• Draw the dots of hash values between 0 and 1.

Page 215: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

219

Approximating Count Distinct

• Suppose you hash every item and take the minimum 1000 hash values.

• Suppose that highest hash value of the 1000 is f (between 0 and 1).

• Intuitively, if f is small then there are many distinct values.

• What is a good approximation of the number of distinct values?

Page 216: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

220

Tuning Distributed Applications

• Queries across multiple databases– Federated Datawarehouse– IBM’s DataJoiner now integrated at DB2 v7.2

• The source should perform as much work as possible and return as few data as possible for processing at the federated server

• Processing data across multiple databases

Page 217: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

221

A puzzle

• Two databases X and Y– X records inventory data to be used for restocking

– Y contains delivery data about shipments

• Improve the speed of shipping by sharing data– Certain data of X should be postprocessed on Y shortly

after it enters X

– You want to avoid losing data from X and you want to avoid double-processing data on Y, even in the face of failures.

Page 218: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

222

Two-Phase Commit

X

Y

commit

commit

Source(coordinator& participant)

Destination(participant)

+ Commits are coordinated between the source and the destination

- If one participant fails then blocking can occur

- Not all db systems support prepare-to-commit interface

Page 219: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

223

Replication Server

+ Destination within a few seconds of being up-to-date

+ Decision support queries can be asked on destination db

- Administrator is needed when network connection breaks!

X

Y

Source

Destination

Page 220: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

224

Staging Tables

+ No specific mechanism is necessary at source or destination

- Coordination of transactions on X and Y

X

Y

Source

Destination

Staging table

Page 221: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

225

Issues in Solution

• A single thread can invoke a transaction on X, Y or both, but the thread may fail in the middle. A new thread will regain state by looking at the database.

• We want to delete data from the staging tables when we are finished.

• The tuples in the staging tables will represent an operation as well as data.

Page 222: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

226

States

• A tuple (operation plus data) will start in state unprocessed, then state processed, and then deleted.

• The same transaction that processes the tuple on the destination site also changes its state. This is important so each tuple’s operation is done exactly once.

Page 223: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

227

Database X Database Y

Table S Table I

YunprocessedYunprocessed

Yunprocessed

M1

Read from source tableM2

Write to destination

Staging Tables

Table S

YunprocessedYunprocessed

Yunprocessed

Database Y

Table I

unprocessedunprocessedunprocessed

Database X

STEP 1 STEP 2

Page 224: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

228

Staging Tables

Database X Database Y

Table S Table I

YunprocessedYunprocessed

Yunprocessed

Table S

YunprocessedYunprocessed

Yunprocessed

Database Y

Table I

processedunprocessedunprocessed

Database X

STEP 3 STEP 4

M3

Update on destination

processedunprocessedunprocessed

M4

Delete source; then dest YXact: Update source siteXact:update destination site

Page 225: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

229

What to Look For

• Transactions are atomic but threads, remember, are not. So a replacement thread will look to the database for information.

• In which order should the final step do its transactions? Should it delete the unprocessed tuple from X as the first transaction or as the second?

Page 226: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Main-memory Databases

Database Tuning – Spring 2015

Philippe Bonnet – [email protected]

Page 227: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Agenda

1. Memory hierarchy

2. Cache-optimized data structures

3. Multi-version Concurrency Control

4. System Examplesa) New Architectures

b) SQL Server 2014 (Hekaton)

Page 228: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Memory Hierarchy

Page 229: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Memory Hierarchy

Source: http://www.3dincites.com/2015/01/iedm-2014-3d-short-course-highlights-3d-memory-cubes-system-design/

Page 230: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Main-memory Databases

• Memory has finite capacity: How to deal with space limitation?

• Memory is volatile: How to deal with persistence?

• Memory is fast: How can a DBMS take advantage?

Page 231: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Dealing with capacity

On Azure, G-series instances go up to 32 cores, 448 GiB of RAM, and 6,596 GB of local SSD storage

Page 232: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Dealing with Capacity

1. Many databases whose size is 10s or 100s GB fit in RAM on a single hosta) The storage engine of SQL Server 2014 makes it possible to

split a database into a collection of tables that are memory-optimized and a collection of tables that are disk-optimized (data resides on disk and is manipulated through the traditional buffer pool).

2. For databases that are larger than RAM, two options:a) All data in RAM on a cluster of hosts (e.g., VoltDB)

b) Separate hot data in RAM from cold data on SSD on the same host (e.g. SQL Server 2014)

Page 233: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Dealing with Persistence

1. Check-in/Check-out– Memory optimized tables are loaded in memory when the

database management system starts up

– They are stored on disk when the database management system shuts down

2. Swap in/Swap out– Tables are swapped in when data gets hot and swapped out

when data gets cold

What happens if data is modified? ...

Page 234: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Dealing with Persistence

Defining the database state:•2nd storage

– Traditional log-based approach– Write-ahead logging protocol (remember that?)

•RAM and 2nd storage on a single host (SQL Server)– Traditional log-based approach– No need for write-ahead logging (see log tuning)

•RAM on multiple hosts (VoltDB)– Recovery is needed when a host fails– Checkpointing to save consistent states– Command log to replay history and recover failed host (see log

tuning)

Page 235: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Dealing with Persistence

• Restrict (greatly) the modifications that are allowed on memory-optimized tables:– No insertions/deletions– No updates– No new index– No concurrent modifications, only short

transactions

Page 236: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Leveraging RAM

• Using actual pointers (that refer to memory addresses), rather than logical IDs and offsets within buffer pool pages

• Variable-size data structures, rather than fixed-size pages– Using an index structure to represent a

collection of tuples• SQL Server 2014 relies on hash-index

Page 237: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Leveraging RAM

• Traditional focus on RAM/Disk transfers

• With main-memory databases, the focus is on cache/RAM transfers1. Maximize cache utilization once data is in cache

• Column-based representation (see storage tuning)• Cache conscious indexes (see index tuning)• Staged query processing

2. Avoid concurrent accesses on cache lines• Introduce optimistic concurrency control

3. Trade more instructions for less cache transfers• Use compression

Page 238: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Avoiding Concurrent Accesses on Cache Lines

Two approaches:

1.Enforce serial transaction execution (VoltDB)– No need for concurrency control

2.Explicit conflict declaration/discovery when transactions start

3.Conflict discovery when transactions commita)Optimistic concurrency control

b)Multi-Version Concurrency Control (SQL Server)

Page 239: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

243

Troubleshooting Techniques(*)

• Extraction and Analysis of Performance Indicators– Consumer-producer chain framework– Tools

• Query plan monitors• Performance monitors• Event monitors

(*) From Alberto Lerner’s chapter

Page 240: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

244

A Consumer-Producer Chain of a DBMS’s Resources

High LevelConsumers

IntermediateResources/Consumers

PrimaryResources

PARSEROPTIMIZERPARSER

OPTIMIZER

EXECUTIONSUBSYSTEM

EXECUTIONSUBSYSTEM DISK

SYBSYSTEM DISK

SYBSYSTEM

CACHEMANAGERCACHE

MANAGER

LOGGINGSUBSYSTEMLOGGING

SUBSYSTEM

LOCKINGSUBSYSTEM LOCKING

SUBSYSTEM

NETWORKDISK/

CONTROLLERCPUMEMORY

sql commands

probingspotsfor

indicators

Page 241: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

245

Recurrent Patterns of Problems

• An overloading high-level consumer

• A poorly parameterized subsystem

• An overloaded primary resource

Effects are not always felt first where the cause is!

Page 242: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

246

A Systematic Approach to Monitoring

• Question 1: Are critical queries being served in the most efficient manner?

• Question 2: Are subsystems making optimal use of resources?

• Question 3: Are there enough primary resources available?

Extract indicators to answer the following questions

Page 243: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

247

Investigating High Level Consumers

• Answer question 1: “Are critical queries being served in the most efficient manner?”

1. Identify the critical queries

2. Analyze their access plans

3. Profile their execution

Page 244: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

248

Identifying Critical Queries

Critical queries are usually those that:

• Take a long time

• Are frequently executed

Often, a user complaint will tip us off.

Page 245: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

249

Event Monitors to Identify Critical Queries

• If no user complains...• Capture usage measurements at specific

events (e.g., end of each query) and then sort by usage

• Less overhead than other type of tools because indicators are usually by-product of events monitored

• Typical measures include CPU used, IO used, locks obtained etc.

Page 246: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

DBMS Components

Query Processor

Concurrency Control Recovery

Storage SubsystemIndexesIndexes

Compiler

Parser

Execution Engine

Buffer Manager@ Dennis Shasha and Philippe Bonnet,

2013

Page 247: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

DB2 9.7 Process Architecture

@ Dennis Shasha and Philippe Bonnet, 2013 Source: http://pic.dhe.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.perf.doc/doc/00003525.gif

Exercise 1:Is intra-query parallelism possible with this process model? In other words, can a query be executed

inparallel within a same instance (or partition)?

Exercise 1:Is intra-query parallelism possible with this process model? In other words, can a query be executed

inparallel within a same instance (or partition)?

Page 248: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

DB2 10.1 Process Architecture

Source: http://pic.dhe.ibm.com/infocenter/db2luw/v10r1/index.jsp?topic=%2Fcom.ibm.db2.luw.admin.perf.doc%2Fdoc%2Fc0008930.html

@ Dennis Shasha and Philippe Bonnet, 2013

No need to knowmuch more about

the process abstractions. We will cover much more on

the memory abstractions (log

tuning), and on the communication

abstractions (tuning the application

interface, tuning across instances).

No need to knowmuch more about

the process abstractions. We will cover much more on

the memory abstractions (log

tuning), and on the communication

abstractions (tuning the application

interface, tuning across instances).

Page 249: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

MySQL Architecture

@ Dennis Shasha and Philippe Bonnet, 2013

Source: http://docs.oracle.com/cd/E19957-01/mysql-refman-5.5/storage-engines.html

Page 250: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Experimental Framework

1. System– Application + DBMS + OS + HW– Parameters (fixed/factors)

2. Metrics– Throughput / Response Time– DBMS performance indicators– OS performance indicators

3. Workload– Actual users (production),

replay trace or synthetic workload (e.g., TPC benchmark)

4. Experiments– What factor to vary?

@ Dennis Shasha and Philippe Bonnet, 2013

Exercise 2:Is throughput always the inverse of response time?

Exercise 2:Is throughput always the inverse of response time?

Exercise 3:Define an experiment to measure the write throughput of the file system on your laptop

Exercise 3:Define an experiment to measure the write throughput of the file system on your laptop

Page 251: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

Example

Statement number: 1

select C_NAME, N_NAME

from DBA.CUSTOMER join DBA.NATION on C_NATIONKEY = N_NATIONKEY

where C_ACCTBAL > 0

Number of rows retrieved is: 136308

Number of rows sent to output is: 0

Elapsed Time is: 76.349 seconds

Buffer pool data logical reads = 272618

Buffer pool data physical reads = 131425

Buffer pool data writes = 0

Buffer pool index logical reads = 273173

Buffer pool index physical reads = 552

Buffer pool index writes = 0

Total buffer pool read time (ms) = 71352

Total buffer pool write time (ms) = 0

Summary of Results

==================

Elapsed Agent CPU Rows Rows

Statement # Time (s) Time (s) Fetched Printed

1 76.349 6.670 136308 0

DBMSOS

Windows: Performance monitorLinux: iostat, vmstat

More on this in tuning the guts.

Page 252: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

256

An example Event Monitor

• CPU indicatorssorted by Oracle’sTrace Data Viewer

• Similar tools: DB2’s Event Monitor and MSSQL’s Server Profiler

Page 253: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

257

An example Plan Explainer

• Access plan according to MSSQL’s Query Analyzer

• Similar tools: DB2’s Visual Explain and Oracle’s SQL AnalyzeTool

Page 254: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

258

Finding Strangeness in Access Plans

What to pay attention to in a plan

• Access paths for each table

• Sorts or intermediary results (join, group by, distinct, order by)

• Order of operations

• Algorithms used in the operators

Page 255: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

259

To Index or not to index?

select c_name, n_name from CUSTOMER join NATIONon c_nationkey=n_nationkey where c_acctbal > 0

Which plan performs best? (nation_pk is an non-clustered index over n_nationkey, and similarly for acctbal_ix over c_acctbal)

Page 256: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

260

Non-clustering indexes can be trouble

For a low selectivity predicate, each access to the index generates a random access to the table – possibly duplicate! It ends up that the number of pages read from the table is greater than its size, i.e., a table scan is way better

Table Scan Index Scan

5 sec143,075 pages

6,777 pages136,319 pages

7 pages

76 sec272,618 pages131,425 pages273,173 pages

552 pages

CPU timedata logical reads

data physical readsindex logical reads

index physical reads

Page 257: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

261

An example Performance Monitor (query level)

• Buffer and CPU consumption for a query according to DB2’s Benchmark tool

• Similar tools: MSSQL’s SET STATISTICS switch and Oracle’s SQL Analyze Tool

Statement number: 1

select C_NAME, N_NAME

from DBA.CUSTOMER join DBA.NATION on C_NATIONKEY = N_NATIONKEY

where C_ACCTBAL > 0

Number of rows retrieved is: 136308

Number of rows sent to output is: 0

Elapsed Time is: 76.349 seconds

Buffer pool data logical reads = 272618

Buffer pool data physical reads = 131425

Buffer pool data writes = 0

Buffer pool index logical reads = 273173

Buffer pool index physical reads = 552

Buffer pool index writes = 0

Total buffer pool read time (ms) = 71352

Total buffer pool write time (ms) = 0

Summary of Results

==================

Elapsed Agent CPU Rows Rows

Statement # Time (s) Time (s) Fetched Printed

1 76.349 6.670 136308 0

Statement number: 1

select C_NAME, N_NAME

from DBA.CUSTOMER join DBA.NATION on C_NATIONKEY = N_NATIONKEY

where C_ACCTBAL > 0

Number of rows retrieved is: 136308

Number of rows sent to output is: 0

Elapsed Time is: 76.349 seconds

Buffer pool data logical reads = 272618

Buffer pool data physical reads = 131425

Buffer pool data writes = 0

Buffer pool index logical reads = 273173

Buffer pool index physical reads = 552

Buffer pool index writes = 0

Total buffer pool read time (ms) = 71352

Total buffer pool write time (ms) = 0

Summary of Results

==================

Elapsed Agent CPU Rows Rows

Statement # Time (s) Time (s) Fetched Printed

1 76.349 6.670 136308 0

Page 258: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

262

An example Performance Monitor (system level)

• An IO indicator’s consumption evolution (qualitative and quantitative) according to DB2’s System Monitor

• Similar tools: Window’s Performance Monitor and Oracle’s Performance Manager

Page 259: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

263

Investigating High Level Consumers: Summary

Find criticalqueries

Found any?Investigatelower levels

Answer Q1over them

Overcon-sumption?

Tune problematicqueries

yes

yesno

no

Page 260: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

264

Investigating Primary Resources

• Answer question 3:“Are there enough primary resources available for a DBMS to consume?”

• Primary resources are: CPU, disk & controllers, memory, and network

• Analyze specific OS-level indicators to discover bottlenecks.

• A system-level Performance Monitor is the right tool here

Page 261: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

265

CPU Consumption Indicators at the OS Level

100%

CPU% of

utilization

70%

time

Sustained utilizationover 70% should trigger the alert.

System utilization shouldn’t be more

than 40%.DBMS (on a non-

dedicated machine)should be getting a decent time share.

total usage

system usage

Page 262: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

266

Disk Performance Indicators at the OS Level

Wait queue

Average Queue Size

New requestsDisk Transfers

/second

Should beclose to zero

Wait timesshould also be close to

zeroIdle disk with pending requests?Check controller

contention.Also, transfers

should be balanced amongdisks/controllers

Page 263: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

267

Memory Consumption Indicators at the OS Level

pagefile

real memory

virtualmemory

Page faults/timeshould be close

to zero. If paginghappens, at least

not DB cache pages.

% of pagefile inuse will tell

you how much memory is “needed.”

Page 264: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

268

Investigating Intermediate Resources/Consumers

• Answer question 2:“Are subsystems making optimal use of resources?”

• Main subsystems: Cache Manager, Disk subsystem, Lock subsystem, and Log/Recovery subsystem

• Similarly to Q3, extract and analyze relevant Performance Indicators

Page 265: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

269

Cache Manager Performance Indicators

Tablescan

readpage()

Free Page slots

Page reads/writes

Pickvictim

strategy Data Pages

CacheManager

If page is not in the cache, readpage

(logical) generates an actual IO (physical). Fraction of readpages that did not generate

physical IO should be 90% or more (hit ratio)

Pages are regularly saved to disk to make

free space.# of free slots should

always be > 0

Page 266: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

270

Disk Manager Performance Indicators

rows

page

extent

file

Storage Hierarchy (simplified)

disk

Displaced rows (moved to other pages) should be kept

under 5% of rows

Free space fragmentation: pages with little space should

not be in the free list

Data fragmentation: ideally files that store DB objects

(table, index) should be in one or few (<5) contiguous extents

File position: should balance workload evenly among all

disks

Page 267: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

271

Lock Manager Performance Indicators

Lock request

Object Lock Type TXN ID

LockList

Lockspending list

Deadlocks and timeouts should seldom happen (no

more then 1% of the transactions)

Lock wait time for a transaction should be a

small fraction of the whole transaction time.

Number of lock waitsshould be a small fraction of the number of locks on

the lock list.

Page 268: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

272

Investigating Intermediate and Primary Resources: Summary

Answer Q3

Problems at

OS level?Answer Q2

Tune low-levelresources

yesno

Problematic

subsystems?Tune

subsystemsInvestigateupper level

yes no

Page 269: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

273

Troubleshooting Techniques

• Monitoring a DBMS’s performance should be based on queries and resources.– The consumption chain helps distinguish

problems’ causes from their symptoms– Existing tools help extracting relevant

performance indicators

Page 270: 1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha (shasha@cs.nyu.edu) Philippe Bonnet (bonnet.p@gmail.com)

274

Recall Tuning Principles

• Think globally, fix locally (troubleshoot to see what matters)

• Partitioning breaks bottlenecks (find parallelism in processors, controllers, caches, and disks)

• Start-up costs are high; running costs are low (batch size, cursors)

• Be prepared for trade-offs (unless you can rethink the queries)