90
#IDUG DB2 LUW Internals for DBAs : Part I & II Updated with Columnar-Organized Table Information Matt Huras & Jim Seeger IBM Wed Sept 10 : 11:00am-12:00am (part 1) and 1:00pm-2:00pm (part 2) Sessions : C01 and C02 Platform: Linux Unix Windows

DB2 LUW Internals for DBAs : Part I & II

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DB2 LUW Internals for DBAs : Part I & II

#IDUG

DB2 LUW Internals for DBAs : Part I & II Updated with Columnar-OrganizedTable Information

Matt Huras & Jim SeegerIBM

Wed Sept 10 : 11:00am-12:00am (part 1) and 1:00pm-2:00pm (part 2)

Sessions : C01 and C02Platform: Linux Unix Windows

Page 2: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleAgenda

• Architecture Overview

• Storage Architecture• Tablespace Internals• Automatic Storage• Storage Groups• Space Reclamation

• Table Management• Tables, Records, Indexes• Page Format, Space Management• Row Compression (including Adaptive

Compression)• Currently Committed • Insert Time Clustered Tables

• Columnar (aka BLU) Tables & Compression

• Logging and I/O (time permitting)• Logging and Recovery Mechanisms• Log Archival Compression• I/O Mechanisms

• Appendix (will not have time to

present these; but material is in IDUG proceedings for you to peruse

later)• Process/Thread Model Details

• Index Management and Compression

• Memory Management

• Smart Prefetching

• MDC Tables

• More Detailed Tablespace Parameter Setting

Example

Part 1

Part 2

Page 3: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleArchitecture Overview

I/O Subsystem� Asynchronous, Parallel I/O� Automatic, Intelligent Data

Striping with Parallel I/O� Big block I/O� Scatter/Gather I/O

Parallelism� SQL and Utilities� Intra- & Intra-Partition Parallelism� Cost-based Optimizer with Query

Rewrite

Very Large Memory

Exploitation� 64 bit Support� I/O Buffering� Multiple Buffer Pools

SMP Exploitation� All CPUs exploited through

OS threads and processes

Clients

CPUDB2 Server

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Parallel Subagents

Coordinator

Agent

Log Buffer Buffer

Pools

Log

Writer

Prefetchers Page

Cleaners

Package

Cache

LockList

Tablespace ContainersLog Disks

Page 4: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

Clients

CPUDB2 Server

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Parallel Subagents

Coordinator

Agent

Log Buffers

Buffer

Pools

Logger Prefetchers Page

Cleaners

Package

Cache

LockList

1. SQL statement sent overnetwork to coordinator agent

2. SQL statement compiled andoptimized

3. Resulting access plan stored in shared access plan cache

4. Access plan execution begins; subagents kicked off to perform parallel scan and sort

5. Periodic async prefetch requests sent to prefetchers (aka ‘ioservers’)

6. Prefetchers asynchronouslydrive parallel I/O againsttablespace containers to bringin extents from disk intoseparate pages in bufferpool

7. Rows read out of buffer pool and sorted in shared memory by subagents

8. Sorted rows sent back to client by coordinator agent

23

6

7

1 8

SELECT * FROM T1 ORDER BY …

77

5

4

extent extent

Page 5: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

Clients

CPUDB2 Server

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Parallel

Subagents

CoordinatorAgent

Log Buffers

Buffer Pools

Logger Prefetchers Page Cleaners

Package Cache

Lock

List

1. Access plan execution begins; subagents kicked off to perform parallel scan of column B (*)

2. Periodic prefetch requests sent to prefetchers (aka ‘io servers’)

3. Prefetchers asynchronouslydrive parallel I/O againsttablespace containers to bringrequested pages from disk intoseparate pages in bufferpool

4. Batch of B values is read out of buffer pool and compared to ‘5’(data remains compressed -“active” compression), forming a batch of qualifying tuple sequence numbers (TSNs)

5. The A values corresponding to the batch of qualifying TSNs are prefetched

6. Qualifying A values added to result set,…

7. and sent thru coordinator agent to client

5

4

7

SELECT A FROM T1 WHERE B=‘5’

44

2

1

… Initial steps skipped …

(*) Synopsis filtering not shown here; More on this later.

3

AC D

B

6 6

ColumnarStorage

Each extent contains

values for 1 column

BLU

Page 6: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

Clients

CPUDB2 Server

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Parallel Subagents

Coordinator

Agent

Log Buffers

Buffer

Pools

Logger Prefetchers Page

Cleaners

Package

Cache

LockList

1. SQL statement sent overnetwork to coordinator agent

2. SQL statement compiled andoptimized

3. Resulting access plan stored in shared access plan cache

4. Access plan execution begins

5. Agent searches for a page in the table large enough for row (more on this later)

6. Page found, and read into buffer pool

7. Agent acquires X lock on row

8. Agent writes log record to log buffer in memory (describes how to redo and undo the upcoming insert)

9. Agent inserts record to page in buffer pool (“dirties” page)

10. Success sent to client

2 43 5

9

D

1 10

INSERT INTO T1 (…)

6

D

8 7

Page 7: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

Clients

CPUDB2 Server

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Parallel

Subagents

CoordinatorAgent

Log Buffers

Buffer Pools

Logger Prefetchers Page Cleaners

Package Cache

Lock

List

1. SQL statement sent overnetwork to coordinator agent

2. SQL statement compiled andoptimized

3. Resulting access plan stored in shared access plan cache

4. Access plan execution begins

5. For each column, agent finds a page large enough for the column value (BLU uses an append approach - more on this later)

6. Pages found, and read into buffer pool

7. Agent acquires X lock on logical row

8. Agent writes log records to log buffer in memory (describes how to redo and undo per column)

9. Agent inserts column tuples to pages in buffer pool (“dirties”pages)

10. Success sent to client

2 43 5

9

1 10

INSERT INTO T1 (…)

6

8 7

AC D

B

ColumnarStorage

Each extent contains

values for 1 column (*) Synopsis maintenance not shown here; More on this later.

DD

DD

BLUBLU

Page 8: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleCOMMIT Processing

Clients

CPUDB2 Server

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Parallel Subagents

Coordinator

Agent

Log Buffers

Buffer

Pools

Logger Prefetchers Page

Cleaners

Package

Cache

LockList

1. COMMIT SQL statement sent over network to coordinator agent

2. Agent writes commit log record to log buffer

3. Agent waits for logger to write log buffer (up to and including the commit log record) to disk (if not already done)

4. Logger gets around to writing needed log buffers to log disk (at this point, the transaction is durable)

5. Logger posts all agents that are waiting for ‘hardening’ of the log records just written to the log disk

6. Lock released

7. Success sent to client

3

D

1 7

2

5 4

6

Page 9: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleCrash Recovery

Clients

CPUDB2 Server

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Log Buffers

Buffer

Pools

Logger Prefetchers Page

Cleaners

Package

Cache

LockList

1. Client tries to CONNECT, RESTART or ACTIVATE the database

2. Agent realizes database is in inconsistent state so initiates crash recovery

3. Log reader reads “active” log records into the log buffer

4. Subagents “redo” log records in parallel

5. For each log record, read target page into bufferpool, and,…

6. … redo the action specified in the log record (if it’s not already reflected in the page)

7. After redo phase, the “undo”phase will undo any actions done for transactions that did not commit before the system crash

8. When redo and undo complete, the database is open for other clients, and success is returned

1

2

3

8

4

7

5

6

D

Page 10: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleImagine Multiple Clients INSERTing

Clients

CPUDB2 Server

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Parallel Subagents

Coordinator

Agent

Log Buffers

Buffer

Pools

Logger Prefetchers Page

Cleaners

Package

Cache

LockList

(Or UPDATEing or DELETEing)

The buffer pool

is full of dirty

pages.

What happens

when an agent

tries to insert to

(yet) another

page ?

INSERT

D

DELETE

D

UPDATE

D

INSERT

D

D D D D D D D D D D DD D D D D D D D D D DD D D D D D D D D D D

Page 11: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleNext INSERT

Clients

CPUDB2 Server

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Parallel Subagents

Coordinator

Agent

Log

BuffersBuffer Pools

Logger Prefetchers Page

Cleaners

Package

Cache

LockList

1. INSERT SQL statement sent over network to coord. agent

2. (Skip SQL compilation, optimiz’n, access plan mgt, etc)

3. Agent finds a page with enough space, and tries to read it into the buffer pool

4. Buffer pool manager chooses a ‘victim’ page. It tries to choose a clean LRU page using ‘clock’algorithm. If can’t find a clean page, will choose dirty victim.

5. Dirty victim must be written to disk. However, before that can be done, associated log record must be written to log disk (Why?). Note, this policy is called “WAL” (or“write ahead logging”). 5a logger

posts interested agents.

6. Now agent writes dirty victim page to disk.

7. Now (finally) target page can be read into buffer pool and updated.

1

2 3

D D D D D D D D D D DD D D D D D D D D D DD D D D D D D D D D D

4

55a

6

“WAL” &“Dirty Steals”

7

Page 12: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title stylePage Cleaners

Clients

CPUDB2 Server

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Parallel Subagents

CoordinatorAgent

Log

BuffersBuffer Pools

Logger Prefetchers Page

Cleaners

Package

Cache

Lock

List

Write dirty

pages to disk in

the background.

Why ??

… Performance

• Insert/Update/Deletesdon’t wait

• More efficient batchI/O

• Avoid Dirty Steals

INSERT

D

DELETE

D

UPDATE

D

INSERT

D

D D D D D D D D D DD D D D D D D D D DD D D D D D D D D D DD

Page 13: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Partitions are Logical� Any number of partitions can be created on

a single physical machine (works extremely well with NUMA architectures)

Shared Nothing Architecture Allows

Virtually Unlimited Scalability� Each partition owns it's resources (buffer pool, locks,

disks,...)

� Avoids common limits on scalability:�No need for distributed lock manager or buffer coherence

protocols

�No need to attach disks to multiple machines

� Partitions Communicate Only Necessary Tuples�Using shared memory (same machine)

�Using high speed comm (diff. machines)

Clients

...

Applications See Single Database View

Partition 1

Virtually Everything Runs in

Parallel Across Nodes� SQL: queries, inserts, updates, deletes

� Utilities: Backup, Restore, Load, Index Create,

Reorg

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Partition 2 Partition N

Database Partitioning Feature

Page 14: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

Cluster Caching Facility (CF)� Provides a global lock manager (GLM)� Provides another level of bufferpool (GBP)

above disk� Redundant CFs kept in-sync with each other

through duplexing

Shared Data Architecture� Members have equal access to database storage� Clients connect to any member and get completely

coherent data access� Members co-operate with each other and the CF to

keep data concurrent data access coherent� Per-member logs

Clients

Architecture Overview : pureScale

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

...CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

GBP

GLM

GBP

GLM

Primary and

Secondary CF

Shared Data

High Speed Interconnect

Applications See Single Database View

GBP

GLM

GBP

GLM

Page 15: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

GBP

GLM

GBP

GLM

1. SQL statement sent over

network to coordinator agent

2. SQL statement compiled and

optimized

3. Resulting access plan stored in shared access plan cache

4. Access plan execution begins

5. Agent searches for a page in the table large enough for row

(more on this later)

6. Page large-enough found, and read into buffer pool

a) Read GBP memory

b) If not in GBP: read from disk

7. Agent acquires global X locks (on page and row) if not already

held by this member

8. Agent writes log record to log

buffer in memory (describes how to redo and undo the

upcoming insert)

9. Agent inserts record to page in

buffer pool (“dirties” page)

10. Success sent to client

pureScale : INSERT INTO T1 (…)

Clients

CPUDB2 Server

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Parallel Subagents

Coordinator

Agent

Log Buffers

Buffer

Pools

Logger Prefetchers Page Cleaners

Package Cache

LockList

2 43 5

1 10

6a

8

9 6b

All traffic to/from CF done through in-memory

communications (RDMA)

GBP provides 2nd buffering layer ; reduces disk I/O

D

7

Page 16: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

GBP

GLM

GBP

GLM

pureScale : COMMIT Processing “FAC”1. COMMIT SQL statement sent

over network to coordinator

agent

2. Agent writes commit log record

to log buffer

3. Agent ensures pages dirtied by

transaction are in GBP

a) Page lock released to CF

asynchronously

4. Agent waits for logger to write

log buffer (up to and including

the commit log record) to disk (if not already done)

5. Logger gets around to writing

needed log buffers to log disk

(at this point, the transaction is durable)

6. Logger posts all agents that are

waiting for ‘hardening’ of the log

records just written to the log disk

7. Lock released

a) Row lock released to CF

asynchronously

8. Success sent to client

Clients

CPUDB2 Server

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Parallel Subagents

CoordinatorAgent

Log Buffers

Buffer

Pools

Logger Prefetchers Page Cleaners

Package

Cache

Lock

List

4

D

1 8

2

6 5

7

3

3a

7a

After commit, GBP guaranteed to contain updated pages

- affordable due to RDMA (in-memory communication)

This FAC (force-at-commit) strategy keeps recovery window

very small

Page 17: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

GBP

GLM

GBP

GLM

pureScale : Member Crash Recovery (MCR)1. pureScale automatically starts

MCR when possible (eg. failed machine available again)

2. Agent initiates MCR

3. Log reader reads “active” log records into the log buffer

4. Subagents “redo” log records in parallel

5. For each log record, read target page into bufferpool from GBP or disk (in this case, GBP has page), and,…

6. … redo the action specified in the log record (if it’s not already reflected in the page - in this case it is already reflected)

7. After redo phase, the “undo”phase will undo any actions done for transactions that did not commit before the member crash

pureScale Recovery

Automation

Processes

CPU

DB2 Server

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Log Buffers Buffer Pools

Logger Prefetchers Page

Cleaners

Package

Cache

Lock

List

1

2

3

4

76

GBP

5

MCR is online (DB available thru other members)

MCR is extremely fast because all committed

updates are in GBP and do not need to be redone

(i.e. # "active" log records very small)

Page 18: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleAgenda

• Architecture Overview

• Storage Architecture• Tablespace Internals• Automatic Storage• Storage Groups• Space Reclamation

• Table Management• Tables, Records, Indexes• Page Format, Space Management• Row Compression (including Adaptive

Compression)• Currently Committed • Insert Time Clustered Tables

• Columnar Tables & Compression

• Logging and I/O• Logging and Recovery Mechanisms• Log Archival Compression• I/O Mechanisms

• (Time permitting) Process/Thread Model• Base Processing Model• Concentrator• Intra Parallel Controls• WLM Dispatcher

• Appendix (will not have time to present these; but material is in

IDUG proceedings for you to peruse later)

• Index Management and Compression

• Memory Management

• Smart Prefetching

• MDC Tables

• More Detailed Tablespace Parameter Setting

Example

Page 19: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

What happens on disk when creating and populating an SMS tablespace ?

db2 create tablespace TS1 managed by system

using ('/mydir1', '/mydir2') extentsize 4 prefetchsize 8

db2 create table T1 (c1 int ...) in TS1

db2 create table T2 (c1 float ...) in TS1

./SQL00002.DAT ./SQL00003.DAT ./SQL00002.DAT ./SQL00003.DAT

/mydir1/. /mydir2/.

T1.3

T1.5

T1.7

T1.9

T1.0

T1.2

T1.4

T1.6

T1.8

T2.1

T2.3

T2.5

T2.7

T2.9

T2.0

T2.2

T2.4

T2.6

T2.8

First Extent of DataPages for T1

T1.1

Second Extent of DataPages for T1

SMS Tablespaces

DB2 10Permanent user-created SMS tablespaces are deprecated in DB2 10. Don’t be alarmed - they remain fully

supported. Deprecating them means we are advising customers to use automatic storage tablespaces.

Page 20: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

Container (Physical) Address Map

02468

101214

510

13579

1113

511

15

/myfile(1024 4K pages)

/dev/rhd7(2048 4K pages)

Tablespace Header

First SMP Extent

Object Table

Extent Map for T1

First 2 Extents of DataPages for T1

Extent Map for T2

First Extent of DataPages for T2

Another Extent of DataPages for T1

Second SMP Extent

Tablespace (Logical) Address Map

0

1

2

3

4

5

6

7

8

31968

� What happens on disk when creating and populating a DMS tablespace ?

db2 create tablespace TS2

managed by database

using (file '/myfile' 1024,

device '/dev/rhd7' 2048)

extentsize 4 prefetchsize 8

db2 create table T1 (c1 int ...) in TS2

db2 create table T2 (c1 float ...) in TS2

DMS Tablespaces

Container ‘tag’ Container ‘tag’

1023

DB2 10Permanent user-created DMS tablespaces are deprecated in DB2 10. Don’t be alarmed - they remain fully supported. Deprecating them means we are advising customers to use automatic storage tablespaces.

Extent = 4 pages Extent = 4 pages

Page 21: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleDMS Tablespaces with Auto Resize

/data1fs /data2fs

DB2 CREATE TABLESPACE MYTS MANAGED BY DATABASE USING

(FILE ‘/data1fs/mytsC1’ 10000,

FILE ‘/data2fs/mytsC2’ 10000) AUTORESIZE YES

DB2 CREATE TABLESPACE YOURTS MANAGED BY DATABASE USING

(FILE ‘/data1fs/yourtsC1’ 10000,

FILE ‘/data2fs/yourtsC2’ 10000) AUTORESIZE YES

INCREASESIZE 50M MAXSIZE 1G

Page 22: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleAuto-Resize DMS Tablespaces Details

Example of Auto-Growth Stopping

C0 C1C0 C1C0 C1C0 C1

C0 C1C0 C1

Auto-resize table space created with 2 containers

Table spacegrows

Table spacegrows

How do you kick start auto-growth again?

1.Make more room available on the file system holding C1

2.Add a new stripe set(recommended if #1 not possible)

3.Extent CO by some amount (reduces striping)

C0 C1C0 C1 C1C0 C1C0C0C0 C1

C2 C3

C0 C1

C2 C3

1) 3)2) Extending C0 results in a new range being created that holds only that one container. Hence, auto-resize will only extend that one container.

Adding the new stripe set here results in a new range being created in the tables space map. Hence, auto-resize will only extend these new containers from here on.

ALTER TABLESPACE …BEGIN NEW STRIPE SET

ALTER TABLESPACE …EXTEND C0 …

chfs –a size=nnnnn …

Page 23: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleAutomatic Storage Tablespaces• Provide a single point of storage management for multiple tablespaces

• Tablespaces grow automatically by drawing storage increments from this common storage pool

• Do not define physical storage (unlike SMS and DMS tablespaces)• Instead, storage is defined by the database storage paths, or, in DB2 10 one or more storage groups (more on this coming up)

• Retain other useful tablespace properties, eg:• Group tables, indexes together that are logically related for convenience in recovery operations

• Keeping unrelated tables, indexes separate, for granularity and flexibility in recovery operations

• Placing tables, indexes and other storage objects in particular bufferpools, for performance fine-tuning

• Tailoring pagesize, prefetchsize and extentsize per table/index/object

• Use DMS mechanics internally

TableSpace “A”

TableSpace “B”

TableSpace “C”

Database “X”

TableSpace “A”

TableSpace “B”

TableSpace “C”

Database “X”

Pool of Storage Paths on File Systems

DMS Tablespaces Automatic Storage Tablespaces

Page 24: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

� What happens on disk when creating/populating Automatic Storage Tablespaces ?

db2 create database mydb on ‘/a/b/c’, /d/e/f’, ‘/g/h/i’, ‘/k/l/m’

db2 create tablespace TS3

db2 create tablespace TS4

db2 create table T1 (c1 int ...) in TS3

db2 load insert into T1 …

db2 create table T2 (c1 int ...) in TS4

db2 load insert into T2

db2 alter database add storage on ‘/n/o/p’, ‘/q/r/s’, ‘/t/u/v’, ‘/w/x/y’

db2 create table T3 (c1 int ...) in TS3

Automatic Storage Tablespaces

TS3 TS3 TS3 TS3

TS4 TS4 TS4 TS4

TS3 TS3 TS3 TS3

TS4 TS4 TS4 TS4

TS3 TS3 TS3 TS3

TS4 TS4 TS4 TS4

TS3 TS3 TS3 TS3

TS4 TS4 TS4 TS4

TS3 TS3 TS3 TS3

Note: For simplicity, the chart depict constant initial sizes and growth increments. Both default to 32M, but are configurable (either as a

constant size, or as a % of current tablespace size).

‘/a/b/c’ ‘/d/e/f ‘/g/h/i’ ‘/k/l/m’

Page 25: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleDB2 10 : Storage Groups (STOGROUPS)

TableSpace “A”

TableSpace “B”

TableSpace “C”

Database “X”

Automatic Storage Tablespaces with Storage Groups

SSD RAID Array SATA RAID Array

STOGROUP “HOT” STOGROUP “COLD”

• Storage Group = Named pool of automatic storage paths• Can have up to 256 per database

• Typical Usage Pattern• Create one Storage Group for each unique class of storage in your database

• Create/Assign tablespaces to the Storage Group that matches their performance & availability needs

• Benefit• Simplified management of multi-class storage

• Manage just a small number of Storage Groups, instead of a large number of tablespaces

TableSpace “A”

TableSpace “B”

TableSpace “C”

Database “X”

Pool of Storage Paths on File Systems

Automatic Storage Tablespaces

Page 26: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleDB2 10 : Storage Groups (STOGROUPS)

TableSpace “A”

TableSpace “B”

TableSpace “C”

Database “X”

SSD RAID ArraySATA RAID Array

“IBMSTOGROUP” STOGROUP “HOT”

TableSpace “A”

TableSpace “B”

TableSpace “C”

Database “X”

Pool of Storage Paths on File Systems

Before DB2 10

TableSpace “D”

New tablespace “D” will contain very frequently referenced data,…How can I optimize it’s I/O performance ?

CREATE STOGROUP HOT ON ‘/a/b/c’, ‘/d/e/f’ OVERHEAD 2 DEVICE READ RATE 300

CREATE TABLESPACE D USING STOGROUP HOT

SATA RAID Array

Wow, queries referencing tablespace D are really fast !Queries referencing tablespace “C” are also very important….

ALTER TABLESPACE C USING STOGROUP HOT

COMMIT

With DB2 10 and Storage Groups

Tablespace inherits STOGROUP I/O attributes

Online rebalance kicked off at commit

Use MON_GET_REBALANCE_STATUS()

to monitor

Online rebalance

Upgrade

Page 27: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleTablespace Parameters Best Practices

• PREFETCHSIZE• Tablespace parameter that defines the number of pages DB2 will try to read at a time,

when prefetching

• Recommendation : Set to AUTOMATIC (default)

• NUM_IOSERVERS• Database configuration parameter that defines the number of prefetchers active in the

database

• Recommendation : Set to AUTOMATIC (default)

• NUM_IOCLEANERS• Database configuration parameter that defines the number of pagecleaners active in the

database

• Recommendation : Set to AUTOMATIC (default)

• EXTENTSIZE• Tablespace parameter that defines the number of pages in each unit of tablespace storage

allocation

• Recommendation• Set so that the extent spreads across all internal devices in a single container (eg. a single

RAID stripe)• Default of 32 is usually a good choice

Page 28: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleAutomatic Storage Example

DB2 CREATE STOGROUP mysg ON /data1fs, /data2fs

DB2 CREATE TABLESPACE MYTS PAGESIZE 16K USING STOGROUP mysg

RAID 5 (4+P)

64KBstrip

stripe

RAID 5 (4+P)

stripe

/data1fs /data2fs

LUNs

64KBstrip

P P

Mount Points

Page 29: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleEXTENTSIZE

RAID 5 (4+P)

64KBstrip

stripe

RAID 5 (4+P)

stripe

/data1fs /data2fsLUNs

• Set EXTENTSIZE so that one extent I/O operation drives all spindles within at least 1 container

• At least 1 full RAID data stripe (i.e. without parity), in this example

• Example : RAID 4+P, RAID strip size = 64KB; DB2 page size = 16KB

EXTENTSIZE = 4 * 64KB / pagesize = 256KB /16KB = 16 pages

256KB 256KB

16 pages 16 pages

64KBstrip

P P

Page 30: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

Reclaimable Storage Tablespaces

• Tablespaces that can free embedded unused

extents, and thereby lower the tablespace high

water mark

• Applies to newly created DMS and Automatic

Storage tablespaces starting from V9.7

Page 31: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleWithout Reclaimable Storage

DROP TABLE 2

DROP TABLE 3

Internal tablespace metadata extents

Table 1

Table 2

Table 3

Extent that is allocated to tablespace, but not to a table

ALTER

TABLESPACE …

REDUCE

Freed Space

Trapped

Unused Space

Page 32: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

DROP TABLE 2

DROP TABLE 3

Internal tablespace metadata extents

Table 1

Table 2

Table 3

Extent that is allocated to tablespace, but not to a table

ALTER

TABLESPACE …

REDUCE MAX

With Reclaimable Storage

Page 33: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleReclaimable Storage : Usage, Hints, Tips

• Use ALTER TABLESPACE .. REDUCE MAX or <size> to free trapped unused space

• Moves used extents from higher addresses in the tablespace to unused lower addresses

• Lowers high water mark accordingly

• Shrinks/removes containers to return space back to the Automatic Storage paths

• Can specify amount (size) to free up, or specify as much as possible (MAX)

• Runs in the background• Works in batches, committing free extents as it progresses

• STOP option terminates a background REDUCE operation

• Can monitor progress with MON_GET_EXTENT_MOVEMENT_STATUS()

ALTER TABLESPACE <tsname> REDUCE --+-----------------------------+---

'-- <size> --+-------------+--+--'

| +----- K -----+ |

| +----- M -----+ |

| +----- G -----+ |

| '-- PERCENT --' |

'--- MAX ---------------------'

'--- STOP --------------------'

Page 34: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleReclaimable Storage : Usage, Hints, Tips

• FYI : Why only new tablespaces ? • Index keys in existing tablespace point to data rows via their physical (tablespace-relative)

addresses• Moving an extent of data to a new physical address, would render it's indexes unoperational

• To avoid this issue, indexes in reclaimable storage tablespace have a changed format• Index keys in reclaimable storage tablespaces point to data rows via logical (table-relative) addresses

• Consider using the ADMIN_MOVE_TABLE() stored procedure to move tables to new reclaimable storage tablespaces while maximizing table availability

• REDUCE is online, but does consume storage bandwidth• If storage bandwidth consumption is a concern, consider:

• ALTER TABLESPACE <tsname> REDUCE STOP

then later, at a more convenient time,

• ALTER TABLESPACE <tsname> REDUCE MAX

Page 35: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleAgenda

• Architecture Overview

• Storage Architecture• Tablespace Internals• Automatic Storage• Storage Groups• Space Reclamation

• Table Management• Tables, Records, Indexes• Page Format, Space Management• Row Compression (including Adaptive

Compression)• Currently Committed• Insert Time Clustered Tables

• Columnar Tables & Compression

• Logging and I/O• Logging and Recovery Mechanisms• Log Archival Compression• I/O Mechanisms

• (Time permitting) Process/Thread Model• Base Processing Model• Concentrator• Intra Parallel Controls• WLM Dispatcher

• Appendix (will not have time to present these; but material is in

IDUG proceedings for you to peruse later)

• Index Management and Compression

• Memory Management

• Smart Prefetching

• MDC Tables

• More Detailed Tablespace Parameter Setting

Example

Page 36: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleTables & Indexes

Non-Reclaimable Space (pre 9.7)Tablespaces

Table relativePage Numbers

Tablespace-relativePage Numbers

B-Tree Index

Indexes UseTable-spaceRelativePage #s

Page 37: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleTables & Indexes

3,2

3 2

Table relativePage Numbers

B-Tree Index

Indexes UseObjectRelativePage #s

Reclaimable Space (9.7 and up)Tablespaces

Page 38: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

Page size choices: 4K, 8K, 16K and 32K

Embedded free space(usable after

page reorg)

Notes

1) Page reorgs are done automatically

online as required. They can be monitored via MON_GET_TABLE()

2) Free space created by deletes or

updates can be held reserved (not usable) until the delete transaction is

committed and older than:a) the oldest transaction accessing the table

b) the oldest modifying transaction in the db

Free space(usable withoutpage reorg)

Record 0

Page Header

-1

Record 2

Page Header

3800 3400 3800 3700

Data Page1056

473,2

1056 1 RID

4 bytes 2 bytes

page# slot#

Record 0

Data Page and RID Format

Data Page473

If deleted space is not being reused,…

…. look for long-running transactions(eg. APPLID_HOLDING_OLDEST_XACT from MON_GET_TRANSACTION_LOG() )

Tip

Use larger page sizes for workloads that tend to access rows sequentially (eg. Warehousing, TEMP tables) and smaller page sizes for random access workloads (egOLTP)

Tip

Index Leaf Page

Slot DirectoryArray of 2 byte integers each

containing offset into page of actual record data

Page 39: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

CREATE TABLE t1 (

c1 INTEGER,

c2 DECIMAL(12),

c3 VARCHAR(20) NOT NULL,

c4 VARCHAR(50) NOT NULL )

INSERT INTO T1 VALUES ( null ,0,'','The big red fox')

Off Len

Fixed Portion

no data 0 c4 data

c3 length=0

Variable Portion

Off Len

� Attribute byte (1 byte)� is only present for NULLable columns� indicates if the value is in fact NULL

Default Row Format

‘The big red fox’

Fixed portion of variable column (4 bytes : offset + length)

Legend:

Actual column data (n bytes)

Off Len

n

Page 40: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

CREATE TABLE t1 (

c1 INTEGER COMPRESS SYSTEM DEFAULT,

c2 DECIMAL(12) COMPRESS SYSTEM DEFAULT,

c3 VARCHAR(20) NOT NULL,

c4 VARCHAR(50) NOT NULL )

VALUE COMPRESSION

INSERT INTO T1 VALUES (null,0, '', 'The big red fox')

Alternate Row Format & Value Compression

Legend:

Attribute byte (1 byte). Used to

indicate column=NULL or

column=default value.

Offset of column data (2 bytes)

Actual column data (n bytes)

‘The big red fox’o1

o3=o4 indicates c3 length is 0

end offset needed to calculate length of c4

n do2 o3 o4 end

c3 fixed

Fixed Portion

c1 data c2 data c4 data

c3 length=0

Variable Portion

c4 fixed c4 data

Consider alternate row format (VALUE COMPRESSION keyword) when …•Significant # of rows contain the column default values (eg. 0 for numerics)

•Significant # of rows contain NULL column values

•Significant # of variable length columns

Tip

Page 41: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

Sidebar: Row Logging

SQL What's Logged

INSERT New row image

DELETE Old row image

UPDATE Four different cases,…

Page 42: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleCurrently Committed Isolation : Motivation

SELECT * FROM EMP

EMPID NAME OFFICE SALARY

6354 Smith A1/21 43

> wait

22D2/18Baum5456205

33X1/03Tata1325104

21AA/00 AA/00 C3/46C3/46Jones783696

43A1/21Smith6354

77 1111Y2/11Y2/11ChanChan42454245

48

salaryofficenameempidrowid

EMP

Uncommitted insert

Uncommitted update

Uncommitted delete

Page 43: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

22D2/18Baum5456

33X1/03Tata1325

21AA/00 AA/00 C3/46C3/46Jones7836

43A1/21Smith6354

1111Y2/11Y2/11ChanChan42454245

salaryofficenameempidrowid

EMP

Currently Committed Isolation : Result

SELECT * FROM EMP

Uncommitted insert

Uncommitted update

Uncommitted delete

205

104

96

77

48

EMPID NAME OFFICE SALARY

6354 Smith A1/21 43

7836 Jones AA/00 21

1325 Tata X1/03 33

5456 Baum D2/18 22

> SUCCESS

DB2 returns currently committed data without

waiting for locks !(Delete and Update undone; Insert skipped.)

Page 44: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

Locklist

X(D)205

X(U)96

X(I)77

lockrowid

INS:Emp,1,6354,Smith,A1/21, 43INS:Emp,4,1325,Tata,X1/03,33

Log Archive

Active Log Files

INS: Emp,1,4245,Chan,Y2/11,11

Log Buffer

Uncommitted INSERTed data is skipped.

For uncommitted DELETEs and UPDATEs, when encountering a lock which would

otherwise conflict, DB2 uses new information in the lock manager to reconstruct

and return the previously committed data from the log buffer or log file.

DEL: Emp,5,5456,Baum,D2/18

-

22D2/18Baum5456

33X1/03Tata1325

21AA/00 AA/00 C3/46C3/46Jones7836

43A1/21Smith6354

1111Y2/11Y2/11ChanChan42454245

salaryofficenameempidrowid

EMP log ref

205

104

96

77

48

UPD: Emp,3,7836,Chan,AA/00����C3/46

Currently Committed : How Does it Work ?

Page 45: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleCurrently Committed : Internals & Usage Notes

• Log-based implementation : simple & fast• No need for rollback segments !

• Currently committed data typically reconstructed from memory (log buffer)

• Exception: updates/deletes from mass update transactions that spill log buffer (active logs

read from storage in this case)

• Fallback to traditional locking• If the currently committed data is unavailable (or not available quickly), DB2 will fall

back to the traditional locking behavior

• Examples

• Currently committed data is only available from an archived log (as may be the case with infinite logging)

• Updater held table lock (not row lock)

• Usage hints & tips• Consider increasing your log buffer size if you see increased log disk reads

• Use MON_GET_TRANSACTION_LOG() to check:CUR_COMMIT_DISK_LOG_READS - ideally want this close to 0

NUM_LOG_READ_IO - is it higher than normal for your system ?

• Consider increasing lock list size (or using AUTOMATIC setting)• To avoid escalation to table locks (disables currently committed behavior for the table)

• Be aware of potential for small increase in log space consumption if CC enabled• First update to a given row in a transaction logs entire row image

Page 46: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

Sidebar: Row Logging

SQL What's Logged

INSERT New row image

DELETE Old row image

UPDATE Four different cases,…

Page 47: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

UPDATE Row Logging

What’s logged:Full before image, plus after

image of changing bytes

and new bytes (if row is

growing).

When used:1. Currently Committed is

enabled,

- and -

2. First update to a given

row in a given

transaction,

- and -

3. DATA CAPTURE

CHANGES not in effect.

“Full Before + Delta”

Logged John 500 10000 Plano TX AABB

John 500 10000 Plano TX AABB

John 602 20012 Plano TX AABB

Original Row

Updated Row

602 20012

John 500 10000 Plano TX AABB

Fred

Original Row

Updated Row

Logged

Fred 500 10000 Plano TX AABBCC

John 500 10000 Plano TX AABB

CC

Examples

Page 48: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

UPDATE Row Logging

John 500 10000 Plano TX 24355

What’s logged:XOR between old and new rows

from 1st changed column to

last changed column.

When used:1. Currently Committed not in

effect, (or, CC is in effect and

transaction is updating given row again),

- and -

2. Row length is not changing,

- and -

3. DATA CAPTURE CHANGES

not in effect.

‘18A0FF33C’x

“Partial XOR”

Examples

John 602 20012 Plano TX 24355

John 500 10000 Plano TX 24357

Fred 500 10000 Plano TX 24355

‘1A35D8C9E88719A6C23340037DCEFF8928D0A7883’x

Original Row

Updated Row

Logged

Original Row

Updated Row

Logged

•When UPDATES comprise a significant portion of your workload …•Weigh extra UPDATE logging vs concurrency benefits of currently committed

•Try to place frequently updated columns adjacent in row definition

Tip

Page 49: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

UPDATE Row Logging

Fred 500 10000 Plano TX AABB

Frank 500 10000 Plano TX AABB

‘1A35D8C9E88719A6C23340037DF8928D0A7’x BB

Fred 500 10000 Plano TX AABB

What’s logged:XOR between new & old

row from 1st word that

changes to end of

smaller row; then any

residual words from

larger row.

When used:Same scenario as

previous except row

length is changing.

Fred 500 10000 Plano TX AABBCC

CC

“Full XOR”

XOR from 3rd byte to end of 1st row Last

byte of

2nd row

Examples

Original Row

Updated Row

Logged

Original Row

Updated Row

Logged

•When UPDATES comprise a significant portion of your workload …•Try to place frequently updated columns at end of row definition

Tip

Page 50: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

UPDATE Row Logging

John 500 10000 Plano TX 24355

What’s logged:Full copies of old and

new rows

When used:Whenever DATA

CAPTURE CHANGES

(replication) is in effect

for the table.

“Full Before & After Row Image”

Example

Frank 500 10000 Plano TX 24355

Original Row

Updated Row

Logged John 500 10000 Plano TX 24355

Frank 500 10000 Plano TX 24355

Page 51: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

� Rows compressed in bufferpool, disk, logs, backup images

� Dictionary-based LZ compression replaces frequently used byte sequences with 12-bit symbol� Byte sequences can span column boundaries or within columns

� Global view of symbol frequency (not limited to single page)

01 opoulos

02 WhitbyONTL4N5R4

… …

Name Dept Salary City Province Postal_Code

Zikopoulos 510 10000 Whitby ONT L4N5R4

Katsopoulos 500 20000 Whitby ONT L4N5R4

Dictionary (stored in the table)

…L4N5R4ONTWhitby2000500L4N5R4ONTWhitby10000510

…0220000Kats0210000510Zik 5000101

Row 1 Row 2

Zikopoulos Katsopoulos

Row Compression

Page 52: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

TableCompression

Dictionary Created

Page Dictionary

Page

Row

• DB2 DB2 10 adds a page level dictionary to further compress page common symbols

• Adapts to changing data patterns

• New keywords on ALTER/CREATE TABLE .. COMPRESS• ADAPTIVE (default)

• STATIC

PageCompression

Dictionary

DB2 10 Adaptive Compression : Overview

ALTER TABLE … COMPRESS YES

ALTER TABLE … COMPRESS YES ADPATIVE

ALTER TABLE … COMPRESS YES STATIC

TABLE REORGor

AutomaticDictionary Creation

New insert on page

Page 53: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

• Rows are inserted into a page (compressed via table dictionary)

• When page is almost full, page dictionary is built

1. Detect common recurring patterns in original records

2. Build compressed page by compressing all existing records

3. Insert page compression dictionary (special record)

4. Insert more compressed records in additional free space

Page Compression Dictionary

Original Page

Compressed Page

Adaptive Compression : How it Works

Page 54: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleCompression : Hints / Tips / Reminders

• Group correlated columns together in table definitions• E.g. place ‘Make’ (eg. Honda) and ‘Model’ (eg. Accord) columns adjacent to eachother• DB2’s row compression will compress common byte sequences regardless of column boundaries

• If you created tablespaces prior to V9.1, ensure you’ve enabled Large RIDsand Large Slots if more than 255 compressed rows will typically fit on your data pages

• Otherwise, DB2 will only place a maximum of 255 rows per page, resulting in less efficient utilization of memory and storage

• Call ADMIN_GET_TAB_INFO() and check LARGE_SLOTS and LARGE_RIDS and for ‘Y’

• When using adaptive compression remember …• New tables:

• COMPRESS YES defaults to ADAPTIVE• (Can explicitly specify COMPRESS YES STATIC or COMPRESS YES ADAPTIVE)

• Pre-10.1 tables:• By default, will stay with existing (static) compression• Use ALTER TABLE … COMPRESS YES ADAPTIVE to enable adaptive compression

dynamically

• Estimate compression savings with ADMIN_GET_TAB_COMPRESS_INFO()

• Report actual compression rate with ADMIN_GET_TAB_DICTIONARY_INFO()

Page 55: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

INSERT Processing (& Space Mgt)

• Default INSERT search algorithm:

• Use the Free Space Control Records (FSCRs) to find page with enough space

• Even if an FSCR indicates that a page has enough free space, that space may not be usable if it is "reserved" by an uncommitted DELETE from another transaction

• Ensure transactions COMMIT frequently; otherwise uncommitted freed space will not be reusable

• Search 5 FSCRs (by default)• if there is no page with enough space, append record to end of table

• DB2MAXFSCRSEARCH=N registry variable limits the number of FSCRs visited for an INSERT

• Start with the default (5) for DB2MAXFSCRSEARCH, as it is designed for most workloads

• Increase it to favour more aggressive space reuse, or, for extremely large tables• Decrease it to favour INSERT speed

• Each search starts at the FSCR where last search ended

• Once the entire table has been searched: we append without searching, until space is created elsewhere in table, via DELETE, for example

Page 56: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

500

501

0

1

Insert 1

Inserts use FSCRs to find a page with enough space

Insert 2 Inserts 3 through n

100000

100063

Subsequent insertsfill up 2 new extents ...

A Free Space Control Record

Insert n+1

Page number within table500

Legend

An inserted recordAn existing record

� DB2MAXFSCRSEARCH=N registry variable

limits the # of FSCRs visited for an INSERT

� Default of 5 works well for typical workloads

� Increase it to favour more aggressive space

reuse, or, for extremely large tables

� Decrease it to favour INSERT speed

� Special value of -1 means unlimited FSCR

search1000

1001

500

501

0

1

1500

1501

2000

2001

2500

Subsequent inserts pick up where previous left off

2501

100000

NewPage allocatedwhen 5 FSCRssearched 3000

2500

2501

Next insert resumes FSCR search, starting at the last FSCR

Tips

Page 57: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

clustering

index

on region

table

• Other search algorithm options:

• Use a clustering index on the table (CREATE INDEX ON T1 .... CLUSTER)

• DB2 tries to insert records on the same page as other records with similar index key values, resulting in more efficient range scans and prefetching

• If there is no space on that page, it tries the surrounding 500 pages, then reverts to the default search algorithm but uses a worst-fit, instead of first-fit approach (to establish a new 'mini' clustering area)

• Tips:• Use a clustering index to optimize queries that retrieve multiple records in index order,

as it results in less physical I/Os

• When a clustering index is defined, use ALTER TABLE PCTFREE nn before load or reorg. This leaves nn% free space on the table's data pages after load and reorg, and increases the likelihood that the clustering insert algorithm will find free space on the desired page

• Use ALTER TABLE APPEND ON (avoids searching and maintenance of FSCRs)

• Can be useful for tables that only grow (but this is very rare)

• Use an Insert Time Clustered Table

INSERT Processing Continued

Page 58: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleInsert-Time-Clustered (ITC) Tables

1) INSERTS …

2) DELETE WHERE …3) REORG … RECLAIM EXTENTS4) INSERTS

ExtentBoundaries

Extents quickly returned to tablespaceAvailable for other tables, indexes

Rows in the same extent

have a similar insert time

8am 9am 10am 11am 12pm1pm 2pm 3pm`

CREATE TABLE … ORGANIZE BY INSERT TIME

Page 59: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

• With non-sparse objects, only free space at the 'end' of the object can be reclaimed

• Means reorg operations usually need to perform a lot of 'heavy lifting'

Reclaiming Space from a Regular Table

Inplace online REORG

-- Almost all rows in table moved

Classic offline REORG

-- New copy of table & indexes createdRow

Page

Page 60: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

� DB2 10.5 ITC RECLAIM EXTENTS improvement– Light weight task to move rows out of ‘almost empty’ extents– More complete space reclamation; still very fast !

� RECLAIM EXTENTS also supported for– Indexes (new in DB2 10)– MDC Tables– Columnar Organized (aka BLU) Tables

Embedded empty extents

can be easily reclaimed

from the table/index

Extent Boundaries

DB2 REORG TABLE T1 .. RECLAIM EXTENTS ALOW WRITE ACCESS

Reclaiming Space from an ITC Table

Page 61: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleIndex REORG RECLAIM

ExtentBoundaries

1) Index on “invoice date”

Index Physical View(logical view is a B-tree)

Index page

Index key on page

3) REORG INDEXES …. CLEANUP ALLRemoves committed pseudo deleted keys & merges pages. Leaves 8 free pages but no free extents.

4) REORG INDEXES …. RECLAIM EXTENTS

RECLAIM intelligentlymoves pages from almostempty extents to almost fullextents, then fees emptyextents back to tablespace.

Pseudo-deleted key

2) Overnight batch job deletes all invoices “greater than 30 days old and paid”Leaves 20% of index keys older than 30 days.

Freed to Tablespace

Page 62: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

Space Mgt & Clustering : Hints / Tips / Reminders

• Make effective use DB2MAXFSCRSEARCH • Large values (or -1) to favour space reuse and reorg avoidance

• Small values to favour INSERT speed

• If range scans are predominant consider using clustering to optimize scan performance

• MDC or clustering index

• Note: new DB2 10 smart prefetching can reduce need for clustering index

• APPEND mode can be useful in isolated scenarios to optimize INSERT speed

• However, ensure you have a strategy to reclaim space if/when mass deletes occur

• Consider Insert Time Clustered tables if rows inserted at the same time are often deleted at the same time

• Especially if easy space reclamation is a priority

Page 63: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleAgenda

• Architecture Overview

• Storage Architecture• Tablespace Internals• Automatic Storage• Storage Groups• Space Reclamation

• Table Management• Tables, Records, Indexes• Page Format, Space Management• Row Compression (including Adaptive

Compression)• Currently Committed • Insert Time Clustered Tables

• Columnar Tables & Compression

• Logging and I/O• Logging and Recovery Mechanisms• Log Archival Compression• I/O Mechanisms

• (Time permitting) Process/Thread Model• Base Processing Model• Concentrator• Intra Parallel Controls• WLM Dispatcher

• Appendix (will not have time to present these; but material is in

IDUG proceedings for you to peruse later)

• Index Management and Compression

• Memory Management

• Smart Prefetching

• MDC Tables

• More Detailed Tablespace Parameter Setting

Example

Page 64: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

64

� With BLU, each page and extent contains values for a single column

TSN

0

1

2

3

4

5

6

7

8

9

TSN = TupleSequence Number

Mike Hernandez

Chou Zhang

Carol Whitehead

Whitney Samuels

Ernesto Fry

Rick Washington

Pamela Funk

Sam Gerstner

Susan Nakagawa

John Piconne

Mike Hernandez

Chou Zhang

Carol Whitehead

Whitney Samuels

Ernesto Fry

Rick Washington

Pamela Funk

Sam Gerstner

Susan Nakagawa

John Piconne

43

22

61

80

35

78

29

55

32

47

43

22

61

80

35

78

29

55

32

47

404 EscuelaSt.

300 Grand Ave

1114 Apple Lane

14 California Blvd.

8883 Longhorn Dr.

5661 Bloom St.

166 Elk Road #47

911 Elm St.

455 N. 1st St.

18 Main Street

404 EscuelaSt.

300 Grand Ave

1114 Apple Lane

14 California Blvd.

8883 Longhorn Dr.

5661 Bloom St.

166 Elk Road #47

911 Elm St.

455 N. 1st St.

18 Main Street

CA

CA

CA

CA

AZ

NC

OR

OH

CA

MA

CA

CA

CA

CA

AZ

NC

OR

OH

CA

MA

90033

90047

95014

91117

85701

27605

97075

43601

95113

01111

90033

90047

95014

91117

85701

27605

97075

43601

95113

01111

Los Angeles

Los Angeles

Cupertino

Pasadena

Tucson

Raleigh

Beaverton

Toledo

San Jose

Springfield

Los Angeles

Los Angeles

Cupertino

Pasadena

Tucson

Raleigh

Beaverton

Toledo

San Jose

Springfield

Page

Extent(assume

extentsize=2)

� TSNs (a logical Row ID) are used to stitch together column values that belong in the same

row during query processing� eg. SELECT zipcode FROM t WHERE name=“Mike Hernandez”

� an internal ‘page map index’ allows DB2 to quickly find the page containing the zipcode for TSN 4

� Typically, column-organized tables use significantly less space than row-organized tables

� Unusual case: column-organized tables with many columns and very few rows can be larger than row-organized

tables as each column requires at least 1 extent

A Deeper Look at Internals : Column Storage

� With traditional tables, each page contains entire rows

BLU

Page 65: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

CREATE TABLE t1 …

IN TS1 INDEX IN TS2

ALTER TABLE …

ADD CONSTRAINT uc1

UNIQUE (c2)

CREATE INDEX i1 …

CREATE INDEX i2 …

CREATE INDEX i3 …

uc1

extent

extent

extent

extent

extent

extent

extent

i1

i3i2

Index (inx)object

Table (dat) object(each extent containpages of rows)

TS2TS1

Row

organizedtable

CREATE TABLE t1 …

ORGANIZE BY COLUMN

IN TS1 INDEX IN TS2

ALTER TABLE …

ADD CONSTRAINT uc1

UNIQUE (c2)

c1

c2

c3

c4

c1

c2

c3

c4

uc1

Index(inx)

object

Table (‘dat’)object

(meta data &

compression

dictionary)

Column (‘cde’) object

(each extent

contains pages of

data for 1 column)

TS2TS1

Column

organizedtable

Synopsis(records the

range of column

values existing in

different TSN

ranges of the

table)

Column-Organized vs Row-Organized TablesBLU

Page 66: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

S_DATE QTY ...

2005-03-01 176 ...

2005-03-02 85 ...

2005-03-02 267

2005-03-04 231

...

...

2006-10-17 476

User table: SALES_COL

SYN130330165216275152_SALES_COL

TSNMIN TSNMAX S_DATEMIN S_DATEMAX ...

0 1023 2005-03-01 2006-10-17 ...

1024 2047 2006-08-25 2007-09-15 ...

...

TSN = Tuple Sequence Number

0

1023

1024

2047

� Meta-data that describes which ranges of values exist

in which parts of the user table

� Enables DB2 to skip portions of a table when

scanning data to answer a query

� Benefits from data clustering, loading pre-sorted data

� Tiny : typically ~ 0.2% the size of the base table

� Transparently and automatically maintained by DB2

0

1023

Synopsis Table Detail

BLU

Page 67: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

0) DB2_WORKLOAD=ANALYTICS

1) CREATE db, tablespaces

2) CREATE TABLE t1 …

PRIMARY KEY (c1)

ORGANIZE BY COLUMN

IN TS1 INDEX IN TS2

3) LOAD FROM myfile INTO t1 …

4) SELECT c2,c3 … FROM t1

WHERE …

5) INSERT INTO t1 …

6) DELETE FROM t1 WHERE …

7) Automatic table maintenance

returns space to tablespace

Index(inx)

object

Table (‘dat’)object

(meta data & compression

dictionary)

Column (‘col’) object

(each extent

contains pages of

data for 1 column)

Synopsis(records the

range of column

values existing in

different regions

of the table)

TS2TS1

automatically maintains synopsis

& collects table & index statistics

automatically maintains synopsis

c1

c2

c3

c4

c1

c2

c3

c4

<empty>

<empty>

<empty>

<empty>

An Illustrative Scenario

BLU

Page 68: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

• Frequency compression

• Values of similar frequency use the same number of bits

• More common values encoded using fewer bits

• Dictionary-per-column• Increases compression effectiveness (vs single-dictionary-per-table)

New Compression for Column-Organized Tables

0 = California1 = NewYork

000 = Arizona001 = Colorado010 = Kentucky011 = Illinois…111 = Washington

000000 = Alaska000001 = Rhode Island…

2 High Frequency States (1 bit covers 2 entries)

8 Medium Frequency States (3 bits cover 8 entries)

40 Low Frequency States (6 bits cover 64 entries)

Frequency compression example

3 different code lengths.

Code lengths are inverselyproportional to frequencyof values represented.

BLU

Page 69: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleBLU Compression is “Actionable”

� Actionable compression allows actions to be performed on compressed data, eg.

� Predicate evaluation (=, <, >, >=, <=, Between, LIKE)

� Range predicates

� Equality joins

� Avoiding decompression during predicate evaluation and other actions provides significant

query performance gains

0 = California1 = NewYork

000 = Arizona001 = Colorado010 = Illinois011 = Kentucky…111 = Washington

000000 = Alaska000001 = Rhode Island…

Order-preserving

Order-preserving

Order-preserving

BLU

Page 70: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleCompression Dictionaries

• Column-level dictionaries: one per column• Dictionary populated during

• LOAD REPLACE, LOAD INSERT into empty table via Analyze Phase

• SQL INSERT, INGEST via Automatic Dictionary Creation (ADC)

• Page-level dictionaries: may also be created• If space savings outweighs cost of storing page-level dictionaries

• Exploit local data clustering at page level to further compress data

• Page compression enabled for Load in 10.5 GA

Data Page

Column 1 Data

Page DictionaryColumn N Compression Dictionary

Column 1 Compression Dictionary

“DAT” object

Data Page

Column 1 Data

Page Dictionary

Data Page

Column 1 Data

Page Dictionary

Data Page

Column 1 Data

Page Dictionary

Data Page

Column 1 Data

Page Dictionary

Data Page

Column N Data

Page Dictionary

“CDE” object

… …

BLU

Page 71: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleCompression Statistics

� Only PCTPAGESSAVED applies to

column-organized tables

� Approximate % of pages saved in the table

� Runstats collects PCTPAGESSAVED by estimating the number of data pages needed to store table in uncompressed row orientation

n/aAVGROWSIZE

n/aAVGROWCOMPRESSIONRATIO

n/aAVGCOMPRESSEDROWSIZE

n/aPCTROWCOMPRESSED

PCTPAGESSAVEDPCTPAGESSAVED

Column-Organized Table

Row-Organized Table

BLU

C1 PCTENCODED = 90

C2 PCTENCODED = 75

C3 PCTENCODED = 100

C1 PCTENCODED = 0

C2 PCTENCODED = 10

C3 PCTENCODED = 0

� If compression ratio is too low, check

this statistic to see if too many values

were left uncompressed in specific

columns

� % of values encoded (compressed) by column-level dictionary

� Measures & of values compressed (NOT compression ratio)

� Many low values ? (see load recommendations - coming up)

SYSCAT.COLUMNSSYSCAT.TABLES

Estimating Compression Benefit

�On DB2 10.5 or above ?�Load into a row-organized compressed table

�Load into a column-organized table

�Compare PCTPAGESAVED for each

•Otherwise•Use stand alone tool, eg.

http://www.dbisoftware.com/blog/DB2NightShowNews.php?id=453

Tip

Page 72: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

72

Load for Column-Organized Tables

1) ANALYZE PHASE If column dictionaries need to be built

Build histograms to track value frequency

Column Compression Dictionaries

Compress values. Build data pages. Update synopsis Build keys for any unique indexes.

User Table

Synopsis Table

2) LOAD PHASE

InputSource

Indexes (as necessary)

BLU

� New defaults (to improve out-of-the-box experience):

� REPLACE RESETDICTIONARY : Default for column-organized tables (vs KEEPDICTIONARY for row-organized)

� STATISTICS: Default for column-organized tables is YES (vs NO for row-organized)

� New option: REPLACE RESETDICTIONARYONLY

� Creates dictionary based on input data without loading any rows

� Can create dictionary prior to ingesting any data from SQL-based utilities

� COPY YES: Not yet supported for column-organized tables

� BLOCKNONLOGGED DB config parameter must be NO (default)

� If set to YES, LOAD will fail with SQL2032N

Page 73: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

BLU Loading & Compression : Hints & Tips

� General recommendation

– New database ? Just set DB2_WORKLOAD=ANALYTICS before creating it

– Existing database ? Explicitly set UTIL_HEAP_SZ to at least 1,000,000 pages

and ‘AUTO’ (new in 10.5)

� DBA considerations for very large tables

– Reduce total time required for the first load of the table by reducing the duration

of the ANALYZE phase• Obtain a subset of representative data from the total data to be loaded into the table

• LOAD .. REPLACE RESETDICTIONARYONLY … the subset of data to build the

column compression dictionary

• LOAD .. INSERT .. to load the full source of data; will not trigger ANALYZE phase

– Consider presorting data on column(s) that will frequently appear in query

predicates to improve compression ratios and synopsis table exploitation

BLU

Page 74: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleAgenda

• Architecture Overview

• Storage Architecture• Tablespace Internals• Automatic Storage• Storage Groups• Space Reclamation

• Table Management• Tables, Records, Indexes• Page Format, Space Management• Row Compression (including Adaptive

Compression)• Currently Committed • Insert Time Clustered Tables

• Columnar Tables & Compression

• Logging and I/O• Logging and Recovery Mechanisms• Log Archival Compression• I/O Mechanisms

• (Time permitting) Process/Thread Model• Base Processing Model• Concentrator• Intra Parallel Controls• WLM Dispatcher

• Appendix (will not have time to present these; but material is in

IDUG proceedings for you to peruse later)

• Index Management and Compression

• Memory Management

• Smart Prefetching

• MDC Tables

• More Detailed Tablespace Parameter Setting

Example

Page 75: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleOnline Log Space & Active Log Space

Active log spaceLog space associated with log records written byinflight transactions or dirty pages (TOT_LOG_USED)

Also: Active log space high water markLargest value of active log space seen

TOT_LOG_USED_TOP

Configured Online Log SpaceTotal configured disk space for log filesLOGFILSZ * (LOGPRIMARY+LOGSECOND) * 4K

Log available Log space available for new log data

(TOT_LOG_AVAILABLE)

Try to ensure TOT_LOG_USED_TOP is comfortablybelow configured online log space

Once full, a log file will be archived.

However, a log file cannot be overwritten if it is ‘active’(by default).

LOGPRIMARY =4LOGSECOND=0

Page 76: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleHints / Tips re. Excessive Active Log Demands• Prevent “log hogs” with MAX_LOG

• Database configuration parameter that defines maximum active log space consumed by one transaction as a percent of primary log space (range: 0-100; 0=not in effect)

• Violator transaction rolled back and connection forced off database

• Set DB2_FORCE_APP_ON_MAX_LOG registry variable to FALSE to leave connection active

• Prevent “long inactive transactions” with NUM_LOG_SPAN• Database configuration parameter defining number of active log files a single transaction is allowed to

span (range: 0-65535; 0=not in effect)

• Violator transaction rolled back and connection forced off database

• Allow “Infinite Logging” (set db cfg parameter LOG_SECOND to -1)• Allows space used by archived active logs to be overwritten with new log data

• These logs are retrieved in the event of ROLLBACK

• An active transaction can span an infinite number of logs

• Transactions are no longer limited by the size of the primary log (logprimary x logfilesiz)

• The occasional "run-away" transaction won’t cause a log full error

• However, rollbacks and crash recovery performance may suffer significantly

• May need to retrieve logs from archive

Tip

• Use db2set BLK_LOG_DISK_FUL = YES to cause applications to wait (rather than receive errorand rollback) when DB2 runs out of space when trying to create a new log file

• Can occur if archive device is slow; DB2 may need to create new primary logs at run-time

• Watch for “ADM1826E DB2 cannot continue because the disk used for logging is full”

Tip

Page 77: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleCrash Recovery

Current LSN (at time of crash)

“MinbuffLSN”LSN of oldest page update in bufferpool at time of crash

“LowtranLSN”LSN of oldest inflighttransaction

UNDO

REDO (transaction state only)

UNDO

REDO (for each log record, read page(s) and redo action if page LSN < log record LSN)

Current LSN(at time of crash)

REDO (as above)

Case 1 : No old transactions (eg. OLTP)

Case 2 : One or more old transaction(s) (eg. Batch)

“MinbuffLSN”LSN of oldest page update in bufferpool at time of crash

“LowtranLSN”LSN of oldest inflighttransaction

LSN = “Log Sequence Number”A log record’s byte offset into the lifetime

logstream of the database

Page 78: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

Minimizing Crash Recovery Time

• Maximize “LowtranLSN”• Minimize transaction size• Break large insert/update/delete transactions into multiple transactions where possible

• Maximize “MinbuffLSN”• Minimize age of oldest dirty page in bufferpool• With pureScale, Force-At-Commit protocol ensures this for member crash recovery• For non-pureScale, small setting for SOFTMAX database configuration parameter

• So, what is “SOFTMAX” ?• Target for amout of log bytes to REDO during crash recovery• Expressed as a % of 1 log file

Page 79: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleSOFTMAX = Target for bytes of REDO during CR

“LSN gap”

SOFTMAX db cfg value

Current LSN “MinbuffLSN”

Current LSN“MinbuffLSN”

SOFTMAX db cfg value

Current LSN“MinbuffLSN”

SOFTMAX db cfg value

Start

More pagesupdated. “LSN gap”created.

Page cleaner activity causes MinbuffLSN to “move up”

• SOFTMAX is expressed as a % of 1 log file - example:

• DB2 tries to keep the difference between MinbuffLSN & Current LSN < SOFTMAX• Done by adjusting aggressiveness of page cleaners

• Higher SOFTMAX values:• Less page cleaning activity

• Slower crash recovery

• Lower SOFTMAX values:• More page cleaning activity

• Faster crash recovery

LOGFILSZ = 5000 pages (5000 * 4KB) = 20 MB

SOFTMAX = 15 ( 20 MB *.15 ) = 3 MB

Page 80: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

Tuning pureScale Member Crash Recovery (MCR)

• MCR is typically extremely fast (several seconds)

• Requires no tuning

• Why? Force-at-commit (FAC)

• pureScale’s FAC protocol ensures all pages updated by a transactions are stored in both primary and secondary CFs before the transaction can commit

• Results in:

• minbuffLSN value for each member remains close to currentLSN

• Minimizes amount of pages that need to be read during MCR

• Many of those that are read during MCR, can be read from GBP memory

• Amount of log records that need to be redone remains very small

Page 81: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

REDO

Current LSN “MinbuffLSN” “LowtranLSN”

UNDO

REDO (for each log record, read page(s) and redo action if page LSN < log record LSN)

WithoutpureScale

Current LSN

“MinbuffLSN”(LSN of oldest page update in member that is not in CF or disk)

“LowtranLSN”(LSN of oldest active transaction in member)

UNDO

pureScale

• Reminder: keeping transactions as short as possible remains a key best practice

• Can result in longer MCR durations due to large amounts of undo to process (just as in non-pureScale)

Tuning pureScale Member Crash Recovery (MCR)

Page 82: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

New Recovery Time Controls in 10.5

(*) SOFTMAX is now deprecated (supported, but not recommended)• Set SOFTMAX to 0, if you want to use the new controls• New databases : SOFTMAX is set to 0• Upgraded database : SOFTMAX not changed

Use as a more direct control over group crash recovery time than SOFTMAX.

Recall : group crash recovery only occur in rare simultaneous failure conditions (eg. both CF fail at the same time)

SOFTMAX

(*)

pureScaleonly

SecondsPAGE_AGE_TRGT_GCR

Target for age of oldest

update page in the group bufferpool (in the CF) not

reflected on persistent

storage.

Seconds

Units

SOFTMAX

(*)

Replaces

Use as a more direct control over crash recovery time than SOFTMAX.

In pureScale, this parameter has little effect unless you have large batch update transactions. This is the result of the pureScale ‘Force-at-Commit’policy, which requires all transactions to send all updated pages to the CF’s GBP before the transaction can commit.

All DB2 configurations

PAGE_AGE_TRGT_MCR

Target for age of oldest updated page in a member’s local bufferpool not reflected on persistent storage (or, for pureScale, not refected in the group bufferpool in the CF).

NotesApplicableto

DB Cfg Parameter

Page 83: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleDirect Agent (Synchronous) I/O

Agents

Buffer Pool(s)

Synchronous reads are normal in OLTP workloads (eg. random key lookups). Synchronous reads can also occur in warehousing/reporting workloads (eg. tablescans) if prefetching is not occurring, or not occurring quickly enough. This may be an indicator that the prefetchingconfiguration should be reviewed.

dirty steal

Synchronous writes can occur in the “dirty steal” scenario (agent needs to read in a page, but cannot find a clean victim slot). This may be an indicator that the page cleaning configuration should be reviewed.

random read

Page 84: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title stylePage Cleaning (Asynchronous) I/O

Page Cleaners

Agents

Log Writer

Agents can trigger the page cleaners when they perform dirty steals and when flushing objects (for example, during not logged operations, index creations).

The logger can trigger page cleaning when available log disk space is getting low, or when target recovery window is exceeded (softmax). This is also termed a lsn gap trigger.

The page cleaners trigger themselves if the proportion of dirty pages exceeds the target CHNGPGS_THRESH. This is termed a threshold trigger.

dirty steal

object flush

threshold

trigger

Buffer Pool(s)

softmax (aka lsn gap)

trigger

log space trigger

Page 85: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title stylePage Cleaning (Asynchronous) I/O

Page Cleaners

Agents

Log Writer

Agents can trigger the page cleaners when they flush objects (for example, during not logged operations, eg. non-logged index creations, not logged transactions)

The logger can trigger page cleaning when available log disk space is getting low.

The page cleaners trigger themselves proactively in 2 ways:1) softmax: instead of waiting for the lsngap to occur, the page cleaners try to keep a steady amount of page cleaning going to prevent the gap from occurring.2) victim cleaning: instead of the threshold trigger, the page cleaners try to keep between 1% and 2% of the slots in the bufferpool clean and ready for victim selection.

(no need for

dirty steal trigger)

object flush

proactivesoftmax

&

victim

cleaning

Buffer Pool(s)

Alternate Page Cleaning

log space trigger

db2set DB2_USE_ALTERNATE_PAGE_CLEANING = ON

Page 86: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title stylePrefetching (Asynchronous) I/O

Prefetchers (aka IO Servers)

Buffer Pool(s)

AgentsAgents send prefetch requests to the prefetch queue(s) during planned prefetching (eg. tablescans), sequential detection (eg. scan through a clustered index), list prefetch (eg. sorted list of pages gathered through an index scan or a list of pages for a particular column in a BLU scan).

“Block Region”

Contiguous groups of pages (extents) are read into discontiguous bufferpool pages. Some platforms directly support this “scatter read” capability. On other platforms the extent is first read into a temporary buffer. Then, each page being copied individually into the bufferpool.

When a buffered pool is configured for block access, the extent IOs are performed directly into contiguous space in the bufferpool, if possible. This is typically more efficient than either of the above to alternatives.

planned prefetch

sequential detection

list prefetch

Page 87: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

� Many file systems now support “Direct I/O”� Bypasses the file system’s buffercache

� Combines performance benefits of RAW with the usability benefits of file systems

� Examples: AIX Concurrent I/O, Veritas Quick I/O

Tables

Indexes

Agents PrefetchersPage

Cleaners

DB2 Bufferpools

FilesystemBuffercache

Tables

Indexes

Agents PrefetchersPage

Cleaners

WithDirect I/O

WithoutDirect I/O

Direct I/O (aka NO FILE SYSTEM CACHING)

• Bypasses the file system buffer cache

• Since V9.5, this is enabled by default on most platforms (*)• Can be disabled via ALTER TABLESPACE …. NO FILE SYSTEM CACHING

(*) See details here :

http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.dbobj.doc/doc/c0051304.html

Page 88: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleI/O (and Bufferpool) Hints / Tips

• Aim for at least 80-85% hit rates on data pages, and 90-95% on index pages (OLTP) on important tables/indexes

100% * (pool_data_l_reads – pool_data_p_reads) / pool_data_l_reads

100% * (pool_index_l_reads – pool_index_p_reads) / pool_index_l_reads

• Take advantage of DB2’s Self Tuning Memory Manager (STMM)

• See appendix charts

• Avoid dirty steals• Monitor element: pool_drty_pg_steal_clns

• Strategies for eliminating dirty steals:

• Ensure sufficient page cleaners (aka IOCLEANER)

• In V9 and beyond, NUM_IOCLEANERS=AUTOMATIC usually works well

• Lower setting for CHNGPGS_THRESH – though watch out for excessive disk writes

• Consider Alternate Page Cleaning

db2set DB2_USE_ALTERNATE_PAGE_CLEANING = ON

• Works best with OLTP systems, usually resulting in smoother, less-bursty write I/O

• Not fully proven in warehouse environments

Page 89: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title style

• Maximize % reads that are asynchronous100% * (pool_data_p_reads – async_data_reads) / pool_data_p_reads

• Strategies:

• Ensure sufficient prefetchers (aka IOSERVERS)

• In V9 and beyond, NUM_IOSERVERS=AUTOMATIC usually works well

• Larger PREFETCHSIZE (ALTER TABLESPACE … PREFETCHSIZE)

• Use a clustering strategy (eg. clustering indexes and MDC)

• Use NO FILE SYSTEM CACHING in most cases• Potential exception : some LOB workloads

• LOBs are not cached in DB2 bufferpools

• All LOB I/O is direct to/from the tablespace containers

• If there is a high degree of locality in the LOB read patterns, reads could be serviced from the file system

cache

• Multiple LOB writes in the same transaction can be done asynchronously, with file system caching

• Such writes are sync-ed to disk at commit time

• Net: if you have a high degree of locality in LOB read patterns and/or write multiple LOBs per transaction, consider placing LOBs in separate tablespace(s) with file system caching

• Also consider LOB INLINING

I/O (and Bufferpool) Hints / Tips

Page 90: DB2 LUW Internals for DBAs : Part I & II

Click to edit Master title styleWe’ve Covered …

• Architecture Overview

• Storage Architecture• Tablespace Internals• Automatic Storage• Storage Groups• Space Reclamation

• Table Management• Tables, Records, Indexes• Page Format, Space Management• Row Compression (including Adaptive

Compression)• Currently Committed • Insert Time Clustered Tables

• Columnar Tables & Compression

• Logging and I/O• Logging and Recovery Mechanisms• Log Archival Compression• I/O Mechanisms

Questions ?