Idug Tridex 2013 Db2 10.5 d

Matt Huras, IBM

Tridex Regional DB2 User’s Group

Sept 2013 Meeting, NYC

DB2 10.5 (including BLU Acceleration)A Technical Overview

Click to edit Master title style

DB2 10.5DB2 10

3x Query Performance Boost50% Compression BoostTemporal QuerynoSQL Graph StoreHADR Multiple Standby

pureScale (DB2 9.8)Virtually Unlimited CapacityTransparent ScalabilityLeading Availability

� TCO & Performance� 3x Query Performance� New Index Exploitation� Adaptive Compression� Multi-temp Storage� Real-time Warehousing

� Ease of Development� Temporal Query� 98% SQL Compattibiltiy� Graph Store� RCAC

� Reliability / Availability� pureScale Integration &

Enhancements� WLM Enhancements� Reorg Avoidance� HADR Mutliple Standby

� TCO & Performance� Any OLTP/ERP workload� Start small; grow with your

business

� Ease of Development� Application Transparent

Scaling� Avoid the risk & cost of

tuning your applications to the database topology

� Reliability /Availability� Maintain service across

planned & unplanned events

� TCO & Performance�Memory-optimized BLU Acceleration �Workload Consolidation with pureScale�Even more performance !

� Ease of Development �Enhanced SQL Compatibility�More noSQL Integration

� Reliability / Availability�Rolling Updates�HADR with pureScale�pureScale Active / active DR

Enhancements�Online add/drop Member�Other availability enhancements

DB2 10.5Analytics at the Speed of Thought

BLU Acceleration

Always Available TransactionsOnline Maintenance

pureScale HADR

Future Proof VersatilityEnhanced SQL & noSQL function


What is DB2 with BLU Acceleration?

• New innovative technology for analytic queries• Columnar storage

• New run-time engine with vector (aka SIMD) processing, deep multi-core optimizations and cache-aware memory management

• “Active compression” - unique encoding for further storage reduction beyond DB2 10 levels, and run-time processing without decompression

• “Revolution by Evolution”• Built directly into the DB2 kernel

• BLU tables can coexists with traditional row tables, in same schema, tablespaces, bufferpools

• Query any combination of BLU or row data

• Memory-optimized (not “in-memory”)

• Value : Order-of-magnitude benefits in …• Performance

• Storage savings

• Time to value

How fast is it ? … Results from the DB2 10.5 Beta

Customer Speedup over DB2 10.1

Large Financial Services Company 46.8x

Global ISV Mart Workload 37.4x

Analytics Reporting Vendor 13.0x

Global Retailer 6.1x

Large European Bank 5.6x

8x-25x improvement

is common

“It was amazing to see the faster query times compared to the performance results with our row-organized tables. The performance of four of our

queries improved by over 100-fold! The best outcome was a query that finished 137x faster by using BLU Acceleration.”- Kent Collins, Database Solutions Architect, BNSF Railway

� POPS (Proof of Performance and Scalability)

• Derived from Redbrick performance test

• Classic sales analytics• 5.5years of data (2000 days) for 63

stores• ~4TB of raw data• 2 fact tables• 5 dimension tables

• Broad range of queries with varying selectivity / aggregation

� Substantial Storage Savings with BLU Acceleration

• 2.5x less space than DB2 10.1

� Massive Performance Gains

• 133x speedup over DB2 10.1

• Maximum query speed up over 900x

� POPS (Proof of Performance and Scalability)

• Derived from Redbrick performance test

• Classic sales analytics• 5.5years of data (2000 days) for 63

stores• ~4TB of raw data• 2 fact tables• 5 dimension tables

• Broad range of queries with varying selectivity / aggregation

� Substantial Storage Savings with BLU Acceleration

• 2.5x less space than DB2 10.1

� Massive Performance Gains

• 133x speedup over DB2 10.1

• Maximum query speed up over 900x

Intel® Xeon® Processor E5-4650

32 cores total (4 CPUs)

384 GB

DS5300 (2x16 disks)

Recent Internal Test

0000

100100100100

200200200200

300300300300

400400400400

500500500500

600600600600

700700700700

DB2 10.1DB2 10.1DB2 10.1DB2 10.1

DB2 BLUDB2 BLUDB2 BLUDB2 BLU

To

tal e

lap

sed

tim

e o

ve

r a

ll q

ue

rie

s (m

in)

621

4.7

13

3x

Lab tests -YMMV

6

� ~2x-3x storage reduction vs DB2 10.1 adaptive compression (comparing all objects - tables, indexes, etc)

• New advanced compression techniques

• Fewer storage objects required

DB2 with BLU Accel.DB2 with BLU Accel.

Significant Storage Savings

Lab tests -YMMV

77 © 2013 IBM Corporation

DB2 with BLU Acceleration : The 7 Big Ideas

1

2

34

5

6

7

� Massive compression with approximate Huffman encoding– The more frequent the value, the fewer bits it is encoded with– Eg., there will typically be more sales records from states with higher populations

• New York and California, may be encoded with only 1 or 2 bits• Alaska and Rhode Island may be encoded in 12 bits

� Register-friendly encoding optimizes CPU & memory efficiency– Encoded values packed together to match the register width of the CPU– Fewer I/Os, better memory utilization, fewer CPU cycles to process

7 Big Ideas: Compute Friendly Encoding and Compression

STATE Encoding

2

New YorkCaliforniaIllinoisMichigan

AlaskaRhode Isl

Florida

Register Length

Conceptual

Compression

Dictionary

99

7 Big Ideas: Data Remains Compressed During Evaluation

SELECT COUNT(*) FROM T1 WHERE STATE = ‘California’

California

New York

California

California

New York

California

Illinois

Michigan

Alaska

Rhode Is

STATE Encoding

California

Count = 1 234

2

© 2013 IBM Corporation

� Encoded values do not need to be decompressed during evaluation– Predicates (=, <, >, >=, <=, Between, etc), joins, aggregations and more work

directly on encoded values

Encode

� Without SIMD processing the CPU will apply each instruction to each

data element

� Eg. compare records to 2005

7 Big Ideas: Multiply the Power of the CPU

� Performance increase with Single Instruction Multiple Data (SIMD)

� Using hardware instructions, DB2 with BLU Acceleration can apply a single instruction to many data elements simultaneously– Predicate evaluation, joins, grouping, arithmetic

3

Compare = 2005

Compare = 2005

Compare = 2005

2001

Instruction

Result Stream

Data

2002 2003 2004

2005

2005 2006 2007 20082009 2010 2011 2012

Processor

CoreCompare = 2005

2001

Instruction

Result Stream

Data200220032004200520062007

Compare = 2005

Compare = 2005

Compare = 2005

Compare = 2005

Compare = 2005

Compare = 2005 2005

Processor

Core

2009 2010 2011 20122005 2006 2007 20082001 2002 2003 2004

11

core core

cache cache

corecore

cache

core 0 working

on blue data

core 1 working

on green data

Cacheline

‘ping-pong’Minimal

Trafficcache

line

7 Big Ideas: Core-Friendly Parallelism4

� BLU queries automatically parallelized across cores, and,achieve excellent multi-core scalability via …– careful data placement and alignment– careful attention to physical attributes of the server– and other factors, designed to …

… maximize CPU cache hit rate & cacheline efficiency

Main memory

layout

SELECT c1 FROM… SELECT c2 FROM… SELECT c1 FROM… SELECT c2 FROM…

cache

7 Big Ideas: Core-Friendly Parallelism4

core core

cache cache

larger

“working

set” of

memory

accesses

core core

cache cache

BLU tries to

match

“working

set” to actual

cache size

remaining

portion of

data is

processed in

sequence

Minimized

Memory Access

Frequent, Slow

Memory Access

2 3

4 5

2 3

4 5

� BLU queries automatically parallelized across cores, and,achieve excellent multi-core scalability via …– careful data placement and alignment– careful attention to physical attributes of the server– and other factors, designed to …

… maximize CPU cache hit rate & cacheline efficiency

7 Big Ideas: Column Oriented Storage

� Massive improvements in I/O efficiency– Only perform I/O on the columns involved in the query– No need to consume bandwidth for other columns– Deeper compression possible due to commonality within column values

� Massive improvements in memory and cache efficiency– Columnar data kept compressed in memory– Data packed into cache friendly structures– Late materialization

• Predicates, joins, scans, etc. all operate on columns packed in memory

– Rows are not materialized until absolutely necessary to build result set– No need to consume memory/cache space & bandwidth for unneeded columns

5

C1 C2 C3 C4 C5 C6 C7 C8C1 C2 C3 C4 C5 C6 C7 C8Columns stored

separately

and packed in

different buffers

in memory

SELECT C4 ... WHERE C4=X

Consumes I/O bandwidth

memory buffers and memory

bandwidth only for C4

14

7 Big Ideas: Scan-Friendly Memory Caching

� Memory-optimized (not “In-Memory”)– No need to ensure all data fits in memory

� BLU includes new scan-friendly victim selection to keep a near optimal % of pages buffered in memory– Traditional RDMSes use ‘most recently used’ victim selection for large scans

• “There’s no hope of caching everything, so just victimize the last page read”

– A key BLU design point is to run well when all data fits in memory, and when it doesn’t !

• Even with large scans, BLU prefersselected pages in the bufferpool, usingan algorithm that adaptively computesa target hit ratio for the current scan,based on the size of the bufferpool,the frequency of pages being re-accessedin the same scan, and other factors

– Benefit: less I/O !

6

RAM

DISKS

Near optimal caching


15

7 Big Ideas: Data skipping

� Automatic detection of large sections of data that do not qualify for a query and can be ignored

� Order of magnitude savings in all of I/O, RAM, and CPU

� No DBA action to define or use – “Synopsis” automatically created and maintained as data is LOADed or INSERTed– Persistent storage of min and max values for sections of data values

7


How BLU Helps: A Hypothetical Example

� The setup:

4TB table with 100 columns, 10 years of data, 2004-2013.

� The query:

SELECT COUNT(*) from MYTABLE where YEAR = ‘2010’

� The challenge:

Subsecond response to a 4TB query on a 32 core server,

without defining an index.

� The action:

1. Compression reduces data size to 1/10th

• Divide by 10

2. Columnar access touches only 1 of 100 columns • Divide by 100

3. Automatic synopsis eliminates pages without 2010 data• Divide by 10

4. Core-friendly parallelism on 32 core system• Each core scans 1/32

5. Compute-friendly encoding and SIMD, scan efficiency is 4x faster than traditional• Divide by ~4

400GB

4GB

400MB

12.5MB

3.1MB

17

7 Big Ideas: Simple to Implement and Use

� LOAD and then… run queries– Significantly reduced or no need for,…

• Indexes• REORG (it’s automated)• RUNSTATS (it’s automated)

• MDC or MQTs or Materialized Views• Statistical views• Optimizer hints

� It is just DB2!– Same SQL, language interfaces, administration– Same DB2 process model, storage, bufferpools

1

“The BLU Acceleration technology has some obvious benefits: It makes our analyticalqueries run 4-15x faster and decreases the size of our tables by a factor of 10x. But it’s when I think about all the things I don't have to do with BLU, it made me appreciate the technology even more: no tuning, no partitioning, no indexes, no aggregates.”-Andrew Juarez, Lead SAP Basis and DBA


18

7 Big Ideas: Simple to Implement and Use

� One setting optimizes the system for BLU Acceleration– Set DB2_WORKLOAD=ANALYTICS

– Informs DB2 that the database will be used for analytic workloads

� Automatically configures DB2 for optimal analytics performance– Makes column-organized tables the default table type– Sets up default page (32KB) and extent size (4) appropriate for analytics– Enables automatic workload concurrency management– Enables automatic space reclaim– Memory for caching, sorting and hashing (bufferpool, sortheap), utilities (utility heap) are

automatically initialized based on the server size and available RAM

� Simple Table Creation– If DB2_WORKLOAD=ANALYTICS, tables will be created column

organized automatically– Data is always automatically compressed - no options– For mixed table types can define tables as ORGANIZE BY COLUMN or ROW

� Utility to convert tables from row-organized to column-organized– db2convert utility

1


CREATE TABLE SALES

(SALESKEY BIGINT not null,

SALESPERSONKEY INT not null,

PRODUCTKEY INT not null,

PERIODKEY BIGINT not null,

…)

ORGANIZE BY COLUMN IN TS1

How to Create a BLU Table

SALEST1

TS1

Storage

Group

Bufferpool

created with bufferpool …created using …

created in …Use the new DFT_TABLE_ORG

database configuration parameter

to set default table organization

(row or column)

Use the new DFT_TABLE_ORG

database configuration parameter

to set default table organization

(row or column)

CREATE STORAGE SG1 ON ‘path1’, ‘path2’, ‘path3’

CREATE TABLESPACE TS1 USING STOGROUP SG1

20

DB2 10.5 Automatic Concurrency Management� Every additional query naturally consumes more memory, locks, CPU & memory bandwidth

� In some databases, more queries can lead to contention and performance degradation

� DB2 10.5 avoids this by automatically optimizing the level of concurrency– DB2 10.5 allows an arbitrarily high number of concurrent queries to be submitted, but limits the number that consume resources at

any point in time

– Lightweight queries that instant response, bypass this control

� Enabled automatically when DB2_WORKLOAD=ANALYTICS

...

Applications & Users

Up to tens of thousands of SQL queries at once

DB2 DBMS kernel

SQL Queries

Moderate number of queries consume resources


Automatically determined based on available machine CPU

resources

SYSDEFAULTUSERCLASS

SYSDEFAULT-

SUBCLASS

DB2 10.5 Automatic Concurrency Management : Details

SYSDEFAULT-

USERWORKLOAD

else

SYSDEFAULT-

MANAGEDSUBCLASS

SYSDEFAULT-

CONCURRENT

limit concurrency

to N, queue excessquery cost

> X ?

SYSDEFAULT-

USERWAS

� New objects created in all 10.5 databases– SYSDEFAULTMANAGEDSUBCLASS - default subclass for managed queries

– SYSDEFAULTUSERWAS - default work class set to map expensive queries (cost > X) to the above subclass

– SYSDEFAULTCONCURRENT - default concurrency threshold to limit concurrently executing managed queries to N

� Default concurrency threshold enabled on database creation when DB2_WORKLOAD=ANALYTICS

� X and N are determined automatically by DB2– Based on available CPU resources

DB2 10.5 additions shown in blue

Threshold

Work Class Set

22

Automatic Space Reclaim

� Traditional ‘reorg’ not needed with BLU tables

– No concept of ‘clustering’

� Deleted space can be easily reclaimed via REORG TABLE t1 RECLAIM EXTENTS

– Freed extents can exist anywhere in the column object (uses efficient “sparse table”technique used with MDC and ITC tables)

– The storage can be subsequently reused by any table in the tablespace

– Done online while work continues

� Done automatically via DB2’s automatic table maintenance when DB2_WORKLOAD=ANALYTICS

– Space is freed online while work continues

Column3Column1 Column2

2012 2012

2012

2012

DELETE * FROM MyTableWHERE Year = 2012

These extents hold only deleted data

Storage extent 2013 2013 2013

2013


CREATE TABLE t1 …

IN TS1 INDEX IN TS2

ALTER TABLE …

ADD CONSTRAINT uc1

UNIQUE (c2)

CREATE INDEX i1 …

CREATE INDEX i2 …

CREATE INDEX i3 …

uc1

extent

extent

extent

extent

extent

extent

extent

i1

i3i2

Index (inx)object

Table (dat) object(each extent containpages of rows)

TS2TS1

Roworganized

table

CREATE TABLE t1 …

ORGANIZE BY COLUMN

IN TS1 INDEX IN TS2

ALTER TABLE …

ADD CONSTRAINT uc1

UNIQUE (c2)

c1

c2

c3

c4

c1

c2

c3

c4

uc1

Index(inx)

object

Table (dat)object

(meta data & compression dictionary)

Column (col) object

(each extent contains pages of data for 1 column)

TS2TS1

A Brief Look at Internals : BLU StorageColumn

organizedtable

Synopsis(records the

range of column values existing in different regions

of the table)

0) DB2_WORKLOAD=ANALYTICS

1) CREATE db, tablespaces

2) CREATE TABLE t1 …

PRIMARY KEY (c1)

ORGANIZE BY COLUMN

IN TS1 INDEX IN TS2

3) LOAD FROM myfile INTO t1 …

4) SELECT c2,c3 … FROM t1

WHERE …

5) INSERT INTO t1 …

6) DELETE FROM t1 WHERE …

7) Automatic table maintenance

returns space to tablespace

Index(inx)

object

Table (dat)object

(meta data & compression dictionary)

Column (col) object

(each extent contains pages of data for 1 column)

Synopsis(records the

range of column values existing in different regions

of the table)

TS2TS1

automatically maintains synopsis & collects table & index statistics

automatically maintains synopsis

A Quick Look at BLU Internals : A Scenario

c1

c2

c3

c4

c1

c2

c3

c4

<empty>

<empty>

<empty>

<empty>

25© 2013 IBM Corporation

1TB SAP BW Queries

0000

500500500500

1000100010001000

1500150015001500

2000200020002000

2500250025002500

3000300030003000

3500350035003500

4000400040004000

4500450045004500

5000500050005000

Q01 Q01a Q02 Q03 Q04 Q05 Q06 Q07 Q08 Q09 Q10 Q11 Q12 Q12a Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20

DB2 10.1DB2 10.1DB2 10.1DB2 10.1

BLU on pre-GA 10.5 BuildBLU on pre-GA 10.5 BuildBLU on pre-GA 10.5 BuildBLU on pre-GA 10.5 Build

IBM Power 7 750AIX6.1 TL63.3GHz 32-core130GB RAM availableDS5300 w 48 spindles

42.8x

Speedup

42.8x

Speedup

771s

18s

Query

Ela

pse

d T

ime

(se

con

ds)

Lab tests -YMMV

26

Cognos BI with BLU Acceleration

Multi-platform software

Analytic Data Mart

(BLU Tables)

Cognos BIwith BLU Acceleration

EDW Application OLAP Application

Example BLU Use Case : EDW Offload

Enterprise Data Warehouse

Load and Go

27

Server: POWER7+ 760

CPU: 48 cores @ 3.4GHz , 1TB RAM

Cognos/DB2 client LPAR: 23 cores, 384GB RAM

DB2 server LPAR: 24 cores, 460GB RAM

1 core, 4GB RAM dedicated to VIOS

Storage: V7000 with 1TB SSD and 4TB HDD

� Cognos BI 10.2 • Dynamic Cubes (ROLAP)

• Extends Dynamic Query with in-memorycaching of members, data, expressions, results, and aggregates

• 963GB of raw data• 7 fact tables• 17 dimension tables

• Workload consists of both loading the cache and running adhoc reports not satisfied in the cache

� In-Memory Aggregate Cache Load• 18x faster than DB2 10.1

� Ad-hoc Cognos Reports• 14x faster than DB2 10.1

Cognos Aggregate Cache Load Elapsed Time

DB2 10.1 DB2 10.5

Report Workload Elapsed Time

DB2 10.1 DB2 10.5

18x faster

14x faster

Cognos with BLU Acceleration

� Breakthrough technology – Combines and extends the leading

technologies– Over 25 patents filed and pending– Leveraging years of IBM R&D spanning 10

laboratories in 7 countries worldwide

� Typical experience – 8x-25x performance gains– 10x storage savings vs. uncompressed data

with indexes– Simple to implement and use

� Order of magnitude improvements in – Consumability– Speed– Storage savings

DB210.5

Super analyticsSuper easy

DB2WITH BLUACCELERATION

DB2WITH BLUACCELERATION

DB2 with BLU Acceleration : Summary


DB2 10.5DB2 103x Query Performance Boost50% Compression Boost

Temporal QuerynoSQL Graph StoreHADR Multiple Standby






� Virtually Unlimited Capacity� Any OLTP/ERP workload� Start small; grow with your

business

� Application Transparent Scaling� Avoid the risk & cost of


� Availability� Maintain service across


� TCO & Performance�Memory-optimized BLU Acceleration �Workload Consolidation with pureScale�Even more performance !

� Ease of Development �Enhanced SQL Compatibility� Index on Expression�More noSQL Integration



DB2 10.5 & DirectionsAnalytics at the Speed of Thought

BLU Acceleration


pureScale HADR

Future Proof Versatility

Extending the pureScale Value Proposition

Learning from the undisputed Gold Standard... System z

� Virtually Unlimited Capacity– Buy only what you need, add capacity as

your needs grow

� Application Transparency– Avoid the risk and cost of

application changes

� Continuous Availability– Deliver uninterrupted access to your data

with consistent performance

pureScale as a Consolidation Platform

� Consolidate multiple workloads to on the same resource infrastructure� Save management and resource costs

DB2

DATABASE A

DB2

DATABASE B

DB2

DATABASE N

PassiveuntilFailover



Failover Failover Failover

Workload A

Applications

Workload B

Applications

Workload C

Applications

Single pool of members can serve all database workloads

CF

Member Member Member Member Member

DATABASE A DATABASE B DATABASE N

Shared

Storage

Member Subsets : Motivation

CF


Shared

Storage

Workload

Batch OLTP

• Current pureScale Workload Balancing Design

• All work is automatically balanced across all members

• Rebalancing occurs on transaction or connection boundaries (configuration option)

• Works very well with single workloads, even with non-homogenous system configurations

• Not ideal for some scenariosAutomatic Workload Balancing

Member Subsets

• Provides workload balancing and management within defined member subsets

• Member subsets:

• Are defined as a database alias witha new SP

• Can be modified dynamically

- applications react by distributing work tonew members and draining work fromremoved members

• On member failure, applications areautomatically re-routed to anothermember in the subset

- if no such member is active, a membernot part of any subset can be chosen(if the subset is defined as ‘inclusive’)

CF


Batch OLTP

CF


Workload 1 Workload 3

Workload 2

Database

SYSPROC.WLM_CREATE_MEMBER_SUBSET

SYSPROC.WLM_ALTER_MEMBER_SUBSET

MemberMember

Shared hot spare members - can be used by any workload using

an inclusive member subset, if all members in the subset fail

Subset Workload Balancing

Per-Member Self Memory Management Tuning

• Current pureScaleSTMM design

• Single ‘tuning’ member makes local tuning decisions

• Broadcasts memory tuning changes to other members

• Tuning member is highly available - automatically moves to another member in the event of failure on original member

• Works well in single homogeneous workload scenarios

• Not ideal in workload consolidation scenarios

CF


Shared

Storage

STMMDaemon

Workload

Tuning Decisions Broadcast

Workload 1



Per-Member Self-Memory Management Tuning

• Per-member STMM approach

• Each member makes autonomous tuning decisions

• Key use cases:

• Workload consolidation

• Non-homogenous member configurations

• Default for new databases

• Existing databases retain current behavior

• Control via:

CF


Shared

Storage

Workload-specific Tuning Decisions

Workload 1



STMMDaemon

STMMDaemon

STMMDaemon

STMMDaemon

STMMDaemon

CALL SYSPROC.ADMIN_CMD

( "get stmm tuning member" )

CALL SYSPROC.ADMIN_CMD

( “update stmm tuning member -2" )

New -2 setting invokes per-member tuning

Explicit Hierarchical Locking (EHL)

� Designed to remove data sharing costs for tables/partitions that are only accessed by a single member

– Avoids CF communication if object sharing not occurring

� Example target scenarios– Workload consolidation– Multi-tenancy– Directed batch

� Enabled via new OPT_DIRECT_WRKLD database configuration

parameter– Detection of data access patterns happens automatically and EHL will kick in when data is not

being shared after configuration parameter set

Member Member Member Member

Table or

Partition A

Table or

Partition B

Table or

Partition C

Table or

Partition D

CFCF

No member/CF

communication

necessary

Multi-Tenancy Demo : 10 Independent Workloads

M0 M1 M2 M3 M4 M5 M6 M7 M8M8 M9

CFp CFs

RDMA interconnect over 10 Gb RoCE Ethernet

Fiberchannel storage interconnect4x IBM TMS820Flash Storage Units20 TB each

10 Gb RoCE switch

� 10 Members x 10 cores each, 2 CF x 8 cores each

� Each member runs a separate 70% read / 30% write transactional workload,

representing a different tenant– E.g. different regions, different subsidiaries, different customers in a SaaS

environment

� 10 Members x 10 cores each, 2 CF x 8 cores each

� Each member runs a separate 70% read / 30% write transactional workload,

representing a different tenant– E.g. different regions, different subsidiaries, different customers in a SaaS

environment

�Over 90% scaling at 10 members !

�More than 850,000 SQL statements per

second across 10 members

�Over 90% scaling at 10 members !

�More than 850,000 SQL statements per

second across 10 members

Lab tests -YMMV

0

1

2

3

4

5

6

7

8

9

10

1 2 3 4 5 6 7 8 9 10#Members Running Workloads

Re

lati

ve

Pe

rfo

rma

nce

Random Key Indexes

� Example Motivating Scenario

– An online store defines an ORDER_NUM column– ORDER_NUM is indexed for fast lookups– Each concurrent transaction gets a newly incremented ORDER_NUM value– Results in frequent attempts by each member to update last index leaf page to

add the latest ORDER_NUM

� Random key indexes

CREATE INDEX i1 ON t1 (INT ORDER_NUM RANDOM)

– Each ORDER_NUM is randomized before insertion into the index– Spreads access requests evenly across index leaf pages– Lookups apply reverse algorithm– Not usable for scans

Random Indexes : Extreme Example

39

2216

17449

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

TPS

Regular index Random index

Lab tests -YMMV

� 4 Members x 32c per member, SLES Linux

� Heavy insert activity into ascending timestamp index

– Creates hot spot at high end of index

The Scenario

IBM achieves new WORLD RECORD

on three-tier SAP® Sales and Distribution (SD) standard

application benchmark with record 266,000 SAP SD users;

reaching 1,471,680 SAPS1

1) Results of DB2® 10.5 on IBM Power 780 on the three-tier SAP SD standard application benchmark on SAP enhancement package 5 for SAP ERP 6.0, achieved 266,000 SAP SD benchmark users, certification # 2013010. Configuration: 8 processors / 64 cores / 256 threads, POWER7+ 3.72 GHz, 512 GB memory, running AIX 7.1

2) Results of DB2® UDB 8.2.2 on IBM eServer p5 Model 595 on the three-tier SAP SD standard application benchmark running SAP R/3 ® Enterprise 4.70 (ERP) software, achieved 168,300 SAP SD benchmark users, certification # 2005021. Configuration:32-core SMP, POWER5, 1.9 GHz, 256 GB memory, running AIX 5.3

3) Results of Oracle 11g Real Application Clusters (RAC) on SAP sales and distribution-parallel standard application benchmark running the SAP enhancement package 4 for SAP ERP 6.0, achieved 180,000 SAP SD benchmark users, certification # 2011037. Configuration: 8 x Sun Fire X4800 M2 each with 8 processors / 80 cores / 160 threads, Intel Xeon Processor E7-8870, 2.40 GHz, 8 x 512 GB memory, running Solaris 10

Source: http://www.sap.com/benchmark

SAP, R/3 and all SAP logos are trademarks or registered trademarks of SAP AG in Germany and several other countries.All other trademarks are the property of their respective owners.

SAP SD Benchmark Users

1.47x more users than bestOracle3 result

DB2 on Power has held the leadership result for the highest number of SAP SD users on the three-tier SAP SD standard application benchmark for over 7 years2

Featuring 64-core IBM Power® 780 AIX® 7.1 & DB2® 10.5










business





� TCO & Performance�Memory-optimized BLU Acceleration �Workload Consolidation with pureScale�Even more pureScale Performance !





BLU Acceleration


pureScale HADR


� What ?

– Allow indexes to be defined with an expression, eg.

CREATE INDEX i1 ON emp (UPPER(lastname), salary+bonus)

� Value Proposition

– Efficient execution of SQL statements with such expressions, eg.

SELECT * FROM emp WHERE UPPER(lastname) = ?

SELECT * FROM emp WHERE salary+bonus = ?

– Avoid the drawbacks of work-around (index on generated column)• Space consumption of extra column• Potential need to modify applications to reference the new column

Index on Expression

� What ?

– Allow indexes to be defined so that NULL keys are excluded, eg:CREATE [UNIQUE] INDEX i1(c1, c2) EXCLUDE NULL KEYS


– Support applications whose semantics require unique enforcement, but only where keys are not NULL

– Storage savings ! (Avoid indexing NULLs if they are infrequently queried)

� Notes

– A NULL key is one where all key components are NULL

Excluding NULL Keys from Indexes

C1 C2

1 NULL Unique Constraint

InsertC1 C2

NULL NULL

C1 C2

1 NULL

NULL NULL

C1 C2 Regular Index ENK Index

1 NULLFail Fail

NULL NULL

C1 C2 Regular Index ENK Index

1 NULLFail Success

NULL NULL

Excluded from

ENK index

� What ?

– Allow tables to be defined with a row size which exceeds page size, eg:

CREATE TABLESPACE TS1 PAGESIZE 4K

CREATE TABLE T(c1 VARCHAR(4000), VARCHAR(4000)) IN TS1


– Support applications that require long row definitions– Avoid a lengthy table redefinition to change pagesize

� Notes

– New maximum row length: 1,048,319 bytes

– Excess row data stored in a LOB– Performance penalty when need to go ‘off page’ for a portion of row

• Usually OK – in most scenarios instances of long rows are rare

– If you expect long rows to be common, try to use a larger page size

Extended Row Size

Extended Row Size Scenario

0) DB2 UPDATE DB CFG FOR DB1

USING EXTENDED_ROW_SZ=ENABLE

1) CREATE TABLESPACE TS1 …

PAGESIZE 32K

2) CREATE TABLE t1 …

(C1 INT,

C2 VARCHAR(30000),

C3 VARCHAR(10000))

IN TS1 LONG IN TS2

3) INSERT INTO t1 WITH VALUES

(1,<10KB str>,<10KB str>)

4) INSERT INTO t1 WITH VALUES

(2,<30KB str>,<10KB str>)

5) UPDATE t1 SET C3=NULL

WHERE C1=2

TS2TS1

LOB

EXTENDED_ROW_SZ=ENABLE is the

default only for new databases

Table (dat)

object

LOB (lob)

object

SYSTABLES PCTEXTENDEDROWS column shows %

of rows in a table that are extended

SYSTABLES PCTEXTENDEDROWS column shows %

of rows in a table that are extended

creates

extended

row

removes

extended

row

Steady Increase in SQL Compatibility Over Time

� Steady increase in compatibility over time– More and more complex applications

� DB2 10.5 estimated to provide >98% statement compatibility

Data is based on DCW (Database Conversion Workbench) DB2 reports

� Engaging

� Mobile� Dynamic� Competitive� Fashionable � Scalable

� Rapidly Changing !Mobile

Social

Big DataAnalytics

Cloud

New Era Applications

� Need for Agility– Rapid application development and evolution– No “schema first” - developers resist solutions

that require delays to sync up with data modelers or change windows

� NoSQL JSON data stores– JSON schema is simple and can be evolved

rapidly without data modelers– Native to the application space (eg Javascript)– Simple model for persisting Java and Javascript

objects

Application Characteristics

App Development Trends

� Simple format for data exchange

– Self-describing, schema-less– Very simple (eg. tag:value format)– Human readable – Based on JavaScript, initially targeted for

web applications,… but,…

� Use is rapidly spreading

– The data interchange format for the Web• JavaScript is very popular in mobile and

systems-of-engagement applications– Analytics

• Eg. An organization stores a large quantity of web statistics stored as JSON documents, and wants to perform analytics

{"firstName": "John","lastName" : "Smith","age" : 25,"address" :{

"streetAddress": "21 2nd Street","city" : "New York","state" : "NY","postalCode" : "10021"

},"phoneNumber":[

{"type" : "home","number": "212 555-1234"

},{

"type" : "fax","number": "646 555-4567"

}]

}

Background : What is JSON ?

Click to edit Master title styleTypical JSON Open Source Datastore Attributes

• Optimized for high speed data ingest

• Relaxed/absent ACID properties• Logging is often turned off or done asynchronously to improve performance

• “Fire and forget” inserts• Applications include checking logic to verify update occured

• No concept of commit or rollback; each JSON update is independent• Applications implement compensation logic to update multiple documents with ACID

properties

• No document-level locking• Applications manage a “revision” tag to detect document update conflicts

• Data is sharded for scalability• Shards are replicated asynchronously for availability

• Queries to replica nodes can return back-level data sometimes

• JSON documents are stored in “collections”• No “join” across collections, requires “in application” joins

• Limited options for security, temporal, geo-spatial,...

The Best of Both Worlds -Agility with a Trusted Foundation

• What : Built-in JSON support in DB2 !

• Including support for popular noSQLJSON APIs

• Why : Preserve mature DBMS features;Leverage existing skills and tools

• Multi-statement Transactions• ACID• Extreme scale, performance and high

availability• Comprehensive Security• Management/Operations• …

DB2 JSON Datastore : Concept & Motivation


• Binary formatted JSON stored in

the database (in LOBs)

• Btree indexing to JSON

elements for fast query

processing

• Java API and command line

• Optional “Fire-forget” inserts

• Supports transactions

• Smart query re-write

• DB2 ecosystem of tools

• Extend support to more

applications and developers

JSON API JSON

Command Shell

JDBC Driver

SQL via DRDA

Java Apps

Key Shopping Cart

1 <binary JSON>

2 <binary JSON>

3 <binary JSON>

item

“item”:”camera”

ApplicationsJava PHP NodeJS

AIM Developed MongoDB Wire ProtocolNoSQL JSON Wire Listener

BSON Wire Protocol

DB2inlined

LOBs

DB2 JSON Datastore : Features










business





� TCO & Performance�Memory-optimized BLU Acceleration �Workload Consolidation with pureScale�Even more pureScale Performance !





BLU Acceleration


pureScale HADR


All Connections

~2/3 Connections~1/3 Connections

~1/2 Connections ~1/2 Connections

� Active/Active DR with Geographically Dispersed pureScale Cluster� Unique Values :

– Coherent active/active access regardless of site (no need for conflict detection/avoidance)– Synchronous (no transaction loss on failures)– DR site constantly being tested

� Previously validated only on Power

M1 M3 M2 M4CFSCFP

Site A Site B

10s of km

Active/Active Disaster Recovery with GPDC on Linux

Applications

100%

90%

80%

70%

60%

10km 20km 30km 40km 50kmSite-to-site distance

R/W Ratio

Portion ofUPDATE,

INSERT, or DELETE

Operationsin

Workload

Good

candidate

workloads /

configurations

for GDPC

GDPC Sweet Spot

60km 70km

Good

candidate

workloads for other

replication

technologies

55

pureScale Cluster Extension without Downtime

� New members can be added to an instance while it is online– No impact to workloads running on existing members.– New member configuration is copied from an existing “reference” member.

• Can be reconfigured later if needed

– Workload can be immediately directed to the newly added member once it is started.

� Other notes– New optional –mid option to indicate

member number to be added

– New member can be addedto an existing member host

– Backup no longer neededafter adding new members.

Log LogLogLog

Memberaddedonline

Log

MemberMemberMemberMemberMember

CF CF

Four Steps to Online Cluster Extension

1. Add the member to the instance� Invoked from existing host in instance� Initial new member configuration derived from invoking host (‘reference’ member)

2. (Optional) Reconfigure the new member as needed� The reference member is used both for member specific database manager configuration (eg. instance

memory) and for member specific database configuration parameters� If needed, update selected parameters on the new member

3. Start the new member� If using default workload balancing, clients will be automatically directed to new member� Else see step 4

4. (Optional) Update member subset or affinity definitions as needed

� Subsets: new member not a member of any subset until explicitly added (can be done online)� Affinity: you can add new member to db2dsdriver.cfg, and dynamically reload in application via API

$ db2iupdt –add –m hostC -mnet hostC-ib0 db2sdin1

$ db2start member 12

$ db2 update database configuration member 5 …

Topology-Changing Restore

• Allow restore of M-member backup to N-member instance

• Allow restore from pS to non-pS and vica-versa

• Backup image can be online if N is a superset of M

CF

Member Member Member

DATABASE “MYDB”Shared

Storage

DB PARTITION

pureScale Feature

Logs

CF


DATABASE “MYDB”Shared

Storage

DB PARTITION

pureScale Feature

Logs

BACKUP IMAGE

BACKUP

RESTORE

Example : Recovery after a Media Failure

Member 0

Member 1

Member 2

t0: Backup Database(online)

t1: Backup tablespaceTBSP0(online)

t3: Backup tablespaceTBSP1(online)

t2: AddMember 2 t4: Media

Failure

� 1) Restore the full database backup image taken at t0

� $ db2 restore database sample from /mybackup

� 2) (Optional) Restore the backup of tablespace TBSP0 taken at t1� $ db2 restore database sample tablespace(tbsp0) from /mybackup taken at t1�

� 3) (Optional) Restore the backup of tablespace TBSP1 taken at t3� $ db2 restore database sample tablespace(tbsp1) from /mybackup taken at t3

� 4) Rollforward the database to the end of logs – this replays the add member 2 event

� $ db2 rollforward database sample to end of logs and stop

� Flashcopy backup and restore a largely manual process:

� Backup 1. Identify LUN(s) associated with the database2. Identify free target LUN(s) for the copy3. Establish the flashcopy pair(s)4. Issue DB2 SUSPEND I/O command to tell DB2 to suspend write I/Os5. Issue storage commands necessary to do the actual flash copy6. Issue DB2 RESUME I/O command to return DB2 to normal

� Restore1. Restore/copy target LUN(s) containing backup of interest2. Issue DB2INIDB command to initialize the database for rollforward recovery3. Issue DB2 ROLL FORWARD command

� No history file entry

� Error prone

� No history file entry

� Error prone

DB2 Database

Source LUNs Target LUNs

Flash Copy

Target LUNs

Flash Copy

Review : Manual Flash Copy Backup

� Flashcopy backup/restore just like any other DB2 backup



DB2 Database


Flash Copy

Target LUNs

Flash Copy

DB2 BACKUP DB sample USE SNAPSHOT

DB2 RESTORE DB sample USE SNAPSHOT

DB2 ROLLFORWARD …

Review : Integrated Flash Copy Backup

� History file record

� Simple !

� Wide (but not exhaustive)

storage support


� Simple !

� Wide (but not exhaustive)

storage support

� Flashcopy backup/restore just like any other DB2 backup



DB2 Database


Flash Copy

Target LUNs

Flash Copy

DB2 BACKUP DB sample USE SNAPSHOT SCRIPT‘/myscript.sh’

Scripted Interface for Flash Copy Backup


� Simple to use !

� Wider storage support

enabled


� Simple to use !

� Wider storage support

enabled

DB2 RESTORE DB sample USE SNAPSHOT SCRIPT‘/myscript.sh’ TAKEN AT <timestamp>

DB2 ROLLFORWARD …

Want to write your own Script ?

DBA DB2 Script

DBA DB2 Script

SNAPSHOT (BACKUP)

RESTORE DELETE QUERY

prepare prepare prepare prepare

snapshot restore delete

verify

storemetadata

rollback

Script must support these ‘actions’

Snapshot (backup) Example Flow

Example script in

samples/BARVendor/libacssc.sh

Example script in

samples/BARVendor/libacssc.sh

� Online inplace reorg support on a table using adaptive compression

� Online inplace reorg support in pureScale

� Fastpath option for online inplace reorg to clean up overflow records only

� Reorg with RECLAIM EXTENTS can cleanup partially empty extents for Insert Time Clustered tables

$ db reorg table T1 inplace cleanup overflows

REORG Enhancements

DB2 10 Review : Insert-Time-Clustered Tables

1) INSERTS …2) DELETE WHERE …3) REORG … RECLAIM EXTENTS

ExtentBoundaries

Extents quickly returned to tablespaceAvailable for other tables, indexes

8am 9am 10am 11am 12pm

– Rows clustered by ‘insert time’– Very predominant pattern : rows inserted together are often deleted together– Results in many extents naturally becoming free after deletions– Invoke extent reclamation explicitly (or rely on Automatic Table Maintenance daemon), eg:

� New extended ITC RECLAIM– Light weight task to move rows out of ‘almost empty’ extents

– More complete space reclamation

– Still very fast

REORG … RECLAIM EXTENTS Reclaims More !

Embedded empty extents

can be easily reclaimed

from the table/index

Extent Boundaries

DB2 REORG TABLE T1 .. RECLAIM EXTENTS ALOW WRITE ACCESS

Online Rolling Updates

� DBAs can apply DB2 maintenance without an outage window

� Procedure:1. Drain (aka Quiesce)2. Remove & Maintain3. Re-integrate4. Repeat until done

Single Database View

DB2 DB2 DB2 DB2

member1 member2

Code level: GACode level: FP1 Code level: GACode level: FP1

member1 member2

COMMIT

Rolling Updates Concepts

db2stop member1quiesce

db2start member1

db2iupdt member1

db2stop member2quiesce

db2iupdt member2

db2start member2

C C

FP1 committed. New function available. Cannot roll down to GA anymore

Transparent ZERO database downtime

member1 member2

Code level: GACode level: FP1 Code level: GACode level: FP1

member1 member2

InstallFixpack

db2start member1

InstallFixpack

C C

$ installFixPack -p <install_path> -l <install_log> –I <InstName> -commit_level

� InstallFixpack command enhanced to simplify an online update– One invocation per host, drives all of the steps needed for all members and CFs on the host – Execute once per host in your instance, and one additional time to commit

Rolling Updates : More Detail

db2start member2

� New informational configuration parameter : – Current Effective Code Level (CECL) - denotes committed code level in the cluster

denotes level of function available in the cluster

InstallFixpack -commit_level

CECL = 10.5 GACECL = 10.5 FP1

pureScale HADR

� Simple DR solution for pureScale

� Built in resiliency

• Tolerant of member failures on primary and standby

• Another member takes-over sending/receiving log data

• Can access failed members logs

� Simple configuration

• No need specify all addresses of ‘other side’ (an automatic discovery protocol does that)

� Eliminate back pressure on primary via log spooling on standby

� Initial support includes

• Async and super-async

CFCF

CFCF

Primary Cluster

Standby DR Cluster

pureScale HADR : Attributes

� Single system view– START / STOP / ACTIVATE / DEACTIVATE / TAKEOVER commands only need to

be issued once, not once per member

� One member on standby is designated the ‘replay’ member– All primary members send log to parallel threads on a replay member on standby– The replay member is highly available

If the current replay member fails, DB2 will automatically run replay on another member

� Assisted Remote Catchup (ARCU) – If one primary member is not available, standby can obtain its logs via another

primary member that is available

� Standby requirement– Must also be running with pureScale with the same number of members (they can be

logical members)

Member

Member

Member

CFCF

Primary site

Member

Member

Member

CFCF

Standby site

Logs 1 Logs 2 Logs 3

Member 3 sends member

1’s logs

Link

Replaymember

Transactions

Member

pureScale HADR Built-in Resiliency

Link

DB2 10.5 Editions – New Simple Packages!

Advanced Workgroup Server Edition� Fully functional offering for small OLTP and analytic

deployments� Primarily used in department environments within large

enterprises. Also available for SMB/MM deployments� Limited by TB, memory, sockets and/or cores� Includes Tools, Compression, BLU, pureScale and DPF

Advanced Enterprise Server Edition� Fully functional offering for Enterprise Class OLTP and/or

analytic deployments� Targeting full enterprise / full data center requirements� No TB, memory, socket or core limit� Includes Tools, Compression, BLU, pureScale and DPF

Workgroup Server Edition� Capabilities for an entry level offering � Targeting single server requirements with less intense

workloads in both the OLTP and analytic space� Limited by TB, memory, sockets and/or cores� Does not include Tools, Compression, BLU, pureScale

and DPF

Express Edition

For Departmental Use For Enterprise Use

Enterprise Server Edition� Capabilities for an entry level offering� Targeting single server, enterprise requirements with

more intense workloads in both the OLTP and analytic space

� No TB, memory, socket or core limit� Does not include Tools, Compression, BLU, pureScale

and DPF

Express-C and Developer Edition

CEO Enterprise and Advanced

Fully FunctionalFully Functional

Limited CapacityLimited Capacity Unlimited CapacityUnlimited Capacity

Base Function OnlyBase Function Only

� Some of the Highlights

– Optim Performance Manager (OPM)• New metrics for columnar query processing and other 10.5 capabilities

– Optim Query Workload Tuner (OQWT)• New table Organization

(ie. BLU) Advisor

– Data Studio• BLU support• HADR Multiple Standbys• pureScale support

Enhancements

Comprehensive 10.5 Tooling Support

� Speed of Thought Analytics - with new BLU Acceleration

• 8-25x faster reporting and analytics1; more than 1000x seen in some lab test queries2

• 10x storage space savings seen during beta test3

� Always Available Transactions - with enhanced pureScale� Online rolling maintenance updates with no planned downtime4

• Designed for disaster recovery over distances of 1000s km5

� Unprecedented Affordability• In-memory speed and simplicity on existing infrastructure• Optimized for SAP workloads for faster performance and to help dramatically reduce costs• Upgrade to DB2 with average. 98% Oracle Database application compatibility7

� Future-Proof Versatility• Optimized capabilities for both OLTP and data warehousing• Business grade NoSQL and mobile database for greater application flexibility

1 Based on internal IBM testing of sample analytic workloads comparing queries accessing row-based tables on DB2 10.1 vs. columnar tables on DB2 10.5. Performance improvement figures are cumulative of all queries in the workload. Individual results will vary depending on individual workloads, configurations and conditions.

2 Based on internal IBM tests of pure analytic workloads comparing queries accessing row-based tables on DB2 10.1 vs. columnar tables on DB2 10.5. Results not typical. Individual results will vary depending on individual workloads, configurations and conditions, including size and content of the table, and number of elements being queried from a given table.

3 Client-reported testing results in DB2 10.5 early release program. Individual results will vary depending on individual workloads, configurations and conditions, including table size and content.4 Based on IBM design for normal operation with rolling maintenance updates of DB2 server software on a pureScale cluster. Individual results will vary depending on individual workloads, configurations and conditions, network availability and bandwidth.5 Based on IBM design for normal operation under typical workload. Individual results will vary depending on individual workloads, configurations and conditions, network availability and bandwidth.6 Available with DB2 Advanced Enterprise Server Edition. 7 Based on internal tests and reported client experience from 28 Sep 2011 to 07 Mar 2012.

Summary & Questions

DB2 10.5 Themes

Documents

Idug Tridex 2013 Db2 10.5 d