PracticalPartitioning v2

Managing Large Data

PartitioningPartitioning OverviewIndexingManaging statisticsCompressionPurgingBacking up

PartitioningFactsDivide and ConquerMany TypesRangeListHashInterval (11g)Reference (11g)Composite

PartitioningFifteen Years of Development

PartitioningFactsIs not mostly about performanceEspecially with OLTPWith OLTP you must be careful to not impeded performance!Is mostly about administrationIs an extra cost option to Enterprise Edition

PartitioningAvailabilityPerformanceAdministration

PartitioningFactsIncreases Availability of dataEach partition is independentSome users may never even notice some data was unavailable due to partition eliminationDowntime is reduced as well as time to recover is reduced (smaller sets of data to recover)part1.sql

Part1.sqlops$tkyte%ORA11GR2> CREATE TABLE emp 2 ( empno int, 3 ename varchar2(20) 4 ) 5 PARTITION BY HASH (empno) 6 ( partition part_1 tablespace p1, 7 partition part_2 tablespace p2 8 ) 9 /

Table created.

ops$tkyte%ORA11GR2> insert into emp select empno, ename from scott.emp 2 /

14 rows created.

Part1.sqlops$tkyte%ORA11GR2> select part1, part2 2 from ( 3 select empno || ', ' || ename part1, row_number() over (order by empno) rn1 4 from emp partition(part_1) 5 ) A FULL OUTER JOIN ( 6 select empno || ', ' || ename part2, row_number() over (order by empno) rn2 7 from emp partition(part_2) 8 ) B on ( a.rn1 = b.rn2 ) 9 /

PART1 PART2--------------- ---------------7369, SMITH 7521, WARD7499, ALLEN 7566, JONES7654, MARTIN 7788, SCOTT7698, BLAKE 7844, TURNER7782, CLARK 7900, JAMES7839, KING 7902, FORD7876, ADAMS7934, MILLER

8 rows selected.

Part1.sqlops$tkyte%ORA11GR2> alter tablespace p1 offline;

Tablespace altered.

ops$tkyte%ORA11GR2> select * from emp;select * from emp *ERROR at line 1:ORA-00376: file 3 cannot be read at this timeORA-01110: data file 3:'/home/ora11gr2/app/ora11gr2/oradata/ora11gr2/ORA11GR2/datafile/o1_mf_p1_6rprfrmo_.dbf'

Part1.sqlops$tkyte%ORA11GR2> variable n numberops$tkyte%ORA11GR2> exec :n := 7844;

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> select * from emp where empno = :n;

EMPNO ENAME---------- -------------------- 7844 TURNER

PartitioningFactsReduced Administrative BurdenPerforming operations on small objects isEasierFaster (each individual operation is, total time might increase)Less resource intensive click to see example

PartitioningSQL> create table big_table1 2 ( ID, OWNER, OBJECT_NAME, SUBOBJECT_NAME, 3 OBJECT_ID, DATA_OBJECT_ID, 4 OBJECT_TYPE, CREATED, LAST_DDL_TIME, 5 TIMESTAMP, STATUS, TEMPORARY, 6 GENERATED, SECONDARY ) 7 tablespace big1 8 as 9 select ID, OWNER, OBJECT_NAME, SUBOBJECT_NAME, 10 OBJECT_ID, DATA_OBJECT_ID, 11 OBJECT_TYPE, CREATED, LAST_DDL_TIME, 12 TIMESTAMP, STATUS, TEMPORARY, 13 GENERATED, SECONDARY 14 from big_table.big_table;Table created. (10,000,000 rows)

PartitioningSQL> create table big_table2 2 ( ID, OWNER, OBJECT_NAME, SUBOBJECT_NAME, 3 OBJECT_ID, DATA_OBJECT_ID, 4 OBJECT_TYPE, CREATED, LAST_DDL_TIME, 5 TIMESTAMP, STATUS, TEMPORARY, 6 GENERATED, SECONDARY ) 7 partition by hash(id) 8 (partition part_1 tablespace big2, 9 partition part_2 tablespace big2, 10 partition part_3 tablespace big2, 11 partition part_4 tablespace big2, 12 partition part_5 tablespace big2, 13 partition part_6 tablespace big2, 14 partition part_7 tablespace big2, 15 partition part_8 tablespace big2 16 ) 17 as 18 select ID, OWNER, OBJECT_NAME, SUBOBJECT_NAME, 19 OBJECT_ID, DATA_OBJECT_ID, 20 OBJECT_TYPE, CREATED, LAST_DDL_TIME, 21 TIMESTAMP, STATUS, TEMPORARY, 22 GENERATED, SECONDARY 23 from big_table.big_table;Table created.

PartitioningSQL> select b.tablespace_name, 2 mbytes_alloc, 3 mbytes_free 4 from ( select round(sum(bytes)/1024/1024) mbytes_free, 5 tablespace_name 6 from dba_free_space 7 group by tablespace_name ) a, 8 ( select round(sum(bytes)/1024/1024) mbytes_alloc, 9 tablespace_name 10 from dba_data_files 11 group by tablespace_name ) b 12 where a.tablespace_name (+) = b.tablespace_name 13 and b.tablespace_name in ('BIG1','BIG2') 14 /

TABLESPACE MBYTES_ALLOC MBYTES_FREE---------- ------------ -----------BIG1 1496 344BIG2 1496 344

PartitioningSQL> alter table big_table1 move;alter table big_table1 move *ERROR at line 1:ORA-01652: unable to extend temp segment by 1024 in tablespace BIG1

We would need a lot of free space (resource) to move this table, you need 2 copies

PartitioningSQL> alter table big_table2 move;alter table big_table2 move *ERROR at line 1:ORA-14511: cannot perform operation on a partitioned object

We cannot move this table, but

PartitioningSQL> alter table big_table2 move partition part_1;Table altered.SQL> alter table big_table2 move partition part_2;Table altered.SQL> alter table big_table2 move partition part_3;Table altered.SQL> alter table big_table2 move partition part_4;Table altered.SQL> alter table big_table2 move partition part_5;Table altered.SQL> alter table big_table2 move partition part_6;Table altered.SQL> alter table big_table2 move partition part_7;Table altered.SQL> alter table big_table2 move partition part_8;Table altered.

We move each small partition one by one

PartitioningSQL> begin 2 for x in ( select partition_name 3 from user_tab_partitions 4 where table_name = 'BIG_TABLE2' ) 5 loop 6 execute immediate 7 'alter table big_table2 move partition ' || 8 x.partition_name; 9 end loop; 10 end; 11 /PL/SQL procedure successfully completed.

Of course, we would likely automate this process

PartitioningTook less free spaceIf something failed, we only lost 1/8th the work (8 partitions)You would need less UNDO space at any single point in timeYou can spread the work out over many days8 hours to rebuild entire table2 hour to rebuild a partitionTake 1 week to rebuild table a partition at a time

Partitioning Enhanced Statement PerformanceRead Query performancePartition elimination is importantMostly a warehouse/reporting eventIn OLTP, partitioning rarely improves read query performanceYou must be careful to not negatively impact it (more on that in indexing)Occasionally, it can increase read performance due to clusteringList partition by region, application queries by region, all data on a given block is for that region

Partitioning Enhanced Statement PerformanceWrite Query performanceReduced contentionInstead of 1 index with 1 hot block, you have N indexes with 1 hot block eachInstead of one set of freelists (be they ASSM or MSSM), you have N.

Partitioning - SchemesRange & IntervalHashListReferenceVirtual ColumnComposite

PartitioningComposite Partitioning

RangeListHashRange11gr19i8iList11gr111gr111gr1Hash11gr211gr211gr2

Indexing

Local and Global IndexesLOCAL INDEX

Equipartition the index with the table: For every table partition, there will be an index partition that indexes just that table partition. All of the entries in a given index partition point to a single table partition, and all of the rows in a single table partition are represented in a single index partition.

GLOBAL INDEX

Partition the index by range or hash: Here the index is partitioned by range, or optionally in Oracle 10g and above by hash, and a single index partition may point to any (and all) table partitions.

Which One to Use?Local Indexes are the first choice, if they make sensePartition key almost certainly must be referenced in predicateOtherwise you will scan ALL index partitions

Most prevalent in Warehouse systems Less so in OLTP to a degree

Which One to Use?

Global Indexes are second choiceAffects the speed and resources used by partition operations, can be maintained however, indexes never have to become unusableNecessary for uniqueness when indexed attributes are not part of the partition keyNecessary for runtime query performance when table partition key is not part of the where clause

Local IndexesTwo types are definedLocal Prefixed: the partition key is on the leading edge of the indexLocal nonprefixed: the partition key is NOT on the leading edgeBoth can use partition eliminationBoth can support uniquenessThere is nothing inherently better about prefixed versus nonprefixed

Local Indexesops$tkyte%ORA11GR2> CREATE TABLE partitioned_table 2 ( a int, 3 b int, 4 data char(20) 5 ) 6 PARTITION BY RANGE (a) 7 ( 8 PARTITION part_1 VALUES LESS THAN(2) tablespace p1, 9 PARTITION part_2 VALUES LESS THAN(3) tablespace p2 10 ) 11 /

Table created.

Local Indexesops$tkyte%ORA11GR2> create index local_prefixed on partitioned_table (a,b) local;Index created.

ops$tkyte%ORA11GR2> set autotrace traceonly explainops$tkyte%ORA11GR2> select * from partitioned_table where a=1 and b=2;

Execution Plan----------------------------------------------------------Plan hash value: 1622054381

----------------------------------------------------------------------------------| Id | Operation | Name | | Pstart| Pstop |----------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | | | || 1 | PARTITION RANGE SINGLE | | | 1 | 1 || 2 | TABLE ACCESS BY LOCAL INDEX ROWID| PARTITIONED_TABLE | | 1 | 1 ||* 3 | INDEX RANGE SCAN | LOCAL_PREFIXED | | 1 | 1 |----------------------------------------------------------------------------------

Predicate Information (identified by operation id):---------------------------------------------------

3 - access("A"=1 AND "B"=2)

Note----- - dynamic sampling used for this statement (level=2)

Local Indexesops$tkyte%ORA11GR2> drop index local_prefixed;Index dropped.

ops$tkyte%ORA11GR2> create index local_nonprefixed on partitioned_table (b) local;Index created.

ops$tkyte%ORA11GR2> select * from partitioned_table where a=1 and b=2;

Execution Plan----------------------------------------------------------Plan hash value: 904532382

---------------------------------------------------------------------------------| Id | Operation | Name || Pstart| Pstop |---------------------------------------------------------------------------------| 0 | SELECT STATEMENT | || | || 1 | PARTITION RANGE SINGLE | || 1 | 1 ||* 2 | TABLE ACCESS BY LOCAL INDEX ROWID| PARTITIONED_TABLE || 1 | 1 ||* 3 | INDEX RANGE SCAN | LOCAL_NONPREFIXED || 1 | 1 |---------------------------------------------------------------------------------

Predicate Information (identified by operation id):---------------------------------------------------

2 - filter("A"=1) 3 - access("B"=2)

Local Indexes - UniquenessLocal indexes can be used to enforce UNIQUE/PRIMARY key constraintsBut the partition key must be included in the constraint itselfWe enforce uniqueness within an index partition never across partitionsThus you cannot range partition a table by a date field and have a primary key index on ID that is localYou have to use global indexes for that

Global IndexesPartitioned using a scheme different from tableTable might have 10 range partitions by dateIndex might have 5 range partitions by region

There are only global prefixed indexes, no such thing as a non-prefixed global indexThe partition key for the global index is on the leading edge of the index every time.

Index only what you needNew in 11gR2You can index only part of a tableMaybe just the most current data needs an indexOlder data would be full scannedQuery plans can be generated that take this into consideration

Index only what you needops$tkyte%ORA11GR2> CREATE TABLE t 2 ( 3 dt date, 4 x int, 5 y varchar2(30) 6 ) 7 PARTITION BY RANGE (dt) 8 ( 9 PARTITION part1 VALUES LESS THAN (to_date('01-jan-2010','dd-mon-yyyy')) , 10 PARTITION part2 VALUES LESS THAN (to_date('01-jan-2011','dd-mon-yyyy')) , 11 PARTITION junk VALUES LESS THAN (MAXVALUE) 12 ) 13 /Table created.

ops$tkyte%ORA11GR2> insert into t 2 select to_date('01-jun-2010','dd-mon-yyyy'), rownum, object_name 3 from all_objects;71923 rows created.

ops$tkyte%ORA11GR2> exec dbms_stats.gather_table_stats(user,'T');

Index only what you needops$tkyte%ORA11GR2> create index t_idx on t(x) local unusable;

Index created.

ops$tkyte%ORA11GR2> alter index t_idx rebuild partition part2;

Index altered.

Index only what you needops$tkyte%ORA11GR2> set autotrace traceonly explainops$tkyte%ORA11GR2> select * from t where x = 42;

---------------------------------------------------------------| Id | Operation || Pstart| Pstop |---------------------------------------------------------------| 0 | SELECT STATEMENT || | || 1 | VIEW || | || 2 | UNION-ALL || | || 3 | PARTITION RANGE SINGLE || 2 | 2 || 4 | TABLE ACCESS BY LOCAL INDEX ROWID|| 2 | 2 ||* 5 | INDEX RANGE SCAN || 2 | 2 || 6 | PARTITION RANGE OR ||KEY(OR)|KEY(OR)||* 7 | TABLE ACCESS FULL ||KEY(OR)|KEY(OR)|---------------------------------------------------------------

Predicate Information (identified by operation id):--------------------------------------------------- 5 - access("X"=42) 7 - filter("X"=42 AND ("T"."DT"=TO_DATE(' 2011-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') OR "T"."DT" IS NULL))

Statistics

PartitioningDiscuss StatisticsLocal Global

Gathering StatisticsStrategy For New DatabasesCreate tablesOptionally Run (or explain) queries on empty tablesPrime / Seed the optimizerEnable incremental statisticsFor large partitioned tablesLoad dataGather statisticsUse the defaultsCreate indexes (if required!)

Gathering StatisticsIncremental StatisticsOne of the biggest problems with large tables is keeping the schema statistics up to date and accurateThis is particularly challenging in a Data Warehouse where tables continue to grow and so the statistics gathering time and resources grow proportionatelyTo address this problem, 11.1 introduced the concept of incremental statistics for partitioned objectsThis means that statistics are gathered for recently modified partitions

Gathering StatisticsThe Concept of SynopsesIt is not possible to simply add partition statistics together to create an up to date set of global statisticsThis is because the Number of Distinct Values (NDV) for a partition may include values common to multiple partitions.To resolve this problem, compressed representations of the distinct values of each column are created in a structure in the SYSAUX tablespace known as a synopsis

Gathering StatisticsSynopsis Example

ObjectColumn ValuesNDVPartition #11,1,3,4,54Partition #21,2,3,4,55NDV by additionWRONG9NDV by SynopsisCORRECT5

Compression

Direct Path Table CompressionIntroduced in Oracle9i Release 2Supports compression during bulk load operations (Direct Load, CTAS, ALTER MOVE, INSERT /*+ APPEND */)Data modified using conventional DML not compressedOptimized compression algorithm for relational dataImproved performance for queries accessing large amounts of dataFewer IOsBuffer Cache efficiency

Direct Path Table CompressionData is compressed at the database block levelEach block contains own compression metadata improves IO efficiencyLocal symbol table dynamically adapts to data changesCompression can be specified at either the table or partition levelsCompletely transparent to applicationsNoticeable impact on write performance

OLTP Table CompressionOracle Database 11g extends compression for OLTP dataSupport for conventional DML Operations (INSERT, UPDATE, DELETE)New algorithm significantly reduces write overheadBatched compression ensures no impact for most OLTP transactionsNo impact on readsReads may actually see improved performance due to fewer IOs and enhanced memory efficiency

OLTP Table CompressionInserts are uncompressedBlock usage reaches PCTFREE triggers CompressionInserts are again uncompressedBlock usage reaches PCTFREE triggers CompressionAdaptable, continuous compressionCompression automatically triggered when block usage reaches PCTFREECompression eliminates holes created due to deletions and maximizes contiguous free space in block

OLTP Table CompressionEmployee TableInitially Uncompressed BlockINSERT INTO EMPLOYEE VALUES (5, Jack, Smith);COMMIT;1JohnDoe 2Jane Doe 3JohnSmith 4 Jane DoeFree SpaceHeader

OLTP Table CompressionBlockHeader Doe Jane4 Smith John3 Smith Jack5 Doe Jane2 Doe John1 LAST_NAME FIRST_NAMEIDEmployee TableLocal Symbol Table

OLTP Table Compression1JohnDoe 2Jane Doe 3JohnSmith 4 Jane Doe 5Jack Smith Free SpaceHeader

Uncompressed BlockHeaderLocal Symbol TableMore Data Per Block

Using OLTP Table CompressionRequires database compatibility level at 11.1 or greaterNew Syntax extends the COMPRESS keywordCOMPRESS [FOR {ALL | DIRECT_LOAD} OPERATIONS]DIRECT_LOAD (DEFAULT)Refers to Bulk load operations from 10g and prior releasesALLOLTP + Direct loadsEnable compression for new tablesCREATE TABLE t1 COMPRESS FOR ALL OPERATIONSEnable only direct load compression on existing tableALTER TABLE t2 COMPRESSOnly new rows are compressed, existing rows are uncompressed

Applying compression with PartitioningChallenge:Want to minimize storageDo not want to use Advanced compression (option) for whatever reasonIn OLTP so no direct path optionsBackup friendly

Applying compression with PartitioningA current online, read-write tablespace that gets backed up like every other normal tablespace in our system. The audit trail information in this tablespace is not compressed, and it is constantly inserted into.

A read-only tablespace containing this year to date audit trail partitions in a compressed format. At the beginning of each month, we make this tablespace read-write, move and compress last months audit information into this tablespace, make it read-only again, and back it up once that month.

A series of tablespaces for last year, the year before, and so on. These are all read-only and might even be on slow, cheap media. In the event of a media failure, we just need to restore from backup. We would occasionally pick a year at random from our backup sets to ensure they are still restorable (tapes go bad sometimes).

Purging

PurgingBest facilitated by partitioningUses DDL, no undo, no redo unless.You have global indexes, they will need to be maintained or rebuilt

If you cannot use partitioningUse DDL Create table as select instead of deleteDELETE is the single most resource intensive statement out there.

Sliding Windows of DataChallenge:Keep N-years/months whatever of data onlineHave the data be constantly availablePurge old dataAdd new dataSupport efficient indexing scheme (keeping availability in mind)Support efficient storage (use indexes one current data mostly)

Sliding Windows of DataWell walk through how to Detaching the old data: The oldest partition is either dropped or exchanged with an empty table to permit archiving of the old data.

Loading and indexing of the new data: The new data is loaded into a work table and indexed and validated.

Attaching the new data: Once the new data is loaded and processed, the table it is in is exchanged with an empty partition in the partitioned table, turning this newly loaded data in a table into a partition of the larger partitioned table.

Sliding Windowops$tkyte@ORA11GR2> CREATE TABLE partitioned 2 ( timestamp date, 3 id int 4 ) 5 PARTITION BY RANGE (timestamp) 6 ( 7 PARTITION fy_2004 VALUES LESS THAN 8 ( to_date('01-jan-2005','dd-mon-yyyy') ) , 9 PARTITION fy_2005 VALUES LESS THAN 10 ( to_date('01-jan-2006','dd-mon-yyyy') ) 11 ) 12 /Table created.

ops$tkyte@ORA11GR2> insert into partitioned partition(fy_2004) 2 select to_date('31-dec-2004',dd-mon-yyyy)-mod(rownum,360), object_id 3 from all_objects 4 /72090 rows created. ops$tkyte@ORA11GR2> insert into partitioned partition(fy_2005) 2 select to_date('31-dec-2005',dd-mon-yyyy)-mod(rownum,360), object_id 3 from all_objects 4 /72090 rows created.

Sliding Windowops$tkyte@ORA11GR2> create index partitioned_idx_local 2 on partitioned(id) 3 LOCAL 4 /Index created. ops$tkyte@ORA11GR2> create index partitioned_idx_global 2 on partitioned(timestamp) 3 GLOBAL 4 /Index created.

Sliding Windowops$tkyte@ORA11GR2> create table fy_2004 ( timestamp date, id int );Table created. ops$tkyte@ORA11GR2> create index fy_2004_idx on fy_2004(id) 2 /Index created.

To archive to

Sliding Windowops$tkyte@ORA11GR2> create table fy_2006 ( timestamp date, id int );Table created. ops$tkyte@ORA11GR2> insert into fy_2006 2 select to_date('31-dec-2006',dd-mon-yyyy)-mod(rownum,360), object_id 3 from all_objects 4 /72097 rows created. ops$tkyte@ORA11GR2> create index fy_2006_idx on fy_2006(id) nologging 2 /Index created.

Data to be loaded

Sliding Windowops$tkyte@ORA11GR2> alter table partitioned 2 exchange partition fy_2004 3 with table fy_2004 4 including indexes 5 without validation 6 /Table altered. ops$tkyte@ORA11GR2> alter table partitioned 2 drop partition fy_2004 3 /Table altered.

That is our purge or archive operationNo data was touched

Sliding Windowops$tkyte@ORA11GR2> alter table partitioned 2 add partition fy_2006 3 values less than ( to_date('01-jan-2007','dd-mon-yyyy') ) 4 /Table altered. ops$tkyte@ORA11GR2> alter table partitioned 2 exchange partition fy_2006 3 with table fy_2006 4 including indexes 5 without validation 6 /Table altered.

That was our loadNo data was touched

Sliding Windowops$tkyte@ORA11GR2> select index_name, status from user_indexes; INDEX_NAME STATUS------------------------------ --------FY_2006_IDX VALIDFY_2004_IDX VALIDPARTITIONED_IDX_GLOBAL UNUSABLEPARTITIONED_IDX_LOCAL N/A

However, we have a problemGlobal indexes go invalid

Sliding Windowops$tkyte@ORA11GR2> alter table partitioned 2 exchange partition fy_2004 3 with table fy_2004 4 including indexes 5 without validation 6 UPDATE GLOBAL INDEXES 7 /Table altered. ops$tkyte@ORA11GR2> alter table partitioned 2 drop partition fy_2004 3 UPDATE GLOBAL INDEXES 4 /Table altered. Online operation, generates redo and undoBut 100% availability

Sliding Windowops$tkyte@ORA11GR2> alter table partitioned 2 add partition fy_2006 3 values less than ( to_date('01-jan-2007','dd-mon-yyyy') ) 4 /Table altered. ops$tkyte@ORA11GR2> alter table partitioned 2 exchange partition fy_2006 3 with table fy_2006 4 including indexes 5 without validation 6 UPDATE GLOBAL INDEXES 7 /Table altered.

Same here

Sliding Windowops$tkyte@ORA11GR2> select index_name, status from user_indexes; INDEX_NAME STATUS------------------------------ --------FY_2006_IDX VALIDFY_2004_IDX VALIDPARTITIONED_IDX_GLOBAL VALIDPARTITIONED_IDX_LOCAL N/A 6 rows selected.

Data was never unavailableOperation did take longerBut so what?

Backing Up

Backing Up

The fastest way to do something is to not do it

Backing UpUse Read Only TablespacesIncorporate sliding tablespaces with your sliding windows of dataBack up once, never againPut your local indexes in with your table partitions orJust dont backup indexes, often as fast or faster to recreate them in the event of media failure

Backing UpDont do indexesEven in a read/write environmentMight represent 50-60% of your database volumeAs easy to recreate in parallel/nologging as it would be to restoreEasier perhaps

Backing UpUse true incrementalsAvailable with changed block tracking in EE since 10gDemands an on disk based backupWe catch the backup up by applying only changed blocks to it.Now the time to backup a 100TB OLTP system is the same as a 100GB system (assuming the same transaction rates)Time to backup is a function of how much data is modified, not database size

Backing UpUse compression where ever availableIndex key compressionDirect path basic compressionOLTP compressionHC compression on Exadata/zfs/PillarSecure files compression

***********************************************

******************************

Documents

PracticalPartitioning v2