Upload
michal-simonik
View
31
Download
0
Embed Size (px)
Citation preview
Joins – which, when and why
Michal Simonik
@michalsimonik [email protected]
http://www.michalsimonik.com
www.anakreongames.com
Independent consultant +13 years in IT +12 years with Oracle Database Architect Data Modeling and Tuning SQL Tuning Database Troubleshooting Consulting
Michal Simonik © 2016
About me
Joins – which, when and why
Michal Šimoník © 2016
Selectivity
Joins – which, when and why
SELECT COUNT(*) FROM mtg.order_items = 4814
SELECT COUNT(*) FROM mtg.order_items WHERE order_id < 500 = 2492
● Selectivity is part of table which is going to be returned ● If there are no statistics dynamic sampling is used ● If there is no histogram, linear distribution is expected ● Views
○ DBA_TABLES a DBA_TAB_STATISTICS ○ DBA_COL_STATISTICS
Michal Šimoník © 2016
Cardinality
Joins – which, when and why
● Expected number of rows returned by operation ● Basic values for costing of joins, filters and sorting operations
Michal Šimoník © 2016
Cost
Joins – which, when and why
● Cost is optimizer’s estimation on how much standard I/O operations will execution require
● 1 unit= 1 single block read
Michal Šimoník © 2016
Histograms
SQL Tuning
SELECT column_name, histogram FROM user_tab_col_statistics
WHERE table_name='CARDS';
COLUMN_NAME HISTOGRAM
------------------------------ ---------------
ID NONE
KIND_ID HEIGHT BALANCED
RARITY_ID FREQUENCY
COLOR_ID FREQUENCY
SET_ID FREQUENCY
ARTIST_ID HEIGHT BALANCED
NAME HEIGHT BALANCED
IMAGE NONE
Michal Šimoník © 2016
Frequency histograms
Joins – which, when and why
Michal Šimoník © 2016
Frequency histogram
Joins – which, when and why
SELECT endpoint_number, endpoint_value FROM user_histograms WHERE table_name='CARDS'
and column_name='RARITY_ID';
ENDPOINT_NUMBER ENDPOINT_VALUE
----------------- --------------------
4989 1
9185 2
9225 3
9285 4
13233 5 13233 - 9285 =
3948
SELECT count(*) FROM mtg.cards WHERE rarity_id = 5;
COUNT(*)
----------
3948
Michal Šimoník © 2016
Height-Balanced histograms
Joins – which, when and why
Michal Šimoník © 2016
Height-Balanced histogram
Joins – which, when and why
SELECT endpoint_number, endpoint_value FROM user_histograms WHERE table_name='CARDS'
and column_name='ARTIST_ID';
ENDPOINT_NUMBER ENDPOINT_VALUE
----------------- ----------------
...
251 599
252 604
253 ~0,39% 615
254 632 100 / 254 /(632 - 615) =
0.023%
(0.023
* 13233) / 100 = ~ 3
SELECT COUNT(*) FROM mtg.cards WHERE artist_id = 632;
COUNT(*)
----------
5
Michal Šimoník © 2016
Height-Balanced histogram
Joins – which, when and why
SELECT endpoint_number, endpoint_value FROM user_histograms WHERE table_name='CARDS'
and column_name='ARTIST_ID';
ENDPOINT_NUMBER ENDPOINT_VALUE
----------------- ----------------
1 1
2 2
4 3
6 4
9 5 (100 / 254 / 1) * 3 = 1.181%
(1.181
* 13233) / 100 = ~ 156
SELECT COUNT(*) FROM mtg.cards WHERE artist_id = 5;
COUNT(*)
----------
127
Michal Šimoník © 2016
B-Tree indexes
Joins – which, when and why
● Single column index does not store NULL*
*from Oracle 11g you can use: create index idx1 on tab(col1 asc,
0);
Michal Šimoník © 2016
Index hash join
Joins – which, when and why
SELECT /*+ INDEX_JOIN(c) */ c.id, c.kind_id, c.set_id
FROM mtg.cards c;
Michal Šimoník © 2016
Index hash join
Joins – which, when and why
Example
Michal Šimoník © 2016
Bitmap indexes
Joins – which, when and why
● Good for indexing low selectivity values ● Slow DML ● ALTER TABLE on indexed will result in index invalidation
● Bitmap indexes cannot be unique ● Size of index depends on data distribution
Michal Šimoník © 2016
Bitmap AND
Joins – which, when and why
SELECT * FROM mtg.cards WHERE name = 'Counterspell' AND id = 'SCB15';
WITH tmp AS
(SELECT
/*+ materialize */
*
FROM
(SELECT m.patient_id,
m.procedure_date,
m.id,
ROW_NUMBER() OVER (PARTITION BY m.patient_id, m.procedure_date ORDER BY m.patient_id, m.procedure_date) c,
COUNT(*) OVER (PARTITION BY m.patient_id, m.procedure_date) ct
FROM patient_data m
)
WHERE ct > 1
)
SELECT uni.id,
uni.patient_id,
uni.procedure_date,
dpl.id
FROM tmp dpl,
tmp uni
WHERE dpl.patient_id = uni.patient_id
AND dpl.procedure_date = uni.procedure_date
AND uni.c = 1
AND dpl.c > 1;
Michal Simonik © 2016
Self join
Joins – which, when and why
Michal Šimoník © 2016
Self join
Joins – which, when and why
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | OMem | 1Mem | O/1/M |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | | | 1024 | 1024 | 1/0/0|
|* 3 | VIEW | | 1 | | | |
| 4 | WINDOW SORT | | 1 | 1024 | 1024 | 1/0/0|
| 5 | TABLE ACCESS FULL | PATIENT_DATA | 1 | | | |
|* 6 | HASH JOIN | | 1 | 814K| 814K| 1/0/0|
|* 7 | VIEW | | 1 | | | |
| 8 | TABLE ACCESS FULL | SYS_TEMP_0FD9FC85C_9DE45319 | 1 | | | |
|* 9 | VIEW | | 1 | | | |
| 10 | TABLE ACCESS FULL | SYS_TEMP_0FD9FC85C_9DE45319 | 1 | | | |
------------------------------------------------------------------------------------------------------
Michal Šimoník © 2016
Joins
Joins – which, when and why
● Cartesian join ● Inner join
○ Standard join ● Semi join
○ Join of two sets, where rows from first set are returned if matching row exists in second set (inner table).
○ IN, EXISTS
Michal Šimoník © 2016
Joins
Joins – which, when and why
● Antijoin ○ Implemented as opposite of semi join ○ Row is excluded if there is matching row in inner set ○ NOT IN, NOT EXISTS
● Equijoin ○ Join based on predicate equality
● Nonequijoin ○ For example join using BETWEEN
Michal Šimoník © 2016
Joins
Joins – which, when and why
● Outer join ○ Returns all rows which satisfy join condition but also those (for outer
table) which do not ○ Oracle recommends that you use the FROM clause OUTER JOIN
syntax rather than the Oracle join operator (+) ○ (+) is subject to restrictions, read more at
■ https://docs.oracle.com/database/121/SQLRF/queries006.htm#SQLRF52355
○ Certain capabilities of materialized views are not supported when using ANSI joins
Michal Šimoník © 2016
Nested Loop
Joins – which, when and why
● Inner table is searched for every row returned from outer table
● Outer table is usually large ● Inner table is small or with good
index ● Cost= Cost(Outer) + N*Cost(Inner) ● Outer can be accessed with index
range scan, if Oracle expect small number of rows
● Watch out for bad estimation of N from Oracle or selectivity on index from inner table
Michal Šimoník © 2016
Hash Join
Joins – which, when and why
● Hash join can be performed only on equijoin
● Small table (Outer) is data set for creation of hash table
● Larger table (Inner) is scanned and rows are tested against hash table
● Hash join is most efficient if hash table can fit into memory.
● Cost is C1 + C2 + little bit for hash table
Michal Šimoník © 2016
Joins
Joins – which, when and why
Example
Michal Šimoník © 2016
Sort merge
Joins – which, when and why
● Both tables are sorted by join columns
● This can mean two large sorts, but there can be good index for that
● Cost is C1 + C2 + SortC1 + SortC2
● Sometimes optimizer ignores the fact that data are already sorted and costs another sort in execution plan if needed
Michal Šimoník © 2016
Sort merge
Joins – which, when and why
Example
Michal Šimoník © 2016
BITMAP join index
Joins – which, when and why
● Performance benefits in warehouse ○ Low number of distinct values on indexed columns ○ Query must not contain columns in the WHERE which are not part
index ○ The overhead of DML is significant
Michal Šimoník © 2016
BITMAP join index
Joins – which, when and why
● Restrictions ○ Parallel DML is only supported on the fact table ○ Parallel DML on one of dimension tables will make index unusable ○ Only one table can be updated concurrently by different transactions ○ No table can appear twice in the join ○ Index-organized and temporary tables are not supported ○ The columns in the index must all be columns of the dimension tables ○ The dimension table join columns must be either primary key columns
or have unique constraints. ○ If a dimension has composite primary key, each key column must be
part of the join
Michal Šimoník © 2016
BITMAP join index
Joins – which, when and why
Example
Michal Šimoník © 2016
CROSS APPLY, OUTER APPLY
Joins – which, when and why
● Why it exists? ○ Language Integrated Query (LINQ) ○ LINQ allowed you to compose a query once and have it work against
any data source ○ Problems with CROSS APPLY - not supported on Oracle ○ Implemented in Oracle 12c
Michal Šimoník © 2016
CROSS APPLY, OUTER APPLY
Joins – which, when and why
Example
Michal Šimoník © 2016
Cluster join
Joins – which, when and why
● CLUSTER join is special case of the NESTED LOOP ● CLUSTER join is used when
○ If tables are part of a cluster ○ If the join is an equijoin between the cluster keys
● Oracle reads each row from the first table and finds all matches in the
second table by using the CLUSTER index ● CLUSTER joins can be very efficient because the joining “rows” are in the
same physical data block ● Very rarely used
Michal Šimoník © 2016
OLAP - Joining Cubes to Tables and Views
Joins – which, when and why
● You can join cubes to ○ Other cubes ○ Tables ○ Views ○ Other row source types
● CUBE JOIN limits the number of fetched values to improve performance ● Cube must be on the right side of the equation ● If CUBE JOIN is not possible, standard joins are used
○ Other cubes
● Hint USE_CUBE / NO_USE_CUBE
Michal Šimoník © 2016
OLAP - Joining Cubes to Tables and Views
Joins – which, when and why
● You can join cubes to ○ Other cubes ○ Tables ○ Views ○ Other row source types
● CUBE JOIN limits the number of fetched values to improve performance ● Cube must be on the right side of the equation ● If CUBE JOIN is not possible, standard joins are used
○ Other cubes
● Hint USE_CUBE / NO_USE_CUBE
Michal Šimoník © 2016
Other operations
Joins – which, when and why
● UNION
● UNION ALL
Michal Šimoník © 2016
Other operations
Joins – which, when and why
● INTERSECT
● MINUS
Michal Šimoník © 2016
OLAP - Joining Cubes to Tables and Views
Joins – which, when and why
INSERT INTO metadata_tab
SELECT metadata_seq.NEXTVAL,
sel.*
FROM
(SELECT DISTINCT zp_margin,
date_from_rb,
date_to_rb,
date_from_rz,
obd_to_rz
FROM fact_tab
WHERE app_code = 'P304'
AND period = 201209
MINUS
SELECT zp_margin,
date_from_rb,
date_to_rb,
date_from_rz,
obd_to_rz
FROM metadata_tab
) sel;
Michal Šimoník © 2016
OLAP - Joining Cubes to Tables and Views
Joins – which, when and why
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | Buffers | Reads |
------------------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 1 | | 0 | 186 | 1 |
| 1 | LOAD TABLE CONVENTIONAL | | 1 | | 0 | 186 | 1 |
| 2 | SEQUENCE | METADATA_SEQ | 1 | | 0 | 186 | 1 |
| 3 | VIEW | | 4 | 4 | 0 | 0 | 0 |
| 4 | MINUS | | 4 | | 0 | 0 | 0 |
| 5 | SORT UNIQUE | | 4 | 4 | 15 | 0 | 0 |
|* 6 | TABLE ACCESS FULL | FAKTA_TAB | 62 | 2480K| 2544K| 15912 | 10332 |
| 7 | SORT UNIQUE | | 4 | 166 | 180 | 0 | 0 |
|* 8 | TABLE ACCESS FULL | METADATA_SEQ | 5 | 166 | 180 | 10 | 5 |
------------------------------------------------------------------------------------------------------
* Optimal memory for SORT- 2k
Michal Simonik © 2016
Thank you for your attention
Joins – which, when and why
Q&A
@michalsimonik [email protected]
http://www.michalsimonik.com