40
Joins – which, when and why Michal Simonik @michalsimonik [email protected] http://www.michalsimonik.com

Joins – which, when and why

Embed Size (px)

Citation preview

Page 1: Joins – which, when and why

Joins – which, when and why

Michal Simonik

@michalsimonik [email protected]

http://www.michalsimonik.com

Page 2: Joins – which, when and why

www.anakreongames.com

Page 3: Joins – which, when and why

Independent consultant +13 years in IT +12 years with Oracle Database Architect Data Modeling and Tuning SQL Tuning Database Troubleshooting Consulting

Michal Simonik © 2016

About me

Joins – which, when and why

Page 4: Joins – which, when and why

Michal Šimoník © 2016

Selectivity

Joins – which, when and why

SELECT COUNT(*) FROM mtg.order_items = 4814

SELECT COUNT(*) FROM mtg.order_items WHERE order_id < 500 = 2492

● Selectivity is part of table which is going to be returned ● If there are no statistics dynamic sampling is used ● If there is no histogram, linear distribution is expected ● Views

○ DBA_TABLES a DBA_TAB_STATISTICS ○ DBA_COL_STATISTICS

Page 5: Joins – which, when and why

Michal Šimoník © 2016

Cardinality

Joins – which, when and why

● Expected number of rows returned by operation ● Basic values for costing of joins, filters and sorting operations

Page 6: Joins – which, when and why

Michal Šimoník © 2016

Cost

Joins – which, when and why

● Cost is optimizer’s estimation on how much standard I/O operations will execution require

● 1 unit= 1 single block read

Page 7: Joins – which, when and why

Michal Šimoník © 2016

Histograms

SQL Tuning

SELECT column_name, histogram FROM user_tab_col_statistics

WHERE table_name='CARDS';

COLUMN_NAME HISTOGRAM

------------------------------ ---------------

ID NONE

KIND_ID HEIGHT BALANCED

RARITY_ID FREQUENCY

COLOR_ID FREQUENCY

SET_ID FREQUENCY

ARTIST_ID HEIGHT BALANCED

NAME HEIGHT BALANCED

IMAGE NONE

Page 8: Joins – which, when and why

Michal Šimoník © 2016

Frequency histograms

Joins – which, when and why

Page 9: Joins – which, when and why

Michal Šimoník © 2016

Frequency histogram

Joins – which, when and why

SELECT endpoint_number, endpoint_value FROM user_histograms WHERE table_name='CARDS'

and column_name='RARITY_ID';

ENDPOINT_NUMBER ENDPOINT_VALUE

----------------- --------------------

4989 1

9185 2

9225 3

9285 4

13233 5 13233 - 9285 =

3948

SELECT count(*) FROM mtg.cards WHERE rarity_id = 5;

COUNT(*)

----------

3948

Page 10: Joins – which, when and why

Michal Šimoník © 2016

Height-Balanced histograms

Joins – which, when and why

Page 11: Joins – which, when and why

Michal Šimoník © 2016

Height-Balanced histogram

Joins – which, when and why

SELECT endpoint_number, endpoint_value FROM user_histograms WHERE table_name='CARDS'

and column_name='ARTIST_ID';

ENDPOINT_NUMBER ENDPOINT_VALUE

----------------- ----------------

...

251 599

252 604

253 ~0,39% 615

254 632 100 / 254 /(632 - 615) =

0.023%

(0.023

* 13233) / 100 = ~ 3

SELECT COUNT(*) FROM mtg.cards WHERE artist_id = 632;

COUNT(*)

----------

5

Page 12: Joins – which, when and why

Michal Šimoník © 2016

Height-Balanced histogram

Joins – which, when and why

SELECT endpoint_number, endpoint_value FROM user_histograms WHERE table_name='CARDS'

and column_name='ARTIST_ID';

ENDPOINT_NUMBER ENDPOINT_VALUE

----------------- ----------------

1 1

2 2

4 3

6 4

9 5 (100 / 254 / 1) * 3 = 1.181%

(1.181

* 13233) / 100 = ~ 156

SELECT COUNT(*) FROM mtg.cards WHERE artist_id = 5;

COUNT(*)

----------

127

Page 13: Joins – which, when and why

Michal Šimoník © 2016

B-Tree indexes

Joins – which, when and why

● Single column index does not store NULL*

*from Oracle 11g you can use: create index idx1 on tab(col1 asc,

0);

Page 14: Joins – which, when and why

Michal Šimoník © 2016

Index hash join

Joins – which, when and why

SELECT /*+ INDEX_JOIN(c) */ c.id, c.kind_id, c.set_id

FROM mtg.cards c;

Page 15: Joins – which, when and why

Michal Šimoník © 2016

Index hash join

Joins – which, when and why

Example

Page 16: Joins – which, when and why

Michal Šimoník © 2016

Bitmap indexes

Joins – which, when and why

● Good for indexing low selectivity values ● Slow DML ● ALTER TABLE on indexed will result in index invalidation

● Bitmap indexes cannot be unique ● Size of index depends on data distribution

Page 17: Joins – which, when and why

Michal Šimoník © 2016

Bitmap AND

Joins – which, when and why

SELECT * FROM mtg.cards WHERE name = 'Counterspell' AND id = 'SCB15';

Page 18: Joins – which, when and why

WITH tmp AS

(SELECT

/*+ materialize */

*

FROM

(SELECT m.patient_id,

m.procedure_date,

m.id,

ROW_NUMBER() OVER (PARTITION BY m.patient_id, m.procedure_date ORDER BY m.patient_id, m.procedure_date) c,

COUNT(*) OVER (PARTITION BY m.patient_id, m.procedure_date) ct

FROM patient_data m

)

WHERE ct > 1

)

SELECT uni.id,

uni.patient_id,

uni.procedure_date,

dpl.id

FROM tmp dpl,

tmp uni

WHERE dpl.patient_id = uni.patient_id

AND dpl.procedure_date = uni.procedure_date

AND uni.c = 1

AND dpl.c > 1;

Michal Simonik © 2016

Self join

Joins – which, when and why

Page 19: Joins – which, when and why

Michal Šimoník © 2016

Self join

Joins – which, when and why

------------------------------------------------------------------------------------------------------

| Id | Operation | Name | E-Rows | OMem | 1Mem | O/1/M |

------------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | | | | |

| 1 | TEMP TABLE TRANSFORMATION | | | | | |

| 2 | LOAD AS SELECT | | | 1024 | 1024 | 1/0/0|

|* 3 | VIEW | | 1 | | | |

| 4 | WINDOW SORT | | 1 | 1024 | 1024 | 1/0/0|

| 5 | TABLE ACCESS FULL | PATIENT_DATA | 1 | | | |

|* 6 | HASH JOIN | | 1 | 814K| 814K| 1/0/0|

|* 7 | VIEW | | 1 | | | |

| 8 | TABLE ACCESS FULL | SYS_TEMP_0FD9FC85C_9DE45319 | 1 | | | |

|* 9 | VIEW | | 1 | | | |

| 10 | TABLE ACCESS FULL | SYS_TEMP_0FD9FC85C_9DE45319 | 1 | | | |

------------------------------------------------------------------------------------------------------

Page 20: Joins – which, when and why

Michal Šimoník © 2016

Joins

Joins – which, when and why

● Cartesian join ● Inner join

○ Standard join ● Semi join

○ Join of two sets, where rows from first set are returned if matching row exists in second set (inner table).

○ IN, EXISTS

Page 21: Joins – which, when and why

Michal Šimoník © 2016

Joins

Joins – which, when and why

● Antijoin ○ Implemented as opposite of semi join ○ Row is excluded if there is matching row in inner set ○ NOT IN, NOT EXISTS

● Equijoin ○ Join based on predicate equality

● Nonequijoin ○ For example join using BETWEEN

Page 22: Joins – which, when and why

Michal Šimoník © 2016

Joins

Joins – which, when and why

● Outer join ○ Returns all rows which satisfy join condition but also those (for outer

table) which do not ○ Oracle recommends that you use the FROM clause OUTER JOIN

syntax rather than the Oracle join operator (+) ○ (+) is subject to restrictions, read more at

■ https://docs.oracle.com/database/121/SQLRF/queries006.htm#SQLRF52355

○ Certain capabilities of materialized views are not supported when using ANSI joins

Page 23: Joins – which, when and why

Michal Šimoník © 2016

Nested Loop

Joins – which, when and why

● Inner table is searched for every row returned from outer table

● Outer table is usually large ● Inner table is small or with good

index ● Cost= Cost(Outer) + N*Cost(Inner) ● Outer can be accessed with index

range scan, if Oracle expect small number of rows

● Watch out for bad estimation of N from Oracle or selectivity on index from inner table

Page 24: Joins – which, when and why

Michal Šimoník © 2016

Hash Join

Joins – which, when and why

● Hash join can be performed only on equijoin

● Small table (Outer) is data set for creation of hash table

● Larger table (Inner) is scanned and rows are tested against hash table

● Hash join is most efficient if hash table can fit into memory.

● Cost is C1 + C2 + little bit for hash table

Page 25: Joins – which, when and why

Michal Šimoník © 2016

Joins

Joins – which, when and why

Example

Page 26: Joins – which, when and why

Michal Šimoník © 2016

Sort merge

Joins – which, when and why

● Both tables are sorted by join columns

● This can mean two large sorts, but there can be good index for that

● Cost is C1 + C2 + SortC1 + SortC2

● Sometimes optimizer ignores the fact that data are already sorted and costs another sort in execution plan if needed

Page 27: Joins – which, when and why

Michal Šimoník © 2016

Sort merge

Joins – which, when and why

Example

Page 28: Joins – which, when and why

Michal Šimoník © 2016

BITMAP join index

Joins – which, when and why

● Performance benefits in warehouse ○ Low number of distinct values on indexed columns ○ Query must not contain columns in the WHERE which are not part

index ○ The overhead of DML is significant

Page 29: Joins – which, when and why

Michal Šimoník © 2016

BITMAP join index

Joins – which, when and why

● Restrictions ○ Parallel DML is only supported on the fact table ○ Parallel DML on one of dimension tables will make index unusable ○ Only one table can be updated concurrently by different transactions ○ No table can appear twice in the join ○ Index-organized and temporary tables are not supported ○ The columns in the index must all be columns of the dimension tables ○ The dimension table join columns must be either primary key columns

or have unique constraints. ○ If a dimension has composite primary key, each key column must be

part of the join

Page 30: Joins – which, when and why

Michal Šimoník © 2016

BITMAP join index

Joins – which, when and why

Example

Page 31: Joins – which, when and why

Michal Šimoník © 2016

CROSS APPLY, OUTER APPLY

Joins – which, when and why

● Why it exists? ○ Language Integrated Query (LINQ) ○ LINQ allowed you to compose a query once and have it work against

any data source ○ Problems with CROSS APPLY - not supported on Oracle ○ Implemented in Oracle 12c

Page 32: Joins – which, when and why

Michal Šimoník © 2016

CROSS APPLY, OUTER APPLY

Joins – which, when and why

Example

Page 33: Joins – which, when and why

Michal Šimoník © 2016

Cluster join

Joins – which, when and why

● CLUSTER join is special case of the NESTED LOOP ● CLUSTER join is used when

○ If tables are part of a cluster ○ If the join is an equijoin between the cluster keys

● Oracle reads each row from the first table and finds all matches in the

second table by using the CLUSTER index ● CLUSTER joins can be very efficient because the joining “rows” are in the

same physical data block ● Very rarely used

Page 34: Joins – which, when and why

Michal Šimoník © 2016

OLAP - Joining Cubes to Tables and Views

Joins – which, when and why

● You can join cubes to ○ Other cubes ○ Tables ○ Views ○ Other row source types

● CUBE JOIN limits the number of fetched values to improve performance ● Cube must be on the right side of the equation ● If CUBE JOIN is not possible, standard joins are used

○ Other cubes

● Hint USE_CUBE / NO_USE_CUBE

Page 35: Joins – which, when and why

Michal Šimoník © 2016

OLAP - Joining Cubes to Tables and Views

Joins – which, when and why

● You can join cubes to ○ Other cubes ○ Tables ○ Views ○ Other row source types

● CUBE JOIN limits the number of fetched values to improve performance ● Cube must be on the right side of the equation ● If CUBE JOIN is not possible, standard joins are used

○ Other cubes

● Hint USE_CUBE / NO_USE_CUBE

Page 36: Joins – which, when and why

Michal Šimoník © 2016

Other operations

Joins – which, when and why

● UNION

● UNION ALL

Page 37: Joins – which, when and why

Michal Šimoník © 2016

Other operations

Joins – which, when and why

● INTERSECT

● MINUS

Page 38: Joins – which, when and why

Michal Šimoník © 2016

OLAP - Joining Cubes to Tables and Views

Joins – which, when and why

INSERT INTO metadata_tab

SELECT metadata_seq.NEXTVAL,

sel.*

FROM

(SELECT DISTINCT zp_margin,

date_from_rb,

date_to_rb,

date_from_rz,

obd_to_rz

FROM fact_tab

WHERE app_code = 'P304'

AND period = 201209

MINUS

SELECT zp_margin,

date_from_rb,

date_to_rb,

date_from_rz,

obd_to_rz

FROM metadata_tab

) sel;

Page 39: Joins – which, when and why

Michal Šimoník © 2016

OLAP - Joining Cubes to Tables and Views

Joins – which, when and why

------------------------------------------------------------------------------------------------------

| Id | Operation | Name | Starts | E-Rows | A-Rows | Buffers | Reads |

------------------------------------------------------------------------------------------------------

| 0 | INSERT STATEMENT | | 1 | | 0 | 186 | 1 |

| 1 | LOAD TABLE CONVENTIONAL | | 1 | | 0 | 186 | 1 |

| 2 | SEQUENCE | METADATA_SEQ | 1 | | 0 | 186 | 1 |

| 3 | VIEW | | 4 | 4 | 0 | 0 | 0 |

| 4 | MINUS | | 4 | | 0 | 0 | 0 |

| 5 | SORT UNIQUE | | 4 | 4 | 15 | 0 | 0 |

|* 6 | TABLE ACCESS FULL | FAKTA_TAB | 62 | 2480K| 2544K| 15912 | 10332 |

| 7 | SORT UNIQUE | | 4 | 166 | 180 | 0 | 0 |

|* 8 | TABLE ACCESS FULL | METADATA_SEQ | 5 | 166 | 180 | 10 | 5 |

------------------------------------------------------------------------------------------------------

* Optimal memory for SORT- 2k

Page 40: Joins – which, when and why

Michal Simonik © 2016

Thank you for your attention

Joins – which, when and why

Q&A

@michalsimonik [email protected]

http://www.michalsimonik.com