38
Meet Your Match Advanced row pattern matching (12c) Stew Ashton (stewashton.wordpress.com ) UKOUG Tech 2016 Can you read the following line? If not, please move closer. It's much better when you can read the code ;)

Advanced row pattern matching

Embed Size (px)

Citation preview

Page 1: Advanced row pattern matching

Meet Your MatchAdvanced row pattern matching (12c)

Stew Ashton (stewashton.wordpress.com)UKOUG Tech 2016

Can you read the following line? If not, please move closer.

It's much better when you can read the code ;)

Page 2: Advanced row pattern matching

2

Advanced usage, not all the syntax• Reminder of the basics• Warmup exercises• Bin fitting• Positive and negative sequencing• Hierarchical summaries• Alternatives to joining

Page 3: Advanced row pattern matching

3

Who am I?• 36 years in IT

– Developer, Technical Sales Engineer, Technical Architect– Aeronautics, IBM, Finance– Mainframe, client-server, Web apps

• 12 years using Oracle database– SQL performance analysis– Replace Java with SQL

• 4 years as in-house “Oracle Development Expert”• Conference speaker since 2014• Currently independent

Page 4: Advanced row pattern matching

4

Questions

Page 5: Advanced row pattern matching

5

Page 6: Advanced row pattern matching

6

Reminder: the Basics

• To illustrate: table with PAGE column– Group consecutive pages together

PAGE1235

FIRSTPAGE LASTPAGE CNT1 3 35 5 1

Page 7: Advanced row pattern matching

7

Pattern and Matching Rows• PATTERN

– Uninterrupted series of input rows– Described as list of conditions (≅ “regular expressions”)

PATTERN (A B*)"A" : 1 row, "B*" : 0 or more rows, as many as possible

• DEFINE (at least one) row condition[A undefined = TRUE]B AS page = PREV(page)+1

• Each series that matches the pattern is a “match”– "A" and "B" identify the rows that meet their conditions– There can be unmatched rows between series

Page 8: Advanced row pattern matching

8

Input, Processing, Output

1. Define input2. Order input3. Process pattern4. using defined conditions5. Output: rows per match6. Output: columns per row7. Go where after match?

SELECT *FROM tMATCH_RECOGNIZE ( ORDER BY page MEASURES A.page as firstpage, LAST(page) as lastpage, COUNT(*) cnt ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW PATTERN (A B*) DEFINE B AS page = PREV(page)+1);

Page 9: Advanced row pattern matching

9

pg idDEFINE ALL ROWS PER MATCH ONE ROW PER MATCH

first Current

last first Curren

t last Final last first Current last Final

last

1 A 1 1 1 1 1 1 3

2 B 1 2 2 1 2 2 3

3 B 1 3 3 1 3 3 3 1 3 3 3

5 B? 1 5 5

Which row do we mean?Column name by itself = « current » row• DEFINE: row being evaluated ; ALL ROWS: each row ; ONE ROW: last row

Page 10: Advanced row pattern matching

10

Warming up: What output from this?CUST_ID TX_DATE DESCR

C001 2016-01-01 InquiryC001 2016-01-01 InquiryC001 2016-01-10 SalesC001 2016-01-21 Repeat InquiryC001 2016-02-10 Repeat InquiryC001 2016-05-01 SalesC001 2016-05-06 SalesC001 2016-06-10 Inquiry 1C001 2016-09-01 Inquiry 2C002 2016-02-01 Inquiry 1C002 2016-02-25 Inquiry 2C003 2016-02-01 Inquiry 2C003 2016-02-10 SalesC003 2016-02-10 SalesC003 2016-03-10 Inquiry 2C004 2016-04-15 Sales

select * from t match_recognize( all rows per match pattern (a*) define a as 1=1);

Page 11: Advanced row pattern matching

11

Add sequence number, starting over after 40 daysCUST_ID TX_DATE DESCR

C001 2016-01-01 InquiryC001 2016-01-01 InquiryC001 2016-01-10 SalesC001 2016-01-21 Repeat InquiryC001 2016-02-10 Repeat InquiryC001 2016-05-01 SalesC001 2016-05-06 SalesC001 2016-06-10 Inquiry 1C001 2016-09-01 Inquiry 2C002 2016-02-01 Inquiry 1C002 2016-02-25 Inquiry 2C003 2016-02-01 Inquiry 2C003 2016-02-10 SalesC003 2016-02-10 SalesC003 2016-03-10 Inquiry 2C004 2016-04-15 Sales

select * from t match_recognize( all rows per match pattern (a*) define a as 1=1);

Page 12: Advanced row pattern matching

12

Add sequence number, starting over after 40 daysCUST_ID TX_DATE DESCR

C001 2016-01-01 InquiryC001 2016-01-01 InquiryC001 2016-01-10 SalesC001 2016-01-21 Repeat InquiryC001 2016-02-10 Repeat InquiryC001 2016-05-01 SalesC001 2016-05-06 SalesC001 2016-06-10 Inquiry 1C001 2016-09-01 Inquiry 2C002 2016-02-01 Inquiry 1C002 2016-02-25 Inquiry 2C003 2016-02-01 Inquiry 2C003 2016-02-10 SalesC003 2016-02-10 SalesC003 2016-03-10 Inquiry 2C004 2016-04-15 Sales

select * from t match_recognize(

all rows per match pattern (a*) define a as 1=1);

Page 13: Advanced row pattern matching

13

Add sequence number, starting over after 40 daysCUST_ID TX_DATE DESCR

C001 2016-01-01 InquiryC001 2016-01-01 InquiryC001 2016-01-10 SalesC001 2016-01-21 Repeat InquiryC001 2016-02-10 Repeat InquiryC001 2016-05-01 SalesC001 2016-05-06 SalesC001 2016-06-10 Inquiry 1C001 2016-09-01 Inquiry 2C002 2016-02-01 Inquiry 1C002 2016-02-25 Inquiry 2C003 2016-02-01 Inquiry 2C003 2016-02-10 SalesC003 2016-02-10 SalesC003 2016-03-10 Inquiry 2C004 2016-04-15 Sales

select * from t match_recognize( partition by cust_id order by tx_date, descr all rows per match pattern (a*) define a as );

select * from t match_recognize( partition by cust_id order by tx_date, descr all rows per match pattern (a*) define a as tx_date <= first(tx_date) + 40);

select * from t match_recognize( partition by cust_id order by tx_date, descr measures count(*) as seq all rows per match pattern (a*) define a as tx_date <= first(tx_date) + 40);

Page 14: Advanced row pattern matching

14

Add sequence number, starting over after 40 daysCUST_ID TX_DATE DESCR SEQ

C001 2016-01-01 Inquiry 1C001 2016-01-01 Inquiry 2C001 2016-01-10 Sales 3C001 2016-01-21 Repeat Inquiry 4C001 2016-02-10 Repeat Inquiry 5C001 2016-05-01 Sales 1C001 2016-05-06 Sales 2C001 2016-06-10 Inquiry 1 3C001 2016-09-01 Inquiry 2 1C002 2016-02-01 Inquiry 1 1C002 2016-02-25 Inquiry 2 2C003 2016-02-01 Inquiry 2 1C003 2016-02-10 Sales 2C003 2016-02-10 Sales 3C003 2016-03-10 Inquiry 2 4C004 2016-04-15 Sales 1

select * from t match_recognize( partition by cust_id order by tx_date, descr measures count(*) as seq all rows per match pattern (a*) define a as tx_date <= first(tx_date) + 40);

Page 15: Advanced row pattern matching

15

Sequence starts from First Sale, Inquiry outside 40 days = 0CUST_ID TX_DATE DESCR SEQ

C001 2016-01-01 Inquiry 1C001 2016-01-01 Inquiry 2C001 2016-01-10 Sales 3C001 2016-01-21 Repeat Inquiry 4C001 2016-02-10 Repeat Inquiry 5C001 2016-05-01 Sales 1C001 2016-05-06 Sales 2C001 2016-06-10 Inquiry 1 3C001 2016-09-01 Inquiry 2 1C002 2016-02-01 Inquiry 1 1C002 2016-02-25 Inquiry 2 2C003 2016-02-01 Inquiry 2 1C003 2016-02-10 Sales 2C003 2016-02-10 Sales 3C003 2016-03-10 Inquiry 2 4C004 2016-04-15 Sales 1

select * from t match_recognize( partition by cust_id order by tx_date, descr measures count(*) as seq all rows per match pattern (a*) define a as tx_date <= first(tx_date) + 40);

Page 16: Advanced row pattern matching

16

Sequence starts from First Sale, Inquiry outside 40 days = 0CUST_ID TX_DATE DESCR SEQ

C001 2016-01-01 Inquiry 1C001 2016-01-01 Inquiry 2C001 2016-01-10 Sales 3C001 2016-01-21 Repeat Inquiry 4C001 2016-02-10 Repeat Inquiry 5C001 2016-05-01 Sales 1C001 2016-05-06 Sales 2C001 2016-06-10 Inquiry 1 3C001 2016-09-01 Inquiry 2 1C002 2016-02-01 Inquiry 1 1C002 2016-02-25 Inquiry 2 2C003 2016-02-01 Inquiry 2 1C003 2016-02-10 Sales 2C003 2016-02-10 Sales 3C003 2016-03-10 Inquiry 2 4C004 2016-04-15 Sales 1

select * from t match_recognize( partition by cust_id order by tx_date, descr measures count(*) as seq all rows per match pattern (a *) define

a as tx_date <= first(tx_date) + 40);

Page 17: Advanced row pattern matching

17

Sequence starts from Sale, Inquiry outside 40 days = 0CUST_ID TX_DATE DESCR SEQ

C001 2016-01-01 Inquiry 1C001 2016-01-01 Inquiry 2C001 2016-01-10 Sales 3C001 2016-01-21 Repeat Inquiry 4C001 2016-02-10 Repeat Inquiry 5C001 2016-05-01 Sales 1C001 2016-05-06 Sales 2C001 2016-06-10 Inquiry 1 3C001 2016-09-01 Inquiry 2 1C002 2016-02-01 Inquiry 1 1C002 2016-02-25 Inquiry 2 2C003 2016-02-01 Inquiry 2 1C003 2016-02-10 Sales 2C003 2016-02-10 Sales 3C003 2016-03-10 Inquiry 2 4C004 2016-04-15 Sales 1

select * from t match_recognize( partition by cust_id order by tx_date, descr measures count(*) as seq all rows per match pattern (inq* sale1{0,1} more_tx*) define more_tx as tx_date <= + 40);

- count(inq.*) define inq as descr != 'Sales', sale1 as descr = 'Sales', more_tx as tx_date <= sale1.tx_date + 40);

Page 18: Advanced row pattern matching

18

Sequence starts from Sale, Inquiry outside 40 days = 0CUST_ID TX_DATE DESCR SEQ

C001 2016-01-01 Inquiry 0C001 2016-01-01 Inquiry 0C001 2016-01-10 Sales 1C001 2016-01-21 Repeat Inquiry 2C001 2016-02-10 Repeat Inquiry 3C001 2016-05-01 Sales 1C001 2016-05-06 Sales 2C001 2016-06-10 Inquiry 1 3C001 2016-09-01 Inquiry 2 0C002 2016-02-01 Inquiry 1 0C002 2016-02-25 Inquiry 2 0C003 2016-02-01 Inquiry 2 0C003 2016-02-10 Sales 1C003 2016-02-10 Sales 2C003 2016-03-10 Inquiry 2 3C004 2016-04-15 Sales 1

select * from t match_recognize( partition by cust_id order by tx_date, descr measures count(*) - count(inq.*) as seq all rows per match pattern (inq* sale1{0,1} more_tx*) define inq as descr != 'Sales', sale1 as descr = 'Sales', more_tx as tx_date <= sale1.tx_date + 40);

Page 19: Advanced row pattern matching

19

Negative sequence for Inquiries within 10 days prior to SaleCUST_ID TX_DATE DESCR SEQ

C001 2016-01-01 Inquiry 0C001 2016-01-01 Inquiry 0C001 2016-01-10 Sales 1C001 2016-01-21 Repeat Inquiry 2C001 2016-02-10 Repeat Inquiry 3C001 2016-05-01 Sales 1C001 2016-05-06 Sales 2C001 2016-06-10 Inquiry 1 3C001 2016-09-01 Inquiry 2 0C002 2016-02-01 Inquiry 1 0C002 2016-02-25 Inquiry 2 0C003 2016-02-01 Inquiry 2 0C003 2016-02-10 Sales 1C003 2016-02-10 Sales 2C003 2016-03-10 Inquiry 2 3C004 2016-04-15 Sales 1

select * from t match_recognize( partition by cust_id order by tx_date, descr measures count(*) - count(inq.*) as seq all rows per match pattern (inq* sale1{0,1} more_tx*) define inq as descr != 'Sales', sale1 as descr = 'Sales', more_tx as tx_date <= sale1.tx_date + 40);

Page 20: Advanced row pattern matching

20

Negative sequence for Inquiries within 10 days prior to SaleCUST_ID TX_DATE DESCR SEQ

C001 2016-01-01 Inquiry 0C001 2016-01-01 Inquiry 0C001 2016-01-10 Sales 1C001 2016-01-21 Repeat Inquiry 2C001 2016-02-10 Repeat Inquiry 3C001 2016-05-01 Sales 1C001 2016-05-06 Sales 2C001 2016-06-10 Inquiry 1 3C001 2016-09-01 Inquiry 2 0C002 2016-02-01 Inquiry 1 0C002 2016-02-25 Inquiry 2 0C003 2016-02-01 Inquiry 2 0C003 2016-02-10 Sales 1C003 2016-02-10 Sales 2C003 2016-03-10 Inquiry 2 3C004 2016-04-15 Sales 1

select * from t match_recognize( partition by cust_id order by tx_date, descr measures

count(*) - count(inq.*) as seq all rows per match pattern (inq* sale1{0,1} more_tx*) define inq as descr != 'Sales', sale1 as descr = 'Sales', more_tx as tx_date <= sale1.tx_date + 40);

Page 21: Advanced row pattern matching

21

Negative sequence for Inquiries within 10 days prior to SaleCUST_ID TX_DATE DESCR SEQ

C001 2016-01-01 Inquiry 0C001 2016-01-01 Inquiry 0C001 2016-01-10 Sales 1C001 2016-01-21 Repeat Inquiry 2C001 2016-02-10 Repeat Inquiry 3C001 2016-05-01 Sales 1C001 2016-05-06 Sales 2C001 2016-06-10 Inquiry 1 3C001 2016-09-01 Inquiry 2 0C002 2016-02-01 Inquiry 1 0C002 2016-02-25 Inquiry 2 0C003 2016-02-01 Inquiry 2 0C003 2016-02-10 Sales 1C003 2016-02-10 Sales 2C003 2016-03-10 Inquiry 2 3C004 2016-04-15 Sales 1

select * from t match_recognize( partition by cust_id order by tx_date, descr measures case when classifier() = 'INQ' and tx_date >= final first(sale1.tx_date) - 10 then count(inq.*) - final count(inq.*) - 1 else count(*) - count(inq.*) end as seq all rows per match pattern (inq* sale1{0,1} more_tx*) define inq as descr != 'Sales', sale1 as descr = 'Sales', more_tx as tx_date <= sale1.tx_date + 40);

Page 22: Advanced row pattern matching

22

Negative sequence for Inquiries within 10 days prior to SaleCUST_ID TX_DATE DESCR SEQ

C001 2016-01-01 Inquiry -2C001 2016-01-01 Inquiry -1C001 2016-01-10 Sales 1C001 2016-01-21 Repeat Inquiry 2C001 2016-02-10 Repeat Inquiry 3C001 2016-05-01 Sales 1C001 2016-05-06 Sales 2C001 2016-06-10 Inquiry 1 3C001 2016-09-01 Inquiry 2 0C002 2016-02-01 Inquiry 1 0C002 2016-02-25 Inquiry 2 0C003 2016-02-01 Inquiry 2 -1C003 2016-02-10 Sales 1C003 2016-02-10 Sales 2C003 2016-03-10 Inquiry 2 3C004 2016-04-15 Sales 1

select * from t match_recognize( partition by cust_id order by tx_date, descr measures case when classifier() = 'INQ' and tx_date >= final first(sale1.tx_date) - 10 then count(inq.*) - final count(inq.*) - 1 else count(*) - count(inq.*) end as seq all rows per match pattern (inq* sale1{0,1} more_tx*) define inq as descr != 'Sales', sale1 as descr = 'Sales', more_tx as tx_date <= sale1.tx_date + 40);

Page 23: Advanced row pattern matching

23

Page 24: Advanced row pattern matching

24

Hierarchical Summary: get salaries of mgr + subordinates

select level lvl, ename, sal from scott.emp start with mgr is null connect by mgr = prior empno;

LVL ENAME SAL1 KING 50002 JONES 29753 SCOTT 30004 ADAMS 11003 FORD 30004 SMITH 8002 BLAKE 28503 ALLEN 16003 WARD 12503 MARTIN 12503 TURNER 15003 JAMES 9502 CLARK 24503 MILLER 1300

>2

Page 25: Advanced row pattern matching

25

Hierarchical Summary: get salaries of mgr + subordinatesselect * from ( select level lvl, ename, sal from scott.emp start with mgr is null connect by mgr = prior empno)match_recognize( measures a.lvl lvl, a.ename ename, a.sal sal, sum(sal) as sum_sal pattern(a b*) define b as lvl > a.lvl);

LVL ENAME SAL1 KING 50002 JONES 29753 SCOTT 30004 ADAMS 11003 FORD 30004 SMITH 8002 BLAKE 28503 ALLEN 16003 WARD 12503 MARTIN 12503 TURNER 15003 JAMES 9502 CLARK 24503 MILLER 1300

Page 26: Advanced row pattern matching

26

Hierarchical Summary: get salaries of mgr + subordinatesLVL ENAME SAL SUM_SAL1 KING 5000 29025

select * from ( select level lvl, ename, sal from scott.emp start with mgr is null connect by mgr = prior empno)match_recognize( measures a.lvl lvl, a.ename ename, a.sal sal, sum(sal) as sum_sal pattern(a b*) define b as lvl > a.lvl);

Page 27: Advanced row pattern matching

27

Hierarchical Summary: get salaries of mgr + subordinatesLVL ENAME SAL SUM_SAL1 KING 5000 29025

select * from ( select level lvl, ename, sal from scott.emp start with mgr is null connect by mgr = prior empno)match_recognize( measures a.lvl lvl, a.ename ename, a.sal sal, sum(sal) as sum_sal after match skip past last row pattern(a b*) define b as lvl > a.lvl);

Page 28: Advanced row pattern matching

28

Hierarchical Summary: get salaries of mgr + subordinatesLVL ENAME SAL SUM_SAL1 KING 5000 29025

select * from ( select level lvl, ename, sal from scott.emp start with mgr is null connect by mgr = prior empno)match_recognize( measures a.lvl lvl, a.ename ename, a.sal sal, sum(sal) as sum_sal after match skip to next row pattern(a b*) define b as lvl > a.lvl);

Page 29: Advanced row pattern matching

29

Hierarchical Summary: get salaries of mgr + subordinatesLVL ENAME SAL SUM_SAL1 KING 5000 290252 JONES 2975 108753 SCOTT 3000 41004 ADAMS 1100 11003 FORD 3000 38004 SMITH 800 8002 BLAKE 2850 94003 ALLEN 1600 16003 WARD 1250 12503 MARTIN 1250 12503 TURNER 1500 15003 JAMES 950 9502 CLARK 2450 37503 MILLER 1300 1300

select * from ( select level lvl, ename, sal from scott.emp start with mgr is null connect by mgr = prior empno)match_recognize( measures a.lvl lvl, a.ename ename, a.sal sal, sum(sal) as sum_sal after match skip to next row pattern(a b*) define b as lvl > a.lvl);

http://www.kibeha.dk/2015/07/row-pattern-matching-nested-within.html

Page 30: Advanced row pattern matching

30

Anchors and Alternation• Anchors

– ^ matches the position before the first row in the partition.– $ matches the position after the last row in the partition

PATTERN(^ A $) = partition must have 1 row

• Alternation: | means OR– "Alternatives are preferred in the order they are specified."

PATTERN ( A | B ) =If A condition is true then A, else if B condition is true then B

Page 31: Advanced row pattern matching

31

JOIN alternative: CDC comparePK VAL

1Same value2Delete this3Old value

PK VAL1Same value3New value4Insert this

T1 T2 select pk, op, val, oldrid from ( select pk, val, rowid rid from t1 union all select pk, val, null from t2)match_recognize( partition by pk order by rid measures classifier() op, first(rid) oldrid all rows per match pattern(^ D $ | ^ I $ | (^ O U $) ) define D as rid is not null, U as decode(O.val, val, 0, 1) = 1);

PK OP VAL OLDRID2D Delete this AAAkdlAAH…MAAB3O Old value AAAkdlAAH…MAAC3U New value AAAkdlAAH…MAAC4I Insert this

Page 32: Advanced row pattern matching

32

(Almost) All Rows per Match

• PATTERN ( A {- B A -} B)– The parts of the pattern enclosed

between {- and -} are excluded from the output.– Here only two rows per match will be returned– More granular than using a WHERE clause

Page 33: Advanced row pattern matching

33

Avoid Inequality joins> create table t(dte not null) asselect sysdate + levelfrom dual connect by level <= 10000;

> create table u(start_dte, end_dte) asselect dte, dte+1/4 from t;

> select count(*) from t, uwhere t.dte = u.start_dte;

Elapsed: 00:00:00.039

> Select count(*) from t, uwhere t.dte between u.start_dte and u.end_dte;

Elapsed: 00:00:09.132

Exadata?

All data in buffer cache

Elapsed: 00:00:09.132

InMemory?Elapsed: 00:00:07.021

Page 34: Advanced row pattern matching

34

Avoid Inequality joins---------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 42 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 42 | |* 2 | HASH JOIN | | 1 | 10000 | 10000 |00:00:00.01 | 42 | | 3 | TABLE ACCESS FULL | T | 1 | 10000 | 10000 |00:00:00.01 | 21 | | 4 | TABLE ACCESS FULL | U | 1 | 10000 | 10000 |00:00:00.01 | 21 | ---------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:14.02 | 42 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:14.02 | 42 | | 2 | MERGE JOIN | | 1 | 2500K| 10000 |00:00:13.93 | 42 | | 3 | SORT JOIN | | 1 | 10000 | 10000 |00:00:00.01 | 21 | | 4 | TABLE ACCESS FULL | T | 1 | 10000 | 10000 |00:00:00.01 | 21 | |* 5 | FILTER | | 10000 | | 10000 |00:00:14.00 | 21 | |* 6 | SORT JOIN | | 10000 | 10000 | 50M|00:00:54.86 | 21 | | 7 | TABLE ACCESS FULL| U | 1 | 10000 | 10000 |00:00:00.01 | 21 | ----------------------------------------------------------------------------------------

Page 35: Advanced row pattern matching

35

Avoid Inequality joinsselect count(*) from ( select start_dte, end_dte from u union all select dte, null from t) match_recognize ( order by start_dte, end_dte all rows per match pattern({-u-} t+) define u as end_dte is not null, t as start_dte < u.end_dte);

Elapsed: 00:00:00.037

-- Works because no overlaps in U range

Page 36: Advanced row pattern matching

36

Child's

play

Page 37: Advanced row pattern matching

37

Solving Problems with pattern matching

• Clear knowledge of input & requirement– Beware of assumptions

• Identify typical problems and solutions– Consecutive sequences– "Start of Group"– Bin fitting– Ranges

(see "Ranges, ranges everywhere!" Tomorrow)• Visualize the data processing flow

– Intermediate results helpful / required?

Page 38: Advanced row pattern matching

Meet Your MatchAdvanced row pattern matching (12c)

@StewAshton (stewashton.wordpress.com)UKOUG Tech 2016