23

Analytic Functions

  • Upload
    ayisha

  • View
    482

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Analytic Functions
Page 2: Analytic Functions

Note: “standard” name is “Window” functions

When? – Starting 8i

Why? – Simple Solution of Complex Problems

Why Exactly? – advanced ranking, aggregation, row comparison, statistics, “what if” scenarios

Order of Evaluation in SQL: Prior to “ORDER BY” clause

Page 3: Analytic Functions

Some of the things that are hard to do in SQL are :

. Calculate a running total

. Find percentages within a group

. Top-N Queries

. Compute a moving average

. Perform ranking queries

Page 4: Analytic Functions

Analytic functions compute an aggregate value based on a group of rows.

The group of rows is called a window and is defined by the analytic_clause. For each row, a sliding window of rows is defined.

The window determines the range of rows used to perform the calculations for the current row

Page 5: Analytic Functions

Analytic-Function(<Argument>,<Argument>,...) OVER ( < Query-Partition-Clause> <Order-By-Clause> <Windowing-Clause> )

PARTITION BY – aggregates result set into groups

ORDER BY – orders data within a partition

WINDOWING – rows or ranges (logical offset)

Page 6: Analytic Functions

EMPNO ENAME DEPTNO SAL----------------------------------------------------- 7369 SMITH 20 800 7499 ALLEN 30 1600 7521 WARD 30 1250 7566 JONES 20 2975 7654 MARTIN 30 1250

SELECT empno, ename, deptno, salFROM emp

How are analytic functions different from group or aggregate functions?

SELECT deptno, sum(sal) over() sal FROM emp

SELECT deptno, sum(sal) salFROM empGROUP BY deptno

DEPTNO SAL------------------------- 30 4100 20 3775

DEPTNO SAL-------------------------- 20 7875 30 7875 30 7875 20 7875 30 7875

Page 7: Analytic Functions

How are analytic functions different from group or aggregate functions?

SELECT deptno, COUNT(*) DEPT_COUNT FROM emp WHERE deptno IN (20, 30)GROUP BY deptno;

DEPTNO                 DEPT_COUNT   --------------- -------------------- 20                     2                       30                 3                     2 rows selected

SELECT empno , deptno, COUNT(*) OVER (PARTITION BY deptno) DEPT_COUNT FROM emp WHERE deptno IN (20, 30); EMPNO DEPTNO DEPT_COUNT

----------- ---------- ---------- 7369 20 2 7566 20 27499 30 3 7900 30 3 7844 30 3

5 rows selected.

Page 8: Analytic Functions

SELECT ename, deptno, sal, sum(sal) over () Tot, sum(sal) over (order by deptno,ename) Run_Tot, sum(sal) over (partition by deptno order by ename) Dept_Tot, row_number() over (partition by deptno order by ename) Seq FROM emp ORDER BY deptno,ename ;

ENAME DEPTNO SAL TOT RUN_TOT DEPT_TOT SEQ

CLARK 10 2450 29025 2450 2450 1KING 10 5000 29025 7450 7450 2MILLER 10 1300 29025 8750 8750 3ADAMS 20 1100 29025 9850 1100 1FORD 20 3000 29025 12850 4100 2JONES 20 2975 29025 15825 7075 3SCOTT 20 3000 29025 18825 10075 4SMITH 20 800 29025 19625 10875 5ALLEN 30 1600 29025 21225 1600 1BLAKE 30 2850 29025 24075 4450 2JAMES 30 950 29025 25025 5400 3

Page 9: Analytic Functions

How Analytic Functions Work and when to use?

Analytic functions are computed after all joins, WHERE clause, GROUP BY and HAVING are computed on the query.

The main ORDER BY clause of the query operates after the analytic functions.

So analytic functions can only appear in the select list and in the main ORDER BY clause of the query.

Page 10: Analytic Functions

ROW_NUMBER()LAG()LEAD()MIN()MAX()RANK()DENSE_RANK()SUM()AVG()FIRST_VALUE()LAST_VALUE()FIRST()LAST()

Page 11: Analytic Functions

ROW_NUMBER( ) gives a running serial number to a partition of records. It is very useful in reporting, especially in places where different partitions have their own serial numbers.

ROW_NUMBER FUNCTION

SELECT empno, deptno, hiredate, ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY hiredate NULLS LAST) SRLNOFROM empWHERE deptno IN (10,20)OREDR BY deptno,SRLNO

EMPNO DEPTNO HIREDATE SRLNO

7782 10 09-JUN-81 1 7839 10 17-NOV-81 2 7934 10 23-JAN-82 3 7369 20 17-DEC-80 1 7566 20 02-APR-81 2 7902 20 03-DEC-81 3 7788 20 09-DEC-82 4 7876 20 12-JAN-83 5

Page 12: Analytic Functions

RANK DENSE_RANK FUNCTIONS

SELECT empno, deptno, sal,RANK() OVER (PARTITION BY deptnoORDER BY sal DESC NULLS LAST) RANK,DENSE_RANK() OVER (PARTITION BYdeptno ORDER BY sal DESC NULLSLAST) DENSE_RANKFROM empWHERE deptno IN (10, 20)ORDER BY 2, RANK;

EMPNO DEPT SAL RANK DENSE_ NO RANK

7839 10 5000 1 1 7782 10 2450 2 2 7934 10 1300 3 3 7788 20 3000 1 1 7902 20 3000 1 1 7566 20 2975 3 2 7876 20 1100 4 3

Page 13: Analytic Functions

LEAD LAG FUNCTIONS

LEAD has the ability to compute an expression on the next rows (rows which are going to come after the current row) and return the value to the current row.

LEAD (<sql_expr>, <offset>, <default>) OVER (<analytic_clause>)

The syntax of LAG is similar except that the offset for LAG goes into the previous rows.

SELECT deptno, empno, sal,LEAD(sal, 1, 0) OVER (PARTITION BY dept ORDER BY sal DESC NULLS LAST) NEXT_LOWER_SAL,LAG(sal, 1, 0) OVER (PARTITION BY dept ORDER BY sal DESC NULLS LAST) PREV_HIGHER_SALFROM empWHERE deptno IN (10, 20)ORDER BY deptno, sal DESC;

DEPT EMP SAL NEXT_ PREV_ NO NO LOWER HIGHER_ _SAL SAL

10 7839 5000 2450 0 10 7782 2450 1300 5000 10 7934 1300 0 2450 20 7788 3000 3000 0 20 7902 3000 2975 3000 20 7566 2975 1100 3000 20 7876 1100 800 2975 20 7369 800 0 100

Page 14: Analytic Functions

FIRST VALUE LAST VALUE FUNCTIONS

The FIRST_VALUE analytic function picks the first record from the partition after doing the ORDER BY. The <sql_expr> is computed on the columns of this first record and results are returned. The LAST_VALUE function is used in similar context except that it acts on the last record of the partition.

FIRST_VALUE(<sql_expr>) OVER (<analytic_clause>)

SELECT empno, deptno, hiredate-FIRST_VALUE(hiredate)OVER (PARTITION BY deptno ORDER BY hiredate) DAY_GAPFROM empWHERE deptno IN (20, 30)ORDER BY deptno, DAY_GAP;

EMPNO DEPTNO DAY_GAP

7369 20 0 7566 20 106 7902 20 351 7788 20 722 7876 20 756 7499 30 0 7521 30 2 7698 30 70 7844 30 200

Page 15: Analytic Functions

MIN MAX FUNCTIONS

MAX returns maximum value of expr. MIN returns minimum value of expr

SELECT manager_id, last_name, salary, MAX(salary) OVER (PARTITION BY manager_id) AS mgr_max FROM employees;

MGRID LNAME SAL MGR_MAX

100 Kochhar 17000 17000 100 De Haan 17000 17000 100 Raphaely 11000 17000 100 Kaufling 7900 17000 100 Fripp 8200 17000 100 Weiss 8000 17000 . . .

Page 16: Analytic Functions

WINDOW CLAUSE

To further sub-partition the result and apply the analytic function.

[ROW or RANGE] BETWEEN <start_expr> AND <end_expr>

<start_expr> can be any one of the following •UNBOUNDED PECEDING•CURRENT ROW •<sql_expr> PRECEDING or FOLLOWING.

<end_expr> can be any one of the following •UNBOUNDED FOLLOWING or •CURRENT ROW or •<sql_expr> PRECEDING or FOLLOWING.

UNBOUNDED PRECEDING for <start_expr> UNBOUNDED FOLLOWING for <end_expr>.

Page 17: Analytic Functions

There are two types of Window clauses

1.ROW Type WindowsSyntax: Function( ) OVER (PARTITIN BY <expr1> ORDER BY <expr2,..> ROWS BETWEEN <start_expr> AND <end_expr>)

(or)

Function( ) OVER (PARTITON BY <expr1> ORDER BY <expr2,..> ROWS [<start_expr> PRECEDING or UNBOUNDED PRECEDING]

2.RANGE WindowsSyntax: Function( ) OVER (PARTITION BY <expr1> ORDER BY <expr2> RANGE BETWEEN <start_expr> AND <end_expr>)

(or)

Function( ) OVER (PARTITION BY <expr1> ORDER BY <expr2> RANGE [<start_expr> PRECEDING or UNBOUNDED PRECEDING]

Page 18: Analytic Functions

SELECT id, sal FROM emp

ROW Type Example

ID SAL01 100002 200003 300004 100005 2000

SELECT id, sal, AVG(sal) OVER(ORDER BY id ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) tot_avgFROM emp

ID SAL TOT_AVG

01 1000 150002 2000 200003 3000 200004 1000 200005 2000 1000

Page 19: Analytic Functions

RANGE Type Example

SELECT ename, sal, hiredate, hiredate-50 "50_days_prior", first_value(ename) over (order by hiredate asc range 50 preceding) first_ename, first_value(hiredate) over (order by hiredate asc range 50 preceding) first_hdate FROM emp ORDER BY hiredate ASC

Page 20: Analytic Functions

ENAME SAL HIREDATE 50_days_p FIRST_ENAM FIRST_HDA ---------------------------------------------------------------------------------------------------- SMITH 800 17-DEC-80 28-OCT-80 SMITH 17-DEC-80 ALLEN 1600 20-FEB-81 01-JAN-81 ALLEN 20-FEB-81 WARD 1250 22-FEB-81 03-JAN-81 ALLEN 20-FEB-81 JONES 2975 02-APR-81 11-FEB-81 ALLEN 20-FEB-81 BLAKE 2850 01-MAY-81 12-MAR-81 JONES 02-APR-81 CLARK 2450 09-JUN-81 20-APR-81 BLAKE 01-MAY-81 TURNER 1500 08-SEP-81 20-JUL-81 TURNER 08-SEP-81 MARTIN 1250 28-SEP-81 09-AUG-81 TURNER 08-SEP-81 KING 5000 17-NOV-81 28-SEP-81 MARTIN 28-SEP-81 FORD 3000 03-DEC-81 14-OCT-81 KING 17-NOV-81 JAMES 950 03-DEC-81 14-OCT-81 KING 17-NOV-81 MILLER 1300 23-JAN-82 04-DEC-81 MILLER 23-JAN-82 SCOTT 3000 19-APR-87 28-FEB-87 SCOTT 19-APR-87 ADAMS 1100 23-MAY-87 03-APR-87 SCOTT 19-APR-87

Page 21: Analytic Functions

Process Resultset Using Minimal Resources

Number Of Logical I/Os Is Less

Run Time Is Less

Easier To Code

Page 22: Analytic Functions

Oracle calls the concept of filling in missing data with partitioned outer joins Data Densification.LEFT OUTER JOIN,PARTITIONED BY ,KEEP are some new features in 10g

LISTAGG , NTH_VALUE are new features available in 11g

LEAD , LAG functions have been improved with the addition ofIGNORE NULLS option in 11g

Page 23: Analytic Functions