Analytic Functions
Introduced in Oracle 8i, analytic functions, also known as windowing functions, allow developers to perform tasks in SQL that were previously confined to procedural languages.
Introduction
Analytic Function Syntax
query_partition_clause
order_by_clause
windowing_clause
Using Analytic Functions
Related articles.
RANK, DENSE_RANK, FIRST and LAST Analytic Functions
FIRST_VALUE and LAST_VALUE Analytic Functions
LAG and LEAD Analytic Functions
LISTAGG Analystic Function in 11g Release 2
Top-N Queries
Pattern Matching (MATCH_RECOGNIZE) in Oracle Database 12c Release 1 (12.1)
Introduction
Probably the easiest way to understand analytic functions is to start by looking at aggregate functions. An aggregate function, as the name suggests, aggregates data from several rows into a single result row. For example, we might use the AVG aggregate function to give us an average of all the employee salaries in the EMP table.
SELECT AVG(sal)
FROM emp;
AVG(SAL)
----------
2073.21429
SQL>
The GROUP BY clause allows us to apply aggregate functions to subsets of rows. For example, we might want to display the average salary for each department.
SELECT deptno, AVG(sal)
FROM emp
GROUP BY deptno
ORDER BY deptno;
DEPTNO AVG(SAL)
---------- ----------
10 2916.66667
20 2175
30 1566.66667
SQL>
In both cases, the aggregate function reduces the number of rows returned by the query.
Analytic functions also operate on subsets of rows, similar to aggregate functions in GROUP BY queries, but they do not reduce the number of rows returned by the query. For example, the following query reports the salary for each employee, along with the average salary of the employees within the department.
SET PAGESIZE 50
BREAK ON deptno SKIP 1 DUPLICATES
SELECT empno, deptno, sal,
AVG(sal) OVER (PARTITION BY deptno) AS avg_dept_sal
FROM emp;
EMPNO DEPTNO SAL AVG_DEPT_SAL
---------- ---------- ---------- ------------
7782 10 2450 2916.66667
7839 10 5000 2916.66667
7934 10 1300 2916.66667
7566 20 2975 2175
7902 20 3000 2175
7876 20 1100 2175
7369 20 800 2175
7788 20 3000 2175
7521 30 1250 1566.66667
7844 30 1500 1566.66667
7499 30 1600 1566.66667
7900 30 950 1566.66667
7698 30 2850 1566.66667
7654 30 1250 1566.66667
14 rows selected.
SQL>
This time AVG is an analytic function, operating on the group of rows defined by the contents of the OVER clause. This group of rows is known as a window, which is why analytic functions are sometimes referred to as window[ing] functions. Notice how the AVG function is still reporting the departmental average, like it did in the GROUP BY query, but the result is present in each row, rather than reducing the total number of rows returned. This is because analytic functions are performed on a result set after all join, WHERE, GROUP BY and HAVING clauses are complete, but before the final ORDER BY operation is performed.
Analytic Function Syntax
There are some variations in the syntax of the individual analytic functions, but the basic syntax for an analytic function is as follows.
analytic_function([ arguments ]) OVER (analytic_clause)
The analytic_clause breaks down into the following optional elements.
[ query_partition_clause ] [ order_by_clause [ windowing_clause ] ]
The sub-elements of the analytic_clause each have their own syntax diagrams, shown here. Rather than repeat the syntax diagrams, the following sections describe what each section of the analytic_clause is used for.
query_partition_clause
The query_partition_clause divides the result set into partitions, or groups, of data. The operation of the analytic function is restricted to the boundary imposed by these partitions, similar to the way a GROUP BY clause affects the action of an aggregate function. If the query_partition_clause is omitted, the whole result set is treated as a single partition. The following query uses an empty OVER clause, so the average presented is based on all the rows of the result set.
CLEAR BREAKS
SELECT empno, deptno, sal,
AVG(sal) OVER () AS avg_sal
FROM emp;
EMPNO DEPTNO SAL AVG_SAL
---------- ---------- ---------- ----------
7369 20 800 2073.21429
7499 30 1600 2073.21429
7521 30 1250 2073.21429
7566 20 2975 2073.21429
7654 30 1250 2073.21429
7698 30 2850 2073.21429
7782 10 2450 2073.21429
7788 20 3000 2073.21429
7839 10 5000 2073.21429
7844 30 1500 2073.21429
7876 20 1100 2073.21429
7900 30 950 2073.21429
7902 20 3000 2073.21429
7934 10 1300 2073.21429
SQL>
If we change the OVER clause to include a query_partition_clause based on the department, the averages presented are specifically for the department the employee belongs too.
BREAK ON deptno SKIP 1 DUPLICATES
SELECT empno, deptno, sal,
AVG(sal) OVER (PARTITION BY deptno) AS avg_dept_sal
FROM emp;
EMPNO DEPTNO SAL AVG_DEPT_SAL
---------- ---------- ---------- ------------
7782 10 2450 2916.66667
7839 10 5000 2916.66667
7934 10 1300 2916.66667
7566 20 2975 2175
7902 20 3000 2175
7876 20 1100 2175
7369 20 800 2175
7788 20 3000 2175
7521 30 1250 1566.66667
7844 30 1500 1566.66667
7499 30 1600 1566.66667
7900 30 950 1566.66667
7698 30 2850 1566.66667
7654 30 1250 1566.66667
SQL>
order_by_clause
The order_by_clause is used to order rows, or siblings, within a partition. So if an analytic function is sensitive to the order of the siblings in a partition you should include an order_by_clause. The following query uses the FIRST_VALUE function to return the first salary reported in each department. Notice we have partitioned the result set by the department, but there is no order_by_clause.
BREAK ON deptno SKIP 1 DUPLICATES
SELECT empno, deptno, sal,
FIRST_VALUE(sal IGNORE NULLS) OVER (PARTITION BY deptno) AS first_sal_in_dept
FROM emp;
EMPNO DEPTNO SAL FIRST_SAL_IN_DEPT
---------- ---------- ---------- -----------------
7782 10 2450 2450
7839 10 5000 2450
7934 10 1300 2450
7566 20 2975 2975
7902 20 3000 2975
7876 20 1100 2975
7369 20 800 2975
7788 20 3000 2975
7521 30 1250 1250
7844 30 1500 1250
7499 30 1600 1250
7900 30 950 1250
7698 30 2850 1250
7654 30 1250 1250
SQL>
Now compare the values of the FIRST_SAL_IN_DEPT column when we include an order_by_clause to order the siblings by ascending salary.
SELECT empno, deptno, sal,
FIRST_VALUE(sal IGNORE NULLS) OVER (PARTITION BY deptno ORDER BY sal ASC NULLS LAST) AS first_val_in_dept
FROM emp;
EMPNO DEPTNO SAL FIRST_VAL_IN_DEPT
---------- ---------- ---------- -----------------
7934 10 1300 1300
7782 10 2450 1300
7839 10 5000 1300
7369 20 800 800
7876 20 1100 800
7566 20 2975 800
7788 20 3000 800
7902 20 3000 800
7900 30 950 950
7654 30 1250 950
7521 30 1250 950
7844 30 1500 950
7499 30 1600 950
7698 30 2850 950
SQL>
In this case the "ASC NULLS LAST" keywords are unnecessary as ASC is the default for an order_by_clause and NULLS LAST is the default for ASC orders. When ordering by DESC, the default is NULLS FIRST.
It is important to understand how the order_by_clause affects display order. The order_by_clause is guaranteed to affect the order of the rows as they are processed by the analytic function, but it may not always affect the display order. As a result, you must always use a conventional ORDER BY clause in the query if display order is important. Do not rely on any implicit ordering done by the analytic function. Remember, the conventional ORDER BY clause is performed after the analytic processing, so it will always take precedence.
windowing_clause
We have seen previously the query_partition_clause controls the window, or group of rows, the analytic operates on. The windowing_clause gives some analytic functions a further degree of control over this window within the current partition, or whole result set if no partitioning clause is used. The windowing_clause is an extension of the order_by_clause and as such, it can only be used if an order_by_clause is present. The windowing_clause has two basic forms.
RANGE BETWEEN start_point AND end_point
ROWS BETWEEN start_point AND end_point
When using ROWS BETWEEN, you are indicating a specific number of rows relative to the current row, either directly, or via an expression. Assuming you don't cross a partition boundary, that number of rows is fixed. In contrast, when you use RANGE BETWEEN you are referring to a range of values in a specific column relative to the value in the current row. As a result, Oracle doesn't know how many rows are included in the range until the ordered set is created.
It is possible to omit the BETWEEN keyword and specify a single RANGE/ROWS endpoint. In this case, Oracle assumes your specified RANGE/ROWS is the start point and the end point it the current row. I would advise against using this syntax as it will be unclear to anyone who doesn't understand this default action.
Possible values for "start_point" and "end_point" are:
UNBOUNDED PRECEDING : The window starts at the first row of the partition, or the whole result set if no partitioning clause is used. Only available for start points.
UNBOUNDED FOLLOWING : The window ends at the last row of the partition, or the whole result set if no partitioning clause is used. Only available for end points.
CURRENT ROW : The window starts or ends at the current row. Can be used as start or end point.
value_expr PRECEDING : A physical or logical offset before the current row using a constant or expression that evaluates to a positive numerical value. When used with RANGE, it can also be an interval literal if the order_by_clause uses a DATE column.
value_expr FOLLOWING : As above, but an offset after the current row.
The documentation states the start point must always be before the end point, but this is not true, as demonstrated by this rather silly, but valid, query.
SELECT empno, deptno, sal,
AVG(sal) OVER (PARTITION BY deptno ORDER BY sal ROWS BETWEEN 0 PRECEDING AND 0 PRECEDING) AS avg_of_current_sal
FROM emp;
EMPNO DEPTNO SAL AVG_OF_CURRENT_SAL
---------- ---------- ---------- ------------------
7934 10 1300 1300
7782 10 2450 2450
7839 10 5000 5000
7369 20 800 800
7876 20 1100 1100
7566 20 2975 2975
7788 20 3000 3000
7902 20 3000 3000
7900 30 950 950
7654 30 1250 1250
7521 30 1250 1250
7844 30 1500 1500
7499 30 1600 1600
7698 30 2850 2850
SQL>
In fact, the start point must be before or equal to the end point. In addition, the current row does not have to be part of the window. The window can be defined to start and end before or after the current row.
For analytic functions that support the windowing_clause, the default action is "RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW". The following query is similar to one used previously to report the employee salary and average department salary, but now we have included an order_by_clause so we also get the default windowing_clause.
SELECT empno, deptno, sal,
AVG(sal) OVER (PARTITION BY deptno ORDER BY sal) AS avg_dept_sal_sofar
FROM emp;
EMPNO DEPTNO SAL AVG_DEPT_SAL_SOFAR
---------- ---------- ---------- ------------------
7934 10 1300 1300
7782 10 2450 1875
7839 10 5000 2916.66667
7369 20 800 800
7876 20 1100 950
7566 20 2975 1625
7788 20 3000 2175
7902 20 3000 2175
7900 30 950 950
7654 30 1250 1150
7521 30 1250 1150
7844 30 1500 1237.5
7499 30 1600 1310
7698 30 2850 1566.66667
SQL>
There are two things to notice here.
The addition of the order_by_clause without a windowing_clause means the query is now returning a running average.
The default windowing_clause is "RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW", not "ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW". The fact it is RANGE, not ROWS, means it stops at the first occurrence of the value in the current row, even if that is several rows earlier. As a result, duplicate rows are only included in the average when the salary value changes. You can see this in the last two records of department 20 and in the second and third records of department 30.
In my opinion, the default windowing_clause should have been "ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING". This would make the accidental inclusion of the windowing_clause much less confusing.
The following query shows one method for accessing data from previous and following rows within the current row using the windowing_clause. This can also be accomplished with LAG and LEAD.
CLEAR BREAKS
SELECT empno, deptno, sal,
FIRST_VALUE(sal) OVER (ORDER BY sal ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS previous_sal,
LAST_VALUE(sal) OVER (ORDER BY sal ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) AS next_sal
FROM emp;
EMPNO DEPTNO SAL PREVIOUS_SAL NEXT_SAL
---------- ---------- ---------- ------------ ----------
7369 20 800 800 950
7900 30 950 800 1100
7876 20 1100 950 1250
7521 30 1250 1100 1250
7654 30 1250 1250 1300
7934 10 1300 1250 1500
7844 30 1500 1300 1600
7499 30 1600 1500 2450
7782 10 2450 1600 2850
7698 30 2850 2450 2975
7566 20 2975 2850 3000
7788 20 3000 2975 3000
7902 20 3000 3000 5000
7839 10 5000 3000 5000
SQL>
Using Analytic Functions
The best way to understand what analytic functions are capable of is to play around with them. This article contains links to other articles I've written about specific analytic functions and the following documentation links list all analytic functions available in Oracle 12c Release 1. The "*" indicates that these functions allow for the full analytic syntax, including the windowing_clause.
AVG *
CLUSTER_DETAILS
CLUSTER_DISTANCE
CLUSTER_ID
CLUSTER_PROBABILITY
CLUSTER_SET
CORR *
COUNT *
COVAR_POP *
COVAR_SAMP *
CUME_DIST
DENSE_RANK
FEATURE_DETAILS
FEATURE_ID
FEATURE_SET
FEATURE_VALUE
FIRST
FIRST_VALUE *
LAG
LAST
LAST_VALUE *
LEAD
LISTAGG
MAX *
MEDIAN
MIN *
NTH_VALUE *
NTILE
PERCENT_RANK
PERCENTILE_CONT
PERCENTILE_DISC
PREDICTION
PREDICTION_COST
PREDICTION_DETAILS
PREDICTION_PROBABILITY
PREDICTION_SET
RANK
RATIO_TO_REPORT
REGR_ (Linear Regression) Functions *
ROW_NUMBER
STDDEV *
STDDEV_POP *
STDDEV_SAMP *
SUM *
VAR_POP *
VAR_SAMP *
VARIANCE *
Introduced in Oracle 8i, analytic functions, also known as windowing functions, allow developers to perform tasks in SQL that were previously confined to procedural languages.
Introduction
Analytic Function Syntax
query_partition_clause
order_by_clause
windowing_clause
Using Analytic Functions
Related articles.
RANK, DENSE_RANK, FIRST and LAST Analytic Functions
FIRST_VALUE and LAST_VALUE Analytic Functions
LAG and LEAD Analytic Functions
LISTAGG Analystic Function in 11g Release 2
Top-N Queries
Pattern Matching (MATCH_RECOGNIZE) in Oracle Database 12c Release 1 (12.1)
Introduction
Probably the easiest way to understand analytic functions is to start by looking at aggregate functions. An aggregate function, as the name suggests, aggregates data from several rows into a single result row. For example, we might use the AVG aggregate function to give us an average of all the employee salaries in the EMP table.
SELECT AVG(sal)
FROM emp;
AVG(SAL)
----------
2073.21429
SQL>
The GROUP BY clause allows us to apply aggregate functions to subsets of rows. For example, we might want to display the average salary for each department.
SELECT deptno, AVG(sal)
FROM emp
GROUP BY deptno
ORDER BY deptno;
DEPTNO AVG(SAL)
---------- ----------
10 2916.66667
20 2175
30 1566.66667
SQL>
In both cases, the aggregate function reduces the number of rows returned by the query.
Analytic functions also operate on subsets of rows, similar to aggregate functions in GROUP BY queries, but they do not reduce the number of rows returned by the query. For example, the following query reports the salary for each employee, along with the average salary of the employees within the department.
SET PAGESIZE 50
BREAK ON deptno SKIP 1 DUPLICATES
SELECT empno, deptno, sal,
AVG(sal) OVER (PARTITION BY deptno) AS avg_dept_sal
FROM emp;
EMPNO DEPTNO SAL AVG_DEPT_SAL
---------- ---------- ---------- ------------
7782 10 2450 2916.66667
7839 10 5000 2916.66667
7934 10 1300 2916.66667
7566 20 2975 2175
7902 20 3000 2175
7876 20 1100 2175
7369 20 800 2175
7788 20 3000 2175
7521 30 1250 1566.66667
7844 30 1500 1566.66667
7499 30 1600 1566.66667
7900 30 950 1566.66667
7698 30 2850 1566.66667
7654 30 1250 1566.66667
14 rows selected.
SQL>
This time AVG is an analytic function, operating on the group of rows defined by the contents of the OVER clause. This group of rows is known as a window, which is why analytic functions are sometimes referred to as window[ing] functions. Notice how the AVG function is still reporting the departmental average, like it did in the GROUP BY query, but the result is present in each row, rather than reducing the total number of rows returned. This is because analytic functions are performed on a result set after all join, WHERE, GROUP BY and HAVING clauses are complete, but before the final ORDER BY operation is performed.
Analytic Function Syntax
There are some variations in the syntax of the individual analytic functions, but the basic syntax for an analytic function is as follows.
analytic_function([ arguments ]) OVER (analytic_clause)
The analytic_clause breaks down into the following optional elements.
[ query_partition_clause ] [ order_by_clause [ windowing_clause ] ]
The sub-elements of the analytic_clause each have their own syntax diagrams, shown here. Rather than repeat the syntax diagrams, the following sections describe what each section of the analytic_clause is used for.
query_partition_clause
The query_partition_clause divides the result set into partitions, or groups, of data. The operation of the analytic function is restricted to the boundary imposed by these partitions, similar to the way a GROUP BY clause affects the action of an aggregate function. If the query_partition_clause is omitted, the whole result set is treated as a single partition. The following query uses an empty OVER clause, so the average presented is based on all the rows of the result set.
CLEAR BREAKS
SELECT empno, deptno, sal,
AVG(sal) OVER () AS avg_sal
FROM emp;
EMPNO DEPTNO SAL AVG_SAL
---------- ---------- ---------- ----------
7369 20 800 2073.21429
7499 30 1600 2073.21429
7521 30 1250 2073.21429
7566 20 2975 2073.21429
7654 30 1250 2073.21429
7698 30 2850 2073.21429
7782 10 2450 2073.21429
7788 20 3000 2073.21429
7839 10 5000 2073.21429
7844 30 1500 2073.21429
7876 20 1100 2073.21429
7900 30 950 2073.21429
7902 20 3000 2073.21429
7934 10 1300 2073.21429
SQL>
If we change the OVER clause to include a query_partition_clause based on the department, the averages presented are specifically for the department the employee belongs too.
BREAK ON deptno SKIP 1 DUPLICATES
SELECT empno, deptno, sal,
AVG(sal) OVER (PARTITION BY deptno) AS avg_dept_sal
FROM emp;
EMPNO DEPTNO SAL AVG_DEPT_SAL
---------- ---------- ---------- ------------
7782 10 2450 2916.66667
7839 10 5000 2916.66667
7934 10 1300 2916.66667
7566 20 2975 2175
7902 20 3000 2175
7876 20 1100 2175
7369 20 800 2175
7788 20 3000 2175
7521 30 1250 1566.66667
7844 30 1500 1566.66667
7499 30 1600 1566.66667
7900 30 950 1566.66667
7698 30 2850 1566.66667
7654 30 1250 1566.66667
SQL>
order_by_clause
The order_by_clause is used to order rows, or siblings, within a partition. So if an analytic function is sensitive to the order of the siblings in a partition you should include an order_by_clause. The following query uses the FIRST_VALUE function to return the first salary reported in each department. Notice we have partitioned the result set by the department, but there is no order_by_clause.
BREAK ON deptno SKIP 1 DUPLICATES
SELECT empno, deptno, sal,
FIRST_VALUE(sal IGNORE NULLS) OVER (PARTITION BY deptno) AS first_sal_in_dept
FROM emp;
EMPNO DEPTNO SAL FIRST_SAL_IN_DEPT
---------- ---------- ---------- -----------------
7782 10 2450 2450
7839 10 5000 2450
7934 10 1300 2450
7566 20 2975 2975
7902 20 3000 2975
7876 20 1100 2975
7369 20 800 2975
7788 20 3000 2975
7521 30 1250 1250
7844 30 1500 1250
7499 30 1600 1250
7900 30 950 1250
7698 30 2850 1250
7654 30 1250 1250
SQL>
Now compare the values of the FIRST_SAL_IN_DEPT column when we include an order_by_clause to order the siblings by ascending salary.
SELECT empno, deptno, sal,
FIRST_VALUE(sal IGNORE NULLS) OVER (PARTITION BY deptno ORDER BY sal ASC NULLS LAST) AS first_val_in_dept
FROM emp;
EMPNO DEPTNO SAL FIRST_VAL_IN_DEPT
---------- ---------- ---------- -----------------
7934 10 1300 1300
7782 10 2450 1300
7839 10 5000 1300
7369 20 800 800
7876 20 1100 800
7566 20 2975 800
7788 20 3000 800
7902 20 3000 800
7900 30 950 950
7654 30 1250 950
7521 30 1250 950
7844 30 1500 950
7499 30 1600 950
7698 30 2850 950
SQL>
In this case the "ASC NULLS LAST" keywords are unnecessary as ASC is the default for an order_by_clause and NULLS LAST is the default for ASC orders. When ordering by DESC, the default is NULLS FIRST.
It is important to understand how the order_by_clause affects display order. The order_by_clause is guaranteed to affect the order of the rows as they are processed by the analytic function, but it may not always affect the display order. As a result, you must always use a conventional ORDER BY clause in the query if display order is important. Do not rely on any implicit ordering done by the analytic function. Remember, the conventional ORDER BY clause is performed after the analytic processing, so it will always take precedence.
windowing_clause
We have seen previously the query_partition_clause controls the window, or group of rows, the analytic operates on. The windowing_clause gives some analytic functions a further degree of control over this window within the current partition, or whole result set if no partitioning clause is used. The windowing_clause is an extension of the order_by_clause and as such, it can only be used if an order_by_clause is present. The windowing_clause has two basic forms.
RANGE BETWEEN start_point AND end_point
ROWS BETWEEN start_point AND end_point
When using ROWS BETWEEN, you are indicating a specific number of rows relative to the current row, either directly, or via an expression. Assuming you don't cross a partition boundary, that number of rows is fixed. In contrast, when you use RANGE BETWEEN you are referring to a range of values in a specific column relative to the value in the current row. As a result, Oracle doesn't know how many rows are included in the range until the ordered set is created.
It is possible to omit the BETWEEN keyword and specify a single RANGE/ROWS endpoint. In this case, Oracle assumes your specified RANGE/ROWS is the start point and the end point it the current row. I would advise against using this syntax as it will be unclear to anyone who doesn't understand this default action.
Possible values for "start_point" and "end_point" are:
UNBOUNDED PRECEDING : The window starts at the first row of the partition, or the whole result set if no partitioning clause is used. Only available for start points.
UNBOUNDED FOLLOWING : The window ends at the last row of the partition, or the whole result set if no partitioning clause is used. Only available for end points.
CURRENT ROW : The window starts or ends at the current row. Can be used as start or end point.
value_expr PRECEDING : A physical or logical offset before the current row using a constant or expression that evaluates to a positive numerical value. When used with RANGE, it can also be an interval literal if the order_by_clause uses a DATE column.
value_expr FOLLOWING : As above, but an offset after the current row.
The documentation states the start point must always be before the end point, but this is not true, as demonstrated by this rather silly, but valid, query.
SELECT empno, deptno, sal,
AVG(sal) OVER (PARTITION BY deptno ORDER BY sal ROWS BETWEEN 0 PRECEDING AND 0 PRECEDING) AS avg_of_current_sal
FROM emp;
EMPNO DEPTNO SAL AVG_OF_CURRENT_SAL
---------- ---------- ---------- ------------------
7934 10 1300 1300
7782 10 2450 2450
7839 10 5000 5000
7369 20 800 800
7876 20 1100 1100
7566 20 2975 2975
7788 20 3000 3000
7902 20 3000 3000
7900 30 950 950
7654 30 1250 1250
7521 30 1250 1250
7844 30 1500 1500
7499 30 1600 1600
7698 30 2850 2850
SQL>
In fact, the start point must be before or equal to the end point. In addition, the current row does not have to be part of the window. The window can be defined to start and end before or after the current row.
For analytic functions that support the windowing_clause, the default action is "RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW". The following query is similar to one used previously to report the employee salary and average department salary, but now we have included an order_by_clause so we also get the default windowing_clause.
SELECT empno, deptno, sal,
AVG(sal) OVER (PARTITION BY deptno ORDER BY sal) AS avg_dept_sal_sofar
FROM emp;
EMPNO DEPTNO SAL AVG_DEPT_SAL_SOFAR
---------- ---------- ---------- ------------------
7934 10 1300 1300
7782 10 2450 1875
7839 10 5000 2916.66667
7369 20 800 800
7876 20 1100 950
7566 20 2975 1625
7788 20 3000 2175
7902 20 3000 2175
7900 30 950 950
7654 30 1250 1150
7521 30 1250 1150
7844 30 1500 1237.5
7499 30 1600 1310
7698 30 2850 1566.66667
SQL>
There are two things to notice here.
The addition of the order_by_clause without a windowing_clause means the query is now returning a running average.
The default windowing_clause is "RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW", not "ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW". The fact it is RANGE, not ROWS, means it stops at the first occurrence of the value in the current row, even if that is several rows earlier. As a result, duplicate rows are only included in the average when the salary value changes. You can see this in the last two records of department 20 and in the second and third records of department 30.
In my opinion, the default windowing_clause should have been "ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING". This would make the accidental inclusion of the windowing_clause much less confusing.
The following query shows one method for accessing data from previous and following rows within the current row using the windowing_clause. This can also be accomplished with LAG and LEAD.
CLEAR BREAKS
SELECT empno, deptno, sal,
FIRST_VALUE(sal) OVER (ORDER BY sal ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS previous_sal,
LAST_VALUE(sal) OVER (ORDER BY sal ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) AS next_sal
FROM emp;
EMPNO DEPTNO SAL PREVIOUS_SAL NEXT_SAL
---------- ---------- ---------- ------------ ----------
7369 20 800 800 950
7900 30 950 800 1100
7876 20 1100 950 1250
7521 30 1250 1100 1250
7654 30 1250 1250 1300
7934 10 1300 1250 1500
7844 30 1500 1300 1600
7499 30 1600 1500 2450
7782 10 2450 1600 2850
7698 30 2850 2450 2975
7566 20 2975 2850 3000
7788 20 3000 2975 3000
7902 20 3000 3000 5000
7839 10 5000 3000 5000
SQL>
Using Analytic Functions
The best way to understand what analytic functions are capable of is to play around with them. This article contains links to other articles I've written about specific analytic functions and the following documentation links list all analytic functions available in Oracle 12c Release 1. The "*" indicates that these functions allow for the full analytic syntax, including the windowing_clause.
AVG *
CLUSTER_DETAILS
CLUSTER_DISTANCE
CLUSTER_ID
CLUSTER_PROBABILITY
CLUSTER_SET
CORR *
COUNT *
COVAR_POP *
COVAR_SAMP *
CUME_DIST
DENSE_RANK
FEATURE_DETAILS
FEATURE_ID
FEATURE_SET
FEATURE_VALUE
FIRST
FIRST_VALUE *
LAG
LAST
LAST_VALUE *
LEAD
LISTAGG
MAX *
MEDIAN
MIN *
NTH_VALUE *
NTILE
PERCENT_RANK
PERCENTILE_CONT
PERCENTILE_DISC
PREDICTION
PREDICTION_COST
PREDICTION_DETAILS
PREDICTION_PROBABILITY
PREDICTION_SET
RANK
RATIO_TO_REPORT
REGR_ (Linear Regression) Functions *
ROW_NUMBER
STDDEV *
STDDEV_POP *
STDDEV_SAMP *
SUM *
VAR_POP *
VAR_SAMP *
VARIANCE *
No comments:
Post a Comment