If you inspect the available analytic functions, you’ll note that many of them deal with mathematical statistics, but not all. Analytic functions are like the aggregate functions, in that they operate upon a group of rows. But they are different in that they do not return just one row. Each analytic function can specify it’s own scope, varying from the entire report to any grouping definable by the flexible syntax provided within the OVER clause, also known as the analytic clause. This independent scoping for each analytic column in a query is what really distinguishes the capabilities of analytic functions. We’ll take a stepwise approach to exposing some of these functions and building SQL queries towards the desired result. Syntax:
analytic_function([ arguments ]) OVER (analytic_clause)
analytic_clause:
[ query_partition_clause ] [ order_by_clause [ windowing_clause ] ]
whitespace
Ranking Functions
For our tenure/pay discrepancy problem, we first need to explore the data to survey the differences between HIRE_DATE and SALARY. Several ranking functions will come in handy. Within the OVER clauses below, the PARTITION BY keyword defines the ranking groups as departments, because we want to consider our salary adjustments based upon tenure/pay discrepancies within departments, not company-wide.
w h i t e s p a c e
SELECT
department_id AS dept,
first_name||’ ‘||last_name AS ename,
salary,
RANK() OVER
(PARTITION BY department_id ORDER BY salary DESC) AS payrank,
DENSE_RANK() OVER
(PARTITION BY department_id ORDER BY salary DESC) AS dpayrank,
PERCENT_RANK() OVER
(PARTITION BY department_id ORDER BY salary ASC) AS pctrank,
CUME_DIST() OVER
(PARTITION BY department_id ORDER BY salary ASC) AS cumedist,
FROM employees
/
w h i t e s p a c e
The difference between RANK and DENSE_RANK is that DENSE_RANK forces strictly ordinal numbers after ranking ties, while RANK skips numbers after ties. Highlighted output for Dept. 60 illustrates this. PERCENT_RANK shows the percentage of employees within the grouping (dept) with lower-ranked salaries. CUME_DIST is very similar, but includes the item itself (ranked employee) in the percentage calculation. Again, inspect Dept. 60 to see the difference. We’ll select DENSE_RANK and PERCENT_RANK for our purposes. Note that some departments are too small, so we’ll impose a minimum eligibility size of 4 staff members. We’ll also curtail the accuracy of the decimal displays to save space. Finally, we’ll add in the tenure information and take an initial stab at building a column which compares seniority and salary rankings.
w h i t e s p a c e
SELECT
department_id AS dept,
first_name||’ ‘||last_name AS ename,
salary,
DENSE_RANK() OVER
(PARTITION BY department_id ORDER BY salary DESC) AS srank,
100*PERCENT_RANK() OVER
(PARTITION BY department_id ORDER BY salary ASC) AS psrank,
sysdate – hire_date AS tenure,
DENSE_RANK() OVER
(PARTITION BY department_id ORDER BY hire_date ASC) AS trank,
100*PERCENT_RANK() OVER
(PARTITION BY department_id ORDER BY hire_date DESC) AS ptrank,
100*PERCENT_RANK() OVER (PARTITION BY department_id ORDER BY hire_date DESC)
- 100*PERCENT_RANK() OVER (PARTITION BY department_id ORDER BY salary ASC)
AS skew
FROM employees
WHERE department_id IN
(SELECT department_id FROM
(SELECT COUNT(*), department_id FROM employees
GROUP BY department_id HAVING COUNT(*) > 3)
)
ORDER BY 9 DESC, 1 ASC
/
w h i t e s p a c e
view query results
This gives some usable results. We can see from the sorted SKEW column that 12 employees have effectively equal percent ranks for tenure and salary (middle of report). Only employees above this cluster have tenure rankings which are lower, within their departments, than their salary rankings. The top two rows, Austin and Marlow, show extreme ‘skew’: quite high tenure coupled with relatively low salary. Since we are only interested in addressing relatively skewed cases, we will apply the following business rule:
• Ignore all skews below 25
• Award a 1% increase to all employees if skew >= 25
• Award an additional 1% increase (total 2%) if skew >= 50
• Award an additional 0.5% increase (total 2.5%) if skew >= 75
Before proceeding however, let’s check an alternative method for ranking employees, using NTILEs. Any ranked group, including the entire report, can be chunked into quartiles or quintiles, or N-tiles. We have a few smallish departments in our test data, so it makes sense to look for a happy medium when it comes to tile sizes. Let’s try splitting into three and four groups, respectively, and to make things interesting, let’s limit our study to the smaller departments. We could then consider an alternative adjustment rule such as this:
• Define skew as salary quartile minus tenure quartile
• Award a 1% increase to all employees if skew = 2
• Award an additional 1% increase (total 2%) if skew = 3
w h i t e s p a c e
SELECT
department_id, last_name, salary,
NTILE(4) OVER (PARTITION BY department_id ORDER BY salary DESC) AS s4tile,
NTILE(3) OVER (PARTITION BY department_id ORDER BY salary DESC) AS s3tile,
sysdate-hire_date AS tenure,
NTILE(4) OVER (PARTITION BY department_id ORDER BY hire_date ASC) AS t4tile,
NTILE(3) OVER (PARTITION BY department_id ORDER BY hire_date ASC) AS t3tile
FROM employees
WHERE department_id IN
(SELECT department_id FROM
(SELECT COUNT(*), department_id
FROM employees
GROUP BY department_id
HAVING COUNT(*) BETWEEN 4 AND 8)
)
ORDER BY 1 ASC, 3 DESC, 6 DESC
/
w h i t e s p a c e
view query results
It looks like there’s a granularity problem with quartiles (and tretiles) for our sample data, which becomes clearer with small departments (30, 60, 100) of six people or less. If you compare employees Raphaely and Khoo (highlighted), you’ll see they’ve both been lumped into the 1st quartile despite Raphaely’s salary being more than three times greater than Khoo’s! The opposite issue is seen with tenure data for Hunold and Patabella: they are placed into adjacent tenure quartiles although they have the two closest HIRE_DATE values in their department.
This argues for choosing the PERCENT_RANK skew measure method seen previously. It has the right granularity for the group size, or PARTITION in analytic function parlance (not to be confused with data partitions for very large tables and indexes).
Adjusting the Skew
But the following query shows PERCENT_RANK is not perfect either, just better than NTILE. It’s nearly identical with the previous one, focusing upon the same 17 employees. Look at the percentile ranking for Raphaely and Khoo; you’ll see that although the numbers have improved, Khoo, with nearly $80K less in salary is still within 20% of Raphaely.
I included two new columns, JOB_ID and a flag to indicate management status. In our case the adjacent large jumps in salary within departments has to do with these two factors. It’s easy enough to alter our strategy by excluding managers from our salary adjustment study, or alter the study to group people by job titles within departments instead of simply by departments; preferably both. But a quick discussion of our findings with the HR officer results in the spec changing: group employees by department and job title combined, and formulate salary/tenure skews on that basis.
This decision leaves the gap between Hunold and Ernst in Dept. 60 as the only grey area. (Review previous query results.) They share the same job titles, but Hunold is in management, and makes about $50K more. It’s not a problem for our needs however, because Ernst has the lowest tenure ranking in his department and job title class, thus will not be in consideration for this pay adjustment.
Pressing forward, we amend the query as follows:
w h i t e s p a c e
SELECT
department_id||job_id AS deptjob,
last_name,
salary,
sysdate – hire_date AS tenure,
100*PERCENT_RANK() OVER
(PARTITION BY department_id||job_id ORDER BY salary ASC) AS sprank,
100*PERCENT_RANK() OVER
(PARTITION BY department_id||job_id ORDER BY hire_date DESC) AS tprank,
100*PERCENT_RANK() OVER (PARTITION BY department_id||job_id ORDER BY hire_date DESC)
- 100*PERCENT_RANK() OVER (PARTITION BY department_id||job_id ORDER BY salary ASC)
AS prskew
FROM employees
WHERE department_id||job_id IN
(SELECT deptjob FROM
(SELECT COUNT(*), department_id||job_id AS deptjob
FROM employees
GROUP BY department_id||job_id
HAVING COUNT(*) > 3 )
)
ORDER BY 1 ASC, 7 DESC
/
w h i t e s p a c e
view query results
Within department-jobs, coded via the concatenated string construct DEPARTMENT_ID||JOB_TITLE, employees are sorted by decreasing percent rank skews between their tenures and salaries. The TENURE column number unit is days. Recalling that our cutoff skew is going to be a minimum 25.0 point differential, we see that only 11 employees will be affected by this pay raise. Now, we’re ready for the final query which lists the adjustments for each affected employee.
w h i t e s p a c e
column adjustment format $99999.99;
WITH skewed_pay AS
(
SELECT
department_id||job_id AS deptjob,
first_name||’ ‘||last_name AS ename,
salary,
100*PERCENT_RANK() OVER
(PARTITION BY department_id||job_id ORDER BY salary ASC) AS sprank,
sysdate – hire_date AS tenure,
100*PERCENT_RANK() OVER
(PARTITION BY department_id||job_id ORDER BY hire_date DESC) AS tprank
FROM employees
WHERE department_id||job_id IN
(SELECT deptjob FROM
(SELECT COUNT(*), department_id||job_id AS deptjob FROM employees
GROUP BY department_id||job_id HAVING COUNT(*) > 3 )
)
)
SELECT
deptjob, ename, salary, tenure,
tprank – sprank AS skew,
(CASE
WHEN tprank-sprank >= 25.0 AND tprank-sprank < 50.0 THEN .01*salary
WHEN tprank-sprank >= 50.0 AND tprank-sprank < 75.0 THEN .02*salary
WHEN tprank-sprank >= 75.0 THEN .025*salary
ELSE 0
END) adjustment
FROM skewed_pay
WHERE tprank – sprank >= 25.0
ORDER BY 5 DESC
/
w h i t e s p a c e
view query results
The Windowing Clause
Let’s summarize how the basic processing for Analytic SQL statements work. The analytic functions are not dealt with until after the basic query has been processed. Once this result set is obtained, the analytic partitions and functions are applied against it as a second pass. Afterwards, any final report-wide sorting is performed.
We’ve thus far only looked at simple analytic clauses. They can get much more complex and look like pretty strange SQL with all sorts of extra reserved words. I’ll introduce just two slightly more complex items which utilize windows of related rows. In analytic function syntax, the windowing clause is an optional element after the PARTITION BY and ORDER BY clauses of the full analytic clause. They return a value for each row in the output, which depends on other rows within the corresponding window. With them you can compute and display cumulative, moving, and centered aggregates. The last two are so-called sliding windows which I’ll cover in a subsequent post.For now, let’s consider that our HR officer is happy with the final query results, except for needing a little more info about how department budgets will be affected. We need to summarize the pay adjustment impacts by department and determine which percentages of the overall adjustment are going to which staff members. We can satisfy the first need with a cumulative windowed aggregate and the second with the RATIO_TO_REPORT analytic function.
Here’s what the completed query will look like, taking just the final SELECT. (To see the entirety, including both named WITH subqueries, click the button below.) The output is also shown.
w h i t e s p a c e
WITH
skewed_pay AS
(
SELECT. . .
),
adjust_calc AS
(
SELECT. . .
)
SELECT
department_id AS dept, job_id, ename, salary, skew, adjustment,
TO_CHAR(100.00*RATIO_TO_REPORT(adjustment) OVER (), ‘99.99’) AS R_to_R,
SUM(adjustment) OVER
(PARTITION BY department_id ORDER BY department_id ASC
ROWS UNBOUNDED PRECEDING) AS dept_cum
FROM adjust_calc
ORDER BY 1 ASC, 4 DESC
/
w h i t e s p a c e
view final report
Dept. JOB_ID ENAME SALARY SKEW ADJUSTMENT R_TO_R DEPT_CUM
===== ========== ================== ======= ====== ========== ====== ==========
30 PU_CLERK Sigal Tobias 27776 25.0 $277.76 2.79 $277.76
50 ST_MAN Payam Kaufling 78368 50.0 $1567.36 15.73 $1567.36
50 ST_CLERK James Marlow 24800 52.6 $496.00 4.98 $2063.36
60 IT_PROG Valli Pataballa 76186 25.0 $761.86 7.65 $761.86
60 IT_PROG David Austin 76186 75.0 $1904.64 19.12 $2666.50
60 IT_PROG Diana Lorentz 66662 25.0 $666.62 6.69 $3333.12
80 SA_REP Patrick Sully 94240 28.6 $942.40 9.46 $942.40
80 SA_REP Allan McEwen 89280 28.6 $892.80 8.96 $1835.20
80 SA_REP Lindsey Smith 79360 39.3 $793.60 7.96 $2628.80
80 SA_REP Louise Doran 74400 25.0 $744.00 7.47 $3372.80
100 FI_ACCOUNT Ismael Sciarra 91661 25.0 $916.61 9.20 $916.61
11 rows selected.
Aggregation Scopes (Partition Windows): entire report departments dept + job_id
w h i t e s p a c e
A couple of things to notice… For the RATIO_TO_REPORT function, the analytic partition is the entire report, so the syntax collapses to: OVER (). In other words, the PARTITION keyword is unneeded. The meaning of the 3rd row, as an example, is that employee Marlow’s adjustment of $496 per annum represents 4.98% of the entire adjustment total being awarded company-wide. The DEPT_CUM column uses the SUM function applied cumulatively within every department, as indicated by it’s PARTITION clause. The cumulative summing, downwards, is accomplished via the ROWS UNBOUNDED PRECEDING keywords. So, for example, Diana Lorentz is the final employee in the sorting sequence for Dept. 60, thus her adjustment of $666.62 is summed with the previous running department total, yielding a DEPT_CUM value of $3333.12 for IT. The power and flexibility of analytic SQL is seen in that three different aggregation scopes are operating simultaneously in this query output. The column R_TO_R works against the entire report; the DEPT_CUM column applies within department breaks; and the SKEW and ADJUSTMENT columns, which are based upon the hidden underlying PERCENT_RANK columns (in the WITH AS clauses) work against a concatenation of department and JOB_ID breaks.
This tour only touches the surface of what can be done with SQL analytic queries. But it gives an idea of how they work and introduces several functions. At a later time, I’ll delve into more complicated examples.
~RS