Itzik Ben-Gan

T-SQL Windowing Improvements in SQL Server 2022

SentryOne Newsletters

The SQLPerformance.com bi-weekly newsletter keeps you up to speed on the most recent blog posts and forum discussions in the SQL Server community.

eNews is a bi-monthly newsletter with fun information about SentryOne, tips to help improve your productivity, and much more.

Subscribe

Featured Author

Erin Stellato is a Principal Consultant with SQLskills and a Microsoft Data Platform MVP.

Erin’s Posts

Microsoft recently released the first public preview of SQL Server 2022. This release has a number of T-SQL improvements. In this article I focus on windowing and NULL-related improvements. These include the new WINDOW clause and the windowing NULL treatment clause.

I’ll be using the sample database TSQLV6 in the examples in this article. You can download this sample database here.

The WINDOW Clause

The WINDOW clause is part of the ISO/IEC SQL standard. It allows you to name parts of a window specification—or an entire one—and then use the window name in the OVER clause of your query’s window functions. This clause allows you to shorten your code by avoiding the repetition of identical parts of your window specifications. This clause is now available in Azure SQL Database and SQL Server 2022, provided you use database compatibility level 160 or higher.

The WINDOW clause is located between the query’s HAVING and ORDER BY clauses:

SELECT
FROM
WHERE
GROUP BY
HAVING
WINDOW
ORDER BY

The WINDOW clause has the following syntax:

WINDOW window_name AS ( [ reference_window_name ]   
                        [ <window partition clause> ]  
                        [ <window order clause> ]   
                        [ <window frame clause> ] )

As an example where the WINDOW clause can be handy in shortening your code, consider the following query:

USE TSQLV6;

SELECT orderid, custid, orderdate, qty, val,
  SUM(qty) OVER( PARTITION BY custid 
                 ORDER BY orderdate, orderid
                 ROWS UNBOUNDED PRECEDING ) AS runsumqty,
  SUM(val) OVER( PARTITION BY custid 
                 ORDER BY orderdate, orderid
                 ROWS UNBOUNDED PRECEDING ) AS runsumval
FROM Sales.OrderValues
WHERE custid IN (1, 2)
ORDER BY custid, orderdate, orderid;

This query generates the following output:

orderid     custid      orderdate  qty         val     runsumqty   runsumval
----------- ----------- ---------- ----------- ------- ----------- ----------
10643       1           2021-08-25 38          814.50  38          814.50
10692       1           2021-10-03 20          878.00  58          1692.50
10702       1           2021-10-13 21          330.00  79          2022.50
10835       1           2022-01-15 17          845.80  96          2868.30
10952       1           2022-03-16 18          471.20  114         3339.50
11011       1           2022-04-09 60          933.50  174         4273.00
10308       2           2020-09-18 6           88.80   6           88.80
10625       2           2021-08-08 18          479.75  24          568.55
10759       2           2021-11-28 10          320.00  34          888.55
10926       2           2022-03-04 29          514.40  63          1402.95

In this query you can see two window functions using identical window specifications, including window partitioning, ordering and framing clauses. To shorten the query, you can use the WINDOW clause to name a window specification with all three elements, say as W, and then specify OVER W in both window functions, like so:

SELECT orderid, custid, orderdate, qty, val,
  SUM(qty) OVER W AS runsumqty,
  SUM(val) OVER W AS runsumval
FROM Sales.OrderValues
WHERE custid IN (1, 2)
WINDOW W AS ( PARTITION BY custid 
              ORDER BY orderdate, orderid
              ROWS UNBOUNDED PRECEDING )
ORDER BY custid, orderdate, orderid;

As you can see, when the window name represents the whole window specification that you need and not just part of it, you specify the window name right after the OVER clause without parentheses.

You may have noticed in the WINDOW clause’s syntax one window name specification can have a reference to another window name. This is especially useful when your query has different window functions with different window specifications and one window specification is the same as part of another. Consider the following query as an example:

SELECT orderid, custid, orderdate, qty, val,
  ROW_NUMBER() OVER( PARTITION BY custid
                     ORDER BY orderdate, orderid ) AS ordernum,
  MAX(orderdate) OVER( PARTITION BY custid ) AS maxorderdate,
  SUM(qty) OVER( PARTITION BY custid 
                 ORDER BY orderdate, orderid
                 ROWS UNBOUNDED PRECEDING ) AS runsumqty,
  SUM(val) OVER( PARTITION BY custid           
                 ORDER BY orderdate, orderid   
                 ROWS UNBOUNDED PRECEDING ) AS runsumval
FROM Sales.OrderValues
WHERE custid IN (1, 2)
ORDER BY custid, orderdate, orderid;

This query generates the following output:

orderid  custid  orderdate  qty  val     ordernum  maxorderdate runsumqty   runsumval
-------- ------- ---------- ---- ------- --------- ------------ ----------- -----------
10643    1       2021-08-25 38   814.50  1         2022-04-09   38          814.50
10692    1       2021-10-03 20   878.00  2         2022-04-09   58          1692.50
10702    1       2021-10-13 21   330.00  3         2022-04-09   79          2022.50
10835    1       2022-01-15 17   845.80  4         2022-04-09   96          2868.30
10952    1       2022-03-16 18   471.20  5         2022-04-09   114         3339.50
11011    1       2022-04-09 60   933.50  6         2022-04-09   174         4273.00
10308    2       2020-09-18 6    88.80   1         2022-03-04   6           88.80
10625    2       2021-08-08 18   479.75  2         2022-03-04   24          568.55
10759    2       2021-11-28 10   320.00  3         2022-03-04   34          888.55
10926    2       2022-03-04 29   514.40  4         2022-03-04   63          1402.95

The MAX function’s window specification has only a window partition clause. The ROW_NUMBER function’s window specification has a window partition clause that is the same as the MAX function’s, plus a window order clause. Both SUM functions have the same window partition and order clauses as the ROW_NUMBER function’s, plus a window frame clause.

The recursive capability of the WINDOW clause’s syntax allows you to shorten the query’s code, like so:

SELECT orderid, custid, orderdate, qty, val,
  ROW_NUMBER() OVER PO AS ordernum,
  MAX(orderdate) OVER P AS maxorderdate,
  SUM(qty) OVER POF AS runsumqty,
  SUM(val) OVER POF AS runsumval
FROM Sales.OrderValues
WHERE custid IN (1, 2)
WINDOW P AS ( PARTITION BY custid ),
       PO AS ( P ORDER BY orderdate, orderid ),
       POF AS ( PO ROWS UNBOUNDED PRECEDING )
ORDER BY custid, orderdate, orderid;

The order of the window name definitions in the WINDOW clause is insignificant. For example, the following code is valid and has the same meaning as the above query:

SELECT orderid, custid, orderdate, qty, val,
  ROW_NUMBER() OVER PO AS ordernum,
  MAX(orderdate) OVER P AS maxorderdate,
  SUM(qty) OVER POF AS runsumqty,
  SUM(val) OVER POF AS runsumval
FROM Sales.OrderValues
WHERE custid IN (1, 2)
WINDOW POF AS ( PO ROWS UNBOUNDED PRECEDING ),
       PO AS ( P ORDER BY orderdate, orderid ),
       P AS ( PARTITION BY custid )
ORDER BY custid, orderdate, orderid;

Note, though, you can't use multiple window name references in one window name specification. You're limited to only one window name reference, plus any relevant additional window specification elements. For example, the following code isn’t valid for this reason:

SELECT orderid, custid, orderdate, qty, val,
  SUM(qty) OVER ( P O F ) AS runsumqty,
  SUM(val) OVER ( P O F ) AS runsumval
FROM Sales.OrderValues
WHERE custid IN (1, 2)
WINDOW P AS ( PARTITION BY custid ),
       O AS ( ORDER BY orderdate, orderid ),
       F AS ( ROWS UNBOUNDED PRECEDING )
ORDER BY custid, orderdate, orderid;

This code generates the following error:

Msg 102, Level 15, State 1, Line 106
Incorrect syntax near 'O'.

You're allowed to mix one window name and additional windowing elements in a window specification, like so:

SELECT orderid, custid, orderdate, qty, val,
  ROW_NUMBER() OVER ( P ORDER BY orderdate, orderid ) AS ordernum,
  MAX(orderdate) OVER P AS maxorderdate
FROM Sales.OrderValues
WHERE custid IN (1, 2)
WINDOW P AS ( PARTITION BY custid )
ORDER BY custid, orderdate, orderid;

This query generates the following output:

orderid     custid      orderdate  qty         val     ordernum             maxorderdate
----------- ----------- ---------- ----------- ------- -------------------- ------------
10643       1           2021-08-25 38          814.50  1                    2022-04-09
10692       1           2021-10-03 20          878.00  2                    2022-04-09
10702       1           2021-10-13 21          330.00  3                    2022-04-09
10835       1           2022-01-15 17          845.80  4                    2022-04-09
10952       1           2022-03-16 18          471.20  5                    2022-04-09
11011       1           2022-04-09 60          933.50  6                    2022-04-09
10308       2           2020-09-18 6           88.80   1                    2022-03-04
10625       2           2021-08-08 18          479.75  2                    2022-03-04
10759       2           2021-11-28 10          320.00  3                    2022-03-04
10926       2           2022-03-04 29          514.40  4                    2022-03-04

As I mentioned before, when a window name represents the whole window specification, like with the MAX function in this query, you specify the window name right after the OVER clause without parentheses. When the window name is only part of the window specification, like with the ROW_NUMBER function in this query, you specify the window name followed by the rest of the windowing elements within parentheses.

By now, you know you're allowed to recursively define one window name based on another. However, in case it wasn’t obvious, cyclic references aren’t allowed. For example, the following query is valid since the window name definitions aren’t cyclic:

SELECT 'This is valid'
WINDOW W1 AS (), W2 AS (W1), W3 AS (W2);

This query generates the following output:

-------------
This is valid

However, the following query is invalid since the window name definitions are cyclic:

SELECT 'This is invalid'
WINDOW W1 AS (W2), W2 AS (W3), W3 AS (W1);

This code generates the following error:

Msg 5365, Level 15, State 1, Line 108
Cyclic window references are not permitted.

Lastly, the scope of the defined window names is the immediate query/table expression, and can't cross table expression boundaries. For instance, if you define a window name in the inner query of a CTE, derived table, view or inline table valued function, the outer query won’t recognize the inner window name. As an example, the following query is invalid for this reason:

WITH C AS
(
  SELECT orderid, custid, orderdate, qty, val,
    SUM(qty) OVER W AS runsumqtyall
  FROM Sales.OrderValues
  WHERE custid IN (1, 2)
  WINDOW W AS ( PARTITION BY custid 
                ORDER BY orderdate, orderid
                ROWS UNBOUNDED PRECEDING )
)
SELECT *,
  SUM(qty) OVER W AS runsumqty22
FROM C
WHERE orderdate >= '20220101';

This code generates the following error:

Msg 5362, Level 15, State 3, Line 172
Window 'W' is undefined.

You have to define a window name you want to use in each of the scopes where you want to use it, like so:

WITH C AS
(
  SELECT orderid, custid, orderdate, qty, val,
    SUM(qty) OVER W AS runsumqtyall
  FROM Sales.OrderValues
  WHERE custid IN (1, 2)
  WINDOW W AS ( PARTITION BY custid 
                ORDER BY orderdate, orderid
                ROWS UNBOUNDED PRECEDING )
)
SELECT *,
  SUM(qty) OVER W AS runsumqty22
FROM C
WHERE orderdate >= '20220101'
WINDOW W AS ( PARTITION BY custid 
              ORDER BY orderdate, orderid
              ROWS UNBOUNDED PRECEDING );

This query generates the following output:

orderid     custid      orderdate  qty         val     runsumqtyall runsumqty22
----------- ----------- ---------- ----------- ------- ------------ -----------
10835       1           2022-01-15 17          845.80  96           17
10952       1           2022-03-16 18          471.20  114          35
11011       1           2022-04-09 60          933.50  174          95
10926       2           2022-03-04 29          514.40  63           29

Each of the scopes defines its own window name W, and they don’t have to be based on the same specification (though they are in this example).

The Windowing NULL Treatment Clause

The NULL treatment clause is part of the ISO/IEC SQL standard and is available to the offset window functions FIRST_VALUE, LAST_VALUE, LAG and LEAD. This clause has the following syntax:

<function>(<scalar_expression>[, <other args>]) [IGNORE NULLS | RESPECT NULLS] OVER( <specification> )

The RESPECT NULLS option is the default, in case you don’t indicate this clause. It means you want the function to return the value of <scalar_expression> in the requested position (first, last, previous, next), whether it's NULL or non-NULL. The IGNORE NULLS option introduces a new capability that people have been eagerly waiting to have in T-SQL for a long time. It means you want the function to return the value of <scalar_expression> in the requested position if it's non-NULL. However, if it is NULL, you want the function to keep going in the relevant direction (backward for LAST_VALUE and LAG, forward for FIRST_VALUE and LEAD) until a non-NULL value is found. If no non-NULL value is found, then it will return a NULL.

To illustrate the utility of this clause, I’ll use a table called T1 in my examples. Use the following code to create and populate T1:

DROP TABLE IF EXISTS dbo.T1;

CREATE TABLE dbo.T1
(
  id INT NOT NULL CONSTRAINT PK_T1 PRIMARY KEY,
  col1 INT NULL,
  col2 INT NULL
);
GO

INSERT INTO dbo.T1(id, col1, col2) VALUES
  ( 2, NULL,  200),
  ( 3,   10, NULL),
  ( 5,   -1, NULL),
  ( 7, NULL,  202),
  (11, NULL,  150),
  (13,  -12,   50),
  (17, NULL,  180),
  (19, NULL,  170),
  (23, 1759, NULL);

Suppose the column id represents the chronological order of the events recorded in T1. Each row represents an event where one or more attribute values have changed. A NULL means the attribute retains whatever last non-NULL value it had up to that point.

Suppose you need to return the last known (non-NULL) col1 value per event. Without access to the NULL treatment clause, you'd need to use a fairly complex technique such as the following:

WITH C AS
(
  SELECT id, col1,
    MAX(CASE WHEN col1 IS NOT NULL THEN id END)
      OVER(ORDER BY id
           ROWS UNBOUNDED PRECEDING) AS grp
  FROM dbo.T1
)
SELECT id, col1,
  MAX(col1) OVER(PARTITION BY grp
                 ORDER BY id
                 ROWS UNBOUNDED PRECEDING) AS lastknowncol1
FROM C;

If you aren't already familiar with this technique, it can take a bit to figure out the logic here.

This code generates the following output:

id          col1        lastknowncol1
----------- ----------- -------------
2           NULL        NULL
3           10          10
5           -1          -1
7           NULL        -1
11          NULL        -1
13          -12         -12
17          NULL        -12
19          NULL        -12
23          1759        1759

Having access to the NULL treatment clause, you can easily achieve the same using the LAST_VALUE function with the IGNORE NULLS option, like so:

SELECT id, col1,
  LAST_VALUE(col1) IGNORE NULLS OVER( ORDER BY id ROWS UNBOUNDED PRECEDING ) AS lastknowncol
FROM dbo.T1;

The difference is of course more dramatic if you need to apply this logic to multiple attributes.

Without access to the NULL treatment clause, you'd use the following code to return the last known col1 and col2 values:

WITH C AS
(
  SELECT id, col1, col2,
    MAX(CASE WHEN col1 IS NOT NULL THEN id END)
      OVER(ORDER BY id
           ROWS UNBOUNDED PRECEDING) AS grp1,
    MAX(CASE WHEN col2 IS NOT NULL THEN id END)
      OVER(ORDER BY id
           ROWS UNBOUNDED PRECEDING) AS grp2
  FROM dbo.T1
)
SELECT id,
  col1,
  MAX(col1) OVER(PARTITION BY grp1
                 ORDER BY id
                 ROWS UNBOUNDED PRECEDING) AS lastknowncol1,
  col2,
  MAX(col2) OVER(PARTITION BY grp2
                 ORDER BY id
                 ROWS UNBOUNDED PRECEDING) AS lastknowncol2
FROM C;

This code generates the following output:

id          col1        lastknowncol1 col2        lastknowncol2
----------- ----------- ------------- ----------- -------------
2           NULL        NULL          200         200
3           10          10            NULL        200
5           -1          -1            NULL        200
7           NULL        -1            202         202
11          NULL        -1            150         150
13          -12         -12           50          50
17          NULL        -12           180         180
19          NULL        -12           170         170
23          1759        1759          NULL        170

I should also note even though the table T1 has a supporting covering index with id as the key, each of the last known attribute calculations in the query above results in an explicit sort operator in the plan, as shown in Figure 1.

Figure 1: Plan for query without the NULL treatment clauseFigure 1: Plan for query without the NULL treatment clause

This fact makes this solution quite expensive.

Here’s the alternative using the NULL treatment clause:

SELECT id, 
  col1, LAST_VALUE(col1) IGNORE NULLS OVER W AS lastknowncol1,
  col2, LAST_VALUE(col2) IGNORE NULLS OVER W AS lastknowncol2
FROM dbo.T1
WINDOW W AS ( ORDER BY id ROWS UNBOUNDED PRECEDING );

This solution is so much shorter and more elegant, and the optimization of the functions with this option can rely on an ordered scan of a supporting index, and thus avoid explicit sorting, as shown in the plan for this query in Figure 2.

Figure 2: Plan for query with the NULL treatment clauseFigure 2: Plan for query with the NULL treatment clause

As mentioned, the NULL treatment clause is available to all offset window functions (FIRST_VALUE, LAST_VALUE, LAG, and LEAD). Here’s an example using LAG to return the previous known value:

SELECT id, col1, 
  LAG(col1) IGNORE NULLS OVER ( ORDER BY id ) AS prevknowncol1
FROM dbo.T1;

This code generates the following output:

id          col1        prevknowncol1
----------- ----------- -------------
2           NULL        NULL
3           10          NULL
5           -1          10
7           NULL        -1
11          NULL        -1
13          -12         -1
17          NULL        -12
19          NULL        -12
23          1759        -12

Want to try to achieve the same without the NULL treatment clause? I bet you don’t!

Conclusion and Other T-SQL Improvements in SQL Server 2022

In this article I covered T-SQL improvements in SQL Server 2022 concerning window functions and NULL handling. I showed how to:

  • Reuse parts of—or entire—window definitions with the WINDOW clause
  • Control NULL treatment in offset window functions with the NULL treatment clause

SQL Server 2022 has additional T-SQL improvements, covered by Aaron Bertrand in this article:

  • GREATEST / LEAST
  • STRING_SPLIT
  • DATE_BUCKET
  • GENERATE_SERIES