February 16, 2016 at 10:50 pm
Comments posted to this topic are about the item Windows Functions: Tell me when that changes.
February 17, 2016 at 1:03 am
Just to make it clear, they are windowing functions or window functions.
Windows functions sounds like API of the OS :).
Good Article, great reminder!
February 17, 2016 at 5:48 am
hartmann 74688 (2/17/2016)
Just to make it clear, they are windowing functions or window functions.Windows functions sounds like API of the OS :).
Good Article, great reminder!
I noticed that too (and have made the same mistake). I don't know the SSC protocol for correcting articles but I hope that gets sorted out so this excellent article has a better chance of showing up earlier in a Google search.
-- Itzik Ben-Gan 2001
February 17, 2016 at 6:13 am
Thanks for the comments guys. Nomenclature is important and I will see what SSC can do to update the article.
February 17, 2016 at 6:29 am
Alternative method for those without LEAD() and LAG()
WITH cteSAMPLEDATA
AS (--Create Sample data
SELECT ID, ORDER_ID, CODE, ORDER_DATE FROM
(VALUES
(1, 'ORDER1', 'DS', N'20151001'),
(2, 'ORDER2', 'DS', N'20151001'),
(5, 'ORDER2', 'DS', N'20151002'),
(3, 'ORDER3', 'DS', N'20151001'),
(6, 'ORDER3', 'AG', N'20151002'),
(8, 'ORDER3', 'AG', N'20151003'),
(4, 'ORDER4', 'DS', N'20151001'),
(7, 'ORDER4', 'AG', N'20151002'),
(9, 'ORDER4', 'DS', N'20151003')
)
AS MYTABLE(ID, ORDER_ID, CODE, ORDER_DATE)
)
SELECT A.ID
,A.ORDER_ID
,A.CODE
,A.ORDER_DATEAS ORDER_DATE
,B.IDAS PREVIOUS_ID
,B.CODEAS PREVIOUS_CODE
,B.ORDER_DATEAS PREVIOUS_ORDERDATE
FROM (
SELECT DISTINCT X.*
FROM cteSAMPLEDATA C
CROSS APPLY(
SELECT TOP 1 *
FROM cteSAMPLEDATA Z
WHERE Z.ORDER_ID = C.ORDER_ID
ORDER BY Z.ORDER_DATE DESC
) X
) A
CROSS APPLY(
SELECT TOP 1 *
FROM cteSAMPLEDATA Z
WHERE Z.ORDER_ID = A.ORDER_ID
AND Z.ORDER_DATE<A.ORDER_DATE
ORDER BY Z.ORDER_DATE DESC
) B
WHERE A.CODE<>B.CODE
February 17, 2016 at 6:48 am
Great article. And since you are going to clean it up for "Windows" vs "Windowing", you can clean up the following typos, too 🙂
1. "Meaning it is the only order whose most recent date (2015-10-03) has a different code (DS) that the one before it (AG) from 2015-10-02"
("that" should be "than")
2. “(note this is for SQL Server 2012+ and Azure):
“Accesses data from a subsequent row in the same result set without the use of a self-join in SQL Server 2016. ..."
(confusing you quote from SQL Server 2016 when saying it is ok in 2012+)
3. "Since our rows as sorted by ORDER_DATE descending, that would be what we want." ("as" should be "are")
4. "LEAD(H1.CODE,1) --Get the code 1 row after the current row.
OVER (PARTITION BY ORDER_ID
ORDER BY ORDER_DATE DESC) AS PREVIOUS_CODE"
"LEAD(H1.CODE,1) --Get the code 1 row before the current row.
OVER (PARTITION BY ORDER_ID
ORDER BY ORDER_DATE DESC) AS PREVIOUS_CODE" (first comment says "after" but second occurance says "before")
February 17, 2016 at 8:16 am
Thanks Michael for doing this article. I thought it was a good explanation and example.
However I was also distracted by the window/windows difference and some typos / misspellings.
Here are a couple more I found:
"They wanted to know when the most resent order’s code was different from the one previous to it." (I think you meant "recent," not that it was re-sent. It seems to be correct everywhere else)
"In this case we want to partition our recordset by ORDER_ID, essentially grouping all rows with the sam ORDER_ID together."
(sam/same)
February 17, 2016 at 8:37 am
Great content, other than the typos and nomenclature things others have already mentioned. Thanks!
February 17, 2016 at 9:42 am
Re: Alternative method for those without LEAD() and LAG()
The original code WITH LEAD() and LAG() seems easier to read for me.
The alternative method would apply in a version of SSQL which does not support LEAD or LAG. Otherwise, time to look at using the new features.
Maybe someone who is an old hand at this could compare the actual execution plans to see which method scales better ?
February 17, 2016 at 10:35 am
I have encountered this precise scenario a dozen times.
You can do the same thing with a correlated sub-query, and they usually scale better than any CTE because a CTE is just an in-line view, which SQL Server will generate first.
In other words, SS will create ALL THE ROWS for a CTE before it processes the main query, whether it will need all those rows or not, and you can't put an index on a CTE.
Slow and disk intensive. Bleh! Better to create 2 temp tables and index them: 1 for the current order and 1 for the previous order.
Then join. Last time I did that the stored proc went from 2 hours to 20 minutes.
February 17, 2016 at 11:57 am
davidawest00 (2/17/2016)
I have encountered this precise scenario a dozen times.You can do the same thing with a correlated sub-query, and they usually scale better than any CTE because a CTE is just an in-line view, which SQL Server will generate first.
In other words, SS will create ALL THE ROWS for a CTE before it processes the main query, whether it will need all those rows or not, and you can't put an index on a CTE.
Slow and disk intensive. Bleh! Better to create 2 temp tables and index them: 1 for the current order and 1 for the previous order.
Then join. Last time I did that the stored proc went from 2 hours to 20 minutes.
Good explanation.
Just out of curiousity - how many records took 20 minutes ?
February 17, 2016 at 12:51 pm
It was to get the latest + previous purchase per customer...about 40 million out of 60-80 million rows...twice.
Once for current and once for previous.
February 17, 2016 at 1:38 pm
These articles with real life examples are great to help people get rid of cursors and change the mentalities.
I just wanted to make 2 comments:
For sample data, create a table (even a temp table) to prevent repeating the creation of sample data each time to concentrate in the real solution. This will also help define the correct data types.
CREATE TABLE MYTABLE(
ID int,
ORDER_ID char(6),
CODE char(2),
ORDER_DATE date
);
INSERT INTO MYTABLE
VALUES
(1, 'ORDER1', 'DS', '20151001'),
(2, 'ORDER2', 'DS', '20151001'),
(5, 'ORDER2', 'DS', '20151002'),
(3, 'ORDER3', 'DS', '20151001'),
(6, 'ORDER3', 'AG', '20151002'),
(8, 'ORDER3', 'AG', '20151003'),
(4, 'ORDER4', 'DS', '20151001'),
(7, 'ORDER4', 'AG', '20151002'),
(9, 'ORDER4', 'DS', '20151003');
Second, I found that there might be an alternative that can give a simpler plan. I'm not sure if that means better performance, but it usually helps. The secret is to reuse your windows as much as possible.
WITH cteGET_MAX_DATE_AND_CODE
AS (
SELECT
H1.ID,
H1.ORDER_ID,
H1.CODE,
H1.ORDER_DATE,
ROW_NUMBER() --Identify the latest row for each ORDER
OVER (PARTITION BY ORDER_ID ORDER BY ORDER_DATE DESC) AS ROW_NUM,
LEAD(H1.CODE,1) --Get the code 1 row before the current row.
OVER (PARTITION BY ORDER_ID
ORDER BY ORDER_DATE DESC) AS PREVIOUS_CODE
FROM MYTABLE H1
)
SELECT
ID, ORDER_ID, CODE, PREVIOUS_CODE, ORDER_DATE
FROM cteGET_MAX_DATE_AND_CODE
WHERE
ROW_NUM = 1
AND CODE != PREVIOUS_CODE
AND PREVIOUS_CODE IS NOT NULL;
February 18, 2016 at 1:38 am
Agreed, always look to use new features.
We have a couple of old 2008R2 servers and are sometimes forced to go old school on a few queries, hence my post.
I'd be interested in the analysis of the execution plans if someone wants to step up?
February 18, 2016 at 7:34 am
The execution plans do not tell the whole story. I have seen some cases by Gila Monster where the execution was simplified but the load on the server was not improved (roughly same execution time.
Viewing 15 posts - 1 through 15 (of 30 total)
You must be logged in to reply to this topic. Login to reply