February 18, 2016 at 7:56 am
j-1064772 (2/18/2016)
The execution plans do not tell the whole story. I have seen some cases by Gila Monster where the execution was simplified but the load on the server was not improved (roughly same execution time.
Agreed, that's why I mention it as a possibility, not a certainty. To be sure about performance improvement, we would need to generate sample data and run tests. I'm not sure that I could generate useful sample data without knowing the real behavior for this data. That's the reason I didn't provide a test as well.
February 18, 2016 at 7:56 am
davidawest00 (2/17/2016)
SS will create ALL THE ROWS for a CTE before it processes the main query, whether it will need all those rows or not
Not necessarily. The query below only does an index seek on the nc1 index of sysjobsteps when I ran it. Sure, if the CTE is complex enough, or returns enough of the rows in the table(s) it uses, it may be materialised. You can't generalise that to all cases, though.
WITH JobsAS (
SELECT * FROM msdb.dbo.sysjobsteps
)
SELECT step_name
FROM Jobs
WHERE job_id = '22B6E268-E13D-497C-B582-0C0559E42382'
John
February 18, 2016 at 8:05 am
John Mitchell-245523 (2/18/2016)
davidawest00 (2/17/2016)
SS will create ALL THE ROWS for a CTE before it processes the main query, whether it will need all those rows or notNot necessarily. The query below only does an index seek on the nc1 index of sysjobsteps when I ran it. Sure, if the CTE is complex enough, or returns enough of the rows in the table(s) it uses, it may be materialised. You can't generalise that to all cases, though.
WITH JobsAS (
SELECT * FROM msdb.dbo.sysjobsteps
)
SELECT step_name
FROM Jobs
WHERE job_id = '22B6E268-E13D-497C-B582-0C0559E42382'
John
Just another example to demonstrate that we're telling SQL Server what we want, not how we want it to do it. There's a lot happening behind our code which is not interpreted left to right.
CREATE TABLE #Example(
Mystring char(5)
);
INSERT INTO #Example VALUES('a'), ('1');
WITH CTE AS(
SELECT *
FROM #Example
WHERE ISNUMERIC(Mystring) = 1
)
SELECT *
FROM CTE
WHERE Mystring = 1;
DROP TABLE #Example;
March 8, 2016 at 5:02 pm
Thanks for the article.
March 17, 2017 at 2:54 am
davidawest00 - Wednesday, February 17, 2016 10:35 AMI have encountered this precise scenario a dozen times.You can do the same thing with a correlated sub-query, and they usually scale better than any CTE because a CTE is just an in-line view, which SQL Server will generate first.In other words, SS will create ALL THE ROWS for a CTE before it processes the main query, whether it will need all those rows or not, and you can't put an index on a CTE.Slow and disk intensive. Bleh! Better to create 2 temp tables and index them: 1 for the current order and 1 for the previous order.Then join. Last time I did that the stored proc went from 2 hours to 20 minutes.
Just to clarify, this is a myth. SQL Server does not "evaluate" a CTE in this manner at all. CTE's are simply a coding construct similar to a derived table or a correlated subquery.
There are some parallels with recursive CTE's but that's a different story.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
March 17, 2017 at 4:21 am
You can also provide a Default value for the LEAD/LAG Window functions.
In this case, it means not having to add the extra AND PREVIOUS_CODE IS NOT NULL in the where clause
WITH cteSAMPLEDATA
AS (
SELECT ID, ORDER_ID, CODE, ORDER_DATE FROM
(VALUES
(1, 'ORDER1', 'DS', N'20151001'),
(2, 'ORDER2', 'DS', N'20151001'),
(5, 'ORDER2', 'DS', N'20151002'),
(3, 'ORDER3', 'DS', N'20151001'),
(6, 'ORDER3', 'AG', N'20151002'),
(8, 'ORDER3', 'AG', N'20151003'),
(4, 'ORDER4', 'DS', N'20151001'),
(7, 'ORDER4', 'AG', N'20151002'),
(9, 'ORDER4', 'DS', N'20151003')
)
AS MYTABLE(ID, ORDER_ID, CODE, ORDER_DATE)
), cteGET_MAX_DATE_AND_CODE
AS (
SELECT
H1.ID,
H1.ORDER_ID,
H1.CODE,
H1.ORDER_DATE,
MAX(H1.ORDER_DATE) --Get the latest date for each ORDER
OVER (PARTITION BY ORDER_ID) AS MAX_DATE,
LEAD(H1.CODE, 1, H1.CODE) --Get the code 1 row before the current row.
OVER (PARTITION BY ORDER_ID
ORDER BY ORDER_DATE DESC) AS PREVIOUS_CODE
FROM cteSAMPLEDATA H1
)
SELECT
ID, ORDER_ID, CODE, PREVIOUS_CODE, ORDER_DATE, MAX_DATE
FROM cteGET_MAX_DATE_AND_CODE
WHERE ORDER_DATE = MAX_DATE
AND CODE != PREVIOUS_CODE AND PREVIOUS_CODE IS NOT NULL
;
March 17, 2017 at 8:57 am
ChrisM@Work - Friday, March 17, 2017 2:54 AMdavidawest00 - Wednesday, February 17, 2016 10:35 AMI have encountered this precise scenario a dozen times.You can do the same thing with a correlated sub-query, and they usually scale better than any CTE because a CTE is just an in-line view, which SQL Server will generate first.In other words, SS will create ALL THE ROWS for a CTE before it processes the main query, whether it will need all those rows or not, and you can't put an index on a CTE.Slow and disk intensive. Bleh! Better to create 2 temp tables and index them: 1 for the current order and 1 for the previous order.Then join. Last time I did that the stored proc went from 2 hours to 20 minutes.Just to clarify, this is a myth. SQL Server does not "evaluate" a CTE in this manner at all. CTE's are simply a coding construct similar to a derived table or a correlated subquery.
There are some parallels with recursive CTE's but that's a different story.
Amen to that! I knew if I read down far enough, one of you heavy hitters would pick up on that. Same thing with a View. They usually aren't "materialized first" and cannot be treated as if they were a table. They become a part of the overall code, just as an iTVF usually does.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 17, 2017 at 9:14 am
adame - Thursday, February 18, 2016 1:38 AMAgreed, always look to use new features.
Nah...you did good. I agree that there have been some very useful and well thought out new features (ROW_NUMBER and APPLY are my all-time "new feature" favorites, so far) but a lot of the new features have been, ummmm.... no other word for it... "crap" or lead to misuse. My three least favorite "new" features so far are the newer date and time (including DATETIME2, etc, because they disallow direct temporal math), PIVOT/UNPIVOT (pre-aggregated CROSS TABS are both easier and faster for what I do), and FORMAT (44 times slower than CONVERT) with incremental Recursive CTEs following as a very close second.
Heh... and yeah.... I definitely get the "legacy" stuff. I had to use a crow-bar and jet-propelled pork chop launcher to get our company to finally upgrade from 2005 to 2012 and that was just last year. We actually had the hardware for the migration setup and ready nearly 2 years before that. A lot of people simply can't afford to do any upgrades and so I frequently shoot for the lowest common denominator (generally 2005) unless it just doesn't make sense to do so because of some huge performance gain... which usually ISN'T the case. π
--Jeff Moden
Change is inevitable... Change for the better is not.
March 17, 2017 at 10:37 am
Jeff Moden - Friday, March 17, 2017 9:14 AMadame - Thursday, February 18, 2016 1:38 AMAgreed, always look to use new features.Nah...you did good. I agree that there have been some very useful and well thought out new features (ROW_NUMBER and APPLY are my all-time "new feature" favorites, so far) but a lot of the new features have been, ummmm.... no other word for it... "crap" or lead to misuse. My three least favorite "new" features so far are the newer date and time (including DATETIME2, etc, because they disallow direct temporal math), PIVOT/UNPIVOT (pre-aggregated CROSS TABS are both easier and faster for what I do), and FORMAT (44 times slower than CONVERT) with incremental Recursive CTEs following as a very close second.
Heh... and yeah.... I definitely get the "legacy" stuff. I had to use a crow-bar and jet-propelled pork chop launcher to get our company to finally upgrade from 2005 to 2012 and that was just last year. We actually had the hardware for the migration setup and ready nearly 2 years before that. A lot of people simply can't afford to do any upgrades and so I frequently shoot for the lowest common denominator (generally 2005) unless it just doesn't make sense to do so because of some huge performance gain... which usually ISN'T the case. π
Whatever did we do without ROW_NUMBER? Oh, I remember now - we ran stuff into a temp table, using the IDENTITY() function in the SELECT. Mimicing PARTITION BY was less trivial. Throw NTILE, SUM, COUNT etc into the mix and you've got a sea-change in the way we code. Magic π
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
March 17, 2017 at 6:44 pm
Nice article..i love window functions..makes for an
elegant code..my go to analytical function for
Year onYear type or queries .
March 17, 2017 at 9:09 pm
Good example of LEAD. As with all things SQL, there are so many approaches to the problem. I do love window functions, and lead/lag are particularly useful. That said, I might have approached this specific problem differently. Mostly I throw this out there just for alternative thoughts, not as a criticism.
One of the things that struck me about the approach outlined is that at the core of the requirement in this case is that we are only interested in the most recent two records for an order. With that in mind, I would have addressed that first, so that the comparison of last value to previous value only has to manage two records per order. You could go so far as to even remove orders with only one record. I would accomplish this with a CTE that uses either ROW_NUMBER or COUNT() OVER () to serially number each set of records over an orderId with desc on date. From that CTE, it's easy to throw a filter on the next select where row_number < 2 and now you've got a record set that doesn't have any records that don't matter. If you really wanted to get fancy, you could filter them further for orders that have more than one record. This would mean the record set that you are ultimately scraping for last value to previous value comparison has precisely 2 records per order. No more and no less.
From there a lead function could be used, or you could self join the table, or you could use the pivot operator and compare columns, or you could group on orderId and compare MAX(value) to MIN(value) - any of which I feel could read cleaner, with varying degrees of performance considerations.
For me, readability is my first consideration, then performance (assuming results are correct, of course). That said, this is what I visualized immediately (not tested)
; WITH ORDERCODES AS
(
SELECT
ID
, ORDER_ID
, CODE
, ORDER_DATE
, REV_ORDER_SEQUENCE = ROW_NUMBER() OVER (PARTITION BY ORDER_ID ORDER BY ORDER_DATE DESC)
, TOTAL_ORDER_RECORDS = COUNT(1) OVER (PARTITION BY ORDER_ID)
FROM
SAMPLEVALUES
)
, ORDERSINERROR
(
SELECT ORDER_ID
FROM ORDERCODES OC
WHERE
OC.REV_ORDER_SEQUENCE <= 2
AND OC.TOTAL_ORDER_RECORDS > 1
GROUP BY ORDER_ID
HAVING MAX(CODE) <> MIN(CODE)
)
SELECT OC.*
FROM
ORDERCODES OC INNER JOIN
ORDERSINERROR OIE
ON OIE.ORDER_ID = OC.ORDER_ID
AND OC.REV_ORDER_SEQUENCE <= 2
ORDER BY
ORDER_ID
, ORDER_DATE
March 18, 2017 at 3:31 pm
ChrisM@Work - Friday, March 17, 2017 10:37 AMWhatever did we do without ROW_NUMBER? Oh, I remember now - we ran stuff into a temp table, using the IDENTITY() function in the SELECT. Mimicing PARTITION BY was less trivial. Throw NTILE, SUM, COUNT etc into the mix and you've got a sea-change in the way we code. Magic π
Partition By was pretty easy to solve once you had things in the Temp Table. I don't even have to tell you how because you already know. The reminder is "QU". π
--Jeff Moden
Change is inevitable... Change for the better is not.
March 18, 2017 at 4:07 pm
Jeff Moden - Saturday, March 18, 2017 3:31 PMChrisM@Work - Friday, March 17, 2017 10:37 AMWhatever did we do without ROW_NUMBER? Oh, I remember now - we ran stuff into a temp table, using the IDENTITY() function in the SELECT. Mimicing PARTITION BY was less trivial. Throw NTILE, SUM, COUNT etc into the mix and you've got a sea-change in the way we code. Magic π
Partition By was pretty easy to solve once you had things in the Temp Table. I don't even have to tell you how because you already know. The reminder is "QU". π
You know that I know that...
Our BI team had a logic problem last week. I gave them three different solutions to play with, all learned here on ssc. The fastest was a QU solution which ran in moments. They were blown away.
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
March 22, 2017 at 12:43 pm
ChrisM@home - Saturday, March 18, 2017 4:07 PMJeff Moden - Saturday, March 18, 2017 3:31 PMChrisM@Work - Friday, March 17, 2017 10:37 AMWhatever did we do without ROW_NUMBER? Oh, I remember now - we ran stuff into a temp table, using the IDENTITY() function in the SELECT. Mimicing PARTITION BY was less trivial. Throw NTILE, SUM, COUNT etc into the mix and you've got a sea-change in the way we code. Magic π
Partition By was pretty easy to solve once you had things in the Temp Table. I don't even have to tell you how because you already know. The reminder is "QU". π
You know that I know that...
Our BI team had a logic problem last week. I gave them three different solutions to play with, all learned here on ssc. The fastest was a QU solution which ran in moments. They were blown away.
Could either of you point to an article or description of a "QU solution", please?
[font="Arial Narrow"]bc[/font]
March 22, 2017 at 1:17 pm
bc_ - Wednesday, March 22, 2017 12:43 PMChrisM@home - Saturday, March 18, 2017 4:07 PMJeff Moden - Saturday, March 18, 2017 3:31 PMChrisM@Work - Friday, March 17, 2017 10:37 AMWhatever did we do without ROW_NUMBER? Oh, I remember now - we ran stuff into a temp table, using the IDENTITY() function in the SELECT. Mimicing PARTITION BY was less trivial. Throw NTILE, SUM, COUNT etc into the mix and you've got a sea-change in the way we code. Magic π
Partition By was pretty easy to solve once you had things in the Temp Table. I don't even have to tell you how because you already know. The reminder is "QU". π
You know that I know that...
Our BI team had a logic problem last week. I gave them three different solutions to play with, all learned here on ssc. The fastest was a QU solution which ran in moments. They were blown away.Could either of you point to an article or description of a "QU solution", please?
Here's the article explaining it.
http://www.sqlservercentral.com/articles/T-SQL/68467/
Viewing 15 posts - 16 through 30 (of 30 total)
You must be logged in to reply to this topic. Login to reply