May 24, 2012 at 2:22 am
Jeff Moden (5/23/2012)
michael vessey (5/23/2012)
try thiswith x (id) as
(select 1 as id
union all
select ID+1 from X as id
where id<10000
)
insert into fooTable
select d.name from sysdatabases d cross join x option (maxrecursion 10000);
Oooooohhhh, be careful, Michael. That's a "Counting rCTE". Read the following article for why that's generally a bad idea.
jeff - i disagree
these kinds of operations are "generally" (and i feel free to use that word since you also used it) one off hits. in this instance it's got a maxrecusrsion of 10k and the time to execute is less than 1 second... if i was trying to shave time off a query then i would start crafting cross joins.... but in this instance ITS REALLY NOT WORTH IT!!!!
i'd lose more time typing than i would save on my code
i'm generally a supporter of the principals you try to get others to follow , but give us a break, for the one off hits like this it just comes across a little preachy and high and mighty (really sorry jeff - i hope you take this with the humour it is intended :-D) with no real world value...
in the Uber environment where i need to squeeze 1ms out of a query i completely agree, but for one off hits i just think it's SQL "Nazi" time.
MVDBA
May 24, 2012 at 2:24 am
oh and a recursive CTE may be an iterative operation, but when cross joined for an insert the "Insert" is not iterative..... it's still a set based operation in the same way that loading a numbers table is iterative, but you use it in a set based way
MVDBA
May 24, 2012 at 2:29 am
...anything that necessarily iterates over 1 row at a time and cannot be made to iterate of more than 1 row at a time (you can read that as processes 1 row at a time for each execution) isn't set based..
hence my insert statement - ----- the only one that affected a table is set based... the rest is CPU spin
sorry - just catching up on all of the posts where you lot crucify me for posting using a peice of code that is QUOTED in the MS documentation
MVDBA
May 24, 2012 at 6:28 am
michael vessey (5/24/2012)
...anything that necessarily iterates over 1 row at a time and cannot be made to iterate of more than 1 row at a time (you can read that as processes 1 row at a time for each execution) isn't set based..
hence my insert statement - ----- the only one that affected a table is set based... the rest is CPU spin
sorry - just catching up on all of the posts where you lot crucify me for posting using a peice of code that is QUOTED in the MS documentation
Not sure what you're talking about. Where did that happen, Michael... on the rCTE? That wasn't a crucifixtion if that's what you're talking about.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 24, 2012 at 6:37 am
michael vessey (5/24/2012)
Jeff Moden (5/23/2012)
michael vessey (5/23/2012)
try thiswith x (id) as
(select 1 as id
union all
select ID+1 from X as id
where id<10000
)
insert into fooTable
select d.name from sysdatabases d cross join x option (maxrecursion 10000);
Oooooohhhh, be careful, Michael. That's a "Counting rCTE". Read the following article for why that's generally a bad idea.
jeff - i disagree
these kinds of operations are "generally" (and i feel free to use that word since you also used it) one off hits. in this instance it's got a maxrecusrsion of 10k and the time to execute is less than 1 second... if i was trying to shave time off a query then i would start crafting cross joins.... but in this instance ITS REALLY NOT WORTH IT!!!!
i'd lose more time typing than i would save on my code
i'm generally a supporter of the principals you try to get others to follow , but give us a break, for the one off hits like this it just comes across a little preachy and high and mighty (really sorry jeff - i hope you take this with the humour it is intended :-D) with no real world value...
in the Uber environment where i need to squeeze 1ms out of a query i completely agree, but for one off hits i just think it's SQL "Nazi" time.
It's ok to disagree but, Yes... it's always really worth it with Counting rCTEs. The reason why is that someone who isn't quite aware will use your rCTE code on something bigger and get into a whole lot of trouble. Besides, why practice doing something wrong even if it's one off? Heh... and the reason why it sounds a little preachy is because some of you come off as being a little preachy in the opposite direction. Justification of bad code because of a low row count is a bad justification.
I strongly resent your implications with the word "Nazi". Let's keep this civil and straight instead of even coming close to ad hominem attacks or using such ugly words. I can take a whole lot but I'm not going to take anything that even comes close to abuse.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 24, 2012 at 6:53 am
michael vessey (5/24/2012)
oh and a recursive CTE may be an iterative operation, but when cross joined for an insert the "Insert" is not iterative..... it's still a set based operation in the same way that loading a numbers table is iterative, but you use it in a set based way
I agree that the input is set based once the rCTE has produced a result set but, No... the overall query not set based. If it were, it wouldn't need to do all of the extra reads that it does.
SET STATISTICS IO ON;
with x (id) as
(select 1 as id
union all
select ID+1 from X as id
where id<10000
)
insert into fooTable
select d.name from sys.databases d cross join x option (maxrecursion 10000);
insert into fooTable
select d.name from sys.databases d cross join dbo.Tally t WHERE t.N BETWEEN 1 AND 10000
SET STATISTICS IO OFF;
Table 'fooTable'. Scan count 0, logical reads 130392, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysdbreg'. Scan count 1, logical reads 20001, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 2, logical reads 60001, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(130000 row(s) affected)
Table 'fooTable'. Scan count 0, logical reads 130392, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Tally'. Scan count 1, logical reads 19, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysdbreg'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(130000 row(s) affected)
--Jeff Moden
Change is inevitable... Change for the better is not.
May 24, 2012 at 6:58 am
Hi, OP here. Been following the thread over the last day and you guys have provided plenty of examples I intend to test and read up on (rCTEs for example).
I've also got DD v1 and going to read Hugo's article tonight.
Just wanted to say thanks for the info TO EVERYONE - but also can we stop the bunfight. No need for bad language - we've gone from a theoretical discussion on different ways of implementing set based iteration to Defcon 1. The humble OP is sitting in the corner over here, watching the fight, and feeling a bit guilty...
Cheers.
---
Note to developers:Want to get the best help? Click here https://www.sqlservercentral.com/articles/forum-etiquette-how-to-post-datacode-on-a-forum-to-get-the-best-help (Jeff Moden)
My blog: http://uksqldba.blogspot.com
Visit http://www.DerekColley.co.uk to find out more about me.
May 24, 2012 at 7:35 am
I strongly resent your implications with the word "Nazi". Let's keep this civil and straight instead of even coming close to ad hominem attacks or using such ugly words. I can take a whole lot but I'm not going to take anything that even comes close to abuse.
jeff - trust me , this is not abuse, i am purely trying to convey that there are people who say "YOU MUST DO THIS BECAUSE IT IS THE UBER WAY" (i did not say you were doing that either) but they don't take into account the scenario you are in. I re-read my posts , and i cannot find anything abusive in them.
i find that teaching people to do it the correct way is fine, but sometimes you just have to get down and dirty
hence the use of smileys in my post, i was attempting to convey humour...
my apologies sincerely if you felt offended by my terminology
MVDBA
May 25, 2012 at 6:37 pm
michael vessey (5/24/2012)
I strongly resent your implications with the word "Nazi". Let's keep this civil and straight instead of even coming close to ad hominem attacks or using such ugly words. I can take a whole lot but I'm not going to take anything that even comes close to abuse.
jeff - trust me , this is not abuse, i am purely trying to convey that there are people who say "YOU MUST DO THIS BECAUSE IT IS THE UBER WAY" (i did not say you were doing that either) but they don't take into account the scenario you are in. I re-read my posts , and i cannot find anything abusive in them.
i find that teaching people to do it the correct way is fine, but sometimes you just have to get down and dirty
hence the use of smileys in my post, i was attempting to convey humour...
my apologies sincerely if you felt offended by my terminology
My apologies, as well. It would appear that you and I are of the same Ilk then and I misread your post. When people post solutions or recommendations with "UBER" claims of performance or best practices, I tend to put on war paint and a postal uniform. It really ticks me off especially when some of those claims are so bloody wrong.
Thank you very much for taking the time to clear up my misunderstanding of what you meant, Michael. Still... I'd lose that "N" word in your descriptions especially in threads addressed to me specifically.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 25, 2012 at 8:10 pm
michael vessey (5/24/2012)
I strongly resent your implications with the word "Nazi". Let's keep this civil and straight instead of even coming close to ad hominem attacks or using such ugly words. I can take a whole lot but I'm not going to take anything that even comes close to abuse.
jeff - trust me , this is not abuse, i am purely trying to convey that there are people who say "YOU MUST DO THIS BECAUSE IT IS THE UBER WAY" (i did not say you were doing that either) but they don't take into account the scenario you are in. I re-read my posts , and i cannot find anything abusive in them.
i find that teaching people to do it the correct way is fine, but sometimes you just have to get down and dirty
hence the use of smileys in my post, i was attempting to convey humour...
my apologies sincerely if you felt offended by my terminology
Ok... putting my money where my mouth is, let me show you that certain "UBERisms" really aren't "UBERisms" and why I disagree with your statements.
First, I agree that it's a "one off" and that you can take certain liberties with such one off code... but using an rCTE to count by one isn't a "liberty"... it's a real problem. The problem is that there are thousands of posts that show Counting rCTE's as a "simple" way to do things and thousands of other people (not to mention all the innocent "lurkers" that also think so) have posted responses to such code offerings as "Really good code . I've put this into production." and the like. So what was meant to be "one off" code has become production code for many and they're going to pay dearly for making that mistake. It's not their fault. They just don't know any better. After all, who would ever publish anything bad on the internet especially if someone is smart enough to build their own blog? :sick:
Second, the one off counting rCTE in this case (or any case, for that matter) doesn't just make a second or two of difference. It makes a LOT of difference. One of the major problems with production systems is rank amateurs making one off queries to satisfy some need. While one person making such a query usually won't cripple a server, having dozens of people using such one off queries will. It's been such a problem that Microsoft actually tried to make it possible to put limits on such queries by creating "resource governor". This query only gens 130,000 rows but I've seen such one off queries that generate much more than that along with a whole lot more resource usage and duration. So it's important that people learn the right way to do things even with one off queries.
Here's the rCTE for this problem along with the "UBER" code. Run it and see what happens.
PRINT '========== Recursive CTE method ======================================'
SET STATISTICS TIME,IO ON;
CREATE TABLE fooTable (data VARCHAR(8000) );
with x (id) as
(select 1 as id
union all
select ID+1 from X as id
where id<10000
)
insert into fooTable
select d.name from sys.databases d cross join x option (maxrecursion 10000);
SET STATISTICS TIME,IO OFF;
DROP TABLE dbo.fooTable;
GO
PRINT '========== The "UBER" method ;-) ====================================='
SET STATISTICS TIME,IO ON;
SELECT d.name
INTO dbo.fooTable
FROM sys.databases d,
(
SELECT TOP (10000) 1 ID
FROM sys.all_columns ac1,
sys.all_columns ac2
)x
;
SET STATISTICS TIME,IO OFF;
DROP TABLE dbo.fooTable;
GO
Actually, I'll save you some time. Here are the results executed on a simple 2.2Ghz 64 bit dual processor box with 4GB of RAM.
========== Recursive CTE method ======================================
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 49 ms.
Table 'fooTable'. Scan count 0, logical reads 130405, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysdbreg'. Scan count 1, logical reads 20001, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 2, logical reads 60001, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1201 ms, elapsed time = 1462 ms.
(130000 row(s) affected)
========== The "UBER" method ;-) =====================================
Table 'Worktable'. Scan count 1, logical reads 20025, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysdbreg'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolrdb'. Scan count 1, logical reads 100, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 2, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 110 ms, elapsed time = 430 ms.
(130000 row(s) affected)
Breaking that down, the rCTE really did only take a second longer. So what's the big deal? The big deal is that I'd like to know just exactly why you would ever intentionally write code that took longer even for a one off?
Oh... I forgot. Let me answer that for you using your own words. "in this instance ITS REALLY NOT WORTH IT!!!!" and you "justified" that statement with your other statement of "i'd lose more time typing than i would save on my code". Ok. I'd almost buy that if it were true. Lets see...
The following code has 187 characters in it not including spaces.
CREATE TABLE fooTable (data VARCHAR(8000) );
with x (id) as
(select 1 as id
union all
select ID+1 from X as id
where id<10000
)
insert into fooTable
select d.name from sys.databases d cross join x option (maxrecursion 10000);
Ah... let's change that code so we don't have to worry about creating the table separately just to make all things equal... The following code is now only 146 characters not including any spaces.
with x (id) as
(select 1 as id
union all
select ID+1 from X as id
where id<10000
)
select d.name INTO dbo.fooTable from sys.databases d cross join x option (maxrecursion 10000);
That change actually helped make the rCTE even faster.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
========== Recursive CTE method with INTO ======================================
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'sysdbreg'. Scan count 1, logical reads 20001, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 2, logical reads 60001, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 437 ms, elapsed time = 464 ms.
(130000 row(s) affected)
But, it's still about 4 times slower than the "UBER" code. Speaking of which, let's see how the "UBER" code compares.
The following "UBER" code has 111 characters in it not including spaces.
SELECT d.name
INTO dbo.fooTable
FROM sys.databases d,
(
SELECT TOP (10000) 1 ID
FROM sys.all_columns ac1,
sys.all_columns ac2
)x
;
Let's tally up the results. The "UBER" code is about 4 times faster, is less resource intensive, and is 35 characters shorter to type than the rcte code.
How is that "NOT worth it"???
--Jeff Moden
Change is inevitable... Change for the better is not.
May 25, 2012 at 9:21 pm
SQL Kiwi (5/24/2012)
Jeff Moden (5/24/2012)
Pro Tip: Reading the existing replies to a thread can help prevent posting duplicate information.
Regarding this point. You know yourself that you often rock up to a long thread, replying in order one at a time without reading all the way through first. It's great for the post count, of course, but try the following Google search: "site:sqlservercentral.com jeff moden heh wish read whole thread" < I got 1,120 results 😛
I did say that I need to learn not to do it but Jeez, Paul... You of all people should know I don't do this stuff for a post count. Besides, I'm not the only one whose ever posted an answer for something that's already been answered.
[EDIT] You really need to read some of those supposed hits. Most of them are based on words you've included and have nothing to do with me saying that I wish I read the whole thread. In fact, there's a whole lot of those posts you found that aren't even mine. 😉
Then do me the courtesy of reading my responses before getting on your anti-rCTE soapbox, and talking to me like I just picked up a SQL manual yesterday. And keep the bad language in the box where it belongs.
Ah... ok. I see what you're calling "bad language". I took strong except to your "Pro Tip" and used the word CRAP, told you that I thought you were being "snarky" because you said it, and equated your tip to the ring knocking stuff a lot of people resort to during debates instead of sticking to the subject. Perhaps we're both out of line but I do have the right to defend myself when someone takes a shot at me.
According to the nightmares I've seen in production code at many companies and the code by people on hundreds of posts and blogs who praise the "efficiency" of "counting rCTEs", I don't appreciate it when you belittle my efforts to educate people about such a danger by callig it "soapbox" even if we are in a moment of disagreement.
And, YES, I did read your responses... you said that something can still be set based even if it's a set of one. I thoroughly agree with that but not in the classic sense where rCTE's are concerned and called you on it. A single query like an rCTE doesn't make something set based in the classic sense of the term as it relates to SQL any more than a triangular join does. You even agreed that's true of triangular joins in another post. Are they actually "set based" in the purest sense of the term? Absolutely! They create a new set for each and every aggregation. The problem is that the sets they usually include also include thousands and millions of copies of the same data much like a bubble sort. Even a bubble sort is "set based" but not in the classic sense of the term as it relates to SQL.
SQL Kiwi (5/24/2012)
Jeff Moden (5/24/2012)
...anything that necessarily iterates over 1 row at a time and cannot be made to iterate of more than 1 row at a time (you can read that as processes 1 row at a time for each execution) isn't set based. Hugo's update was set based much like a while loop operating on different levels of a hierarchy is set based. It's capable of handling more than 1 row at a time for each iteration.The following recursive counting CTE is set-based then?
WITH rCTE AS
(
SELECT n = 1
UNION ALL
SELECT rCTE.n
FROM rCTE,
(
VALUES
(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1)
) AS V (v)
)
SELECT TOP (10000)
ROW_NUMBER() OVER (ORDER BY (SELECT PI()))
FROM rCTE
ORDER BY
ROW_NUMBER() OVER (ORDER BY (SELECT PI()))
OPTION (MAXRECURSION 0);
It's certainly an unusual way of doing a cross join, but yes... I consider that to be set based. And, no, I don't consider it to be a "counting" rCTE. I'd also spend some time educating a developer that wrote such a thing.
Then do me the courtesy of reading my responses before getting on your anti-rCTE soapbox, and talking to me like I just picked up a SQL manual yesterday. And keep the bad language in the box where it belongs.
No problem... Keep your "pro tips" in that same box, please.
Now that we've each had a turn swinging at each other and have bloodied each other's lip a bit, how about we drop all of this, raise a beer to each other (ouch... you do throw a good punch! 🙂 ), and call it a terrible misunderstanding between friends. At the very least, lets agree that the two of us are calling two different things "set based" and move on.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 25, 2012 at 11:12 pm
Jeff Moden (5/25/2012)
It's certainly an unusual way of doing a cross join, but yes... I consider that to be set based. And, no, I don't consider it to be a "counting" rCTE.
And this one?
WITH rCTE AS
(
SELECT n = 1
UNION ALL
SELECT
rCTE.n + F1.f
FROM rCTE
CROSS JOIN
(
VALUES
(01),(02),(03),(04),
(05),(06),(07),(08),
(09),(10),(11),(12)
) AS F1 (f)
)
SELECT TOP (10000)
rCTE.n
FROM rCTE
OPTION (MAXRECURSION 0);
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
May 25, 2012 at 11:34 pm
Which definition shall I use? The mathematically correct one or the classic SQL one?
Mathematically, yes.
Classic SQL server wise, no. For the same reason the following isn't.
[Edit] Actually, I'm not sure on this one... I've got some more checking to do.
[Edit 2] Either way, it's still a heck of a lot slower and more resource intensive than the equivalent "UBER" code. That's makess all of this pretty much a moot point to me because I wouldn't use your rCTE example for this and I wouldn't recommend it to anyone else for use.
[Edit 3] I'm also going to have to take back what I said about the first rCTE you posted for your questions to me. After looking at it, yes... it's an unusual way to do the equivalent of a cross join. But it's still doing multiple "executions" and making very small row count estimations in the execution plan. I guess you could call it set based in the classic SQL sense because each resursive iteration is processing a set, but man it's slow compared to a real cross join. Perhaps instead of calling it "set based", it should be called "Lasagna" code.
WITH rCTE AS
(
SELECT n = 1
UNION ALL
SELECT
rCTE.n + F1.f
FROM rCTE
CROSS JOIN
(
VALUES
(01)
) AS F1 (f)
)
SELECT TOP (10000)
rCTE.n
FROM rCTE
OPTION (MAXRECURSION 0);
Your turn. Is the following Set based or not? Please use both definitions in your response.
with x (id) as
(select 1 as id
union all
select ID+1 from X as id
where id<10000
)
select d.name from sys.databases d cross join x option (maxrecursion 10000);
--Jeff Moden
Change is inevitable... Change for the better is not.
May 26, 2012 at 12:11 am
Removed because I'm not actually sure about your spots of code yet. Please see my edit above.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 26, 2012 at 2:01 am
The difficulty you are having here is exactly why I said the meaning of 'set-based' is a bit woolly, in response to Sean saying that recursive CTEs are not set-based.
How do you define "set-based"? If each iteration processes a set (even if the set has one member) then in what sense is it not set-based? Should a "set-based" query be a single statement (the recursive CTE passes that test, the WHILE loop does not)? Or should "set-based" exclude any physical execution plan that includes a nested loops join? It all seems a bit woolly to me.
Perhaps you missed the humour in Sean's reply ('sheepishly' re: 'woolly'), but the point is that this is what started this debate, before your first post. As you can see above (especially if you read the thread back in context), I deliberately did not give a definition of 'set-based'; my reply to Sean was to explore what I see as a flaw in his reasoning.
It seems to me that 'set-based' is a purely logical concept, rooted in relational and set theory. Certainly it is possible to program procedurally in T-SQL, using WHILE loops and cursors, but any logical SQL query defines a logical set-based operation, by definition. Recursive CTEs, in particular, express a logical set-based requirement, they say nothing about how it should be implemented. Extending the meaning of 'set-based' into the physical domain therefore seems flawed.
Anyway, another recursive counting CTE to bend your mind around:
WITH Digits AS
(
SELECT
D.digit
FROM
(
VALUES
(1),(2),(3),(4),(5),
(6),(7),(8),(9),(10)
) AS D (digit)
),
rCTE AS
(
SELECT
n = d.digit
FROM Digits AS d
UNION ALL
SELECT
rCTE.n * 10 + d.digit
FROM rCTE
CROSS JOIN Digits AS d
WHERE rCTE.n < 1000
)
SELECT n
FROM rCTE;
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
Viewing 15 posts - 31 through 45 (of 46 total)
You must be logged in to reply to this topic. Login to reply