September 19, 2020 at 4:36 pm
What is the minimal way to create a tally table/tvf/cte and to describe its advantages (if any) versus recursive cte's? For the past several weeks I've observed most (all) of the SQL Server questions and answers on Stack Overflow. Over and over and over the questions require some form of sequence generation. Over and over and over the tally based solutions almost always lose to a recursive cte based solution. It's very frustrating for the Answer-ers (like me) offering the tally based solutions. What can be done about this?
Here's a CTE based tally approach I've been using. However, analysis of the execution plan shows it's horribly inefficient.
drop table if exists #a;
go
create table #a(
id int,
[start_date] date,
[end_date] date,
a int,
b int);
insert #a(id, [start_date], [end_date], a, b) values
(1, '2020-09-18', '2020-09-28', 4, 2),
(2, '2019-09-18', '2019-10-12', 6, 3);
/* this code is horribly inefficient */
;with
n(n) as (
select * from (values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) v(n)),
tally_cte(n) as (
select row_number() over (order by (select null))
from n n1 cross join n n2 cross join n n3 cross join n n4 cross join n n5)
select id, dateadd(d, n-1, t.[start_date]) dt, a, b
from #a t
cross apply
tally_cte tc
where tc.n<=datediff(d, t.[start_date], t.[end_date])+1
order by 1, 2;Here's the least "scary" looking numbers tvf I've been able to come up with. Does anyone have anything more succinct? I'm calling it fnNumbers because I think "tally" is a scary word for noobs to grok. Sorry, maybe that's not true but I'm trying to eliminate all possible objections. fnNumbers seems simpler.
drop function if exists [dbo].[fnNumbers];
go
create function [dbo].[fnNumbers](
@zero_or_one bit,
@n bigint)
returns table with schemabinding as return
with n(n) as (select null from (values (1),(2),(3),(4)) n(n))
select 0 n where @zero_or_one = 0
union all
select top(@n) row_number() over(order by (select null)) n
from n na, n nb, n nc, n nd, n ne, n nf, n ng, n nh,
n ni, n nj, n nk, n nl, n nm, n np, n nq, n nr;
go
He he he, now you got me started, make some jugs of coffee;-)
😎
September 19, 2020 at 8:07 pm
Ok, here we go...
The code that you originally wrote generates the full range of values that the inline cascading CTE for the Tally cte can generate and it does so for every row that you CROSS APPLY it to. That's the inefficiency due to the lack of a "Row Goal" for each iteration until after all of the Tally rows have been generated. Although you limit the use of each result in the WHERE clause, the WHERE clause simply occurs too late in the mix and still allows the Tally cte to generate all of the rows it's capable of generating for every row in the source table.
If you were using an fnTally function in the CROSS APPLY, the "Row Goal" is passed to the function as the 2nd operand and that's a part of the reason that an fnTally function works so well (although there can be a problem when working with larger numbers and the system "goes parallel" and the optimizer loses it's mind and generates all 4 billion rows, which I have a couple of easy fixes for).
Jeff Williams had the right idea of early setting of a "row goal" but his method still over-calculates what the number of rows will be for anything but the maximum range of dates in the rows of the source tables.
There's an easy fix for inlining the Tally cte... you simply have to treat is as if it were a function and pass a TOP to each iteration of its use by the CROSS APPLY, just like what would happen if it were an iTVF (Inline Table Valued Function). Remember that you must NOT treat the Tally cte as if it were a table because it's not... it's a data generator and it has to generate data for each row in the source table. Treating it as if it were a table (in other words, using a WHERE clause against it instead of a TOP in the CROSS APPLY) is a mistake that even some pro's make and is the reason why so many people incorrectly think a Tally table can be beaten with cruddy code such as that found in an rCTE.
Here's some commented code that works much better to solve the problem you posted... Since you're also doing this for DATE columns, you might as well go "whole hawg" and make sure that it can handle the full range of the DATE datatype and that also means that you have to use DATEDIFF_BIG() instead of just DATEDIFF().
WITH
E1(N) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))E0(N)),
Tally(N) AS (--==== Potentially handle up to 3,652,059, which is the number of
-- days in the full range of the DATE datatype.
-- (Scalability Bullet-Proofing handles up to 10,000,000 - 1)
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1
FROM E1 n1, E1 n2 ,E1 n3, E1 n4, E1 n5, E1 n6, E1 n7
)
SELECT src.id, dt = DATEADD(d, t.N, src.start_date), src.a, src.b
FROM #a src
CROSS APPLY (--==== You must treat the Tally cte as if it were a function that
-- needs TOP in it to prevent massive number generation overruns.
-- You must not treat the Tally cte as if it were a table because
-- it doesn't act like a table. It's called once for each row in
-- the source table.
SELECT TOP (DATEDIFF_BIG(d, src.start_date, src.end_date)+1)
N
FROM Tally cte
)t(N)
;
Without the comments, you can see how relatively short the code is but, on forums like StackOverflow, people will learn nothing from it simply because they didn't know enough about how to solve the original problem and so they're not going to understand the solution unless they analyze it and we know how often they'll do that.
WITH
E1(N) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))E0(N)),
Tally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1
FROM E1 n1, E1 n2 ,E1 n3, E1 n4, E1 n5, E1 n6, E1 n7)
SELECT src.id, dt = DATEADD(d, t.N, src.start_date), src.a, src.b
FROM #a src
CROSS APPLY (SELECT TOP (DATEDIFF_BIG(d, src.start_date, src.end_date)+1) N FROM Tally cte) t(n)
;
Heh... I also hate underscores and brackets and find it difficult to believe that MS reserved "start_date" and "Days" as reserved words.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 19, 2020 at 9:46 pm
Ah... almost forgot... What to do about the folks that claim that performance doesn't matter if you know that there will always only be a small number of rows involved and so just several milliseconds of difference doesn't matter? Even you are thinking the same thing.
To put it in human terms as a bit of a simile, would you drink your tap water if you knew there was only a "little" lead in it?
Now, let's put it in terms of code. Here's the setup code from your SQLFiddle "1000" example...
drop function if exists [dbo].[fnNumbers];
go
create function [dbo].[fnNumbers](
@zero_or_one bit,
@n bigint)
returns table with schemabinding as return
with n(n) as (select null from (values (1),(2),(3),(4)) n(n))
select 0 n where @zero_or_one = 0
union all
select top(@n) row_number() over(order by (select null)) n
from n na, n nb, n nc, n nd, n ne, n nf, n ng, n nh,
n ni, n nj, n nk, n nl, n nm, n np, n nq, n nr;
go
drop function if exists [dbo].[fnNumbersR];
go
create function [dbo].[fnNumbersR](
@zero_or_one smallint,
@n bigint)
returns table with schemabinding as return
with numbers(n) as (
select cast(@zero_or_one as int) n
union all
select n+1
from numbers
where n<@n)
select n from numbers;
go
drop table if exists performance_metrics;
go
create table performance_metrics(
pm_id int identity(1, 1) primary key not null,
test_item nvarchar(256) not null,
start_dt datetime2(7) not null,
end_dt datetime2(7) null,
ms_diff as datediff(ms, start_dt, end_dt) persisted);
go
... and, here's your test code from the same SQLFiddle...
set nocount on;
/* begin perf test of fnNumbers */
insert performance_metrics(test_item, start_dt, end_dt)
output inserted.*
values ('fnNumbers', sysutcdatetime(), null);
/* count rows using fnNumbers */
select count(*) from dbo.fnNumbers(1, 1000);
/* end perf test of fnNumbers */
update performance_metrics
set end_dt=sysutcdatetime()
output concat(inserted.test_item, ' ran in ', inserted.ms_diff, 'milliseconds')
where test_item='fnNumbers';
/* begin perf test of fnNumbersR */
insert performance_metrics(test_item, start_dt, end_dt)
output inserted.*
values ('fnNumbersR', sysutcdatetime(), null);
/* count rows using fnNumbersR */
select count(*) from dbo.fnNumbersR(1, 1000) option (maxrecursion 0);
/* end perf test of fnNumbersR */
update performance_metrics
set end_dt=sysutcdatetime()
output concat(inserted.test_item, ' ran in ', inserted.ms_diff, 'milliseconds')
where test_item='fnNumbersR';
Last but not least, here are the results from running that test code on my laptop...
Heh... big deal, right? The difference is "only" 5 milliseconds, right? Why would anyone squawk about such a thing?
The answer is because it's stupid to drink water that you know has lead in it. 😀
But, there's more and you never see it because you're only measuring duration. Let's change the test a bit so we can see what's missing. It's still your code but I took out your measurement code...
/***** count rows using fnNumbers ******************************************************************/
select count(*) from dbo.fnNumbers(1, 1000);
GO 5
--@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
GO
/***** count rows using fnNumbersR *****************************************************************/
select count(*) from dbo.fnNumbersR(1, 1000) option (maxrecursion 0);
GO 5
Now, let's see what SQL Profiler says about the code (I have it setup to measure duration in microseconds instead of milliseconds. CPU is still measured in milliseconds)...
Ahhhhhhhhhh... so it's not actually a 1 ms to 6 ms difference which would only be 5 times faster or so (which is still very substantial, IMHO). It's actually almost two orders of magnitude faster (about 97 times faster, in this case) .
Ok... with that in mind, imagine if all the little stuff in your server ran 97 times faster! Talk about "getting the lead out"!
Whup... wait a minute... there's some arsenic in the water to go with that "little" bit of lead. Look at the reads for that bloody recursive CTE. Those are logical reads which means that not only does the rCTE unnecessarily use both CPU and a whopping 2 orders of magnitude more time (duration), but it also has to do reads from memory. The Tally based function doesn't use any!
How bad is it? There are 8KB in a single page read and, it turns out, there are 128 pages in a megabyte and soooooo... the rCTE is doing 9000/128 or 70.3125 megabytes of reads for every bloody run of the rCTE! Just a piddly 10 runs would mean 731 megabytes of reads. In fact, it only takes 15 runs of the rCTE to produce more than a GigaByte of reads from memory. That's just a silly, ignorant, and stupid waste of one of the most valuable resources there is on SQL Server.
Now, just imagine all the junk that runs on a busy server and imagine how much better it would run if all the "little" stuff ran two orders of magnitude faster and used 70MB less memory I/O on every single run.
That's a problem with a lot of Developers... they only look at one piece of code at a time and make silly justifications about the supposedly insignificant differences milliseconds make instead of taking the time to do it right. How much extra time does it take? If you become practiced in doing it right, it actually takes less time to do it right than to do it wrong. For example, it takes a whole lot more time to write a CTE correctly than it does to write code to use a Tally function. The same can be true even if you have to write out an inline version because, if you're practiced at it, you have it in a code library that you can copy and paste from. Even if you have to type it out, it doesn't take long to do it right. Maybe an extra minute.
If people don't buy that (and a lot won't ever buy that), then show them how even a well written (has a transaction around the looping part) WHILE Loop blows the doors off of an rCTE especially where memory reads are concerned.
To me, it's just stupid to write code that you know has at least two built in performance problems. To try to justify it based on low row counts might mean you been drinking water with lead in it. 😉
Like Granny used to say (and it's still true), "If you mind the pennies, the dollars will take care of themselves".
--Jeff Moden
Change is inevitable... Change for the better is not.
September 20, 2020 at 12:19 am
Ok here are two recent winning recursive cte's. Be gentle when evaluating the opposing answers 🙂
https://stackoverflow.com/questions/63922869/get-an-interval-of-dates-from-a-range-of-dates/63923536
https://stackoverflow.com/questions/63919706/fill-missing-months-on-a-date-query/63920844
As a bonus, here losing while using an ordinal splitter
Thank you for posting those links.
Heh... lordy. One guy even wrote a WHILE Loop as a solution on one of those. And then if you look at some of the methods on other threads that are absolutely spot on, you have a bunch of naysayers in comments that are naysaying because they just don't get it.
It's one of the many reasons why I usually avoid SO.
In the first one, the OP accepted an answer that doesn't meet his posted expected results. You also didn't meet the requirements stated in the post. Of course, the OP might be out in left field with his request, as well. Why would you provide a given start date if you really want to start on the day after? Maybe because it's the last date in a previous series but the OP never states that.
The first two do make some pretty good interview questions, though. I already ask one similar to the second one when I'm conducting interviews.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 20, 2020 at 1:15 pm
Ok here are two recent winning recursive cte's. Be gentle when evaluating the opposing answers 🙂
https://stackoverflow.com/questions/63922869/get-an-interval-of-dates-from-a-range-of-dates/63923536
https://stackoverflow.com/questions/63919706/fill-missing-months-on-a-date-query/63920844
As a bonus, here losing while using an ordinal splitter
Thank you for posting those links.
Heh... lordy. One guy even wrote a WHILE Loop as a solution on one of those. And then if you look at some of the methods on other threads that are absolutely spot on, you have a bunch of naysayers in comments that are naysaying because they just don't get it.
It's one of the many reasons why I usually avoid SO.
In the first one, the OP accepted an answer that doesn't meet his posted expected results. You also didn't meet the requirements stated in the post. Of course, the OP might be out in left field with his request, as well. Why would you provide a given start date if you really want to start on the day after? Maybe because it's the last date in a previous series but the OP never states that.
The first two do make some pretty good interview questions, though. I already ask one similar to the second one when I'm conducting interviews.
All of the code posted on these three SO links would graciously fail my code review and if presented to me in an interview, I would suggest to the interviewee that he might be a better fit for a paper delivery role.
😎
September 20, 2020 at 3:25 pm
Ah... almost forgot... What to do about the folks that claim that performance doesn't matter if you know that here will always only be a small number of rows involved and so just several milliseconds of difference doesn't matter? Even you are thinking the same thing.
Maybe I was looking for an excuse or to not just switch to "what works". My points per answer are the lowest of all the people on the list which means I'm working harder for less outcome. To Eirikur's point the code I posted wasn't great, which I admit, and I did feel like I'd lost the performance baseline vs fnTally (which is what I'd like to use or equivalent). That's why I'm looking for better ways. Without a tested pattern it's been a struggle. Notice the timestamps on those 2 links, 3 minutes and 6 minutes vs mine which were in the 20-30's minutes. That's a big factor because fiddling around with cte's without a good pattern takes time and observing the lack of performance without a row goal in the code... and then it has to be explained. The rCTE'ers seem to just have to mention the name of the technique.
What you've done by analyzing the code in the Fiddle is meaningful and amazing. Thank you. For many other things too, thank you. This could put me on the right track and provides something to work with.
Here's my analogy. People love eating hamburgers. Most people, however, do not care to know how the meat paddies are made. No amount of interest piquing is going to induce any curiosity about grinding up cows. Now I'm a hamburger salesman. I sell gourmet hamburgers that take 30 minutes to prepare. When I sell my hamburgers, I'm standing next to Ronald McDonald who sells hot, fresh hamburgers as fast as people can shovel them in their mouths. How reasonable is it to expect hungry people to ignore hot, fresh hamburgers while their gourmet burger is prepared? I really don't want to make Big Mics at McDowell's (Coming to America) by switching approaches. So why would someone buy a hamburger from me? I think my hamburgers need to come with some sort of demonstrable proof of quality. And if the gourmet hamburgers are made in a factory (using itvf) then they can be made at about the same speed as fast food.
I'm going to analyze your analysis and try to bake in as much as I can into the Fiddles. I had forgotten the 'go 20' trick. I think the sys tables and dm views are accessible in Fiddle. SSC and SO answers which use the itvf fnNumbers should come with a prepared explanation with performance claims, links to Fiddles, and links to the original articles for people who want to know how the paddies are made. I'll post here what I come up with and update. Thanks Jeff.
Also as far as some answers being accepted when perhaps less then fully addressing the question being asked that's not an isolated occurrence as I have many examples which are even more curious. But imo it's just cannon fodder and not worth worrying about.
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
September 20, 2020 at 4:31 pm
Heh... lordy. One guy even wrote a WHILE Loop as a solution on one of those.
The WHILE loopers always think everyone else is/are crazy. That person's query also contained a RIGHT OUTER JOIN. When was the last time you actually saw one of those not in a textbook?
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
September 20, 2020 at 6:55 pm
That person's query also contained a RIGHT OUTER JOIN. When was the last time you actually saw one of those not in a textbook?
About 10 minutes ago. I use them quite a bit.
FROM dbo SomeSmallTable sst
RIGHT JOIN dbo.SomeBridgeTable bst ON bst.SomeID = sst.SomeID
LEFT JOIN dbo.SomeLargerTable slt ON bst.SomeID = slt.SomeID
;
Most folks would insist that no Outer Joins are necessary here. They'd be correct but it let's the reader more easily understand the bridge table and, on occasion, it has helped the optimizer do things a little better on occasion.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 20, 2020 at 7:06 pm
Jeff Moden wrote:Ah... almost forgot... What to do about the folks that claim that performance doesn't matter if you know that here will always only be a small number of rows involved and so just several milliseconds of difference doesn't matter? Even you are thinking the same thing.
Maybe I was looking for an excuse or to not just switch to "what works". My points per answer are the lowest of all the people on the list which means I'm working harder for less outcome. To Eirikur's point the code I posted wasn't great, which I admit, and I did feel like I'd lost the performance baseline vs fnTally (which is what I'd like to use or equivalent). That's why I'm looking for better ways. Without a tested pattern it's been a struggle. Notice the timestamps on those 2 links, 3 minutes and 6 minutes vs mine which were in the 20-30's minutes. That's a big factor because fiddling around with cte's without a good pattern takes time and observing the lack of performance without a row goal in the code... and then it has to be explained. The rCTE'ers seem to just have to mention the name of the technique.
What you've done by analyzing the code in the Fiddle is meaningful and amazing. Thank you. For many other things too, thank you. This could put me on the right track and provides something to work with.
Here's my analogy. People love eating hamburgers. Most people, however, do not care to know how the meat paddies are made. No amount of interest piquing is going to induce any curiosity about grinding up cows. Now I'm a hamburger salesman. I sell gourmet hamburgers that take 30 minutes to prepare. When I sell my hamburgers, I'm standing next to Ronald McDonald who sells hot, fresh hamburgers as fast as people can shovel them in their mouths. How reasonable is it to expect hungry people to ignore hot, fresh hamburgers while their gourmet burger is prepared? I really don't want to make Big Mics at McDowell's (Coming to America) by switching approaches. So why would someone buy a hamburger from me? I think my hamburgers need to come with some sort of demonstrable proof of quality. And if the gourmet hamburgers are made in a factory (using itvf) then they can be made at about the same speed as fast food.
I'm going to analyze your analysis and try to bake in as much as I can into the Fiddles. I had forgotten the 'go 20' trick. I think the sys tables and dm views are accessible in Fiddle. SSC and SO answers which use the itvf fnNumbers should come with a prepared explanation with performance claims, links to Fiddles, and links to the original articles for people who want to know how the paddies are made. I'll post here what I come up with and update. Thanks Jeff.
Also as far as some answers being accepted when perhaps less then fully addressing the question being asked that's not an isolated occurrence as I have many examples which are even more curious. But imo it's just cannon fodder and not worth worrying about.
If you look at my answer count v.s. the number of points I have and the position I have on Stack Overflow, you'll see that quantity doesn't matter much and, even if it did, I'm simply not going to get sucked into the SO "points matter" mindset because it's stupid. People posting the incremental rCTE solutions are doing a lot of damage to those that don't know better, which is most of the people that ask questions on such forums. All they want is ANY solution that works. Most don't give two hoots in hell about performance.
It's why I don't frequent SO very much... it gives me cramps. 😀 Ed Wagner (good friend, fellow DBA, and valued heavy hitter on SSC) recently told me about the time he had kind of a long post explaining the best way to do something... some of the moderators removed the post because "it was too long". He's never posted on SO again and I don't blame him.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 21, 2020 at 4:21 am
Here's my analogy. People love eating hamburgers. Most people, however, do not care to know how the meat paddies are made. No amount of interest piquing is going to induce any curiosity about grinding up cows. Now I'm a hamburger salesman. I sell gourmet hamburgers that take 30 minutes to prepare. When I sell my hamburgers, I'm standing next to Ronald McDonald who sells hot, fresh hamburgers as fast as people can shovel them in their mouths. How reasonable is it to expect hungry people to ignore hot, fresh hamburgers while their gourmet burger is prepared? I really don't want to make Big Mics at McDowell's (Coming to America) by switching approaches. So why would someone buy a hamburger from me? I think my hamburgers need to come with some sort of demonstrable proof of quality. And if the gourmet hamburgers are made in a factory (using itvf) then they can be made at about the same speed as fast food.
Steve, You've got a wrong analogy here.
The better one would be this:;
You live in late Soviet Union and have to choose which car to buy: Porsche 911 and ZAZ 968 (https://youtu.be/So9r_UZI6zQ)
Cars are pretty similar: light weight, 2 doors, engine at the back, headlights, steering wheel - lots of similarities.
As for the differences - there are few.
Porsche is hard to find on the streets, when ZAZ is pretty much on every corner;
ZAZ is much cheaper to buy and service. In fact, ZAZ is so simple, you don't have to be a big time mechanic to fix it yourself (you're gonna be fixing it all the time, but that's another story);
Spares and supplies for ZAZ are much more accessible, and, again, unlike Porsche, you should be able to fit them yourself, not needing to book a certified mechanic to do that.
As for performance - well, it's still 30 min drive in a traffic jam, you might miss couple of traffic lights on your way, but ZAZ would not be much slower than Porsche.
Which car you choose - it's up to you. Should Porsche really come with detailed description of all the advantages for every other crappy car on the market? Well, if you want something better, you need to do your homework.
If it's too much of a trouble, then you're gonna end up driving a piece of junk.
And I must say - for whatever reasons, ZAZ was way more common on the streets of USSR than Porsche.
Just like rCTE's are much more common on the pages of SO than fnTally.
_____________
Code for TallyGenerator
September 21, 2020 at 1:10 pm
Steve, You've got a wrong analogy here.
The better one would be this:;
You live in late Soviet Union and have to choose which car to buy: Porsche 911 and ZAZ 968 (https://youtu.be/So9r_UZI6zQ)
Cars are pretty similar: light weight, 2 doors, engine at the back, headlights, steering wheel - lots of similarities.
As for the differences - there are few.
Porsche is hard to find on the streets, when ZAZ is pretty much on every corner;
ZAZ is much cheaper to buy and service. In fact, ZAZ is so simple, you don't have to be a big time mechanic to fix it yourself (you're gonna be fixing it all the time, but that's another story);
Spares and supplies for ZAZ are much more accessible, and, again, unlike Porsche, you should be able to fit them yourself, not needing to book a certified mechanic to do that.
As for performance - well, it's still 30 min drive in a traffic jam, you might miss couple of traffic lights on your way, but ZAZ would not be much slower than Porsche.
Which car you choose - it's up to you. Should Porsche really come with detailed description of all the advantages for every other crappy car on the market? Well, if you want something better, you need to do your homework.
If it's too much of a trouble, then you're gonna end up driving a piece of junk.
And I must say - for whatever reasons, ZAZ was way more common on the streets of USSR than Porsche.
Just like rCTE's are much more common on the pages of SO than fnTally.
When James May's hand buckles the hood of the car, lol.
My friend Alec frequently posts what he calls "Soviet Jokes" on Facebook. Here's his latest
Soviet Joke Rubrik is Back:
A frightened man came to the KGB.
"My talking parrot has disappeared."
"That's not the kind of case we handle. Go to the criminal police."
'Excuse me, of course I know that I must go to them. I am here just to tell you officially that I disagree with the parrot."
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
September 21, 2020 at 2:30 pm
If you look at my answer count v.s. the number of points I have and the position I have on Stack Overflow, you'll see that quantity doesn't matter much and, even if it did, I'm simply not going to get sucked into the SO "points matter" mindset because it's stupid. People posting the incremental rCTE solutions are doing a lot of damage to those that don't know better, which is most of the people that ask questions on such forums. All they want is ANY solution that works. Most don't give two hoots in hell about performance.
It's why I don't frequent SO very much... it gives me cramps. 😀 Ed Wagner (good friend, fellow DBA, and valued heavy hitter on SSC) recently told me about the time he had kind of a long post explaining the best way to do something... some of the moderators removed the post because "it was too long". He's never posted on SO again and I don't blame him.
SO is a different thing altogether. With my question that requires context and more than a hit-and-run answer I brought it here. A few times I've noticed questions there which were going to require more than 1 turn of the screw to get right and I thought they would be better off asking somewhere else, like here. But you can't do that so I just let it go.
Maybe I have been caught up in the points thing. For me it's (sometimes) fun and it's also supposed to be a good means to a good end. I really enjoy SQL programming the more the better. Prior to SSC a little over a year ago I had never posted on any SQL site because I never really had anything to say. 5 years(!) ago when I learned it was possible to really resolve the data access knot using .NET Core I quit SQL programming. Now I have something that can tremendously elevate SQL development and that's too good not to share. How to influence people to be interested enough to take a look? What avenues are available to do that? I'm trying different things.
It's frustrating with SSC the conversation being largely directed by question asking. I do like the "more than 1 turn of the screw" approach also. It's not adversarial and it generally results in better outcomes. What I would like to see are threads that are people's projects and that others can offer their views on. I have some code that I want to get feedback on updating but it's a project and not a single question. I don't know what category it would fit in or if that's something that would be received well.
Jeff, I was going to mark your response above as the answer but it actually doesn't answer the question. Your answer is merely complete and I asked for minimal. It's not minimal it's only complete.
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
Jeff, I was going to mark your response above as the answer but it actually doesn't answer the question. Your answer is merely complete and I asked for minimal. It's not minimal it's only complete.
That's because the notion of "minimal", although in itself can be correct, won't actually fix your problem, which is your previous incorrect usage of the fnTally or inline cCTE (Cascading CTE). 😉 You have to set the correct "row goal" even if you had a "minimal" object to pull from because such a "minimal" object will almost always have more rows available than you need. Having the row goal for such an object in the WHERE clause of the outer query that uses the object isn't correct except when you have a physical Tally table with a Unique Clustered Index on the "N" column.
For example, you DID create a mostly "minimal" Tally cCTE but then you didn't use it correctly and it because "horribly inefficient". Again, it's not inefficient... the code with a very late "row goal" that you wrapped it in is the actual problem and that's why I posted the "complete" rather than the "minimal" solution you incorrectly think you're looking for. 😀
If you really want to help folks on SO, find out how and where I can publish things like the fnTally function (I've actually completed another mod to control runaway parallelism because the execution plan moves the "row goal" to a bad spot depending on system settings. And, no... it's not the latest one I have published on SSC... I need to update that one, as well). I know there some sort of a public WIKI on SO but I got tired of figuring out how to get to it and make a submittal.
The other thing that concerns me about SO is that some snot-head might think they have a better way and edit the code I post. I don't mind improvements to my code (it's what I look forward to on SSC, actually) but most snot-heads don't even test and those that do, normally do so almost casually and ineffectively.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 22, 2020 at 12:11 am
Maybe my confusion on this subject is that maybe I've guessed wrong about what you mean by "Minimal" so let's look back at your original question...
What is the minimal way to create a tally table/tvf/cte and to describe its advantages (if any) versus recursive cte's?
You also posted code that had a fairly serious problem with how the Tally CTE was being used and I thought solving that problem was what you were really after. Writing a "minimal" inline function of, say, 10K would help when such errors are made but, IMHO, it would be a coverup and poor replacement for an rCTE. To be honest, I don't even want to show how to write one because of that problem.
As for convincing others, "A man forced against his will is of the same opinion still". I don't try to convince the rCTE users directly anymore because they're not prepared to listen about things like "more reads that a while loop" and "a lot more CPU" because they're convinced that such things don't matter for small row counts. They're also convinced that learning something new (how ironic... they're teach rCTEs to someone that didn't know about them and so is also "new") will take too much Developer time and that a minute of Developer time is somehow worth more than a bar of Platinum.
The only thing you can do is a demo that demonstrates how much CPU, Duration, and how many Logical Reads the given rCTE and other solutions took and then soundly beat it with the inline Tally CTE. They'll see it. Hopefully, so will the OP and the lurkers that read it. Do it as many times as they write rCTEs and maybe they'll finally get it.
Explanations don't work on people not prepared to listen. If you can't demonstrate it, it's not worth the argument with a thread full of ring-knockers and naysayers. Let the code do the talking.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 22, 2020 at 4:13 pm
Here are new Fiddles. The old Fiddles were not in this more popular Fiddle. There's more than one Fiddle FYI.
Performance comparison of fnNumbers vs fnNumbersRC inserting 10x row counts (n=1,000)
Performance comparison of fnNumbers vs fnNumbersRC inserting 10x row counts (n=10,000)
Performance comparison of fnNumbers vs fnNumbersRC inserting 20x row counts (n=1,000)
Performance comparison of fnNumbers vs fnNumbersRC inserting 40x row counts (n=1,000)
Performance comparison of fnNumbers vs fnNumbersRC inserting 100x row counts (n=1,000)
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
Viewing 15 posts - 16 through 30 (of 46 total)
You must be logged in to reply to this topic. Login to reply