What is the minimal way to create a tally table/tvf/cte and to describe its advantages (if any) versus recursive cte's? For the past several weeks I've observed most (all) of the SQL Server questions and answers on Stack Overflow. Over and over and over the questions require some form of sequence generation. Over and over and over the tally based solutions almost always lose to a recursive cte based solution. It's very frustrating for the Answer-ers (like me) offering the tally based solutions. What can be done about this?
Here's a CTE based tally approach I've been using. However, analysis of the execution plan shows it's horribly inefficient.
drop table if exists #a;
go
create table #a(
id int,
[start_date] date,
[end_date] date,
a int,
b int);
insert #a(id, [start_date], [end_date], a, b) values
(1, '2020-09-18', '2020-09-28', 4, 2),
(2, '2019-09-18', '2019-10-12', 6, 3);
/* this code is horribly inefficient */
;with
n(n) as (
select * from (values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) v(n)),
tally_cte(n) as (
select row_number() over (order by (select null))
from n n1 cross join n n2 cross join n n3 cross join n n4 cross join n n5)
select id, dateadd(d, n-1, t.[start_date]) dt, a, b
from #a t
cross apply
tally_cte tc
where tc.n<=datediff(d, t.[start_date], t.[end_date])+1
order by 1, 2;
Here's the least "scary" looking numbers tvf I've been able to come up with. Does anyone have anything more succinct? I'm calling it fnNumbers because I think "tally" is a scary word for noobs to grok. Sorry, maybe that's not true but I'm trying to eliminate all possible objections. fnNumbers seems simpler.
drop function if exists [dbo].[fnNumbers];
go
create function [dbo].[fnNumbers](
@zero_or_one bit,
@n bigint)
returns table with schemabinding as return
with n(n) as (select null from (values (1),(2),(3),(4)) n(n))
select 0 n where @zero_or_one = 0
union all
select top(@n) row_number() over(order by (select null)) n
from n na, n nb, n nc, n nd, n ne, n nf, n ng, n nh,
n ni, n nj, n nk, n nl, n nm, n np, n nq, n nr;
go
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
September 18, 2020 at 4:21 pm
The first part of the problem is how you are generating the tally - you are generating too many rows. The second part of the problem is the return from ROW_NUMBER() which returns a bigint and that introduces an implicit conversion.
Third problem - where tc.n<=datediff(d, t.[start_date], t.[end_date])+1 - uses a function to 'limit' the rows.
And finally, your code is generating a 'number' or 'tally' CTE instead of generating the list of dates. We can limit the number of rows generated by making sure we only generate the number of rows needed in the tally and generate the list of dates at the same time.
drop table if exists #a;
go
create table #a(
id int,
[start_date] date,
[end_date] date,
a int,
b int);
insert #a(id, [start_date], [end_date], a, b) values
(1, '2020-09-18', '2020-09-28', 4, 2),
(2, '2019-09-18', '2019-10-12', 6, 3);
Declare @startDate date = (Select min(start_date) From #a)
, @endDate date = (Select max(end_date) From #a);
With t(n)
As (
Select t.n
From (
Values (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)
, (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)
, (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)
, (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) As t(n)
)
, dates (inputDate)
As (
Select Top (datediff(day, @startDate, @endDate) + 1)
dateadd(day, checksum(row_number() over(Order By @@spid) - 1), @startDate)
From t t1, t t2, t t3, t t4
)
Select *
From dates d
Inner Join #a a On d.inputDate Between a.[start_date] And a.end_date;
There are a few more optimizations that we can perform - depending on the 'range' of data. For example, if you know that you will never need more than 8000 rows:
Declare @inputString varchar(8000) = 'some text string';
With t(n)
As (
Select t.n
From (
Values (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)
, (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) As t(n)
)
, iTally (n)
As (
Select Top (len(@inputString))
checksum(row_number() over(Order By @@spid))
From t t1, t t2, t t3 -- 8000 rows
)
Select *
From iTally it
...
If you cannot define the limits - that is, you must use the values from a table to determine the number of rows to generate - then an inline-table valued function that accepts the start/end range or length can be used. That is then cross applied to the table passing in the start/end ranges.
Last note - on any functions I recommend avoiding varchar(max)/nvarchar(max) as an input parameter. As soon as you put that into your function as an input parameter you are guaranteed to have cardinality issues due to implicit conversions and the performance will definitely be worse.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
September 18, 2020 at 4:56 pm
The first part of the problem is how you are generating the tally - you are generating too many rows. The second part of the problem is the return from ROW_NUMBER() which returns a bigint and that introduces an implicit conversion.
Third problem - where tc.n<=datediff(d, t.[start_date], t.[end_date])+1 - uses a function to 'limit' the rows.
And finally, your code is generating a 'number' or 'tally' CTE instead of generating the list of dates. We can limit the number of rows generated by making sure we only generate the number of rows needed in the tally and generate the list of dates at the same time.
There are a few more optimizations that we can perform - depending on the 'range' of data. For example, if you know that you will never need more than 8000 rows:
If you cannot define the limits - that is, you must use the values from a table to determine the number of rows to generate - then an inline-table valued function that accepts the start/end range or length can be used. That is then cross applied to the table passing in the start/end ranges.
Last note - on any functions I recommend avoiding varchar(max)/nvarchar(max) as an input parameter. As soon as you put that into your function as an input parameter you are guaranteed to have cardinality issues due to implicit conversions and the performance will definitely be worse.
Thanks for the reply. Everything you've written is true but it doesn't help me. And I should know (and I do) how to optimize the query. The issue is there's no time for it. It has to be fast and simple and generic. A tvf would be best (fnTally is perfect) but it requires a "click away" to SSC to get the definition... vs copy/paste/run/mark answer in less than 5 minutes.
Here's a comparison
This code
drop function if exists [dbo].[fnNumbers];
go
create function [dbo].[fnNumbers](
@zero_or_one bit,
@n bigint)
returns table with schemabinding as return
with n(n) as (select null from (values (1),(2),(3),(4)) n(n))
select 0 n where @zero_or_one = 0
union all
select top(@n) row_number() over(order by (select null)) n
from n na, n nb, n nc, n nd, n ne, n nf, n ng, n nh,
n ni, n nj, n nk, n nl, n nm, n np, n nq, n nr;
go
print ('start_dt: '+cast(sysutcdatetime() as varchar(30)));
select count(*) from dbo.fnNumbers(0, 100000);
print ('end_dt: '+cast(sysutcdatetime() as varchar(30)));
Output
start_dt: 2020-09-18 16:51:10.2923573
(1 row(s) affected)
end_dt: 2020-09-18 16:51:10.3079740
Now the recursive CTE
drop function if exists [dbo].[fnNumbersR];
go
create function [dbo].[fnNumbersR](
@zero_or_one smallint,
@n bigint)
returns table with schemabinding as return
with numbers(n) as (
select cast(@zero_or_one as int) n
union all
select n + 1
from numbers
where n < @n)
select n from numbers;
go
print ('start_dt: '+cast(sysutcdatetime() as varchar(30)));
select count(*) from dbo.fnNumbersR(0, 100000) option (maxrecursion 0)
print ('end_dt: '+cast(sysutcdatetime() as varchar(30)));
Output
start_dt: 2020-09-18 16:51:46.0738447
(1 row(s) affected)
end_dt: 2020-09-18 16:51:47.8394976
The recursive cte takes 0:01:11 versus 0:00:02 for the tally based approach.
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
September 18, 2020 at 5:19 pm
That's over 100,000 rows. When it drops to 10,000 the difference is much less. Then when it drops to 1,000 the difference is so small it's hard to measure.
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
September 18, 2020 at 5:55 pm
Ok, here's a different comparison. Suppose the query contains 34 UNION ALL SELECTS from a tvf generating 1000 row values in sequence.
declare
@start_dt datetime2(7)=sysutcdatetime(),
@end_dt datetime2(7),
@ms_diff int;
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000) union all
select count(*) from dbo.fnNumbers(0, 1000)
select @end_dt=sysutcdatetime();
select @ms_diff=datediff(ms, @start_dt, @end_dt);
print ('ms_diff: '+cast(@ms_diff as varchar(30)));
Output
(34 row(s) affected)
ms_diff: 0
Versus
declare
@start_dt datetime2(7)=sysutcdatetime(),
@end_dt datetime2(7),
@ms_diff int;
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000) union all
select count(*) from dbo.fnNumbersR(0, 1000)
option (maxrecursion 0)
select @end_dt=sysutcdatetime();
select @ms_diff=datediff(ms, @start_dt, @end_dt);
print ('ms_diff: '+cast(@ms_diff as varchar(30)));
Output
(34 row(s) affected)
ms_diff: 687
When generating multiple sequences in succession the tally based method wins with just a few dozen requests
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
September 18, 2020 at 6:09 pm
If the number of queries is increased to 100 then ms_diff of the tally table is only 16 whereas for the recursive CTE it's 2531!
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
September 18, 2020 at 6:49 pm
Jeffrey Williams, the more I look at your code the more I like it. I definitely missed a few things and using TOP in the select is so obvious now. Yes yes, I could and should do a better job of coding carefully. What I have in mind tho is a paragraph and/or DB Fiddle(s) (which I haven't used) that preemptively addresses the comparison between methods. Because the recursive CTE Answer-ers can blow answers out in just a few minutes. Without a tvf I haven't been able to match that. When the answer are offered (sometimes with tally CTE, sometimes with tvf (click away)) they don't get marked as answers. So imo and as far as I can tell it's not being communicated in an approachable way and I'd like to fix that but I need some help. What I'm doing now is not working.
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
September 18, 2020 at 7:45 pm
It should be said I'm not looking for an advantage not available to anyone else. As soon as a different method works the others will just switch and then it's back being in the same canoe with the same paddle. But at least people won't have to write recursive CTE's which is no fun and it seems likely is not the ideal way in most situations
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
September 18, 2020 at 10:41 pm
Here's a DB Fiddle that shows fnNumbers is about 4-10x faster to generate 100,000 rows.
Here's a DB Fiddle that shows fnNumbers is about 2x faster to generate 10,000 rows.
Here's a DB Fiddle that shows fnNumbers is about equal to generate 1,000 rows.
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
September 19, 2020 at 2:59 pm
Thanks for the reply. Everything you've written is true but it doesn't help me. And I should know (and I do) how to optimize the query. The issue is there's no time for it. It has to be fast and simple and generic. A tvf would be best (fnTally is perfect) but it requires a "click away" to SSC to get the definition... vs copy/paste/run/mark answer in less than 5 minutes.
You may wish to check TallyGenerator script on SSC. Search for it by name.
See if it's what you're looking for.
_____________
Code for TallyGenerator
September 19, 2020 at 3:08 pm
Over and over and over the tally based solutions almost always lose to a recursive cte based solution.
I've not checked the other posts on this thread yet but do you have some links where there problem is exhibited?
--Jeff Moden
Change is inevitable... Change for the better is not.
September 19, 2020 at 3:13 pm
However, analysis of the execution plan shows it's horribly inefficient.
Until the later versions of SSMS 18.X, the execution plan doesn't really have anything that is actually a good determination of which code is actually going to be the fastest. Yeah, there's usually some pretty strong indicators (like massive rowcounts in arrows and other blocks) as to which may be better but execution plans can actually suck pretty bad for making the final decision as to which code is better.
I'm going to read through the rest of this thread before I make more comments and then we'll get back to this subject in particular.
EDIT: Ok... I see what you're looking at. You are, in fact, looking at the row counts and I agree. Horribly inefficient. It's because you've properly set the "row goal" for each "set" of numbers you want to generate for each row identified by the start and end date of the source table. Again, I'll be back in a later post to explain and to also explain why Jeffrey William's code is better but still incorrect.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 19, 2020 at 3:21 pm
Jeffrey Williams, the more I look at your code the more I like it. I definitely missed a few things and using TOP in the select is so obvious now. Yes yes, I could and should do a better job of coding carefully. What I have in mind tho is a paragraph and/or DB Fiddle(s) (which I haven't used) that preemptively addresses the comparison between methods. Because the recursive CTE Answer-ers can blow answers out in just a few minutes. Without a tvf I haven't been able to match that. When the answer are offered (sometimes with tally CTE, sometimes with tvf (click away)) they don't get marked as answers. So imo and as far as I can tell it's not being communicated in an approachable way and I'd like to fix that but I need some help. What I'm doing now is not working.
Again, can you post some links to threads where that has occurred?
--Jeff Moden
Change is inevitable... Change for the better is not.
September 19, 2020 at 3:39 pm
What is the minimal way to create a tally table/tvf/cte and to describe its advantages (if any) versus recursive cte's? For the past several weeks I've observed most (all) of the SQL Server questions and answers on Stack Overflow. Over and over and over the questions require some form of sequence generation. Over and over and over the tally based solutions almost always lose to a recursive cte based solution. It's very frustrating for the Answer-ers (like me) offering the tally based solutions. What can be done about this?
Here's a CTE based tally approach I've been using. However, analysis of the execution plan shows it's horribly inefficient.
drop table if exists #a;
go
create table #a(
id int,
[start_date] date,
[end_date] date,
a int,
b int);
insert #a(id, [start_date], [end_date], a, b) values
(1, '2020-09-18', '2020-09-28', 4, 2),
(2, '2019-09-18', '2019-10-12', 6, 3);
/* this code is horribly inefficient */
;with
n(n) as (
select * from (values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) v(n)),
tally_cte(n) as (
select row_number() over (order by (select null))
from n n1 cross join n n2 cross join n n3 cross join n n4 cross join n n5)
select id, dateadd(d, n-1, t.[start_date]) dt, a, b
from #a t
cross apply
tally_cte tc
where tc.n<=datediff(d, t.[start_date], t.[end_date])+1
order by 1, 2;Here's the least "scary" looking numbers tvf I've been able to come up with. Does anyone have anything more succinct? I'm calling it fnNumbers because I think "tally" is a scary word for noobs to grok. Sorry, maybe that's not true but I'm trying to eliminate all possible objections. fnNumbers seems simpler.
drop function if exists [dbo].[fnNumbers];
go
create function [dbo].[fnNumbers](
@zero_or_one bit,
@n bigint)
returns table with schemabinding as return
with n(n) as (select null from (values (1),(2),(3),(4)) n(n))
select 0 n where @zero_or_one = 0
union all
select top(@n) row_number() over(order by (select null)) n
from n na, n nb, n nc, n nd, n ne, n nf, n ng, n nh,
n ni, n nj, n nk, n nl, n nm, n np, n nq, n nr;
go
You've got everything there except the actual code that uses the function(s). Please post that so that I don't end up writing something different as the example that you're trying to convey.
EDIT: Never mind... I found the code in your SQLFiddles.
As for using fnNumbers instead of fnTally to keep from scaring noobies away, whatever. I usually won't do such a thing because too many people have written some really nasty and inefficient stuff using the word "Numbers" (or "Nums") and most other people that use the word "Tally" usually seem to do a better job. Of course, there are exceptions (for example, Itzik Ben-Gan started the good method with his "GetNums" function, which is a good exception) in both cases but that's been my overall observation.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 19, 2020 at 4:32 pm
Ok here are two recent winning recursive cte's. Be gentle when evaluating the opposing answers 🙂
https://stackoverflow.com/questions/63922869/get-an-interval-of-dates-from-a-range-of-dates/63923536
https://stackoverflow.com/questions/63919706/fill-missing-months-on-a-date-query/63920844
As a bonus, here losing while using an ordinal splitter
https://stackoverflow.com/questions/63886788/extracting-date-from-text-in-sql-using-cursor/63887736
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
Viewing 15 posts - 1 through 15 (of 46 total)
You must be logged in to reply to this topic. Login to reply