How to create a minimal tally table and to describe its advantages (if any)

Question

How to create a minimal tally table and to describe its advantages (if any)

Steve Collins

SSCrazy

Points: 2247
More actions
September 18, 2020 at 3:14 pm
Go to Answer
#3789482
What is the minimal way to create a tally table/tvf/cte and to describe its advantages (if any) versus recursive cte's? For the past several weeks I've observed most (all) of the SQL Server questions and answers on Stack Overflow. Over and over and over the questions require some form of sequence generation. Over and over and over the tally based solutions almost always lose to a recursive cte based solution. It's very frustrating for the Answer-ers (like me) offering the tally based solutions. What can be done about this?
Here's a CTE based tally approach I've been using. However, analysis of the execution plan shows it's horribly inefficient.
```
drop table if exists #a;
go
create table #a(
  id            int,
  [start_date]  date,
  [end_date]    date,
  a             int,
  b             int);

insert #a(id, [start_date], [end_date], a, b) values
(1, '2020-09-18', '2020-09-28', 4, 2),
(2, '2019-09-18', '2019-10-12', 6, 3);

/* this code is horribly inefficient */
;with
n(n) as (
    select * from (values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) v(n)),
tally_cte(n) as (
     select row_number() over (order by (select null)) 
     from n n1 cross join n n2 cross join n n3 cross join n n4 cross join n n5)
select id, dateadd(d, n-1, t.[start_date]) dt, a, b
from #a t 
     cross apply 
     tally_cte tc 
where tc.n<=datediff(d, t.[start_date], t.[end_date])+1
order by 1, 2;
```
Here's the least "scary" looking numbers tvf I've been able to come up with. Does anyone have anything more succinct? I'm calling it fnNumbers because I think "tally" is a scary word for noobs to grok. Sorry, maybe that's not true but I'm trying to eliminate all possible objections. fnNumbers seems simpler.
```
drop function if exists [dbo].[fnNumbers];
go
create function [dbo].[fnNumbers](
  @zero_or_one   bit, 
  @n             bigint)
returns table with schemabinding as return
with n(n) as (select null from (values (1),(2),(3),(4)) n(n))
select 0 n where @zero_or_one = 0
union all
select top(@n) row_number() over(order by (select null)) n
from n na, n nb, n nc, n nd, n ne, n nf, n ng, n nh,
     n ni, n nj, n nk, n nl, n nm, n np, n nq, n nr;
go
```
- This topic was modified 5 years, 5 months ago by Steve Collins. Reason: Simplified the tvf a little more
- This topic was modified 5 years, 5 months ago by Steve Collins.
- This topic was modified 5 years, 5 months ago by Steve Collins. Reason: Made the values list 1, 2, 3, 4 to visually imply sequence
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Viewing 15 posts - 1 through 15 (of 47 total)

You must be logged in to reply to this topic. Login to reply

Jeffrey Williams SSC Guru Points: 90351 More actions · Answer 1

The first part of the problem is how you are generating the tally - you are generating too many rows. The second part of the problem is the return from ROW_NUMBER() which returns a bigint and that introduces an implicit conversion.

Third problem - where tc.n<=datediff(d, t.[start_date], t.[end_date])+1 - uses a function to 'limit' the rows.

And finally, your code is generating a 'number' or 'tally' CTE instead of generating the list of dates. We can limit the number of rows generated by making sure we only generate the number of rows needed in the tally and generate the list of dates at the same time.

drop table if exists #a;
go
create table #a(
  id            int,
  [start_date]  date,
  [end_date]    date,
  a             int,
  b             int);

insert #a(id, [start_date], [end_date], a, b) values
(1, '2020-09-18', '2020-09-28', 4, 2),
(2, '2019-09-18', '2019-10-12', 6, 3);

Declare @startDate date = (Select min(start_date) From #a)
      , @endDate date = (Select max(end_date) From #a);

   With t(n)
     As (
 Select t.n 
   From (
 Values (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)
      , (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)
      , (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)
      , (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) As t(n)
        )
      , dates (inputDate)
     As (
 Select Top (datediff(day, @startDate, @endDate) + 1) 
        dateadd(day, checksum(row_number() over(Order By @@spid) - 1), @startDate) 
   From t t1, t t2, t t3, t t4
        )
 Select *
   From dates           d
  Inner Join #a         a On d.inputDate Between a.[start_date] And a.end_date;

There are a few more optimizations that we can perform - depending on the 'range' of data. For example, if you know that you will never need more than 8000 rows:

Declare @inputString varchar(8000) = 'some text string';

   With t(n)
     As (
 Select t.n 
   From (
 Values (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)
      , (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) As t(n)
        )
      , iTally (n)
     As (
 Select Top (len(@inputString)) 
        checksum(row_number() over(Order By @@spid)) 
   From t t1, t t2, t t3                                -- 8000 rows
        )
 Select *
   From iTally   it
 ...

If you cannot define the limits - that is, you must use the values from a table to determine the number of rows to generate - then an inline-table valued function that accepts the start/end range or length can be used. That is then cross applied to the table passing in the start/end ranges.

Last note - on any functions I recommend avoiding varchar(max)/nvarchar(max) as an input parameter. As soon as you put that into your function as an input parameter you are guaranteed to have cardinality issues due to implicit conversions and the performance will definitely be worse.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

Steve Collins SSCrazy Points: 2247 More actions · Answer 2

Jeffrey Williams wrote:

The first part of the problem is how you are generating the tally - you are generating too many rows. The second part of the problem is the return from ROW_NUMBER() which returns a bigint and that introduces an implicit conversion.
Third problem - where tc.n<=datediff(d, t.[start_date], t.[end_date])+1 - uses a function to 'limit' the rows.
And finally, your code is generating a 'number' or 'tally' CTE instead of generating the list of dates. We can limit the number of rows generated by making sure we only generate the number of rows needed in the tally and generate the list of dates at the same time.
There are a few more optimizations that we can perform - depending on the 'range' of data. For example, if you know that you will never need more than 8000 rows:
If you cannot define the limits - that is, you must use the values from a table to determine the number of rows to generate - then an inline-table valued function that accepts the start/end range or length can be used. That is then cross applied to the table passing in the start/end ranges.
Last note - on any functions I recommend avoiding varchar(max)/nvarchar(max) as an input parameter. As soon as you put that into your function as an input parameter you are guaranteed to have cardinality issues due to implicit conversions and the performance will definitely be worse.

Thanks for the reply. Everything you've written is true but it doesn't help me. And I should know (and I do) how to optimize the query. The issue is there's no time for it. It has to be fast and simple and generic. A tvf would be best (fnTally is perfect) but it requires a "click away" to SSC to get the definition... vs copy/paste/run/mark answer in less than 5 minutes.

Here's a comparison

This code

drop function if exists [dbo].[fnNumbers];
go
create function [dbo].[fnNumbers](
  @zero_or_one   bit, 
  @n             bigint)
returns table with schemabinding as return
with n(n) as (select null from (values (1),(2),(3),(4)) n(n))
select 0 n where @zero_or_one = 0
union all
select top(@n) row_number() over(order by (select null)) n
from n na, n nb, n nc, n nd, n ne, n nf, n ng, n nh,
     n ni, n nj, n nk, n nl, n nm, n np, n nq, n nr;
go

print ('start_dt: '+cast(sysutcdatetime() as varchar(30)));
select count(*) from dbo.fnNumbers(0, 100000);
print ('end_dt: '+cast(sysutcdatetime() as varchar(30)));

Output

start_dt: 2020-09-18 16:51:10.2923573

    (1 row(s) affected)
    
end_dt: 2020-09-18 16:51:10.3079740

Now the recursive CTE

drop function if exists [dbo].[fnNumbersR];
go
create function [dbo].[fnNumbersR](
  @zero_or_one   smallint, 
  @n             bigint)
returns table with schemabinding as return
with numbers(n) as (
    select cast(@zero_or_one as int) n
    union all
    select n + 1
        from numbers
        where n < @n)
select n from numbers;
go

print ('start_dt: '+cast(sysutcdatetime() as varchar(30)));
select count(*) from dbo.fnNumbersR(0, 100000) option (maxrecursion 0)
print ('end_dt: '+cast(sysutcdatetime() as varchar(30)));

Output

start_dt: 2020-09-18 16:51:46.0738447

    (1 row(s) affected)
    
end_dt: 2020-09-18 16:51:47.8394976

The recursive cte takes 0:01:11 versus 0:00:02 for the tally based approach.

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Steve Collins SSCrazy Points: 2247 More actions · Answer 3

That's over 100,000 rows. When it drops to 10,000 the difference is much less. Then when it drops to 1,000 the difference is so small it's hard to measure.

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Steve Collins SSCrazy Points: 2247 More actions · Answer 4

Ok, here's a different comparison. Suppose the query contains 34 UNION ALL SELECTS from a tvf generating 1000 row values in sequence.

declare
  @start_dt         datetime2(7)=sysutcdatetime(),
  @end_dt           datetime2(7),
  @ms_diff          int;

select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) union all 
select count(*) from dbo.fnNumbers(0, 1000) 
select @end_dt=sysutcdatetime();
select @ms_diff=datediff(ms, @start_dt, @end_dt);
print ('ms_diff: '+cast(@ms_diff as varchar(30)));

Output

    (34 row(s) affected)
    
ms_diff: 0

Versus

declare
  @start_dt         datetime2(7)=sysutcdatetime(),
  @end_dt           datetime2(7),
  @ms_diff          int;

select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) union all 
select count(*) from dbo.fnNumbersR(0, 1000) 
option (maxrecursion 0) 
select @end_dt=sysutcdatetime();
select @ms_diff=datediff(ms, @start_dt, @end_dt);
print ('ms_diff: '+cast(@ms_diff as varchar(30)));

Output

    (34 row(s) affected)
    
ms_diff: 687

When generating multiple sequences in succession the tally based method wins with just a few dozen requests

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Steve Collins SSCrazy Points: 2247 More actions · Answer 5

If the number of queries is increased to 100 then ms_diff of the tally table is only 16 whereas for the recursive CTE it's 2531!

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Steve Collins SSCrazy Points: 2247 More actions · Answer 6

Jeffrey Williams, the more I look at your code the more I like it. I definitely missed a few things and using TOP in the select is so obvious now. Yes yes, I could and should do a better job of coding carefully. What I have in mind tho is a paragraph and/or DB Fiddle(s) (which I haven't used) that preemptively addresses the comparison between methods. Because the recursive CTE Answer-ers can blow answers out in just a few minutes. Without a tvf I haven't been able to match that. When the answer are offered (sometimes with tally CTE, sometimes with tvf (click away)) they don't get marked as answers. So imo and as far as I can tell it's not being communicated in an approachable way and I'd like to fix that but I need some help. What I'm doing now is not working.

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Steve Collins SSCrazy Points: 2247 More actions · Answer 7

It should be said I'm not looking for an advantage not available to anyone else. As soon as a different method works the others will just switch and then it's back being in the same canoe with the same paddle. But at least people won't have to write recursive CTE's which is no fun and it seems likely is not the ideal way in most situations

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Steve Collins SSCrazy Points: 2247 More actions · Answer 8

Here's a DB Fiddle that shows fnNumbers is about 4-10x faster to generate 100,000 rows.

Here's a DB Fiddle that shows fnNumbers is about 2x faster to generate 10,000 rows.

Here's a DB Fiddle that shows fnNumbers is about equal to generate 1,000 rows.

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Sergiy SSC Guru Points: 110209 More actions · Answer 9

Steve Collins wrote:

Thanks for the reply. Everything you've written is true but it doesn't help me. And I should know (and I do) how to optimize the query. The issue is there's no time for it. It has to be fast and simple and generic. A tvf would be best (fnTally is perfect) but it requires a "click away" to SSC to get the definition... vs copy/paste/run/mark answer in less than 5 minutes.

You may wish to check TallyGenerator script on SSC. Search for it by name.

See if it's what you're looking for.

_____________
Code for TallyGenerator

Jeff Moden SSC Guru Points: 1004703 More actions · Answer 10

Steve Collins wrote:

Over and over and over the tally based solutions almost always lose to a recursive cte based solution.

I've not checked the other posts on this thread yet but do you have some links where there problem is exhibited?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004703 More actions · Answer 11

Steve Collins wrote:

However, analysis of the execution plan shows it's horribly inefficient.

Until the later versions of SSMS 18.X, the execution plan doesn't really have anything that is actually a good determination of which code is actually going to be the fastest. Yeah, there's usually some pretty strong indicators (like massive rowcounts in arrows and other blocks) as to which may be better but execution plans can actually suck pretty bad for making the final decision as to which code is better.

I'm going to read through the rest of this thread before I make more comments and then we'll get back to this subject in particular.

EDIT: Ok... I see what you're looking at. You are, in fact, looking at the row counts and I agree. Horribly inefficient. It's because you've properly set the "row goal" for each "set" of numbers you want to generate for each row identified by the start and end date of the source table. Again, I'll be back in a later post to explain and to also explain why Jeffrey William's code is better but still incorrect.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004703 More actions · Answer 12

Steve Collins wrote:

Jeffrey Williams, the more I look at your code the more I like it. I definitely missed a few things and using TOP in the select is so obvious now. Yes yes, I could and should do a better job of coding carefully. What I have in mind tho is a paragraph and/or DB Fiddle(s) (which I haven't used) that preemptively addresses the comparison between methods. Because the recursive CTE Answer-ers can blow answers out in just a few minutes. Without a tvf I haven't been able to match that. When the answer are offered (sometimes with tally CTE, sometimes with tvf (click away)) they don't get marked as answers. So imo and as far as I can tell it's not being communicated in an approachable way and I'd like to fix that but I need some help. What I'm doing now is not working.

Again, can you post some links to threads where that has occurred?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004703 More actions · Answer 13

Steve Collins wrote:

What is the minimal way to create a tally table/tvf/cte and to describe its advantages (if any) versus recursive cte's? For the past several weeks I've observed most (all) of the SQL Server questions and answers on Stack Overflow. Over and over and over the questions require some form of sequence generation. Over and over and over the tally based solutions almost always lose to a recursive cte based solution. It's very frustrating for the Answer-ers (like me) offering the tally based solutions. What can be done about this?
Here's a CTE based tally approach I've been using. However, analysis of the execution plan shows it's horribly inefficient.
drop table if exists #a;
go
create table #a(
  id            int,
  [start_date]  date,
  [end_date]    date,
  a             int,
  b             int);

insert #a(id, [start_date], [end_date], a, b) values
(1, '2020-09-18', '2020-09-28', 4, 2),
(2, '2019-09-18', '2019-10-12', 6, 3);

/* this code is horribly inefficient */
;with
n(n) as (
    select * from (values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) v(n)),
tally_cte(n) as (
     select row_number() over (order by (select null)) 
     from n n1 cross join n n2 cross join n n3 cross join n n4 cross join n n5)
select id, dateadd(d, n-1, t.[start_date]) dt, a, b
from #a t 
     cross apply 
     tally_cte tc 
where tc.n<=datediff(d, t.[start_date], t.[end_date])+1
order by 1, 2;
Here's the least "scary" looking numbers tvf I've been able to come up with. Does anyone have anything more succinct? I'm calling it fnNumbers because I think "tally" is a scary word for noobs to grok. Sorry, maybe that's not true but I'm trying to eliminate all possible objections. fnNumbers seems simpler.
drop function if exists [dbo].[fnNumbers];
go
create function [dbo].[fnNumbers](
  @zero_or_one   bit, 
  @n             bigint)
returns table with schemabinding as return
with n(n) as (select null from (values (1),(2),(3),(4)) n(n))
select 0 n where @zero_or_one = 0
union all
select top(@n) row_number() over(order by (select null)) n
from n na, n nb, n nc, n nd, n ne, n nf, n ng, n nh,
     n ni, n nj, n nk, n nl, n nm, n np, n nq, n nr;
go

~~You've got everything there except the actual code that uses the function(s). Please post that so that I don't end up writing something different as the example that you're trying to convey.~~

EDIT: Never mind... I found the code in your SQLFiddles.

As for using fnNumbers instead of fnTally to keep from scaring noobies away, whatever. I usually won't do such a thing because too many people have written some really nasty and inefficient stuff using the word "Numbers" (or "Nums") and most other people that use the word "Tally" usually seem to do a better job. Of course, there are exceptions (for example, Itzik Ben-Gan started the good method with his "GetNums" function, which is a good exception) in both cases but that's been my overall observation.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Steve Collins SSCrazy Points: 2247 More actions · Answer 14

Ok here are two recent winning recursive cte's. Be gentle when evaluating the opposing answers 🙂

https://stackoverflow.com/questions/63922869/get-an-interval-of-dates-from-a-range-of-dates/63923536

https://stackoverflow.com/questions/63919706/fill-missing-months-on-a-date-query/63920844

As a bonus, here losing while using an ordinal splitter

https://stackoverflow.com/questions/63886788/extracting-date-from-text-in-sql-using-cursor/63887736

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können