January 15, 2015 at 1:05 pm
Hello,
At what number of rows do you consider the tally table method to perform less efficiently than using an actual numbers table? I've recently had to do some stuff with a billion rows, and generating the TT for it took 10 minutes. Though I would imagine the ceiling is much lower than that - generating 10 or 100 million rows seems to precipitate a fairly steep drop off in performance vs the usual TOP 1000000 select to generate numbers.
Thanks
January 15, 2015 at 1:19 pm
Can you post EXACTLY what you are using for your various tests?
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
January 15, 2015 at 1:57 pm
Yeah, I just do this and replace the TOP with whatever target number.
USE tempdb
DECLARE @cntr BIGINT = 0;
;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))
,NUMS(N) AS (SELECT TOP 2147483647 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9)
SELECT @cntr = COUNT_BIG(*)
FROM NUMS
January 15, 2015 at 3:11 pm
I will compare that on my system with 2 other methods of doing numbers and report back. Is that the only method you tried? Sounded like you had at least one other.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
January 15, 2015 at 4:10 pm
Everything I've tried has been an extension of stuff I've seen around here. One thing I thought would make a difference is how many levels of CTEs I was stacking to achieve the row goals, but it was pretty slight.
This is all testing on a box with tempdb on XtremIO, fwiw. Heh.
USE tempdb
DECLARE @cntr BIGINT = 0;
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
E2(N) AS (SELECT 1 FROM E1 a, E1 b),
E4(N) AS (SELECT 1 FROM E2 a, E2 b),
E8(N) AS (SELECT 1 FROM E4 a, E4 b, E4 c, E4 d, E4 e, E4 f, E4 g, E4 h),
cteTally(N) AS (
SELECT TOP (2147483647) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E8
)
SELECT @cntr = COUNT_BIG(*)
FROM cteTally
PRINT @cntr
January 15, 2015 at 8:56 pm
sqldriver (1/15/2015)
Hello,At what number of rows do you consider the tally table method to perform less efficiently than using an actual numbers table? I've recently had to do some stuff with a billion rows, and generating the TT for it took 10 minutes. Though I would imagine the ceiling is much lower than that - generating 10 or 100 million rows seems to precipitate a fairly steep drop off in performance vs the usual TOP 1000000 select to generate numbers.
Thanks
I have to ask... what are you doing that requires a billion row set of sequential numbers? I ask because many people have made mistakes in the past where they thought the same thing but it turned out they almost always required something <= 8K rows.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 15, 2015 at 9:15 pm
I have several tables with between 1 billion and 2 billion rows in them that I have to generate reports on. Sometimes periods of inactivity, sometimes periods of high activity, particular event frequencies, etc. Adding the descending order time stamp index to them was... interesting.
Er, now that I think about it, I should probably figure out how I'm going to change that identity column to a bigint, huh?
:Whistling:
January 15, 2015 at 10:25 pm
sqldriver (1/15/2015)
Yeah, I just do this and replace the TOP with whatever target number.
USE tempdb
DECLARE @cntr BIGINT = 0;
;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))
,NUMS(N) AS (SELECT TOP 2147483647 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9)
SELECT @cntr = COUNT_BIG(*)
FROM NUMS
Quick thought, this inline tally code returns 10^9 => 1000000000, falls 1147483647 rows short of the target.
😎
On the speculation on the cost of the number of joins, they do come at a cost, hardly noticeable on smaller set but can be substantial on larger ones, consider this code which has three different "seeds" for the tally:
USE tempdb;
GO
SET NOCOUNT ON;
DECLARE @cntr BIGINT = 0;
DECLARE @TESTRES TABLE(TDT DATETIME2(7) NOT NULL DEFAULT(SYSDATETIME()),TTEXT VARCHAR(50) NOT NULL);
INSERT INTO @TESTRES(TTEXT) VALUES('10^10');
;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))
,NUMS(N) AS (SELECT TOP 2147483647 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9,T T10)
SELECT @cntr = COUNT_BIG(*)
FROM NUMS NM;
INSERT INTO @TESTRES(TTEXT) VALUES('10^10');
INSERT INTO @TESTRES(TTEXT) VALUES('100^5');
;WITH T(N) AS (SELECT N FROM (
VALUES
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
) AS X(N))
,NUMS(N) AS (SELECT TOP 2147483647 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4,T T5)
SELECT @cntr = COUNT_BIG(*)
FROM NUMS NM;
INSERT INTO @TESTRES(TTEXT) VALUES('100^5');
INSERT INTO @TESTRES(TTEXT) VALUES('250^4');
;WITH T(N) AS (SELECT N FROM (
VALUES
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
) AS X(N))
,NUMS(N) AS (SELECT TOP 2147483647 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4)
SELECT @cntr = COUNT_BIG(*)
FROM NUMS NM;
INSERT INTO @TESTRES(TTEXT) VALUES('250^4');
SELECT
T.TTEXT
,DATEDIFF(MICROSECOND, MIN(T.TDT),MAX(T.TDT)) AS DURATION
FROM @TESTRES T
GROUP BY T.TTEXT;
Results
TTEXT DURATION
------ ----------
10^10 103271692
100^5 68606038
250^4 70060435
January 16, 2015 at 1:00 am
sqldriver (1/15/2015)
I have several tables with between 1 billion and 2 billion rows in them that I have to generate reports on. Sometimes periods of inactivity, sometimes periods of high activity, particular event frequencies, etc. Adding the descending order time stamp index to them was... interesting.Er, now that I think about it, I should probably figure out how I'm going to change that identity column to a bigint, huh?
:Whistling:
I don't know why you'd need a Tally Table of fnTally for that, though.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 16, 2015 at 8:05 am
Jeff Moden (1/16/2015)
sqldriver (1/15/2015)
I have several tables with between 1 billion and 2 billion rows in them that I have to generate reports on. Sometimes periods of inactivity, sometimes periods of high activity, particular event frequencies, etc. Adding the descending order time stamp index to them was... interesting.Er, now that I think about it, I should probably figure out how I'm going to change that identity column to a bigint, huh?
:Whistling:
I don't know why you'd need a Tally Table of fnTally for that, though.
Because the reports all have additional column(s) added that are either NULL, or the qualifying date/event/user. So the tally table is particularly helpful when someone who will undoubtedly not look at the data wants to see, say, inactive periods in a week, or day, or month(s).
January 16, 2015 at 8:06 am
Eirikur Eiriksson (1/15/2015)
sqldriver (1/15/2015)
Yeah, I just do this and replace the TOP with whatever target number.
USE tempdb
DECLARE @cntr BIGINT = 0;
;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))
,NUMS(N) AS (SELECT TOP 2147483647 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9)
SELECT @cntr = COUNT_BIG(*)
FROM NUMS
Quick thought, this inline tally code returns 10^9 => 1000000000, falls 1147483647 rows short of the target.
😎
On the speculation on the cost of the number of joins, they do come at a cost, hardly noticeable on smaller set but can be substantial on larger ones, consider this code which has three different "seeds" for the tally:
USE tempdb;
GO
SET NOCOUNT ON;
DECLARE @cntr BIGINT = 0;
DECLARE @TESTRES TABLE(TDT DATETIME2(7) NOT NULL DEFAULT(SYSDATETIME()),TTEXT VARCHAR(50) NOT NULL);
INSERT INTO @TESTRES(TTEXT) VALUES('10^10');
;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))
,NUMS(N) AS (SELECT TOP 2147483647 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9,T T10)
SELECT @cntr = COUNT_BIG(*)
FROM NUMS NM;
INSERT INTO @TESTRES(TTEXT) VALUES('10^10');
INSERT INTO @TESTRES(TTEXT) VALUES('100^5');
;WITH T(N) AS (SELECT N FROM (
VALUES
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
) AS X(N))
,NUMS(N) AS (SELECT TOP 2147483647 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4,T T5)
SELECT @cntr = COUNT_BIG(*)
FROM NUMS NM;
INSERT INTO @TESTRES(TTEXT) VALUES('100^5');
INSERT INTO @TESTRES(TTEXT) VALUES('250^4');
;WITH T(N) AS (SELECT N FROM (
VALUES
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
) AS X(N))
,NUMS(N) AS (SELECT TOP 2147483647 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4)
SELECT @cntr = COUNT_BIG(*)
FROM NUMS NM;
INSERT INTO @TESTRES(TTEXT) VALUES('250^4');
SELECT
T.TTEXT
,DATEDIFF(MICROSECOND, MIN(T.TDT),MAX(T.TDT)) AS DURATION
FROM @TESTRES T
GROUP BY T.TTEXT;
Results
TTEXT DURATION
------ ----------
10^10 103271692
100^5 68606038
250^4 70060435
Yeah, that was my bad on the copy/paste. I was still waiting for that particular execution to finish. Thanks, though. I'm going to see if those differences hold up on my end.
EDIT:
Not too shabby.
TTEXTDURATION
10^10135318870
100^5122927660
250^4115224179
January 16, 2015 at 8:42 am
sqldriver (1/16/2015)
Jeff Moden (1/16/2015)
sqldriver (1/15/2015)
I have several tables with between 1 billion and 2 billion rows in them that I have to generate reports on. Sometimes periods of inactivity, sometimes periods of high activity, particular event frequencies, etc. Adding the descending order time stamp index to them was... interesting.Er, now that I think about it, I should probably figure out how I'm going to change that identity column to a bigint, huh?
:Whistling:
I don't know why you'd need a Tally Table of fnTally for that, though.
Because the reports all have additional column(s) added that are either NULL, or the qualifying date/event/user. So the tally table is particularly helpful when someone who will undoubtedly not look at the data wants to see, say, inactive periods in a week, or day, or month(s).
That's what I thought. There are less than 11,000 days in a 30 year period. If you think you need 1 or 2 billion rows in a Tally Table to provide a missing time period reference for time periods as small as a single day, then you're doing something terribly wrong. The only time I might be able to see the need for a billion row Tally Table is if your time period is in seconds, which would require something less than 950,400,000.
You should post the code for one of the reports so we can see what you're doing.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 16, 2015 at 8:47 am
Jeff Moden (1/16/2015)
sqldriver (1/16/2015)
Jeff Moden (1/16/2015)
sqldriver (1/15/2015)
I have several tables with between 1 billion and 2 billion rows in them that I have to generate reports on. Sometimes periods of inactivity, sometimes periods of high activity, particular event frequencies, etc. Adding the descending order time stamp index to them was... interesting.Er, now that I think about it, I should probably figure out how I'm going to change that identity column to a bigint, huh?
:Whistling:
I don't know why you'd need a Tally Table of fnTally for that, though.
Because the reports all have additional column(s) added that are either NULL, or the qualifying date/event/user. So the tally table is particularly helpful when someone who will undoubtedly not look at the data wants to see, say, inactive periods in a week, or day, or month(s).
That's what I thought. There are less than 11,000 days in a 30 year period. If you think you need 1 or 2 billion rows in a Tally Table to provide a missing time period reference for time periods as small as a single day, then you're doing something terribly wrong. The only time I might be able to see the need for a billion row Tally Table is if your time period is in seconds, which would require something less than 950,400,000.
You should post the code for one of the reports so we can see what you're doing.
You guessed right on the seconds 🙂
"Lawyers" and their "billable hours"; we should all be so lucky.
January 16, 2015 at 9:03 am
sqldriver (1/16/2015)
You guessed right on the seconds 🙂
"Lawyers" and their "billable hours"; we should all be so lucky.
Oh my word! That's bloody well insane. :hehe: Do they track "potty time"? 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
January 16, 2015 at 12:46 pm
Jeff Moden (1/16/2015)
sqldriver (1/16/2015)
You guessed right on the seconds 🙂
"Lawyers" and their "billable hours"; we should all be so lucky.
Oh my word! That's bloody well insane. :hehe: Do they track "potty time"? 😛
Don't forget the different rates for the different activities, wiping is 75% of the dropping rate...:unsure:
😎
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply