February 28, 2010 at 10:52 am
OK then, let's take a look and up the ante to 4 billion rows at the same time.
Database in SIMPLE recovery mode.
ZERO growth in database files.
ZERO growth in tempdb files.
set statistics time on
;
with
cte1 as
(select TOP 1681 1 as col from master..spt_values)
,
cte2 as
(select row_number() over (order by (select 0)) as row from cte1
cross join cte1 AS T2
cross join cte1 AS T3)
SELECT count_big(all case
when row%15 = 0 then 'FIZZBUZZ'
when row%5 = 0 then 'BUZZ'
when row%3 = 0 then 'FIZZ'
else cast(row as varchar(10))
end)
--from cte2 where row < 150000001
from cte2 where row < 4000000001
set statistics time off
February 28, 2010 at 1:55 pm
steve-893342 (2/28/2010)
OK then, let's take a look and up the ante to 4 billion rows at the same time.Database in SIMPLE recovery mode.
ZERO growth in database files.
ZERO growth in tempdb files.
set statistics time on
;
with
cte1 as
(select TOP 1681 1 as col from master..spt_values)
,
cte2 as
(select row_number() over (order by (select 0)) as row from cte1
cross join cte1 AS T2
cross join cte1 AS T3)
SELECT count_big(all case
when row%15 = 0 then 'FIZZBUZZ'
when row%5 = 0 then 'BUZZ'
when row%3 = 0 then 'FIZZ'
else cast(row as varchar(10))
end)
--from cte2 where row < 150000001
from cte2 where row < 4000000001
set statistics time off
That is roughly what I had a week or so ago, before I realised that Jason's method using no table access was inevitably better. The only differences are that I used a three way join on a subset of master.sys.all_columns (I also tried using instead master.dbo.Tally, which on my system is a public read access 11000 Tally table), and that I was working with a smaller number of rows because the cube root of 4,294,967,296 is a bit under 1626, so that's big enough with a 3-way join. It ran several times more slowly than using a 6 way join that didn't read any tables. The trouble appears to be that reading more than 1000 records from a system view uses more CPU than doing the whole thing without reading from filestore.
I thought about using a 5 way self-join, but it requires 85 elements in the base CTE and the gain (if any) seemed unlikely to be worth the extra typing.
Tom
March 1, 2010 at 5:40 am
steve-893342 (2/28/2010)
OK then, let's take a look and up the ante to 4 billion rows at the same time.Database in SIMPLE recovery mode.
ZERO growth in database files.
ZERO growth in tempdb files.
set statistics time on
;
with
cte1 as
(select TOP 1681 1 as col from master..spt_values)
,
cte2 as
(select row_number() over (order by (select 0)) as row from cte1
cross join cte1 AS T2
cross join cte1 AS T3)
SELECT count_big(all case
when row%15 = 0 then 'FIZZBUZZ'
when row%5 = 0 then 'BUZZ'
when row%3 = 0 then 'FIZZ'
else cast(row as varchar(10))
end)
--from cte2 where row < 150000001
from cte2 where row < 4000000001
set statistics time off
You got zero growth everywhere? Cool! I'll have to give the CTE version of the table join a shot at it. How long did it take your system and how fast is the processor? I ask because on the test I did about 6 months ago, it took 40 minutes just for a billion rows on my poor ol' 1.8GHz 8 year old box.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 1, 2010 at 8:50 am
Well I ran it on my laptop (2.13 GHz, 1 GB RAM)
and it took 47 minutes for the 4 billion rows. On a more conventional server set up it took 20 minutes.
Have you got a really beefy server you can run this on?
I'm interested in how fast this will go!
March 2, 2010 at 4:45 am
OK, so how far can we push this one then with the tools at my disposal? 100 billion is a nice round number and is just a billion times greater than the modest 100 used at the inception of the FIZZBUZZ challenge. On this SQL Server 2008 box I am currently using, the 4 billion row query took 17 minutes. If this query is scalable to the 100 billion level, then it should take roughly 17 minutes x 25 which is just over 7 hours. The actual time taken for the 100 billion row query was 6 hours 56 minutes. A scalability almost exactly as predicted!
with
cte1 as
(select TOP 4642 1 as col from master.sys.all_columns)
,
cte2 as
(select row_number() over (order by (select 0)) as row from cte1
cross join cte1 AS T2
cross join cte1 AS T3)
SELECT count_big(all case
when row%15 = 0 then 'FIZZBUZZ'
when row%5 = 0 then 'BUZZ'
when row%3 = 0 then 'FIZZ'
else cast(row as varchar(12))
end)
from cte2 where row < 100000000001
March 2, 2010 at 4:46 pm
I have posted a solution on ASK that does the COUNT_BIG select for 4 Billion rows in 16 minutes on SQL express on my laptop, while I am working in outlook and have a vm running an erp package, so I am sure it would be quicker on a server with lots of super fast RAM.
Surprisingly (maybe) instead of using CTEs I used good old Views and in all my tests it is slightly quicker than using CTEs....
If only there was a quick alternative to the row_number function that also allowed parallelism, then we could use more than one thread and speed it up even more....
I basically use a simple union of 32 "select 1" statements in the first view, then cross join that with itself in two more views to get over a million potential rows.
I then use a fourth cross join view to do the ROW_NUMBER thing and make available over a trillion rows
The last view does the fizzbuzz thing using the row numbers generated in view 4.
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
March 2, 2010 at 5:09 pm
nice solution
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
March 2, 2010 at 5:33 pm
Why thank you Mister Magoo that's very interesting feedback. I have most recently been looking at my version of the 4 billion on a virtual server running SQL Server 2005 and it's taking 15 minutes. My record is 13 minutes on a slightly higher spec virtual machine. Your timings are really interesting because it's really difficult to gauge what they mean when taken in isolation.
BTW do you intend to go for the trillion any time soon?
Here's my version
THE TRILLION
with
cte1 as
--(select TOP 10000 1 as col from master..syscolumns)
(select TOP 10000 1 as col from master..tally)
,
cte2 as
(select row_number() over (order by (select 0)) as row from cte1
cross join cte1 AS T2
cross join cte1 AS T3)
SELECT count_big(all case
when row%15 = 0 then 'FIZZBUZZ'
when row%5 = 0 then 'BUZZ'
when row%3 = 0 then 'FIZZ'
else cast(row as varchar(14))
end)
from cte2 where row < 1000000000001
March 2, 2010 at 5:44 pm
@jason - thanks.
@steve-2 - yes - just about to try it, but I am not sure if my laptop will have enough memory - have you done any monitoring to see what is required for the trillion?
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
March 2, 2010 at 8:41 pm
Too late now!
I experimented with those union queries but couldn't find the right one, probably having too many unions or too few. Did you arrive at 32 by a trial and error process or did you have an inkling in advance?
March 2, 2010 at 8:43 pm
Tom
I think you'll find my query is written in a different way which speeds it up by a factor of 4.
March 3, 2010 at 2:05 am
[font="Arial"]
Using an xml path clause, it's possible to perform the module operation only for 3 and 5, because the output for 15 is the concatenation of the other values.
;with cte(num) as (
select 1 union all
select num+1 from cte
where num<100)
select
isnull(
(select ltrim(string) from
(select 3 module,'Fizz' string union all select 5,'Buzz') T
where num % module=0
for xml path(''))
,num)
from cte
option (maxrecursion 100)
[/font]
March 3, 2010 at 3:48 am
steve-893342 (3/2/2010)
TomI think you'll find my query is written in a different way which speeds it up by a factor of 4.
Yes, "order by (select 0)" is a nice. It's a "you can't do that" error that you get if you try to use "order by NULL" or "order by 0" in that context in a UDF. Using (select 0) (or (select NULL) - I wonder if it makes a difference) eliminates an expensive sort. I'm surprised it's a factor 4, though, and I don't see any other differences (apart from not being wrapped in a UDF).
Tom
March 3, 2010 at 4:37 am
steve-893342 (3/2/2010)
Too late now!I experimented with those union queries but couldn't find the right one, probably having too many unions or too few. Did you arrive at 32 by a trial and error process or did you have an inkling in advance?
Re the unions: I kind of played around with various things - mostly based on vague hunches of what might be scalable.
Along the way I tried variants of what I selected inside the first view (int, bit, char), variants of how many to select (2,4,8,16,32 etc), variants of how many times to cross-join, but in the end it all came down to testing performance of the various options - rather than intelligent design....
I wanted to try views just because i thought it might (possibly) save a bit of compilation time and i was originally hoping for some parallelism (I don't know much about how the engine decides to parallelise(!?) obviously!) but it seems row_number() anywhere in this query prevents that...
RE: The TRILLION....:ermm: my first attempt (overnight) did not perform as I would like (it was still running after 6.5 hours) so I think I may have a think about the maths of the cross joins and maybe try tweaking the row counts in my views to suit the larger set - but maybe i have found the limit of the views approach?
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
March 3, 2010 at 4:57 am
Yes Mister Magoo that really is a very elegant solution indeed.
I've had a look at comparing against my query and it sure is a close one to call.
Re the TRILLION - I think you'll have to give it a lot longer than that!
Viewing 15 posts - 226 through 240 (of 363 total)
You must be logged in to reply to this topic. Login to reply