May 7, 2011 at 9:42 pm
Peter:
Here is my tally table generation code. The important elements are likely the DBREINDEX and UPDATE STATISTICS; while you "shouldn't" need them, I've found a very small difference by using them... and this thread is all about finding very small differences.
Much of the other silliness (temp tally tables just to fill permanent ones) you see is to get it to work on SQL Server 2000 boxes with large values of N, to handle the VARCHAR(MAX) splitters and other larger work.
IF EXISTS (SELECT * FROM tempdb.dbo.sysobjects WHERE id = object_id(N'[tempdb].[dbo].[#TallyTempBase]'))
DROP TABLE [dbo].[#TallyTempBase]
CREATE TABLE #TallyTempBase
(N bit NOT NULL
);
INSERT INTO #TallyTempBase
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = object_id(N'[dbo].[Tally]') AND OBJECTPROPERTY(id, N'IsUserTable') = 1)
DROP TABLE [dbo].[Tally]
CREATE TABLE dbo.[Tally] (
[N] [smallint] NOT NULL ,
CONSTRAINT [PK_Tally_N] PRIMARY KEY CLUSTERED
(
[N]
) WITH FILLFACTOR = 100 ON [PRIMARY]
) ON [PRIMARY]
IF EXISTS (SELECT * FROM tempdb.dbo.sysobjects WHERE id = object_id(N'[tempdb].[dbo].[#Tally]'))
DROP TABLE [dbo].[#Tally]
SELECT TOP 32767
IDENTITY(SMALLINT,1,1) AS N
INTO #Tally
FROM #TallyTempBase t1 -- 16
CROSS JOIN #TallyTempBase t2 -- 256
CROSS JOIN #TallyTempBase t3 -- 4096
CROSS JOIN #TallyTempBase t4 -- 65536
INSERT INTO dbo.tally
SELECT TOP 32767 N
FROM #Tally
ORDER BY N
DROP TABLE #Tally
DBCC DBREINDEX('dbo.tally')
UPDATE STATISTICS tally WITH FULLSCAN, NORECOMPUTE
IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = object_id(N'[dbo].[Tally0Based]') AND OBJECTPROPERTY(id, N'IsUserTable') = 1)
DROP TABLE [dbo].[Tally0Based]
CREATE TABLE dbo.[Tally0Based] (
[N] [smallint] NOT NULL ,
CONSTRAINT [PK_Tally0Based_N] PRIMARY KEY CLUSTERED
(
[N]
) WITH FILLFACTOR = 100 ON [PRIMARY]
) ON [PRIMARY]
IF EXISTS (SELECT * FROM tempdb.dbo.sysobjects WHERE id = object_id(N'[tempdb].[dbo].[#Tally0Based]'))
DROP TABLE [dbo].[#Tally0Based]
SELECT TOP 32768
IDENTITY(SMALLINT,0,1) AS N
INTO #Tally0Based
FROM #TallyTempBase t1 -- 16
CROSS JOIN #TallyTempBase t2 -- 256
CROSS JOIN #TallyTempBase t3 -- 4096
CROSS JOIN #TallyTempBase t4 -- 65536
INSERT INTO dbo.Tally0Based
SELECT TOP 32768 N
FROM #Tally0Based
ORDER BY N
DROP TABLE #Tally0Based
DBCC DBREINDEX('dbo.Tally0Based')
UPDATE STATISTICS Tally0Based WITH FULLSCAN, NORECOMPUTE
P.S. Don't be surprised by the optimizer; it's entirely likely that some query hints are required to get optimal performance; from 2000 to 2008 R2 I've seen some mind-bogglingly bad choices from the optimizer; real performance tuning is now, and has always been, about benchmarking reality... because the theory doesn't always hold true in practice, even (though rarely) at the assembly code level (Pentium floating point bug, anyone?).
May 7, 2011 at 9:47 pm
Jeff Moden (5/7/2011)
...Does anyone else have any other suggestions for a speed enhancement before I do that?
Nadrek and Peter. Thanks a million for taking the time to tweek the code. Wayne, thanks for testing so many people's suggestions and posting the results in graph form. You've just got to love to collaboration that occurred in this thread. Well done, folks! 🙂
You're quite welcome!
For the temporary tally table variant, I would also suggest something other than a power of 10 multi-CTE; 20^3 is 8000, and 21^3 is just over 9000. They're not many more UNIONs, slightly smaller overall size, and one fewer join in the nested CTEs. Also try 91^2 (8281), with two fewer CTE's.
Also, make certain it's using a smallint, not an int; no reason to spend 4 bytes per number when it doesn't get over a value of 32767.
May 8, 2011 at 2:54 am
Nadrek (5/7/2011)
Jeff Moden (5/7/2011)
You're quite welcome!For the temporary tally table variant, I would also suggest something other than a power of 10 multi-CTE; 20^3 is 8000, and 21^3 is just over 9000. They're not many more UNIONs, slightly smaller overall size, and one fewer join in the nested CTEs. Also try 91^2 (8281), with two fewer CTE's.
Also, make certain it's using a smallint, not an int; no reason to spend 4 bytes per number when it doesn't get over a value of 32767.
I tried that too, but since the tally driving tables (E1 to E4) return a constant it does not seem to matter one bit between returning a bit or an int. None of these values are used and the only one that is used, is in the cteTally, which is a generated number and thus likely optimized to be memory-less as well. There seem to be no data stored in memory, other then some counters. I would need to look inside the memory usage itself to make sure (can be important on a busy system), but would expect a speed difference which i did not see during my previous tests.
I am certainly going to look a but further into it tho, as I did notice some implicit converts, sometimes even from int to bigint and then the next operation back to int again.
May 8, 2011 at 10:19 pm
Ok... Peter and Nadrek... I need to know your real names for inclusion in the final documentation in the updated sprocs that I'm going to attach to the article. Either tell me here or send me an email. If you prefer for your full names not to be known, let me know that and I'll just include your SSC handles.
But I need to know sooner than later so I can put this puppy to bed. Thanks folks. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
May 8, 2011 at 10:35 pm
By the way... I've attached the final results which also contains more charts than you can shake a stick at.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 9, 2011 at 3:59 am
Jeff Moden (5/8/2011)
Ok... Peter and Nadrek... I need to know your real names for inclusion in the final documentation in the updated sprocs that I'm going to attach to the article. Either tell me here or send me an email. If you prefer for your full names not to be known, let me know that and I'll just include your SSC handles.But I need to know sooner than later so I can put this puppy to bed. Thanks folks. 🙂
Well my name is: Peter de Heer
May 9, 2011 at 4:03 am
Jeff Moden (5/8/2011)
By the way... I've attached the final results which also contains more charts than you can shake a stick at.
And wow, that Split method (green line) does some crazy stuff, but is overall clearly the winner.
Is that the .NET CLR one or a new SQL based one I missed?
May 9, 2011 at 4:33 am
peter-757102 (5/9/2011)
Jeff Moden (5/8/2011)
By the way... I've attached the final results which also contains more charts than you can shake a stick at.And wow, that Split method (green line) does some crazy stuff, but is overall clearly the winner.
Is that the .NET CLR one or a new SQL based one I missed?
That's the CLR that Paul wrote for me. As you can tell from the first chart, it doesn't like stuff that doesn't need to be split and appears to be the only real spot where it has troubles. As you say, it's clearly the overall winner.
Thank you for the full name, Peter. You and Nadrek did good things on this thread. The final script I attach will be the documented version of your DelimitedSplit8k_T1 (without the _T1 in the name). This function has been "helped" by a lot of good folks over time and I wanted to be sure to give you both some credit for your extreme and well though out participation as well as the great improvement you made through DelimitedSplit8k_T1.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 9, 2011 at 5:23 am
Jeff
This article inspired me to write a TallyTable article for Access users/developers.
And thanks to the other commenters for your patience and enlightening comments.
Mark
May 9, 2011 at 5:39 am
About the little wiggly bits at the start of the green lines (the SQL CLR implementation): CLR code is compiled to an intermediate language to make it portable across different processor architectures. This code is further optimized and compiled to native machine code by the just-in time compiler when it runs on the machine for the first time. This conversion to optimal machine code is performed only once, so it survives SQL Server restarts for example.
This is the reason that tests should run the newly-installed CLR function with a few dummy values before 'hot laps'. As different code paths are compiled to machine code on demand, the CLR implementation reaches its full potential.
My guess is that Jeff forgot about this and the first runs (on very small numbers of elements and element lengths) were running CLR code that was not yet fully compiled to native code. It really doesn't matter though - the test results are utterly conclusive.
Also, bear in mind that the CLR code is obvious and basic, and written by an amateur in ten minutes. Compare that to the effort expended to try to get 10% better performance out of the TSQL solutions - a language which just is not well suited to these types of tasks.
Consider also that the CLR function works with strings up to 2GB (MAX data types), Unicode, and is the only implementation to scale linearly with the number of elements; look at the composite graphs: the CLR lines almost overlap!
I should probably also mention that CLR routines work exclusively with Unicode, so the CLR routine has the overhead of converting the VARCHAR input strings to Unicode, and the returned results (also Unicode) are twice as big, at two bytes per character. That means the CLR routine is writing twice as much data to disk in Jeff's tests, and converting to or from Unicode for every row of input and output!
Paul
May 9, 2011 at 5:56 am
SQLkiwi
Can people use ngen to compile the assembly and save the JIT overhead?
May 9, 2011 at 6:41 am
SQLkiwi (5/9/2011)
About the little wiggly bits at the start of the green lines (the SQL CLR implementation): CLR code is compiled to an intermediate language to make it portable across different processor architectures. This code is further optimized and compiled to native machine code by the just-in time compiler when it runs on the machine for the first time. This conversion to optimal machine code is performed only once, so it survives SQL Server restarts for example.This is the reason that tests should run the newly-installed CLR function with a few dummy values before 'hot laps'. As different code paths are compiled to machine code on demand, the CLR implementation reaches its full potential.
My guess is that Jeff forgot about this and the first runs (on very small numbers of elements and element lengths) were running CLR code that was not yet fully compiled to native code. It really doesn't matter though - the test results are utterly conclusive.
Also, bear in mind that the CLR code is obvious and basic, and written by an amateur in ten minutes. Compare that to the effort expended to try to get 10% better performance out of the TSQL solutions - a language which just is not well suited to these types of tasks.
Consider also that the CLR function works with strings up to 2GB (MAX data types), Unicode, and is the only implementation to scale linearly with the number of elements; look at the composite graphs: the CLR lines almost overlap!
I should probably also mention that CLR routines work exclusively with Unicode, so the CLR routine has the overhead of converting the VARCHAR input strings to Unicode, and the returned results (also Unicode) are twice as big, at two bytes per character. That means the CLR routine is writing twice as much data to disk in Jeff's tests, and converting to or from Unicode for every row of input and output!
Paul
Yep... forgot about that. I'll try it a different way. Thanks, Paul.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 9, 2011 at 6:45 am
mark hutchinson (5/9/2011)
JeffThis article inspired me to write a TallyTable article for Access users/developers.
And thanks to the other commenters for your patience and enlightening comments.
Mark
Very cool. Hopefully, it'll be here on SSC but, if it's not, let us know what the URL is... I'd love to have a read on it. Thanks. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
May 9, 2011 at 6:48 am
Jeff
I just submitted it yesterday. It hasn't entered the editor review phase yet. I'd like to know what you think. I might be able to your incorporate suggestions during the editing process.
May 9, 2011 at 7:38 am
Thanks, Mark. I'll have a look.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 211 through 225 (of 990 total)
You must be logged in to reply to this topic. Login to reply