September 4, 2007 at 2:08 am
Hi All,
can anybody send some performance tuning related documents or URL to me ?
Regards
Karthik
karthik
September 4, 2007 at 2:40 am
Assuning that you are using SQL Server 2000 (since these are the 7 and 2000 forums, and I hope you are not stuck with 7 ) I'd recommend
Microsoft SQL Server 2000(TM) Performance Tuning Technical Reference.
A short review by Andy Warren you can read on this site under http://www.sqlservercentral.com/columnists/awarren/reviewofsqlserver2000performancetuning.asp
There are many books though.
September 4, 2007 at 3:24 am
Kimberley Tripp's blog and articles
http://www.sqlskills.com/blogs/kimberly/
http://www.sqlskills.com/articles.asp
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 5, 2007 at 9:56 am
Thanks Andras and Gila !
karthik
September 7, 2007 at 10:33 pm
Oh, I dunno... I've seen those blogs and many more... they go on about how to configure hardware, how to split TempDB, how to defrag indexes, how to use (too many) indexes, how to override parallelism on cursors (that should be a hint on what's coming ), and yada-yada-yada... very few of them talk about how to NOT write the crap code that requires all of that other junk
--Jeff Moden
Change is inevitable... Change for the better is not.
September 8, 2007 at 12:04 am
Karthik,
I don't mean to sound so cynical about all the "experts"... wait, no... I do But no one teaches the "basics" anymore. I don't know what your previous programming experience has been, but in the old days, one of the first things they taught was how to count to 10. They don't teach that simple lesson in SQL most of the time... and when they do, it usually comes out something like this...
DECLARE @Counter INT SET @Counter = 1 WHILE @Counter <= 10 BEGIN PRINT @Counter SET @Counter = @Counter + 1 END
That, my friend, is what I refer to as "RBAR" which is pronounced as "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row". Let me show you in very simple terms why that's so bad.... let's run the same code but let's make it count to 100,000 instead...
DECLARE @Counter INT SET @Counter = 1 WHILE @Counter <= 100000 BEGIN PRINT @Counter SET @Counter = @Counter + 1 END
That takes 20 seconds or so to run... most will blame the fact that PRINT is relatively slow. So, let's populate a table and then select from the table...
CREATE TABLE #MyHead (SomeNumber INT)
DECLARE @Counter INT SET @Counter = 1 WHILE @Counter <= 100000 BEGIN INSERT INTO #MyHead (SomeNumber) VALUES (@Counter) SET @Counter = @Counter + 1 END SELECT * FROM #MyHead DROP TABLE #MyHead
Oh my God! It takes even longer (about 29 seconds) even if I run Query Analyzer in the high speed grid mode! Why? 100,000 individual INSERTs, that's why! RBAR ON STERIODS. NOT set-based.
Ok... so how to do it set based? The answer is, as I've suggested before, is you have to stop thinking about rows and think about columns. And, you have to think a bit outside the box! You have to ask yourself "What will automatically count?" and "How do I create a SET of rows from 1 to 100,000?" The answer is, of course, you have to know the software you're working with (SQL in this case). I know the IDENTITY function will automatically count during a SELECT/INTO and I know a cross-join will make more rows than you can shake a stick at AND I know how to use TOP or SET ROWCOUNT to limit the number of rows produced by the cross-join. So, here's the set based code to do the same as the loop above, sans the loop...
SELECT TOP 100000 IDENTITY(INT,1,1) AS SomeNumber INTO #MyHead FROM Master.dbo.SysComments sc1 --Contains more the 4000 rows on brand new server CROSS JOIN Master.dbo.SysComments sc2 --Contains more the 4000 rows on brand new server SELECT * FROM #MyHead DROP TABLE #MyHead
That only takes a bit over 1, I say ONE, second to run... that's about 29 times faster than the loop! And it's even easier in SQL Server 2005 when using ROWNUM!
Let's do another similar problem... let's create a two column table... one column called "SomeNumber" that contains a sequential number just like before. The other column will be called "RunningTotal" and will contain the sum of all the numbers as the rows progress, kinda like in a check book. First, we'll create the table with the sequential numbers and then calculate the running total... here we go...
--===== Create the table and intially populate with sequential numbers on the fly SELECT TOP 20000 IDENTITY(INT,1,1) AS SomeNumber, CAST(0 AS BIGINT) AS RunningTotal INTO #MyHead FROM Master.dbo.SysComments sc1 --Contains more the 4000 rows on brand new server CROSS JOIN Master.dbo.SysComments sc2 --Contains more the 4000 rows on brand new server
--===== Add a primary key on the SomeNumber column (every table should have a PK) ALTER TABLE #MyHead ADD CONSTRAINT PK_MyHead_SomeNumber PRIMARY KEY CLUSTERED (SomeNumber)
Now, here's how a lot of folks will make a running total... no loop, so it's frequently mistaken for set based but it's hundreds of times worse than a loop...
--===== Create the running total the way most people do it (the WRONG way!) UPDATE #MyHead SET RunningTotal = (SELECT SUM(SomeNumber) FROM #MyHead mh1 WHERE mh1.SomeNumber <= mh2.SomeNumber) FROM #MyHead mh2
SELECT * FROM #MyHead DROP TABLE #MyHead
Holy Moly! Almost 2 MINUTES to do a simple update on a lousy 20,000 rows! The reason why is that the code, although it has no loop in it, is hundreds of time worse than a loop because it has to "touch" over 200 MILLION rows internally because it executes the sub-query once for each row in the table! That's (X2+X)/2 rows or 200,010,000 rows! (See the following URL for a definition of what a "Triangular Join" is).
http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=338&messageid=359124&p=2
Because it doesn't have a loop, lot's of folks think that code is set based and then have the nerve to say set based solutions aren't as good as a cursor or While loop... they simply haven't explored all of the options... they failed to not only think outside the box, they don't even realize that they're in a box
So, how to do it set based? Like I said, you have to know the language... the following code runs like a raped-ape with it's hair on fire ... First, we'll start with a clean slate, just like before...
--===== Create the table and intially populate with sequential numbers on the fly SELECT TOP 20000 IDENTITY(INT,1,1) AS SomeNumber, CAST(0 AS INT) AS RunningTotal INTO #MyHead FROM Master.dbo.SysComments sc1 --Contains more the 4000 rows on brand new server CROSS JOIN Master.dbo.SysComments sc2 --Contains more the 4000 rows on brand new server
--===== Add a primary key on the SomeNumber column (every table should have a PK) ALTER TABLE #MyHead ADD CONSTRAINT PK_MyHead_SomeNumber PRIMARY KEY CLUSTERED (SomeNumber)
... and now the running total... don't blink when you run this or you'll miss it
/******************************************************************************** Create the running total using some set based tricks of the trade which requires that you know the software tool (SQL) that you're using. ********************************************************************************/
--===== Declare and preset local variable(s) DECLARE @RunningTotal INT SET @RunningTotal = 0 DECLARE @StartTime DATETIME --Runs so fast, we have to measure duration SET @StartTime = GETDATE()
--===== Do the running total... don't blink or you'll miss it! UPDATE #MyHead SET @RunningTotal = RunningTotal = @RunningTotal + SomeNumber FROM #MyHead WITH (INDEX(PK_MyHead_SomeNumber))
SELECT * FROM #MyHead DROP TABLE #MyHead
--===== Show the duration SELECT STR(DATEDIFF(ms,@StartTime,GETDATE())) + ' Milliseconds!'
I get "250 Milliseconds!" and the correct answer on my box... compared to the two minutes (120 seconds) the other solution took, this solution ran 480 times faster!
Now... THAT's the power of "true" set based programming and there a very few exceptions, ever.
So, let's sum this up... we performance tuned the other query by adding the correct index to the table and it still took 2 minutes to run. And, we ran it on pretty good hardware. So, if what most folks resort to as "performance tuning" were gasoline, you wouldn't have enough to run a sugar ant's mini-bike through a match box...
... the ONLY way to "performance tune" is to know the software you are using and then write good, intelligent code. In order to do that, you simply need to study and practice. My estimation of high speed code tuning is 2% hardware, 1% indexes, and 97% code dependent... plain and simple... crap code will kill even the most hardy of servers.
Whew! Sorry to be so long winded but I wanted to give you some good, practical examples of how to "performance tune" by writing good code that doesn't need to be tuned. Lemme know what you think about all of that...
--Jeff Moden
Change is inevitable... Change for the better is not.
September 8, 2007 at 12:57 am
P.S. Increasing the number of rows to 30,000 (just added 50% more rows), caused the time on the first running total example to more than double at 4 minutes and 19 seconds... the true set based solution also doubled... it took 436 MILLI-Seconds
AND, I changed the RunningTotal column and @RunningTotal variable to BIGINT and ran the set based example on a MILLION rows... it took only 12.986 seconds... how long do you think the first running total example would take? I'm not going to tie my machine up that long to find out
Like I said, the best type of performance tuning that you can do to your server is to write good code that doesn't need tuning
--Jeff Moden
Change is inevitable... Change for the better is not.
September 8, 2007 at 11:39 am
That is correct I met one of the writers of the above book he did not know the difference between UNION and UNION ALL and biggest was they did not cover DBCC ShowContig there are SQL Server 2000 tuning tools on the market just running ShowContig internals through Profiler. I was impressed with Ken England's book but he did not write a 2005 version of it.
http://www.simple-talk.com/community/forums/thread/1515.aspx
http://www.amazon.com/SQL-Performance-Tuning-Peter-Gulutzan/dp/0201791692
Kind regards,
Gift Peddie
September 8, 2007 at 7:04 pm
Yeah... how 'bout that? I'm thinking that a couple of us on this forum should get together and write our own book... one that has some useful info
--Jeff Moden
Change is inevitable... Change for the better is not.
September 9, 2007 at 12:22 am
> one that has some useful info
Which will not be read by anybody.
Did you notice how quickly they all lose interest when it comes to something what does not fit their minds?
_____________
Code for TallyGenerator
September 9, 2007 at 9:18 am
Heh... yeah... I did noticed that... easier for them to throw hardware at the problem even though that doesn't work. Still makes them feel like they did something.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 9, 2007 at 2:19 pm
I guess you lost me at "set-based". Your computation, although very elegant and incredibly efficient is not set-based at all. As a matter of fact - I'd venture to say you've managed to prove very effectively that under certain circumstances, set-based is not the most efficient way.
After all - if you look closely at the operation there are TWO conceptual steps (the calculation and the storage thereof). Your optimization (VERY impressive by the way) was to make the computation (how you come up with @runningSum) NOT set-based, but maintain the storage (e.g. the INSERT) as something set-based. The "pure" set-based operation (where you use the sub-query) is a lot less efficient (as you're managed to prove), but IS in fact "set-based" (since you could run that regardless of the physical order and STILL come up with the same result).
Remember - in Set Theory, sets are unordered lists, so "order is irrelevant", and order is a key concept in a running total (try changing the order around), so trying to perform this activity in a set-based environment means that you can't count on what "happened before or after" the record you're on (which your algorithm clearly doesn't do).
What you ARE taking advantage of is an algorithm which SCALES up much more efficiently (linear growth versus exponential growth). The fact that you're computing it using a single pass through 30K records, versus an algorithm that essentially scans 30K^2 or so records is where the difference is.
Interestingly enough, the part of the operation that CAN be done set-based is substantially more effective when you perform it in a set-based fashion. If you split the two operations apart, and compare your operation to that of a cursor - you should see them have roughly the SAME performance. In my case - it was 750ms vs 820 for the cursor (which I am attributing to the loop handling, etc...). Of course, there is no INSERT <table> select * from Cursor, so in that case your process is substantially more efficient there. Still - even with the individual inserts, the cursor version is a LOT more efficient than the subquery (4.5 seconds vs 297 seconds when running the sub-query version).
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
September 9, 2007 at 2:23 pm
By the way - who are the "they" you two are beating up on (the ones who lose interest in good programming when it doesn't "fit their views")? I must have missed part of THAT discussion.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
September 9, 2007 at 2:58 pm
Matt,
Who told you that subquery is "set based"?
If you heard it from a person don't listen to his/her judgements anymore.
If you read it in a book - throw it away.
Cause it's crap. Sorry.
Subquery is being executed FOR EACH LINE.
Because for each line of a query value(s) supplied to a subquery is different. So, it must be rerun from the beginning.
It's a classical example of HIDDEN CURSOR.
All your conclusions about set based approach are wrong - you just don't know what it is. You've learnt it from a wrong book.
Sorry.
_____________
Code for TallyGenerator
September 9, 2007 at 3:48 pm
You're missing my point. Jeff's method isn't Set-based either. It's using a SEQUENCE (a list with an ORDER), which is a vastly different concept. The calculation HAS to be done for each row as well (that's the definition of a "running total"). Like it or not - the fact that he's using a c++-style accumulator to come up with the number, and that that number is ENTIRELY dependent on the physical sort order of the underlying data, negates this as being set-based. I was just asking that you not ignore the very SAME hidden cursor buried within this piece of code.
Don't get me wrong - that's some of the very best code I've seen in some time. It performs at lightning speed, and consumes fewer resources than either the sub-query or the "traditional" cursor solution. It just doesn't fit his characterization as being set-based. In my mind it's about the same as saying "ms_FOREACHTABLE doesn't use a cursor" - it does, just not one I have to define.
What's making this that much more efficient is that he's "reusing" the total from the previous record to calculate the current record. In this case that in fact makes for a huge difference, and it's to his credit that he can think that way.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Viewing 15 posts - 1 through 15 (of 33 total)
You must be logged in to reply to this topic. Login to reply