Performance Tuning Guide

Question

Performance Tuning Guide

karthik M

One Orange Chip

Points: 29139
More actions
September 4, 2007 at 2:08 am

#153793

Hi All,

can anybody send some performance tuning related documents or URL to me ?

Regards
Karthik

karthik

Viewing 15 posts - 1 through 15 (of 33 total)

You must be logged in to reply to this topic. Login to reply

Andras Belokosztolszki SSC-Insane Points: 22031 More actions · Answer 1

Assuning that you are using SQL Server 2000 (since these are the 7 and 2000 forums, and I hope you are not stuck with 7 ) I'd recommend

Microsoft SQL Server 2000(TM) Performance Tuning Technical Reference.

A short review by Andy Warren you can read on this site under http://www.sqlservercentral.com/columnists/awarren/reviewofsqlserver2000performancetuning.asp

There are many books though.

Andras Belokosztolszki, MCPD, PhD
GoldenGate Software

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 2

Kimberley Tripp's blog and articles

http://www.sqlskills.com/blogs/kimberly/

http://www.sqlskills.com/articles.asp

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

karthik M One Orange Chip Points: 29139 More actions · Answer 3

karthik M

One Orange Chip

Points: 29139

September 5, 2007 at 9:56 am

#731557

Thanks Andras and Gila !

karthik

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 4

Oh, I dunno... I've seen those blogs and many more... they go on about how to configure hardware, how to split TempDB, how to defrag indexes, how to use (too many) indexes, how to override parallelism on cursors (that should be a hint on what's coming ), and yada-yada-yada... very few of them talk about how to NOT write the crap code that requires all of that other junk

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 5

Karthik,

I don't mean to sound so cynical about all the "experts"... wait, no... I do But no one teaches the "basics" anymore. I don't know what your previous programming experience has been, but in the old days, one of the first things they taught was how to count to 10. They don't teach that simple lesson in SQL most of the time... and when they do, it usually comes out something like this...

DECLARE @Counter INT
    SET @Counter = 1
  WHILE @Counter <= 10
  BEGIN
          PRINT @Counter 
            SET @Counter = @Counter + 1
    END

That, my friend, is what I refer to as "RBAR" which is pronounced as "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row". Let me show you in very simple terms why that's so bad.... let's run the same code but let's make it count to 100,000 instead...

DECLARE @Counter INT
    SET @Counter = 1
  WHILE @Counter <= 100000
  BEGIN
          PRINT @Counter 
            SET @Counter = @Counter + 1
    END

That takes 20 seconds or so to run... most will blame the fact that PRINT is relatively slow. So, let's populate a table and then select from the table...

 CREATE TABLE #MyHead (SomeNumber INT)

DECLARE @Counter INT
    SET @Counter = 1
  WHILE @Counter <= 100000
  BEGIN
         INSERT INTO #MyHead (SomeNumber) 
         VALUES (@Counter)
            SET @Counter = @Counter + 1
    END
 
 SELECT * FROM #MyHead
   DROP TABLE #MyHead

Oh my God! It takes even longer (about 29 seconds) even if I run Query Analyzer in the high speed grid mode! Why? 100,000 individual INSERTs, that's why! RBAR ON STERIODS. NOT set-based.

Ok... so how to do it set based? The answer is, as I've suggested before, is you have to stop thinking about rows and think about columns. And, you have to think a bit outside the box! You have to ask yourself "What will automatically count?" and "How do I create a SET of rows from 1 to 100,000?" The answer is, of course, you have to know the software you're working with (SQL in this case). I know the IDENTITY function will automatically count during a SELECT/INTO and I know a cross-join will make more rows than you can shake a stick at AND I know how to use TOP or SET ROWCOUNT to limit the number of rows produced by the cross-join. So, here's the set based code to do the same as the loop above, sans the loop...

 SELECT TOP 100000
        IDENTITY(INT,1,1) AS SomeNumber
   INTO #MyHead
   FROM      Master.dbo.SysComments sc1 --Contains more the 4000 rows on brand new server
  CROSS JOIN Master.dbo.SysComments sc2 --Contains more the 4000 rows on brand new server
 
 SELECT * FROM #MyHead
   DROP TABLE #MyHead

That only takes a bit over 1, I say ONE, second to run... that's about 29 times faster than the loop! And it's even easier in SQL Server 2005 when using ROWNUM!

Let's do another similar problem... let's create a two column table... one column called "SomeNumber" that contains a sequential number just like before. The other column will be called "RunningTotal" and will contain the sum of all the numbers as the rows progress, kinda like in a check book. First, we'll create the table with the sequential numbers and then calculate the running total... here we go...

--===== Create the table and intially populate with sequential numbers on the fly
 SELECT TOP 20000
        IDENTITY(INT,1,1) AS SomeNumber,
        CAST(0 AS BIGINT) AS RunningTotal
   INTO #MyHead   
   FROM      Master.dbo.SysComments sc1 --Contains more the 4000 rows on brand new server
  CROSS JOIN Master.dbo.SysComments sc2 --Contains more the 4000 rows on brand new server

--===== Add a primary key on the SomeNumber column (every table should have a PK)
  ALTER TABLE #MyHead
    ADD CONSTRAINT PK_MyHead_SomeNumber 
        PRIMARY KEY CLUSTERED (SomeNumber)

Now, here's how a lot of folks will make a running total... no loop, so it's frequently mistaken for set based but it's hundreds of times worse than a loop...

--===== Create the running total the way most people do it (the WRONG way!)
 UPDATE #MyHead
    SET RunningTotal = (SELECT SUM(SomeNumber) FROM #MyHead mh1 WHERE mh1.SomeNumber <= mh2.SomeNumber)
   FROM #MyHead mh2

 SELECT * FROM #MyHead
   DROP TABLE #MyHead

Holy Moly! Almost 2 MINUTES to do a simple update on a lousy 20,000 rows! The reason why is that the code, although it has no loop in it, is hundreds of time worse than a loop because it has to "touch" over 200 MILLION rows internally because it executes the sub-query once for each row in the table! That's (X²+X)/2 rows or 200,010,000 rows! (See the following URL for a definition of what a "Triangular Join" is).

http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=338&messageid=359124&p=2

Because it doesn't have a loop, lot's of folks think that code is set based and then have the nerve to say set based solutions aren't as good as a cursor or While loop... they simply haven't explored all of the options... they failed to not only think outside the box, they don't even realize that they're in a box

So, how to do it set based? Like I said, you have to know the language... the following code runs like a raped-ape with it's hair on fire ... First, we'll start with a clean slate, just like before...

--===== Create the table and intially populate with sequential numbers on the fly
 SELECT TOP 20000
        IDENTITY(INT,1,1) AS SomeNumber,
        CAST(0 AS INT) AS RunningTotal
   INTO #MyHead   
   FROM      Master.dbo.SysComments sc1 --Contains more the 4000 rows on brand new server
  CROSS JOIN Master.dbo.SysComments sc2 --Contains more the 4000 rows on brand new server

--===== Add a primary key on the SomeNumber column (every table should have a PK)
  ALTER TABLE #MyHead
    ADD CONSTRAINT PK_MyHead_SomeNumber 
        PRIMARY KEY CLUSTERED (SomeNumber)

... and now the running total... don't blink when you run this or you'll miss it

/********************************************************************************
 Create the running total using some set based tricks of the trade which requires
 that you know the software tool (SQL) that you're using.
********************************************************************************/

--===== Declare and preset local variable(s)
DECLARE @RunningTotal INT
    SET @RunningTotal = 0
DECLARE @StartTime    DATETIME --Runs so fast, we have to measure duration
    SET @StartTime    = GETDATE()

--===== Do the running total... don't blink or you'll miss it!
 UPDATE #MyHead
    SET @RunningTotal = RunningTotal = @RunningTotal + SomeNumber
   FROM #MyHead WITH (INDEX(PK_MyHead_SomeNumber))

 SELECT * FROM #MyHead
   DROP TABLE #MyHead

--===== Show the duration
 SELECT STR(DATEDIFF(ms,@StartTime,GETDATE())) + ' Milliseconds!'

I get "250 Milliseconds!" and the correct answer on my box... compared to the two minutes (120 seconds) the other solution took, this solution ran 480 times faster!

Now... THAT's the power of "true" set based programming and there a very few exceptions, ever.

So, let's sum this up... we performance tuned the other query by adding the correct index to the table and it still took 2 minutes to run. And, we ran it on pretty good hardware. So, if what most folks resort to as "performance tuning" were gasoline, you wouldn't have enough to run a sugar ant's mini-bike through a match box...

... the ONLY way to "performance tune" is to know the software you are using and then write good, intelligent code. In order to do that, you simply need to study and practice. My estimation of high speed code tuning is 2% hardware, 1% indexes, and 97% code dependent... plain and simple... crap code will kill even the most hardy of servers.

Whew! Sorry to be so long winded but I wanted to give you some good, practical examples of how to "performance tune" by writing good code that doesn't need to be tuned. Lemme know what you think about all of that...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 6

P.S. Increasing the number of rows to 30,000 (just added 50% more rows), caused the time on the first running total example to more than double at 4 minutes and 19 seconds... the true set based solution also doubled... it took 436 MILLI-Seconds

AND, I changed the RunningTotal column and @RunningTotal variable to BIGINT and ran the set based example on a MILLION rows... it took only 12.986 seconds... how long do you think the first running total example would take? I'm not going to tie my machine up that long to find out

Like I said, the best type of performance tuning that you can do to your server is to write good code that doesn't need tuning

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Gift Peddie SSC Guru Points: 73570 More actions · Answer 7

That is correct I met one of the writers of the above book he did not know the difference between UNION and UNION ALL and biggest was they did not cover DBCC ShowContig there are SQL Server 2000 tuning tools on the market just running ShowContig internals through Profiler. I was impressed with Ken England's book but he did not write a 2005 version of it.

http://www.simple-talk.com/community/forums/thread/1515.aspx

http://www.amazon.com/SQL-Performance-Tuning-Peter-Gulutzan/dp/0201791692

Kind regards,
Gift Peddie

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 8

Yeah... how 'bout that? I'm thinking that a couple of us on this forum should get together and write our own book... one that has some useful info

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Sergiy SSC Guru Points: 110208 More actions · Answer 9

> one that has some useful info

Which will not be read by anybody.

Did you notice how quickly they all lose interest when it comes to something what does not fit their minds?

_____________
Code for TallyGenerator

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 10

Heh... yeah... I did noticed that... easier for them to throw hardware at the problem even though that doesn't work. Still makes them feel like they did something.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 11

I guess you lost me at "set-based". Your computation, although very elegant and incredibly efficient is not set-based at all. As a matter of fact - I'd venture to say you've managed to prove very effectively that under certain circumstances, set-based is not the most efficient way.

After all - if you look closely at the operation there are TWO conceptual steps (the calculation and the storage thereof). Your optimization (VERY impressive by the way) was to make the computation (how you come up with @runningSum) NOT set-based, but maintain the storage (e.g. the INSERT) as something set-based. The "pure" set-based operation (where you use the sub-query) is a lot less efficient (as you're managed to prove), but IS in fact "set-based" (since you could run that regardless of the physical order and STILL come up with the same result).

Remember - in Set Theory, sets are unordered lists, so "order is irrelevant", and order is a key concept in a running total (try changing the order around), so trying to perform this activity in a set-based environment means that you can't count on what "happened before or after" the record you're on (which your algorithm clearly doesn't do).

What you ARE taking advantage of is an algorithm which SCALES up much more efficiently (linear growth versus exponential growth). The fact that you're computing it using a single pass through 30K records, versus an algorithm that essentially scans 30K^2 or so records is where the difference is.

Interestingly enough, the part of the operation that CAN be done set-based is substantially more effective when you perform it in a set-based fashion. If you split the two operations apart, and compare your operation to that of a cursor - you should see them have roughly the SAME performance. In my case - it was 750ms vs 820 for the cursor (which I am attributing to the loop handling, etc...). Of course, there is no INSERT <table> select * from Cursor, so in that case your process is substantially more efficient there. Still - even with the individual inserts, the cursor version is a LOT more efficient than the subquery (4.5 seconds vs 297 seconds when running the sub-query version).

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 12

By the way - who are the "they" you two are beating up on (the ones who lose interest in good programming when it doesn't "fit their views")? I must have missed part of THAT discussion.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Sergiy SSC Guru Points: 110208 More actions · Answer 13

Matt,

Who told you that subquery is "set based"?

If you heard it from a person don't listen to his/her judgements anymore.

If you read it in a book - throw it away.

Cause it's crap. Sorry.

Subquery is being executed FOR EACH LINE.

Because for each line of a query value(s) supplied to a subquery is different. So, it must be rerun from the beginning.

It's a classical example of HIDDEN CURSOR.

All your conclusions about set based approach are wrong - you just don't know what it is. You've learnt it from a wrong book.

Sorry.

_____________
Code for TallyGenerator

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 14

You're missing my point. Jeff's method isn't Set-based either. It's using a SEQUENCE (a list with an ORDER), which is a vastly different concept. The calculation HAS to be done for each row as well (that's the definition of a "running total"). Like it or not - the fact that he's using a c++-style accumulator to come up with the number, and that that number is ENTIRELY dependent on the physical sort order of the underlying data, negates this as being set-based. I was just asking that you not ignore the very SAME hidden cursor buried within this piece of code.

Don't get me wrong - that's some of the very best code I've seen in some time. It performs at lightning speed, and consumes fewer resources than either the sub-query or the "traditional" cursor solution. It just doesn't fit his characterization as being set-based. In my mind it's about the same as saying "ms_FOREACHTABLE doesn't use a cursor" - it does, just not one I have to define.

What's making this that much more efficient is that he's "reusing" the total from the previous record to calculate the current record. In this case that in fact makes for a huge difference, and it's to his credit that he can think that way.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?