Performance Tuning Guide

  • Hi All,

     

    can anybody send some performance tuning related documents or URL to me ?

     

    Regards

    Karthik

     

     

    karthik

  • Assuning that you are using SQL Server 2000 (since these are the 7 and 2000 forums, and I hope you are not stuck with 7 ) I'd recommend

    Microsoft SQL Server 2000(TM) Performance Tuning Technical Reference.

    A short review by Andy Warren you can read on this site under http://www.sqlservercentral.com/columnists/awarren/reviewofsqlserver2000performancetuning.asp

    There are many books though.


    Andras Belokosztolszki, MCPD, PhD
    GoldenGate Software

  • Kimberley Tripp's blog and articles

    http://www.sqlskills.com/blogs/kimberly/

    http://www.sqlskills.com/articles.asp

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks Andras and Gila !

    karthik

  • Oh, I dunno... I've seen those blogs and many more... they go on about how to configure hardware, how to split TempDB, how to defrag indexes, how to use (too many) indexes, how to override parallelism on cursors (that should be a hint on what's coming ), and yada-yada-yada... very few of them talk about how to NOT write the crap code that requires all of that other junk

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Karthik,

    I don't mean to sound so cynical about all the "experts"... wait, no... I do   But no one teaches the "basics" anymore.  I don't know what your previous programming experience has been, but in the old days, one of the first things they taught was how to count to 10.  They don't teach that simple lesson in SQL most of the time... and when they do, it usually comes out something like this...

    DECLARE @Counter INT
        SET @Counter = 1
      WHILE @Counter <= 10
      BEGIN
              PRINT @Counter 
                SET @Counter = @Counter + 1
        END

    That, my friend, is what I refer to as "RBAR" which is pronounced as "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".   Let me show you in very simple terms why that's so bad.... let's run the same code but let's make it count to 100,000 instead...

    DECLARE @Counter INT
        SET @Counter = 1
      WHILE @Counter <= 100000
      BEGIN
              PRINT @Counter 
                SET @Counter = @Counter + 1
        END

    That takes 20 seconds or so to run... most will blame the fact that PRINT is relatively slow.  So, let's populate a table and then select from the table...

     CREATE TABLE #MyHead (SomeNumber INT)
    DECLARE @Counter INT
        SET @Counter = 1
      WHILE @Counter <= 100000
      BEGIN
             INSERT INTO #MyHead (SomeNumber) 
             VALUES (@Counter)
                SET @Counter = @Counter + 1
        END
     
     SELECT * FROM #MyHead
       DROP TABLE #MyHead

    Oh my God!  It takes even longer (about 29 seconds) even if I run Query Analyzer in the high speed grid mode!  Why?  100,000 individual INSERTs, that's why!  RBAR ON STERIODS.  NOT set-based.

    Ok... so how to do it set based?  The answer is, as I've suggested before, is you have to stop thinking about rows and think about columns.  And, you have to think a bit outside the box!  You have to ask yourself "What will automatically count?"  and "How do I create a SET of rows from 1 to 100,000?"  The answer is, of course, you have to know the software you're working with (SQL in this case).  I know the IDENTITY function will automatically count during a SELECT/INTO and I know a cross-join will make more rows than you can shake a stick at AND I know how to use TOP or SET ROWCOUNT to limit the number of rows produced by the cross-join.  So, here's the set based code to do the same as the loop above, sans the loop...

     SELECT TOP 100000
            IDENTITY(INT,1,1) AS SomeNumber
       INTO #MyHead
       FROM      Master.dbo.SysComments sc1 --Contains more the 4000 rows on brand new server
      CROSS JOIN Master.dbo.SysComments sc2 --Contains more the 4000 rows on brand new server
     
     SELECT * FROM #MyHead
       DROP TABLE #MyHead

    That only takes a bit over 1, I say ONE, second to run... that's about 29 times faster than the loop!  And it's even easier in SQL Server 2005 when using ROWNUM!

    Let's do another similar problem... let's create a two column table... one column called "SomeNumber" that contains a sequential number just like before.  The other column will be called "RunningTotal" and will contain the sum of all the numbers as the rows progress, kinda like in a check book.  First, we'll create the table with the sequential numbers and then calculate the running total... here we go...

    --===== Create the table and intially populate with sequential numbers on the fly
     SELECT TOP 20000
            IDENTITY(INT,1,1) AS SomeNumber,
            CAST(0 AS BIGINT) AS RunningTotal
       INTO #MyHead   
       FROM      Master.dbo.SysComments sc1 --Contains more the 4000 rows on brand new server
      CROSS JOIN Master.dbo.SysComments sc2 --Contains more the 4000 rows on brand new server
    --===== Add a primary key on the SomeNumber column (every table should have a PK)
      ALTER TABLE #MyHead
        ADD CONSTRAINT PK_MyHead_SomeNumber 
            PRIMARY KEY CLUSTERED (SomeNumber)

    Now, here's how a lot of folks will make a running total... no loop, so it's frequently mistaken for set based but it's hundreds of times worse than a loop...

    --===== Create the running total the way most people do it (the WRONG way!)
     UPDATE #MyHead
        SET RunningTotal = (SELECT SUM(SomeNumber) FROM #MyHead mh1 WHERE mh1.SomeNumber <= mh2.SomeNumber)
       FROM #MyHead mh2
     SELECT * FROM #MyHead
       DROP TABLE #MyHead

    Holy Moly!  Almost 2 MINUTES to do a simple update on a lousy 20,000 rows!  The reason why is that the code, although it has no loop in it, is hundreds of time worse than a loop because it has to "touch" over 200 MILLION rows internally because it executes the sub-query once for each row in the table!  That's (X2+X)/2 rows or 200,010,000 rows!  (See the following URL for a definition of what a "Triangular Join" is).

    http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=338&messageid=359124&p=2

    Because it doesn't have a loop, lot's of folks think that code is set based and then have the nerve to say set based solutions aren't as good as a cursor or While loop... they simply haven't explored all of the options... they failed to not only think outside the box, they don't even realize that they're in a box

    So, how to do it set based?  Like I said, you have to know the language... the following code runs like a raped-ape with it's hair on fire ... First, we'll start with a clean slate, just like before...

    --===== Create the table and intially populate with sequential numbers on the fly
     SELECT TOP 20000
            IDENTITY(INT,1,1) AS SomeNumber,
            CAST(0 AS INT) AS RunningTotal
       INTO #MyHead   
       FROM      Master.dbo.SysComments sc1 --Contains more the 4000 rows on brand new server
      CROSS JOIN Master.dbo.SysComments sc2 --Contains more the 4000 rows on brand new server
    --===== Add a primary key on the SomeNumber column (every table should have a PK)
      ALTER TABLE #MyHead
        ADD CONSTRAINT PK_MyHead_SomeNumber 
            PRIMARY KEY CLUSTERED (SomeNumber) 

    ... and now the running total... don't blink when you run this or you'll miss it

    /********************************************************************************
     Create the running total using some set based tricks of the trade which requires
     that you know the software tool (SQL) that you're using.
    ********************************************************************************/
    --===== Declare and preset local variable(s)
    DECLARE @RunningTotal INT
        SET @RunningTotal = 0
    DECLARE @StartTime    DATETIME --Runs so fast, we have to measure duration
        SET @StartTime    = GETDATE()
    --===== Do the running total... don't blink or you'll miss it!
     UPDATE #MyHead
        SET @RunningTotal = RunningTotal = @RunningTotal + SomeNumber
       FROM #MyHead WITH (INDEX(PK_MyHead_SomeNumber))
     SELECT * FROM #MyHead
       DROP TABLE #MyHead
    --===== Show the duration
     SELECT STR(DATEDIFF(ms,@StartTime,GETDATE())) + ' Milliseconds!'

    I get "250 Milliseconds!" and the correct answer on my box... compared to the two minutes (120 seconds) the other solution took, this solution ran 480 times faster!

    Now... THAT's the power of "true" set based programming and there a very few exceptions, ever.

    So, let's sum this up... we performance tuned the other query by adding the correct index to the table and it still took 2 minutes to run.  And, we ran it on pretty good hardware.  So, if what most folks resort to as "performance tuning" were gasoline, you wouldn't have enough to run a sugar ant's mini-bike through a match box...

    ... the ONLY way to "performance tune" is to know the software you are using and then write good, intelligent code.  In order to do that, you simply need to study and practice.  My estimation of high speed code tuning is 2% hardware, 1% indexes, and 97% code dependent... plain and simple... crap code will kill even the most hardy of servers.

    Whew!  Sorry to be so long winded but I wanted to give you some good, practical examples of how to "performance tune" by writing good code that doesn't need to be tuned.  Lemme know what you think about all of that...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • P.S.  Increasing the number of rows to 30,000 (just added 50% more rows), caused the time on the first running total example to more than double at 4 minutes and 19 seconds... the true set based solution also doubled... it took 436 MILLI-Seconds

    AND, I changed the RunningTotal column and @RunningTotal variable to BIGINT and ran the set based example on a MILLION rows... it took only 12.986 seconds... how long do you think the first running total example would take?  I'm not going to tie my machine up that long to find out

    Like I said, the best type of performance tuning that you can do to your server is to write good code that doesn't need tuning

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • That is correct I met one of the writers of the above book he did not know the difference between UNION and UNION ALL and biggest was they did not cover DBCC ShowContig there are SQL Server 2000 tuning tools on the market just running ShowContig internals through Profiler.  I was impressed with Ken England's book but he did not write a 2005 version of it.

    http://www.simple-talk.com/community/forums/thread/1515.aspx

    http://www.amazon.com/SQL-Performance-Tuning-Peter-Gulutzan/dp/0201791692

    Kind regards,
    Gift Peddie

  • Yeah... how 'bout that?  I'm thinking that a couple of us on this forum should get together and write our own book... one that has some useful info

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • > one that has some useful info

    Which will not be read by anybody.

    Did you notice how quickly they all lose interest when it comes to something what does not fit their minds?

    _____________
    Code for TallyGenerator

  • Heh... yeah... I did noticed that... easier for them to throw hardware at the problem even though that doesn't work.  Still makes them feel like they did something.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I guess you lost me at "set-based".  Your computation, although very elegant and incredibly efficient is not set-based at all.  As a matter of fact - I'd venture to say you've managed to prove very effectively that under certain circumstances, set-based is not the most efficient way.

    After all - if you look closely at the operation there are TWO conceptual steps (the calculation and the storage thereof).  Your optimization (VERY impressive by the way) was to make the computation (how you come up with @runningSum) NOT set-based, but maintain the storage (e.g. the INSERT) as something set-based.  The "pure" set-based operation (where you use the sub-query) is a lot less efficient (as you're managed to prove), but IS in fact "set-based" (since you could run that regardless of the physical order and STILL come up with the same result).

    Remember - in Set Theory, sets are unordered lists, so "order is irrelevant", and order is a key concept in a running total (try changing the order around), so trying to perform this activity in a set-based environment means that you can't count on what "happened before or after" the record you're on (which your algorithm clearly doesn't do). 

    What you ARE taking advantage of is an algorithm which SCALES up much more efficiently (linear growth versus exponential growth).  The fact that you're computing it using a single pass through 30K records, versus an algorithm that essentially scans 30K^2 or so records is where the difference is.

    Interestingly enough, the part of the operation that CAN be done set-based is substantially more effective when you perform it in a set-based fashion.  If you split the two operations apart, and compare your operation to that of a cursor - you should see them have roughly the SAME performance.  In my case - it was  750ms vs 820 for the cursor (which I am attributing to the loop handling, etc...).  Of course, there is no INSERT <table> select * from Cursor, so in that case your process is substantially more efficient there.  Still - even with the individual inserts, the cursor version is a LOT more efficient than the subquery (4.5 seconds vs 297 seconds when running the sub-query version).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • By the way - who are the "they" you two are beating up on (the ones who lose interest in good programming when it doesn't "fit their views")?  I must have missed part of THAT discussion. 

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt,

    Who told you that subquery is "set based"?

    If you heard it from a person don't listen to his/her judgements anymore.

    If you read it in a book - throw it away.

    Cause it's crap. Sorry.

    Subquery is being executed FOR EACH LINE.

    Because for each line of a query value(s) supplied to a subquery is different. So, it must be rerun from the beginning.

    It's a classical example of HIDDEN CURSOR.

    All your conclusions about set based approach are wrong - you just don't know what it is. You've learnt it from a wrong book.

    Sorry.

    _____________
    Code for TallyGenerator

  • You're missing my point.  Jeff's method isn't Set-based either.  It's using a SEQUENCE (a list with an ORDER), which is a vastly different concept.  The calculation HAS to be done for each row as well (that's the definition of a "running total").  Like it or not - the fact that he's using a c++-style accumulator to come up with the number, and that that number is ENTIRELY dependent on the physical sort order of the underlying data, negates this as being set-based.  I was just asking that you not ignore the very SAME hidden cursor buried within this piece of code.

    Don't get me wrong - that's some of the very best code I've seen in some time. It performs at lightning speed, and consumes fewer resources than either the sub-query or the "traditional" cursor solution. It just doesn't fit his characterization as being set-based.  In my mind it's about the same as saying "ms_FOREACHTABLE doesn't use a cursor" - it does, just not one I have to define.

    What's making this that much more efficient is that he's "reusing" the total from the previous record to calculate the current record.  In this case that in fact makes for a huge difference, and it's to his credit that he can think that way.

     

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 15 posts - 1 through 15 (of 33 total)

You must be logged in to reply to this topic. Login to reply