Writing Faster T-SQL

  • My old laptop at work with 500 MB of RAM that will be replaced this week took 9 seconds. My newer laptop at home took less than 1 second.  Actually, the time is just for displaying the numbers. 

    Aunt Kathi Data Platform MVP
    Author of Expert T-SQL Window Functions
    Simple-Talk Editor

  • If you have a subscription for SQL Server Magazine there is an article by Ben-Gan:

    http://www.sqlmag.com/Article/ArticleID/94376/sql_server_94376.html

    where he compares several different methods of creating a sequence of integers placed in a table and the "multiplicative" method mentioned above seems to be the fastest.

     


  • Nice method by Ben-Gan! He "wins", 'cause I just bought his book.

    You could do this as a recursive CTE; that would be cool.

    http://msdn2.microsoft.com/en-us/library/ms186243.aspx

    Signature is NULL

  • Than is impressive.  Thanks, Paul.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Not sure why I came back to this article, but I have to take back some of what I said about the code that Kathi found in Itzik Ben-Gan's book...

    WITH

    L0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows

    L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows

    L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows

    L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows

    L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B),--65536 rows

    L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B),--4294967296 rows

    num AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS N FROM L5)

    SELECT N FROM NUM WHERE N <= 1000000;

    From what I can see, the general concensus is that the code above takes about 7-9 seconds to run on most machines... it's also a compartively lot of code to "remember"...

    The basis of the code I use to generate Tally tables and large volumes of test data is much shorter and only takes a little over 2 seconds to do the same thing that Ben-Gan's code does (1,000,000 rows of sequential numbers)... I suppose something similar could be done in a CTE with Row_Number over without all the individual CTE's in the above code, but the Tally table method is almost 4 times faster...

    --===== Create and populate the test table on the fly

    -- (Takes just over 2 seconds to populate on my slow 6 year old computer)

    SELECT TOP 1000000

    IDENTITY(INT,1,1) AS Number

    INTO dbo.TestTable

    FROM Master.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2

    Matt Miller and I have been doing some extreme testing of some of the new functionality available in 2k5 and above... although it's highly convenient, it's not always better or faster (in fact, it's usually neither). Choose your "weapons" wisely 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Daniel Wilson (5/22/2007)


    thanks for the intro to CTE's! I know I've written the same SELECT statement into 2 parts of a query before. No more ... ah ... once we get all our customers to switch to 2005!

    Temp tables make for an effective 2000 workaround. I don't yet have 2005 (looking forward to October), so I haven't gotten to use CTEs yet.

    Advantages of temp tables (relative to CTE):

    Persists until the end of the calling context, useful if you need to use the same information for more than one query

    Can be indexed separately from the source table

    Disadvantages of temp tables:

    Does not inherit indices from source table(s)

    Persists until the end of the calling context, so you may need to remember to drop it when you're done with it. Memory resident longer than strictly necessary in some circumstances.

    Edited for clarity


    Puto me cogitare, ergo puto me esse.
    I think that I think, therefore I think that I am.

  • srienstr (3/31/2008)


    Daniel Wilson (5/22/2007)


    thanks for the intro to CTE's! I know I've written the same SELECT statement into 2 parts of a query before. No more ... ah ... once we get all our customers to switch to 2005!

    Temp tables make for an effective 2000 workaround. I don't yet have 2005 (looking forward to October), so I haven't gotten to use CTEs yet.

    Advantages:

    Persists, useful if you need to use the same information for more than one query

    Can be indexed separately from the source table

    Disadvantages:

    Does not inherit indices from source table(s)

    Persists, so you need to remember to drop it when you're done with it. Memory resident longer than strictly necessary in some circumstances.

    Hard to tell whether your Advantages/Disadvantages are about Temp Tables or CTE's. I think you're talking about temp tables. If so, I absolutely agree with the advantages and the first disadvantage.

    On the "Persists" disadvantage, I kinda of agree if the sproc is big and the Temp table is used only at the beginning. I just want everyone to be clear that Temp tables will automatically drop when the proc that created it ends or the session that created it disonnects.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (3/31/2008)


    On the "Persists" disadvantage, I kinda of agree if the sproc is big and the Temp table is used only at the beginning. I just want everyone to be clear that Temp tables will automatically drop when the proc that created it ends or the session that created it disonnects.

    True, but it could build up quite a bit if someone builds the table in QA and uses the same QA window throughout the day. I agree it's not nearly so much of an issue once the process is made into a sproc.


    Puto me cogitare, ergo puto me esse.
    I think that I think, therefore I think that I am.

  • srienstr (4/1/2008)


    Jeff Moden (3/31/2008)


    On the "Persists" disadvantage, I kinda of agree if the sproc is big and the Temp table is used only at the beginning. I just want everyone to be clear that Temp tables will automatically drop when the proc that created it ends or the session that created it disonnects.

    True, but it could build up quite a bit if someone builds the table in QA and uses the same QA window throughout the day. I agree it's not nearly so much of an issue once the process is made into a sproc.

    Well - garbage collection has always sucked, in pretty much every setting and every language I've ever run into. I personally think the SQL Server does a reasonable job at it, but still - handle your own trash, an automated system can never be as aggressive as you or know precisely when it's okay to get rid of it. I personally try to never leave anything up to the automated system to clean up.

    As to the advantages/disadvantages: it's actually rather easy to build CTE's that essentially "hide" temp tables (meaning - they materialize as worktables, etc...). don't fall for the CTE = no temp table. If the scenario calls for it, sometimes it's best to just bite the bullet and control the object you need to have: create the temp table, and have the ability to index it, alter it and drop it.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • srienstr (4/1/2008)


    Jeff Moden (3/31/2008)


    On the "Persists" disadvantage, I kinda of agree if the sproc is big and the Temp table is used only at the beginning. I just want everyone to be clear that Temp tables will automatically drop when the proc that created it ends or the session that created it disonnects.

    True, but it could build up quite a bit if someone builds the table in QA and uses the same QA window throughout the day. I agree it's not nearly so much of an issue once the process is made into a sproc.

    That's a horse of a differnet color... humans have to remember to cleanup their own mess. Easy to do, to... just start a new window and close the other one.

    Having data persist in a temp table during troubleshooting can be a huge advantage because you may not need to run ALL the code like you might with a CTE, Derived Table, or Table variable.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Dear Jeff: Please pardon me if this seems to be a naive question - however I wanted to understand why you have chosen to use the Newid() function here when generating the random numbers? Couldn't we have just used Rand() without any parameters as such?

    Saurabh Dwivedy
    ___________________________________________________________

    My Blog: http://tinyurl.com/dwivedys

    For better, quicker answers, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537

    Be Happy!
  • Saurabh Dwivedy (11/21/2009)


    Dear Jeff: Please pardon me if this seems to be a naive question - however I wanted to understand why you have chosen to use the Newid() function here when generating the random numbers? Couldn't we have just used Rand() without any parameters as such?

    Excellent question...

    The problem with RAND() is that it's like GETDATE() in a query... it's only calculated once per query. Oh, it'll change every time you run the query, but it will return all the same number for all rows in the query.

    The irony of it all is that the RAND() function requires a "random seed" in order to return different values in the same query. The only "random seed" available is NEWID().

    Run this and see...

    SELECT TOP 100

    RAND() AS UnseededRAND,

    RAND(CHECKSUM(NEWID())) AS SeededRAND

    FROM Master.sys.All_Columns ac1

    CROSS JOIN Master.sys.All_Columns ac2

    RAND can't take NEWID() as a seed... it has to be converted first. I used to convert it to VARBINARY when I first started using it. Some folks smarter than me observed that CHECKSUM will convert to INT quite a bit faster than a converstion to BINARY or VARBINARY.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Dear Jeff:

    You are truly awesome. Heartfelt thank you for explaining it so succinctly and beautifully. I was able to understand every syllable of it.

    Regards,

    Saurabh.

    Saurabh Dwivedy
    ___________________________________________________________

    My Blog: http://tinyurl.com/dwivedys

    For better, quicker answers, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537

    Be Happy!
  • Understanding is the key... thanks, Saurabh. :blush:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/21/2009)


    Understanding is the key... thanks, Saurabh. :blush:

    Actually, Jeff, in a query such as this:

    select getdate(), getdate();

    the getdate() function is actually called twice. Found this out during the duscussion regarding bitbuckets QotD a few days ago.

Viewing 15 posts - 16 through 30 (of 32 total)

You must be logged in to reply to this topic. Login to reply