Selecting from temp table outside SP?

  • I do not know much about large stored procedures and performance, and I did not develop the whole procedure. I have made some changes and thats it.

    I am really looking forward to get your comments on it.

    Here is the beast 🙂

    http://pastebin.com/fa0e03c0

    The stored procedure computes statistics about our customers and their application usage in a Citrix farm. Let me know if you need more details.

    Thanks in advance.

  • one thing that can bite you nasty is your local (temptb) copy of source data.

    1) create your temp tables up front in stead of using select into.

    2) only select the columns you need in stead of pulling over all row data

    3) create some indexes on the temptb for later usage

    since you use date, (year/month), make sure you add an extra column up front to insert an actual date your/month/01, so you don't need the date conversions anymore and still have full date functionallity.

    And fill up that date where you select your source data.

    (Index it)

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • ALZDBA (1/18/2009)


    1) create your temp tables up front in stead of using select into.

    What is the advantage?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • - one of the advantages is the awareness of what one is creating.

    ( cfr select only the columns you need)

    - if in the case, some columns should always be non-null, you can have that in your temptb's ddl.

    - Is there any performance gain ? I don't know for sure, I cannot find a ref right now. As I recall, keeping it outside of an explicit transaction sure does have it advantages with regards to tempdb catalog contention.

    [edited]

    here's a nice ref... stating two disadvantages of select into:

    ( sql7/sql2000 forum )

    - lock resources

    - recompiles

    http://sql-server-performance.com/Community/forums/p/12553/70520.aspx

    [/edited]

    - but select into will cache tempdb objects .... yes, but if you're going to add some indexes, they nolonger are..

    source: MS doc "Working with tempdb in SQL Server 2005"

    •SQL Server 2005 caches the temporary table that is created by using a CREATE TABLE or SELECT INTO statement. In some cases, temporary tables are not cached, such as when there is an explicit DDL on a temporary table after it is created, or if there is a named constraint on the temporary table. Temporary tables are also not cached if they are part of dynamic SQL or ad-hoc batch. ....

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • thj (1/16/2009)


    I do not know much about large stored procedures and performance, I am really looking forward to get your comments on it.

    snip..

    At not even 100 lines, your beast is definitely not what I consider "large". Only the amount of data it has to process is large and there are only a few lines of code, which might make it possible to optimize it for performance. I'll give it a try and see what I can suggest.

    Three things I see offhand, are

    1. you should first create all your temp tables, so your stored procedure does not have to be recompiled each a temp table is created. Yes I know this is trivial with respect to the time required to process a huge amount of data, but this is easy to do.

    Also, a SELECT COUNT(*) from #tmpContracts might be useful, just to get an idea of what your dealing with.

    2. inserting in the WHILE clause:

    If it is possible to rewrite the row-by-row INSERT statement by a single massive all-at-once INSERT statement, as I have posted before, this can considerably improve performance.

    3. your WHERE clause:

    [font="Courier New"]WHERE DATEPART(YEAR, uh.[date]) = DATEPART(YEAR, ExecDate)

    AND DATEPART(MONTH, uh.[date]) = DATEPART(MONTH, ExecDate)

    AND sdbs.SESSIONEND>=DateAdd(Minute,H.Minute,ExecDate)

    AND sdbs.SESSIONSTART<DateAdd(Minute,H.Minute+@minutInterval,ExecDate)[/font]

    Using Date manipulation functions in a where clause means that it is not possible to use indexes, since the results of these functions is calculated on the fly.

    Off-hand, without checking to see how much it might help, I will look into creating additional columns and in a single update do the date calculations, on which indexes could be created and use these new columns in the WHERE clause.

    In the meantime, since I do not have access to the truck load of data you have to go through, I cannot easily determine wich section of the code needs attention first. I can suggest the following:

    - in all your selects, limit the scope of the data set by adding the TOP clause, i.e. [font="Courier New"]SELECT TOP 50000 ...[/font] just so it does not take eternity to run.

    - then insert the following commands:

    1. at the beginning of your routine:

    [font="Courier New"]DECLARE @dt_LastDate datetime, @ld_NewDate datetime

    SET @dt_LastDate = GETDATE()[/font]

    2. Before and atfer each of your major sections (where you put the PRINT statements) add the following commands:

    [font="Courier New"]SET @ld_NewDate = GETDATE

    SELECT @ld_NewDate, @dt_LastDate, DATEDIFF(Ms, @ld_NewDate, @dt_LastDate)

    SET @dt_LastDate = @ld_NewDate [/font]

    This will show your the most time is spent, i.e. which section of the code to optimize first.

    From your comments, it looks like you are a bit uneasy about stored procedures. The best way to gain familiarity and confidence is to get your hands dirty and actually try to improve it. Second, ask for help as you have done in this forum. But working it out yourself is the best teacher.

    Please note that since I am myself not comfortable with execution plans, I am probably not offering the best suggestions to go about it. Note that my experience is nowhere near the 10,000-point contributors.

    Hoping others will be more useful, I encourage you to become familiar and experienced in stored procs, since you have encountered a problematic one and you are the one that has to deal with it. Just think of how it will be perceived by others if you can realy tame the best and reduce the run time to a few hours ... Just that might make it very worthwhile to do it outside job hours, on your own clock.

    Please let us know how it all ends up.

  • Thanks for all the good comments. They are much appreciated!

    I will look into optimizing the SP in this week, when I have reported all the statistics to our sales department. Then I will return with the results. And probaly more questions. Thanks again.

  • Another thing I just thought of:

    At the first PRINT, abort the stored proc by entering the command

    RETURN -666

    So you can immediately get the the results, without having a few hours to see the results of all the steps, just begin with obtaining the info from the first step.

    Then move that line to the next PRINT statement to obtain the results for the two first steps.

    And so on.

    Another thing you can do, if authorized to do this is to create a "debug_log"

    table and instead of the return -666 statements, use the following

    [font="Courier New"]INSERT INTO debug_log (StepNo, New_Date, Old_Date, Elapsed_Time)

    SELECT 'Step 1', @ls_NewDate, @ls_OldDate, @ld_DateDiff).[/font]

    You could then periodically look at the contents of debug_log table (from a distinct Query Analyzer window) to see how your stored proc is going along without having to stop it at each step and without having to wait until it completes.

    Good luck.

  • Hi again guys

    Sorry that I haven't replied here for a long time, but I still haven't got time for optimizing the SP. My boss won't allow ressources for that task as it is now. It works so he is happy. Hopefully I will be able to return to this task in the near future.

    Just wanted to let you know. I appreciate your help! Thanks again.

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply