Bad performance in TSQL

  • I have a stored procedure which executes the following SQL statement:

    INSERT INTO CONTRACT_TO_IND_PARTY (COMP_CODE, LESSOR_NUM, MAIN_CONTRACT_NUM

    , SUB_CONTRACT_NUM, PARTY_ID, IND_TITLE)

    SELECT DISTINCT

    CT_COMP_CODE

    , CT_LESSOR_NUM

    , CT_MAIN_CONTRACT_NUM

    , CT_SUB_CONTRACT_NUM

    , PTY_Party_Id

    , PARSENAME(PTY_Party_Comments, 4) --Title

    FROM PT_PTY_Party_View, RAW_CT_Contract_Display_View

    WHERE

    CAST(SUBSTRING(PTY_Party_Code, 1, 10) AS int) = CT_COMP_CODE

    ANDCAST(SUBSTRING(PTY_Party_Code, 11, 10) AS int) = CT_LESSOR_NUM

    ANDCAST(SUBSTRING(PTY_Party_Code, 21, 10) AS int) = CT_MAIN_CONTRACT_NUM

    ANDPTY_Party_Code IS NOT NULL

    ANDPTY_Party_Comments IS NOT NULL

    ORDER BY CT_COMP_CODE, CT_LESSOR_NUM, CT_MAIN_CONTRACT_NUM, CT_SUB_CONTRACT_NUM

    This query takes on the order of six hours to complete. If I run it manually, it takes about a minute. The explain plan for the select statement isn't terrible. It overestimates the amount of rows being returned by about 4x, but like I said, normally completes within a minute. None of the tables seems to have any locks on them.

  • A couple of things:

    How many rows are in each table? Are being inserted?

    How are you running it when it takes 6 hours to run? 1 minute to run?

    Remove 'AND PTY_Party_Code IS NOT NULL' because when parsing the field, NULL will never equal CT_COMP_CODE, CT_LESSOR_NUM or CT_MAIN_CONTRACT_NUM

    Change the join syntax to:

    FROM PT_PTY_Party_View

    INNER JOIN RAW_CT_Contract_Display_View

    ON CAST(SUBSTRING(PTY_Party_Code, 1, 10) AS int) = CT_COMP_CODE

    AND CAST(SUBSTRING(PTY_Party_Code, 11, 10) AS int) = CT_LESSOR_NUM

    AND CAST(SUBSTRING(PTY_Party_Code, 21, 10) AS int) = CT_MAIN_CONTRACT_NUM

    AND PTY_Party_Comments IS NOT NULL

    Why are you using SELECT DISTINCT?  The need for DISTINCT normally indicates incomplete join qualification.

    Why are you ordering the the INSERT?  There is no need for this.

    Does the show plan include 'parallelism'?  If so, try it with OPTION (MAXDOP 1) at the end.

    Parsing PTY_Party_Code is a terrible idea.  Considered storing it as 3 separate integer columns so that indexes can be used.

     

  • Functions applied to a column in WHERE cause cause fulll table scan. Such queries are slow by design.

    You need to dismantle PTY_Party_Code to the parts you need and have them in separate indexed integer columns.

    I know, database design is boring subject, but sometimes it's really helpful. Read some rules about it.

    _____________
    Code for TallyGenerator

  • How many rows are in each table? Are being inserted?

    About 267,000 rows are being inserted.

    How are you running it when it takes 6 hours to run?

    Inside a stored procedure which does a bunch of similar stuff.

    1 minute to run?

    In a Query Analyzer window.

    Remove 'AND PTY_Party_Code IS NOT NULL' because when parsing the field, NULL will never equal CT_COMP_CODE, CT_LESSOR_NUM or CT_MAIN_CONTRACT_NUM

    Good point.

    Change the join syntax to:

    FROM PT_PTY_Party_View

    INNER JOIN RAW_CT_Contract_Display_View

    ON CAST(SUBSTRING(PTY_Party_Code, 1, 10) AS int) = CT_COMP_CODE

    AND CAST(SUBSTRING(PTY_Party_Code, 11, 10) AS int) = CT_LESSOR_NUM

    AND CAST(SUBSTRING(PTY_Party_Code, 21, 10) AS int) = CT_MAIN_CONTRACT_NUM

    AND PTY_Party_Comments IS NOT NULL

    When I did this the explain plans were exactly the same. The explain plan expects to end up with about 850k records; could it be throttling?

    Why are you using SELECT DISTINCT? The need for DISTINCT normally indicates incomplete join qualification.

    This is inserting into a temp table. After that, we look up various IDs for each combination of comp, lessor and main. You may be right, though. Generally there's a fourth element of the key.

    Why are you ordering the the INSERT? There is no need for this.

    I wondered that too, but since I got handed this code this afternoon figured there might be a reason. We do have some stuff that cares about order of inserts. And it doesn't affect the one-minute plan by much without it, since it still has to do the distinct.

    Does the show plan include 'parallelism'? If so, try it with OPTION (MAXDOP 1) at the end.

    Parsing PTY_Party_Code is a terrible idea. Considered storing it as 3 separate integer columns so that indexes can be used.

    Agreed.

    I figured that since my main problem was that it had such drastically different run times inside and outisde of a stored procedure, niggling stuff like the order by and parsing pty_party_code would come out in the wash.

  • Try adding "WITH RECOMPILE" to the stored procedure definition, and see if that corrects the issue.

  • [EDIT: THIS FIRST PARA WAS WRITTEN WITHOUT READING THE PREVIOUS POSTS PROPERLY] If that doesn't do it, I'd be surprised, but if not, can you see any difference between the showplan outputs, and if so post them? A theoretical possibility, especially if your views are complex, is that your free sql is reusing a 'lucky' query plan that somehow the sp doesn't pick up on, even when looking to generate a fresh execution plan. The optimiser does not always work very well, even with fresh stats, especially with very complex queries. You could clear the plan buffers with DBCC FREEPROCCACHE to create a level playing field, but only after you save the two different versions of the query plan.

    Alternatively, any possibility of schema ambiguities in your table names being resolved differently? The stored proc may have an owner other than you, possibly with different permissions. relatedly, I seem to remember there is also the issue of recompiles being caused by non-schema-modified table names - so perhaps the problem is not that the sp is reusing an old plan, but that it is recompiling evry time? You could run a trace for recompiles, or just try schema-qualifying all your object references (I don't think referenced views have to do the same internally for the sp to reuse plans, but I may be wrong).

    The apparently independent problem with the non-1nf data could perhaps be temporarily resolved with indexed calculated columns or views in v8+. Even 1 minute is quite a long time, after all. Of course I don't know how many tables of what size are in your views, and what optimisation they might bear, or their parents, or their parents' parents...

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

  • Jeffrey, because you have 3 "Full Table" joins in your query you've got 267,000 * 3 = 800k lines in Hash Join. Plus some other minor duplicates you must have gives you those 850k. SQL Server has to record it to a table in temdb and work out the data in this table. Take into consideration that there are no indexes, no statistics on those 850k rows, your DISTINCT, your ORDER go with full table scan.

    You cannot do anything with it unless you have those 3 indexed columns.

    And get rid of that ORDER BY. INSERT applies its own order without paying any attention to the order of data being inserted. Unless the targeted table does not have any index on it.

    _____________
    Code for TallyGenerator

  • You'd only be interested in the INSERT order if there was a identity column. Sergiy, are you saying that this won't work and new records might be 'incorrectly' numbered?

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

  • If you use identity column you don't really care which line is assigned to which number.

    Identity is just a "shortcut" for natural key for data in table. And those columns making this key must be invluded into unique clustered index.

    If you need to use identity as only unique constraint then your data is a mess.

    _____________
    Code for TallyGenerator

  • Sergiy, some data is 'a mess'. Some data refers to its own processing. Someone somewhere has a legitimate reason for doing what I describe. So the answer to the question is...?

    >And those columns making this key must be included into unique clustered index.
    What on earth gives you that idea?

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

  • Answer is "There is no point".

    _____________
    Code for TallyGenerator

  • No point in unique clustered index on composite key with a surrogate? Don't be too hard on yourself, there might be a point sometimes. It's just not necessarily a good idea, and certainly not mandatory.

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

  • I ended up making a temp table and adding an index to it. It sped the query up enormously.

    It doesn't quite solve my underlying problem, though. What's the best way to see explain plans for the middle of a stored procedure? Run explain on the whole thing?

    I think part of the problem was using pt_party_view with old stats information.

    The MAXDOP 1 thing worked extremely well, too. Thanks for the help.

  • Also, check the definition of the views.  I've worked on systems with views nesting views many levels deep, with each level using the same tables, causing them to appear in the plan many times.

  • Ah yes, nested views:

    >Of course I don't know how many tables of what size are in your views, and what optimisation they might bear, or their parents, or their parents' parents...

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply