May 31, 2006 at 4:39 pm
I have a stored procedure which executes the following SQL statement:
INSERT INTO CONTRACT_TO_IND_PARTY (COMP_CODE, LESSOR_NUM, MAIN_CONTRACT_NUM
, SUB_CONTRACT_NUM, PARTY_ID, IND_TITLE)
SELECT DISTINCT
CT_COMP_CODE
, CT_LESSOR_NUM
, CT_MAIN_CONTRACT_NUM
, CT_SUB_CONTRACT_NUM
, PTY_Party_Id
, PARSENAME(PTY_Party_Comments, 4) --Title
FROM PT_PTY_Party_View, RAW_CT_Contract_Display_View
WHERE
CAST(SUBSTRING(PTY_Party_Code, 1, 10) AS int) = CT_COMP_CODE
ANDCAST(SUBSTRING(PTY_Party_Code, 11, 10) AS int) = CT_LESSOR_NUM
ANDCAST(SUBSTRING(PTY_Party_Code, 21, 10) AS int) = CT_MAIN_CONTRACT_NUM
ANDPTY_Party_Code IS NOT NULL
ANDPTY_Party_Comments IS NOT NULL
ORDER BY CT_COMP_CODE, CT_LESSOR_NUM, CT_MAIN_CONTRACT_NUM, CT_SUB_CONTRACT_NUM
This query takes on the order of six hours to complete. If I run it manually, it takes about a minute. The explain plan for the select statement isn't terrible. It overestimates the amount of rows being returned by about 4x, but like I said, normally completes within a minute. None of the tables seems to have any locks on them.
May 31, 2006 at 4:58 pm
A couple of things:
How many rows are in each table? Are being inserted?
How are you running it when it takes 6 hours to run? 1 minute to run?
Remove 'AND PTY_Party_Code IS NOT NULL' because when parsing the field, NULL will never equal CT_COMP_CODE, CT_LESSOR_NUM or CT_MAIN_CONTRACT_NUM
Change the join syntax to:
FROM PT_PTY_Party_View
INNER JOIN RAW_CT_Contract_Display_View
ON CAST(SUBSTRING(PTY_Party_Code, 1, 10) AS int) = CT_COMP_CODE
AND CAST(SUBSTRING(PTY_Party_Code, 11, 10) AS int) = CT_LESSOR_NUM
AND CAST(SUBSTRING(PTY_Party_Code, 21, 10) AS int) = CT_MAIN_CONTRACT_NUM
AND PTY_Party_Comments IS NOT NULL
Why are you using SELECT DISTINCT? The need for DISTINCT normally indicates incomplete join qualification.
Why are you ordering the the INSERT? There is no need for this.
Does the show plan include 'parallelism'? If so, try it with OPTION (MAXDOP 1) at the end.
Parsing PTY_Party_Code is a terrible idea. Considered storing it as 3 separate integer columns so that indexes can be used.
May 31, 2006 at 5:22 pm
Functions applied to a column in WHERE cause cause fulll table scan. Such queries are slow by design.
You need to dismantle PTY_Party_Code to the parts you need and have them in separate indexed integer columns.
I know, database design is boring subject, but sometimes it's really helpful. Read some rules about it.
_____________
Code for TallyGenerator
May 31, 2006 at 5:32 pm
How many rows are in each table? Are being inserted?
About 267,000 rows are being inserted.
How are you running it when it takes 6 hours to run?
Inside a stored procedure which does a bunch of similar stuff.
1 minute to run?
In a Query Analyzer window.
Remove 'AND PTY_Party_Code IS NOT NULL' because when parsing the field, NULL will never equal CT_COMP_CODE, CT_LESSOR_NUM or CT_MAIN_CONTRACT_NUM
Good point.
Change the join syntax to:
FROM PT_PTY_Party_View
INNER JOIN RAW_CT_Contract_Display_View
ON CAST(SUBSTRING(PTY_Party_Code, 1, 10) AS int) = CT_COMP_CODE
AND CAST(SUBSTRING(PTY_Party_Code, 11, 10) AS int) = CT_LESSOR_NUM
AND CAST(SUBSTRING(PTY_Party_Code, 21, 10) AS int) = CT_MAIN_CONTRACT_NUM
AND PTY_Party_Comments IS NOT NULL
When I did this the explain plans were exactly the same. The explain plan expects to end up with about 850k records; could it be throttling?
Why are you using SELECT DISTINCT? The need for DISTINCT normally indicates incomplete join qualification.
This is inserting into a temp table. After that, we look up various IDs for each combination of comp, lessor and main. You may be right, though. Generally there's a fourth element of the key.
Why are you ordering the the INSERT? There is no need for this.
I wondered that too, but since I got handed this code this afternoon figured there might be a reason. We do have some stuff that cares about order of inserts. And it doesn't affect the one-minute plan by much without it, since it still has to do the distinct.
Does the show plan include 'parallelism'? If so, try it with OPTION (MAXDOP 1) at the end.
Parsing PTY_Party_Code is a terrible idea. Considered storing it as 3 separate integer columns so that indexes can be used.
Agreed.
I figured that since my main problem was that it had such drastically different run times inside and outisde of a stored procedure, niggling stuff like the order by and parsing pty_party_code would come out in the wash.
May 31, 2006 at 5:48 pm
Try adding "WITH RECOMPILE" to the stored procedure definition, and see if that corrects the issue.
May 31, 2006 at 6:16 pm
[EDIT: THIS FIRST PARA WAS WRITTEN WITHOUT READING THE PREVIOUS POSTS PROPERLY] If that doesn't do it, I'd be surprised, but if not, can you see any difference between the showplan outputs, and if so post them? A theoretical possibility, especially if your views are complex, is that your free sql is reusing a 'lucky' query plan that somehow the sp doesn't pick up on, even when looking to generate a fresh execution plan. The optimiser does not always work very well, even with fresh stats, especially with very complex queries. You could clear the plan buffers with DBCC FREEPROCCACHE to create a level playing field, but only after you save the two different versions of the query plan.
Alternatively, any possibility of schema ambiguities in your table names being resolved differently? The stored proc may have an owner other than you, possibly with different permissions. relatedly, I seem to remember there is also the issue of recompiles being caused by non-schema-modified table names - so perhaps the problem is not that the sp is reusing an old plan, but that it is recompiling evry time? You could run a trace for recompiles, or just try schema-qualifying all your object references (I don't think referenced views have to do the same internally for the sp to reuse plans, but I may be wrong).
The apparently independent problem with the non-1nf data could perhaps be temporarily resolved with indexed calculated columns or views in v8+. Even 1 minute is quite a long time, after all. Of course I don't know how many tables of what size are in your views, and what optimisation they might bear, or their parents, or their parents' parents...
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
May 31, 2006 at 7:04 pm
Jeffrey, because you have 3 "Full Table" joins in your query you've got 267,000 * 3 = 800k lines in Hash Join. Plus some other minor duplicates you must have gives you those 850k. SQL Server has to record it to a table in temdb and work out the data in this table. Take into consideration that there are no indexes, no statistics on those 850k rows, your DISTINCT, your ORDER go with full table scan.
You cannot do anything with it unless you have those 3 indexed columns.
And get rid of that ORDER BY. INSERT applies its own order without paying any attention to the order of data being inserted. Unless the targeted table does not have any index on it.
_____________
Code for TallyGenerator
May 31, 2006 at 7:52 pm
You'd only be interested in the INSERT order if there was a identity column. Sergiy, are you saying that this won't work and new records might be 'incorrectly' numbered?
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
June 1, 2006 at 3:33 am
If you use identity column you don't really care which line is assigned to which number.
Identity is just a "shortcut" for natural key for data in table. And those columns making this key must be invluded into unique clustered index.
If you need to use identity as only unique constraint then your data is a mess.
_____________
Code for TallyGenerator
June 1, 2006 at 4:49 am
Sergiy, some data is 'a mess'. Some data refers to its own processing. Someone somewhere has a legitimate reason for doing what I describe. So the answer to the question is...?
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
June 1, 2006 at 5:22 am
Answer is "There is no point".
_____________
Code for TallyGenerator
June 1, 2006 at 5:36 am
No point in unique clustered index on composite key with a surrogate? Don't be too hard on yourself, there might be a point sometimes. It's just not necessarily a good idea, and certainly not mandatory.
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
June 1, 2006 at 11:49 am
I ended up making a temp table and adding an index to it. It sped the query up enormously.
It doesn't quite solve my underlying problem, though. What's the best way to see explain plans for the middle of a stored procedure? Run explain on the whole thing?
I think part of the problem was using pt_party_view with old stats information.
The MAXDOP 1 thing worked extremely well, too. Thanks for the help.
June 1, 2006 at 12:22 pm
Also, check the definition of the views. I've worked on systems with views nesting views many levels deep, with each level using the same tables, causing them to appear in the plan many times.
June 1, 2006 at 1:30 pm
Ah yes, nested views:
>Of course I don't know how many tables of what size are in your views, and what optimisation they might bear, or their parents, or their parents' parents...
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply