November 1, 2002 at 2:23 am
Hi all
SELECT TableA.LINE_ID
FROM
TableA
LEFT OUTER JOIN
TableB
ON TableA.RecName = TableB.RecName
AND TableA.RecType = TableB.RecType
AND TableA.Price = TableB.Price
AND TableA.Status = 'ACTIVE'
AND TableB.Status = 'ACTIVE'
WHERE TableB.LINE_ID IS NULL
When running this SQL without the second last line ("TableB.Status = 'ACTIVE'")
the query executes in about 2 seconds. When including it though it takes about 7 minutes to execute!
Any ideas on what the reason might be? I suspect it might be an indexing problem?
J
November 1, 2002 at 2:49 am
Hi, sounds like a full table scan is being performed. Indexes on all columns used to join / filter will improve things.
Regards,
Andy Jones
.
November 1, 2002 at 3:07 am
Nope, does not work. I put indexes on all the columns used in the join and the query still executes for 7 min.
J
November 1, 2002 at 4:33 am
You should look at your execution plan to see what it is doing during that step.
Try this and see what happens.
SELECT TableA.LINE_ID
FROM
TableA
LEFT OUTER JOIN
(SELECT
RecName,
RecType,
Price,
Line_ID
FROM TableB
WHERE TableB.Status = 'ACTIVE') AS TableB
ON TableA.RecName = TableB.RecName
AND TableA.RecType = TableB.RecType
AND TableA.Price = TableB.Price
AND TableA.Status = 'ACTIVE'
WHERE TableB.LINE_ID IS NULL
November 1, 2002 at 5:39 am
The execution plan indicates that 86% of the processing is on the "Clustered Index Scan" on both TableA and TableB. That being the LINE_ID columns on both.
Should I change the table design in some way to improve this?
I tried your script and it is fast
(2 seconds)
Thanks
J
November 1, 2002 at 7:33 am
No, I don't think a table design change or index will fix it. The problem I believe comes from the fact TableB must be ACTIVE and TableA must alos be. In an inner join you could set them equal However with the outer join and the on having to check both A and B status fields, I believe it is doing this at the same time and slowing you down. The subquery handles getting B handled in a specific order of operation and takes the poor decision out of the loop.
November 1, 2002 at 10:23 am
I suppose what Antares said might be possible, but it seems to me more likely that statistical information on your tables is suggesting to the query optimizer that a significant number of rows in both tables will meet your FROM conditions; consequently, it figures a table scan (or - in this case - a clustered index scan, which is equivalent to a table scan) is the most effective way to obtain your data. Alternatively, given the amount of data, the optimizer may feel that performing a FILTER operation (which it would have to do if it used any other index besides the clustered index) would be more expensive than doing the clustered index scan and filtering the rows during the scan. Try Antares' suggestion first, and if it doesn't improve things, you might want to look at:
I generally discourage the use of hints where possible, but sometimes they are necessary. Personally, I would first try placing the clustered index on a different column - again, one of the more selective columns in your join, such as RecName or Price. Usually, I place clustered indexes on less selective columns, but in this case you have to consider the logic of the optimizer. It's choosing the clustered index because either (a) it believes that the number of rows to be returned is significantly large that it might as well scan the whole table, or (b) that it believes the cost of performing a separate FILTER operation is sufficiently large that it would be less expensive to perform the clustered index scan, doing the filter there, even though the clustered index doesn't cover any of the columns in your join. By placing the clustered index on one of your more selective columns, it may (a) help the optimizer recognize that a clustered index seek will cost less than a scan (something that you can't always say when comparing nonclustered index seeks vs. scans), and (b) if it chooses to filter the columns as part of the clustered index seek, it has a good head start, because the index seek itself will sufficiently filter a large portion of the rows. If your clustered index is supporting a primary key, this will require you to drop and rebuild the primary key, and you must consider this change in relation to all of the other queries in your system that might be affected by this.
Matthew Burr
Edited by - mdburr on 11/01/2002 10:23:52 AM
Edited by - mdburr on 11/01/2002 7:10:57 PM
November 2, 2002 at 6:31 am
I think the problem could be todo with the position of the tabA.status = 'Active' clause.
If you only want result from tableA that are active move it to the where clause.
Further more is there a index on recname, rectype, price and status. IF not adding this will allow for the use of a coverin index.
SELECT TableA.LINE_ID
FROM
TableA
LEFT OUTER JOIN
TableB
ON TableA.RecName = TableB.RecName
AND TableA.RecType = TableB.RecType
AND TableA.Price = TableB.Price
AND TableB.Status = 'ACTIVE'
WHERE TableB.LINE_ID IS NULL
AND TableA.Status = 'ACTIVE'
Do these tables have lots of columns?
Simon Sabin
Co-author of SQL Server 2000 XML Distilled
http://www.amazon.co.uk/exec/obidos/ASIN/1904347088
Simon Sabin
SQL Server MVP
http://sqlblogcasts.com/blogs/simons
November 4, 2002 at 2:57 am
Matthew, I cannot change the clustered index to another column as the line_id column is the only true incremental column in this table. I am not familiar with the use of hints but I found your comments quite interesting. Will have to do a bit of reading on my own though.
Simon, thanks, you got it spot on. The previous solution actually gave incorrect results. I got records back where the Status on TableA was <> 'ACTIVE'. Which is wrong.
I might have given an unclear description of my requirements.
Anyway, moving "TableA.Status = 'ACTIVE'" out to the where clause solved the problem and the speed is quite reasonable as well.
Both the tables have got about 20 columns.
Thanks
J
November 5, 2002 at 9:43 am
The comment about the line_id column being the only true incremental column in the table is interesting. This is not a requirement for a clustered index; i.e. a clustered index does not need to be placed on an incremental column. I suspect that you have placed it there in order to avoid page splits, by insuring that all new inserts - since they are incremental - will be placed in the last page of the table. That is one reason of considering placing the clustered index on such a column. On the other hand, if you ever do deletes from this table, you could be losing a lot of space, since the space previously used by the now deleted row will not be reused. Also, you have to balance negative performance on inserts with negative performance on selects and determine which you can afford more. You might find it more reasonable to place the clustered index elsewhere if you can recognize significant performance gains in a larger portion of your queries. Consider also that unless you are selecting large ranges of rows based on line_id (e.g. using queries that search for "line_id BETWEEN x AND y," you may not be recognizing all of the potential performance gains that a clustered index can truly provide.
Ultimately, you may be losing performance by using a clustered index primarily to support ordered inserts of your data (i.e. all data being added to the end of the table), which in turn may be costing you in data space, since any space left when data is deleted may not be getting reused; this in contradistinction to using the clustered index in a possibly more effective way and saving yourself from page splits by using other options, such as FILLFACTOR. If you haven't yet - and you probably have - you might want to revisit this indexing decision. Of course, indexing decisions, ultimately, are a very personal thing, based on the data and activity in your database, which - obviously - you know far better than anyway outside of your company; so, take any advice with a "grain of salt."
Matthew Burr
Edited by - mdburr on 11/05/2002 09:45:59 AM
November 6, 2002 at 12:30 pm
joepin,
I'm coming late to the party so forgive me if I'm covering old material...
It looks like you are blurring the line between the join and the where clauses. The JOIN clause is only used to determine how the outer join is supposed to select elements from TableB. By imposing the condition "AND TableB.Status = 'ACTIVE'" you are implicitly converting an outer join into an inner join.
HTH
Steve Hendricks
MCSD, MCDBA
AFS Consulting Group
(949) 588-9800 x15
Steve Hendricks
MCSD, MCDBA
Data Matrix
shendricks@afsconsulting.com
(949) 588-9800 x15
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply