Select with inner joins becomes slower with new data

Question

Post reply

Select with inner joins becomes slower with new data

Ferran

Grasshopper

Points: 11
More actions
February 28, 2005 at 8:23 am

#85260

Hello everybody !
I'm new in this forum... But I have read some and it looks nice. (Sorry about my english ).
I have a strange problem...
I'm using Windows 2000 SP3, and SQLServer 2000.
The same "select" statement becomes terribly slower if the data from the "where" clausule has been inserted recently.
Example :
SELECT TABLEA1.* , TABLEB2.OID AS FFIELD0, TABLEC3.OID AS FFIELD1 FROM ((TABLEA TABLEA1 WITH (READUNCOMMITTED) INNER JOIN TABLEB TABLEB2 ON TABLEA1.OIDTableB = TABLEB2.OID) INNER JOIN TABLEC TABLEC3 ON TABLEB2.OIDTableC = TABLEC3.OID) WHERE (OIDTable4 IN (1819563, 1819565, 1819566, 1819567, 1819568, 1819569, 1819570, 1819571, 1819572, 1819573, 1819574, 1819575, 1819576, 1819577, 1819578, 1819579, 1819580))
These numbers form the "where... " are identifiers of registers inserted recently. The time needed to execute this statement is 2 minutes and 45 seconds.
The same select with other numbers of registers inserted long time ago is faster. The size of the Database is about 480MB.
How can I change the "select" to do it faster ? Which is my problem ? Can anybody help me ? Any idea ?

Ferran.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 1

I can think of 2 things :

1 - index are fragmented : you can run sp_updatestats to update the statistics of the table.

You should also look up DBCC INDEXDEFRAG to see if it could help you.

2 - I would try moving this part to a derived table and join to that table :

WHERE (OIDTable4 IN (1819563, 1819565, 1819566, 1819567, 1819568, 1819569, 1819570, 1819571, 1819572, 1819573, 1819574, 1819575, 1819576, 1819577, 1819578, 1819579, 1819580)).

Or you could also make a new permanent table that olds the ids that need to be selected and join to that instead of the big in clause.

RD Francis SSCarpal Tunnel Points: 4867 More actions · Answer 2

Typed this up once, and the browser seems to have eaten it. I'll try to do it a bit shorter this time.

You probably want to run this query in Query Analyzer, with Show Execution Plan on. There are a couple of things that could be happening here:

1) OIDTable4 is not in an index, and your statistics are missing or out of date. Look for parts of the execution that take a long time, and check the actual row count versus the estimated row count. If they're way off, consider updating/adding statistics (you can do this through Query Analyzer by simply right-clicking on the problem query action).

2) OIDTable4 may be in an index, but that index may not be used. When joining large tables together based on primary/foreign keys, (especially if the primary keys are also the clustered index for the table), the optimizer winds up scanning the table or the clustered index (in essence the same thing).

If OIDTable4 happens to be unique as well, that would explain a lot. The server would search through the table for all the records with an OIDTable4 from your IN clause. If OIDTable4 is unique, and it's found one record for each of the values you specified halfway through the table scan, it can stop; it knows there aren't any more records it needs. If the last record it needs is the last record in the table (or if any of the IDs doesn't exist) it has to check everything.

In fact, you might try constructing a search with ten old OIDTable4 values, and one non-existent one; that might give you an indication that this is what';s happening.

As I mentioned above, if the clustered index of a table is needed for a join condition, and other data is needed for the rest of the query, the clustered index (which is the table in full, pre-sorted in the correct order) will be scanned for the required rows. You can get around this, if performance is imperative, by using a "covering index".

A "covering index" is simply an index where all of the columns used in a query (whether in join conditions, the WHERE clause, or the select list) are a part of the index in question. When this is true, SQL Server will just read the index for everything it needs, and ignore the actual table altogether. If the index is significantly "narrower" (the total storage space for the columns from the index is much less that that for the full rows from the table), you can save a lot of reads this way.

If this sounds like your situation, and if it's worth the extra cost during INSERT operations of having another index (not to mention the additional storage cost), here's what I would do.

First, look at the existing non-clustered indexes on the table. Do any contain some of the fields you want? If so, you might want to simple add the missing field(s) to that index. Remember, of course, that this might slow down other queries, if they're already using that index as a covering index and the index data for each row suddenly doubles.

You probably don't want to add the missing field(s) to the clustered index. Clustered index fields should be chosen carefully, ideally to provide unique values for each record while remaining fairly narrow. A tuning tip I picked up only a few weeks ago is that SQL Server uses a row pointer to indicate the row an index refers to on non-clustered tables, but uses the clustered index key to indicate the row in non-clustered indexes on a table that also has a clustered index. So, if your clustered index is a half dozen varchar columns with an average length of 700 characters, those fields are used not only in the clustered index, but in the other five indexes on the table as well.

Anyway: If my guess is correct, and OIDTable4 lives in TABLEC3, then a good covering index might be (OIDTable4, OID). SQL Server would scan this index, grab only the records it needed (I'd suspect based on the OIDTable4 list in the IN clause, though ti might still do the joins first), and move on.

Of course, this is all guesswork and speculation - however, hopefully, you can find something to use here.

R David Francis