June 10, 2005 at 11:45 am
I have a query that performs as well as I need it to, but I would like to understand why the criteria I have added to the query makes it so quick. Here is the query:
DECLARE @batchuid VARCHAR(40)
SELECT tDocArchive.arcuid, tOCRTokens.oct_token FROM tOCRTokens
INNER JOIN tDocOCRTokens ON tDocOCRTokens.ocd_octuid = tOCRTokens.octuid
INNER JOIN tDocArchive ON tDocArchive.arcuid = tDocOCRTokens.ocd_arcuid
WHERE tDocArchive.doa_batchuid = @batchuid
AND tDocArchive.doa_batchorder%1 < 1
ORDER BY tDocArchive.doa_batchorder
When I run this query it takes 5 seconds to return 24599 rows but, if I remove the "tDocArchive.doa_batchorder%1 < 1" piece of criteria it takes 24 seconds to run. How is this possible given the following index details:
tDocArchive.arcuid - clustered primary key
tOCRTokens.octuid - clusterd primary key
tDocOCRTokens.ocd_octuid - indexed foreign key
tDocOCRTokens.ocd_arcuid - indexed foreign key
The doa_batchorder field does not have an index!
The only thing I can think of is that the ordering is helping the optimization. The query execution plan shows that the sort on doa_batchorder happens first.
Thanks, any insight is appreciated.
Tony
June 10, 2005 at 11:50 am
How many rows do you get if you remove that condition?
June 10, 2005 at 11:52 am
I get the same number of rows, the results are the same.
June 10, 2005 at 11:54 am
I would imagine the same number of results, as any integer will return true for n%1<1...
June 10, 2005 at 11:55 am
Another question... what is this supposed to filter?
AND tDocArchive.doa_batchorder%1 < 1
Can you rerun the queries and send us the execution plans
SET SHOWPLAN_TEXT ON
GO
Query1
GO
Query2
GO
SET SHOWPLAN_TEXT OFF
June 10, 2005 at 11:57 am
Ya... it's so obvious (when you think about it) .
June 10, 2005 at 12:06 pm
I am not attempting to filter with that filter, I only noticed that the query runs faster. Here is the plan you requested.
|--Sort(ORDER BY: ([tDocArchive].[doa_batchorder] ASC))
|--Merge Join(Inner Join, MERGE: ([tOCRTokens].[octuid])=([tDocOCRTokens].[ocd_octuid]), RESIDUAL: ([tOCRTokens].[octuid]=[tDocOCRTokens].[ocd_octuid]))
|--Index Scan(OBJECT: ([SecureDocs_v1].[dbo].[tOCRTokens].[pk_tOCRTokens]), ORDERED FORWARD)
|--Sort(ORDER BY: ([tDocOCRTokens].[ocd_octuid] ASC))
|--Bookmark Lookup(BOOKMARK: ([Bmk1002]), OBJECT: ([SecureDocs_v1].[dbo].[tDocOCRTokens]))
|--Nested Loops(Inner Join, OUTER REFERENCES: ([tDocArchive].[arcuid]) WITH PREFETCH)
|--Filter(WHERE: ([tDocArchive].[doa_batchorder]%1<1))
| |--Bookmark Lookup(BOOKMARK: ([Bmk1004]), OBJECT: ([SecureDocs_v1].[dbo].[tDocArchive]))
| |--Index Seek(OBJECT: ([SecureDocs_v1].[dbo].[tDocArchive].[idx07_tDocArchive]), SEEK: ([tDocArchive].[doa_batchuid]='ddf66fe212a14c2191d6ee43b43f762b') ORDERED FORWARD)
|--Index Seek(OBJECT: ([SecureDocs_v1].[dbo].[tDocOCRTokens].[idx03_tDocOCRTokens]), SEEK: ([tDocOCRTokens].[ocd_arcuid]=[tDocArchive].[arcuid]) ORDERED FORWARD)
Thanks for taking a look at this.
June 10, 2005 at 12:08 pm
where's the second plan?
June 10, 2005 at 12:13 pm
Oops, I guess that would help
With: tDocArchive.doa_batchorder%1 < 1
|--Sort(ORDER BY: ([tDocArchive].[doa_batchorder] ASC))
|--Merge Join(Inner Join, MERGE: ([tOCRTokens].[octuid])=([tDocOCRTokens].[ocd_octuid]), RESIDUAL: ([tOCRTokens].[octuid]=[tDocOCRTokens].[ocd_octuid]))
|--Index Scan(OBJECT: ([SecureDocs_v1].[dbo].[tOCRTokens].[pk_tOCRTokens]), ORDERED FORWARD)
|--Sort(ORDER BY: ([tDocOCRTokens].[ocd_octuid] ASC))
|--Bookmark Lookup(BOOKMARK: ([Bmk1002]), OBJECT: ([SecureDocs_v1].[dbo].[tDocOCRTokens]))
|--Nested Loops(Inner Join, OUTER REFERENCES: ([tDocArchive].[arcuid]) WITH PREFETCH)
|--Filter(WHERE: ([tDocArchive].[doa_batchorder]%1<1))
| |--Bookmark Lookup(BOOKMARK: ([Bmk1004]), OBJECT: ([SecureDocs_v1].[dbo].[tDocArchive]))
| |--Index Seek(OBJECT: ([SecureDocs_v1].[dbo].[tDocArchive].[idx07_tDocArchive]), SEEK: ([tDocArchive].[doa_batchuid]='ddf66fe212a14c2191d6ee43b43f762b') ORDERED FORWARD)
|--Index Seek(OBJECT: ([SecureDocs_v1].[dbo].[tDocOCRTokens].[idx03_tDocOCRTokens]), SEEK: ([tDocOCRTokens].[ocd_arcuid]=[tDocArchive].[arcuid]) ORDERED FORWARD)
Without: tDocArchive.doa_batchorder%1 < 1
|--Sort(ORDER BY: ([tDocArchive].[doa_batchorder] ASC))
|--Merge Join(Inner Join, MERGE: ([tOCRTokens].[octuid])=([tDocOCRTokens].[ocd_octuid]), RESIDUAL: ([tOCRTokens].[octuid]=[tDocOCRTokens].[ocd_octuid]))
|--Index Scan(OBJECT: ([SecureDocs_v1].[dbo].[tOCRTokens].[pk_tOCRTokens]), ORDERED FORWARD)
|--Sort(ORDER BY: ([tDocOCRTokens].[ocd_octuid] ASC))
|--Hash Match(Inner Join, HASH: ([tDocArchive].[arcuid])=([tDocOCRTokens].[ocd_arcuid]), RESIDUAL: ([tDocArchive].[arcuid]=[tDocOCRTokens].[ocd_arcuid]))
|--Bookmark Lookup(BOOKMARK: ([Bmk1004]), OBJECT: ([SecureDocs_v1].[dbo].[tDocArchive]))
| |--Index Seek(OBJECT: ([SecureDocs_v1].[dbo].[tDocArchive].[idx07_tDocArchive]), SEEK: ([tDocArchive].[doa_batchuid]='ddf66fe212a14c2191d6ee43b43f762b') ORDERED FORWARD)
|--Clustered Index Scan(OBJECT: ([SecureDocs_v1].[dbo].[tDocOCRTokens].[pk_tDocOCRTokens]))
Thanks
June 10, 2005 at 12:31 pm
the where seems to force and index seek (removing a scan.. which is probabely the longest operation of the query).
However I can't really be sure without the relative cost to the batch... and most importantly why???
June 10, 2005 at 1:00 pm
for real, why why why
June 10, 2005 at 1:04 pm
I wish I knew...
June 10, 2005 at 4:08 pm
I would agree that the index seek is the most significant difference between the two queries. The tDocOcrTokens table has the greatest number of rows of any table in the database, over 10,000,000, all the other tables have less than 500,000 so this is the most costly operation in the query, is that what you mean when you meant by "the relative cost to the batch?"
Anyhow, Thanks a ton. If I find out why I'll post. I am off for the weekend.
June 15, 2005 at 6:42 am
Hi
1. With: tDocArchive.doa_batchorder%1 < 1 :: It uses "Nested Loops" JOIN whereas without ...%1 <1 it uses "Hash Match" JOIN. HASH JOIN uses more memory w.r.t. Nested LOOP JOIN. The execution plan is different & so is the result.
2. INDEX on tDocArchive.doa_batchuid & tDocArchive.doa_batchorder are missing.
3. JOIN tables with small records in the begning of the SQL SCRIPT.
4 UPDATE the statistics for all the indexes.
5. Preferably filter the records in the INNER JOIN. e.g. INNER JOIN TABLE1 on ( TABLE1.field10 = 'value' AND TABLE1.field1 = TABLE2.field2 )
regds/ramanuj
🙂
June 15, 2005 at 12:04 pm
I have changed the query to read as follows, performance seems to be the same.
SELECT tDocArchive.arcuid, tOCRTokens.oct_token
FROM tOCRTokens
INNER JOIN tDocOCRTokens ON tDocOCRTokens.ocd_octuid = tOCRTokens.octuid
INNER JOIN tDocArchive ON (tDocArchive.doa_batchorder%1 < 1 AND
tDocArchive.doa_batchuid = @batchuid AND
tDocArchive.arcuid = tDocOCRTokens.ocd_arcuid)
ORDER BY tDocArchive.doa_batchorder
The doa_batchuid does have an index, just not doa_batchorder, until this query I have only used doa_batchorder to order the batch, it has never been used as a filter before. I added an index to doa_batchorder but it did not seem to make a difference so I removed it. Is there a better way to convince this query to use a NESTED LOOP JOIN instead of a HASH JOIN? I have looked into query hints but I don't seem to see a hint that will do this.
Also, I have a job that updates the statistics and defrags indexes early every morning.
Thanks, Tony
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply