Strange Query Optimization

Question

Post reply

Strange Query Optimization

Tony Piazza

SSC Rookie

Points: 49
More actions
June 10, 2005 at 11:45 am

#110553

I have a query that performs as well as I need it to, but I would like to understand why the criteria I have added to the query makes it so quick. Here is the query:
DECLARE @batchuid VARCHAR(40)
SELECT tDocArchive.arcuid, tOCRTokens.oct_token FROM tOCRTokens
INNER JOIN tDocOCRTokens ON tDocOCRTokens.ocd_octuid = tOCRTokens.octuid
INNER JOIN tDocArchive ON tDocArchive.arcuid = tDocOCRTokens.ocd_arcuid
WHERE tDocArchive.doa_batchuid = @batchuid
AND tDocArchive.doa_batchorder%1 < 1
ORDER BY tDocArchive.doa_batchorder
When I run this query it takes 5 seconds to return 24599 rows but, if I remove the "tDocArchive.doa_batchorder%1 < 1" piece of criteria it takes 24 seconds to run. How is this possible given the following index details:
tDocArchive.arcuid - clustered primary key
tOCRTokens.octuid - clusterd primary key
tDocOCRTokens.ocd_octuid - indexed foreign key
tDocOCRTokens.ocd_arcuid - indexed foreign key
The doa_batchorder field does not have an index!
The only thing I can think of is that the ordering is helping the optimization. The query execution plan shows that the sort on doa_batchorder happens first.
Thanks, any insight is appreciated.
Tony

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 1

How many rows do you get if you remove that condition?

Tony Piazza SSC Rookie Points: 49 More actions · Answer 2

I get the same number of rows, the results are the same.

Mike Gainer Grasshopper Points: 23 More actions · Answer 3

I would imagine the same number of results, as any integer will return true for n%1<1...

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 4

Another question... what is this supposed to filter?

AND tDocArchive.doa_batchorder%1 < 1

Can you rerun the queries and send us the execution plans

SET SHOWPLAN_TEXT ON

GO

Query1

GO

Query2

GO

SET SHOWPLAN_TEXT OFF

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 5

Ninja's_RGR'us

SSC Guru

Points: 294069

June 10, 2005 at 11:57 am

#564719

Ya... it's so obvious (when you think about it) .

Tony Piazza SSC Rookie Points: 49 More actions · Answer 6

I am not attempting to filter with that filter, I only noticed that the query runs faster. Here is the plan you requested.

|--Sort(ORDER BY: ([tDocArchive].[doa_batchorder] ASC))

|--Merge Join(Inner Join, MERGE: ([tOCRTokens].[octuid])=([tDocOCRTokens].[ocd_octuid]), RESIDUAL: ([tOCRTokens].[octuid]=[tDocOCRTokens].[ocd_octuid]))

|--Index Scan(OBJECT: ([SecureDocs_v1].[dbo].[tOCRTokens].[pk_tOCRTokens]), ORDERED FORWARD)

|--Sort(ORDER BY: ([tDocOCRTokens].[ocd_octuid] ASC))

|--Bookmark Lookup(BOOKMARK: ([Bmk1002]), OBJECT: ([SecureDocs_v1].[dbo].[tDocOCRTokens]))

|--Nested Loops(Inner Join, OUTER REFERENCES: ([tDocArchive].[arcuid]) WITH PREFETCH)

|--Filter(WHERE: ([tDocArchive].[doa_batchorder]%1<1))

| |--Bookmark Lookup(BOOKMARK: ([Bmk1004]), OBJECT: ([SecureDocs_v1].[dbo].[tDocArchive]))

| |--Index Seek(OBJECT: ([SecureDocs_v1].[dbo].[tDocArchive].[idx07_tDocArchive]), SEEK: ([tDocArchive].[doa_batchuid]='ddf66fe212a14c2191d6ee43b43f762b') ORDERED FORWARD)

|--Index Seek(OBJECT: ([SecureDocs_v1].[dbo].[tDocOCRTokens].[idx03_tDocOCRTokens]), SEEK: ([tDocOCRTokens].[ocd_arcuid]=[tDocArchive].[arcuid]) ORDERED FORWARD)

Thanks for taking a look at this.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 7

Ninja's_RGR'us

SSC Guru

Points: 294069

June 10, 2005 at 12:08 pm

#564729

where's the second plan?

Tony Piazza SSC Rookie Points: 49 More actions · Answer 8

Oops, I guess that would help

With: tDocArchive.doa_batchorder%1 < 1

|--Sort(ORDER BY: ([tDocArchive].[doa_batchorder] ASC))

|--Merge Join(Inner Join, MERGE: ([tOCRTokens].[octuid])=([tDocOCRTokens].[ocd_octuid]), RESIDUAL: ([tOCRTokens].[octuid]=[tDocOCRTokens].[ocd_octuid]))

|--Index Scan(OBJECT: ([SecureDocs_v1].[dbo].[tOCRTokens].[pk_tOCRTokens]), ORDERED FORWARD)

|--Sort(ORDER BY: ([tDocOCRTokens].[ocd_octuid] ASC))

|--Bookmark Lookup(BOOKMARK: ([Bmk1002]), OBJECT: ([SecureDocs_v1].[dbo].[tDocOCRTokens]))

|--Nested Loops(Inner Join, OUTER REFERENCES: ([tDocArchive].[arcuid]) WITH PREFETCH)

|--Filter(WHERE: ([tDocArchive].[doa_batchorder]%1<1))

| |--Bookmark Lookup(BOOKMARK: ([Bmk1004]), OBJECT: ([SecureDocs_v1].[dbo].[tDocArchive]))

| |--Index Seek(OBJECT: ([SecureDocs_v1].[dbo].[tDocArchive].[idx07_tDocArchive]), SEEK: ([tDocArchive].[doa_batchuid]='ddf66fe212a14c2191d6ee43b43f762b') ORDERED FORWARD)

|--Index Seek(OBJECT: ([SecureDocs_v1].[dbo].[tDocOCRTokens].[idx03_tDocOCRTokens]), SEEK: ([tDocOCRTokens].[ocd_arcuid]=[tDocArchive].[arcuid]) ORDERED FORWARD)

Without: tDocArchive.doa_batchorder%1 < 1

|--Sort(ORDER BY: ([tDocArchive].[doa_batchorder] ASC))

|--Merge Join(Inner Join, MERGE: ([tOCRTokens].[octuid])=([tDocOCRTokens].[ocd_octuid]), RESIDUAL: ([tOCRTokens].[octuid]=[tDocOCRTokens].[ocd_octuid]))

|--Index Scan(OBJECT: ([SecureDocs_v1].[dbo].[tOCRTokens].[pk_tOCRTokens]), ORDERED FORWARD)

|--Sort(ORDER BY: ([tDocOCRTokens].[ocd_octuid] ASC))

|--Hash Match(Inner Join, HASH: ([tDocArchive].[arcuid])=([tDocOCRTokens].[ocd_arcuid]), RESIDUAL: ([tDocArchive].[arcuid]=[tDocOCRTokens].[ocd_arcuid]))

|--Bookmark Lookup(BOOKMARK: ([Bmk1004]), OBJECT: ([SecureDocs_v1].[dbo].[tDocArchive]))

| |--Index Seek(OBJECT: ([SecureDocs_v1].[dbo].[tDocArchive].[idx07_tDocArchive]), SEEK: ([tDocArchive].[doa_batchuid]='ddf66fe212a14c2191d6ee43b43f762b') ORDERED FORWARD)

|--Clustered Index Scan(OBJECT: ([SecureDocs_v1].[dbo].[tDocOCRTokens].[pk_tDocOCRTokens]))

Thanks

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 9

the where seems to force and index seek (removing a scan.. which is probabely the longest operation of the query).

However I can't really be sure without the relative cost to the batch... and most importantly why???

Mike Gainer Grasshopper Points: 23 More actions · Answer 10

Mike Gainer

Grasshopper

Points: 23

June 10, 2005 at 1:00 pm

#564758

for real, why why why

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 11

Ninja's_RGR'us

SSC Guru

Points: 294069

June 10, 2005 at 1:04 pm

#564759

I wish I knew...

Tony Piazza SSC Rookie Points: 49 More actions · Answer 12

I would agree that the index seek is the most significant difference between the two queries. The tDocOcrTokens table has the greatest number of rows of any table in the database, over 10,000,000, all the other tables have less than 500,000 so this is the most costly operation in the query, is that what you mean when you meant by "the relative cost to the batch?"

Anyhow, Thanks a ton. If I find out why I'll post. I am off for the weekend.

Ramanuj SSC Veteran Points: 278 More actions · Answer 13

Hi

1. With: tDocArchive.doa_batchorder%1 < 1 :: It uses "Nested Loops" JOIN whereas without ...%1 <1 it uses "Hash Match" JOIN. HASH JOIN uses more memory w.r.t. Nested LOOP JOIN. The execution plan is different & so is the result.

2. INDEX on tDocArchive.doa_batchuid & tDocArchive.doa_batchorder are missing.

3. JOIN tables with small records in the begning of the SQL SCRIPT.

4 UPDATE the statistics for all the indexes.

5. Preferably filter the records in the INNER JOIN. e.g. INNER JOIN TABLE1 on ( TABLE1.field10 = 'value' AND TABLE1.field1 = TABLE2.field2 )

regds/ramanuj

🙂

Tony Piazza SSC Rookie Points: 49 More actions · Answer 14

I have changed the query to read as follows, performance seems to be the same.

SELECT tDocArchive.arcuid, tOCRTokens.oct_token

FROM tOCRTokens

INNER JOIN tDocOCRTokens ON tDocOCRTokens.ocd_octuid = tOCRTokens.octuid

INNER JOIN tDocArchive ON (tDocArchive.doa_batchorder%1 < 1 AND

tDocArchive.doa_batchuid = @batchuid AND

tDocArchive.arcuid = tDocOCRTokens.ocd_arcuid)

ORDER BY tDocArchive.doa_batchorder

The doa_batchuid does have an index, just not doa_batchorder, until this query I have only used doa_batchorder to order the batch, it has never been used as a filter before. I added an index to doa_batchorder but it did not seem to make a difference so I removed it. Is there a better way to convince this query to use a NESTED LOOP JOIN instead of a HASH JOIN? I have looked into query hints but I don't seem to see a hint that will do this.

Also, I have a job that updates the statistics and defrags indexes early every morning.

Thanks, Tony