Tuning slow query

Question

Post reply

Tuning slow query

LOOKUP_BI-756009

SSCertifiable

Points: 7286
More actions
December 9, 2010 at 11:37 am

#232353

SELECT DISTINCT A.Status
FROM dbo.Test1 A
LEFT JOIN dbo.Test2 B
ON A.Status= B.Status
WHEREB.Status IS NULL
AND A.Status IS NOT NULL
Number of rows in dbo.Test1 A = 150 000
Number of rows in dbo.Test2 B = 4500
There are clusterd indexes on both table on ID column
This query is taking 35 minutes.
How can I make it faster
I have tested the following
1) Included column A.Status & B.Status on the clustered index -Time taken 34 minutes
The query execution plan shows that the LEFT JOIN operation cost is 60% and the filter on B.Status IS NULL is 18%.These two task has the highest %

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

John Rowan SSC Guru Points: 56440 More actions · Answer 1

For starters, I'd consider indexing the Status columns. Also, remove the Distinct and add a GROUP BY A.Status clause.

John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

LOOKUP_BI-756009 SSCertifiable Points: 7286 More actions · Answer 2

Hi John,

I did include column Status into clusterd Index for both table A & B.

The Status column in table A has many duplicates, thats the reason for the distinct

John Rowan SSC Guru Points: 56440 More actions · Answer 3

So the clustered index is a composit index? What other columns are in it?

It would help if you could post the table DDL (full DDL w/ index defs too), a few rows of sample data, and most importantly the execution plan.

John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

LOOKUP_BI-756009 SSCertifiable Points: 7286 More actions · Answer 4

John Rowan (12/9/2010)
So the clustered index is a composit index? What other columns are in it?
It would help if you could post the table DDL (full DDL w/ index defs too), a few rows of sample data, and most importantly the execution plan.

Initially just the column ID was the clusterd index on both tables, but I had included column Status to the clustered index for my testing

I will post the DDL shortly, meanwhile I tried this and it seemed to run much faster.Could you have a look at it and confirm if my logic is correct

SELECT DISTINCT A.Status INTO #TEMP FROM dbo.Test1 A where A.Status IS NOT NULL

SELECT C.Status from #TEMP C

LEFT JOIN dbo.Test2 B

ON C.Status= B.Status

WHERE B.Status IS NULL

IT took roughly 1 minute

John Rowan SSC Guru Points: 56440 More actions · Answer 5

An execution plan here would help a bunch. You shouldn't have to bother with the temp table to make this work.

The challenge with adding Status to the clustered index as a secondary column is that you are not using the first column in the index in the query. So if your clustered index is ID, Status and you are not using ID in the query, its not really taking advantage of the index like it could.

The clustered index key is included in all non-clutered indexes, so creating a non-clustered index on the Status columns is probably the way to go here.

John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

LOOKUP_BI-756009 SSCertifiable Points: 7286 More actions · Answer 6

LOOKUP_BI-756009

SSCertifiable

Points: 7286

December 9, 2010 at 1:00 pm

#1260761

I got your point John, let me give that a try first

WO Old Hand Points: 339 More actions · Answer 7

I agree with John that the execution plan would be very informative, and that the temp table shouldn't be necessary.

Have you tried rewriting the query using an exists clause instead of the left outer join? since the population in table test2 is relatively small i would expect this to give a better result

something like:

select distinct a.status

from dbo.test1 a

where a.status not in (select distinct b.status

from dbo.test2 b)

and a.status is not null

Willem

Jeff Moden SSC Guru Points: 1004401 More actions · Answer 8

LOOKUP_BI-756009 (12/9/2010)
SELECT DISTINCT A.Status
FROM dbo.Test1 A
LEFT JOIN dbo.Test2 B
ON A.Status= B.Status
WHEREB.Status IS NULL
AND A.Status IS NOT NULL
Number of rows in dbo.Test1 A = 150 000
Number of rows in dbo.Test2 B = 4500
There are clusterd indexes on both table on ID column
This query is taking 35 minutes.
[font="Arial Black"]How can I make it faster[/font]
I have tested the following
1) Included column A.Status & B.Status on the clustered index -Time taken 34 minutes
The query execution plan shows that the LEFT JOIN operation cost is 60% and the filter on B.Status IS NULL is 18%.These two task has the highest %

Just going by the names of the columns in the ON clause... you probably can't make this as fast as you'd like even with indexes. You've most likely made a many-to-many join which is really a Cartesian Join (ie: CROSS JOIN). You need additional JOIN criteria to make this problem go away.

Of course, that's a total guess on my part because none of us actually know anything about your tables or their columns. My suggestion would be to read and heed the article at the second link in my signature line below.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 9

I'm with Jeff on this, as usual. Show the estimated query plan and not the incredibly FAT join line. Put your mouse over it and see the telephone number that pops up for estimated rows.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

LOOKUP_BI-756009 SSCertifiable Points: 7286 More actions · Answer 10

Yes, I tried John's response and did much more quicker. Thank You all.

Jeff Moden SSC Guru Points: 1004401 More actions · Answer 11

LOOKUP_BI-756009 (12/13/2010)
Yes, I tried John's response and did much more quicker. Thank You all.

Still, I believe you have a many-to-many join and a bit of extra criteria would help. If additional criteria is not possible, you may want to have a look at the overall design of the tables and data... a change may be in order. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

250128836 Newbie Points: 3 More actions · Answer 12

Please test Query SQL:

SELECT DISTINCT A.Status

FROM dbo.Test1 A

LEFT JOIN dbo.Test2 B

ON A.Status= B.Status

WHERE A.Status IS NOT NULL

AND B.Status IS NULL

Because of Number of rows in dbo.Test1 A more than Number of rows in dbo.Test1 B

😛

Malcolm Daughtree SSCrazy Points: 2382 More actions · Answer 13

250128836 (12/15/2010)
Please test Query SQL:
SELECT DISTINCT A.Status
FROM dbo.Test1 A
LEFT JOIN dbo.Test2 B
ON A.Status= B.Status
WHERE A.Status IS NOT NULL
AND B.Status IS NULL
Because of Number of rows in dbo.Test1 A more than Number of rows in dbo.Test1 B
😛

Not being picky - well Ok I am but the above query makes no sence....

as the query is internaly processed

Step 1

FROM dbo.Test1 A

LEFT JOIN dbo.Test2 B ON A.Status = B.Status

every thing a AND those IN b that has the same status AS A

A LEFT JOIN will show every Test1 status and all the matching Test2.status records Nulls are not evaluated

hence negating the need for the where clause becauce only B records with an equalling A Status will be returned.

Left joins, right joins, and Inner joins are 'Exclusive' joins if you want to use the following where clause

WHERE A.Status IS NOT NULL

AND B.Status IS NULL

Then you will need to use the inclusive OUTER join and then evaluate for nulls.

Lastly remove the Distinct keyword and include instead a GROUP BY A.status because the Group by clause is actioned after the where clause and you are forcing SQL to double handle its internal grouping.