December 9, 2010 at 11:37 am
SELECT DISTINCT A.Status
FROM dbo.Test1 A
LEFT JOIN dbo.Test2 B
ON A.Status= B.Status
WHEREB.Status IS NULL
AND A.Status IS NOT NULL
Number of rows in dbo.Test1 A = 150 000
Number of rows in dbo.Test2 B = 4500
There are clusterd indexes on both table on ID column
This query is taking 35 minutes.
How can I make it faster
I have tested the following
1) Included column A.Status & B.Status on the clustered index -Time taken 34 minutes
The query execution plan shows that the LEFT JOIN operation cost is 60% and the filter on B.Status IS NULL is 18%.These two task has the highest %
December 9, 2010 at 11:41 am
For starters, I'd consider indexing the Status columns. Also, remove the Distinct and add a GROUP BY A.Status clause.
December 9, 2010 at 11:54 am
Hi John,
I did include column Status into clusterd Index for both table A & B.
The Status column in table A has many duplicates, thats the reason for the distinct
December 9, 2010 at 11:57 am
So the clustered index is a composit index? What other columns are in it?
It would help if you could post the table DDL (full DDL w/ index defs too), a few rows of sample data, and most importantly the execution plan.
December 9, 2010 at 12:47 pm
John Rowan (12/9/2010)
So the clustered index is a composit index? What other columns are in it?It would help if you could post the table DDL (full DDL w/ index defs too), a few rows of sample data, and most importantly the execution plan.
Initially just the column ID was the clusterd index on both tables, but I had included column Status to the clustered index for my testing
I will post the DDL shortly, meanwhile I tried this and it seemed to run much faster.Could you have a look at it and confirm if my logic is correct
SELECT DISTINCT A.Status INTO #TEMP FROM dbo.Test1 A where A.Status IS NOT NULL
SELECT C.Status from #TEMP C
LEFT JOIN dbo.Test2 B
ON C.Status= B.Status
WHERE B.Status IS NULL
IT took roughly 1 minute
December 9, 2010 at 12:55 pm
An execution plan here would help a bunch. You shouldn't have to bother with the temp table to make this work.
The challenge with adding Status to the clustered index as a secondary column is that you are not using the first column in the index in the query. So if your clustered index is ID, Status and you are not using ID in the query, its not really taking advantage of the index like it could.
The clustered index key is included in all non-clutered indexes, so creating a non-clustered index on the Status columns is probably the way to go here.
December 9, 2010 at 1:00 pm
I got your point John, let me give that a try first
December 12, 2010 at 11:32 am
I agree with John that the execution plan would be very informative, and that the temp table shouldn't be necessary.
Have you tried rewriting the query using an exists clause instead of the left outer join? since the population in table test2 is relatively small i would expect this to give a better result
something like:
select distinct a.status
from dbo.test1 a
where a.status not in (select distinct b.status
from dbo.test2 b)
and a.status is not null
Willem
December 12, 2010 at 6:28 pm
LOOKUP_BI-756009 (12/9/2010)
SELECT DISTINCT A.Status
FROM dbo.Test1 A
LEFT JOIN dbo.Test2 B
ON A.Status= B.Status
WHEREB.Status IS NULL
AND A.Status IS NOT NULL
Number of rows in dbo.Test1 A = 150 000
Number of rows in dbo.Test2 B = 4500
There are clusterd indexes on both table on ID column
This query is taking 35 minutes.
[font="Arial Black"]How can I make it faster[/font]
I have tested the following
1) Included column A.Status & B.Status on the clustered index -Time taken 34 minutes
The query execution plan shows that the LEFT JOIN operation cost is 60% and the filter on B.Status IS NULL is 18%.These two task has the highest %
Just going by the names of the columns in the ON clause... you probably can't make this as fast as you'd like even with indexes. You've most likely made a many-to-many join which is really a Cartesian Join (ie: CROSS JOIN). You need additional JOIN criteria to make this problem go away.
Of course, that's a total guess on my part because none of us actually know anything about your tables or their columns. My suggestion would be to read and heed the article at the second link in my signature line below.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 13, 2010 at 10:42 am
I'm with Jeff on this, as usual. Show the estimated query plan and not the incredibly FAT join line. Put your mouse over it and see the telephone number that pops up for estimated rows.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
December 13, 2010 at 11:04 am
Yes, I tried John's response and did much more quicker. Thank You all.
December 14, 2010 at 7:37 am
LOOKUP_BI-756009 (12/13/2010)
Yes, I tried John's response and did much more quicker. Thank You all.
Still, I believe you have a many-to-many join and a bit of extra criteria would help. If additional criteria is not possible, you may want to have a look at the overall design of the tables and data... a change may be in order. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
December 15, 2010 at 7:00 pm
Please test Query SQL:
SELECT DISTINCT A.Status
FROM dbo.Test1 A
LEFT JOIN dbo.Test2 B
ON A.Status= B.Status
WHERE A.Status IS NOT NULL
AND B.Status IS NULL
Because of Number of rows in dbo.Test1 A more than Number of rows in dbo.Test1 B
😛
December 15, 2010 at 10:50 pm
250128836 (12/15/2010)
Please test Query SQL:SELECT DISTINCT A.Status
FROM dbo.Test1 A
LEFT JOIN dbo.Test2 B
ON A.Status= B.Status
WHERE A.Status IS NOT NULL
AND B.Status IS NULL
Because of Number of rows in dbo.Test1 A more than Number of rows in dbo.Test1 B
😛
Not being picky - well Ok I am but the above query makes no sence....
as the query is internaly processed
Step 1
FROM dbo.Test1 A
LEFT JOIN dbo.Test2 B ON A.Status = B.Status
every thing a AND those IN b that has the same status AS A
A LEFT JOIN will show every Test1 status and all the matching Test2.status records Nulls are not evaluated
hence negating the need for the where clause becauce only B records with an equalling A Status will be returned.
Left joins, right joins, and Inner joins are 'Exclusive' joins if you want to use the following where clause
WHERE A.Status IS NOT NULL
AND B.Status IS NULL
Then you will need to use the inclusive OUTER join and then evaluate for nulls.
Lastly remove the Distinct keyword and include instead a GROUP BY A.status because the Group by clause is actioned after the where clause and you are forcing SQL to double handle its internal grouping.
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply