May 4, 2006 at 9:21 am
I need to query a master table to return records where the master has two detail records with certain values.
For example with tables
Master
MasterID
Details
DetailID
MasterID
Value
I have tried
Select * From Master
INNER JOIN Details D1 ON Master.MasterID = Details.MasterID
INNER JOIN Details D2 ON Master.MasterID = Details.MasterID
WHERE D1.Value = 'Criteria1Value' AND D2.Value = = 'Criteria2Value'
this seems to work but is very slow on data with 10,000 master records and 900,000 detail records.
I would appreciated it if someone could suggest a better solution.
Simon.
May 4, 2006 at 9:38 am
Hi Simon,
There's nothing wrong with that query (except for the typo ).
What indexes do you have on the tables? What is the query plan telling you is taking the time?
This article might help you optimise things...
http://www.sqlservercentral.com/columnists/jsack/sevenshowplanredflags.asp
Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
May 4, 2006 at 9:43 am
IMO this notation is more comprehensive ...
Select * From Master
INNER JOIN Details D1
ON Master.MasterID = Details.MasterID
and D1.Value = 'Criteria1Value'
INNER JOIN Details D2
ON Master.MasterID = Details.MasterID
and D2.Value = 'Criteria2Value'
provide indexes for Details.MasterID and Details.value if you can
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
May 4, 2006 at 9:44 am
I'm honestly not sure it would be any faster because I'm still learning on SQL optimization, but you could try eliminating the join as follows:
Select * From Master
WHERE MasterID IN ( SELECT MasterID From Details WHERE Value = 'Criteria1Value' )
AND MasterID IN ( SELECT MasterID From Details WHERE Value = 'Criteria2Value' )
Again, this might be slower than your query, but I'm just brainstorming.
Edited this because I tend to get EXISTS and IN confused. Not sure why.
May 4, 2006 at 9:55 am
alzdba - That re-working of the query could potentially produce different results. Also, for the OP, it might be faster. See this article for a discussion of this:
http://www.sqlservercentral.com/columnists/sjones/outerjointrouble.asp
Julie, that is indeed another way to approach this issue, but using IN rather than an inner join is almost always a bad idea (although not always).
Simon, as already noted, look at the query plan and the indexing. Also seriously consider getting rid of the Select * and replacing that with only the columns you need.
May 4, 2006 at 10:26 am
thank you for the quick replies.
The query above is just psuedo code, but I have indexes on the ID's (int) used in the join. I don't think I can do much with the value field as it is text and varies quite a bit.
I don't use * in my real query.
Regarding the different notation of using the D1.value in the ON part of the query, I have found this to run slower.
The problem I have is that the WHERE clause is also a lot more complicated and joins quite a few more times for different values.
I wil ltry out the nested Select statements to see how quick that runs.
I'm on quite a tight timescale at the moment so I doubt I can figure out the optimisation myself so any further ideas would be helpful.
May 4, 2006 at 10:42 am
Without seeing the actual query and DDL, we can't be of much help. A query plan would be really helpful as well.
May 4, 2006 at 10:44 am
A quick fix might be to see where your "Value" match may be slowing down. I would suggest using alzdba's query and commenting out sections to see where the slow down might be occurring...
SELECT *
FROM Master
INNER JOIN Details D1 ON( Master.MasterID = Details.MasterID
AND D1.Value = 'Criteria1Value')
INNER JOIN Details D2 ON( Master.MasterID = Details.MasterID
AND D2.Value = 'Criteria2Value')
-----------------------------------------------------------------------------------------
SELECT *
FROM Master
INNER JOIN Details D1 ON( Master.MasterID = Details.MasterID)
-- AND D1.Value = 'Criteria1Value')
INNER JOIN Details D2 ON( Master.MasterID = Details.MasterID
AND D2.Value = 'Criteria2Value')
-----------------------------------------------------------------------------------------
SELECT *
FROM Master
INNER JOIN Details D1 ON( Master.MasterID = Details.MasterID
AND D1.Value = 'Criteria1Value')
INNER JOIN Details D2 ON( Master.MasterID = Details.MasterID)
-- AND D2.Value = 'Criteria2Value')
I wasn't born stupid - I had to study.
May 4, 2006 at 11:13 am
If the Details table is usually accessed this way, I'd try changing the clustered index to MasterID, Value. If that isn't your primary key on that table (such as if it the PK is DetailID), then the primary key will be nonclustered, which is fine 99% of the time when the PK is an identity.
If you were not pulling many rows, then a nonclustered index on those values would help vs. not having it. But if you're pulling more than a few rows, then the nonclustered version of that index quickly becomes more expensive then a raw table scan (and you're probably seeing lots of table scans on the big table here), and will be ignored by the optimizer.
So, my advice is to give the clustered index a shot.
-Eddie
Eddie Wuerch
MCM: SQL
May 4, 2006 at 11:30 am
Julie. Your method seems to solve the speed issue in most cases. Queries that took 10 minutes now take 3 seconds !
However as the WHERE clause becomes more complex looking for different combinations of Details.Value then it slows down somewhat.
This seems to be caused by using an OR in the WHERE clause. This can be solved by re-writing the where clause to use only 'AND' and using multiple queries.
For example a where clause containing
(D1.Value = 'value1' AND D2.Value = 'value2') AND (D3.Value = 'value3' OR D4.Value = 'value4')
would take 21 seconds.
becomes two queries
1. (D1.Value = 'value1' AND D2.Value = 'value2' AND D3.Value = 'value3')
2. (D1.Value = 'value1' AND D2.Value = 'value2' AND D4.Value = 'value4')
which take 4 seconds in total.
Not very elegant but seems to work. As my .net program already creates this query in theory I can write something to work out the change to the where clause.
It still feels like this should not be so difficult though
Simon.
May 4, 2006 at 11:33 am
Eddie,
This is interesting. In most cases I would be returning 1,000's if not 10,000's of results so your suggestion is well worth trying out.
At the moment I have PK on the DetailID field which is an Identity field.
I will do some testing tomorrow and see if there is a difference.
May 4, 2006 at 3:36 pm
>>Select * From Master
I think an obvious performance culprit has been overlooked: "Select *"
Do you really need all columns from both tables ? If you slap a covering index on the Details table indexing just the [MasterID] and [Value] columns, and remove the "Select *", the query could be satisfied with only index page hits with no need to read the data pages from the Detail table.
Also, removing 2 joins to the Detail table may help:
Select M.*
From Master As M
Inner Join
(
Select MasterID
From Details
Where Value In ('Criteria1Value', 'Criteria2Value')
Group By MasterID
Having Count(*) = 2
) dt
On dt.MasterID = M.MasterID
May 5, 2006 at 1:24 am
PW,
I don't use * in my actual query. The exmaple I have provided is more of a general problem, the queries I have in practice can have very involved WHERE clause elements.
Your suggestion looks good but I'm not sure how I would use it in more complex examples for example the below is my original example extended slightly
Master
MasterID (Identity)
Details
DetailID (Identity)
MasterID (int)
Type (int)
Value (nvarchar)
SELECT Master.MasterID FROM Master
INNER JOIN Details D1 ON Details.DetailID = Master.MasterID
INNER JOIN Details D2 ON Details.DetailID = Master.MasterID
INNER JOIN Details D3 ON Details.DetailID = Master.MasterID
WHERE (D1.TypeID = 1 AND D1.Value = 'Value1') AND (D2.TypeID = 56 AND D2.Value = 'Value2' OR (D3.TypeID = 25 AND D3.Value = 'Value3'))
All of this is part of an application that allows user defined searches on a system, hence the WHERE clause needs to be generated by my program and is not fixed.
May 5, 2006 at 3:00 am
You MUST have clustered index on MasterId,TypeID columns in Details table.
_____________
Code for TallyGenerator
May 5, 2006 at 3:05 am
for this query alone, I would advise only a non-clustering (but covering) index since there is no extra data-access.
In general an " exists (select ... where correlated predicate = xxx) " is faster than an in-list
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply