My query is just too slow - Please help

  • I need to query a master table to return records where the master has two detail records with certain values.

    For example with tables

    Master

     MasterID

    Details

     DetailID

     MasterID

     Value

    I have tried

     Select * From Master

     INNER JOIN Details D1 ON Master.MasterID = Details.MasterID

     INNER JOIN Details D2 ON Master.MasterID = Details.MasterID

     WHERE D1.Value = 'Criteria1Value' AND D2.Value = = 'Criteria2Value'

    this seems to work but is very slow on data with 10,000 master records and 900,000 detail records.

    I would appreciated it if someone could suggest a better solution.

    Simon.

     

  • Hi Simon,

    There's nothing wrong with that query (except for the typo ).

    What indexes do you have on the tables? What is the query plan telling you is taking the time?

    This article might help you optimise things...

    http://www.sqlservercentral.com/columnists/jsack/sevenshowplanredflags.asp

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • IMO this notation is more comprehensive ...

    Select * From Master

     INNER JOIN Details D1

    ON Master.MasterID = Details.MasterID 

                           and D1.Value = 'Criteria1Value' 

     INNER JOIN Details D2

    ON Master.MasterID = Details.MasterID

       and D2.Value = 'Criteria2Value'

    provide indexes for Details.MasterID and Details.value if you can

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • I'm honestly not sure it would be any faster because I'm still learning on SQL optimization, but you could try eliminating the join as follows:

     Select * From Master

     WHERE MasterID IN ( SELECT MasterID From Details WHERE Value = 'Criteria1Value' )

               AND MasterID IN ( SELECT MasterID From Details WHERE Value = 'Criteria2Value' )

    Again, this might be slower than your query, but I'm just brainstorming.

    Edited this because I tend to get EXISTS and IN confused.  Not sure why.

  • alzdba - That re-working of the query could potentially produce different results.  Also, for the OP, it might be faster.  See this article for a discussion of this:

    http://www.sqlservercentral.com/columnists/sjones/outerjointrouble.asp

    Julie, that is indeed another way to approach this issue, but using IN rather than an inner join is almost always a bad idea (although not always).

    Simon, as already noted, look at the query plan and the indexing.  Also seriously consider getting rid of the Select * and replacing that with only the columns you need.

  • thank you for the quick replies.

    The query above is just psuedo code, but I have indexes on the ID's (int) used in the join. I don't think I can do much with the value field as it is text and varies quite a bit.

    I don't use * in my real query.

    Regarding the different notation of using the D1.value in the ON part of the query, I have found this to run slower.

    The problem I have is that the WHERE clause is also a lot more complicated and joins quite a few more times for different values.

    I wil ltry out the nested Select statements to see how quick that runs.

    I'm on quite a tight timescale at the moment so I doubt I can figure out the optimisation myself so any further ideas would be helpful.

  • Without seeing the actual query and DDL, we can't be of much help.  A query plan would be really helpful as well.

  • A quick fix might be to see where your "Value" match may be slowing down.  I would suggest using alzdba's query and commenting out sections to see where the slow down might be occurring... 

    SELECT *

    FROM Master

       INNER JOIN Details D1 ON( Master.MasterID = Details.MasterID

                                           AND D1.Value = 'Criteria1Value')

       INNER JOIN Details D2 ON( Master.MasterID = Details.MasterID

                                           AND D2.Value = 'Criteria2Value')

    -----------------------------------------------------------------------------------------

    SELECT *

    FROM Master

       INNER JOIN Details D1 ON( Master.MasterID = Details.MasterID)

    --                                       AND D1.Value = 'Criteria1Value')

       INNER JOIN Details D2 ON( Master.MasterID = Details.MasterID

                                           AND D2.Value = 'Criteria2Value')

    -----------------------------------------------------------------------------------------

    SELECT *

    FROM Master

       INNER JOIN Details D1 ON( Master.MasterID = Details.MasterID

                                           AND D1.Value = 'Criteria1Value')

       INNER JOIN Details D2 ON( Master.MasterID = Details.MasterID)

    --                                       AND D2.Value = 'Criteria2Value')

    I wasn't born stupid - I had to study.

  • If the Details table is usually accessed this way, I'd try changing the clustered index to MasterID, Value.  If that isn't your primary key on that table (such as if it the PK is DetailID), then the primary key will be nonclustered, which is fine 99% of the time when the PK is an identity.

    If you were not pulling many rows, then a nonclustered index on those values would help vs. not having it.  But if you're pulling more than a few rows, then the nonclustered version of that index quickly becomes more expensive then a raw table scan (and you're probably seeing lots of table scans on the big table here), and will be ignored by the optimizer. 

    So, my advice is to give the clustered index a shot.

    -Eddie

    Eddie Wuerch
    MCM: SQL

  • Julie. Your method seems to solve the speed issue in most cases. Queries that took 10 minutes now take 3 seconds !

    However as the WHERE clause becomes more complex looking for different combinations of Details.Value then it slows down somewhat.

    This seems to be caused by using an OR in the WHERE clause. This can be solved by re-writing the where clause to use only 'AND' and using multiple queries.

    For example a where clause containing

    (D1.Value = 'value1' AND D2.Value = 'value2') AND (D3.Value = 'value3' OR D4.Value = 'value4')

    would take 21 seconds.

    becomes two queries

    1. (D1.Value = 'value1' AND D2.Value = 'value2' AND D3.Value = 'value3')

    2. (D1.Value = 'value1' AND D2.Value = 'value2' AND D4.Value = 'value4')

    which take 4 seconds in total.

    Not very elegant but seems to work. As my .net program already creates this query in theory I can write something to work out the change to the where clause.

    It still feels like this should not be so difficult though

    Simon.

  • Eddie,

    This is interesting. In most cases I would be returning 1,000's if not 10,000's of results so your suggestion is well worth trying out.

    At the moment I have PK on the DetailID field which is an Identity field.

    I will do some testing tomorrow and see if there is a difference.

  • >>Select * From Master

    I think an obvious performance culprit has been overlooked: "Select *"

    Do you really need all columns from both tables ? If you slap a covering index on the Details table indexing just the [MasterID] and [Value] columns, and remove the "Select *", the query could be satisfied with only index page hits with no need to read the data pages from the Detail table.

    Also, removing 2 joins to the Detail table may help:

    Select M.*

    From Master As M

    Inner Join

    (

      Select MasterID

      From Details

      Where Value In ('Criteria1Value', 'Criteria2Value')

      Group By MasterID

      Having Count(*) = 2

    ) dt 

    On dt.MasterID = M.MasterID

      

  • PW,

    I don't use * in my actual query. The exmaple I have provided is more of a general problem, the queries I have in practice can have very involved WHERE clause elements.

    Your suggestion looks good but I'm not sure how I would use it in more complex examples for example the below is my original example extended slightly

    Master

     MasterID (Identity)

    Details

     DetailID (Identity)

     MasterID (int)

     Type (int)

     Value (nvarchar)

    SELECT Master.MasterID FROM Master

    INNER JOIN Details D1 ON Details.DetailID = Master.MasterID

    INNER JOIN Details D2 ON Details.DetailID = Master.MasterID

    INNER JOIN Details D3 ON Details.DetailID = Master.MasterID

    WHERE (D1.TypeID = 1 AND D1.Value = 'Value1') AND (D2.TypeID = 56 AND D2.Value = 'Value2' OR (D3.TypeID = 25 AND D3.Value = 'Value3'))

    All of this is part of an application that allows user defined searches on a system, hence the WHERE clause needs to be generated by my program and is not fixed.

  • You MUST have clustered index on MasterId,TypeID columns in Details table.

    _____________
    Code for TallyGenerator

  • for this query alone, I would advise only a non-clustering (but covering) index since there is no extra data-access.

    In general an " exists (select ... where correlated predicate = xxx) " is faster than an in-list

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply