COALESCE in the setting relationships

  • I had a query that was running very slowly and I couldn't figure out what was going on.  Our new DBA ran the query through a query optimizer in TOAD and it spit out some recommended changes.  They contained something I had never seen before and have been unable to find reference to online.

    On several of my joins it recommended we turn      

    INNER JOIN LAW_PROD.dbo.CUSTORDER CUSTORDER

    ON SAINVOICE.COMPANY = CUSTORDER.COMPANY 

    AND SAINVOICE.ORDER_NBR = CUSTORDER.ORDER_NBR

    into

    INNER JOIN LAW_PROD.dbo.CUSTORDER CUSTORDER

    ON SAINVOICE.COMPANY = COALESCE (CUSTORDER.COMPANY , CUSTORDER.COMPANY) 

    AND SAINVOICE.ORDER_NBR = CUSTORDER.ORDER_NBR

    The field that it recommends using COALESCE on only contains one of 3 values with null not being an option.  I can't figure out what this is doing and why it works so damn well.  Has anyone ever used this or seen it used?

     

    Set nocount on

    SELECT DISTINCT

     SAINVCLINE.COMPANY,

     SAINVCLINE.INVC_PREFIX,

     SAINVCLINE.INVC_NUMBER,

     SAINVCLINE.LINE_NBR,

     SAINVCLINE.ITEM,

     SAINVCLINE.ITEM_DESC,

     SAINVCLINE.QUANTITY,

     SAINVCLINE.SELL_UNIT_PRC,

     SAINVCLINE.UNIT_COST,

     SAINVCLINE.R_YEAR,

     SAINVCLINE.PERIOD,

     SAINVOICE.TRANS_USER4 as CUSTNUM,

     SALESREP.NAME,

     SAINVOICE.ORDER_NBR,

     SAINVOICE.INVOICE_DATE,

     CUSTORDER.SHIP_TO_NAME,

     CUSTORDER.SHIP_TO_ADDR1,

     CUSTORDER.SHIP_TO_CITY,

     CUSTORDER.SHIP_TO_STATE,

     CUSTORDER.SHIP_TO_ZIP,

     SAINVCLINE.SALES_MINCL,

     SAINVCLINE.SALES_MAJCL,

     CASE

     WHEN B.VENDOR in ('   904288', '   903690') then 'Network Appliance'

     ELSE 'nSeries'

     End as Vendor

    FROM LAW_PROD.dbo.SAINVCLINE SAINVCLINE

         INNER JOIN LAW_PROD.dbo.SAINVOICE SAINVOICE

     ON SAINVCLINE.COMPANY = COALESCE (SAINVOICE.COMPANY , SAINVOICE.COMPANY)

     AND SAINVCLINE.INVC_PREFIX = SAINVOICE.INVC_PREFIX

     AND SAINVCLINE.INVC_NUMBER = SAINVOICE.INVC_NUMBER

         INNER JOIN LAW_PROD.dbo.CUSTORDER CUSTORDER

     ON SAINVOICE.COMPANY = COALESCE (CUSTORDER.COMPANY , CUSTORDER.COMPANY) 

     AND SAINVOICE.ORDER_NBR = CUSTORDER.ORDER_NBR

     INNER JOIN LAW_PROD.dbo.POLINESRC A

     ON SAINVOICE.ORDER_NBR = A.SOURCE_DOC_N

     AND SAINVOICE.COMPANY = A.COMPANY --COALESCE (A.COMPANY , A.COMPANY)

     INNER JOIN LAW_PROD.dbo.PORECLINE B

     ON A.PO_NUMBER = B.PO_NUMBER

     AND A.LINE_NBR = B.LINE_NBR

     AND A.COMPANY = B.COMPANY --COALESCE (B.COMPANY , B.COMPANY)

     INNER JOIN LAW_PROD.dbo.APVENMAST C

     ON B.VENDOR = C.VENDOR 

     INNER JOIN LAW_PROD.dbo.SALESREP SALESREP

     ON SAINVCLINE.SALESMAN = SALESREP.SALESMAN 

    WHERE SAINVOICE.INVOICE_DATE between '1/1/06' and '11/15/06' --@StartDate and @EndDate

     AND (B.VENDOR = '   903690' or B.VENDOR = '   904228')

     AND SAINVCLINE.ITEM <> 'FREIGHT'

    OPTION (MAXDOP 1)

  • Interesting. It could act as a non-vendor specific JOIN hint.

    If you look at the query plans for the original and COALESCE version, is there a difference between the type of join between the tables?

    (eg Look at SAINVCLINE  and SAINVOICE)

    If there is a difference, does ISNULL work in the same way?

     

  • Actually I don't see any point of this replacement.

    COALESCE (CUSTORDER.COMPANY , CUSTORDER.COMPANY)  = CUSTORDER.COMPANY

    all the time.

    This makes query only slower.

    Probably he ment this:

    INNER JOIN LAW_PROD.dbo.CUSTORDER CUSTORDER

    ON SAINVOICE.COMPANY = COALESCE (CUSTORDER.COMPANY , SAINVOICE.COMPANY ) 

    AND SAINVOICE.ORDER_NBR = CUSTORDER.ORDER_NBR

    If this is the case I would suggest:

    INNER JOIN LAW_PROD.dbo.CUSTORDER CUSTORDER

    ON (SAINVOICE.COMPANY = CUSTORDER.COMPANY OR (CUSTORDER.COMPANY  IS NULL AND SAINVOICE.COMPANY IS NULL)&nbsp

    AND SAINVOICE.ORDER_NBR = CUSTORDER.ORDER_NBR

    Looks horribly long but on large tables it's way more effective than any kind of COALESCE or ISNULL options.

    MAke sure there are indexes on all joined columns.

    _____________
    Code for TallyGenerator

  • Sergiy,

    While appreciating your comments about the join expression and the need for correct indexes, I think the COALESCE functions here may be tricking the optimizer. (Gabola did say that the COALESCE was produced from a commercial optimizing product.)

    I find the optimizer difficult to understand, but think, when there are a lot of joins, it is unlikely to get the best join plan for more than 3 or 4 of those joins. I presume, but do not know, that it:

    1. goes through the joins in order and works out the best join type. (Merge/Hash/Nested Loop)

    2. decides when it has spent enough time on optimization and sets any remaining joins to nested loop.

    If my assumptions here are wrong, would someone please enlighten me.

    Assuming the above is correct, then maybe a COALESCE function is enough to convince the optimizer not to optimize that join and to optimize a join further down instead. A merge/hash join, instead of a nested loop join, further down the list may be enough to increase the performance of the query. (An alternative could be to re-order the joins but this may upset a carefully crafted deadlock avoidance strategy.)

    I found Gabola’s post interesting as this could provide a straight forward way of finding the best join type for each join, in a statement with a lot of joins, after indexes and join expressions have been considered. I envisaged:

    1. updating the statistics.

    2. progressively going through the query, adding COALESCE to the next join, running the query and making a note, from the query plan, of the join type for the first join without a COALESCE.

    3. removing the COALESCEs  and adding the join hints previously noted.

    If Gabola does not post a reply, I may try to make time and set up a test of this idea later next week.

    As your knowledge, and logic, are obviously considerably better than mine, I would be grateful for your thoughts on the above; even if it is only to suggest that I entrust myself to the care of the local lunatic asylum.

    Thank you,

    Ken

     

  • Gabola71, what does the COALESCE end up doing to your query plan?

    Ken and Sergiy,

    Ken's theory is interesting.  I too have seen 3rd party optimizing tools (Quest Cenral) where COALESCE was used to 'optimize' SQL code.  I put optimize in quotes because I've never seen one of these tools that have impressed me.  There is no way for these tools to break down the SQL code and restructure it in a more efficient manner so what they do is to play around with different combinations of locking hints, index hints,  and, yes, COALESCEing JOINs.  I was hoping that our developers could use the tool to optimize their SQL code when, in fact, these tools' only way of optimizing is by applying hints.  This means that it will even optimize very poorly written SQL by simply adding hints.  I took one look at what the tool was doing and have never tried it again, but it I've always been curious as to what it was trying to do by COALESCEing the JOINs.  Like Sergiy, one would think that it is redundant to compare 2 identical values in COALESCE and that it would only result in additional overhead.  While I do not plan on putting in any time to play with this one, I would be very curious if anyone comes up with some proof on this one.

    Have a good weekend!!

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Thanks for all the responses.  I'll try to run full execution plans this weekend for the query using the COALESCE JOIN and the previous.  I'm a pretty competent coder, I had never seen this used and just looking at it doesn't make sense like one of the posters said.  But this addition took a query that was running for 15 mins down to 30 seconds.  I'll admit that some other hints will probably achieve the same results but was just curious if anyone knew what exactely the COALESCE(D) join was doing.

  • Gabola71,

    Have you managed to get the execution plans for the query before and after COALESCE was applied?

    If you are short on time, it would be good just to compare the results of each query with SET SHOWPLAN_TEXT ON.

    Regards,

    Ken

  • I'm curious what you found as well.

    Thanks,

     

    Mike

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply