November 16, 2006 at 11:11 am
I had a query that was running very slowly and I couldn't figure out what was going on. Our new DBA ran the query through a query optimizer in TOAD and it spit out some recommended changes. They contained something I had never seen before and have been unable to find reference to online.
On several of my joins it recommended we turn
INNER JOIN LAW_PROD.dbo.CUSTORDER CUSTORDER
ON SAINVOICE.COMPANY = CUSTORDER.COMPANY
AND SAINVOICE.ORDER_NBR = CUSTORDER.ORDER_NBR
into
INNER JOIN LAW_PROD.dbo.CUSTORDER CUSTORDER
ON SAINVOICE.COMPANY = COALESCE (CUSTORDER.COMPANY , CUSTORDER.COMPANY)
AND SAINVOICE.ORDER_NBR = CUSTORDER.ORDER_NBR
The field that it recommends using COALESCE on only contains one of 3 values with null not being an option. I can't figure out what this is doing and why it works so damn well. Has anyone ever used this or seen it used?
Set nocount on
SELECT DISTINCT
SAINVCLINE.COMPANY,
SAINVCLINE.INVC_PREFIX,
SAINVCLINE.INVC_NUMBER,
SAINVCLINE.LINE_NBR,
SAINVCLINE.ITEM,
SAINVCLINE.ITEM_DESC,
SAINVCLINE.QUANTITY,
SAINVCLINE.SELL_UNIT_PRC,
SAINVCLINE.UNIT_COST,
SAINVCLINE.R_YEAR,
SAINVCLINE.PERIOD,
SAINVOICE.TRANS_USER4 as CUSTNUM,
SALESREP.NAME,
SAINVOICE.ORDER_NBR,
SAINVOICE.INVOICE_DATE,
CUSTORDER.SHIP_TO_NAME,
CUSTORDER.SHIP_TO_ADDR1,
CUSTORDER.SHIP_TO_CITY,
CUSTORDER.SHIP_TO_STATE,
CUSTORDER.SHIP_TO_ZIP,
SAINVCLINE.SALES_MINCL,
SAINVCLINE.SALES_MAJCL,
CASE
WHEN B.VENDOR in (' 904288', ' 903690') then 'Network Appliance'
ELSE 'nSeries'
End as Vendor
FROM LAW_PROD.dbo.SAINVCLINE SAINVCLINE
INNER JOIN LAW_PROD.dbo.SAINVOICE SAINVOICE
ON SAINVCLINE.COMPANY = COALESCE (SAINVOICE.COMPANY , SAINVOICE.COMPANY)
AND SAINVCLINE.INVC_PREFIX = SAINVOICE.INVC_PREFIX
AND SAINVCLINE.INVC_NUMBER = SAINVOICE.INVC_NUMBER
INNER JOIN LAW_PROD.dbo.CUSTORDER CUSTORDER
ON SAINVOICE.COMPANY = COALESCE (CUSTORDER.COMPANY , CUSTORDER.COMPANY)
AND SAINVOICE.ORDER_NBR = CUSTORDER.ORDER_NBR
INNER JOIN LAW_PROD.dbo.POLINESRC A
ON SAINVOICE.ORDER_NBR = A.SOURCE_DOC_N
AND SAINVOICE.COMPANY = A.COMPANY --COALESCE (A.COMPANY , A.COMPANY)
INNER JOIN LAW_PROD.dbo.PORECLINE B
ON A.PO_NUMBER = B.PO_NUMBER
AND A.LINE_NBR = B.LINE_NBR
AND A.COMPANY = B.COMPANY --COALESCE (B.COMPANY , B.COMPANY)
INNER JOIN LAW_PROD.dbo.APVENMAST C
ON B.VENDOR = C.VENDOR
INNER JOIN LAW_PROD.dbo.SALESREP SALESREP
ON SAINVCLINE.SALESMAN = SALESREP.SALESMAN
WHERE SAINVOICE.INVOICE_DATE between '1/1/06' and '11/15/06' --@StartDate and @EndDate
AND (B.VENDOR = ' 903690' or B.VENDOR = ' 904228')
AND SAINVCLINE.ITEM <> 'FREIGHT'
OPTION (MAXDOP 1)
November 16, 2006 at 11:43 am
Interesting. It could act as a non-vendor specific JOIN hint.
If you look at the query plans for the original and COALESCE version, is there a difference between the type of join between the tables?
(eg Look at SAINVCLINE and SAINVOICE)
If there is a difference, does ISNULL work in the same way?
November 16, 2006 at 4:03 pm
Actually I don't see any point of this replacement.
COALESCE (CUSTORDER.COMPANY , CUSTORDER.COMPANY) = CUSTORDER.COMPANY
all the time.
This makes query only slower.
Probably he ment this:
INNER JOIN LAW_PROD.dbo.CUSTORDER CUSTORDER
ON SAINVOICE.COMPANY = COALESCE (CUSTORDER.COMPANY , SAINVOICE.COMPANY )
AND SAINVOICE.ORDER_NBR = CUSTORDER.ORDER_NBR
If this is the case I would suggest:
INNER JOIN LAW_PROD.dbo.CUSTORDER CUSTORDER
ON (SAINVOICE.COMPANY = CUSTORDER.COMPANY OR (CUSTORDER.COMPANY IS NULL AND SAINVOICE.COMPANY IS NULL) 
AND SAINVOICE.ORDER_NBR = CUSTORDER.ORDER_NBR
Looks horribly long but on large tables it's way more effective than any kind of COALESCE or ISNULL options.
MAke sure there are indexes on all joined columns.
_____________
Code for TallyGenerator
November 17, 2006 at 11:42 am
Sergiy,
While appreciating your comments about the join expression and the need for correct indexes, I think the COALESCE functions here may be tricking the optimizer. (Gabola did say that the COALESCE was produced from a commercial optimizing product.)
I find the optimizer difficult to understand, but think, when there are a lot of joins, it is unlikely to get the best join plan for more than 3 or 4 of those joins. I presume, but do not know, that it:
1. goes through the joins in order and works out the best join type. (Merge/Hash/Nested Loop)
2. decides when it has spent enough time on optimization and sets any remaining joins to nested loop.
If my assumptions here are wrong, would someone please enlighten me.
Assuming the above is correct, then maybe a COALESCE function is enough to convince the optimizer not to optimize that join and to optimize a join further down instead. A merge/hash join, instead of a nested loop join, further down the list may be enough to increase the performance of the query. (An alternative could be to re-order the joins but this may upset a carefully crafted deadlock avoidance strategy.)
I found Gabola’s post interesting as this could provide a straight forward way of finding the best join type for each join, in a statement with a lot of joins, after indexes and join expressions have been considered. I envisaged:
1. updating the statistics.
2. progressively going through the query, adding COALESCE to the next join, running the query and making a note, from the query plan, of the join type for the first join without a COALESCE.
3. removing the COALESCEs and adding the join hints previously noted.
If Gabola does not post a reply, I may try to make time and set up a test of this idea later next week.
As your knowledge, and logic, are obviously considerably better than mine, I would be grateful for your thoughts on the above; even if it is only to suggest that I entrust myself to the care of the local lunatic asylum.
Thank you,
Ken
November 17, 2006 at 1:08 pm
Gabola71, what does the COALESCE end up doing to your query plan?
Ken and Sergiy,
Ken's theory is interesting. I too have seen 3rd party optimizing tools (Quest Cenral) where COALESCE was used to 'optimize' SQL code. I put optimize in quotes because I've never seen one of these tools that have impressed me. There is no way for these tools to break down the SQL code and restructure it in a more efficient manner so what they do is to play around with different combinations of locking hints, index hints, and, yes, COALESCEing JOINs. I was hoping that our developers could use the tool to optimize their SQL code when, in fact, these tools' only way of optimizing is by applying hints. This means that it will even optimize very poorly written SQL by simply adding hints. I took one look at what the tool was doing and have never tried it again, but it I've always been curious as to what it was trying to do by COALESCEing the JOINs. Like Sergiy, one would think that it is redundant to compare 2 identical values in COALESCE and that it would only result in additional overhead. While I do not plan on putting in any time to play with this one, I would be very curious if anyone comes up with some proof on this one.
Have a good weekend!!
November 17, 2006 at 1:54 pm
Thanks for all the responses. I'll try to run full execution plans this weekend for the query using the COALESCE JOIN and the previous. I'm a pretty competent coder, I had never seen this used and just looking at it doesn't make sense like one of the posters said. But this addition took a query that was running for 15 mins down to 30 seconds. I'll admit that some other hints will probably achieve the same results but was just curious if anyone knew what exactely the COALESCE(D) join was doing.
November 24, 2006 at 11:45 am
Gabola71,
Have you managed to get the execution plans for the query before and after COALESCE was applied?
If you are short on time, it would be good just to compare the results of each query with SET SHOWPLAN_TEXT ON.
Regards,
Ken
November 27, 2006 at 2:13 pm
I'm curious what you found as well.
Thanks,
Mike
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply